Proposed slightly radical change to CODE_PHRASE in Text package in openEHR

Dear all,

we just came across a "feature" of the current openEHR Reference Model which, in the light of current implementations (particularly XML), it seems would be good to alter slightly. The change has no semantic effect, only affecting how data are persisted. We are too close to the Release 1.0 release date to be making such changes for my liking at least, but on the other hand I suspect that this change would be of universal benefit.

The class concerned from the current model is as follows:

class CODE_PHRASE
    terminology_id: TERMINOLOGY_ID -- Identifier of the distinct terminology from which the code_string (or its elements) was extracted.
    code_string: String -- The key used by the terminology service to identify a concept or coordination of concepts.
        -- This string is most likely parsable inside the terminology service, but nothing can be assumed
        -- about its syntax outside that context.
end

The effect of this in XML data is to have to store:

    <name xsi:type="DV_CODED_TEXT">
          <value>clinical finding</value>
          <defining_code>
               <code_string>404684003</code_string>
               <terminology_id>
                    <value>SNOMED-CT</value>
               </terminology_id>
          </defining_code>
     </name>

The PROPOSED CHANGE to the class is as follows:

class CODE_PHRASE
    value: String
       -- the string concatenation form of the term, in the form
       -- "terminology_id::code_string", e.g. "SNOMED-CT::404684003"

    terminology_id(): TERMINOLOGY_ID {}
       -- NOW A FUNCTION
        -- Identifier of the distinct terminology from which the code_string (or its elements) was extracted.

    code_string(): String {}
       -- NOW A FUNCTION
       -- The key used by the terminology service to identify a concept or coordination of concepts.
        -- This string is most likely parsable inside the terminology service, but nothing can be assumed
        -- about its syntax outside that context.
end

The effect of this in XML data is to have to store:

    <name xsi:type="DV_CODED_TEXT">
          <value>clinical finding</value>
          <defining_code>
                 <value>SNOMED-CT::404684003</value>
          </defining_code>
    </name>

We believe this will also enable more powerful path expressions using the syntax form of coded terms, since the whole coded term code is now just one attribute, e.g. "SNOMED-CT::404684003".

openEHR already uses "micro-syntaxes" for various kinds of identifiers, such as archetype id, and a few other things, like units, so we see this proposed change as compatible with the existing state of affairs. It also has no semantic effect on the object-oriented view of CODE_PHRASE, since terminology_id and code_string hoth return the same values as before (it is just that now they are computed).

The only downside to this change we can see is that some current implementors will need to change their softare and schemas.

In the balance of considerations, I would prefer to impose the nuisance value now, and have a better reference model to live with for the next 5-10 years.

Do others agree?
Are their violent objertions?
Does anyone see a flaw in the proposal?

thanks,

- thomas beale

Hi Tom

I prefer the original structure, mainly from a general dislike of aggregate
attributes (such as the new one combining the terminology and the code id).
I can see the need to decompose the attribute to talk to the right
terminology being a nuisance and it could make some sensible operations
difficult when dealing with CODE_PHRASEs from multiple terminologies.

Cheers,
Hugh

Hi,

As an engineer, I prefer the new proposal.
A code without its terminology is like a number without its unit - and engineers have been trained to hate that :wink:
So imho : the closer both concept are one from another, the best is the model.

Cheers,

Philippe

Hugh Grady a écrit :

Sorry for not knowing better but I'll ask anyway; why
not something like:

<name xsi:type="DV_CODED_TEXT">
   <value>clinical finding</value>
      <defining_code>
         <code_string
terminology_id="SNOMED-CT">404684003</code_string>
      </defining_code>
</name>

Ed Dodds

Hi Ed
This is pretty close to the current schema - we are trying to keep attributes to a minimum.
Sam

Ed Dodds wrote:

Hugh Grady wrote:

Hi Tom

I prefer the original structure, mainly from a general dislike of aggregate
attributes (such as the new one combining the terminology and the code id).

In general I agree with the sentiment; I would only say that coded terms are somewhat of an exception - or any identifier with a name-space prepended, which is essentially what this is.

I can see the need to decompose the attribute to talk to the right
terminology being a nuisance and it could make some sensible operations
difficult when dealing with CODE_PHRASEs from multiple terminologies.

Can you elaborate on the last point above, Hugh?

- thomas

Ed Dodds wrote:

Sorry for not knowing better but I'll ask anyway; why
not something like:

<name xsi:type="DV_CODED_TEXT">
  <value>clinical finding</value>
     <defining_code>
        <code_string
terminology_id="SNOMED-CT">404684003</code_string>
     </defining_code>
</name>

Hi Ed,

this works in XML-schema only, but not in any other formalism. We also don't make any special use of XML attributes, since it does not map to object models - the only exception we make with this is the use of archetype_node_id in all data nodes, which is implemented as an XML-attribute in the XML-schema. But otherwise we use a systematic mapping of all object-oriented properties to XML elements. Doing otherwise causes a lot of problems - there ends up being an arbitrarily chosen mapping of object attributes to XML attributes or elements - often for completely irrelevant aesthetic reasons.

What we are suggesting is that if the syntax form "SNOMED-CT::404684003" were used as the persistent form of the object, then it would work in any formalism, including XML, where it would as shown in my original post (second example).

- thomas beale

sorry I missed this one - didn't get the message in fact - something odd going on with this list....

Heath Frankel wrote:

Tom,
My only comments is related to the resulting XML schema. Any reason we
couldn't simplify the XML further to the following:

    <name xsi:type="DV_CODED_TEXT">
          <value>clinical finding</value>
          <defining_code>SNOMED-CT::404684003</defining_code>
    </name>

I don't care too much wht the XML looks like - i.e. if it diverges from the form it should have if strictly following the object structure - which would be the following:

     <name xsi:type="DV_CODED_TEXT">
           <value>clinical finding</value>
           <defining_code><value>SNOMED-CT::404684003</value></defining_code>
     </name>

since CODE_PHRASE is still an object; whereas your (neater) XML treats it as a String field or function of DV_CODED_TEXT. Maybe this is the solution...to be technically correct, you could even have a new function or field defined on DV_CODED_TEXT called e.g. defining_code_string, then your XML would be exactly correct:

     <name xsi:type="DV_CODED_TEXT">
           <value>clinical finding</value>
           <defining_code_string>SNOMED-CT::404684003</defining_code_string>
     </name>

Even your initial XML is "correct", as long as we are happy to make the XML-schema diverge from the object model at that level. I think we have to be resigned to that kind of thing with XML, since otherwise it just generates too much garbage.

Having data enclosed within the defining_code element indicates that this is
the value anyway so we don't need the additional value child element. The
only potential downside of this is if there are additional attributes or
associations added to code_phrase later which would need to then be
represented as follows:

    <name xsi:type="DV_CODED_TEXT">
          <value>clinical finding</value>
          <defining_code>SNOMED-CT::404684003
                       <some_other_attribute>some
data</some_other_attribute>
            </defining_code>
    </name>

Even though the result is still valid XML it is not the normal
representation in XML.

Is the above valid XML? I didn't know you could do that...But it seems pretty unorthodox, especially if we want to have a clear mapping to object structures, which I think we should consider the "statement of truth".

We would then need to change the schema to
include the value element again. What is the likelihood of additional
attributes to Code_Phrase?

Sorry for the discussion of what the XML looks like, but you started it :>.

touché :wink: The think to keep in mind is to do with paths. We will obviously use Xpaths in XML data, but we also use Xpath-like paths as logical paths against objects in memory, and non-XML forms of the data. We want all these paths to be the same (or let's say, the Xpath-like openEHR paths to be a nearly strict syntactical subset of the W3C Xpath syntax). In these paths, ideally we would be able to just reference something like SNOMED-CT::404684003 rather than having to write stuff like
    some_attr [ terminology_id = "SNOMED-CT" AND code_string= "404684003" ]
...well, it seems like a nicer idea, but maybe it doesn't matter that much? Doing the latter means you can more easily have expressions like:
    some_attr [ terminology_id = "SNOMED-CT" AND ( code_string= "404684003" OR code_string = "404684017") ]
and so on, which Hugh or someone else mentioned here....if we think this is a distinct possibility, then sticking with the current model may be better anyway. On the other hand, if we think that the XML instance will just be too heavy because of this, we should go with the new proposal.

Further thoughts? (Given that we are very close to our 1.0 release date, I am inclined to leave this as it is, and allow for a divergence in the XML-schema which enables fewer tags, as in Heath's original piece of XML).

- thomas beale

This thread seemed to start with the assumptions that firstly, a
DV_CODED_TEXT will inevitably be stored in XML and that, secondly, the
mapping to XML will be a very direct one, even up to the names of the
corresponding elements. Are either of these valid assumptions? Even if they
are, should the low-level implementation details really be a concern in the
ADL language design? I'm sure there are all sorts of clever tricks that
will be use to persist OpenEHR data efficiently, and it's by no means a
given that the persistence solution will involve XML when large datasets are
being managed.

[mailto:owner-openehr-technical@openehr.org] On Behalf Of Thomas Beale

Hugh Grady wrote:

This thread seemed to start with the assumptions that firstly, a
DV_CODED_TEXT will inevitably be stored in XML and that, secondly, the
mapping to XML will be a very direct one, even up to the names of the
corresponding elements. Are either of these valid assumptions? Even if they
are, should the low-level implementation details really be a concern in the
ADL language design? I'm sure there are all sorts of clever tricks that
will be use to persist OpenEHR data efficiently, and it's by no means a
given that the persistence solution will involve XML when large datasets are
being managed.

I'm coming around to this argument, and I am also against making openEHR specifications particularly XML oriented (what if XML disappears one day?-)... Heath Frankel also pointed out to me privately that although having SNOMED::12314134 etc in the XML might be nice sometimes, but could also be incredibly annoying since you a) have to keep stripping the XXXXX:: bit off all the codes, and b) you may want to just search on code value or terminology id separately for various reasons.

I think that the correct answer to this is to leave the spec as is, and potentially to put a function into DV_CODED_TEXT that will generate the single string form; adding such a function makes no difference to the persisted data or anything else in the reference model.

The XML-schema still has the freedom to use the single string form if desired, but I think it might be better if we went the orthodox object route there as well. We will need more implementation experience to know which is best.

- thomas beale