DV_CODED_TEXT with open/extensible set of codes (value set)

I would like to pick up a discusssion we started in the ADL2 meeting last year in Braunschweig:

We had another request (this time from HiGHmed) to support open/extensible set of codes in a DV_CODED_TEXT.
In this case, the reason is that they cannot standardise all codes across the various sites, but wherever it is possible they want to use the available codes as defined. You will probably have seen various variations of such a request (e.g.: we cannot use Snomed, but need to use another terminology; we can only mandate a few, but then there may be more from the same terminology; we want to use our codes where possible, but need normal free text as well.)

To my knowledge there is not really a good way of supporting these use cases, neither in 1.4 nor 2.0.

Some have argued this should not be done, and this had been my opinion a few years ago as well, but for better or worse, I do not think we can or should ignore this any longer. The reason is that ignoring these requirements will only lead to workarounds we like even less from an idealistic point of view, there is no hope that these requirements go away in the next couple of years or even decades.
In any case, even if this does not result in a spec change but a clear recommendation of what to do (and not do) in such cases, that would valuable as well.

See a summary of the initial discussion at https://openehr.atlassian.net/wiki/spaces/ADL/pages/386007225/Local+Value-set+Replacement - We had identified 3 “Candidate Solutions”:

  • use a choice of DV_CODED_TEXT + DV_TEXT
  • subsumption code + reduction redefinitions
  • new required markers - such as a required flag on C_TERMINOLOGY_CODE or a recommendation: required | preferred | example - the latter is to what FHIR is doing if I remember correctly.

The concrete example is a DV_TEXT that is coded using terminology in the oet. But oet’s limit-to-list flag cannot applied in this case (-> the corresponding list_open flag is on CString cannot be applied on the Code Phrase). In case you are interested in the details of this example, see https://ckm.highmed.org/ckm/templates/1246.169.69/27 (Look for: Probenart). In addition, the “limit-to-list” discussion from the Braunschweig meeting is also related: https://openehr.atlassian.net/wiki/spaces/ADL/pages/386007194/OET+Template+-+Limit-to-list

From my point of view, I think we need to discuss what could be allowed in addition to the codes available in the coded text, and how this could be expressed:

  • any (free or coded) text
  • any other codes from the same specified terminology (snomed)
  • any other other codes from any terminology

The first is what HiGHmed need, and what is probably the most common requirement, and maybe all that is required.

Anyway, I hope we can continue the Braunschweig discussion and come to a conclusion.

4 Likes

For cross-reference - see https://openehr.atlassian.net/browse/SPECPR-70 (an older, closely related, PR).

1 Like

Further to today’s discussion, and @ian.mcnicoll 's worry about whether local code-sets can be ‘broken’ in a specialisation by adding a new code, not just restricting (as per standard constraint logic), well, this can be done, if you assume that the parent code set has this new marker of extensible | preferred or even example - i.e. not required. This is exactly the intention of this setting on a C_TERMINOLOGY_CODE - to allow the universal constraint rules to be broken.

So I think the real question is: assuming we implement this (in ADL2, as required: Boolean + strength or similar, and the same on C_CODE_PHRASE for ADL 1.4), then for all existing (ADL1.4) archetypes (say, in CKM), what are the values of those settings? I presume it is required = False and strength = preferred?

Potentially all existing archetypes would need to be migrated to include these settings, or we can potentially defined the additions in such a way that required = False and strength = preferred are the result if no changes are made to the archetype.

@rong.chen said something about ?only applying this logic to bindings - I’m not sure I quite understand yet. But guessing: if we bind ac1 to some http://snomed.info/vset/1234 we want to mark that with required+strength as above, but on the other hand if ac1 is internally defined to be {at1, at2, at3}, it seems to me logical to want to be able to apply the same settings to the latter as well. But I may have misunderstood.

Also, just to be clear on why value-set constraints are just the same as any other constraint in a machine-processing sense: the universal rule we have is that the data created according to any archetype will be a valid instance of any parent of that archetype. If that is not the case, archetypes don’t do anything, and querying based on matching specialised archetypes also won’t work.

So the required + strength marker on C_TERMINOLOGY_CODE indicates whether this rule can be broken or not. If in some particular data created by (local, obscure) archetype Ca (a specialisation of international archetype C), there is a coded field with required = False and a preferred vset of {mild, moderate, severe}, then if you put a code 'mild to moderate' (according to @ian.mcnicoll’s example) in that field, then this will still validate against the archetype C (the one that everyone has access to), precisely because of that setting.

As soon as you have required = False on C_TERMINOLOGY_CODE, in fact from a machine point of view, there is no constraint, it’s just different flavours of suggestion. I believe this is what our intention is.

Well, this isn’t true if we let create new nodes derived from RM classes in specialization/templates, and this is the exact same thing but with data types.

So-called ‘new’ nodes can only be created if the archetype permits it. By default, container attributes do allow such nodes, unless specifically prohibited using an exclusion node. See here.

However, the value range of a leaf type, i.e. Integer, String, Date, Terminology_code etc is not a set of child objects of a container, but a set of possible values for an attribute of that type. Just as a specialised archetype can’t redefine an Integer range (which is just a set of possible values) {|0..10|} to be {|0..11|}, neither can it redefine a Terminology_code value set from {at1, at2, at3} to {at1, at2, at3, at4}. If either of these redefinitions were allowed, the golden rule of conformance to parent archetypes of data created by specialisations would be violated.

Breaking the golden rule is pretty unattractive, since it means you are back to having no control over your data, and you’ll just get garbage, as in most systems today - the better ones only control input to some extent by virtue of UI controls with separately defined limits.

The proposal for required + strength for C_TERMINOLOGY_CODE can either be seen as a means of breaking the rule but documenting it, or it may be understood that, by default, terminology constraints are just various flavours of suggestion, unless explicitly marked as ‘required’, and so most of the time, no reliable constraint is actually in place at all, and so no or very limited machine processing can take place. Which seems to correspond to the situation in terminology as it is today…

Just to illustrate with a timely example from the Covid-19 modelling.

We created a specialization of the health risk evaluation archetype. The purpose of this was to model the Covid-19 risk factors. It was great to use the archetype for such purposes regarding the translations into different languages and also the documentation of the information model.

The table below lists the first risk factors.

Term Description
Contact with confirmed Covid-19 case Contact with confirmed Covid-19 case within 14 days before symptom onset.
Contact with suspected case/ pneumonia case Contact with suspected case/ pneumonia case within 14 days before symptom onset.
Contact with birds in China Contact with birds in China in 10 days before symptom onset.
Contact with confirmed human case of Avian flu in China Contact with confirmed human case of Avian flu in China in 10 days before symptom onset.
Contact with severe, unexplained respiratory disease Contact with severe, unexplained respiratory disease in 10 days before symptom onset.
Potential contact exposure based on location Potential contact exposure based on location.

Later in the epidemic situation new risk factors where added. Some might even be national or regional. We wanted a way to add more terms into the archetype.

And with the rapid development of the disease we also wanted to be able to add free text in some situations.

2 Likes

Yeah, I was referring to creating new DV_TEXT/DV_CODED_TEXT which are actually objects. Domain types kind of hide it a little, but it’s just a RM specific way of defining alternatives.

I would disagree in having garbage, you will have an actual archetype/template that governs your data, it wouldn’t be strictly a specialization, but neither are now with the ‘open unless specifically prohibited’ approach. I would argue that is less critical in this case because the semantics on the element node can be very clear (Sometime you could even infer the expression that defines the subset).

Arguably, if same ‘open unless specifically prohibited’ is also allowed then we could also simulate the strength attribute in ADL (i.e. we probably can find equivalent prohibited or fix things to certain value/prohibit other that can mimic that behavior), so strength probably becomes a useful syntactic sugar.

So I would assume this corresponds to the need to mark the term constraint as either preferred or maybe extensible as per our discussion yesterday?

1 Like

Well actually, strictly speaking, so-called ‘added’ nodes like ELEMENT and so on are proper specialisations, since they still fit the data space defined by the RM (e.g. CLUSTER.items: List<ITEM> can always have another ELEMENT of whatever form) - unless ‘closed’ or prohibited in some way. Currently the spec assumes ‘open unless closed’, but the opposite assumption could be made, or it could even be settable on a template or at OPT generation. Which way it is doesn’t matter that much, as long as it is clear what the rules actually in some particular data processing situation.

Creating a new DV_CODED_TEXT with extra codes in it, for the same data item that you already have a DV_CODED_TEXT[id4] for in the parent archetype might work in a kind of messy way, since that new DV_CODED_TEXT[id0.2] would not be understood as a specialisation of the existing one in the parent, but rather an alternative (in the ADL understanding of that word) to the one in the parent. So the runtime processor would allow you to put in any of the terms in either the [id4] or [id0.2] nodes which will create the effect you want in the data, but now there is no longer a single constraint containing the logical intended value set - it’s spread across two constraints. It would be like creating 2 separate Integer range alternatives like {|0..3|} and {|4..10|} to obtain the effect of {|0..10|}. So if you were visualising the archetype, or doing any tool processing, it won’t be obvious that some of the alternative constraints should logically be joined together.

Anyway, do you agree that the required + strength change achieves what we want (assume that the required is really just a function that checks to see if strength = required) or do we need something more? I’m not sure if I understood the last bit, you might need to expand on that a bit!

last bit was related to your second paragraph, explicit id4 specializations could be created, and even id4 be prohibited. This would also give some actual meaning to the occurrences in DV (i.e. if it is defined as alternatives of 1…1 objects it wouldn’t be able to remove “terms” from the subset, but if they are defined as 0…1 then you probably can)

In any case I think the proposed change would be enough :slight_smile:

Sounds good to me.
It has the additional advantage (over my initial proposal with code_list_open in 1.4) that 1.4 and 2.0 are more aligned. (My intention to suggest code_list_open was to keep the 1.4 change minimal and aligned with C_STRING’s list_open)

OET’s “limit-to-list=false” in the described case would then probably best match “required = false / strength=extensible” (unless someone thinks strength=“preferred” is better?)

I think the default if not specified for existing archetypes should be required=true / strength=required.
That’s my understanding of what it means at the moment and I wouldn’t change that. No migration needed, no change in meaning.

1 Like

Well that’s certainly what the formal meaning is. But is that compatible with @ian.mcnicoll’s requirements that internal code-sets can be ‘expanded’?

1 Like

Yes, I 'm ok with that - it reflects the current position in ADL1.4. It can be the default for existing archetypes where not expressly stated but we can always change existing archetypes (I assume going from required = false to true would be a breaking change) or decide what is needed for new archetypes.

1 Like

I agree on this.

As an alternative, could all elements with the combination DV_CODED_TEXT with a value set and DV_TEXT, have required=false as default?

I can see why you want this, but technically it would require interpreting a DV_CODED_TEXT with a certain definition differently depending on whether there was a DV_TEXT next to it. But I don’t think we need to do it quite like that - consider that the presence of the DV_TEXT enables you to use the DV_TEXT anyway, regardless of what the DV_CODED_TEXT says. So if the DV_CODED_TEXT says ‘required’ but there is a DV_TEXT as well, then I think the correct interpretation is:

  • if you code this, it must be from these codes
  • or you can use text
  • (but you can’t use some other code)

Does this achieve what you want?

But… the entire point of making this change was to get away from the DV_CODED_TEXT + DV_TEXT pattern?

Then I am confused: I read your previous question to be referring to archetypes that do contain a constraint of the form DV_CODED_TEXT + DV_TEXT (i.e. as currently exists)?

Technically (i.e. from a software processing point of view), if there is no DV_TEXT at all, then it means that you can’t create a DV_TEXT instance for that data item, and satisfy the archetype. I don’t see any way out of that…

But maybe I misunderstood the requirement.

The DV_CODED_TEXT + DV_TEXT pattern is generally (only?) used when we identify that an element needs to be coded and there is a set of terms that are generally used, but we can’t be sure that the codes we include in the DV_CODED_TEXT are the only ones for every use case. The idea is that the DV_TEXT would be made into a DV_CODED_TEXT, and new terms/codes added, in the template. We’ve been told that’s not a good pattern to use, for Technical Reasons.