A case for hierarchical value sets

siljelb · 11 September 2020 08:47

There has been sporadic previous discussion about the addition of hierarchical value sets. This is a much needed addition, which would save us from splitting up concepts that should be one element into several. A good example is “Regularity” from Pulse/heart beat (https://ckm.openehr.org/ckm/archetypes/1013.1.4295):

I’d like to revive the discussion about this issue, hopefully to get this defined and implemented in tools as soon as possible.

yampeku · 11 September 2020 19:31

I think splitting (specializing further) the possible codes is allowed. Why do you want to avoid that?. I would say even semantically is perfectly fine

thomas.beale · 12 September 2020 10:21

It is allowed, but currently, at-codes and ac-codes (in the scheme of id/at/ac of ADL2) can only be specialised according to the specialisation level of the archetype in which they appear. We had another thread in which we surmised that we could potentially relax this, such that you could created e.g. the following all in one archetype:

at4    - irregular heartbeat
  at4.1 - irregularly irregular heartbeat
    at4.1.1 - weirdly irregularly irregular heartbeat
  at4.2 - regularly irregular heartbeat
etc

Now, this might seem liberating, but I suspect it will quickly get out of hand, as specialised archetypes start creating specialised codes as well, but tools can no longer easily figure out where there are clashes etc. And pretty soon, we will start wishing we had an openEHR Terminology to contain all the at/ac code content of archetypes. Which we possibly should be thinking about!

Anyway, we will need to contemplate this as a group.

ian.mcnicoll · 12 September 2020 11:24

We are mixing up 2 slightly different ideas here.

Ability to specialise terms in a valueset. Which I think we agree is allowed but not supported in tooling. The ADL2 rule which I think is too restrictive is that any specialised code must inherit semantically from an existing code. I think ti is sufficient and way more realistic to rule that thany new codes must logically live within the scope of the valueset. Either way this is a rule that is almost impossible to police technically. We have to rely on the modellers good sense.
THe ability to layout the terms as a hierarchy, not just as flat list (which FHIR valuesets allow). In a perfect world this would follow the semantics closely but that is not always the case - tat was one reason why the Ocean OTE terminology server had some very sophisticated ways of re-shaping SNOMED valuesets to fit front-line use.

at3    - regular heartbeat
at4    - irregular heartbeat
- at4.1 - irregularly irregular heartbeat
- at4.2 - regularly irregular heartbeat
- at4.1.1 - weirdly irregularly irregular heartbeat

Because we want flatten out the bottom level of the irregular hierarchy.
FHIR uses a nested ‘contains’ structure for this -making it clear that no semantic relationship is inferred by such nesting. I’d certainly be happy to adopt that approach - it would make conversion/ reuse between openEHR and FHIR much easier as well.

I do also think we need our valuesets to allow for mixed codesystems. It is horrible but right now very common where local /national codesystems often need to be intermixed with SNOMED or LOINC.

thomas.beale · 12 September 2020 12:38

Well ADL2 doesn’t require that a specialised code such as at4.1 ‘inherit semantically’ from its parent at4. It’s just the case that the former code specialises the latter which means it is treated as being an IS-A child. That is the way it will be computed with in tools, runtime EHRs and especially queries. of course none of these tools look at the words associated with the codes.

So if you go and define at4.1 as not being a semantic specialisation of at4, none of the info systems or tools will care, but humans, CDS, GDL and anything relying on semantics won’t work properly when something coded with say ‘brain tumour’ comes under ‘angina’.

I don’t know what it means to say that ‘new codes must logically live within the value set’. We didn’t invent any new rules about how terminology works in archetype-land - we’re just following standard subsumption rules that any HIT system expects. To make a value set work properly, and ‘add’ new codes to it, requires that those new codes are children of some code(s) already in the value set; otherwise you just have a different value-set.

If we break that rule, bad things will happen, because data from a specialised archetype will no longer conform to the parent archetype. This logic is the same as in Snomed or any ontology.

Having said all that, if the newly added ‘terminology constraint strength’ is used to make the constraint extensible or preferred, then I guess you could add whatever you like to a local value set in a specialised archetype. But most of the runtime environment will just be ignoring whatever those value-sets say anyway, since they’re really just different kinds of suggestion.

ian.mcnicoll · 12 September 2020 13:43

But working HIT systems do not use termsets wholly based on standard subsumption rules - they are very often a mix, because the the terms that people want to work with often come from different subsumption hierarchies. We learnt that ages ago. it is exactly why terminology tooling such as OTS incorporates a whole bunch of facilities to create termsets that are not wholly based in subsumption hierarchies.

We discussed this before.

If I have a list which says

Mild
Moderate
Severe

and a specialisation wishes to add ‘Fatal’, I would say this is perfectly acceptable. It is in line with the scope and intent of the valueset as a whole unlike say ‘Blue’ .

Which parent code do I inherit from ? I would push back in inheriting from any parent as the closest match which says that Fatal inherits from Severe seems pretty spurious if not actually misleading. In any case it is most unlikely we will be doing any heavy-duty subsumption in internal term lists - the pule example is a s far as it goes.

You are trying to impose a level of ontological rigor which is simply not supported by any terminology, nor indeed should be, and which does not in any case line up with practical system development.
Just because terms do not inherit does not stop them being computable or adding value.

Of course their generalisability is more limited but that’s a trade-off that the modeller/system designer has to make

I’d like others views please as I cannot see this as any way workable or necessary. We have been talking on this long enough - time to decide. It is stopping us getting ADL2 into all the tooling.

Having said all that], can we also address Silje’s question which is not about ontology but visual containment/nesting.

As it stands the FHIR Valueset is hitting a lot more usability points from my PoV .

siljelb · 12 September 2020 16:39

Yes please.

ian.mcnicoll · 12 September 2020 17:04

And in turn can you give us your thoughts on the other issue around specialisations of existing codes, or more accurately whether it should be possible to extend an existing valueset, whether in a specialisation or a template to add terms that are not direct descendants of an existing term. Archetype Editor allowed us to do that, but not Archetype Designer as it is following current ADL2 rules. Or at least that is my understanding.

thomas.beale · 12 September 2020 17:41

Ok, that is true. I had not realised you were trying to make archetype value-sets work like intentional ref-sets.

NB: the entire following applies for the strict case, which we now call required, in a terminology constraint.

But ultimately, the same principle still applies down the specialisation lineage. In other words, if you establish an initial value set, say ac4 with a mix of codes that are theoretically from multiple subsumption hierarchies (i.e. meaning that’s where they would be found in they were in Snomed or FMA or OGMS or wherever), then if you want to specialise the value-set in a specialised archetype, you are still up for following standard subsumption rules, under each code in the original value set. If you don’t do this, it means that the data built according to that child archetype can contain codes that are not any kind of any of the things in the parent version of the value-set, which means:

a) the golden rule is broken, i.e. data created according to a child archetype doesn’t conform to the parent archetype.
b) the value-set in the parent archetype isn’t controlling anything - but if it is required, then the overall system will break.

In your example, ‘fatal’ should be defined as a child of ‘severe’ (one assumes death is a fairly severe thing…). If you create it separately, you will get something like this:

at10 - mild
at11 - moderate
at12 - severe
at0.1 - fatal

This will fail archetype validation, because at0.1 not being a child of anything already in the set breaks the golden rule. But lets just imagine such data were illegally written to the EHR anyway, when a query processor using just the shared parent archetype runs, and looks for instances containing <<at10 or <<at12 it won’t find the ‘fatal’ cases. It can’t because it doesn’t know anything about any at0.1.

None of this logic is specific to ADL - it’s just the basic maths of constraint logic - specialisations are subsets, otherwise the data don’t validate against the parent models. This isn’t really about ‘ontological rigour’ or anything like that - it’s just about making systems and tools function correctly based on the assumption of the artefacts they are computing on.

I appreciate that this isn’t always convenient, but computers are dumb - so we have to feed them things they expect, or else they will malfunction.

Once again, if the ‘strength’ is not set to required, then you could presumably do what you like in child archetypes. (I had not really thought this far in doing the ‘strength’ CR, so we need to discuss that case.) I would have thought that this change took care of the problem - if you know enough to specify required in the parent, it means you have built an exhaustive value-set (possibly including some NOC-like term); if the authors don’t know if they have full coverage, then they should just set extensible.

Just to be 100% clear - I don’t want to impede any evolution in handling terminology or whatever else. I just want to make sure the specs and the software based on them retain internal integrity, and things keep working as expected.

yampeku · 12 September 2020 20:41

As long as they are NOT domain types it should be supported (as they wouldn’t be different of any other node in terms of specialization)

sebastian.garde · 15 September 2020 13:37

It seems to me that required means you must stick to Thomas’ golden rule.
For any of the other binding strengths the bets are off and you can extend the value set: This comes with reduced computability of course whenever you set it to any other binding strength than required.
(For extensible you could imagine an implicit “other” code as part of the parent value set: It is this implicit code that is being specialised when you add a completely new code)

So as long as nobody suggests that we need to extend value sets with binding strength required we are more or less on the same page?

One question I have is if one can specialise existing(!) codes of a value set if binding-strength is required. Here Thomas’ golden rule is still in place but not sure if this is still strictly speaking an extension (and thus e.g. binding strength extensible). If you look at FHIR at https://www.hl7.org/fhir/terminologies.html#extensible they would think it is extensible in that case?

Not sure where this is leading us, but for purely visual hierarchies à la FHIR’s “contains”- you probably then also want a way to express whether a parent is abstract or not (I mean: whether it can be selected by a user as a code itself or only any of its leaf nodes).

siljelb · 15 September 2020 13:37

Yes.

This is what I want. In my example the child terms are strict semantic subclasses of their parent, but as Ian says this is often not the case.

This is what I want to be able to do. Note that this is not in a specialised archetype.

(I also want to be able to specialise terms in a value set in a specialised archetype, but that is a different thing. Could we move that discussion to a new thread? )

thomas.beale · 15 September 2020 13:47

Yes, you can do this but not in a specialised archetype (so you can’t do what you want just yet

However, I think you should consider that a PR, and if someone could create it with sufficient links to @siljelb and @ian.mcnicoll original discussion / points etc, that would be much appreciated (sorry I am in the middle of major home renovation and will be following things badly for a while).

So if we could then separate out Ian’s query, which AFAIK is solved by the value-set ‘strength’ work item already underway - @ian.mcnicoll if you think it is not, at least me and @sebastian.garde don’t seem to be understanding why, so we need to go a bit further

siljelb · 15 September 2020 14:28

I’ve given it a try: https://openehr.atlassian.net/projects/SPECPR/issues/SPECPR-364

ian.mcnicoll · 15 September 2020 19:33

We are talking

Thanks Sebastian,

A good summary of how we might have (possibly accidentally!!) arrived at consensus.

The golden rule applies for ‘required’ valuesets but not for other more relaxed flavours.
I think we should apply the rule strictly, even to specialisations of ‘required’. I only say that because I think you will find that we apply ‘required’ pretty rarely in archetypes (only where adherance is a clinical safety issue , much more commonly in templates.

Agree re the hierarchies. We probably do need something to signify that a branch node is potentially not selectable.

ian.mcnicoll · 15 September 2020 19:35

I think we are good @Thomas - consensus has been achieved - the white smoke has appeared.

thomas.beale · 15 September 2020 20:19

I love it when something burning is a good thing. I do need to add more into the constraint strength CR to cover specialisation rules for non-required status term constraints.

yampeku · 22 February 2021 14:02

Just thought that we don’t actually need to represent the hierarchy on the codes themselves (in fact, giving semantics to the codes is usually regarded as a bad idea). If a hierarchy is needed maybe is just a matter of defining the codes as atxxxx codes and probably add more semantic rules (SKOS?) to relate the atxxxx codes in a more semantic way, maybe even triplets

{at1,skos:broader,at2}

I propose this because I can see problems when you redefine children codes in several archetypes

at1 (a code)
at1.1 & at1.2 (codes children of at1, on same archetype)

Problem is that at1.1 and at1.2 don’t even need to be in the same “semantic level” and both be children of its parent. If you add more codes later in the specialization you cannot really correctly address the hierarchy (e.g. adding at1.3 that is semantically a child of at1 and a parent of at1.2)

thomas.beale · 22 February 2021 17:22

I could be attracted to this, but it means that now we would have two ways that values (at-) codes can be specialised: by the usual specialisation (at4 → at4.1), and by adding relationships. I think it will be hard to keep track of this.

Unless… specialisation of the form at4 → at4.1 → at4.1.6 (child archetypes) could be modified to always cause the addition of a relationship as you suggest. Then if we add this new method of simply asserting other relationships - then all semantic hierarchies of at-codes (ADL2) would be determined by those relationships, not by the codes.

So we would treat the code specialisation of the form created by at4 → at4.1, id9 → id9.3 etc, as something like a ‘mechanical’ mode of specialisation, whereas the addition of relationships would be understood as proper IS-A (or even other) relationships.

This is not a solid theory, just reacting to @yampeku !