Rules in archetypes - getting it right

I’ve been cogitating on how rules could be written properly in archetypes, a topic I never spent sufficient time on in the past. As usual, @pieterbos has pushed the envelope and forced me to think about this more carefully. @borut.fabjan, @yampeku and @sebastian.garde might want to take a look, since they are close to tools. If @ian.mcnicoll could take a look from the authoring point of view it would also be good. (Obviously I would like everyone to have a look, I’m just pinging those group since they probably have concrete opinions).

The challenge in getting rules (i.e. expressions in the form of assertions, assignments etc) in archetypes right is to solve the semantics of how they map to runtime data instances. Rules are instance-level entities whose symbols attach to values at runtime. However, as defined in an archetype, they can only refer indirectly to data, via the archetype paths. So the question is what happens at runtime exactly? Given that data will usually have multiple occurrences of particular structures (e.g. Event structures, multiple Elements in a Cluster, etc), how do those multiple instances get mapped to the symbols in rules? Couldn’t there be an explosion of permutations? Generally speaking, it is intuitively obvious how rules are intended to be applied, but specifying them formally so as to be implemented is not quite so obvious.

Below I propose a couple of variants on how we should understand rules in archetypes that we could use to make sense of these questions.

The first is the default approach. This uses the Apgar sum as a simple example of a rule. In the below, you will see the rule, and near the bottom, the bindings to paths. The intention of the rule is that it is applied to each Apgar Event, i.e. potentially up to 5 times (typically 2 or 3).

archetype (adl_version=2.0.6; rm_release=1.0.3; generated; uid=f27fac48-3acb-4061-9619-c783fd8346ab)
  openEHR-EHR-OBSERVATION.apgar.v1.0.1

description
  lifecycle_state = <"unmanaged">
  original_author = <...>
  details = <...>
    
definition
  OBSERVATION[id1] ∈ {  -- Apgar score
    data ∈ {
      HISTORY[id3] ∈ {
        events ∈ {
          POINT_EVENT[id4] occurrences ∈ {0..1} ∈ { -- 2 minutes
            offset ∈ {
              DV_DURATION[id42] ∈ {
                value ∈ {PT1M}
              }
            }
            data ∈ {
              ITEM_LIST[id2] ∈ {
                items ∈ {
                  ELEMENT[id10] occurrences ∈ {0..1} ∈ {
                    value ∈ {
                      DV_ORDINAL[id43] ∈ {
                        [value, symbol] ∈ { -- apgar_respiratory_value
                          [{0}, {[at11]}],
                          [{1}, {[at12]}],
                          [{2}, {[at13]}]
                        }
                      }
                    }
                  }
                  ELEMENT[id6] occurrences ∈ {0..1} ∈ {...}
                  ELEMENT[id14] occurrences ∈ {0..1} ∈ {...}
                  ELEMENT[id18] occurrences ∈ {0..1} ∈ {...}
                  ELEMENT[id22] occurrences ∈ {0..1} ∈ {...}
                  ELEMENT[id26] occurrences ∈ {0..1} ∈ {
                    value ∈ {
                      DV_COUNT[id48] ∈ {
                        magnitude ∈ {|0..10|} -- apgar_total_value
                      }
                    }
                  }
                }
              }
            }
          }
          POINT_EVENT[id27] occurrences ∈ {0..1} ∈ { -- 2 minutes
            offset ∈ {
              DV_DURATION[id49] ∈ {
                value ∈ {PT2M}
              }
            }
            data ∈ {
              use_node ITEM_LIST[id50] /data[id3]/events[id4]/data[id2]
            }
          }
          POINT_EVENT[id28] occurrences matches {0..1} matches {...}  -- 3 minutes
          POINT_EVENT[id29] occurrences matches {0..1} matches {...}  -- 5 minutes
          POINT_EVENT[id32] occurrences matches {0..1} matches {...}  -- 10 minuten
                }
            }
        }
    }
  
rules
  check apgar_total_value = apgar_heartrate_value + apgar_respiratory_value + 
        apgar_reflex_value + apgar_muscle_tone_value + apgar_skin_colour_value
    
symbols
  symbol_definitions = <
    ["en"] = <
       ["apgar_respiratory_value"] = <
           text = <"Apgar score respiratory value">
       >
       ["apgar_heartrate_value"] = <
           text = <"Apgar score heartrate value">
       >
       ["apgar_muscle_tone_value"] = <
           text = <"Apgar score muscle tone value">
       >
       ["apgar_reflex_value"] = <
           text = <"Apgar score reflex value">
       >
       ["apgar_skin_colour_value"] = <
           text = <"Apgar score skin_colour value">
       >
       ["apgar_total_value"] = <
           text = <"Apgar score total value">
       >
    >

    symbol_bindings = <
      ["apgar_respiratory_value"] =   <"/data[id3]/events/data[id2]/items[id10]/value[id43]/value">
      ["apgar_heartrate_value"] =     <"/data[id3]/events/data[id2]/items[id6]/value[id44]/value">
      ["apgar_muscle_tone_value"] =   <"/data[id3]/events/data[id2]/items[id14]/value[id45]/value">
      ["apgar_reflex_value"] =        <"/data[id3]/events/data[id2]/items[id18]/value[id46]/value">
      ["apgar_skin_colour_value"] =   <"/data[id3]/events/data[id2]/items[id22]/value[id47]/value">
      ["apgar_total_value"] =         <"/data[id3]/events/data[id2]/items[id26]/value[id48]/magnitude">
    >
  >

These bindings are to absolute paths, i.e. from the root point of archetype archetype. But if we think about it more carefully, the rule is actually logically embedded within the enclosing ITEM_LIST object, and it could even be written like that (an idea I proposed 10 years ago):

              ITEM_LIST[id2] ∈ {
                items ∈ {
                  ELEMENT[id10] occurrences ∈ {0..1} ∈ {
                    value ∈ {
                      DV_ORDINAL[id43] ∈ {
                        [value, symbol] ∈ { -- apgar_respiratory_value
                          [{0}, {[at11]}],
                          [{1}, {[at12]}],
                          [{2}, {[at13]}]
                        }
                      }
                    }
                  }
                  ELEMENT[id6] occurrences ∈ {0..1} ∈ {...}
                  ELEMENT[id14] occurrences ∈ {0..1} ∈ {...}
                  ELEMENT[id18] occurrences ∈ {0..1} ∈ {...}
                  ELEMENT[id22] occurrences ∈ {0..1} ∈ {...}
                  ELEMENT[id26] occurrences ∈ {0..1} ∈ {
                    value ∈ {
                      DV_COUNT[id48] ∈ {
                        magnitude ∈ {|0..10|} -- apgar_total_value
                      }
                    }
                  }
                }
                rules
                    check apgar_total_value = apgar_heartrate_value + 
                                              apgar_respiratory_value + 
                                              apgar_reflex_value + 
                                              apgar_muscle_tone_value + 
                                              apgar_skin_colour_value
              }

For this to make sense, the bindings would need to be something like this, i.e. relative to the path of the enclosing object:

    symbol_bindings = <
      ["apgar_respiratory_value"] =   <"data[id2]/items[id10]/value[id43]/value">
      ["apgar_heartrate_value"] =     <"data[id2]/items[id6]/value[id44]/value">
      ["apgar_muscle_tone_value"] =   <"data[id2]/items[id14]/value[id45]/value">
      ["apgar_reflex_value"] =        <"data[id2]/items[id18]/value[id46]/value">
      ["apgar_skin_colour_value"] =   <"data[id2]/items[id22]/value[id47]/value">
      ["apgar_total_value"] =         <"data[id2]/items[id26]/value[id48]/magnitude">
    >

Now if we want to stick to rules expressed only at the top-level of an archetype, we could define the path bindings like this:

    symbol_bindings = <
      ["apgar_events"] =   <
          root = <"/data[id3]/events">
          children = <
              ["apgar_respiratory_value"] =   <"data[id2]/items[id10]/value[id43]/value">
              ["apgar_heartrate_value"] =     <"data[id2]/items[id6]/value[id44]/value">
              ["apgar_muscle_tone_value"] =   <"data[id2]/items[id14]/value[id45]/value">
              ["apgar_reflex_value"] =        <"data[id2]/items[id18]/value[id46]/value">
              ["apgar_skin_colour_value"] =   <"data[id2]/items[id22]/value[id47]/value">
              ["apgar_total_value"] =         <"data[id2]/items[id26]/value[id48]/magnitude">
          >
       >
    >

In the above, there are now ‘root points’ and ‘child paths’. The rule could then be expressed like this:

rules
  check
    for_all evt: apgar_events | 
         evt/apgar_total_value = 
           evt/apgar_heartrate_value + 
           evt/apgar_respiratory_value + 
           evt/apgar_reflex_value + 
           evt/apgar_muscle_tone_value + 
           evt/apgar_skin_colour_value

This form of the rule can easily be executed at runtime, since the bindings of the instance data items to the symbols are now unambiguous. However, both the rule and the bindings are more complex. If we just used the original form in the first example, the semantics in this more complex rule and bindings have to be inferred.

We could potentially allow the following in order to simplify the rule a bit:

rules
  check
    apgar_events.for_all (
         apgar_total_value = 
         apgar_heartrate_value + 
         apgar_respiratory_value + 
         apgar_reflex_value + 
         apgar_muscle_tone_value + 
         apgar_skin_colour_value
    )

This requires the runtime to infer that the expression inside the parentheses is really something like the following lambda:

  agent (evt: EVENT) {
    evt.child(apgar_total_value) = evt.child(apgar_heartrate_value) + ...
  }

There are more challenges when the rule(s) include paths from elsewhere in the archetype (e.g. /protocol or /state in an openEHR archetype) because then you have some paths inside a multiply valued node (EVENT[*]) and some outside. I won’t complicate the story with that for now, since I think the above gives us enough to think about.

In summary: the question is about striking the right balance between explicitly stated and inferred semantics for rules within archetypes.

Reality check: it should be remembered that there are other places rules could be applied to data, for example, a more flexible approach would be to associated a DLM (Decision Logic Module) with a template, or just an application. This allows rules to be written that can be bound to:

  • variables anywhere in a template, not just in an archetype
  • data returned from AQL queries ranging across completely separate Compositions etc from the EHR
  • data not even in an openEHR system, e.g. MPI demographic data.

The general case is the reason for the work on DLMs and the Subject Proxy Service.

I mention this because I think that there are limits on what we should try to do with rules inside archetypes, and the complexity of the binding problem may not justify making them too powerful. Others may disagree :wink:

My personally preferred solution would be the more explicit (i.e. more complex) representation, because:

  • it makes implementation of rule execution much easier
  • we can assume that rule-writing will be tool assisted, e.g. AD, LinkEHR etc would add some smarts to make it easy for the author to select paths etc, and the tool will generate the correct rule expression, bindings etc.
  • rule probably won’t be used that much in archetypes anyway, since more realistic rules will usually range over variables from multiple archetypes within a template or an application retrieval data set.

(It is also simpler to specify in the ADL specification.)

All feedback welcome.

This is how we deal with permutations in linkEHR, by defining “object builders” that tell you how that subtree should look like. Object builders can also have conditions, (for example, from this root get only the medications with no end time) and then apply the rules only to these

I looked for a relatively simple actual example of a score calculation, that is a bit more complex than the Apgar score. I found the Motricity Index. It’s a couple of simple sums, with two sub-scores, a summed total, and an average of the two scores as its final score. With the exception that the result of 33 + 33 + 33 is defined as 100 instead of 99. Below I added a simple visualization of the archetype and the rule, because it is slightly easier to read than the rules. All the elements are ordinals, except for the summed scores, which are DV_COUNTs.

This is a visualization of an ADL 2 archetype, I uploaded it as openEHR-EHR-OBSERVATION.motricity_index.v1.0.0.adls (22.9 KB)

Note that here some fields are both used as input and output bindings at the same time, and it assumes that the rules are evaluated from top to bottom. Whether this is correct according to the standard is impossible to say, since the execution model is not defined, but it is how we interpreted it.

Some observations:

  • using subscores that make sense as a separate score in the model, that are also used to compute a total score is very common. That means there needs to be some kind of way to express that, for example with both input and output bindings at the same time
  • naming the input/output in different languages is obviously useful in many places, but in archetypes specifically, I wonder how useful these are - they could also be auto-generated from the archetype, as shown in the visualisation. For CDS/GDL that is obviously very different.
  • names for the tags for rules in many languages could actually be useful, as well as custom messages to show in language if validations defined in rules are failing.
  • some earlier versions of a new syntax for this expression language did not allow for variables, only direct path bindings. This example would be a problem without variables.
  • It could be a good idea to explicitly bind these rules to specific parts of the archetype, with the semantics ‘these should be executed and valid for every occurrences in this part of this archetype’. This could be as simple as defining a for_all block around all these rules, saying it’s for all events in the motricity score. Note that in these cases, it could still be required to use values outside the given scope, at least as long as they are single valued or processed with some operation or function that accepts lists.

But we probably need agreement on the following in order to agree on working on this:

  • A clear list of use cases for these
  • A list of goals we want to achieve
  • A number of people actually wanting to help write and implement a new standard.

Without those being present, I find a new language very hard to justify.

1 Like

Nice example!

Just to be clear: we already have a new language (meta-model = BMM Expressions; syntax = EL; Antlr4 grammar imminent), the question is just how much we use it in archetypes. The basic difference in this newer language is that it separates expressions from what the symbols are bound to, so that all expressions and rules, no matter where they appear, have the same syntax and semantics.

I’m fine if we don’t use this in archetypes - in fact, my main proposal is just to introduce the path-binding so as to make the rules purely symbolic. But we can stick with the inline paths as well, although it complicates specification maintenance a bit, because we need to reinstate the older expression language (which was never completed, and had significant limitations). But that’s life in specification-land - if that’s the consensus, I’m happy to do it.

I think the bigger question is what do we see as the general approach for representing:

  • general CDS guidelines, scores, plan-related logic etc that range over data sets larger than just an archetype
  • cross-field logic in application forms

I have some concern that if we represent say Apgar, GCS, Barthel, etc in one way, and these CDS and plan rules in a different way then it is at least confusing as to how we sell this to the outside world.

Here’s an example of some CDS guideline logic,:

Some of these have related plans, which can be seen if you scroll back up. The notion of ‘subscores’ is generalised to rule chains.

It would be easy to represent say Apgar, GCS and other rules that relate to single archetypes in this way as well. If we put them in archetypes, we have 2 representations.

Indeed there are already GDL2 representations of:

So concrete questions in the future might be: where do I look for openEHR’s GCS score calculator; where do I look for openEHR’s qRisk3 guideline?

Once again - I’m fine with all this as long as we are making a conscious choice about it. This is a separate question from whether we make an official release that captures the state of the ADL rules in the form implemented by Nedap and Veratech (and indeed by me, in ADL Workbench 10 years ago) - we are doing that with ADL 2.1.0 anyway.

1 Like

I converted the motricity example to the symbolic form, just to think about it. Below, the original, then the converted.

rules
    $arm_score:Integer ::= /data[id2]/events[id3]/data[id4]/items[id5]/items[id7]/value/value + /data[id2]/events[id3]/data[id4]/items[id5]/items[id9]/value/value + /data[id2]/events[id3]/data[id4]/items[id5]/items[id11]/value/value
    arm: $arm_score < 99 implies /data[id2]/events[id3]/data[id4]/items[id5]/items[id13]/value/magnitude = $arm_score
    arm_round_up: $arm_score = 99 implies /data[id2]/events[id3]/data[id4]/items[id5]/items[id13]/value/magnitude = 100
    $leg_score:Integer ::= /data[id2]/events[id3]/data[id4]/items[id6]/items[id14]/value/value + /data[id2]/events[id3]/data[id4]/items[id6]/items[id16]/value/value + /data[id2]/events[id3]/data[id4]/items[id6]/items[id18]/value/value
    leg: $leg_score < 99 implies /data[id2]/events[id3]/data[id4]/items[id6]/items[id20]/value/magnitude = $leg_score
    leg_round_up: $leg_score = 99 implies /data[id2]/events[id3]/data[id4]/items[id6]/items[id20]/value/magnitude = 100
    sum_score: /data[id2]/events[id3]/data[id4]/items[id24]/value/magnitude = /data[id2]/events[id3]/data[id4]/items[id5]/items[id13]/value/magnitude + /data[id2]/events[id3]/data[id4]/items[id6]/items[id20]/value/magnitude
    total_score: exists /data[id2]/events[id3]/data[id4]/items[id24]/value/magnitude implies  (/data[id2]/events[id3]/data[id4]/items[id22]/value/magnitude = /data[id2]/events[id3]/data[id4]/items[id24]/value/magnitude / 2) 

Reworked in binding form:

rules
    arm_score_computed:Integer ::= pinch_grip + elbow_extension + shoulder_abduction
    arm: arm_score_computed < 99 implies arm_score = arm_score_computed
    arm_round_up: arm_score_computed = 99 implies arm_score = 100

    leg_score_computed:Integer ::= ankle_dorsiflexion + knee_extension + hip_flexion
    leg: leg_score_computed < 99 implies leg_score = leg_score_computed
    leg_round_up: leg_score_computed = 99 implies leg_score = 100

    sum_score: summed_score = arm_score + leg_score
    total_score: exists summed_score implies  (total_score = summed_score / 2) 

symbols
  bindings = <
    ["pinch_grip"] = <"/data[id2]/events[id3]/data[id4]/items[id5]/items[id7]/value/value">
    ["elbow_extension"] = <"/data[id2]/events[id3]/data[id4]/items[id5]/items[id9]/value/value">
    ["shoulder_abduction"] = <"/data[id2]/events[id3]/data[id4]/items[id5]/items[id11]/value/value">
    ["arm_score"] = <"/data[id2]/events[id3]/data[id4]/items[id5]/items[id13]/value/magnitude">
    ["ankle_dorsiflexion"] = <"/data[id2]/events[id3]/data[id4]/items[id6]/items[id14]/value/value">
    ["knee_extension"] = <"/data[id2]/events[id3]/data[id4]/items[id6]/items[id16]/value/value">
    ["hip_flexion"] = <"/data[id2]/events[id3]/data[id4]/items[id6]/items[id18]/value/value">
    ["leg_score"] = <"/data[id2]/events[id3]/data[id4]/items[id6]/items[id20]/value/magnitude">
    ["summed_score"] = <"/data[id2]/events[id3]/data[id4]/items[id24]/value/magnitude">
    ["total_score"] = <"/data[id2]/events[id3]/data[id4]/items[id22]/value/magnitude">
  >

I’ve been thinking further about the rules question.
Let’s make the following assumptions:

  • rules are in a separate section of an archetype (and not spread through the main definition section)
    • why? This distinguishes cleanly ‘first order’ constraints (ADL) at a per-object level from n-th order constraints (rules) that operate across n variables
  • we use a style of bindings to establish ‘scopes’, in the same way we would understand it in programming, rather than every variable being bound to an absolute path.
    • why? This has the equivalent effect of rules being stated within the relevant {} ADL sections, without literally having to do that. It also makes the execution engine easier to implement, since it doesn’t have to infer much.

I would therefore propose the following way of doing rules, as per this example:

In the above, the expression evt/apgar_total_value is shorthand for evt.child(apgar_total_value), i.e. an Xpath-like expression that gets a child at a certain path.

If we were to agree on this, the next question is in what version do we add it? Possibilities:

  • I think we leave it out of Release 2.2.0, which otherwise has additions that are mostly implemented or being implemented in tools.
  • Possibly the same argument for release 2.3.0.
  • We define the next release of ADL 2.5, and add it there.
  • We define a 3.0 release and add it there.

I would propose we put it in a release called ADL 2.5, because we are not proposing to change ADL itself, i.e. ADL2 and AOM2 are still valid; the rules section just adds something.

If we agree on this, I will try to complete ADL 2.2 and ADL 2.3 in a form that doesn’t update the rules section syntax (the meta-model will change to the newer EL meta-model, because the old EL meta-model is now some years out of date and no longer maintained).

Thoughts?

1 Like

One thing that would break every ADL parser out there but could be quite useful in this context would be to declare aliases in a kind of AQL way:
e.g.

ELEMENT[id10] as myvariable occurrences ∈ {0..1} ∈ {
                    value ∈ {
                      DV_ORDINAL[id43] ∈ {
                        [value, symbol] ∈ { -- apgar_respiratory_value
                          [{0}, {[at11]}],
                          [{1}, {[at12]}],
                          [{2}, {[at13]}]
                        }
                      }
                    }
                  }

This would remove the need of the symbol section, arguably making the archetype less verbose and more clear. In any case I would assume this additional rules to be declared when creating templates, so having them into the archetypes is not a big deal.

2 Likes

There’s a better way to do that, but it’s for ADL3, which is what is in the Decision Language, i.e. symbols are just codes. Then you can have:

definition
  ELEMENT[respiratory_effort] occurrences ∈ {0..1} ∈ {
    value ∈ {
      DV_ORDINAL[id43] ∈ {
        [value, symbol] ∈ {
          [{0}, {[respiratory_effort_absent]}],
          [{1}, {[respiratory_effort_weak]}],
          [{2}, {[respiratory_effort_normal]}]
        }
      }
    }
  }

terminology
	term_definitions = <
		["en"] = <
			items = <
				["apgar_score"] = <
					text = <"Apgar score">
					description = <"A tool to assess the clinical status of the newborn infant immediately after birth and their response to resuscitation.">
				>
				["respiratory_effort"] = <
					text = <"Respiratory effort">
					description = <"Observation of the infant's respiratory effort.">
				>
				["absent"] = <
					text = <"Absent">
					description = <"No effort to breath.">
				>
				["respiratory_effort_weak"] = <
					text = <"Weak or irregular">
					description = <"Some effort to breath, moving chest.">
				>
				["respiratory_effort_normal"] = <
					text = <"Normal">
					description = <"Breathing normally or crying.">
				>  

I like that but how do specialisations work?

@ian.mcnicoll are you referring to my last proposal, i.e. #6 above?

Yes - specialisations. I think the realistic approach is that the terms (i.e. symbol names) created for rules exist in all children, as they would in say a Java or Python class inheriting names from a parent class.

For the rules, they can either be added to (rules with new tag) or overridden - define a rule with the same tag. I.e. in the following:

    Apgar_sum_validity: for_all evt: apgar_events | 
         evt/apgar_total_value = 
           evt/apgar_heartrate_value + 
           evt/apgar_respiratory_value + 
           evt/apgar_reflex_value + 
           evt/apgar_muscle_tone_value + 
           evt/apgar_skin_colour_value

In the above, Apgar_sum_validity is the tag. If you were to create a rule in a child with this same tag (some specialised idea of Apgar sum), then it would replace the original.

If we don’t like ‘tags’, we can define rules as proper functions, as I have done in the Decision Language. This would technically be a breaking change, but it would be a lot nicer. Then you would have:

apgar_sum_valid: Boolean
    Result := for_all evt: apgar_events | 
         evt/apgar_total_value = 
           evt/apgar_heartrate_value + 
           evt/apgar_respiratory_value + 
           evt/apgar_reflex_value + 
           evt/apgar_muscle_tone_value + 
           evt/apgar_skin_colour_value
    ;

Now we have a properly defined function apgar_sum_valid, not just an expression, and it can be made clearer when this function is executed - for example, at every call to the archetype validator, or on some other basis. Then how redefinition works is clearer as well, since apgar_sum_valid has to remain a Boolean-returning 0-argument function in any child.

This would break Nedap’s current archetype-level rules, but am interested to know what @pieterbos thinks as a future direction.

I’m thinking how would querying work with a term specialised from “apgar_score” ?

Ah - the ADL3 version.

For the purpose of illustration, assume that some org ACME published a specialised kind of Apgar score called a ‘neonate score’.

There are two ways to go:

Partially self-defining hierarchy

We specialise the term apgar_score with the term apgar_score.neonate, i.e. using a dot (or some other character) to esablish the IS-A relation. Then an even more specialised code could be apgar_score.neonate.weird. If we wanted to preserve the ability to detect the depth at which a code is defined, then we need to also allow apgar_score..strange, which is the same idea as a code like at2.0.4. For the same reason, codes like ..really_strange would then be possible. It is clear that ids starting with dots (or whatever else) and /or having double and triple dots internally are not going to be attractive kinds of ids for writing rules. We could go halfway, such that apgar_score.neonate is a specialisation child (at any level) of apgar_score, and really_strange is just a new code, again at any level. apgar_score..strange would just be apgar_score.strange, no matter at what level apgar_score and apgar_score.strange were defined at.

This gets us out of defining IS-A relations explicitly, but we’d lose the knowledge of what level of archetype specialisation a code was defined at (and we probably wouldn’t care). I think this might work well enough, but needs to be analysed more carefully.

Full independent terminology approach

The above is likely to be unweildy, and a better approach is likely to be that we forget the old system of self-defining term hierarchies base on lexical ids like at4 ← at4.1 ← at4.1.7 etc, and just do the same as Snomed, modern ontologies etc. So you could potentially define acme_neonate_score as a child of apgar_score. To do this, we have to do what Snomed or any ontology does: add the relation as well, i.e. acme_neonate_score IS-A apgar_score. This might look as follows (in JSON5-ish syntax):

-- in Parent archetype 
symbols: {
  symbol_definitions: {
    en:  {
       apgar_respiratory_value: {
           text: "Apgar score respiratory value"
       },
       apgar_heartrate_value: {
           text: "Apgar score heartrate value"
       },
       apgar_muscle_tone_value: {
           text: "Apgar score muscle tone value"
       },
       apgar_reflex_value: {
           text: "Apgar score reflex value"
       },
       apgar_skin_colour_value: {
           text: "Apgar score skin_colour value"
       },
       apgar_score: {
           text: "Apgar score total value"
       }
    }
  }
}


-- in child archetype:
symbols: {
  symbol_definitions: {
    en:  {
       neonate_score: {
           text: "ACME neonate score total value"
       }
    }
  }
symbol_relations: [
  ["neonate_score", "is-a", "apgar_score"],
]

-- In 2nd level child
symbols: {
  symbol_definitions: {
    en:  {
       weird: {
           text: "weird kind of neonate score"
       }
       really_strange: {
           text: "something really strange"
       }
    }
  }
symbol_relations: [
  ["weird", "is-a", "neonate_score"],
  ["really_strange", "is-a", "apgar_score"]
]

This approach means that querying has to work logically like you are querying with Snomed or whatever, except that the Query processor has the archetypes to provide the term hierarchy.

Controversial, I know but I think what we will mostly need is the self-defining hierarchy, as these kind of term specialisations will not be commonly required and ‘proper ontologising’ even more rarely.

apgar_score.0.strange or apgar_score._.strange

If there is a need for proper ontologising, I would leave that to labelling with an external terminology, otherwise we enter a whole world of other pain!!

The argument against doing that is that, say you define this variable neonate_score, and you also want it to be in the IS-A relationship with apgar_score, for querying purposes. If we stick with self defining, wherever the neonate variable is mentioned, it will look like this:

apgar_score.neonate_score: Integer
    Result := a + b + c + 1.4 -- or whatever
    ;

adjusted_score: Integer
    Result := 80% * (apgar_score.neonate_score + 2 ) -- or whatever
    ;

Not easy to read, plus verbose. Compared to:

neonate_score: Integer
    Result := a + b + c + 1.4 -- or whatever
    ;

adjusted_score: Integer
    Result := 80% * (neonate_score + 2 ) -- or whatever
    ;

If we add the capability to represent IS-A relationships to archetypes (we are already talking about this remember), processing them at runtime in the query processor will not be complicated - it’s not like the processor has to call out to an actual separate ontology or terminology.

As usual, to be discussed.