Missing rule in AOM 1.4 for non-unique sibling nodeIds

pablo · 27 March 2024 19:30

We are working on modeling questionnaires, which has been an ongoing discussion for some time now.

For short, we are reusing a generic question archetype in a way that the template (OPT) has a multiple attribute with C_OBJECTs that have the same nodeId but different names (using the name as a differentiator).

While reading the AOM 1.4 spec it says:

4.2.3.1. Node_id and Paths

The node_id attribute in the class C_OBJECT, inherited by all subtypes, is of great importance in the archetype constraint model. It has two functions:

+ it allows archetype object constraint nodes to be individually identified, and in particular, guarantees sibling node unique identification;

+ it is the main link between the archetype definition (i.e. the constraints) and the archetype ontology, because each node_id is a 'term code' in the ontology section.

REF: Archetype Object Model 1.4 (AOM1.4)

But in our case, this “rule” is broken: guarantees sibling node unique identification since siblings can’t be identified by node_id alone and need the name too.

From other sample OPTs in the CKM and the AOM itself, the way we are using the OPT seems correct, since I can’t find a rule in AOM 1.4 that says we can’t put two C_OBJECT that have the same node_id inside a the same C_MULTIPLE_ATTRIBUTE.children. Though I can’t find either a rule that says that node_id plus name should be unique, because now if we allow those two rules, then it’s impossible to know with which C_OBJECT inside the C_MULTIPLE_ATTRIBUTE.children a given RM object instance matches (important for validating RM instances against an OPT and also for querying).

I think a rule for unique node_id and name should be added to the AOM 1.4 specs, this is crucial for models that reuse the same archetype at the same level, like with questionnaire questions or with laboratory test results analytes

With that rule, we can say that an OPT that breaks it is not valid, right now we can’t, so systems implement that case in different ways, making behavior unpredictable (note this has to do with conformance).

We are actually implementing that rule, and returning an error if the OPT doesn’t comply.

sebastian.garde · 2 April 2024 08:32

It may be worth clarifying this rule. Agree that it does not seem to phrased correctly for these cases.

Re what is correct: I think the original idea at least was that it does not have to be the name, but can be another attribute as well, as long as the result is that the sibling nodes can be uniquely identified. Typically, it is the name of course.

thomas.beale · 2 April 2024 16:12

That is correct.

pablo · 2 April 2024 17:11

If the C_OBJECTs that appear in the same C_MULTIPLE_ATTRIBUTE.children comply with the same archetype_id, I guess any constraint that is different on any attribute of the C_OBJECTs could serve as a differentiator. For instance if the C_OBJECTs are for ELEMENTs, and each ELEMENT.value has a different constraint, that might work, though to be sure the constraints codomain shouldn’t intersect (that’s the allowed values for different constraints like A: 1…10 and B: 3, 21, 50; you see that the value 3 is valid against both constraints, so when you have 3 in a data instance you don’t know with which specific C_OBJECT it’s mapped to).

When the constraints are text-based or code-based it’s easier for the modeler to assign texts or codes that don’t match (or have intersecting codomains).

If that reasoning is correct, I think the specs might be missing a more strict definition (maybe it’s there and it’s just that I can’t find it!).

If such rule is added, like an archetype or template validation rule, we can use it to check the validity of such cases of repeating constraints inside C_MULTIPLE_ATTRIBUTES, though I think this also applies when having C_SINGLE_ATTRIBUTES.alternatives, but in that case it’s sufficient for the data to match ANY of the alternatives, so the first match is enough.

This is also important for querying, since the paths for the C_MULTIPLE_ATTRIBUTE.children might be all the same, even if they have different constraints for attributes like name inside, so paths in AQLs might need extra predicates like name/value=... or if another attribute is used, some_attribute/some_field=....

@thomas.beale what do you think?

Going one extra step, and this is a separate topic (I don’t want to move the focus from the first question about improving the rules in the spec), if we don’t deprecate AOM/TOM 1.4 anytime soon, what do you think about adding another node differentiator?

COMPO

content
- ENTRY arch_id_1, node_idx_1
- ENTRY arch_id_1, node_idx_2
- ENTRY arch_id_1, node_idx_3

Though I know something like this was added to ADL2, I’m not sure when we will actually stop using 1.4 stuff.

thomas.beale · 2 April 2024 17:42

That’s certainly true, for the reason you describe.

I’ll need to look, but one thing to remember is that the semantic definitions are vastly better in ADL2/AOM2 than in the ADL/AOM14 docs. And tooling wise, the preferred path these days is to convert ADL 1.4 archetypes into ADL2 (via Archie or other tool) and then everything that goes on is defined by AOM2 / ADL2 semantics.

However, I think you may be right - the question of value constraint overlap is not formally defined from memory, so there is still something else we need to do.

We created a PR and I think a CR (for a new attribute called sequence_id or similar) but we did not action it so far.

pablo · 2 April 2024 20:54

I tried to find a reference for this case in AOM and I couldn’t. This might be a marginal case that is why others might not focus to much on this, but happens when reusing the same structure in the same place for different things of the same type: like questionnaire questions or laboratory results analytes.

Was that for 1.4? I’ll check JIRA, it might worth another try.

(UPDATE: I see this one but the change proposed in on RM not on AOM, I would expect the differentiator to be in C_OBJECT Issue navigator - openEHR JIRA)

I believe adding an extra differentiator, at least for this case, would be a simple change in tools, though it also has some repercussions on paths and AQL, but I feel it’s worth the improvement since this is generating the need of workarounds and patches outside the specs that might not work everywhere.

thomas.beale · 2 April 2024 21:02

See here. [SPECRM-63] - openEHR JIRA

Many of us have thought it would have been good to add in the past, but it raises interesting and non-trivial questions to do with versioning. The problem is that adding it now would not be easy, because the various vendor products all deal with the requirement in other ways. This is not to say it should not be done, but by now it might be in openEHRv2. Whether it could be done in current openEHR is a cost/value question for platform implementers I guess.

pablo · 2 April 2024 21:06

I’m sure it would also affect specialization.

I see this one but the change proposed in on RM not on AOM, I would expect the differentiator to be in C_OBJECT Issue navigator - openEHR JIRA

thomas.beale · 2 April 2024 21:09

Well anything that works like a data instance sequence number needs to be RM level, and you would not normally want to constrain this field.

pablo · 2 April 2024 21:14

But the problem I’m facing is having data items and not knowing to which C_OBJECT it corresponds too. If the differentiator/sequence number is only in the RM data, you still can’t point to a specific sibling C_OBJECT inside C_MULTIPLE_ATTRIBUTE.children

I think THE solution would be to have the differentiator in the data and in C_OBJECTs so you can find the right C_OBJECT, for instance, when validating that data structure against an OPT, or when indexing data on the database, or when building queries on a query editor and you want to retrieve data that matches one specific C_OBJECT among several sibling ones that all have the same archetype ID and node ID. Of course I might be missing something.

thomas.beale · 2 April 2024 23:17

I realise I did not quite understand your original post properly - I had thought that you wanted an alternative to ‘name’ for distinguishing data instances derived from the same archetype node.

Looking more carefully at your original question, I would advise modelling each question as a distinctly coded (in the archetype) node, e.g. a CLUSTER. Different questions are clearly semantically different things; to model them as instances of the same archetype node with just changes in the name field doesn’t correspond with the modelling intent in my view.

pablo · 3 April 2024 00:15

Don’t worry, let me create a full example so I can exemplify this better. I’m currently writing an article about this, focusing on questionnaire modeling.

damoca · 3 April 2024 07:57

I agree, I think that this topic is more about modeling practices rather than a technical issue. Let me develop it further.

An archetype is a general purpose, semantic definition of an information structure. A template is a combination and constraint of archetypes for specific purposes. The key word here is constraint, templates constrain the semantics and accepted data values defined in the archetypes.

Then, in practice, we create a template, we clone archetype nodes and rename them to better fit the final use case scenarios. But what are the implications of that renaming? Are we really just adjusting the text to make it more suitable for the final users, or are we creating cloned nodes with new semantics? I think here it is the key question.

If the cloned nodes in the template are just adjustments of the name, or of the accepted data values, then identifying them it is not that important, since at the end, semantically they are still defined by the atNNNN/idNNNN code of he original archetypes.

But if cloned nodes in a template are used to create completely new elements, with a particular meaning and semantics (I imagine that this is the case of the questionnaires in the original message of @pablo ) maybe templates is not the place to create them. If we are creating new semantic definitions, we have to do it at the archetype level, and each new node will have its own atNNNN/idNNNN code. Or, depending on the case, an specialization, with their atNNNN.X identifiers.

The tricky question here would be how we, or the tooling, can differentiate between both cases in order to create faithful structures and validation processes.

pablo · 3 April 2024 16:15

@damoca you touch a very sensible point.

What I have tried is to have two archetypes:

COMPOSITION.questionnaire
ADMIN_ENTRY.question

The question has an ELEMENT with no constraints (ANY) on the value, and tested two possibilities for the ELEMENT.name:

have no constraint in the archetype (when the question will be just interpreted by humans)
have a DV_CODED_TEXT in the archetype (when the question needs to be identified by code e.g. for statistical analysis)

Then in the template we have many ADMIN_ENTRY.question nodes, one per each question, and each question is semantically different than other questions, the semantics are set by the name of the question which is the textual question: how much do you smoke a day?, how often do you exercise? etc.

Now the semantic part: as a modeler you can say all the questions are the same, independently of the text/name of the question (abstract view), or you can say each question is semantically different, depending on the text/name of the question (concrete view).

Before you sit on either side, there is another point to consider: this is more similar to a terminology constraint than a data structure constraint. For instance, you can consider each question text/name is an item of a terminology, which is true since the question text is actually a term that means something, though is not an assertion, is a question (we are used to terminologies of assertions or statements like “this is blood pressure” or “this is diabetes”, but not so used to terms like “do you have diabetes?”).

So my approach is terminological in conjunction to very abstract archetypes, and combining the two I get the final template (I’m writing about this right now).

So in the template you will have:

COMPOSITION.questionnaire
ADMIN_ENTRY.question: name = ‘do you smoke?’
ADMIN_ENTRY.question: name = ‘do you exercise?’
ADMIN_ENTRY.question: name = ‘…’

In that case, COMPOSITION.content has 3 C_OBJECT children, each with a different name constraint. Though because the rule I mentioned above doesn’t seem to be in the specs, we need to manually assure the name constraints don’t collide, or in other words (and just for my case), that someone doesn’t add the same question twice.

In parallel, I think this approach applies also to lab results, since we can have a similar generic set of archetypes and define the specific individual test results (analytes) as terminological constraints, which might take advantage of using LOINC. I think this case also has a similar terminological approach than a data structure approach in terms of modeling.

damoca · 3 April 2024 18:06

I completely understand your scenario. It’s another example of the complexity of modeling questionnaires

So, back to your initial question, I agree that the non-unique sibling nodeId spec text can be improved to clarify some use cases.

And for your proposal of adding an additional node differentiator in the RM, if you don’t want to wait for a change in the RM, you should proceed with archetype specializations and not modeling the questions at the template, as I said before.
In an specialized archetype you can create as many atN.X codes as you need inheriting the parent node, and constraint them individually. And that should be enough for any validation needs you have.

The problem I found is that I couldn’t figure how to do this with the Archetype Designer. With LinkEHR we can have this parent archetype:

And then create this specialization:

pablo · 3 April 2024 18:10

I proposed the change in the AOM.

The proposal for the change in the RM was made by @thomas.beale

pablo · 3 April 2024 19:31

@damoca @thomas.beale this might help understanding my use case Patient Questionnaires in openEHR: the missing link

thomas.beale · 3 April 2024 19:54

Exactly.

Also - that should be the intent of renaming within a template - it’s a UI / display level idea, not a semantic one.

I would build a Questionnaire as an OBSERVATION (since it’s obtaining data from the patient) containing a number of CLUSTERs, each representing a question. Each CLUSTER can define the question language (including translations), the intended semantics of the question, any coding, and the structure & type of the possible answers.

The COMPOSITION level would include meta-data about the questionnaire, and also the situation in which it was filled in - at doctor’s surgery, at home etc.

pablo · 4 April 2024 00:23

I use ADMIN_ENTRY for it’s simplicity. This is a first approach for a full solution (check the article above).

heather.leslie · 4 April 2024 02:22

It is interesting to watch this thread evolve.

I went through the process of trying to design a generic questionnaire over 10 years ago - I blogged on the situation in Feb 2014 here. The blog post conclusions are way outdated now and our approach has moved on, but I include it to demonstrate that there has been a long history of trying to solve this problem that precedes this technical discussion.

In case you are interested, our current approach is below…
I invite my Editorial colleagues to join in if I misrepresent our current approach - @siljelb, @John_Tore_Valand @varntzen.

PROMS/PREMS and other validated scores and scales - are carefully represented as standalone OBSERVATIONs that undergo the peer review and publication process. You can find a collection of them in various states of publication in the ‘Scores and Scales’ project on CKM. You will also find a demo template for PROMIS-29 which aggregates CLUSTERs in a mix-and-match way to represent many permutations and combinations of the PROMIS item components but contained within a single PROMIS framework OBSERVATION. If questionnaires for a specific purpose are identified that will support reuse, even if not ubiquitously, then they are a candidate for this group and the associated modelling/governance process.
In recent years, especially triggered by the need for COVID-19 screening, we have slowly been developing a library of screening questionnaires that represent common ways of screening for critical clinical information. They represent ubiquitous topics in a loose way that deliberately balances standardisation/structure with flexibility. Modelling each screening questionnaire concept as a discrete archetype at least supports very high-level querying based on the concept, but has the potential for broader reuse, including common querying. You can find a collection of them, also in various states of publication, in the ‘Screening questionnaires’ project. They have been surprisingly successful and have been reused extensively in several projects, hence the significant number which have been successfully published. The reuse has been focused on initial patient screening use cases but has also been found to be very valuable in secondary use domains such as disease registries and infectious disease surveillance. The other value of this approach is that we strategically align the concepts with archetypes that will capture ‘positive presence’ as a more detailed persistent model - eg Symptom/sign screening - a ‘Yes’, can trigger the use of CLUSTER.symptom/sign to capture more information. Discerning where screening starts/stops and what should be left to the persistent model is often the hardest part, to avoid the potential of duplication in different models - we have found this to be a significant governance issue.
The notion of a generic archetype that provides a general structure without any semantics has been explored and discarded. We decided that there is essentially no value in a generic ‘pattern for pattern’s sake’ approach - and we did try, pushing this very hard for a long time - but even if an appropriate design could be made, there is little shareable value if most or all of the semantics need to be added in the template. So, for the situation where questionnaires have been developed for a specific organisation or project, and where there will be little, limited or no opportunity for reuse, we recommend modellers build their own archetype and govern it locally. In reality, there will be a myriad of these messy, locally configured archetypes that will come into existence, add value to a local user, product or project, be managed locally and never need to be shared. If it is discovered that there is value in sharing, then they can later be proposed to CKM as per option #1, above