Missing rule in AOM 1.4 for non-unique sibling nodeIds

pablo · 4 April 2024 06:43

Thanks Heather. On the article linked above I explored a different path, combining:

generic archetypes
external questionnaire model, with specific constraints to questions and answers (it’s focused on questions, not on other types of information recording)
library of questionnaires and questions based on the model in 2. which is managed like an external terminology service would do with concepts and terms but with questions and their related metadata
a program that generates a template based on a questionnaire definition, and that follows the generic archetype model (which is more in line with the organization-specific questionnaires)

On the template side we explored two options:

a. having one question for each ENTRY
b. having one ENTRY and all the questions inside as CLUSTERs

Option b. has a simpler data structure. Querying on either option is very similar. The only issue is the one that triggered this thread: a limitation on the AOM, which affects data validation and querying.

This is all still a PoC but seems to work end-to-end.

ian.mcnicoll · 4 April 2024 08:18

Thanks Heather,

Great summary, and having been involved in those complex and painful attempts to develop a generic questionnaire archetypes, I completely agree that the current approach of specific PROMS. a set of themed screening archetypes and ultimately some local Q+A type archetypes is working very well for us.

As you say there are some grey areas around when to use the screening archetypes vs. ‘semantic/ persistent’ equivalents, and it requires a bit of detailed analysis to understand the status of the information being entered. Is it preparatory information capture of medical history, true ‘formal recording’ of a diagnosis, or at the other end of the process, perhaps a protocol-driven questionaire for registry purposes.

@Pablo - I don’t really understand the problem you are trying to solve (that is not solved as Heather has detailed above). Even if you can manage a list of ‘questions’ , and that does somewhat replicate what LOINC is doing, outside the area of simple scores, those questions generally also have some other data beyond present/absent like ‘Date of diagnosis’ or severity.

That was the problem that we found - simple Y/N answers nearly always needed to be extended to cover other datapoints.

The ability of the latest CDRs (Better and Ehrbase I understand), to do context-agnostic queries on ELEMENTS will make it easy to cross-query the various flavours of record.

siljelb · 4 April 2024 12:59

Thanks @heather.leslie and @ian.mcnicoll for your responses, I agree completely with you both about the questionnaire question!

Regarding validated/standardised scores and scales, in addition to the OBSERVATION pattern outlined by Heather, there’s a sibling pattern for gradings and classifications that are closely associated with specific concepts modelled as ENTRY archetypes. The pattern is described in this Confluence page: ENTRY-linked gradings and classifications - openEHR Clinical - Confluence

pablo · 4 April 2024 20:21

@ian.mcnicoll note the issue I reported is from the specs themselves, not about questionnaire modeling in particular. It’s just that while working with questionnaires I bumped into this AOM spec limitation.

The questionnaires we are dealing with are plain questions with single of multiple answers of the same type. There is no context information associated like “do you have diabetes?” then a related question “what’s the date of detection” , so there is no severity either. Neither are simple yes/no.

Though we have groups for questions, which are CLUSTERs. There we could add context data, but right now there is no need for that for our requirements.

The article I shared has more info about this use case.

Hope that helps.

thomas.beale · 4 April 2024 23:56

Well we do have a node differentiation in every C_OBJECT in an archetype. It’s just that you are cloning at nearly the data level in a template. So you are going around the standard AOM approach to representing semantically different nodes!

This is roughly the way I would expect to model questionnaires. There are undoubtedly a more sophisticated approaches that we have not developed yet, but this PROMIS modelling enables semantically distinct and reusable questions within a containing questionnaire, which is usually what is needed.

Why not do this as an archetype - then you will get the effect you want? Maybe I am missing something obvious here?

pablo · 5 April 2024 01:14

Duplicating sibling nodes in the template is not new. Check the template with multiple sibling SECTION.adhoc Template: International Patient Summary [openEHR Clinical Knowledge Manager] The only differentiator for C_OBJECTs is the name constraint at the section level. I believe I’m using the same technique.

Having a single archetype for each question or by question + context was explored. If we scale this for new questionnaires we might end up with 100s of archetypes for questions, which we will need to manage in time, and templates will need to be regenerated with each change to an archetype, having also to manage dependencies. IMO the terminology-like approach moves that away from openEHR modeling. Though it’s healthy to have and explore different approaches. Though the orthogonal question is what the spec allows and not, and were the spec has limitations for valid cases. We can create a separate thread about different approaches to model questionnaires. From a more general view, what I’m exploring right now is the ability of generating openEHR artifacts based on external requirements, based on other models, which is like domain specific language but for models. It seems this is a semantic level between archetypes and templates. There might be more implementers doing stuff like this, but I couldn’t find any papers about it, so might be something new and worth exploring (not only for questionnaires).

This is done in an archetype, it’s just generic. The specific questions are defined in the template on either case (we tested both cases): one specific question per ENTRY and one specific question per CLUSTER inside one ENTRY. Either way the problem of having several siblings with the same archetype ID is still present, the first case at the ENTRY level and the second case at the CLUSTER level.

siljelb · 5 April 2024 12:13

Yep, this modelling pattern works well for the PROMIS type of questionnaire, which is one of potentially many predetermined and standardised combinations of predetermined and standardised questions. But it won’t work well for your average questionnaire, which usually are semi-random collections of questions with limited standardisation or internal coherence. We need to support those as well, and this is one of the requirements that led to the development of the Screening questionnaire family of archetypes.

That’s not how either the PROMIS pattern or the Screening questionnaire pattern work; neither of them (pattern-wise) have archetypes containing only one predetermined question.

pablo · 5 April 2024 17:45

I see sets of single questions per cluster e.g. Cluster Archetype: PROMIS Item Bank v1.0 - Anxiety [openEHR Clinical Knowledge Manager]

That’s a single archetype for 4 single questions that are about the same topic, anxiety in this case. So I would use a better terminology: having a single archetype per-topic and a few questions inside. This is what we call “group” in our questionnaire model.

So in the process I described, we model these externally (from archetypes), and end up in an auto-generated template with similar constraints as the current PROMIS archetypes, resulting in something very similar, but following different methodology, conceptually and technically.

joostholslag · 5 April 2024 18:15

I’ve encountered a similar requirement as Pablo using modelling of care plans. We decided to use the UID RM attribute for this. (Can’t find exactly which anymore).
We used ADL2 so, node identification of sibling nodes in adl2 specialisations (archetype or template idem), but at runtime the order isn’t guaranteed to be stable, and we needed stable identification for linking data.
I would suggest first to hasten the migration to adl2.4/3. And second to adopt the pattern from adl2 in adl1.4 in the interim, or at least be very careful to have an easy migration path.

pablo · 5 April 2024 22:59

Interesting. Do you mean the order of the items in the data, right?

I don’t think ordering should play a role in this. The issue of not being able to identify sibling nodes with the same archetype_node_id in the AOM side is the first part, then the second part is we need to be able to: given any object in a RM data instance, find the specific C_OBJECT constraint that defines that RM object, which means, we need to be able to store that extra identifier (differentiator) in the RM, and in the AOM to comply with the first part. But that identifier/differentiator in the AOM or RM shouldn’t depend on the position (position based identifiers are too 90’s hehe). Thinking about this requirement, I guess “sequence” is not a good name, since it implies some dependency on the position of things.

I don’t know how all that works in AOM 2, but if the second requirement I mentioned above depends on the position of objects, we are screwed… If that’s the case, maybe you can mention that to the new ADL3 editors so they can contemplate the requirement hopefully remove any dependency on position/based IDs or indexes.

thomas.beale · 6 April 2024 04:55

You will - that’s unavoidable if questionnaires are to be properly managed.

Well you can still use terminology to represent question texts, descriptions and so on, with the archetype-based modelling style. Unless you are changing questionnaire structures, very few changes will be needed in the archetypes - changes to terminology will take care of most needs. But you’ll have properly semantically distinguished questions rather than just data level clones.

Right - that is just creating data instance clones that are all instances of one ‘adhoc section heading’ concept. This is not the same as what you want to do.

How do questions get out of order?

joostholslag · 6 April 2024 06:35

Yes, the sibling identification in specialisations (like templates) has been solved in adl2 in my experience. But just the idx’.x’ syntax doesn’t solve a consistent order in the data. So the problem I described is related but not identical.

Agree, and it doesn’t: we used the ELEMENTs LOCATABLE.uid as a constant identifier of a specific node (in data).
Many situations, the nodeid in adl2 itself is fine for identification, but if there’s reason to expect changes in the nodeid (order) there’s an optional attribute for consistent identification.

Well, the questions in Pablo’s example don’t get out of order if you define them in an (adl2) archetype or template. But the data for a multiple occurrence item can get out of order.
In our situation it was parts of an episodic care plan composition with lists of problems, with goals, with actions, grouped by sections, where the order of nodes in the list could be edited by the user to indicate priority.
https://archetype-editor.nedap.healthcare/advanced/plan/archetypes/openEHR-EHR-COMPOSITION.plan_element-SFMPC.v1.0.0
https://archetype-editor.nedap.healthcare/advanced/plan/archetypes/openEHR-EHR-COMPOSITION.plan-SFMPC.v0.3.0

thomas.beale · 6 April 2024 14:01

Those numbers do not indicate lexical order… but whatever the codes are, the lexical order of archetype nodes in the ADL is significant (hence the ‘before’ and ‘after’ operators).

Right. The archetype cannot guarantee anything about that.

joostholslag · 6 April 2024 16:23

I thought you might say that, and I don’t know our exact reasoning anymore. And I no longer have access to nedap documentation. @MattijsK do you?
But given the complexity of a care plan: multiple parties editing, across organisations, and potentially life long currency on the one hand and the importance of semantic correctness of the link on the other (clinically and potentially for determining authorisation scopes) I still feel it much safer to add a uid to a data node. The spec text states it’s intended for root nodes, but I don’t see much issue with this usage, does anyone?

pablo · 6 April 2024 21:06

It’s different if this is managed by the openEHR process, than an external process specific for this use case.

What I wrote in the article is not about using terminologies to handle the question texts, is about using ideas from terminology management to manage complete question definitions. That is why I mentioned “terminology-like approach” than creating a terminology for questions.

I think we are getting lost in translation here

That is creating clones of the AOM nodes and adding constraints over the name. This is exactly what I’m doing. I don’t think “data instance” apply here, since it’s all AOM, not RM.

Then when we have a RM data instance, we need to know which C_OBJECT corresponds to each question node in the RM instance, and we need to find the C_OBJECT by the archetype_node_id of the LOCATABLE instance and use the LOCATABLE.name value to find the C_OBJECT that has a constraint that matches that name (the one that validates), the issue is there are two C_OBJECTs which this constraint passes for the sane name value. This is the limitation we are facing.

IMO there should be possible to add to the C_OBJECTs, in the archetype or template instance, that fall into this category, an extra differentiator code, ID, etc. And tooling should be able to detect this situation and add those differentiators accordingly. Of course this is not in the current 1.4 spec.

thomas.beale · 6 April 2024 21:50

Yep, I wasn’t precise, but I did get that you are managing question texts in a private terminology-like approach, which makes sense to me.

Aha - but AOM structures are somewhere between the reference model (which is just the model of kinds of Lego bricks, and how they stick together) and real data which is Lego dogs, tractors and helicopters. Template structures often get very close to the final data. So it’s not a binary distinction, there’s a continuum here.

So far so good.

This is not the usual thing to do. Consider: what you are trying to do here is semantically disambiguate 2 data instances based on the constraints of their name field. But there’s a much easier way to mark data semantically: create it based on semantically distinct archetype nodes, which means: define your nodes (questions or whatever they may be) as proper nodes. Now the node codes (id-codes in ADL2) will be in the data and you can instantly tell them apart.

Different name values but same archetype node id means: these are (multiple) instances of the same semantic thing, just distinguished by some other field that corresponds to their creation, typically time, but it could be some other contextual property.

There’s actually nothing in the AOMs/ADL2 spec that says you can’t add new C_OBJECT nodes in a template, but most probably don’t handle it, or even prevent it. The reason is that it’s always easy just to create some local specialised archetypes to model the local content you want - this will get you archetype code per node. Then build the template to define the questionnaire.

Most interesting archetype semantics are not in the ADL1.4 spec - we stopped developing it 15 years ago

pablo · 6 April 2024 23:36

Just to clarify: what would be a concept or term in a terminology, is a: question text + question answer type + question answer multiplicity + question answer options. All that is one single concept that is managed in the “questionnaire service” like terms or concepts are manged in a “terminology service”.

I understand your point. Though the model is almost the same for AOM/TOM in terms of how constraints are defined. I know we are defining the constraints at the template level, and closer to the final data structure in the RM, but the problem with the non-unique sibling nodes is still there. Even if we were not using templates, the archetypes have the same issue. So the problem is not really at which level we are defining the data, which has more to do with the orthogonal discussion about questionnaire modeling, the initial discussion is the AOM identification thing. We are mixing two conversations

Remember we are reusing the same generic archetype for each question, the the specific questions are at the template, complying with that generic archetype.

We can go with this discussion forever, the challenge is to define different semantic levels and semantic boundaries as a framework we can use for the discussion. Until we have a mathematical definition of that, there is a grey area. What I see is: the RM is the base semantic level for the structure of general concepts. Archetypes can define more semantics, and different levels of them, since we have specialization, but the boundary is all concepts defined at the archetype level should be general use. Then the 3rd is the final semantic level, that adds semantics and constraints over the archetype level, for specific use. Then there are other semantic levels that are not in the scope of openEHR.

Considering that, our questions are for specific use, so my criteria is to put that at the template level, but we need the underlying archetype structure so we can base our templates on something, though we don’t need much specific structure there, so the archetypes are pretty generic.

Now, with that context, we have the AOM/TOM node identification issue.

Of course our hypothesis could be argued, but I consider it makes sense, though I by no means claim this is the only way, though I believe it’s conceptually and technically correct.

Sadly openEHR focus shifted away from the AOM spec and just recently understood we needed more at that level, and here we are discussing limitations of the most widely used AOM version

thomas.beale · 7 April 2024 02:33

Right - but semantically that’s a/the problem. The question / answer possibilities ‘have you ever smoked?’ are not just a runtime data variant of the question / answer possibilities for ‘Did you ever have a heart problem?’

Right; it also forms the upper ontology of epistemic primitives for the domain models, expressed as archetypes.

Well - more like use-case independent use. Don’t forget, archetypes can be specialised purely for local reasons, e.g. adding some funny data point to the usual collection for heart rate or whatever. So they won’t be used outside of the teaching hospital doing that. But within any deployment, locally specialised archetypes are generally applicable - you still need to create templates or other artefacts (e.g. workflows) that use them.

The template level is conceived of as providing two functions to create use-case specific data set definitions:

aggregating elements from specific archetypes (including ‘removing’ those not needed)
narrowing constraints (particularly terminology) to correspond to local use

By convention, template modellers don’t add new data points. For one thing, ‘template modellers’ almost always corresponds to non-domain experts, more like business analysts or even just normal developers. They assume all the semantics have been worked out; they try to build data sets that will provide the data for forms in their software; plus implement local terminology bindings.

Well all the semantics are in the ADL2/AOM2 specs. Much of the semantics that are implicated in this discussion just can’t be accommodated in ADL1.4, because it doesn’t do specialisation (in ADL1.4, an archetype is a flat archetype, not a differential). It also has a terrible coding system and other undesirable problems. For any question about how to do archetypes properly, I have always directed people to the ADL2/AOM2 specs, even if concretely they had to find a way to implement the ideal approach in 1.4 tools. Today, the most used tool - Better’s AD - has had AOM2 inside it for at least 5 years (I suspect closer to 7 or 8).

It’s not a perfect system, and it doesn’t do everything , but it does have a reasonable amount of internal coherence.

siljelb · 7 April 2024 08:20

I’m struggling to understand how moving the complexities of representing the specific semantics of questionnaire questions (where they matter) from the archetype level to the template level solves anything. To my understanding the only real gain is additional complexity.

The current principles for modelling questionnaires in openEHR can be summed up as the following:

Most questionnaires are ad-hoc garbage, and their exact semantics don’t matter outside of the context of the questionnaire itself and immediate reuse situations. Semantically it doesn’t matter too much how these are modelled, but the Screening questionnaire archetypes have been created to simplify the creation of yes/no/unknown structures, together with context information like terminology codes for a diagnosis, and temporal information. As @heather.leslie mentioned earlier, this simplifies the common requirement to transfer questionnaire based information into corresponding positive-presence archetypes such as Problem/diagnosis.
Questionnaire questions which have been deliberately designed and verified by research, for example NEWS2 or PROMIS, need to be modelled in such a way that they can be uniquely identified across use cases. This is why these are modelled as specific archetypes, either as a complete tool like NEWS2, or as item banks which can be combined in a template like PROMIS. (Note that the PROMIS question banks are much larger than the current archetypes, and the intent was to grow the archetypes as the remaining questions are required.)

Is the problem you’re trying to solve the ad-hoc or the deliberate, @pablo? Or is it even a third category that we haven’t previously identified?

heather.leslie · 7 April 2024 10:07

+1 @siljelb

Despite the significant pushback from the most experienced international openEHR modellers, both in this thread and on LinkedIn, it is concerning that software engineers are intent on solving a technical modelling problem in complete isolation of any collaboration with the modelling community, especially not considering what would improve the current situation, or be sustainable or governable.