Reviving an old thread… we’re looking at OPT2 details right now, and I’m thinking about @pieterbos 's issues on OPT2 (reachable from this PR).
Originally (= 10 y ago) I thought we would just generate a simple OPT structure consisting of C_COMPLEX_OBJECTs and C_PRIMITIVE_OBJECTs (and any remaining ARCHETYPE_SLOTs).
Then we ran into two issues (at least the first described by @ian.mcnicoll in the past):
- sometimes there are slots with very specific node-ids, e.g. something like ‘work contact’ and ‘home contact’ (maybe in some SDOH kind of archetype) that would both be filled by the same generic
CLUSTER.contact_info
archetype. We would want to retain the owning archetype’s id-codes (i.e. the work and home) because if we lose them in the data, we can’t distinguish which is the work and which the home data structure - the archetype idCLUSTER.contact_info
is no help. - the reverse situation can happen: two generic slots e.g. something like ‘other_details’ and ‘extension’ (ok, bad modelling, but still…) being filled by two very specific archetypes, e.g. ‘care_plan details’ and ‘financial data’. In this case, the owning archetype node ids are not very helpful, it’s the filler archetypes that tell you what the data really are.
In summary we may need both the id-code and the filler archetype id at any archetype root point in an OPT in order to form useful paths for AQL queries.
The real problem is that our current approach to marking nodes in an OPT and therefore forming paths is not quite good enough. The current approach has the LOCATABLE nodes being filled as follows:
[openEHR-EHR-COMPOSITION.encounter.v1]
/content
[openEHR-EHR-OBSERVATION.ear_exam.v1]
/data
[id29]
[id40]
/items
[openEHR-EHR-CLUSTER.device.v1]
etc
[id72]
etc
It is at those root points with the archetype ids that we want to retain the id-codes from the parent archetype nodes. We could do that in the OPT if we change the rules, but not (currently) in the data.
To maintain the id-codes on all nodes (and not get confused as to which archetype any id-code belonged to), we would need some scheme like the following
[archetype=openEHR-EHR-COMPOSITION.encounter.v1]
/content
[archetype=openEHR-EHR-OBSERVATION.ear_exam.v1
node_id=openEHR-EHR-COMPOSITION.encounter.v1::id4]
/data
[node_id=openEHR-EHR-OBSERVATION.ear_exam.v1::id29]
[node_id=openEHR-EHR-OBSERVATION.ear_exam.v1::id40]
/items
[archetype=openEHR-EHR-CLUSTER.device.v1
node_id=openEHR-EHR-OBSERVATION.ear_exam.v1::id47]
etc
[node_id=openEHR-EHR-OBSERVATION.ear_exam.v1::id72]
etc
In the above, the ‘node_id’ meta-attributes can be used like proper terminology codes, no need to go looking for what archetype they came from. So something like node_id=openEHR-EHR-OBSERVATION.ear_exam.v1::id47
is a fully qualified code telling us what the node is statically defined to mean in the archetype. We also don’t lose id-codes from slot nodes.
The above would of course make for very long paths, but the node codes would be fully specified, just like ‘snomed_ct::123456789’ or similar. That could be shortened by using some replacement for the long archetype ids, e.g. 10-digit codes similar to SNOMED.
Another difference of this scheme is that we would not have [archetype_id]
path predicates as such, only predicates of the form [archetype_id:: node_id]
.
At the root points we also have ‘archetype=xxx’ (LOCATABLE.archetype_details
) to indicate we have changed archetypes via slot filling or external reference. This might be statically defined (i.e. modelling time in a an archetype or template) or dynamically filled.
The above would lead to different archetype paths than we have today although the two varieties are easily machine mappable.
Thoughts on whether this or some other improved scheme is worth looking at?