archetype node_ids again - looking for final solution

I spent a bit of time trawling back through that last discussion on C_OBJECT.node_id (the property that carries at-codes) and whether it should be mandatory or optional, and also whether empty is valid.

Currently it is mandatory, and can’t be empty.

We need to make the spec work for 2 distinct styles of modelling:

  • 13606 style - requires an at-code on every node, but only some have to be defined in the ontology section

  • in this case, the node codes really are like identifiers only, some of which happen to have a ‘meaning’ define for them

  • the result of this is the ability to do more mechanical processing of structures, since absolutely every node has an id.

  • openEHR / CIMI style - at-codes required according to rules needed to ensure uniqueness; if present, always defined in ontology section; they are optional on any other node;

  • in this case, the node codes (where they exist) really are codes, and are always defined as terms

  • the result of this, especially in larger archetypes is far fewer node ids (since very few at leaves), and shorter paths.

In theory, in both approaches the ontology section comes out about the same, since the LinkEHR-style ontology only contains at-codes with some ‘interesting’ or meaningful definition.

I would think our goal would be that archetypes written by anyone should work in anyone else’s tool. At the moment this probably isn’t the case:

  • the AE and ADL workbench would both complain about archetypes with at-codes with no definition in the ontology section
  • LinkEHR would complain about archetypes that had some nodes with no at-codes

As Pablo Pazos said in that previous discussion, we should make this work.

Can we arrive at a single set of rules that can accommodate everyone?

The thing that I think is the greatest point of difficulty isn’t node_id optionality (since a tool built with this assumption will accommodate archetypes with at-codes everywhere), but the question of whether there can be codes with no definitions in the ontology.

The original idea of archetypes was to ‘overload’ the basic types available in a generic information model to have domain level meanings. To my mind that is still the basis of the approach, as per this example from the blood_match archetype:

The driver for node codes isn’t primarily uniqueness, it is ‘meaning’. So above we have 3 ELEMENTs representing ‘ABO’, ‘Rhesus’ and ‘Anti-bodies detected’, and two others representing ‘Antibody’ and ‘Details’ - that’s the idea of domain ‘overloading’ of a reference model.

So it seems logical that:

  • any ‘overloaded’ node should have a definition of its id, otherwise, why overload?

  • Additionally, it seems obvious that where there are multiple sibling object nodes under an attribute, they all should have distinct codes and meanings, since otherwise … you don’t know what they mean, and the archetype isn’t doing it’s job.

  • It can also be the case that for some nodes, the RM default meaning (e.g. something like ‘OBSERVATION.protocol’) needs to be overloaded by a more specific meaning. So an at-code is added there as well.

So far so good. I think both camps agree on this - which nodes are ‘semantically overloaded’ should always come out the same in a proper modelling discussion.

Now, we also agree (as far as I know) that it’s a good idea to be able to identify any node in an archetype by a unique path, so that archetype nodes can reliably be associated and re-associated with data instance nodes (and also processed in all kinds of ways inside design time tools). Due to the above requirements, this is more or less guaranteed to be satisfied anyway.

However, LinkEHR has an additional requirement an id must exist on every single object node - including nodes with no special ‘meaning’, nor requiring a uniqueness marker - and so it adds that. It doesn’t require such nodes to have ontology section definitions, so the ontology isn’t affected, but it does now mean that there are node ids with no definition in the ontology. This is the point of breakage between tools.

I know the UPV guys have probably explained this before (maybe some years ago) but I am struggling to remember why this is needed, and what it adds. Can someone provide a summary of the explanation here (and any comments on the above)? I think that would help to know where we should go next with this.

  • thomas

Thomas, thanks for your effort, it is very interesting, but now I am not able to respond more to it. I’ll come back to this in a few days.

Bert

(attachments)

bdeifccf.png

Let me split a bit this email, to not get confusing ------------------------------------------------------ Hi Thomas, as you probably know, by now, my kernel works with both Reference Models, and has AOM as base-definition, every constraint a RM puts above that is for the particular Reference Model. The AOM only needs to support it, in fact it is the other way around: A Reference Model which wants to work with ADL-archetypes, must be able to fit in the AOM. ------------------------------------------------------ I have a question about the mandatory of the node_id. I can read it in the specs, you are right, it is there. It may sound stupid, but that is not how I ever understood this requirement. Because C_PRIMITIVE_OBJECT is also derived from C_OBJECT, and mostly I see them in OpenEHR-archetypes without node_id. I checked a few examples and the ADL-parser accepts them with no node_id, and returns null for nodeid if there is no nodeid. In the AOM Java code I find following rule in CObject-constructor if (nodeID != null && StringUtils.isEmpty(nodeID)) { throw new IllegalArgumentException(“empty nodeID”); } In my feeling, this contradicts with the specs. Or am I wrong? Must be something stupid I overlook. Please put me on the right track! ------------------------------------------------------ I think this is because of purpose. The developers only have their specific purpose in mind, and do not want to create an archetype-editor which serves purely the AOM-specs. At this moment there are only two Reference Models, but it could well be possible there are new to come. Because the AOM concept is very powerful. It can have a life of its own, for beyond the purpose of OpenEHR and 13606. Reference models should be, as I said, just constraints on the AOM. And if they receive recognition in major software-development, then that is a good thing, but it has nothing to do with the AOM. The AOM is (only) a modeling-environment, it must be as wide open as possible. So, discussion must not target the internals of the AOM, but the reference model. ------------------------------------------------------ Ah, Pablo said so too? Thanks Pablo!!! It is the first time in years someone else except me brings this under attention. (as far as I know, maybe I missed some) I had this discussion a few times on this list, let’s say, once every two years, and it always brought up a mist of arguments, and in the end nothing changed. That was a bit tiresome, so I left it as it is and did my own thing, and that was following the AOM as written by Rong. Which now seems conflicting against the specs? (please enlighten me) I think it is important, having good tooling is essentially for the acceptance of a standard (AOM is part of an ISO standard, as I believe). It is really a pity this fact is not well recognized. The answer does not look to complicated. But maybe, again?, I am overlooking complexities. Talking about ArchetypeEditor GUI, make the in creation of an C_PRIMITIVE_OBJECT the addition of a node_id optional. Create an archetype-editor so, that it works conform the AOM. And make support of the AP optionally. Shouldn’t be that hard. In fact, in my spare hours, I am working on one, purely AOM-archetype-editor (but I do not have many spare-hours) Bert