CLUSTER.exam-specializations are expecting a previous version of CLUSTER.exam archetype

Pieter has probably already got all the logic working, but just in case, here is the wiki page that has (AFAIK) the right algorithm for bidirectional code conversion.

1 Like

@pieterbos / Archie’s conversion log works very well.
Pieter and I have validated (and finetuned) this by integrating it in CKM to - reliably - convert the various revisions of an adl14 archetype to adl2 while keeping the at/id codes consistent even with node deletions and additions.

So 1.4 → 2 conversion works quite well in CKM using Archie.
(There is the issue that you can only guess which revision of a specialisation belongs to which revision of the parent.)

Anyway, next step would probably be a 2->1.4 conversion.
If that works reliably as well (so that ideally you can round trip as far as that is possible), that would be an important step to also enabling adl2 archetype upload and basing more and more components directly on adl2, but continuing to offer adl14 export (and import as well of course :face_with_peeking_eye:).

Funding obviously never hurts, if only to prioritize.
The main issue with adl2 support is that it is one of the greatest (complex + comprehensive) changes to a tool like ckm that I can imagine.

Do we have an explanation of specialisation semantics for 1.4 in specs? Current archetype technology overview document reads as a description of ADL 2.0 flavour of specialisation.

I had to look back through old releases - the AOM 1.4 doc has this para (oops):

Archetypes can be specialised. The formal rules of specialisation are described in the openEHR Archetype Semantics document (forthcoming), but in essence are easy to understand. Briefly, an archetype is considered a specialisation of another archetype if it mentions that archetype as its parent, and only makes changes to its definition such that its constraints are ‘narrower’ than those of the parent. Any data created via the use of the specialised archetype is thus conformant both to it and its parent. This notion of specialisation corresponds to the idea of ‘substitubility’, applied to data.

ADL 1.4 doesn’t discuss specialisation other than to mention the specialise section.

But here is what was in ADL 0.9 document (2004) (added bolding):

5.4 Specialisation
Archetypes can be specialised. The primary rule for specialisation is that data created according to
the more specialised archetype are guaranteed to conform to the parent archetype. Specialised arche-
types have an identifier based on the parent archetype id, but with a modified section, as described
earlier.
Since ADL archetypes are designed to be usable in a standalone fashion, they include all the text of
their definition. This applies to specialised archetypes as well, meaning that the contents of the ADL
file of a specialised archetype contains all the relevant parts from its parent, with additions or modifi-
cations according to the specialisation rules. In an analogy with object-oriented class definitions,
ADL archetypes can be thought of as always being in “inheritance-flattened” form. Validation of a
specialised archetype requires that its parent be present, and relies on being able to locate equivalent
sections using node identifiers. For this reason, nodes in specialised archetypes carry either the same
identifiers as the corresponding nodes in the parent, or else a node identifier derived by “specialising”
the parent node id, using “dot” notation. The following describes in detail how archetypes are special-
ised.

I started work on ADL2 in about 2007, so it looks like I never attempted to write an explanation for 1.4 - and there is no good explanation available. ADL 1.4 archetypes are pre-flattened archetypes, so specialisation = copy+fork.

The only workable form of specialisation of anything is always differential :wink:

Thanks Tom. It did not occur to me to look at pre 1.4 versions of ADL docs. I suspect another form of useful information is the content in the wiki. The current link to wiki links in the AM landing page (at least for adl I think) is broken, but I did a search in wiki for adl and lots of useful stuff came up. I did not go through them.

The reason for the question and for me looking for documentation is to have a clearer picture of the changes to how specialisation is handled at the syntax/semantics level. There’s knowledge in the form of code in Archie etc but that’s not as convenient as having it in the specs :slight_smile: I’ll try to spare an hour or so today to do some reading. I have not touched this part of things for years, good to refresh every now and then.

1 Like

Here’s the ADL2 explanation with numerous syntax examples.

The AOM2 spec has specialisation semantics all over the place, and you will see a lot of conformance functions like this, where the ‘other’ parameter is the corresponding node from the same archetype.

--
-- C_ATTRIBUTE conformance to parent
--

c_conforms_to (other: like Current): Boolean
        -- True if this node on its own (ignoring any subparts) expresses the same or narrower constraints as `other'.
        -- Returns False if any of the following is incompatible:
        --	 * cardinality
        --	 * existence
    require
        other /= Void
    do
        Result := existence_conforms_to (other) and
            ((is_single and other.is_single) or else
            (is_multiple and cardinality_conforms_to (other)))
    end

c_congruent_to (other: like Current): Boolean
		-- True if this node on its own (ignoring any subparts) expresses no additional constraints than `other'.
    require
        other /= Void
    do
        Result := existence = Void and ((is_single and other.is_single) or
                (is_multiple and other.is_multiple and cardinality = Void))
    end

existence_conforms_to (other: like Current): Boolean
		-- True if the existence of this node conforms to other.existence
    require
        other_exists: other /= Void
    do
        if existence /= Void and other.existence /= Void then
            Result := other.existence.contains (existence)
        else
            Result := True
        end
    end

There’s also section 8.1.2.2 on validation, flattening and diffing.

The documentation isn’t as good as I’d like - even with pulling a fair bit of the code from the ADL2 workbench where it was worked out. There’s a lot of code in there that is reasonably readable if you want diffing / flattening algorithms for example. The 3 phase validator also is useful to understand details. Getting these algorithms right (according to the regression tests :wink: took about 5 years.

Archie’s equivalent code is mostly under tools, and I seem to remember @pieter telling me he created different (maybe more concise) diffing and flattening algorithms.

1 Like

And I look forward to se a more formal description on how specialization should be used, when to use, pros and cons for the clinical modelling.

Over years I have moved more in the direction of slots with specific clusters for specific details. IMHO this is a more scalable approach than speciallsation. I think this is not (only) a AM/ADL question. I am afraid people will use to much speciallsation if it gets møre easily available.

The problem described initially in this thread will be there also with ADL2!? When creating hieraechical bindings there will be an issue with changes. This will degrade the overall capacity of change in openEHR modelling.

I have not yet seen descriptions on how this will Impact the overall governance of archetypes .

An example which illustrate my point is this great question: Archetype specialization vs new archetypes

The answer from @siljelb is good. Still the question on management is not (only) about the problems with ADL1.4 and copying. No matter how this is technically implemented or managed you still have the semantic and governed connection which needs to be managed in the long run. We can make it technically simpler, but the information model is at the same complexity which only a few globally will be able to cope with.

Well we actually need both. Consider any archetype with a slot for some purpose like ‘device’, or ‘physical exam’. What do you want to allow in that slot? Archetypes for devices, or for physical exams. Now, if you can specify the slot with a constraint like openEHR-EHR-CLUSTER.device_description or openEHR-EHR-OBSERVATION.physical_exam, and there are specialised children of both those, then there is nothing else needed. But if there is no specialisation, a) those archetypes can’t be authored properly in the first place, and b) specifying slots becomes a real problem - you are stuck with enumerating all the possible device archetypes, physical exam archetypes. And that list keeps changing. You can fake it a bit by requiring that all the names of those archetypes match some pattern, and that will work, but it provides no guarantee of what data points could appear in the slot, whereas with specialisation, you are guaranteed to have certain data points.

So it’s really about how we achieve re-use and the balance between composition and specialisation. If we throw out specialisation altogether, we lose first class substitutability, which really reduces the power of any formalism.

We rarely specify which archetypes should go into a SLOT, most commonly keep it open for any archetype and leave it to the template. Or, in cases where we know one or several archetypes, we add them as “included”, but still don’t restrict others. Only in cases there is no doubt, or there is no thinkable use case for other archetypes we restrict to named archetypes.

1 Like

Our advice to new modellers is very much in line with what @varntzen and @bna towards less inheritance and more slot-filling/aggregation. ADL2 will actually help that since it essentially allows those slot-fills to be fully concretised. Pretty close to Embedded templates but where e.g. a Cardiac ultrasound imaging result is specialisation of the generic imaging result but where the new content is defined by a slotted in details Cluster archetype. In my head I’m calling these ‘aggregated archetypes’.

And I totally agree with keeping slots open in those generic archetypes.

2 Likes

I’m not sure I argued against specialitations :slight_smile: I just stated the modelling pattern is now not to be too specific what should go into any SLOT, because we a) don’t know what the future archetype to be put into that slot will be (might not be made, or will be named) and b) don’t want to backtrack all archetypes to correct the slots when there has been a change to an existing archetype.

1 Like

Fair enough, sorry for overstating on your behalf!! Based on the work we did in the past, using a lot of inheritance specialisation, that turned out to be a bit of a blind alley and maximising ‘generality’ is much more manageable using aggregated clusters, but we are really still learning as we go in areas of very fine granularity like physical examination, imaging results and detailed anatomical pathology.

3 Likes

It’s really a question of balance. If every exam-like thing (to use a current example) has 6 common attributes, and whatever number of varying attributes, then it’s a strong candidate for specialisation from a common ‘core’ exam archetype, with the extra attributes being modelled local to each archetype and/or via slots defined either locally or in the parent.

I agree that we should use composition (slots) in a significant number of cases, but not using specialisation where it helps is making things harder for a) model definition and b) querying. Also, specialisation is usually essential to make composition work properly.

Formal substitutability at run-time (which is the consequence of specialisation) means that a lot of querying power comes for free - querying for any of those core exam attribute for example - that relies on the query engine knowing that all kinds of exams are specialisations of the core one. If all those exams have no specialisation relationship, writing the query becomes nearly impossible. The same argument applies to any kind of clinical data that has a lot of variations on a common base.

All I’d really advocate for is that with ADL2/AOM2 (which implements specialisation properly), it’s worth figuring out what the right balance is. ADL1.4 archetype modelling is not a good guide to that.

Just a bunch of opinions from a different angle, to give you some food for thought Tom :slight_smile: :

Maybe you may consider distinguishing the use of specialisation for describing clinical concepts (archetype authoring via differentials à la ADL2) from composing the descriptions of clinical concepts by using slots and describing what’s allowed in a slot. These are not necessarily the same use cases for inheritance.

“…you are stuck with enumerating…” → Some may call this being specific, explicit and preferring a closed set.
“…and that list keeps changing…” → not every day. Slower actually, as the applications mature. Some may prefer the tests to break if that list changes btw :slight_smile:

Well then, don’t fake inheritance with archetype names, instead provide a closed set, and you’re back to knowing which data points are guaranteed to appear in the slot, for each member of the closed set.

I wish you could see this as a trade-off between having a guaranteed set of data items in a slot and not having to discover common data items between two different sets of data, which is what I suspect pushing @bna to his approach. Consider the case in which the author of the archetypes have no clue about the requirements of the composers of those archetypes (template) in Bjorn’s case (slots). At best they’ll have to work together to follow maximal data set principle for the next version of an Archetype, or worse, they’ll fail to find a the common subset but will have delayed actual solution delivery, which is again what I suspect hitting @bna

Admittedly, preferring explicit-one-of semantics (for the curious, google Sum types please) over is-a semantics will introduce challenges in other areas, say Aql :slight_smile: My point is implementation concerns and opinions may point at a different way of approaching modelling and it may be good to consider these. Maybe even support them with ADL constructs (better than regexes that is)

Just some opinions as I said in the beginning. I may be wildly off the mark of course, given where I take my archetype advice these days: Archetypes with Meghan: A Spotify Original Podcast | Archewell

2 Likes

I’ll just note that there’s a reason no-one does this in software development :wink:

Oh it’s certainly a trade-off - that’s why specialisation isn’t used alone to achieve re-use & substitutability - it would make for terrible archetypes and terrible software.

What I would say here is that in a design sense, it’s never just a question of whether there may be some common data points, but whether the overall concept of the archetypes is essentially the same (e.g. they are all some kind of ‘exam’). There are certainly situations in which there are superficially similar sets of data points (or maybe even not that superficial) that still don’t point to specialisation as the solution.

All I am really saying is that:

  • ADL2 does inheritance properly (such that specialising an archetype doesn’t create a fork), plus fixes 20+ problems identified in the past, including … everything to do with templates
  • what is thought to be the optimum balance of specialisation and composition is influenced by the inability of ADL1.4 archetypes to do specialisation properly, so I recommend that when we get to using ADL2, that optimum should be reviewed
  • quite possibly, we still have not realised the benefits of type-substitutability at an archetype level, for runtime querying, which is the main deal in any system…

These are indeed my main concerns: a) being able to (possibly machine) create queries that are efficient, powerful and succinct and b) being able to build more resilient applications that can deal with e.g. all varieties of a specialised parent archetype, rather than just an ad hoc collection of named archetypes, or archetypes whose names happen to match some regex.

Agree - the thing I want to add is a constrainer type for ARCHETYPE_HRID class, i.e. C_ARCHETYPE_HRID. It would support operators like SNOMED’s ‘<<’, and ‘<’, meaning ‘any specialisation child of this archetype’. And so on. See this wiki page for an 8-year old proposal to do this.

I’m not trying to tell anyone to model archetypes differently, I’m just suggesting that when we move the ecosystem to ADL2, it would be worth reviewing the balance of specialisation v composition, since specialisation will now work.

1 Like

and I’ll just note that I hope as many people as possible think this way :wink:

1 Like

haha touché!

1 Like

Hi @bna ,

The original pattern that I advocated, built & tested for Physical exam archetypes, was nesting CLUSTERs that had common (manually copied) core (but not necessarily all) data elements to keep patterns aligned but simultaneously trying to avoid the issues that come from ADL specialisation/tooling. Not ideal in many ways, but by modelling each CLUSTER explicitly meant we modelled exactly and precisely the relevant content for that use case; no data elements were included that were not directly relevant to the clinical concept.

Specialisations work most effectively where the patterns are universal and in reality even strong patterns in clinical medicine very often break the rules unless the core pattern is so generic it is nigh on useless in an implementation. In some (many?) use cases, by overlaying rigid patterns on child archetypes, and whether using ADL1.4 or 2.0, specialisations can potentially increase templating complexity and ambiguity in some areas as much as they might add benefits in others.

It may be contentious to some, but my philosophy has always been that the clinical content in archetypes should primarily be designed to be accurate, unambiguous and understandable to the clinicians (first priority) and to support modellers with minimal training to represent the data appropriately (second), not for technical elegance or purity. Implementation requirements should be considered as a third priority.

In any case, creating coherent families of patterns is not simple and whether manually managed or specialised, there are arguments for and against. So Physical examination was our first test bed of this approach - with a fractal mix-and-match design requirement for the detailed examination of specific body regions/parts etc. By early 2019 we had developed many archetypes using this pattern and it worked well from a clinical modeller’s point of view.

In February 2019 @Silje and I were requested to attend a meeting where you & @anca.heyd pushed very strongly for a specialisation pattern for the Physical examination CLUSTERs. You argued that DIPS needed to be able to query exam findings based on specialisations and especially the need to add the index ‘System or structure examined’ data element carrying a SNOMED code for the system or structure already identified in the archetype name. This made the focus of the examination explicit and unambiguous in the model as well as supporting your implementation requirement to be able to query for all related examination findings based on the SNOMED hierarchy eg all chest exam findings - chest wall, heart, lungs etc. This is my recollection of the reason and subsequent justification for the development of a parent (generic) CLUSTER.exam and transformation of all of the existing Physical exam CLUSTER archetypes available at the time to specialisations.

I remember the meeting well because the consequences of changing the modelling pattern at that point were already huge. As one of the most experienced implementers at the time, we listened and (reluctantly - me, at least) agreed to your proposal. I then remodelled the CLUSTERs as specialisations - from CKM records it seems that work started in July 2019 and has been gradual and ongoing. This was NOT a trivial task - many hours, attention to detail, and considerable cost.

Since then more Physical exam specialisation archetypes have been added to the library, as well as breaking changes to the CLUSTER.exam.v1 resulting in a v2. None of the CLUSTER specialisations have been upgraded (a manual process, due to ADL1.4 & current tooling) and one new examination CLUSTER for placenta was added recently. When we started modelling the imaging exam domain, which has similar fractal, mix-and-match requirements, we naturally used the same pattern.

This is the history as I recall my involvement…

3 years later, it is incredibly frustrating to find this thread playing out. The consequences of reverting back to the previous patterns are even greater now - larger numbers of archetypes across multiple domains.

@bna, most of all, I would appreciate you clearly explaining to us all the reason for changing your mind - in the thread above I can only see you’ve stated ‘over the years I’ve moved more in the direction of slots with specific clusters… this is more scalable than specialisation….’ OK, but precisely what has shifted between your very strongly held view in 2019 and your current view?

I have always favoured the nesting of CLUSTERs approach, so while I would welcome the reversion back to the original patterns in principle, it would be very helpful if we can understand more about what you have learned in those intervening years as we weigh up our choices, especially the cost of reverting in terms of skilled availability, resources and $$. It has implications for us all to understand how best to solve this; so that the new Clinical modelling leadership group can document the philosophy/logic/rationale as a critical consideration for modelling decisions in the future, and so we can develop an agreed approach to creating aligned families of archetypes: essentially the pros & cons of specialisations vs non-specialised ‘pseudo copies’.

(Please don’t @ me about ADL 2.0 - that’s a related but somewhat separate issue from pattern design POV. Assume for this discussion that perfect tooling for ADL specialisation design and implementation exists)

Cheers

Heather

2 Likes

Dear Heather,

I am sorry If my writing made some confusion. I should have been more preceise when talking about specialisation and composition pattern. I didn’t make it explicit that I talked about generic modelling patterns, both for information models and more traditional models like classes in i.e. C#, Java or TypeScript. When I said I have moved from specialisation to composition I meant both for programming patterns and my thinking about clinical modelling. This not only about specialisation and/or cluster in cluster.

It is also related to the pattern of maximal data set. Maybe we can create smaller and more sustainable archetypes combined with SLOTS for the specific details? Maybe this can make long-term governance more maintainble? There is question marks here since I really don’t know the best solution for all requirements in the different clinical domains, the technical impact on vendors and the impact for global governance of archetypes.

For archetypes: I think I never modelled a specialised archetype for production usage. I explored the capability but never found a good way to make it scalable. This could be due to tooling or ADL 1.4. I think it mostly is because it didn’t fit the modelling domain or the human brain :slight_smile:

I don’t remember details about the meeting you mention back in 2019. I do remember looking at the patterns for physical examinations over some time. The way it was modelled, with a shared set of attributes to define the system or structure examined, it surely had some features pointing in the direction of specialisation.

I also think it would be easier to develop AQL to get all phyical examination defined by some terminology to define the system or structure examined.

I don’t think I made some kind of veto on this. I just shared my thinking.

There could be several ways if achieving this. I.e. using a CLUSTER with an attribute/element defining the system or structure examined and the expand by slots for the specific examinations and systems examined.

What I also wanted was to have a review or discussion with the technical group to make sure we had the right concensus in this important modelling domain. It could even be that there was some patterns which where candidates for an upgrade of the RM. We in openEHR should be an agile community being able to work horizontal and vertical in such changes.

I don’t have the facit on how this should be done. There might be different modelling problems which require different patterns. When introducing a pattern i.e. CLUSTER in CLUSTER or specialisation it would be nice to have some background info about the decisions and the functional and non-functional requirements behind the decision.