Modelling pattern - imaging examination - what do you think?

I forgot to mention something important - in ADL 1.4-based systems (currently most vendors today), the query engine can work out if there is data from specialised archetypes just by searching on the top archetype id with any extended form of the concept part of the id, i.e. if the parent is


in ADL 1.4, children are named like




So the query engine just has to know to search for openEHR-EHR-CLUSTER.exam%.v1 or similar (maybe there is something smarter you can do - need to check with SQL experts) - no need to have access to archetype repository.

Yes, but I’ve been focused on RM, not modelling, and things are different when it comes to these two :slight_smile:

When it comes to your question, it’s a difficult one. Some of the points Tom made in response to you deserve another dedicate response, but I’m not sure if I can find the time for that, so I’ll try to respond to you (mainly).

IHMO the criteria for choosing one of inheritance and composition over the other changes between the software development and data modelling contexts for openEHR. Even though the terms have mainly the same meanings, their mechanics are different when we’re talking about object oriented programming languages and openEHR models.

I think I can only share some thoughts I consider to be decision criteria, and I’d expect someone to compose those (no pun intended) when making modelling decisions. These are the thoughts of a programmer, not a clinician, but I’d expect my arguments to make sense to clinicians as well.

openEHR archetypes are meant to be maximal datasets. They’re inclined to grow in content in time, and that growth is meant to ensure clinical data created years later to be compatible with earlier data. Breaking changes in models introduce the essence of “this lump of data is not compatible with that lump of data” problem which we’re trying to decrease as much as possible with openEHR, though the problems introduced by breaking changes in archetypes are a lot more isolated and controllable than the case of chasing a retired nurse to ask for the source code of the program they wrote in Delphi, which has been running for 18 years in a department…

So a criteria for a modeller is: how important it is for them to ensure data compatibility between versions.
This criteria interacts with another one : how much freedom the modeller would like to give to other modellers for reusing their model? If we were to keep adding more data points to an archetype, then inheritance turns that into a convenient set of data points for any archetype specialising it (inheriting from it). This is where the second level of openEHR’s design has an advantage over its first level and mainstream OO languages: specialisation can throw away the data points that are not useful/relevant via templating, but there’s no such option in implementations of RM or in mainstream OO programming languages: you have to live with what you inherit.

So archetype modelling is more robust then data modelling in programming languages, because it can deal with the antipattern known as the god class thanks to templating mechanism.

There are gotchas though. If templating and specialisation were all that we needed, openEHR would have one archetype, with an ever growing number of data points, and we’d have templates eliminating everything they did not need, just to keep some data points.

Leaving aside the difficulty of navigating such a semantic beast, there’s one other criteria that stops this from happening, a modelling criteria which also interacts with the others: is this a mandatory data point?
See, mandatory data points the problem from OO languages, because you can’t get rid of them by templating, so if you inherit the archetype, you have to live with that data point. Our modeller now has a choice. If they put the mandatory data point into an archetype, the reuse via inheritance comes with a a price. Not only that, but also data based on future versions of that archetype must populate this field, so there’s a responsibility to bear for other modellers even if they never inherit from it. (to conclude my point above: if openEHR had a god archetype, it’d have so many mandatory data points, it’d be impossible for downstream users to use templating to produce anything sensible.)

Most of the the benefits of composition over inheritance in the OO programming languages land come from avoiding the problems of having to deal with stuff you inherited and cannot omit. My humble opinion is, if you don’t have a strong conviction about a data point being mandatory, the combination of specialisation and templating is a nice way of offering reuse for your models.

Composition can still come to your help though :slight_smile: One situation is, when you’re making various optional data points available to future specialisations, but there is some semantic cohesion between a subset of your data points. I.e. they’re meant to be inherited together to be useful, they relate to each other, or there are some invariants that apply that must hold when a combination of optional data points are used together.

You cannot express these without explicitly identifying that semantically coherent group of data points, and if they don’t have any dependence (cohesion) with the rest of their siblings, then you may want to switch over to composition over inheritance but introducing a new archetype with these data points, which’d let you make the implicit points explicit.

The same applies when a data point being mandatory is conditional upon use or values of other data points. In that case, there’s no need to leave a mandatory data point high up in the archetype inheritance/specialisation hierarchy. You can pack data points relevant-to-the-mandatory-one into an archetype, use composition (slots) to reuse it (optionally), but still keep the mandatory data point mandatory within that archetype, but now you’ve isolated that strong condition to a smaller model rather than forcing it as a contract on all specialisations. I think I saw a comment from @siljelb hinting at this direction, though not entirely:

I think if the element mentioned in the quote above had some relevant data points, there could have been another way to make it available to AQL queries. If the data point and its kin had enough significance to become an archetype, then using composition (slots) to include it in models would make it possible to do
SELECT cls/..data_point_we_want/.../value FROM ... CONTAINS CLUSTER cls[that_extracted_cluster_id] because the cluster archetype would provide a semantic root from which we can acess the data point, no matter where that semantic root is in any other model including it. Happy to be corrected on this one.

So if I was a clinical modeller, these would be the things I’d keep an eye on when making decisions. They may be entirely rubbish of course, in which case I more than welcome some education :slight_smile:

ps: event though I said I’ll write a dedicated post, I have to say @thomas.beale : to be specific, subtype polymorphism is undefined in AQL as things stand, as far as I know. If any vendors are implementing it, it’d be interesting to know :slightly_smiling_face:
ps2: lots of grammar errors and typos, but I’m really busy, sorry


The naming of the specialised archetypes has all (AFAIK) the same pattern as you’ve found and shown above. If the modellers keep on being as clever as until today and stick to that naming convention, your suggestion will work.

Hi @bna,

EHRbase currently does not allow to use wildcards for the archetype ID and therefore would not catch any entries based on a parent archetype. I think we could add this without breaking things, but its not on the roadmap, yet.

To be clear: my intended meaning is not that the query author has to think of putting in the wildcard; the AQL engine should always do it automatically.

1 Like

What would happen in the case someone only wants finds data within the specialized archetype?

To my experience, databases rarely work with inheritance. For example, when creating a SQL query to find employees, I might not want to get other entries from a person table but the ones which are relevant for the set of employees. From a clinical safety point of view, I think being explicit about the information need would be desirable.

1 Like

That would be a breaking change in our CDR. Such a requirement (if we agree on it) must be stated clear in the specifications.

1 Like

I would say it needs a switch to change the processing. We have to remember: if the query processor fails to pick up data of specialised versions of any archetype, it’s a real error, and it may have real-world consequences.

Not sure if this is true. Have a look at the Postgresql documentation. You can see that you have to do something special to get instances of the parent kind but not the children kinds (here: CITY is the parent type and CAPITAL is the specialisation):

SELECT name, elevation
    FROM ONLY cities
    WHERE elevation > 500;
1 Like

Ah, sorry, you are right and I misunderstood the case:

I was referring to the case where someone queries explicitly for the specialization. I think we have a common understanding that this should only retrieve data from instances of the specialized archetype.

For the query on the “parent”, it should retrieve data from the specialized instances, too.

Fully agree, it should be done this way.


Yep, that is certainly true.

I agree, wild-carding should be supported but not as a default.

Example for Better Docs.

    bp/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value as systolic,
    bp/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value as diastolic
    EHR e[ehr_id/value='e119f88b-36b7-4537-9914-22bb9396e101']



so can be used for 1.4 syntactic specialisations.


Depends on where the wildcarding is. In the ‘v*’ bit you are right - hidden wildcards would be an error. Also if the engine were to do *pressure - that also has to be a user choice.

But for any concept of the form xxxxx, not automatically retrieving data for the archetype xxxxx-* is an error. Note - the ‘-’ has to be there.

For ADL2 systems, archetype lookup is needed, since concept names don’t follow the xxxx-yyyy-zzzz pattern.

1 Like

From a safety issue, I think I disagree. I would want to explicitly include specialisations, not have them automatically included by default.

From a safety issue, it is the other way around - it can easily be the case that some template is recording exactly the data in a more basic template, using a specialised version of the archetype, and the important data is in the inherited items - which could be the case for CLUSTER.exam-* archetypes. For the query processor not to pick up this data in response to a query using the parent archetype id is unequivocally an error - it’s the same data. So the users are just not seeing some of the data defined by the parent archetype, just because it happens to be inherited into a specialised child.

We need to talk about this, it’s a serious issue! I’ve put it on the SEC agenda for next meeting.


Safety in this manner is IMHO about being consistent. A given query on the same data should give the same vendor neutral result, and also be consistent across versions of the CDRs.

There are some important rules to agree upon here. I agree.

1 Like

Agree. But the result also needs to be not wrong :wink:

1 Like

We use specialisation patterns extensively. We’re still implementing AQL and it’s designed to return all data from specialisations. I think it’s the only right design. Since you’re querying for a concept e.g. “imaging exam” and as Thomas explained “imaging exam - of Fallopian tube” ‘is_a” “imaging exam” it should be returned by default. I even struggle to think of a scenario where you don’t want the children, but probably there are edge cases.

Yes, this is the only convention that makes sense to me. Query on parent returns result of children. Query on child doesn’t give results for parent. There may be usecases where you want the reverse as an option, but I would spend most of my time making sure the default works.
As with regards to the wildcard in the archetype ID. I only like the version number wildcard. Using it for specialisations is way too brittle imho. And using *-pressure is hard to imagine to give useful data. E.g. cluster.tyre-pressure used in cluster.device is hardly something useful to query at the same time as blood-pressure. And thus I find this pattern dangerous. I’m not against it even people know what they’re doing, but I wouldn’t recommend it as a pattern for regular queries, e.g. for querying children.


It would be great to see some of these archetypes. There are not much specialisation in the available CKM. I would like to see how you are doing it, ie. the granularity and clinical domains.

I am also curious on how you handle distributed versioning of a hierarchy of specialisations. One example might be the need to change the parent archetype with a major change like going from v1 to v2. Based on the naming regime for archetype ids you will not be able to see which parent it is specialised from. How do you handle this?

1 Like

You could have a look at my care plan design: Obsolete status for care plan composition archetype - #25 by joostholslag

Or our covid vac archetypes: Sign In with Auth0
(Register for free account first; Please don’t look at the rest of the design, it’s driven by our limitations e.g. no action support. )

In general my advice would be to specialise on the minor version level, because that way patch updates (typos, etc) will be updated automatically, but breaking changes are not.
One reason to specialise patch level is if you have a hardcover ui, where an extra character my break the layout of a text label.
Another would be if you don’t want new elements to show up in the ui automatically.
Major updates include breaking changes, those should generally be reviewed before updating imho. An exception could be a measurement scale or something that is in itself a concept with versioning (new version of scale means new archetype not new version of an archetype).

These recommendations are coming from a scenario where most archetypes are scales where forms are auto generated from ADL2 templates. And the template is often autogenerated from an adl2 observation (wrapped in comp.result-report and adds eval.clinical_assesment).

Happy to share more, or explain in detail, since the models are quite complex, and adl2 is so different it may be a challenge to understand from this message alone.

1 Like

I think this is good advice, once an archetype reaches v1. But can we specialise on the minor version in ADL1.4?