Syntax for including archetypes in SLOTs, regardless of version

Hi,

Sebastian Garde and I had a brainstorm a while ago about how to handle inclusion of archetypes in SLOTs (either CLUSTERs within ENTRY archetypes, or ENTRY archetypes within COMPOSITIONs or SECTIONs). At the moment this has to be noted explicitly (whether because of tooling or the specifications, I don’t know), so that in order to include for example all historical versions and specialisations of the Body Mass Index archetype in a COMPOSITION or SECTION, I have to include both openEHR-EHR-OBSERVATION.body_mass_index(-[a-zA-Z0-9_]+).v0, openEHR-EHR-OBSERVATION.body_mass_index(-[a-zA-Z0-9_]+).v1 and openEHR-EHR-OBSERVATION.body_mass_index(-[a-zA-Z0-9_]+)*.v2. If we ever make a v3 BMI archetype, this will then need to be added. This is a hassle when modelling archetypes in the first place, and it’s an even worse problem for governing them over time.

Based on the discussion I had with Sebastian, and with the kind help of some regex geeks on Twitter (you know who you are :wink:), I propose one of the following as the default syntax for including any version of a given archetype in a SLOT:

Hi Silje

I may not have got this right, but why not “v[0-9][0-9]?” (or, not care about what follows “body_mass_index”) ?

In other words, add a pattern to catch “any single (possibly double) digit version number” (?).

This looks like a straightforward case of “constrain to openEHR-EHR-OBSERVATION.body_mass_index”.

All the best

Athanasios Anastasiou

v[0-9][0-9] would also include v00-v09, which we don’t want.

\d{1,2}

Karsten

Silje,

just as a technical note, the proper regex for including

openEHR-EHR-OBSERVATION.body_mass_index(-[a-zA-Z0-9_]+).v0, openEHR-EHR-OBSERVATION.body_mass_index(-[a-zA-Z0-9_]+).v1 and openEHR-EHR-OBSERVATION.body_mass_index(-[a-zA-Z0-9_]+)*.v2.

is
openEHR-EHR-OBSERVATION.body_mass_index(-[a-zA-Z0-9_]+)*.v(0|1|2)

or

openEHR-EHR-OBSERVATION.body_mass_index(-[a-zA-Z0-9_]+)*.v[0-2]

to allow any version, just leave the version id part off entirely.

Note that different major versions of an archetype are technically different archetypes - i.e. they contain some breaking change. So whether allowing any major version of an archetype in a slot is a good default probably needs to be thought about carefully.

  • thomas

Ah, sorry, was misled by *number*.

Karsten

For reference, these are the various regexes I use in the ADL workbench.

Thanks Thomas,

The idea here is that we (likely in 99% of the cases) do want to include any version of the archetype, so the .v[0-2] variant isn’t relevant. The reason for this is that even though an archetype will have breaking changes from one major version to the next, the clinical concept will stay the same (or it should have a completely new ID). We don’t generally include archetypes based on their specific content at the time of inclusion, but on the clinical concept they represent.

If leaving the version part out completely is the correct way to leave versioning open when including archetypes, the CKM will need to change behaviours regarding this, since it currently rejects any archetype that does this:

Hi Silje, hi Thomas, hi all,

Whether the CKM validation errors from below are correct or bogus boils down to my question from before whether it is valid or not to just leave the version part out completely.

In my understanding the regex needs to be fully matched which means you cannot just leave it out completely – but it is not 100% clear from the specs as far as I can see (but see my excerpt from the ADL2 specs from before).

If we assume the regex does NOT need to be a full match, then the validation errors from CKM below are bogus of course.

But if partial matches are sufficient, this in turn requires some reinterpretation of existing regexes as well:

For example, openEHR-EHR-OBSERVATION.body_mass_index(-[a-zA-Z0-9_]+)*.v1 would then also match openEHR-EHR-OBSERVATION.body_mass_index.v15 (or v10 v11 etc. for example)

The example below openEHR-EHR-OBSERVATION.body_mass_index(-[a-zA-Z0-9_]+)*.v[0-2] means that not only v0, v1, v2 are valid, but also v10, v15, v27 to name a few.

A few archetypes in CKM (demographics mainly I think) have shortened this further to omit the openEHR-EHR- prefix. This currently turns up as a validation error in CKM as well, but in the partial match interpretation, it would be valid…it can be seen as brief and elegant or regarded as confusing (especially with CLUSTERs from the EHR or DEMOGRAPHIC parts).

Personally, I think it is better to always require a full match here – this is more explicit and avoids unintended side-effects like the ones described above.
But most importantly, I think this needs to be clarified so that either that regex or one with additional open version regex can be used to describe what you want to model: any version is allowed (and a template can then tie this down).

Regards,
Sebastian

(attachments)

image001.jpg

Hi Silje, hi Thomas, hi all,

Whether the CKM validation errors from below are correct or bogus boils down to my question from before whether it is valid or not to just leave the version part out completely.

In my understanding the regex needs to be fully matched which means you cannot just leave it out completely – but it is not 100% clear from the specs as far as I can see (but see my excerpt from the ADL2 specs from before).

if you are using the regex to validate ids, then you will need the full regex to match any valid id. If the regex is just to filter out ids that are validated elsewhere then you can minimise the regex.

If we assume the regex does NOT need to be a full match, then the validation errors from CKM below are bogus of course.

But if partial matches are sufficient, this in turn requires some reinterpretation of existing regexes as well:

For example, openEHR-EHR-OBSERVATION\.body_mass_index(-[a-zA-Z0-9_]+)*\.v1 would then also match openEHR-EHR-OBSERVATION.body_mass_index.v15 (or v10 v11 etc. for example)

The example below openEHR-EHR-OBSERVATION\.body_mass_index(-[a-zA-Z0-9_]+)*\.v[0-2] means that not only v0, v1, v2 are valid, but also v10, v15, v27 to name a few.

the regex character class [0-2] matches only a single digit having the character values in the series 0-2, i.e. 0, 1, or 2.

Now that you mention it, I do seem to remember we specified a very long time ago that those regexes did have to be full validating ones. So that means using something more like

openEHR-EHR-OBSERVATION\.body_mass_index(-[a-zA-Z0-9_]+)*\.v(0|[1-9][0-9]*)(\.(0|[1-9][0-9]*)){0.2}

as the checker regex in CKM, and patterns like openEHR-EHR-OBSERVATION\.body_mass_index(-[a-zA-Z0-9_]+)*\.v.*, where the trailing '.*' matches anything, and the validator regex ensures it is only semver dotted version patterns.

- thomas

The example below openEHR-EHR-OBSERVATION\.body_mass_index(-[a-zA-Z0-9_]+)*\.v[0-2] means that not only v0, v1, v2 are valid, but also v10, v15, v27 to name a few.

the regex character class [0-2] matches only a single digit having the character values in the series 0-2, i.e. 0, 1, or 2.

[SG] Sure – but if it is were the case that only a partial match is required, anything could be added to the left or the right (as long as it makes a valid archetype id).

Now that you mention it, I do seem to remember we specified a very long time ago that those regexes did have to be full validating ones.

[SG] That’s exactly what I mean, yes, thank you. (And that then ensures that the above does not match v10 or v15…)

So that means using something more like

openEHR-EHR-OBSERVATION\.body_mass_index(-[a-zA-Z0-9_]+)*\.v(0|[1-9][0-9]*)(\.(0|[1-9][0-9]*)){0.2}

as the checker regex in CKM, and patterns like openEHR-EHR-OBSERVATION\.body_mass_index(-[a-zA-Z0-9_]+)*\.v.*, where the trailing '.*' matches anything, and the validator regex ensures it is only semver dotted version patterns.

- thomas

Hi everyone,

In our modelling, it is safe to assume that the latest version of an archetype is the best candidate on offer for anyone using an archetype and filling a SLOT for the first time. Options for use of previous versions may be useful for implementers who have older versions in their current systems and don’t want two different versions or to update all their systems to the latest version.

I totally agree that from a governance point of view SLOT inclusions won’t need to specify a version in 99% of cases. However in some situations it theoretically may be appropriate to fix a version in place in a specific SLOT. In fact I can’t think of a use case YET where we need to specify a certain version, but no doubt this will occur at some time in the near future.

In all our modelling it seems that as soon as we limit our options one way or another we discover a use case that breaks our most recently made rule! Murphy’s law?

So we want to have our cake and eat it too – default to any or all versions of an archetype, with the option to specify one (or maybe even multiple) if needs be. Same theory applies to exclusions in a SLOT as well.

The governance overheads of currently specifying v0 and/or v1 and/or v2 will only increase as time goes on and at present as CKAs we have people upset that v0 is specifically included but that archetype has subsequently been published as v1. They want to see v1 specifically included. They don’t understand the theory behind it, not unreasonably, that as long as no archetypes (and versions) are excluded in a specific way, even if the SLOT suggests a v0 as an inclusion, it technically doesn’t stop a v1 being inserted in there. So the inclusion of all versions also has an important design guidance function as well. Newbies may not understand that if an archetype of the appropriate class is no actively excluded, then any or all of the archetypes of that class are technically valid for adding into a template.

Regards

Heather

(attachments)

image001.jpg

Amen.

So, do we have a conclusion that toolmakers can reference? Can we document this somewhere in the specs or elsewhere?

(attachments)

image001.jpg