Semantic versioning for pre and post release states

This is copied from a SEC conversation which discussed adopting full ADL2-lile template naming conventions.

One issue that arose was how much of the Semantic versioning (https://semver.org) rules should be adopted, in particular, the pre and post release numbering aspects.

For clarity, this is primarily about project-level artefacts, not really for CKM-type work, and mostly about templates, though I think it probably does apply to new archetypes.

I agree that it would be helpful to support more ‘non-published’ parts of semver in tooling - @pablo - I think the earlier discussion on whether to support these or not was really centred on whether CDRs and REST API should support them, and there I agree they should not.

However, as @siljelb says. there is definitely value in having at least some support in tooling, especially for templates.

There is quite a lot of variation allowed in smver in this area - is it worth trying toprofile a subset of all of the options to make it easier for tooling implementers to support correctly?

  1. A pre-release version MAY be denoted by appending a hyphen and a series of dot separated identifiers immediately following the patch version. Identifiers MUST comprise only ASCII alphanumerics and hyphens [0-9A-Za-z-]. Identifiers MUST NOT be empty. Numeric identifiers MUST NOT include leading zeroes. Pre-release versions have a lower precedence than the associated normal version. A pre-release version indicates that the version is unstable and might not satisfy the intended compatibility requirements as denoted by its associated normal version. Examples: 1.0.0-alpha, 1.0.0-alpha.1, 1.0.0-0.3.7, 1.0.0-x.7.z.92, 1.0.0-x-y-z.–.

Personally, I would be happy with

1.0.0-alpha.1.x.y.z (with alpha, beta, rc allowed ).

If an artefact is pre-release then tooling auto-updates the second set of semver numbers when it detects changes, rather than the first.

I think I’d prefer to put some constraints on semver in its area, especially if that can help power some tooling support.

Our strong preference for new project level templates and archetypes is to start at 1.0.0-alpha, rather than v0, as this allows much easier transition into production, where we now have a policy of not deploying any .v0 archetypes.

From @siljelb

I’d prefer being able to use 1.0.0-alpha+1.x.y.z (including beta, rc) as well, to clarify that it’s a post-publication revision.

I guess this could work for archetypes as well, but should we mandate the build versioning like 1.0.0-alpha.0.0.1?

1 Like

Mostly makes sense to me. I think we’re already doing a lot of good things in openEHR around semantic versioning but definitely opportunities to improve.

One qeustion I have is whether a use_archetype (and similar like ‘specialise’ in adl2 and AQL) that includes compostion.vital_signs.v1.0.0 would match compostion.vital_signs.v1.0.0 -alpha.1234? That could be dangerous assuming ‘-alpha’ is an unsafe artefact. Probably we do want the answer to be yes, but we need to be careful to manage the ‘-alpha’ artefacts so they don’t accidentally end up in production data. Probably the default would be production environment should not allow ‘alpha’ archetypes and templates.

This does bring me to other opportunities to improve.

CKM currently has draft archetypes a ‘.v0’ with [“revision”] = <“0.0.1-alpha”>. In ADL2 this is assumed to lead to the HRID ‘openEHR-EHR-ACTION.care_plan.v0.0.1-alpha‘ (recent consensus is this needs a namescpace added still). Now this ‘-alpha’ means no v0 ckm archetypes are available for use in production. This is not desirable, since ckm v0 archetypes are usually still a lot better than local (copy?) archetypes where the version number can (also) be locally controlled.
We should consider dropping the ‘-alpha’ from v0 archetypes. Since the ‘v0’ already signals it’s an unstable artefact.
Which leads me to the next opportunity, I do want ckm .v0 archetypes to be stable in the sense they are semantically versioned. Because now it’s unreasonably hard to discover (breaking) changes to the archetype, which is quite key for managing (production) data stability and querying.
Probably good semantic versioning is too much work for such unstable artefacts. The least we could do is increase some part of the version number after every change. And we might be able to signify whether a change is breaking. Which would be very helpful, also in development settings. Some time ago it was proposed to use the second digit for breaking changes and the third for non-breaking (minor and patch). I don’t really like this too much, because we’re making custom rules over a generic versioning problem. So we should study semantic versioning spec to see how they would handle this and try to stay close.
Also I do doubt how unstable ‘.v0’ actually are. My guess is they are unstable mostly when the editors start their work to prepare a publication process. And this might be the point where the version number changes from .v0(.0.1) to .v1.0.0-alpha (and maybe -beta during review and ‘-rc’ after review and before publication (potentially with no actual changes between the latest beta, single rc and published (without -suffix). We should align this with the

lifecycle_state = <“in_development”>

The specs already say quite a lot about this. So I’d like to study and update that.

Curious for your thoughts

I agree unpublished archetypes should be semantically versioned, but it needs to be done after the -alpha so we don’t end up with v23.0.0 as the first published revision. I’m wondering whether it’d be a good idea to move to something similar to what we’re proposing for templates, so like org.openehr::openEHR-EHR-ACTION.care_plan.v1.0.0-alpha.0.3.4 to org.openehr::openEHR-EHR-ACTION.care_plan.v1.0.0-alpha.5.6.7 and so on. Then the first published version can still be org.openehr::openEHR-EHR-ACTION.care_plan.v1.0.0 without breaking anything.

2 Likes

I think there remains a place for v0 archetypes. At the very least in the mean time while we transition to v1.0.0-alpha.X. I would suggest to drop the ‘-alpha’ from v0 asap. And start incrementing the final digit for every change.
Somewhat separate from this I would make the transition from v0.0.1 → v1.0.0 alpha an editorial one. To me it makes sense to do this once the publication process has started (like preparing the archetype by editors, intention to run reviews etc.) and till that point leave it as .v0.X.X. Doest that make sense @siljelb ?
How do others feel about this? @heather.leslie @KBarwise? @sebastian.garde ?

Just noted something in the RM: we might need to make VERSION_TREE_ID a SEMVER in the RM if we want to maintain the exact semantics in the RM data versioning.

  1. VERSION_TREE_ID looks like a semver, but it has a different purpose.
  2. My interpretation is that class if for detecting/tracking distributed changes over the data, like you would do with Git for code, that is: creating and tracking a directed acyclic graph of branches, that then could be merged to the trunk or to a parent branch.
  3. From my own experience, and from what I see, it seems most implementers are not using that class in that way, but more like a semantic versioning thing

So I have two questions:

  1. do we need to have an explicit SEMVER class somewhere in the RM?
  2. someone is actually using VERSION_TREE_ID for branching tracking or they are using it as something like SEMVER?

discussion with @heather.leslie:

Agreements:
The active changes CKM editor do work well.
The v0 → v1 should be kept.
Editors don’t manage the versioning before v0.
Editors don’t do anything with the ‘-alpha’. No impact on editors if we (automatically) change the ‘x’ parts from v0.X.x.-xx is ok to change to fit implementers needs (just not the v0.).

conclusions from Joost:

  • ok to drop ‘-alpha’ (from v0 archetypes)
  • ok to start updating the version number for v0 archetypes for ‘every’ change to the archetype.

Thoughts:
Concern from Heather: changing the archetype publication to archetypes ‘in review’ as ‘v1-alpha’ may lead to confusion among users of the archetype. And may incur a lot of manual work from scarce resources.
v1 is a clear signal to clinical modellers wether to know an artefact is stable enough for e.g. translations.

Todo:
Discuss with tooling (e.g. @sebastian.garde) around changing CKM to drop ‘-alpha’. And update the version number.
Think more on how and if to use ‘v1.0.0-alpha’ for archetypes in review/pre publication.

I think the indicator for resource stability is not the version but the lifecycle_state = “approved” (or published in the CKM = green check)

REF: Common Information Model

In semver 1.0.0-alpha means it’s a pre-release of 1.0.0, so by their definition, semantics are very clear.

What might not be so clear is that we have two different concepts for versioning and lifecycle state that actually cross in versioning = 1.0.0 and lifecycle state = “approved” / “published”, but beyond that specific point, both should be independent IMHO.

I understand that going to 1.0.0 it’s an important milestone and has that extra load of being sure what’s marked as approved/published is actually stable. In software is the same, though in software we might have 0.5.1-alpha, and maybe on archetypes only 1.0.0-alpha makes sense as a pre-step to verify everything is OK so the lifecycle state can change to published. But then, for archetypes it might also be important for the major releases: 2.0.0-alpha, 3.0.0-alpha, etc even though the archetype is already published.

In software we also use m1, m2, m3 to mark pre-release versions to mark milestones, for instance m1 could be all technical validation passes, m2 could be to check all translations are complete, etc.

Also note my previous message about that we might need to add something to the RM to support all these, because it’s not only about the syntax, it’s about the ARCHETYPE_ID and the REVISION_HISTORY_ITEM version_id associated with the AUTHORED_RESOURCE (see common package).

Sure, that’s the theory, and easy for an engineer. This was what we relied on until this new approach to Archetype identification was developed. The synchronous change of the lifecycle state and version from v0 to v1 was a deliberate decision to make it absolutely clear to everyone, including non-technical participants eg translators.

From memory, at the time of the revision of the Archetype Identification spec in ~2014/5, we were dealing with v1 archetypes as the default for initial or draft archetypes generated from the initial archetype designer and it not being clear what the version transition was on publication published, ie every archetype version that carried the lifecycle transition to published would potentially be different. Moving from draft by default as v1 to v2 on publication did not make sense but signified a significant change in status. So we reverted to v0 for draft and initial archetypes.

If we’re dealing with archetypes ‘in the wild’ (and we were in 2015), the combo of v1 and ‘published’ helped to confirm they were following the specs rather than random versioning. A v1 plus any state other than published or a v0 with ‘published’ should be considered ‘dodgy’ until proven otherwise.

To designate something as ‘pre-release’ needs a clear trigger. I can’t think of a reason for this from a knowledge governance POV.

Initiation of a review, which is part of the publication lifecycle, has been suggested as a potential trigger for designation as 'pre-release. However does not mean any more than than a first review has been initiated, rather than someone has expressed interest and intent to progress the archetype through the public review process. The unfortunate reality is that we have a number of archetypes that have been stuck for years (redesignated from ‘in review’ to ‘paused’, so the notion of ‘pre-release’ is meaningless in those situations. The time to pass through review ranges from weeks to years, depending on complexity, the levels of difficulty and interest. However, once an archetype is deemed ready for publication following consensus on the clinical content, the time to publication is only one week - the notice period of pending publication on Discourse as part of our publication process. It doesn’t seem helpful to use this as a trigger for changing the version to pre-release in this short time.

Regards

Heather

1 Like

I Heather, I understand the confusion while migrating to a different versioning scheme, though that’s independent from the professional background, for me it was also confusing :slight_smile: What I’m talking about is from now on, now that the new versioning scheme is already established and known for both clinical modelers and engineers.

I just wanted to add more info about what @joostholslag mentioned on the previous email about the discussion of using or not the -alpha (or whatever pre-release qualifier supported by semver), and what -alpha would mean in terms of the stability of an archetype.

From my PoV, every archetype publicly available in the CKM is actually “released” (because even if you’re in the middle of a translation, you are working on a branch with a specific version assigned there), so now that we know that version and status are actually overlapping concepts, both need to be co-managed. For instance, if x.y.z-alpha would be used, then we should all know that’s a pre-release of version x.y.z that for whatever reason is still being reviewed, tested, translated, validated, etc. to make it stable for the x.y.z release. Though that might be changing the archetype’s version without changing the status, so I’m not sure if pre-releases in terms of semver have any value for managing archetypes. So I agree with this:

I would suggest to do the exercise to simulate the archetype lifecycle from a newly proposed archetype to a published one, to see how the statuses and semantic versions could change, and detect any possible inconsistencies or new requirements, for instance if -alpha means something or it’s not needed in this context.

My interest is not directly on the archetypes but on the templates, because I would like to apply the same versioning logic to OPTs.

Best,
Pablo.

1 Like