Hi Bert,
Whilst I understand and agree with the need for very precise identification of a particular archetype via MD5 or some other ‘meaningless identifier/locator’, it is also necessary to clearly identify the version, revision and build of an archetype for authoring and governance purposes, as well as in early stages of development and testing.
During the authoring phase, where archetypes and templates can be very fluid, and have multiple dependencies, our experience has been that MD5 hashing is very difficult to manage since it forces very strict dependencies, where in practice we perhaps only want to ensure that the archetype Version is correct, and temporarily ignore Revision changes or minor ‘Build’ changes, until publication/deployment. In terms of regular software development, imagine the consequences if all of your source code files were MD5 hashed, and each dependency MD5 tagged, resulting in a compile failure if there is a mismatch between the dependent files.
I am not convinced that abandoning the existing archetypeID mechanism is the right approach but even if it were I think we would still need to support it for legacy purposes. Let’s also leave aside the OID vs ReverseURI issue for now (there is a possible case for supporting both).
As I said in a previous post, I am working on some proposals I am working on, based on earlier discussions, Thomas’s wiki material, the suggestions for Semantic Versioning at semver.org and experience from both CKM and non-CKM repository governance.
The suggestion will be to introduce an extendedArchtypeID, composed of …
OriginalDomain: The originating Domain namespace where the archetype was first governed e.g. org.openehr but could be an OID. This never changes even when the archetype is transferred to a different repository.
ArchetypeID: The current archetypeID (including Version) e.g. openEHR-EHR-OBSERVATION.blood_pressure.v1
RevisionNumber: The revision number e.g ‘2’ starting from base 0 with each new Version
NonOperationalModifier: A state suffix e.g. (initial, draft, rc, inactive).
BuildID: This is repository-defined. In the case of CKM it might be the ‘citeable-ID’ e.g. 1031.1.1296_2 but it could equally be an MD5 Hash for a non-CKM repository. The only stipulation is that any archetype made visible by the repository must increment the BuildID, whenever ANY change is made to the archetype.
So the full ID for a published archetype might look something like
org.openEHR::openEHR-EHR-OBSERVATION.blood_pressure.v1.0+build1031.1.1296_2
or for the same archetype taken back into review
org.openEHR::openEHR-EHR-OBSERVATION.blood_pressure.v1.1-draft+build1031.1.1296_7
The nomenclature is closely based on semver.org with some modifications
The metadata should also carry :
AuthoringLifeCyle: (Initial, Draft, Team Review, Suspended Review, Published, Rejected, Superseded, Moved). This is effectively a superset of the operational states to account for authoring and governance requirements which do not affect the operational status of the archetype. i.e an archetype which moves from Draft to Team Review and then Review Suspended, remains a ‘beta’ archetype from an operational perspective.There is a direct mapping between the operational states and one or more Author lifecycle states.
CurrentDomain: identifier for the current controlling domain, to allow for situations where control/governance of the archetype is transferred.
We can probably carry all of the necessary extra metadata within the current ADL1.4 specification under other_details.
In general I would expect archetypes and templates to refer to dependent archetypes or templates using the full extended identifier, but associated tools could choose to ignore e.g build or even revision incompatibilities where appropriate, and update the references automatically to point to the currently used artifacts, if modelling is still at a stage where loose coupling is necessary.
In terms of the original discussion, I think my main concern is that not that the state of understanding of how DCMs/archetypes should be managed.governed and versioned is still very much in the early phases of understanding, particularly in the very distributed governance environment that I think we all agree will be required. openEHR archetypes (? also FHIR resources) are a bit of a special case since we expect these to form the basis of persisted, queryable data, not just as messaging definitions. This makes the challenge of versioning / governance rules much harder but I think we are pretty close to having an implementable solution.
More soon … and would be interested in others thoughts about the BuildID, particularly in the context of Git or Subversion-based repositories.
Ian