openEHR artefact namespace identifiers

Hi,

About a year ago Thomas published a draft of some detailed artefact identification proposals at http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/knowledge_id_system.pdf

to help with the rapidly approaching scenario of having to cope with similarly named artefacts being published by different authorities. We are starting to see this scenario emerging in real-world projects and whilst potential collisions can be managed informally for now, we will need a formal mechanism before long.

I would like to raise one aspect which I think might need re-thought on the basis of recent IHTSDO proposal for SNOMED covering the same ground.

In the pdf Thomas says

" When an archetype is moved from its original PO (e.g. a local health authority, or a specialist peak
body) to a more central authoring domain (e.g. a national library, openEHR.org) its namespace will be
changed to the new domain, as part of the review and handover process. The archetype’s semantic
definition may or may not change. In order for tools to know that an archetype was not created new
locally, but was moved from another PO, an explicit reference statement can be made in the archetype
in the description section of an archetype as follows:"

id_history = <“se.skl.epj::openEHR-EHR-EVALUATION.problem.v1”

The IHTSDO proposals cover the same scenario i.e a SNOMED code originally authored in one namespace subsequently being managed in a new namespace. A good example might be a SNOMED term which is originally used t a national level but is then adopted internationally. They suggest that the term keeps its original authored namespace, and it is the namespace of the managing entity that changes, arguing that this is much less disruptive to systems that are using the term concerned.

I think we should consider adopting the same approach for openEHR archetypes, as otherwise the formal identifier of an archetype will change if a locally developed archetype becomes promoted to international use, a relatively common occurrence.

We would then need to record the current publisher so that the agency with current responsibility could be identified
current_publisher = <“se.skl.epj”>

Thoughts would be welcome as I think we need to start making these (or alternative) specifications formal to enable tooling and application support to go ahead.

Ian

Dr Ian McNicoll
office +44 (0)1536 414994
fax +44 (0)1536 516317
mobile +44 (0)775 209 7859
skype ianmcnicoll
ian.mcnicoll@oceaninformatics.com

Clinical analyst, Ocean Informatics, UK
openEHR Clinical Knowledge Editor www.openehr.org/knowledge
Honorary Senior Research Associate, CHIME, UCL
BCS Primary Health Care www.phcsg.org

Hello,

I like that approach regarding namespaces, it will be needed sooner than later.

Related to archetype identifiers there is another problem still to be solved. How they deal with RM evolutions? Current openEHR RM release is 1.0.2 but it can change in the future. Nowhere at archetypes is said which RM version was used to define them. This information should go, at least, at the archetype header, but probably should also be represented at the archetype id. Otherwise we will not be able to differentiate between an archetype for one version of the RM and the same archetype (modified if it is the case) for a different one.

David

2011/4/5 Ian McNicoll <Ian.McNicoll@oceaninformatics.com>

Also, some of the restrictions of the identifier name seem arbitrary,
like minimum number of characters or what characters can you put.

Hi Ian,

This sounds more sensible to me, I was always worried about the change in
identifier when it got escalated to a higher PO.

I wonder if we can get a summary of the rest of the SNOMED namespacing
scheme so that we can see if it is usable in its entirety.

I have always been a supporter of the readable identifiers but I am now
starting to see their limitations in reality. I think we should seriously
consider an existing standard unique identifier scheme (UUID/GUID or OID)
rather than trying to invent a new one. I understand that there are issues
with using these existing schemes but I can't say that I am seeing that
namespaced archetype ID helps these issues, the only benefit is some
readability but this is countered by clashes in the wild and the governance
overhead to get it controlled.

The Archetype class has a UID attribute, I think we need to start using this
as an object identifier, after all an archetype is an object. In the
artefact maintenance area, we can start using the ObjectVersionId scheme to
manage the PO (creating system) and revisions. Alternatively we can use a
single OID to represent the PO with a root and the artefact ID as an
extension.

The issue with this suggested change is that we will have to work out how we
make this work with the existing archetype IDs used in data or transition
away from using existing archetype IDs. In the short term I think we need
to concentrate on template identifiers as the problem is worse with many
more organisations producing templates that overlap and less being reused
between organisations. Therefore, we can try a standard UID approach for
templates without the legacy and once we have this sorted we can look at
integrating back into archetypes.

Personally, I would like to propose the use of OIDs for controlled artefacts
as it is an ISO standard and already used in health informatics for
identifying such knowledge artefacts such as terminologies. I know OIDs are
not liked due their length, unreadability and managed allocation, but to me
it is a natural fit for this kind of artefact ID. Each publishing
organisation can get an OID and manage the items that they produce, this can
be done using a content management system automatically as is done by CKM.
And to be honest, the new namespaced ID scheme is likely to be longer and
requires management, and barely legible once we include the namespace and
additional delimiters.

We can also have a fallback to GUID for organisations that don't have an OID
and a knowledge management system to maintain an OID. This would be the
default when a new template is created in a template editor but can be
upgraded to an OID once submitted to a knowledge management system.

Regards

Heath

Note that in the SNOMED case, there are two identifiers in play: the concept identifier (which contains the namespace ID) and a module identifier. The idea is that ye namespace in the concept identifier will remain fixed and thus indicate the entity that originally introduced the concept, while the moduleId used associated with the defining entries in the release files changes to indicate the entity currently maintaining that concept.

michael

Hi!

artefact identification proposals
at http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/knowledge_id_system.pdf

...

se.skl.epj::openEHR-EHR-EVALUATION.problem.v1

...Then discussions regarding UUIDs, OIDs etc followed in several messages....

Is not the simplest thing to just use URIs [
http://en.wikipedia.org/wiki/Uniform_Resource_Identifier ], or even
better allowing non-latin characters by using IRIs [
http://tools.ietf.org/html/rfc3987 ]?

Then organizations can choose if they want to base IDs on
domain-names, UUIDs, OIDs or whatever that fits in a URI (which might
be a URN, see list at http://www.iana.org/assignments/urn-namespaces/
). Some archetype authoring organizations may like names with
semantics, some may not, so why enforce any of the views.

Now since metadata is going to be well defined inside the file, the
need for semantics in identifiers or file names is gone so the main
thing left is that we want a _unique_ string. URIs are supposed to be
unique.

Some URI-examples:
urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6
urn:oid:1.3.6.1.2.1.27
urn:lsid:chemacx.cambridgesoft.com:ACX:CAS967582:1
http://id.skl.se/openEHR/EHR-EVALUATION.problem.v1
http://schema.openehr.org/openEHR/EHR/EVALUATION/problem/v3
urn:nbn:se:liu:diva-38012

I see no point in enforcing usage of OIDs as suggested in some responses.

The idea of not changing the ID if/when transferring responsibility of
an archetype between authorities sounds very reasonable if the content
is unchanged.

When I visited Brazil, I noticed that the MLHIM project's development
version was using UUIDs for the artifacts (CCDs) that correspond to
what is called archetypes in openEHR.

Best regards,
Erik Sundvall
erik.sundvall@liu.se http://www.imt.liu.se/~erisu/ Tel: +46-13-286733

Hello,

I like that approach regarding namespaces, it will be needed sooner than later.

Related to archetype identifiers there is another problem still to be solved. How they deal with RM evolutions? Current openEHR RM release is 1.0.2 but it can change in the future. Nowhere at archetypes is said which RM version was used to define them. This information should go, at least, at the archetype header, but probably should also be represented at the archetype id. Otherwise we will not be able to differentiate between an archetype for one version of the RM and the same archetype (modified if it is the case) for a different one.

It should go in the archetype, that is for sure - but it should be understood only as ‘the RM version used when this archetype was authored / quality assured etc’ - rather than ‘the RM version for which this archetype is valid’. The reason is easy to understand: for some particular archetype, authored at RM 1.0.2 let’s say, it may be valid for many RM revisions after that, even RM 2.x, and not only that, it might be perfectly valid for prior revisions e.g. 1.0, 1.0.1, even 0.95 - it can depend a lot on what parts of the RM the archetype happens to use. This is the reason I argued against including the RM version in the archetype id, because it doesn’t tell us anything about validity. (We had a long discussion about this on the technical list last year or 2009 I forget which).

Now.. if the RM changes, let’s say to 2.0.0, then we might assume that there are one or two breaking changes, and that a few archetypes could break. The only way I can see to deal with this is:

  • we stick with the rule that minor RM change numbers never break archetypes (or indeed existing data), i..e 1.0.1 → 1.0.2 → 1.0.3 etc is guaranteed safe

  • we say that a major RM version change, i.e. 2.x, 3.x etc that includes breaking changes there has to be a validity test run on all archetypes.

  • any that don’t pass, i.e. are compromised by the change need to be marked in some way, maybe a header field with the meaning ‘valid up to RM release xxx’ or so.

  • such archetypes would themselves then have to be versioned (xxxx.v1 => xxxx.v2)
    It should be remembered that we can undertake many innovations and ‘fixes’ that don’t break anything on the RM, and therefore don’t require a major release. So openEHR 2.x, 3.x etc are likely to be extremely rare events.

  • thomas

(attachments)

OceanInformaticsl.JPG

Hi!

I like the Erik’s idea of having a global and unique URI to reference each archetype (also for templates). This will help to build a global archetype server, and the domain of the URI can act as the namespace of the archetype. And, if those domains really exist (like openehr.org), an archetype URI can be equal to real working URL :D, so we can request any archetype directly from the server via HTTP requests.

And it’ll be great to build automated archetype tests to check the validity of an archetype against a version of the RM, as Thomas said. It’ll be great to have this functionality integrated with the archetype editor.

(attachments)

OceanInformaticsl.JPG

Dear Thomas,

I agree with your general approach, but you miss two important things.

  1. If openEHR spreads, you cannot expect that all implementations will be always up-to-date. That means that just upgrading the version number of the archetype will not be enough for a system to automatically differentiate the right version for the RM version they have implemented. If we just say “Ok, but since that information will be at the archetype header we don’t need it at the identifier”, we can also say the same for the concept, the RM class, the RM and the organisation. At the end, we will find that we don’t need a semantic id, as Erik said, and that a UUID, OID or whatever is enough for identifying archetypes.

  2. Archetypes are not only for openEHR. We must always have in mind that other reference models can be used with their own life-cycle that could be not so fine-grained as in openEHR. For example, we are now creating HL7 CDA R2 archetypes but during this year it is expected CDA R3 to be approved. How will we differentiate archetypes of R2 from archetypes of R3? Once again, R2 will still be used for many years and updating the version number isn’t enough.

Finally, I also agree that is not the same to talk about “RM version of an archetype” than to talk about “archetype validity regarding a RM”, but one should not exclude the other.

David

2011/4/7 Thomas Beale <thomas.beale@oceaninformatics.com>

(attachments)

OceanInformaticsl.JPG

Dear Thomas,

I agree with your general approach, but you miss two important things.

1. If openEHR spreads, you cannot expect that all implementations will
be always up-to-date. That means that just upgrading the version
number of the archetype will not be enough for a system to
automatically differentiate the right version for the RM version they
have implemented. If we just say "Ok, but since that information will
be at the archetype header we don't need it at the identifier", we can
also say the same for the concept, the RM class, the RM and the
organisation. At the end, we will find that we don't need a semantic
id, as Erik said, and that a UUID, OID or whatever is enough for
identifying archetypes.

I have no problem with a UUID or similar, but I don't think they are
that useful on their own, and they require more tooling support. If you
want to make inferences about what RM class, and what clinical concept
a given archetype is about, and you only have a UUID, you need to know
what / where to lookup. You also have to be able to have rules somewhere
to know when to assign a new UUID, when to treat two different UUIDs as
referring to archetypes that are actually revisions (i.e both compatible
with the same data), when to treat them as versions (not compatible with
the same data). We could potentially make the RM class name and concept
id just new fields in the description. The thing you lose (and maybe it
is worth losing) is that by creating a tuple of a particular subset of
meta-data items, viz: publishing organisation + RM issuer + RM class
name + archetype concept id + archetype version id, this happens to be
the unique key in archetype space (a new revision or new translation on
the other hand is obviously a new artefact, but it is not semantically a
new archetype). To achieve the equivalent with a UUID -based id system
means smarter software and either centralised id management (ISO oids)
or some distributed id system, possibly like DNS. At the moment we can
process the archetype id and directly know the relationship between two
archetypes. I think going down the UUID road will require more agreement
on tools and governance infrastructure.

2. Archetypes are not only for openEHR. We must always have in mind
that other reference models can be used with their own life-cycle that
could be not so fine-grained as in openEHR. For example, we are now
creating HL7 CDA R2 archetypes but during this year it is expected CDA
R3 to be approved. How will we differentiate archetypes of R2 from
archetypes of R3? Once again, R2 will still be used for many years and
updating the version number isn't enough.

I don't know what R3 looks like, but if it is a different reference
model, then the ids would be something like

hl7-cda3-Entry.diagnosis.v1

If CDA r3 on the other hand is a clean extension / superset of CDAr2,
then the 'reference model' is really just 'cda', and there is no simple
relationship between particular archetypes and particular releases of
the CDA model.

- thomas

Now since metadata is going to be well defined inside the file, the

it is not well enough defined yet, but it could be. We would need to do
some work on that to define the exact rules.

need for semantics in identifiers or file names is gone so the main
thing left is that we want a _unique_ string. URIs are supposed to be
unique.

Some URI-examples:
urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6
urn:oid:1.3.6.1.2.1.27
urn:lsid:chemacx.cambridgesoft.com:ACX:CAS967582:1
http://id.skl.se/openEHR/EHR-EVALUATION.problem.v1
http://schema.openehr.org/openEHR/EHR/EVALUATION/problem/v3
urn:nbn:se:liu:diva-38012

There are two ways to see this. We could say that, assuming archetypes
become quite widespread in IT in general, that anything should be
allowed, just make it a URI formatted id. However, for major domains
like health, I don't know if this helps, I think we need better
standards than that. It would be a bit like saying to SNOMED national
release centres: go make your own concept ids, you don't need to follow
IHTSDO rules, only a meta-rule that says: don't take an id that has
already been used.

I think that the URI/IRI/etc argument is a different dimension from the
content of the ID. URIs et al are about technical accessibility within a
notionally online info-fabric. But specific communities are still going
to want to control their spaces- e.g. I don't think we will see ISBNs
die out any time soon.

- thomas

Oids probably are the one kind of id I would not propose for archetypes; the multi-axial id in current use + the proposed namespace id is equivalent to an Oid, just with some more constrained rules on what is on the axes, and readable values. The need for a highly managed id assignment system plus loss of readability and inferencing capability seems like a backward step to me. UUIDs seem a more obvious step. Note that UUIDs don’t cope properly with namespaces nor versions, and there are already id systems that assign a UUID to the ‘artefact’ and a second UUID to the version, so that it can be inferred if two concrete artefact instances are really just versions of the same thing. Note that a UUID is massive overkill for a version id of something! But this just shows that simple assignment of UUIDs or Oids is no panacea…

  • thomas
  1. Archetypes are not only for openEHR. We must always have in mind
    that other reference models can be used with their own life-cycle that
    could be not so fine-grained as in openEHR. For example, we are now
    creating HL7 CDA R2 archetypes but during this year it is expected CDA
    R3 to be approved. How will we differentiate archetypes of R2 from
    archetypes of R3? Once again, R2 will still be used for many years and
    updating the version number isn’t enough.

I don’t know what R3 looks like, but if it is a different reference
model, then the ids would be something like

hl7-cda3-Entry.diagnosis.v1

If CDA r3 on the other hand is a clean extension / superset of CDAr2,
then the ‘reference model’ is really just ‘cda’, and there is no simple
relationship between particular archetypes and particular releases of
the CDA model.

  • thomas

So, at the end you are putting the RM version somewhere at the identifier… :slight_smile:

(btw, I don’t know which will be the changes at R3)

David

That we are very cautious about reference model version changes
doesn't mean that any other organization does the same.
Look at HL7 v2 & v3 for example :wink:

no - I am just using ‘r2’, ‘r3’ as parts of a name. I.e. ‘CDAr2’ and ‘CDAr3’ may be two different names just like ‘Marco’ and ‘Pierro’, … or like ‘ICD9’ and ‘ICD10’. Here I am assuming that HL7 will allow anything to happen in CDAr3, including radical changes. If that is not the case, and CDAr3 is really just a clean extension of r2, then the RM we are talking about is just ‘CDA’, and any given archetype may or may not work with any particular minor or major release of it.

  • thomas

Hi Erik,
I was suggesting that we enforce OIDs, in fact my intent was similar to
yours, to open up the choice of what is used and not enforce the specially
designed ID scheme currently used that requires upgrading to support
namespacing making it have the same issues as the standard UID schemes.

I like the suggestion of URIs, although I also agree with Tom's later
comment that within openEHR implementations we should try to limit the
options of the URI schemes used. However, ADL and AOM shouldn't be
restricted to this same set, to allow other implementation profiles for
other reference models to make their own choices.

Heath

Thomas,

Your proposed changes to the archetype Identifiers and governance actually aligns with the same management and inferencing requirements as OIDs, the only benefit left is the readability, but even that is becoming hard to do with the additional namespaces and delimiters. In addition, having meaningful IDs and deriving meaning from IDs is counter to what good practice in terminology identifier management.

If we choose a GUID (or any other standard UID) for the archetype ID, then I see no reason why the VersionedObjectId scheme cannot be used for managing versions of the archetype as long as it is properly administered.

Heath

I will have a play around with some new meta-data additions to archetypes and put them on this list in the next day or so. Let’s then think about what is needed in terms of different kinds of ‘identifiers’, both assigned and generated.

  • thomas

Hi!

While we are discussing metadata and identifiers... Shouldn't the
metadata/description part of an archetype/template be based on Dublin
Core ( http://en.wikipedia.org/wiki/Dublin_Core ) instead of an
openEHR specific approach? That might make librarians, search engines
and other existing artifact storage/searc software feel more at home
:slight_smile: Using Dublin Core does not take away the requirement of having
identifiers though, since something has to go into the Dublin Core
identifier field. There are several levels of Dublin Core and the ones
with good structural requirements on the values may be useful.

The idea of using Dublin Core for archetype/template-like-artifacts I
got from Tim Cook et.al. in MLHIM.

Best regards,
Erik Sundvall
erik.sundvall@liu.se http://www.imt.liu.se/~erisu/ Tel: +46-13-286733

Hello

I have been working some time in DCM/archetype metadata. Dublin Core
is suitable for that, however, there is an ISO norm (ISO 15699. Health
informatics. Clinical knowledge resources. Metadata ) which is an
extension of Dublin Core for Health informatics and it's even more
suitable. They have even defined 'archetype' as one of the valid
document types!. I would be interested in helping on metadata topic.
When do we start? :wink: