openEHR system_id syntax in specification ? ...and in practice (survey)?

erik.sundvall · 20 October 2024 08:58

Hi!

From Karolinska we’d just like to double-check the recommended syntax of the system_id for CDRs. Examples in different specification documents are a bit confusing.

Introduction

I do not know if there is any explicit specification text regarding syntax/content of system_id, not even in main description in section 6.1.1. of the Architecture Overview - perhaps that should be added? Other places describing system_id don’t define syntax either…

In AUDIT_DETAILS.system_id it is just described as a String.
In FEEDER_AUDIT_DETAILS.system_id it is just described as a String (but that kind of feeder-system is not necessarily an openEHR based CDR)
In EHR.system_id it is described as a HIER_OBJECT_ID

…but there are some different formats of the system_id part of composition-IDs used in instance examples.

In section “9.2.2. Levels of Identification” of
the Architecture Overview reverse DNS/domain notation is used in examples, such as au.gov.health.rdh.ehr1
In section “6.3.3. Version Identification” of
the Common Information Model reverse DNS/domain notation is used in examples, such as nz.gov.msh
In section “5.3.2.5. Identifying Versions within openEHR Versioned Containers” of the Base Types specification reverse DNS/domain notation is used in examples, such as uk.nhs.ehr1
The overview in REST Specification uses forwards DNS/domain notation in examples, such as openEHRSys.example.com - is this an error?

Main question

So… is reverse DNS/domain notation the recommended form or not? I “grew up” with that, reading the examples by @thomas.beale and others.

Survey

What syntax are you and your customers using? At Ocean @Seref & @chunlan.ma? What about experience from freshEHR @ian.mcnicoll? Better @Bostjan_Lah & @matijap? Nedap @joostholslag & @MattijsK? Vitasystems @birger.haarbrandt & @vidi42? DIPS @bna? Cabolabs @pablo? Code24 @sebastian.iancu? Cambio @martin.grundberg, @mikael & @rong.chen? Medblocks @Sidharth_Ramesh? And others I forgot to mention here?

Background

Very soon we’ll be putting a new CDR product into real production (replacing another product) and fairly soon also starting to add new original data into it, and after storing original data that is not found anywhere else, there is no easy turning back regarding system_id without a messy change in already stored, possibly signed and crosslinked, EHR content. The production use cases we have had in CDRs so far have been based on openEHR-fomatted converted copies of data from other source systems, and those imports can always be re-run if/when changing CDR product and system_id sytax can be changed if we want to.

Our current assumption would be that something like the following IDs would work as system_id inside different logical openEHR EHRs for us:

system_id for default logical CDR instance (“main regional EHR system”):
se.regionstockholm.openehr
system_id for dev instance of a logical CDR used for technical development, only containing fake patient data:
se.regionstockholm.openehr.dev
system_id example for possibly further even more pseudonomized research-only (logical) CDR, possibly mixing content from different source CDRs:
se.regionstockholm.openehr.research

birger.haarbrandt · 20 October 2024 10:09

Hi @erik.sundvall,

we are using the reverse domain notation within EHRbase and HIP. As we provide a multi-tenant system, the tenant name is reflected in the different system IDs. For example:

karolinska.hip-cdr-core-ehrbase-enterprise.sandbox.vghip.cloud

Hope this helps

erik.sundvall · 21 October 2024 05:30

Ooops, that response is a bit confusing in itself @birger.haarbrandt , cloud is the top level domain (corresponding to .se .de .com .org etc) and vghip is registered to “vitagroup AG”. This means you are not using reverse domain name notation, see:

I assume the example is only intended for a potential test system, if I represented a healthcare provider and would set up an openEHR system intended for lifelong health records, I would most certainly try to avoid having productnames or hosting solutions in the system_id. I would also try to have the system_id based on the (reverse) domain of the hospital, region or some national EHR organisation depending on the scope of the CDR.

P.S. We have sometimes seen providers confusing the (logical) system_id with the more “physical” http server name. Those usually have different purpose and lifespan. Sometimes they may of course align if you (e.g. using a DNS CNAME alias) point the forwards version of the adress , e.g. https://openehr.regionstockholm.se to the CDR with the openEHR system_id se.regionstockholm.openehr - but that is not necessary.

birger.haarbrandt · 21 October 2024 07:16

Hi Erik,

thanks for pointing this out, it seems I was a bit too quick to reply. This is indeed a test system. I will check, if I find a more helpful example.

bna · 22 October 2024 16:26

Hi Erik
AFAIK we use a GUID as the representation of the system in system_id. We have had several discussions over the years internally in the organisation on this topic.

Tend to be like: Should we use the system_id more actively to state something about the semantic of the installations or the system at hand.

In practical installations and use in internal test environements. customers different staging environments and also over the years merging different hospitals into the same system - we have not found a way to cope with some reverse domain notation.

So to sum up: We use a GUID to identify the system.

erik.sundvall · 24 October 2024 15:49

@thomas.beale do you have any input on this for those wanting semantics in system-ID (as opposed to DIPS)? It looks like you sugested reverse DNS notation in specification examples.

Personally I find them good as a way to point out that we mean logical belonging rather than a physical host, and most such similar identifiers (e.g. OIDs, IP-numbers, Version numbering etc) go left to right from coarse-grained/general to more fine-grained/specific.

sebastian.iancu · 30 October 2024 17:20

Hi Erik,

We use since 2011 an internal format for id, short unique format 16chars, as it was allowed and was best working with our db index. Besides that, we also have a RM type called system, which besides system-id has also a system-name like a domain name and potentially other info. This is basically what I would like to suggest to be adopted for RM 1.2. Tthere is a ticket on my pile exactly about all problem you mentioned in your assessment. I think some years ago i also made a confluence page about it. If we adopt that new RM type we could always use id or name, and we just have to settle about id syntax - perhaps uuid.
Another aspect related to system-id appears also when you look to aql federation, and the fact that there is a need for a node-id, which is different semantically from system-id - has to do with virtual systems, distributed systems over phisical CDRs, multitenancy, etc

pablo · 31 October 2024 18:48

Hi, in our case it’s used more like a code from a dictionary/nomenclature/code list.

We have our master lists of codes, one of those lists should be “System IDs”, then the entries on that list are the ones valid for the openEHR system_id. In that master list we can assign other attributes to the code if needed. Then depending on the environment those codes could mean different things. For instance, it could be the ID for a specific application/system for recording clinical data, it could be a hospital ID, a clinic or hospital network ID, a region ID, etc. That is because the definition of “system” in different environments might mean different things, so we like the flexibility to adapt to as many cases as possible.

The values themselves don’t have any specific structure or syntax that we could derive meaning from all of it or it’s parts (it’s an opaque code) and the meaning is really defined in the dictionary.

Hope that helps!

thomas.beale · 31 October 2024 19:58

I have to say that I still don’t see any better id than a reverse domain
name. Usually the front part of a reverse domain name indicates the
org’s identity, like uk.nhs or similar, and the rest is usually
something like the facility (hospital or ‘trust’ in the UK etc) and then
some system ids after that.

Although I like Guids for nearly everything else, it seems a bit painful
for this use to me. But DIPS might have reasons I don’t know about

erik.sundvall · 1 November 2024 12:33

Thanks for the feedback @thomas.beale! I agree and reverse domain notation fits our use cases in Region Stockholm for a single main logical instance of regional EHR. Since the system_id appears in each composition and it also can be good/interesting for a human to understand when merging data from different systems, human-understandable identifiers are good.

The question remains if any recommendations should be mentioned in the spec for differnt use case categories, e.g.

If global conflict-free federation possibilities are desired, then:
a. In cases where identifier semantics, e.g. based on organisational/regional ownership of the EHR is stable and clear, reverse domain name notation is recomomended (not forward domain notation)
b. In cases where semantics of the system_id can or should not be easily determined and stable, other globally unique identifiers (like UUIDs) are recommended.
If it is known ahead of time that EHR content from the system will never be used in federations where system_ids can conflict, (or that system_id will be changed to globally unique identifers before content is used outside the system,) then any string can be used as identifier.

sebastian.iancu · 10 January 2025 10:12

Linking here a related PR ticket Specification PR tracker - Issues - openEHR JIRA

@erik.sundvall do you have now an overview with all usages from community? Have you got something else other then what is mentioned above?

Jelte · 10 January 2025 10:17

Within Nedap we’re also using reverse domain name, e.g. something similar to com.nedap.tentant.

thomas.beale · 11 January 2025 18:40

Should that ‘tentant’ be ‘tenant’? Or is that just Dutch for ‘tenant’? Just asking because everything in the cloud is indeed a tenant within some kind of subscription like AWS, Azure etc.

Jelte · 21 January 2025 09:56

That was a typo, it should be tenant.