Implementation fine details - case sensitivity and date time formats

Hi implementers,

We have come across a few issues recently which will affect interoperability between openEHR implementations if we don’t have some tighter guidelines for implementation. I will start with a couple that relate to case sensitivity of code strings and identifier values. I am interested in people’s views about handling case sensitivity in the following cases.

The first is with regard to territory and encoding code phrase strings. Rong recently found when using the Ocean EhrBank Service with the Java reference implementation that we differ in which case we use. RONG expects “AU” while I use “au” and similar for “UTF-8” and “utf-8”.

The ISO-3166-1 specifies these as upper case values while the openEHR terminology specification provides examples using lower case.

Similarly, when I use two different XML editors that I have on my machine to create a new XML document I get one with an upper case encoding and the other lower case.

The other case is with regard to identifier values and URIs, in particular EHR_URIs. We use GUIDs for our EHR ID, VERSIONED_OBJECT uid and CONTRIUBUTION uid values etc. We also currently use a GUID for our system ID. Again there seems to be some inconsistency between implementations of creating GUIDs (this time from the same vendor). My development framework creates lower case GUIDs while my database system creates upper case GUIDs. Hence I get version UIDs such as the following.

0835608c-4863-4a54-b38a-3c7150c0ba57::EA13C291-8433-4961-ADB8-CB8541E8B171::1

Similarly, when I create an EHR_URI I get something like the following.

ehr://0c34ac1c-0ffa-4958-81c4-ca914a3ab913@EA13C291-8433-4961-ADB8-CB8541E8B171/0835608c-4863-4a54-b38a-3c7150c0ba57@latest_trunk_version

Now, when I used a URI class from my development framework with the above URI and compared the ToString() result with the original uri, it was different because the URI class had converted it to lower case. Now this might seem OK for Internet URLs etc, but because the EHR_URI contains the VERSIONED_OBJECT uid possibly in another system, we need to ensure that we can find this object reliably across implementations.

I would also suggest the same issue arises for archetype IDs and node IDs. Is “openEHR-EHR-OBSERVATION.blood_pressure.v1” the same as “openehr-ehr-observation.blood_pressure.v1” and is “at0001” the same as “AT0001”. The later we had an issue with when doing some work with the AQL gramma where we wanted the keywords to be case insensitive but the node ID to be lower case only. The parser tool we were using treated all literals in the gramma as case insensitive.

Enough background.

Should code phrase strings be case sensitive? If so, which case do we use for territory and encoding, etc.

Should object identifier values be case sensitive?

Should archetype/node identifiers be case sensitive?

Should URIs be case sensitive?

Should EHR_URIs be case sensitive?

Thoughts?

Another issue we need to think about is Date/Time value formats using ISO-8601. My experience is that intrinsic development framework classes (and XML) do not support ISO-8601 completely/properly, especially for the requirements of openEHR such as partial date/times etc. We have had to implement our own set of classes to ensure ISO-8601 is properly implemented. So questions are:

Is basic or extended format the preferred?

Is it allowable to convert original data format to another format for internal use?

What is the normalised format for the purposes of creating a signature digest?

I certainly have opinions on these but I would like to hear others first.

Regards

Heath

Heath Frankel
Product Development Manager

Ocean Informatics

Ground Floor, 64 Hindmarsh Square

Adelaide, SA, 5000

Australia

ph: +61 (0)8 8223 3075

mb: +61 (0)412 030 741
email: heath.frankel@oceaninformatics.com

Hi Heath,

First of all my basic opinion is that implementations should be case
insensitive. All the items you mentioned are "... like a box of
chocolates. You never know what you're gonna get."
--Forrest Gump's momma

The ISO-3166-1 specifies these as upper case values while the openEHR
terminology specification provides examples using lower case.

I do not recall case sensitivity being 'specified'.

The other case is with regard to identifier values and URIs, in
particular EHR_URIs. We use GUIDs for our EHR ID, VERSIONED_OBJECT
uid and CONTRIUBUTION uid values etc. We also currently use a GUID
for our system ID. Again there seems to be some inconsistency between
implementations of creating GUIDs (this time from the same vendor).

Another example of the need for case insensitive.

I would also suggest the same issue arises for archetype IDs and node
IDs. Is “openEHR-EHR-OBSERVATION.blood_pressure.v1” the same as
“openehr-ehr-observation.blood_pressure.v1” and is “at0001” the same
as “AT0001”. The later we had an issue with when doing some work with
the AQL gramma where we wanted the keywords to be case insensitive but
the node ID to be lower case only.

Why do node IDs have to be lowercase?

Should code phrase strings be case sensitive?

No.

Should object identifier values be case sensitive?

No.

Should archetype/node identifiers be case sensitive?

No.

Should URIs be case sensitive?

No.

Should EHR_URIs be case sensitive?

No.

Another issue we need to think about is Date/Time value formats using
ISO-8601.

Probably my most painful implementation issue. :slight_smile:

My experience is that intrinsic development framework classes (and
XML) do not support ISO-8601 completely/properly, especially for the
requirements of openEHR such as partial date/times etc. We have had
to implement our own set of classes to ensure ISO-8601 is properly
implemented.

Glad you brought this up. I would like to see openEHR develop a profile
similar to what W3C did to make 8601 implementable.
http://www.w3.org/TR/NOTE-datetime

What is the normalised format for the purposes of creating a signature
digest?

I'm not certain what you mean by this.

Cheers,
Tim

Should code phrase strings be case sensitive? If so, which case do we use
for territory and encoding, etc.

Should object identifier values be case sensitive?

Should archetype/node identifiers be case sensitive?

Should URIs be case sensitive?

Should EHR_URIs be case sensitive?

Unless there is a legitimate reason for there to be two distinct
items with differing case (i.e unless AU and au should actually
_mean_ something different), I would vote for case insensitivity
throughout. Which is not to say that I want the case to be
discarded - I would think most systems should be case
preserving, but case insensitive.

Another issue we need to think about is Date/Time value formats using
ISO-8601. My experience is that intrinsic development framework classes
(and XML) do not support ISO-8601 completely/properly, especially for the
requirements of openEHR such as partial date/times etc. We have had to
implement our own set of classes to ensure ISO-8601 is properly implemented.
So questions are:

Is basic or extended format the preferred?

Doesn't matter to me, as long as it is defined with enough detail in
the openehr spec.

Is it allowable to convert original data format to another format for
internal use?

What is the issue here regarding internal use? Is it a loss of
precision in round-tripping through the internal format or some other issue?

What is the normalised format for the purposes of creating a signature
digest?

Do we even have an agreed upon canonical format for the rest of data
re: digital signatures?

Andrew

I think it's better to stick to ISO when a definition for that is
available, as outer people don't need to be aware of local decisions.

For instance, country codes in ISO standard 3166 seem to be always on
uppercase, so I believe that is better to fix that to uppercase only
http://www.iso.org/iso/country_codes/iso_3166_code_lists/english_country_names_and_code_elements.htm

Hi implementers,

We have come across a few issues recently which will affect interoperability between openEHR implementations if we don’t have some tighter guidelines for implementation. I will start with a couple that relate to case sensitivity of code strings and identifier values. I am interested in people’s views about handling case sensitivity in the following cases.

The first is with regard to territory and encoding code phrase strings. Rong recently found when using the Ocean EhrBank Service with the Java reference implementation that we differ in which case we use. RONG expects “AU” while I use “au” and similar for “UTF-8” and “utf-8”.

The ISO-3166-1 specifies these as upper case values while the openEHR terminology specification provides examples using lower case.

Similarly, when I use two different XML editors that I have on my machine to create a new XML document I get one with an upper case encoding and the other lower case.

The other case is with regard to identifier values and URIs, in particular EHR_URIs. We use GUIDs for our EHR ID, VERSIONED_OBJECT uid and CONTRIUBUTION uid values etc. We also currently use a GUID for our system ID. Again there seems to be some inconsistency between implementations of creating GUIDs (this time from the same vendor). My development framework creates lower case GUIDs while my database system creates upper case GUIDs. Hence I get version UIDs such as the following.

0835608c-4863-4a54-b38a-3c7150c0ba57::EA13C291-8433-4961-ADB8-CB8541E8B171::1

Similarly, when I create an EHR_URI I get something like the following.

ehr://0c34ac1c-0ffa-4958-81c4-ca914a3ab913@EA13C291-8433-4961-ADB8-CB8541E8B171/0835608c-4863-4a54-b38a-3c7150c0ba57@latest_trunk_version

Now, when I used a URI class from my development framework with the above URI and compared the ToString() result with the original uri, it was different because the URI class had converted it to lower case. Now this might seem OK for Internet URLs etc, but because the EHR_URI contains the VERSIONED_OBJECT uid possibly in another system, we need to ensure that we can find this object reliably across implementations.

I would also suggest the same issue arises for archetype IDs and node IDs. Is “openEHR-EHR-OBSERVATION.blood_pressure.v1” the same as “openehr-ehr-observation.blood_pressure.v1” and is “at0001” the same as “AT0001”. The later we had an issue with when doing some work with the AQL gramma where we wanted the keywords to be case insensitive but the node ID to be lower case only. The parser tool we were using treated all literals in the gramma as case insensitive.

Enough background.

Should code phrase strings be case sensitive? If so, which case do we use for territory and encoding, etc.

Should object identifier values be case sensitive?

Should archetype/node identifiers be case sensitive?

Should URIs be case sensitive?

Should EHR_URIs be case sensitive?

Hi Heath,

Thanks for this thoughtful post on the implementation details.

I think we should stick to case insensitive whenever possible and decide on either upper case or lower case for internal representation. I also think this type of guideline should be included in the design specifications, e.g. class description of CODE_PHRASE and TERMINOLOGY_ACCESS so the intention is well understood by implementers.

I am not sure I understand your second question regarding date/time format completely. As long as the actual value is the same (meaning not truncated), different ways of representing the same value shouldn’t impede interoperability, right? So to your 2nd date/time question, yes it should be allowed as long as the value (in milliseconds?) is not altered. For the purpose of signature digest, I guess if you use the string representation instead of the actual value, we really need to agree on a particular format for this purpose.

Cheers,
Rong