RFC - CR-000150 - express language etc as a String

thomas.beale · 18 August 2005 11:32

As part of the current group of CRs being analysed for openEHR, we are considering CR-000150 (http://coruscant.chime.ucl.ac.uk:8200/openEHR_Collector/projects/specifications/CR/150 in the CR system) which basically says that where we have an attribute in the refreence model which represents language, encoding or territory, we should directly use Strings rather than use CODE_PHRASE as we do now.

Current Situation

rong.chen · 20 August 2005 20:45

Tim Cook wrote:

We would be interested in opinions on this proposal. I personally do not know whether we can regard the ISO and IANA code sets for country, language and character-sets as 'safe for all time'; does anyone have inside knowledge on this? Any other opinions welcome.

I certainly do not have any inside information on the stability of ISO
codes.

I do think it is fairly obvious that hardwiring ANY specific code set is
against the foundational premise that openEHR specifications define
(hope to define) the future proof EHR model. The only thing we can be
certain of is that things defined today will change in the future.

I agree. We need to have this indirection to handle changes.

Even if we are certain that these codes will not change in the future and we feel the need to hardcode them in the software, some enumeration type instead of string type should be used to implement them since string is not type safe.

Cheers,

Rong

system · 21 August 2005 07:31

My ideas about this are:

coding systems never will be stable.
the way to handle change in OpenEHR (and CEN En13606) is via archetypes.
select a coding system and produce a ‘ancestor archetype’ that uses codes from a specific coding system.
over time a new ‘ancestor-archetype’ will be produced using a new version of the specific coding system or a new coding system altogether.
the question now is how to handle interoperability. The answer is the use for an ‘archetype ontology’. One we miss at this moment.
having this ‘archetype ontology’ makes it possible to define synonyms, anti-nyms, etc, and make semantic interoperability possible.
for it to work properly we need:
‘ancestor archetypes’, that can be inserted in Templates
an archetype ontology
an ‘archetype editor’ that is able to not only produce archetypes and templates but also assemble ancestor archetypes and normal archetypes into templates but also constrain each of them further.

Gerard Freriks

ps: I use sometimes the term ‘proto-archetype’ for the ‘ancestor-archetype’

Isabel_Roman · 21 August 2005 09:51

I’m agree with the idea of an archetype ontology.
In fact and archetype is an ontology, and the concepts reused in different archetypes can be part of the ancestor-archetype…

I tried something similar, only an experiment, when I expressed demographic archetypes in owl, but it was very centered in demographics.
This was the very, very, simple ontology I used to represent some useful concepts of archetypes I need for demographics:
http://trajano.us.es/~isabel/EHR/basearquetipo.owl
This could be named the ancestor-archetype for demographics, very simple of course…

And this was the person archetype:
http://trajano.us.es/~isabel/EHR/person.owl
all this using the openEHR ontology
http://trajano.us.es/~isabel/EHR

All these are in owl but could be expressed in ADL too.

Regards
Isabel Román
http://trajano.us.es/~isabel

thomas.beale · 21 August 2005 12:38

Gerard Freriks wrote:

My ideas about this are:

- coding systems never will be stable.
- the way to handle change in OpenEHR (and CEN En13606) is via archetypes.

well, in general that's the idea. But the question at hand is about coding in the reference model itself, i.e. for the structural (hard-wired) attributes that have coded values - in other words, things which we have specifically chosen not to archetype.There isn't much mileage in archetyping the code-set of ENTRY.language, for example - we don't want to open such a basic thing up to variation in archetypes. Instead we want it controlled inside the reference model and openEHR vocabularies. The original question of CR-150 was whether we should bypass even this flexibility and simply specify that such attributes are of type String (or maybe an enumerated type) and hard code them into the model. In my view, this is problematic in all sorts of ways - the main one is that each implementor will do this in a different, probably in compatible way.

- select a coding system and produce a 'ancestor archetype' that uses codes from a specific coding system.

This topic of 'ancestor archetypes' is a different issue. I am not yet sure what they are - are they any different from a normal specialisation parent archetype?

- over time a new 'ancestor-archetype' will be produced using a new version of the specific coding system or a new coding system altogether.

well, this already happens for the code-sets fo language and country etc, using the openEHR vocabulary approach for them - that is, we have openEHR_language and openEHR_country vocabularies which wrap the ISO code-sets; this allows us to change what they wrap, add extra codes and so on.

- the question now is how to handle interoperability. The answer is the use for an 'archetype ontology'. One we miss at this moment.

in general, that's true, but it wouldn't make any difference for the basic vocabularies of language and territory.

- thomas

williamtfgoossen · 21 August 2005 13:56

In een bericht met de datum 21-8-2005 14:43:17 West-Europa (zomertijd), schrijft Thomas.Beale@OceanInformatics.biz:

coding systems never will be stable.

the way to handle change in OpenEHR (and CEN En13606) is via archetypes.

well, in general that’s the idea. But the question at hand is about
coding in the reference model itself, i.e. for the structural
(hard-wired) attributes that have coded values - in other words, things
which we have specifically chosen not to archetype.There isn’t much
mileage in archetyping the code-set of ENTRY.language, for example - we
don’t want to open such a basic thing up to variation in archetypes.
Instead we want it controlled inside the reference model and openEHR
vocabularies. The original question of CR-150 was whether we should
bypass even this flexibility and simply specify that such attributes are
of type String (or maybe an enumerated type) and hard code them into the
model. In my view, this is problematic in all sorts of ways - the main
one is that each implementor will do this in a different, probably in
compatible way.

Some coding systems must remain stable. E.g. in the format of clinical instruments with specific clinimetric or psychometric characteristics, examples include Barthel, Apgar score, already to some extend archetyped.
The answer categories and value sets here are themselves standardized. These must be hard coded into the archetype to enforce the compatibility.
Here the intelligent semantic interoperability comes into vision: the clinical and evidence base itself enforces specific semantics (variable, values, codes) and thus requires the data be stored and exchanged in only this format. There is no optionality for string here, and also enumeration cannot be done on the fly.

See examples of not yet archetypes on www.zorginformatiemodel.nl. These 90 or so instruments and observation sets will become available in English (we work on request by request base).
Here we will achieve no mileage in archetyping the code-set but probably lightyears

My 2 Eurocents.

William Goossen

thomas.beale · 21 August 2005 16:10

Williamtfgoossen@cs.com wrote:

In een bericht met de datum 21-8-2005 14:43:17 West-Europa (zomertijd), schrijft Thomas.Beale@OceanInformatics.biz:

Some coding systems must remain stable. E.g. in the format of clinical instruments with specific clinimetric or psychometric characteristics, examples include Barthel, Apgar score, already to some extend archetyped.
The answer categories and value sets here are themselves standardized. These must be hard coded into the archetype to enforce the compatibility.

that is of course possible and intended when you are talking about clinical categories.

Here the intelligent semantic interoperability comes into vision: the clinical and evidence base itself enforces specific semantics (variable, values, codes) and thus requires the data be stored and exchanged in only this format. There is no optionality for string here, and also enumeration cannot be done on the fly.

See examples of not yet archetypes on www.zorginformatiemodel.nl. These 90 or so instruments and observation sets will become available in English (we work on request by request base).

This set of models will be of great use, but we will need to extract the clinical aspects and remove all the hl7 message-related specifics - then we will have the basis of interoperable clinical models the whole community can work with.

Here we will achieve no mileage in archetyping the code-set but probably lightyears

do you have a ruler?-)

- thomas beale

Sam · 21 August 2005 21:52

Dear All

I am not proposing that the language codes are stable for all time, rather that they are from a fixed set in one version of the reference model. I am worried that some people will see the CODE PHRASE as meaning that you can set these to whatever you like (despite invariants - or otherwise why have it) and do so and that it will add room for errors.

I am not so concerned about it as I used to be, but simplicification has its value - just look at how CCR is gaining momentum in the US - because people can use it!

Sam

Thomas Beale wrote:

As part of the current group of CRs being analysed for openEHR, we are considering CR-000150 (http://coruscant.chime.ucl.ac.uk:8200/openEHR_Collector/projects/specifications/CR/150 in the CR system) which basically says that where we have an attribute in the refreence model which represents language, encoding or territory, we should directly use Strings rather than use CODE_PHRASE as we do now.

Current Situation

These particular attributes, which occur, for example in the class DV_TEXT (see http://svn.openehr.org/specification/TRUNK/publishing/architecture/rm/data_types_im.pdf) use international standard codesets as follows:

language is represented by a CODE_PHRASE object containing a code-set id for openehr-languages, which is the same as ISO 639 2-character language codes

encoding is similarly represented, but using codes from IANA character sets, see http://www.iana.org/assignments/character-sets

territory is similarly represented, using ISO 3166 2-character country codes

All three of these codesets are currently ‘wrapped’ by openEHR code-sets (see Support IM, http://svn.openehr.org/specification/TRUNK/publishing/architecture/rm/support_im.pdf), and it is the openEHR code-sets which are mentioned in the reference model invariants, thus forcing the appropriate attributes always to be a code from the appropriate code set. This level of indirection allows for openEHR to, in the future, use different code sets for this purpose (e.g. the ISO 3-character code sets, or perhaps an ISO replacement for the IANA charater set names, or even IANA equivalents for the ISO code sets); the reference model would remain valid regardless.

The logic for choosing to model these codes as CODE_PHRASEs in openEHR was for consistency: every coded entity in openEHR is either a DV_CODED_TEXT (which contains a CODE_PHRASE) or a CODE_PHRASE (used when the codes themselves carry the meaning, as most of the ISO and IANA codesets do). IN practical terms it does of course mean slightly more data instances at a fine-grained level; e.g. in XML you would see more tags and data items for each CODE_PHRASE compared to a simple String field.

Proposed Situation

Sam Heard has proposed that these three types of codes should be hard-wired into the reference model - as direct string attributes, and that the reference model documentation should simply say that the particular ISO or IANA codes are mandatory in each case.

This is a reasonable position - these codesets seem to be very stable - some would say they are the most stable of any coded entity today. There is undoubtedly software around which does hardwire such codes, and has never had a problem. There is also an argument for simpler object structures as well - a String is simpler than a CODE_PHRASE. However, semantically, the current and proposed solutions are the same - in the current situation, invariants guarantee the the codes must come from the appropriate codesets for each particular attribute.

Possible objections are:

the indirection we currently have is useful: there is no guarantee that we won’t have to move to another code-set which better serves the same purpose

the consistency in the software (all coded entities are always dealt with via the terminology service, no matter what they are) is preferable to having certain fields that the software itself directly knows the codes of

We would be interested in opinions on this proposal. I personally do not know whether we can regard the ISO and IANA code sets for country, language and character-sets as ‘safe for all time’; does anyone have inside knowledge on this? Any other opinions welcome.

thomas beale

If you have any questions about using this list, please send a message to d.lloyd@openehr.org

system · 21 August 2005 21:55

– –

Gerard Freriks, arts

Huigsloterdijk 378

2158 LR Buitenkaag

The Netherlands

T: +31 252 544896

M: +31 654 792800

Gerard Freriks wrote:

well, in general that’s the idea. But the question at hand is about coding in the reference model itself, i.e. for the structural (hard-wired) attributes that have coded values - in other words, things which we have specifically chosen not to archetype.

There isn’t much mileage in archetyping the code-set of ENTRY.language, for example - we don’t want to open such a basic thing up to variation in archetypes. Instead we want it controlled inside the reference model and openEHR vocabularies. The original question of CR-150 was whether we should bypass even this flexibility and simply specify that such attributes are of type String (or maybe an enumerated type) and hard code them into the model. In my view, this is problematic in all sorts of ways - the main one is that each implementor will do this in a different, probably in compatible way.

I agree.

select a coding system and produce a ‘ancestor archetype’ that uses codes from a specific coding system.

This topic of ‘ancestor archetypes’ is a different issue. I am not yet sure what they are - are they any different from a normal specialisation parent archetype?

Certain archetype fragments will be needed. They are the prototypic ones. The ones I’m calling ‘ancestor archetypes’ are the standardised starting points we use to derive archetypes of.
They act as the start of families of archetypes.
The ones dealing with observations, the ones expressing measurements, etc.

over time a new ‘ancestor-archetype’ will be produced using a new version of the specific coding system or a new coding system altogether.

well, this already happens for the code-sets fo language and country etc, using the openEHR vocabulary approach for them - that is, we have openEHR_language and openEHR_country vocabularies which wrap the ISO code-sets; this allows us to change what they wrap, add extra codes and so on.

Good. Am I correct to see the wrapper as the ‘ancestor archetype’ and the variants as archetypes each constraining the ‘ancestor’.

the question now is how to handle interoperability. The answer is the use for an ‘archetype ontology’. One we miss at this moment.

in general, that’s true, but it wouldn’t make any difference for the basic vocabularies of language and territory.

Several parts of the archetype ontology will we extremely simple. Other parts will not.

thomas.beale · 22 August 2005 11:07

Gerard Freriks wrote:

Certain archetype fragments will be needed. They are the prototypic ones. The ones I'm calling 'ancestor archetypes' are the standardised starting points we use to derive archetypes of.
They act as the start of families of archetypes.
The ones dealing with observations, the ones expressing measurements, etc.

ah, well, you know my view on that! I beieve that basic categories such as Observation, Evaluation, Instruction and Act belong in the reference model, for two reasons:
a) it proves possible to devise formal models of such concepts which work for all possible specific types of the same concept. This is proven by building archeytpes. For example, no matter what kind of clinical observation we model with an archetype, the openEHR Observation concept still works. In some recent cases described by Grahame Grieve and Sam Heard, there may be a small change needed. This is how these classes can be evolved into solid, invariant definitions which work for all clinical uses.

b) we want to avoid the situation where archetype developers, or even develpers of 'proto-archetypes' are arguing about what an Observation, Evaluation etc are, and producing competing ancestor archetypes of differing versions of the concept. This will not help interoperability, and in any case, isn't even an interesting topic for most clinical people. They want to model concepts like "Haemaglobin A1c measurement", not "Observation". There is already a place for those that do want to debate what an Observation is: the reference model - they can always review that, and propose changes.

Sam and I have a paper under development which provides what we think is a solid theoretical and practical basis for basic types in the reference model, and provides a comprehensive typology of Entry subtypes. I think this will make the matter of what 'proto-archetypes' should and should not be used for clearer.

- thomas

system · 22 August 2005 14:17

Thomas,

read below.

Gerard

– –

Gerard Freriks, arts

Huigsloterdijk 378

2158 LR Buitenkaag

The Netherlands

T: +31 252 544896

M: +31 654 792800

ah, well, you know my view on that! I beieve that basic categories such as Observation, Evaluation, Instruction and Act belong in the reference model, for two reasons:

a) it proves possible to devise formal models of such concepts which work for all possible specific types of the same concept. This is proven by building archeytpes. For example, no matter what kind of clinical observation we model with an archetype, the openEHR Observation concept still works. In some recent cases described by Grahame Grieve and Sam Heard, there may be a small change needed. This is how these classes can be evolved into solid, invariant definitions which work for all clinical uses.

The items you mention have to be part of a standard. We agree fully.
The reference model or an other place is fine. As long as it is part of a standard.

The problem is where? I reserved in my mind part 3 of EHRcom for this.

b) we want to avoid the situation where archetype developers, or even develpers of ‘proto-archetypes’ are arguing about what an Observation, Evaluation etc are, and producing competing ancestor archetypes of differing versions of the concept. This will not help interoperability, and in any case, isn’t even an interesting topic for most clinical people. They want to model concepts like “Haemaglobin A1c measurement”, not “Observation”. There is already a place for those that do want to debate what an Observation is: the reference model - they can always review that, and propose changes.

Sam and I have a paper under development which provides what we think is a solid theoretical and practical basis for basic types in the reference model, and provides a comprehensive typology of Entry subtypes. I think this will make the matter of what ‘proto-archetypes’ should and should not be used for clearer.

Looking forward to an early draft.
It will get my full attention.

GF

thomas.beale · 22 August 2005 15:04

Gerard Freriks wrote:

Thomas,

read below.

Gerard

-- <private> --

Gerard Freriks, arts

Huigsloterdijk 378

2158 LR Buitenkaag

The Netherlands

T: +31 252 544896

M: +31 654 792800

ah, well, you know my view on that! I beieve that basic categories such as Observation, Evaluation, Instruction and Act belong in the reference model, for two reasons:

a) it proves possible to devise formal models of such concepts which work for all possible specific types of the same concept. This is proven by building archeytpes. For example, no matter what kind of clinical observation we model with an archetype, the openEHR Observation concept still works. In some recent cases described by Grahame Grieve and Sam Heard, there may be a small change needed. This is how these classes can be evolved into solid, invariant definitions which work for all clinical uses.

The items you mention have to be part of a standard. We agree fully.
The reference model or an other place is fine. As long as it is part of a standard.

The problem is where? I reserved in my mind part 3 of EHRcom for this.

Don't forget the new work item - the Archetype Knowledge Framework. In this there is the 'Domain Base Concept Model' - it is an agreed UML model on which to base interoperable archetypes. We believe it will look a lot like openEHR, but my most recent analysis is that it won't be the same; openEHR is slightly deficient in places. The needs of the Danish G-EPJ would also be directly addressed.

- thomas

Hamm_Russell_A · 23 August 2005 17:27

Hi John,

Sorry for the delay in getting back to you.

I just downloaded the most recent version of Protégé (Protégé 3.1), and can open an OWL files Isabel specifies directly in Protégé.

File->New Project

In the “Create New Project” window check the “Create from Existing Sources” checkbox. Click “Next” Then paste the URL Isabel specifies in the text box. It may throw an exception, but the file appears to open.

-russ

Topic		Replies	Views
Private response, so OpenEHR list is not for further discussion? Clinical (archive)	5	5	19 September 2005
CEN meeting and data types Clinical (archive)	14	27	7 March 2007
Microsoft/NHS common health interface and openEHR datatypes Clinical (archive)	24	39	14 February 2008
poor version management in archetype editor Clinical (archive)	20	21	8 December 2008
ADL 1.2 proposal Technical (archive)	11	16	11 May 2004
Suggestions re Term binding in Archetype Editor Technical (archive)	49	18	8 January 2007
ADL to XML Schema Technical (archive)	29	25	15 March 2005
ANNC: openEHR Release 1.0 candidate uploaded Technical (archive)	9	9	2 November 2005
one-to-many term bindings in archetypes Clinical (archive)	12	23	3 April 2015
XML Focus Group for openehr Clinical (archive)	15	19	16 September 2008

RFC - CR-000150 - express language etc as a String

Current Situation

Proposed Situation

Related topics