CEN meeting and data types

Dear All

I have been at the CEN working group meetings representing Standards Australia. It is clear to me that CEN needs to take on the openEHR data types in order to progress quickly. The ISO data types are likely to be appropriate for the HL7 environment and will map to openEHR - but the openEHR data types are ready for archetypes and the cluster element (leaf node) architecture.

You can have a look at the ISO data type proposal likely to come through HL7 soon at:

http://informatics.mayo.edu/wiki/index.php/ISO_Datatypes

user name: wiki

password: wikiwiki

It will be helpful to make your views known on this list.

Cheers, Sam

Sam,

It would be helpful to provide (more) arguments for your opinion.

Gerard

– –
Gerard Freriks, MD
Huigsloterdijk 378
2158 LR Buitenkaag
The Netherlands

T: +31 252544896
M: +31 620347088
E: gfrer@luna.nl

Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety. Benjamin Franklin 11 Nov 1755

hey Sam

I'll bite :wink:

> but the openEHR data types are ready for
> archetypes and the cluster element (leaf node) architecture.

it you want, we can go round and round on semantic issues. Always
a pleasure ;-). But is there anything specific that makes
you think that it would be inappropriate or unwise to use the
iso datatypes in the document with 13606? (so not including
general issues)

Grahame

Sam Heard wrote:

Grahame Grieve wrote:

hey Sam

I'll bite :wink:

> but the openEHR data types are ready for
> archetypes and the cluster element (leaf node) architecture.

it you want, we can go round and round on semantic issues. Always
a pleasure ;-). But is there anything specific that makes
you think that it would be inappropriate or unwise to use the
iso datatypes in the document with 13606? (so not including
general issues)

I guess it depends on what CEN wants to achieve, and also what the
implementation state and intention of the ISO types is. Possibilities I see:

    * Let's say that the ISO types provide a set of types whose purpose
      is to facilitate data type conversion between HL7 & HL7-like (e.g.
      various flavours of v2, v3 etc), openEHR, others (UN-cefact? ASTM?
      etc). Then the kind of implementations will be limited to XML
      conversion.
    * On the other hand, if they were used as "real data types", say in
      CEN, then there is now the job of implementing them in all the
      major technologies and testing them. Plus they need to be checked
      for use with archetypes.
    * If CEN used the openEHR data types, they get something implemented
      in Java, C#, Eiffel, XSD (others?), that are heavily debugged and
      in production use now, and for which the constraint semantics and
      syntax are already known and tested in ADL. This includes
      constraint types for String (C_STRING), Integer (C_INTEGER),
      ....Date (C_DATE)..plus specialist constrainer types for
      DV_ORDINAL (C_DV_ORDINAL), DV_QUANITTY (C_DV_QUANTTY) and
      CODE_PHRASE (C_CODE_PHRASE). These have all been tested and are
      known to work, and numerous archetypes have used them. Also, the
      openEHR data types are founded on existing standard data types
      (ISO11404), and assume the standard semantics for all the usual
      built-in things (String, Integer, Boolean, Array<>, List<>,...)
      plus the ISO8601 date/time types (Date, Time, etc)

Now, since CEN is an archetype-enabled standard, it might make sense to
use data types that are known to work in software and known to work for
archetypes.

So one question is: what is the intended use of the new ISO date types
(conversion, or to be the 'real thing')? Secondly, how will CEN EN13606
be validated with a new set of data types?

- thomas beale

Thomas,

I agree with you that in the case CEN (13606) adopts the OpenEHR data types they know that it is proven technology.
Just now, when any moment CEN/tc251 EN13606 will get published, we dearly need proven data types to implement it.

In the case that CEN will make the choice for OpenEHR, my remaining questions are:
What harm is done?
How can CEN/tc251 EN13606 be aligned, some years from now, with the forthcoming ISO data type standard?
Can it be aligned? Or can’t it?

Gerard

– –
Gerard Freriks, MD
Huigsloterdijk 378
2158 LR Buitenkaag
The Netherlands

T: +31 252544896
M: +31 620347088
E: gfrer@luna.nl

Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety. Benjamin Franklin 11 Nov 1755

Gerard Freriks wrote:

Thomas,

I agree with you that in the case CEN (13606) adopts the OpenEHR data types they know that it is proven technology.
Just now, when any moment CEN/tc251 EN13606 will get published, we dearly need proven data types to implement it.

In the case that CEN will make the choice for OpenEHR, my remaining questions are:
What harm is done?
How can CEN/tc251 EN13606 be aligned, some years from now, with the forthcoming ISO data type standard?
Can it be aligned? Or can't it?

I think Grahame will ensure that it is not only 'aligned', but safely inter-convertible... - he is working on it now.

- thomas

I will try, anyway.

Grahame

Thomas Beale wrote:

In een bericht met de datum 22-2-2007 11:36:46 West-Europa (standaardtijd), schrijft Thomas.Beale@OceanInformatics.biz:

Now, since CEN is an archetype-enabled standard, it might make sense to
use data types that are known to work in software and known to work for
archetypes.

So one question is: what is the intended use of the new ISO date types
(conversion, or to be the ‘real thing’)? Secondly, how will CEN EN13606
be validated with a new set of data types?

  • thomas beale

Good points / questions,

my 2 cents on this:

I would like to distinghuis between the few datatypes that are basic and work in software, in archetypes, in HL7 v2 and in HL7 v3, not much but there will be several, and the ones that are technical implementation specific. From clinicians point of view then most day to day data will be represented and they will not have to worry about unimportant technical details (unimportant because smart technicians have found conversion methods to deal with it).

imho the ISO standard should define the generic real thing. Integer is real, string is real, OpenEHRstring is one technical artifact derived from real thing to work in some software. Next, it should facilitate in preventing battles to make conversions possible.

This can only be solved if we step back from the technical data specification and use the clinical data specification as point of reference, map from there to CEN, Open EHR, ISO, HL7 v2 and v3. It is like the standards, no explosions wanted.

Hope this helps,

William

Dear all,

We need to be careful.

Wiki defines:

Built-in abstract data types

Because some ADTs are so common and useful in computer programs, some programming languages build implementations of ADTs into the language as native types or add them into their standard libraries. For instance, Perl arrays can be thought of as an implementation of the List or Deque ADTs and Perl hashes can be thought of in terms of Map or Table ADTs. The C++ Standard Library and Java libraries provide classes that implement the List, Stack, Queue, Map, Priority Queue, and String ADTs.

The rest are other things, other ‘types of data types’ at best.
Artefacts that need a proper definition and scope before we use them in an argument.

What CEN, HL7, OpenEHR and ISO need is an agreement about the ADT’s they basically need.
Plus the next level (layer) of other artefacts they need in the models that are deployed using the set of agreed ADT’s.

CEN is using the term ‘CEN Data Types’ for these.
It is a mixture of CEN specific definitions and ADT’s.
HL7 is using the term ‘Abstract Data Types’ for their CEN-like collection of artefacts.
Even Addresses and Telephone numbers are part of there scope.
Here I sense (one of several) confusions created by terms used in the HL7 community and products.

On top of all this there are artefacts (Archetypes/Templates) that are the leaf-nodes in that context and we must never use the term data type for those.
Data types ‘live’, are defined, have a scope, in the ICT world of programming languages and databases.
The leaf nodes in archetypes and templates are defined at the healthcare level.
In no way they can be considered ‘data types’ and must classified as such.

The way archetypes and templates are expressed in ADL contain real data types’ since this is the ICT-world.

It is for these reasons that the contribution by William confuses me.
Things are getting mixed up. Creating problems.

With regards,

Gerard

– –
Gerard Freriks, MD
Huigsloterdijk 378
2158 LR Buitenkaag
The Netherlands

T: +31 252544896
M: +31 620347088
E: gfrer@luna.nl

Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety. Benjamin Franklin 11 Nov 1755

Williamtfgoossen@cs.com wrote:

In een bericht met de datum 22-2-2007 11:36:46 West-Europa
(standaardtijd), schrijft Thomas.Beale@OceanInformatics.biz:

Good points / questions,

my 2 cents on this:

I would like to distinghuis between the few datatypes that are basic
and work in software, in archetypes, in HL7 v2 and in HL7 v3, not
much but there will be several, and the ones that are technical
implementation specific. From clinicians point of view then most day
to day data will be represented and they will not have to worry about
unimportant technical details (unimportant because smart technicians
have found conversion methods to deal with it).

imho the ISO standard should define the generic real thing. Integer is
real, string is real, OpenEHRstring is one technical artifact derived
from real thing to work in some software. Next, it should facilitate
in preventing battles to make conversions possible.

I am not sure what you mean by this William...there isn't any special
'OpenEHRstring' - in openEHR, all those 'inbuilt types' are assumed -
see the Support IM -
http://svn.openehr.org/specification/BRANCHES/Release-1.1-candidate/publishing/architecture/rm/support_im.pdf
- Same goes for Integer, Real, Boolean etc. This was one of the points
of departure from HL7, which redefined all the basic types (INT, REAL,
ST, BL etc).

This can only be solved if we step back from the technical data
specification and use the clinical data specification as point of
reference, map from there to CEN, Open EHR, ISO, HL7 v2 and v3. It is
like the standards, no explosions wanted.

that sounds right, but I am not sure if I have understood you in the way
intended. The data types in openEHR have largely been driven by clinical
input (particularly expert pathology input) and archetyping experience.
But they are of course technical artifacts, else we cannot write software...

- thomas

Hi Sam

some responses

The first issue to note is the ANY class which contains a number of complex
attributes not shown in the package. nullFlavour is dealt with in CEN and
openEHR at the ELEMENT level which is far more appropriate in systems - ie test
the datavalue, if it is Null then test the flavourOfNull on the element. As HL7
does not have the idea of container, this has to be here. All the other
attributes are dealt with at different levels (eg Template Id which may apply to
an ELEMENT but never to a data value. Adding these to CEN would cause major
duplication, increase complexity without benefit and deteriorate performance.

well, the concept of the ISO datatypes (also true for 11404) is that you can
have direct and indirect conformance. If you are claiming indirect conformance
(which will be true for HL7 and openEHR), then you must provide a mapping that
explains how your implementation differs from the ISO datatypes as they stand.

I have drafted an openEHR mapping - but it's just a series of notes at this time.
But the openEHR mapping would explain all these things above and ANY would not
have these properties. The same would apply to 13606.

The next issue is the inheritance hierarchy. In systems one would expect to be
able to substitute on the basis of specialisation. While we can write invariants
occasionally for good reason to prohibit this in some situations, generally this
should apply. What strikes me about the inheritence model here is that it is
really a way of constraining the large class 'Term' for particular purposes.
Take for instance CV - it is a term that has translations added (CD), and then
has qualifiers removed (CE) then has translations removed! While there may be
use cases when Term has all the attributes it does, this hierarchy cries out for
remodelling!

> Further, is it usable? How would clinicians know which one to use in an archetype?

yeah, I have some sympathy for this point. CE/CV are just constraints on CD, and
defining them as separate types is rather inelegant. For mapping purposes, I'd
simply drop CE and CV from openEHR & 13606. Since Term and Translation are private
types, that simply leaves CD : coded value. That's not had for clinicians to pick
between :wink:

Translations are the equivalent of the openEHR mapping. By including all the
text and qualifiers in translations one may find a good deal of difficulty
understanding the meaning.

well, translations may require qualifiers. And the openEHR spec (rightly)
allows this too.

as for text on qualfiers and translations, this is an interesting point.
To be really precise and pedantic, there are use cases for this. But if
you aren't really concerned with being pedantic and precise, you should
be able to ignore these things. I guess this is something I could usefully
add to the spec - that the meaning of the text associated with qualifiers
and translations can never have any independent contribution to the meaning
of the whole term - I'll have to think about how to phrase that properly.

The issue of qualifiers is a large one and while the argument for the HL7
approach is that this provides a syntax for coordination of terms, it is not
expressed in the model. CR is ambiguous from my point of view with two terms
that are both optional and the value may have translations and qualifiers.
Perhaps a computable set of rules will arise for how to control this space but I
suspect some gnomes will be required. This approach was first published in GEHR
in the early 90s and has been dropped in openEHR as it was deemed unworkable
from a semantic point of view. One can argue that the CODE_PHRASE in openEHR is
as problematic potentially - but as it is provided by a terminology service, it
is far less likely that enthusiastic implementers will start adding data willy
nilly.

So the HL7 qualifier thing is (mostly) simply a predefined expression syntax for
post-coordination. It overlaps with terminologies that have their own expression
syntax - such as SNOMED. The HL7 model does allow a richer expression of the
details of the construction of the expression - such as which text led to which
qualifier, but this is, as I said, for precision and pedantry. Not for normal
everyday use. So the question is, is it better to squeeze things into the
text of a CODE_PHRASE, or to squeeze things into xml? Either way, you need to
have a terminology service to do anything useful with the data. So what's the
difference? I don't have a strong feeling about that.

Grahame

Grahame Grieve wrote:

  
So the HL7 qualifier thing is (mostly) simply a predefined expression syntax for
post-coordination. It overlaps with terminologies that have their own expression
syntax - such as SNOMED. The HL7 model does allow a richer expression of the
details of the construction of the expression - such as which text led to which
qualifier, but this is, as I said, for precision and pedantry. Not for normal
everyday use. So the question is, is it better to squeeze things into the
text of a CODE_PHRASE, or to squeeze things into xml? Either way, you need to
have a terminology service to do anything useful with the data. So what's the
difference? I don't have a strong feeling about that.

I think the main point here is that CODE_PHRASE and other similar parts
of the openEHR model (and this applies to any model at all) that are
modelled using an internal syntax string (which could itself be XML -
who is to say it isn't?) implies quite strongly that the contents of the
relevant attributes (CODE_PHRASE.code_string) are the business of some
outside system, not openEHR itself. In purely technical terms, using a
class modelling approach for such things may be the same as using the
syntax approach - i.e. any code_string generated by a terminology server
can most likely be modelled using a class model as well, something like
HL7's classes. But....
* there is always the possibility that it can't - because the class
model commits to one idea of terminology coordination, while the syntax
approach leaves it open
* the information model shouldn't dictate to the terminology environment
how to represent its artefacts.

The key point about the openEHR approach in this area is that a
CODE_PHRASE code_string is just a 'key' to a database that just happens
to be a terminology service. The construction of the keys is the
latter's business not the business of the client of the service.

- thomas beale

Hi

So the HL7 qualifier thing is (mostly) simply a predefined expression syntax for
post-coordination. It overlaps with terminologies that have their own expression
syntax - such as SNOMED. The HL7 model does allow a richer expression of the
details of the construction of the expression - such as which text led to which
qualifier, but this is, as I said, for precision and pedantry. Not for normal
everyday use. So the question is, is it better to squeeze things into the
text of a CODE_PHRASE, or to squeeze things into xml? Either way, you need to
have a terminology service to do anything useful with the data. So what's the
difference? I don't have a strong feeling about that.

I think the main point here is that CODE_PHRASE and other similar parts
of the openEHR model (and this applies to any model at all) that are
modelled using an internal syntax string (which could itself be XML -
who is to say it isn't?) implies quite strongly that the contents of the
relevant attributes (CODE_PHRASE.code_string) are the business of some
outside system, not openEHR itself. In purely technical terms, using a
class modelling approach for such things may be the same as using the
syntax approach - i.e. any code_string generated by a terminology server
can most likely be modelled using a class model as well, something like
HL7's classes. But....
* there is always the possibility that it can't - because the class
model commits to one idea of terminology coordination, while the syntax
approach leaves it open

do we know of any case?

* the information model shouldn't dictate to the terminology environment
how to represent its artefacts.

I have some sympathy for this. I have been tempted to toast the qualifier
and push everything into code as you guys have done, for the same reasons.
But I haven't found any case where the existing qualifier syntax is a
problem, and there is accepted requirements for originalText on the
qualifiers (at least, HL7 has accepted them). SO I didn't toast it, but
I did say in the openEHR mapping that you'd collapse the qualifiers into
the code phrase. I don't have a strong feeling for whether this would be
necessary or appropriate for 13606

Grahame

Graham,

There can not be one model (or model of systems) that does it all in a perfect way.
We must agree that it is best to leave every domain with its own problems and models that help solve it.
We must have a standard way to deal with it.

About what is needed for EN13606 (a formal European standard and Nationale one in all Member States) I do not know for certain.
On one hand everything in the EHRcom extract traveling between systems will be resolved completely. No dependence on services somewhere (or not) in between.
But this holds for legacy systems, as we know them.
What we want, and expect to happen, is that those old legacy systems are replaced by OpenEHR conformant (EN13606 conformant) systems.
Then the rules will be different, because without several standardised services used by EHR-systems of the future we can not build the systems of the future..
Provided that those services are based on (European) Standards, no doubt.

Gerard

– –
Gerard Freriks, MD
Huigsloterdijk 378
2158 LR Buitenkaag
The Netherlands

T: +31 252544896
M: +31 620347088
E: gfrer@luna.nl

Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety. Benjamin Franklin 11 Nov 1755

Grahame Grieve wrote:

Hi

* there is always the possibility that it can't - because the class
model commits to one idea of terminology coordination, while the syntax
approach leaves it open
    
do we know of any case?
  

I don't have a concrete one - so fair enough

  

* the information model shouldn't dictate to the terminology environment
how to represent its artefacts.
    
I have some sympathy for this. I have been tempted to toast the qualifier
and push everything into code as you guys have done, for the same reasons.
But I haven't found any case where the existing qualifier syntax is a
problem, and there is accepted requirements for originalText on the
qualifiers (at least, HL7 has accepted them). SO I didn't toast it, but
I did say in the openEHR mapping that you'd collapse the qualifiers into
the code phrase. I don't have a strong feeling for whether this would be
necessary or appropriate for 13606
  

I don't have a problem; I just think we need a clear description of the
equivalence so mappings are safe.

- thomas