Data Types RM

These comments relate to v1.7.1

Section 4.2 DV_BOOLEAN. are the values {true, false}
case sensitive? Generally, is openehr case sensitive
or not?

Section 5.1.1. Terminology Id's - this section doesn't
seem consistent with the Terminology section of
the external package. Also, terminology versions
aren't included, but this is usually important

Section 5.1.5. includes the following example:
  blah blah Ross River infection blah blah blah bronchial pneumonia
with some annotations about how these concepts are coded.
But I don't understand what the value of this is when the text
could be either of:
  Patient has Ross River infection which caused severe bronchial pneumonia
  Lab Result of Ross River infection is wrong. Patient has bronchial pneumonia

5.1.5.2 This section introduces the "match" attribute, which has one
of the following values:
   -1 code provided is more specific than it should be
   0 code matches intent
   +1 code provided is not as specific as the intent
The use of values like this seems poorly modelled - surely this
is a variation of the datum interpretation concept from the design
principles? Also, the use of a numerical value seems to fall into the
design practices condemned in that section, and makes me think that
some users/implementors will start putting in -10 when they feel strongly
about the coding issue (and who doesn't feel stringly about coding...)

5.1.6 Language Translations. This whole issue is a nightmare, but it bother's
me that because "it's not known what will be needed", we won't allow multiple
langauges. Surely, if the source material is in multiple languages - not
too unusual here with a signficant migrant population and many doctors from
these ethnic groups - then we know what the requirements are?

5.2.1 TERM_MAPPING.purpose - no terminology? how could this have any practical
meaning with a controlled usage?

5.4.1 TEXT_FORMAT_PROPERTY. It seems strange to me that this makes use of
CSS, rather than introducing semantic based markup. HTML is a mess, but
semantic based markup is a good thing, where as CSS is the opposite

6.3 - interval. I'm not sure why there is an implicit type Interval<T> and
this extra type - it's hard to work out what the implications of their
differences is

6.4 reference Range type - this seems a little simple to handle the gloriously
complicated reference ranges much beloved by endocrinologists. What is the
intent in these cases?

12.2.5 HL7 types. Ok, I have to comment.
1. I changed BIN to List<BN> from LIST<BL> so that the null issue is clarified.
2. ARRAY<CHAR> is not the same as ARRAY<BYTE>
3. ST is a problem but it is a list<ST> (oh, err, LIST<CHAR>) not a LIST<BL>.
4. I reworked the null stuff a little
so that's out of date now :wink:

Grahame

Grahame Grieve wrote:

These comments relate to v1.7.1

Section 4.2 DV_BOOLEAN. are the values {true, false}
case sensitive? Generally, is openehr case sensitive
or not?

no they're not. I'm not sure what it means to say "is openEHR case-sensitive or not" - wherever Strings occur, it is, since String processing in computing is always case-sensitive by default. Only in specific places (e.g. the Windows NTFS file system) has it been removed.

Section 5.1.1. Terminology Id's - this section doesn't
seem consistent with the Terminology section of
the external package. Also, terminology versions
aren't included, but this is usually important

As far as we can determine, there is no extant standard for names of terminologies. There is an ISO standard 11179 which we have yet to obtain and study. If this provides a more standard way of expressing names of terminologies, then we will use it. So far, we have used the form "name(version)" for terminologies, e.g. "SNOMED-CT(2003)". In the terminology ids section, we have for example:

Terminology_id_Languages: SET<STRING> = {ISO:639-1(1988), ISO:639-2(1998)...}
-- ISO language names; see http://www.loc.gov/standards/iso639-2/langhome.html
Terminology_id_Countries: SET<STRING> = {ISO:3166-1, ISO:3166-2, ...}
-- ISO country codes & country subdivision codes

So here, the nameof the first terminology is "ISO:639-1" and the version "1988". I don't think this is inconsistent, but there may be better ways to express such names.

Section 5.1.5. includes the following example:
blah blah Ross River infection blah blah blah bronchial pneumonia
with some annotations about how these concepts are coded.
But I don't understand what the value of this is when the text
could be either of:
Patient has Ross River infection which caused severe bronchial pneumonia
Lab Result of Ross River infection is wrong. Patient has bronchial pneumonia

actually, there was no intention that these two terms be linked in the text - the intention was to show that both text (DV_TEXT) and a coded term (DV_CODED_TEXT) could have mappings of other terms added.

5.1.5.2 This section introduces the "match" attribute, which has one
of the following values:
-1 code provided is more specific than it should be
0 code matches intent
+1 code provided is not as specific as the intent
The use of values like this seems poorly modelled - surely this
is a variation of the datum interpretation concept from the design
principles?

not sure why you say this - the only meaning for this attribute is to indicate match. There is no other data in the attribute. But you might argue that it should be an enumerated type rather than an Integer. We went for an integer for the following reasons:
- it is quite common in the C languages for -1/0/+1 to be the results of a comparison, e.g. with the string routines.
- to allow for the possibility in the future that sensible meanings could indeed be attached to higher /lower values.

Also, the use of a numerical value seems to fall into the
design practices condemned in that section, and makes me think that
some users/implementors will start putting in -10 when they feel strongly
about the coding issue (and who doesn't feel stringly about coding...)

They might. But the specification explicitly says that only comparisons of >, =, and < 0 are meaningful.

5.1.6 Language Translations. This whole issue is a nightmare, but it bother's
me that because "it's not known what will be needed", we won't allow multiple
langauges. Surely, if the source material is in multiple languages - not

multiple languages are allowed - no doubt about that - it's why DV_TEXT records language at the finest level.

too unusual here with a signficant migrant population and many doctors from
these ethnic groups - then we know what the requirements are?

The requirements are very complex. It is easy to find use cases, the problem is that there are so many use cases that nobody has worked out a general way of dealing with them all yet. Some of the challenges:
- translation and version control - should translations be allowed as new trunk versions, side versions or be forced to be new version trees?
- what if some of the latest most up to date information is only available in another language (due to the patient getting sick while in the Amazon for example)?
- what if translations don't translate the whole text of existing entries?

5.2.1 TERM_MAPPING.purpose - no terminology? how could this have any practical
meaning with a controlled usage?

I think it probably should be coded as well (I presume you meant here "without controlled usage"). But there are no terminologies for this at the moment, although it would not be hard to invent one.

5.4.1 TEXT_FORMAT_PROPERTY. It seems strange to me that this makes use of
CSS, rather than introducing semantic based markup. HTML is a mess, but
semantic based markup is a good thing, where as CSS is the opposite

the whole point is here not to introduce any semantic markup. We only want simple formatting. If you want to include whole tracts of semantic text, use DV_PARSABLE or DV_ENCAPSULATED...

6.3 - interval. I'm not sure why there is an implicit type Interval<T> and
this extra type - it's hard to work out what the implications of their
differences is

DV_INTERVAL is just a type which inherits from DATA_VALUE, INTERVAL does not. DV_INTERVAL could easliy be implemented using an INTERVAL<>.

6.4 reference Range type - this seems a little simple to handle the gloriously
complicated reference ranges much beloved by endocrinologists. What is the
intent in these cases?

to record only the reference range for a given datum, for a given patient, for the given test etc - not to record whole reference range tables. In most pathology this appears to suffice. But I'll say again - it's not to include all the reference range tables or data - only to record the particular values for the particular measurement and patient ...

12.2.5 HL7 types. Ok, I have to comment.
1. I changed BIN to List<BN> from LIST<BL> so that the null issue is clarified.

for BINs, EDs and STs I presume. Not for anything else...

2. ARRAY<CHAR> is not the same as ARRAY<BYTE>

because of double width characters presumably?

3. ST is a problem but it is a list<ST> (oh, err, LIST<CHAR>) not a LIST<BL>.

Going by the 4th ballot, I must have missed something - it seems to be that ST inherits from ED then BIN, which is bound to LIST<BN>.

4. I reworked the null stuff a little so that's out of date now :wink:

as soon as I can understand the 4th ballot, I will write an update for this section;-)

- thomas beale

These comments relate to v1.7.1

Section 4.2 DV_BOOLEAN. are the values {true, false}
case sensitive? Generally, is openehr case sensitive
or not?

no they're not. I'm not sure what it means to say "is openEHR case-sensitive or not" - wherever Strings occur, it is, since String processing in computing is always case-sensitive by default. Only in specific places (e.g. the Windows NTFS file system) has it been removed.

hmm. I will come back to this later.

5.1.5.2 This section introduces the "match" attribute, which has one
of the following values:
-1 code provided is more specific than it should be
0 code matches intent
+1 code provided is not as specific as the intent
The use of values like this seems poorly modelled - surely this
is a variation of the datum interpretation concept from the design
principles?

not sure why you say this - the only meaning for this attribute is to indicate match. There is no other data in the attribute. But you might argue that it should be an enumerated type rather than an Integer. We went for an integer for the following reasons:
- it is quite common in the C languages for -1/0/+1 to be the results of a comparison, e.g. with the string routines.
- to allow for the possibility in the future that sensible meanings could indeed be attached to higher /lower values.

yes, and we could use 0 to indicate null too, which is common practice in the c langauges.
I do argue that this should be an enumerated type. If it get's extended to higher
and lower values, then begin a terminology makes the problems with changing it
explicit

5.2.1 TERM_MAPPING.purpose - no terminology? how could this have any practical
meaning with a controlled usage?

I think it probably should be coded as well (I presume you meant here "without controlled usage"). But there are no terminologies for this at the moment, although it would not be hard to invent one.

oh yes - I did mean without. This is one of those horrible little corners for which
one must invent a terminology

5.4.1 TEXT_FORMAT_PROPERTY. It seems strange to me that this makes use of
CSS, rather than introducing semantic based markup. HTML is a mess, but
semantic based markup is a good thing, where as CSS is the opposite

the whole point is here not to introduce any semantic markup. We only want simple formatting. If you want to include whole tracts of semantic text, use DV_PARSABLE or DV_ENCAPSULATED...

oh - parsable? interesting. I read that and ignored it since it said so little
about what it was. why is it constrained to plain text?

back to the point. Why don't you want semantic markup?

6.4 reference Range type - this seems a little simple to handle the gloriously
complicated reference ranges much beloved by endocrinologists. What is the
intent in these cases?

to record only the reference range for a given datum, for a given patient, for the given test etc - not to record whole reference range tables. In most pathology this appears to suffice. But I'll say again - it's not to include all the reference range tables or data - only to record the particular values for the particular measurement and patient ...

but the reason endocrinologists have such wonderful reference ranges is that the particular values may not be
known (or even knowable). LH and FSH are classic examples - here's a set of reference ranges for different
stages of the cycle. The test as probably ordered to give some indication of where the cycle is - so instead
of interpreting the test result in terms of the reference range for the patient's situation, we are interpreting
the patient's situation in terms of the result compared to it's reference ranges.

But more generally, pathology test providers do not have enough information to assign
a patient to one of a set of reference ranges, where they have them

12.2.5 HL7 types. Ok, I have to comment.
1. I changed BIN to List<BN> from LIST<BL> so that the null issue is clarified.

for BINs, EDs and STs I presume. Not for anything else...

yes, but there's no other LIST<BL> anywhere. and elsewhere, the presence of
null items in the list isn't so outright stupid as in LIST<BL>.

2. ARRAY<CHAR> is not the same as ARRAY<BYTE>

because of double width characters presumably?

yes

3. ST is a problem but it is a list<ST> (oh, err, LIST<CHAR>) not a LIST<BL>.

Going by the 4th ballot, I must have missed something - it seems to be that ST inherits from ED then BIN, which is bound to LIST<BN>.

ah yes, but we tricked you and pulled a fast one by redefining it's ancestor. This is a fun trick to pull, so
I'll explain it a bit better:

   LIST<T>
     head() : T
     tail() : LIST<T>
   BIN = LIST<BN>
     head() : BN
     tail() : LIST<BN>
   ED
     head() : BN
     tail() : LIST<BN>
   ST
     head() : ST
     tail() : ST

because BIN is a LIST<BN>, then the head and tail take types BN and list<BN>.
But when we come to string, we pull a fast one, and ST is actually a LIST<ST>.
or ST is actually a ST. or something. So you see what I mean by we saying that
we pulled a fast one. Though I think mostly we tricked ourselves. And it should
be clear, that as editor of the spec, my current project is to fix that little mess!

4. I reworked the null stuff a little so that's out of date now :wink:

as soon as I can understand the 4th ballot, I will write an update for this section;-)

oh - well I won't hold my breath. When you do understand, can you let everyone know

Grahame

Grahame Grieve wrote:

5.1.5.2 This section introduces the "match" attribute, which has one
of the following values:
-1 code provided is more specific than it should be
0 code matches intent
+1 code provided is not as specific as the intent
The use of values like this seems poorly modelled - surely this
is a variation of the datum interpretation concept from the design
principles?

not sure why you say this - the only meaning for this attribute is to indicate match. There is no other data in the attribute. But you might argue that it should be an enumerated type rather than an Integer. We went for an integer for the following reasons:
- it is quite common in the C languages for -1/0/+1 to be the results of a comparison, e.g. with the string routines.
- to allow for the possibility in the future that sensible meanings could indeed be attached to higher /lower values.

yes, and we could use 0 to indicate null too, which is common practice in the c langauges.

yes - that is an example of the evil I was talking about. But null is not allowed (or even possible) for this field.

I do argue that this should be an enumerated type. If it get's extended to higher
and lower values, then begin a terminology makes the problems with changing it explicit

We did originally think of this, and there is a micro terminology in CEN or ISO, I forget which, which defines "<", "=", ">" as symbolic terms with defined meanings.

5.2.1 TERM_MAPPING.purpose - no terminology? how could this have any practical
meaning with a controlled usage?

I think it probably should be coded as well (I presume you meant here "without controlled usage"). But there are no terminologies for this at the moment, although it would not be hard to invent one.

oh yes - I did mean without. This is one of those horrible little corners for which
one must invent a terminology

I tend to agree (and I've noted this in a compendium of CRs that eventually must be sorted out!)

oh - parsable? interesting. I read that and ignored it since it said so little
about what it was. why is it constrained to plain text?

do you mean - why is the 'value' a String? It is assumed to be parsable text - that's all. It could be XML - but not necessarily. It could also be stored as a DV_MULTIMEDIA - i.e. opaquely.

back to the point. Why don't you want semantic markup?

semantic markup is fine, just not in DV_TEXTs - otherwise we will ge people putting all kinds of stuff in there in massive tracts of XML, and others will just put in small strings, which is the intention. This would be a nightmare for decision support. If semantic markup is needed, it can just be a DV_PARSABLE or DV_MULTIMEDIA - no loss of meaning, and much easier to know what to do with.

6.4 reference Range type - this seems a little simple to handle the gloriously
complicated reference ranges much beloved by endocrinologists. What is the
intent in these cases?

to record only the reference range for a given datum, for a given patient, for the given test etc - not to record whole reference range tables. In most pathology this appears to suffice. But I'll say again - it's not to include all the reference range tables or data - only to record the particular values for the particular measurement and patient ...

but the reason endocrinologists have such wonderful reference ranges is that the particular values may not be
known (or even knowable). LH and FSH are classic examples - here's a set of reference ranges for different
stages of the cycle. The test as probably ordered to give some indication of where the cycle is - so instead
of interpreting the test result in terms of the reference range for the patient's situation, we are interpreting
the patient's situation in terms of the result compared to it's reference ranges.

But more generally, pathology test providers do not have enough information to assign
a patient to one of a set of reference ranges, where they have them

what we think is reasonable at the moment is:
- to allow multiple reference ranges (e..g normal, critical etc) to be built into the data type - this will take care of all the mundane reference range situations
- for complex reference data, it just has to be supplied as its own Entry or Structure as part of an Entry, and will probably need archetyping.

I know this is a hairy area, and I doubt we have it 100% correct for now - more experimentation is needed.

3. ST is a problem but it is a list<ST> (oh, err, LIST<CHAR>) not a LIST<BL>.

Going by the 4th ballot, I must have missed something - it seems to be that ST inherits from ED then BIN, which is bound to LIST<BN>.

ah yes, but we tricked you and pulled a fast one by redefining it's ancestor. This is a fun trick to pull, so
I'll explain it a bit better:

  LIST<T>
    head() : T
    tail() : LIST<T>
  BIN = LIST<BN>
    head() : BN
    tail() : LIST<BN>
  ED
    head() : BN
    tail() : LIST<BN>
  ST
    head() : ST
    tail() : ST

because BIN is a LIST<BN>, then the head and tail take types BN and list<BN>.
But when we come to string, we pull a fast one, and ST is actually a LIST<ST>.
or ST is actually a ST. or something. So you see what I mean by we saying that
we pulled a fast one. Though I think mostly we tricked ourselves. And it should
be clear, that as editor of the spec, my current project is to fix that little mess!

but hang on - ED has no formal type parameter - there is no way that ST can redefine T to be ST - it has already been fixed in BIN. The only way out of this is if ED is ED<T>. And you cannot do a direct redefine of head:ST from head:BN, since ST is not conformant to BN. Nor could tail: ST (or should it be tail:List<ST>?) be a valid redefinition of tail: LIST<BN> for similar reasons. Looking at the above, I doubt if I could make it work in any language. I think there is more work to do here... Not to mention that ST really being a LIST<ST> is not likely to convince too many people!

- thomas

Grahame

It is like being reviewed by a tornado - even my teeth feel clean!

>>These comments relate to v1.7.1
>>
>>Section 4.2 DV_BOOLEAN. are the values {true, false}
>>case sensitive? Generally, is openehr case sensitive
>>or not?
>
>no they're not. I'm not sure what it means to say "is openEHR
>case-sensitive or not" - wherever Strings occur, it is, since String
>processing in computing is always case-sensitive by default. Only in
>specific places (e.g. the Windows NTFS file system) has it been removed.

hmm. I will come back to this later.

This is quite usual I guess.

>>5.1.5.2 This section introduces the "match" attribute, which has one
>>of the following values:
>>-1 code provided is more specific than it should be
>>0 code matches intent
>>+1 code provided is not as specific as the intent
>>The use of values like this seems poorly modelled - surely this
>>is a variation of the datum interpretation concept from the design
>>principles?
>
>not sure why you say this - the only meaning for this attribute is to
>indicate match. There is no other data in the attribute. But you might
>argue that it should be an enumerated type rather than an
Integer. We went
>for an integer for the following reasons:
>- it is quite common in the C languages for -1/0/+1 to be the
results of a
>comparison, e.g. with the string routines.
>- to allow for the possibility in the future that sensible
meanings could
>indeed be attached to higher /lower values.

yes, and we could use 0 to indicate null too, which is common practice in
the c langauges.
I do argue that this should be an enumerated type. If it get's
extended to
higher
and lower values, then begin a terminology makes the problems
with changing it
explicit

I tend to agree with your reasoning - really equivalence is 1 from a
mathematical point of view. These values will only be included if they are
already in mapping tables and would only be used for automatic processing -
and could be determined independent of the EHR and perhaps should be - but
it might be useful when you translate - for example - so Ross River
(Australian disease) might come out as an Arbo Virus Infection in Italy -
they will not know Ross River anyway and will see that the statement was
more specific - they will know it is a type of arboviral infection that is
not in their set.

>>5.2.1 TERM_MAPPING.purpose - no terminology? how could this have any
>>practical
>>meaning with a controlled usage?
>
>I think it probably should be coded as well (I presume you meant here
>"without controlled usage"). But there are no terminologies for this at
>the moment, although it would not be hard to invent one.

oh yes - I did mean without. This is one of those horrible little corners
for which
one must invent a terminology

This was an area we wanted to cover but not control at the moment. It will
only have textural meanings at present but it could be helpful in some
jurisdictions - they can code it!

>>5.4.1 TEXT_FORMAT_PROPERTY. It seems strange to me that this
makes use of
>>CSS, rather than introducing semantic based markup. HTML is a mess, but
>>semantic based markup is a good thing, where as CSS is the opposite
>
>the whole point is here not to introduce any semantic markup. We
only want
>simple formatting. If you want to include whole tracts of semantic text,
>use DV_PARSABLE or DV_ENCAPSULATED...

oh - parsable? interesting. I read that and ignored it since it
said so little
about what it was. why is it constrained to plain text?

back to the point. Why don't you want semantic markup?

>>6.4 reference Range type - this seems a little simple to handle the
>>gloriously
>>complicated reference ranges much beloved by endocrinologists.
What is the
>>intent in these cases?
>
>to record only the reference range for a given datum, for a
given patient,
>for the given test etc - not to record whole reference range tables. In
>most pathology this appears to suffice. But I'll say again - it's not to
>include all the reference range tables or data - only to record the
>particular values for the particular measurement and patient ...

but the reason endocrinologists have such wonderful reference ranges is
that the particular values may not be
known (or even knowable). LH and FSH are classic examples -
here's a set of
reference ranges for different
stages of the cycle. The test as probably ordered to give some indication
of where the cycle is - so instead
of interpreting the test result in terms of the reference range for the
patient's situation, we are interpreting
the patient's situation in terms of the result compared to it's reference
ranges.

That is why we can have more than one - and then one can be marked as the
relevant one.

I will leave the rest to you - hope you get some sleep - Sam

But more generally, pathology test providers do not have enough
information
to assign
a patient to one of a set of reference ranges, where they have them

That is true - but it can be after the fact if that is appropriate.

oh - parsable? interesting. I read that and ignored it since it said so little
about what it was. why is it constrained to plain text?

do you mean - why is the 'value' a String? It is assumed to be parsable text - that's all. It could be XML

but if it's XML, then it's not plain text?

>>because BIN is a LIST<BN>, then the head and tail take types BN and list<BN>.

But when we come to string, we pull a fast one, and ST is actually a LIST<ST>.
or ST is actually a ST. or something. So you see what I mean by we saying that
we pulled a fast one. Though I think mostly we tricked ourselves. And it should
be clear, that as editor of the spec, my current project is to fix that little mess!

but hang on - ED has no formal type parameter - there is no way that ST can redefine T to be ST

it's a good thing that head and tail are fictional properties then.
but I will be proposing to fix this. we will see how it goes

Grahame