# Data Types RM **Category:** [Technical (archive)](https://discourse.openehr.org/c/technical-archive/156) **Created:** 2003-03-25 10:35 UTC **Views:** 4 **Replies:** 5 **URL:** https://discourse.openehr.org/t/data-types-rm/15736 --- ## Post #1 by @grahamegrieve These comments relate to v1\.7\.1 Section 4\.2 DV\_BOOLEAN\. are the values \{true, false\} case sensitive? Generally, is openehr case sensitive or not? Section 5\.1\.1\. Terminology Id's \- this section doesn't seem consistent with the Terminology section of the external package\. Also, terminology versions aren't included, but this is usually important Section 5\.1\.5\. includes the following example:   blah blah Ross River infection blah blah blah bronchial pneumonia with some annotations about how these concepts are coded\. But I don't understand what the value of this is when the text could be either of:   Patient has Ross River infection which caused severe bronchial pneumonia   Lab Result of Ross River infection is wrong\. Patient has bronchial pneumonia 5\.1\.5\.2 This section introduces the "match" attribute, which has one of the following values:    \-1 code provided is more specific than it should be    0 code matches intent    \+1 code provided is not as specific as the intent The use of values like this seems poorly modelled \- surely this is a variation of the datum interpretation concept from the design principles? Also, the use of a numerical value seems to fall into the design practices condemned in that section, and makes me think that some users/implementors will start putting in \-10 when they feel strongly about the coding issue \(and who doesn't feel stringly about coding\.\.\.\) 5\.1\.6 Language Translations\. This whole issue is a nightmare, but it bother's me that because "it's not known what will be needed", we won't allow multiple langauges\. Surely, if the source material is in multiple languages \- not too unusual here with a signficant migrant population and many doctors from these ethnic groups \- then we know what the requirements are? 5\.2\.1 TERM\_MAPPING\.purpose \- no terminology? how could this have any practical meaning with a controlled usage? 5\.4\.1 TEXT\_FORMAT\_PROPERTY\. It seems strange to me that this makes use of CSS, rather than introducing semantic based markup\. HTML is a mess, but semantic based markup is a good thing, where as CSS is the opposite 6\.3 \- interval\. I'm not sure why there is an implicit type Interval<T> and this extra type \- it's hard to work out what the implications of their differences is 6\.4 reference Range type \- this seems a little simple to handle the gloriously complicated reference ranges much beloved by endocrinologists\. What is the intent in these cases? 12\.2\.5 HL7 types\. Ok, I have to comment\. 1\. I changed BIN to List<BN> from LIST<BL> so that the null issue is clarified\. 2\. ARRAY<CHAR> is not the same as ARRAY<BYTE> 3\. ST is a problem but it is a list<ST> \(oh, err, LIST<CHAR>\) not a LIST<BL>\. 4\. I reworked the null stuff a little so that's out of date now ;\-\) Grahame --- ## Post #2 by @thomas.beale Grahame Grieve wrote: > These comments relate to v1\.7\.1 > > Section 4\.2 DV\_BOOLEAN\. are the values \{true, false\} > case sensitive? Generally, is openehr case sensitive > or not? no they're not\. I'm not sure what it means to say "is openEHR case\-sensitive or not" \- wherever Strings occur, it is, since String processing in computing is always case\-sensitive by default\. Only in specific places \(e\.g\. the Windows NTFS file system\) has it been removed\. > Section 5\.1\.1\. Terminology Id's \- this section doesn't > seem consistent with the Terminology section of > the external package\. Also, terminology versions > aren't included, but this is usually important As far as we can determine, there is no extant standard for names of terminologies\. There is an ISO standard 11179 which we have yet to obtain and study\. If this provides a more standard way of expressing names of terminologies, then we will use it\. So far, we have used the form "name\(version\)" for terminologies, e\.g\. "SNOMED\-CT\(2003\)"\. In the terminology ids section, we have for example: Terminology\_id\_Languages: SET<STRING> = \{ISO:639\-1\(1988\), ISO:639\-2\(1998\)\.\.\.\} \-\- ISO language names; see http://www.loc.gov/standards/iso639-2/langhome.html Terminology\_id\_Countries: SET<STRING> = \{ISO:3166\-1, ISO:3166\-2, \.\.\.\} \-\- ISO country codes & country subdivision codes So here, the nameof the first terminology is "ISO:639\-1" and the version "1988"\. I don't think this is inconsistent, but there may be better ways to express such names\. > Section 5\.1\.5\. includes the following example: > blah blah Ross River infection blah blah blah bronchial pneumonia > with some annotations about how these concepts are coded\. > But I don't understand what the value of this is when the text > could be either of: > Patient has Ross River infection which caused severe bronchial pneumonia > Lab Result of Ross River infection is wrong\. Patient has bronchial pneumonia actually, there was no intention that these two terms be linked in the text \- the intention was to show that both text \(DV\_TEXT\) and a coded term \(DV\_CODED\_TEXT\) could have mappings of other terms added\. > 5\.1\.5\.2 This section introduces the "match" attribute, which has one > of the following values: > \-1 code provided is more specific than it should be > 0 code matches intent > \+1 code provided is not as specific as the intent > The use of values like this seems poorly modelled \- surely this > is a variation of the datum interpretation concept from the design > principles? not sure why you say this \- the only meaning for this attribute is to indicate match\. There is no other data in the attribute\. But you might argue that it should be an enumerated type rather than an Integer\. We went for an integer for the following reasons: \- it is quite common in the C languages for \-1/0/\+1 to be the results of a comparison, e\.g\. with the string routines\. \- to allow for the possibility in the future that sensible meanings could indeed be attached to higher /lower values\. > Also, the use of a numerical value seems to fall into the > design practices condemned in that section, and makes me think that > some users/implementors will start putting in \-10 when they feel strongly > about the coding issue \(and who doesn't feel stringly about coding\.\.\.\) They might\. But the specification explicitly says that only comparisons of >, =, and < 0 are meaningful\. > 5\.1\.6 Language Translations\. This whole issue is a nightmare, but it bother's > me that because "it's not known what will be needed", we won't allow multiple > langauges\. Surely, if the source material is in multiple languages \- not multiple languages are allowed \- no doubt about that \- it's why DV\_TEXT records language at the finest level\. > too unusual here with a signficant migrant population and many doctors from > these ethnic groups \- then we know what the requirements are? The requirements are very complex\. It is easy to find use cases, the problem is that there are so many use cases that nobody has worked out a general way of dealing with them all yet\. Some of the challenges: \- translation and version control \- should translations be allowed as new trunk versions, side versions or be forced to be new version trees? \- what if some of the latest most up to date information is only available in another language \(due to the patient getting sick while in the Amazon for example\)? \- what if translations don't translate the whole text of existing entries? > 5\.2\.1 TERM\_MAPPING\.purpose \- no terminology? how could this have any practical > meaning with a controlled usage? I think it probably should be coded as well \(I presume you meant here "without controlled usage"\)\. But there are no terminologies for this at the moment, although it would not be hard to invent one\. > 5\.4\.1 TEXT\_FORMAT\_PROPERTY\. It seems strange to me that this makes use of > CSS, rather than introducing semantic based markup\. HTML is a mess, but > semantic based markup is a good thing, where as CSS is the opposite the whole point is here not to introduce any semantic markup\. We only want simple formatting\. If you want to include whole tracts of semantic text, use DV\_PARSABLE or DV\_ENCAPSULATED\.\.\. > 6\.3 \- interval\. I'm not sure why there is an implicit type Interval<T> and > this extra type \- it's hard to work out what the implications of their > differences is DV\_INTERVAL is just a type which inherits from DATA\_VALUE, INTERVAL does not\. DV\_INTERVAL could easliy be implemented using an INTERVAL<>\. > 6\.4 reference Range type \- this seems a little simple to handle the gloriously > complicated reference ranges much beloved by endocrinologists\. What is the > intent in these cases? to record only the reference range for a given datum, for a given patient, for the given test etc \- not to record whole reference range tables\. In most pathology this appears to suffice\. But I'll say again \- it's not to include all the reference range tables or data \- only to record the particular values for the particular measurement and patient \.\.\. > 12\.2\.5 HL7 types\. Ok, I have to comment\. > 1\. I changed BIN to List<BN> from LIST<BL> so that the null issue is clarified\. for BINs, EDs and STs I presume\. Not for anything else\.\.\. > 2\. ARRAY<CHAR> is not the same as ARRAY<BYTE> because of double width characters presumably? > 3\. ST is a problem but it is a list<ST> \(oh, err, LIST<CHAR>\) not a LIST<BL>\. Going by the 4th ballot, I must have missed something \- it seems to be that ST inherits from ED then BIN, which is bound to LIST<BN>\. > 4\. I reworked the null stuff a little so that's out of date now ;\-\) as soon as I can understand the 4th ballot, I will write an update for this section;\-\) \- thomas beale --- ## Post #3 by @grahamegrieve >> These comments relate to v1\.7\.1 >> >> Section 4\.2 DV\_BOOLEAN\. are the values \{true, false\} >> case sensitive? Generally, is openehr case sensitive >> or not? > > no they're not\. I'm not sure what it means to say "is openEHR case\-sensitive or not" \- wherever Strings occur, it is, since String processing in computing is always case\-sensitive by default\. Only in specific places \(e\.g\. the Windows NTFS file system\) has it been removed\. hmm\. I will come back to this later\. >> 5\.1\.5\.2 This section introduces the "match" attribute, which has one >> of the following values: >> \-1 code provided is more specific than it should be >> 0 code matches intent >> \+1 code provided is not as specific as the intent >> The use of values like this seems poorly modelled \- surely this >> is a variation of the datum interpretation concept from the design >> principles? > > not sure why you say this \- the only meaning for this attribute is to indicate match\. There is no other data in the attribute\. But you might argue that it should be an enumerated type rather than an Integer\. We went for an integer for the following reasons: > \- it is quite common in the C languages for \-1/0/\+1 to be the results of a comparison, e\.g\. with the string routines\. > \- to allow for the possibility in the future that sensible meanings could indeed be attached to higher /lower values\. yes, and we could use 0 to indicate null too, which is common practice in the c langauges\. I do argue that this should be an enumerated type\. If it get's extended to higher and lower values, then begin a terminology makes the problems with changing it explicit >> 5\.2\.1 TERM\_MAPPING\.purpose \- no terminology? how could this have any practical >> meaning with a controlled usage? > > I think it probably should be coded as well \(I presume you meant here "without controlled usage"\)\. But there are no terminologies for this at the moment, although it would not be hard to invent one\. oh yes \- I did mean without\. This is one of those horrible little corners for which one must invent a terminology >> 5\.4\.1 TEXT\_FORMAT\_PROPERTY\. It seems strange to me that this makes use of >> CSS, rather than introducing semantic based markup\. HTML is a mess, but >> semantic based markup is a good thing, where as CSS is the opposite > > the whole point is here not to introduce any semantic markup\. We only want simple formatting\. If you want to include whole tracts of semantic text, use DV\_PARSABLE or DV\_ENCAPSULATED\.\.\. oh \- parsable? interesting\. I read that and ignored it since it said so little about what it was\. why is it constrained to plain text? back to the point\. Why don't you want semantic markup? >> 6\.4 reference Range type \- this seems a little simple to handle the gloriously >> complicated reference ranges much beloved by endocrinologists\. What is the >> intent in these cases? > > to record only the reference range for a given datum, for a given patient, for the given test etc \- not to record whole reference range tables\. In most pathology this appears to suffice\. But I'll say again \- it's not to include all the reference range tables or data \- only to record the particular values for the particular measurement and patient \.\.\. but the reason endocrinologists have such wonderful reference ranges is that the particular values may not be known \(or even knowable\)\. LH and FSH are classic examples \- here's a set of reference ranges for different stages of the cycle\. The test as probably ordered to give some indication of where the cycle is \- so instead of interpreting the test result in terms of the reference range for the patient's situation, we are interpreting the patient's situation in terms of the result compared to it's reference ranges\. But more generally, pathology test providers do not have enough information to assign a patient to one of a set of reference ranges, where they have them >> 12\.2\.5 HL7 types\. Ok, I have to comment\. >> 1\. I changed BIN to List<BN> from LIST<BL> so that the null issue is clarified\. > > for BINs, EDs and STs I presume\. Not for anything else\.\.\. yes, but there's no other LIST<BL> anywhere\. and elsewhere, the presence of null items in the list isn't so outright stupid as in LIST<BL>\. >> 2\. ARRAY<CHAR> is not the same as ARRAY<BYTE> > > because of double width characters presumably? yes >> 3\. ST is a problem but it is a list<ST> \(oh, err, LIST<CHAR>\) not a LIST<BL>\. > > Going by the 4th ballot, I must have missed something \- it seems to be that ST inherits from ED then BIN, which is bound to LIST<BN>\. ah yes, but we tricked you and pulled a fast one by redefining it's ancestor\. This is a fun trick to pull, so I'll explain it a bit better:    LIST<T>      head\(\) : T      tail\(\) : LIST<T>    BIN = LIST<BN>      head\(\) : BN      tail\(\) : LIST<BN>    ED      head\(\) : BN      tail\(\) : LIST<BN>    ST      head\(\) : ST      tail\(\) : ST because BIN is a LIST<BN>, then the head and tail take types BN and list<BN>\. But when we come to string, we pull a fast one, and ST is actually a LIST<ST>\. or ST is actually a ST\. or something\. So you see what I mean by we saying that we pulled a fast one\. Though I think mostly we tricked ourselves\. And it should be clear, that as editor of the spec, my current project is to fix that little mess\! >> 4\. I reworked the null stuff a little so that's out of date now ;\-\) > > as soon as I can understand the 4th ballot, I will write an update for this section;\-\) oh \- well I won't hold my breath\. When you do understand, can you let everyone know Grahame --- ## Post #4 by @thomas.beale Grahame Grieve wrote: >>> 5\.1\.5\.2 This section introduces the "match" attribute, which has one >>> of the following values: >>> \-1 code provided is more specific than it should be >>> 0 code matches intent >>> \+1 code provided is not as specific as the intent >>> The use of values like this seems poorly modelled \- surely this >>> is a variation of the datum interpretation concept from the design >>> principles? >> >> not sure why you say this \- the only meaning for this attribute is to indicate match\. There is no other data in the attribute\. But you might argue that it should be an enumerated type rather than an Integer\. We went for an integer for the following reasons: >> \- it is quite common in the C languages for \-1/0/\+1 to be the results of a comparison, e\.g\. with the string routines\. >> \- to allow for the possibility in the future that sensible meanings could indeed be attached to higher /lower values\. > > yes, and we could use 0 to indicate null too, which is common practice in the c langauges\. yes \- that is an example of the evil I was talking about\. But null is not allowed \(or even possible\) for this field\. > I do argue that this should be an enumerated type\. If it get's extended to higher > and lower values, then begin a terminology makes the problems with changing it explicit We did originally think of this, and there is a micro terminology in CEN or ISO, I forget which, which defines "<", "=", ">" as symbolic terms with defined meanings\. >>> 5\.2\.1 TERM\_MAPPING\.purpose \- no terminology? how could this have any practical >>> meaning with a controlled usage? >> >> I think it probably should be coded as well \(I presume you meant here "without controlled usage"\)\. But there are no terminologies for this at the moment, although it would not be hard to invent one\. > > oh yes \- I did mean without\. This is one of those horrible little corners for which > one must invent a terminology I tend to agree \(and I've noted this in a compendium of CRs that eventually must be sorted out\!\) > oh \- parsable? interesting\. I read that and ignored it since it said so little > about what it was\. why is it constrained to plain text? do you mean \- why is the 'value' a String? It is assumed to be parsable text \- that's all\. It could be XML \- but not necessarily\. It could also be stored as a DV\_MULTIMEDIA \- i\.e\. opaquely\. > back to the point\. Why don't you want semantic markup? semantic markup is fine, just not in DV\_TEXTs \- otherwise we will ge people putting all kinds of stuff in there in massive tracts of XML, and others will just put in small strings, which is the intention\. This would be a nightmare for decision support\. If semantic markup is needed, it can just be a DV\_PARSABLE or DV\_MULTIMEDIA \- no loss of meaning, and much easier to know what to do with\. >>> 6\.4 reference Range type \- this seems a little simple to handle the gloriously >>> complicated reference ranges much beloved by endocrinologists\. What is the >>> intent in these cases? >> >> to record only the reference range for a given datum, for a given patient, for the given test etc \- not to record whole reference range tables\. In most pathology this appears to suffice\. But I'll say again \- it's not to include all the reference range tables or data \- only to record the particular values for the particular measurement and patient \.\.\. > > but the reason endocrinologists have such wonderful reference ranges is that the particular values may not be > known \(or even knowable\)\. LH and FSH are classic examples \- here's a set of reference ranges for different > stages of the cycle\. The test as probably ordered to give some indication of where the cycle is \- so instead > of interpreting the test result in terms of the reference range for the patient's situation, we are interpreting > the patient's situation in terms of the result compared to it's reference ranges\. > > But more generally, pathology test providers do not have enough information to assign > a patient to one of a set of reference ranges, where they have them what we think is reasonable at the moment is: \- to allow multiple reference ranges \(e\.\.g normal, critical etc\) to be built into the data type \- this will take care of all the mundane reference range situations \- for complex reference data, it just has to be supplied as its own Entry or Structure as part of an Entry, and will probably need archetyping\. I know this is a hairy area, and I doubt we have it 100% correct for now \- more experimentation is needed\. >>> 3\. ST is a problem but it is a list<ST> \(oh, err, LIST<CHAR>\) not a LIST<BL>\. >> >> Going by the 4th ballot, I must have missed something \- it seems to be that ST inherits from ED then BIN, which is bound to LIST<BN>\. > > ah yes, but we tricked you and pulled a fast one by redefining it's ancestor\. This is a fun trick to pull, so > I'll explain it a bit better: > >   LIST<T> >     head\(\) : T >     tail\(\) : LIST<T> >   BIN = LIST<BN> >     head\(\) : BN >     tail\(\) : LIST<BN> >   ED >     head\(\) : BN >     tail\(\) : LIST<BN> >   ST >     head\(\) : ST >     tail\(\) : ST > > because BIN is a LIST<BN>, then the head and tail take types BN and list<BN>\. > But when we come to string, we pull a fast one, and ST is actually a LIST<ST>\. > or ST is actually a ST\. or something\. So you see what I mean by we saying that > we pulled a fast one\. Though I think mostly we tricked ourselves\. And it should > be clear, that as editor of the spec, my current project is to fix that little mess\! but hang on \- ED has no formal type parameter \- there is no way that ST can redefine T to be ST \- it has already been fixed in BIN\. The only way out of this is if ED is ED<T>\. And you cannot do a direct redefine of head:ST from head:BN, since ST is not conformant to BN\. Nor could tail: ST \(or should it be tail:List<ST>?\) be a valid redefinition of tail: LIST<BN> for similar reasons\. Looking at the above, I doubt if I could make it work in any language\. I think there is more work to do here\.\.\. Not to mention that ST really being a LIST<ST> is not likely to convince too many people\! \- thomas --- ## Post #5 by @Sam Grahame It is like being reviewed by a tornado \- even my teeth feel clean\! > >>These comments relate to v1\.7\.1 > >> > >>Section 4\.2 DV\_BOOLEAN\. are the values \{true, false\} > >>case sensitive? Generally, is openehr case sensitive > >>or not? > > > >no they're not\. I'm not sure what it means to say "is openEHR > >case\-sensitive or not" \- wherever Strings occur, it is, since String > >processing in computing is always case\-sensitive by default\. Only in > >specific places \(e\.g\. the Windows NTFS file system\) has it been removed\. > > hmm\. I will come back to this later\. This is quite usual I guess\. > >>5\.1\.5\.2 This section introduces the "match" attribute, which has one > >>of the following values: > >>\-1 code provided is more specific than it should be > >>0 code matches intent > >>\+1 code provided is not as specific as the intent > >>The use of values like this seems poorly modelled \- surely this > >>is a variation of the datum interpretation concept from the design > >>principles? > > > >not sure why you say this \- the only meaning for this attribute is to > >indicate match\. There is no other data in the attribute\. But you might > >argue that it should be an enumerated type rather than an > Integer\. We went > >for an integer for the following reasons: > >\- it is quite common in the C languages for \-1/0/\+1 to be the > results of a > >comparison, e\.g\. with the string routines\. > >\- to allow for the possibility in the future that sensible > meanings could > >indeed be attached to higher /lower values\. > > yes, and we could use 0 to indicate null too, which is common practice in > the c langauges\. > I do argue that this should be an enumerated type\. If it get's > extended to > higher > and lower values, then begin a terminology makes the problems > with changing it > explicit I tend to agree with your reasoning \- really equivalence is 1 from a mathematical point of view\. These values will only be included if they are already in mapping tables and would only be used for automatic processing \- and could be determined independent of the EHR and perhaps should be \- but it might be useful when you translate \- for example \- so Ross River \(Australian disease\) might come out as an Arbo Virus Infection in Italy \- they will not know Ross River anyway and will see that the statement was more specific \- they will know it is a type of arboviral infection that is not in their set\. > >>5\.2\.1 TERM\_MAPPING\.purpose \- no terminology? how could this have any > >>practical > >>meaning with a controlled usage? > > > >I think it probably should be coded as well \(I presume you meant here > >"without controlled usage"\)\. But there are no terminologies for this at > >the moment, although it would not be hard to invent one\. > > oh yes \- I did mean without\. This is one of those horrible little corners > for which > one must invent a terminology This was an area we wanted to cover but not control at the moment\. It will only have textural meanings at present but it could be helpful in some jurisdictions \- they can code it\! > >>5\.4\.1 TEXT\_FORMAT\_PROPERTY\. It seems strange to me that this > makes use of > >>CSS, rather than introducing semantic based markup\. HTML is a mess, but > >>semantic based markup is a good thing, where as CSS is the opposite > > > >the whole point is here not to introduce any semantic markup\. We > only want > >simple formatting\. If you want to include whole tracts of semantic text, > >use DV\_PARSABLE or DV\_ENCAPSULATED\.\.\. > > oh \- parsable? interesting\. I read that and ignored it since it > said so little > about what it was\. why is it constrained to plain text? > > back to the point\. Why don't you want semantic markup? > > >>6\.4 reference Range type \- this seems a little simple to handle the > >>gloriously > >>complicated reference ranges much beloved by endocrinologists\. > What is the > >>intent in these cases? > > > >to record only the reference range for a given datum, for a > given patient, > >for the given test etc \- not to record whole reference range tables\. In > >most pathology this appears to suffice\. But I'll say again \- it's not to > >include all the reference range tables or data \- only to record the > >particular values for the particular measurement and patient \.\.\. > > but the reason endocrinologists have such wonderful reference ranges is > that the particular values may not be > known \(or even knowable\)\. LH and FSH are classic examples \- > here's a set of > reference ranges for different > stages of the cycle\. The test as probably ordered to give some indication > of where the cycle is \- so instead > of interpreting the test result in terms of the reference range for the > patient's situation, we are interpreting > the patient's situation in terms of the result compared to it's reference > ranges\. That is why we can have more than one \- and then one can be marked as the relevant one\. I will leave the rest to you \- hope you get some sleep \- Sam > But more generally, pathology test providers do not have enough > information > to assign > a patient to one of a set of reference ranges, where they have them That is true \- but it can be after the fact if that is appropriate\. --- ## Post #6 by @grahamegrieve > >> oh \- parsable? interesting\. I read that and ignored it since it said so little >> about what it was\. why is it constrained to plain text? > > do you mean \- why is the 'value' a String? It is assumed to be parsable text \- that's all\. It could be XML but if it's XML, then it's not plain text? >>because BIN is a LIST<BN>, then the head and tail take types BN and list<BN>\. >> But when we come to string, we pull a fast one, and ST is actually a LIST<ST>\. >> or ST is actually a ST\. or something\. So you see what I mean by we saying that >> we pulled a fast one\. Though I think mostly we tricked ourselves\. And it should >> be clear, that as editor of the spec, my current project is to fix that little mess\! > > but hang on \- ED has no formal type parameter \- there is no way that ST can redefine T to be ST it's a good thing that head and tail are fictional properties then\. but I will be proposing to fix this\. we will see how it goes Grahame --- **Canonical:** https://discourse.openehr.org/t/data-types-rm/15736 **Original content:** https://discourse.openehr.org/t/data-types-rm/15736