# Data Types **Category:** [Technical (archive)](https://discourse.openehr.org/c/technical-archive/156) **Created:** 2002-06-04 04:57 UTC **Views:** 5 **Replies:** 43 **URL:** https://discourse.openehr.org/t/data-types/14440 --- ## Post #1 by @Tim_Cook5 DV\_PARTIAL\_DATE \- Purpose: Incorrectly assumes that a 'day' is unknown with a known or unknown month\. It is very realistic to see a situation in which a person will recall that something occurred on the 1st of the month 10 years ago but cannot recall if it was June or July\. People relate things in their life and if an event is recurrent on a specific day of the month they will recall that though they may have no reference to which month it was\. DV\_PARTIAL\_TIME \- Purpose: Incorrectly assumes that an hour will be known\. Same reasoning as above that a person may not be certain if an event occurred at half\-past 10 or half\-past 11\. R/S, Tim Cook --- ## Post #2 by @thomas.beale \[to readers of this list: please remember to cc:all when replying\! The list policy is not to automatically set reply\-to: to the sender due to spam and vacation email problems\] Tim Cook wrote: > DV\_PARTIAL\_DATE \- Purpose: Incorrectly assumes that a 'day' is > unknown with a known or unknown month\. > It is very realistic to see a situation in which a person will > recall that something occurred on the 1st of the month 10 years ago > but cannot recall if it was June or July\. People relate things in > their life and if an event is recurrent on a specific day of the > month they will recall that though they may have no reference to > which month it was\. > yes, we had this debate, and I seem to remember that we thought that a date like 2/?/1993 was artificial in the sense that even if the patient did remember for a fact that it was the 1st or whatever, the date was no better than as if the day was also forgotten \- from a mathematical/processing point of view\. Personally I'm agnostic on this, and I would lean toward the "faithfulness" requirement of GEHR which would say record it anyway\. What do others think? > DV\_PARTIAL\_TIME \- Purpose: Incorrectly assumes that an hour will be > known\. Same reasoning as above that a person may not be certain if > an event occurred at half\-past 10 or half\-past 11\. > same argument either way for time I guess\. \- thomas beale --- ## Post #3 by @Tim_Benson Surely the criterion for any structured data is whether another application is expected to use that structured data in a way that \(a\) adds value and \(b\) is safe\. If either \(a\) or \(b\) are not true then structure simply adds cost and complexity without benefit\. --- ## Post #4 by @thomas.beale Tim Benson wrote: > Surely the criterion for any structured data is whether another application > is expected to use that structured data in a way that \(a\) adds value and \(b\) > is safe\. If either \(a\) or \(b\) are not true then structure simply adds cost > and complexity without benefit\. > Tim, I agree with the premise; but what is your solution in this case? The structure would only change in a very trivial way i\.e\. by adding a flag which means "day\_unknown"\. Are you asking for a use case which proves that this should exist? I agree \- that's what we need\. Tim has provided the simplest of all \- if the patient said it, we should record it\. Is it enough \- I don't know\.\.\. \- thomas beale --- ## Post #5 by @thomas.beale Tim Benson wrote: > Surely the criterion for any structured data is whether another application > is expected to use that structured data in a way that \(a\) adds value and \(b\) > is safe\. If either \(a\) or \(b\) are not true then structure simply adds cost > and complexity without benefit\. > Tim, I agree with the premise; but what is your solution in this case? The structure would only change in a very trivial way i\.e\. by adding a flag which means "day\_unknown"\. Are you asking for a use case which proves that this should exist? I agree \- that's what we need\. Tim has provided the simplest of all \- if the patient said it, we should record it\. Is it enough \- I don't know\.\.\. \- thomas beale --- ## Post #6 by @Sam Tim I think this is true but from a date point of view we can only know the year if the month is unknown \- if it is one or two then the person will have to guess and store it as a fuzzy date\. I think this is the only sensible approach\. We can record in text the time issues that have been mentioned\. Sam --- ## Post #7 by @Sam I do not think he is right \- Sam --- ## Post #8 by @Tony_Grivell I agree with both aspects of Thomas's argument\. In terms of future practical "longitudinal" use of the data, the effective time\-precision will be set much more by the month than the day\. However, it's also true that there \_may\_ be some other \(presently\-unimagined\) use for the more\-precisely defined day \- which argues in favour of it being recorded\. tony grivell --- ## Post #9 by @thomas.beale Tony Grivell wrote: > I agree with both aspects of Thomas's argument\. In terms of future practical "longitudinal" use of the data, the effective time\-precision will be set much more by the month than the day\. However, it's also true that there \_may\_ be some other \(presently\-unimagined\) use for the more\-precisely defined day \- which argues in favour of it being recorded\. one use case I was trying to imagine was \- what if the remembered date corresponded to some other significant date in the patient's history, and doctors were trying to figure out say medications or other interventions which the patient was unsure about about/couldn't remember\. So let's say there is an admission for 15/dec/1990 \(start of an episode where a fracture was treated\), and at some later time, the patient is telling their GP that on the "15th of I forget what month near the end of 1990, I fractured my leg"\. The GP might review the record and think that the two dates were probably the same, and that it was therefore the same fracture\. I would guess that this is a real contrived long shot, and probably unrealistic, but we need some more evidence from clinicians\.\.\. \- thomas beale --- ## Post #10 by @Tim_Cook5 > record and think that the two dates were probably the > same, and that it > was therefore the same fracture\. I would guess that this > is a real > contrived long shot, and probably unrealistic, but we > need some more > evidence from clinicians\.\.\. \*\*\* CAUTION \*\*\* This turned into a rambling about sociological issues and may have no technical merit\. I am not a clinician, but \.\.\.\.\.\.\.\.\.\.\.\.\.\. I believe this is NOT unrealistic\. It is 'how' people think\. Especially so when confronted with a question they did not expect in a usually stressful environment\. Yes, a family physician's office is a stressful environment for most people\. It is unfamiliar, they are usually ill and the questions seem to come from out of nowhere\. Physician's tend to put together puzzles about their patients and they seldom have all the pieces at one time\. We probably cannot name a system \(at this time\) that allows physicians to quickly and easily put those pieces together\. But, if given a place to store those pieces, "in context", an implementation of this model will be able to perform those type of retrieval functions\. Envision being able to scan a medical record for all partial dates\. Retrieve those dates along with some context of the CONTRIBUTION\. A computer could do very little with that information in most cases\. But a human mind \(physician\) could probably see relationships/patterns very quickly\. The 'idea' of an EHR should be to provide the clinician with appropriate information quickly, so they can do their job better \(improved patient care\)\. Dr\. Robert Shepherd so aptly describes the real benefits an EHR \(with or without decision support\) provides to a family physician\. I hope I can paraphrase it in an understandable way\. He says that a family doc will spend 90% of their clinical time on things they are familiar with\. The other 10% is where they benefit the most from the added information that an EHR can provide quickly\. That information may come in the form of linked guidelines, accurate and searchable documentation \(possibly from other patients EHR's\), current patient's history and problem list or sophisticated decision support system\. Dr\. Shepherd supports this with community / sociological reasoning\. So even though a family doc is a generalist, they tend to specialize to some degree because of these social realities\. So, for these reasons I believe that providing ways to capture imprecise dates/times in other than DV\_TEXT is of benefit and allowing for them to contain only 1/3 of the total data is not without merit\. --- ## Post #11 by @pschloeffel I would support Tim's view on the grounds of faithfulness also\. Peter Schloeffel --- ## Post #12 by @pschloeffel This would obviously be very uncommon but certainly a possible clinical scenario which supports the case for inclusion\. Peter Schloeffel --- ## Post #13 by @Tim_Benson Tom, I do not think that structure can be justified if that structure is unlikely to add either value or safety down the line\. So in the situation where we are not able to rely on a time as being either a strict point in time or an interval is likely to create semantic problems\. Unless you can rely on strict chronological listing it is unhelpful to try to give spurious precision\. So my suggestion is that such fuzzy dates should be put into free text only and all dates associated with any entry should only be the ones we can rely on, such as date and time of entry\. What is more precise: "the first of the month, but do not remember which month", "the night it rained" or "the morning that the kids were late for school"? To me there is no point in using anything other than free text for any of these\. Julian dates can be very useful, but not all date information fits the simple model and errors are made when we try to force it in\. We should always have a time stamp for computer entry, which should be flagged if this is the only Julian\-type date information that is available \(and must be used with great caution along side free text data\)\. Tim --- ## Post #14 by @thomas.beale Tim Benson wrote: > Tom, > I do not think that structure can be justified if that structure is unlikely > to add either value or safety down the line\. > a priori, I agree 100% with this; our unwritten motto is: only make it structured if it is safely computable\.\.\. > So in the situation where we > are not able to rely on a time as being either a strict point in time or an > interval is likely to create semantic problems\. Unless you can rely on > strict chronological listing it is unhelpful to try to give spurious > precision\. So my suggestion is that such fuzzy dates should be put into > free text only and all dates associated with any entry should only be the > ones we can rely on, such as date and time of entry\. > well, I think it depends on how fuzzy\. If someone jsut can't remember which date in the first week of July 2001 she first experienced organ rejection symptoms after a kidney transplant, that's not a completely unusable date; and one can easily imagine researchers wanting this kind of date to be in computable form in a study which is trying to characterise the efficacy of immunosuppresant drugs on transplant patients\. It seems to me that this is a good case for creating a PARTIAL\_DATE object with the day unknown \(or maybe an INTERVAL<DATE> \- it doesn't matter\)\. We also added the follwong routines to PARTIAL\_DATE: probable\_date: DATE possible\_dates:INTERVAL<DATE> These provide statistically reasonable approximations to the true date, for the purposes of querying and research\. So the question in my mind is: when is a date or a time \_too\_ fuzzy to record as a structured object? \(I should point out that we agree completely that very unreliable dates/times should indeed be text; but where is the cut\-off point?\) > What is more precise: "the first of the month, but do not remember which > month", "the night it rained" or "the morning that the kids were late for > school"? To me there is no point in using anything other than free text for > any of these\. Julian dates can be very useful, but not all date information > fits the simple model and errors are made when we try to force it in\. > again, on the face of it I agree\. But it may be in some circumstances that during a 1 minute discussion with the receptionist at the surgery, the patient agrees that "the night it rained" was indeed either tuesday of wednesday of last week, then we do in fact have a reasonable partial date\.\.\. apart from these \(probably few\) cases, I agree, we do not want to suggest that all statements about time no matter how fuzzy should be encoded as structured instances\. > We should always have a time stamp for computer entry, which should be > flagged if this is the only Julian\-type date information that is available > \(and must be used with great caution along side free text data\)\. > agree\. Ultimately it is up to the clinician to read the entry and act properly upon it of course\. I think we need to make some additions to the openEHR RM text to do with "when is a date too fuzzy to record in a structured way"\. thanks for your comments \- thomas --- ## Post #15 by @Paul_Stephen_Woolman In matching patient records for statistical purposes or for longitudinal tracking it is frequently the case that dates are entered erroneously, sometimes by the data entry person, sometimes by the patient themselves\. for instance the date of an operation can easily be "around march 1993"\. If there is a date field in the system the operator will usually add "1st" so the date becomes 1993\-03\-01 which is clearly incorrect but is the best assumption given the information\. Often the operator cannot proceed through the system unless the date field is entered\. We are humans and most clinical people will do this sort of approximation in entering dates and "safety" depends a lot on clinical judgement\. Most computer systems do not allow for free text entry of dates so i dont think that is a starter\. What is useful in longitudinal stracking of records is having some knowledge of the accuracy of a date so a field like "date certainty" is used and has values like "certain, sure, approximate, guess"\. Here is an example of matching two records for longitudinal study: record 1 surname BLOGGS date of birth 1993\-03\-01 certainty APPROXIMATE first name JAMES STUART Health number 1234567890 record 2 surname BLOGGS date of birth 1993\-03\-12 certainty CERTAIN first name JAMEY S\. Health number 123456789 these two records are on the face of it different and could be a different person but in fact are of the same person recorded in different places at different times by different people\. Human error explaines the different recordings\. Having the certainty field helps to match the records together by giving less weight to the approximate date in the first record\. A matching algorithm does this as well as taking the initial in the second record as matching the second forename in the first record\. Paul Woolman XML program manager Information and Statistics Division NHS Scotland Quoting Thomas Beale <thomas@deepthought\.com\.au>: --- ## Post #16 by @system Tim, Not the receiving system is the criterion\. It is the requirement to be able to store narrative and retreive/search in narrative text in a safe way that is the criterion\. Gerard > Surely the criterion for any structured data is whether another application > is expected to use that structured data in a way that \(a\) adds value and \(b\) > is safe\. If either \(a\) or \(b\) are not true then structure simply adds cost > and complexity without benefit\. > \-\- <private> \-\- Gerard Freriks, arts Huigsloterdijk 378 2158 LR Buitenkaag The Netherlands \+31 252 544896 \+31 654 792800 --- ## Post #17 by @thomas.beale Our current solution to this situation, as I mentioned in another post was to add the following routines to the PARTIAL\_DATE type: probable\_date: DATE possible\_dates:INTERVAL<DATE> If the user GUI was then constructed so that a blank date, and a blank month were allowed, the software behind would create a PARTIAL\_DATE object\. These functions would provide sensible values for statistical computation \(i\.e\. 15th of the month if date unknown, 1st/June if date and month unknown\)\. The possible\_dates function provides the outer limits of the possible values of the date, which can be used for query matching\. I am not sure of how much skew this introduces, but it has to be better than having falsely accuracte dates, or else no structured date at all\. Paul, what do you think of this approach? \- thomas beale --- ## Post #18 by @system Language is a funny thing\. Sometimes a concept that is used is very precise\. \(a specific date, a specific object, e\.g\. a particular chair\) Sometimes the concept is vague\. \(some time, some date, any object with the name chair\) Sometimes it is possible to define exactly what we mean\. But what is a perfect definition of a chair, so a person not having seen any in his life understands it? In Informatics many tend to think that everything can be defined precisely\. Reality is, more often than not, it can't \. The problem is how to handle these imprecise concepts faithfully\. By having an attribute \(like HL7\) indicating this? Gerard Ps: Hi Paul\. Is NHS Scotland interested to take part in EN 13606 work with CEN? > Our current solution to this situation, as I mentioned in another post > was to add the following routines to the PARTIAL\_DATE type: > probable\_date: DATE > possible\_dates:INTERVAL<DATE> > > If the user GUI was then constructed so that a blank date, and a blank > month were allowed, the software behind would create a PARTIAL\_DATE > object\. These functions would provide sensible values for statistical > computation \(i\.e\. 15th of the month if date unknown, 1st/June if date > and month unknown\)\. The possible\_dates function provides the outer > limits of the possible values of the date, which can be used for query > matching\. I am not sure of how much skew this introduces, but it has to > be better than having falsely accuracte dates, or else no structured > date at all\. > > Paul, what do you think of this approach? > > \- thomas beale > > \- > If you have any questions about using this list, > please send a message to d\.lloyd@openehr\.org \-\- <private> \-\- Gerard Freriks, arts Huigsloterdijk 378 2158 LR Buitenkaag The Netherlands \+31 252 544896 \+31 654 792800 --- ## Post #19 by @Tim_Cook5 \[many very good points deleted for brevity\] > > Envision being able to scan a medical record for all partial dates\. > > Retrieve those dates along with some context of the CONTRIBUTION\. A > > computer could do very little with that information in most cases\. > > But a human mind \(physician\) could probably see > > relationships/patterns very quickly\. > Perhaps\. Or it could be a mess of obscurantist screen > junk\. But if the mess was organised as the text of a story told by one human > being to another at a particular time it might be OK\. Exactly\. The implementation of that vision is indeed very tricky\. My point is that the "model" must accommodate it before any attempt at implementation\. > the other point of course that is very important for usability is the need for the > record to present less and less precision as the years recede: Really? I had not considered this\. Is it really distracting to 'see' a full date? Does the brain not transform that during the process? Is this 'really important' or is it a 'really cool' technology problem to conquer? --- ## Post #20 by @thomas.beale Tim Benson wrote: >Tom, >I do not think that structure can be justified if that structure is unlikely >to add either value or safety down the line\. So in the situation where we >are not able to rely on a time as being either a strict poFrom \- Mon Jun 10 18:14:09 2002 X\-UIDL: 1023696510\.17046\.bne009m\.server\-mail\.com X\-Mozilla\-Status: 0011 X\-Mozilla\-Status2: 00000000 Return\-Path: <owner\-openehr\-technical@chime\.ucl\.ac\.uk> Delivered\-To: mb34367a@bne009m\.server\-mail\.com Received: \(qmail 17043 invoked by alias\); 10 Jun 2002 08:08:30 \-0000 Delivered\-To: alias\-deepthoughtcomaumb34367\-thomas@deepthought\.com\.au Received: \(qmail 17019 invoked from network\); 10 Jun 2002 08:08:29 \-0000 Received: from unknown \(HELO chime\.ucl\.ac\.uk\) \(128\.40\.182\.1\)   by bne009m\.server\-mail\.com with SMTP; 10 Jun 2002 08:08:29 \-0000 Received: from localhost \(daemon@localhost\)   by chime\.ucl\.ac\.uk \(8\.11\.6/8\.11\.6\) with SMTP id g5A86x815248;   Mon, 10 Jun 2002 09:07:00 \+0100 \(BST\) Received: by ATuin\.chime\.ucl\.ac\.uk \(bulk\_mailer v1\.13\); Mon, 10 Jun 2002 09:05:31 \+0100 Received: \(from majordom@localhost\)   by chime\.ucl\.ac\.uk \(8\.11\.6/8\.11\.6\) id g5A85NH14923   for openehr\-technical\-rimward; Mon, 10 Jun 2002 09:05:23 \+0100 \(BST\) Received: from mta04\.mail\.mel\.aone\.net\.au \(mta04\.mail\.au\.uu\.net \[203\.2\.192\.84\]\)   by chime\.ucl\.ac\.uk \(8\.11\.6/8\.11\.6\) with ESMTP id g5A84cj14795   for <openehr\-technical@openehr\.org>; Mon, 10 Jun 2002 09:04:39 \+0100 \(BST\) Received: from deepthought\.com\.au \(\[210\.84\.94\.234\]\)           by mta04\.mail\.mel\.aone\.net\.au with ESMTP           id <20020610080501\.JEQP27618\.mta04\.mail\.mel\.aone\.net\.au@deepthought\.com\.au>;           Mon, 10 Jun 2002 18:05:01 \+1000 Message\-ID: <3D045E2F\.1090908@deepthought\.com\.au> Organization: Deep Thought Informatics Pty Ltd User\-Agent: Mozilla/5\.0 \(Windows; U; Windows NT 5\.0; en\-US; rv:0\.9\.4\.1\) Gecko/20020508 Netscape6/6\.2\.3 X\-Accept\-Language: en,pdf MIME\-Version: 1\.0 --- ## Post #21 by @Tim_Benson Hi Tom, I do not know which HL7 TCs and SIGs should host this debate\. It is not humanly possible to follow everything, but I would have thought that EHR and MnM should also have a point of view\. My thrust all along has been to make sure that we really understand what are the specific requirements of each use case, which is less controversial than asking what is the set of all possible requirements\. Once the requirements are clear, then it is another debate about how best to realise them in a design specification\. I still feel that these two stages are being melded together, when they should be kept as separate as possible\. Tim --- ## Post #22 by @thomas.beale Tim Benson wrote: > Hi Tom, > I do not know which HL7 TCs and SIGs should host this debate\. It is not > humanly possible to follow everything, but I would have thought that EHR and > MnM should also have a point of view\. > > My thrust all along has been to make sure that we really understand what are > the specific requirements of each use case, which is less controversial than > asking what is the set of all possible requirements\. > it is, and it is the right thing to do when building a system\. But when building a standard, or a product, or something which is clearly going to have application outside the situations which can possibly be thought of when it is being written, things are somewhat different \- we cannot just stick to software\-engineering as usual\. This is why the GEHR/openEHR work uses a 2\-level framework rather than the typical single\-level one, which is very limited and does not behave well in time\. That said, I am all for finding use cases to justify the inclusion of things, and avoiding unncessary optionality \- within an overall framework which provides for future\-proofing\. \- thomas --- ## Post #23 by @system Hi, In my mind I always placed the Null\-stuff as an attribute within an element/Class\. Not in a datatype\. It is meta\-information\. Gerard > Tim Benson wrote: > >> Tom, >> I do not think that structure can be justified if that structure is > > unlikely >> to add either value or safety down the line\. So in the situation where we >> are not able to rely on a time as being either a strict point in time > > or an >> interval is likely to create semantic problems\. Unless you can rely on >> strict chronological listing it is unhelpful to try to give spurious >> precision\. So my suggestion is that such fuzzy dates should be put into >> free text only and all dates associated with any entry should only be the >> ones we can rely on, such as date and time of entry\. >> > > We are having an interesting debate on just this topic on the HL7 CQ > list \(don't know if you get that\)\. The HL7 data type modelling approach > seems to be to include Null markers all over the place inside the data > types, so that no matter how little you know, you can still create an > instance of a structured data item\. My reponse to this has been: > > \- it makes the data type specification quite a lot more complex, since > now the semantics have to always include the possibility of an attribute > or function result of a data item being Null \(just start thinking about > this and it will become more obvious\) > \- it will make the implementation of data types and also software that > uses them more complicated > \- it will create some data instances where parts of the item are > missing, which will IMO be quite unexpected by most software\. E\.g\. > IVL<T>s with missing upper and lower limits \(but the principle is > general and applies to all data types\)\. I think there is the potential > for unsafe data via this approach\. > > In the long term, I think this may cause pollution of EHRs and other > systems with unreliable data items, and cause erroneous results in some > decision support and query\-based applications\. It will also prevent > applications based on a more typical concept of data types from working > properly\. > > I am not saying the HL7 approach is invalid \- it is valid \- but it is > also quite complex, and overkill in most cases \(in some parts of the RIM > it is in fact in error, but that's another argument\)\. > > The openEHR approach is much simpler: > \- data types are "clean" \- Null markers are specified at the next level > up in the model > \- some special partial data types such as PARTIAL\_DATE are specified, > because they occur commonly\. The model of PARTIAL\_DATE explicitly says > what can be missing and what cannot be, and defines all its semantics > accordingly > \- if not enough information is known to create a data item, it should be > recorded as narrative\. This way, decision support and querying will not > be operating on unreliable data\. > > This approach can be summarised as an "all\-or\-nothing" approach \- either > you have the required values to create the data item, or you don't\. The > HL7 approach can be described as an "anything\-goes" approach \- you can > create a structured data item no matter how little you know; it will > just have fewer or more Null markers\. > > I am partway though writing up the different design approaches, which I > will post if anyone wants to see it\. > > I wonder what others think\. > > \- thomas beale > > \- > If you have any questions about using this list, > please send a message to d\.lloyd@openehr\.org \-\- <private> \-\- Gerard Freriks, arts Huigsloterdijk 378 2158 LR Buitenkaag The Netherlands \+31 252 544896 \+31 654 792800 --- ## Post #24 by @Tim_Benson Tom Beale wrote: > But when > building a standard, or a product, or something which is clearly going > to have application outside the situations which can possibly be thought > of when it is being written, things are somewhat different \- we cannot > just stick to software\-engineering as usual\. Well here I disagree\. It is always the case that something designed to do a set of well specified jobs well, will always find other valuable uses\. > This is why the GEHR/openEHR work uses a 2\-level framework rather than > the typical single\-level one, which is very limited and does not behave > well in time\. I am all in favour of multilevel frameworks\. You might like to check out the e\-Service Development Framework at: http://www.govtalk.gov.uk/documents/eSDFprimerV1b.pdf This has a three level framework with a high level architecture, reusable elements and standards \(which can be implemented\)\. In another representation this is presented as the central three layer sandwich of a 5\-layer framework with generic standards \(such as UML, and XML\) above the architecture and actual instantiations below the standards giving: \- Generic standards \(e\.g XML\) \_ High level information architecture \- Reusable elements \- Standards \- Instantiation Each of these 5 layers can be divided vertically into Requirements, Design and Technology Implementation, giving a matrix\. Tim --- ## Post #25 by @Tony_Grivell > \[many very good points deleted for brevity\] > >> > Envision being able to scan a medical record for all partial > > dates\. >> > Retrieve those dates along with some context of the > > CONTRIBUTION\. A >> > computer could do very little with that information in most > > cases\. >> > But a human mind \(physician\) could probably see >> > relationships/patterns very quickly\. > >> Perhaps\. Or it could be a mess of obscurantist screen >> junk\. But if the mess was organised as the text of a story told by > > one human >> being to another at a particular time it might be OK\. > > Exactly\. > > The implementation of that vision is indeed very tricky\. My point > is that the "model" must accommodate it before any attempt at > implementation\. > >> the other point of course that is very important for usability is > > the need for the >> record to present less and less precision as the years recede: > > Really? I had not considered this\. Is it really distracting to > 'see' a full date? > Does the brain not transform that during the process? Is this > 'really important' or is it a 'really cool' technology problem to > conquer? The \_apparent\_ need \(or acceptability\) for less precision in dates as they age is not necessarily true\. For example, in retrospective review of 'causes and effects' \(eg a drug trial\) time \_intervals\_ will be required to be calculatable at optimum precision\. A general principle is, surely, that we can't and should not anticipate the future use/value of any of the data\! Tony Grivell --- ## Post #26 by @Tim_Cook5 > The \_apparent\_ need \(or acceptability\) for less precision > in dates as they age is not necessarily true\. For example, in retrospective > review of 'causes and effects' \(eg a drug trial\) time \_intervals\_ > will be required to be calculatable at optimum precision\. > A general principle is, surely, that we can't and should not anticipate the > future use/value of any of the data\! It was my impression that he was speaking of an implementation issue\. Not that the actual data should degrade in precision over time\. So the presentation to the physician is what is in question here\. If that is what users would want/need to see then I have no objection to it\. I was merely asking if that was truely a 'need'\. Tim --- ## Post #27 by @Sam Tim This is definately a mistake \- amny disorders have a date of onset that is fuzzy from a month point of view but is worthwhile \- last Pap smear, last attendance at Ophthalmologist etc\. The point about a fuzzy date is that it is helpful for human interpretation \- a month that a spouse died will be very worthwhile even if a day is not known \- when chasing records at another centre \- knowing that a date is accurate or not will overcome a lot of frustration\. SAm --- ## Post #28 by @William_E_Hammond Time to weigh in on fuzzy dates\. We have been using fuzzy dates at Duke and in TMR since the early 70s for just the reason Sam states\. Often patients will know on;y the year, more frequently the month and year only but no date\. We discover that partial data is much more useful than no data\. So we used fuzzy dates\. The fuzzy dates are displayed with ?? for the unknown parts\. Whenever we sort, a fuzzy day sorts to the 15th of the month, and a fuzzy year sorts to July\. Statisticians are generally unhappy with fuzzy dates and want to throw them out\. But every one seems happy when someone records the date of onset for hypertension as July 4, 1976\. Where is the hour, minutes and seconds\. I argue that fuzzy dates are acceptable and valid data points and should be used in statistical analysis\. In a datetime stamp, unknowns are stored as 00\. Thank goodness, we use another saymbol for a totally unknown date\. Ed Hammond --- ## Post #29 by @Tim_Benson Sam, I think you have misunderstood me\. Human beings love complex patterns, but computers hate them\. Of course you must keep the richness of "the day before the big storm", but you should not try to put that sort of thing into a Julian date field\. Let people do what they are good for and let us use computing for what it is good for\. The fact is computers do not like ambiguity\. The question is always what do we want to use this info for? Is it to structure a record in chronological order or what? Tim --- ## Post #30 by @thomas.beale Tim Benson wrote: > Sam, I think you have misunderstood me\. Human beings love complex patterns, > but computers hate them\. Of course you must keep the richness of "the day > before the big storm", but you should not try to put that sort of thing into > a Julian date field\. Let people do what they are good for and let us use > computing for what it is good for\. The fact is computers do not like > ambiguity\. The question is always what do we want to use this info for? Is > it to structure a record in chronological order or what? > > Tim > I think Tim's general considerations are correct \(or at least I agree with them ;\-\) \- the reasons to use structured v non\-structured data items \(or any items for that matter\) are: \- if you have enough raw data to build the structured item \- if the information is to be used in computation I think these principles are correct\.\. but we do need to understand them\. The general design of the openEHR data types follows these principles in that you cannot create any item unless you can provide the required data to the creation routine; i\.e\. you can only create valid data items, be they quantities, terms or whatever\. However, there are times when you don't quite have all the raw data, but a\) you have enough to build a reasonable version of a data instance, and b\) you want to be able to compute on the instance\. Partial dates and times fall into the category, and this is why we have created separate classes of them\. If you have year and month only, you cannot create a valid DATE instance, but you can create a valid PARTIAL\_DATE instance, which will still satisfy the computational requirements of DATEs \(by synthesising reasonable mid\-month dates, etc\) For data which is really quite unreliable, we suggest that it be recorded as narrative text, as Tim mentioned earlier\. Contrast this with the HL7 data type approach where every type and every attribute and function result can be Null indicating it is unknown\. The idea of this \(according to Gunther\) is so that no matter how little you know, you can record it in structured form\. We can think of this design approach as a completely fuzzy approach\. As an example, you can have a IVL<TS> \(interval of time\) with unknown low and high values\. I have noted that this makes it nearly useless for computation, since you can't even call the contains\(a\_time:TS\) routine \- well you can, but you will get back "UNK" \(unknown\) as a value\. I see dangers in this approach: \- the specification is more complex, since the semantics have to include the case where each and every attribute might be Null\. Complex specifications are more likely to lead to implementation bugs \- software will be more complex because it has to be able to handle UNKs \- unreliable raw data is being used to create structured data instances whch might be treated by software as being more reliable than they really are \- if there is software operating on the data that does not understand the possibility that UNK can be returned from function calls, it is not clear that the data is safely processable I can see the theoretical interest of recording unreliable data in a structured way, even if half of it is missing, but practically I don't think that it is a very useful thing to do, except in exceptional \(and common\) cases like date & time\. Gunther says that people may come back and fill in the missing bits, but in general I think this is quite unlikely \- no\-one has time\. \(Exceptions might be partial data gathered in A&E or similar situations\)\. Hence we have opted for a simpler approach: \- in general data types are designed in a pure fashion \- no general facility for unknown elements \- special data types for partial data are specified; the advantage of this is that the semantics of these types are clear \- Null markers are recorded, not inside data instances, but where they are used, e\.g\. in the ELEMENT class in the EHR reference model thoughts? \- thomas beale --- ## Post #31 by @thomas.beale William E Hammond wrote: > Time to weigh in on fuzzy dates\. We have been using fuzzy dates at Duke > and in TMR since the early 70s for just the reason Sam states\. Often > patients will know on;y the year, more frequently the month and year only > but no date\. We discover that partial data is much more useful than no > data\. > > So we used fuzzy dates\. The fuzzy dates are displayed with ?? for the > unknown parts\. Whenever we sort, a fuzzy day sorts to the 15th of the > month, and a fuzzy year sorts to July\. > Ed, presumably you meant "a fuzzy month"\. This is the design we have used, so that's encouraging \(when can we install it at Duke?\-\)\. > Statisticians are generally unhappy > with fuzzy dates and want to throw them out\. > I am not convinced that the statistical arguments are so great \- I can see that there would be a skew towards things that happen more often on the 15th of the month, due to the day\-less dates in the system, but I can't think of any clinical research that would be looking at that\. Are there any studies on the dangers of fuzzy dates in statistical analysis? > But every one seems happy > when someone records the date of onset for hypertension as July 4, 1976\. > Where is the hour, minutes and seconds\. I argue that fuzzy dates are > acceptable and valid data points and should be used in statistical > analysis\. > > In a datetime stamp, unknowns are stored as 00\. Thank goodness, we use > another saymbol for a totally unknown date\. > > Ed Hammond > \- thomas beale --- ## Post #32 by @William_E_Hammond Thomas, You are correct \- I meant fuzzy month, not year\. I wish Duke were in a position to let you install\. Ed --- ## Post #33 by @Douglas_Carnall > I can see the theoretical interest of recording unreliable data in a > structured way, even if half of it is missing, but practically I don't > think that it is a very useful thing to do, except in exceptional \(and > common\) cases like date & time\. Gunther says that people may come back > and fill in the missing bits, but in general I think this is quite > unlikely \- no\-one has time\. \(Exceptions might be partial data gathered > in A&E or similar situations\)\. All clinical data is unreliable data\. Or at least, all clinical data has varying degrees of reliability\. On paper records, the structure tends to allow the person reading the record to make a shrewd assessment of its reliability\. We pay more attention to the consultant's clinic letter than the house officer's handwritten note at 4am\. We note the gaps as much as the actual data\. We cast an eye down the problem list summary and instantly have a feel for how well it has been maintained, without necessarily obsessionally correlating that list with the narrative in the notes\. If I see a patient in A&E who has been punched on the nose, look up his nose, draw a picture of his nose, and write "no septal haematoma" in the notes it is pretty clear that when the patient comes back with a septal haematoma that it developed subsequent to my examination\. \(Yes, it turns out the patient was punched a second time\.\.\.\)\. If I make a statement only about the external bruising, there will be doubt\. If I see a patient who subsequently turns out to have thyrotoxicosis, but do not record the presence or absence of certain key clinical findings \(e\.g\. pulse, weight, tremor\), and do not order thyroid function blood tests, then there must be doubt if I even considered the diagnosis\. Abstracting clinical data out of context is problematic\. One of the skills of an expert is to read a record, and quickly form an accurate impression of the patient's problems\. Part of this is knowing what to ignore, as well as recognising positive cues\. Now paper records have many problems, but unfortunately most computerised records seem to elevate the desire of computer scientists, ontologists, expert system designers, statisticians, politicians, and other busybodies to atomise data elements from the records of clinical consultations for their own purposes\. Fine\. There are lots of good reasons for doing this\. But it must not be the detriment of the primary utility of the record for the clinician\. I know it's a windup to make this statement to this list, but we now have enough cheap gadgets and computing power at the desktop to model a paper record graphically\. Maybe this would be a good starting point for a clinical record that truly gave first priority to the clinicians using it\. Would the open\-ehr archetypes provide the building blocks for a designer who wanted to take this approach? D\. --- ## Post #34 by @Clem_McDonald What Ed has described is the proper way to bet on the number that is not there when you have to for statistical purposes You ahve to assume some complete date for the purpose of the statistics\. Taking the Mean date for hte month or the mean month for the year is the best you can do\. \(We have used hte first of hte month and the first of the year and that DOES produce a bias As it turns out the Social security adminestration and the Tumor registry systems do the same thing \(that ed reccomends\) when the specfics are not known\. Thomas Beale wrote: --- ## Post #35 by @thomas.beale Clem McDonald wrote: > What Ed has described is the proper way to bet on the number that is not there > when you have to for statistical purposes > > You ahve to assume some complete date for the purpose of the statistics\. > Taking the Mean date for hte month or the mean month for the year is the best > you can do\. \(We have used hte first of hte month and the first of the year and > that DOES produce a bias > Right, so we have followed the first approach in the model: day missing from date => synthesize 15th / MM / yyyy day and month missing from date => synthesize 30 / JUN / yyyy We also added the function possible\_dates:INTERVAL<DATE> For a missing day => synthesize \{1 / MM / yyyy \- days\_in\_month\(MM\) / MM / yyyy\} For a missing month => synthesize \{1 / 1 / yyyy \- 31 / DEC / yyyy\} These functions can be used to catch the fuzzy dates when querying\. I guess there will be a skew when multiple successive queries are run on the same data because fuzzy dates will match more queries, so there must be a better way to do this\. Methods I can think of include: \- assigning a random date to each fuzzy date\. This is hard, because as more dates are added to the system, you have to keep monitoring them to be sure that you are still setting the fuzzy ones to a truly random value \- using the interval method above, but when doing a series of queries, remembering when you already have matched an item, to prevent double inclusion\. Do Regenstrief or Duke have any method for making fuzzy dates match queries? \- thomas beale --- ## Post #36 by @thomas.beale Douglas Carnall wrote: > \.\.\. > > If I see a patient who subsequently turns out to have thyrotoxicosis, but do > not record the presence or absence of certain key clinical findings \(e\.g\. > pulse, weight, tremor\), and do not order thyroid function blood tests, then > there must be doubt if I even considered the diagnosis\. > > Abstracting clinical data out of context is problematic\. > this is certainly our point of view\. We would say: \- record what is stated measured, checked etc \- do it in such a way that it works a\) for patient care b\) for decision support c\) for other uses, in that order\. \- assume that physicians and other health workers are thiking people, and will in general use data to make inferences leading to care decisions; don't try to record data in a way that makes presumptions about this, or prejudices the thinking process of the clinician That said, there is still the technical challenge, at the reductionist end of the data\-recording spectrum, of when to try an record data items in structured form \(so they are computable\) and when not to\. Structured data is much better for: \- computation, especially decision support \- interoperability, since every communicating party can agree on the one standard for what a "Quantity" etc looks like However, many people, including myself, have strong reservations about recording very unreliable data \(either partially specified or from a known/suspected unreliable source\) in structured form, particularly if values are synthesized to make it fit the requirements of creation of the structured data object in question\. > I know it's a windup to make this statement to this list, but we now have > enough cheap gadgets and computing power at the desktop to model a paper > record graphically\. Maybe this would be a good starting point for a clinical > record that truly gave first priority to the clinicians using it\. > > Would the open\-ehr archetypes provide the building blocks for a designer who > wanted to take this approach? > first thing to say is that CEN/GEHR/openEHR approaches do not predispose \(we hope\) the visual appearance of EHRs in applications to any particular model; there is no reason why the clinician's view of the record on the screen should not look like the paper record they are used to\. Once you start looking at forms for recording information in the paper record, it is clear that these forms often represent a\) a long\-term refinement of important data items for the purpose, and b\) a long\-term refinement of the arrangement of the questions and way of recording answers\. So in many cases, forms will be a starting point for archetypes\. But I should stress that archetypes \(as we have defined them in GEHR/openEHR\) are constraint models of data, not models for forms as such\. Now consider a form like the diabetic interview form in our current project\. The first time interview form has boxes for information that clinicians recognise as being in various well\-known categories, such as lifestyle \(the smoking, diet and exercise questions\), family history \(diabetes in the family\), current medications, and so on\. We envisage archetypes primarily for structuring data in the record, so there will be archetypes for each of these well\-known categories of information\. This means that if a different clinician uses an unrelated form for the patient, which also asks for \(probably different\) data to do with lifestyle, family history and so on, what we want are archetypes for lifestyle, fam hist etc, which cover the data being asked for in each place\. Over time, the design of such archetypes crystallises, and specialisations may be created for certain kinds of patients\. Where does this leave forms? One of the reasons I / DSTC have proposed a more formal concept of "contributions" is so that data gathered on a form, whcih might well be committed to different parts of the EHR according to various thematic \(data\-oriented rather than scren\-oriented\) archetypes, can be re\-assembled easily into the original form\. Secondly, there are various people thinking about "visual archetypes" and stylesheets for archetypes, and I have seen a system in Europe which I think could be integrated with the GEHR archetypes to build screen forms whose elements and element groups are based on archetypes, but where the overall design of the screen form resembles something the clinicians are used to seeing\. It is early days yet\.\.\.\. \- thomas beale --- ## Post #37 by @Douglas_Carnall Hmm\. If I saw a patient with an absent pulse and failed to note that finding, either mentally or in a clinical record, I think I might rightly be accused of being a poor observer of the human condition\. Come to think of it, a weightless patient would be interesting too :\-\) And everyone has a physiological tremor, though it is often exaggerated in moderate to severe untreated thyrotoxicosis\. So in fact, now I come to think about it again, I didn't mean presence or absence of pulse, weight or tremor, but presence or absence of\-\-all right, use the word, fuzzy, values for pulse, weight or tremor\. But you knew what I meant didn't you? ;\-\) D\. --- ## Post #38 by @thomas.beale One thing to be clear on \- we must differentiate between "not recorded" and "not there"\. Not recording someone's weight does not make them "weightless" \(don't worry I understood the joke, but this is a serious point as well\)\. A better example would be \- not recording smoking status doesn't make the patient a non\-smoker\. There are 5 possible situations I know of that can occur with data: 1\. it is not recorded \(nothing is recorded\) 2\. it was asked \(e\.g\. by an application GUI\) but remains unknown due to various reasons \(patient uncounscious, refused to divulge, etc\) 3\. it is completely known and recorded 4\. it is recorded, but there are bits missing 5\. it is recorded, but in the negative \(no known allergies, no previous surgery, etc etc\) Cases 2, 4and 5 have not always been properly catered for in systems\. Case 2 is dealt with in by the use of what i would call "data quality markers", i\.e\. what HL7 calls "flavours of Null"\. Actually, we call them that in the openEHR model, and use HL7's flavours of null \(although we use them in a different way\) Case 4 is dealt in openEHR by partial data types e\.g\. DV\_PARTIAL\_DATE, and with Null Flavours in HL7\. Case 5 requires proper structurinng of the health record, so that negatives can be recorded; archetypes/templates help in this\. \- thomas beale Douglas Carnall wrote: --- ## Post #39 by @system Hi, When I was thinking about this many years ago, I needed 4 attributes for a data item/statement/fragment/transaction/archetype, etc . One on **Asked/Answered**: **Answered Yes Answered No** **Asked** The 'normal' answer (3) No response (2) **Yes** **Asked** Unsollicited answer the real 'Null', Not recorded (1) **No** And then there is the Attribute on **Certainty**. And then the one on **Completeness**. And the one on **Negation**. Gerard > > One thing to be clear on - we must differentiate between "not recorded" > and "not there". Not recording someone's weight does not make them > "weightless" (don't worry I understood the joke, but this is a serious > point as well). A better example would be - not recording smoking status > doesn't make the patient a non-smoker. > > There are 5 possible situations I know of that can occur with data: > > 1. it is not recorded (nothing is recorded) > 2. it was asked (e.g. by an application GUI) but remains unknown due to > various reasons (patient uncounscious, refused to divulge, etc) > 3. it is completely known and recorded > 4. it is recorded, but there are bits missing > 5. it is recorded, but in the negative (no known allergies, no previous > surgery, etc etc) > > Cases 2, 4and 5 have not always been properly catered for in systems. > > Case 2 is dealt with in by the use of what i would call "data quality > markers", i.e. what HL7 calls "flavours of Null". Actually, we call them > that in the openEHR model, and use HL7's flavours of null (although we > use them in a different way) > > Case 4 is dealt in openEHR by partial data types e.g. DV_PARTIAL_DATE, > and with Null Flavours in HL7. > > Case 5 requires proper structurinng of the health record, so that > negatives can be recorded; archetypes/templates help in this. > > - thomas beale > > > Douglas Carnall wrote: > >> >>>> If I see a patient who subsequently turns out to have thyrotoxicosis, but >>>> do >>>> not record the presence or absence of certain key clinical findings (e.g. >>>> pulse, weight, tremor) >>>> >> >> Hmm. If I saw a patient with an absent pulse and failed to note that >> finding, either mentally or in a clinical record, I think I might rightly be >> accused of being a poor observer of the human condition. Come to think of >> it, a weightless patient would be interesting too :-) >> >> And everyone has a physiological tremor, though it is often exaggerated in >> moderate to severe untreated thyrotoxicosis. >> >> So in fact, now I come to think about it again, I didn't mean presence or >> absence of pulse, weight or tremor, but presence or absence of--all right, >> use the word, fuzzy, values for pulse, weight or tremor. >> >> But you knew what I meant didn't you? >> >> ;-) >> >> D. >> -- -- Gerard Freriks, arts Huigsloterdijk 378 2158 LR Buitenkaag The Netherlands +31 252 544896 +31 654 792800 --- ## Post #40 by @Sam Tom, This area is interesting within the concept of the EHR as we have developed it\. Firstly, the place holders for key information may be an organiser rather than an entry\. For example with adverse reactions to medication \- a statement that the patient reports no known adverse reactions or even allergies is worth knowing \- but it is not an entry of the type allergy or adverse reaction\. A negative report such as this requires updating \- how long is it reasonable to assume that no new allergies have developed? \- whereas a report of an allergy of penicillin if well documented will probably have the same status from that point on\. Reporting that someone is a non\-smoker is not of the same order \- it is a negative finding but if the person is an adult you would anticipate that it is a stable finding \- whereas ex\-smoker has a different 'half\-life' \- it is also likely to be included in an entry about smoking \- how many per day, perhaps the number of pack years, and the person's current smoking status\. So negative recordings might be different\. The difference then is that a state of tobacco intake = 0 i\.e\. a non\-smoker is different than the state of having no known allergies \- the count of allergies = 0 but this is not an attribute of most people's health record\. I would propose that we have an entry that is EMPTY and returns a DV\_TEXT that can be displayed if required \- but will be dated and we will know who added it\. This will allow organisers to be useful mandatory placeholders and know unambiguously that there are no allergies for example \(and when it was asked and by whom\) Cheers, Sam > One thing to be clear on \- we must differentiate between "not recorded" > and "not there"\. Not recording someone's weight does not make them > "weightless" \(don't worry I understood the joke, but this is a serious > point as well\)\. A better example would be \- not recording smoking status > doesn't make the patient a non\-smoker\. > > There are 5 possible situations I know of that can occur with data: > > 1\. it is not recorded \(nothing is recorded\) > 2\. it was asked \(e\.g\. by an application GUI\) but remains unknown due to > various reasons \(patient uncounscious, refused to divulge, etc\) > 3\. it is completely known and recorded > 4\. it is recorded, but there are bits missing > 5\. it is recorded, but in the negative \(no known allergies, no previous > surgery, etc etc\) > > Cases 2, 4and 5 have not always been properly catered for in systems\. > > Case 2 is dealt with in by the use of what i would call "data quality > markers", i\.e\. what HL7 calls "flavours of Null"\. Actually, we call them > that in the openEHR model, and use HL7's flavours of null \(although we > use them in a different way\) > > Case 4 is dealt in openEHR by partial data types e\.g\. DV\_PARTIAL\_DATE, > and with Null Flavours in HL7\. > > Case 5 requires proper structurinng of the health record, so that > negatives can be recorded; archetypes/templates help in this\. > > \- thomas beale > > Douglas Carnall wrote: > > > > >>>If I see a patient who subsequently turns out to have > thyrotoxicosis, but do > >>>not record the presence or absence of certain key clinical > findings \(e\.g\. > >>>pulse, weight, tremor\) > >>> > > > >Hmm\. If I saw a patient with an absent pulse and failed to note that > >finding, either mentally or in a clinical record, I think I > might rightly be > >accused of being a poor observer of the human condition\. Come to think of > >it, a weightless patient would be interesting too :\-\) > > > >And everyone has a physiological tremor, though it is often > exaggerated in > >moderate to severe untreated thyrotoxicosis\. > > > >So in fact, now I come to think about it again, I didn't mean presence or > >absence of pulse, weight or tremor, but presence or absence > of\-\-all right, > >use the word, fuzzy, values for pulse, weight or tremor\. > > > >But you knew what I meant didn't you? > > > >;\-\) > > > >D\. > > > > \-\- > \.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\. > Deep Thought Informatics Pty Ltd > > mailto:thomas@deepthought.com.au > open EHR \- http://www.openEHR.org > Archetype Methodology \- http://www.deepthought.com.au/it/archetypes.html Community Informatics \- http://www.deepthought.com.au/ci/rii/Output/mainTOC.html \.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\. --- ## Post #41 by @thomas.beale \[original post from Prof John Roddick, Flinders University South Australia, which failed to get through\] --- ## Post #42 by @thomas.beale > \[original post from Prof John Roddick, Flinders University South Australia, which failed to get through\] > >> Parsons, S\., 1996\. Current approaches to handling imperfect information in data and knowledge bases\. IEEE Transactions on Knowledge and Data Engineering 8 \(3\): 353\-372\. > >> in which he identifies five types of imperfection in data\. Namely: > >> 1\. Incomplete\. \(eg\. test results not known or qualified as in "interim results only"\) > I think this is an aspect of the real\-world situation, and just means that the information currently captured is only a "snapshot" along some tmeline; later, the final information will \(presumably\) be available\. In openEHR, this would be indicated in the clinical info itself, e\.g\. pathology results might say "preliminary results"\. We don't need to do anything special in this case\. In cases like an unconscious person coming to A&E, and the admission form on the screen requires all sorts of things which cannot be answered for now, traditional computer systems do completely the wrong thing, and either prevent the form from being committed with what is known \(phsyical description=xxxx, presenting complaint=partially severed left hand\.\.\.\.\) or creates dummy \(but wrong\) values for the fields that could not be filled in\. For this kind of situation, we have taken a lead from SCADA control systems \(where I learned about software\) and HL7's "flavours of null" approach\. In control systems, all values have an associated "data quality" marker, which, if it indicates that the value is "old" or that serial communication from the field has stopped, you ignore the actual value \(which might otherwise look like a completely legitimate transformer voltage or whatever\)\. In HL7, all their data types include the notion of Null values in every possible field, and the include a "flavour of null" \- reason for why the value is not available \- e\.g\.\. "unknown", "unavailable", "not asked", "asked but refused", "not applicable" etc \(that's from memory so the values might be a bit off\)\. The approach we have taken in openEHR is similar to the control system approach, and uses HL7's flavour's of null\. Thus, the class ELEMENT has attributes: value: DATA\_VALUE null\_flavour: DV\_CODED\_TEXT \{value from HL7 null flavours domain\} This approach also works for database systems \- there is no need to mix in fake null/0 values into the type value domain for a value field \- it's a separate field, btu always associateed with the value field\. So even if Oracle forces you to have a real date in the date\-of\-borth field \(e\.g\. "1\-1\-1800"\), the null\_flavour sitting next to it has the value "UNK", meaning \- "unknown \- ignore what is in the value field"\. >> 2\. Imprecise\. \(eg\. age "between 25 and 30" etc\.\)\. This arises from a lack of granularity\. > we definitely have to deal with this\. The possible ways include: \- DV\_INTERVAL<T> type for ranges \- partial dates & times \- using narrative text do we need more? >> 3\. Vague\. \(eg\. blood pressure "high", smokes "a lot", pain "acute", etc\.\) This arises from the use of fuzzy terms\. > we also have to deal with this, and the typical clinical version found in pathology and other areas where you get values from sets like \{trace, \+, \+\+, \+\+\+, \.\.\.\}\. Currenty we have avoided a complex fuzzy data type, and provided the DV\_ORDINAL data type, which allows ordinal numbers to be associated with symbols \(or words\)\. So for smoking, if you really want to avoid characterising quantitatively, you could use a DV\_ORDINAL, which comes from a "Lilliputian DOH tobacco consumption" domain/set: \{1=none; 2=occasional; 3= regular/light; 4=heavy; 5=going to die real soon now\}\. From the medical perspective I imagine that this particular example would be a spectacularly bad way to record this particular datum\.\.\.\.\. but the model will certainly let you do it, and it will also allow comparison \(use of the '<' operator\) by virtue of the ordinal numbers associated with the symbols\. For recording pain, or the Apgar characteristics, or urinalysis values, this approach seems fairly common among clinicians\. Our idea wsith DV\_ORDINAL was primarily not to prevent doctors from using "\+", "\+\+", "\+\+\+" type values, and to add a little bit of rigour \(ensuring comparability\)\. What we are not doing is implementing a mathematical fuzzy model where each symbol is associated with a sub\-section of a numerical range\. For those of you into fuzzy maths, you know that to characterise these mapping requires a fair bit of extra information\. However, this kind of information can be stored in archetypes, and is not needed in the data \(the mappings should not change with respect to the patient\), so we should probably consider this when designing the archetype version of the DV\_ORDINAL class \(and maybe other quantitative classes as well\)\. >> 4\. Uncertain\. \(eg\. a 95% chance of accuracy\)\. Arises from a lack of knowledge or subjective assessment\. > for this we include a "confidence: REAL" attribute in the ENTRY class\. >> 5\. Inconsistent\. \(ie\. contradictory information\)\. > I'm not sure what should be done about this, but I think it is in the clincal domain; the level of or reason for inconsistency should be characterised in the data by its authors; I don't think it needs anyting special in the reference model\. \(Anyone disagree?\) >> to that you can add a sixth > >> 6\. Out\-of\-date\. \(ie\. correct when stored by unlikely to be true now\)\. > this is a tricky one, and an example is "smoking status"=smoker which might be true up until two years ago, but change then\. Also, the converse \- the EHR shows that the patient was recorded as a smoker 15 years ago, but there is no new information regarding smoking at all\. Is s/he still a smoker? In general the time\-based transaction concept of GEHR gives systems the basic tool for recording updates to things\. Sam has been contemplating ways of representiing the idea of "confirming" previous information whose value does not change, but we want a more recent update on teh situation \(and medico\-legally, the practitioner wants to show in the record that they did indeed review various things on such\-and\-such a date\)\. This might require a special marker whcih does not change the valuue of something, but says that it was verified to be the same\. I don't think we have and answer yet for this in the architecture\. >> These can, of course, be combined\! > >> Incompleteness has traditionally been handled in databases with the null value\. In my opinion this has been totally inadequate but that doesn't stop it being the only option available in most systems\. Imprecision and uncertainly is often handled through coercion to the nearest value with all the problems that might cause and vagueness and inconsistency is often not handled at all\. Out\-of\-date\-ness is handled by assuming it doesn't happen\. > John's long experience with the horrors of inadequate data handling certainly rings true with me\. >> For the purposes of GEHR, I would suggest that No\. 5\. Inconsistent data is a fact of life and since this is somewhat different \(it required two pieces of information for example\) then we should leave this category to constraint handling and expert interpretation\. > Agree\. >> However, I would suggest we need to find a way of handling the other 5\. It's not initially clear how though\. Perhaps a qualifying field for each critical value? > how do you feel about the current ways of dealing with the problems, detailed above? We would value your expert opinion\. \- thomas beale --- ## Post #43 by @Douglas_Carnall > In cases like an unconscious person coming to A&E, and the admission > form on the screen requires all sorts of things which cannot be answered > for now, traditional computer systems do completely the wrong thing, and > either prevent the form from being committed with what is known > \(phsyical description=xxxx, presenting complaint=partially severed left > hand\.\.\.\.\) or creates dummy \(but wrong\) values for the fields that could > not be filled in\. Yes\. This is a VITAL thing to recognise: that any data entry method that forces clinicians to commit when we do not wish to commit will lead either to: \(1\) nonsensical data \(garbage in\.\.\. \) \(2\) user anomie \("it forced me to lie"\) > In control systems, all values have an associated "data > quality" marker, which, if it indicates that the value is "old" or that > serial communication from the field has stopped, you ignore the actual > value \(which might otherwise look like a completely legitimate > transformer voltage or whatever\)\. In HL7, all their data types include > the notion of Null values in every possible field, and the include a > "flavour of null" \- reason for why the value is not available \- e\.g\.\. > "unknown", "unavailable", "not asked", "asked but refused", "not > applicable" etc \(that's from memory so the values might be a bit off\)\. The need for a language of uncertainty\.\.\. interesting\. Will think more about this\. > >> 2\. Imprecise\. \(eg\. age "between 25 and 30" etc\.\)\. This arises from > >> a lack of granularity\. > > we definitely have to deal with this\. The possible ways include: > \- DV\_INTERVAL<T> type for ranges > \- partial dates & times > \- using narrative text > > do we need more? I often find, when I'm coding a consultation, that I am happy to use a precise Clinical Term as a starting point for a statement of a diagnosis, but want to qualify it in some way\. Most commonly, I want to add "?" or "??" or "???", or all three in a list of differential diagnoses of descending probability, or utility\. e\.g\. chest pain ?ischaemic ??muscular ???emotional One often wants to rule out important, but less likely diagnoses\. In the example above, my hunch might be that the pain has an emotional cause, but I want to catch the ECG technician before she goes home, then complete my examination, before bringing the patient back to a longer appointment to discuss her recent bereavement\. I think the answer to this problem as a system designer is recognise that qualifiers are likely to be needed in many situations, and leave a space for their collection and definition\. The space should be as unassuming as possible about the kind of data that may be found there\. Once the system has been used, one important use for the data gathered in the uncertainty space will be to enable a retrospective qualitative examination of the kinds of qualifers that have been felt useful by clinicians, and attempt some taxonomy for Version 2\. > >> 3\. Vague\. \(eg\. blood pressure "high", smokes "a lot", pain "acute", > >> etc\.\) This arises from the use of fuzzy terms\. > > we also have to deal with this, and the typical clinical version found > in pathology and other areas where you get values from sets like \{trace, > \+, \+\+, \+\+\+, \.\.\.\}\. I'm not sure that I can be certain about the difference between "vague" and "imprecise" ;\-\) All of the examples of vague data given above are of course amenable to further quantification, should someone feel it is important to take the time to do so\. The individual clinican probably knows what he or she means when using such a term\. I carry a number of definitions in my head of what I mean by "high" blood pressure for example, but they may vary \(hopefully not too wildly\) from those of my colleague down the corridor, or those that the Professor of Cardiovascular Medicine uses\. If the system could record a set of "imprecision preferences" for each individual user, it could enable: 1\) subsequent users of a vague value to get a feel for how the individual who recorded it has used vague values in other instances; 2\) individuals to compare their own use of vague values with others, and migrate towards a mean\. > Currenty we have avoided a complex fuzzy data type, and provided the > DV\_ORDINAL data type, which allows ordinal numbers to be associated with > symbols \(or words\)\. So for smoking, if you really want to avoid > characterising quantitatively, you could use a DV\_ORDINAL, which comes > from a "Lilliputian DOH tobacco consumption" domain/set: \{1=none; > 2=occasional; 3= regular/light; 4=heavy; 5=going to die real soon now\}\. > From the medical perspective I imagine that this particular example > would be a spectacularly bad way to record this particular datum\.\.\.\.\. The most useful quantitative predictor for harm caused by tobacco consumption is the pack year \(=packs/day \* years\_of\_consumption\) i\.e\. if I smoke 10/day for 15 years this would be 7\.5 pack years\. But if I was a tobacco cessation therapist, I might be interested in recording that someone who had previously smoked 18/day had cut down to 17/day over the last week \(a negligible change in terms of calculating health risk\)\. Although you're right that your ordinal set is not the way I'd choose to record smoking data myself, if another clinician chose to use that framework, I could still draw useful inferences from it subsequently\. There must be lots of data gathering designs out there on this topic; I think   never?/ever?/now? are the main top heads for tobacco consumption;   if never record null value   if now=no but ever=yes     then offer opportunity to record start date, cessation date and pack years   if now=yes     then offer opportunity to record start date and daily consumption   \(note, people don't always smoke the same amount each day\) is the way I like it \(and think about it\) but others might find implementations based on this irritating\. So why not allow clinicians to have a box in which they can either: 1\) just write text about tobacco consumption 2\) set up their own structure that is meaningful for them \(and share those structures on the internet, like complicated config files are shared for example, those of mutt or bash\. > but the model will certainly let you do it, and it will also allow > comparison \(use of the '<' operator\) by virtue of the ordinal numbers > associated with the symbols\. For recording pain, or the Apgar > characteristics, or urinalysis values, this approach seems fairly common > among clinicians\. > > Our idea wsith DV\_ORDINAL was primarily not to prevent doctors from > using "\+", "\+\+", "\+\+\+" type values, and to add a little bit of rigour > \(ensuring comparability\)\. Let's not inflict rigour on people\. Let's offer clinicians delightful opportunities to express what they thought, and what they mean\. Comparison should be a secondary function\. > What we are not doing is implementing a mathematical fuzzy model where > each symbol is associated with a sub\-section of a numerical range\. For > those of you into fuzzy maths, you know that to characterise these > mapping requires a fair bit of extra information\. However, this kind of > information can be stored in archetypes, and is not needed in the data > \(the mappings should not change with respect to the patient\), so we > should probably consider this when designing the archetype version of > the DV\_ORDINAL class \(and maybe other quantitative classes as well\)\. > > >> 4\. Uncertain\. \(eg\. a 95% chance of accuracy\)\. Arises from a lack > >> of knowledge or subjective assessment\. > > for this we include a "confidence: REAL" attribute in the ENTRY class\. Err\.\.\. I'm quite happy telling a patient that I'm 90% confident that it's just a virus, but I'll see them next week if it's not settling down\. I'm not so sure that it'll be meaningful to make comparisons with my colleagues' statements that they were 85% and 95% confident that the patients they saw had viruses too\. Just because you've got two numbers doesn't mean that you can perform arithmetic with them\. > >> 5\. Inconsistent\. \(ie\. contradictory information\)\. > > I'm not sure what should be done about this, but I think it is in the > clincal domain; the level of or reason for inconsistency should be > characterised in the data by its authors; I don't think it needs anyting > special in the reference model\. \(Anyone disagree?\) See my suggestion for a meta/qualifier/uncertainty space above\. > >> to that you can add a sixth > >> > >> 6\. Out\-of\-date\. \(ie\. correct when stored by unlikely to be true now\)\. > > this is a tricky one, and an example is "smoking status"=smoker which > might be true up until two years ago, but change then\. I'm much more relaxed about this\. Clinicians are experts at interpreting old data from clinical records\. As long as I know that, at date X, clinician Y thought Z was true, I can form a judgement about what that means, and what action if any I need to take to refresh the data now\. > Sam has been contemplating ways of representiing the idea of > "confirming" previous information whose value does not change, but we > want a more recent update on teh situation \(and medico\-legally, the > practitioner wants to show in the record that they did indeed review > various things on such\-and\-such a date\)\. This might require a special > marker whcih does not change the valuue of something, but says that it > was verified to be the same\. I don't think we have and answer yet for > this in the architecture\. Sort of like the Unix "touch" command? > >> These can, of course, be combined\! Ha ha\! That's the world I live in for sure\. :\-\) D\. --- ## Post #44 by @thomas.beale Douglas Carnall wrote: >> In control systems, all values have an associated "data >> quality" marker, which, if it indicates that the value is "old" or that >> serial communication from the field has stopped, you ignore the actual >> value \(which might otherwise look like a completely legitimate >> transformer voltage or whatever\)\. In HL7, all their data types include >> the notion of Null values in every possible field, and the include a >> "flavour of null" \- reason for why the value is not available \- e\.g\.\. >> "unknown", "unavailable", "not asked", "asked but refused", "not >> applicable" etc \(that's from memory so the values might be a bit off\)\. >> > > The need for a language of uncertainty\.\.\. interesting\. Will think more about this\. > the HL7 ballot has the full explanation if you are interested\. But there is a big difference in the we way apply it and the way they do\. They specify that not only can a whole datum be Null \(with its flavour of null stated\), but any attribute thereof can as well\. This means that you can get partially populated data items \(e\.g\. a quantity with no units, an interval with missing limits etc\) which I have argued is more complex to process, and more likely to result in software errors \(given the quality of real world software\)\. Theoretically, there is nothing wrong with their approach \(in fact it's quite an interesting idea\), but for the moment,we are going to go a simpler, more expected direction\. Time and experience will tell whcih approach is more appropriate\. >>>> 2\. Imprecise\. \(eg\. age "between 25 and 30" etc\.\)\. This arises from >>>> a lack of granularity\. >>>> > > I often find, when I'm coding a consultation, that I am happy to use a precise Clinical Term as a starting point for a statement of a diagnosis, but want to qualify it in some way\. Most commonly, I want to add "?" or "??" or "???", or all three in a list of differential diagnoses of descending probability, or utility\. > > e\.g\. > > chest pain ?ischaemic ??muscular ???emotional > ok \- at the moment, we would say that a differetnial diagnosis would be defiined by archetypes, which in your case would be associations of terms and confidence factors expressed as what we call DV\_ORDINALs, giving you the ablity to just use "?", "??", "???"\. This means your software could be written to accept exactly what you have put in above\. > Once the system has been used, one important use for the data gathered in the uncertainty space will be to enable a retrospective qualitative examination of the kinds of qualifers that have been felt useful by clinicians, and attempt some taxonomy for Version 2\. > agree \- we need some more in\-use experience before further theorising\.\.\. >>>> 3\. Vague\. \(eg\. blood pressure "high", smokes "a lot", pain "acute", >>>> etc\.\) This arises from the use of fuzzy terms\. >>>> > > If the system could record a set of "imprecision preferences" for each individual user, it could enable: > 1\) subsequent users of a vague value to get a feel for how the individual who recorded it has used vague values in other instances; > 2\) individuals to compare their own use of vague values with others, and migrate towards a mean\. > what this means is actually using fuzzy quantitative mappings for imprecise terms\. The fuzzy \(numeric\) data has to be carried with the symbolic datum each time, so it can be compared to other's data, and the comparison will work, even if your "high" is someone else's "critical"\. We haven't yet got this facility, but I think it is important enough to start designing into the archetype model\. > Although you're right that your ordinal set is not the way I'd choose to record smoking data myself, if another clinician chose to use that framework, I could still draw useful inferences from it subsequently\. > right\. > There must be lots of data gathering designs out there on this topic; I think > > never?/ever?/now? are the main top heads for tobacco consumption; > > if never record null value > > if now=no but ever=yes >   then offer opportunity to record start date, cessation date and pack years > > if now=yes >   then offer opportunity to record start date and daily consumption > > \(note, people don't always smoke the same amount each day\) > > is the way I like it \(and think about it\) but others might find implementations based on this irritating\. So why not allow clinicians to have a box in which they can either: > > 1\) just write text about tobacco consumption > 2\) set up their own structure that is meaningful for them \(and share those structures on the internet, like complicated config files are shared for example, those of mutt or bash\. > well, this is what archetypes are about\. But we can go frther with them, since we can allow 2 or 3 alternative smoking archetypes, and computationally convert between them, but comparing their interfaces\. This won't necessarily be easy, and in some cases will be very challenging \(e\.g\. comparing numeric nr packets to "heavy smoker"\) but the principle is there\.\.\.\. >> Our idea wsith DV\_ORDINAL was primarily not to prevent doctors from >> using "\+", "\+\+", "\+\+\+" type values, and to add a little bit of rigour >> \(ensuring comparability\)\. >> > > Let's not inflict rigour on people\. Let's offer clinicians delightful opportunities to express what they thought, and what they mean\. Comparison should be a secondary function\. > it's hidden in the model anyway \- they won't see it\. But it's useful for pseudo\-standardised sets of symbols for e\.g\. urinalysis >>>> 4\. Uncertain\. \(eg\. a 95% chance of accuracy\)\. Arises from a lack >>>> of knowledge or subjective assessment\. >>>> >> >> for this we include a "confidence: REAL" attribute in the ENTRY class\. >> > Err\.\.\. I'm quite happy telling a patient that I'm 90% confident that it's just a virus, but I'll see them next week if it's not settling down\. I'm not so sure that it'll be meaningful to make comparisons with my colleagues' statements that they were 85% and 95% confident that the patients they saw had viruses too\. > > Just because you've got two numbers doesn't mean that you can perform arithmetic with them\. > this is true, but I'm not sure what the alternative, since at least a % is more neutral than "low", "med", "high"\. Is there any research in this area I wonder? >>>> 6\. Out\-of\-date\. \(ie\. correct when stored by unlikely to be true now\)\. >>>> >> >> this is a tricky one, and an example is "smoking status"=smoker which >> might be true up until two years ago, but change then\. >> > I'm much more relaxed about this\. Clinicians are experts at interpreting old data from clinical records\. As long as I know that, at date X, clinician Y thought Z was true, I can form a judgement about what that means, and what action if any I need to take to refresh the data now\. > right\. I am right behind the idea that the EHR is to help clinicians do their job \(which is a lot of the time: thinking, evaluating, deciding\.\.\.\) not try to take it over\. Well leave it to Bill Gates and his paper clip to do that\. >> Sam has been contemplating ways of representiing the idea of >> "confirming" previous information whose value does not change, but we >> want a more recent update on teh situation \(and medico\-legally, the >> practitioner wants to show in the record that they did indeed review >> various things on such\-and\-such a date\)\. This might require a special >> marker whcih does not change the valuue of something, but says that it >> was verified to be the same\. I don't think we have and answer yet for >> this in the architecture\. >> > Sort of like the Unix "touch" command? > that's it\. thanks for the input\. I think the two items for us to consider from this are: a\) need for fuzzy quantification in archetypes to correspond to ordinal symbols \(\+, \+\+, \+\+\+, ?, ??, ??? etc\) b\) possible re\-evaluation of % as a way of expressing subjective certainty\. \- thomas beale --- **Canonical:** https://discourse.openehr.org/t/data-types/14440 **Original content:** https://discourse.openehr.org/t/data-types/14440