Pathology numeric values not supported in DV_Quantity

Hi everyone,

We want to report an issue that has arisen in data processing in Australia.

The issue is the somewhat random ability of systems to report a >xx or <yy range where a quantity is expected - there are still units and still a normal range. This is common with TSH and GFR - but can turn up in unexpected instances - e.g. we had a baby with a HCO3 of <5 mmol/L. This can be dealt with at present by substituting an interval - but it is a bit wierd as there is still a normal range - it kind of works as there is only a lower or upper value of the interval and so this single quantity can carry the normal range.

The point is that it is really a point measurement that is outside the range of the measuring device. Also, it means that we will have to have archetypes that allow multiple datatypes for all quantities that could conceivably be measured in this way.

The alternative is to consider a DV_QUANTITY_RANGE that inherits from DV_QUANTITY - it still has only one value - but now it has the ability to set this as the upper or lower value - and also whether this number is included or not.

The advantage is that there would still be a number to graph and this data type could always be substituted for a DV_QUANTITY (ie without archetyping).

I wonder what others think.

Cheers, Sam

Hi Sam,
This case is I believe actually a measure of accuracy of a result.

In my pathology system, I deal with it by having a separate attribute on quantity
It takes values such as >, <, >=, <=, ~
(~ => approximately, I => Inaccurate)

the value “I” may be used for instance when an analyser returns a potassium value
which on subsequent examination of the blood is shown to be erroneously
high due to haemolysis. This is usually accompanied by some text which
is displayed instead of the numeric value e.g. HAEM but the underlying
numeric value needs to be stored anyway as well.

This of course makes the logic for deciding whether a result is within a normal range
more interesting and graphing routines etc need to take this flag into account.

I don’t feel strongly whether you deal with this as part of the quantity datatype or have a new
datatype inheriting from quantity.

Regards
Vince

Dr Vincent McCauley MB BS, Ph.D
McCauley Software Pty Ltd

I may be missing the mark, as I come orginally from a process control
background, so apologies if this sounds like an engineering solution.

There is a mechanism in OPC (common protocol for getting information out of
machines) to where each data point can be identified according to the
quality of the data - e.g. OPC_QUALITY_GOOD, OPC_QUALITY_BAD,
OPC_QUALITY_UNCERTAIN. There is a further qualification for why the data is
bad (connection problems, config error, etc) but the record still contains a
actual value, so it can still be plotted, but if it can also be filtered
out. I guess this is essentially what you were saying Vince.

Unless I've misunderstood what Sam has proposed, the problem with a
substituted value is that it's not going to reflect the recorded value - ie.
a chart won't show the "true" erroneous data.

-Tom

Hi,

yy
What does it mean?

To my mind it semantically means a state of exception. Meaning not only that the measurement is yy but that it is unmeasurable.

If this reasoning is true than each archetype with a measurement needs an exception attribute.
In general this will be true in many more circumstances.

Each possible statement (data item and/or archetype) can have a few states:
requested/expected- unrequested/not expected (eg expected is TSH measurement but unrequested and unexpected the response is TSH>2000 as an indication of exception)
As exception there are at least two possibilities:
known-unknown. (eg RR 120/unknown mmHg. TSH was measured and presented but it must not be considered a real result it is in doubt)
true-untrue (eg I measured RR 60/80 this measurement I consider untrue, but it was that was was measured. TSH >2000 but is untrue because it was unmeasurable)

Gerard

– –

Gerard Freriks, arts

Huigsloterdijk 378

2158 LR Buitenkaag

The Netherlands

T: +31 252 544896

M: +31 654 792800

Vince

I like the flags - I wonder if we should have a ? or ! for the value affected by a quality issue - what do others think? The quality issues are dealt with in the laboratory archetypes.

Sam

Vincent McCauley wrote:

Probably "?" for dubitable results. "!" is commonly used
here for marking up (perhaps unexpectedly) clinically
important results (such as an *unusually* high titer of
something where I expected either normal or somewhat
elevated).

Karsten

Just going through the replies we have had on this one…

  • Gerard’s point about <5 etc being an exception is not quite right - it’s very common; it’s usually to do with sensitivity of instruments (i.e. accuracy), but there are also analytes which are reported as just being over a threshold since any number larger than X is fine (e.g. glomerulin, Sam tells me).

  • this is not an indication that the data type is really a DV_INTERVAL or DV_QUANTITY_RANGE - it is clearly not. When we see “HCO3: <5 mmol/L” we are not reporting an interval of 0 - 5 mmol/L, we are reporting a point value somewhere in 0-5, but we don’t quite know where.

  • Tom Tuddenham’s point is also correct. In openEHR, we actually do have a data quality marker (I used to work in SCADA as well, and lived with this kind of stuff for years!). It is called null_flavour and is defined on the ELEMENT class, next to the value attribute, which is the one that holds the Quantity that we are talking about (or some other kind of data value in other circumstances). Here we have a more fine-grained occurrence of the same problem, for slightly different reasons: the instrument or measuring method and data communicatoins are working as they should, it’s just that either the value is too low or high for quantification by the instrument, or else the instrument doesn’t bother reporting it above or below a certain threshold, since it is known that any value above/below is healthy. Nevertheless, we have to treat it in a similar way - probably with a flag that indicates ‘status’ of the value.

  • in practical terms we have to deal with the fact that quantities in the form of single-sided intervals with <, >, <=, >= can be mixed in with normal point value quantities, or replace them, on a per test-result basis.

  • we also have to have a solution that is easily comprehensible in the model and for software developers. Allowing INTERVALs to magically replace QUANTITY as is done in HL7 is not the way to do it, since there is no clean basis in the modelling to do this (i.e. it’s not normally possible in OO languages - you have to do something quirky to make it happen.); in any case, as pointed out above, DV_INTERVAL is not semantically correct in these cases anyway.
    My analysis is that we need to slightly extend DV_QUANTIFIED (supertype of DV_COUNT and DV_QUANTITY, as well as all the date/time types), in the way that Vince has said (probably Vince worked this solution out years ago;-)…so that the semantics are:

  • a magnitude

  • NEW ATTRIBUTE: a status flag - with the following possible values:

  • : greater than

  • < : less than

  • = : greater than or equal to (Vince, do we really need this and the next one - do you get real values where it is reported like this?)

  • <= : less than or equal to

  • = : exact point value (i.e. the default situation)

  • ~ : approximately equal to, i.e. like ‘=’ but with some unknown error

  • ? : innaccurate…what does this mean? If it is due to haemolysed blood then is it “inaccurate” or is it really just plain “wrong” (“incorrect”)?- accuracy

  • ..other attributes, depending on subtype

Adding a flag will be easy in modelling and software terms. What we have to do is carefully design the values; Vince has provided what is probably just about right, but I would like to be sure - see notes above on the list. Also, remember openEHR QUANTIFIED class already has accuracy as a Real - it can be a % or absolute value, so that any DV_QUANTIFIED can be created with a +/- 5% or whatever. Given this, do we need the ‘~’ flag (maybe we do: maybe there is no accuracy data available, and all we can get from a legacy feed is ‘~’)? And isn’t the “inaccurate” flag (as Vince named it) about something else? As Vince said, doing this means more careful data analysis to determine whether a value is normal or not, and how it should be graphed. Do we need to take this into account in the model in some way - there is already another CR to adjust how normal_range is modelled, and we have an is_normal function defined on DV_ORDERED (the ancestor of all the Quantity types in openEHR).

If we can get a bit more discussion on these details, I think we can fairly quickly state what changes are needed and write a CR for them.

  • thomas

Sam Heard wrote:

Thomas,

I agree it is very common.
But when <5 is reported in essence it means that it is an exception.
It is not a precise result. It does not mean that it is less than 5 only. It means that something of an exceptional state in in order.
It could be zero, it could be 4.999.. And anything in between but not an exact figure like x=5.1 units of a kind.

Gerard

– –

Gerard Freriks, arts

Huigsloterdijk 378

2158 LR Buitenkaag

The Netherlands

T: +31 252 544896

M: +31 654 792800

Gerard,
There are cases where we get a result like > 60 which is not an exception because the normal range is >60.

Heath

Hi Thomas,
Specific points:

  1. My pathology software supports <= and >= in this context but I have not come across an automated blood analyser
    interface that supports or requires this and in the couple of databases
    I looked at (approx. 15X10^6 numeric values over 8 years) no user has used these values.
    So probably good for completeness but no apparent use in the real world!

  2. The “Inaccurate flag” generally means that for some reason this value should be treated with caution
    or is unreliable.
    It may in fact be perfectly accurate (as in the haemolysed blood K+ example) but not actually a
    measure of the defined analyte - i.e. rather than a serum K+ value what was measured was
    serum K+ contaminated with Intracellular K+.
    Similar issues occur with cold agglutinins (inaccurate values if performed on specimen
    not kept at body temp) and Serum glucose measured on a specimen which does not
    contain a metabolic inhibitor (“fluoride tube”) which has been kept at room temp for too long.
    At other times it may actually be inaccurate e.g due to failure to calibrate an
    analyser correctly.

The fact that the value is unreliable often only becomes apparent after the event
e.g. the doctor rings up to query a high normal K+ on a patient whose K+ value
was expected to be low and a visual/microscopic examination of the specimen
reveals haemolysis
In the case of a badly calibrated analyser, statistical analysis of values performed routinely may demonstrate
that there has been an unacceptable variability in results or the average result
was significantly higher than expected or
(as happened very memorably in my practice in the Emergency department)
the values over a period of time fail to correlate with the clinical condition of the patients.

So it is absolutely necessary to be able to record (and keep) the
value but be able to flag it either at the time or sometime later as unreliable/inaccurate.
It is probably not worth (or even in some cases possible) to decide
whether an erroneous result is inaccurate or unreliable.

  1. Rules need to be provided as to how such values should be treated when comparing
    with normal ranges. For example if the normal range is 0-6 and the value given is <5 then
    this is normal. However if the normal range is 0-3 is this a normal value or not?
    This can be dealt with by “flavours of null” on a “normality flag”.

  2. Applications such as graphing, statistics packages etc. need to be aware of
    such values and treat them appropriately. Some general guidance/rules around this for
    developers/users may be appropriate.

Regards
Vince

hi

I don't think that the concept of <,> etc should
be conflated with the concept of approximately
and doubtful in the model. the approximate and doubtful always raise the issue of why and how and so I think that should be a matter for the archetype to resolve. However < and > etc, should be a data type thing.

Grahame

Thomas Beale wrote:

Raise the proper flag that indicates that it is TRUE and we know how to interpret.

GF

– –

Gerard Freriks, arts

Huigsloterdijk 378

2158 LR Buitenkaag

The Netherlands

T: +31 252 544896

M: +31 654 792800

Concerning my last reply on this subject,

I feel the appropriate solution is:
* add an attribute value_qualifier of type STRING with allowable values >, <, >=, <=, = (since this is a closed list, using coded terms doesn't seem to be useful)
* allow ELEMENT.null_flavour and DV_QUANTIFIED.accuracy to be used to cover Vince's Inaccurate (probably wrong) and '~' (slightly inaccurate, but usable value) cases respectively. In the latter case, it seems to me that if accuracy is going to be reported, it should be quantified, the way we do it in openEHR, i.e. +/- 5%, +/-2 and so on. Vince - am I being unreasonable? Did you have '~' because labs devices output this?

Unless better ideas surface, I will create a CR on the basis of the above.

- thomas

Grahame Grieve wrote:

hi

I don't think that the concept of <,> etc should
be conflated with the concept of approximately
and doubtful in the model. the approximate and doubtful always raise the issue of why and how and so I think that should be a matter for the archetype to resolve. However < and > etc, should be a data type thing.

Grahame

Grahame, you are right - to express ">5 (inaccurate)" we need two flags...

I can't think of great names of the top of my head, but how about:

    * value_qualifier - the attribute that carries the <, >, = etc
    * value_status - an attribute that carries some other possible
      flags, e.g. ?, ~, others?

I am suggesting that Vince's '~' is more like a data quality marker than an indicator of how to read the value...'?' means inaccurate....possibly wildly? Are '~' and '?' really different? If the second flag was just to say accurate / inaccurate then we could just use a Boolean. That would probably cover 95% of needs and be simple at the same time....Vince - any comments on that?

I think we are close to a solution here.

- thomas

Thomas Beale wrote:

Concerning my last reply on this subject,

I feel the appropriate solution is:
* add an attribute value_qualifier of type STRING with allowable values

, <, >=, <=, = (since this is a closed list, using coded terms doesn't

seem to be useful)
* allow ELEMENT.null_flavour and DV_QUANTIFIED.accuracy to be used to
cover Vince's Inaccurate (probably wrong) and '~' (slightly inaccurate,
but usable value) cases respectively. In the latter case, it seems to me
that if accuracy is going to be reported, it should be quantified, the
way we do it in openEHR, i.e. +/- 5%, +/-2 and so on. Vince - am I being
unreasonable? Did you have '~' because labs devices output this?

If you are going to capture error limits around a scalar quantity, then
you need to also capture the nature of those limits. Sometimes they are
simply co-efficients of variation, sometimes one or two (or 1.96)
standard deviations (as frequentist confidence intervals for normally
distributed data, or asymptotically normal confidence limits for
non-normally distributed data), sometimes they are non-normal confidence
limits, and occasionally (but often with clinical trials etc) they are
Bayesian credible intervals. Then there is the confidence level - often
95% but sometimes 99%, sometimes less. Will the proposed solution cover
these and other scenarios?

Tim C

Tim Churches wrote:

Thomas Beale wrote:
  

Concerning my last reply on this subject,

I feel the appropriate solution is:
* add an attribute value_qualifier of type STRING with allowable values
    

, <, >=, <=, = (since this is a closed list, using coded terms doesn't
      

seem to be useful)
* allow ELEMENT.null_flavour and DV_QUANTIFIED.accuracy to be used to
cover Vince's Inaccurate (probably wrong) and '~' (slightly inaccurate,
but usable value) cases respectively. In the latter case, it seems to me
that if accuracy is going to be reported, it should be quantified, the
way we do it in openEHR, i.e. +/- 5%, +/-2 and so on. Vince - am I being
unreasonable? Did you have '~' because labs devices output this?
    
If you are going to capture error limits around a scalar quantity, then
you need to also capture the nature of those limits. Sometimes they are
simply co-efficients of variation, sometimes one or two (or 1.96)
standard deviations (as frequentist confidence intervals for normally
distributed data, or asymptotically normal confidence limits for
non-normally distributed data), sometimes they are non-normal confidence
limits, and occasionally (but often with clinical trials etc) they are
Bayesian credible intervals. Then there is the confidence level - often
95% but sometimes 99%, sometimes less. Will the proposed solution cover
these and other scenarios?
  

Hm....that's a good question. Currently the model (see http://www.openehr.org/uml/Browsable/_9_0_76d0249_1109599337877_94556_1510Report.html) only captures limits as either a +/- percent, or as a +/- absolute value (see the accuracy attributes in the diagram) - it does this via the attribute accuracy_is_percent which is just a Boolean. What you are asking for would be accommodated by making it a code which indicated the meaning of the accuracy band.

So far we have not had such requirements expressed for the openEHR models, but as I happen to know you are coming form an epidemiological/public health/statistical point of view, clearly we need to accommodate them.

Tim, if the accuracy_is_percent attribute was upgraded to a coded value, could you suggest a set of meanings that would cover all the epi/PH needs?

- thomas

Tim Churches wrote:
> If you are going to capture error limits around a scalar quantity,
then
> you need to also capture the nature of those limits. Sometimes they
are
> simply co-efficients of variation, sometimes one or two (or 1.96)
> standard deviations (as frequentist confidence intervals for normally
> distributed data, or asymptotically normal confidence limits for
> non-normally distributed data), sometimes they are non-normal
confidence
> limits, and occasionally (but often with clinical trials etc) they are
> Bayesian credible intervals. Then there is the confidence level -
often
> 95% but sometimes 99%, sometimes less. Will the proposed solution
cover
> these and other scenarios?
>
Hm....that's a good question. Currently the model (see
http://www.openehr.org/uml/Browsable/_9_0_76d0249_1109599337877_94556_151
0Report.html)
only captures limits as either a +/- percent, or as a +/- absolute value

I think the term "absolute value" as you are using it here (and I understand how you are using it) can lead you into a semantic trap, because +/-2 is almost always a confidence band - the producer of the data is rarely absolutely certain that the true value is no more than 2 higher or 2 lower than the nominal value, just that there is a 95% (or whatever) likelihood that the true value is within those limits (and for the epidemiological and statistical puriosts, yes I know that that is not actually a correct statement for frequentist limits, but it is the most commonly used albeit slightly mistaken interpretation...).

Note also that confidence limits are no always symmetrical (because not all values have a normal distribution), so you need to be able to capture both upper and lower limits as separate values.

(see the accuracy attributes in the diagram) - it does this via the
attribute accuracy_is_percent which is just a Boolean. What you are
asking for would be accommodated by making it a code which indicated the
meaning of the accuracy band.

So far we have not had such requirements expressed for the openEHR
models, but as I happen to know you are coming form an
epidemiological/public health/statistical point of view, clearly we need
to accommodate them.

Almost all lab results are just best guesses from a statiistical distribution, so it is not just an epidemiological/public health concern.

Tim, if the accuracy_is_percent attribute was upgraded to a coded value,
could you suggest a set of meanings that would cover all the epi/PH
needs?

You'll have to tell me what that would involve. A single coded value? Upper and lower limits? Confidence level. Type of limit?

Tim C

Tim Churches wrote:

  

Tim, if the accuracy_is_percent attribute was upgraded to a coded value, could you suggest a set of meanings that would cover all the epi/PH needs?
    
You'll have to tell me what that would involve. A single coded value? Upper and lower limits? Confidence level. Type of limit?
  

well, essentially what you are proposing would require (let's not get too pure about how I use the word "accuracy" here for the moment):
- lower accuracy limit: Real
- upper accuracy limit: Real
- accuracy limit type: coded term
- confidence level (or this could be part of the previous coded attribute, since only a small number of confidence bands are used in practice aren't they?)

Now, what we currently have is a set of general purpose quantity classes designed to enabled recording of any quantitative data we have come across so far. Between various MDs such as Sam, Vince and others, I think we have pathology covered from a practical point of view (well, we do once we get this <, >, etc thing sorted).

The real question is: what is the type & origin of data that need to represented in the more sophisticated way that we are now suggesting? Is it a different category of data? Should be leave the current DV_QUANTITY as is and add a new subtype? Or is it that we should consider a quantity with a 95% T-distribution confidence interval as a pretty normal thing? Should we then start considering the "simple" idea of a symmetric accuracy range (+/- xxx) as really just one specific type of a confidence interval (it might translate to something like 98% on a normal curve). In other words, should we generalise he "accuracy" notion into a "confidence interval" notion?

- thomas

Hi,

A few words from a non-techie.

Quantity means that what is the resulting figure expressing a quantity.
Hb: 8.5 mmol/L

A property of the Hb measurement can be an uncertainty.
This is not an uncertainty of the figure “8.5”, but of the Hb measurement where 8.5 is the correct resulting number and mmol/L the code for the units.
There can be the question that the reported 8.5 really is 8.5 with/without roundoff error.
Only roundoff could be added to DV-QUANTITY as an added extra property, I think.

Uncertainty is added information that the uncertainty of the measurement is plus or minus something according to a specified (or implied) distribution type.
In my view uncertainty is the property of the measurement i.e. the specific archetype/template that will express the number
This uncertainty will be expressed in an archetype using attributes using DV-QUANTITY expressing the uncertainty as limits and a distribution type term (with a default gaussian distribution?)

Gerard Freriks

– –
Gerard Freriks, arts
Huigsloterdijk 378
2158 LR Buitenkaag
The Netherlands

T: +31 252 544896
M: +31 654 792800

Gerard Freriks wrote:

Hi,

A few words from a non-techie.

Quantity means that what is the resulting figure expressing a quantity.
Hb: 8.5 mmol/L

A property of the Hb measurement can be an uncertainty.
This is not an uncertainty of the figure "8.5", but of the Hb
measurement where 8.5 is the correct resulting number and mmol/L the
code for the units.
There can be the question that the reported 8.5 really is 8.5
with/without roundoff error.
Only roundoff could be added to DV-QUANTITY as an added extra property,
I think.

Uncertainty is added information that the uncertainty of the measurement
is plus or minus something according to a specified (or implied)
distribution type.
In my view uncertainty is the property of the measurement i.e. the
specific archetype/template that will express the number
This uncertainty will be expressed in an archetype using attributes
using DV-QUANTITY expressing the uncertainty as limits and a
distribution type term (with a default gaussian distribution?)

The foregoing seems sensible to me.

I think that uncertainty or confidence interval (or credible interval)
information for a scalar quantity (such as a biochemistry result) should
be treated in a similar manner to normal ranges for lab tests results.
How does openEHR handle normal ranges, which, depending on the type of
test, may be specific to each lab/assay method or kit and reference
population. My microbiologist colleagues keep reminding me that most
serological results can't really be interpreted in the absence of assay-
and lab-specific reference titres, and the same is true of many NAT
(nucleic acid test) assays, especially those involving PCR. Usually the
microbiologist or pathologist will provide their lab- and assay-specific
interpretation of the numbers for the requesting doctor eg "titre 1 in
256 i.e. positive for XYZ", and it could be argued that it is enough to
capture just the interpretation of the numbers - but that doesn't seem
tot be the guiding principle elsewhere in openEHR. For example, I marvel
at the completeness of the archetype for capturing blood pressure
measurements, right down to the detail of which phase of the Korotkoff
sounds is used, as I recall. Applying that same degree of attention to
detail to lab results means having the ability to accommodate quite a
lot of metadata about each scalar result. Mostly that detailed metadata
about accuracy or confidence limits or about assay types won't be
collected, won't be available or won't matter, but occasionally it will
matter, and I suppose that's what openEHR needs to plan for, within reason.

Tim C

Thomas Beale wrote:

Tim Churches wrote:

Tim, if the accuracy_is_percent attribute was upgraded to a coded
value, could you suggest a set of meanings that would cover all the
epi/PH needs?
    
You'll have to tell me what that would involve. A single coded value?
Upper and lower limits? Confidence level. Type of limit?
  

well, essentially what you are proposing

Not proposing anything, I'm just asking the question "Have you thought
about this?"

would require (let's not get
too pure about how I use the word "accuracy" here for the moment):
- lower accuracy limit: Real
- upper accuracy limit: Real
- accuracy limit type: coded term
- confidence level (or this could be part of the previous coded
attribute, since only a small number of confidence bands are used in
practice aren't they?)

Now, what we currently have is a set of general purpose quantity classes
designed to enabled recording of any quantitative data we have come
across so far. Between various MDs such as Sam, Vince and others, I
think we have pathology covered from a practical point of view (well, we
do once we get this <, >, etc thing sorted).

Just curious: have you had much input from pathologists, microbiologists
and lab scientists? The more one talks to such people, the more one
discovers about the uncertainties inherent in certain assay techniques,
and the differences in the scalar (and qualitative or Boolean) results
produced by different assay kits and different labs.

Oh, there's another form of uncertainty which typically is of relevance
to Boolean/dichotomous results (positive/negative, detected/not detected
etc) and that is the sensitivity and specificity of the test, or the
related quantities PPV (positive predictive value) and NPV. (Note to
computer scientists: "specificity" and "sensitivity" are cognate with
"precision" and "recall".)

The real question is: what is the type & origin of data that need to
represented in the more sophisticated way that we are now suggesting? Is
it a different category of data? Should be leave the current DV_QUANTITY
as is and add a new subtype? Or is it that we should consider a quantity
with a 95% T-distribution confidence interval as a pretty normal thing?
Should we then start considering the "simple" idea of a symmetric
accuracy range (+/- xxx) as really just one specific type of a
confidence interval (it might translate to something like 98% on a
normal curve). In other words, should we generalise he "accuracy" notion
into a "confidence interval" notion?

I think that a one or two day workshop with a range of pathologists,
microbiologists, lab scientists, epidemiologists and statisticians (and
some clinicians and computer scientists, of course) would suffice to
come up with sensible answer to your question. I'd be happy to
participate and to suggest other participants. First half day would need
to be spent bringing everyone up to speed on openEHR so they understand
the nature of the question(s) to be addressed (and a good means of
spreading the openEHR gospel while you're at it...).

Might be possible to hold a cyber-workshop instead, via email or
real-time conferencing. The former would be much slower, of course.

Tim C