Pathology numeric values not supported in DV_Quantity

Sam would be better able to give an idea of all the health professionals who have been consulted, but certainly in Australia, Vince McCauley (a pathologist) has been extremely helpful on pathology result detail. Also, people like Heath Frankel and Grahame Grieve who have worked with HL7v2 messages for years have provided quite a lot of input on details (for example in Release 1.0, there is now a summary attribute for Historical data structures, directly due to Grahame's advice on the shape of lab data his software handles - see http://www.openehr.org/uml/Browsable/_9_0_76d0249_1109157527311_729550_7234Report.html).

Is it enough? At this stage I would be fairly confident that the models are good enough for most pathology data (certainly everything any of the docs working with openEHR has seen). Are they perfect? Of course not. We always need more input. The confidence level stuff implied by your requirements (let's treat them as epi/public health data requirements) would make things better; we just have to determine a) what scope of data they apply to (e.g. how much sophistication do we need in the EHR compared to say a dedicated data warehouse designed for statistical studies?) and b) how to add them to the current model in a way compatible with what is there.

I think that he idea of a workshop is a good one; I would prefer to see clinical professionals here take up the suggestion and do something with it; I don't see these kinds of discussions as being IT driven - they are all about articulating requirements.

- t

Tim Churches wrote:

I agree with this - that's it good enough now.

I think this thread is starting to talk about things which aren't properly part of the data type, they are conceptual things about
the result values, and should be modelled explicitly in the archetypes.

Grahame

Thomas Beale wrote:

Grahame Grieve wrote:

I agree with this - that's it good enough now.

I think this thread is starting to talk about things which aren't properly part of the data type, they are conceptual things about
the result values, and should be modelled explicitly in the archetypes.

Grahame,

is it your feeling that we need to have a better model of accuracy, i.e. more like the confidence interval idea? Or are we ok with what we have? My gut feeling is to leave the current DV_QUANTITY the way it is and consider either
a) doing nothing - treat Tim's requirements as requirements not on primary data going into the EHR, but on generated data of the kind found in an epidemiological/statistical style of system or
b) add a variant of DV_QUANTITY (probably a subtype) that does the full deal (and is convertible into a vanilla DV_QUANTITY).

thoughts anyone?

- thomas

hi Thomas

is it your feeling that we need to have a better model of accuracy, i.e. more like the confidence interval idea? Or are we ok with what we have?

well. a measured quantity is a group of data, with some or all of the following
things known:
- what was measured
- how it was measured (+ who & when & where & environment conditions!)
- units for the values
- a possible range of values

What was measured does usually need to be known. In terms of modeling this as a data type, I note that what was measured is usually considered to be outside the data type. And though I wonder whether this is actually right, we're not discussing this right now.

In real life, we generally don't count how it was measured as part of the value. The idea is that you go read the "methods section" (whatever that means) if you care. Except that in clinical medicine, there's a few things where how something is measured matters. in clinical medicine, we generally say that something else was measured at this point. A classic example is Total Calcium and Ionized Calcium. (and it's not wrong to call them
different things, my point is that it's arbitrary). Anyhow, I've never heard someone argue that they should be part of the datatype.

I don't think there's much point in differentiating between a measured
and a non-measured quantity - that's for philosophy.

so, back to the possible range of values. This is a complex concept. Generally, the possible range of values is a bell shaped probability distribution (or a log bell curve), but it's rarely properly known whether it actually is - it's generally assumed that it is a bell curve. You could *approximate* the concept of a probability distribution by
reporting a central value with a +/-, or an interval that expresses the
95% percentile.

I know that we got taught in uni to track uncertainties (and sometimes even to quantitate the distribution curve), and to bring them through our equations (and conclusions!), but out in the real world, it's rarely done in published papers (shame, really) and I've never seen it done in clinical work (even in clinical research).

In clinical medicine, the only behaviour I've seen is to report a single value, what was actually measured, and not say anything at all about the uncertainty. No, I'm wrong. Once I used to perform an assay where the methodological uncertainty in the number was clinically significant. We used to report a range rather than a point value, so's the doctor's couldn't be mistaken about it's meaning.

Reporting <X or >X for a value is something that you have to do if you
aren't normally reporting a range of values. So you said you didn't want
to model that as an interval, but I was less than convinced - if you always
reported an interval, it would be consistent. But even if you were consistent
in this way, the methodological basis for the "interval" <5 or >5000 is not
the same as the methodological basis for 100-110. These concepts overlap.

If you added confidence interval - as an optional item - then you get an interesting situation. If I say that this value is <50 (ci=100), what am I saying? (and don't laugh, this is a common clinical result value to report).

In clinical medicine, also, the things that may corrupt the result due to interference from drugs, unusual medical conditions, etc, these don't contribute to the distribution range, so it's not usually significant.

This is starting to ramble. As I said, in clinical medicine, we only report a single value, let the interpreter figure out the distribution themselves. If they're not sure, they should contact the number on the report (in all
legal jurisdictions I know, there must be one).

I think that for the rare cases where the distribution range needs to be conveyed/stored outside the generating system, then the archetype should
store it. The archetype already includes some of the other stuff in my original data grouping, so I don't see it as inappropriate to solve it this way.

so, leave it as it is.

Grahame

Thomas Beale wrote:

Sam would be better able to give an idea of all the health professionals
who have been consulted, but certainly in Australia, Vince McCauley (a
pathologist) has been extremely helpful on pathology result detail.
Also, people like Heath Frankel and Grahame Grieve who have worked with
HL7v2 messages for years have provided quite a lot of input on details
(for example in Release 1.0, there is now a summary attribute for
Historical data structures, directly due to Grahame's advice on the
shape of lab data his software handles - see
http://www.openehr.org/uml/Browsable/_9_0_76d0249_1109157527311_729550_7234Report.html).

Is it enough? At this stage I would be fairly confident that the models
are good enough for most pathology data (certainly everything any of the
docs working with openEHR has seen). Are they perfect? Of course not. We
always need more input. The confidence level stuff implied by your
requirements (let's treat them as epi/public health data requirements)
would make things better; we just have to determine a) what scope of
data they apply to (e.g. how much sophistication do we need in the EHR
compared to say a dedicated data warehouse designed for statistical
studies?) and b) how to add them to the current model in a way
compatible with what is there.

Sure, that's a very reasonable position. I was not suggesting that
openEHR *must* to accommodate such things, but as someone else opened
the Pandora's Box of +/- accuracy as a data value property, as opposed
to part of a higher-level Archetype construct, I felt obliged to point
out that there was more to it than there might appear at first glance.
But a system for EHRs can't accommodate every subtlety of the Universe,
so best to force the lid back down on the Box in this case.

I think that he idea of a workshop is a good one; I would prefer to see
clinical professionals here take up the suggestion and do something with
it; I don't see these kinds of discussions as being IT driven - they are
all about articulating requirements.

Happy to participate and to suggest other participants if someone wishes
to organise one.

Tim C

Thomas Beale wrote:

Grahame Grieve wrote:

I agree with this - that's it good enough now.

I think this thread is starting to talk about things which aren't
properly part of the data type, they are conceptual things about
the result values, and should be modelled explicitly in the archetypes.

Grahame,

is it your feeling that we need to have a better model of accuracy, i.e.
more like the confidence interval idea? Or are we ok with what we have?
My gut feeling is to leave the current DV_QUANTITY the way it is and
consider either
a) doing nothing - treat Tim's requirements as requirements not on
primary data going into the EHR, but on generated data of the kind found
in an epidemiological/statistical style of system or

I think that is the best option, but I must point out that not all of
the things I mentioned were purely of interest to epidemiologists and
statisticians. I don't have time to look right now, but I am sure
openEHR has the issue of normal/reference ranges for lab results well
and truly covered, but the issue of the specificity/sensitivity/PPV/NPV
of a particular type/method/brand of test *is* of immediate clinical
interest in some circumstances, and doesn't belong only in a data
warehouse. But it probably belongs in the Archetype, not the reference
model.

Tim C

Dear Tom and all

I must say that quantifying accuracy and uncertainty is very difficult - and I do like the inclusion of ~ in the set of flags to mean approximately - when there is no idea of accuracy from a mathematical point of view. I think we may lose something if we try and get it all in the notion we currently have of a measured accuracy.

Sam

Thomas Beale wrote:

Sam Heard wrote:

Dear Tom and all

I must say that quantifying accuracy and uncertainty is very difficult - and I do like the inclusion of ~ in the set of flags to mean approximately - when there is no idea of accuracy from a mathematical point of view. I think we may lose something if we try and get it all in the notion we currently have of a measured accuracy.

Sam

notwithstanding the fact that the '~' flag probably worked well in Vince's system, I cannot help but wonder what it really adds; if you don't know either the absolute limits (of the measuring device) or statistical confidence limits, how do you compute with it? How can I write a decision support program that takes notice of a '~' flag?

- thomas

Tom,
Does leaving "the current DV_QUANTITY the way it is" include the ability to
record "< 5 mmol/L" for example?

Heath

Heath Frankel wrote:

Tom,
Does leaving "the current DV_QUANTITY the way it is" include the ability to
record "< 5 mmol/L" for example?
  

yes - sorry - that was ambiguous - we have to make that addition (using a coded attribute).

- t

The 'coding' is surely 'Accuracy' ('Measurement' has 'Accuracy') where this can be None|~|Unknown|Percentage(value)!SD(Distribution type,value)
which would cover any measurement (e.g.height,heart rate), not just pathology lab values

Regards,
Colin Sutton

It is a flag that says the value is very uncertain - accuracy is not known - how do we say this - or a quality factor makes the reading very uncertain. I just want to be able to see how we express when accuracy is poor but not quantifiable.
Sam

Thomas Beale wrote:

Sam Heard wrote:

It is a flag that says the value is very uncertain - accuracy is not known - how do we say this - or a quality factor makes the reading very uncertain. I just want to be able to see how we express when accuracy is poor but not quantifiable.
Sam

doesn't it mean that the value is completely useless, and that instead, a null-flavour flag should be set in the Element, and no data value be recorded at all?

- thomas

Thomas,

In a data type like DV - in my mind- only flags can be raised that indicate the technicalities of that number. And that means "round off error" with which it is reported.

All other flags are at the archetype level.
Null-Flavors belong there. It is all at the semantic level, the knowlegde level.
And not the numeric interpretation level.

Gerard

-- CEN/tc251 Convenor --

Gerard Freriks, MD
convenor CEN/tc251 WG1

TNO ICT
Brassersplein 2
Delft

T: +31 15 2857105
M: +31 6 54792800

Thomas Beale wrote:

Sam Heard wrote:

It is a flag that says the value is very uncertain - accuracy is not
known - how do we say this - or a quality factor makes the reading
very uncertain. I just want to be able to see how we express when
accuracy is poor but not quantifiable.
Sam

doesn't it mean that the value is completely useless, and that instead,
a null-flavour flag should be set in the Element, and no data value be
recorded at all?

There is no such thing as a pre-condition in clinical medicine: the
clinician has to compute with whatever parameter values are provided, no
matter how shabby, incomplete and inconsistent they are... Floyd and
Hoare would have hated being doctors of medicine, I suspect. :wink:

Tim C

Tom
That is unlikely - it just means it is the best that could be done. Remember this can be used usefully outside of labs too.
Sam

Thomas Beale wrote:

Colin Sutton wrote:

The 'coding' is surely 'Accuracy' ('Measurement' has 'Accuracy') where this can be None|~|Unknown|Percentage(value)!SD(Distribution type,value)
which would cover any measurement (e.g.height,heart rate), not just pathology lab values
  

this seems pretty close to a correct model. Slight corrections I would suggest are:
- I am still uncomfortable with '~', since it seems to mean "approximate", but "we don't know how approximate"...
- does "None" mean a) none recorded (i.e. don't know, i.e. same as '~') or b) no accuracy, i.e. an exact value (reasonable for some things, e.g. the answer to the question "number of previous pregnancies")?
- in the case of a statistical distribution, one value may not be enough to characterise the limits, since the distribution may be asymmetric (I don't remember enough beyond normal/T/Chi2 to remember if there are distributions that need even more parameters).

The question for us in openEHR is how much to implement of such a model: we have to be driven by real use cases.

- thomas beale

Thomas Beale wrote:

Colin Sutton wrote:

The 'coding' is surely 'Accuracy' ('Measurement' has 'Accuracy') where
this can be None|~|Unknown|Percentage(value)!SD(Distribution type,value)
which would cover any measurement (e.g.height,heart rate), not just
pathology lab values
  

this seems pretty close to a correct model. Slight corrections I would
suggest are:
- I am still uncomfortable with '~', since it seems to mean
"approximate", but "we don't know how approximate"...
- does "None" mean a) none recorded (i.e. don't know, i.e. same as '~')
or b) no accuracy, i.e. an exact value (reasonable for some things, e.g.
the answer to the question "number of previous pregnancies")?
- in the case of a statistical distribution, one value may not be enough
to characterise the limits, since the distribution may be asymmetric (I
don't remember enough beyond normal/T/Chi2 to remember if there are
distributions that need even more parameters).

In terms of statistical confidence limits/intervals, the parameters are:
the type of limits/interval (frequentist "confidence interval" or
Bayesian "credible interval", the confidence value (typically 95%, but
often not), and the underlying assumed *error* distribution (normal,
Poisson, Student's T, Weibull etc etc).

However, confidence intervals/limits don't indicate where in a
population distribution a particular value lies - quantiles are more
often used for this - the actual quantile of the value (eg for growth
measurements read against a normogram), or the values of the quartiles
or 5th and 95th percentile, or variations on that.

The question for us in openEHR is how much to implement of such a model:
we have to be driven by real use cases.

If you really want to nail this problem, a workshop involving a range of
people (from lab scientists to pathologists to clinicians to
epidemiologists and biostatisticians) is required, I think. It could be
in the form of a virtual workshop via email, but you really need to
gather together a diverse group, state the problem/s to be solved (eg
lab values only, or physical measures only, or to include other things
like measures derived from psychological scales or population or study
measures like odds ratios and relative risks or age-standardised
rates?), and get them to generate use cases and explore the issues.
Happy to be involved, but as an epidemiologist, I'd feel more
comfortable if some mathematical statisticians and some lab scientists
were involved too.

Tim C

Folks,

I will repeat myself.

You are talking about a data type.
This DV_Quantity is a number.
The question is how do we embellish this data type and the number it presents with extra codes/numbers to indicate: types of certainty/uncertainty, and statistical distributions.

The only real meaning of an extra attribute as part of DV-Quantity pertains to the number given and not the meaning (interpretation).
The extra attribute in DV-Quantity will provide information about the precision of the number, only.

Any extra information is a property of the concept in which DV-Quantity is used. E.g. certainty/uncertainty, distribution, etc.
It is related to the specific concept and its context that is being expressed and not the expression of a number/data type.

~, statistical distributions, etc will have to be expressed at the level of Concept definition and therefore the Archetype.

Greetings

Gerard

– –

Gerard Freriks, arts

Huigsloterdijk 378

2158 LR Buitenkaag

The Netherlands

T: +31 252 544896

M: +31 654 792800

Tim,

I agree with the workshop idea, and assume that it could at least be done in Australia as a starting point. Thus, for the short term, I am inclined to add only the very simple "<, >, <=, >=, =" indicator, and possibly consider the "~" one (since these at least allow us to properly represent very low and very high path test values that are sent as "<5" and similar). The complex stuff that Tim has described below needs proper modelling and in the end will lead to new data types (and as Gerard says, it may well lead to something in the archetypes). As with everything, we need to really understand the exact requirements first, and that probably won't happen without a workshop.

- thomas

Tim Churches wrote: