Z-scores and percentiles

ian.mcnicoll · 23 November 2021 12:41

We already have a ton of optional attributes like that - accuracy, ref ranges, magnitude_status, normal_status are not used in 95% of quantities - one more won’t hurt!!

pieterbos · 23 November 2021 13:21

indeed, for DV_DURATION it is a Real, without indication what this means if it is not a percentage, for all the other types it is correctly a DV_DURATION.

thomas.beale · 23 November 2021 13:48

Er yes - sign of old age when you can’t read your own model

I remembered an older design from Gehr, where we had something called ‘customary quantity’ or maybe ‘customary units’ - can’t quite remember. Anyway, the idea was to deal with things like:

months and days
weeks and days
feet and inches (still the norm for height in UK as well as US)
lb / oz
stones + pounds (body weight in US)

We might need to think a bit more in that mode if we want a more elegant solution to the current question.

That was the thinking that led us to stick to a simpler system. Only problem of course is if the originally entered units are not the canonical ones that would be recomputed later, e.g. 1y and 5 days, or 40 weeks and 2 days - the top unit has to be remembered. But these are real edge cases I think.

pieterbos · 23 November 2021 15:49

Then the question: is this problem big enough to require a new datatype or an adaptation to the existing one? Is this needed often enough to motivate vendors to implement something new?

A pattern to be used, could be something like:

Cluster: quantity with Z-score
Element: quantity: the measurement
Element: quantity: z-score or percentile score
Element: dv_text with distribution

obviously with the more concrete naming. Relatively cumbersome to do. As a pattern that is just reused, it’s doable. As an archetype, it gets problematic. In a template you could override the units and other constraints of the quantity that is the measurement, in an archetype that would require a specialization for every use. Does this happen often, or is this just a few times in the entire scope of what should be in the CKM?

bna · 23 November 2021 16:23

I have a problem with this topic. I don’t think the Z-scores or percentiles is a property of the data. They are more a view of the data in a specific context. The percentile distribution will change depending on the dataset it is based on.

As such this kind of information needs to be stored together with some decision support system whee the recommendation is based on population X and not Y.

This might be modelled using existing features in tooling and specifications.

I discussed this with Silje the other day. And I didn’t really see the totalt picture here. Just sharing my thoughts and look forward to feedback.

siljelb · 23 November 2021 21:46

The problem with this is that we don’t want a generic CLUSTER archetype to represent the core elements of archetypes, for example Head circumference. This would have to be constrained in a template, which partly defeats the point of making specific archetypes for specific measurements.

siljelb · 23 November 2021 21:52

Agree, and this is why the data set of the hypothetical STATISTICAL_VALUE class needs to contain an element for the distribution/source/whatever.

Strongly disagree. Percentiles and Z-scores are bits of informations derived from population distributions and used for making clinical decisions, but they’re not inherently part of decision support systems wholly separate from the EHR data. They are used together with the measurements they’re derived from, and should be persisted in context.

bna · 23 November 2021 22:10

Yes - this is what we look at differently. I can’t see why a percentile or Z-score should be stored together with i.e. the head circumference or a body height. The measurement is the same no matter what kind of percentile or Z-score is used to evaluate the consequence of the measurement.

@siljelb and I tried to understand each others views the other day. Now we need someone to advice us on this

heather.leslie · 23 November 2021 23:20

The estimated gestation is actually not a measurement but reflects the typical gestation associated with the actual measured parameters of a fetus in utero. The associated variance expressed as =/- days reflects the distribution associated with a cohort/community that may or may not be as formal as SDs, but in clinical practice, it is used as a proxy for it. I’m not sure of the science behind where these gestation estimates +/- variance came from originally.
The actual things measured are uterine height, fetal BPD etc. The assigned gestation in weeks +/- days is inferred from the population statistics with associated variance, more akin to recording an associated population mean & SDs than not.
In any case, we are dealing with a Duration datatype, not an Amount/Quantity and we need to be able to associate units with it.

It needs to be recorded as a separate data element that is probably sourced from a test result or external knowledge base but is used as the basis for ongoing antenatal care.

As Silje and I discussed yesterday, actually quite a different thing to the Z score after all, and a diversion from the main subject (apologies for introducing the added confusion) but still needs its unique resolution.

As best I can see at the moment, accuracy is only available related to Quantity, as a real number or a percentage. The specs are not at all clear to me, and could do with more examples to clarify.

In addition, from a tooling POV, we need to be able to actively turn this kind of attribute ‘on’ in the modelling where we know it is always relevant eg so that clinicians can review it correctly as well as implementation advice to vendors. In other use cases, it can be available in the RM when needed.

heather.leslie · 23 November 2021 23:43

The Child growth indicators archetype has been developed with that kind of view in mind. It is pretty clunky but it kind of works.
The recent use case is about spirometry. While the measurement is accurate, the Z score is increasingly used as a statistical tool for the expression of results, to not rely on a number measured by the machine but instead focus on where a patient sits on the bell curve from a similar population cohort as a more accurate way to express the severity of disease.
In this situation having a Z-score as an attribute/alternative expression for the measured value deserves tight alignment, especially as there are multiple spirometry measurements potentially expressed this way. Multiple CLUSTERs identifying knowledgebase, cohort, Z-score etc is unwieldy and can easily be disconnected from the actual result.

I suspect that we will see this kind of expression of measured values to become increasingly prominent, especially as we get more population-based health data and AI etc. Manually adding a Z-score associated with every measurement in the Spirometry model is possible but unwieldy. Potentially adding it into any/every Quantity data type will become a future nightmare.

thomas.beale · 24 November 2021 10:13

I guess it’s still an individual estimate, but I see what you mean - in the ideal world of knowing exactly the cohort corresponding to the observable facts about the current pregnancy, you would just be quoting the EDD of that cohort rather than ‘estimating’ as such. But that’s probably close to impossible since the Obs/midwife is taking a figure from a much more general cohort and estimating adjustments to it based on e.g. course of previous pregnancies as well as current observables… so probably each woman is a cohort of one. The real question is probably whether you foresee the need to include some population related values like SD, variance, distribution etc with EDD.

Yep - but we’ll need to treat this as a separate question from the z-score one I think.

That is correct; we should add more examples. If you create a new PR here (press Create button at top), and add a subject and short description, that will record this need…

thomas.beale · 24 November 2021 10:19

Do you mean multiple raw values, each with its z-value, and then separately from that, the relevant fixed distribution data? Is there an example that you/someone could create to show what the most complex case of this type of data recorded in a single encounter is? E.g. bullet-point logical structure or mindmap etc.

DavidIngram · 24 November 2021 10:58

“While the measurement is accurate, the Z score is increasingly used as a statistical tool for the expression of resultshttps://pubmed.ncbi.nlm.nih.gov/29873048/, to not rely on a number measured by the machine but instead focus on where a patient sits on the bell curve from a similar population cohort as a more accurate way to express the severity of disease.”

An interesting discussion from different perspectives, with good arguments both ways. Bjorn invited wider input, so here are my thoughts.

Placing measurements in context is always useful, and likewise for interpreting their meaning and consequence. Like any such statistic, the z-score rests on assumptions about the population it describes, and these are matters for empirical investigation. Z-score rests on assumption of normal distribution [ Z = X- mu/sigma]. This is often a good approximation and sometimes very much not – in the world of endocrinology, distributions I have tracked, even after logarithmic transformation, were highly skewed. Assuming a distribution is usefully characterised as normal, the question of what ranks as a similar population will often remain highly contextual. As David Spiegelhalter’s wonderful book, The Art of Statistics, emphasises, it is important to visualise the raw data. It is a tricky path when we start to treat statistics themselves as if they are data. Hence, I imagine, some of Bjorn’s concern.

I can see the practical arguments in and around how best to capture such scores within the openEHR modelling paradigm. I can see the argument that confusing scores with raw data can become a slippery slope. As so often in matters medical, these are judgements about general and particular cases. openEHR methodology is a general formalism and struggles, as all such formalisms must do, to balance this with particular cases. Too many particular adaptations and the generality loses its power and appeal. Too much blanket generalisation and the real world loses touch.

I haven’t yet read the later postings this morning. I will follow with interest and hope a wider group will add their thoughts, as Bjorn encouraged!

thomas.beale · 24 November 2021 12:12

This should be carved on the lintel of the main entrance to the openEHR edifice

Leuschner · 24 November 2021 13:06

Hi everyone.
My contribution to the discussion will ellaborate on the use case of Z-scores I am most familiar with: LFT’s. Although I can’t confidently choose one side of the ‘barricade’, I hope I this insight can help the discussion.

Historically, ‘normal’ values for spirometry were defined per equations derived from data of relativelly small, male-predominant populations; in Europe, this was aggravated by the fact that this population was almost exclusively composed of white male miners. Women’s ‘normal’ was defined as 80% of the male average values; and for each gender, the normal interval was defined as 80-120% the average value;
Recently, new ‘normal’ values were derived from GLI (Global Lung Initiative) raw data;
This same movement proposed to use Z-scores to define normality (-1.64 < Z < + 1.64), on the assumption that 80-120% was an even worse approximation to the normal interval;
Both GLI equations (to define normality) and z-scores (to express results) are now routinely used.

Nevertheless, absolute ‘raw’ values still support decisiona making in some settings (e.g. FEV1>1L for lung ressection in cancer). Moreover, if we don’t record the raw values (measurement and equation derived normal), it will be harder to accomodate for future changes in test result expression.

My punchline could be ‘raw data is more future-proof, but clinicians need imediate, effortless data transformations for ongoing decision support’.

Seref · 24 November 2021 16:55

Deja vu you say?
https://www.mail-archive.com/openehr-clinical@lists.openehr.org/msg04184.html

(responses at the time are at the bottom)

DavidIngram · 24 November 2021 17:48

C’est vrai, Seref! - just had a look – it might be worth re-posting the points you made then.

Seref · 24 November 2021 18:39

Sure David. Here it is:

Hi Heather,

I’d humbly advice against making Z-Score an attribute for every quantity
data.

The first reason is that Z-Score is a meaningful metric for the assumption
of normality, that is the possible values of the numeric quantity
demonstrate a particular characteristic (the bell curve)

In a non-normally distributed case, the Z score is meaningless and even
though many stakeholders end up assuming a normal distribution, there is a
lot of data out there that is not distributed normally.

So if Z-Score becomes an attribute of every quantity type it will be an
attribute that implies a particular statistical/probabilistic context when
it is set and won’t be set that many times given all the instances of
quantity types in openEHR.

So I believe from a software design/implementation perspective, Z-Score is
a higher level concept than the quantity primitives we have in the
reference model and should be modelled at the archetype level.

The slightly more subtle reason for not making z-score an attribute of
quantity types is the wider definition of a probabilistic event. A
particular diagnosis in a population could be interpreted as a
probabilistic event, for which there is a population distribution, the
prevalence, if we bend the language towards the clinical lingo.
Now when that diagnosis is expressed with a DV_CODED_TEXT and mappings to a
snomed_ct code, there is also a valid requirement to associate a z-score to
it, say to assess the likelihood of a particular diagnosis in the context
of the patient’s data.
So now it may make sense to add Z-Value to dv_coded_text as well

So even though it looks like a simple metric, the Z-Value is associating
statistics and all things based on it (machine learning, risk estimation
etc) to clinical data and that association should be very clearly described
IMHO, which would not be possible if we do it at the RM level.

If all of the above makes no sense and I completely misunderstood the
suggestion, then accept my apologies along with a nice pint that I’ll buy
when I see you next time

Kind regards
Seref

Seref · 24 November 2021 19:28

Replying to myself from 2017, introducing a Stochastic package to RM , with Dv_Beta, Dv_Normal etc , say, based on the research at hand may solve the problem I pointed at. At archetype level, modellers can compose values of these types with DV_whatever’s we already have using a cluster.

And no Tom, they are not to inherit from DV_Stochastic, just go with composition over inheritance for once please Let them float freely in that package…

siljelb · 24 November 2021 20:03

I’m concerned modelling z-scores et al manually would make for very unwieldy archetypes. Especially if it was to be done as a cluster. What would something like that look like, and how would it impact archetypes as standardised information “packages” for specific clinical concepts?

I’m also having trouble seeing the difference in principle between this and reference ranges. Reference ranges are usually derived from a distribution (normal or otherwise), with the lower and upper ranges defined by a certain lower and higher percentile.