Z-scores and percentiles

borut.jures · 24 November 2021 20:12

I’m reading and learning (all this is new to me) but have some UI related input.

I started generating forms from the OPTs and it helps to have a clear purpose for the DV_ types. They are prime candidates for the optimized entry components.

So just from the UI point of view I would vote for a new DV_ type to be considered.

thomas.beale · 24 November 2021 21:18

Well they might just be defined by physics / biochemistry etc as well - e.g. a systolic of 180 will have a physical implication for weaker vessels / stroke etc that could be determined by observation of populations, but also (in more recent times) by a physics /fluid dynamics model.

Taking a random example: I see in Chernecky & Berger 3rd Ed. that PaCO2 has a reference range termed ‘panic values’: <= 20mmHg, > 70mmHg. SpO2 panic range: <= 60%. Is this determined from population studies, or is there a model that predicts that <=60% SpO2 will cause irreversible cell injury? I don’t know, but I don’t think reference ranges should have to be based only on statistical variance etc.

siljelb · 25 November 2021 07:20

I agree, and I’m not saying they always are. Just usually. (Reference range - Wikipedia)

DavidIngram · 25 November 2021 12:12

I am not knowledgeable enough anymore to weigh the requirement and implementation issues in play in this discussion. It is clearly an important matter, though, as I outlined before. I hope the following further reflections may be useful in helping decide the immediate issues you face with handling z-scores.

Silje’s question is important and the approach she rightly points out for defining a reference range for a relevant, well-defined and well-sampled reference population seems unarguable and useful to have available in context of interpreting the raw measurement. This use relates directly to the datum itself and not to a statistic derived both from the datum and summary characteristics of an assumed population from which it is drawn (mean and standard deviation in the context of Z-score). That is a difference which may carry significant consequences downstream – both for openEHR methodology and how it is used in practical clinical context. It can open a door to noise and bias in judgement (see below), so needs to be understood carefully. One has to think about how such information is going to be used.

I see long lists of clinical measurements and their reference ranges on my phone, in my GP health record. At my age, I get screened from time to time using automated tests of blood biochemistry that report maybe 50 such numbers. The GP is alerted about any that may be drifting outside this range and is invited to comment for the record when they do. Usually ‘acceptable’ or ‘will keep an eye on this next year’ or the like, in my case, fortunately, still! . It’s a rule of thumb judgement based on personal experience – the GP’s of lots of folks in their mid seventies, like me, and of my particular profile. It is a judgement and as Kahneman emphasises in his brilliant recent book (Noise – a fault in human judgement), there is a lot to be said for treating such judgement as akin to a measurement, with associated bias and noise. The examples he uses in developing this argument, notably in judicial proceedings, are mind boggling! It’s a great read.

In the clinical world, trends within a statistically defined reference range may well be individually relevant and may be significant (someone normally at one end of the range and progressing steadily towards the other end, for example) . Normal clinical variability in a population is notoriously wide and can mask ongoing trends that are already well advanced when they drift outside the defined reference range. Its hard not to see this surveillance of data not merging into the world of analysis and judgement. The issues being debated here are about a grey area between characterising and recording measurement and characterising and recording judgement and action.

The kind of measurements Thomas was mentioning – partial pressure of CO2 or oxygen saturation – are not so easy, or perhaps useful, to compartmentalise in reference ranges. They reflect all sorts of non-linearities that can quickly become acute problems. The shape of the oxygen dissociation curve is such that the body copes well until deoxygenation progresses to a different region of the curve where the blood’s ability to transport oxygen to support metabolism becomes rapidly more at risk. I doubt anyone would thank their doctors for awaiting a crisis alert at oxygen saturation as low as Thomas quoted! Likewise, the CO2 range he mentions needs to be understood in context of the ongoing physiological mechanisms of lung function, gas exchange and tissue metabolism. The numbers quoted are so wide-ranging that they mean rather little in this practical context of management. I spent nearly 20 years modelling such physiology and putting it to work in clinical context. It’s hard to get beyond applying fairly simple to follow and experienced rules of thumb in this area and hoping for the best! I’ve seen this reality at first hand in both neonatal and adult ICU context over the years.

It seems to me that it is unwise to let decision support-like issues creep too far into data definitions. That’s why this particular use-case around Z-score seems a fairly pivotal one. I assume no one is arguing for a machine learning algorithm to become associated with data definitions. If so, we should probably abandon openEHR and let the machines get on with it! We’re not quite ready for that (yet or hopefully ever!), but where do we steer our modelling paradigm as boundary cases like this challenge existing openEHR methodology and requirement for it to evolve further. Its an empirical matter and that’s why implementation, implementation, implementation (in all the domains openEHR interacts with) must remain the top three priorities guiding development of both openEHR methodology and its wider community. That was written into our community ethos from day one.

I hope this doesn’t sound too irritatingly detached from the nitty gritty of what you are trying to resolve. One tends to a helicopter view of much of life as one gets older!

The

Seref · 25 November 2021 12:34

Ok, this time replying to Silje rather than myself from the past.

I did a quick (and still incomplete) read of the responses so far and it looks like even though your original question sounds impartial, based on your later responses it looks like you do have a preference, and that is to have z-scores in the RM as an attribute of a suitable type.

Which is absolutely fine. That’s a preference, and as most things are, a design choice. Your preference is based on the Cluster based design (which Tom also mentions in his early responses) producing very unwieldy archetypes, in your own words.

If I try to reword my old response: I think having this as an attribute in an existing RM type may lower the precision of RM as an alphabet of clinical data. In exchange for loss of precision in expressing data semantics, you gain convenience in modelling concepts: because of inheritance, attributes are just there for you to populate, no need to express that with a cluster (or so I read your response).

We can live with the loss of precision, if it’s a 80/20 or maybe even a 90/10 situation. The funny thing is of course, my view may not find support from anybody else. I am more inclined to think about statistical concepts from the perspective of a statistician than a clinician’s, and my view of the domain may therefore be different than that of a clinician, who’s used to using a z-score as a pragmatic metric that is implicitly connected to a number of concepts: normality, population mean etc.

@bna 's point about why are we event putting this into models is even more interesting, because he makes a good point about this being a kind of metadata, especially if you’re adopting a more statistician view. The distribution of data is meta-data indeed, just like how it can be displayed on the screen! I have always been against higher level concepts leaking into clinical models, so if I adopt @bna 's pov, then I’d have to agree not having z-scores in RM or archetypes

So I’ll suggest we discuss who the stakeholders are

clinicians,
secondary use consumers (statisticians etc),
software developers (and companies who pay their salaries )

and then decide who’s our priority here and what kind of agreement would make all stakeholders reasonably happy or at least leave them grumbling least

thomas.beale · 25 November 2021 12:45

We should heed this advice…

Well, if we put it in the wrong place / wrong way. Dedicated data types, possibly wrapping other raw values are the way to go in my view, not the addition of stats-related attributes into existing data types, which all represent raw data rather than statistically annotated or processed data. If we want to add something to the RM, there will be a way to do it that has no adverse effect on existing data or software.

I would say that we can assume that if point-of-care clinicians are looking at z-scores etc, then having a clear representation that doesn’t mess with any other data is essential. But the long term priority for statistical annotation / generated values is surely your second category - secondary use consumers (statisticians etc). That implies solving the general problem properly - i.e. representing any kind of distribution (at least, recording the name) and relevant params.

We still only want to limit ourselves to the data types we need in the individual patient record or reprocessed versions thereof (e.g. HighMed-like) - full CDS, trial study solutions etc will have needs around representing entire populations and their statistical characteristics.

siljelb · 25 November 2021 12:50

What would this look like in terms of practical modelling? Would we have to manually change the data type in every use case where we anticipate the need for z-scores or percentiles?

Agree.

thomas.beale · 25 November 2021 12:55

Good question… I think initially you (modellers) would need to make it fairly black and white : is the intention in this data field to record a z-score (or similar stats piece of info). Consider that emerging low-code form & app generators will use what is in the archetype to generate entry widgets / labels / controls. We don’t want to see all that extra stats stuff being added to ‘normal’ fields, but we do want to see it on the fields where we really need it. This doesn’t answer your question properly, but I think the CM group needs to analyse not just the data but the question of when you do / might need it.

siljelb · 25 November 2021 13:02

To me, this means that we may have to make breaking changes to a significant number of the published archetypes, making this modelling change. As Heather pointed out, this kind of information will likely be used in an increasing number of use cases in the near future. Does this mean that we should model every DV_QUANTITY like this in the future, to avoid having to make breaking changes if a z-score requirement pops up at a later point?

thomas.beale · 25 November 2021 13:30

Hm.

Solution #1

that can be an argument for modelling z-score and similar archetype fields rather than RM types, if you want to be able to retrospectively add in z-score data points on top of existing DV_QUANTITY and similar raw items. You would need to leave the current items where they are and make either direct siblings, or a sibling CLUSTER for each field of name xxx, named (say) xxx-statistical or so. This will mean the changed archetype will still work with old data, and the old archetype version will work with data create by the new version (i.e. in other systems, apps etc).

This will be fairly clunky and I suspect will not feel very clean for the variety of statistical-related needs going forward.

Solution #2

We could add a new optional attribute to (say) DV_QUANTIFIED (or maybe even DATA_VALUE - seeing what @Seref said earlier) that pointed out to some DV_STATISTICAL_VALUE kind of type (let’s just assume this could be agreed and designed, or @Seref just works it out . If you opened an old archetype with the Archetype Designer that had loaded the new RM with this new attribute, your DV_QUANTITYs and other data types would now have some further archetypeable things that you could constrain - this is where you’d say z-score / normal, or T + params, or X + params etc.

Updated software and forms could now show (say) an optional button next to your raw value for adding z-score or similar (depending on what you archetyped). As long as all that data were optional, that new software won’t die with old data, and old software probably won’t die with new data that have those fields filled in (it depends a bit on DB representation and method of data materialisation).

Note this solution is the opposite of ‘wrapping’ - indeed it is more like the way reference ranges are attached to Quantity data items.

This kind of change really requires much more careful analysis than I have given here, by (at least) the usual suspects i.e. experts on the clinical side, and @borut.fabjan , @yampeku , @pieterbos to analyse it through properly, plus some secondary processing people.

I’d recommend a dedicated wiki page to explicate the problem / needs, and probably some dedicated calls to figure out all angles of proposed technical change impact on tools, data etc.

You have opened not a can of worms but a reservoir of alligators

siljelb · 25 November 2021 13:37

I like this suggestion.

I’d be happy to participate. We should probably find some clinicians who are particularly well versed in statistics too.

I suspected as much when I started the thread. We’ve discussed this at least twice before, including the thread referenced by Seref. without finding a workable solution. Hopefully third time’s the charm

bna · 25 November 2021 16:12

I have not yet seen any evidence or examples of this. Would be great to see some examples.

Reason why I am asking is of course biased by my initial thinking that this kind of information is not part of the datum/measurement.

I would appreciate a concrete example for this kind of modelling.

Seref · 25 November 2021 17:00

I’d say that feels like a move towards a god class. I’d live with it in the name of being pragmatic to keep modellers happy, but dv_date_time ending up with an optional z-score will lead to some interesting conversations years down the line when someone asks why’s this here?. As I said before, it’s a tradeoff.

yes. stakeholders, as I called them.

Wholly agreed.

thomas.beale · 25 November 2021 17:21

We don’t do God classes, only ancestors I wasn’t being 100% serious about DATA_VALUE, but we do need an analysis of whether a statistical value type is needed for anything outside DV_ORDERED or Quantity types. My gut feel is that’s as far as we should go, but others may know better. When we know what existing data types we want to apply statistical annotations / values to, we’ll be able to see how to model it properly.

heather.leslie · 26 November 2021 03:57

We do need to plan for managing the way data might be collected and used in the future. We have a well-documented example in front of us, supported by academic papers. The Spirometry example is perfect - it’s a new approach developed only in the past couple of years. I’d never heard of it before. And it makes good clinical sense.

As personalised medicine comes to the fore, the Z-score for various measurements such as Spirometry identifies where someone sits in the bell curve for their personal characteristics, genetic profile, etc will become more meaningful. The raw value recorded will still need to be stored, but the Z score that places that raw data within a population context will likely be used to determine severity or treatment options and warrants storage as well. It could well be that the Z-score, rather than the raw score, becomes the trigger for CDS.

We don’t have a crystal ball here to foresee how much medicine changes, but as we get more data on populations, the capacity to refine clinical treatment to the individual within an appropriate population context will rely on these kinds of derived datum.

joostholslag · 26 November 2021 11:03

I’ve been tracking this topic. At first I wasn’t really interested, since I’ve not come across this requirement. But I do think Heather is right, with the advance of personalised medicine, z-scores are very useful.

I would like to add one perspective that may explain a bit about the disagreement between Silje and Bjørn:as stated before z-scores indicate the variance between the recorded measurement and a population. This is not an observed value in itself. But since the population data is/was pretty fixed, the z-score became pretty much a 1:1 transformation of the observed value. And this feels like an observable entity. But it isn’t, since the population data will change, changing the z-score and this is against the principle of an observable entity: it shouldn’t change if circumstances change. This currently is an negligible problem, since the population data is very fixed, and the observed valued is event based, and the relevance of the observed data expires long before the population data (and thus z-score) changes. But with the advance of personalised medicine the population data may get tailored to the individual and thus much smaller and more sensitive to change. It may even be continuously calculated from live data, or even estimated using predictive statistics/ML. And so it gets increasingly important to record and show the relevant metadata to the user interpreting the metadata. E.g. population used, population data last updated, calculating algorithm etc.
I’m worried about storing this info in the observation, since it’s not observable and as I’ve shown is not going to be feeling close to that for long. How to solve this I don’t know. The main usecase is decision support, I’d say. That’s both supporting the user to interpret the data as well as computerised decision support.
So storing the z-score as an evaluation already feels a little better. But still this feels off, since there’s no human interpretation/evaluation of the data involved. A new entry type might be a bit much? How about GDL algorithms? But where would that algo then store the z-score? Does it have to, since it could be live calculated at the moment of display? On the other hand if a clinician bases the decisions on the z-score you do want to record it as it was at the moment of decision making, if only for legal auditability. Recoding in the observation might be the best place. But recording it in the element of the original measured value feels wrong, conceptually. A fourth archetypeable attribute of the observation (and parents) class, next to data, state and protocol?
I do tend to agree the z-score and the like should be solved at RM level, mainly because it’s a fundamental concept and the usage feels to universal to do a hack like archetyping with clusters as suggested.
Tough stuff.

siljelb · 26 November 2021 11:41

I agree with you this isn’t easy, but I think this is a key point. A central question then becomes, will z-scores ever be recalculated at a later point based on an old measurement, or will they always be based on a fresh measurement of the relevant clinical measurement? And if they’re recalculated, will the new scores be reinserted into old compositions as a new version, or will they be persisted as a new composition referencing the old measurement?

joostholslag · 26 November 2021 11:54

Well this is mainly a clinical question. I’d say the calculated z-value should be recorded at the time of decision making. And if the population changes this shouldn’t be recalculated automatically. Because it may make your descision ‘wrong’. So what may happen is, if the population data (or the patient characteristics on the basis of which a population was selected) changes, after the decision was made, you may want to reevaluate that descision and record a new (version of) a composition that records both the new z-score (part of new version of the observation that contains the measured value) and the new decision as an observation. We have to be careful with the timing though, we must be careful not to change the timing of the moment of observation in a new version. The RM supports this fine, but easy to go wrong in implementation, since usually time of recording is so close to time of measurement, this is often used as a surrogate which goes wrong if you change the z-score in the observation days later. Another argument against storing the z-score in the observation.

siljelb · 26 November 2021 12:08

Hm. Do I understand you correctly if I read this as a suggestion to record a Z-score as an entirely separate ENTRY archetype? I can see some pros to this approach, but I suspect clear bindings to the measurement the Z-score was based on would be messy at best. We’d probably need to use the LINK parameter as a direct reference to the specific data element of the measurement, which is explicitly discouraged in the specs… (Common Information Model)

ian.mcnicoll · 26 November 2021 12:13

I’m persuaded by Heather and Silje’s examples that whilst this is right now a bit unusual, that it may well become more common as a different way of expressing the results of an Observation, and probably merits RM support

Also I can see that it is closely related to the raw datapoint but wary about overloading quantity/amount yet again with even more content which is quite obscure for most use-cases.

I’m coming round to the idea of a new datatype, as I can see all sorts of metadata needing to be captured alongside the Z-score. We already handle other ‘derived values’ such a calculated scores (NEWS2) as separate Elements, and that feels like a good compromise, also perhaps easier to explain to newbies, then delving deep into an overloaded quantity datatype.