Z-scores and percentiles

siljelb · 26 November 2021 12:19

How do you envisage this new data type being used?

Agree.

pieterbos · 26 November 2021 12:35

Yes, just an extra element would work. Perhaps in a way it is not so different from the case where some score has multiple intermediate scores or even multiple final scores. Those are directly calculated from the data, but need to be stored separate as well, to ensure the data a decision was based on is stored exactly as it was shown at that point in time.

Should those be separate elements in the protocol of the observation?

joostholslag · 26 November 2021 12:40

That’s exactly the difference, z-scores are not directly calculated from the data, they are calculated compared to a reference population data set. Currently often that data set is so fixed (e.g. head circumerence for boys) it feels like a direct calculation. But it isn’t and we’ll get into trouble if we make design decision on that false assumption. I’m fine with saying it’s too hard to solve right no, so just be pragmatic and go with a seperate element in an archetype, or even a cluster with raw value and statiscal meta data pattern. But thats postponing doing the hard stuff. And 'technical/modelling debt building up. Could be fine if we say this is a niche thing. But if we say (as all clinical people in this thread do) this is going to to be a ‘bigger’ thing. This is our chance to do it right.

joostholslag · 26 November 2021 12:43

It was what I was ‘suggesting’. Just to keep a broad view of possible solutions. And because conceptually it doesn’t fit the current entry classes. But you make a good point against it. And I dislike the inelegance of it myself.
Do we handle any similar derived/secondary data in other archetypes?

I don’t agree with the NEWS pattern comparison, since it 1 is a ‘self standing’ datapoint and 2 it only depends on the choosen calculation algorithm, not on an external dataset (yet). So recording the values in data and the algorithm in protocol is a nice conceptual match. Not so for z-score.

pieterbos · 26 November 2021 13:00

Even those ‘directly calculated from the data’ have a method or formula in between that is not stored in the data. The difference is that with the z-score some external data is used (the distribution). But even that is not so different from some of the calculation methods of scores, where for example the calculation is different for different ages or genders.
And even if it is viewed as more complex than directly calculated from data, that reinforces the point I was trying to make - if we already store calculated values from more straightforward calculations, I think it should be stored in this case as well. And it could be stored in the same as we do with scores, with just an extra element, and perhaps some extra information in protocol to store what steps were followed to calculate these scores.

ian.mcnicoll · 26 November 2021 13:03

No - I think they are tightly bound to the Z-score. I’d actually prefer to tag elements as being protocol vs data etc rather than the current structural commitment as although it works well inmost cases there are situations where e.g. multiple events need different protocol (different device), also easier to change if you get it wrong.

joostholslag · 26 November 2021 16:46

pieterbos:

Even those ‘directly calculated from the data’ have a method or formula in between that is not stored in the data. The difference is that with the z-score some external data is used (the distribution). But even that is not so different from some of the calculation methods of scores, where for example the calculation is different for different ages or genders.
And even if it is viewed as more complex than directly calculated from data, that reinforces the point I was trying to make - if we already store calculated values from more straightforward calculations, I think it should be stored in this case as well. And it could be stored in the same as we do with scores, with just an extra element, and perhaps some extra information in protocol to store what steps were followed to calculate these scores.

Well, largely agree:
If there are calculations happening, they should be in the rules section of the archetype actually executing the calculation and/or there should be some datum the states with calculation is used, so as to allow the consuming clinician to interpret the data.
What we could do is record the z-score as an element in data. And add elements for meta data (population, algorithm etc) in protocol.
But the clinicians expect this to be such a common usecase that it will have to modelled in most archetypes, including changing most current one. Since this is a big amount of manual work, it would be nice if it’s solved by the RM. That can even solve it for current archetypes (by updating rm version), right?

I tend to like the idea. But inheritance is problematic here, protocol is in care_entry, data is only in observation and evaluation. But would you do the same for state? How would you handle the difference between state of the entire observation vs state of an element?
If we can only do protocol under data, not state, it is not very useful right?

thomas.beale · 27 November 2021 17:23

yes this is an important point.

thomas.beale · 27 November 2021 17:25

Please not! Protocol is non-data in the sense it provides no clinical semantics, only info about methods of doing things.

thomas.beale · 27 November 2021 18:03

THis only works for calculations whose inputs are wholly within the data set of the archetype. Calculations that are effectively inferences relying on other knowledge resources have to be done elsewhere, e.g. in executions of CDS / Guideline algorithms.

This will be fractured bits and pieces, and a misuse of the protocol subtree, if the z-scores are to be considered as clinically relevant data.

I thought observable raw values was the scope of discussion here? Recording of z-scores in Evaluations could make sense of course - maybe more so, since they might be considered as a kind of interpretable assessment rather than raw data. But there is no obstacle to doing this if the appropriate new data types are created.

heather.leslie · 28 November 2021 23:34

These are definitely data and need to be recorded as such.

Remember that there may be more than one Z score per archetype. In Spirometry there will be more than one measurement that can have an associated Z score. BP has 4 measurable data elements. See the pattern…?

We can assume that each measurement in an archetype references the same population cohort for deriving a Z score, but it is an assumption and as soon as we lock that in no doubt we’ll find a use case to break it.

So we need to record the metadata that enabled each Z score to be derived, and commit it alongside the raw measurement so that we have provenance /foundation for how the treatment was chosen. If the cohort or derived value changes, the treatment protocol may also need to change and we need to consider how we might model that, but the intimate connection of the original raw data/Z score pair should remain intact in the record. For that reason, I’m increasingly convinced that they should be recorded using the same model, to be stronger than just using a link.
We record interpretations of the results of a scale/score or examination findings in OBSERVATIONs all the time.

thomas.beale · 29 November 2021 12:42

Yep - I wouldn’t assume that for exactly that reason - as soon as you do, there’ll be some data point about a donated organ or other family member that needs another population distribution recorded for it.

Sounds right to me.

I also would not prefer to use the Link solution.

pieterbos · 29 November 2021 13:04

I am not entirely convinced that it should be a change to the RM - is there really a big switch in the field happening to percentiles and z-scores as a basis for interpreting observations, and will this be done in so many places that just modelling these will not be enough? And does it really need to be stored, or is this some kind of app?

if it should, a possibility could be to add one new attribute to DV_AMOUNT or perhaps even DV_ORDERED (not sure whether ordinal and scale should support this). It could be called something like population_comparison or frequency_distribution_score or even just population score, or perhaps a better term exists. Its type would then be a new class, with as properties the score itself (a DV_QUANTITY or a Real?), a coded text indicating which type of score this is (z-score, percentile, t-score, etc.), and a text indicating which distribution data is used. The cardinality of the attribute could be 0…1, or even 0…*, in case multiple scores need to be recorded. It is probably necessary to add a normal range here as well, since its purpose is evaluating the range of the observed value.

This has several benefits over a new DV_ type:

Data stored in the RM is backwards compatible
This can be used without any changes in existing archetypes
Existing archetypes or templates can be easily changed to add constraints to the new scores
It keeps the type of the recorded data the same, which conceptually seems right

Alternatively, perhaps a DV_POPULATION_SCORE could be made, as a separate element from the recorded quantity to not have to change all archetypes. The main drawbacks I see is that it has a less explicit binding to the recorded observation, and that it requires explicit modelling to use.

DavidIngram · 29 November 2021 13:51

I’ve spent an evening digging back into the history of z-score in the standardization and interpretation of clinical measurements.

I started with spirometry and then other areas. These publications (a few listed below in case of interest) had interesting results and insights.

There are different ways reported for calculating z-scores and comparative studies of their efficacy in standardizing measurement and assisting clinical interpretation. Taking just one example – a measurement of something like vital capacity may involve a different reference population and computational method when standardizing it as a z-score, according to the volume measured . That hadn’t occurred to me but it seems very plausible when one thinks about it. The record of the method used in calculating a z score involves more detail than a mean and standard deviation of a Gaussian distribution (mu and sigma) – methods also take account of distribution skewness (lambda) and numerical methods used in calculating the z-score look to be often quite complex algorithms. One author warned of situations with potential to amplify measurement error through use of z-score and another group made extensive study of clinical severity ratings and how they changed according to the different methodologies in use.

I’m not in any way disagreeing that clinical use of z-score must find a place in records – clearly it must. What seems apparent from my brief foray into ten years of research and international standardization of measurement through z-score is that it has clear advantages, and detail of which its users should also be kept aware of and in touch with. There is also caution expressed about the validity of the standardization methods used in relation, for example, to different ethnic groups. If used as a proxy measurement, one needs to be aware, as one of the papers shows, that cross plots of z-score and raw respiratory measurement have considerable scatter – how much the limitation of the raw data and how much arising through computation of z-score is an empirical question which is still being explored in many different contexts.

As I said before, I’m not sufficiently au fait with the current openEHR clinical/technical sequellae of this discussion, but the issues and choices do seem to merit the careful and cautious consideration shown in the posts thus far. I’m just setting out here the chain of thought that Bjorn’s questions have stimulated in me.

David

Some stuff I read and found useful, in case of interest:

The Global Lung Function Initiative: dispelling some myths of lung function test interpretation | European Respiratory Society

https://breathe.ersjournals.com/content/9/6/462

Spirometry, Static Lung Volumes, and Diffusing Capacity

http://rc.rcjournal.com/content/respcare/62/9/1137.full.pdf

Advances in spirometry testing for lung function analysis

https://doi.org/10.1080/17476348.2019.1607301

Optimisation of children z-score calculation based on new statistical techniques

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0208362

The use of Z-scores in paediatric cardiology

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3487208/

The story wrt anthropometric data like height and weight and growth charts looks very well established.

joostholslag · 29 November 2021 13:57

heather.leslie:

Remember that there may be more than one Z score per archetype. In Spirometry there will be more than one measurement that can have an associated Z score. BP has 4 measurable data elements. See the pattern…?

We can assume that each measurement in an archetype references the same population cohort for deriving a Z score, but it is an assumption and as soon as we lock that in no doubt we’ll find a use case to break it.So we need to record the metadata that enabled each Z score to be derived, and commit it alongside the raw measurement so that we have provenance /foundation for how the treatment was chosen. If the cohort or derived value changes, the treatment protocol may also need to change and we need to consider how we might model that, but the intimate connection of the original raw data/Z score pair should remain intact in the record. For that reason, I’m increasingly convinced that they should be recorded using the same model, to be stronger than just using a link.

Agreed.

@heather.leslie I agree with the other clinicians there’s a clear expectation. But I also think Pieter’s is a fair question, since modelling in the RM should generally be minimised. And there are currently few archetypes that need the z-score I’m aware of. Is there any data or papers that back up our expectations?

I think this is a nice approach. It keeps the transformed data very close to the raw value, while still indicating it’s a transformation of that raw data, not an observed value. And it’s nice for backwards compatibility indeed.

So to summarise the proposal is to store the observed data e.g. length in the value attribute. And add optional attributes for

transformed value (other name suggestions?),
algorithm for transformations (z/t score, reference, custom raw algo)
comparing dataset
normal range

0…* makes sense. But we’d have to find a way to group the attributes, they should occur together, not individually.

siljelb · 2 December 2021 12:01

I also agree with this suggested approach. I suspect DV_AMOUNT is sufficient. I would be very suprised if a Z-score is needed for a DV_ORDINAL or DV_SCALE.

I think this can be handled in the same way as other_reference_ranges, ie as a List of objects?

thomas.beale · 2 December 2021 17:20

I feel something good coming together! Only 77 posts to get here

siljelb · 3 December 2021 12:52

Great! How do we progress from here?

thomas.beale · 3 December 2021 13:15

You or @heather.leslie or anyone (but better from clinical side) creates a PR in the usual place, and puts a small summary of the bits that matter of this conversation into the description field (plus a link to this thread).

(We like to do it this way so that the author of the PR is actually the person or group for whom the problem exists - then we can see a historical record of requests coming from CM group or anywhere else for that matter).

DavidIngram · 3 December 2021 14:31

Having seen the detail of how z-scores are calculated and used in practice, it will be good to work from this as a usecase in specifying what and where you wish this to be recorded and accessible. Respirology has been adduced most often in this thread so maybe a respiratory physician might be found able and willing to help in this.