# Z-scores and percentiles **Category:** [Clinical](https://discourse.openehr.org/c/clinical/5) **Created:** 2021-11-19 08:35 UTC **Views:** 1407 **Replies:** 79 **URL:** https://discourse.openehr.org/t/z-scores-and-percentiles/2087 --- ## Post #1 by @siljelb [Z-scores, or standard scores](https://en.wikipedia.org/wiki/Standard_score), and [percentiles](https://en.wikipedia.org/wiki/Percentile) are used for a whole range of clinical measurements. We haven't been modelling these into archetypes as of yet, and I'm not sure this would be a good way to represent them, since they're closely bound to each measurement. So this made me think; could Z-scores and percentiles be represented using an additional RM element of the DV_QUANTITY (edit: or maybe DV_AMOUNT?) data type? Or do we need to model them into every archetype where they're potentially relevant? --- ## Post #2 by @heather.leslie There was a discussion around this some time ago related to [OBSERVATION.child_growth](https://ckm.openehr.org/ckm/archetypes/1013.1.2741). It was modelled as an inline attribute at the time. Also, we need similar for [EDD](https://ckm.openehr.org/ckm/archetypes/1013.1.4340) in pregnancy - eg 34 weeks 4 days +/- 7 days. I would prefer it as part of the RM too - for quantity and duration. --- ## Post #3 by @siljelb Yep, it's needed for spirometry too. Specs folks, what do you think? @pieterbos @thomas.beale [quote="heather.leslie, post:2, topic:2087"] Also, we need similar for [EDD](https://ckm.openehr.org/ckm/archetypes/1013.1.4340) in pregnancy - eg 34 weeks 4 days +/- 7 days. [/quote] Isn't this uncertainty rather than a percentile/Z-score? --- ## Post #4 by @thomas.beale Is it that you want to record just the z-score number, but have it indicated that it is a z-score value, or do you want the z-score number as well as the raw number? If the latter, currently it needs another data element. The EDD example is an accuracy thing. --- ## Post #5 by @siljelb In most cases we want the base measurement (for example height 150 cm) plus the Z-score (for example height for age 0.91) and/or percentile (for example height for age 81.9). --- ## Post #6 by @thomas.beale I would potentially treat this trio (or more) as a pattern, and possible create a CLUSTER containing three ELEMENTs, named raw, percentile, z-score or whatever names you want meaning that. Then in some other archetype where you want a 'tri-value' of this kind (i.e. the triple), you just use that CLUSTER. In ADL2, you can just plug in the archetype directly into the parent archetype - no need for any slot. If this need is a) fairly common and b) the shape of the data is always the same, i.e. this same 3 items (or 4, or 5 or whatever), then a pattern would make sense. You would see paths like: `.../items[id14|tri-value|]items[id309|z-score|]/value` or in readable form., `.../tri-value/z-score/value` Another solution would be a sort of 'template group' of `ELEMENTs`, which would just be the set of 3 `ELEMENTs`, no `CLUSTER`; if the tools supported such groups, the would be dragged and dropped into the parent `CLUSTER`, but it would not necessarily be so obvious that the 3 were together. Finally, if we regarded this triple (again, I'm assuming triple) as really common and useful - like an Ordinal or a Quantity - we might actually create a new descendant type under `DV_QUANTIFIED` or `DV_ORDERED` in the RM, called ... `STATISTICAL_POPULATION_VALUE` or hopefully something shorter ;) --- ## Post #7 by @heather.leslie [quote="siljelb, post:3, topic:2087"] Isn’t this uncertainty rather than a percentile/Z-score? [/quote] Standard deviation is essentially a reflection of the amount of variability within a given data set. EDD +/- n days is documenting the variability in estimating the EDD at a given gestation. We can argue the semantics but the issues around representation are the same. --- ## Post #8 by @pieterbos It seems like we have three separate but very much related things in this discussion: - accuracy of a single measurement or estimate - a measure of variability and distribution of the measurement in a population - a way of indicating how a single measurement relates to the distribution, so the z-score or percentile. Accuracy, so the +- x days example, seems well specified, in https://specifications.openehr.org/releases/RM/latest/data_types.html#_accuracy_and_uncertainty , so the `accuracy` and `accuracy_is_percent` attributes of DV_AMOUNT and it subclasses, as specified in https://specifications.openehr.org/releases/RM/latest/data_types.html#_dv_amount_class . Is that a good way to solve that particular issue of 'this many weeks, +- 7 days'? I think standard deviation is different in the sense that it is not an accuracy - the measurement could have any kind of accuracy plus a mean value and standard deviation. This relates to a distribution of measurements, and also the mean value will need to be stored besides the standard deviation. There is reference ranges in the RM, in which μ ± σ, so the mean ± standard deviation could be stored, with the meaning indicating what it is, preferably with some (snomed?) coding. I would prefer to indicate mean and standard deviation, instead of μ ± σ, but I guess this could work within the current specification? Then the Z-score or percentile: maybe a different DV_QUANTITY, with the z-score, perhaps in a cluster with the measurement value? In these kinds of scores, is the distribution also needed in the data? In case of a normal distribution, that is sort of possible in reference range in the same way as above, but if it's not a normal distribution (possible with percentile score, probably not with z-score?), that will add more complexity. Do these indications with a distribution occur often? Often enough to necessitate an addition to the specification? --- ## Post #9 by @thomas.beale [quote="pieterbos, post:8, topic:2087"] Accuracy, so the ± x days example, seems well specified, in [Data Types Information Model ](https://specifications.openehr.org/releases/RM/latest/data_types.html#_accuracy_and_uncertainty) [/quote] Yes - this represents single measurements or estimates, with accuracy, i.e. error. [quote="pieterbos, post:8, topic:2087"] I think standard deviation is different in the sense that it is not an accuracy [/quote] Right - it's an artefact of the statistical analysis of a population. [quote="pieterbos, post:8, topic:2087"] There is reference ranges in the RM, in which μ ± σ, so the mean ± standard deviation could be stored, with the meaning indicating what it is, preferably with some (snomed?) coding. I would prefer to indicate mean and standard deviation, instead of μ ± σ, but I guess this could work within the current specification? [/quote] Well, theoretically, but then we're overriding the normal meaning of that part of the model, and people then have to write special code to look for special reference range names to see that it's a SD band instead. [quote="pieterbos, post:8, topic:2087"] In these kinds of scores, is the distribution also needed in the data? [/quote] I think this is likely in the general case, so if we were thinking of adding a type to the RM, I'd probably want to model it more fully, including a field for name of distribution. Also, the purpose of a type called something like `STATISTICAL_VALUE` is likely to make code and data much safer. Also, such a type would be useful in secondary / aggregated applications. --- ## Post #10 by @pieterbos [quote="thomas.beale, post:9, topic:2087"] Well, theoretically, but then we’re overriding the normal meaning of that part of the model, and people then have to write special code to look for special reference range names to see that it’s a SD band instead. [/quote] The concept of a reference range does not seem to be well defined enough in the specification to say it is something else. If you take the definition in https://en.wikipedia.org/wiki/Reference_range , then it would be overriding the normal meaning, but the RM specification does not seem to limit it to that. However, often when the standard deviation is important, it is actually to define a reference range. That will in most cases not be `mean +- standard deviation`, but more likely to be `mean +- n*standard deviation`. So my question to the modellers is: is this to indicate something that fits in a definition of a reference range, or something else? [quote="thomas.beale, post:9, topic:2087"] I think this is likely in the general case, so if we were thinking of adding a type to the RM, I’d probably want to model it more fully, including a field for name of distribution. Also, the purpose of a type called something like `STATISTICAL_VALUE` is likely to make code and data much safer. Also, such a type would be useful in secondary / aggregated applications. [/quote] Could be solved with RM changes, or just by modelling this with a couple of ELEMENTS or a standardised archetyped CLUSTER. If RM changes, I am not sure if it should be a new data type. The measurement or quantity is still a regular number, often with a unit, so a subclass of DV_AMOUNT, and not a 'statistical value'. What is desired here is a bit of extra information on how this number is to be interpreted, which in these cases happens to be information about the relation between the quantity and the distribution of this quantity in the population. Could just be some changes to DV_AMOUNT or DV_QUANTITY to add this information? --- ## Post #11 by @siljelb [quote="pieterbos, post:8, topic:2087"] It seems like we have three separate but very much related things in this discussion: * accuracy of a single measurement or estimate * a measure of variability and distribution of the measurement in a population * a way of indicating how a single measurement relates to the distribution, so the z-score or percentile. [/quote] Agree, thanks for putting it this clearly. [quote="pieterbos, post:8, topic:2087"] Accuracy, so the ± x days example, seems well specified, in [Data Types Information Model ](https://specifications.openehr.org/releases/RM/latest/data_types.html#_accuracy_and_uncertainty) , so the `accuracy` and `accuracy_is_percent` attributes of DV_AMOUNT and it subclasses, as specified in [Data Types Information Model ](https://specifications.openehr.org/releases/RM/latest/data_types.html#_dv_amount_class) . Is that a good way to solve that particular issue of ‘this many weeks, ± 7 days’? [/quote] I'm not sure this is usable for the '± 7 days' example, as *accuracy* in DV_AMOUNT is Real and not a type with units. Another example could be '3h7m ± 2 minutes', for which we need to specify which unit we're talking about. [quote="pieterbos, post:8, topic:2087"] Then the Z-score or percentile: maybe a different DV_QUANTITY, with the z-score, perhaps in a cluster with the measurement value? In these kinds of scores, is the distribution also needed in the data? In case of a normal distribution, that is sort of possible in reference range in the same way as above, but if it’s not a normal distribution (possible with percentile score, probably not with z-score?), that will add more complexity. Do these indications with a distribution occur often? Often enough to necessitate an addition to the specification? [/quote] I think this needs to be closely tied to the measurement in question, and I don't see how we could closely associate a CLUSTER archetype with a specific data element in each archetype. I think an RM parameter of DV_AMOUNT would be better. For that I think we need another class, say STATISTICAL_VALUE as suggested by @thomas.beale , consisting of: * value (Real, for example '65') * type (DV_TEXT, for example 'Percentile') * distribution (DV_TEXT, for example ' 2000 CDC Growth Charts for the United States') Edit: This class would have to be able to be repeated in a *List*, similar to mappings and other_reference_ranges. --- ## Post #12 by @ian.mcnicoll Does this ever exist in the absence of an actual 'magnitude' i.e it is only the STATISTICAL_VALUE that is needed? --- ## Post #13 by @siljelb I can't say, but I would be surprised if it was ever separated from the value it's derived from. --- ## Post #14 by @thomas.beale [quote="pieterbos, post:10, topic:2087"] the RM specification does not seem to limit it to that [/quote] It might not be documented well enough in the spec, but the design intention was clear from the start: represent reference ranges in lab results, vital signs and any other observable for which such a range is commonly used in medicine. [quote="pieterbos, post:10, topic:2087"] However, often when the standard deviation is important, it is actually to define a reference range. That will in most cases not be `mean +- standard deviation`, but more likely to be `mean +- n*standard deviation`. [/quote] That can be true, but today reference ranges could just as easily be derived by data mining (i.e. comparing input variables to outcomes). Practically speaking, the reference ranges in openEHR are not trying to do anything other than represent the ranges used by labs or other sources for that kind of patient (say, pregnant woman) for the analyte in question. [quote="pieterbos, post:10, topic:2087"] Could just be some changes to DV_AMOUNT or DV_QUANTITY to add this information [/quote] Well then we're adding optional fields that will be void on 95% of all data, but will probably confuse developers reading the documentation. I'd prefer to see another data type, or a wrapping data type, e.g. it could be done in the form `STATISTICAL_VALUE` where the outer class is adding the extra bits and pieces, and the `DV_QUANTIFIED` (usually a `DV_QUANTITY`) carries the original raw value (potentially with its own reference ranges). --- ## Post #15 by @thomas.beale [quote="siljelb, post:11, topic:2087"] I’m not sure this is usable for the ‘± 7 days’ example, as *accuracy* in DV_AMOUNT is Real and not a type with units. Another example could be ‘3h7m ± 2 minutes’, for which we need to specify which unit we’re talking about. [/quote] Accuracy in `DV_DURATION` is a `DV_DURATION`. [See here](https://specifications.openehr.org/releases/RM/latest/data_types.html#_overview_5). --- ## Post #16 by @siljelb [quote="thomas.beale, post:14, topic:2087"] Well then we’re adding optional fields that will be void on 95% of all data, but will probably confuse developers reading the documentation. I’d prefer to see another data type, or a wrapping data type, e.g. it could be done in the form `STATISTICAL_VALUE` where the outer class is adding the extra bits and pieces, and the `DV_QUANTIFIED` (usually a `DV_QUANTITY`) carries the original raw value (potentially with its own reference ranges). [/quote] The problem is, Z-scores and percentiles could potentially be used on any clinical measurement, from IQ tests via lab results and head circumferences, to spirometry. I would think it would be used across a larger number of different concepts than reference ranges, which are mainly used for lab results. --- ## Post #17 by @thomas.beale [quote="siljelb, post:16, topic:2087"] The problem is, Z-scores and percentiles could potentially be used on any clinical measurement, from IQ tests via lab results and head circumferences, to spirometry. I would think it would be used across a larger number of different concepts than reference ranges, which are mainly used for lab results [/quote] This points even more toward modelling it as something like `STATISTICAL_VALUE` because then you can just have a statistical version of any other kind of value. But we probably need a comprehensive statement of the problem first - maybe a wiki page? I wouldn't like to half solve this... --- ## Post #18 by @siljelb [quote="thomas.beale, post:17, topic:2087"] This points even more toward modelling it as something like `STATISTICAL_VALUE` because then you can just have a statistical version of any other kind of value. [/quote] How would that work in practice? --- ## Post #19 by @pieterbos [quote="siljelb, post:11, topic:2087"] I’m not sure this is usable for the ‘± 7 days’ example, as *accuracy* in DV_AMOUNT is Real and not a type with units. Another example could be ‘3h7m ± 2 minutes’, for which we need to specify which unit we’re talking about. [/quote] If a DV_DURATION, I guess that would be represented either in seconds or as a percentage? It's a bit technical, but very possible to build a user interface that allows input in days/weeks/etc and stored it in seconds or as a percentage. If DV_QUANTITY with unit weeks, it can be represented as a fraction of weeks, since it is a real number. A bit ugly to represent currently, but it would be correct. I guess a change to make DV_DURATION.accuracy a iso_duration itself might be better, but it would be a breaking change. --- ## Post #20 by @siljelb [quote="thomas.beale, post:15, topic:2087"] Accuracy in `DV_DURATION` is a `DV_DURATION`. [See here ](https://specifications.openehr.org/releases/RM/latest/data_types.html#_overview_5). [/quote] Isn't that for DV_TEMPORAL, which DV_DURATION afaics doesn't inherit? --- ## Post #21 by @ian.mcnicoll We already have a ton of optional attributes like that - accuracy, ref ranges, magnitude_status, normal_status are not used in 95% of quantities - one more won't hurt!! --- ## Post #22 by @pieterbos [quote="siljelb, post:20, topic:2087"] Isn’t that for DV_TEMPORAL, which DV_DURATION afaics doesn’t inherit? [/quote] indeed, for DV_DURATION it is a Real, without indication what this means if it is not a percentage, for all the other types it is correctly a DV_DURATION. --- ## Post #23 by @thomas.beale [quote="siljelb, post:20, topic:2087"] Isn’t that for DV_TEMPORAL, which DV_DURATION afaics doesn’t inherit? [/quote] Er yes - sign of old age when you can't read your own model ;) I remembered an older design from Gehr, where we had something called 'customary quantity' or maybe 'customary units' - can't quite remember. Anyway, the idea was to deal with things like: * months and days * weeks and days * feet and inches (still the norm for height in UK as well as US) * lb / oz * stones + pounds (body weight in US) We might need to think a bit more in that mode if we want a more elegant solution to the current question. [quote="pieterbos, post:19, topic:2087"] If a DV_DURATION, I guess that would be represented either in seconds or as a percentage? It’s a bit technical, but very possible to build a user interface that allows input in days/weeks/etc and stored it in seconds or as a percentage. [/quote] That was the thinking that led us to stick to a simpler system. Only problem of course is if the originally entered units are not the canonical ones that would be recomputed later, e.g. 1y and 5 days, or 40 weeks and 2 days - the top unit has to be remembered. But these are real edge cases I think. --- ## Post #24 by @pieterbos Then the question: is this problem big enough to require a new datatype or an adaptation to the existing one? Is this needed often enough to motivate vendors to implement something new? [quote="siljelb, post:11, topic:2087"] I think this needs to be closely tied to the measurement in question, and I don’t see how we could closely associate a CLUSTER archetype with a specific data element in each archetype. I think an RM parameter of DV_AMOUNT would be better. For that I think we need another class, say STATISTICAL_VALUE as suggested by @thomas.beale , consisting of: [/quote] A pattern to be used, could be something like: Cluster: quantity with Z-score Element: quantity: the measurement Element: quantity: z-score or percentile score Element: dv_text with distribution obviously with the more concrete naming. Relatively cumbersome to do. As a pattern that is just reused, it's doable. As an archetype, it gets problematic. In a template you could override the units and other constraints of the quantity that is the measurement, in an archetype that would require a specialization for every use. Does this happen often, or is this just a few times in the entire scope of what should be in the CKM? --- ## Post #25 by @bna I have a problem with this topic. I don't think the Z-scores or percentiles is a property of the data. They are more a view of the data in a specific context. The percentile distribution will change depending on the dataset it is based on. As such this kind of information needs to be stored together with some decision support system whee the recommendation is based on population X and not Y. This might be modelled using existing features in tooling and specifications. I discussed this with Silje the other day. And I didn't really see the totalt picture here. Just sharing my thoughts and look forward to feedback. --- ## Post #26 by @siljelb [quote="pieterbos, post:24, topic:2087"] A pattern to be used, could be something like: Cluster: quantity with Z-score Element: quantity: the measurement Element: quantity: z-score or percentile score Element: dv_text with distribution [/quote] The problem with this is that we don't want a generic CLUSTER archetype to represent the core elements of archetypes, for example Head circumference. This would have to be constrained in a template, which partly defeats the point of making specific archetypes for specific measurements. --- ## Post #27 by @siljelb [quote="bna, post:25, topic:2087"] The percentile distribution will change depending on the dataset it is based on. [/quote] Agree, and this is why the data set of the hypothetical STATISTICAL_VALUE class needs to contain an element for the distribution/source/whatever. [quote="bna, post:25, topic:2087"] As such this kind of information needs to be stored together with some decision support system whee the recommendation is based on population X and not Y. [/quote] Strongly disagree. Percentiles and Z-scores are bits of informations derived from population distributions and used for making clinical decisions, but they're not inherently part of decision support systems wholly separate from the EHR data. They are used together with the measurements they're derived from, and should be persisted in context. --- ## Post #28 by @bna [quote="siljelb, post:27, topic:2087"] Strongly disagree. Percentiles and Z-scores are bits of informations derived from population distributions and used for making clinical decisions, but they’re not inherently part of decision support systems wholly separate from the EHR data. They are used together with the measurements they’re derived from, and should be persisted in context. [/quote] Yes - this is what we look at differently. I can't see why a percentile or Z-score should be stored together with i.e. the head circumference or a body height. The measurement is the same no matter what kind of percentile or Z-score is used to evaluate the consequence of the measurement. @siljelb and I tried to understand each others views the other day. Now we need someone to advice us on this :-) --- ## Post #29 by @heather.leslie [quote="thomas.beale, post:9, topic:2087"] Yes - this represents single measurements or estimates, with accuracy, i.e. error. [/quote] The estimated gestation is actually not a measurement but reflects the typical gestation associated with the actual measured parameters of a fetus in utero. The associated variance expressed as =/- days reflects the distribution associated with a cohort/community that may or may not be as formal as SDs, but in clinical practice, it is used as a proxy for it. I'm not sure of the science behind where these gestation estimates +/- variance came from originally. The actual things measured are uterine height, fetal BPD etc. The assigned gestation in weeks +/- days is inferred from the population statistics with associated variance, more akin to recording an associated population mean & SDs than not. In any case, we are dealing with a Duration datatype, not an Amount/Quantity and we need to be able to associate units with it. It needs to be recorded as a separate data element that is probably sourced from a test result or external knowledge base but is used as the basis for ongoing antenatal care. As Silje and I discussed yesterday, actually quite a different thing to the Z score after all, and a diversion from the main subject (apologies for introducing the added confusion) but still needs its unique resolution. As best I can see at the moment, accuracy is only available related to Quantity, as a real number or a percentage. The specs are not at all clear to me, and could do with more examples to clarify. In addition, from a tooling POV, we need to be able to actively turn this kind of attribute 'on' in the modelling where we know it is always relevant eg so that clinicians can review it correctly as well as implementation advice to vendors. In other use cases, it can be available in the RM when needed. --- ## Post #30 by @heather.leslie [quote="bna, post:28, topic:2087"] The measurement is the same no matter what kind of percentile or Z-score is used to evaluate the consequence of the measurement. [/quote] The [Child growth indicators archetype](https://ckm.openehr.org/ckm/archetypes/1013.1.2741) has been developed with that kind of view in mind. It is pretty clunky but it kind of works. The recent use case is about spirometry. While the measurement is accurate, the [Z score is increasingly used as a statistical tool for the expression of results](https://pubmed.ncbi.nlm.nih.gov/29873048/), to not rely on a number measured by the machine but instead focus on where a patient sits on the bell curve from a similar population cohort as a more accurate way to express the severity of disease. In this situation having a Z-score as an attribute/alternative expression for the measured value deserves tight alignment, especially as there are multiple spirometry measurements potentially expressed this way. Multiple CLUSTERs identifying knowledgebase, cohort, Z-score etc is unwieldy and can easily be disconnected from the actual result. I suspect that we will see this kind of expression of measured values to become increasingly prominent, especially as we get more population-based health data and AI etc. Manually adding a Z-score associated with every measurement in the Spirometry model is possible but unwieldy. Potentially adding it into any/every Quantity data type will become a future nightmare. --- ## Post #31 by @thomas.beale [quote="heather.leslie, post:29, topic:2087"] The assigned gestation in weeks +/- days is inferred from the population statistics with associated variance, more akin to recording an associated population mean & SDs than not [/quote] I guess it's still an individual estimate, but I see what you mean - in the ideal world of knowing exactly the cohort corresponding to the observable facts about the current pregnancy, you would just be quoting the EDD of that cohort rather than 'estimating' as such. But that's probably close to impossible since the Obs/midwife is taking a figure from a much more general cohort and estimating adjustments to it based on e.g. course of previous pregnancies as well as current observables... so probably each woman is a cohort of one. The real question is probably whether you foresee the need to include some population related values like SD, variance, distribution etc with EDD. [quote="heather.leslie, post:29, topic:2087"] In any case, we are dealing with a Duration datatype, not an Amount/Quantity and we need to be able to associate units with it. [/quote] Yep - but we'll need to treat this as a separate question from the z-score one I think. [quote="heather.leslie, post:29, topic:2087"] As best I can see at the moment, accuracy is only available related to Quantity, as a real number or a percentage. The specs are not at all clear to me, and could do with more examples to clarify [/quote] That is correct; we should add more examples. If you [create a new PR here](https://openehr.atlassian.net/issues/?filter=11103) (press Create button at top), and add a subject and short description, that will record this need... --- ## Post #32 by @thomas.beale [quote="heather.leslie, post:30, topic:2087"] In this situation having a Z-score as an attribute/alternative expression for the measured value deserves tight alignment, especially as there are multiple spirometry measurements potentially expressed this way [/quote] Do you mean multiple raw values, each with its z-value, and then separately from that, the relevant fixed distribution data? Is there an example that you/someone could create to show what the most complex case of this type of data recorded in a single encounter is? E.g. bullet-point logical structure or mindmap etc. --- ## Post #33 by @DavidIngram “While the measurement is accurate, the Z score is increasingly used as a statistical tool for the expression of results, to not rely on a number measured by the machine but instead focus on where a patient sits on the bell curve from a similar population cohort as a more accurate way to express the severity of disease.” An interesting discussion from different perspectives, with good arguments both ways. Bjorn invited wider input, so here are my thoughts. Placing measurements in context is always useful, and likewise for interpreting their meaning and consequence. Like any such statistic, the z-score rests on assumptions about the population it describes, and these are matters for empirical investigation. Z-score rests on assumption of normal distribution [ Z = X- mu/sigma]. This is often a good approximation and sometimes very much not – in the world of endocrinology, distributions I have tracked, even after logarithmic transformation, were highly skewed. Assuming a distribution is usefully characterised as normal, the question of what ranks as a similar population will often remain highly contextual. As David Spiegelhalter’s wonderful book, The Art of Statistics, emphasises, it is important to visualise the raw data. It is a tricky path when we start to treat statistics themselves as if they are data. Hence, I imagine, some of Bjorn’s concern. I can see the practical arguments in and around how best to capture such scores within the openEHR modelling paradigm. I can see the argument that confusing scores with raw data can become a slippery slope. As so often in matters medical, these are judgements about general and particular cases. openEHR methodology is a general formalism and struggles, as all such formalisms must do, to balance this with particular cases. Too many particular adaptations and the generality loses its power and appeal. Too much blanket generalisation and the real world loses touch. I haven’t yet read the later postings this morning. I will follow with interest and hope a wider group will add their thoughts, as Bjorn encouraged! --- ## Post #34 by @thomas.beale [quote="DavidIngram, post:33, topic:2087"] Too many particular adaptations and the generality loses its power and appeal. Too much blanket generalisation and the real world loses touch [/quote] This should be carved on the lintel of the main entrance to the openEHR edifice ;) --- ## Post #35 by @Leuschner Hi everyone. My contribution to the discussion will ellaborate on the use case of Z-scores I am most familiar with: LFT's. Although I can't confidently choose one side of the 'barricade', I hope I this insight can help the discussion. * Historically, 'normal' values for spirometry were defined per equations derived from data of relativelly small, male-predominant populations; in Europe, this was aggravated by the fact that this population was almost exclusively composed of white male miners. Women's 'normal' was defined as 80% of the male average values; and for each gender, the normal interval was defined as 80-120% the average value; * Recently, new 'normal' values were derived from GLI (Global Lung Initiative) raw data; * This same movement proposed to use Z-scores to define normality (-1.64 < Z < + 1.64), on the assumption that 80-120% was an even worse approximation to the normal interval; * Both GLI equations (to define normality) and z-scores (to express results) are now routinely used. Nevertheless, absolute 'raw' values still support decisiona making in some settings (e.g. FEV1>1L for lung ressection in cancer). Moreover, if we don't record the raw values (measurement and equation derived normal), it will be harder to accomodate for future changes in test result expression. My punchline could be 'raw data is more future-proof, but clinicians need imediate, effortless data transformations for ongoing decision support'. --- ## Post #36 by @Seref Deja vu you say? https://www.mail-archive.com/openehr-clinical@lists.openehr.org/msg04184.html (responses at the time are at the bottom) --- ## Post #37 by @DavidIngram C’est vrai, Seref! - just had a look – it might be worth re-posting the points you made then. --- ## Post #38 by @Seref Sure David. Here it is: Hi Heather, I'd humbly advice against making Z-Score an attribute for every quantity data. The first reason is that Z-Score is a meaningful metric for the assumption of normality, that is the possible values of the numeric quantity demonstrate a particular characteristic (the bell curve) In a non-normally distributed case, the Z score is meaningless and even though many stakeholders end up assuming a normal distribution, there is a lot of data out there that is not distributed normally. So if Z-Score becomes an attribute of every quantity type it will be an attribute that implies a particular statistical/probabilistic context when it is set and won't be set that many times given all the instances of quantity types in openEHR. So I believe from a software design/implementation perspective, Z-Score is a higher level concept than the quantity primitives we have in the reference model and should be modelled at the archetype level. The slightly more subtle reason for not making z-score an attribute of quantity types is the wider definition of a probabilistic event. A particular diagnosis in a population could be interpreted as a probabilistic event, for which there is a population distribution, the prevalence, if we bend the language towards the clinical lingo. Now when that diagnosis is expressed with a DV_CODED_TEXT and mappings to a snomed_ct code, there is also a valid requirement to associate a z-score to it, say to assess the likelihood of a particular diagnosis in the context of the patient's data. So now it may make sense to add Z-Value to dv_coded_text as well :) So even though it looks like a simple metric, the Z-Value is associating statistics and all things based on it (machine learning, risk estimation etc) to clinical data and that association should be very clearly described IMHO, which would not be possible if we do it at the RM level. If all of the above makes no sense and I completely misunderstood the suggestion, then accept my apologies along with a nice pint that I'll buy when I see you next time :) Kind regards Seref --- ## Post #39 by @Seref Replying to myself from 2017, introducing a Stochastic package to RM , with Dv_Beta, Dv_Normal etc , say, [based on the research at hand](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5603665/#:~:text=The%20results%20show%20that%20the,the%20lognormal%2C%20and%20the%20exponential.) may solve the problem I pointed at. At archetype level, modellers can compose values of these types with DV_whatever's we already have using a cluster. And no Tom, they are not to inherit from DV_Stochastic, just go with composition over inheritance for once please :slight_smile: Let them float freely in that package.. --- ## Post #40 by @siljelb I'm concerned modelling z-scores et al manually would make for very unwieldy archetypes. Especially if it was to be done as a cluster. What would something like that look like, and how would it impact archetypes as standardised information "packages" for specific clinical concepts? I'm also having trouble seeing the difference in principle between this and reference ranges. Reference ranges are usually derived from a distribution (normal or otherwise), with the lower and upper ranges defined by a certain lower and higher percentile. --- ## Post #41 by @borut.jures I'm reading and learning (all this is new to me) but have some UI related input. I started generating forms from the OPTs and it helps to have a clear purpose for the DV_ types. They are prime candidates for the optimized entry components. So just from the UI point of view I would vote for a new DV_ type to be considered. --- ## Post #42 by @thomas.beale [quote="siljelb, post:40, topic:2087"] Reference ranges are usually derived from a distribution (normal or otherwise), with the lower and upper ranges defined by a certain lower and higher percentile [/quote] Well they might just be defined by physics / biochemistry etc as well - e.g. a systolic of 180 will have a physical implication for weaker vessels / stroke etc that could be determined by observation of populations, but also (in more recent times) by a physics /fluid dynamics model. Taking a random example: I see in Chernecky & Berger 3rd Ed. that PaCO2 has a reference range termed 'panic values': <= 20mmHg, > 70mmHg. SpO2 panic range: <= 60%. Is this determined from population studies, or is there a model that predicts that <=60% SpO2 will cause irreversible cell injury? I don't know, but I don't think reference ranges _should_ have to be based only on statistical variance etc. --- ## Post #43 by @siljelb [quote="thomas.beale, post:42, topic:2087"] I don’t think reference ranges *should* have to be based only on statistical variance etc. [/quote] I agree, and I'm not saying they *always* are. Just usually. (https://en.wikipedia.org/wiki/Reference_range#Standard_definition) --- ## Post #44 by @DavidIngram I am not knowledgeable enough anymore to weigh the requirement and implementation issues in play in this discussion. It is clearly an important matter, though, as I outlined before. I hope the following further reflections may be useful in helping decide the immediate issues you face with handling z-scores. Silje’s question is important and the approach she rightly points out for defining a reference range for a relevant, well-defined and well-sampled reference population seems unarguable and useful to have available in context of interpreting the raw measurement. This use relates directly to the datum itself and not to a statistic derived both from the datum and summary characteristics of an assumed population from which it is drawn (mean and standard deviation in the context of Z-score). That is a difference which may carry significant consequences downstream – both for openEHR methodology and how it is used in practical clinical context. It can open a door to noise and bias in judgement (see below), so needs to be understood carefully. One has to think about how such information is going to be used. I see long lists of clinical measurements and their reference ranges on my phone, in my GP health record. At my age, I get screened from time to time using automated tests of blood biochemistry that report maybe 50 such numbers. The GP is alerted about any that may be drifting outside this range and is invited to comment for the record when they do. Usually ‘acceptable’ or ‘will keep an eye on this next year’ or the like, in my case, fortunately, still! . It’s a rule of thumb judgement based on personal experience – the GP’s of lots of folks in their mid seventies, like me, and of my particular profile. It is a judgement and as Kahneman emphasises in his brilliant recent book (Noise – a fault in human judgement), there is a lot to be said for treating such judgement as akin to a measurement, with associated bias and noise. The examples he uses in developing this argument, notably in judicial proceedings, are mind boggling! It’s a great read. In the clinical world, trends within a statistically defined reference range may well be individually relevant and may be significant (someone normally at one end of the range and progressing steadily towards the other end, for example) . Normal clinical variability in a population is notoriously wide and can mask ongoing trends that are already well advanced when they drift outside the defined reference range. Its hard not to see this surveillance of data not merging into the world of analysis and judgement. The issues being debated here are about a grey area between characterising and recording measurement and characterising and recording judgement and action. The kind of measurements Thomas was mentioning – partial pressure of CO2 or oxygen saturation – are not so easy, or perhaps useful, to compartmentalise in reference ranges. They reflect all sorts of non-linearities that can quickly become acute problems. The shape of the oxygen dissociation curve is such that the body copes well until deoxygenation progresses to a different region of the curve where the blood’s ability to transport oxygen to support metabolism becomes rapidly more at risk. I doubt anyone would thank their doctors for awaiting a crisis alert at oxygen saturation as low as Thomas quoted! Likewise, the CO2 range he mentions needs to be understood in context of the ongoing physiological mechanisms of lung function, gas exchange and tissue metabolism. The numbers quoted are so wide-ranging that they mean rather little in this practical context of management. I spent nearly 20 years modelling such physiology and putting it to work in clinical context. It’s hard to get beyond applying fairly simple to follow and experienced rules of thumb in this area and hoping for the best! I’ve seen this reality at first hand in both neonatal and adult ICU context over the years. It seems to me that it is unwise to let decision support-like issues creep too far into data definitions. That’s why this particular use-case around Z-score seems a fairly pivotal one. I assume no one is arguing for a machine learning algorithm to become associated with data definitions. If so, we should probably abandon openEHR and let the machines get on with it! We’re not quite ready for that (yet or hopefully ever!), but where do we steer our modelling paradigm as boundary cases like this challenge existing openEHR methodology and requirement for it to evolve further. Its an empirical matter and that’s why implementation, implementation, implementation (in all the domains openEHR interacts with) must remain the top three priorities guiding development of both openEHR methodology and its wider community. That was written into our community ethos from day one. I hope this doesn’t sound too irritatingly detached from the nitty gritty of what you are trying to resolve. One tends to a helicopter view of much of life as one gets older! The --- ## Post #45 by @Seref Ok, this time replying to Silje rather than myself from the past. I did a quick (and still incomplete) read of the responses so far and it looks like even though your original question sounds impartial, based on your later responses it looks like you do have a preference, and that is to have z-scores in the RM as an attribute of a suitable type. Which is absolutely fine. That's a preference, and as most things are, a design choice. Your preference is based on the Cluster based design (which Tom also mentions in his early responses) producing very unwieldy archetypes, in your own words. If I try to reword my old response: I think having this as an attribute in an existing RM type may lower the precision of RM as an alphabet of clinical data. In exchange for loss of precision in expressing data semantics, you gain convenience in modelling concepts: because of inheritance, attributes are just there for you to populate, no need to express that with a cluster (or so I read your response). We can live with the loss of precision, if it's a 80/20 or maybe even a 90/10 situation. The funny thing is of course, my view may not find support from anybody else. I am more inclined to think about statistical concepts from the perspective of a statistician than a clinician's, and my view of the domain may therefore be different than that of a clinician, who's used to using a z-score as a pragmatic metric that is implicitly connected to a number of concepts: normality, population mean etc. @bna 's point about why are we event putting this into models is even more interesting, because he makes a good point about this being a kind of metadata, especially if you're adopting a more statistician view. The distribution of data is meta-data indeed, just like how it can be displayed on the screen! I have always been against higher level concepts leaking into clinical models, so if I adopt @bna 's pov, then I'd have to agree not having z-scores in RM or archetypes :) So I'll suggest we discuss who the stakeholders are * clinicians, * secondary use consumers (statisticians etc), * software developers (and companies who pay their salaries :) ) and then decide who's our priority here and what kind of agreement would make all stakeholders reasonably happy or at least leave them grumbling least :) --- ## Post #46 by @thomas.beale [quote="DavidIngram, post:44, topic:2087"] It seems to me that it is unwise to let decision support-like issues creep too far into data definitions [/quote] We should heed this advice... [quote="Seref, post:45, topic:2087"] I think having this as an attribute in an existing RM type may lower the precision of RM as an alphabet of clinical data. [/quote] Well, if we put it in the wrong place / wrong way. Dedicated data types, possibly wrapping other raw values are the way to go in my view, not the addition of stats-related attributes into existing data types, which all represent raw data rather than statistically annotated or processed data. If we want to add something to the RM, there will be a way to do it that has no adverse effect on existing data or software. [quote="Seref, post:45, topic:2087"] So I’ll suggest we discuss who the stakeholders are [/quote] I would say that we can assume that if point-of-care clinicians are looking at z-scores etc, then having a clear representation that doesn't mess with any other data is essential. But the long term priority for statistical annotation / generated values is surely your second category - secondary use consumers (statisticians etc). That implies solving the general problem properly - i.e. representing any kind of distribution (at least, recording the name) and relevant params. We still only want to limit ourselves to the data types we need *in the individual patient record or reprocessed versions thereof* (e.g. HighMed-like) - full CDS, trial study solutions etc will have needs around representing entire populations and their statistical characteristics. --- ## Post #47 by @siljelb [quote="thomas.beale, post:46, topic:2087"] Dedicated data types, possibly wrapping other raw values are the way to go in my view [/quote] What would this look like in terms of practical modelling? Would we have to manually change the data type in every use case where we anticipate the need for z-scores or percentiles? [quote="thomas.beale, post:46, topic:2087"] I would say that we can assume that if point-of-care clinicians are looking at z-scores etc, then having a clear representation that doesn’t mess with any other data is essential. But the long term priority for statistical annotation / generated values is surely your second category - secondary use consumers (statisticians etc). That implies solving the general problem properly - i.e. representing any kind of distribution (at least, recording the name) and relevant params. [/quote] Agree. --- ## Post #48 by @thomas.beale [quote="siljelb, post:47, topic:2087"] What would this look like in terms of practical modelling? Would we have to manually change the data type in every use case where we anticipate the need for z-scores or percentiles? [/quote] Good question... I think initially you (modellers) would need to make it fairly black and white : is the intention in this data field to record a z-score (or similar stats piece of info). Consider that emerging low-code form & app generators will use what is in the archetype to generate entry widgets / labels / controls. We don't want to see all that extra stats stuff being added to 'normal' fields, but we do want to see it on the fields where we really need it. This doesn't answer your question properly, but I think the CM group needs to analyse not just the data but the question of when you do / might need it. --- ## Post #49 by @siljelb [quote="thomas.beale, post:48, topic:2087"] Good question… I think initially you (modellers) would need to make it fairly black and white : is the intention in this data field to record a z-score (or similar stats piece of info). [/quote] To me, this means that we may have to make breaking changes to a significant number of the published archetypes, making this modelling change. As Heather pointed out, this kind of information will likely be used in an increasing number of use cases in the near future. Does this mean that we should model every DV_QUANTITY like this in the future, to avoid having to make breaking changes if a z-score requirement pops up at a later point? --- ## Post #50 by @thomas.beale [quote="siljelb, post:49, topic:2087"] To me, this means that we may have to make breaking changes to a significant number of the published archetypes, making this modelling change [/quote] Hm. Solution #1 -------- that can be an argument for modelling z-score and similar archetype fields rather than RM types, if you want to be able to retrospectively add in z-score data points on top of existing DV_QUANTITY and similar raw items. You would need to leave the current items where they are and make either direct siblings, or a sibling CLUSTER for each field of name xxx, named (say) xxx-statistical or so. This will mean the changed archetype will still work with old data, and the old archetype version will work with data create by the new version (i.e. in other systems, apps etc). This will be fairly clunky and I suspect will not feel very clean for the variety of statistical-related needs going forward. Solution #2 --------- We could add a new optional attribute to (say) DV_QUANTIFIED (or maybe even DATA_VALUE - seeing what @Seref said earlier) that pointed out to some DV_STATISTICAL_VALUE kind of type (let's just assume this could be agreed and designed, or @Seref just works it out ;) . If you opened an old archetype with the Archetype Designer that had loaded the new RM with this new attribute, your DV_QUANTITYs and other data types would now have some further archetypeable things that you could constrain - this is where you'd say z-score / normal, or T + params, or X + params etc. Updated software and forms could now show (say) an optional button next to your raw value for adding z-score or similar (depending on what you archetyped). As long as all that data were optional, that new software won't die with old data, and old software *probably* won't die with new data that have those fields filled in (it depends a bit on DB representation and method of data materialisation). Note this solution is the opposite of 'wrapping' - indeed it is more like the way reference ranges are attached to Quantity data items. This kind of change really requires much more careful analysis than I have given here, by (at least) the usual suspects i.e. experts on the clinical side, and @borut.fabjan , @yampeku , @pieterbos to analyse it through properly, plus some secondary processing people. I'd recommend a dedicated wiki page to explicate the problem / needs, and probably some dedicated calls to figure out all angles of proposed technical change impact on tools, data etc. You have opened not a can of worms but a reservoir of alligators ;) --- ## Post #51 by @siljelb [quote="thomas.beale, post:50, topic:2087"] Solution #2 [/quote] I like this suggestion. [quote="thomas.beale, post:50, topic:2087"] This kind of change really requires much more careful analysis than I have given here, by (at least) the usual suspects [/quote] I'd be happy to participate. We should probably find some clinicians who are particularly well versed in statistics too. [quote="thomas.beale, post:50, topic:2087"] You have opened not a can of worms but a reservoir of alligators :wink: [/quote] I suspected as much when I started the thread. We've discussed this at least twice before, including the thread referenced by Seref. without finding a workable solution. Hopefully third time's the charm :crossed_fingers: --- ## Post #52 by @bna [quote="siljelb, post:49, topic:2087"] this kind of information will likely be used in an increasing number of use cases in the near future. [/quote] I have not yet seen any evidence or examples of this. Would be great to see some examples. Reason why I am asking is of course biased by my initial thinking that this kind of information is not part of the datum/measurement. I would appreciate a concrete example for this kind of modelling. --- ## Post #53 by @Seref [quote="thomas.beale, post:50, topic:2087"] or maybe even DATA_VALUE - seeing what @Seref said earlier [/quote] I'd say that feels like a move towards a god class. I'd live with it in the name of being pragmatic to keep modellers happy, but dv_date_time ending up with an optional z-score will lead to some interesting conversations years down the line when someone asks why's this here?. As I said before, it's a tradeoff. [quote="thomas.beale, post:50, topic:2087"] This kind of change really requires much more careful analysis than I have given here, by (at least) the usual suspects i.e. experts on the clinical side, and @borut.fabjan , @yampeku , @pieterbos to analyse it through properly, plus some secondary processing people. [/quote] yes. stakeholders, as I called them. [quote="thomas.beale, post:50, topic:2087"] I’d recommend a dedicated wiki page to explicate the problem / needs, and probably some dedicated calls to figure out all angles of proposed technical change impact on tools, data etc. [/quote] Wholly agreed. --- ## Post #54 by @thomas.beale [quote="Seref, post:53, topic:2087"] I’d say that feels like a move towards a god class. [/quote] We don't do God classes, only ancestors ;) I wasn't being 100% serious about DATA_VALUE, but we do need an analysis of whether a statistical value type is needed for anything outside DV_ORDERED or Quantity types. My gut feel is that's as far as we should go, but others may know better. When we know what existing data types we want to apply statistical annotations / values to, we'll be able to see how to model it properly. --- ## Post #55 by @heather.leslie We do need to plan for managing the way data might be collected and used in the future. We have a well-documented example in front of us, supported by academic papers. The Spirometry example is perfect - it's a new approach developed only in the past couple of years. I'd never heard of it before. And it makes good clinical sense. As personalised medicine comes to the fore, the Z-score for various measurements such as Spirometry identifies where someone sits in the bell curve for their personal characteristics, genetic profile, etc will become more meaningful. The raw value recorded will still need to be stored, but the Z score that places that raw data within a population context will likely be used to determine severity or treatment options and warrants storage as well. It could well be that the Z-score, rather than the raw score, becomes the trigger for CDS. We don't have a crystal ball here to foresee how much medicine changes, but as we get more data on populations, the capacity to refine clinical treatment to the individual within an appropriate population context will rely on these kinds of derived datum. --- ## Post #56 by @joostholslag I’ve been tracking this topic. At first I wasn’t really interested, since I’ve not come across this requirement. But I do think Heather is right, with the advance of personalised medicine, z-scores are very useful. I would like to add one perspective that may explain a bit about the disagreement between Silje and Bjørn:as stated before z-scores indicate the variance between the recorded measurement and a population. This is not an observed value in itself. But since the population data is/was pretty fixed, the z-score became pretty much a 1:1 transformation of the observed value. And this feels like an observable entity. But it isn’t, since the population data will change, changing the z-score and this is against the principle of an observable entity: it shouldn’t change if circumstances change. This currently is an negligible problem, since the population data is very fixed, and the observed valued is event based, and the relevance of the observed data expires long before the population data (and thus z-score) changes. But with the advance of personalised medicine the population data may get tailored to the individual and thus much smaller and more sensitive to change. It may even be continuously calculated from live data, or even estimated using predictive statistics/ML. And so it gets increasingly important to record and show the relevant metadata to the user interpreting the metadata. E.g. population used, population data last updated, calculating algorithm etc. I’m worried about storing this info in the observation, since it’s not observable and as I’ve shown is not going to be feeling close to that for long. How to solve this I don’t know. The main usecase is decision support, I’d say. That’s both supporting the user to interpret the data as well as computerised decision support. So storing the z-score as an evaluation already feels a little better. But still this feels off, since there’s no human interpretation/evaluation of the data involved. A new entry type might be a bit much? How about GDL algorithms? But where would that algo then store the z-score? Does it have to, since it could be live calculated at the moment of display? On the other hand if a clinician bases the decisions on the z-score you do want to record it as it was at the moment of decision making, if only for legal auditability. Recoding in the observation might be the best place. But recording it in the element of the original measured value feels wrong, conceptually. A fourth archetypeable attribute of the observation (and parents) class, next to data, state and protocol? I do tend to agree the z-score and the like should be solved at RM level, mainly because it’s a fundamental concept and the usage feels to universal to do a hack like archetyping with clusters as suggested. Tough stuff. --- ## Post #57 by @siljelb [quote="joostholslag, post:56, topic:2087"] if a clinician bases the decisions on the z-score you do want to record it as it was at the moment of decision making [/quote] I agree with you this isn't easy, but I think this is a key point. A central question then becomes, will z-scores ever be recalculated at a later point based on an old measurement, or will they always be based on a fresh measurement of the relevant clinical measurement? And if they're recalculated, will the new scores be reinserted into old compositions as a new version, or will they be persisted as a new composition referencing the old measurement? --- ## Post #58 by @joostholslag [quote="siljelb, post:57, topic:2087"] will z-scores ever be recalculated at a later point based on an old measurement, or will they always be based on a fresh measurement of the relevant clinical measurement? And if they’re recalculated, will the new scores be reinserted into old compositions as a new version, or will they be persisted as a new composition referencing the old measurement? [/quote] Well this is mainly a clinical question. I’d say the calculated z-value should be recorded at the time of decision making. And if the population changes this shouldn’t be recalculated automatically. Because it may make your descision ‘wrong’. So what may happen is, if the population data (or the patient characteristics on the basis of which a population was selected) changes, after the decision was made, you may want to reevaluate that descision and record a new (version of) a composition that records both the new z-score (part of new version of the observation that contains the measured value) and the new decision as an observation. We have to be careful with the timing though, we must be careful not to change the timing of the moment of observation in a new version. The RM supports this fine, but easy to go wrong in implementation, since usually time of recording is so close to time of measurement, this is often used as a surrogate which goes wrong if you change the z-score in the observation days later. Another argument against storing the z-score in the observation. --- ## Post #59 by @siljelb Hm. Do I understand you correctly if I read this as a suggestion to record a Z-score as an entirely separate ENTRY archetype? I can see some pros to this approach, but I suspect clear bindings to the measurement the Z-score was based on would be messy at best. We'd probably need to use the LINK parameter as a direct reference to the specific data element of the measurement, which is explicitly discouraged in the specs... (https://specifications.openehr.org/releases/RM/latest/docs/common.html#_link_class) --- ## Post #60 by @ian.mcnicoll I'm persuaded by Heather and Silje's examples that whilst this is right now a bit unusual, that it may well become more common as a different way of expressing the results of an Observation, and probably merits RM support Also I can see that it is closely related to the raw datapoint but wary about overloading quantity/amount yet again with even more content which is quite obscure for most use-cases. I'm coming round to the idea of a new datatype, as I can see all sorts of metadata needing to be captured alongside the Z-score. We already handle other 'derived values' such a calculated scores (NEWS2) as separate Elements, and that feels like a good compromise, also perhaps easier to explain to newbies, then delving deep into an overloaded quantity datatype. --- ## Post #61 by @siljelb [quote="ian.mcnicoll, post:60, topic:2087"] I’m coming round to the idea of a new datatype [/quote] How do you envisage this new data type being used? [quote="ian.mcnicoll, post:60, topic:2087"] I can see all sorts of metadata needing to be captured alongside the Z-score [/quote] Agree. --- ## Post #62 by @pieterbos [quote="ian.mcnicoll, post:60, topic:2087"] We already handle other ‘derived values’ such a calculated scores (NEWS2) as separate Elements, and that feels like a good compromise, also perhaps easier to explain to newbies, then delving deep into an overloaded quantity datatype. [/quote] Yes, just an extra element would work. Perhaps in a way it is not so different from the case where some score has multiple intermediate scores or even multiple final scores. Those are directly calculated from the data, but need to be stored separate as well, to ensure the data a decision was based on is stored exactly as it was shown at that point in time. [quote="ian.mcnicoll, post:60, topic:2087"] I can see all sorts of metadata needing to be captured alongside the Z-score [/quote] Should those be separate elements in the protocol of the observation? --- ## Post #63 by @joostholslag [quote="pieterbos, post:62, topic:2087"] Those are directly calculated from the data, [/quote] That's exactly the difference, z-scores are not directly calculated from the data, they are calculated compared to a reference population data set. Currently often that data set is so fixed (e.g. head circumerence for boys) it feels like a direct calculation. But it isn't and we'll get into trouble if we make design decision on that false assumption. I'm fine with saying it's too hard to solve right no, so just be pragmatic and go with a seperate element in an archetype, or even a cluster with raw value and statiscal meta data pattern. But thats postponing doing the hard stuff. And 'technical/modelling debt building up. Could be fine if we say this is a niche thing. But if we say (as all clinical people in this thread do) this is going to to be a 'bigger' thing. This is our chance to do it right. --- ## Post #64 by @joostholslag [quote="siljelb, post:59, topic:2087, full:true"] Hm. Do I understand you correctly if I read this as a suggestion to record a Z-score as an entirely separate ENTRY archetype? I can see some pros to this approach, but I suspect clear bindings to the measurement the Z-score was based on would be messy at best. We’d probably need to use the LINK parameter as a direct reference to the specific data element of the measurement, which is explicitly discouraged in the specs… ([Common Information Model](https://specifications.openehr.org/releases/RM/latest/docs/common.html#_link_class)) [/quote] It was what I was 'suggesting'. Just to keep a broad view of possible solutions. And because conceptually it doesn't fit the current entry classes. But you make a good point against it. And I dislike the inelegance of it myself. Do we handle any similar derived/secondary data in other archetypes? I don't agree with the NEWS pattern comparison, since it 1 is a 'self standing' datapoint and 2 it only depends on the choosen calculation algorithm, not on an external dataset (yet). So recording the values in data and the algorithm in protocol is a nice conceptual match. Not so for z-score. --- ## Post #65 by @pieterbos [quote="joostholslag, post:63, topic:2087"] That’s exactly the difference, z-scores are not directly calculated from the data, they are calculated compared to a reference population data set. [/quote] Even those 'directly calculated from the data' have a method or formula in between that is not stored in the data. The difference is that with the z-score some external data is used (the distribution). But even that is not so different from some of the calculation methods of scores, where for example the calculation is different for different ages or genders. And even if it is viewed as more complex than directly calculated from data, that reinforces the point I was trying to make - if we already store calculated values from more straightforward calculations, I think it should be stored in this case as well. And it could be stored in the same as we do with scores, with just an extra element, and perhaps some extra information in protocol to store what steps were followed to calculate these scores. --- ## Post #66 by @ian.mcnicoll No - I think they are tightly bound to the Z-score. I'd actually prefer to tag elements as being protocol vs data etc rather than the current structural commitment as although it works well inmost cases there are situations where e.g. multiple events need different protocol (different device), also easier to change if you get it wrong. --- ## Post #67 by @joostholslag [quote="pieterbos, post:65, topic:2087"] Even those ‘directly calculated from the data’ have a method or formula in between that is not stored in the data. The difference is that with the z-score some external data is used (the distribution). But even that is not so different from some of the calculation methods of scores, where for example the calculation is different for different ages or genders. And even if it is viewed as more complex than directly calculated from data, that reinforces the point I was trying to make - if we already store calculated values from more straightforward calculations, I think it should be stored in this case as well. And it could be stored in the same as we do with scores, with just an extra element, and perhaps some extra information in protocol to store what steps were followed to calculate these scores. [/quote] Well, largely agree: If there are calculations happening, they should be in the rules section of the archetype actually executing the calculation and/or there should be some datum the states with calculation is used, so as to allow the consuming clinician to interpret the data. What we could do is record the z-score as an element in data. And add elements for meta data (population, algorithm etc) in protocol. But the clinicians expect this to be such a common usecase that it will have to modelled in most archetypes, including changing most current one. Since this is a big amount of manual work, it would be nice if it’s solved by the RM. That can even solve it for current archetypes (by updating rm version), right? [quote="ian.mcnicoll, post:66, topic:2087"] I’d actually prefer to tag elements as being protocol vs data [/quote] I tend to like the idea. But inheritance is problematic here, protocol is in care_entry, data is only in observation and evaluation. But would you do the same for state? How would you handle the difference between state of the entire observation vs state of an element? If we can only do protocol under data, not state, it is not very useful right? --- ## Post #68 by @thomas.beale [quote="joostholslag, post:63, topic:2087"] That’s exactly the difference, z-scores are not directly calculated from the data, they are calculated compared to a reference population data set. Currently often that data set is so fixed (e.g. head circumerence for boys) it feels like a direct calculation. But it isn’t and we’ll get into trouble if we make design decision on that false assumption [/quote] yes this is an important point. --- ## Post #69 by @thomas.beale [quote="pieterbos, post:62, topic:2087"] Should those be separate elements in the protocol of the observation? [/quote] Please not! Protocol is non-data in the sense it provides no clinical semantics, only info about methods of doing things. --- ## Post #70 by @thomas.beale [quote="joostholslag, post:67, topic:2087"] If there are calculations happening, they should be in the rules section of the archetype actually executing the calculation and/or there should be some datum the states with calculation is used, so as to allow the consuming clinician to interpret the data. [/quote] THis only works for calculations whose inputs are wholly within the data set of the archetype. Calculations that are effectively inferences relying on other knowledge resources have to be done elsewhere, e.g. in executions of CDS / Guideline algorithms. [quote="joostholslag, post:67, topic:2087"] What we could do is record the z-score as an element in data. And add elements for meta data (population, algorithm etc) in protocol. [/quote] This will be fractured bits and pieces, and a misuse of the protocol subtree, if the z-scores are to be considered as clinically relevant data. [quote="joostholslag, post:67, topic:2087"] data is only in observation and evaluation [/quote] I thought observable raw values was the scope of discussion here? Recording of z-scores in Evaluations could make sense of course - maybe more so, since they might be considered as a kind of interpretable assessment rather than raw data. But there is no obstacle to doing this if the appropriate new data types are created. --- ## Post #71 by @heather.leslie These are definitely data and need to be recorded as such. Remember that there may be more than one Z score per archetype. In Spirometry there will be more than one measurement that can have an associated Z score. BP has 4 measurable data elements. See the pattern...? We can assume that each measurement in an archetype references the same population cohort for deriving a Z score, but it is an assumption and as soon as we lock that in no doubt we'll find a use case to break it. So we need to record the metadata that enabled each Z score to be derived, and commit it alongside the raw measurement so that we have provenance /foundation for how the treatment was chosen. If the cohort or derived value changes, the treatment protocol may also need to change and we need to consider how we might model that, but the intimate connection of the original raw data/Z score pair should remain intact in the record. For that reason, I'm increasingly convinced that they should be recorded using the same model, to be stronger than just using a link. We record interpretations of the results of a scale/score or examination findings in OBSERVATIONs all the time. --- ## Post #72 by @thomas.beale [quote="heather.leslie, post:71, topic:2087"] We can assume that each measurement in an archetype references the same population cohort for deriving a Z score, but it is an assumption and as soon as we lock that in no doubt we’ll find a use case to break it. [/quote] Yep - I wouldn't assume that for exactly that reason - as soon as you do, there'll be some data point about a donated organ or other family member that needs another population distribution recorded for it. [quote="heather.leslie, post:71, topic:2087"] So we need to record the metadata that enabled each Z score to be derived, and commit it alongside the raw measurement so that we have provenance /foundation for how the treatment was chosen. [/quote] Sounds right to me. [quote="heather.leslie, post:71, topic:2087"] For that reason, I’m increasingly convinced that they should be recorded using the same model, to be stronger than just using a link. [/quote] I also would not prefer to use the Link solution. --- ## Post #73 by @pieterbos I am not entirely convinced that it should be a change to the RM - is there really a big switch in the field happening to percentiles and z-scores as a basis for interpreting observations, and will this be done in so many places that just modelling these will not be enough? And does it really need to be stored, or is this some kind of app? if it should, a possibility could be to add one new attribute to DV_AMOUNT or perhaps even DV_ORDERED (not sure whether ordinal and scale should support this). It could be called something like `population_comparison` or `frequency_distribution_score` or even just `population score`, or perhaps a better term exists. Its type would then be a new class, with as properties the score itself (a DV_QUANTITY or a Real?), a coded text indicating which type of score this is (z-score, percentile, t-score, etc.), and a text indicating which distribution data is used. The cardinality of the attribute could be 0..1, or even 0..*, in case multiple scores need to be recorded. It is probably necessary to add a normal range here as well, since its purpose is evaluating the range of the observed value. This has several benefits over a new DV_ type: - Data stored in the RM is backwards compatible - This can be used without any changes in existing archetypes - Existing archetypes or templates can be easily changed to add constraints to the new scores - It keeps the type of the recorded data the same, which conceptually seems right Alternatively, perhaps a DV_POPULATION_SCORE could be made, as a separate element from the recorded quantity to not have to change all archetypes. The main drawbacks I see is that it has a less explicit binding to the recorded observation, and that it requires explicit modelling to use. --- ## Post #74 by @DavidIngram I’ve spent an evening digging back into the history of z-score in the standardization and interpretation of clinical measurements. I started with spirometry and then other areas. These publications (a few listed below in case of interest) had interesting results and insights. There are different ways reported for calculating z-scores and comparative studies of their efficacy in standardizing measurement and assisting clinical interpretation. Taking just one example – a measurement of something like vital capacity may involve a different reference population and computational method when standardizing it as a z-score, according to the volume measured . That hadn’t occurred to me but it seems very plausible when one thinks about it. The record of the method used in calculating a z score involves more detail than a mean and standard deviation of a Gaussian distribution (mu and sigma) – methods also take account of distribution skewness (lambda) and numerical methods used in calculating the z-score look to be often quite complex algorithms. One author warned of situations with potential to amplify measurement error through use of z-score and another group made extensive study of clinical severity ratings and how they changed according to the different methodologies in use. I’m not in any way disagreeing that clinical use of z-score must find a place in records – clearly it must. What seems apparent from my brief foray into ten years of research and international standardization of measurement through z-score is that it has clear advantages, and detail of which its users should also be kept aware of and in touch with. There is also caution expressed about the validity of the standardization methods used in relation, for example, to different ethnic groups. If used as a proxy measurement, one needs to be aware, as one of the papers shows, that cross plots of z-score and raw respiratory measurement have considerable scatter – how much the limitation of the raw data and how much arising through computation of z-score is an empirical question which is still being explored in many different contexts. As I said before, I’m not sufficiently au fait with the current openEHR clinical/technical sequellae of this discussion, but the issues and choices do seem to merit the careful and cautious consideration shown in the posts thus far. I’m just setting out here the chain of thought that Bjorn’s questions have stimulated in me. David Some stuff I read and found useful, in case of interest: The Global Lung Function Initiative: dispelling some myths of lung function test interpretation | European Respiratory Society [https://breathe.ersjournals.com/content/9/6/462](https://breathe.ersjournals.com/content/9/6/462) Spirometry, Static Lung Volumes, and Diffusing Capacity [http://rc.rcjournal.com/content/respcare/62/9/1137.full.pdf](http://rc.rcjournal.com/content/respcare/62/9/1137.full.pdf) Advances in spirometry testing for lung function analysis [https://doi.org/10.1080/17476348.2019.1607301](https://doi.org/10.1080/17476348.2019.1607301) Optimisation of children z-score calculation based on new statistical techniques [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0208362](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0208362) The use of Z-scores in paediatric cardiology [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3487208/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3487208/) The story wrt anthropometric data like height and weight and growth charts looks very well established. --- ## Post #75 by @joostholslag [quote="heather.leslie, post:71, topic:2087"] Remember that there may be more than one Z score per archetype. In Spirometry there will be more than one measurement that can have an associated Z score. BP has 4 measurable data elements. See the pattern…? We can assume that each measurement in an archetype references the same population cohort for deriving a Z score, but it is an assumption and as soon as we lock that in no doubt we’ll find a use case to break it.So we need to record the metadata that enabled each Z score to be derived, and commit it alongside the raw measurement so that we have provenance /foundation for how the treatment was chosen. If the cohort or derived value changes, the treatment protocol may also need to change and we need to consider how we might model that, but the intimate connection of the original raw data/Z score pair should remain intact in the record. For that reason, I’m increasingly convinced that they should be recorded using the same model, to be stronger than just using a link. [/quote] Agreed. [quote="pieterbos, post:73, topic:2087"] I am not entirely convinced that it should be a change to the RM - is there really a big switch in the field happening to percentiles and z-scores as a basis for interpreting observations, and will this be done in so many places that just modelling these will not be enough? [/quote] @heather.leslie I agree with the other clinicians there’s a clear expectation. But I also think Pieter’s is a fair question, since modelling in the RM should generally be minimised. And there are currently few archetypes that need the z-score I’m aware of. Is there any data or papers that back up our expectations? [quote="pieterbos, post:73, topic:2087"] if it should, a possibility could be to add one new attribute to DV_AMOUNT or perhaps even DV_ORDERED [/quote] I think this is a nice approach. It keeps the transformed data very close to the raw value, while still indicating it’s a transformation of that raw data, not an observed value. And it’s nice for backwards compatibility indeed. So to summarise the proposal is to store the observed data e.g. length in the value attribute. And add optional attributes for * transformed value (other name suggestions?), * algorithm for transformations (z/t score, reference, custom raw algo) * comparing dataset * normal range 0..* makes sense. But we’d have to find a way to group the attributes, they should occur together, not individually. --- ## Post #76 by @siljelb [quote="joostholslag, post:75, topic:2087"] [quote="pieterbos, post:73, topic:2087"] if it should, a possibility could be to add one new attribute to DV_AMOUNT or perhaps even DV_ORDERED [/quote] I think this is a nice approach. It keeps the transformed data very close to the raw value, while still indicating it’s a transformation of that raw data, not an observed value. And it’s nice for backwards compatibility indeed. So to summarise the proposal is to store the observed data e.g. length in the value attribute. And add optional attributes for * transformed value (other name suggestions?), * algorithm for transformations (z/t score, reference, custom raw algo) * comparing dataset * normal range [/quote] I also agree with this suggested approach. I suspect DV_AMOUNT is sufficient. I would be very suprised if a Z-score is needed for a DV_ORDINAL or DV_SCALE. [quote="joostholslag, post:75, topic:2087"] 0…* makes sense. But we’d have to find a way to group the attributes, they should occur together, not individually. [/quote] I think this can be handled in the same way as ```other_reference_ranges```, ie as a ```List``` of objects? ![image|690x49](upload://vzBB0IHu7wQ293EL00i23Sra9AO.png) --- ## Post #77 by @thomas.beale [quote="siljelb, post:76, topic:2087"] I think this can be handled in the same way as `other_reference_ranges`, ie as a `List` of objects? [/quote] I feel something good coming together! Only 77 posts to get here ;) --- ## Post #78 by @siljelb [quote="thomas.beale, post:77, topic:2087"] I feel something good coming together! Only 77 posts to get here :wink: [/quote] Great! How do we progress from here? :smiley: --- ## Post #79 by @thomas.beale You or @heather.leslie or anyone (but better from clinical side) creates a PR in the usual place, and puts a small summary of the bits that matter of this conversation into the description field (plus a link to this thread). (We like to do it this way so that the author of the PR is actually the person or group for whom the problem exists - then we can see a historical record of requests coming from CM group or anywhere else for that matter). --- ## Post #80 by @DavidIngram Having seen the detail of how z-scores are calculated and used in practice, it will be good to work from this as a usecase in specifying what and where you wish this to be recorded and accessible. Respirology has been adduced most often in this thread so maybe a respiratory physician might be found able and willing to help in this. --- **Canonical:** https://discourse.openehr.org/t/z-scores-and-percentiles/2087 **Original content:** https://discourse.openehr.org/t/z-scores-and-percentiles/2087