Pattern for modelling of scores (and PEWS)

martin.grundberg · 5 May 2020 20:26

We have recently been looking into the requirements around PEWS based on requirements in Sweden.

I am by no means an expert on the topic, but PEWS seems to be interpreted differently based on country and other contexts. There does not seem to be any consensus on how to calculate a PEWS score. There is no PEWS archetype in the international ckm either.

I hade a brief discussion with @siljelb who mentioned the Norwegian PEWS archetype (https://arketyper.no/ckm/#showarchetype_1078.36.1402). Turns out, in Norway PEWS is always calculated in the same way, whereas in Sweden, the calculation is done differently based on patient age.

My initial thought was that this means that we cannot share the Norwegian PEWS archetype. Especially as it has explicitly modelled scoring rule for calculation in the archetype, e.g. what respiration interval corresponds to what score.

Then I started to think about the impact of modelling the actual scoring rule into the archetype like this for us in Sweden. Should we then have 5 or so different PEWS archetypes, one for each age interval for the patient? This does not sound reasonable?

Instead, wouldnt it be better to not include this type of rule inside the archetype? That means that we can create one common international PEWS (assuming that we can agree on the different type of Observations that could be included in the rule, or agree to include the maximum set). A PEWS is a PEWS. Then you would have some other attribute (as part of the Protocol?) where you define what rule (a GDL rule?) that the score was based on.

I can of course see a hardcore informatics view on this, a PEWS calculated based on one formula is not exactly the same as a PEWS calculated by some other formula, hence they need to be modelled as different archetypes. But what benefit does this really give? It wont make life easy for implementers or enable interoperability. Cant we see PEWS as a score for assessing the clincial status of pediatric patients, regardless of the method of getting to the score?

There are other concepts like this, we are currently looking into the RETTS triage method which is common in Sweden. It also has age intervals impacting the method of determining the RETTS triage level. But having discussed that use case, the RETTS scores is seen as the same, regardless of how they have been calculated. Meaning, we should probably not model different RETTS archetypes per age interval, and we should probably not model the rule inside the RETTS archetype.

Question
Isnt it a more sustainable pattern for modelling scores and similar concepts to not include the actual formula, rule, protocol etc that gives you the score inside the archetype, and instead include some type of metadata for this in the protocol or similar.

ian.mcnicoll · 6 May 2020 09:13

Hi Martin,

This is tricky!! There is a new UK PEWS score coming out (it would have been released by now, if not for COVID-19). I will try to see if it possible ot get a heads up on the changes.

Where a score has different variants because of age or other significant criteria, it is always a difficult decision to know whether to try to cope with the differences in a single archetype, which is easier to manage. But often, the differences just become so convoluted that it is just not possible to shoehorn all of the variants into a single artreact - you could even argue that NEWS/PEWS is a variant of this. In the brief look I has at PEWS2, the initial thought was that this could be modelled in a single archetype but that was a very early judgement. So the single archetype approach may work for PEWS2 but I’m not sure we could make that principle try to accomodate all of the various flavours safely and manageably. Once you start to add in something GDL or internal rules it all sorts to get very complex, and if nothing else very hard to explain to clinical colleagues and developers.

“A PEWS is a PEWS” - is it? I get that the difference may seem unimportant from a local frontline perspective but it will matter to people doing research, and it may matter if doing cross-national handovers, where there a re subtle differences in scoring or interpreting the scores.

I’d love to see a single international score but I don’t think we can force that through archetype development - we can certainly support it but it would have to come from a significant clinical leadership.

The issue about where to include calculation rules is an interesting one. Right now we can only really use GDL which of course makes perfect sense since the calcs often involve raw inputs from a number of archetypes. There is actually a feature in ADL called ‘rules’ which allows for some sort of rule syntax to be embedded in a single archetype. The problem in populating that, for now, is in deciding what language to use - that is being worked on right ‘Expression language’ now based on experience from GDL, Task planning.

So, I d probably agree with your last statement, in principle, that where there is strong clinical consensus that ‘PEWS is PEWS’ or ‘RETTS is RETTS’, then there may be a case for having a single archetype, However I suspect that in many cases, getting that clear consensus may be difficult, and actually building the models and making them manageable might then be tricky in many cases.

martin.grundberg · 6 May 2020 11:15

I fully understand that maybe a PEWS is not always a PEWS, or at least, I am not the person to say whether that is the case

But cant you really argue the same around other clinical concepts? Take a blood pressure, different blood pressure measurements are not comparanble depending on their state and/or protocol. Couldnt you argue that whether PEWS are comparable depends on your use case. Either your use case is more generic, ie interested only in the overall concept of pediatric early warning scores regardless of how they have been calculated, or your use case is more precise, in which case you use something like the PEWS protocol and/or data to distinguish them?

What would be the recommended approach for the Swedish PEWS if we have the different age intervals if we do not model the rule outside the archetype?

This does not seem to be a unique requirement for PEWS in Sweden either, this is what seems to be from the Scottish PEWS: “There are five age appropriate charts (0-11mths; 12-23mths; 2-4yrs; 511yrs ;> 12years).”
https://www.clinicalguidelines.scot.nhs.uk/ggc-paediatric-guidelines/ggc-guidelines/intensive-and-critical-care/paediatric-early-warning-score-pews/

If we come to the conclusion that there should be a single archetype within a Swe (or Scottish) context, couldnt that reasoning be used to argue a single international archetype as well? And modelling as separate archetypes, will that really help e.g. research where all PEWS are localized in their implementation?

PEWS is an example which we are currently looking at, but I think we could have the same discussion around many similar concepts. As you say, NEWS is the same, but seem to have less divergence in the rules.

I am a bit new to this, and especially the GDL side of things, so need to look into that a bit more and how that can be tied with the ADL. Thanks for the tip!

martin.grundberg · 7 May 2020 06:38

@ian.mcnicoll, I think this comes down to two separate questions:

1) Are “the same” scores calculated based on different formulas the same concept and can be modelled as one archetype, e.g. is a PEWS always a PEWS?
-I guess the answer is “it depends” and requires clinical review per score

2) Should decision support rules, business logic etc be modelled into the archetype
-This seems to be the norm currently for scores, but it cant be a sustainable model? Doesnt that force us in Sweden to model 5 PEWS archetypes? In RETTS (https://predicare.eu/) there are 9 age intervals, does that mean we should do 9 RETTS archetypes?

It would be great with a scalable pattern for scores as they are such a common use case, and is the current pattern for scores really sustainable?

siljelb · 7 May 2020 07:18

IMHO the answer to this question is generally “no”. Can you safely and predictably compare two *EWS scores made based on different thresholds? If not, they are really separate things, and should be treated as such.

In general, I don’t think decision support rules should be built into archetypes. Simpler business logic like the formula to calculate a score could be though, if we had an established language to do so. I don’t claim to have the answer for your question about the Swedish PEWS, or RETTS, but in general I’d tend towards modelling them as different archetypes, for the reason outlined above. If I understand the way these are built correctly, they’re in principle no different from GCS vs pGCS. Should those be modeled as one archetype?

I realise I’m going to come across as a naysayer and party pooper here, but here goes: This is a very tempting idea, but based on experience I think this will turn out to be more complex and less useful once you get into the details. Some clinical concepts lend themselves to generic models, but those are usually quite tricky to get right, and always depend heavily on terminology to avoid mixing up different things.

I understand the argument about sustainability, and there are a lot of scores and scales out there. However, they’re usually fairly straightforward to model (with some exceptions), and once made they’re usually very stable. I think we’d make more headaches for ourselves trying to shoehorn a new score or scale into a generic score archetype pattern rather than just churning out a new archetype.

Something we maybe should discuss though, is the review requirements for score and scale archetypes. Do they really need a proper community review, or could they be fast tracked through by having a few people compare the source materials with the archetypes?

martin.grundberg · 7 May 2020 07:55

@siljelb, can’t you argue the same about to different blood pressure measurements? Excuse the poor example, but lets say one where the patient it at rest and lying down, and the other after significant physical effort, standing up and with a different measurement location? Ie, blood pressure measurements are not always comparable, but still the same generic concept of blood pressure. Modelling them as different archetypes makes things more difficult for interoperability, implementation etc.

I also get your point, with my argument you could also say there should be a “score” archetype that caters for all scores (NEWS, PEWS etc), the scoring formula is kept elsewhere and is only referenced through the score archetype metadata (protocol etc). That is not what I am arguing for

I have little experience in modelling archetypes, so I value your and the rest of the communities input (and may very well soon realize I’m way off here :D)! But in other areas of modelling and separating responsibilities in a large and complex EHR, this doesnt feel scalable to me. Im just worried that we are not using the right tools in the wider informatics toolbox, and instead treating everything as a nail as we have our well known arhcetype model hammer.

Leuschner · 7 May 2020 08:52

Hi,

This is a very interesting topic. We seem to lack a consensus regarding the best way to express these second order information structures.

@martin.grundberg You make a good point with the BP example. The multiple nodes on that archetype acommodate for the differences regarding each kind of measurement. They change the interpretation of the values, and direct comparisions should be avoided. Moreover, there’s even a calculation included in the archetype (mean arterial pressure).

Another scenario that poses a similar problem: evolving classifications. An example would be TNM staging for solid tumors. It evolves over time; those patients that would be staged as a IIIb today, might be staged as IIIa in a previous version of staging criteria for that disease. The actual version used at that time must me preserved, and decisions that were then made on that basis should be interpreted regarding that specific version, although a newer staging version was published.

Would ‘intra-archetype’ logic solve this, or as @siljelb points out, this will turn out to mess things up? Or should we actually separate knowledge and data modeling, expressing the first in GDL and acommodating all knowledge versioning on that side of the HIS?

siljelb · 7 May 2020 09:50

Yes and no. The blood pressure archetype is a good example of a tricky clinical concept that seems deceptively simple on first look. As @Leuschner points out, there are multiple elements in the BP archetype that may provide context to enable reusers to assess whether the information can be reused for their purposes. But:

There’s also a higher level of context that’s very important in this assessment, for example is the measurement performed on an ambulatory patient as part of a routine GP checkup, or on a moribund patient in ICU?
The additional context elements in the archetype may not be recorded.

Reusers must take all of this into account when assessing whether they can reuse the information in their particular use case. In some use cases this may be critical, while in others it may not matter.

Scores and scales are somewhat simpler, in that they’re fully and not just partly human constructs, and they’re somewhat standardised in nature.

It’s interesting that you bring this up! TNM is a special case, in that it’s not just a score, but a versioned knowledge base (usually in book form) of staging criteria for ~all cancer types. New versions are released regularly, where the base elements are the more or less the same, but the thresholds may or may not be different. We’ve chosen to model this as two generic archetypes, one for cTNM and one for pTNM. They both contain elements for specifying the version used (something that’s shockingly absent from the data sets of a lot of cancer registries and pathology reporting templates in many locations ), and all the value sets must be provided from the TNM knowledge base.

If Predicare regularly released a new revision of RETTS every 2 years (do they?), we might do the same thing for those archetypes.

Another consideration is that in clinical scores the tendency is for every local implementation and research project to make their own version of a validated score, based on little or no evidence and without changing its name. This tendency to make their data incomparable with every other project seemingly using the same score, is made less attractive by locking the archetypes down to a more specific content.

ian.mcnicoll · 7 May 2020 09:50

Where the logic is truly intra-archetype e.g calculation of the final NEWS2 score and Grade from the intermediate sub-categories, or the MAP example, yes, I think it makes sense to keep the rules in there but aligning NEWS2 with the base observations like SpO2 resps etc - no this is should always be separate as a GDL, I think.

ian.mcnicoll · 7 May 2020 10:08

the tendency is for every local implementation and research project to make their own version of a validated score, based on little or no evidence and without changing its name

Indeed - we had a few things to say about this at the London openEHR day last year !!

The whole story of archetype design over the past 15 years has been in trying to find generic, reusable patterns and @heather.leslie, in particular did a huge amount in the early days around generic patterns of examination, I did a fair bit on trying to find common patterns in cancer reporting, including an earlier attempt at TNM.

I guess my general conclusion is that while you can sometimes unearth a few nuggets of things that can be easily modelled generically, it all has a habit of going wrong as you push further into the detail that is required by implementations, or need to accomodate both valid and invalid variances.

Remember that we are not just trying to represent these ideas in a technical manner, whether for messaging or persistence, we are trying to create shared discussion spaces where non-tech stakeholders can have a meaningful discussion.

TNM is a great example of what people think of as a ‘standard’ when it is actually just a ‘very’ generic framework which is very different for each cancer type and each new edition. In a perfect world, I would create a new archetype for each Cancer TNM, and possibly for each edition, specialised from the generic ones but this is a huge task and really ought to be part of the TNM knowledge base.

Tobacco/ smoking/substance misuse is another area where we arrived at a usable compromise after a great deal of ‘wrangling’.

The main message is that we can draw on a lot of experience but is often hard to tease out any exact rules. As an example, I said earlier that the draft PEWS2 score does manage to encompass all of the age ranges in a single archetype. In contrast we have also been working on a set of Sepsis assessment archetypes which also have variants based on age/ clinical setting and found that we could not fuse these into a single archetype.

siljelb · 7 May 2020 10:26

Is this archetype available anywhere?

heather.leslie · 7 May 2020 11:37

We are getting to the point where we can say: "this new concept works like "and suggest that a modeller create a model in a similar pattern. This is saving time and resources but the reality is that as soon as you work out a pattern, you also find something that breaks it. If it was possible we’d have written the 10 step ‘fool proof’ manual
But as we slowly we chip away at building a coherent, coordinated ecosystem of archetypes, experience is important, and this is our challenge… to distill and share this wisdom so that others can contribute as well…

ian.mcnicoll · 7 May 2020 11:57

Yes but I am currently embargoed as PEWS2 has not been officially released - I have just asked about the status.

varntzen · 7 May 2020 12:19

When we make a new score, we always try to find the original source by searching in medical publications, google scholar, pubmed, etc, and then make the score according to that (or a later modified one). Very often we find local adjustments, national or per hospital. Or they has been translated slightly different. I guess this has happened with the Scottish and Swedish PEWS scores. There is no rule that say such scores are wrong or not allowed. These deviations can be perfectly in order. We have a few with _no extension, and other countries can do as well.
But for the international repository, I feel that we should stick to the original. Those are common medical knowledge, and messing them up will cause confusion among those we serve.
General score-archetype? OK. Why stop there? We can have one Observation, one Action, one Instruction and one Evaluation and leave the application to differentiate between instances of each by adding terminology or logic. It’s possible. But it will be extremely unfamiliar to clinicians, and also add a lot of workload on the application and message brokers.
The point of openEHRs “dual modelling” will disappear and reintroduce the gap (or more rightly, the canyon) between informaticians and clinicians. It’s a no-go area.

heather.leslie · 7 May 2020 20:25

I don’t have a problem with having multiple PEWs scores in CKM if they are valid instruments and clinically useful to others outside of a local context. We would need to manage their naming and IDs carefully so they are distinguishable from one another and IDs so they are technically distinct.
It is another issue completely if implementers or researchers want to compare data across the models - that’s a whole different issue with enormous complexity.
But putting the models together in one place might serve to identify how many similar (but different) models that exist and trigger the discussion that leads to convergence, rather than even more variations.
We can hope

martin.grundberg · 8 May 2020 07:11

Some really good points and perspectives here!

I agree with @heather.leslie that it is a good thing if a “local” score is made available in the international ckm. It can be used anywhere, so it is really only it’s origin that is local, it’s usage should ideally not be constrained by locking it in a local repository. It is also very helpful to see what others have done. The review process for such an archetpye could be a bit interesting though.

@varntzen, you are of course right that the generalization can be made in absurdum. I dont think a generic score is a good thing. I do think that there are good arguments for (and likely also against) having a common PEWS. Another one is how a system implementer or consumer of an openEHR based platform should know what to query for if you can have an arbitrary number of PEWS archetypes. PEWS seems like a somewhat international concept, why should you not be able to query for it? Lets say that the business process in hospital 1 is to use PEWS X and in hospital 2 they use PEWS Y, having distinct archetypes makes such a use case difficult for implementers as the consumer needs knowledge about all potential PEWS archetypes to not risk missing important clinical data.

Maybe with binding of the archetypes to SNOMED CT you could use subsumption testing for that use case. Like in this example, the consensus here seems to model the archetype on the “lowest” level of specificity. If let’s say there is a Norwegian PEWS concept that is a child to the generic PEWS concept, that could solve part of the problem.

My main argument (or mainly raising the topic for discussion) is really that modelling the clinical knowledge outside the information model makes modelling concepts with more advanced or local rules easier while at the same time not removing any clinical precision. Not that all PEWS are alike or even that they should be modelled in the same archetype.

varntzen · 8 May 2020 07:42

Well, this is equally difficult to the clinicians, how can they know that the result from PEWS X is different from PEWS Y? If the result is shown regardless of X and Y, the result is lying. “Someone” or “Something” has to know the difference, and as humans are less capable of remembering “ward a) in hospital 1 is using PEWS X, while we are using PEWS Y”, it’s better be handled by the system.

thomas.beale · 8 May 2020 10:10

Having read the wikipedia PEWS page (see how good my research is my initial impression is that there are different PEWS scores, not just with different scoring methods, but sometimes different input variables. From an informatics perspective, these have to be different things, and ‘PEWS’ would need to be understood as a general notion, not a fixed method (cf Apgar etc).

Now, that is fairly annoying, because if you did create PEWS(NO), PEWS(SE), PEWS(UK), or some other flavours (maybe named after whoever manages to promulgate their version better), from a computational point of view, the variants are possibly not directly comparable. But if you think about it, having distinct variants means you could devise a comparator or data converter function such that (say) each of those PEWS(NO) etc would logically have a convert_to_basic_pews() where 'basic PEWS` is some comparable form.

The point here is not that they are not comparable, it’s that they have the same data structure - that’s why it’s one archetype. Indeed, the state attribute is there precisely to enable computational logic to determine if two BPs are comparable or not - that’s the purpose of that field.

The equivalent of that in PEWS land would be something like two PEWS scores (same structures), but where the state of one child includes the fact that he has a known heart defect or some other condition that changes the normal ranges.

If it is a question of same structures and data items (i.e. input variables) but differing normal ranges, then you can at least create a parent PEWS archetype, and specialise that into various children based on the x, y and z methods of scoring and/or differing normal ranges. In the latter case, you’d have a PEWS archetype and then some child ones like PEWS_0_11m, PEWS_12_23m, etc. If there are still variant scoring algorithms, then… see the next point.

There is already a method of defining formal expressions in archetypes (see here) which we are improving, based on an improved expression language. We have considered putting formulae in archetypes to define how a score is computed. Based on that, another possibility is to create a PEWS archetype (assuming the data structure and variable set is the same) and then add more than one scoring function (maybe called things like score_by_smith_method(), score_by_larsson_method(), or whatever it may be).

If your design intention was to generate notifications based on continuously monitored variables (e.g. neonate ICU) then you could do the modelling, and add GDL to detect changes and generate such warnings. That still means solving the archetype modelling side, but you could put more of the rules in the GDL. You’d need to ping @rong.chen to find out the details of how to do that.

We do have a basic question (in general) of what rules should appear in archetypes, and which ones elsewhere. It seems clear that some could go in, e.g. the formula for mean arterial pressure could be included in the BP archetype, since it would make life easier for EHR systems to compute the MAP on the spot. On the other hand, where the data is the same shape but the computations different - i.e. potentially this PEWS problem, then it seems the rules need to go into a decision logic module of some kind, of which GDL is a current implementation in openEHR. We don’t have a full answer for all this yet.

I don’t have the time to trawl through the PEWS literature properly, so consider the above just a few ideas that might help to think about how the machinery of archetypes might be used to do what you want.

thomas.beale · 8 May 2020 10:14

We are very close to having this now.

mikael · 1 February 2021 14:05

Hi,

I am currently trying to model our Swedish version of PEWS archetype(s)and considering how the archetype(s) can fit in the model of the “Paediatric Early Warning Score (PEWS)”, https://ckm.openehr.org/ckm/archetypes/1013.1.5149, and “PEWS - simple variables”, https://ckm.openehr.org/ckm/archetypes/1013.1.5167 archetypes.

The Swedish version of PEWS consists of 7 different versions of the score where each version is adapted to the different age spans 0 – 3 months, 4 – 11 months, 1 – 2 years, 3 – 5 years, 6 – 11 years, 12 – 15 years, and 16 – 18 years. Each version contains the main parameters Respiratory (Andning), Cardiovascular (Cirkulation) and Behaviour (Neurologi). Each of these three main parameters are dependent on a few vital parameters whose definitions depend on which age span version that is used. We have a use case where we not only would like to record each score for each main parameter but also each score for each vital parameter the main parameters are dependent on.

Would it be most appropriate to model a cluster archetype for each age span version of PEWS that fit in the slot “PEWS components” in the “Paediatric Early Warning Score (PEWS)” archetype to fit all relevant parameters? Or is it better to model it in some other way?

I assume that it would be most appropriate to model the main parameters Respiratory (Andning), Cardiovascular (Cirkulation) and Behaviour (Neurologi) as in the archetype “PEWS - simple variables” and to add a cluster for each of these three main parameters that contain the vital parameters the main parameter is dependent on. Would it be appropriate to indicate that modeling style in the “PEWS - simple variables” archetype?

Regards
Mikael