I created a first draft of the NEWS2 guideline (RCP, 2017), as another example to exercise the Task Planning/ Decision Logic Module approach to formulating guidelines (see here). I have yet to encode the workflow (Task Planning) part.
In common with other guidelines, NEWS2 has some tricky elements for formal representation, particularly the ‘scale 2’ form of the SpO2 score, which can be seen on the visual algorithm representation below.
On the SpO2 (scale 2) row, it has the following:
88-92% → 0 points;
>=93% on air-> 0 points;
And then remaining items to the right marked ‘oxygen’. It takes a bit of work to correctly formulate this in a computable way, as can be seen in the example. I am assuming that ‘air’ means PP air, if not, I don’t see how this cell makes sense, since ambient air is clearly the default.
There are other minor challenges as well: NEWS2 doesn’t name the 5 bands it uses (or sometimes 4) for various kinds of monitoring and clinical response.
Another thing that is an interesting question is exactly what the workflow (Task Plan) should look like. Is it essentially a timed monitoring loop, with exit branches for urgent / ICU care? In other words, does NEWS2 become the controlling logic for managing very ill patients?
As usual, I am interested in any clinician feedback, particularly about whether the formalism (although not yet colourised in a nice editor) look like something that you could use, with a bit of training (assume you do have tools)?
Partly, this kind of exercise makes me think that organisations such as the RCP, NICE etc who create guidelines would benefit from direct input from clinical modelling and even some technical experts, so as to improve the computable representation and maintainability of the guidelines they publish - and one day possibly even publish primarily in a mixed computational documentary format, of the kind we are trying to develop here.
In common with other guidelines, NEWS2 has some tricky elements for formal representation, particularly the ‘scale 2’ form of the SpO2 score, which can be seen on the visual algorithm representation below.
The guideline says
Use Scale 2 if target range is 88–92%, eg in hypercapnic respiratory failure
Would it be better to make this into separate DLMs? Where DLM 1 decides on NEWS2 normal sat, or NEWS2 scale 2. That dlm can even be assisted by a query on the target saturation (In a EVALUATION.goal archetype. That would simplify the reading of this instrument.
Another thing that is an interesting question is exactly what the workflow (Task Plan) should look like. Is it essentially a timed monitoring loop, with exit branches for urgent / ICU care? In other words, does NEWS2 become the controlling logic for managing very ill patients?
I’m not an expert on the subject. But I don’t expect so: it’s a tool to help asses the severity of illness/shock. It’s use is for nurses to know when to contact a doctor. And for(junior) doctors to aid decisions on when to escalate care (to ICU) and to keep track on the progress in time of an illness and it’s treatment. The main goal of the scale afaik is in the name: “early warning”.
The main use settings will be on normal (and nursing home) wards where vital signs are checked ‘only’ about 2 times a day. On the icu with continuous monitoring the news has less value (I assume, so if there are any icu specialists here, correct me if I’m wrong). So I do not expect it be used as a controlling logic.
As usual, I am interested in any clinician feedback, particularly about whether the formalism (although not yet colourised in a nice editor) look like something that you could use, with a bit of training (assume you do have tools)?
I think I’d need to see visual representation in a tool to be able to properly answer this question. I can (kind off) follow the logic, but I do not see a big advantage compared to functions in a regular programming language on the basis of easier viewing for clinicians without experience in computer programming.
Partly, this kind of exercise makes me think that organisations such as the RCP, NICE etc who create guidelines would benefit from direct input from clinical modelling and even some technical experts, so as to improve the computable representation and maintainability of the guidelines they publish - and one day possibly even publish primarily in a mixed computational documentary format, of the kind we are trying to develop here.
Yes! Right now I imagine guideline commissions fight a lot over wording. While making choices very specific could ease their choices. But I do not have experience writing guidelines.
I did think of that, and it might be the way to go. My initial instinct is not to mess with the guideline in its published form, since if we do that too much, we may end up with a guideline that users are not confident is truly NEWS2, i.e. it may appear to be some loose (or not) derivative. But - sticking fairly rigidly to the published form means creating some strange rules etc.
Indeed, the scale 1/scale 2 idea is not at all how I would have defined this in the first place - I would create two guidelines with some shared parts, because we’re really talking about 2 kinds of patients as far as I can see:
people whose target SpO2 is ‘normal’
people with impaired respiratory function, e.g. COPD etc
If vitals (which is most of the input variables of NEWS2) are being checked say every 8h, or even every 4h, based on having done one NEWS2 score, but the patient gets worse quickly, then it is not going to be a NEWS that gives you an alert to get the patient to ICU … it will be a Siemens monitor beeping due to SpO2 < 95% (or whatever the threshold is), or the patient in distress ringing for help. So I’m still a bit mystified how NEWS really works as an early warning of an acutely ill patient going downhill if it’s not being checked fairly frequently. I’d like to understand the real-world workflow of NEWS use today…
Well you can’t write the following in most PLs, although I see Java 13 finally has something close:
temperature_score: Integer
Result := case temperature in
=================
|≤35.0|: 3,
-----------------
|35.1..36.0|: 1,
-----------------
|36.1..38.0|: 0,
-----------------
|38.1..39.0|: 1,
-----------------
|≥39.1|: 2
=================
;
The idea of that is that it looks pretty close to the published form. You also can’t write a declaration like this:
neutrophils: Quantity
currency = 3d
ranges =
----------------------------------
[normal]: |>1 x 10^9/L|,
[low]: |0.5 - 1 x 10^9/L|,
[very_low]: |<0.5 x 10^9/L|
----------------------------------
;
This is because mainstream PLs don’t have any meta-model at all for the notion of tracked variables and range declarations. Another thing that mainstream PLs don’t handle is:
molecular_subtype: Terminology_term
Result :=
choice of
=========================================================
er_positive and
her2_negative and
not ki67.in_range ([high]): [luminal_A],
---------------------------------------------------------
er_positive and
her2_negative and
ki67.in_range ([high]): [luminal_B_HER2_negative],
---------------------------------------------------------
er_positive and
her2_positive: [luminal_B_HER2_positive],
---------------------------------------------------------
er_negative and
pr_negative and
her2_positive and
ki67.in_range ([high]): [HER2],
---------------------------------------------------------
er_negative and
pr_negative and
her2_negative and
ki67.in_range ([high]): [triple_negative],
---------------------------------------------------------
*: [none];
=========================================================
;
which is a functional equivalent of an if/then/else chain. None of these tricks are that complicated of course, but if you actually had to use e.g. Java directly to write these rules, it would be painful, and a lot more lines of code.
I could be wrong of course about the value of this approach, which is why I am trying to express a reasonable number of different types of CPG in this form to see how well it works. I’m working on a formal grammar, so we should have a colourising editor soon-ish…
Hi Thomas,
It’s not clear to me from your description whether you are modelling NEWS2 de novo as a guideline or if you are using the published archetype as the basis - https://ckm.openehr.org/ckm/archetypes/1013.1.3342?
It has been modelled on the assumption that clinicians will need to capture the parameters used to calculate the score, even if some or all values have been derived from devices or other measurements.
It should impact how you create a guideline.
And the decision about the use of Scale 1 or 2 is based on the establishing the presence or absence of an existing condition before starting to use the guideline, as per the archetype ‘Use’: “f the patient has hypercapnic respiratory failure, calculation of the NEWS score should use ‘SpO₂ Scale 2’. In all other situations calculation of the NEWS2 score should use ‘SpO₂ Scale 1’.”
Ah yes, I should have mentioned that. I did review the archetype to see how you had interpreted particularly the varying SpO2 score, but didn’t otherwise use it. I didn’t aim to integrate it directly in this initial exercise, since I am assuming that the various vitals are routinely available from normal obs anyway (they should be), and the NEWS can thus be run using available data. I can however model the workflow part such that the task in which the NEWS is calculated displays a form whose contents include the 6 input data items, and items that have no values can be filled in (e.g. consciousness state).
I ended up building the respiratory failure v normal patient condition into the algorithm so it will generate the right answer, because that’s the way it’s documented, but it does seem like two variant forms of the DLM would be better. I might try that.
One of us is confused, and while I assume that one is me I’ll still bite on the off chance it isn’t…
The default state for relevant patients is that a human actively checks vitals at some interval determined by the procedures in place. Twice a day or once per shift would be pretty typical I think. The intent of early warning scores like NEWS is that you use it to calculate a score that indicates whether you need to change your procedure.
Off the top of my head, for NEWS the escalating changes for progressively worse scores are to start checking vitals (and recalculating the score) every 4 hours, then 2 hours, then switch to continuous monitoring (which may or may not mean a transfer to the ICU). I can’t immediately recall if NEWS2 also specifies whether you should have a doctor present etc. for the different stages (I believe it does). Even if vitals are being collected automatically, scoring with NEWS gives you a point at which to reassess overall status and care decisions. The worse the calculated score the more often you need to reassess, and NEWS(2) gives you both the way to calculate that score and the resulting recommended interval for reassessment. The latter being equally important as the former.
And an adjacent point: early warning scores are about looking at a “totality” (select variables) of a patient’s state rather than any single variable. You can conceivably have a patient that doesn’t trip a single individual threshold alarm (your “SpO2 < 95%”) but is still minutes away from coding, even though they walked into the emergency room under their own power mere minutes ago. It would be better to think of it as a score to determine the appropriate intensity of care, on a more fine grained scale than not ICU—ICU, than a binary “Now we need to send them to the ICU” flag.
<Insert obligatory: not a healthcare professional-caveat here>
Thank you for this great discussion - interested as am researching scales like NEWS2.
My concern is: these numbers in the scale are crude whole numbers, and lose the accuracy that is present in the underlying data, as per NEWS2 archetype header "Not to be used to record actual measurements for each variable. Use specific OBSERVATION archetypes for this purpose:
OBSERVATION.blood_pressure;
OBSERVATION.pulse;
OBSERVATION.respiration;
OBSERVATION.body_temperature;
OBSERVATION.acvpu; and
OBSERVATION.pulse_oximetry."
BTW that’s not a complete set of its components e.g. inspired air/O2 is given a binary value of 2 points: could there be a range of actual O2 % values using openEHR-EHR-CLUSTER.inspired_oxygen.v1 ?
Similarly, ACVPU is given a single value of 3, rather than using any of the 5 components.
All the continuous variables are chunked e.g. for pulse into 6 very unequal value ranges - yet surely a pulse change of 1 that crosses a grouping should not have equivalent arithmetic effect as one of 15 that doesn’t?
AFAICS the reason for this loss of detail is because “mental calculation” of a score in emergency is the prime purpose, so numbers must be simple enough to be processed by “mental arithmetic”.
But as more of its components are machine-generated or -stored, could we be reverting the calculation back to using the raw data, machine-computed complete with non-linearities to present to a clinical user on a mobile or Point-of-Care device (or further process by machine) ?
Please excuse if off-topic or not helpful
(not a CCU person, have never used NEWS for real…)
Indeed. So say you do a NEWS2, and it tells you to start monitoring 4-6 hourly (i.e. say twice per standard shift). But one hour later, the patient gets much worse, and if someone were to do a NEWS2, it would indicate (say) 1 hour monitoring. But no-one is there to do that. Someone at the nursing station gets an alert that your monitor is reading SpO2 = 89%, and they immediately act on that. So I am not clear how NEWS has acted as an early warning here - standard monitoring of he VS is what will cause a response.
NEWS2 risk bands 4 and 5 require a doctor according to my reading.
Anyway, what I am really getting to is this: you can detect most of the NEWS input parameters with normal monitors, plus some indication of whether the patient is on PPO. The remaining variable is consciousness assessment. So NEWS could be computed on 5 variables more or less in real time, which would then be a real ‘early warning tool’. Not sure how to compensate for consciousness, but this would go close to addressing your ‘totality’ point.
Exactly … and indeed, I could easily base the rules on more accurate value ranges.
Yeah, this point slightly bothers me, since it would be quite reasonable to assume most parameters could be obtained by direct connection to the data from EHR, monitoring devices etc.
They should be, but at least from a medicolegal POV the parameter scores at the time and the sum should also be recorded as well. I’m sure that it could all be found via queries or forensically, but the data required to store this score content is trivial. The archetype content provides a useful clinical snapshot at a point in time, potentially even to graph the parameters over time, so it possibly should be integrated into the guideline, even if only to record the query results contemporaneously.
Good point. I had not thought enough about that. The decision not to use the openEHR archetype directly in the DLM however isn’t that there’s anything wrong with the archetype, it’s just that a major design goal of Task Planning/Decision Logic is to work over any systems environment, including one with no openEHR EHR. So it obtains the data from the Subject Proxy Service, which gets it from wherever it is available in each environment. In a full openEHR environment, there are various tricks that can be used to make all this very efficient of course.
In a current project that uses NEWS2 we are storing the raw scores, and the sub-scores, and the total. Part of the reason is that the total score may not be calculable because of missing raw data but the intermediate scores still have utility.