Should openEHR specify a rounding mode?

Hi all,

We were discussing numeric rounding in openEHR and realized that, as far as we can tell, the specifications do not define a normative rounding mode.

One example is DV_QUANTITY.precision, where a value may be rounded to a certain number of decimal places. However, this could also matter in other places, for example in archetype rules or other validation/calculation contexts. Without a specified rounding mode, different implementations could produce different results for the same input.

For example, when rounding -2.5 to an integer:

round half toward positive infinity: -2   // e.g. Java Math.round()
HALF_UP / half away from zero:       -3
HALF_EVEN / banker's rounding:       -2

In Archie, we currently use round half toward positive infinity in some places, simply because we rely on Java’s Math.round(). However, we’re not sure whether that is the most appropriate behavior to standardize, or whether something like HALF_EVEN or HALF_UP would be preferable.

We’re also wondering whether there are existing recommendations or standards in the medical or scientific domain (or IEEE/ISO/HL7/UCUM) that openEHR should align with.

Our questions are:

  1. Is the absence of a specified rounding mode intentional?

  2. Would it make sense for openEHR to define a normative rounding mode?

  3. If so, which rounding mode would be most appropriate?

  4. Are there existing standards or conventions that openEHR should follow?

  5. If this were to be specified, where would be the most appropriate place? We were thinking of somewhere central in the BASE specification (e.g. on the base_types page), so that the same rounding semantics apply consistently throughout openEHR.

We’re curious to hear your thoughts.

I don’t know the answer to your questions, but it feels important to explore a bit possibilities, syncing with other standards (if applicable).

If we are specking about a few functions, probably the BASE is the right place. Perhaps you can elaborate or sketch how would you like to have them in BASE?

On the other hand, if we are speaking about more then just a few functions, I think Thomas did some proposals in the past which are part of BEL/EL & expressions, built in functions (that needs to be specified, perhaps implementation language agnostic) - does that relates in any ways to your needs?

(Sorry something went wrong)

The openEHR Foundation Types define:

  • Real as a 32-bit real number in an interoperable representation, including single-width IEEE floating point.
  • Double as a 64-bit real number in an interoperable representation, including double-precision IEEE floating point.

Im not sure if that references to IEEE 754 but probably does

openEHR also references implementation technologies such as Java, C#, and C++, and Java’s numeric model is based on java.lang.Number.

For Java-based implementations, the JVM specification is more explicit: floating-point operations use the IEEE 754 default rounding mode for binary floating-point arithmetic. See Chapter 2. The Structure of the Java Virtual Machine

That default mode is:

Round to nearest, ties to even

This means that inexact binary floating-point results are rounded to the nearest representable value. If the result is exactly halfway between two representable values, the one with an even least-significant bit is chosen.

This matches the IEEE 754 default rounding rule for binary floating-point arithmetic:

Round to nearest, ties to even is the default for binary floating point and the recommended default for decimal floating point.
Round to nearest, ties away from zero is only required for decimal implementations.

This should not be confused with Java’s Math.round()

Math.round() is an API-level integer rounding function. It does not define the JVM’s normal floating-point arithmetic rounding mode. Its tie behavior is different: ties are rounded toward positive infinity.

Javascripts Math.round() works exacty the same way.

The 2 lines in the foundation is also all i could find, what ever mode is defined i feel it might be good to describe it more explicit.

In critical applications like banking or health care, using floats isn’t the best real number representation since it’s lossy. We should use decimal representations of real numbers which are lossless.

The openEHR specs are written in terms of ‘precision’, with the intent that sufficient precision would always be used to achieve the necessary accuracy. Any rule for rounding is therefore probably either local or (more likely) the default mode inside the relevant software libraries (Java, or whatever).

There is another rounding related question, which is the ‘floating point problem’, which has the effect that a number like 5.0 when stored as a floating point may be read as 4.9999999999999999. This can make some data / documents unreadable and also cause equality failures. This problem is well recognised, of course (especially in finance), and is the reason for types like Decimal or BigDecimal in various programming languages.

Since there is no Decimal type in openEHR’s primitive types, the practical approach is to map Real to Decimal or whatever the decimal type is called in your software environment. openEHR also has the Double type, which I would leave mapped to IEEE FP Double, on the assumption that Double is used when floating point numbers really are wanted,

+1

That’s exactly what we do and what I suggested above.

Thanks for the replies! We think our original question may not have been entirely clear, as we weren’t primarily referring to floating-point arithmetic or the mapping of Real to Decimal/BigDecimal (which we completely agree with).

What we’re really wondering about is cases where an implementation is required to convert a real-valued result to an integer value.

For example, suppose an archetype rule calculates the product of two DV_QUANTITY values, but the result is assigned to a DV_COUNT. If the calculation results in 1.5, the implementation cannot store that value directly, since DV_COUNT only allows integers. The same applies to -1.5.

At that point, the implementation has to decide what to do:

  • 1.5 → 1 or 2?

  • -1.5 → -1 or -2?

The answer depends on the rounding strategy (e.g. half toward positive infinity, HALF_UP, HALF_EVEN, etc.).

Our concern is that if this choice is implementation-defined, two conformant openEHR implementations could legitimately persist different values for the same calculation. That not only affects interoperability at the moment the calculation is performed, but could also make it difficult or even impossible to switch between implementations while preserving existing data, because the persisted values would already differ depending on which implementation originally performed the calculation.

Would this be considered acceptable, or should openEHR define deterministic behavior for these kinds of conversions? Alternatively, should such conversions be disallowed unless the archetype or expression explicitly specifies how the value should be converted?

Forgive me if I don’t understand the case properly. I’m a modeller, not techie.

Why should any assign the two DV_QUANTITY to a DV_COUNT? To me, this is a problem in the archetype, not the specs missing a rounding mode.

If a modeller chose for some reason to use a DV_COUNT in a target element of a computation of two or more Real in an archetype, then he or she should have an opinion on what rounding rules that should be applied, and that will vary with what the concept or element is about. Sometimes it doesn’t really matter how the rounding rules are, and sometimes it does. If it does, then it must be documented in the archetype element how it should be rounded. And it will be up to the logic in the application to comply with that documentation, and to store and display it accordingly.

Hard agree @varntzen. It woukd be good to understand the use case but this feels like an issue that can only be resolved on a per case example because I think setting any kind of default could be fraught with risk

I guess we could define the various rounding modes as functions in the Basic Expression Language, e.g.

  • ROUND = nearest value.
  • ROUNDUP = always higher.
  • ROUNDDOWN = always lower.
  • MROUND = nearest multiple.
  • CEILING = next multiple up.
  • FLOOR = next multiple down.

Then you could use them consistently in rules.

But having a few use cases for this would be really helpful to understand the need and detailed requirements.

An example would be assessments like this: Catherine Bergego Scale (CBS) – Strokengine.

Scoring:
The CBS uses a 4-point rating scale to indicate the severity of neglect for each item:

0 = no neglect
1 = mild neglect (patient always explores the right hemispace first and slowly or hesitantly explores the left side)
2 = moderate neglect (patient demonstrates constant and clear left-sided omissions or collisions)
3 = severe neglect (patient is only able to explore the right hemispace)

This results in a total score out of 30.

Azouvi et al. (2002, 2003) have reported arbitrary ratings of neglect severity according to total scores:

0 = No behavioral neglect
1-10 = Mild behavioral neglect
11-20 = Moderate behavioral neglect
21-30 = Severe behavioral neglect

In cases of severe impairment the patient may not able to perform an item of the CBS. In these instances the item is considered invalid, is not scored, and is not included in the final score. As such, the total score would be a calculation of the average score of the valid questions (i.e. sum of individual scores divided by number of valid questions x 10).

=====

Above calculation can easily result in fractions, for example if only 9 items can be performed by a patient, the final additions of scores is multiplied by 10/9. This then has to be compared to the range to determine the amount of behavioral neglect. Users in practise dit not want see fractions in the total score, so a DV_QUANTITY with precision > 0 was unwanted. With a DV_COUNT or DV_QUANTITY with precision = 0 we have to apply rounding. Based on the rounding strategy a patient can be categorised in a different category of neglect.

I agree that this would be ideal, @sebastian.garde: having an option to define a rounding mode, with a default mode that always applies.

I think this is less about deciding case by case whether you should do something, and more about whether the specification allows you to do it at all.

What I mean is that openEHR is a specification, and there do not seem to be strict restrictions on what you can combine in rules. For example, in BEL and EL, you do not directly use the RM-based DV_ types with their add, multiply, and other methods. Those methods are only defined within the RM itself. BEL and EL instead reference primitive types.

Because of that, I think there is nothing really preventing you from doing any calculation, as long as the result is cast or assigned to the correct type. This seems to happen on assignment. I think AQL has a similar situation, assuming I understand the specification correctly.

As far as I know, BEL, EL, and AQL do not implement the AM in a way that forces these rules. That probably makes case-by-case enforcement harder.

So having some kind of default rounding mode applied would already be a good start. Having the option to define the rounding mode case by case, wherever that is relevant, would be even better.

Thank you, @StefanTeijgeler, this clarifies the case.

One thing is that in such a case as with the CBS instrument, does it make sense to make a make an average of a limited number of scores in the first place, and to conclude in a total score? That’s not up to us as modellers or implementers to decide really, but I have been wondering many times how aware those making the scores and scales are of this question.

Anyways, if the score is designed that way, so be it. :smiley:

The next thing that appears to my mind is to ask the clinicians doing this score. How do they sum and score when there are a “rounding problem”? If this is done consistently, well then apply that rule in the application.

However, I would guess that, if you ask around, some clinicians have a habit of weighing some of the sub scores individually, as exemplified in the article by Azouvi et al, with varying degree of difficulty between the 1-10 questions. For example (made up by me) “There were no answer on question 8, so the average becomes 17.4, but the result of question number 1 were 3, so we round up”. Which makes it a much harder job to make an algorithm to chose which rounding mode to use. Should your system actually compute this total score and store it, or should it be manually recorded by the user? Perhaps it is better to display the average, and force the user to click on what the clinician assert in which group the patient is?

The third issue, is that scores and scales are themselves not really precise instruments. There is a degree of uncertainty both in the scoring process, and the assessment. The scale itself and its total score might not pretend to be more decisive than it actually is, most of them are only indications and should be evaluated by a clinician and compared with other observations. So the rounding mode may not be that important after all?

Sorry to introduce more questions than answers. This is the “dirty” nature of the medical domain :smiley:

By the way, if you have an archetype of Catherine Bergego Scale, please upload it as an Archetype Proposal in the Clinical Knowledge Manager.