Uncertain, unknown and no information

Clinical data is hard to get right. One problem is to handle the unknown and/or uncertain semantic. I assume you all have met the challenge to model a value list of

  • yes
  • no
  • uncertain
  • unknown
  • not applicable

Norway is currently implementing a screening program for colorectal cancer. There will be #fhir messages with the data and extensively use of #snomed-ct. DIPS is developing #openehr models (templates and archetypes) to be used within our systems. We need to map these value sets into openehr somehow.

The straight forward solution is a DV_BOOLEAN element with some usage of null flavours.

OpenEHR has unknown as a null flavour. Some argue that unknown and uncertain are different statements.

Thomas Beale wrote this wiki page back in 2007 : https://openehr.atlassian.net/wiki/spaces/spec/pages/4915211/Null+Flavours+and+Boolean+data+in+openEHR

The question is:

What do you think about adding more terms into null flavour and maybe adding uncertain as a first candiate?


I believe null flavour codes can be extended by the implementation, then harmonized with the openEHR terminology, because we need first the examples from the use cases to start defining new codes.

Current codes are few …

… and semantics are not well defined in the specs AFAIK:


Data values are connected to spatial structures via the value attribute of the ELEMENT class of the representation cluster. This class also carries the attribute null_flavour, whose value indicates how to read the contents of the value attribute. Values from the openEHR null flavours vocabulary, including 253|unknown|, 271|no information|, 272|masked|, and 273|not applicable| are used to populate it. Only a small number of generic codes are defined, in order to avoid complex processing for most data instances, for which this simple classification of null is sufficient.

In some circumstances however, additional detail is required in addition to the null flavour code. Examples include reporting and where specific reasons for lack of data have medico-legal ramifications, e.g. ‘patient was unconscious’, ‘patient refused to tell me’, ‘no reason provided’. For these situations, the optional null_reason field may be used to record a specific reason.


@bna What does ‘uncertain’ mean? In what context - data presence/absence or clinical certainty. Serious question. We probably all use it differently unless it’s attached to a value set.

In the archetype modelling we only use Boolean data types when we are sure there is only a Yes/No or True/false clinical answer. Then the null flavours are related to whether data is available or not.

In clinical modelling reality we hardly ever use Booleans, because there are hardly any clinical situations where there is a clear black and white answer. Too often there are subtle shades of grey, which I suspect might be one of the reasons driving your request for ‘uncertain’. In those ‘grey’ situations in archetypes, we tend to use the pattern of CODED_TEXT with a codable value set ‘Present/Absent/Indeterminate’. In this context ‘indeterminate’ means ‘we looked but we couldn’t tell’ - and that could be because of inexperience or simply that it could not be discerned even by the most experienced clinician, but it is still a very important and valid clinical finding that needs to be recorded in its own right, not as a RM ‘flavour’.

Just throwing this into the mix and stirring the soup :confounded:



Having through this journey repeatedly, and going through it again with another ‘reigistry-type’ dataset, I’m definitely with Heather on this.

I avoid DV_BOOLEN almost everywhere and always prefer to use a DV_CODED_TEXT even if there are only Present/Absent type answers, partly for similar reasons to Heather boolean have a nasty habit if not staying that way ’ grey Booleans’ and also, at least the UK the present/ absents are often carried as SNOMED terms. It is also very hard to come up with standard terms for the unknown/indeterminate/equivocal variants, and right now mix and null_flavours are tricky to use (if mixed with non-null options) and I think should mostly be reserved for essentially technical gotchas, not for normal clinical recording.

Having said all that, I wonder if there is an opportunity to see if we can work out some standard patterns and usage, and make use of SNOMED terms as either the primary terms or mappings. That would require a conversation with SCT International about licensing but I sense there is a change in their approach with the Global Patient Set, so perhaps there is conversation to be had there.


First: Sorry for leading the discussion into DV_BOOLEAN :slight_smile: We prefer DV_CODED_TEXT or DV_ORDINAL for most such use-cases. The boolean track is a rabbit hole we want to avoid…

The issue I want to discuss is if we should do some work to find patterns to be used for there use-cases. I think it is of interest to have a shared way to express such statements. It would benefit the data for secondary use.

I don’t have a clear idea on how to to this. Some options:

  1. DV_BOOLEAN with an extended NULL_FLAVOUR
  2. DV_CODED_TEXT with value set defined in archetype
  3. DV_TEXT specialized with external terminologies to reuse the value set between elements
  4. Archetype some ELEMENTS be reused
  5. Other options?

The most common pattern we follow is DV_TEXT in archetype which we specialize into DV_CODED_TEXT with terminologies. It works somehow great.

The reason why I revisit this topic now is the work with coloscopy report where the national program develop FHIR resources. Here they use a combination of HL7 FHIR null flavour and/or absent terminologies combined with local value sets for the specific resource or quality registry. This is why I had an idea of making such value sets a part of the reference model.

I am not sure if it is a good idea. I can see pros and cons.


  • Make such statements semantically defined
  • Suited for international models


  • Many of the statements we find is not semantically well-defined. And as such it makes no sense to put them in a small box of terminologies

Other thoughts?


What about the use of DV_BOOLEAN in the context of deceased? Whether a patient is deceased is a yes/no clinical answer, but what if the information is stored ambiguously - e.g. ‘death or major disability’ in source?


Hi Michelle,

Welcome to our world. Firstly I agree that you should avoid booleans, precisely because someone always wants to add a third option.

Specifically, your example of ‘death or major disability’ is a classic example of some designing a really badd questionnaire without any thought of if/how the data might be originally captured or used.

‘Death or major disability’ is a nonsense question about the patient unless you are really only interested if ‘really bad stuff happened’ perhaps as a part of pharmaceutical research. From a direct patient care perspective this makes no sense!!

However, this kind of mixed semantics is not uncommon when working at the registry/reporting end of data capture, or indeed at the very early stages of data capture , which is why there are number of ‘screening’ observation archetypes that recognise that this is not ‘primary semantic data’


Hi Ian,

Thanks for that! I couldn’t agree more. We’re also having the conversation about the value vs. the reason for the value (e.g. value is unknown, because it wasn’t captured or because it is stored ambiguously)

I am curious as to how it would work in this scenario if deceased is DV_BOOLEAN where allowable values are true or false. Does this mean deceased is false, the null flavour is ‘unknown’ and the null reason is something like ‘ambiguous in source’ or is deceased null, with null flavour ‘unknown’ and null reason ‘ambiguous in source’?


1 Like

Null Flavours are designed to be used when there is no value for ‘deceased’ at all, you cannot have a value AND a null_flavour - you have to choose between one or another

name: ‘deceased’
value: true


name: ‘deceased’
null_flavour: ‘no information’
null reason: ‘internet failed’

I think most of us try to avoid using null_flavours for ‘routine’ expressions of ‘unknown’ i.e where we can expect ‘unknown’ to be a reasonable answer - I would be explicitly adding that to a coded_text list.

I would only use null_flavour for genuinely unexpected ‘unknowns’, particularly where the element is mandatory i.e you can’t get the information but the archetype insists - using null_flavour is the get out clause.

Remember too, that unless an element is mandatory, it can always be left empty.

Of course this is (I hope!!) good advice if you have a clean room to work in but sometimes you have no option but to replicate exactly what is already established, perhaps after a small battle to persuade them of the error of their ways!!

1 Like

I have a question mostly for @birger.haarbrandt
Is null flavour supported by EHRBase?
What is the syntax then?
If it is not supported how you account for compulsory field whose information is unknown?

Are you referring to the flat format?

1 Like

Well. The question was incomplete. What I would like to know is how you can treat data that can be unknown both from the template and the composition point of view. When I talk about the template I mean both the opt and the webtemplate (json for flat composition).

In fewer words can you write some example lines of what I have to put in both opt and webtemplate files and what I have to put in a composition in order to account in EHRBase for unknown data?

Ah, in this case its quite a general openEHR question. Unfortunately, I’m “on the road” until the weekend. I hope that somebody can provide an example in the meantime. Otherwise, I will be able to provide an example next week.

thank you @birger.haarbrandt ,
I’ll appreciate very much the examples.
It’s quite a general openEHR question but I’m mostly interested in what EHRBase supports

Here is a valid example (at least in Better

 "initial_assesment/glasgow_coma_scale_gcs/best_motor_response_m/_null_flavour|code": "271", 

thanks Ian. And for the same example do I have to do something in the template?
I try to explain better. Do I have to know beforehand what data can be absent and declare them in some way in the template (this data may be present or not) or it is something that must be done only at composition level?
I assume it is the latter. I don’t always know in advance which data will be more susceptible to absence.

It would nearly always be a run-time ‘decision’ but occasionally (as per the example above) be a case where the archetype enforces mandation on a sub-score which is not recorded, only the total score. I’m not sure TBH if you can constrain that in templates.

that’s correct. However, if I remember correctly, you can limit to allowed absence reasons if this is required.


@birger.haarbrandt sorry to bother but can you post me some example that works with EHRBase?

Does anyone have experience with EHRBase and null values in flat format? Is it doable?