Flexibility without compromising data integrity

Our journey with openEHR began with replacing an EPROMS (electronic patient reported outcomes) system. We used openEHR to build archetypes and templates and Better Studio to build the forms. Our experience is that in this area the specification for what data to collect and how to elicit that data is very subjective relative to other clinical workflows.

In order to maintain some control the questions and responses were defined by a clinical working group and we designed to that specification. As new teams join the EPROMS project and original teams review the data collected we are receiving unanticipated requests for change that we are unsure how to manage.

These can be summarised as

  1. Adding questions
  2. Adding responses
  3. Changing responses
  4. Changing questions

Had we anticipated these requests we may have designed differently and the question is how to design to accomodate flexibility without compromising data integrity. Browsing the community I’ve noticed concepts like including multiple data forms (eg, Text and Coded Text), using ELEMENTS and CLUSTERS but wondered what is the optimal solution for a scenario where change is certain so it is not as simple as training staff in a particular direction.

Change is inevitable even in scenarios where concepts appear rigid eg, EWS to NEWS to NEWS2 so some advice on accommodating that within openEHR which relies on common data to facilitate sharing, or at least that’s my understanding.

Thanks in advance

1 Like

Ah … welcome to our world!! One of the reasons that we have a maximal dataset philosophy is to try to get ahead of the churn of change as you engage more clinical groups.

It might be helpful to get a couple of examples.

The main technical principle is that adding new content i.e new data points or clusters, is a non-breaking change i.e that will not affect existing data integrity. Adding new questions is not a problem at all.
Adding new responses to existing questions e.g adding Unknown to an existing Yes/No response is not a breaking change from a technical POV but can have consequences for downstream software of processes that are not expecting ‘Unknown’ - that’s not an openEHR issue per se, just a factor in clinical informatics.

One thing to be aware of is that even breaking changes are not necessarily blockers. Either you can joint query the 2 versions, or , in many cases it is pretty easy to bulk update data captured with the older version to the new format. I know most of the CDR vendors do that fairly regularly. Best avoided, of course but somewhat inevitable from time to time.

Other possibilities are to tease out common question sets, perhaps as CLUSTERS that can be mixed and matched for different Conditions.

Changing questions and responses is entirely possible since under the hood they are generally backed with internal coded terms (at codes) but you need to be sure that you are not changing the meaning in any significant way, such that querying becomes misleading.

In all of this there is probably a cultural angle, of drawing at least some of the clinical people in to the hard decisions about potential breaking change. Clinicians do need to better understand the potential cost/impact of the changes they request. People are used to tinkering and tweaking, which of course is often good for local ownership/innovation but it can come at a significant cost both in terms of cost and data integrity.

We have all been there!! And there would be no shame in making some radical changes now to set yourselves up for more flexibility in the future.

1 Like

The modeling challenge with questions is similar to the laboratory results: you can model each question with it’s specific question and answer constraints or you can do it in a generic way, by having one model for all questions and using alternative constraints for al the possible results. Then having a dictionary of questions/answer options that can be used to fill in the openEHR models and let the user enter or pick the response. This is very similar to how terminologies work, but instead of having a concept code you have the text of the question and the possible answers for each question.

With laboratory results it’s similar you can model each result in it’s own archetype or just create a generic archetype and have a terminology on the side specifying how a result will look like, for instance by using LOINC. In your case what LOINC is for lab results the question dictionary is for questionnaires.

Just my 2 cents.


That is a good suggestion Pablo, as an alternative.

We did try something like that in this template - generic archetype plus controlled terminology.

@NeillY - it would be really helpful to see some examples of the particular challenges you are facing.

Hi Neill, thanks for sharing this question. Ian answered most of it already in a great way. I’d like to add that designing archetypes for wide use is not easy. It’s a task for expert that have both deep understanding of the domain (usually by years of practising) and of informatics. And a unique talent that’s hard to nail down. Then mix in years of trial and error.
There are some pointers to ‘prevent’ many of the issues you encounter. Ian and Pablo described specific ones. For one it’s good to follow the openEHR design guide and the described modelling patterns. https://openehr.atlassian.net/wiki/spaces/healthmod/pages/304742401/CKM+Editorial+Style+Guide
Another is to put the archetypes out for wide (international) review early (first draft, definitely before hitting production) that way you get most of the expected change request early. And as well known in software dev every change that comes one step later in the process it takes 10x time to fix. So it may feel cumbersome to have to do a review before trying a model in real world app, it’s often worth it.
Asking questions like this you’ll get a lot of expert help (for free) and it will help you develop your own openEHR modelling expertise. But it could be smart to cheat and contract an expert like Ian or @heather.leslie.

I was proposing something like that but with questions, which is not exactly terminology.

By analyzing the semantics of the questionnaire questions and the requirement of updating questionnaires, questions and even answers, we can consider the questions are content themselves, not just data structures or data points. In that sense, questions and answers look like concepts in a terminology, with the analogy of the “question dictionary” instead of “terminology”. The issue of modeling each question in it’s own archetype is maintainability: each new question, modification to a question or to an answer can cause many updates needed in cascade on archetypes and templates using the updated one.

So if the archetype is generic, if the “content” (questions and answers) are managed externally to the archetype (in the question dictionary), and the linking metadata for questionnaire/question to the specific type of questionnaire is also managed outside archetypes (in the dictionary too), then any questionnaire will be mappable to an openEHR COMPOSITION without the need of dealing with archetypes and templates each time an update is needed. In fact, templates could be autogenerated from the definitions of the linking data managed in the question dictionary, but there are many ways of implementing this.

Of course I’m not a clinical modeler, though I had this problem before and gave an engineering solution for it, which might or not make sense depending from where are you coming from :slight_smile: