Philosophy fun - model questionnaires literally or via existing archetypes?

Just to make all the people who do serious modelling of actual questionnaires crazy, I’d like to a raise a philosophical question: should we model questionnaires as literal tree structures isomorphic to the implied tree structure of the forms they are usually expressed as? Or should we attempt to represent every question / option / field using some existing archetype data point? Let’s call this the literal versus semantic approach.

NB: When I say ‘we’, I of course mean ‘you’, the clinical modellers :wink:

But the question is a serious one and there are data processing, querying and semantics angles that I think are important.

I’ve been taking a good look at the JAMAR (JUVENILE ARTHRITIS MULTIDIMENSIONAL ASSESSMENT REPORT (JAMAR)) questionnaire, among others.

The first thing one realises when trying to do the modelling in a semantic way is that the match for items in CKM is often fairly approximate semantically as well as structurally. For example these parts:

could in theory be represented by data points in the various signs/symptoms CLUSTER archetypes, arranged within OBSERVATIONs. Or maybe pain archetypes. But it’s not a very close match. Other questions, or question sets are clearly an artefact of the tool design, i.e. they follow some theory of experts who have worked out what questions reveal most diagnostic or disease state information.

My initial conclusion is that we should model questionnaires in a literal fashion, i.e. without trying to represent data points with any proper ‘semantic’ archetypes. I think we should do this because:

  1. it’s the only way to be faithful to the questionnaire, and
  2. there’s no way to know if the true intention of any item in the questionnaire really is the same semantically as some previously defined openEHR archetype, except maybe for a few things like pain scale.

If this approach were used, implications would seem to include:

  • every questionnaire is its own thing (no / little reuse)
  • since there’s no use of ‘standard’ archetype data points, queries for say ‘joint stiffness’ based on some standard archetype are not going to find any ‘joint stiffness’ answers in recorded questionnaires.

Initially the above seems sub-optimal, but if we look a bit harder at the problem, we would potentially try to:

  • create a growing library of ‘standard question archetypes’ to cover questions that recur across questionnaires (there are a number of EMR related products that work on this basis)
  • start to design questionnaires more based on this standard library rather than each one being a new invention
  • find a way to link such questions to what we think are the relevant data points in the main clinical archetypes (perhaps by terminology)

One problem with the idea of a ‘standard questionnaire library’ approach is of course that questionnaires that come into use tend to come from all over the world, from research groups, specific institutions, PhD authors, and so on, i.e. there’s little common culture (at least that I am aware of). Whether there is any future in such a library seems orthogonal to me, to the original proposition - that we should model questionnaires using a mostly literal approach.

A fair bit of this would seem to apply to scores and scales as well - they are all tools of some kind, designed to make it efficient to triage or classify patients in a way so as to determine an appropriate care pathway, or classify as at-risk etc.

Anyway, the purpose of posting here is to run the idea past people who know a lot more about clinical questionnaires than I do, and see if any of the above resonates. There might be practical conclusions, such as having a dedicated questionnaire and/or question RM types to make direct representation easier.

It is even more complex that this, Thomas. I regard the questionnaire is a modeller’s nemesis because they are mostly not designed with computers in use. And often in the conversion to a digital format, modellers realise all of the humanness that is required to answer a question that actually has two dependent components but allows only one answer for both. Humans somehow manage it but it’s a nightmare to try to model purely or academically and in the end, more often than you want, you can only directly copy the original paper form into a dedicated archetype.

Yes, the CKAs do have an approach. It is a practical dilemma we have to deal with everyday and the current approach has only come from practical experience, good and bad.

  • Some things we can’t influence and have to model explicitly - think licensed and research validated scores or scales eg Glasgow Coma Scale or the PROMIS questionnaires which requires its own project to hold the mix & match question bank - We’ve only just started to model this family of questionnaires that are aggregated to varying degrees and in different combinations. Also the EORTC family of models, which will need a similar approach should we gain permission to access to the content.
  • For many scores and scales, they are ubiquitous and we model them as their own OBSERVATIONs, with some editiorial guidance found here - Some of them have been gathered in this project -
  • During the initial COVID work, we have built the first generic openEHR archetypes to support screening for a variety of common topics - meds, conditions, symptoms etc - and they seem to be supporting subsequent modelling requirements well. We deliberately kept them to the Yes/No (Present/Absent) responses, with the intent that more detail about any medication, condition or symptoms should be recorded in the formal archetypes, so that we avoid duplication. So far, they are proving to be of significant reuse value in our testing, especially registries, research and clinical screening and similar use cases. And the reason why is that they are simple and reflect common clinical activity -
  • I spent way too much time trying to build generic patterns for archetypes that allowed groupings and infinite nesting, with Question/Answer pairs where the answers were an ‘Any’ data type. They end up looking like fancy trees, but any content had to be added at template level, so they actually provided zero value, especially from a reuse POV. So we gave this up as a bad approach, only to find the FHIR Questionnaire resource come along a long time later to replicate this very idea, with all actual content added in a profile. :woozy_face: Again, zero interop value, but at least they get some consistency re metadata, I guess.

Ultimately, I think that questionnaires are largely patternless, more a reflection of how creative the author is. This bodes badly for generic questionnaire patterns in principle. We are always looking for opportunity for reuse value, hence the recent addition of screening questionnaires. But for the most part, we often just have to cut our losses and be pragmatic… The hope would be that as researchers develop questionnaires into the future, they look to archetypes to help structure it sensibly :thinking:


Thanks for the answer, I need to digest that. But one thing I should note: when I talk about modelling questionnaires ‘literally’, I mean: just make the tree structure in the archetype. Today, that would be using the data part inside an Observation Event, but we could add another Entry type to allow this - and we could include meta-data elements to enable connections / links / references from question items to other resources, terms or whatever. It would be similar to FHIR indeed, and entirely pragmatic. (BTW, FHIR questionnaire is wrongly named, it’s about representing data from any UI form, not questionnaires as such).

1 Like

The problem is a real conundrum.
1- Data recorded about a patient system can be recorded/shown in an EHR for the purpose of providing health and care by a healthcare provider. It is the primary use of data. Data about the patient system is coded using an ontology.
2- A questionnaire most often summaries, abstracts, data already recorded. Many times generic or specific classifications are used. The data obtained in a questionnaire is used for statistical analysis. I.e. it will be used to investigate a population. It is typical re-use of EHR-data.
3- A special case is the questionnaire as documentation tool where the result is a primary, de novo, result based on reuse of data about the patient system. It can result in a (rules based) abstraction of existing observations. An abstraction that is the result of a (rules based) evaluation of existing data.

(2) is a unique context specific Template.
(3) is a generic standardised Evaluation Archetype

All questionnaires follow the generic pattern of any Panel.

Yessss - I needed that sentence. I’ve seen it demoed as a questionnaire format and it makes sense in that context at the time. Then I go back and look at the model and I get confused again. The dissonance is real.

I think this is actually the QuestionnaireResponse Resource that contains the UI data and not the Questionnaire Resource?

From section 2.37.2 of Questionnaire resource - “Questionnaire is focused on user-facing data collection. It describes specific questions and includes information such as what number/label should be displayed beside each question, conditions in which questions should be displayed (or not), what instructions should be provided to the user, etc.”

And this is exactly the context in which I’ve seen it demoed. Each question needs to be added for the implementation - a questionnaire profile for FHIR, which would be the equivalent of us creating an openEHR template containing CONTENTLESS archetypes, as I understand it.


I think questionnaires are defined as Questionnaire instances, and data collected in questionnaireresponse


That is correct and actually the two FHIR resources are not too badly designed. We have a ePROMs solution and our own model is similar to FHIR (Questionnaire and QuestionnaireResponse). I think they help to get just more than meta-data right to be honest. I strongly suggest a generic archetype pattern for modelling questionnaires

I am thinking generic RM pattern, that would make the archetyping even easier.

1 Like

I agree, any contentless patterns should be in the RM. We are finding some generic reusable patterns for some parts of questionnaires though, and they seem to be working well for a lot of different questionnaires. See

I’m honestly not sure that it would or that it makes sense to have dedicated Questionairre archetypes. It is dead easy to take a questionaire and create custom archetype for that - way easier than constraining a generic structure , which you would have to do in a specialiasation or template anyway. I would be pretty sure that these could be turned into FHIR artefacts very readily as well.

The main challenge we have is really not replicating the basic questionairre structure, it is mixing and matching with ‘proper archetypes’ e.g the example where ‘maximal blood pressure’ pops up in what is otherwise a typical questionairre. THat is largely an issue of analysis but there is also an modelling issue but how these things interact -the FHIR folks are struggling with that too.

What they do have in their Questionnaire resource is conditional ‘enablement’ of particular questions, which creeps into UI territory but can also be argued has a data quality aspect.

From what I can see, almost anything else in there right now is doable in existing archetypes.

so from my POV

Building local questionnaire archetypes is easier than templating a generic questionairre archetype ( we tried that years ago) and there are a few thematic questionairre archetypes already up on CKM.

Being able to pump these out as FHIR resources would be nice.

Adding conditional-enablement capacity would be very nice

Solving the problem of how and where to interact with ‘normal’ archetypes is the holy grail - but some of the issues about ‘nesting’ Entries i.e being able to more easily associate a Blood pressure Entry with a leaf-node in a question would be very helpful.

I assume you really meant ‘RM type’ here?

Right - but of what RM type? Today it’s Observation, which is not a great fit. And there are probably a few node-level meta-data items that would be useful for Questionnaire items that are not in the standard Cluster/Element structure, e.g.

  • question text (maybe with variants)
  • link or ref to an associated ?type of standard Obs or Eval (e.g. your BP example), to be machine populated from the Question response, possibly via a conversion function (your holy grail)

(Note that most of the other FHIR Questionnaire meta-data is constraints, all taken care of by archetyping; also the meta-data in the root type is mostly ‘Authored resource meta-data’, all taken care of by openEHR’s Resource model.

Hence my original (maybe devil’s advocate) thesis - just make every questionnaire a custom thing, maybe in the long run, compiled from ‘standard questions’ from a Questionnaire Question library, but not trying to build in direct Obs or other real archetype data elements, because they don’t really have the same semantics.

This approach would also make it easier to generate FHIR equivalent artefacts.

Kinda meant either an RM class or a set of very generic archetypes that mirrored something like the FHIR artefacts. We actually did this a long time ago and gave up!! If we were going to do anything. I would just mimic what is in FHIR at the moment but in my world there is often a mix of ‘pure’archetypes’, questionairres and essentially a need for pure archetypes to have a qustionaire-type facade.

So, I think there is merit in seeing whether some of the conditionality and extra labelling attributes e.g question text’ that you talked about can just be added to existing RM classes ? ELEMENT I think that would have much more utility. Quite a lot of what is currently modelled in FHIR- Qs (fed up mis-spelling questionnaire), we already IMV correctly as Observations - Apgar, GCS, Scales, Scores.

To be honest, I’m not bothered about the RM class - OBSERVATION is fine.

The more ENTRY classes we add the more semantic subtlety becomes a problem - why is this an Observation and that’s a Q.

I doubt there is such a thing as a standard set of questions - that’s what makes them a pain (though as Silje has indicated there are some helpful partially reusable patterns. And if none of the existing patterns fit , then just build another archetype. If the ‘extra features’ esp. conditionality were included as part of perhaps ELEMENT, as far as I can see the transform of an Observation archetype of a ‘dumb questionnaire’ to a FHIR Q would be pretty easy (and similarly for run-time instances.

The place that we do see standard questions as part of FHIR Qs are things like Apgar, GCS or some other scores like EORTC which has question banks, variably used in different scores. I can see a place for carrying those standard sets outside archetypes almost like reference terminologies, which is I think along the lines you are suggesting bu in our world these would definitely sit inside OBSERVATIONS not Qs (IMO!!).

Well you wouldn’t do that - FHIR has had to custom model all the meta-data that we have automatically for both the artefact as a whole (Authored Resource) and for the interior node constraints, which come for free with archetypes. There appear to be a couple of substantively useful fields we’d need, which is one reason to consider a custom RM type.

Agree on the scales/scores. These are not questionnaires, they are structured Observations.

Conditionality we handle via archetype rules (I am considering ways to make that a lot nicer as part of the TP/GDL work).

We are talking slightly at cross purposes here. Funnily enough, the semantics are important and not all questionnaires are equal, after all.

Use case 1
Consider this form -$File/Adult-Health-Check-Proforma.pdf
It is an Aboriginal and Torres Strait Islander Health Assessment and current the use case for the CSIRO project here in Australia.

  • The content designers perceive it as a questionnaire due to the layout and how they anticipate that clinicians will fill it out.
  • I have modelled it as an openEHR template as a master data set for all ages, representing the clinical concepts as it should be recorded for persistence within a clinical system - Please note that it does contain a few of the new screening questionnaires, where there is screening with yes/no answers.
  • The FHIR team are currently building a demo output using the FHIR questionnaire resource, which is how it is intended to be implemented as a standalone app, leading clinicians through the process of filling it out and presenting conditional content as required.

This is a really common thing in health IT - but is it a questionnaire? I differentiate between the content, which is largely not a questionnaire (except where I’ve modelled it explicitly as such), and the process of presentation in UI and workflow. In the latter situation, the FHIR questionnaire resource seems to fit very well. but remember that it is a clinical content free zone. Should we try to replicate this in openEHR? Perhaps it would be useful to capture the essence of the UI/process/workflow etc here. But content free archetypes? A ‘one size fits all’ abstract pattern which will result in ‘one size fits non’ in reality. (We did try and fail many years ago.) Where all content needs to be added at template level? This provides very little benefit, even if we start to extensively share templates - adding all semantics at template level has to be problematic in terms of sharing and interoperability and clear meaning and intent.

Use case 2
On the other hand there are many scores and scales about assessment of a single, simple concept, all well defined and understood, which can be accurately represented as repeatable point in time OBSERVATIONs. That approach works very well. These are the PREMs & PROMs, the Apgars and Glasgow Coma Scales.

We need to discern if we are talking about a group of questions focused on a single concept or a data set that uses a questionaire format, UI or workflow to collect the answers. Then there is the messy grey zone in between - EORTC and PROMIS are perfect examples of mix and match sub-questionnaires (or reusable question banks) within a variety of more complex questionnaires - and we are currently approaching these by modelling each sub-questionnaire as separate CLUSTER archetypes that fit within the parent OBSERVATION.

For @thomas.beale’s initial example, there are essentially two choices:

  • copy and paste it into a single archetype, but that removes any simple opportunity to reuse the data once it has been entered into the clinical system; OR
  • the data needs to be teased out carefully and modelled using existing archetypes and the ‘questionnaireness’ of it needs to be managed in the UI/workflow etc, no matter what the implementation. This is similar to Use case #1, above.
    For example: Pain and swelling in the joints could be captured using the Symptom/screening archetype - present/true is the only option allowed for each joint plus the same archetype can be used to record ‘no joints with pain or swelling’ at the top level. See as an example.
    The question about joint stiffness on waking could be managed using the same archetype, with the history event to carry the semantics of ‘during the past week’.

Honestly, the other side of it is that if I was to create that questionnaire, I’d do it differently if I was considering how data would be recorded for posterity in clinical systems, especially if I was considering reuse. But the reality is that most questionnaires at the moment are created in a vacuum, for a local reason, with local thinking. And the consequence are the messy clinical requirements commonly found all over. Sometimes you just have to cut your losses and create an exact copy, because you can’t keep wrangling bad design.

One day, assessments like Thomas’ JAMAR form or my assessment example will be designed with a standardised clinical knowledge foundation and we’ll be in some kind of data nirvana. One day… :pleading_face:

We definitely don’t need another class of archetype for this, please :woozy_face:.


How bout an approach that is exactly reverse of templating: start from a Questionnaire (e.g. PROMs) and then work you way into linking to appropriate archetypes (reference to nodes) and terminology - we could call the former as archetype bindings. The Questionnaire itself will have certain openEHR constraints in terms of RM data structures / types and also standard cardinality so it is an improvement by itself. By this way clinicians / non-techy people could easily get started with creating/managing content while a smaller expert group could do the latter. I’d be interested in the community’s thoughts. I honestly think taking a 100 question PROMs or PREMs and trying to create each and every section and question via an appropriate archetype and then do very complex templating will NOT work for end users and even most of the openEHR community.