I will offer both some agreement and a counter-opinion. Everyone always wants things ‘simpler’, and without foresight, it is very easy to make them too simple. Let’s take the two cases: ITEM_STRUCTURE data structures, and Observation.data (of type HISTORY). It turns out that the discipline in structures we thought existed in clinical data - Lists, Tables, Trees etc, is less clear in reality, and apparently clinical modellers want to just make everything a tree-like structure. I personally am still somewhat sceptical that this is really true, because in the many tabular, particular bilateral information structures (reflexes, audiogram, much ophthalmology, and so on), throwing out the ability to define a Table structure in the data simply means that tabular structures may get built in different ways be different groups. Internationally standard archetypes can remove this problem partly, but software still has the job of working out how to correctly put the data on the screen and also capture it, in ways familiar to clinical people. But - let’s assume that, all things considered, this simplification was to prove useful, and the consequential downstream costs were acceptable. We could in that case make some small simplification of the current models.
Now consider the counter-examples: Observation.data, the structure of Action and Instruction. Let’s stick with the first one for the moment. As mentioned in previous posts, we chose to define a History data structure in the RM because it is absolutely ubiquitous in observational data, no matter whether it is one or many samples, and no matter what part of medicine or science it comes from. Here are some paths from the Apgar archetype - you can see where they contain ‘events[1 minute]’, ‘events[5 minute]’ etc…
– note that everything below shows English language translations of terms
/data: HISTORY
/data/origin: DV_DATE_TIME
/data/duration: DV_DURATION
/data/period: DV_DURATION
/data/events[1 minute]: EVENT
/data/events[1 minute]/offset: DV_DURATION
/data/events[1 minute]/data/items[Heart Rate]/value: DV_ORDINAL
/data/events[1 minute]/data/items[Respiratory effort]: ELEMENT
/data/events[1 minute]/data/items[Respiratory effort]/value: DV_ORDINAL
/data/events[1 minute]/data/items[Muscle tone]/value: ELEMENT
…
/data/events[2 minute]/data/items[Heart Rate]/value: DV_ORDINAL
…
/data/events[5 minute]/data/items[Heart Rate]/value: DV_ORDINAL
For an archetype containing Interval_events, e.g. any rolling BPs, temperature etc, you get paths like:
/data/events[any event]/offset: DV_DURATION
/data/events[any event]/width: DV_DURATION
/data/events[any event]/sample_count: Integer
/data/events[any event]/math_function: DV_CODED_TEXT – e.g. value could be a code for ‘average’
These paths correspond to formal data structures, as you can see here. Have a close look at these classes. Everything here can be archetyped. The INTERVAL_EVENT class for example provides attributes for sample width, sample_count (allowing data compression from ICU and other devices), and math_function. In all there are around a dozen formally specified attributes here that would not exist in a simple Entry.data: Cluster model. That’s not just 12 things that both archetype specifiers have to invent, but also 12 things implementers have to invent - i.e. find names for, decide on types, and structure (hint: there are not just 12 ways of doing this, but hundreds of millions).
History tells us that organisations in the latter group will do this differently every time. Note that it took probably 4 years to get this model into its current form, where it works for everything we can throw at it now. I have to ask - why go back through that pain? Why go to a model that is too simple, only to discover too late a whole raft of problems that this model takes away? If anyone doubts this, just have a go at remodelling 20 or so of the most common Observation archetypes you can find on CKM, in the 13606 model. You can try this in the existing openEHR tools by either using an Evaluation structure (which does happen to be a tree) or the openEHR Generic_entry type. Any such archetype can be validated and viewed in the ADL Workbench or Archetype Editor.
Let’s see what would happen to the paths and types if you do this:
Fred’s version of Apgar based on Cluster/Element structures:
/data: CLUSTER
/data/items: CLUSTER
/data/items[origin]: ELEMENT
/data/items[origin]/value: DV_DATE_TIME
/data/items[duration] – not defined by Fred
/data/items[period]/value: Real – oops - Fred prefers ‘Real’ here
/data/items[1 minute]: CLUSTER
/data/items[1 minute]/time: DV_DATE_TIME – oh no… Fred wants absolute time points, not offsets
/data/items[1 minute]/items: CLUSTER
/data/items[1 minute]/items/items[Heart Rate]/value: Integer – Hm. Fred just wants to use an Integer here
/data/items[2 minute]/items/items[Muscle Tone]/value: Integer – Hm. Fred just wants to use an Integer here
Janes’s version of Apgar based on Cluster/Element structures**:**
/data: CLUSTER
/data/items: CLUSTER
/data/items[origin]: CLUSTER – oh no, Jane wants to use a CLUSTER here!
/data/items[origin]/value: DV_DATE_TIME
/data/items[duration]: Real – Jane decided on a Real here
/data/items[period]/value – not defined by Jane
/data/items[Any time]: CLUSTER – Jane is not going to control the times at all
/data/items[Any time]/timepoint: ELEMENT-- Jane records the time in this new attribute
/data/items[Any time]/items: CLUSTER
/data/items[Any time]/items/items[Heart Rate]/value
Bob’s version of 24h BP based on Cluster/Element structures**:**
/data/items[4 hour]/items[diastolic]/value: Real – Bob has collapsed the path here; Fred’s software that
understood Fred’s structure of logical ‘history of events’ data now won’t work on Bob’s data.
…
X 1,000 other clinical modellers around the world
X 1,000 other archetypes
X 10,000 software developers around the world.
= 10 billion more incompatible combinations than we have today
Now consider the internal structures of Instruction and Action (here). Consider just the simple fact that Action is a conjunction of ‘time’ (would anyone argue against this?), ‘description’ (ditto), and various other details to do with the Instruction state machine. I am not saying that this model is perfectly correct; maybe it needs a ‘other_details’ attribute as is used elsewhere in openEHR. But again, just do the thought experiment of having nothing but a Cluster/Element structure to work with. You immediately have to a) work out a model of where you will record time and description, b) if you want it to be interoperable, you will have to get into some kind of standards discussion with other people to agree on the same structure and c) you then have to complicate all the tooling software and also downstream software in order for it to ‘know’ that this particular Cluster/Element structure is really an Action structure, and to diplay it as such. The downstream complications will be quite significant. It is the difference between giving a software developer UML pictures like the above, or … a completely generic tree structure, which is as good as giving them nothing.
In summary: there is a strong argument in my view for avoiding all this modelling and downstream complexity. A challenge to the health informatics community is to actually show how it will be better for modelling and software to just go to a simple Cluster/Element model (or in HL7 terms, Act/Act-relationship). If it can be done and save time and trouble compared to what we have done in openEHR, I will be the first to embrace it.
Note: flatter data paths are aesthetically pleasing to humans, but computers don’t care about them. They care about consistent path structures.
(attachments)
