Trying to understand the openEHR Information Model

I just spent quite a few profitable hours today with ehr_im.pdf, which appears to be the main resource for understanding the “Information Model” or “Reference Model,” available for download from the CKM web site.

Overall, it’s a very well-written document that anyone trying design or implement any sort of EHR system should read. I’m left with a few questions about instantiation, isolation, persistence, querying and the impact of changes on stored content and querying. I hate to take valuable time for anyone to answer my questions, so maybe all I need are some more references.

I’ll first explain what I think I understand of how it all works.

From what I can see, the entire system consists of a hierarchy of classes, some, like the EHR, Composition, Instruction, Observation, Evaluation and Action are defined as part of the reference model while others, the archetypes, which are not part of the reference model, all inherit from one of these RM classes. There are other RM classes, like Entry, Navigation, Folder, Data_Structure, etc.) that are also part of the RM and are properties of the archetypes. EHR is the base class, containing, by reference, all the others. Navigational information inside the composition archetypes is apparently critical. The Composition type is the basic container for all other archetypes that might be used within a single “contribution.” And templates specify which archetypes will exist in the composition types and in what arrangement. All of this seems quite clear.

Several things would seem to follow from all this:

To access even the smallest detail from the overall record, the software would need to request the entire record from the server, presumably in the form of a binary stream, deserialize it all, and then instantiate everything from the EHR class on down. It is somewhat analogous to loading a document of some sort, something you load into memory in its entirety before you can read anything from it. Am I mistaken here? Or is there a way to instantiate small pieces of it? That, it seems, would depend on the level at which serialization occurs, whether it is serialized in pieces or in one big blob (or XML document) or serializes it in smaller units.

If it is all in one piece, how do you manage isolation? Can only one user “check out” the record at one time? Or does it work something like source control systems like SVN, where different people can commit to a common project, merge differences, etc? Once you obtain the binary stream from the server, you from then on know nothing of changes others might also be making.

It would also seem to follow that when you want to save your work (say you added some composition) that you would serialize the entire record–which may contains years of information–and send it to the server as a fresh new document, completely replacing the old one, which, presumably, would be moved to some “past version” archive. Correct? If so, how do you cope with your storage requirements roughly doubling with every tiny addition to the record? I’m probably way off here; you’ve probably got an elegant answer to this, namely, some sort of segmented storage, with each composition persisted in its own little blob??

You have event classes and you have persistent classes, well described in the pdf. A persistent class would be something like a current drug list. Following on with my understanding, it would seem that any change to this list via a new composition submission, would effectively create an entirely new copy of the list, embracing any changes, however slight. Would the old one then be archived in the now-obsolete former EHR record?

How, in all this, would querying work? Would the server itself have to deserialize and instantiate hundreds or thousands of complete EHR records in order to search within them? I understand that you do have some path information persisted outside the EHR blob, giving you some idea what is inside of what, but that would still not eliminate the need to do a server-side deserialization and instantiation in order to read specific information pointed to by the externally-stored paths. Or so I would think. If I’m right, how fast are your queries and what sort of hardware does it take to run them?

Your documention is clear that navigation information within individual compositions is consulted in queries. That would seem to require server-side instantiation, and then, subsequent to that, probing the internal items piece by piece.

Thank you very much for any help!

Randy Neall

With this as the structure, accessing the record would mean instantiating the whole mass of information, the complete record

I suppose there would be nothing to stop some inventive engineers from persisting everything, down to the lowest-level element, in a regular relational database, but I suspect that would be a kluge. This is, after all, a system of classes

Hi!

Good questions! Many of the questions regarding versioning etc are explained in chapter 6 of
http://www.openehr.org/releases/1.0.2/architecture/rm/common_im.pdf

I’ll briefly address some questions and hope others have time for the rest and more details.

From what I can see, the entire system consists of a hierarchy of classes, some, like the EHR, Composition, Instruction, Observation, Evaluation and Action are defined as part of the reference model while others, the archetypes, which are not part of the reference model, all inherit from one of these RM classes.

This is one of the parts that openEHR-learners often find tricky.

Archetypes do not “inherit” from the RM classes in the ordinary object-oriented sense of the word inherit.

The archetypes can bee seen as a list of external validation rules, names etc describing how to pick, name and combine pieces from the RM for a specific clinical purpose. (To “name” a piece here refers to setting a value of a specific attribute of the object, not changing the RM class name.) The serialized EHR data for a patient only contains RM objects that in turn contain references to the archetypes that were used for naming and validating this particular combination of RM objects.

I don’t know if that simplified explanation helps. It might be a start.

To access even the smallest detail from the overall record, the software would need to request the entire record from the server, presumably in the form of a binary stream, deserialize it all, and then instantiate everything from the EHR class on down. It is somewhat analogous to loading a document of some sort, something you load into memory in its entirety before you can read anything from it. Am I mistaken here? Or is there a way to instantiate small pieces of it?

I think most implementations work with pieces the level of VERSIONs of VERSIONED_OBJECTs (for example versioned compositions) or smaller when storing and querying data. See the previously linked common_im.pdf

Or does it work something like source control systems like SVN, where different people can commit to a common project, merge differences, etc?

Very much like a distributed version control system, for example GIT.

You have event classes and you have persistent classes, well described in the pdf. A persistent class would be something like a current drug list.

Actually they are instantiations of the same COMPOSITION class, just with different values for one of the attributes.

Best regards,
Erik Sundvall
erik.sundvall@liu.se http://www.imt.liu.se/~erisu/ Tel: +46-13-286733

Hi Randy, It’s a standard operation to query for and obtain objects of all sizes, from whole Compositions (the largest contiguous objects in an openEHR system) to an Element or even just the Quantity inside the Element. The wiki seems to be down right now, but there is a specification of the return structures for querying describing this. the update logic is Composition-level, and you can’t commit something smaller than a Composition. The default logic is ‘optimistic’ meaning that there is no locking per se; instead, each request for a Composition includes the version (in meta-data not visible to the data author), and an attempt to write back a new version of a Composition will cause a check between the current top version and the ‘current version’ recorded for the Composition when it was retrieved. IF they are identical, the write will succeed. There is also branching supported in the specification. Read the for the details. Not correct :wink: The EHR is a virtual information object, and has no containment relationship to the Compositions or other items it includes. yep, that’s much closer. Actually the spec doesn’t say how the new version is stored, only that its logical contents have to be the current full contents of the medications list (or whatever). Vendors could implement differential version representation if they want to. No, the basic approach is: well that’s often true, depending on the query content, but you can probably guess from the above that the amount of deserialisation is actually quite limited and completely manageable. It’s not even the bottleneck - the bottelneck is almost always web services, and XML serialise / deserialise. hope this clarifies. - t

I had meant to say here: this makes sense for EHR and similar systems because there is very low / no write competition for the same piece of the same patient record, as a general rule. - thomas

this makes sense for EHR and similar systems because there is very low / no write competition for the same piece of the same patient record

Can you give an example of parts of records which are at big risk for competitive updates?

Thanks
Bert.

“big risk” - it’s a combination of how likely it is, and how bad it is if they are.

Generally, current location, current medication lists, summary lists are things where contention can happen. Quite often, I’ve seen, a cascade of things will happen on a patient simultaineously as multiple people focus on the patient

The other place where contention is a problem I’ve experience has been pathology reports that are not complete - in a busy lab doing 2000 reports/day, I observed editing contention 10-20x a day on average. That’s pretty low, but the consequences of a clash… bad…

Grahame

In the lab, are that updates, or new records? How do you deal with long time transactions? Suppose a lab edits a dataset, saves it in an archetype-model which will be used to store more items. Then lab employee does the following test and saves it. Should it be saved in the same data-set, or in a new version? I don’t think you should have long lasting transactions, lasting longer then one millisecond. Maybe in a lab, there should be a client/GUI which stores/caches local until the results are complete. So it depends on the EHR-system and archetypes. And in the current medicationlist, how big is the chance that two edits are done simultaneously? And is it also a GUI question, to refresh the screen, once in a while, so that the chance of a care-professionalist to be looking at the really current medications increases from 99,99% to 99,999%? There can always be a late update from a pharmacy, mostly they update the system at moment of providing medication, but it can always go wrong, electricity fall out, etc. Those screen refreshes are also a GUI-thing. But, of course, it must be prevented. But I think you will agree that there is no need of fancy isolation-schemas. The most basic one will do. And transactions should never take longer then a minimum of time. Say one millisecond. Not everything needs to be resolved by a OpenEHR-kernel. Some things are really a GUI matter. But I am interested in arguments. Bert

that's very interesting. I don't think we've seen anything like that - not that I doubt what you are saying here. It would be very interesting to know in what circumstances competitive updates to Rx and Dx list for a patient occur. Smart systems might track such things and turn on pessimistic (i.e. locking-based) versioning.

- thomas

I thought about this a few years ago and came to the conclusion that
the GUI/Client would need quite a bit of savvy HCI.
The person working on the data need to be kept informed
of how/when the system maybe changing under him.

Google documents has now come along and does something like that.
You're busy editing one section of an article then a networked
colleague begins to edit the same thing.
GDocs tells you who it is and how to communicate with them by a
secondary channel (EHR would be the primary channel).
You can both still keep editing but at least you know you are
going to have double check the result afterwards.
Conflict resolution is best avoided by timely human intervention
rather than automated attempts afterwards.

And GDocs does well even when clients go offline for a short time.

my 2cents

Gavin Brelstaff - CRS4

Hi Thomas,

I can certainly see a situation where e.g A medication order was
issued and the medication administered within a short time period,
requiring dynamic persistent medication summary updates (with
references/links to the original Entries in event Compositions) where
a lazy commit could cause an issue. A problem summary list collision
is less likely but possible e.g. where an EHR is fully
problem-oriented and a patient sees the GP, then visits a practice
nurse, without the GP record being committed first.

Ian

Grahame - can you elucidate on this? Are you saying that you have seen multiple parallel committers trying to update the same lab report (same patient, order etc) at the same time? The only way I can imagine this is if multiple specialist lab systems contribute to a common overall report (i.e. some kind of order grouping). In this case, there is unavoidably logic to do with how the pieces get stitched together anyway, so I am not sure how contention errors could arise.

- thomas

You’ve all been very helpful and clear in responding to my questions.

What I’ve learned is that the basic unit of storage–and retrieval–is a single composition, nothing bigger, nothing smaller, and certainly not the complete roster of compositions as I had thought (based on my mistaken notion that one could not serialize, easily, only a section of a complex instantiated object tree). That resolves a lot of my concerns.

However, this has taken the conversation into an interesting area, namely, those types of compositions of that contain what you call “persistent” information, such as drug lists, problem lists, family history and so on, where subsequent compositions must modify the states of earlier compositions and where, as a result, subsequent compositions must embody and repeat much of what is contained in prior compositions. The same issue, I would think, would also arise in your workflow situations (observation / instruction / action), where, again, subsequent compositions–often in hierarchical relation to yet prior compositions–must modify the states of items in prior compositions. Once again, since everything–and its hierarchical context–is immutable and cannot be modified in place, you have to reproduce that entire context in each composition that modifies the state–or has a dependency on–of something, however small, in a prior composition. And to compound the issue even more, these subsequent compositions, whose contents address prior compositions, might also contain “event” as opposed to “persistent” information. So, as states of items in prior compositions undergo state changes, it is not a simple matter of apples-to-apples substitutions as you replace them with new versions, because both prior and subsequent compositions could also contain “event” information. So maybe the versioning process actually splits compositions, declaring only pieces of them obsolete.

Obviously, you’ve all found ways to make this work, perhaps elegantly, but, as some are suggesting, at very least this would enlarge the amount and scope of information involved in a single commit, thus inviting contention. I see some real complexity here. I’ll have to read more about how versioning works, using the references you have provided me. I did look at the common_im.pdf Eric referenced, and versioning, from my brief exposure to it in this PDF, is obviously one of the most complex aspects of the openEHR specification, as well it would be.

An openEHR record, as I’m coming to understand it, is basically an indexed collection of very sophisticated “documents” analogous to PDFs, “documents,” which, like ordinary documents, are persisted as single digital streams that can be hashed and signed, and that must also be deserialized and parsed. That seems good for adding stand-alone new information to the collection, but somewhat more complex for new information with a distinct dependence on stuff in prior “documents.” This, of course, forbids in-place editing of state; a new document must be issued to change the old and–necessarily–embody the old, even what has not changed, in the process.

A completely different approach would entail saving everything in conventional RDBMS tables and columns or object databases, and allow in-place modifications–if one could solve the problem of preserving prior states of entire aggregations of data, signing and attestation, to say nothing of the problem of sparse data schemas. I can see why you’ve gone the way you have, but in so doing, you have your own set of challenges. But who doesn’t?

Randy

Hi Thomas,

I can certainly see a situation where e.g A medication order was
issued and the medication administered within a short time period,

well, 'short' here probably means at least minutes... that's 'long' in computing terms.

requiring dynamic persistent medication summary updates (with
references/links to the original Entries in event Compositions) where
a lazy commit could cause an issue.

lazy commits (i.e. due to caching) are a different (and real) issue. Proper cache management should avoid them.

  A problem summary list collision
is less likely but possible e.g. where an EHR is fully
problem-oriented and a patient sees the GP, then visits a practice
nurse, without the GP record being committed first.

yes, that's certainly a possibility, if the practice solution isn't designed to deal with it, and the staff are not trained...

- thomas

Yes, in the lab situation we typically saw this multiple times a day - multiple people trying to update the same cluster of records at the same time. So the scenario is a typical relational database- a cluster of related records, some information in fields, and some in blobs as a structured text. Someone would start editing that cluster in a GUI, and then either someone else or a machine would also want to perform some operation that caused updates to some portion of the same cluster of records. A user might spend several minutes editing the record - or even several hours, particularly if they get distracted by phone calls, and it’s a complex report like an autopsy, for instance.

So you can’t afford to do this as database transactions, but you can’t afford to do either version based merging, or to lose either the previously committed information, or the newly committed information - and the users managing this are not abstract thinkers with the time to figure out the clash. And losing good clinical infornation due to bad IT - the users are particurlarly intolerant of this. And as I said, it happened much more often than you’d expect. I spent a couple of years refining the kestral system for managing this issue.

I haven’t seen the same against current lists in an EHR - just that they are updated continually. I’ve no reason to think that the in principle issue is different, though the frequency might be.

To Randy’s point - managing concurrency is a real issue. Period.

Grahame

In the Netherlands there is, what we call, the "door-handle-patient". At the moment he is leaving the room, and is busy opening the door, he tells what he is really worried about.
"I am afraid it is cancer, doc"
The GP asks the patient to sit down for an extra minute and explains why he thinks it is not cancer, or he makes another appointment because he thinks the patient has a point..
So a GP at latest should commit after the door is closed and the patient has definitely gone and just before the new patient enters.
At the moment a patient arrives again at the nurses or assistants-desk, the dossier should be fully up to date, or it should be recognizable that it is not up to date, and then the nurse has to wait until the lock is released.

The kernel can support this best by pessimistic locking.

The GUI should support the workflow in a medical practice, by using the kernels locking-features.

Bert

But what if every user, nurses or GP create a new composition, when they do an addition. Then there is nothing lost.

Bert

>>patient sees the GP, then visits a practice
>>nurse, without the GP record being committed first.
>
>yes, that's certainly a possibility, if the practice solution isn't
>designed to deal with it, and the staff are not trained...

In the Netherlands there is, what we call, the "door-handle-patient".
At the moment he is leaving the room, and is busy opening the door,
he tells what he is really worried about.

That's standard GP land.

The GP asks the patient to sit down for an extra minute and explains
why he thinks it is not cancer, or he makes another appointment
because he thinks the patient has a point..
So a GP at latest should commit after the door is closed and the
patient has definitely gone and just before the new patient enters.

For one thing that moment (the patient being "gone for
good") never comes in reality.

However, there's no need to define such a moment in time.
The GP writes into the EMR whatever is known at any point
during the consultation. Yes, that will be subject to
editing, deleting, amending, but that's normal !

The nurse (that is, any other workplace of the GPs network)
will see whatever has been committed. Whenever something is
committed a change notification is pushed out by the storage
engine and clients can update themselves if relevant (that's
how GNUmed does it). This, of course, does not yet solve the
conflict of the user editing something that's just being
changed but at least there's no chance to not be aware of
it.

At the moment a patient arrives again at the nurses or
assistants-desk, the dossier should be fully up to date, or it should
be recognizable that it is not up to date

In reality "fully up to date" never happens. It is always
the current state of affairs.

and then the nurse has to wait until the lock is released.

Ah, no, it doesn't make a difference whether the nurse waits
for a lock to be released or not - because even if the GP
released the lock the nurse has no way of knowing whether
the GP committed everything (instructions) needing
committing or whether the GP forgot something. That can only
be assured by out-of-band means, say, the patient knowing
what the nurse needs to do for him (or GP and patient
agreeing and sending an "action sheet" *before* the patient
leaves the room -- and still that does not prove the GP does
not remember something needing doing after the patient left
the exam room).

It is a problem not solvable by technical means alone.

Karsten

That was, more or less the point I was trying to illustrate.

But technical means should be able to support these kind of situations in the agreed work-flow in that practice.

Bert

actually, versioning is suppprted (and routinely used) for all Compositions. Its meaning for non-persistent Compositions is that it is an error correction or update. The only real difference is that there will be many more version updates to persistent Compositions over the lifetime of an EHR than for any other kind of Composition in that EHR. the change set in openEHR is actually not a single Composition, it’s a set of Composition Versions, which we call a ‘Contribution’. Each such Version can be: a logically new Composition (i.e. a Version 1), a changed Composition (version /= 1) or a logical deletion (managed by the Version lifecycle state marker). So a visit to the GP could result in the following Contribution to the EHR system: due to the referencing approach, this is not really a problem. But systems do have to be careful to create references that point to particular versions, or ‘latest version’ (whatever it might be). I have to say, in the systems I know of, the contention issue is vanishingly small. It’s not to say it will never occur, but it’s not a general problem that I know of. luckily no clinical person ever sees this in archetype land, or else they would all go mad :wink: (After shooting the evil spec developers for having the temerity to think they should even see such gears and cogs). there is an emerging set of ‘second order’ object definitions, that use the URI-based referencing approach in very sophisticate ways to represent things like care plans, medication histories and so on. I can’t point to a spec right now, but they will start to appear. it’s true, but the gain from avoiding the 3NF modelling approach is that any data complexity can be accommodated - it doesn’t matter if someone comes up with a weird microbiology result structure with 95 nodes of data in some very specific tree structure - the database and query service just keep working. Referencing, larger logical structures like ‘episodes’, and the update semantics don’t come for free, and require careful design. So I think we have bought into a new area of difficulty, as the price of quite a significant gain over ‘single level’ systems where the class model or ER model encode all the information semantics. We need something to keep us off the streets… - thomas