Dual Model EHR implementation

Dear all,

Within the Virgen del Rocio University Hospital we are analysing how to implement a EHR based on Dual Model Approach. When we analysed direct implementation a database based on of either OpenEHR Reference Model or ISO 13606, we have detected that it could have slow performance . Given that we are concerned about this problem, we would like to know possible strategies have been identified by implementers in order to fasten the performance of storage and query.

Also the granularity level is one open issue that impacts on the performance, I would like to know if the level of granularity of the archetypes contained within the OpenEHR CKM is able to satisfy the requirements of an EHR with more than 1 million records.

Kind Regards

Alberto

Alberto Moreno Conde
GIT-Grupo de Innovación Tecnológica
Hospital Universitario Virgen del Rocío
Edif. Centro de Documentación Clínica Avanzada
Av. Manuel Siurot, s/n.
C.P.: 41013 SEVILLA

Alberto,
I am not sure if your second question is entirely valid. If you are
talking about the modelling granularity of the archetypes, the impact
of this factor on performance is directly related to the design you
are using for implementation. So someone may claim that the
granularity is OK, while it may cause issues in another
implementation. So a performance based metric for granularity of
models would not be valid IMHO.

Regarding performance, we may have a more useful discussion if you can
provide more details about what you have done. What kind of
architecture did you use, that led to the conclusion that you may have
performance issues?

My feedback regarding your question would be as follows:

1) Specialize in the DB layer. It is good to be able to switch from DB
to DB, but don't buy this argument blindly. It would take too long to
go into details, but I suggest that you pick up one DB for your
project, and see what kind of performance improvement you can achieve.
On postgresql, I've managed to increase performance a lot by using
stored procedures. Most software engineers will claim that it is bad
design to write code in the DB, but remember that I'm not talking
about business logic code. I'm talking about the persistence mechanism
related code, which brings us to the next one

2) Do not let domain information to dominate DB design. Just because
concepts in the models (openEHR/13606/CDA...) have relations among
them does not mean that you need to model those relations in the DB.
Yes, it would be easy to have a similar/same representation across
multiple layers. Same structures powering GUI, then sent to middle
tier, and later persisted to DB. Sadly, this has become the dominant
approach to all projects, but the problem is, EHR domain uses complex
tree structures, with dynamic links created due to different data that
needs to be used for different clinical cases. So the approach for a
single design that spans multiple layers does not really perform that
well for EHR work. Yes, it may work, and there are systems that prove
that, but there is more you can achieve if you specialize in your
layers, especially in the DB.
Take a look at http://en.wikipedia.org/wiki/Entity-attribute-value_model
. See references at the bottom. Why do you think medical systems are
mentioned alongside this model?

3) The price you'll pay for an unorthodox db design will be that
you'll have to consider means of getting data out of the db, and
putting it back, for secondary purposes. You won't be able to write
sql queries directly if you go for a specialized DB layer, so you need
to evaluate your use cases carefully. You won't be able to use off the
shelf reporting tools for example, most of which assume that you can
connect to a DB and provide some tables, or some SQL queries. With an
entity attribute value model, this is not straightforward, and most of
the time it is not possible at all.

Cheers
Seref

Hi, I'm somewhat new to the group and openEHR (on technical level). So excuse my ignorance. I was wondering about the use of non sql databases. Has anyone tried to see if something like couchDB or MongoDB could be used to store EHR data. I know there are definate performace gains with nonsql databases, but I don't know enough about the storing of these documents to know if something like this could be used. Just wondering if anyone has tried this or conceived how this may or may not be possible.

Dr. Robert Stark

Hi all.

IMHO the best approach (or at least what I have done and feel is reasonable) is to take only some of the classes in the reference model and represent them in the database. I have seen some implementations which adopt an automatic code generation approach, direct from the reference model. But that builds certain structures into the database which are unnecessary and/or may hinder performance. When analyzing the openEHR it seems to me it was not conceived with its database implementation in mind (which is an absolutely reasonable approach). The way information is persisted, I guess, is left to implementators and I believe that is probably Alberto’s issue.

To solve the multiple database problem, the structure openEHR database structure could be designed using ORM tools such as that in http://www.ormfoundation.org (again, that is what I have used).

I agree that archetypes should pose no performance problem at the database level if care is exercised no to try to represent them in the database. In the final analysis, it seems to me that that is what openEHR is all about: separating (represented, archetyped) knowledge from the (storage) structure

Hi Robert,
The kind of systems you mentioned have the main advantage of scaling
up without very expensive software licenses. I would not say anything
about definite performance gains without having a well defined context
though.

These persistence mechanisms mainly serve large data processing tasks
these days, and they require a different thinking compared to
relational dbs. They become quite interesting choices if you are using
something like the EAV model

The first ever version of persistence mechanism I've implemented for
openEHR used an object persistence mechanism, and it ended up being
much slower than the shiny documents that came with it claimed. These
kind of scalable flat(ish) systems for persistence would be my choice
if I was looking into analysis on a few million EHR instances and
more.

Hi Alberto, we are working on this subject:

http://code.google.com/p/open-ehr-gen-framework/
http://code.google.com/p/open-ehr-gen-framework/wiki/Optimizacion
http://code.google.com/p/open-ehr-gen-framework/wiki/EstructurasDeDatos
http://www.openehr.org/wiki/display/impl/Playing+with+Pablo's+Open+EHR-Gen+Framework

Seref,

You make three points, beginnging with “specialize in the DB layer.” Could you elaborate just a bit more?

I understand your second basic point (“Do not let domain information to dominate DB design”), but lose you beginning where you say, “Yes, it would be easy to have a similar/same representation across multiple layers …” From that point on I don’t follow your elaboration of your second point.

From your third point I take it you are opposed to EAV of any sort. The only other alternative would be to persist serialized byte streams and/or XML documents, which is what I assume to be the OpenEHR approach.

Thanks,

Randy Neall

Hi Alberto,

A few naive comments/questions from a clinical modelling perspective.

The granularity of the archetypes is mostly determined by issues of
clinical validity and reuseability, rather then performance. We do try
to keep archetype tree structures as 'flat' as possible i.e remove
unnecessary clusters, and some work needs to be done to revise some of
the older draft CKM archetypes e.g the OBSERVATION.examination series
in this regard. It would be interesting to know what you meant by the
question - can you give some a couple of examples of more and less
granular archetypes, from your perspective?

We have recently been able to develop a high performance EHR using 80%
CKM archetypes (or those of similar granularity/re-useability) with
querying 100% via AQL. The key change we made to improve performance
was to make sure that dates are well-supported by indexing e.g date
recorded, Composition start_date, observation time etc. Most
operational queries in a live EHR are time-based - e.g Chart of most
recent results, Current admissions etc. OTOH many reporting style
queries will be terminology-based i.e patients with diabetes, and I
expect this is an area where specific indexing might help further. I
know that UK GP systems which have traditional RDBMS type
architectures have extensive indexing on diagnostic codes.Workflow
indexing will also be important in other applications for tracking
orders and resultant activities. This is a facility that Ocean is
currently implementing in OceanEHR.

So indexing on dates, diagnosis / procedure codes and workflow IDs is
probably the key.

You might find it helpful to speak to the HL7 RIMBAA community.
Although starting from a very different RM, they are essentially
facing a similar low-level engineering problem. (I will get killed by
both communities for that statement!!).

I am interested in your question re granularity - can you explain
further what you were concerned about?

Cheers,

Ian

Dr Ian McNicoll
office +44 (0)1536 414 994
+44 (0)2032 392 970
fax +44 (0)1536 516317
mobile +44 (0)775 209 7859
skype ianmcnicoll
ian.mcnicoll@oceaninformatics.com

Clinical Modelling Consultant, Ocean Informatics, UK
openEHR Clinical Knowledge Editor www.openehr.org/knowledge
Honorary Senior Research Associate, CHIME, UCL
BCS Primary Health Care www.phcsg.org

Hi!

When we have approached openEHR storage we have tried to keep use
cases separate and not solve everything with only one database
structure.

A simplified (perhaps naive) version of the reasoning:
1. If you want to access data as big chunks in a certain serialization
format (like openEHR COMPOSITIONs used in daily patient care), then
store them in such chunks (and index on fields relevant to your
retrieval needs).
2. If you want to access small data fields from huge amounts of EHRs
(like population wide statistics/epidemiology) then store them as such
small pieces.

OpenEHR's append-only (or "never physically delete") principle
combined with it's clearly timestamped operations makes replication
between these databases easier. If the DB in use case 2 is not used
for entering data and if it can tolerate a few minutes lag-time behind
live EHRs (1.) then implementation is not that very hard.

Some more hints regarding our reasoning is available in the poster at:
http://www.imt.liu.se/~erisu/2010/EEE-Poster-multipage.pdf
A proper detailed paper is (still) in the works...

I suspect that if you would aim for a "1.5" in between 1 and 2 then
what you will get is exactly a compromise not optimal for any of the
above mentioned use cases. :slight_smile:

Best regards,
Erik Sundvall
erik.sundvall@liu.se http://www.imt.liu.se/~erisu/ Tel: +46-13-286733

Greetings,
Somehow this conversation failed to fall into my primary mail screen,
hence the late response and apologies...

When I say specialize in the DB layer, I mean do not fall into the
trap of delegating managing the DB layer design and use completely to
something like Hibernate. There is now an incredibly common dogma in
the software world where everyone is repeating "it is bad to have
logic in DB, so don't go there, instead use this shiny nice tool.."
That is carrying a useful advice way beyond its usefulness IMHO. Think
a lot before you buy the argument: "but you can easily change
databases if you use ORM and this tool and that" Yes, you can, but
very few real life products support multiple DBs. There is great power
in DB layer, and you can leverage that if you learn the specific
features an capabilities of particular DBs. When I say specialize, I
mean : learn the capabilities of a DB server you'll choose, and
leverage them without breaking the generally accepted principles of
software engineer. It is possible. You can use hibernate, but still
carve out a lot of performance from the DB. I choose to specialize in
postgresql, and that allows me to stay in the 500 millisecond barrier
for queries, inserts etc (which is an unofficial psychological barrier
that was introduced to my work by Thomas Beale)

My third point is: nothing is free, so if you invest in EAV, get ready
to pay the price. You'll have to write more code for simple stuff
sometimes, so plan ahead. Don't buy the EAV argument just like you
should not buy the "oh, hibernate can generate the whole DB" argument.

In general, I'm trying to not to sing along with the chorus in the
software domain, getting ever loud and crowd :slight_smile: Healthcare is a very
challenging domain, so practices of e-commerce, or advice from books
written by academics who has never implemented a single production
system in their lives won't cut it.

Best Regards
Seref

Hi,

Speaking from the HL7 RIMBAA perspective, and echoing a lot of older
cross-industry best practices: to create a database solely based on the
RM is fine - if your aim is to have a database that's not optimized for
any particular purpose, one that can be used for any purpose. In HL7
we've seen a lot of implementations of this type of approach when it
comes to research, clinical trails, public health databases. One will
never know what sort of queries will be important at some point in the
future, so the architecture (on purpose) is chosen in such a way that it
is generic. In general such an approach is known as OLAP. This approach
carries a penalty in terms of performance, but that may not be an issue
in research/clinical trials/public health anyway.

If you're going to use the data for a very specific workflow or purpose,
or if you know exactly how things are going to be queried for, then
obviously the database structure (and indexes) are going to be optimized
for that specific workflow. In other words: OLTP. But this has the
significant drawback (like many proprietary DBMS schema) that it's less
flexible: if the workflows change one has to redo the database schema to
maintain performance.

For this reason one quite often sees both next to each other: an OLTP
database to support the ongoing workflow and a limited number of
optimized queries, and an OLAP store for long term archival and to allow
ad-hoc complex queries, or to extract new data sets for new OLTP-style
databases.

When we have approached openEHR storage we have tried to keep use
cases separate and not solve everything with only one database
structure.

A simplified (perhaps naive) version of the reasoning:
1. If you want to access data as big chunks in a certain serialization
format (like openEHR COMPOSITIONs used in daily patient care), then
store them in such chunks (and index on fields relevant to your
retrieval needs).
2. If you want to access small data fields from huge amounts of EHRs
(like population wide statistics/epidemiology) then store them as such
small pieces.

I concur. Differences in granularity and differences in relying on (and:
maturity of) composite structures such as archetypes and templates tend
to lead (when looking at HL7) to OLAP-style data bases. For HL7 CDA
implementations we've seen the approach where CDA-sections are persisted
as XML-blobs. CDA-sections are the level of aggregation used within the
application, so it didn't make sense to have a database model that's
fully atomic with regard to the RM.

The granularity of the archetypes is mostly determined by issues of
clinical validity and reuseability, rather then performance. We do try
to keep archetype tree structures as 'flat' as possible i.e remove

Ian wrote:

So indexing on dates, diagnosis / procedure codes and workflow IDs is
probably the key.

It probably is, if one tries to generalize from workflow-supporting
applications in daily use. Although one can probably make general
recommendations about indexing on certain aspects, it's difficult to do
so if you look at the full variance of such workflows in healthcare.

You might find it helpful to speak to the HL7 RIMBAA community.
Although starting from a very different RM, they are essentially
facing a similar low-level engineering problem. (I will get killed by
both communities for that statement!!).

It is indeed a similar problem, which is why we (as HL7 RIMBAA) are
looking with interest to the implementation experiences being discussed
on this list, and why we had a joint implementers meeting in Sydney
(during the HL7 meeting).

TTYL,

-Rene

Hi Seref,

From what we've seen in HL7 RIMBAA EAV is rarely embraced wholesale -
in practice people (if they use EAV at all) use a mixed RDBMS - EAV
approach. RDBMS for its speed and indexing capabilities, EAV for
blob-type storage of rarely searched/used data. If you don't need the
ultimate flexibility/extendability of EAV, you probably shouldn't use it
if only for performance reasons (and as a human it gets hard to
understand pretty fast).

In general, I'm trying to not to sing along with the chorus in the
software domain, getting ever loud and crowd :slight_smile: Healthcare is a very
challenging domain, so practices of e-commerce, or advice from books
written by academics who has never implemented a single production
system in their lives won't cut it.

Like always, no silver bullits .. I agree one should listen to those
that have already built systems within healthcare, and not just theorize
about building such systems. The proof of the pudding is in the eating,
not in the chewing on the recipe.

TTYL,

-Rene

Hi Rene,
I did not mean to suggest that EAV is the only design that one should
use. I guess a better attempt to express what I have in mind is this:
people use relational design too heavily in DB layer, mostly due to
benefit of tooling in other layers.

Good analogy about the pudding :slight_smile:

I agree with Alan. In OpenEHR-Gen, we have modeled almost all the classes between Folder and Datatypes (Folder, Composition, Section, Entry, Item, DvText, etc) and represented all those concepts in our DB schema. Here you can find our data model: http://code.google.com/p/open-ehr-gen-framework/downloads/detail?name=model.png

I think my friend Alan refer to our implementation of the OpenEHR-Gen Framework, that automaticaly generates the DB Schema from the Reference Model classes we programed in Grails Framework (http://www.grails.org/). Grails have a great ORM tool (Object-Relational Mapping, this is diferent to the ORM mentioned by Alan).
Through this experience, we have seen that this complex structured model have some shortcomings on performance, but it do not take hours or minutes to complete tasks like data binding and saving or data querying, and this can be boosted by good servers, fast disks and a good DBMS.

What we are doing now is redesigning the data model, dividing the classes in two groups, one groups just for structure classes, and the second group for content classes. The first group have the classes that are part of the structure but don’t have clinical content, like Section or Cluster. The second group have the clinical/demographic content like Element or Composition. Then can infer the “structure classes” from an archetype, we may not include them into the persistent model, so we’ll model only the content classes, and add some metainfo to help reconstruct the complete RM structure as if it has been persisted on the database.

So, in the persistent layer we’ll have: archetypes, a reduced persistent RM, and metadata.

This will be the next step in our open source project.

Hope that helps.

What Pablo is doing in his redesign is precisely the way I think openEHR implementations should be tackled. As far as I can infer, the openEHR model could be best described as sort of a business layer model. Some of its classes (generally speaking, the “structure classes”) should be represented in an almost “as-is” manner in the DB backend (with adequate management, of course, of the impedance mismatch problems issues associated with ORM). Regarding “content structures”, these could be persisted as “objects” in ad hoc field in the database. Before openEHR, and in a model somehow similar to openEHR we had designed, we persisted these complex structures as XML data in XML fields (in Oracle or SQLServer) or (huge) text fields (in MySQL or Postgres). Unfortunately, in my experience at least, indexing these fields (and making them performant) can be a real nuisance with native tools provided by commercial databases (for example, the SQL Server XML API is, IMHO, too complex and not too efficient). Probably that’s where there is most room for improvement in openEHR implementations, as I feel that that is where performance problems could arise. Regarding EAV approaches, if I properly understood the discussion taking place here (implementing EAV at the database level), I believe it should be used only in some exceptional cases. Although EAV models can be made performant by adding hardware, clustering and optimizing the database (something which, BTW, cannot be done with all databases), I feel it is intrinsically underperformant.

Regarding “content structures”, these could be persisted as “objects” in ad hoc field in the database.

What kind of “objects”? And how would any approach of this sort be much better than XML? You’d still have to retrieve, parse or otherwise deserialize the entire blob before you could productively read, or navigate to, the tiniest part of it, taking time and resources. And it would seem that your object would have to be mixed with a lot of instance-level metadata (as in XML), further bloating its size, complexity and internal overhead. And are there non-proprietary ways to do this?

I don’t see how the functionality of such objects would greatly exceed that of a PDF text document (possibly including a document-level table of contents), which, at the end of the day, is what a lot of EMR systems essentially amount to. Doctors typically pull up text-based notes often autogenerated from discrete fields never searched upon again and which may even die upon the generation of the note. I understand that one approach is to provide some basic indexed “pointers” to the blob within the DB, but that does not really overcome the basic problem that blobs pose.

One could argue that this at least avoids the problems often associated with EAV, but at the expense of easy and efficient access to discrete data elements. If a weight is too heavy to lift one solution is simply not to lift it.

Randy Neall

Regarding “content structures”, these could be persisted as “objects” in ad hoc field in the database.

What kind of “objects”? And how would any approach of this sort be much better than XML? You’d still have to retrieve, parse or otherwise deserialize the

entire blob before you could productively read, or navigate to, the tiniest part of it, taking time and resources. And it would seem that your object would have to be mixed with a lot of instance-level metadata (as in XML), further bloating its size, complexity and internal overhead. And are there non-proprietary ways to do this?

I never mentioned blobs. Precisely…XML structures would be one of the best choices (or probably better: a JSON “object”). If you read on you’ll see that that is precisely what I did in the past. That is why I enclosed the word object with double quotes (meaning to use the concept metaphorically). Serializing (real) runtime objects into XML or JSON “objects” and storing these “complex data types” (I should probably have used this terminology) is, to the best of my understanding, the most convenient choice. So I fully agree with your comments.

I don’t see how the functionality of such objects would greatly exceed that of a PDF text document (possibly including a document-level table of contents), which, at the end of the day, is what a lot of EMR systems essentially amount to. Doctors typically pull up text-based notes often autogenerated from discrete fields never searched upon again and which may even die upon the generation of the note. I understand that one approach is to provide some basic indexed “pointers” to the blob within the DB, but that does not really overcome the basic problem that blobs pose.

Sure. Blobs are ghastly and under no circumstance would I propose their use for storing information of the type we are dealing with here.

One could argue that this at least avoids the problems often associated with EAV, but at the expense of easy and efficient access to discrete data elements. If a weight is too heavy to lift one solution is simply not to lift it.

Alan,

Thanks for clarifying. I thought in your earlier post you had ruled out XML. I was curious what the alternative would be. JSON, as you suggest, would be better.

Since writing my post I realized I had not given you credit for one innovation I had not seen before, namely, placing “structure classes” directly into the DB. That would allow you to keep your JSON content instances relatively small and hence searchable with formal queries, at least to some extent. I could be mistaken, but I think an alternative approach I had heard about on this forum would be to create basically just one blob or just one XML document to contain an entire medical record for a patient, a record that would be extended with each encounter, with the prior instance of the record deleted or at least never referenced again. Again, I will probably be corrected here. Your approach sounds better.

Randy

Randolph & Alan,

In our approach we will give a shot to store the group of “structure classes” I mentioned before, directly on a relational DB, in an atomic way (we will have tables and columns to each field, not a blob to store all the structure). With this approach I think we can boost performance about 70%, the big bottleneck in our implementation is de dynamic data binding (put data input from a user in a RM structure), and with this improvements, 1. this logic will be much more simple, 2. the DB schema wil be simpler also. I hope we’ll have some results soon. We’ll be evaluating performance on MySQL and Postgres DBMSs. The XML approach is our plan B :smiley:

This approach is based on some requirements:

  1. We need the clinical data on the production DB (that DB is relational, and the clinical data is stores in the “content classes” I mentioned before.

  2. The repository must support querying data at an atomic level without loading a huge amount of data on memory (for example: find all the patients with systolic BP over 130).

  3. In the end we need a complete structure: the structure classes can be infered by arcehtypes and metadata (that can be persisted on filesystem, or could be instanced on memory)

There are other options too: 1. use an object-oriented dabatase (an ger rid of the Object-Relational Mapping tool), 2. use a document oriented DB: a. a XML native DB or b. a JSON native DB like http://www.mongodb.org/, 3. mix of relational dbs or oo dbs and filesystem (to store documents XML or JSON).

Just my 2 cents.

Yes….what we did is to store these small XML “documents in ad hoc fields designated as XML data types (in SQL Server). These fields are indexable but the API is a programmer’s nightmare and for reasons I fail to grasp (one probably being my own incompetence…that can’t be ruled out ;-), not too efficient. But the fact remains that that’s probably the best way to go. How to index the data elements inside the XML structure so that individual records may be retrieved admits many different solutions. Probably a mix of several approaches would do the trick: EAV for some particular fields could be used (ie: fields often used in searches or sorting), indexing the XML fields with the database API in others. Even indexing outside the database by way of Lucene or something similar could do (but that could bring serious synchronization problems).

I’ve given a great deal of thought to these problems and experimented whenever possible but lacked the time to approach these issues in a more orderly manner. I am still waiting for that crucial paper or technical document which may help to clarify the issue as I think there lies the clue to a widespread adoption of a model as elegant and intelligent as openEHR strikes me to be.

Cheers and thanks for listening