Persistence of Compositions

Hey there,

we make good progress in understanding and applying openEHR to our Data Warehouse project. First data mappings have been done and a poc ETL process utilizing mapforce generated c# code has been created for vital parameters (originating from the ICU). Currently, I’m working on the data model of the persistence layer. I know that there has been some discussion about this topic in the past and recently. I checked implementation examples like Pablo’s ehrGEN project and Erik Sundvalls’ Liu EEE. Additionally I read a paper that gave a helpful overview about persistence strategies and enjoyed the discussion Bert started on LinkedIn.

To get to the point: We intend to use a mixed approach involving a MS SQL Server. More or less static data is supposed to be in the relational part while most clinical data (with the exception of lab values for example) should be represented as rows of XML datatype. Erik’s approach was to develop a proprietary XML Schema to wrap compositions and contained entries. Obviously, this might work in a native XML database like eXist but does not serve our needs. Additionally, storing the archetypes in a relational fashion is not be our first choice. Therefore, I’m interested to learn if any of you has already spent some thoughts about best practices to split compositions into individual XML documents while keeping the relationship throughout different tables and/or rows.

Cheers,

Hi Birger,

In our EHRServer we took the opposite approach: we get XML compositions committed, we store those in the filesystem and we extract and create records on a relational database for parts of those XMLs. The difference from EHRGen is that EHRGen has the composition model implemented as classes and we have an ORM tool that maps objects to relational databases (generates DB schema, does CRUD, validate constraints, etc., we deal only with the object structure, not directly with the database).

I’m really interested of comparing pros and cons of our different approaches, mostly on the querying and retrieval side.

Here is a small demo of EHRServer: http://www.youtube.com/watch?v=D-hs-Ofb8SY

Also the projects is public under my GitHub account: https://github.com/ppazos

Hi Birger,

Erik's approach
was to develop a proprietary XML Schema to wrap compositions and
contained entries. Obviously, this might work in a native XML database
like eXist but does not serve our needs.

Could you tell more about why a XML database does not serve your needs?

Additionally, storing the
archetypes in a relational fashion is not be our first choice.

Why not?

Therefore, I'm interested to learn if any of you has already spent some
thoughts about best practices to split compositions into individual XML
documents while keeping the relationship throughout different tables
and/or rows.

I just released information about our "archetyped" MEDrecord. Our goal
was to create a more friendly API for MEDrecord based on archetypes.
While developing this API it proved to be a small step to also generate
SQL schemas from the archetype so that is what we tried.
Now we have a version of MEDrecord which stores data in a XML database
and a version which stores data in an SQL database whose schema is
generated using archetypes.

More information about the generated API and schemas can be found here:

  http://www.medrecord.nl/medrecord-json-api/

We are planning to implement some use cases and then see which approach
(XML database or SQL database) works best.

But like I said, the main goal of the "archetyped" MEDrecord was to
provide a clean, type safe API to clients and not something which can
store anything without generating some code first.
In time the "archetyped" MEDrecord might grow into such a solution, but
currently, for our use cases this is not important.
The important aspect is that we can create a clean API quickly which is
based on archetypes.

What works best, relational database, native XML database or a
combination (like PostgreSQL with its XML datatype) is something we
still have to figure out. Although I see the benefits of using a native
XML database, I do not believe it will have decent performance any time
soon on cheap hard- and software for the type of queries we need for
building user interfaces without adding many more complexities.

Also, we basically treat archetypes as the schema for the information so
putting a schemaless database beneath it seems to be a bit of a
mismatch. Instead we attempt to convert the schema provided by
archetypes into a schema suitable for relational databases and I must
say I am quite pleased with the lack of complexity on the server side.
The server code turned out to be quite straight forward and simple.
Even the complexity of the generator which converts the schemas is
manageable.

Of course there is room for improvement and maybe it is not enough to
implement all possibilities of OpenEHR but for now it is enough to
implement the use cases we have in the foreseeable future.
We plan to develop it as new use cases present themselves instead of
trying to build something which can do anything first and then see if we
can fit the use cases in there.

Regards,

Ralph van Etten
MEDvision360

Hi Ralph,

I am very impressed by the innovativeness and originality of this plan.
It looks, at first sight, like the best two-level-modeling kernel architecture I have seen for years.

I trust you that it will work like you say it does, but I haven't looked deep enough into that to judge in that way.
Also very good you share this with the world.

You write:
"Of course there is room for improvement and maybe it is not enough to implement all possibilities of OpenEHR "

I don't know how to understand this.
Do you mean that, for the moment, it is not possible to implement all OpenEHR requirements?

Best regards
Bert Verhees

Ralph van Etten schreef op 5-2-2014 20:34:

Hi Ralph,

Hi Birger,

Erik's approach
was to develop a proprietary XML Schema to wrap compositions and
contained entries. Obviously, this might work in a native XML database
like eXist but does not serve our needs.

Could you tell more about why a XML database does not serve your needs?

That is because we got both complex and simple data structures. For example, we got millions of equally structured demographic data and lab values that can perfectly be represented in a static data schema. Furthermore, there is lots of catalogue data that fits best into relational tables. (Another important reason is, that our abailable ETL an BI tools are optimized to work with SQL Server...)

Additionally, storing the
archetypes in a relational fashion is not be our first choice.

Why not?

We have to do the trade off between performance and understandability of the data model. Medical data can become very complex. As we need to build data marts from our database it's more important to understand the data as it's not a real-time system. In my opinion that's a lot easier when you have data structured in a hierarchical way as a document (what XML is perfect for).

Therefore, I'm interested to learn if any of you has already spent some
thoughts about best practices to split compositions into individual XML
documents while keeping the relationship throughout different tables
and/or rows.

I just released information about our "archetyped" MEDrecord. Our goal
was to create a more friendly API for MEDrecord based on archetypes.
While developing this API it proved to be a small step to also generate
SQL schemas from the archetype so that is what we tried.
Now we have a version of MEDrecord which stores data in a XML database
and a version which stores data in an SQL database whose schema is
generated using archetypes.

Are both versions available as open source? Yesterday I took a look at it (just the code) and was impressed by your work!

More information about the generated API and schemas can be found here:

http://www.medrecord.nl/medrecord-json-api/

We are planning to implement some use cases and then see which approach
(XML database or SQL database) works best.

But like I said, the main goal of the "archetyped" MEDrecord was to
provide a clean, type safe API to clients and not something which can
store anything without generating some code first.

I'm wondering about it's capabilities to do validation of clinical data against archetypes. Is it possible to deserialize openEHR XML and check it for consistency or is it a one way ticket so far?

In time the "archetyped" MEDrecord might grow into such a solution, but
currently, for our use cases this is not important.
The important aspect is that we can create a clean API quickly which is
based on archetypes.

What works best, relational database, native XML database or a
combination (like PostgreSQL with its XML datatype) is something we
still have to figure out. Although I see the benefits of using a native
XML database, I do not believe it will have decent performance any time
soon on cheap hard- and software for the type of queries we need for
building user interfaces without adding many more complexities.

I'm happy to exchange gained experience. I already got two ideas but I will need to do the implementation first. If it works I will create an article in the wiki.

Also, we basically treat archetypes as the schema for the information so
putting a schemaless database beneath it seems to be a bit of a
mismatch. Instead we attempt to convert the schema provided by
archetypes into a schema suitable for relational databases and I must
say I am quite pleased with the lack of complexity on the server side.
The server code turned out to be quite straight forward and simple.
Even the complexity of the generator which converts the schemas is
manageable.
Of course there is room for improvement and maybe it is not enough to
implement all possibilities of OpenEHR but for now it is enough to
implement the use cases we have in the foreseeable future.
We plan to develop it as new use cases present themselves instead of
trying to build something which can do anything first and then see if we
can fit the use cases in there.

I think your project might be a great starting point to implement a reference CRUD system people can learn from to build their own applications. Thank you very much for sharing :smiley:

Regards,

Ralph van Etten
MEDvision360

Best,

Birger

Hi Bert,

Thanks for your kind words.

You write:
"Of course there is room for improvement and maybe it is not enough to
implement all possibilities of OpenEHR "

I don't know how to understand this.
Do you mean that, for the moment, it is not possible to implement all
OpenEHR requirements?

At the moment only the functionality required for our use cases is
implemented. For instance, OpenEHR allows archetype slots with
wildcards. This is something we do not need for our usecases and
therefore we have not implemented it yet. There are many things like this.

Eventually I think we will support all OpenEHR requirements but we are
only planning to implement them when there is a need for them.

Regards,

Ralph van Etten
MEDvision360

Ralph van Etten schreef op 6-2-2014 9:45:

At the moment only the functionality required for our use cases is
implemented. For instance, OpenEHR allows archetype slots with
wildcards. This is something we do not need for our usecases and
therefore we have not implemented it yet. There are many things like this.

This brings in a little code-complexity.

I solved it by validating every part against its owns archetype, and check the archetype-id's against the wildcard, and if everything fits, then the parts can be glued together legally to one dataset, which was, in my case, an XML-dataset.

I think there is no other way, because you only know at runtime which archetype is going to be used in a slot.

I am afraid you cannot escape this use-case for long time. If I was you, I would prioritize this one.

good luck, I am sure you will succeed

Bert

Regarding consistency checks, we have been able to generate schematron rules from the archetypes to check constraints stated on the archetype on the XML files. I think it is also what HL7 people uses for validating instances.

Hi Diego, I am to using Schematron for validation-purpose, but I use it together with RelaxNG, because RelaxNG is better in testing structures and Schematron is better in validating simple constraints. But creating a RelaxNG from an archetype is difficult, that is complex and hard to maintain code. But now I see your message, I think it could be possible to create Schematron only and check every possible constraint, which mostly (not on leaf-nodes) are occurrences and cardinality. Do you have every possible constraint in an archetype covered by Schematron? Bert

Hi Birger,

Could you tell more about why a XML database does not serve your
needs?

That is because we got both complex and simple data structures. For
example, we got millions of equally structured demographic data and
lab values that can perfectly be represented in a static data schema.
Furthermore, there is lots of catalogue data that fits best into
relational tables. (Another important reason is, that our abailable
ETL an BI tools are optimized to work with SQL Server...)

Ah, yes, about the same reasons as ours. Thanks for sharing!

Are both versions available as open source? Yesterday I took a look
at it (just the code) and was impressed by your work!

Yes, they will be available as open source.

I'm wondering about it's capabilities to do validation of clinical
data against archetypes. Is it possible to deserialize openEHR XML
and check it for consistency or is it a one way ticket so far?

Yes it is possible. Although it is not implemented yet. So far I have
only implemented serialization to openEHR XML. Deserialization will be
possible, but only for the archetypes known to MEDrecord.
In fact, the original idea was to just create an API which
(de)serialized everything to/from XML and just stores the XML.

Regards,

Ralph van Etten
MEDvision360

I believe that for non leaf nodes I’m sure you can easily check for check form occurrences and types, I don’t remember if cardinality was taken into account, I have to check that. For leaf nodes, I don’t remember having any trouble specifying that kind of constraints.

Thanks :wink: Bert