at first I’d like to introduce myself: I’m a research associate the University of Braunschweig, Germany. We are involved in a clinical data warehouse project at Hannover Medical School. I got a background in biomedical informatics and computer science.
We would like to use openEHR to generate a somewhat generic data model that serves the need of researchers in translational medicine. I have an architecture in mind as follows:
Specialists create archetypes and templates for their specific domains. Then an XSD is created from the particular template. We derive a XML document of the XSD that is filled with data of our source systems (SAP, HL7 v2 messages etc.) with the help of ETL-Tools (data cleaning and stuff…). Then, the document gets validated with help of the XSD and gets stored persistently into a XML Database (or maybe MS SQL Server 2012, we would need to evaluate its limitations). This consolidated database serves as repository for the creation of dedicated data marts.
As far as I understand the architecture of openEHR, we don’t need any of the openEHR ‘server’ functions when our goal is to store data according to openEHR reference model and data is just for research purposes. Is there a trial version of the Template Designer? The sales people of Ocean Software didn’t respond yet.
This is of course just a rough sketch but I would highly appreciate some comments and thoughts about this approach. To be honst: at first I wanted to give the RIM a try. Then I tried their tools. End of story.
Hi Birger, This will only work if the templates does not need to be validated done against the underlying archetypes and reference model. So to speak, if the templates and its data are treated as a list of paths and values with no reference to any underlying model. Maybe, for certain kind of datamining, this will do. If this is your plan, why would you, in that case, need XML Schema? Because, what are you validating then? ---------------- But if this is not the case, consider next: If ADL had the same characteristics as XML Schema (1.1), then we would not have needed ADL. We would simply have written our archetypes in XML Schema (1.1). But this is not the case. We need ADL. There are the XSD’s representing the Reference Model. You can use them to validate on that level. You can, I think, validate that an XML represents an instance which validates well against the Reference Model. But there it stops. But you cannot specialize XSD’s in the same way you specialize the Reference Model to archetypes. This is because extension and restriction have another meaning in the XML Schema then in the reference model. You will run into conflicts. So you cannot check if a derived XSD is valid against the top XSD. Every archetype must be represented by a stand-alone XSD. But also you will run against another problem: The other problem is that XML Schema does not allow siblings having the same name, but are of different type. This happens all the time in archetypes, for example, an item-list has elements and one element has a dv_text as value, the other a dv_boolean. The XML Schema-parsers (from Saxony and also Apache) see them as different types. So the item-list mostly cannot have children called “element”. You can solve that by name-mangling to get a valid XSD. I would not advise this route. More people have run into these problems. One has solved by giving every element another name, with a GUID in it. It has become a completely new Reference Model. You can find it when you search MLHIM. I do not comment on this solution to avoid an old discussion. It is also possible to validate XML representing rm-objects by RelaxNG together with Schematron. But you have to write an engine which translates ADL to these schema-languages. This is a bit hard, writing that engine, but still I think this is the best way, and there complexities which you may not realize now. For example, how do you validate documents having archetypeslots in it? ---------- I saw other people are offering help too, maybe your project can be arranged in a way that you will not run into problems. Good luck. Bert
The Template Designer is free to use these days. It will be open sourced soon (when we get around to cleaning up the code a bit). The usual build server is not available right now, so we have added a download link on the openEHR website . - thomas beale
I work for Ocean Informatics and am sorry to hear you have not heard back from my colleagues - I have emailed you privately to follow this up. I am a clinical modeller and although I have a good grasp of openEHR technology I am not technical so take what you wish from my comments!!
Although you are correct that in theory you do not need many of the advanced features of a n openEHR server (clinical data repository) and might be able to do something simple as you have suggested, as Berty has suggested, this can be a tricky area and I would urge you to take advantage of some of the existing implementations that are in existence. openEHR can be a complex technology tand there are myriad academic attempts to do something quick and simple that tend to consume a great deal of resource and divert from the key task at hand. Peter Linhardt’s offer is well worth considering since his team have practical experience of persisting and querying openEHR data.
The approach I would take, (as used by Peter’s team) is based on the Ocean Template Data Document approach but has no dependency on the Ocean server or any other paid-for product. There are several other non-Ocean products which use an identical approach. The basic approach is to create a template using the Ocean Template Designer which aligns to your input data as closely as possible. This template is used to export a Template Data SChema (XSD), and from that a Template data Document (instance data) is created by populating (normally via xslt) from your source data. There is a standard transform (from any TDS) to ‘canonical’ openEHR XML, available at
The key requirements I think you need to consider are
Can I easily persist and retreive openEHR compositions without getting deepely embroiled in coding?
Can I query the data easily (critical in your case), preferably using AQL Archetype Query Language?
I still think you would be better to approach some existing server providers (or take up Peter’s offer) to save you considerable effort. Once you have a better understanding of openEHR technologies, you might well decide to ‘roll your own’ persistence layer but that would be my last resort not my first!
I hope I have managed to avoid any commercial bias here. To my knowledge at least 3 other openEHR developers are using the TDD approach.
thanks for the fast reponses. You are right that I'm still pretty new to openEHR, that's the reason why I try to understand things on a high level first.
The solution we try to find does not have to be quick and simple, I would prefer to call it lean and appropriate When there is a good reason why we should use an openEHR Server then we should take this road. From my perspective the most important feature that we need is a good set of modeling tools and a healthy underlying reference model that enables us to develop a data model that is somewhat stable and independent of source systems.
Obviously I was ambiguous in my description of what we want to achieve as your first suggestion (TDD) is similar to what I had in mind (I found some of your company's slides getting more into detail about such an approach). When I understand you right, validation is done by using an openEHR server for data validation instead of W3C schema validation against TDS.
I will follow of course the nice offer and get into contact with Peter to learn how they achieved things.
Thank you very much again for giving me a good start.