openEHR prototype

Hi everyone,

I will develop a prototype in openEHR using openEHR.NET, C# implementation.

Basically i want to do a web service to an openEHR repository. I studied relational, xml and object databases. Recently i discovered the NoSQL databases. This type of database seems to match the openEHR objectives. There are four types of NoSQL databases: Key/value, Column-family, document and graph.

I think NoSQL document storage is the best type to use.

Does anyone know anything about this that could help me?

Thanks in advance for your time,
José Silva

Hi Jose,
Do you have a list of components you’ll need to develop to implement your prototype? Can you list them?

Regards
Seref

Hi Seref

I didn’t understand very well your question but I will answer in two ways:

  • I want to create an openEHR database and one web-service for this repository. I don’t want to do or care about interface. The objective is that several applications could use this web-service. In future should be possible to convert openEHR in other standards to communicate with other applications who don’t “talk” openEHR, but for now this is not important.

  • For now I will use only therapeutic and administrative prescriptions.

Regards
José Silva

Ok, you’re basically talking about implementing an openehr repository. You asked: “Does anyone know anything about this that could help me?”
If you don’t care about the access mechanism to your repository, you are free to use anything. If you have more specific requirements for saving to and reading from your repository, data size, transaction support etc, then you’re better with some options than others.

Right, this question was about the repository. I am thinking to use NoSQL Document Storage like MongoDB. Am I choosing the right type?

My repository must be prepared to handle large amount of data, the query response speed is a bit more important than storage speed because there will be more queries than inserts/updates.

Regards,
José Silva

Jose,
Don’t get me wrong, I have no intention to offend you, but I think you’re trying to do something without adequately understanding what you’re trying to do.
EHR persistence in general is probably the most complex component of EHR implementation, both for openEHR and similar standards based data.

If you want to know if you’re choosing the right type of persistence technology, you (or anybody else) can only know this if there is a persistence strategy/design in place.
You’re going to have some representation of openEHR based data, and you’re going to build a mechanism to persist and to read it. There are many ways of doing it. You can use something like db40, mongodb, relational dbs…
There is a huge missing block in your thinking and in your plan, and you’re asking questions which can be answered only if someone knows what is in that missing block.

My suggestion is, try to implement the simplest possible method of writing data to somewhere (even a file would do) and read it back somehow. See what it is like to have a complete data loop. I’ve started with db40 about 5 years ago.
Do it in the fastest way you can, to see the whole picture, then start identifying the components and optimizing/replacing them.

Brutal honesty warning: It takes years of work to be able to answer your questions, and it is very valuable information, unlikely to be provided for free. So start building a small prototype, and ask questions as you encounter issues.

Regards
Seref

No, it’s ok Seref. You are right, i’m new in this field and i’m trying to start to do something.

Thank you for your advices.

Regards,
José Silva

Seref, i certainly wish i received a response like this one 5-6 years ago! :smiley:

Jose, in (small) addition to what Seref says, your "problem" is NOT mongoDB...Your "problem" is not 100% technological.

The first thing that you have to do is understand the openEHR data structure and how a tiny little humble number (like a single OBSERVATION for example, just one number) is stored and associated with the rest of the data in a subject's EHR.

Do this mentally, based on the specification documents. Don't worry about technology. If you had to do this "by hand" how would you do it? Where do you start? What do you need? What do you need first, second, third, etc. And then try to generalise to how would you do it for Archetypes & Templates of ANY structure.

What you are dealing with here is a Dual Model (it specifies both a Reference Model (RM) and an Archetype Model (AM)). The Reference Model specifies abstract data structures (like a list of numbers for example, or a table, etc) and the Archetype Model specifies how are these abstract data structures pieced together in even larger structures that support a specific use-case. (And Template's are doing the exact same thing by piecing together different Archetypes).

The key-point here is that the AM contains a huge ammount of extremely important data: _Constraints_ . And, it is not always possible to map an Archetype's (or Template's) constraints to whatever similar mechanism some DBMS is using. It's not always as simple as "NOT NULL"....It's more like "This list of numbers should be between 4 and 8 items long and each entry of this list, being a number, should have constraints (lowLimit<(<=)x<(<=)highLimit) that depend on the unit that the user will select!!! (and which units are allowed are also specified BY the Archetype!!!!!)". In other words, an Archetype may be allowing you to specify "length" in feet / inches / meters / cm / etc and the constraint "0-1 meters" has different "physical" representation (0-1, 0-100, 0-3.9, etc) depending on the unit.....This "detail" can not be ignored or simplified....This specificity is the actual objective.

Once this way of describing data is specified, once this huge data structure is in place, you can't just leave it there. You now need a way to query it. This is where the Archetype Query Language (AQL) is coming in. This is a project on its own (i am not joking). You have to parse AQL and then plug the parameters to a function that will actually implement the query. The "best case" scenario is one where you can "translate" a query from AQL to whatever a DBMS is using....But that can turn to a bloodshed pretty quickly too so better keep AQL in the radar from the beginning.

Once you have your data in place, all impecably organised and queryable...you can then start using them to actually generate some useful information...This is where GDL is coming in (http://www.openehr.org/news_events/releases/20130311)…But we are already in 2020 by now (and everything has gone beautifully, ideal and full-time)...so let's come back to what you are trying to do.

In general, even ignoring the GUI part, you will end-up implementing an openEHR DBMS to a certain extent. Whether it's totally file based like Seref is proposing or it makes use of facilities provided by some underlying DBMS, you will end up implementing functionality that CRUDs (Create, Recall, Update, Delete) this large data structure specified over several PDFs in openEHR.org.

The mongoDB business is a tiny little branch of the tree (not insignificant). You have to stand back and appreciate the bigger picture because this will save you a huge ammount of design-redesign cycles (and the more you have built, the worst the "tearing down" is).

Of course, you can always scour the XML of an Archetype (AT SOME VERSION!!!!!), grab all the paths and then assign values to those paths in a key/value kind of way and query the graph using the query language of the DBMS......Solved. (???)
(Maybe this is a way to handle JUST the "last 10 meters" of the persistence)

I am in no case trying to scare you off but as Seref says, you need to understand what is going on and you need to do this before you write even a single line of code, this will bring the right questions to the surface.

I hope this helps.

All the best
Athanasios Anastasiou

Hi Jose, I think trying & learning from current openehr open source software is the best first step for what you are trying to do.

Ing. Pablo Pazos
www.cabolabs.com

Actually,
I’d still suggest that he tries it on his own first. It takes about 2 years to learn how to write code properly with a language. It takes 20 to learn how to read code.
Due to massive amount of frameworks and concepts, today’s code is a whole different challenge compared to 20 years ago. He’d lose lots of time just setting up stuff, following function calls in source code etc etc.

My humble suggestion to Jose is write first, read later.

Hi everyone

I don’t know if it’s just me, but I’m not sure that Jose’s question really got answered.

I myself am starting to dive into development of an openEHR system and I found some of the comments and responses to Jose’s original question a little puzzling. Specifically the statment “It takes years of work to be able to answer your questions, and it is very valuable information, unlikely to be provided for free”.

Seref, i have read your blog and seen your postings, and I respect your knowledge of openEHR and contributions. But I don’t know why asking about database implementation would take years to answer and not be provided for free. I think he just wants an opinion for those who have implemented a nosql datastore.

Maybe I’m not understanding this correctly (so please correct me if I’m wrong), but I think that we need to be understanding that there are going to be all kinds of developers which have different levels of experience. We want to attract as many of these people as possible and make it as easy as possible to start there journey into openEHR. And therefore, it would be great if someone who has used a noSQL database like mongo, answer the question about the experience with it and wether it worked and what are the pitfalls. This kind of information should be shared as it increases the chances of the mainstream adoption of the project itself as a whole.

openEHR can be quite intimidating even for experienced developers because of things like dual model approach, AQL, ADL, Archetypes, Templates, etc. These are not ideas that are necessarily mainstream with regular computer software development. And so putting all of these things together is frankly intimidating since there is kind of a black hole on exactly how to implement all of this together. We want to show that implementation is possible and share ideas of how. There are a lot of ideas about how to go about doing this, and it would be great if this kind of information was shared freely as our ultimate goal here is the same. My hopes is that the openEHR community embraces this way of thinking as well.

Dr. Rob Stark

Hi!

Sorry for joining the discussion a bit late. I hope you don’t mind if I split the discussion thread and (soon) start another separate additional renamed thread-branch regarding modularization.

In [1] we discuss a modular persistence approach where also NoSQL persistence approaches could be plugged in.

Regarding NoSQL solutions Sergio Freire has has recently ended a productive post-doc year with us at Linköping University (we miss him already) and together we are now in the middle of experiments using different approaches:

  • Some initial XML experiments were reported in [2]. As expected the investigated XML-databases did of course not scale for epidemiological queries if used in a simple straightforward non-optimized way, but some (e.g. BaseX) worked well for non-epidemiological queries where you already know the patient identity.
  • Hadoop with an openEHR-path-specialized indexing mechanism and map-reduce (experiment lead by Fang Wei-Kleiner)
  • Couchbase with openEHR data stored in JSON-format (experiment lead by Sergio Freire)
    All of the solutions above will likely be available as open source later, but as you probably understand they are experimental incomplete research implementations done in very limited time with limited resources and thus far from ready for production use. The preliminary performance results so far regarding the last two are promising also for epidemiological queries. (Sergio is also exploring a RDBMS-based variant.)

I know that the vendor Marand has been exploring different NoSQL approaches too (including MongoDB) before settling on a well performing RDBMS-hybrid approach using an additional inverted index. This is mentioned briefly in an upcoming survey paper [3] regarding different openEHR persistence implementations used around the world.

Best regards,
Erik Sundvall
erik.sundvall@liu.se http://www.imt.liu.se/~erisu/ Tel: +46-13-286733

References:
[1] Sundvall E, Nyström M, Karlsson D, Eneling M, Chen R, Örman H. Applying representational state transfer (REST) architecture to archetype-based electronic health record systems. Accepted to BMC Medical Informatics and Decision Making. 2013;
Preprint manuscript available via email request from erik.sundvall@liu.se
Limited parts of the paper are also described in chapter 3.2 of my thesis http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-87702

[2] Freire S M, Sundvall E, Karlsson D, Lambrix P. Performance of XML Databases for Epidemiological Queries in Archetype-Based EHRs. In: Proceedings Scandinavian Conference on Health Informatics 2012. Scandinavian Conference on Health Informatics 2012, October 2-3, Linköping, Sweden. Linköping: Linköping University Electronic Press; 2012. p. 51-57. Linköping Electronic Conference Proceedings, 70.
Available from: http://www.ep.liu.se/ecp/070/009/ecp1270009.pdf

[3] Samuel Frade, Sergio M. Freire, Erik Sundvall, José Hilário Patriarca-Almeida and Ricardo Cruz-Correia. Survey of openEHR storage implementations. Submitted to CBMS 2013

Let's try to get a little perspective here.

The problem of storing openEHR data in a database is the same as for any other so-called 'complex object data'. The openEHR data are defined by the openEHR Reference Model. That has around 110 classes, including 28 clinical data types, and 40 descendants of LOCATABLE - this includes the demographic model classes.

There are probably around 65 classes that 'matter' to normal developers. Most data are structures consisting of COMPOSITION / SECTION / various ENTRY types / HISTORY / EVENT / CLUSTER / ELEMENT, and most clinical data complexity ends up as just more CLUSTER / ELEMENT structures, i.e. 'free hierarchy'.

This class model is pretty tractable for software developers.

In terms of persistence, the main complicating factor is the versioning, which although it is modelled in a specific way in openEHR (and includes the 'Contribution' concept, i.e. change-sets), is far from an openEHR-unique feature. So to implement an openEHR persistence solution, you just need to be able to store:

  * all updates as change-sets, known as Contributions, where
  * each Contribution consists of one or more
      o logical content item additions
      o logical content item deletions
      o logical content item changes
  * and where each such content item is either
      o a COMPOSITION or
      o one of the type EHR_ACCESS, EHR_STATUS or FOLDER
      o in the case of demographics, a PARTY or PARTY_RELATIONSHIP.

I wouldn't regard this as a 'big problem' - and it's not specific to openEHR.

All of the 'complexity' of openEHR resides in how the data are understood at the next level, when created in the first place, and when retrieved from persistence. From the point of view of basic store and retrieve by e.g. some indexed UID, this doesn't matter.

However it starts to matter when you get into retrieval by query. Then you suddenly start to care 'what the data mean' and about how to implement a query processor based on not just the structure (the reference model classes) but also on the 'soft typing' provided by archetypes / templates.

Seref's comments relate to implementing not just brute-force persistence, but also query-based retrieval, and doing both in a truly scalable and portable way. In other words, the requirements of the 'maximum capability' openEHR solution one could imagine.

Obviously it's reasonable to implement lesser things, and I would encourage this.

The need for the two functions (persistence and querying) don't go away, but we can certainly make some simplifications if we don't assume the need for e.g. 20 million EHRs with sub-second query response from day 1. So I would encourage developers to focus on a) a working SQL or noSQL persistence solution and b) even a rudimentary AQL query processor. By the way, a 'no SQL persistence solution' can be achieved with a normal SQL database like MySql - just use blobbing. Or use one of the XML databases like http://www.exist-db.org/exist/apps/doc/ . I think the performance may be weak at the outset, but Erik Sundvall's and other groups have been looking at these solutions and gathering evidence as to what kind of performance can be expected and realised.

Once you decide to go no-SQL, either blobs, path-based, XML or whatever, the AQL query implementation becomes pretty easy (again, the hard stuff is the optimisation, not the core functionality) and I would encourage developers to try something here.

As a final suggestion, would it help to set up an 'openEHR persistence' wiki page and even mailing list to gather intelligence across the community and share it better?

- thomas beale

There already are a few good starting points for such a wiki page:

http://www.openehr.org/wiki/pages/viewpage.action?pageId=786487
http://www.openehr.org/wiki/display/resources/Persistence+FAQs
http://www.openehr.org/wiki/display/dev/Design+pattern+for+persistence (to a certain extent)

Hi Robert,

This is a paper that describes their implementation of openEHR and
Querying system by Scala and MongoDB
http://link.springer.com/chapter/10.1007%2F978-3-642-37134-9_15

Shinji

Hi,

I would like to thank you all for your opinions.
I’m seeing that this community is very active and I’m learning too with your comments.

Rob saw my point and I agree with his opinion.

I’m exploring C# implementation from CodePlex to create my web-service.

I think my steps should be:

  1. Choose the archetypes i will need;
  2. Create one class for each archetype allowing this use for several templates;
  3. Create a repository to store this information taking into account what Thomas Beale said (like I said, I thought in NoSQL);
  4. Store and query the repository creating the respective modules;
  5. Use the queries’ answer to build a XML message to send to the client that did the query.

Please, correct me if I’m wrong or missing something.

By the way, this project I’m doing is for my master thesis dissertation.

Regards,
José Silva

Jose,
I do not understand what you mean by #2. Have you seen this?: http://serefarikan.com/2012/11/08/openehr-for-practical-people-cleaned-up/
Did you read this bit on Codeplex?: http://openehr.codeplex.com/documentation

Thanks Shinji. Are you using any of these databases with your Ruby Implementation?

Jose,

Your steps are a little vague so it’s hard to to know what you are trying to do for sure on a few steps. But I though I might put this out there for others to look at what we are going to try and do. And by the way, thanks everyone for pointers about nosql. I still have to read up on all this so I may find the answers I am looking for in those documents.

RIght now we are still researching. So here is where we are starting at with our Rails Project. Please let me know if I am way off base on any of this.

  1. We will develop specialized archetypes for the dental domain using the archetype editors. We can then use these in conjunction with openEHR archetypes to develop templates.

  2. We are going to use the the Ruby library (Thank you Shinji!) to convert our adl documents into object models in Ruby for the system to manipulate (Create, Edit, Delete Objects). Once a patient record is created, we will store it as json data to either mongo or couchbase for storage. We haven’t really decided on which one to go with yet.

  3. We will try and use json formatted data when communicating with clients (ios and browswer).

So right now, where I am a little foggy is on how we are going to query and what are we going to do with aql. I am still researching this. Of course i know we can query json data from the nosql database, but I have more to work on here.

Just wondering Jose why you would use xml for communication when you could use json which is supported by many languages as well as mongo?

I know I’ll sound grumpy again when I say this, but here we go:

Most nosql databases are quite young. They are all emphasizing the ease of scalability, and the relatively easier models of querying data. If anybody here is looking at a nosql db for a system that is going to be used for clinical care, my suggestion is to consider the ACID support and immediate consistency.

Most nosql dbs rely on replication argument to claim that durability is achieved via replication and the data is there, one way or another. However, not all installations can afford multiple servers, and most nosql servers also scale with eventual consistency. During clinical care, you can’t rely on eventual consistency. You want immediate consistency.

Relational dbs are extremely good at ACID and immediate consistency, which is the reason they can’t easily scale out. Because you need a global transaction manager to scale out with immediate consistency, and that is really really hard. Please take a look at this before getting over excited about nosql db concepts: http://en.wikipedia.org/wiki/CAP_theorem So far no one has beaten the cap theorem.

The nosql hype in a way reminds me the perpetual machine claims that never die. Yet, the second law of thermodynamics is out there, unchallenged.

So if you’re doing research work, where you are not building an OLTP system for clinical care, nosql may be great. But if you’re responsible for systems that will support clinical care during the actual care process, I suggest a bit more thinking.

I’ll simply post a link to another message I’ve written in the past for JSON: http://lists.openehr.org/pipermail/openehr-technical_lists.openehr.org/2011-December/006406.html Please check the bits about abstract types an concrete instances etc etc. .

Without AQL, you simply can’t have portable access to data. if I have to learn access method A to read from openEHR repo A and access method B to read from openEHR repo B, then how on earth we’re going to have smart healthcare apps that can run across multiple systems? Adopting systems to each repository is so costly that it’ll never take off. Just for the fun of it, google “curly braces problem” and see what that brings.