Best Database for openEHR

@pablo , I struggle with this. I am new to micro-services and am still learning, but my current concept of EHR doesn’t seem to fit with the general micro-service model. From my understanding each micro-service (bounded by business context that is determined by behavior/function) owns its own data. The openEHR model of having 1 massive data repository would seem to violate this ? How can one conform to the core tenet of micro-service architecture (services that have independent behavior and control their own data) and also still maintain the outward appearance of a consolidated data store built with openEHR archetype components ? What are your thoughts on this ?

Thanks for your thoughts,

@woodpk

Hey @woodpk,

I’ve posted an article related to this topic a while ago https://www.cabolabs.com/blog/article/microservice_architectures_and_open_platforms_for_health_care_information_systems-5cab9ab60d88a.html

About your question, one thing is to focus on one business area per micro-service, and another is how “massive” are the data repositories. You could have a lot of data related with one business area, for instance the clinical data repository is one area and is massive.

But, in general, you will have many systems in a hospital architecture, each could be focused on one business area, all can exchange information they manage, and all can have their own repositories. What you won’t have is one database with all the data from all the components if you are following the micro-service approach.

On a lower level, you might want to break the clinical repository into different repositories, for example, because it could optimize usage (like using shards divided by range of time), or storing different information “types” on different repos. At the end of the day, you will find most clinical info are “documents”, and as such, the clinical repository will be a massive document storage, but still focus on one business area.

Hope that helps :slight_smile:

@pablo Thanks a lot for such a fast and thought out reply.

I can see how different sub systems could easily have their own openEHR data backend. I have been looking into the Task Planning aspect of openEHR. I see these task planning systems as Sagas. Udi Dahan talks about (I might be misinterpreting this) Sagas being a better definition for a bounded context than what most people might use (eg Client/Patient Manager or another service defined in terms of a data model). It just seems like the idea of complex Sagas, with the data they -own- would naturally “break up” or overlap the data sovereignty of the mass openEHR CDR document store.

Do you think this issue could be solved almost by having the back end store for each “service” be of the openEHR CDR type? Would this even be the case for billing and other such services?

I know these are too many questions to discuss adequately in a forum post, but I appreciate your attention and thoughts.

Out of curiosity, has anyone ever attempted or thought about an event sourcing / event store strategy for OpenEHR persistence, @ian.mcnicoll ?

I worked with event sourcing in juridical context. The concept would also do in clinical context because not only the state is important but also how we get there.

So it could make the data-version system of OpenEhr superfluous. That would be a relief. But on the other hand, it should not occur very often that versions are stored.

A lot of queries are about state, and what I saw in an event store storage was shadow databases that stored the current states when applicable.

I wonder if an event store adds something to OpenEhr. I can imagine one strong point. Event stores can handle long lasting transactions. Sometimes in banking and business there are transactions that can take days, but more often, minutes or hours.

OpenEhr does not really support that. When the Composition is ready, it will be stored in a microsecond transaction.

More ideal would be that different parts of a Composition would be prepared for transaction immediately after they occur. Then it becomes possible to store not finished Compositions, cq events (f.e. Observations ), for the case something happens which makes it impossible to finish the Composition, even when the client systems goes offline.

Another use case for long lasting transaction in clinical context is that Compositions really can take hours or days before they are completed. For example, labs can take longer time, but semantically, (one could argue) , they belong to the composition in which is the clinical context for the lab.

Block chain can also be a great help in such situations.

@Bert_Verhees One of the things I was thinking about / wondering given my understanding of the openEHR system structure is how the main EHR ‘database’ operates under significant load from multiple users across regions. I’m sure i’m not seeing this correctly, but isn’t the main EHR database a significant single point of failure ?

I was thinking that using the CQRS/event sourced type of archetecture, not only would you have the openEHR Audit system taken care of by default, but you would also be able to partition out your data into actual microservices (services that owned their own data). I suppose the thing I am having the hardest time wrapping my head around is that, while I have read @pablo 's article on a microservices implementation, this does not meet the definition of a microservice as I understand it. The main EHR database seems like it might be a big monolith that all other services directly depend on ?

Is my perception of this wrong? Are there things I am not seeing in the openEHR architecture ?
I really appreciate the feedback both of you have given thus far. It is helping me understand the overall system better.

A database is always possible a single point of failure, from that perspective OpenEhr is not unique.

But that risk is very much mitigated with mirroring in cloud technology with mirrored databases on several geological locations and also a local mirror for when the internet goes down. Also your kernel should run in the cloud on several geo-locations. The API will send you through to the best mirror for that moment.

This is not OpenEhr specific, and we experience that not all hospitals take necessary precautions. We had a hospital in the Netherlands last week which had big problems in the server-room and was almost a full day without data. This can cost human lives.

No matter OpenEhr or another system, this kind of problems are not necessary anymore, and a hospital having these without having taken precautions must be held responsible for lost of lives if it occurs.

As I wrote, a event source would be a good idea for OpenEhr, in fact for all kind of clinical engines, but you need to solve the CQRS part, in the same way you have to solve it without event source. That is mainly by having an AQL engine which is not very hard to arrange on a NoSQL database.

Because you do the queries against a database in the cloud, serverless, you don’t need to know which mirror you are addressing, for all these kinds of situations are intelligent solutions in the cloud.

Hi!

For many use-cases for a specific healthcare provider (or region) I’d say that having a single logical EHR/CDR database is often a reasonable approach, then you could of course split it physically using sharding of your own or automated distribution/sharding built into general data-backend products.

Other things than the EHR Database/storage in an openEHR-platform could be split into micro services though. Some suggestions are available in Applying representational state transfer (REST) architecture to archetype-based electronic health record systems | BMC Medical Informatics and Decision Making | Full Text

It would probably not be so great to put every box in that image it’s own micro service, but it may give some thoughts regarding possible things to group/split. If you want to go by an “owning data” split, then the Demographics, the “Contribution builder” and some caches/trigger-handlers could be suitable pieces.

The paper above also mentions some sharding options.

Please note that the paper pre-dates most openEHR REST-implementations and specifications, so the URLs and several other things/thoughts do not match the current openEHR REST standard.

Hi everybody,

I think it generally makes sense to do horizontal partitioning on the EHRs. I think what could require some more attention/implementation is “distributed” versioning: This would work similar as git: there is a master CDR and the slave CDRs (like the database of a particular application system) could clone from the master. Then, commits can be made on the slave systems and merged into the master.

In the architecture overview, this is briefly described as follows:

some EHRs in an EHR server in one location are mirrored into one or more other EHR servers (e.g. at care providers where the relevant patients are also treated); the mirroring process requires asynchronous synchronisation between servers to work seamlessly, regardless of the location, time, or author of any data created.

Plus the link to the spec: Common Information Model

I think it would be great if this could be implemented across openEHR vendors soonly, so that we can really start to sync between vendor-independent CDRs and reach highest scalability.

Best,

Birger

1 Like

It is all arranged by the cloud providers, you don’t need to worry about that as a customer. Even if you have local databases, synchronisation can happen as a cloud service, because you can add local machines to the cloud mechanism.

Bert

Thanks for this interesting contribution

Bert

1 Like

@woodpk for some published performance tests see links at the end of post #50 Dec '19 above.

1 Like
4 Likes

Good framework, abstracting the database layer and making quick development possible.

IMHO, The best suggestion in this thread.

1 Like

hi,
what is your idea about Apache HBase?
we are developing an OpenEHR based system and using HBase for our db.
it would be great if we have @thomas.beale opinion too.
thanks

I can’t comment on the basis of any experience with Apache HBase, but I read it is conceptually Google’s Bigtable idea over Hadoop and HDFS. So the question of is this DB any good for openEHR can really only be answered when you determine your schema approach. If you are going to use a 3NF approach, or close to it, I would not guess there is much advantage, and indeed standard Postgres is probably easier (you’ll want inheritance support between tables for example - I don’t know if HBase has that).

On the other hand, if you are thinking of some esoteric schema architecture that creates large tables, maybe based on the idea of archetypes as table definitions (i.e. equivalent to classes), then it might indeed be interesting (templates then become views of archetype table projections; 1 view per template in the system). I think some Chinese openEHR projects have followed an approach like this. I’d consider it a research question rather than a production approach but you may know better. I’d be very interested to see it actually working!

The other thing HBase could be good for is as a target for ETL extracts for ‘study’ purposes i.e. implementing SDTM tables for research. Each ETL target then looks more like a ‘big table’ i.e. big data resource on which to do some data mining or Bayesian logic or whatever.

Certainly interested to know your experience.

1 Like

Hi Pourya,

I cannot comment on HBase, but I’m interested if you tried EHRbase with its postgres approach. I would highly appreciate to learn about your findings and what criteria are leading you to build your own openEHR server (of course creating an own commercial IP is a sensibel reason).

1 Like

Hi Pourya,

Just to back up what Birger has said. Of course it is always great to see new openEHR CDR datastore implementations and the specs are there to be used but it is a pretty challenging engineering task, and I always suggest to people interested in doing that to work for a bit with an existing openEHR CDR so that you get your head around a somewhat different paradigm of data management.

That is particularly true if you have an application waiting on the CDR to be developed but even if the primary goal is just to build a CDR, a bit of time working with an existing product will give you a much better feel for the challenge and potential solutions.

Of course you may know all that, and want to press ahead in which case, feel free to keep asking questions and good luck!!

1 Like

I’d like to suggest considering a Distributed SQL database for this project. If you’re not familiar with this database category, it includes databases that are built natively as distributed systems. As “distributed-native” systems, they have that in common with NoSQL databases like Cassandra and MongoDB, mentioned in earlier comments here. But this category of databases also supports “SQL”, as the name indicates. Specifically, “SQL” refers to relational algebra and support for ANSI-standard SQL. In the past, we as an industry moved to NoSQL systems with the idea that relational databases could not scale cost-effectively, or at all in some cases. For the RDBMS technology at that time (~2007) that assertion was true. But now Distributed SQL databases have solved that problem and can effectively scale relational joins in a distributed system. Distributed SQL databases are easily adopted partly because they use existing common database protocols, viz. MySQL and PostgreSQL, so they work with existing drivers in any programming language, like Java, and with ORMs like Hibernate, mostly out of the box. Distributed SQL databases are also multi-model these days. That means that they support documents and JSON as first-class citizens just as well as traditional relational tables and rows. Because of these capabilities, Distributed SQL are replacing the last era of NoSQL databases like MongoDB and Cassandra as well as RDBMSs like Oracle, SQL Server and IBM DB2. Some examples of Distributed SQL databases are SingleStoreDB, CockroachDB and YugabyteDB. (Disclosure: I work for SingleStore.)

3 Likes

Thank you Domenic from your thoughts. Do you have any experience, insights or results (that you can share) about implementing openEHR on SingleStoreDB ?