Data extraction from openEHR - a demanding and challenging task?

pablo · 17 August 2025 02:52

This is actually the most time consuming task, but it’s totally independent from the source of the data, it could be a plain text file, csv, SQL, whatever, even non-tabular. The challenge is to annotate the source data with enough information that can actually be mapped to an openEHR reference model hierarchy. With that part done, it’s very easy to generate an OPT.

So the challenges are:

Design the right metadata, that’s to consider the openEHR entry ontology and the data value model.
And actually doing the source annotation.

With those two steps done, generating the OPT is trivial.

Our openEHR CDR implementation (https://atomik.app/) is 100% relational, so there’s no problem there. It will always depend on how you design your relational database (you need enough semantics to support your use case and also to extract openEHR-valid stuff from it). You can get creative and use a mixed approach too, like relational+files or relational+document (note that could be in the same DBMS or by combining two different things).

borut.jures · 17 August 2025 11:34

openEHR archetypes use 2-level modeling. All(?) CDRs implementations store data at the “RM” level. The 100% relational approach would be to store the data at the “domain” level.

There are research papers on using object-relational-mapping to achieve this but I’m not aware of any implementation using ORM to store openEHR data.

What I want to test is transforming RM data to primitive SQL tables and storing openEHR data using traditional RDBMS approaches. Additionally all relationships between RM objects are also using native foreign keys. This way clinical modelers use the 2-level modeling to model archetypes, which are then stored as “low-level” RDBMS tables (I don’t have a good name for this “traditional” way of storing data in relational tables).

This means there is a “blood pressure” table in the DB. This has been controversial (but requested by the followers of the Domain Driven Design). I believe @NY_Frank has a similar approach in mind with “an initially RDBMS-based model”.

I want to implement a CDR using both approaches and then compare their performance. It is just an interesting project to avoid boredom

pablo · 17 August 2025 15:37

Well, Atomik uses 100% relational with ORM.

It’s based on the EHRServer approach and tech stack, though it had been improved, extended and optimized in many areas.

I think you checked the EHRServer already, which was the first approach to an open source openEHR CDR back from 2012-2013, and was derived from my thesis work from a couple of years earlier, which was called EHRGen and worked with archetypes instead of templates. In that one I merged together the CDR and the data entry, so it was an app with openEHR storage, with all forms autogenerated on the flight. I think that’s also in my github if you want to check it. EHRGen is also on the same tech stack and uses ORM. In fact I’ve been using that tech stack with ORM since 2007.

Seref · 18 August 2025 12:09

It’s OK to defer these, but you’ll almost certainly find it too challenging a task bolting these on later to your relational implementation.

That’s OK too. I’ve long advocated benefiting from openEHR at whatever level works for you. If you look at the models in the international CKM and draw something on a napkin inspired by those, that’s a win in my book.

That being said, analysis on openEHR data is a different beast. I’ll be talking about this in the upcoming Barcelona event: we’ve been running an analytics server for various clients for years now. You can represent openEHR data in a relational format, sure, but scaling that in different dimensions (usability, performance etc) is not a trivial task.

You can certainly drop some of the design concerns related to OLTP in your design, but I don’t want to say that it’ll be a walk in the park when you mention patient journey views etc. The more you optimise for analytics, the further you’ll have to go from designs that can support AQL, composition persistence/retrieval in a performant way. Others may disagree, but that’s my 2 pennies.

So my feedback is: you can build something for analytics inspired by openEHR, but I’d say you’ll find it difficult to grow that into a more single EHR/care centric CDR later.

ian.mcnicoll · 19 August 2025 09:50

Great advice from @Seref .

I’m a ‘clinical hacktitioner’ not a developer, and I can see the attraction of using openEHR archetypes purely to guide the clinical content, but build in a more traditional RDBMS fashion, particularly if a key output needs to play nicely in the SQL world for reporting purposes. I can also understand how diving straight in to ‘full openEHR’ can seem very daunting, anbd indeed if you try to build your own CDR, it is a significant challenge, to do in a performant way.

However there is a third way, which is to use an existing openEHR CDR, amd I guess in your situation an open source example such as Pablo’s Atomik, or Ehrbase.

The major advantage is that you can then get into the really tricky areas of designing your data content. via archetypes/templates with real-time deployment and all the advantages of full versioning, queryability via AQLetc .

ian.mcnicoll · 19 August 2025 11:37

and even if you decide to revert to RDBMS or even DIY CDR, you will have a much better idea of what the openEHR CDR ecosystem give you for free and how easy or otherwise it is to export AQL resultsets to more SQL-flavour outputs.

And definitely as Seref said , building for reporting/ analytics and then back-building to support direct care, is IMO almost impossible, going from direct care to reporting is doable.

We have been working with a multi-national specialist renal medicine provider who are using openEHR to normalise outputs from local existing legacy EPRs into a normalised data platform using openEHR, primarily for analytics purposes but with a view to support direct care, and the archetype and template shaped accordingly. So not too far from your use case.

Topic		Replies	Views
The honest story about implementing openEHR as an EHR-vendor New to openEHR?	9	1992	26 February 2026
Business Intelligence tools with openEHR support Integration	21	2357	11 September 2024
Dual Model EHR implementation Technical (archive)	23	13	8 June 2011
openEHR - ETL - sharing is caring Specifications aql	13	1233	6 May 2023
openEHR API implementation in Rust Implementation	44	1747	23 June 2023
Separating Models from Implementation New to openEHR?	107	5625	11 December 2021
Relationship between FHIR and openEHR HL7 FHIR	22	14705	8 July 2024
openEHR archetypes as SQL Tables General Discussion aql , archetype	58	971	26 February 2026
OHDSI - and openEHR OHDSI OMOP standards	27	6037	25 October 2022
Modeling generic concepts, considerations for querying Clinical (archive)	17	14	10 October 2017

Data extraction from openEHR - a demanding and challenging task?

Related topics