Business Intelligence tools with openEHR support

Jelte · 1 February 2021 16:30

Does anybody know Business Intelligence (BI) tools that support processing openEHR data?

E.g., to generate reports that contain data from openEHR Compositions combined with data from other sources.

birger.haarbrandt · 1 February 2021 18:50

Not directly BI, but there are tools to directly connect to “R” and Python:

Also, Medexter has an interface to execute Arden Syntax MLMs on openEHR:

We (HiGHmed and vitagroup) are also interested in BI tool connectivity to openEHR and I hope that soon we can offer direct integration with Power BI and such.

ian.mcnicoll · 1 February 2021 21:46

The general approach is to use AQL to extract data in more tabular/condensed formats to a secondary RDBMS and do much of the BI against that.

Some companies have, I understand, developed more sophisticated approaches but I’m not sure if these are available separately to the main CDR product or specific applications.

pablo · 2 February 2021 21:08

I’m not a BI expert, but I think an openEHR data source wouldn’t be different from any other clinical data source from a BI standpoint. Tough openEHR has elements that might add value to a BI environment:

a standard data format (all data represented in the same way helps to simplify ETL required for BI)
a standard query mechanism (it is easy to create new data services over any openEHR server to provide data to a BI environment)

Those elements would help minimize and standardize the development of data brokers, where the data is processed in order to load the data in the BI environment, to be able to generate those reports, indicators, charts, etc.

In my experience with BI, most of the work done for ELT from the data sources to the BI env is custom, since depends on the indicators and types of reports you want out of the BI, but also depends on the source data sets you are working with. It’s very difficult and complex to create a generic tool in this area. So even if the openEHR data is standardized, when you say “tools that support processing openEHR data”, any existing tool might not be applicable to a different environment since it’s created to comply with custom requirements.

Considering that, you should have the desired reports designed beforehand to be able to select a tool to implement that environment. It would be very difficult to apply an existing tool if the requirements are not documented.

An idea to implement the generation of reports considering the data source is our EHRServer (GitHub - ppazos/cabolabs-ehrserver: Open platform to manage and share standardized clinical data, designed by @ppazos at CaboLabs Health Informatics.) would be:

design the reports
design the BI data model (star, snowflake, etc)
define the data model to feed those reports (openEHR templates)
load the templates in EHRServer
load data for those templates in the EHRServer
create queries to extract the data subsets you need to map to your reports
create a broker that maps the output of the queries (JSON/XML) to the BI data model
implement the reports in the BI tool
generate the reports
periodically execute the broker to load updated data into the BI data

Hope that helps.

joostholslag · 7 February 2021 10:48

Thanks for the replies everyone!
We at Nedap are specifically looking for a generalised solution that exports all openEHR data to a tabular format that’s directly importable by tools like powerBI and custom ‘dashboarding’ tools that will user our apis. Since we do not create custom solutions this needs to be coded and implemented by partnering consultancy agencies. We were hoping to save them the trouble off learning the openEHR RM.
I guess we will end up with some compromise of AQL and a tool for third parties to simplify building queries.

birger.haarbrandt · 7 February 2021 15:54

Be aware that we are currently building such functionality (easy to use AQL query builder and CSV exporter) as part of the German COVID-19 platform. Might not be 100% aligned with your needs (as we are still using ADL 1.4 respectively the old OPT format) but might be a good starting point. As this is all gonna be open source under Apache 2 it might be worth taking a look: NUM Forschungsdatenplattform · GitHub and GitHub - ehrbase/aql-editor-backend

joostholslag · 7 February 2021 16:32

Hi Birger,

This looks very interesting! Is there a more descriptive ‘manual’ available? And/or docker images or some other way to play around with this?

birger.haarbrandt · 7 February 2021 16:47

Unfortunately not at the moment as this is still under development. However, I think this should become possible around the beginning of March. Feel free to drop me an e-mail and I will keep you in the loop.

thomas.beale · 7 February 2021 20:16

@pablo do you define templates specifically based on the ETL or report data sets? Do they have embedded queries?

yampeku · 8 February 2021 00:20

One thing I want to try is to embed AQL queries (more like REST calls with AQL queries) in XQuery to extract and transform data. I’d love to have XML data from available openEHR EHR servers, but people seem to be happier with just json

pablo · 8 February 2021 00:53

There should be a relationship but there is an indirect one.

In general, for BI what is needed is some kind of indicator or “measure” in data warehousing lingo, which is the answer to certain question, that could be navigated in different ways. Like “average of X”.

Then you have “dimensions” or “sex”, like time. So you can have “average of X in the last year for Males”.

Those dimensions have a dependency to the original data, maybe aggregated and summarized in a certain way, or could be just the raw original data.

IMHO the “original” data is what should be defined by templates.

Then there is another construct that is the “fact”, which is an event or snapshot of a business event, that was captured in all it’s dimensions (time, sex, …) and is used to calculate the “measure”.

So to feed a data warehouse, we need an ETL to transform the original data defined by templates, into the facts with all their dimensions, in order to be able to calculate the output measures.

In general I wouldn’t represent the facts or the dimensions directly with templates. Maybe there could be some cases where the original data is very similar to the fact tables, but this is not always the case.

What I see as an advantage, which is a very difficult step to start with in BI projects, is with openEHR all the clinical data sources are standardized. Sanitation and standardization of the sources is a required step for any ETL, and openEHR simplifies that a lot having all the data defined by templates.

But there is a missing link, which I think in a certain way is covered by the FHIR mapping language, is to have some formal way to express mappings from openEHR to custom data models and from custom data models to openEHR, which will also cover part of the gap for any ETL, for BI or for simple data migration from legacy to openEHR repos, or between openEHR repos to custom databases like research ones.

Maybe other guys can explain this better, my memory is a littler rusty in terms of BI concepts, but I have the basics.

All this is also why I told @Jelte that specific requirements should be specified before looking for tools, since the ETL is a required but currently custom step for BI.

joostholslag · 8 February 2021 06:43

Does ETL mean extract, transform and load?

joostholslag · 8 February 2021 06:56

Makes sense Pablo, thanks for your elaborate response. The problem is we don’t want to have specific requirements for BI measures since we don’t provide service billed by the hour. Our consultancy partners do this, but they lack openEHR knowledge.
The way we do this with our legacy data is we provide a ‘raw’ MySQL database dump to our customers with all their data we have in our systems. Than the customer hires a 3rd party BI consultant to connect (and ETL?) that database dump to powerBI or a custom build BI tool by one of those BI consultancy companies.
We could do the same with openEHR data, but since one cell contains most of the relevant data (as a json rmObject) the required ETL is beyond the possibilities of BI tools.
So we need a way to either 1:transform and export all openEHR data in a tabular SQL like way, so it’s possible to load it into a BI tool and create the processing and dashboarding there.
Or 2: do the processing in AQL and export as a measure to specific csv (or something like that). But then the consultants will have to learn AQL and openEHR. In that case we as Nedap probably want to invest in a (or adopt an open source) query builder tool to hide most of openEHR complexity.
@birger.haarbrandt s project looks promising!

thomas.beale · 8 February 2021 11:14

ETL may be one of the possible uses of the Subject Proxy Service, which can extract data sets from anywhere, including non-openEHR sources (It’s a more generalised approach of more or less what @rong.chen / Cambio is already doing in their GDL2 systems).

The Subject Proxy approach (currently) allows you to defined typed & named ‘variables’ e.g. ‘sex: CodedTerm’, which might differ from how they are stored in some back-end system (e.g. ‘gender: Coded’ or whatever). Currently the declarations are in ‘Decision Logic Modules’ (spec here), but the same kind of modules could be used to extract data, and generate derived values via rules for an ETL purpose.

I have not tried to think about the ETL use case too much yet, so just providing food for thought, in case it strikes a chord.

pablo · 8 February 2021 12:55

If that is the case, you don’t have any BI requirements there, your requirement is how to export openEHR data so others can read it, which is the first step of the ETL, extraction…

openEHR has a flat format that can be used, but I don’t like it, since most clinical information is hierarchical by definition. If the BI team works with clinical data, I’m sure they understand this. The issue is not openEHR or the format, the issue is how to effectively communicate to another team where the data they need can be found, basically documentation.

You can provide a normal JSON openEHR instance, I mean all the structure, but then you need to document what it means. If they are used to do ETL, they know how to process different data sources and formats. Of course, explaining an openEHR structure is like describing the template used to defined the structure, and that is like a CDA implementation guide. This kind of guide, is really helpful for big ETL projects, because there is no common language to express the ETL, something I mentioned above.

Then what they need to do is to implement the mappings from the openEHR data to extract each data point they require. This is part of the custom requirements, the other part is the data you are actually providing, so you can’t escape the “custom” part. With the data points selected, they will proceed with the transformation and loading, then the rest.

In the EHRServer, when you create a datavalue query, you can set the type of result you want from that query, it could be grouped by path or composition. By path, is like having a table of results, which is what you might use for the initial step of ETL. This is a guide on how to use it: https://www.youtube.com/watch?v=zSftiFBjboE&t=2s

My description doesn’t consider your business model, you need to adapt the parts to your own reality. This is a general description on how any kind of integration is done for analytics and BI projects.

ian.mcnicoll · 8 February 2021 14:50

Nice. I suspect most of the CDR vendors have some kind of ‘shaping’ service - Better have a simple /query/csv out put but also a more sophisticated sever-side js post-processing later thet lets you shape data. I kmopw that Code24 also to have something similar and Ocean have a lot of experience, and some interesting aql->BI adaptors in this area.

pablo · 8 February 2021 16:00

Exactly, different implementations might have their own outputs for queries. In fact I have a ticket to implement CSV output for the EHRServer but didn’t have the time to implement it

Considering that, it wouldn’t be difficult to implement such service outside the CDR, like a broker between the CDR and the BI environment.

joostholslag · 12 February 2021 13:09

Yes! thank you for rephrasing my question, sometimes asking good questions is very hard (at least for me)

So we are looking for a generic way to export any openEHR data to csv/sql in a way that makes it easy to transform and load the data into regular BI tools. Without the one doing the transform and load needing (in depth) openEHR knowledge.

The issue is not openEHR or the format
I disagree: our current/legacy medical data is in a (my)sequel db where every column contains a specific kind of information e.g. column 1: patient id, 2: time of measurement 3: body temperature.
If i give this to a BI consultant it’s easy for them to load this into a BI tool. It’s not as easy if I give a db copy where column 3 contains a json RM object.

I’m currently watching

This is a guide on how to use it: EHRServer Flow Demo Step by Step: load and query openEHR clinical data - YouTube

great that you made these videos!

Not sure you are aware, but we have our own openEHR CDR implementation. So we would probably need to build something like

Better have a simple /query/csv out put

We were hoping for a plugin to let e.g. powerBI import json rm objects, a json rm object → csv exporter java library or an aql builder to more easily create usecase specific EHR.

I think this topic gives a nice overview of the implementation scenario’s and current availble solutions: proprietary and custom. Thanks for the input everyone!

I think us running into building this ourselfs, and ehrbase building this already is an argument against building your own CDR:)

pablo · 12 February 2021 18:02

This is your implementation and your way to communicate with your partner.

The real problem is they can’t process something they don’t understand if you just dump openEHR data to them, because that is a very ineffective way of communication. IMO when you provide all the information (note I didn’t use “data” here but a broad definition of “information”), they should be able to process that. So the issue is about communication (human to human!), not data format.

Another thing is: openEHR doesn’t tell you how to store your data. If you want you can put each data point in a different column in an relational database, or you can use a JSON database, or text files, openEHR doesn’t provide a data storage schema or schema-less recommendation.

You need to provide “information” to your partners. That is: the data, the definition of the data, the schemas of the formats, and any hints on how to find the specific data they need. All this is not an openEHR concern in terms of the scope of the specifications.

ian.mcnicoll · 13 February 2021 11:27

Hi Joost - good discussion but what’s important to recognise is tat this is not really an openEHR problem, it is a generic issue of the growing mismatch of the complexity of data inside a semantically-rich data ecosystem ,and the need for outsiders to navigate that easily.

The simple DB example you gave makes sense until you have to extract only ‘some’ kinds of data e.g only that recorded in a care home, or as part of a cvid assessment or … at that point you need a much more detailed understanding of the table structures, relationships etc. (and they might need ot change for technical reasons.

This is where openEHR scores , at one level - it is just one giant queryable data tree where the structure of that tree, and related querying is described by archetypes and templates.

The down-side is that as the scope and complexity of the data tree grows, even though it is technically accessible, it remains a very complex space. That complexity needs to be opaque to BI tools)or indeed possibly third-party app developers, that just want a few selected data points from within this complex space.

We know that this is equally challenging to other sectors /tech where the internal dataspace starts to get really complex. @pablo mentioned Star schema - Star schema - Wikipedia.

My approach would be to write AQL’s that returned scalar values and then export as CSV (or more sophisticated tabular output.

Several CDR suppliers also have more sophisticated ‘data-bending’ scripting that sits on top of AQL or other openEHR querying and can br run server-side to present simpler formats.

e.g. View-AgeBandsSearch API.md (9.2 KB)