Does anybody know Business Intelligence (BI) tools that support processing openEHR data?
E.g., to generate reports that contain data from openEHR Compositions combined with data from other sources.
Does anybody know Business Intelligence (BI) tools that support processing openEHR data?
E.g., to generate reports that contain data from openEHR Compositions combined with data from other sources.
Not directly BI, but there are tools to directly connect to âRâ and Python:
Also, Medexter has an interface to execute Arden Syntax MLMs on openEHR:
We (HiGHmed and vitagroup) are also interested in BI tool connectivity to openEHR and I hope that soon we can offer direct integration with Power BI and such.
The general approach is to use AQL to extract data in more tabular/condensed formats to a secondary RDBMS and do much of the BI against that.
Some companies have, I understand, developed more sophisticated approaches but Iâm not sure if these are available separately to the main CDR product or specific applications.
Iâm not a BI expert, but I think an openEHR data source wouldnât be different from any other clinical data source from a BI standpoint. Tough openEHR has elements that might add value to a BI environment:
Those elements would help minimize and standardize the development of data brokers, where the data is processed in order to load the data in the BI environment, to be able to generate those reports, indicators, charts, etc.
In my experience with BI, most of the work done for ELT from the data sources to the BI env is custom, since depends on the indicators and types of reports you want out of the BI, but also depends on the source data sets you are working with. Itâs very difficult and complex to create a generic tool in this area. So even if the openEHR data is standardized, when you say âtools that support processing openEHR dataâ, any existing tool might not be applicable to a different environment since itâs created to comply with custom requirements.
Considering that, you should have the desired reports designed beforehand to be able to select a tool to implement that environment. It would be very difficult to apply an existing tool if the requirements are not documented.
An idea to implement the generation of reports considering the data source is our EHRServer (GitHub - ppazos/cabolabs-ehrserver: Open platform to manage and share standardized clinical data, designed by @ppazos at CaboLabs Health Informatics.) would be:
Hope that helps.
Thanks for the replies everyone!
We at Nedap are specifically looking for a generalised solution that exports all openEHR data to a tabular format thatâs directly importable by tools like powerBI and custom âdashboardingâ tools that will user our apis. Since we do not create custom solutions this needs to be coded and implemented by partnering consultancy agencies. We were hoping to save them the trouble off learning the openEHR RM.
I guess we will end up with some compromise of AQL and a tool for third parties to simplify building queries.
Be aware that we are currently building such functionality (easy to use AQL query builder and CSV exporter) as part of the German COVID-19 platform. Might not be 100% aligned with your needs (as we are still using ADL 1.4 respectively the old OPT format) but might be a good starting point. As this is all gonna be open source under Apache 2 it might be worth taking a look: NUM Forschungsdatenplattform ¡ GitHub and GitHub - ehrbase/aql-editor-backend
Hi Birger,
This looks very interesting! Is there a more descriptive âmanualâ available? And/or docker images or some other way to play around with this?
Unfortunately not at the moment as this is still under development. However, I think this should become possible around the beginning of March. Feel free to drop me an e-mail and I will keep you in the loop.
@pablo do you define templates specifically based on the ETL or report data sets? Do they have embedded queries?
One thing I want to try is to embed AQL queries (more like REST calls with AQL queries) in XQuery to extract and transform data. Iâd love to have XML data from available openEHR EHR servers, but people seem to be happier with just json
There should be a relationship but there is an indirect one.
In general, for BI what is needed is some kind of indicator or âmeasureâ in data warehousing lingo, which is the answer to certain question, that could be navigated in different ways. Like âaverage of Xâ.
Then you have âdimensionsâ or âsexâ, like time. So you can have âaverage of X in the last year for Malesâ.
Those dimensions have a dependency to the original data, maybe aggregated and summarized in a certain way, or could be just the raw original data.
IMHO the âoriginalâ data is what should be defined by templates.
Then there is another construct that is the âfactâ, which is an event or snapshot of a business event, that was captured in all itâs dimensions (time, sex, âŚ) and is used to calculate the âmeasureâ.
So to feed a data warehouse, we need an ETL to transform the original data defined by templates, into the facts with all their dimensions, in order to be able to calculate the output measures.
In general I wouldnât represent the facts or the dimensions directly with templates. Maybe there could be some cases where the original data is very similar to the fact tables, but this is not always the case.
What I see as an advantage, which is a very difficult step to start with in BI projects, is with openEHR all the clinical data sources are standardized. Sanitation and standardization of the sources is a required step for any ETL, and openEHR simplifies that a lot having all the data defined by templates.
But there is a missing link, which I think in a certain way is covered by the FHIR mapping language, is to have some formal way to express mappings from openEHR to custom data models and from custom data models to openEHR, which will also cover part of the gap for any ETL, for BI or for simple data migration from legacy to openEHR repos, or between openEHR repos to custom databases like research ones.
Maybe other guys can explain this better, my memory is a littler rusty in terms of BI concepts, but I have the basics.
All this is also why I told @Jelte that specific requirements should be specified before looking for tools, since the ETL is a required but currently custom step for BI.
Does ETL mean extract, transform and load?
Makes sense Pablo, thanks for your elaborate response. The problem is we donât want to have specific requirements for BI measures since we donât provide service billed by the hour. Our consultancy partners do this, but they lack openEHR knowledge.
The way we do this with our legacy data is we provide a ârawâ MySQL database dump to our customers with all their data we have in our systems. Than the customer hires a 3rd party BI consultant to connect (and ETL?) that database dump to powerBI or a custom build BI tool by one of those BI consultancy companies.
We could do the same with openEHR data, but since one cell contains most of the relevant data (as a json rmObject) the required ETL is beyond the possibilities of BI tools.
So we need a way to either 1:transform and export all openEHR data in a tabular SQL like way, so itâs possible to load it into a BI tool and create the processing and dashboarding there.
Or 2: do the processing in AQL and export as a measure to specific csv (or something like that). But then the consultants will have to learn AQL and openEHR. In that case we as Nedap probably want to invest in a (or adopt an open source) query builder tool to hide most of openEHR complexity.
@birger.haarbrandt s project looks promising!
ETL may be one of the possible uses of the Subject Proxy Service, which can extract data sets from anywhere, including non-openEHR sources (Itâs a more generalised approach of more or less what @rong.chen / Cambio is already doing in their GDL2 systems).
The Subject Proxy approach (currently) allows you to defined typed & named âvariablesâ e.g. âsex: CodedTermâ, which might differ from how they are stored in some back-end system (e.g. âgender: Codedâ or whatever). Currently the declarations are in âDecision Logic Modulesâ (spec here), but the same kind of modules could be used to extract data, and generate derived values via rules for an ETL purpose.
I have not tried to think about the ETL use case too much yet, so just providing food for thought, in case it strikes a chord.
If that is the case, you donât have any BI requirements there, your requirement is how to export openEHR data so others can read it, which is the first step of the ETL, extractionâŚ
openEHR has a flat format that can be used, but I donât like it, since most clinical information is hierarchical by definition. If the BI team works with clinical data, Iâm sure they understand this. The issue is not openEHR or the format, the issue is how to effectively communicate to another team where the data they need can be found, basically documentation.
You can provide a normal JSON openEHR instance, I mean all the structure, but then you need to document what it means. If they are used to do ETL, they know how to process different data sources and formats. Of course, explaining an openEHR structure is like describing the template used to defined the structure, and that is like a CDA implementation guide. This kind of guide, is really helpful for big ETL projects, because there is no common language to express the ETL, something I mentioned above.
Then what they need to do is to implement the mappings from the openEHR data to extract each data point they require. This is part of the custom requirements, the other part is the data you are actually providing, so you canât escape the âcustomâ part. With the data points selected, they will proceed with the transformation and loading, then the rest.
In the EHRServer, when you create a datavalue query, you can set the type of result you want from that query, it could be grouped by path or composition. By path, is like having a table of results, which is what you might use for the initial step of ETL. This is a guide on how to use it: https://www.youtube.com/watch?v=zSftiFBjboE&t=2s
My description doesnât consider your business model, you need to adapt the parts to your own reality. This is a general description on how any kind of integration is done for analytics and BI projects.
Nice. I suspect most of the CDR vendors have some kind of âshapingâ service - Better have a simple /query/csv out put but also a more sophisticated sever-side js post-processing later thet lets you shape data. I kmopw that Code24 also to have something similar and Ocean have a lot of experience, and some interesting aql->BI adaptors in this area.
Exactly, different implementations might have their own outputs for queries. In fact I have a ticket to implement CSV output for the EHRServer but didnât have the time to implement it
Considering that, it wouldnât be difficult to implement such service outside the CDR, like a broker between the CDR and the BI environment.
Yes! thank you for rephrasing my question, sometimes asking good questions is very hard (at least for me)
So we are looking for a generic way to export any openEHR data to csv/sql in a way that makes it easy to transform and load the data into regular BI tools. Without the one doing the transform and load needing (in depth) openEHR knowledge.
The issue is not openEHR or the format
I disagree: our current/legacy medical data is in a (my)sequel db where every column contains a specific kind of information e.g. column 1: patient id, 2: time of measurement 3: body temperature.
If i give this to a BI consultant itâs easy for them to load this into a BI tool. Itâs not as easy if I give a db copy where column 3 contains a json RM object.
Iâm currently watching
This is a guide on how to use it: EHRServer Flow Demo Step by Step: load and query openEHR clinical data - YouTube
great that you made these videos!
Not sure you are aware, but we have our own openEHR CDR implementation. So we would probably need to build something like
Better have a simple /query/csv out put
We were hoping for a plugin to let e.g. powerBI import json rm objects, a json rm object â csv exporter java library or an aql builder to more easily create usecase specific EHR.
I think this topic gives a nice overview of the implementation scenarioâs and current availble solutions: proprietary and custom. Thanks for the input everyone!
I think us running into building this ourselfs, and ehrbase building this already is an argument against building your own CDR:)
This is your implementation and your way to communicate with your partner.
The real problem is they canât process something they donât understand if you just dump openEHR data to them, because that is a very ineffective way of communication. IMO when you provide all the information (note I didnât use âdataâ here but a broad definition of âinformationâ), they should be able to process that. So the issue is about communication (human to human!), not data format.
Another thing is: openEHR doesnât tell you how to store your data. If you want you can put each data point in a different column in an relational database, or you can use a JSON database, or text files, openEHR doesnât provide a data storage schema or schema-less recommendation.
You need to provide âinformationâ to your partners. That is: the data, the definition of the data, the schemas of the formats, and any hints on how to find the specific data they need. All this is not an openEHR concern in terms of the scope of the specifications.
Hi Joost - good discussion but whatâs important to recognise is tat this is not really an openEHR problem, it is a generic issue of the growing mismatch of the complexity of data inside a semantically-rich data ecosystem ,and the need for outsiders to navigate that easily.
The simple DB example you gave makes sense until you have to extract only âsomeâ kinds of data e.g only that recorded in a care home, or as part of a cvid assessment or ⌠at that point you need a much more detailed understanding of the table structures, relationships etc. (and they might need ot change for technical reasons.
This is where openEHR scores , at one level - it is just one giant queryable data tree where the structure of that tree, and related querying is described by archetypes and templates.
The down-side is that as the scope and complexity of the data tree grows, even though it is technically accessible, it remains a very complex space. That complexity needs to be opaque to BI tools)or indeed possibly third-party app developers, that just want a few selected data points from within this complex space.
We know that this is equally challenging to other sectors /tech where the internal dataspace starts to get really complex. @pablo mentioned Star schema - Star schema - Wikipedia.
My approach would be to write AQLâs that returned scalar values and then export as CSV (or more sophisticated tabular output.
Several CDR suppliers also have more sophisticated âdata-bendingâ scripting that sits on top of AQL or other openEHR querying and can br run server-side to present simpler formats.
e.g. View-AgeBandsSearch API.md (9.2 KB)