INFOBANCO: an openEHR-based health research platform

damoca · 13 July 2023 09:28

Hello,

I want to share with all of your the results of the INFOBANCO project, that has been developed between April 2022 and June 2023 for the Madrid Health Service, Spain. Veratech for Health has had the privilege of participating in it from its initial concept to its final implementation, encouraging the adoption of openEHR and the archetype modeling methodology.

INFOBANCO is the result of a Public Procurement of Innovation project with the aim of building a regional data platform for health research. This platform is intended to provide services to clinicians, managers, and researchers, making it possible to combine data from multiple sources. It is equipped with tools for data governance, collection, transformation, interrogation, visualization, and analysis to obtain knowledge and to support decision making.

The architecture of the INFOBANCO platform can be seen in the following figure:

The innovative idea behind this architecture is to put an openEHR CDR in the center of a research platform, and to use it as the source for transformations of data (ETL processes) to other health data standards commonly used in the clinical research field (OMOP CDM, HL7 FHIR, CDISC ODM, i2b2). The hypothesis for this work was that the openEHR reference model and archetypes provide the most complete set of information (both health information and context information) to feed any other information model used by other standards.

Components of the platform:

Inputs. Two different information systems have been integrated, including the EHR information from Hospital 12 de octubre and EHR information from the Primary Care area.
Data lake. A first repository for raw data integration, facilitating a single entry point to process it. This data lake offers data in multiple layers: raw (data as is in its origin), curated (basic normalization, such as date or number formats), and consumption (classification/organization of data according to its domain).
openEHR CDR. Data from the data lake has been normalized following openEHR archetypes and templates. Initially, only data covered by existing archetypes and required by the output formats has been included in the openEHR CDR. This CDR is built using the Better platform.
Standard outputs. ETL processes have been implemented to convert openEHR data to other standard formats. The usual pathway has been the selection of relevant data for each output using AQL first, and then the implementation of data transformations using the most adequate technology in each case: Python, Java, Pentaho.
Non-standard outputs. Some use cases required information that has not yet been included in the openEHR CDR (mostly data not covered by existing archetypes or internal management data from the input systems). In those cases, for example to build a BI control panel, data can still be accessed directly from the data lake.

The tasks of the project did not include any specific archetype modeling activity. Only templates were created using already existing archetypes. At this stage, more than 35 existing archetypes from the CKM were used to build 21 templates representing data such as Demographics, Encounters, Health problems, Medication administration, Immunizations, Alerts, Phenotype report, Genomic report, Family history, etc.

By the end of June 2023, the platform has been completed and the project has finished. A first set of 100.000 patients have been loaded into the platform, with the intention of loading the 450.000 patients of Hospital 12 de octubre in the following months, and up to the 6.5 million patients of the Madrid region in the foreseen future.

This project has been made possible thanks to the collaboration of the following organizations:

Hospital 12 de octubre, Madrid
Área de Atención Primaria, Madrid
Veratech for health
NTT Data Spain
RHEA Group
Better

Funding and management:

European Union, European Regional Development Fund (ERDF)
Ministerio de Sanidad de España
Consejería de Sanidad de la Comunidad de Madrid

More information:

JillRiley · 13 July 2023 09:54

@damoca can we use this information publicly; thinking I’d ask Pete to do something on this?

damoca · 13 July 2023 10:20

Yes, of course. All this information was presented in a public event in Madrid two weeks ago, and also presented in the openEHR conference in Barcelona by Miguel Pedrera.

Kanthan_Theivendran · 24 July 2023 10:56

This some amazing work. Well done

olivier.franssen · 11 February 2025 08:46

Hello, when I look at the first diagram above, it seems that the BI part is not based on the OpenEHR CDR, but on the upstream datalake. Could you give a little more information on this architecture ? How is the data lake datamodel related to OpenEHR ?

damoca · 11 February 2025 11:07

The datalake has three main purposes:

To provide a uniform access layer to all sources of data
To store all kinds of detailed information that is not normalized/included in openEHR archetypes (management info of the clinical processes mainly)
To support all the existing applications, especially existing BI processes.

Regarding this last point, the hospital has a big number of BI components running (hundreds, if I remember well). The idea was to keep all that components without modifications. Thus, the datalake is the best source for them.

In the future, new components can be built above the openEHR CDR, but that was not the main objective at this initial deployment.

olivier.franssen · 11 February 2025 12:18

Thanks David.
Another related question. In the OpenEHR CDR, how do you take into account the time perspective in tables that would be the source of slowly changing dimensions. For example, keeping the different historical versions of a patient, a care provider, an hospital ward, …).
Olivier

Topic		Replies	Views
OHDSI - and openEHR OHDSI OMOP standards	27	6053	25 October 2022
The honest story about implementing openEHR as an EHR-vendor New to openEHR?	9	2029	26 February 2026
BI Capabilities on openEHR Platform	5	762	13 July 2022
RFI: Datalake combined with openEHR CDRs - combined data dictionaries etc Procurements	13	568	12 September 2024
What would you like to see in a new openEHR CDR? Tools	13	892	25 March 2023
Data extraction from openEHR - a demanding and challenging task? Implementation	25	677	19 August 2025
New paper: Archetype-based data warehouse environment Clinical (archive)	4	8	2 July 2015
Region of Catalonia - Award of the tender for the service of CDR platform Procurements	16	1777	2 May 2024
INFORMATION ABOUT THE SYSTEM Technical (archive)	6	14	21 February 2017
Will we fight the next pandemic on openEHR? Clinical	7	626	30 November 2023

INFOBANCO: an openEHR-based health research platform

Related topics