Ehrbase Performance Issue

We are using EHRbase (v2.5) in production with approximately 150 GB of clinical data.
Infrastructure
EHRbase (Kubernetes):

  • Replicas: 3

  • CPU: 2–4 cores per pod

  • Memory: 4–16 GB per pod

Database:

  • CPU: 8 cores

Our data model includes relatively standard lab-related archetypes:

  • openEHR-EHR-OBSERVATION.laboratory_test_result.v1

  • openEHR-EHR-CLUSTER.specimen.v1

  • openEHR-EHR-CLUSTER.laboratory_test_analyte.v1

Use Case

We are executing a cohort-style AQL query to retrieve:

  • Lab tests conducted for a given day

  • Corresponding test results

  • Associated specimen details

Essentially, this is a time-filtered query (single day) over lab observations with related clusters.

Problem is Even when querying for a single day, the response time is very high ~20 min.

Are there known optimizations or indexing strategies in EHRbase/PostgreSQL for improving performance of such AQL queries.

PS:

select e/ehr_id/value as ehr_id, c/uid/value as composition_id, c/name/value as template_id, c1/items[at0001]/value/value as case_id, o1/data[at0001]/events[at0002]/data[at0003]/items[at0005]/value/value as test_name, o1/data[at0001]/events[at0002]/data[at0003]/items[at0005]/value/defining_code/code_string as test_name_code, o1/data[at0001]/events[at0002]/data[at0003]/items[at0005]/value/defining_code/terminology_id/value as test_name_codesystem, o1/data[at0001]/events[at0002]/data[at0003]/items[at0073]/value/value as overal_test_status from EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.report.v1] CONTAINS OBSERVATION o1[openEHR-EHR-OBSERVATION.laboratory_test_result.v1] AND CLUSTER c1[openEHR-EHR-CLUSTER.case_identification.v0] WHERE c/archetype_details/template_id/value=‘lab_report_template.en.v0’ AND c/context/start_time/value>=‘2026-04-01’ AND c/context/start_time/value<=‘2026-04-02’

@Mancini before thinking of indexes at the DB level, I want to mention some options:

  1. (tip) When comparing start_time, use the time object in comparisons instead of /value, so c/context/start_time>"2026-04-01"
  2. I see in your projections that you are reusing a common path o1/data[at0001]/events[at0002]/data[at0003], maybe think of extracting it in a contains if this still fulfills your needs
  3. Split this into 2 queries: one that only handles the retrieval of composition IDs that match your criteria, and another one that does the projections

For more dedicated support you can reach out to me in DMs

Thanks @vidi42