Why use AQL instead of SQL in openEHR?

Nathan · 14 June 2022 06:03

Would someone be able to articulate clearly for non-technical types like me why Archetype Query Language (AQL) has been developed for and used in openEHR, instead of the widely available generic alternatives?

There is quite a bit online about AQL, but I haven’t yet found anything that explains it clearly to the layperson.

It would be great to have a ‘short version’ and a ‘long version’!

siljelb · 14 June 2022 06:45

Short short version from a clinical modeller/low code dev pov:

You can get an entire composition (“clinical document”) of standardised structured hierarchical information with one short line, or one very specific data element from a very specific context with a slightly longer line with more paths involved.
You don’t have to know or think about tables or anything like that, just the archetypes and templates.

birger.haarbrandt · 14 June 2022 07:40

The idea is to define queries completely on the domain level. This means that AQL does not “know” anything about the underlying database technology and therefore is technology and vendor agnostic. This allows vastly improved portability of applications and algorithms.

We also find it a very good abstraction to make querying easier. To give an example, compare the expressions from EHRbase in AQL vs. the automatically derived SQL query for obtaining all body temperature measurements for a patient if the measurement is >37.5°C:

AQL:

SELECT o/data[at0002]/events[at0003]/data[at0001]/items[at0004]/value/magnitude
  FROM EHR e
    CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.report.v1]
    CONTAINS OBSERVATION o [openEHR-EHR-OBSERVATION.body_temperature.v2]
 WHERE e/ehr_id/value = '65e7badd-15a9-42ac-95da-cef9c758141b'
    AND o/data[at0002]/events[at0003]/data[at0001]/items[at0004]/value/magnitude > 37.5
 LIMIT 10
 ORDER BY o/data[at0002]/events[at0003]/data[at0001]/items[at0004]/value/magnitude DESC

SQL:

select "alias_44068444"."/data[at0002]/events[at0003]/data[at0001]/items[at0004]/value/magnitude"
from (select array_1454511093_13.var_1454511093_14::numeric as "/data[at0002]/events[at0003]/data[at0001]/items[at0004]/value/magnitude"
      from "ehr"."entry"
               right outer join "ehr"."composition" as "composition_join"
                                on "composition_join"."id" = "ehr"."entry"."composition_id"
               right outer join "ehr"."ehr" as "ehr_join" on "ehr_join"."id" = "composition_join"."ehr_id"
               join lateral (
          select cast((ehr.xjsonb_array_elements(("ehr"."entry"."entry" #>>
                                                  '{/composition[openEHR-EHR-COMPOSITION.report.v1],/content[openEHR-EHR-SECTION.adhoc.v1]}')::jsonb) #>>
                       '{/items[openEHR-EHR-OBSERVATION.body_temperature.v2],0,/data[at0002],/events,/events[at0003],0,/data[at0001],/items[at0004],0,/value,magnitude}') as numeric)
                     AS var_1454511093_14) as "array_1454511093_13" on true
      where ("ehr"."entry"."template_id" = 'Corona_Anamnese' and
             ("ehr_join"."id" = '65e7badd-15a9-42ac-95da-cef9c758141b' AND
              array_1454511093_13.var_1454511093_14 > 37.5))) as "alias_44068444"
order by "/data[at0002]/events[at0003]/data[at0001]/items[at0004]/value/magnitude" desc
offset 0 rows fetch next 10 rows only

Of course the complexity of the SQL query can look differently in other systems, but I think this gives a rough idea why doing some abstraction is helpful.

ian.mcnicoll · 14 June 2022 08:08

Here goes!!

Short version

SQL is a widely used ‘physical’ query language for relational databases. To create an SQL, you need to know the exact table structure and table relationships, which in healthIT systems can be very, very complex and are also unique to each application.

Making relational databases work efficiently require ‘normalisation’ which can make understanding the logical connections between data items very hard to maintain and understand. Often the logical tree-like hierarchy of health data gets very convoluted when represented in relational databases.

When you query a relational database, your ‘schema’ (map) is the table structure itself, not the logical view of the data you would use when programming

Not every healthIT system uses relational databases e.g EPIC uses a version of MUMPS. which does not use SQL .

AQL is not a physical query language - it is a logical query language. openEHR does not decree what kind of database is use in a CDR - it might be a relational database , or MUMPS, or MongoDB. When you query with AQL your schema /map’ is provided by the archetype and templates that are in use in your CDR, not the schema of the underlying database. It is up to the CDR provider to convert the AQL, on the fly, to whatever physical query language the internal database uses.

Although most current CDR providers do use relational databases in some way, the tables inside are optimised around the core Reference Model concepts like Composition and Entry. You will not find a Blood pressures or Medications table.

Using a logical query language like this makes an openEHR datastore tech-neutral i.e id I change my CDR provider to use a new faster/cheaper product which e.g. uses MongoDb rather than a relational database, my application or reporting queries do not change.

Even more importantly this hands control of data access to the care provider - there is no need to ask the CDR vendor to open up and explain or adapt their internal data structure.

Of course, where SQL still really holds sway is in the reporting world, and most CDRS / AQL can export the data in a way that can be readily used by SQL-based reporting databases. The use of a secondary datastore for reporting purposes is extremely common , even when the primary datastore is SQL-based, just because of privacy and performance issues.

ian.mcnicoll · 14 June 2022 08:26

birger.haarbrandt:

SELECT o/data['data']/events['Any event']/data['Tree']/items['Temperature']/value/magnitude as Systolic
  FROM EHR e
    CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.report.v1]
    CONTAINS OBSERVATION o [openEHR-EHR-OBSERVATION.body_temperature.v2]
 WHERE e/ehr_id/value = '65e7badd-15a9-42ac-95da-cef9c758141b'
    AND  o/data['data']/events['Any event']/data['Tree']/items['Temperature']/value/magnitude >  37.5
 LIMIT 10
 ORDER BY o/data['data']/events['Any event']/data['Tree']/items['Temperature']/value/magnitude DESC

I’ve tidied @birger.haarbrandt 's nice example to add the human names in the AQL paths instead of the atCodes (which are defined in the archetype and make it language-neutral) , just to make it clearer for clinical folks/newbies.

the ehr/ehrId/value is the patient identifier. So this query says give me any temperatures for patient with id
where the temperature is greater than 37.5

If you want a sense of how differently individual applications represent health data at database level have a look at this blog about a JAMIA paper on interoperability

bna · 14 June 2022 15:26

That’s good for newbies. The beauty of AQL is that it is vendor neutral and language agnostic. Which is a solid foundation for portable applications. There is an entrance barrier from “good old database” thinking. But it’s a small exchange for portability and data that survive system changes or upgrades.

thomas.beale · 14 June 2022 18:42

Exactly right. AQL is designed so that queries are based only on clinical content models (archetypes). If these are shared across systems, then the queries will work on those systems. Which means that application forms, CDS, reporting, ETL, BI and everything else relying on querying can be portable, regardless of what concrete DB solution / approach has been chosen in those systems.

Model-based portability instead of DB schema lock-in.

pablo · 17 June 2022 00:08

Short version (IMHO): an openEHR data model / schema is defined by archetypes not by tables/columns, it’s a higher level construct, where the underlying implementation at the persistence level could be relational, document oriented, object oriented, key/value, or whatever technology you can find out there. This openEHR “data model” is independent from the implementation technology, so the standard specification can’t provide a SQL way of querying, because that would imply an implementation technology is required, and it isn’t. Though if a relational DB is used as implementation technology, then somewhere in the AQL interpreter, it does an AQL to SQL conversion, but that is an internal software component, not a user-facing feature.

Though AQL is part of the openEHR specs, there are many ways of querying an openEHR data store without AQL at the user-facing querying interface, but every openEHR query mechanism will rely heavily on archetype IDs and paths, because that is the openEHR “data schema definition”.

This could give you an idea of a non-AQL querying mechanism to query openEHR data, that can be used by non techies (it’s all point and click in a UI, there is no need of knowing a specific language). I think you will be able to see how difficult/complex would be to create a similar query in plain SQL, and also see the value on having clinical professionals creating their own queries on the fly EHRServer: openEHR queries + SNOMED CT expressions 2 - YouTube

Hope that helps.