Improve AQL to simplifying the querying of terms

sebastian.iancu · 17 April 2024 09:21

This is a spinoff from How to make mappings easier to query and to document in openEHR and [EHRbase] Storing and querying data without an standalone archetype focusing on issue 4 as described below:

How to make mappings easier to query and to document in openEHR

scope which is really about how/where to record/query additional term codes, alongside internal codes like atcodes.

This is (IMO) about how to be able to define and record additional multiple termCodes, to support querying via those additional terms

In scope

Where to add additional terms e.g LOINC and/or SNOMED to

a Node name (LOCATABLE.name/value), alongside the archetypeNodeId

a Node value, usually DV_CODED_TEXT, in addition to any defining_code e.g an internal atCode is used as the value defining_code but we also want to carry the SNOMED equivalent code in the patient record

How do we constrain archetypes/templates to ‘force’ any additional codes to be carried?

Can and should we constrain these additional codes as mandatory, or impose validation?

Can we improve AQL to simplifying the querying of terms?

[EHRbase] Storing and querying data without an standalone archetype

What is new in Better CDR 4.0 is the ability to query an ELEMENT for one of these terms , regardless of the parent archetype context.

This uses a code predicate. Not how I might have done it but I guess it aligns with the FHIR approach.
SELECT
    sys/value as Systolic,
    dia/value as Diastolic
FROM COMPOSITION c
CONTAINS (ELEMENT sys[code='SNOMED-CT::271649006'] AND ELEMENT dia[code='SNOMED-CT::271650006'])

ian.mcnicoll · 3 April 2024 08:44

One thing that would help here is an AQL function that would treat the defining_code and any mappings as a single flat list (like FHIR CodeableConcept), plus allow token matches ( terminologyId and term combined)
e.g hasTerm{"SNOMEDCT::“123456”)

birger.haarbrandt · 12 April 2024 19:45

SELECT
    sys/value as Systolic,
    dia/value as Diastolic
FROM COMPOSITION c
CONTAINS (ELEMENT sys[code='SNOMED-CT::271649006'] AND ELEMENT dia[code='SNOMED-CT::271650006'])

Would be nice to hear from @matijap about design considerations. I know that we can introduce conventions into AQL but introducing a predicate that is like an alias and another syntax for the codes and values does not feel right.

Edit: after fiddling around a bit I must confess that this is ugly when sticking to XPath conventions so I start understanding where this approach is coming from

matijap · 14 April 2024 14:46

This approach is simple, straightforward and well-defined. The last point is important so that we can actually deliver an implementation. There were many things we considered, but I don’t think I can reconstructs all the arguments that went into it…

(One of the things was whether this would be the first functionality to break our approach of independence between AQL and the model. We were considering finding all the code mappings in the archetypes and converting them to at codes before executing the query, which would mean you could add new codes to old archetypes and have your old data automatically be mapped to those new codes. At the end we opted against it…)

birger.haarbrandt · 14 April 2024 15:56

Why not define a function like this :

ELEMENT sys[CONTAINS_CODE('SNOMED-CT', '271649006')]

matijap · 14 April 2024 16:25

Yes, but we abandoned that idea.

Fair point…

birger.haarbrandt · 14 April 2024 16:30

Sorry, I messed up when I tried to edit. With a function, we could also introduce another parameter if the “match” field needs to be considered.

ian.mcnicoll · 15 April 2024 08:14

Perhaps we can get the best of both worlds.

I like the use of a token to shortcut the terminologyId and codestring. These appear in FHIR and GDL

Wr also need something to blur the difference between a term carried in defining_code and one carried in mappings.

So introduce a function like

CONTAINS_TERM(‘SNOMED-CT’::271649006’,'=" )

which means … find that SNOMED code either as a defining_code or as a mapping, where the mapping.matches = ‘=’

Better still if the first parameter was a list of terms that could be returned from a terminology expand operation.

This could be used as both

WHERE ELEMENT.name/value/CONTAINS_TERM(‘SNOMED-CT’::271649006’,'=" )

or

WHERE ELEMENT.value/CONTAINS_TERM(‘SNOMED-CT’::271649006’,'=" )

and then

CONTAINS (ELEMENT sys[code=‘SNOMED-CT::271649006’]

is essentially a shortcut for

WHERE sys.name/value/CONTAINS_TERM(‘SNOMED-CT’::271649006’,'=" )

@matijap - does your current code construct include something in ELEMENT.name/value/defining_code?

matijap · 15 April 2024 08:37

No, only mappings are considered, not the defining code that could be there if the name is a DV_CODED_TEXT.

SevKohler · 15 April 2024 12:00

What about DV_CODED_TEXT as answer for e.g. Problem/Diagnose name ?
That would be the only case i still see that term mappings should not cover.
If no, how did you imagine the queries then ?
Searching Element for both with an OR ?

ian.mcnicoll · 15 April 2024 12:31

I had assumed that [code=…] only applied to LOCATABLE.name but perhaps I’m wrong

That why I was suggesting separate function that could be applied to both name and value, and could include defining_code.

It is possible to use current AQL to do this, of course, but it is really convoluted.

matijap · 16 April 2024 06:35

In our case not even to all LOCATABLE.name but only to ELEMENT.name, and within that only to ELEMENT.name.mappings. (We might consider including ELEMENT.name.defining_code in the future though.)

birger.haarbrandt · 16 April 2024 11:16

This is what I have on my mind, too. Would only write as separate parameters

CONTAINS_TERM('SNOMED-CT','271649006', '=' )or

CONTAINS_TERM('SNOMED-CT::271649006', '=' )

and I think there would be merit to add ELEMENT.name.defining_code.

@matijap would this work for you as well?

sebastian.iancu · 17 April 2024 08:27

Nice discussion, focused on problem solving.
Could you try to address this problem in a more formal way? I’m thinking that you should make a proposal for SEC or and AQL WG to extend the list of functions with this one. Because this AQL might be implemented now mainly at EHRBase & Better, but in fact there are more parties involved or impacted. Also, the specs will have to be updated, with Antlr included - and the change should be consistent with previous spec and styles. Of course, you can still implement this outside of AQL standard, but that’s not that nice I guess.

birger.haarbrandt · 17 April 2024 09:48

Hi @sebastian.iancu,

this would be the next step for me. Once we got a rough outline, we can create tickets, discuss in the SEC meeting etc.

sebastian.iancu · 17 April 2024 09:55

If I am not mistaken, the current grammar does not support function-calls on the CONTAINS expression, but it does allow the above [code='...'] as predicate even though code does not exist.
Is a WHERE expression always a solution to avoid CONTAINS expressions with function calls?

sebastian.iancu · 17 April 2024 10:02

Based on current grammar and spec-style I would suggest
CONTAINS_TERM(sys.name/value, 'SNOMED-CT', '271649006', '=' )
instead of
sys.name/value/CONTAINS_TERM('SNOMED-CT', '271649006', '=' )
unless we make this CONTAINS_TERM a function on the RM level.

Name of the function ‘CONTAINS_TERM’ is also debatable from my point of view if we make it an AQL function.

Also, consider the complex functionality supported by TERMINOLOGY function, which is not only matching codes, but can also expand, validate, map, subsume. How would that relate to CONTAINS_TERM working?

ian.mcnicoll · 17 April 2024 15:35

I agree we need to support TERMINOLOGY expansions so the term codes/terminologyIds need to be an array, and I think pushes us back to an array of tokens and we may well have a mixed terminology expansion

CONTAINS_TERM(sys.name/value, [‘SNOMED-CT::271649006’, LOINC::2345-3], ‘=’ )

or

CONTAINS_TERM(sys.name/value, TERMINOLOGY(…), ‘=’ )

damoca · 17 April 2024 15:52

Following this, I would be very careful when adding syntactic sugar to AQL. Not only because you need to have a very clear definition of it to avoid different interpretations and implementations, but more importantly, if we add syntactic sugar that it is only applicable to an openEHR reference model we would be deviating from the purpose of AQL (ARCHETYPE Query Language) and end with an OpenEHR Query Language. This would mean that the specification could not be valid for other reference models or even with the openEHR RM itself if it evolves in the following 10, 20, 30 years (future-proof, you know).

birger.haarbrandt · 17 April 2024 15:55

@sebastian.iancu the function based approach (also without providing the sub-path to my understanding) is compliant with XPath which I think is a good point of reference how we should deal with predicates. In contrast, defining “virtual” predicates like “code” which are not present in the instance. As a user, I personally would find this confusing.