AQL to retrieving data for an element with cardinality 0..*

siljelb · 10 November 2022 09:38

Setting aside the fact that the specific archetype in question in this topic isn’t designed to be templated in this way:
Does AQL have a mechanism to specify that two (or more) parts of the WHERE clause should operate on the same instance of an internal cluster of an archetype? Otherwise, I would think it would return any instance of the immediate parent archetype where any instance of the cluster corresponds to the first part of the WHERE clause, and any, potentially other, instance of the cluster corresponds to the secord part of the WHERE clause?

chaya · 10 November 2022 09:57

@pablo yes, raised a bug in EHRBase GitHub

Seref · 10 November 2022 10:06

My expectation from an AQL implementation would be that this is done by using a path from/including the cluster.
As in

        ... CONTAINS CLUSTER cls[...] ...
        ...
       WHERE cls/..../foo0 = 'bar0' and cls/.../foo1 = 'bar1'

even if there are multiple CLUSTERs in a COMPOSITION the implicit meaning of the above is get me the whatever (SELECT) only if the COMPOSITION containing it has a CLUSTER such that it satisfies these 2 conditions (expressed above in WHERE using and). Most query languages work on this semantics, that the alias cls refers to the same instance of data everywhere in the query and if you provide n constraints then it must satisfy them all.

Now how a vendor interprets the semantics of the syntax I used above is up to them, so I qualified my explanation with “My expectation”

damoca · 10 November 2022 10:13

I think that the current AQL specifications do not mention how to deal with paths including objects with multiple occurrences in the middle. As you say, we could have two different interpretations:

Each match of the full path is processed as a single valid result, and it is combined with any other matched results.
The match of the path with existing instances breaks when an object with multiple occurrences is found.

For example, in the Symptom example, the path

o/data[at0001]/events[at0002]/data[at0003]/items[at0022]/items

includes an object with multiple occurrences (the at0022 CLUSTER). How do we combine the results?

I suspect we can make explicit this behavior using the current AQL syntax, using the containment definition in the FROM clause. In theory, it is not needed to have it defined as an independent archetype. So, this shoulb be a valid AQL:

SELECT 
c/uid/value as compid 
FROM EHR e 
CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.encounter.v1] 
  CONTAINS OBSERVATION o[openEHR-EHR-OBSERVATION.symptom_sign_screening.v0] 
    CONTAINS CLUSTER c
WHERE c/items[at0004]/value/value='Depression' 
      AND c/items[at0005]/value/value='Yes'

Observe that here we will have a problem if the OBSERVATION archetype contains different internal CLUSTERs. We should be able to differentiate the specific CLUSTER we want to work with.
It could be done with the following expression:

-- According to the specifications, this is a perfectly valid predicate
-- although I have never seen it in use
CONTAINS CLUSTER c[at0022]

or it could be done at the WHERE clause:

-- I think I've never seen this kind of filter but, again, it seems syntactically valid
WHERE c[at0022]
      AND c/items[at0004]/value/value='Depression' 
      AND c/items[at0005]/value/value='Yes'

I don’t know if any of those possibilities is currently supported by implementations.

siljelb · 10 November 2022 10:15

damoca:

Observe that here we will have a problem if the OBSERVATION archetype contains different internal CLUSTERs. We should be able to differentiate the specific CLUSTER we want to work with.
It could be done with the following expression:
-- According to the specifications, this is a perfectly valid predicate
-- although I have never seen it in use
CONTAINS CLUSTER c[at0022] 
or it could be done at the WHERE clause:
-- I think I've never seen this kind of filter but, again, it seems syntactically valid
WHERE c[at0022]
      AND c/items[at0004]/value/value='Depression' 
      AND c/items[at0005]/value/value='Yes' 
I don’t know if any of those possibilities is currently supported by implementations.

Yep, this is what I was wondering about. Wouldn’t this be needed to be able to do what @chaya wanted?

ian.mcnicoll · 10 November 2022 11:56

@Seref - the issue here is that the Cluster concerned is ‘internal’, and not an independent CLUSTER archetype.

I’m not aware of ant current implementations allowing CONTAINS on non-root nodes e.g. CONTAINS CLUSTER cls [at002] as per yours and @siljelb suggestions.

I’ll try to test the behaviour when the CONTAINS is applied at CLUSTER archetype level (lab analytes being a good example).

Does the Ocean CDR support this, and if not, what would be your expectation of the behaviour of the query that lacks an explicit ‘contains cluster’ statement.

Right now it looks as if Better CDR ‘assumes’ that a default contains exists i.e that if a cluster is referred to in the where clause, there conditions apply to each cluster instance, not across every instance

That feels to be a reasonable assumption for clinical data but appreciate that might not align with a technical querying perspective.

OTOH ehrBase seems to assume the absence of an explicit CONTAINS (support for which may not exist, at least on non root nodes). That may be technically more correct but I suspect is unhelpful in the majority of clinical querying.

This is quite an important topic so we should definitely discuss ASAP in SEC to provide clear guidance.

chaya · 10 November 2022 11:56

@damoca Thanks.
True, I had tried node predicate with name/value criterion. But looks like it is not supported by EHRBase.

chaya · 10 November 2022 12:00

Thanks @ian.mcnicoll
This is a major issue for us and a quick resolution would help.

ian.mcnicoll · 10 November 2022 12:04

Did you use the approach that @Sijle suggested? i.e apply the rename to the presence node? - and can you share your original template as a template fileset as this makes it much easier to play around with?

ian.mcnicoll · 10 November 2022 12:09

I would say at least according to @Seref 's advice, that is definitely wrong. There is an explicit CONTAINS CLUSTER statement that, at least in Seref’s view, should cause the where clause to applied internally within that cluster instance only.

I’m less clear what should happen if there is no explicit CONTAINS

ian.mcnicoll · 10 November 2022 12:15

Otherwise, I would think it would return any instance of the immediate parent archetype where any instance of the cluster corresponds to the first part of the WHERE clause, and any, potentially other, instance of the cluster corresponds to the secord part of the WHERE clause?

I think that is the key question and while I understand that might be technically correct, I think it is pretty counter-intuitive and I’m struggling to think of examples where it should apply.

If we removed CONTAINS clauses and performed full path queries, would we expect cross-instance container queries at let’s say OBSERVATION level on systolic and diastolic elements?

Seref · 10 November 2022 12:27

That’s irrelevant at the data level. The CLUSTER is a data item sitting in a COMPOSITION It’s semantics at the modelling level do not exist at the data level. You can only stick a cluster archetype where it’s allowed in the RM, so the personal history of the CLUSTER makes no difference (at least to me) Happy to hear if you disagree (or maybe not…)

Once again we hit another vendor choice. Fair enough, in that case, you may be in trouble because you cannot assign an alias to a path segment (as per syntax) and you lose the ability to imply it’s the same data instance that should conform to criteria. As in:

... CONTAINS INSTRUCTION i[bla] ...
... WHERE i/path/segments/all_/the/way/to/cluster/and_/its/element/value = 'foo1'
          and
          i/path/segments/all_/the/way/to/cluster/and_/its/element/value = 'foo2'

The problem above is this composition can have > 1 CLUSTERs and there’s nothing that says “a cluster should satisfy both conditions” instead, this query says “fetch me the composition that contains a cluster with criteria 1 (foo1) and also contains cluster with criteria2 (foo2)” If a composition has two clusters, each satisfying one criteria, you’ll end up selecting it, which is not what you want. (I used the underscores in the paths above to disable syntax colouring of discourse, ignore them)

Unless you introduce some mechanism that’s not in the commonly implemented subset of AQL, your best bet is to fetch these and filter the query results at the application layer.
Implementers can do different things by interpreting the same syntax in a different way, but I’m always conservative when it comes to language semantics, so I cannot see an aql level solution unless the implementation supports CLUSTER in the FROM clause

siljelb · 10 November 2022 12:47

I’m not sure I agree it’s counter-intuitive. IMO the best solution for this generic problem would be if the FROM clause supported internal clusters, both with and without a path or at-code.

Seref · 10 November 2022 13:00

I agree. Sorry, I should have given more focus to your earlier comment, I only responded to its first sentence, then posted something that agrees with the remaining part of it.

You’re right (at least at the same camp with me ) in your interpretation (so we’re right).

Re your suggested solution: vendors keep the supported types in FROM to a limited subset because for most technologies used to implement AQL, there is a significant cost (processing, memory, disk space) to supporting more types, or at-code-free queries. It is also hard to see where this’d stop, i.e. other types being promoted to FROM clause.

chaya · 10 November 2022 16:25

No not yet, Please find attached the template native format.
sample_questionnaire.en.v0.t.json (117.0 KB)
The archetype used are
openEHR-EHR-COMPOSITION.encounter.v1 and
openEHR-EHR-OBSERVATION.symptom_sign_screening.v0

bna · 14 November 2022 13:01

A long thread. I will explore this on our CDR. Do we have some example composition to validate against?

chaya · 14 November 2022 13:03

Hope this helps.

ian.mcnicoll · 14 November 2022 13:17

@bna - I confirmed the behaviour on ehrBase and that Better CDRs handles this as if there was CONTAINS[at0022] clause.

@siljelb - I don’t really agree that this AQL is not using the archetype ‘as -designed’ - the query should work without needing to add the name/value as you suggested. I would probably have tried that s a workaround but it should not be necessary.

bna · 14 November 2022 19:14

I added the two compositions and runned the AQL as provided by @chaya above

SELECT
   c/uid/value as compid
FROM
   ehr e
      CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.encounter.v1]
         CONTAINS OBSERVATION o[openEHR-EHR-OBSERVATION.symptom_sign_screening.v0]
WHERE
   o/data[at0001]/events[at0002]/data[at0003]/items[at0022]/items[at0004]/value/value = 'Depression' and
   o/data[at0001]/events[at0002]/data[at0003]/items[at0022]/items[at0005]/value/value = 'Yes' and
   c/archetype_details/template_id/value = 'sample_questionnaire.en.v0'
ORDER BY c/context/start_time/value DESC

Yes will return one row and No will return one row. As expected.
I am not sure if I understand what the problem is. This seems like a trivial use-case for a CDR.

bna · 15 November 2022 19:44

Seems like we have this feature…