Problem selecting feeder_audit ids in AQL for Ehrbase

Chrizi · 22 January 2024 13:12

Hi, we are having trouble running the following AQL in EHRbase posting to rest endpoint /ehrbase/rest/openehr/v1/query/aql:

SELECT c/feeder_audit/originating_system_item_ids/id FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.encounter.v1]

We receive rows with “null” as response from EHRbase .

But running the same query in EHR Studio | Better gives the expected result, i.e. rows with the values of id. Has anyone else run into a similiar problem or know about any wokround?

The composition starts like this:

{
“_type”: “COMPOSITION”,
“name”: {
“_type”: “DV_TEXT”,
“value”: “Treatment X”
},
“archetype_details”: {
“archetype_id”: {
“value”: “openEHR-EHR-COMPOSITION.encounter.v1”
},
“template_id”: {
“value”: “Treatment_x.v2”
},
“rm_version”: “1.0.4”
},
“feeder_audit”: {
“_type”: “FEEDER_AUDIT”,
“originating_system_item_ids”: [
{
“_type”: “DV_IDENTIFIER”,
“id”: “2640706”,
“type”: “IdDescriptionName”
}
],
“feeder_system_item_ids”: [
{
“_type”: “DV_IDENTIFIER”,
“id”: “SystemName”,
“type”: “mapper.name”
}
],
…

vidi42 · 22 January 2024 13:50

Hey @Chrizi

Querying the feeder audit in EHRbase is indeed limited. However we are improving this with the new AQL engine version of EHRbase that we’re working on and should be available in the near future.

You can see here a preview of it from the OpenEHR NL Conference https://youtube.com/clip/Ugkx0bl0mnvx2Ru3TlbvWhsBPd_D4c51x9jG?si=snIb1Z5Xt8rlGiVI

Chrizi · 29 January 2024 09:25

Hi @vidi42 ,

We are experiencing a similar problem when trying to retrieve a value that is part of a list object outside of feeder audit. Can the following be the same problem?

This query returns 0 rows in EHRBase, but gives the expected value of “time/value” in Better sandbox:

SELECT c/content[openEHR-EHR-ACTION.medication.v1]/time/value FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.encounter.v1]

However, the following query works and returns the expected time/value in EHRBase as well:

SELECT a/time/value FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS ACTION a[openEHR-EHR-ACTION.medication.v1]

Example composition and template here:

joshua.grisham · 29 January 2024 14:05

Hi @vidi42 the example in the clip you linked to looks like it will work 100% with this use case we are looking at right now, that is great!

Regarding @Chrizi 's second question I think it is semi related but maybe with FEEDER_AUDIT you are avoiding the problem in using CONTAINS now, however, just to be sure…

I think the original problem as well as the 2nd problem seem to mostly be rooted in the fact that these are arrays of elements we are dealing with as opposed to a cluster+items or list of archetypes (meaning, with a specific archetype type or archetype_node_id, which is the case that does seem to work well with arrays).

I have tried to troubleshoot this now a bit by turning on full logging in the database and examining the actual SQL query that is executed against the database when trying different type of AQL queries. Basically what I have seen is that as long as you are querying against arrays of items which have an archetype_node_id (and/or maybe with the attributes “name” and “value” – the “name_value_predicate” case that can be found in your code?) then the query generated against the database seems to be handling it in the right way (specifically that it is eventually using the PostgreSQL function jsonb_array_elements to first parse and split the array into elements before then trying to extract the values), but if you can’t specify an archetype_node_id in the query (and/or maybe this “name_value_predicate” case which does not match List<DV_IDENTIFIER> of things like originating_system_item_ids either), then it is not doing anything with jsonb_array_elements but instead trying to execute jsonb_extract_path_text directly against the array itself (which will never match anything, I think?).

I have tried a lot of different “hacks” now to try and force the path to be interpreted as an array, for example to SELECT with the following different variations:

c/feeder_audit/originating_system_item_ids/id - (same as original example) rows are returned but values are null
c/feeder_audit/originating_system_item_ids[type='SomeSystemId']/id - rows are returned but values are null
c/feeder_audit/originating_system_item_ids[DV_IDENTIFIER]/id - ParseCancellationException from API
c/feeder_audit/originating_system_item_ids[DvIdentifier]/id - ParseCancellationException from API
c/feeder_audit/originating_system_item_ids[*]/id - ParseCancellationException from API
c/feeder_audit/originating_system_item_ids[0]/id - ParseCancellationException from API
c/feeder_audit/originating_system_item_ids[]/id -ParseCancellationException from API
etc etc

All with no real success, unfortunately.

When I monitor the logs from the database at the same time, it never seems to succeed in interpreting this path as a JSON array (never tries to use jsonb_array_elements to handle the array itself).

Here is the database query I am seeing from the original example:

select
  jsonb_extract_path_text(cast("composition_join"."feeder_audit" as jsonb), 'originating_system_item_ids', 'id') as "/feeder_audit/originating_system_item_ids/id"
from "ehr"."entry"
  right outer join "ehr"."composition" as "composition_join"
    on "composition_join"."id" = "ehr"."entry"."composition_id" 
  right outer join "ehr"."ehr" as "ehr_join"
    on "ehr_join"."id" = "composition_join"."ehr_id"
where
  ("ehr"."entry"."template_id" = $1 and ("ehr_join"."id" = 'bbe596e6-567d-44e9-871e-b718a66f8beb'))

If I take my second variant (c/feeder_audit/originating_system_item_ids[type='SomeSystemId']/id) then the query becomes like this:

select 
  array_239963547_43.var_239963547_44 as "/feeder_audit/originating_system_item_ids[type='SomeSystemId']/id" 
from "ehr"."entry" 
  right outer join "ehr"."composition" as "composition_join" 
    on "composition_join"."id" = "ehr"."entry"."composition_id" 
  right outer join "ehr"."ehr" as "ehr_join" 
    on "ehr_join"."id" = "composition_join"."ehr_id" 
  left outer join lateral 
    (select 
      (select jsonb_extract_path_text(cast("composition_join"."feeder_audit" as jsonb), '/originating_system_item_ids[type=''SomeSystemId'']', '/id') as "/feeder_audit/originating_system_item_ids[type='SomeSystemId']/id" 
      where 
        cast(jsonb_extract_path_text(cast("ehr"."js_composition"(cast("composition_join"."id" as uuid), cast('local.ehrbase.org' as text)) as jsonb), 'type') as varchar) 
          =  cast('SomeSystemId' as varchar)) as "var_239963547_44") as "array_239963547_43"
    on $1 
where 
  ("ehr"."entry"."template_id" = $2 and ("ehr_join"."id" = 'bbe596e6-567d-44e9-871e-b718a66f8beb'))

(a lot more complex but still never parsing the array via jsonb_array_elements so it will still not match the array items correctly, plus it seems that it is taking the full syntax including the constraint value as plain text of the field name? field name it is looking for is originating_system_item_ids[type=''SomeSystemId''] instead of originating_system_item_ids?)

If I execute an AQL query against an array which has archetype types / node IDs then you see a totally different kind of query being generated (which does execute jsonb_array_elements (your wrapped version of it, anyway)):

SELECT c/context/other_context/items[openEHR-EHR-CLUSTER.organisation.v1]/items[at0001] FROM EHR e[ehr_id/value='bbe596e6-567d-44e9-871e-b718a66f8beb'] CONTAINS COMPOSITION c

becomes:

select (array_77380713_39.var_77380713_40)::TEXT as "/context/other_context/items[openEHR-EHR-CLUSTER.organisation.v1]/items[at0001]" from "ehr"."entry" right outer join "ehr"."composition" as "composition_join" on "composition_join"."id" = "ehr"."entry"."composition_id" join "ehr"."event_context" on "ehr"."event_context"."composition_id" = "ehr"."entry"."composition_id" right outer join "ehr"."ehr" as "ehr_join" on "ehr_join"."id" = "composition_join"."ehr_id" join lateral (
          select jsonb_extract_path_text(
            ehr.xjsonb_array_elements(cast(jsonb_extract_path(
              cast("ehr"."js_context"("ehr"."event_context"."id") as jsonb),
              'other_context',
              '/items[openEHR-EHR-CLUSTER.organisation.v1]'
            ) as jsonb)),
            '/items[at0001]'
          )
         AS var_77380713_40) as "array_77380713_39" on true where ("ehr"."entry"."template_id" = $1 and ("ehr_join"."id" = 'bbe596e6-567d-44e9-871e-b718a66f8beb'))

I think this then becomes the key question for me now: how will this work in general with arrays in the new release of EHRbase? Is there a new way to specify these paths when the array items do not have an archetype type or node id, and/or when querying directly on the path without constraining the array (via []) will it return a row in the AQL response for each item in the array?

Good that we will be able to use CONTAINS on what seems like any object type now (including FEEDER_AUDIT) which seems to handle this well (based on the clip) but I assume that there will be other cases where one might still need to do something against a path without using CONTAINS where there could be arrays of elements like this?

birger.haarbrandt · 29 January 2024 15:21

Hi @joshua.grisham,

first of all, kudos for digging that deep and make such comprehensive analysis!

This case should as well be covered by the new AQL engine. We might initially not support all possible predicates though. This means that, if possible, you should filter using the WHERE clause instead of the predicate for this feeder_audit example:

c/feeder_audit/originating_system_item_ids[type='SomeSystemId']/id

However, it is mostly a matter of prioritization to add support for these fields as well at a later point in time.

ian.mcnicoll · 29 January 2024 15:22

It would be good to try to get a wee bit of cross-CDR consensus on which set of predicates to support as a priority?