Safety features in AQL: subject

birger.haarbrandt · 12 November 2019 17:29

Hi everybody,

we recently discussed topics regarding use-cases in transplant medicine. This is a domain where the subject in entries is quite important to distinguish between donors and receivers. For example, lab values may need to be compared to create a banff classification. However, within AQL, values from donor and receiver cannot be easily distinguished as this would require an explicit statement about the subject of care within the AQL statement.

Hence, I would like to re-open the discussion if subject self should be the default and other paths belonging to entries about the donor (or fetus etc.) are only retrieved when this is stated explicitly or there is a way to provide information within the result set if a particular value is associated with a subject of care which is different from “self”.

My favorite solution would actually be, as discussed during the last(?) SEC, that a different subject then “self” changes the path in a way that it cannot be confused.

Would be great to get some feedback on this!

pablo · 12 November 2019 18:48

Hi Birger,

It would be nice to have more details about the use case, like details about the data that should be modeled.

On the other hand I don’t see this as a thing specific to AQL, depending on how data is modeled in OPTs, the queries that will be used to get the related data might change.

On the other hand, by looking at the RM, it is easily used to accommodate data that is not directly about the patient in the EHR of the patient (PARTY_SELF).

Each entry has subject: PARTY_PROXY https://specifications.openehr.org/releases/RM/latest/ehr.html#_entry_and_its_subtypes
PARTY_PROXY concrete type could be different than PARTY_SELF, with two options: PARTY_IDENTIFIED or PARTY_RELATED https://specifications.openehr.org/releases/RM/latest/common.html#_overview_3

So I guess data about the donor could be added to the EHR of the receiver using one of those types in the correspondent ENTRY.subject. When that is queried, we need some kind of type checker that I think we lack in AQL, like:

SELECT … FROM … COMPOSITION c CONTAINS ENTRY e WHERE e/subject IS PARTY_IDENTIFIED and e/subject/identifiers matches {…}

or

SELECT … FROM … COMPOSITION c CONTAINS ENTRY e WHERE e/subject IS NOT PARTY_SELF and e/subject/identifiers matches {…}

We need the type checker here because PARTY_PROXY is not archetypable, so we can’t constraint it via CONTAINS or we can’t use an archetype_id to verify it’s type.

I’m not sure if this is the issue you raised, there is info missing, but this analysis raises the issue about the impossibility of querying some data appropriately.

Seref · 12 November 2019 19:03

My high level understanding is you want some convenience to query records based on subject of care. I’d have strong objections to your first suggestion, that by default entries with subject == self are retrieved and others are ignored unless stated. That is a conditional behaviour for a query langauge based on actual value of data and in my opinion that should not be implemented.

Why not SELECT the subject in the result set? You’d have a column that either says ‘self’ or something else. You could use the WHERE clause to filter ‘self’ in or out. Wouldn’t that do what you’re asking for?

pablo · 12 November 2019 19:22

I agree with Seref, also post-processing of the results is always possible.

As a general comment when trying to analyze data or generate some kind of report, post-processing of query result sets is almost unavoidable. That case might be more frequent than we think, since I don’t know we can do everything that we need on AQL to get exactly the result set we need to operate on. Some kind of middleware will be needed on top of those results.

birger.haarbrandt · 12 November 2019 20:08

Hi Seref,

I think the issue here is that this will require to actively filter for non-self subjects on every single query, at least when we let users generate the AQL queries. This might also cause problems if we want to show the latest lab values, diagnosis etc. on a generic patient chart.

The problem here is that we are talking about expectations. As openEHR is working with a patient-centric architecture, users/developers can be surprised by the results when entries are actually not about the patient. So being explicit (or cautious) here can help to prevent errors in my opinion.

Using the WHERE clause is imo not a good solution as you would have to apply it a lot to always be sure that you are not retrieving wrong (=unexpected) data.

Seref · 12 November 2019 20:47

Isn’t the unusual/unexpected thing here putting data that does not belong to the patient into the patient’s EHR?
A query language queries data, that’s it. If the data is in the repo, it would return it.

Can’t you create a dedicated ehr for whatever data that does not belong to patient A and provide references to that data from the EHR you’re accessing?

Aql should not consider any semantics above the RM. You want RM data, it gives you RM data. If we allow higher level semantics to leak into AQL, there is no turning back from that.

I’m trying to think about some feature at the CDR level, above AQL but even then this feels wrong because it is not something above the data layer (such as user name of the querying user etc), it is literally a particular value in the RM.

I guess my point is consistency is always more important than convenience and sometimes convenience is not always available.

birger.haarbrandt · 12 November 2019 21:09

Well, there is nothing you can do in transplant medicine or obstetrics to prevent putting these things inside a patient’s EHR. I guess this is like exactly the reason why this is supported by openEHR (and FHIR).

Given the paradigm that AQL should not contain semantics outside the RM (which is reasonable), then we might need to think about a change to the RM. This is similar to the issue in CDA where a flag somewhere in the tree can change the meaning of the data entirely (mood-codes…). This is not a good design.

The point @pablo raised regarding post-processing is not clear to me: when I only got the values from the leaves, there is no way to know if these values belong to a subject that is different from the patient.

To clarify, this is primarily not about convenience, it is about patient safety.

varntzen · 12 November 2019 22:01

As a non-techie, I can support Birger in his concern.

I understand the use case, documenting observations and evaluations for others than the patient is a valid and not uncommon use case. Typically a unborn child which normally won’t have a record itself. Everything is documented in the mother’s EHR, even some time after birth. Also, as Birger explains, it will happen with data belonging to a donor (could be that it turns out the donor were CMV positive, which can have severe effects on the recipient).

This is similar to the EVALUATION.family_history, where a internal CLUSTER with information belonging to family members are stored. If you nest for example a CLUSTER for Lab results in a SLOT within that internal cluster, you will fetch results from the fetus by the path, but you can’t differentiate results from the fetus and the mother if you do a query on the lab-result archetype “overall”.

If this is the case Birger is raising, it is a valid question and have nothing to do with convenience in querying.

pablo · 13 November 2019 04:14

The context of queries is the EHR, data is returned for each EHR, independently of the entry.subject type

We are really missing some specification around your user case.

pablo · 13 November 2019 04:21

Queries need to return what you need to satisfy your requirements. If leaves are not enough, you need to return the full entries, or even the full compositions.

Seref · 13 November 2019 09:43

I understand your point re safety but the risk to patient safety exists due to the way the modeller created the models. The potential safety issue then exists only in the context of the use of that model, not across an infinite amount of models openEHR is designed to handle. This is something that needs to be handled at the application level and the use of where clause in AQL is one suggestion to do that.

I’ll now be even more annoying and say that changing the RM is just another case of pushing some semantics from the second level of two level modelling down and is not a good approach.

Given openEHR’s design, RM is an attempt to generalise all data types that can arise in healthcare, which is a generalisation. Your requirement here is a specific case that arises in one particular domain and it is meant to be handle by archetypes/templates by design. Handling that at AQL and RM level, both of which are data layer concepts, makes all sorts of architectural integrity/consistency alarms go off in my head.

So given my objection based on the architectural principals, my suggestion would be to discuss constraining the RM so that you don’t have to add clinical domain cases to data types or queries, i.e. finding a way of constraining the models to handle your requirement. The discussion should focus on ADL and templates/archetypes, I really don’t think AQL or RM is the right level to address this requirement, but we have a lot of clever people around I’d be delighted to be proven wrong here

matijap · 13 November 2019 13:07

I was about to propose a slight hack but then I discussed with Fabio and we came to a conclusion that it is actually a big hack and runs orthogonal (in a bad way) to the existing attribute subject that Pablo mentioned. My (bad) idea was to specialise the archetypes for the (probably rare) things that we do measure for another subject (like donor), for instance to have a “donor blood pressure” archetype as a specialisation of “blood pressure”. When querying, it is mandatory to specify the archetypes and usually it is not done in a way that would return data for the entire hierarchy of archetypes, so that would work well. But it is a hack and in the same way it would simplify some generic things (like charts), it would complicate others (like the entry forms for donor blood pressure which could then not be exactly the same ones as entering blood pressure in other settings).

Apart from spreading awareness of this semantically important attribute I’m not sure what the solution would be. (And I’m sure that even our own (=Better by Marand) team doing the medications prescription and application tracking software disregards that attribute and cannot distinguish between medications that need to be applied to a woman and those that need to be applied to her newborn child…)

birger.haarbrandt · 13 November 2019 14:39

@Seref: I totally disagree here. When the openEHR RM provides an attribute and this attribute is used as intended, it is not fair to say that this is a modelling problem. This is clearly a problem within the RM and AQL.

@pablo: I think the use-case is sufficiently clear, at least the answers provided yet are definitely addressing the issue I raised.

@matijap: I has something similar in mind, though it would add another node in the RM path under the entry which needs to be explicitly defined (self, fetus etc.). This way, it is not possible to confuse the subjects. The drawback would be that this is a breaking change for the RM and it would be necessary to have more projections inside the query to get all values from all subject types.

Would be great to get some more feedback. To me it sounds like clinical people like @varntzen tend to agree on the need.

pablo · 13 November 2019 15:53

@birger.haarbrandt the requirement is not 100% clear for me, I don’t see the whole picture because your question was focused on AQL, but maybe AQL is not the solution to tackle the specific functionality you need, or maybe is a combination of different things. To understand the whole problem and come up with a good solution I need to understand the whole use case.

Seref · 13 November 2019 18:13

@birger.haarbrandt I’m not the one saying that this is a problem . I don’t think I even used the word. I’m merely suggesting that what you’re describing as a problem is not at the level you’re trying to fix it at.

I insist this should be solved above the AQL but maybe an solution similar to how relational databases support changes to execution behaviour may be adopted as an approach in the middle.

That is, a CDR can accept an explicit statement during execution of an AQL query or above the query text itself such as
SET SUBJECT_SELF_FILTER ON

This is configuring CDR behaviour from AQL execution context and CDR can either support or not support this. But it is much better than AQL execution acting different just to address a specific case. If the server does not support it, you get an error. It is explicit and potentially millions of different queries don’t end up paying the price of the performance hit.

Would this be an acceptable solution?

edit: You know what? forget it. This is a bad idea. Leaving it as an example of a bad idea. Are we then supposed to introduce flags/settings for every similar use case. @varntzen is talking about the case in Evalutions and clusters. Then what? Another flag for that? Do we change AQL execution for that as well? Who’ll keep track of this.

I repeat my initial suggestions: write the where clauses or separate that data to another model/ehr and use references. We can do some clever stuff with resolving the references to help but that’s another topic.

thomas.beale · 13 November 2019 19:05

Note that it is normal for data with subject=someone else to be in a patient’s EHR - most common case is foetal hearbeat, ultrasound etc for pregnant woman; organ transplant data is probably the second most common case.

thomas.beale · 13 November 2019 19:10

Birger,
Just to get some clarity, what is your ideal behaviour if no special WHERE condition is stated on ENTRY.subject? That only subject=self ENTRYs are returned?

pablo · 13 November 2019 21:32

Hi Thomas, check my first comment here, is it worth to have an operator to check RM types for parts that have polymorphism and are not archetypable in AQL? PARTY_PROXY is just one example.

birger.haarbrandt · 13 November 2019 22:08

I think we have several options here. Though these are only educated guesses:

Queries only return results for subject=self when no WHERE condition on subject is provided. When there is a WHERE condition on subject (e.g. “fetus”), only this explicitly stated condition should be used. It might also make sense to add a wildcard to get data for any subjects.
We could also think about an additional subject column as part of the result set that is automatically added when any data is provided that is not subject self. Though I think this would affect performance because this would require an additional lookup for each entry and this would not work for aggregation queries.
The AQL syntax could be enhanced so that every single query needs to contain an explicit WHERE condition for the targeted subject. This would avoid mistakes but create lots of boilerplate code in most queries.

I guess the important part for you is to get the idea. Maybe the solutions are bad but make clear what system behavior I would like to avoid in the future.

pablo · 13 November 2019 22:37

I don’t think 1. is true, actually queries return what is on the EHR, that was my previous comment about the context being EHR. If you try to set ENTRY.subject to a different thing than PARTY_SELF, and you query the EHR, you will get those entries also and I think this is correct, because is information that is important to the EHR of the patient that might not be directly of the patient. Another example would be to know if a parent has any diseases that could be inherited, that is info about the parent, that is important for the EHR of the child.

For 2. you can actually return the full ENTRY so you can check the subject on post-processing if needed.

I think to add a type check is enough, I don’t think some kind of scoping to constraint the results adds much value, and might be too specific for a small set of use cases, knowing there are other alternative solutions, this might be not the best design for a general purpose query language. Also type checking operators can be used on other cases as mentioned before: when there is polymorphism and the data is not archetypable, because on other cases you can just filter with CONTAINS by archetype ID, which is like a type check.