Safety features in AQL: subject

bna · 25 November 2019 08:31

Coming late to this topic is hard. So many discussions going back and forth…

I want to bring another, possible, solutions to the table:

Solution 1: Folder which link in data from donor
AFAIK the problem here is to collect and store data belonging to both the donor and the recipient. They are different persons. In normal circumstances the data would belong to different EHR’s.

For such use-case I am thinking about storing data into the EHR which is belongs, and then use FOLDER to collect data from the related EHR’s. The solution is to create a FOLDER in the recipient EHR which links in data from the donor.

Solution 2: Change the defaults of AQL
Since all ENTRY instances have a subject and we normally expect PARTY_SELF (the EHR) to be the subject - lets make the default of AQL be to return only PARTY_SELF. Applications who need all kind of subject might query for specific or all subjects.

This solution is equal to the way we handle REPORT compositions in DIPS EHR Store. Compositions with the report category is not returned by default. You have to query for them.

I have not been able to read through all the comments in the thread, so forgive me if someone has suggested the same solutions before.

thomas.beale · 25 November 2019 11:27

I’ll just point out that there is a reason we have ENTRY.subject as an attribute in the model - it’s precisely to handle the case when the subject of the data in the Entry isn’t the same as the subject of the record, even though the Entry is relevant to the subject of the record.

This was in openEHR from the earliest days - based on wide clinical consultation - to support representation of:

foetal heart rate measurements (in the mother’s record)
donor organ data (in the recipient’s record)
disease information of an infectious carrier (in the record of a newly infected patient)
family history information

Although in many systems, PARTY_SELF is likely to be the subject, in many systems - those specialising in any of the above kinds of data - it is just as likely that subject = foetus, donor or some other person.

I think taking such information out of the patient’s EHR would clearly compromise care, or just render it impossible.

Personally, I think we can solve the challenge of making querying work as intuitively expected but also be ‘easy’ - with some generic additions in query tooling.

It is not impossible to imagine also / alternatively maintaining some index object in the EHR that contains pointers to ENTRYs for which subject /= self, and using that to make it easy for queries to distinguish. I think such methods are likely to be fragile however.

I don’t think FOLDERs will be useful here, due to the basic fact that an original COMPOSITION could very easily contain two Entries having different subjects, e.g. maternal (subject=self) BP, heart rate, and foetal heart rate (subject=foetus #N).

I still think that the main ‘problem’ here is data user expectation v real data not being aligned. It is a matter of education within our community on how to do querying properly and safely. If we want to avoid doing some kind of type check on ENTRY.subject, then it’s a matter of modal querying and/or query pipelining, something which we don’t have at all in openEHR right now, but which is basic in the research / analytics domain.

bna · 25 November 2019 14:52

@thomas.beale - I think we both agree then. And as a follow up we, as a community, have to work out some patterns on how to manage such data use-cases.

IMHO it will be reasonable to say that the default behaviour of an AQL query engine is to response with only ENTRY belonging to PARTY_SELF. If the client needs more we have to implement some flag or query patterns.

I don’t think the answer is yes/no on this postulate and look forward to feedbacks.

Seref · 25 November 2019 15:51

@bna I and Tom provided opinions against the default behaviour you suggested above. As long as this discussion may have been, it is a valuable one and the discussion above is worth reading.

bna · 25 November 2019 15:58

@Seref - Yes I read that earlier and was naive enough to think that you changed your minds for this better approach.

@matijap - I guess you like the last two words of the sentence above

thomas.beale · 25 November 2019 16:41

Well that ‘default approach’ means the query engine knows something specific about the RM, which is a basic IT anti-pattern so we need to stay away from that.

I would currently advocate that the appropriate filter be included as a pipeline stage which is automatically added by a query tool working in a specific mode that has been switched on by the user.

Seref · 25 November 2019 17:02

I’m very keen to hear why “this” is a better approach (compared to alternatives) because personally I think the most promising suggestion to move forward came from Birger and even that scares me a bit when I think about it.

May I kindly request you elaborate a bit on your suggestions, in comparison to, say static analysis of aql as discussed above? I’d be delighted to have a few more options on the table.

ian.mcnicoll · 25 November 2019 17:13

Are we looking at a sort of Linter for aql that allows us to check for known snafus and potential safety issues?

Seref · 25 November 2019 17:42

“sort of Linter” is good enough for me to say: “yes, that’s what I have in mind based on what Birger said previously”, but that is me. Tom’s definition sounds like a runtime solution but still backed by some static analysis.

I think some sort of static analysis of AQL is inevitable and I’m personally in favour of this approach as much as possible. I’m in the mind to start a new thread based on what I consider to be most useful input from this one and discuss this particular approach there.

bna · 25 November 2019 17:53

Are you talking about this kind of solution @Seref?
Is so - then we agree on the same kind of solution. My suggestion was to let the AQL engine add this kind of structure, if not present.

What we really want is to be backward compatible. We don’t want to introduce some new “mandatatory” structures of AQL. I guess current querying does not consider that the subject might be something else. Thus it will be quite safe to add this assumption to the engine. And this feature might even be a configuration depending on the running environment.

bna · 25 November 2019 17:56

I think we have the same assumptions then. The appropriate filter is included as some pipeline stage in the AQL engine - given either a configuration or some hints given by the client.

The overall requirement here is to not introduce any breaking changes for existing clients, and still be able to cover the use-case given by @birger.haarbrandt.

Seref · 26 November 2019 11:15

I’ve been objecting to AQL engine adding this kind of structure automatically, after a query without this structure arrives. My suggestion is to raise a warning or an error during parsing of the AQL statement, before it reaches the engine. Whether or not to raise the warning/error would likely be determined by metadata in the archetype. Modellers should at least take the responsibility of alerting downstream users of their models that the model may contain data that belongs to parties other than the subject of the EHR. So its their responsibility to put that metadata into the model.

AQL processing stack can check archetype metadata, find the need to check for subject or set subject in the query (as pointed out by the semantics of warning put in place by the modeller), then raise a warning or an error (based on the severity level set by the modeller) . Then it is the query author’s responsibility to act and put in place whatever that warning/error requires. From the point of view of the AQL engine, there are no warnings, no conditional execution, it just runs whatever it receives and returns the results.

We may have different terminologies, I refer to engine as the AQL runtime, i.e. the chunk of code that does the actual fetching of data. All of the above can be performed by AQL authoring tool and/or by the CDR that does the parsing of AQL. It’d play nice with everything.

So far, this is the best approach I can think of.

Revisiting my point above re modellers taking responsibility, please remember that the core design goal of openEHR is to let clinicians take charge but without forcing clinicians to become techies or techies to act as clinicians. I think the approach I suggest above fits that overarching design principal. Techies should not implement logic that watches for conditions without a clinician telling them to do so.

The suggested approach above may go down in flames during technical implementation or may have glaring holes in it but it attempts to preserve the core principals of openEHR with concern for whatever system design principals I have (which is always open to challenge and improvement)

ian.mcnicoll · 26 November 2019 14:50

Modellers should at least take the responsibility of alerting downstream users of their models that the model may contain data that belongs to parties other than the subject of the EHR.

But that’s the point - this is not part of the archetypes, it is part of the RM. Any archetype can be assigned a non-subject party, at run-time or in a template. Now this is all fine in a smallish, controlled semantic space where there is a single app, and set of devs who can be relatively easily kept informed. And although this kind of usage is anticipated, it is will still only occur in a tiny minority of templates/run-time situations, other than in specialist sectors. AQL tools cannot help us create safe queries since the risk is going to arise when I am querying cross-template e,g. at Entry level. The only safe approach is to explicitly query for PARTY_SELF, unless you are absolutely sure that no-one, anywhere. This is easy enough in a constrained app environment but definitely prone to a critical error in a full-blown national EHR.

I am beginning to think that we may need to change the RM to ensure that this context-switch does not cause a serious downstream issue.

Seref · 26 November 2019 15:09

Well, everything is part of the RM. It is at the archetype level we handle the specific case. How “handling” may work may be subject to discussion but that’s where it’s meant to be done.

Whatever you want to express by changing the RM, we must find a way of doing it by constraining the RM. I’m really not trying to be a zealot or annoy people here but this semantics we’re trying to express is just too high level for RM.

Can you tell me what kind of change you’d propose to RM and what’s the problem with my suggestion above?

I’m sure we’re all tired of the discussion at this point and honestly I’d just let it go if it did not feel so against the design of this whole thing we’ve spent our careers on.

If you add the warning/check to the archetype that contains that entry, tools can warn you no matter what templates you use, unless you issue a query that refers to an ENTRY without referring to its parent composition (which is a bad idea anyway)

ian.mcnicoll · 26 November 2019 15:52

“It is at the archetype level we handle the specific case.”

But the whole point is that non-self subject can be applied to ANY archetype, at template or run-time.

"If you add the warning/check to the archetype that contains that entry, "

but that’s not the problem. The archetypes very rarely constrain subject. The issue is with (the majority situation) where an archetype can be completely legitimately, at run-time, be assigned either to self or to non-self.

Lab test (as in donor examples) is a good use-case, but almost any archetype could potentially be used that way - and the problem is that I can not know if/how it has been used in any given CDR. This is not a property of an archetype. So therefore the only safe approach is to always explicitly search for subject=PARTY_SELF except on those rare occasions where I might want to do otherwise

I should never ignore subject in ANY AQL because I cannot ever know if/how someone might use any given archetype in non-self mode in the future, at which point I might return some dangerously wrong information, without me donmigh anything new. Same applies to AQL fragments in GDL, or Task planning.

I am still open to any/all suggestions about mitigation but I’m afraid this does present a very significant future risk and IMO needs to be ‘designed out’ so that a query that omits .subject should never be capable of returning Entries that apply to non-self .

i.e. this

SELECT l FROM EHR e CONTAINS OBSERVATION l[openEHR-EHR-OBSERVATION.laboratory_test.v1] must never be capable of returning data where l.subject <> PARTY_SELF.

If that means re-engineering how we handle non-self subject, then that’s what we need to do.

RM change - there is not a simple structural solution, I agree, other than to drop the idea of non-self subject altogether.

However, we do have quite a lot of functional specifications already - we are just adjusting the guidance on disjoint merges because of a safety issue. This is a similar behavioural issue that I think should be enshrined in the specs, just in the same way that we say that a query over a persistent composition should only return the latest version.

Guidance is not enough, tooling is not enough, education is not enough. Sorry to bang on but this is important.

matijap · 26 November 2019 19:24

I must say I am against session-lived “flags” that Thomas proposed, like SET SUBJECT_FILTER ON. And while I cannot object his arguments about abstract RM and AQL, I find Ian’s points stronger. Either we must get rid of the subject attribute, which will make openEHR inadequate for certain medical settings, or we must implicitly ignore data not directly related to the EHR’s owner unless explicitly instructed otherwise. This fact is not necessarily a part of the AQL itself, but rather a property of the query engine within CDR.

The only other option (that is hard to implement consistently and is backwards-incompatible to a greater extent than the implicit filtering of data, and still requires special handling of the subject attribute within query engines) is to make the subject filter mandatory for all cases where the engine cannot prove that the data cannot contain entries for other subjects. That is, the only exception would be when querying directly from entries using an archetype predicate where that archetype is known to constrain the subject to PARTY_SELF and does not contain any other entries itself (or those are again constrained in that way). But that would put pressure to archetype modellers to have lenghty discussions for each archetype whether it makes, or will make in the future, sense for it to be used in subject!=PARTY_SELF scenarios, because constraining it would make queries simpler. I don’t think this is the most sensible approach here.

bna · 27 November 2019 04:22

This is important as @ian.mcnicoll pointed out.

My thinking is still that the usual (> 95%) of entries will belong to PARTY_SELF. Which makes it a good choice to let the AQL engine set this filter if it’s not explicit set by client. With this approach all existing clients will work as before even when entries not belonging to PARTY_SELF is added. The existing clients never wanted data not belonging to the subject of care.

This is the same argument as:

It’s impossible to change all client software within a limited period of time. They might be delivered by third party vendors. This is why backward compability is important and necessary. I really don’t think existing clients of an AQL client expected data from another subject than PARTY_SELF. Let’s keep it that way.

When implementing an openehr system you must have some reference to the RM. The data is defined and serialised according to the RM. The RM has an attribute for subject. This attribute is solve the use-case at hand, the donor case. Yet again we see the RM as a rescue when real life use-cases appear. Great work!

I don’t know why this use-case forces a change in the RM. IMHO we only need to change our minds about what kind of data might be stored in the CDR, and provide some reasonable defaults to make life as a client software developer easy, quick and open for extension.

Let the AQL engine query only entries with subject party self if not explicit told not to.

matijap · 27 November 2019 07:52

Of course I would argue that such an implicit filter has to be well documented, probably in more than one place. We must acknowledge it is a deviation from genericity, but one that will confuse less people than the current state does.

varntzen · 27 November 2019 08:41

What happens when we extract data from the EHR to a data warehouse? Won’t that be on ENTRY-level, not referring to the composition ?

Seref · 27 November 2019 09:09

Good question. Technically another discussion though, but here we go anyway: It heavily depends on your warehouse design as well as your use of data warehouse. We’re currently running multiple large scale data warehouses and BI and reporting based on that is never done without consideration of the models that set the rules for the creation of the data.

We, at Ocean, subscribe to archetypes being the ubiquitous language across all use cases and try to leverage them in every context we can. You can see how that view is shaping my responses it this thread.

Reporting on openEHR data is a whole domain in itself and it took us years to get things working to the satisfaction of most stakeholders (noticed how I avoided saying get things right? )