Safety features in AQL: subject

thomas.beale · 15 November 2019 13:44

yeah… no!

You have to solve questions with 2nd (or nth) order semantics in upper layers of the system, not in the base information models, which can’t be tasked with knowing how particular types of queries should be processed.

thomas.beale · 15 November 2019 13:46

Well, we are thinking about something related - see this page on Reports support in RM.

ian.mcnicoll · 15 November 2019 14:55

I’d like to explore that idea further - can you repost in Clinical so we can discuss without tripping over the AQL docussion, which is complex enough, though I can see the relevance here.

ian.mcnicoll · 17 November 2019 13:19

Here goes (I know I am going to regret this conversation!!).

My view, possibly a misunderstanding of the RM is that is is (a very good) attempt to formalise what the known requirements for an EHR can be universally modelled. This is required because health data encompasses a whole host of ‘special cases’ which one does not find in other sectors. Many of these ‘special cases’ are best dealt with in the archetype-layer but underpinning that is an attempt to capture patterns and rules around which can be defined globally. So we have the specialised ENTRY clases, which were controversial but are accepted as bringing value. We have the ideas of ‘persistent’ vs. ‘event’ compositions, again ‘special cases’ but universally so.

Now we have the ‘special case’ of an ENTRY which contains ’ non-self-data’. There is a clear requirement to be able to handle this kind of data but are allowing subject to be re-defined in a way that is potentially risky from a querying perspective. This has not mattered too much until now, since in general the people building the queries have been the same as those building the templates, and will be aware of this potential risk, and build the appropriate ‘safe-queries’. The problem that Birger is highlighting is that as the scope of use of a set of patient data grows, the people querying may not be aware of this risk, especially since the use of ‘non-self-data’ is actually pretty in the overall picture.

So we know that the ‘non-self-data’ construct is helpful, but we also know that it could represent a significant risk if not used appropriately, and that there is a very clear ‘default’ use i.e PARTY_SELF, which will apply in 99% of usage.

As such I think this is exactly the kind of knowledge that should be instantiated in the RM design. How we do that is a separate issue. It is definitely not an AQL issue per se since any other query mechanism will have exactly the same issue. We can mitigate by adding Where clauses or predicates but these will depend on good practice and documentation or tooling, and do add complexity to every query to protect against a pretty obscure set of use cases.

I would prefer that something (optionally) ends up in the instance metadata that potentially prevents inadvertent querying, and I think this will be very helpful in a number of other more subtle cases - such as the one Vebjorn has highlighted.

@thomas.beale - I am not suggesting solving these kind of questions at RM level, simply that there are some kind of answers that can only be resolved by adding attribution to the data tree,since querying is archetype and template neutral. The DIPS ’ report’ tag is an example of this - The idea is good, just for me not expressive enough for other similar uses and should be at ENTRY level ?? even at ELEMENT not at Composition. I think that gives us much more flexible approach.

thomas.beale · 17 November 2019 15:54

Don’t shoot me, but I have to point out that this is not the case. There are no ‘special cases’ in the RM in terms of the RM. The RM, like any information model just says what it says. We built it precisely to be able to represent clinical statements about things (foetal heart rate, a donor kidney) whose owning subject is not the subject of the record. But the RM of course has no idea that this might be rare - in fact it would be common in an organ transplant system or dedicated obs/childbirth system - nor that getting the subject wrong in a query could have major consequences. The RM’s job is correct representation, not outguessing querying needs, or statistical occurrences of particular configurations of data.

Putting special markers in the data for this case is a bad option, as @Seref pointed out, that opens the door to endless special cases, which means endless hacking of the RM, and worse, endless hacking of query processors. In the end, it would become nearly impossible to know if either the data or the AQL processor were correct, since correctness would incorporate these special markers, and special processing capabilities.

It’s slightly annoying, but I don’t see any big problem in implementing this with some generic additions to AQL that enable modal querying, which is of the form:

SET SOME_MODE
do some queries
UNSET SOME_MODE

Where the effect of the SOME_MODE is to modify all the queries when turned in, in some standard way. In this particular case, the most obvious modification is the type checking one, that looks for non-PARTY_SELF ENTRYs and includes or excludes them.

Other better solutions may be available, but none of them would involve marking data in a special way, or having to modify the query processor in non-generic ways.

Seref · 18 November 2019 18:08

@thomas.beale I’m taking back my suggestion of SET X /UNSET X. It is a bad idea. As I wrote above, this approach has multiple problems:

Its binary nature is not sufficiently clear for potentially multi-dimensional behaviour choices. Even in RDMSs, where the context is simpler, this approach is leading to lots of problems.
It binds runtime behaviour to actual data. Whether or not we do it with SET switches does not matter. This’ll create a backdoor to a bad design choice and requests will start piling up; every clinician with a ‘magical’ wish will get in the queue with their boolean flags instead of writing WHERE clauses:
We’ll be trying to explain to clinicians why SET HIDE_NOT_CODED_DIAGNOSIS ON is a bad idea within a few months .

What we can do is to introduce ‘semantic-safety’ to AQL, similar to how static type checking works. This is me thinking on @birger.haarbrandt 's response to my request for clarification.
As @ian.mcnicoll says, my suggestion to write the where clauses can be problematic because users of AQL are prone to forgetting to do that (I’ll forget the proper uses/improper uses section of archetype metadata for the moment). Ok, then we force the CDRs and modelling tools to remind the users to write the WHERE clause or path predicate (as Birger suggested).
This leaves all the runtime behaviour as it is, we can make the suggestions as complicated as it needs to be and we handle to whole requirement above the AQL runtime. If the modelling tool can’t catch it, then the CDR would during query parse.

If a CDR doesn’t support this feature, it is not different than someone forgetting to type SET X ON but this approach leaves runtime unmodified/simple and is more expressive than binary flags.

Now, how do we define these semantic-safety rules and where do we put them (ideally into a template that includes this extra semantic constraint) is another discussion but I’ll cross this bridge once I win some ground with this binary flag thing
Thoughts?

Seref · 18 November 2019 18:13

Ok, let me try to win you over, at least for the effort you put into your response. As @thomas.beale says, RM is a lot dumber than you see it. It is just a combination of data types with more semantics than plain text or integers but it is not anything more than that.

You have a solid point here regarding difficulty of following good practice. So let’s enforce good practice via the platform and keep the fundamental principals intact. Please read my response to Tom: Safety features in AQL: subject

wouterzanen · 20 November 2019 19:05

Sorry Just dropped in. And haven’t been able the read the full discussion.

As you know we are doing quite a bit with transplants. And for me it seems that the best solution is to consider both donor and recipient as separate ehr’s. We will create different subject namespaces.

The only thing we are now facing is what to do with the transplant as this is a clinical procedure involving both donor and recipient (so perhaps we add them to both).

birger.haarbrandt · 20 November 2019 19:21

Hi Wouter,

good to have you in this discussion. Do you actually point to the record entry within the the donor’s EHR? We are currently concerned with kidney transplants (part of Screen Reject and NephroDigital projects) and found some stuff that from my perspective just needs to be kept permanently available at the receivers patient record (e.g. the lab value to led to the banff classification). Do you also see this from an EHR perspective as we do within a hospital or this your case a bit different?

Birger

wouterzanen · 21 November 2019 07:06

We don’t use the Banff classification but as I understand it it is a classification of the organ (graft). I agree that this is information that should be permanently available to the recipient. I don’t have a final solution yet or maybe it should be a case by case solution.

So there are a couple of use cases.

For use by either the recipient physicians or donorcoordinators(donor side). Who are the useally the composers of the data on either side donor or recipient.
The operational registry managing the allocation proces (organ exchange organisation like Eurotransplant) and thus the link between donors and recipients.
A clinical registry for study purposes.

Let me start with defining a transplanted organ as a graft, you can consider this the same as tissue (skin) or even a mechanical graft like a ventricular assist device (heart pump). So as soon as it is disconnected from the donor body it is no longer an object of that donor. Considering this I would say all information on the graft is all gathered via a proxy.

Now for the first use case:

A recipient physician would need to view the data collected by the donor coordinator on the graft. So he should be able to retrieve this data and it will be part of the patient record. This data on the graft will be enriched by the recipient phycisian. Now there are to cases in which the information that was enriched by the recipient should flow back to the the donor team. One if a Serious adverse events take place e.g. a malignant tumour is found. This is a separate report (template) that will be sent to the organ exchange organisation and added to the donor record as well as all other recipients of donor organs. The second being for a quality report for the procurement surgeon.I think a EHR might not be the best place to keep the quality reports.
So for Eurotransplant we would consider information on the graft as a separate object closely linked to transplant. For now most of the above information resides in the donor EHR and in the system for quality and adverse event reaction and we would present the data to different parties from either the recipient, donor or transplant(allocation) perspective. I think the managing link between these domains is one of the core tasks of Eurotransplant. So how would we make this work in openEHR, openEHR seems in it self not well suited to manage relations between EHR’s, so we might not even use openEHR for this. If we would, graft I formation would stay with the donor, as during our allocation donor graft might be allocated to one recipient who declines and end up in other recipients. Transplant/allocation information has to reside on both sides to be able to always make the connection.
Now for a research database for ether a donor or recipient I would just present graft information on both EHR might get you some redundant data but it is easy. And will be most useable. So your original question how to recognize this as data of the graft. We for a study were we are using this have used headings to differentiate between graft, donor, recipient and transplant information. But what we have also done is create an extension archetype to fit the protocol extension slot when using . Add information on the procedure (on procurement, during transport, during transplant) or something like that perhaps even just if it is donor or recipient derived.

Gr,

Wouter

ian.mcnicoll · 21 November 2019 09:20

One other approach might be to use top level folders to carry non ehr subject data. I’ve been considering something similar to separate unmanaged onsumer wearable type info from managed care records.

Seref · 21 November 2019 15:35

Just to clarify @ian.mcnicoll when you say top level folders, you’re referring to folders associated to a particular EHR, but contain data that is not within the EHR. i.e. using folders as a metadata mechanism for EHR. Did I get it right?

ian.mcnicoll · 21 November 2019 16:04

Not quite. In the ehr but recognises a different level of governance. Normally I would only want to query within the professional folder.

wouterzanen · 21 November 2019 18:02

@birger.haarbrandt just one other concern with using party proxy to distinguish between recipient and donor. What if the recipient receives organs from multiple donors either at the same time (for instance pancreas islets transplants) or sequential. You would still need to identify which donor and which organ, values relate to. It seems that querying on that information is better.

Then on the matter of privacy a deceased transplant donor have no privacy under the GDPR, only living persons have privacy. However morally we should respect there privacy. We also have the case of living transplants, however I would say that consent in this case comes with consenting to donate an organ and does not have to be made explicit.

Although I think that making it easier to query on party-proxy is a good idea, I still think it is not a good solution for this use case…

Seref · 22 November 2019 10:56

sorry @ian.mcnicoll you lost me there. My question is in the context of data organisation. I don’t understand what you mean with different level of governance or the professional folder

ian.mcnicoll · 22 November 2019 12:11

In my example there is a need ot carry both ‘personal’ unmanaged,uncurated compositions of patient-provided wearable device data (typically daily step counts in a well person’ and similar ‘professional’ patient-provided wearable device data (perhaps daily step counts as part of a pre-operative fitness program, initiated and reviewed by an anaesthetist).

Same data ,different composition/template but the bulk ‘personal’ data is generally not going to be queried along with the 'professional 'data.

My suggestion was to create 2 top-level folders, pne for ‘personal’ compositions, one for ‘professional’ compositions ,and to use something like

SELECT … as step_count FROM EHR e[…] CONTAINS FOLDER f [‘professional’] CONTAINS composition c contains OBSERVATION o[openEHR-EHR-OBSERVATION.step_count.v1]routinely for professional clinical activity.

Does that help?

Seref · 22 November 2019 12:17

Thanks, I think I got it now.

bna · 25 November 2019 08:31

Coming late to this topic is hard. So many discussions going back and forth…

I want to bring another, possible, solutions to the table:

Solution 1: Folder which link in data from donor
AFAIK the problem here is to collect and store data belonging to both the donor and the recipient. They are different persons. In normal circumstances the data would belong to different EHR’s.

For such use-case I am thinking about storing data into the EHR which is belongs, and then use FOLDER to collect data from the related EHR’s. The solution is to create a FOLDER in the recipient EHR which links in data from the donor.

Solution 2: Change the defaults of AQL
Since all ENTRY instances have a subject and we normally expect PARTY_SELF (the EHR) to be the subject - lets make the default of AQL be to return only PARTY_SELF. Applications who need all kind of subject might query for specific or all subjects.

This solution is equal to the way we handle REPORT compositions in DIPS EHR Store. Compositions with the report category is not returned by default. You have to query for them.

I have not been able to read through all the comments in the thread, so forgive me if someone has suggested the same solutions before.

thomas.beale · 25 November 2019 11:27

I’ll just point out that there is a reason we have ENTRY.subject as an attribute in the model - it’s precisely to handle the case when the subject of the data in the Entry isn’t the same as the subject of the record, even though the Entry is relevant to the subject of the record.

This was in openEHR from the earliest days - based on wide clinical consultation - to support representation of:

foetal heart rate measurements (in the mother’s record)
donor organ data (in the recipient’s record)
disease information of an infectious carrier (in the record of a newly infected patient)
family history information

Although in many systems, PARTY_SELF is likely to be the subject, in many systems - those specialising in any of the above kinds of data - it is just as likely that subject = foetus, donor or some other person.

I think taking such information out of the patient’s EHR would clearly compromise care, or just render it impossible.

Personally, I think we can solve the challenge of making querying work as intuitively expected but also be ‘easy’ - with some generic additions in query tooling.

It is not impossible to imagine also / alternatively maintaining some index object in the EHR that contains pointers to ENTRYs for which subject /= self, and using that to make it easy for queries to distinguish. I think such methods are likely to be fragile however.

I don’t think FOLDERs will be useful here, due to the basic fact that an original COMPOSITION could very easily contain two Entries having different subjects, e.g. maternal (subject=self) BP, heart rate, and foetal heart rate (subject=foetus #N).

I still think that the main ‘problem’ here is data user expectation v real data not being aligned. It is a matter of education within our community on how to do querying properly and safely. If we want to avoid doing some kind of type check on ENTRY.subject, then it’s a matter of modal querying and/or query pipelining, something which we don’t have at all in openEHR right now, but which is basic in the research / analytics domain.

bna · 25 November 2019 14:52

@thomas.beale - I think we both agree then. And as a follow up we, as a community, have to work out some patterns on how to manage such data use-cases.

IMHO it will be reasonable to say that the default behaviour of an AQL query engine is to response with only ENTRY belonging to PARTY_SELF. If the client needs more we have to implement some flag or query patterns.

I don’t think the answer is yes/no on this postulate and look forward to feedbacks.