Safety features in AQL: subject

Crikey, I forget to switch Discourse notifications on and AQL carnage ensues. I have not quite absorbed all the discussion but I think we can agree that it would be preferable to find away to protect ‘naive’ queries that could be performed in a way that is potentially unsafe. By naive I don’t imply stupidity, just that the underlying system may be using more advance constructs than the querier is used to e.g. mixed parent/child records. This will get worse as the scope and complexity of the records increases.

I’d agree with Heather that we probably should avoid partitioning non-subject data with a specialisation or separate archetype though I have done so on one occasion for a very obscure and potentially risky example of parental medication in the child record - yuck.

There are a few places in the RM where a naive query which ignores a related ‘status’ attribute might mangle the intended result - subject, current_state, careflow_step, but not any others that I can think of and of course, as @Seref says, us modellers can really go to town but we have to take responsibility there. The examples that Birger has highlighted are by-design in the RM - nothing to do with the clinical modelling layer. It is an elegant design and maximises re-use of archetypes but it does have this weakness around querying.

The various arguments for and against different solutions are all well made but I’m starting to come round to Thomas’s idea of a togglable ‘safe-search’ mode, on by default, such that somewhere in the AQL processing chain, subject is required to be PARTY_SELF, unless otherwise specified, or safe_search is off. Similarly for current_state, when retreiving an ACTION

Ok, I may have misunderstood you because I thought you were suggesting that aql execution changes behaviour and excludes data unlebss subject is explicitly stated
This suggestion of making an attribute mandatory is at least explicit and Sounds like a more promising direction for discussion than the one I thought you were suggesting Thanks for the clarification (on a train and writing on the phone so bad formatting is inevitable)

@ian.mcnicoll see above please

Thanks @Seref,

I did read it but have not fully absorbed all of the suggestions/ counter-suggestions. I’ll re-read.

and this is a cat that needed to be let out of the bag, and the worm out of the can… I think it is a great example of how this community. While on a rainy walk, it occurs to me that this is actually part of a wider question about potentially limit ing the query traversal. AQL gives us great power but we do need to limit its ability to dig out data, whether for semantic reasons (subject), privacy rules or Bjorn’s ‘report’ flag on compositions to indicate that these should not normally be found by queries.

So I’ll re-read the various comments but here is another suggestion - why do we not consider adding some kind of RM atttribute ?? on PATHABLE that says ‘do not traverse in queries’ - with perhaps a few flavours. It would be up to the system designer to set that attribute on compositions, gnerally aided by tooling and/or system defaults, but capable of being overriden as per Thomas’s example. That moves the problem to the system design space, outside of AQL.

Thanks Ian,
Rm should not know about how it’s to be used in terms of computation
Querying, UI, serialisation to xml, json etc are all use cases for an implementation of the RM. use cases should not go there, there being RM, that effectively eliminates the whole point of two level modelling
Birger’s last suggestion is sounding promising to me atm (emphasis on atm :slight_smile: )

1 Like

Maybe on the side here, but somehow related:
When it comes to data that are exclusively registered for reporting (research, financial, sick leave formulas to authorities dealing with sickness benefit, etc) purposes, we’ve for now abandoned the report flag on compositions, and instead making , ugly, local archetypes to be used to duplicate data born in “real” archetypes made for primary documentation. Example: Diagnose in “Sick leave formulas” is not always the correct diagnose in the EHR, hence polluting the patients diagnose history.

If we could avoid making this duplicate, “monster-archetypes” and instead rely on an attribute in the RM indicating this particular information is a duplicate or for a reporting purpose only … Is this possible?

No misunderstanding, I was just not that precise as I had several ideas floating on my mind and no clue if this is good or bad.

yeah… no!

You have to solve questions with 2nd (or nth) order semantics in upper layers of the system, not in the base information models, which can’t be tasked with knowing how particular types of queries should be processed.

Well, we are thinking about something related - see this page on Reports support in RM.

I’d like to explore that idea further - can you repost in Clinical so we can discuss without tripping over the AQL docussion, which is complex enough, though I can see the relevance here.

1 Like

Here goes (I know I am going to regret this conversation!!).

My view, possibly a misunderstanding of the RM is that is is (a very good) attempt to formalise what the known requirements for an EHR can be universally modelled. This is required because health data encompasses a whole host of ‘special cases’ which one does not find in other sectors. Many of these ‘special cases’ are best dealt with in the archetype-layer but underpinning that is an attempt to capture patterns and rules around which can be defined globally. So we have the specialised ENTRY clases, which were controversial but are accepted as bringing value. We have the ideas of ‘persistent’ vs. ‘event’ compositions, again ‘special cases’ but universally so.

Now we have the ‘special case’ of an ENTRY which contains ’ non-self-data’. There is a clear requirement to be able to handle this kind of data but are allowing subject to be re-defined in a way that is potentially risky from a querying perspective. This has not mattered too much until now, since in general the people building the queries have been the same as those building the templates, and will be aware of this potential risk, and build the appropriate ‘safe-queries’. The problem that Birger is highlighting is that as the scope of use of a set of patient data grows, the people querying may not be aware of this risk, especially since the use of ‘non-self-data’ is actually pretty in the overall picture.

So we know that the ‘non-self-data’ construct is helpful, but we also know that it could represent a significant risk if not used appropriately, and that there is a very clear ‘default’ use i.e PARTY_SELF, which will apply in 99% of usage.

As such I think this is exactly the kind of knowledge that should be instantiated in the RM design. How we do that is a separate issue. It is definitely not an AQL issue per se since any other query mechanism will have exactly the same issue. We can mitigate by adding Where clauses or predicates but these will depend on good practice and documentation or tooling, and do add complexity to every query to protect against a pretty obscure set of use cases.

I would prefer that something (optionally) ends up in the instance metadata that potentially prevents inadvertent querying, and I think this will be very helpful in a number of other more subtle cases - such as the one Vebjorn has highlighted.

@thomas.beale - I am not suggesting solving these kind of questions at RM level, simply that there are some kind of answers that can only be resolved by adding attribution to the data tree,since querying is archetype and template neutral. The DIPS ’ report’ tag is an example of this - The idea is good, just for me not expressive enough for other similar uses and should be at ENTRY level ?? even at ELEMENT not at Composition. I think that gives us much more flexible approach.

Don’t shoot me, but I have to point out that this is not the case. There are no ‘special cases’ in the RM in terms of the RM. The RM, like any information model just says what it says. We built it precisely to be able to represent clinical statements about things (foetal heart rate, a donor kidney) whose owning subject is not the subject of the record. But the RM of course has no idea that this might be rare - in fact it would be common in an organ transplant system or dedicated obs/childbirth system - nor that getting the subject wrong in a query could have major consequences. The RM’s job is correct representation, not outguessing querying needs, or statistical occurrences of particular configurations of data.

Putting special markers in the data for this case is a bad option, as @Seref pointed out, that opens the door to endless special cases, which means endless hacking of the RM, and worse, endless hacking of query processors. In the end, it would become nearly impossible to know if either the data or the AQL processor were correct, since correctness would incorporate these special markers, and special processing capabilities.

It’s slightly annoying, but I don’t see any big problem in implementing this with some generic additions to AQL that enable modal querying, which is of the form:

SET SOME_MODE
do some queries
UNSET SOME_MODE

Where the effect of the SOME_MODE is to modify all the queries when turned in, in some standard way. In this particular case, the most obvious modification is the type checking one, that looks for non-PARTY_SELF ENTRYs and includes or excludes them.

Other better solutions may be available, but none of them would involve marking data in a special way, or having to modify the query processor in non-generic ways.

@thomas.beale I’m taking back my suggestion of SET X /UNSET X. It is a bad idea. As I wrote above, this approach has multiple problems:

  • Its binary nature is not sufficiently clear for potentially multi-dimensional behaviour choices. Even in RDMSs, where the context is simpler, this approach is leading to lots of problems.
  • It binds runtime behaviour to actual data. Whether or not we do it with SET switches does not matter. This’ll create a backdoor to a bad design choice and requests will start piling up; every clinician with a ‘magical’ wish will get in the queue with their boolean flags instead of writing WHERE clauses:
    We’ll be trying to explain to clinicians why SET HIDE_NOT_CODED_DIAGNOSIS ON is a bad idea within a few months .

What we can do is to introduce ‘semantic-safety’ to AQL, similar to how static type checking works. This is me thinking on @birger.haarbrandt 's response to my request for clarification.
As @ian.mcnicoll says, my suggestion to write the where clauses can be problematic because users of AQL are prone to forgetting to do that (I’ll forget the proper uses/improper uses section of archetype metadata for the moment). Ok, then we force the CDRs and modelling tools to remind the users to write the WHERE clause or path predicate (as Birger suggested).
This leaves all the runtime behaviour as it is, we can make the suggestions as complicated as it needs to be and we handle to whole requirement above the AQL runtime. If the modelling tool can’t catch it, then the CDR would during query parse.

If a CDR doesn’t support this feature, it is not different than someone forgetting to type SET X ON but this approach leaves runtime unmodified/simple and is more expressive than binary flags.

Now, how do we define these semantic-safety rules and where do we put them (ideally into a template that includes this extra semantic constraint) is another discussion but I’ll cross this bridge once I win some ground with this binary flag thing :slight_smile:
Thoughts?

Ok, let me try to win you over, at least for the effort you put into your response. As @thomas.beale says, RM is a lot dumber than you see it. It is just a combination of data types with more semantics than plain text or integers but it is not anything more than that.

You have a solid point here regarding difficulty of following good practice. So let’s enforce good practice via the platform and keep the fundamental principals intact. Please read my response to Tom: Safety features in AQL: subject

Sorry Just dropped in. And haven’t been able the read the full discussion.

As you know we are doing quite a bit with transplants. And for me it seems that the best solution is to consider both donor and recipient as separate ehr’s. We will create different subject namespaces.

The only thing we are now facing is what to do with the transplant as this is a clinical procedure involving both donor and recipient (so perhaps we add them to both).

Hi Wouter,

good to have you in this discussion. Do you actually point to the record entry within the the donor’s EHR? We are currently concerned with kidney transplants (part of Screen Reject and NephroDigital projects) and found some stuff that from my perspective just needs to be kept permanently available at the receivers patient record (e.g. the lab value to led to the banff classification). Do you also see this from an EHR perspective as we do within a hospital or this your case a bit different?

Birger

We don’t use the Banff classification but as I understand it it is a classification of the organ (graft). I agree that this is information that should be permanently available to the recipient. I don’t have a final solution yet or maybe it should be a case by case solution.

So there are a couple of use cases.

  1. For use by either the recipient physicians or donorcoordinators(donor side). Who are the useally the composers of the data on either side donor or recipient.
  2. The operational registry managing the allocation proces (organ exchange organisation like Eurotransplant) and thus the link between donors and recipients.
  3. A clinical registry for study purposes.

Let me start with defining a transplanted organ as a graft, you can consider this the same as tissue (skin) or even a mechanical graft like a ventricular assist device (heart pump). So as soon as it is disconnected from the donor body it is no longer an object of that donor. Considering this I would say all information on the graft is all gathered via a proxy.

Now for the first use case:

  1. A recipient physician would need to view the data collected by the donor coordinator on the graft. So he should be able to retrieve this data and it will be part of the patient record. This data on the graft will be enriched by the recipient phycisian. Now there are to cases in which the information that was enriched by the recipient should flow back to the the donor team. One if a Serious adverse events take place e.g. a malignant tumour is found. This is a separate report (template) that will be sent to the organ exchange organisation and added to the donor record as well as all other recipients of donor organs. The second being for a quality report for the procurement surgeon.I think a EHR might not be the best place to keep the quality reports.

  2. So for Eurotransplant we would consider information on the graft as a separate object closely linked to transplant. For now most of the above information resides in the donor EHR and in the system for quality and adverse event reaction and we would present the data to different parties from either the recipient, donor or transplant(allocation) perspective. I think the managing link between these domains is one of the core tasks of Eurotransplant. So how would we make this work in openEHR, openEHR seems in it self not well suited to manage relations between EHR’s, so we might not even use openEHR for this. If we would, graft I formation would stay with the donor, as during our allocation donor graft might be allocated to one recipient who declines and end up in other recipients. Transplant/allocation information has to reside on both sides to be able to always make the connection.

  3. Now for a research database for ether a donor or recipient I would just present graft information on both EHR might get you some redundant data but it is easy. And will be most useable. So your original question how to recognize this as data of the graft. We for a study were we are using this have used headings to differentiate between graft, donor, recipient and transplant information. But what we have also done is create an extension archetype to fit the protocol extension slot when using . Add information on the procedure (on procurement, during transport, during transplant) or something like that perhaps even just if it is donor or recipient derived.

Gr,

Wouter

One other approach might be to use top level folders to carry non ehr subject data. I’ve been considering something similar to separate unmanaged onsumer wearable type info from managed care records.

Just to clarify @ian.mcnicoll when you say top level folders, you’re referring to folders associated to a particular EHR, but contain data that is not within the EHR. i.e. using folders as a metadata mechanism for EHR. Did I get it right?

Not quite. In the ehr but recognises a different level of governance. Normally I would only want to query within the professional folder.