# Safety features in AQL: subject **Category:** [AQL](https://discourse.openehr.org/c/aql/43) **Created:** 2019-11-12 17:29 UTC **Views:** 2476 **Replies:** 104 **URL:** https://discourse.openehr.org/t/safety-features-in-aql-subject/137 --- ## Post #1 by @birger.haarbrandt Hi everybody, we recently discussed topics regarding use-cases in transplant medicine. This is a domain where the subject in entries is quite important to distinguish between donors and receivers. For example, lab values may need to be compared to create a banff classification. However, within AQL, values from donor and receiver cannot be easily distinguished as this would require an explicit statement about the subject of care within the AQL statement. Hence, I would like to re-open the discussion if subject self should be the default and other paths belonging to entries about the donor (or fetus etc.) are only retrieved when this is stated explicitly or there is a way to provide information within the result set if a particular value is associated with a subject of care which is different from "self". My favorite solution would actually be, as discussed during the last(?) SEC, that a different subject then "self" changes the path in a way that it cannot be confused. Would be great to get some feedback on this! --- ## Post #2 by @pablo Hi Birger, It would be nice to have more details about the use case, like details about the data that should be modeled. On the other hand I don't see this as a thing specific to AQL, depending on how data is modeled in OPTs, the queries that will be used to get the related data might change. On the other hand, by looking at the RM, it is easily used to accommodate data that is not directly about the patient in the EHR of the patient (PARTY_SELF). 1. Each entry has subject: PARTY_PROXY https://specifications.openehr.org/releases/RM/latest/ehr.html#_entry_and_its_subtypes 2. PARTY_PROXY concrete type could be different than PARTY_SELF, with two options: PARTY_IDENTIFIED or PARTY_RELATED https://specifications.openehr.org/releases/RM/latest/common.html#_overview_3 So I guess data about the donor could be added to the EHR of the receiver using one of those types in the correspondent ENTRY.subject. When that is queried, we need some kind of type checker that I think we lack in AQL, like: SELECT .. FROM .. COMPOSITION c CONTAINS ENTRY e WHERE e/subject IS PARTY_IDENTIFIED and e/subject/identifiers matches {...} or SELECT .. FROM .. COMPOSITION c CONTAINS ENTRY e WHERE e/subject IS NOT PARTY_SELF and e/subject/identifiers matches {...} We need the type checker here because PARTY_PROXY is not archetypable, so we can't constraint it via CONTAINS or we can't use an archetype_id to verify it's type. I'm not sure if this is the issue you raised, there is info missing, but this analysis raises the issue about the impossibility of querying some data appropriately. --- ## Post #3 by @Seref My high level understanding is you want some convenience to query records based on subject of care. I'd have strong objections to your first suggestion, that by default entries with subject == self are retrieved and others are ignored unless stated. That is a conditional behaviour for a query langauge based on actual value of data and in my opinion that should not be implemented. Why not SELECT the subject in the result set? You'd have a column that either says 'self' or something else. You could use the WHERE clause to filter 'self' in or out. Wouldn't that do what you're asking for? --- ## Post #4 by @pablo I agree with Seref, also post-processing of the results is always possible. As a general comment when trying to analyze data or generate some kind of report, post-processing of query result sets is almost unavoidable. That case might be more frequent than we think, since I don't know we can do everything that we need on AQL to get exactly the result set we need to operate on. Some kind of middleware will be needed on top of those results. --- ## Post #5 by @birger.haarbrandt Hi Seref, I think the issue here is that this will require to actively filter for non-self subjects on every single query, at least when we let users generate the AQL queries. This might also cause problems if we want to show the latest lab values, diagnosis etc. on a generic patient chart. The problem here is that we are talking about expectations. As openEHR is working with a patient-centric architecture, users/developers can be surprised by the results when entries are actually not about the patient. So being explicit (or cautious) here can help to prevent errors in my opinion. Using the WHERE clause is imo not a good solution as you would have to apply it a lot to always be sure that you are not retrieving wrong (=unexpected) data. --- ## Post #6 by @Seref Isn't the unusual/unexpected thing here putting data that does not belong to the patient into the patient's EHR? A query language queries data, that's it. If the data is in the repo, it would return it. Can't you create a dedicated ehr for whatever data that does not belong to patient A and provide references to that data from the EHR you're accessing? Aql should not consider any semantics above the RM. You want RM data, it gives you RM data. If we allow higher level semantics to leak into AQL, there is no turning back from that. I'm trying to think about some feature at the CDR level, above AQL but even then this feels wrong because it is not something above the data layer (such as user name of the querying user etc), it is literally a particular value in the RM. I guess my point is consistency is always more important than convenience and sometimes convenience is not always available. --- ## Post #7 by @birger.haarbrandt Well, there is nothing you can do in transplant medicine or obstetrics to prevent putting these things inside a patient's EHR. I guess this is like exactly the reason why this is supported by openEHR (and FHIR). Given the paradigm that AQL should not contain semantics outside the RM (which is reasonable), then we might need to think about a change to the RM. This is similar to the issue in CDA where a flag somewhere in the tree can change the meaning of the data entirely (mood-codes...). This is not a good design. The point @pablo raised regarding post-processing is not clear to me: when I only got the values from the leaves, there is no way to know if these values belong to a subject that is different from the patient. To clarify, this is primarily not about convenience, it is about patient safety. --- ## Post #8 by @varntzen As a non-techie, I can support Birger in his concern. I understand the use case, documenting observations and evaluations for others than the patient is a valid and not uncommon use case. Typically a unborn child which normally won't have a record itself. Everything is documented in the mother's EHR, even some time after birth. Also, as Birger explains, it will happen with data belonging to a donor (could be that it turns out the donor were CMV positive, which can have severe effects on the recipient). This is similar to the EVALUATION.family_history, where a internal CLUSTER with information belonging to family members are stored. If you nest for example a CLUSTER for Lab results in a SLOT within that internal cluster, you will fetch results from the fetus by the path, but you can't differentiate results from the fetus and the mother if you do a query on the lab-result archetype "overall". If this is the case Birger is raising, it is a valid question and have nothing to do with convenience in querying. --- ## Post #9 by @pablo The context of queries is the EHR, data is returned for each EHR, independently of the entry.subject type We are really missing some specification around your user case. --- ## Post #10 by @pablo Queries need to return what you need to satisfy your requirements. If leaves are not enough, you need to return the full entries, or even the full compositions. --- ## Post #11 by @Seref I understand your point re safety but the risk to patient safety exists due to the way the modeller created the models. The potential safety issue then exists only in the context of the use of that model, not across an infinite amount of models openEHR is designed to handle. This is something that needs to be handled at the application level and the use of where clause in AQL is one suggestion to do that. I'll now be even more annoying and say that changing the RM is just another case of pushing some semantics from the second level of two level modelling down and is not a good approach. Given openEHR's design, RM is an attempt to generalise all data types that can arise in healthcare, which is a generalisation. Your requirement here is a specific case that arises in one particular domain and it is meant to be handle by archetypes/templates by design. Handling that at AQL and RM level, both of which are data layer concepts, makes all sorts of architectural integrity/consistency alarms go off in my head. So given my objection based on the architectural principals, my suggestion would be to discuss constraining the RM so that you don't have to add clinical domain cases to data types or queries, i.e. finding a way of constraining the models to handle your requirement. The discussion should focus on ADL and templates/archetypes, I really don't think AQL or RM is the right level to address this requirement, but we have a lot of clever people around I'd be delighted to be proven wrong here :) --- ## Post #12 by @matijap I was about to propose a slight hack but then I discussed with Fabio and we came to a conclusion that it is actually a big hack and runs orthogonal (in a bad way) to the existing attribute *subject* that Pablo mentioned. My (bad) idea was to specialise the archetypes for the (probably rare) things that we do measure for another subject (like donor), for instance to have a "donor blood pressure" archetype as a specialisation of "blood pressure". When querying, it is mandatory to specify the archetypes and usually it is not done in a way that would return data for the entire hierarchy of archetypes, so that would work well. But it is a hack and in the same way it would simplify some generic things (like charts), it would complicate others (like the entry forms for donor blood pressure which could then not be exactly the same ones as entering blood pressure in other settings). Apart from spreading awareness of this semantically important attribute I'm not sure what the solution would be. (And I'm sure that even our own (=Better by Marand) team doing the medications prescription and application tracking software disregards that attribute and cannot distinguish between medications that need to be applied to a woman and those that need to be applied to her newborn child...) --- ## Post #13 by @birger.haarbrandt @Seref: I totally disagree here. When the openEHR RM provides an attribute and this attribute is used as intended, it is not fair to say that this is a modelling problem. This is clearly a problem within the RM and AQL. @pablo: I think the use-case is sufficiently clear, at least the answers provided yet are definitely addressing the issue I raised. @matijap: I has something similar in mind, though it would add another node in the RM path under the entry which needs to be explicitly defined (self, fetus etc.). This way, it is not possible to confuse the subjects. The drawback would be that this is a breaking change for the RM and it would be necessary to have more projections inside the query to get all values from all subject types. Would be great to get some more feedback. To me it sounds like clinical people like @varntzen tend to agree on the need. --- ## Post #14 by @pablo @birger.haarbrandt the requirement is not 100% clear **for me**, I don't see the whole picture because your question was focused on AQL, but maybe AQL is not the solution to tackle the specific functionality you need, or maybe is a combination of different things. To understand the whole problem and come up with a good solution I need to understand the whole use case. --- ## Post #15 by @Seref @birger.haarbrandt I'm not the one saying that this is a problem . I don't think I even used the word. I'm merely suggesting that what you're describing as a problem is not at the level you're trying to fix it at. I insist this should be solved above the AQL but maybe an solution similar to how relational databases support changes to execution behaviour may be adopted as an approach in the middle. That is, a CDR can accept an explicit statement during execution of an AQL query or above the query text itself such as `SET SUBJECT_SELF_FILTER ON` This is configuring CDR behaviour from AQL execution context and CDR can either support or not support this. But it is much better than AQL execution acting different just to address a specific case. If the server does not support it, you get an error. It is explicit and potentially millions of different queries don't end up paying the price of the performance hit. Would this be an acceptable solution? **edit:** You know what? forget it. This is a bad idea. Leaving it as an example of a bad idea. Are we then supposed to introduce flags/settings for every similar use case. @varntzen is talking about the case in Evalutions and clusters. Then what? Another flag for that? Do we change AQL execution for that as well? Who'll keep track of this. I repeat my initial suggestions: write the where clauses or separate that data to another model/ehr and use references. We can do some clever stuff with resolving the references to help but that's another topic. --- ## Post #16 by @thomas.beale Note that it is normal for data with subject=someone else to be in a patient's EHR - most common case is foetal hearbeat, ultrasound etc for pregnant woman; organ transplant data is probably the second most common case. --- ## Post #17 by @thomas.beale Birger, Just to get some clarity, what is your ideal behaviour if no special WHERE condition is stated on ENTRY.subject? That only subject=self ENTRYs are returned? --- ## Post #18 by @pablo Hi Thomas, check my first comment here, is it worth to have an operator to check RM types for parts that have polymorphism and are not archetypable in AQL? PARTY_PROXY is just one example. --- ## Post #19 by @birger.haarbrandt I think we have several options here. Though these are only educated guesses: 1. Queries only return results for subject=self when no WHERE condition on subject is provided. When there is a WHERE condition on subject (e.g. "fetus"), only this explicitly stated condition should be used. It might also make sense to add a wildcard to get data for any subjects. 2. We could also think about an additional subject column as part of the result set that is automatically added when any data is provided that is not subject self. Though I think this would affect performance because this would require an additional lookup for each entry and this would not work for aggregation queries. 3. The AQL syntax could be enhanced so that every single query needs to contain an explicit WHERE condition for the targeted subject. This would avoid mistakes but create lots of boilerplate code in most queries. I guess the important part for you is to get the idea. Maybe the solutions are bad but make clear what system behavior I would like to avoid in the future. --- ## Post #20 by @pablo I don't think 1. is true, actually queries return what is on the EHR, that was my previous comment about the context being EHR. If you try to set ENTRY.subject to a different thing than PARTY_SELF, and you query the EHR, you will get those entries also and I think this is correct, because is information that is important to the EHR of the patient that might not be directly of the patient. Another example would be to know if a parent has any diseases that could be inherited, that is info about the parent, that is important for the EHR of the child. For 2. you can actually return the full ENTRY so you can check the subject on post-processing if needed. 3. I think to add a type check is enough, I don't think some kind of scoping to constraint the results adds much value, and might be too specific for a small set of use cases, knowing there are other alternative solutions, this might be not the best design for a general purpose query language. Also type checking operators can be used on other cases as mentioned before: when there is polymorphism and the data is not archetypable, because on other cases you can just filter with CONTAINS by archetype ID, which is like a type check. --- ## Post #21 by @thomas.beale As Seref said earlier, we don't want to build special cases into a general language. And normal queries should return any matching data, that's also basic. Ideally we find a way to make it possible and easy to state subject = self or /= self in a way that is completely general (e.g. could be used to only get INTERVAL_EVENTs, not all EVENTs). Pablo's suggestion of a type predicate is worth considering. If 'obs' identifies the OBSERVATION then something like ``` is_type (obs/subject , 'PARTY_SELF') -- or not is_type (...) ``` in the WHERE clause is not particularly onerous. --- ## Post #22 by @birger.haarbrandt I hope this is not becoming repetitive but I'm not sure if my point here is fully considered by your answer. The issue is the following: everytime people use a query like this: select e/ehr_id, a_a/data[at0002]/events[at0003]/data[at0001]/items[at0004]/value/magnitude, a_a/data[at0002]/events[at0003]/time/value from EHR e contains COMPOSITION a contains OBSERVATION a_a[openEHR-EHR-OBSERVATION.body_temperature.v1] where a_a/data[at0002]/events[at0003]/data[at0001]/items[at0004]/value/magnitude>38 AND a_a/data[at0002]/events[at0003]/data[at0001]/items[at0004]/value/units = '°C' AND e/ehr_id/value MATCHES { '849bf097-bd16-44fc-a394-10676284a012', '34b2e263-00eb-40b8-88f1-823c87096457'} there is a risk of not receiving the expected result (I assume that everybody agrees that average developers or analysts would expect the body temperature measurements of THE patient when even the experienced colleagues from Better seem to have made this assumption) and not even knowing that the result set contains data that is not directly about the subject. Hence, I think it is fair to ask if average devs really expect the behavior you described and would be aware that they would additionally have to filter for subject on potentially multiple single entries. In a patient-centered record, average people, that did not work on the spec like you and Seref, might actually be surprised by not only receiving data about the subject from the query above. Putting an info box with a warning into the spec/docs would certainly be a first step (besides implementing Pablo's suggestion regarding predicates. I was not even aware of this technical problem). Maybe we need to collect some more real-world experience first from colleagues at Tieto, DIPS to learn if they are aware of this. --- ## Post #23 by @heather.leslie I totally endorse Vebjorn's comments. Storing data on behalf of another individual is fraught with all sorts of privacy, authorisation issues etc as well. It is an absolute no brainer from a clinician's point of view that querying should be able to clearly, safely and unambiguously identify health data contributions to a health record sourced from other individuals. We definitely don't want to have to hack this in models. --- ## Post #24 by @thomas.beale Well there's no technical problem; the query will just return whatever is in the system that conforms to it. If you have Body temps of another subject B in subject A's EHR that you are querying, you will get subject B data as well. No developer should expect anything else ;) However, the semantic expectation is another question. It's actually not that much different from the expectation that when you query for blood pressure and heart rate that you get *actual measured* BP and HR. However, in many systems today, you could easily pick up *target* BP or HR as well. This is a question of *epistemic status*, solved (in the great majority of cases) in openEHR by a) the distinct Entry subtypes and b) archetypes specific to the various epistemic categories. The question of subject isn't however an epistemic category, it's just a contingent fact. The problem though is that we mentally think of everything in subject A's EHR as being about subject A (including parts, systems, tissue etc). In fact, it contains everything *relevant* to the care of subject A, which can include foetal heart rates, candidate organ genetic match profile, etc. No-one is generally surprised by the fact that foetal heart rates are included in the EHR of a pregnant mother, so I am not convinced that the contents of the EHR are contrary to the expectations of clinical professionals. It's just that many of us often ... forget. We can make documentary additions to the specs to make it clearer, probably a good idea. But in terms of a technical fix, I suggest that @pablo's type-checking approach I posted above, or something similar is probably the kind of thing to look at. Secondly, I can imagine some preset query 'modes' like the one @Seref posted above could be implemented in a generic way, so that if you did something like ``` SET SUBJECT_SELF_FILTER ON ``` it either modified the original query to include the type-check to force only `PARTY_SELF` Entries to be returned, or maybe it re-filters the result set (clearly the latter is not a great idea if you have `SUBJECT_SELF_FILTER OFF`, in terms of performance). Either way, some technical trick is performed that has the effect of adding this condition into the query, thus always creating the expected answer. In the above, `SET SUBJECT_SELF_FILTER` could just be a symbolic name of a configured query filter / modifier that is used a lot, so that it saves the query author some typing. I would consider the true query to be *the result of having applied any such filters*, i.e. these filters are just a tooling convenience that saves time. If this is not the case, then a bunch of saved queries have to include these filter setting statements as well, to be complete. Not out of the question, but more complicated I would suggest. --- ## Post #25 by @Seref @thomas.beale as I added to my response above, I regret having let that worm out of the can. The problem with `SET SUBJECT_SELF_FILTER ON` is that it won't be sufficient to address all instances of the case Birger is raising. @varntzen already gave another example which is related to ENTRY level. Adopting this approach would lead to lots of flags/variables and boolean flags become a nightmare to change behaviour for complicated cases. This is exactly the situation with SQL server for example. I'm a strong proponent of pushing things towards the use of functions and keeping the core lang as small as possible as you know. @pablo's suggestion, with your suggested function based approach is what I'd see as an acceptable solution and I'd prefer that over SET...ON. If the semantics of execution we'd like to express becomes more complicated, which is bound to happen (again, see @varntzen 's example), we have more flexibility using functions, which we can add more variables to etc etc. --- ## Post #26 by @birger.haarbrandt > No-one is generally surprised by the fact that foetal heart rates are included in the EHR of a pregnant mother, so I am not convinced that the contents of the EHR are contrary to the expectations of clinical professionals. It’s just that many of us often … forget. Thanks Thomas, that's exactly the point why I think we need to have a safety net that enforces/motivates developers to make concious decision. I think it is worth taking a look at other domains: of course you can say that C++ is safe to use because all behavior is clearly defined in the specifications. Still we see segmentation faults because C++ allows to shoot yourself in the food (and head). Now we have to think about if openEHR which is applied in safety critical environements actively helps us preventing doing harm to people. From my perspective, patient safety comes before technical consistency. This is one of the reasons why I think that CDA is badly designed with lots of flags in the middle of the paths inside large trees. Now the question: does the WHERE condition solve the issue? I think it is not sufficient because AQL does not actively motivates its use. I would rather have an explicit statement enforced by the generator of the query to make sure there is a concious decision. What do you think about including a **mandatory** predicate like (not sure about the actual syntax but just to explain the idea): `a_a/data[at0002, subject=self]/events[at0003]/data[at0001]/items[at0004]/value/magnitude` --- ## Post #27 by @Seref [quote="birger.haarbrandt, post:26, topic:137"] From my perspective, patient safety comes before technical consistency. [/quote] You cannot have patient safety without conceptual consistency. You'd be improving one case while doing harm in another, ironically arriving at inconsistent patient safety due to inconsistent behaviour. [quote="birger.haarbrandt, post:26, topic:137"] What do you think about including a **mandatory** predicate like (not sure about the actual syntax but just to explain the idea): [/quote] A predicate like this would require aql type checking, because in order to have that mandatory subject constraint in the predicate you'd have to know that that path is indeed an RM object with the subject attribute. So this is an approach above the AQL, which is similar to how flow from Facebook deals with javascript's dynamic type system. It is conceptually a solution at the tooling/CDR level (if it does the check during parsing), which is what I've been suggesting. Unless I'm missing something, it is not your initial suggestion; which is the aql execution changing behaviour based on subject's type. Did I get any of the above wrong? Happy to be corrected. --- ## Post #28 by @birger.haarbrandt Hi Seref, I quote myself: > Hence, I would like to re-open the discussion if subject self should be the default and other paths belonging to entries about the donor (or fetus etc.) are only retrieved when this is stated explicitly If we had a predicate as suggested that is **mandatory**, this would somewhat fulfill one of my initial ideas. This is also not too far away from the idea of changing the path, even when technically the predicate is a bit different from that. I argued a bit later that always explicitly stating the subject (may it be within the predicate or a WHERE clause) would bloat the AQL statements. If this is accepted, I would be happy with this solution. Hence, it might be that I did not fully understand your suggestion and our ideas were not too far away. Does this make sense to you? --- ## Post #29 by @thomas.beale Well doing anything like that is a tooling issue; it's just an alternate shortcut to the `SET SUBJECT_SELF_FILER ON`or any other such thing. I don't disagree with any of your comments about patient safety (at all). But a basic rule of reliable, maintainable software, and tractable data is that you don't build special cases into general languages / models; you take care of them in another layer dedicated to handling special cases. As soon as we start hacking AQL or the RM or anything else to have special semantics in certain places because of the subjective (at a technical level) importance of that particular model element, we are dead. Now, having said that, we do actually have a general feature in BMM (and even in UML, via profiling) that enables specific classes and attributes to be classified in certain ways - currently you can classify model attributes in BMM as infrastructure, runtime, or (the default) semantic. These classifiers were originally proposed in order to alert tools as to which things not to allow the archetype author to constraint - e.g. most date/times in the models (only knowable at runtime). The ADL Workbench reads these markers, and other tools could as well. In theory, we could have a classifier that marks certain parts of the model as having relevance to e.g. 'safety' or somesuch. Personally, I think this method is far too obscure to achieve the desired effect, just mentioning it for completeness. --- ## Post #30 by @thomas.beale [quote="birger.haarbrandt, post:26, topic:137"] Now the question: does the WHERE condition solve the issue? I think it is not sufficient because AQL does not actively motivates its use. [/quote] Not sure what you mean here. AQL doesn't describe how AQL should be used, only what AQL is (or it will when our super-heros @Seref and @pablo and others are done with it :) What we are talking about really is that there is nothing in any of the tooling environments that alerts query authors, and one assumes report definers etc, to certain domain-level semantic issues, one being the correct understanding of what 'subject' any given query is targetting, and whether that is the intended one. We can imagine a tool, something like Better's EhrExplorer, with a friendly query builder, and one of the check boxes is: □ include data with subject /= self (or maybe pre-checked with the text 'subject = self'). The tool would then do something along the lines of what has already been suggested, and modify the queries being created by the author appropriately. The whole question of patient safety really is a layer above the mechanics of AQL, the RM or anything else in the data-processing layer. We just have to work out where to handle it. Having said that, adding some prominent NBs or whatever in certain parts of the RM spec may be a good idea to alert tool implementers to actually think about e.g. ENTRY.subject and Querying. --- ## Post #31 by @pablo On that case, with what we currently have, you need to return the OBSERVATION, so it should be SELECT ..., a_a FROM ... Then check a_a.subject on post-processing of the result set. I don't think there will be any system just displaying on a screen directly what comes from the database without any processing in the middle, there should be a business logic layer between persistence and presentation. --- ## Post #32 by @birger.haarbrandt `Not sure what you mean here. AQL doesn’t describe how AQL should be used, only what AQL is (or it will when our super-heros @Seref and @pablo and others are done with it` Well, AQL gives its users a frame to express his or her information need. I consider AQL to be a domain specific language that makes certain assumptions about the Reference Model. For example, AQL does not allow to search on abstract entry classes, we need to be explicit about the Archetype. That's a decision the designers may have made on purpose. Forcing the user to be explicit about the information need by requiring a predicate is imo just another design decision for the DSL's syntax. I have to say that for me it is not really clear if the tool layer is really the one that should fix this issue. If the RM was designed differently regarding the representation of other subjects, this would also affect AQL and potentually prevent the issue by design. Hence, I think this is more about fixing the issue on a different layer without introducing breaking changes to the RM. Though, I have to say that I'm not 100% sure if my thinking is correct. --- ## Post #33 by @thomas.beale Actually, there is nothing preventing AQL searching on abstract ENTRY types; if there were archetypes based directly on ENTRY (we don't have them AFAIK, but nothing prevents that), it should search all ENTRYs based on that archetype. Whether current implementations are using the machine representation of the RM (i.e. the BMM) to do that properly I don't know. In any case, if the FROM part just states ENTRY, with no archetype id, the querying will operate over all ENTRYs, within whatever kind of COMPOSITIONs are stated in the FROM (if any) - again, assuming this is implemented properly. In general AQL doesn't know anything specific about any specific RM; but it does need access to a meta-model of the RM so it can determine things like concrete child classes of an abstract class it should look for etc. W.r.t. the subject issue, I don't see why you think the RM is broken or should be defined differently - what should it look like? --- ## Post #34 by @birger.haarbrandt Without being able to provide all the details, I think that having paths like this in compositions would make accessing the EHR safer: [openEHR-EHR-OBSERVATION.body_temperature.v1]/party[subject='self']/data[at0002]/events[at0003]/data[at0001]/items[at0004]/value/magnitude Edit: gosh, I think I learned something today. Until 1 second ago, I had the idea in my mind that only the following expression should return a non-empty result in AQL: `a_a/data[at0002]/events[at0003]/data[at0001]/items[at0004]/value/magnitude` However, I really did not recognize that this is still valid AQL: `a_a/data/events/data/items/value/magnitude` Nice to feel very stupid :) --- ## Post #35 by @thomas.beale Well that is going to break every path-based processor in the land, since there is no attribute called 'party' under class OBSERVATION... --- ## Post #36 by @pablo I don't like that solution: that rule needs to be added everywhere in the system and repository logic, will need several tweaks, and fragments the EHR information, as commented before: the context (or domain) of AQL is the whole EHR, and doesn't depend on the type of certain attributes. If we need to check types, the proposed type checking operator would be better and implementation refactoring will be minimal. I'm talking from the standpoint of someone that needed to implement a query DSL from scratch and did those refactors many times, and adding new operators or functions is always easier than messing with path internal structures. --- ## Post #37 by @ian.mcnicoll Crikey, I forget to switch Discourse notifications on and AQL carnage ensues. I have not quite absorbed all the discussion but I think we can agree that it would be preferable to find away to protect 'naive' queries that could be performed in a way that is potentially unsafe. By naive I don't imply stupidity, just that the underlying system may be using more advance constructs than the querier is used to e.g. mixed parent/child records. This will get worse as the scope and complexity of the records increases. I'd agree with Heather that we probably should avoid partitioning non-subject data with a specialisation or separate archetype though I have done so on one occasion for a very obscure and potentially risky example of parental medication in the child record - yuck. There are a few places in the RM where a naive query which ignores a related 'status' attribute might mangle the intended result - subject, current_state, careflow_step, but not any others that I can think of and of course, as @Seref says, us modellers can really go to town but we have to take responsibility there. The examples that Birger has highlighted are by-design in the RM - nothing to do with the clinical modelling layer. It is an elegant design and maximises re-use of archetypes but it does have this weakness around querying. The various arguments for and against different solutions are all well made but I'm starting to come round to Thomas's idea of a togglable 'safe-search' mode, on by default, such that somewhere in the AQL processing chain, subject is required to be PARTY_SELF, unless otherwise specified, or safe_search is off. Similarly for current_state, when retreiving an ACTION --- ## Post #38 by @Seref Ok, I may have misunderstood you because I thought you were suggesting that aql execution changes behaviour and excludes data unlebss subject is explicitly stated This suggestion of making an attribute mandatory is at least explicit and Sounds like a more promising direction for discussion than the one I thought you were suggesting Thanks for the clarification (on a train and writing on the phone so bad formatting is inevitable) --- ## Post #39 by @Seref [quote="Seref, post:25, topic:137"] as I added to my response above, I regret having let that worm out of the can. The problem with `SET SUBJECT_SELF_FILTER ON` is that it won’t be sufficient to address all instances of the case Birger is raising. @varntzen already gave another example which is related to ENTRY level. Adopting this approach would lead to lots of flags/variables and boolean flags become a nightmare to change behaviour for complicated cases. This is exactly the situation with SQL server for example. I’m a strong proponent of pushing things towards the use of functions and keeping the core lang as small as possible as you know. @pablo’s suggestion, with your suggested function based approach is what I’d see as an acceptable solution and I’d prefer that over SET…ON. If the semantics of execution we’d like to express becomes more complicated, which is bound to happen (again, see @varntzen 's example), we have more flexibility using functions, which we can add more variables to etc etc [/quote] @ian.mcnicoll see above please --- ## Post #40 by @ian.mcnicoll Thanks @Seref, I did read it but have not fully absorbed all of the suggestions/ counter-suggestions. I'll re-read. and this is a cat that needed to be let out of the bag, and the worm out of the can... I think it is a great example of how this community. While on a rainy walk, it occurs to me that this is actually part of a wider question about potentially limit ing the query traversal. AQL gives us great power but we do need to limit its ability to dig out data, whether for semantic reasons (subject), privacy rules or Bjorn's 'report' flag on compositions to indicate that these should not normally be found by queries. So I'll re-read the various comments but here is another suggestion - why do we not consider adding some kind of RM atttribute ?? on PATHABLE that says 'do not traverse in queries' - with perhaps a few flavours. It would be up to the system designer to set that attribute on compositions, gnerally aided by tooling and/or system defaults, but capable of being overriden as per Thomas's example. That moves the problem to the system design space, outside of AQL. --- ## Post #41 by @Seref Thanks Ian, Rm should not know about how it’s to be used in terms of computation Querying, UI, serialisation to xml, json etc are all use cases for an implementation of the RM. use cases should not go there, there being RM, that effectively eliminates the whole point of two level modelling Birger’s last suggestion is sounding promising to me atm (emphasis on atm :) ) --- ## Post #42 by @varntzen Maybe on the side here, but somehow related: When it comes to data that are exclusively registered for reporting (research, financial, sick leave formulas to authorities dealing with sickness benefit, etc) purposes, we've for now abandoned the report flag on compositions, and instead making , ugly, local archetypes to be used to duplicate data born in "real" archetypes made for primary documentation. Example: Diagnose in "Sick leave formulas" is not always the correct diagnose in the EHR, hence polluting the patients diagnose history. If we could avoid making this duplicate, "monster-archetypes" and instead rely on an attribute in the RM indicating this particular information is a duplicate or for a reporting purpose only ....... Is this possible? --- ## Post #43 by @birger.haarbrandt No misunderstanding, I was just not that precise as I had several ideas floating on my mind and no clue if this is good or bad. --- ## Post #44 by @thomas.beale [quote="ian.mcnicoll, post:40, topic:137"] So I’ll re-read the various comments but here is another suggestion - why do we not consider adding some kind of RM atttribute ?? on PATHABLE that says ‘do not traverse in queries’ - with perhaps a few flavours. It would be up to the system designer to set that attribute on compositions, gnerally aided by tooling and/or system defaults, but capable of being overriden as per Thomas’s example. That moves the problem to the system design space, outside of AQL. [/quote] yeah... no! You have to solve questions with 2nd (or nth) order semantics in upper layers of the system, not in the base information models, which can't be tasked with knowing how particular types of queries should be processed. --- ## Post #45 by @thomas.beale [quote="varntzen, post:42, topic:137"] If we could avoid making this duplicate, “monster-archetypes” and instead rely on an attribute in the RM indicating this particular information is a duplicate or for a reporting purpose only … Is this possible? [/quote] Well, we are thinking about something related - [see this page on Reports support in RM](https://openehr.atlassian.net/wiki/spaces/spec/pages/92358988/Reports). --- ## Post #46 by @ian.mcnicoll I'd like to explore that idea further - can you repost in Clinical so we can discuss without tripping over the AQL docussion, which is complex enough, though I can see the relevance here. --- ## Post #47 by @ian.mcnicoll [quote="Seref, post:41, topic:137"] Rm should not know about how it’s to be used in terms of computation Querying, UI, serialisation to xml, json etc are all use cases for an implementation of the RM. use cases should not go there, there being RM, that effectively eliminates the whole point of two level modelling Birger’s last suggestion is sounding promising to me atm (emphasis on atm [/quote] Here goes (I know I am going to regret this conversation!!). My view, possibly a misunderstanding of the RM is that is is (a very good) attempt to formalise what the known requirements for an EHR can be universally modelled. This is required because health data encompasses a whole host of 'special cases' which one does not find in other sectors. Many of these 'special cases' are best dealt with in the archetype-layer but underpinning that is an attempt to capture patterns and rules around which can be defined globally. So we have the specialised ENTRY clases, which were controversial but are accepted as bringing value. We have the ideas of 'persistent' vs. 'event' compositions, again 'special cases' but universally so. Now we have the 'special case' of an ENTRY which contains ' non-self-data'. There is a clear requirement to be able to handle this kind of data but are allowing subject to be re-defined in a way that is potentially risky from a querying perspective. This has not mattered too much until now, since in general the people building the queries have been the same as those building the templates, and will be aware of this potential risk, and build the appropriate 'safe-queries'. The problem that Birger is highlighting is that as the scope of use of a set of patient data grows, the people querying may not be aware of this risk, especially since the use of 'non-self-data' is actually pretty in the overall picture. So we know that the 'non-self-data' construct is helpful, but we also know that it could represent a significant risk if not used appropriately, and that there is a very clear 'default' use i.e PARTY_SELF, which will apply in 99% of usage. As such I think this is exactly the kind of knowledge that should be instantiated in the RM design. How we do that is a separate issue. It is definitely not an AQL issue per se since any other query mechanism will have exactly the same issue. We can mitigate by adding Where clauses or predicates but these will depend on good practice and documentation or tooling, and do add complexity to every query to protect against a pretty obscure set of use cases. I would prefer that something (optionally) ends up in the instance metadata that potentially prevents inadvertent querying, and I think this will be very helpful in a number of other more subtle cases - such as the one Vebjorn has highlighted. @thomas.beale - I am not suggesting solving these kind of questions at RM level, simply that there are some kind of answers that can only be resolved by adding attribution to the data tree,since querying is archetype and template neutral. The DIPS ' report' tag is an example of this - The idea is good, just for me not expressive enough for other similar uses and should be at ENTRY level ?? even at ELEMENT not at Composition. I think that gives us much more flexible approach. --- ## Post #48 by @thomas.beale [quote="ian.mcnicoll, post:47, topic:137"] Now we have the ‘special case’ of an ENTRY which contains ’ non-self-data’. [/quote] Don't shoot me, but I have to point out that this is not the case. There are no 'special cases' in the RM in terms of the RM. The RM, like any information model just says what it says. We built it precisely to be able to represent clinical statements about things (foetal heart rate, a donor kidney) whose owning subject is not the subject of the record. But the RM of course has no idea that this might be rare - in fact it would be common in an organ transplant system or dedicated obs/childbirth system - nor that getting the subject wrong in a query could have major consequences. The RM's job is correct representation, not outguessing querying needs, or statistical occurrences of particular configurations of data. Putting special markers in the data for this case is a bad option, as @Seref pointed out, that opens the door to endless special cases, which means endless hacking of the RM, and worse, endless hacking of query processors. In the end, it would become nearly impossible to know if either the data or the AQL processor were correct, since correctness would incorporate these special markers, and special processing capabilities. It's slightly annoying, but I don't see any big problem in implementing this with some generic additions to AQL that enable modal querying, which is of the form: SET SOME_MODE do some queries UNSET SOME_MODE Where the effect of the SOME_MODE is to modify all the queries when turned in, in some standard way. In this particular case, the most obvious modification is the type checking one, that looks for non-PARTY_SELF ENTRYs and includes or excludes them. Other better solutions may be available, but none of them would involve marking data in a special way, or having to modify the query processor in non-generic ways. --- ## Post #49 by @Seref @thomas.beale I'm taking back my suggestion of SET X /UNSET X. It is a bad idea. As I wrote above, this approach has multiple problems: * Its binary nature is not sufficiently clear for potentially multi-dimensional behaviour choices. Even in RDMSs, where the context is simpler, this approach is leading to lots of problems. * It binds runtime behaviour to actual data. Whether or not we do it with SET switches does not matter. This'll create a backdoor to a bad design choice and requests will start piling up; every clinician with a 'magical' wish will get in the queue with their boolean flags instead of writing WHERE clauses: We'll be trying to explain to clinicians why `SET HIDE_NOT_CODED_DIAGNOSIS ON` is a bad idea within a few months . What we can do is to introduce 'semantic-safety' to AQL, similar to how static type checking works. This is me thinking on @birger.haarbrandt 's response to my request for clarification. As @ian.mcnicoll says, my suggestion to write the where clauses can be problematic because users of AQL are prone to forgetting to do that (I'll forget the proper uses/improper uses section of archetype metadata for the moment). Ok, then we force the CDRs and modelling tools to remind the users to write the WHERE clause or path predicate (as Birger suggested). This leaves all the runtime behaviour as it is, we can make the suggestions as complicated as it needs to be and we handle to whole requirement above the AQL runtime. If the modelling tool can't catch it, then the CDR would during query parse. If a CDR doesn't support this feature, it is not different than someone forgetting to type SET X ON but this approach leaves runtime unmodified/simple and is more expressive than binary flags. Now, how do we define these semantic-safety rules and where do we put them (ideally into a template that includes this extra semantic constraint) is another discussion but I'll cross this bridge once I win some ground with this binary flag thing :slight_smile: Thoughts? --- ## Post #50 by @Seref Ok, let me try to win you over, at least for the effort you put into your response. As @thomas.beale says, RM is a lot dumber than you see it. It is just a combination of data types with more semantics than plain text or integers but it is not anything more than that. You have a solid point here regarding difficulty of following good practice. So let's enforce good practice via the platform and keep the fundamental principals intact. Please read my response to Tom: https://discourse.openehr.org/t/safety-features-in-aql-subject/137/49?u=seref --- ## Post #51 by @wouterzanen Sorry Just dropped in. And haven't been able the read the full discussion. As you know we are doing quite a bit with transplants. And for me it seems that the best solution is to consider both donor and recipient as separate ehr's. We will create different subject namespaces. The only thing we are now facing is what to do with the transplant as this is a clinical procedure involving both donor and recipient (so perhaps we add them to both). --- ## Post #52 by @birger.haarbrandt Hi Wouter, good to have you in this discussion. Do you actually point to the record entry within the the donor's EHR? We are currently concerned with kidney transplants (part of Screen Reject and NephroDigital projects) and found some stuff that from my perspective just needs to be kept permanently available at the receivers patient record (e.g. the lab value to led to the banff classification). Do you also see this from an EHR perspective as we do within a hospital or this your case a bit different? Birger --- ## Post #53 by @wouterzanen We don't use the Banff classification but as I understand it it is a classification of the organ (graft). I agree that this is information that should be permanently available to the recipient. I don't have a final solution yet or maybe it should be a case by case solution. So there are a couple of use cases. 1. For use by either the recipient physicians or donorcoordinators(donor side). Who are the useally the composers of the data on either side donor or recipient. 2. The operational registry managing the allocation proces (organ exchange organisation like Eurotransplant) and thus the link between donors and recipients. 3. A clinical registry for study purposes. Let me start with defining a transplanted organ as a graft, you can consider this the same as tissue (skin) or even a mechanical graft like a ventricular assist device (heart pump). So as soon as it is disconnected from the donor body it is no longer an object of that donor. Considering this I would say all information on the graft is all gathered via a proxy. Now for the first use case: 1. A recipient physician would need to view the data collected by the donor coordinator on the graft. So he should be able to retrieve this data and it will be part of the patient record. This data on the graft will be enriched by the recipient phycisian. Now there are to cases in which the information that was enriched by the recipient should flow back to the the donor team. One if a Serious adverse events take place e.g. a malignant tumour is found. This is a separate report (template) that will be sent to the organ exchange organisation and added to the donor record as well as all other recipients of donor organs. The second being for a quality report for the procurement surgeon.I think a EHR might not be the best place to keep the quality reports. 2. So for Eurotransplant we would consider information on the graft as a separate object closely linked to transplant. For now most of the above information resides in the donor EHR and in the system for quality and adverse event reaction and we would present the data to different parties from either the recipient, donor or transplant(allocation) perspective. I think the managing link between these domains is one of the core tasks of Eurotransplant. So how would we make this work in openEHR, openEHR seems in it self not well suited to manage relations between EHR's, so we might not even use openEHR for this. If we would, graft I formation would stay with the donor, as during our allocation donor graft might be allocated to one recipient who declines and end up in other recipients. Transplant/allocation information has to reside on both sides to be able to always make the connection. 3. Now for a research database for ether a donor or recipient I would just present graft information on both EHR might get you some redundant data but it is easy. And will be most useable. So your original question how to recognize this as data of the graft. We for a study were we are using this have used headings to differentiate between graft, donor, recipient and transplant information. But what we have also done is create an extension archetype to fit the protocol extension slot when using . Add information on the procedure (on procurement, during transport, during transplant) or something like that perhaps even just if it is donor or recipient derived. Gr, Wouter --- ## Post #54 by @ian.mcnicoll One other approach might be to use top level folders to carry non ehr subject data. I've been considering something similar to separate unmanaged onsumer wearable type info from managed care records. --- ## Post #55 by @Seref Just to clarify @ian.mcnicoll when you say top level folders, you're referring to folders **associated** to a particular EHR, but contain data that is not within the EHR. i.e. using folders as a metadata mechanism for EHR. Did I get it right? --- ## Post #56 by @ian.mcnicoll Not quite. In the ehr but recognises a different level of governance. Normally I would only want to query within the professional folder. --- ## Post #57 by @wouterzanen @birger.haarbrandt just one other concern with using party proxy to distinguish between recipient and donor. What if the recipient receives organs from multiple donors either at the same time (for instance pancreas islets transplants) or sequential. You would still need to identify which donor and which organ, values relate to. It seems that querying on that information is better. Then on the matter of privacy a deceased transplant donor have no privacy under the GDPR, only living persons have privacy. However morally we should respect there privacy. We also have the case of living transplants, however I would say that consent in this case comes with consenting to donate an organ and does not have to be made explicit. Although I think that making it easier to query on party-proxy is a good idea, I still think it is not a good solution for this use case.. --- ## Post #58 by @Seref sorry @ian.mcnicoll you lost me there. My question is in the context of data organisation. I don't understand what you mean with different level of governance or the professional folder :( --- ## Post #59 by @ian.mcnicoll In my example there is a need ot carry both 'personal' unmanaged,uncurated compositions of patient-provided wearable device data (typically daily step counts in a well person' *and* similar 'professional' patient-provided wearable device data (perhaps daily step counts as part of a pre-operative fitness program, initiated and reviewed by an anaesthetist). Same data ,different composition/template but the bulk 'personal' data is *generally* not going to be queried along with the 'professional 'data. My suggestion was to create 2 top-level folders, pne for 'personal' compositions, one for 'professional' compositions ,and to use something like SELECT .... as step_count FROM EHR e[...] CONTAINS FOLDER f ['professional'] CONTAINS composition c contains OBSERVATION o[openEHR-EHR-OBSERVATION.step_count.v1]routinely for professional clinical activity. Does that help? --- ## Post #60 by @Seref Thanks, I think I got it now. --- ## Post #61 by @bna Coming late to this topic is hard. So many discussions going back and forth.... I want to bring another, possible, solutions to the table: **Solution 1: Folder which link in data from donor** AFAIK the problem here is to collect and store data belonging to both the donor and the recipient. They are different persons. In normal circumstances the data would belong to different EHR's. For such use-case I am thinking about storing data into the EHR which is belongs, and then use FOLDER to collect data from the related EHR's. The solution is to create a FOLDER in the recipient EHR which links in data from the donor. **Solution 2: Change the defaults of AQL** Since all ENTRY instances have a subject and we normally expect PARTY_SELF (the EHR) to be the subject - lets make the default of AQL be to return only PARTY_SELF. Applications who need all kind of subject might query for specific or all subjects. This solution is equal to the way we handle REPORT compositions in DIPS EHR Store. Compositions with the report category is not returned by default. You have to query for them. I have not been able to read through all the comments in the thread, so forgive me if someone has suggested the same solutions before. --- ## Post #62 by @thomas.beale I'll just point out that there is a reason we have ENTRY.subject as an attribute in the model - it's precisely to handle the case when the subject of the data in the Entry isn't the same as the subject of the record, *even though the Entry is relevant to the subject of the record*. This was in openEHR from the earliest days - based on wide clinical consultation - to support representation of: * foetal heart rate measurements (in the mother's record) * donor organ data (in the recipient's record) * disease information of an infectious carrier (in the record of a newly infected patient) * family history information Although in many systems, PARTY_SELF is likely to be the subject, in many systems - those specialising in any of the above kinds of data - it is just as likely that subject = foetus, donor or some other person. I think taking such information out of the patient's EHR would clearly compromise care, or just render it impossible. Personally, I think we can solve the challenge of making querying work as intuitively expected but also be 'easy' - with some generic additions in query tooling. It is not impossible to imagine also / alternatively maintaining some index object in the EHR that contains pointers to ENTRYs for which subject /= self, and using that to make it easy for queries to distinguish. I think such methods are likely to be fragile however. I don't think FOLDERs will be useful here, due to the basic fact that an original COMPOSITION could very easily contain two Entries having different subjects, e.g. maternal (subject=self) BP, heart rate, and foetal heart rate (subject=foetus #N). I still think that the main 'problem' here is data user expectation v real data not being aligned. It is a matter of education within our community on how to do querying properly and safely. If we want to avoid doing some kind of type check on ENTRY.subject, then it's a matter of modal querying and/or **query pipelining**, something which we don't have at all in openEHR right now, but which is basic in the research / analytics domain. --- ## Post #63 by @bna @thomas.beale - I think we both agree then. And as a follow up we, as a community, have to work out some patterns on how to manage such data use-cases. IMHO it will be reasonable to say that the default behaviour of an AQL query engine is to response with only ENTRY belonging to PARTY_SELF. If the client needs more we have to implement some flag or query patterns. I don't think the answer is yes/no on this postulate and look forward to feedbacks. --- ## Post #64 by @Seref @bna I and Tom provided opinions against the default behaviour you suggested above. As long as this discussion may have been, it is a valuable one and the discussion above is worth reading. --- ## Post #65 by @bna @Seref - Yes I read that earlier and was naive enough to think that you changed your minds for this *better* approach. @matijap - I guess you like the last two words of the sentence above :-) --- ## Post #66 by @thomas.beale Well that 'default approach' means the query engine knows something specific about the RM, which is a basic IT anti-pattern so we need to stay away from that. I would currently advocate that the appropriate filter be included as a pipeline stage which is automatically added by a query tool working in a specific mode that has been switched on by the user. --- ## Post #67 by @Seref I'm very keen to hear why "this" is a better approach (compared to alternatives) because personally I think the most promising suggestion to move forward came from Birger and even that scares me a bit when I think about it. May I kindly request you elaborate a bit on your suggestions, in comparison to, say static analysis of aql as discussed above? I'd be delighted to have a few more options on the table. --- ## Post #68 by @ian.mcnicoll Are we looking at a sort of Linter for aql that allows us to check for known snafus and potential safety issues? --- ## Post #69 by @Seref "sort of Linter" is good enough for me to say: "yes, that's what I have in mind based on what Birger said previously", but that is me. Tom's definition sounds like a runtime solution but still backed by some static analysis. I think some sort of static analysis of AQL is inevitable and I'm personally in favour of this approach as much as possible. I'm in the mind to start a new thread based on what I consider to be most useful input from this one and discuss this particular approach there. --- ## Post #70 by @bna [quote="birger.haarbrandt, post:26, topic:137"] a_a/data[at0002, subject=self]/events[at0003]/data[at0001]/items[at0004]/value/magnitude [/quote] Are you talking about this kind of solution @Seref? Is so - then we agree on the same kind of solution. My suggestion was to let the AQL engine add this kind of structure, if not present. What we really want is to be backward compatible. We don't want to introduce some new "mandatatory" structures of AQL. I guess current querying does not consider that the subject might be something else. Thus it will be quite safe to add this assumption to the engine. And this feature might even be a configuration depending on the running environment. --- ## Post #71 by @bna I think we have the same assumptions then. The *appropriate filter is included as some pipeline stage* in the AQL engine - given either a configuration or some *hints* given by the client. The overall requirement here is to not introduce any breaking changes for existing clients, and still be able to cover the use-case given by @birger.haarbrandt. --- ## Post #72 by @Seref I've been objecting to AQL engine adding this kind of structure automatically, after a query without this structure arrives. My suggestion is to raise a warning or an error during parsing of the AQL statement, before it reaches the engine. Whether or not to raise the warning/error would likely be determined by metadata in the archetype. Modellers should at least take the responsibility of alerting downstream users of their models that the model may contain data that belongs to parties other than the subject of the EHR. So its their responsibility to put that metadata into the model. AQL processing stack can check archetype metadata, find the need to check for subject or set subject in the query (as pointed out by the semantics of warning put in place by the modeller), then raise a warning or an error (based on the severity level set by the modeller) . Then it is the query author's responsibility to act and put in place whatever that warning/error requires. From the point of view of the AQL engine, there are no warnings, no conditional execution, it just runs whatever it receives and returns the results. We may have different terminologies, I refer to engine as the AQL runtime, i.e. the chunk of code that does the actual fetching of data. All of the above can be performed by AQL authoring tool and/or by the CDR that does the parsing of AQL. It'd play nice with everything. So far, this is the best approach I can think of. Revisiting my point above re modellers taking responsibility, please remember that the core design goal of openEHR is to let clinicians take charge but without forcing clinicians to become techies or techies to act as clinicians. I think the approach I suggest above fits that overarching design principal. Techies should not implement logic that watches for conditions without a clinician telling them to do so. The suggested approach above may go down in flames during technical implementation or may have glaring holes in it but it attempts to preserve the core principals of openEHR with concern for whatever system design principals I have (which is always open to challenge and improvement) --- ## Post #73 by @ian.mcnicoll > Modellers should at least take the responsibility of alerting downstream users of their models that the model may contain data that belongs to parties other than the subject of the EHR. But that's the point - this is not part of the archetypes, it is part of the RM. Any archetype can be assigned a non-subject party, at run-time or in a template. Now this is all fine in a smallish, controlled semantic space where there is a single app, and set of devs who can be relatively easily kept informed. And although this kind of usage is anticipated, it is will still only occur in a tiny minority of templates/run-time situations, other than in specialist sectors. AQL tools cannot help us create safe queries since the risk is going to arise when I am querying cross-template e,g. at Entry level. The only safe approach is to explicitly query for PARTY_SELF, unless you are absolutely sure that no-one, anywhere. This is easy enough in a constrained app environment but definitely prone to a critical error in a full-blown national EHR. I am beginning to think that we may need to change the RM to ensure that this context-switch does not cause a serious downstream issue. --- ## Post #74 by @Seref [quote="ian.mcnicoll, post:73, topic:137"] But that’s the point - this is not part of the archetypes, it is part of the RM [/quote] Well, everything is part of the RM. It is at the archetype level we handle the specific case. How "handling" may work may be subject to discussion but that's where it's meant to be done. Whatever you want to express by changing the RM, we must find a way of doing it by constraining the RM. I'm really not trying to be a zealot or annoy people here but this semantics we're trying to express is just too high level for RM. Can you tell me what kind of change you'd propose to RM and what's the problem with my suggestion above? I'm sure we're all tired of the discussion at this point and honestly I'd just let it go if it did not feel so against the design of this whole thing we've spent our careers on. [quote="ian.mcnicoll, post:73, topic:137"] AQL tools cannot help us create safe queries since the risk is going to arise when I am querying cross-template e,g. at Entry level [/quote] If you add the warning/check to the archetype that contains that entry, tools can warn you no matter what templates you use, unless you issue a query that refers to an ENTRY without referring to its parent composition (which is a bad idea anyway) --- ## Post #75 by @ian.mcnicoll "It is at the archetype level we handle the specific case." But the whole point is that non-self subject can be applied to ANY archetype, at template or run-time. "If you add the warning/check to the archetype that contains that entry, " but that's not the problem. The archetypes very rarely constrain subject. The issue is with (the majority situation) where an archetype can be completely legitimately, at run-time, be assigned either to self or to non-self. Lab test (as in donor examples) is a good use-case, but almost any archetype could potentially be used that way - and the problem is that I can not know if/how it has been used in any given CDR. This is not a property of an archetype. So therefore the only safe approach is to always explicitly search for subject=PARTY_SELF except on those rare occasions where I might want to do otherwise I should never ignore subject in *ANY* AQL because I cannot ever know if/how someone might use any given archetype in non-self mode in the future, at which point I might return some dangerously wrong information, without me donmigh anything new. Same applies to AQL fragments in GDL, or Task planning. I am still open to any/all suggestions about mitigation but I'm afraid this does present a very significant future risk and IMO needs to be 'designed out' so that a query that omits .subject should never be capable of returning Entries that apply to non-self . i.e. this SELECT l FROM EHR e CONTAINS OBSERVATION l[openEHR-EHR-OBSERVATION.laboratory_test.v1] must **never** be capable of returning data where l.subject <> PARTY_SELF. If that means re-engineering how we handle non-self subject, then that's what we need to do. RM change - there is not a simple structural solution, I agree, other than to drop the idea of non-self subject altogether. However, we do have quite a lot of functional specifications already - we are just adjusting the guidance on disjoint merges because of a safety issue. This is a similar behavioural issue that I think should be enshrined in the specs, just in the same way that we say that a query over a persistent composition should only return the latest version. Guidance is not enough, tooling is not enough, education is not enough. Sorry to bang on but this is important. --- ## Post #76 by @matijap I must say I am against session-lived "flags" that Thomas proposed, like *SET SUBJECT_FILTER ON*. And while I cannot object his arguments about abstract RM and AQL, I find Ian's points stronger. Either we must get rid of the subject attribute, which will make openEHR inadequate for certain medical settings, or we must implicitly ignore data not directly related to the EHR's owner unless explicitly instructed otherwise. This fact is not necessarily a part of the AQL itself, but rather a property of the query engine within CDR. The only other option (that is hard to implement consistently and is backwards-incompatible to a greater extent than the implicit filtering of data, and still requires special handling of the subject attribute within query engines) is to make the subject filter mandatory for all cases where the engine cannot prove that the data cannot contain entries for other subjects. That is, the only exception would be when querying directly from entries using an archetype predicate where that archetype is known to constrain the subject to PARTY_SELF and does not contain any other entries itself (or those are again constrained in that way). But that would put pressure to archetype modellers to have lenghty discussions for each archetype whether it makes, or will make in the future, sense for it to be used in subject!=PARTY_SELF scenarios, because constraining it would make queries simpler. I don't think this is the most sensible approach here. --- ## Post #77 by @bna This is important as @ian.mcnicoll pointed out. My thinking is still that the usual (> 95%) of entries will belong to PARTY_SELF. Which makes it a good choice to let the AQL engine set this filter if it's not explicit set by client. With this approach all existing clients will work as before even when entries not belonging to PARTY_SELF is added. The existing clients never wanted data not belonging to the subject of care. This is the same argument as: [quote="matijap, post:76, topic:137"] we must implicitly ignore data not directly related to the EHR’s owner unless explicitly instructed otherwise. This fact is not necessarily a part of the AQL itself, but rather a property of the query engine within CDR. [/quote] It's impossible to change all client software within a limited period of time. They might be delivered by third party vendors. This is why backward compability is important and necessary. I really don't think existing clients of an AQL client expected data from another subject than PARTY_SELF. Let's keep it that way. When implementing an openehr system you must have some reference to the RM. The data is defined and serialised according to the RM. The RM has an attribute for subject. This attribute is solve the use-case at hand, the donor case. Yet again we see the RM as a rescue when real life use-cases appear. Great work! I don't know why this use-case forces a change in the RM. IMHO we only need to change our minds about what kind of data might be stored in the CDR, and provide some reasonable defaults to make life as a client software developer easy, quick and open for extension. Let the AQL engine query only entries with subject party self if not explicit told not to. --- ## Post #78 by @matijap Of course I would argue that such an implicit filter has to be well documented, probably in more than one place. We must acknowledge it is a deviation from genericity, but one that will confuse less people than the current state does. --- ## Post #79 by @varntzen [quote="Seref, post:74, topic:137"] "tools can warn you no matter what templates you use, unless you issue a query that refers to an ENTRY without referring to its parent composition (which is a bad idea anyway)" [/quote] What happens when we extract data from the EHR to a data warehouse? Won't that be on ENTRY-level, not referring to the composition ? --- ## Post #80 by @Seref Good question. Technically another discussion though, but here we go anyway: It heavily depends on your warehouse design as well as your use of data warehouse. We're currently running multiple large scale data warehouses and BI and reporting based on that is never done without consideration of the models that set the rules for the creation of the data. We, at Ocean, subscribe to archetypes being the ubiquitous language across all use cases and try to leverage them in every context we can. You can see how that view is shaping my responses it this thread. Reporting on openEHR data is a whole domain in itself and it took us years to get things working to the satisfaction of most stakeholders (noticed how I avoided saying get things right? ;) ) --- ## Post #81 by @Seref Thank you, this is a very good explanation of the clinical requirements. Your point about anybody being able to update the data with different subjects is the tricky bit here. I can see how the RM design leaves you frustrated in this case. You would not want to explicitly archetype subject to PARTY_SELF because it is redundant in say 95+% of the cases. I'd say the inconvenience is that the RM is giving equal opportunity to very likely and a lot less likely distributions of data by design. I'd say a long term solution would be to make it harder to chuck someone else's data next to EHR's subject's data. I'm fine with the rare case being explicit and relatively inconvenient compared to common case. I could keep thinking/discussion about remedies at higher levels to manage this situation without a change to RM but maybe we should have a discussion about a change to RM too, which may keep things cleaner and still avoid modifying AQL behaviour. @ian.mcnicoll can you please start another thread regarding what a potential change to RM may look like to deal with this subject != PARTY_SELF situation? --- ## Post #82 by @heather.leslie Totally endorse Ian's position here --- ## Post #83 by @thomas.beale [quote="ian.mcnicoll, post:75, topic:137"] I am still open to any/all suggestions about mitigation but I’m afraid this does present a very significant future risk and IMO needs to be ‘designed out’ so that a query that omits .subject should never be capable of returning Entries that apply to non-self . i.e. this SELECT l FROM EHR e CONTAINS OBSERVATION l[openEHR-EHR-OBSERVATION.laboratory_test.v1] must **never** be capable of returning data where l.subject <> PARTY_SELF. [/quote] From a business / clinical semantics point of view, I agree with the importance and patient safety etc. No argument there. However, the idea that the raw AQL query would not return all matching data if some of that data were Entries for other subjects is *objectively wrong in any DB + querying system*. Now, if we consider that instead the above query is something like what the query designer sees, within a certain tool in which certain options are set to fix things like which subjects data are being retrieved for, then we are fine. We need to study this a bit more, but solutions that look very reasonable to me are: * the automatic inclusion by tools of 'subject=self' in the relevant predicates due to a subject=self option being selected in the tool (this is a nice lightweight approach I think @birger.haarbrandt mentioned somewhere in the previous 765 posts ; ); * the automatic inclusion of a second stage pipeline filter that removes results for any subject not covered by the selected option in the tool. These approaches don't under-estimate the clinical importance of getting the expected result, but also don't violate truly basic rules of DB and query language design. --- ## Post #84 by @ian.mcnicoll > However, the idea that the raw AQL query would not return all matching data if some of that data were Entries for other subjects is *objectively wrong in any DB + querying system* . and whilst I think I understand that perspective, I would take the view that openEHR is not just a 'DB + querying system'. It encompasses much more, in terms of functional and behavioural specifications. I accept that is definitely not a pure AQL issue, any domain-driven query mechanism would encounter the same issue. We are not querying against DB, we are querying against a domain model which is carefully crafted to meet the requirements for a health IT, and if that domain model (for good reason) introduces a potential significant context switch, the domain model/spec should mitigate that directly IMO. Whilst the query language measure out lined would mitigate the risk, I don't think this is strong enough for a safety-critical environment. One way or another I think this behaviour must be default out-of-the-box for any CDR implemented against the Spec. We cannot afford to leave this to tooling, and it has to apply to any current or future domain querying mechanism. --- ## Post #85 by @varntzen What Ian said. :smiley: --- ## Post #86 by @Seref [quote="ian.mcnicoll, post:84, topic:137"] openEHR is not just a ‘DB + querying system [/quote] Careful. AQL **is** just a query language addressing a logical data model. --- ## Post #87 by @wouterzanen So really enjoying this discussion and learning a lot as well. Now I understand the problem, I have to say I agree with both positions. It would be clearly wrong for a query not to give the expected results, in both cases. If I do an AQL query I would expect the where clause to filter data not increase data. It gives a very messy query experience in the 5% of cases that would need this. The only solutions I see now: - add a parameter to the template so that on design time, the possible parties for this template can be set. This could default to party_self. Update AQL to require Subject either in the select statement or in the where clause so there is no ambiguity - Another option, but messier, would be to not allow query results for more than one Subject in a single query unless Subject is part of the Select statement. --- ## Post #88 by @ian.mcnicoll I was being careful - I said 'openEHR' not 'AQL'. ;) This is most definitely not an AQL problem. It will affect any querying system applied on top of the openEHR RM that allows the state of ENTRY.subject to be ignored, and that basically means any properly featured system - the power being in the ability to query the full data tree. So, I get (finally!) the argument that at some level there needs to be a 'pure' querying mechanism. However the clinical safety officer in me is fretting that exposing this routinely to an increasingly inexpert community of devsand clinical people, will lead to some potentially dangerous problems. If I was CCIO of an openEHR CDR product, I would be insisting that some kind of safety control was applied in this area by default. It feels that there is broad consensus that at some point in the query-building/ running process, that we should routinely apply subject = PARTY.SELF in AQL or equivalent querying language. The discussion now is really about where that should be applied 1. In training / educational materials 2. In tooling 3. "In the CDR" i.e in software 4. In the RM I do not feel that education or tooling are enough here. As I said above, whether part of the openEHR spec or not, As a CCIO I would not sell or procure a CDR unless it mitigated this risk in the software i.e. it injects/applies the subject clause, unless it is already specified. THis could be a vendor-specific feature but I would far prefer for it to be part of the spec, perhaps as a 'query profile' that I could ask to have applied and test against. Does that bring us together? --- ## Post #89 by @thomas.beale Well the only place this can happen is above the DB, and above the technical querying language. Note: this is the same answer regardless of DB, query language, or product. This debate is identical for a Cerner system or anything else. So the solution can only be concretised in a layer above the raw QL. Some 'Clinical Query language' maybe, or modal querying, which would require something like existing queries to be put into 'Querying sets' which have some options on them. Anyway, technically the solution is not difficult. The interesting question is determining what constitutes an 'openEHR CDR' - and what to do with existing queries. I would argue for upgrading them to the correct form. --- ## Post #90 by @Seref There goes my Christmas drinks from you [quote="ian.mcnicoll, post:88, topic:137"] Does that bring us together? [/quote] Not really, because despite doing a great job in explaining the problem and the requirement, you're then insisting on a particular implementation method that you see as the only solution to this problem: [quote="ian.mcnicoll, post:88, topic:137"] As a CCIO I would not sell or procure a CDR unless it mitigated this risk in the software **i.e. it injects/applies the subject clause**, unless it is already specified [/quote] You seem to have a very particular, technical opinion about how this risk should be mitigated (based on 3 in your list) and that's where we really disagree :) It would be great if we could explore if a combination of 2 and 4 because you're now pushing a technical argument, not a clinical or even a user experience one. I sincerely apologise if I'm getting this wrong but I feel like we need to exhaust other options even if they deliver less than perfect results, which is something we'll have to leave with sometimes. --- ## Post #91 by @birger.haarbrandt > I sincerely apologise if I’m getting this wrong but I feel like we need to exhaust other options even if they deliver less than perfect results, which is something we’ll have to leave with sometimes. Not putting purity of design over patient safety is a hill I certainly would die on. Maybe I missunderstand the argument: on which level does the AQL filtering happen in DIPS as described by @bna? --- ## Post #92 by @Seref @birger.haarbrandt no need to die on any hills. There are already too few openEHR implementers. My position is certainly not that of purity (for the sake of purity). I am concerned that following some options we identified here will lead to more safety issues down the road than the ones we fix. --- ## Post #93 by @varntzen Dear all This seems to be a discussion of a challenge that isn't possible to solve in Discourse. Personally I do not see all the implications of the technical suggestions, and I think others do not see the implications for modellers for some of the suggestions. But I'm quite sure all understand that there is a real safety issue, unless we come up with a way of handling it. But it isn't a purely technical issue, neither a modelling or a specification issue. It's between all of them. Should this be further discussed in a workshop? Online or physical? Are someone capable to present a summary of the suggested solutions, with pros and cons? --- ## Post #94 by @ian.mcnicoll @varntzen- actually I think we are creeping to a conclusion so I am keen to push on here for a bit. @Seref- u were off my Xmas list 3 posts ago, mate ;) > you’re then insisting on a particular implementation method that you see as the only solution to this problem: I was trying not to 'solutionise' so let me try again to explain where I sit. 1. This is a an issue that arises because of a feature in the RM. None of us want to lose that feature but I think we all recognise that it can present a risk because of the very flexible way that openEHR data, can and indeed, should be queried (by AQL or something else). 2. It is easy to 'fix' the issue by creating queries that explicitly handle subject in every use-case. That could be done by education/training, better still by tooling but I still feel that the risk presented, although rare, is sufficiently impactful that I as a CDR product consumer, would want that CDR to handle subject explicitly. I don't actually care how/where that happens but it has to have the same effect as if tooling added a subject clause by default. My test for this in an AQL-enabled system would be to check that any AQL which lacked a subject clause returned a resultset that omitted any non-self subjects. Whether that is by pre or post-processing, I don't care - it is the safe behaviour that matters. And of a different query engine, a different tech solution would be required but he end behaviour should be the same - a query over openEHR that returns ENTRY objects should never return non-subject entries unless those have explicitly requested. 3. It might be argued that this could be down to specific CDR implementation but I feel it is such a universal issue that we need to say or do something about it as part of the specifications/conformance, even if this is an optional profile. As a purchaser, I would require that profile to be available, others may take a different view. @thomas.beale asked the good question about 'what is a CDR' in this context. For me that is a datastore that conforms behaviourally to the openEHR specification (with optional levels of conformance). I'd want 'anti-skid braking' to be available and turned on. @Seref - are we at least agreed on that as an optional requirement ? --- ## Post #95 by @thomas.beale [quote="birger.haarbrandt, post:91, topic:137"] Not putting purity of design over patient safety is a hill I certainly would die on. [/quote] There's purity and there's purity. If any DB technology (including query language) such as Postgres, Oracle, MySQL, CouchDB, StrawberryHill, LucyInTheSkyWithDiamonds or anything else doesn't completely and exactly implement the formal logic of its DB technology (relational, graph or whatever), it won't generate reliable results, and can never be trusted. That is not a viable position for any DB technology. AQL is a DB technology, albeit a rather small one, but it is like any other 'new' QL, e.g. Sparql, GraphQL etc - it has to actually work in expected ways. It can't react in special ways to one particular RM (i.e. DB information schema) or just not bring back certain results because they don't conform with the 'most obvious assmption of the most typical business users'. That's suicide for any technology. So, at this level, things are very pure, and if they are not, it's just a bug that needs to be fixed. So these semantics have to be addressed above the AQL level, there is no way around that. Now, it can be done in various ways, many not onerous. It seems to me that the main worry is simply that most authors of AQL queries don't think about subject or have an easy way of filtering on it. So we need to change that situation. --- ## Post #96 by @ian.mcnicoll > So these semantics have to be addressed above the AQL level, there is no way around that. It took me a while but agreed:face_with_monocle: However I will still argue that we need to build that filter into CDR software, not rely on education or tooling, and that guidance on the requirement (and conformance testing) should be part of the openEHR spec (optional to implement). --- ## Post #97 by @bna This is not a topic on AQL. It's a topic about the service interface of an openEHR RM CDR. We are talking about the conformance rules for such an API. I think @ian.mcnicoll summarised it well above. Given another context or use case the discussion is different. An analogy: We are talking AQL into a view. This view has a preset default filter which is only include entry with PARTY_SELF. The same where true if the "view" had access control and authorisation. Then you would only get the entries you where allowed to see. As you see from the analogy and the example, this is not a discussion about AQL rather a discussion about the requirements for an openEHR CDR. --- ## Post #99 by @wouterzanen [quote="ian.mcnicoll, post:88, topic:137"] However the clinical safety officer in me is fretting that exposing this routinely to an increasingly inexpert community of devsand clinical people, will lead to some potentially dangerous problems [/quote] Probably in other areas as well. Isn't it time to define a formal certification for openEHR CDR's? --- ## Post #100 by @bna [quote="birger.haarbrandt, post:91, topic:137"] Maybe I missunderstand the argument: on which level does the AQL filtering happen in DIPS as described by @bna? [/quote] The filter is applied at the service level. I think I am discussing this topic from the CDR API view. I.e the openEHR REST API. At this level data will be filtered according to the rules defined. Access control and authorization might also influence the behaviour of AQL. I assume @thomas.beale and @Seref are discussing from the viewpoint of an super-administrator (sudo). Then there will be no filter and you are operating on your own risk. Right? Based on this we have several levels of conformance requirements. --- ## Post #101 by @ian.mcnicoll > We are talking AQL into a view. This view has a preset default filter which is only include entry with PARTY_SELF. Does that include queries retrieving composition objects -do you filter out any non-subject ENTRIES? What about whole GET /composition retriavals? --- ## Post #102 by @bna [quote="ian.mcnicoll, post:101, topic:137"] What about whole GET /composition retriavals? [/quote] Good question. Then you have to get the whole thing. We can't exclude stuff from the Composition --- ## Post #103 by @Seref Well, if the drinks are gone, I may as well keep pushing, being the purist I am... [quote="ian.mcnicoll, post:94, topic:137"] I don’t actually care how/where that happens but it has to have the same effect as if tooling added a subject clause by default. [/quote] [quote="ian.mcnicoll, post:94, topic:137"] My test for this in an AQL-enabled system would be to check that any AQL which lacked a subject clause returned a resultset that omitted any non-self subjects [/quote] These two suggestions both lead to a query other than written by the query author to be run. From an openEHR specifications perspective, this should not make it to AQL specifications. It is the implicit nature of this that is the problem so the way forward is potentially * making this interception/filtering explicit * doing it somewhere above the AQL layer and specifying it outside of AQL If we add a dedicated REST endpoint that is guaranteed to apply some filter to otherwise unfiltered aql results like @bna suggests , it would be shifting the problem elsewhere, because now we have to make sure that calls should be made to that endpoint in some cases and if you call the unfiltered endpoint then you're back to same case in which you forgot to force subject, only X layers above and lots of engineering time && money later. I guess the primary difference between our approaches is I'm in favour of "force the user to write the correct query and don't let them run it until it's safe" and you're in favour of "fix it for the user if they fail" (explicit vs implicit). Both @thomas.beale and I are trying to explain the latter is problematic. Same checks to ensure safety can be performed both at the client side and the server side btw. Here is a sampling of my failed attempts to suggest server side checks as part of CDR: [quote="Seref, post:72, topic:137"] We may have different terminologies, I refer to engine as the AQL runtime, i.e. the chunk of code that does the actual fetching of data. All of the above can be performed by AQL authoring tool **and/or by the CDR that does the parsing of AQL** [/quote] [quote="Seref, post:27, topic:137"] It is conceptually a solution at the tooling/**CDR level (if it does the check during parsing),** which is what I’ve been suggesting. [/quote] [quote="Seref, post:6, topic:137"] I’m trying to think about some feature **at the CDR level, above AQL** [/quote] [quote="Seref, post:49, topic:137"] then **we force the CDRs and modelling tools to remind the users** to write the WHERE clause or path predicate (as Birger suggested). [/quote] --- ## Post #104 by @thomas.beale [quote="wouterzanen, post:99, topic:137"] Probably in other areas as well. Isn’t it time to define a formal certification for openEHR CDR’s? [/quote] Well we do have a [partly built conformance spec](https://specifications.openehr.org/releases/CNF/latest/index). We don't however have a formal definition of where the subject 'option' would go, such that we could determine system conformance to it. --- ## Post #105 by @birger.haarbrandt For EHRbase, we are using the openEHR conformance spec to define integration tests which we use in our continuous integration pipeline. This could serve as conformance framework to certify conformance. We will hopefully find the time to put the stuff in a separate project (it's already within our github repo). --- ## Post #106 by @thomas.beale That spec is rough - if you develop new tests, changed tests, or find tests that don't make sense, please make them known! Slack #conformance is probably good... --- **Canonical:** https://discourse.openehr.org/t/safety-features-in-aql-subject/137 **Original content:** https://discourse.openehr.org/t/safety-features-in-aql-subject/137