# Aql teaser 1 for implementers **Category:** [AQL](https://discourse.openehr.org/c/aql/43) **Created:** 2020-02-27 09:54 UTC **Views:** 1034 **Replies:** 32 **URL:** https://discourse.openehr.org/t/aql-teaser-1-for-implementers/373 --- ## Post #1 by @Seref Here is a little something for aql implementers that I know of (where Ian counts for EhrBase) @bna @matijap @ian.mcnicoll Given the composition on the left, and the AQL query after that, what should the result set for the select clause on the right be? Take this as a tease between me and you in a light hearted way please ;) think on it whenever you have some free time. ![image|690x359](upload://kpvriham1kuZOPjnlvhLw3CP3BK.png) SELECT c,s,i,o,e FROM COMPOSITION C c CONTAINS SECTION s CONTAINS (INSTRUCTION i AND (OBSERVATION o OR EVALUATION e)) (i) on the left most tree is composition|section|observation... instances, i.e. a single composition instance --- ## Post #2 by @matijap Could you write the query in text, please? :slight_smile: --- ## Post #3 by @Seref sure, I updated the post :) --- ## Post #4 by @thomas.beale So if we assume that the second CONTAINS is distributed over the Boolean operators in the usual fashion, the query will indeed return everything, since the logical condition is satisfied. However, the CONTAINS semantics over logical operators is not defined in the current spec (other than 'by example')... --- ## Post #5 by @bna This is a tricky one. Need to look into it in detail. A quick shooting from the hip (is it a saying in English?) :slightly_smiling_face: You will get two rows. One for each of the combination of the observation and evaluation. I will come back on it tomorrow. --- ## Post #6 by @Seref Take your time. Also consider: could it be 3 rows? Why? How? ;) --- ## Post #7 by @ian.mcnicoll Naively - because I do not 'get' resultset rows at all, if there were a single candidate composition in the EHR, I would be saying 5 rows for each object in the Select. One for the composition, one for the section, one for each entry. but this is getting close to maths :nauseated_face: --- ## Post #8 by @thomas.beale Ah I see, the Q is about the return row structure. In that case, the SELECT part is arguably invalid, because it cannot be satisfied by some possible FROM structures, which can be C/S/i+o and C/S/i+e. It can only be satisfied by a FROM part as follows: ```` FROM COMPOSITION C c CONTAINS SECTION s CONTAINS INSTRUCTION i AND OBSERVATION o AND EVALUATION e ```` Now, if there happen to be any C/S/i+o+e instances in the data, the SELECT part can be seen as valid, and will return a row with {C, S, i, o, e} in it for each distinct instance in the EHR. Otherwise it should return nothing. So the main questions here is: * how to validate the SELECT part if it implies a different FROM part (equivalently, the FROM part does not guarantee a base data set that can always satisfy the SELECT projection)? Right now, AQL does not specify these validity criteria, but it should. I would argue that the original query is invalid on the grounds that the FROM part specifies possible structures that make the SELECT projection impossible to obtain. If we took the opposite stance, we would have to say that a SELECT projection only has to be *possible* w.r.t. the FROM data, not guaranteed. I think this doesn't make sense, because it ignores the fact that the FROM part is just wrongly specified. Additional remark: the C/S/i/e/o SELECT part doesn't really make sense anyway, since it is returning the outer containers (the C and S) and the things they contain. If we really want to allow that, then it also has to be stated whether you are getting separate clones of the S, i, e and o, or else if the contained items are just refs into the outer containing C structure. --- ## Post #9 by @Seref Thanks. I'll wait for @bna and @matijap to respond before I do ;) --- ## Post #10 by @bna I have discussed with my developers, and we stand by my previous statement with two rows. The reason is that you have two possible leaf branches with the OR between observation and evaluation. We agree that the problem presented is problematic. There is no existing shared rules on how to interpret such functionality. The presented problem is similar to the use-cases we presented a few years ago on the permutation problem. See [Here](https://github.com/bjornna/openehr-conformance/blob/master/aql/case1.1-permutation_bp/index.adoc) --- ## Post #11 by @pablo I guess of we have "SELECT a, b" a validity rule could be "b should not be included on a" and vice versa, since that will clearly maps a tree structure into the plain row/column structure, like: a1 | b1 a1 | b2 (if the a1 instance contains b1 and b2 instances) That might work if returning structures, but the same happens if we return any attributes of "a" and "b" and "a contains b". So if we have: "SELECT a/path, b/path", "a shouldn't contain b an vice versa" (I guess is the same rule at the object level, but now considering paths). --- ## Post #12 by @matijap Okay, finally had the time to try this out on our server and it returns one row. I see no reason for there to be two rows. I see how it could be 3 (i.e. returning all possible combinations: with just instruction, with just evaluation, and with both), but that would make no practical sense. I haven't studied the AQL spec, but logically there is an OR operator in containment, not XOR, so why exclude one or the other in each row? --- ## Post #13 by @Seref @matijap many thanks for taking the time to respond. I appreciate, same goes to @bna To clarify, I am certainly not advocating any particular behaviour, but I want to find out how you see some particular semantics and have a discussion between vendors. I chose these teasers based on some questions I'd like to bounce of you all. Now, given your perfectly sensible response and question, may I kindly ask you to take a look at teaser 4, which I just posted? The interpretation/behaviour which doesn't make sense (initially) here, becomes important in teaser 4, or else, two implementations can return the single instruction instance under column i_1 or i_2. I'll stop talking about that teaser here now, but I'd love to get your feedback on that one, keeping your reasoning on this one in mind. I'll then extend the discussion to others (maybe I should just put all together in a single post?) Thanks again. --- ## Post #14 by @matijap Before even looking at Teaser 4 (and I'll probably progress through 2 and 3 first, although I think I have an idea of what #4 might be like), I'd like to add that actually we interpret disjunction just like SQL RDBMs interpret outer joins, so in case of multiple occurrences we return a cartesian product (by default; we have more proprietary tricks up our sleeves). And if multiple variables can match the same object (i.e. two INSTRUCTION variables with identical archetype predicate), we'll return them in all possible orderings, unfortunately. (And if that's the question in teaser #4, I'll congratulate myself for the guess with a beer in the evening.) Edit: as explained in teaser #2, there is an added condition in this join that the two variables cannot represent exactly the same object, so if you require containment of two instructions with the same archetype, a composition that has only one will not be selected, and for a composition with 2 matching instructions (call them i1 and i2) you will get 2 rows (for permutations (i1,i2) and (i2,i1)), not 4 (you will not get (i1,i1) nor (i2,i2)). --- ## Post #15 by @Seref The beer certainly is on me @matijap and a well deserved one. I was curious about the behaviour of your implementation in different cases, since that's where differences in resultsets will come from among vendors. My understanding is your interpretation of disjunction changes based on the number of matches. Would you say that if a pre-defined simplification is applied to the permutative interpretation, where possible, that would describe the behaviour of your implementation? --- ## Post #16 by @matijap [quote="Seref, post:15, topic:373"] Would you say that if a pre-defined simplification is applied to the permutative interpretation, where possible, that would describe the behaviour of your implementation? [/quote] Sorry, I do not understand the question (and in particular what you mean by pre-defined simplification). --- ## Post #17 by @matijap [quote="Seref, post:15, topic:373"] The beer certainly is on me @matijap and a well deserved one. [/quote] Well, not exactly, I thought there would be a conjunction in #4, but there is disjunction, and it makes a lot of difference. :) If there was conjunction, and if we accept the fact that two variables cannot represent the same object, there are clearly no rows. But for the disjunction case I'll answer in teaser #4 thread. --- ## Post #18 by @Seref [quote="matijap, post:17, topic:373"] and it makes a lot of difference [/quote] Actually, based on your answer, it doesn't :) Well, at least one aspect of your answer I was referring to. You're clearly thinking in terms of permutative behaviuor, i.e. all possible rows the query can generate given the composition, then you're removing some of those, which is what I referred to as simplification. Your verbal definition may be removing duplicate rows as much as possible, but the end result is the same. You have a starting, maximal set or rows, then you remove as much as you can, for the purposes of convenience. My point and the reason I gave an example based on disjunction, is the consistency argument. If you apply the logic above to conjunction, it would make sense to apply to disjunction, wouldn't it? And your answer actually does that, both here and there. So you still deserve the beer :) --- ## Post #19 by @matijap [quote="pablo, post:11, topic:373"] I guess of we have “SELECT a, b” a validity rule could be “b should not be included on a” and vice versa, since that will clearly maps a tree structure into the plain row/column structure [/quote] I disagree we want to have that rule. The user may select whatever they want and they have to consider the consequences. And there definitely is use for "select c/uid/value, instr from composition c contains instruction instr[...]". Our implementation does support packing into arrays instead of producing cartesian products, but I do not think that should be default, at least not in this version of AQL when we have so many successfully working production queries out there in the wild. --- ## Post #20 by @matijap [quote="Seref, post:18, topic:373"] all possible rows the query can generate given the composition, then you’re removing some of those [/quote] While this can have the same end result, it is not the terms I like to think in. I prefer to say we are binding objects to variables, recursively over all variables, but only objects that have not been bound to other variables are considered for the next variable. If there are no unbound candidate objects left and the variable is part of a disjunction (OR), we can leave that variable empty. Now, this means that in the teaser #4 we would actually only have one row (i_1 would never be empty because we could bind it and then there would not be any objects left for i_2). Well... :) --- ## Post #21 by @Seref [quote="matijap, post:19, topic:373"] I disagree we want to have that rule [/quote] Ocean agrees. I won't go into details, because I really want to keep these posts to discussions specific. --- ## Post #22 by @Seref [quote="matijap, post:20, topic:373"] Now, this means that in the teaser #4 we would actually only have one row [/quote] And this would be a case where your way of formalising the execution of a query breaks. I've been in this situation many times and this is actually where I'm coming from. I'm not claiming that you're doing anything wrong, you have a way of thinking, which allows you to explain the result set for this teaser, and the output contains all the bindings we'd consider as the minimum set to return. Now when you apply the same thinking to teaser 4, it does not deliver the output that your implementation is already delivering, so your way of thinking is not suitable for generalisation to disjunction and conjunction, regardless of the number of data items in the input. It is OK to think in terms of different 'cases' as long as we're sure that our set of cases is exhaustive, but the way of thinking I suggested could deliver the minimum required results in both cases. Except, we'd then have to define case specific 'trimmings', to make things easier for consumer of result set, so there is still a trade off. The important point is, if we identify and define cases in an exhaustive manner, we can guarantee all implementations would return the same results in all uses of logical operators with CONTAINS. These teasers are serving that purpose :) Thanks a lot for your time and responses, this is very valuable. --- ## Post #23 by @ian.mcnicoll I could probably argue a case that as a potential query creator, that if I ask for 3 objects in the SELECT, I should get 3 objects back ... BUT!! .. I would be very happy if the implementers got t together and said, no this is dumb, and you can't have it. At least I know the rules and can adapt my ideas accordingly. --- ## Post #24 by @Seref [quote="ian.mcnicoll, post:23, topic:373"] At least I know the rules and can adapt my ideas accordingly. [/quote] You hit the hammer on the head. As I said before, I'm very happy to make things easier for you, as long as we do it in a safe way, in the same way across all vendors. Sometimes it won't be possible (to make it easy for you) but that's a balance every query lang has to set. --- ## Post #25 by @matijap [quote="ian.mcnicoll, post:23, topic:373"] I ask for 3 objects in the SELECT, I should get 3 objects back [/quote] As an application developer, I would sometimes like to hit more than one fly with one slap (and would not like to select the entire composition and extract things from it when the EHR server can already do it for me). I will point out that giving application developer so much freedom can sometimes lead to errors though; the following example from the fluid balance module of one of our end-user applications only works because the cartesian product problem is irrelevant if all templates limit cardinality of all those observations to at most 1: SELECT c/context/start_time, c/context/other_context/items, ofo, ofov, ofof, ofou, cbss, c/uid FROM EHR[ehr_id/value='$ehrId'] CONTAINS Composition c[openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS ( ( Observation ofo[openEHR-EHR-OBSERVATION.fluid_output.v1] CONTAINS ( Cluster cbs[openEHR-EHR-CLUSTER.bodily_substance.v1] OR Cluster bsa[openEHR-EHR-CLUSTER.bodily_substance-ascites.v1] OR Cluster bss[openEHR-EHR-CLUSTER.bodily_substance-saliva.v1] OR Cluster bsbc[openEHR-EHR-CLUSTER.bodily_substance-bile_content.v1] OR Cluster bssemen[openEHR-EHR-CLUSTER.bodily_substance-semen.v1] OR Cluster bsother[openEHR-EHR-CLUSTER.bodily_substance-mnd.v1] OR Cluster bscsf[openEHR-EHR-CLUSTER.bodily_substance-csf.v1] OR Cluster bslv[openEHR-EHR-CLUSTER.bodily_substance-lavages_vagina.v1] OR Cluster bsblood[openEHR-EHR-CLUSTER.bodily_substance-blood.v1] OR Cluster cbssweat[openEHR-EHR-CLUSTER.bodily_substance-sweat.v1] OR Cluster bse[openEHR-EHR-CLUSTER.bodily_substance-exudate.v1] ) ) OR ( Observation ofov[openEHR-EHR-OBSERVATION.fluid_output-vomiting.v1] CONTAINS Cluster cbsv[openEHR-EHR-CLUSTER.bodily_substance-vomit.v1] ) OR ( Observation ofof[openEHR-EHR-OBSERVATION.fluid_output-faeces.v1] CONTAINS Cluster cbsf[openEHR-EHR-CLUSTER.bodily_substance-faeces.v1] ) OR ( Observation ofou[openEHR-EHR-OBSERVATION.fluid_output-urine.v1] CONTAINS Cluster cbsu[openEHR-EHR-CLUSTER.bodily_substance-urine.v1] ) OR ( Observation ogz[openEHR-EHR-OBSERVATION.global_zn.v1] CONTAINS Cluster cbss[openEHR-EHR-CLUSTER.bodily_substance-sweat.v1] ) ) --- ## Post #26 by @bna [quote="Seref, post:1, topic:373"] SELECT c,s,i,o,e FROM COMPOSITION C c CONTAINS SECTION s CONTAINS (INSTRUCTION i AND (OBSERVATION o OR EVALUATION e)) [/quote] ![image|188x105](upload://wLuaZ02fkoldqZ5cUyh1NebsuMS.png) Based on the following AQL: SELECT c/name/value as C1, s/name/value as S1, i/name/value as I1, o/name/value as O1, e/name/value as E1 FROM COMPOSITION c CONTAINS SECTION s CONTAINS (INSTRUCTION i AND (OBSERVATION o OR EVALUATION e)) The example data is given here: https://github.com/bjornna/openehr-conformance/tree/master/aql/case5-seref in the composition c0.xml|json --- ## Post #27 by @bna And if we change to an OR we get: ![image|164x128](upload://9T5NlmtdhvokIB0ilgllI5aSVmm.png) Based on the following AQL: SELECT c/name/value as C1, s/name/value as S1, i/name/value as I1, o/name/value as O1, e/name/value as E1 FROM COMPOSITION c CONTAINS SECTION s CONTAINS (INSTRUCTION i OR (OBSERVATION o OR EVALUATION e)) And the data is located here: https://github.com/bjornna/openehr-conformance/tree/master/aql/case5-seref in the composition c0.xml|json --- ## Post #28 by @pablo [quote="matijap, post:19, topic:373"] The user may select whatever they want and they have to consider the consequences. [/quote] If the user or query designer don't know the internal rules, they can't handle the consequences because they don't know what those are until the query is executed, and also will depend on the data set they are querying. I disagree on delegating the consequences of querying to users, when the rules are not explicitly stated. They don't know what they don't know, and they are not the developers implementing the rules. We have a gap there. [quote="matijap, post:19, topic:373"] at least not in this version of AQL when we have so many successfully working production queries out there in the wild [/quote] I think the current discussions should focus on AQL2 since we will introduce breaking changes, but IMO we will get a more solid spec that way, and quicker, instead of fixing all the issues AQL1 has now, and trying to harmonize different implementations along the way. --- ## Post #29 by @matijap [quote="pablo, post:28, topic:373"] I disagree on delegating the consequences of querying to users, when the rules are not explicitly stated. [/quote] No disagreement here, we need to define processing rules. You were kind of suggesting we can avoid that by hampering the AQL somewhat. What I disagree with is both the conclusion (that doing this will simplify processing rules definition or even remove the need for it) and the general direction of the idea (hampering the query language). --- ## Post #30 by @matijap [quote="pablo, post:11, topic:373"] So if we have: “SELECT a/path, b/path”, “a shouldn’t contain b an vice versa” [/quote] Based on slack debate, I'm commenting on this one more time at the risk of repeating myself. This renders us unable to do one of the probably most common type of query: "give me all observations of some kind, and let me know the UID of the composition they're found in", so "select o, c/uid/value from composition c contains observation o[some_archetype]". --- ## Post #31 by @ian.mcnicoll I agree @Matija. I don't think we should be breaking things that for the most part work extremely well. It is only when the CONTAINS clauses get complex and nested that the confusion arises. --- ## Post #32 by @Seref [quote="matijap, post:29, topic:373"] What I disagree with is both the conclusion (that doing this will simplify processing rules definition or even remove the need for it) and the general direction of the idea (hampering the query language). [/quote] Completely agree. --- ## Post #33 by @Seref [quote="matijap, post:30, topic:373"] This renders us unable to do one of the probably most common type of query [/quote] and many more. As you said another thread, this makes AQL almost completely useless. --- **Canonical:** https://discourse.openehr.org/t/aql-teaser-1-for-implementers/373 **Original content:** https://discourse.openehr.org/t/aql-teaser-1-for-implementers/373