I am creeping towards a much better understanding of why I, at least, have struggled to understand some of the principles here. Here is my current attempt âŚ
- The âgutsâ of the query is actually in the FROM clause and below. This defines which object structures to return and the filters (WHERE clause) on any matching objects.
so FROM COMPOSITION C where c/start_time/value = â2020-03-03â says âfind me all composition objects with that start_time.â
then critically it means âeach time you find a unique object matching those criteria, create a new row in the resultset.â
So far so good - we get a row back for each matching object
Now SELECT kicks in and says, from that row you have given me, here are the specific objects or scalar values that I really want to see.
Critically, SELECT is a filter on each object in the array of rows returned by the FROM clause, each of which holds an object.
If we work with Bjornâs example we will get 2 compositions (assuming they have the same start_time)
- We can apply the same principle with FROM OBSERVATION o where o/origin/value = â2020-03-03â
If we work with Bjornâs example we will get 3 Observations (assuming they have the same origin)
In neither example is the WHERE clause necessary - just to demonstrate that the first phase of the process is the query into the FROM space and any WHERE clauses to return an array of matching objects - one per row. Iâm also ignoring FROM EHR e for brevity.
So far again so good, no surprises in the returned content in either example?
- but what happens when we start nesting the CONTAINS clause.
FROM COMPOSITION C CONTAINS OBSERVATION o where c/start_time/value = â2020-03-03â and o/origin/value = â2020-03-03â
Until about an hour ago I would have said that this means
âfind me all composition objects with that start_time which also contain OBSERVATION o with that particular origin valueâ
but I now understand that actually means âfind me all Observation objects with that particular origin whose parent composition objects have that particular start_timeâ and return the whole structure - parent composition and matching observation.
i.e. executing a FROM clause will always generate matches at the lowest level of object matching - one row for each object match - if the matching object happens to share a parent composition with another matching object, it doesnât care.
So in Bjornâs example, we now get 3 rows - one for C1 that has one Observation match, 2 for C2 that has two observation matches.
So now the critical thought-error (for me at least) - by specifying SELECT c or SELECT c, o, or anything else, I am somehow controlling the object rows that are matched - Iâm not, I am merely filtering the object that the FROM clause returns in each row and that will always look for unique matches at the furthest end of the data tree it has been asked to traverse.
- FROM COMPOSITION c CONTAINS OBSERVATION o CONTAINS CLUSTER x where c/start_time/value = â2020-03-03â and o/origin/value = â2020-03-03â
So this meansâŚ
Find me any Cluster objects which along with their parent structures, and generate an object row for each match,
if each of those OBSERVATIONS has 2 clusters we should end up with 6 rows because the âtargetâ object is a CLUSTER match, which (because of CONTAINS composition, expects to also return it within the context of the parent COMPOSITION.
Anything I express in SELECT has no effect on that number of matches, it merely filters what is returned in each row.
Iâll stop here for a lie down but, at least at one level now my head understands what is happening.
There also does appear to me to be a requirement to allow the âprojectionsâ approach as well as the âsingle-composition approachâ so I like Seref;s suggestion of using EXIST as an alternative to CONTAINS to control that aspect, so that at very least we can use the EXIST alias as part of the WHERE clause.
There are further challenges around SELECT but if we can agree an approach