Hello,
I have another question concerning the semantics of AQL queries:
In the documentation there are queries of the form
SELECT a/data[at0001]/items[at0004]/value
FROM EHR e CONTAINS COMPOSITION a[openEHR-EHR-COMPOSITION.encounter.v1]
WHERE b/data[at0001]/items[at0004]/value/value >= 140
What ensures that the identified path in the SELECT section references the same data instances that are contrained with the same identified path in the WHERE section ? It could be argued that there is only one "data[at0001]" in "b" and only one "items[at0004]" in "a/data[at0001]" and so on. But is this already the full explanation for the expression to be unambiguous ? The aliases used in queries (e.g. "a") ensures that a reference to an alias definitely means the same instance. Looking at queries like the one above let assume that aliases are only syntactic sugar and are not functionally needed. Is this correct ?
Greetings
Georg
That query does not look valid to me. There is no ‘b’ that is defined in the FROM clause. Once an alias for a node is created in the FROM clause, as in COMPOSITION a[..] the alias ‘a’ in the FROM and WHERE clauses refers to the same matching data node.
I would not assume anything based on the query you’ve given because that query is invalid as far as I can see.
Recently we discussed other problematic cases, like having criteria over a structured datatype like:
WHERE xxxx/data[at0001]/items[at0004]/value/magnitude >= 140 AND xxxx/data[at0001]/items[at0004]/value/units = “mmHg”
Internally that should be interpreted as “magnitude” and “units” should be attributes of the same DV_QUANTITY instance, but do all AQL implementations actually do that?
But maybe that kind of query should be written as:
WHERE dv/magnitude >= 140 AND dv/units = “mmHg”
In that case, dv should be defined in the FROM, and all variables/aliases should point to the same data instance.
Hi Pablo, see inline please
Recently we discussed other problematic cases, like having criteria over a structured datatype like:
WHERE xxxx/data[at0001]/items[at0004]/value/magnitude >= 140 AND xxxx/data[at0001]/items[at0004]/value/units = “mmHg”
Internally that should be interpreted as “magnitude” and “units” should be attributes of the same DV_QUANTITY instance, but do all AQL implementations actually do that?
Who knows?
the syntax is correct though, so what they do internally does not matter, the WHERE clause is clearly introducing two constraints on the same DV_QUANTITY instance.
But maybe that kind of query should be written as:
WHERE dv/magnitude >= 140 AND dv/units = “mmHg”
In that case, dv should be defined in the FROM, and all variables/aliases should point to the same data instance.
this is also valid AQL but you may have two problems here:
-
AQL implementations may not support the full AQL semantics. That is, even though EHR e[..] CONTAINS DV_QUANTITY dv is legal AQL a particular back end may not support it. In your example snippet, the FROM statement (which we don’t see) should have an xxxx. Now to use the dv alias as you’ve suggested, that FROM claused would have to become: xxxx/data[at0001]/items[at0004]/value as dv and this is the case a back end may not support, or all back ends may not support because a common thinking among vendors is users would select more meaningful/high-level nodes (mostly entries or clusters under them) and access data via SELECT or apply criteria in WHERE clauses using relative paths.
-
if you’d like to apply two criteria to xxxx, then you have to declare xxxx in FROM clause and do WHERE xxx/path1/value AND xxx/path2/value So you have to define xxxx here, if you do what I described above, you only have DV_QUANTITY leaf node and you cannot introduce constraints to other nodes relative to xxxx. So you have pros and cons or ‘high level nodes giveth and taketh away’…
Another option, which should actually be legal given the current AQL grammar may be to move all constraints on xxx to the immediate predicate of xxxx but this should be used only when there is no ambiguity. As in:
FROM … CONTAINS xxxx[data/items/value/magnitude >= 140 AND data/items/value/units = ‘mmHg’]
This would neatly move the joint criteria up but I would be uncomfortable with this because you cannot specify archetype node ids for data/items anymore and this would likely trigger a cartesian explosion in result sets. Then you’d have Ian complaining about duplicate results ![]()
All the best
Seref
The correct way to do this IMO, and my earliest idea for AQL in about 2006 was using archetype fragments, which are a kind of Frame-logic approach, formally. Today, people think in terms of GraphQL, but you need constraints.
E.g.
WHERE
dv matches {
magnitude matches {``|>=140.0``|}
units matches {"mm[Hg]"}
}
Or, more readably:
WHERE
dv ```∈ { magnitude ∈ {|>=140.0|} units ```∈ {"mm[Hg]"}
}
- thomas
Hi Tom,
I remember reading this somewhere. Can you remember if the current antlr grammar supports that embedded adl syntax?
Cheers
Seref
seems to support something - search for ‘identifiedEquality’ in the grammar part of the AQL spec.
Hi Seref, see below ![]()
Hi Pablo, see inline please
Recently we discussed other problematic cases, like having criteria over a structured datatype like:
WHERE xxxx/data[at0001]/items[at0004]/value/magnitude >= 140 AND xxxx/data[at0001]/items[at0004]/value/units = “mmHg”
Internally that should be interpreted as “magnitude” and “units” should be attributes of the same DV_QUANTITY instance, but do all AQL implementations actually do that?
Who knows?
the syntax is correct though, so what they do internally does not matter, the WHERE clause is clearly introducing two constraints on the same DV_QUANTITY instance.
That should be defined by the spec to avoid inconsistent implementations.
On the other hand, paths might not reference the same data instance, consider multiple occurrences of the ELEMENT in xxxx/data[at0001]/items[at0004], the only way to be sure that references the same data instance is by doing something like xxxx/data[at0001]/items[at0004][0]/value/… or xxxx/data[at0001]/items[at0004][1]/value/… without the index, paths references a list of objects.
But maybe that kind of query should be written as:
WHERE dv/magnitude >= 140 AND dv/units = “mmHg”
In that case, dv should be defined in the FROM, and all variables/aliases should point to the same data instance.
this is also valid AQL but you may have two problems here:
- AQL implementations may not support the full AQL semantics. That is, even though EHR e[..] CONTAINS DV_QUANTITY dv is legal AQL a particular back end may not support it.
I was thinking about EHR CONTAINS OBSERVATION CONTAINS … CONTAINS DV_QUANTITY dv, not directly referencing a DV from the EHR, but as you said, it is valid.
The CONTAINS also has to handle the multiple occurrences of contained items on attributes that are collections.
In your example snippet, the FROM statement (which we don’t see) should have an xxxx. Now to use the dv alias as you’ve suggested, that FROM claused would have to become: xxxx/data[at0001]/items[at0004]/value as dv and this is the case a back end may not support, or all back ends may not support because a common thinking among vendors is users would select more meaningful/high-level nodes (mostly entries or clusters under them) and access data via SELECT or apply criteria in WHERE clauses using relative paths.
- if you’d like to apply two criteria to xxxx, then you have to declare xxxx in FROM clause and do WHERE xxx/path1/value AND xxx/path2/value So you have to define xxxx here, if you do what I described above, you only have DV_QUANTITY leaf node and you cannot introduce constraints to other nodes relative to xxxx. So you have pros and cons or ‘high level nodes giveth and taketh away’…
Another option, which should actually be legal given the current AQL grammar may be to move all constraints on xxx to the immediate predicate of xxxx but this should be used only when there is no ambiguity. As in:
FROM … CONTAINS xxxx[data/items/value/magnitude >= 140 AND data/items/value/units = ‘mmHg’]
wow, that is nice ![]()
I like this!
Comments inline
Hi Seref, see below
Hi Pablo, see inline please
Recently we discussed other problematic cases, like having criteria over a structured datatype like:
WHERE xxxx/data[at0001]/items[at0004]/value/magnitude >= 140 AND xxxx/data[at0001]/items[at0004]/value/units = “mmHg”
Internally that should be interpreted as “magnitude” and “units” should be attributes of the same DV_QUANTITY instance, but do all AQL implementations actually do that?
Who knows?
the syntax is correct though, so what they do internally does not matter, the WHERE clause is clearly introducing two constraints on the same DV_QUANTITY instance.
That should be defined by the spec to avoid inconsistent implementations.
Sorry, I misunderstood what you wrote earlier. My response ‘who knows?’ is not valid. The semantics is clear, mag and units are attributes of the same instance. it is the aql semantics.
On the other hand, paths might not reference the same data instance, consider multiple occurrences of the ELEMENT in xxxx/data[at0001]/items[at0004], the only way to be sure that references the same data instance is by doing something like xxxx/data[at0001]/items[at0004][0]/value/… or xxxx/data[at0001]/items[at0004][1]/value/… without the index, paths references a list of objects.
Very short answer: No. similar to attributes being attributes of same DV_QUANTITY instance, paths are relative paths from a single ‘match’, i.e. an actual object that fits the criteria defined by the FROM clause . There may be multiple matches but the where clause would not apply criteria 1 to match one and criteria 2 to match two. You can see the same semantics used by XQuery and Sparql as well.
Obligatory moaning for the millionth time: All of this confusion stems from lack of a formal model defining AQL semantics. I used tree pattern queries in my research so I can use those to explain aql, but as it stands the standard we have does not have this so users will keep getting confused.
This. A query language is more than just a syntax. And probably we don’t need to invent different behaviors than technologies that already work (XPath/XQuery, for example)
we are open to upgrading the AQL spec at any time
- thomas
You’ll be hearing from me kind sir ![]()
Obligatory moaning for the millionth time: All of this confusion stems from lack of a formal model defining AQL semantics. I used tree pattern queries in my research so I can use those to explain aql, but as it stands the standard we have does not have this so users will keep getting confused.
This. A query language is more than just a syntax. And probably we don’t need to invent different behaviors than technologies that already work (XPath/XQuery, for example)
That is exactly why I didn’t implement AQL yet, the spec should include syntax, behavior, semantics and result sets defined. And lots of examples of those things combined. Because impregnada should include parser+engine and the parser is the simple part. Until we have that all implementations will have differences when querying the same database.
You are way more technical than just a consumer of AQL, so you may consider adopting semantics from XQuery/XPath/Sparql. It would not really leave you a million miles from the current AQL semantics. I’ll leave it at that since this is not a technical list as far as I my understanding goes. Feel free to re-initiate discussion under technical list if you have comments (goes for everyone) ![]()
All the best
Seref
“impregnada” should be “implementations”, damn phone auto correction ![]()
That is exactly why I didn’t implement AQL yet, the spec should include syntax, behavior, semantics and result sets defined. And lots of examples of those things combined. Because implementations should include parser+engine and the parser is the simple part. Until we have that all implementations will have differences when querying the same database.
@Diego Boscá Tomás about the technology, look at this thing https://www.arangodb.com/
Search for AQL (funny name for a generic query language :P), but it seems strangely nice to be use to implement an openEHR AQL backend, since it can be used to query many backend technologies, is another abstraction layer. I’ll test this thing when I have time, would love to hear from others that test it.