If no predicate for VERSION is used, are all versions or only the latest returned ?
SELECT e/ehr_id/value, v/commit_audit/time_committed/value from EHR e
CONTAINS VERSION v
CONTAINS COMPOSITION c
ORDER BY vo/commit_audit/time_committed DESC
If no predicate for VERSION is used, are all versions or only the latest returned ?
SELECT e/ehr_id/value, v/commit_audit/time_committed/value from EHR e
CONTAINS VERSION v
CONTAINS COMPOSITION c
ORDER BY vo/commit_audit/time_committed DESC
First question is: whatâs the FROM part? Secondly, AQL only returns the data from one version. By default that is the latest. I donât remember building a query like the one you showed, but I see that AQL has been modified to allow âCONTAINS VERSION ()â, so someone else might have to explain that. (In my view, in âCONTAINS VERSIONâ, the CONTAINS operator doesnât have the same semantics as for normal containment).
Whatâs the definition of ânormal containmentâ? Just trying to understand why itâs different for a VERSION.
With normal containment, if there were two Bs contained under A (say two OBSERVATIONs under COMPOSITION), then they co-exist as siblings within the total data object represented by the COMPOSITION. This is not the semantics of VERSIONs - here, âcontainmentâ of two or more VERSIONs makes each one an alternative, because their âcontainerâ (a VERSIONED_OBJECT) isnât a single data object, but a little machine that maintains multiple versions over time of some other data object.
So the semantics of normal containment can be understood e.g. via path-based referencing, e.g. COMP_A/content[1] and COMP_A/content[2] (using Xpath like syntax), whereas the semantics of versioning is VO_A.atTime(t1), VO_A.atTime(t2), VO_A.atVersionId(â1.2.4â) and so on. You cannot even know if VO_A/versions[1] and VO_A/versions[2] contains full objects, diffs, or some of smart representation.
So for me, using the CONTAINS operator in AQL makes no sense for accessing versions. There used to be a concept called âtime windowâ for making the query run against the state of the data at the time point mentioned in the TIME WINDOW clause, but that was removed some years ago.
A bit of digging around on the topic reveals that some DBs support a SYSTEM TIME idea:
Version âas ofâ a point in time:
SELECT * FROM MyTable FOR SYSTEM_TIME AS OF '2025-01-01T10:00:00'; returns the row versions that were valid at that timestamp.â
All versions of a row:
SELECT * FROM MyTable FOR SYSTEM_TIME ALL WHERE Id = 123 ORDER BY SysStartTime; returns the full version history for that key.â
Versions over a period:
FOR SYSTEM_TIME FROM start TO end, BETWEEN start AND end, or CONTAINED IN (start, end) filter row versions by validity intervals.â
Other DBs have things like the following:
In platforms without native temporal syntax, versioning is often modeled manually using validity columns such as ValidFrom/ValidTo (and possibly a second pair for transaction time). Different versions are then accessed by explicit predicates on those columns:
âAs ofâ a time (valid-time only):
WHERE Id = 123 AND ValidFrom <= :t AND :t < ValidTo returns the version effective at time :t.â
Main conclusion is that access data in a versioned system / DB usually requires a dedicated operator or access method.
The problem Iâd see with that interpretation (and it is an interpretation, not definition from an AQL spec p.o.v, much like the most of AQL spec is) is that it is inconsistent with the rest of the query semantics. When most of the query semantics of a query is explained in data terms, having a âlittle machineâ next to the data items, exposed only through a functional interface, to which the fundamental operator for composing structure of queries (CONTAINS) does not apply, breaks the conceptual consistency of the query semantics. Weâre suddenly not âquerying data in the CDRâ as 99% of people are told to, but weâre querying an object model. Is it possible? Sure, it is. Is it a good design choice? I would say no; it will confuse users, who mostly see versions as discrete instances of something, snapshots of a thing that they are interested in getting back.
The alternative is to hide the RMâs way of version access (through function calls) and present versions of things as discrete, full snapshots of compositions etc.
IMO youâre also coupling means of access to versions with the representation of versions, which Iâd try to avoid. Applying a time criteria to a number of things does not preclude those things from having a representation that is consistent with the rest of the world model AQL exposes. I.e. having an operator/function that says the instance with the highest version at time or all versions that existed at time does not mean those instances cannot be a thought as a bunch of things that happened to have version numbers.
Iâd prefer to present a view of the system that does not force the user of AQL to think in different ways as much as possible. All of this is subjective of course: each to their own (though it would help if implementers agreed, this being a standard query language and all that
)
Not really - weâre just querying with some kind of operator(s) that enable the writer to state what time or version id they want. Iâm not saying how this should look in AQL, only that it be independent of representation.
Using CONTAINS in fact implies a representation that is quite often unlikely to be the actual one.
Well, it appears that most DB manufacturers disagree - they mostly use special semantics for accessing versions of data, meaning special additions to SQL and/or special schema patterns etc.
Does anyone thing of versioning as just being like object containment?
I understand what you say, but if we look at the RM and how CONTAINS is used in AQL, you can have âEHR e CONTAINS COMPOSITION câ where the containment relationship between EHR and COMPOSITION is through an OBJECT_REF to a VERSIONED_COMPOSITION and taking the last VERSION inside that VERSIONED_COMPO.
So itâs used in different ways in different areas, because resolving the EHR-COMPOSITION containment is different than resolving the COMPOSITION-OBSERVATION containment, which is not through an OBJECT_REF, neither involves a VERSIONED_OBJECT or a VERSION.
Everyone using AQL kind of knows that because itâs common usage (convention), though the semantics are not formally defined.
So a third possibility can be also true: when VERSION is mentioned in an AQL containment, that also works in a specific way, itâs just that is not common usage yet. IMHO what @SevKohler wants to do (or similar), by using VERSION in the query, should be possible without special operators, we just need to define the convention. For instance by saying that this will actually use the last version:
SELECT e/ehr_id/value, v/commit_audit/time_committed/value from EHR e
CONTAINS VERSION v
CONTAINS COMPOSITION c
ORDER BY vo/commit_audit/time_committed DESC
yeah as said its already in the ANTLR of AQL and its pretty much implemented like that for some vendors.
Its just what the VERSION v is returning.
But i think we just add that to the spec ![]()
Not much use of changing how VERSION is queried now.
Exactly my point. Ignore the implementation; the logical containment is:
There are no âversionsâ in the logical containment picture; versioning is a mechanism to make the EHR act like a versioning file system, or Git.
The example query makes no sense. You donât normally want to obtain the particular VERSION object containing a COMPOSITION; you want the query processor to run the query on the whole EHR at some time point in prior commit history. Very rarely some DB admin might want to investigate specific VERSION objects, but they wonât bother to use AQL for that, theyâll just do it on the raw DB.
Even if you really want to have EHR CONTAINS VERSION CONTAINS COMPOSITION it still doesnât provide the semantics required for normal versioned querying, i.e., run this query, but against the EHR as it was 45 days ago.
Iâd expect (and implement) all versions to be returned. Thatâs because there is no criteria expressed that differentiates between VERSION instances. Thereâs the slight issue of VERSION not being an RM type (it is Version<T>) but this is one of the simplifications an AQL implementation may use.
@thomas.beale Iâm trying to interpret what you say, and I think you are saying âwhat should be in the specs is thisâŚâ, while in the current specs there is no mention of ânormal containmentâ or âlogical containmentâ, two terms you are using like those were specified.
On the other hand, Iâm not focusing on implementation, I just mentioned how the RM is specified and whatâs in the current RM for the EHR-COMPOSITION containment and COMPOSITION-ENTRY containment (letâs call it âreal containmentâ or âphysical containmentâ for context).
Back to the AQL spec, there it only says what the syntax is and not how it should work (this is an issue I ranted about AQL for years though nobody seem to care: AQL is defined as syntax, though how it should work is not well defined), containment there is defined as:
3.13.2.2. Containment
Since archetypes are in hierarchical structure, AQL has a containment constraint which specifies the hierarchical relationships between parent and child data items. The
FROMclause utilises this hierarchical constraint along with class expression to determine the data source to which the AQL query is applied.The syntax of containment constraint expression is very simple: using
CONTAINSoperator between two class expressions. Left-hand operand class expression is the parent object of the right-hand operand class expression, e.g.EHR e CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.referral.v1]
And:
Containment expression
The openEHR RM is used to identify the hierarchical relationships among the found archetypes so as to define the containment expression. For example, a Composition archetype is the parent of an Observation archetype, so the AQL query looks as follows:
FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION [openEHR-EHR-OBSERVATION.blood_pressure.v1]
Formally, all we have is this âcontainment constraint which specifies the hierarchical relationships between parent and child data itemsâ.
Iâm not saying what you say is right or wrong, but itâs not on the specs, so itâs clearly your own mental model or interpretation. Another conclusion is that the AQL spec is currently incomplete in many ways, especially in how queries should work technically.
As said before, when doing EHR CONTAINS COMPOSITION, the CONTAINS works in a different way than doing COMPOSITION CONTAINS OBSERVATION, then I donât see the problem (my own interpretation) of using VERSION as another data item from the RM in AQL, though, when included, the CONTAINS will also work in a specific way.
My main point is: we need to define these things.
Well maybe not exactly, but versioning is defined in a special part of the RM, independent of everything else. It shouldnât be in the âRMâ of course, it should be in the BASE component - being in the RM is a historical anomaly.
Right - âchild data itemsâ - i.e. multiple co-existing sibling items, typically found in a container data structure like a list or similar. That description doesnât apply to a Versioned object. The Versions are not siblings inside the data, but snapshots of the whole data item, at different points in time (i.e. due to committing changes at various points in time).
No argument there.
I understand what you say, and Iâm not in disagreement, though we need to discuss based on the current spec, not in the desire for a better one. I think those are two different conversations, and might trigger two different sets of JIRA tickets, one for the AQL support for versions and another for the improvements you mentioned (what should be and isnât).
That description doesnât apply to a Versioned object. The Versions are not siblings inside the data, but snapshots of the whole data item, at different points in time (i.e. due to committing changes at various points in time).
I also understand that, but thatâs also interpretation. What applies and doesnât applies is not defined. If itâs not defined, you can say anything about that and can be true, and you can say the opposite and also be true. Iâm just trying to think logically, not adding my own view to the discussion.
Another view could be just looking at the RM as classes and relationships, without extra semantics, in this view VERSION is just another class, nothing special or that should be processed differently from other classes. You are actually adding those semantics on top to provide the interpretation on how AQL should work.
Yet another element to consider is that some providers already implemented support for VERSION in AQL. I would like to know how many and in which way, because I would prefer to formalize that instead of leaving it open, generating different implementations of the same thing that might lead to different query results (inconsistencies between implementations).
In any case I think we agree that the specification should be modified to tell exactly what CONTAINS is and how it should behave, because this discussion can go on forever and the real thing is that we are discussing opinions not current definitions and rules. We need the spec first to give us a framework to discuss, at least to agree on the basic definitions and behavior.