What is a query on VERSION v returning ?

If no predicate for VERSION is used, are all versions or only the latest returned ?

SELECT e/ehr_id/value, v/commit_audit/time_committed/value from EHR e
CONTAINS VERSION v
CONTAINS COMPOSITION c
ORDER BY vo/commit_audit/time_committed DESC

First question is: what’s the FROM part? Secondly, AQL only returns the data from one version. By default that is the latest. I don’t remember building a query like the one you showed, but I see that AQL has been modified to allow ‘CONTAINS VERSION ()’, so someone else might have to explain that. (In my view, in ‘CONTAINS VERSION’, the CONTAINS operator doesn’t have the same semantics as for normal containment).

What’s the definition of ‘normal containment’? Just trying to understand why it’s different for a VERSION.

  1. Thanks i think we need to add this info to the Spec for clarification.
    Also including an example.
    Since there is ALL_VERSIONS and LATEST_VERSION as predicates, its somewhat ambigous.
  2. Thats how it is defined in the AQL ANTLR.
    Its VERSION< Composition > so its semantically correct, is it ?
1 Like

With normal containment, if there were two Bs contained under A (say two OBSERVATIONs under COMPOSITION), then they co-exist as siblings within the total data object represented by the COMPOSITION. This is not the semantics of VERSIONs - here, ‘containment’ of two or more VERSIONs makes each one an alternative, because their ‘container’ (a VERSIONED_OBJECT) isn’t a single data object, but a little machine that maintains multiple versions over time of some other data object.

So the semantics of normal containment can be understood e.g. via path-based referencing, e.g. COMP_A/content[1] and COMP_A/content[2] (using Xpath like syntax), whereas the semantics of versioning is VO_A.atTime(t1), VO_A.atTime(t2), VO_A.atVersionId(‘1.2.4’) and so on. You cannot even know if VO_A/versions[1] and VO_A/versions[2] contains full objects, diffs, or some of smart representation.

So for me, using the CONTAINS operator in AQL makes no sense for accessing versions. There used to be a concept called ‘time window’ for making the query run against the state of the data at the time point mentioned in the TIME WINDOW clause, but that was removed some years ago.

A bit of digging around on the topic reveals that some DBs support a SYSTEM TIME idea:

Version “as of” a point in time:

SELECT * FROM MyTable FOR SYSTEM_TIME AS OF '2025-01-01T10:00:00'; returns the row versions that were valid at that timestamp.​

All versions of a row:

SELECT * FROM MyTable FOR SYSTEM_TIME ALL WHERE Id = 123 ORDER BY SysStartTime; returns the full version history for that key.​

Versions over a period:

FOR SYSTEM_TIME FROM start TO end, BETWEEN start AND end, or CONTAINED IN (start, end) filter row versions by validity intervals.​

Other DBs have things like the following:

In platforms without native temporal syntax, versioning is often modeled manually using validity columns such as ValidFrom/ValidTo (and possibly a second pair for transaction time). Different versions are then accessed by explicit predicates on those columns:

“As of” a time (valid-time only):

WHERE Id = 123 AND ValidFrom <= :t AND :t < ValidTo returns the version effective at time :t.​

Main conclusion is that access data in a versioned system / DB usually requires a dedicated operator or access method.

2 Likes

The problem I’d see with that interpretation (and it is an interpretation, not definition from an AQL spec p.o.v, much like the most of AQL spec is) is that it is inconsistent with the rest of the query semantics. When most of the query semantics of a query is explained in data terms, having a “little machine” next to the data items, exposed only through a functional interface, to which the fundamental operator for composing structure of queries (CONTAINS) does not apply, breaks the conceptual consistency of the query semantics. We’re suddenly not “querying data in the CDR” as 99% of people are told to, but we’re querying an object model. Is it possible? Sure, it is. Is it a good design choice? I would say no; it will confuse users, who mostly see versions as discrete instances of something, snapshots of a thing that they are interested in getting back.

The alternative is to hide the RM’s way of version access (through function calls) and present versions of things as discrete, full snapshots of compositions etc.

IMO you’re also coupling means of access to versions with the representation of versions, which I’d try to avoid. Applying a time criteria to a number of things does not preclude those things from having a representation that is consistent with the rest of the world model AQL exposes. I.e. having an operator/function that says the instance with the highest version at time or all versions that existed at time does not mean those instances cannot be a thought as a bunch of things that happened to have version numbers.

I’d prefer to present a view of the system that does not force the user of AQL to think in different ways as much as possible. All of this is subjective of course: each to their own (though it would help if implementers agreed, this being a standard query language and all that :slight_smile: )

Not really - we’re just querying with some kind of operator(s) that enable the writer to state what time or version id they want. I’m not saying how this should look in AQL, only that it be independent of representation.

Using CONTAINS in fact implies a representation that is quite often unlikely to be the actual one.

Well, it appears that most DB manufacturers disagree - they mostly use special semantics for accessing versions of data, meaning special additions to SQL and/or special schema patterns etc.

Does anyone thing of versioning as just being like object containment?

I understand what you say, but if we look at the RM and how CONTAINS is used in AQL, you can have “EHR e CONTAINS COMPOSITION c” where the containment relationship between EHR and COMPOSITION is through an OBJECT_REF to a VERSIONED_COMPOSITION and taking the last VERSION inside that VERSIONED_COMPO.

So it’s used in different ways in different areas, because resolving the EHR-COMPOSITION containment is different than resolving the COMPOSITION-OBSERVATION containment, which is not through an OBJECT_REF, neither involves a VERSIONED_OBJECT or a VERSION.

Everyone using AQL kind of knows that because it’s common usage (convention), though the semantics are not formally defined.

So a third possibility can be also true: when VERSION is mentioned in an AQL containment, that also works in a specific way, it’s just that is not common usage yet. IMHO what @SevKohler wants to do (or similar), by using VERSION in the query, should be possible without special operators, we just need to define the convention. For instance by saying that this will actually use the last version:

SELECT e/ehr_id/value, v/commit_audit/time_committed/value from EHR e
CONTAINS VERSION v
CONTAINS COMPOSITION c
ORDER BY vo/commit_audit/time_committed DESC

yeah as said its already in the ANTLR of AQL and its pretty much implemented like that for some vendors.
Its just what the VERSION v is returning.
But i think we just add that to the spec :slight_smile:

Not much use of changing how VERSION is queried now.

Exactly my point. Ignore the implementation; the logical containment is:

  • EHR
    • COMPOSITION
      • SECTION
        • …
      • ENTRY
        • CLUSTER

There are no ‘versions’ in the logical containment picture; versioning is a mechanism to make the EHR act like a versioning file system, or Git.

The example query makes no sense. You don’t normally want to obtain the particular VERSION object containing a COMPOSITION; you want the query processor to run the query on the whole EHR at some time point in prior commit history. Very rarely some DB admin might want to investigate specific VERSION objects, but they won’t bother to use AQL for that, they’ll just do it on the raw DB.

Even if you really want to have EHR CONTAINS VERSION CONTAINS COMPOSITION it still doesn’t provide the semantics required for normal versioned querying, i.e., run this query, but against the EHR as it was 45 days ago.

I’d expect (and implement) all versions to be returned. That’s because there is no criteria expressed that differentiates between VERSION instances. There’s the slight issue of VERSION not being an RM type (it is Version<T>) but this is one of the simplifications an AQL implementation may use.

@thomas.beale I’m trying to interpret what you say, and I think you are saying “what should be in the specs is this…”, while in the current specs there is no mention of “normal containment” or “logical containment”, two terms you are using like those were specified.

On the other hand, I’m not focusing on implementation, I just mentioned how the RM is specified and what’s in the current RM for the EHR-COMPOSITION containment and COMPOSITION-ENTRY containment (let’s call it “real containment” or “physical containment” for context).

Back to the AQL spec, there it only says what the syntax is and not how it should work (this is an issue I ranted about AQL for years though nobody seem to care: AQL is defined as syntax, though how it should work is not well defined), containment there is defined as:

3.13.2.2. Containment

Since archetypes are in hierarchical structure, AQL has a containment constraint which specifies the hierarchical relationships between parent and child data items. The FROM clause utilises this hierarchical constraint along with class expression to determine the data source to which the AQL query is applied.

The syntax of containment constraint expression is very simple: using CONTAINS operator between two class expressions. Left-hand operand class expression is the parent object of the right-hand operand class expression, e.g.

EHR e CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.referral.v1]

And:

Containment expression

The openEHR RM is used to identify the hierarchical relationships among the found archetypes so as to define the containment expression. For example, a Composition archetype is the parent of an Observation archetype, so the AQL query looks as follows:

FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1]
    CONTAINS OBSERVATION [openEHR-EHR-OBSERVATION.blood_pressure.v1]

Formally, all we have is this “containment constraint which specifies the hierarchical relationships between parent and child data items”.

I’m not saying what you say is right or wrong, but it’s not on the specs, so it’s clearly your own mental model or interpretation. Another conclusion is that the AQL spec is currently incomplete in many ways, especially in how queries should work technically.

As said before, when doing EHR CONTAINS COMPOSITION, the CONTAINS works in a different way than doing COMPOSITION CONTAINS OBSERVATION, then I don’t see the problem (my own interpretation) of using VERSION as another data item from the RM in AQL, though, when included, the CONTAINS will also work in a specific way.

My main point is: we need to define these things.

1 Like

Well maybe not exactly, but versioning is defined in a special part of the RM, independent of everything else. It shouldn’t be in the ‘RM’ of course, it should be in the BASE component - being in the RM is a historical anomaly.

Right - ‘child data items’ - i.e. multiple co-existing sibling items, typically found in a container data structure like a list or similar. That description doesn’t apply to a Versioned object. The Versions are not siblings inside the data, but snapshots of the whole data item, at different points in time (i.e. due to committing changes at various points in time).

No argument there.

I understand what you say, and I’m not in disagreement, though we need to discuss based on the current spec, not in the desire for a better one. I think those are two different conversations, and might trigger two different sets of JIRA tickets, one for the AQL support for versions and another for the improvements you mentioned (what should be and isn’t).

I also understand that, but that’s also interpretation. What applies and doesn’t applies is not defined. If it’s not defined, you can say anything about that and can be true, and you can say the opposite and also be true. I’m just trying to think logically, not adding my own view to the discussion.

Another view could be just looking at the RM as classes and relationships, without extra semantics, in this view VERSION is just another class, nothing special or that should be processed differently from other classes. You are actually adding those semantics on top to provide the interpretation on how AQL should work.

Yet another element to consider is that some providers already implemented support for VERSION in AQL. I would like to know how many and in which way, because I would prefer to formalize that instead of leaving it open, generating different implementations of the same thing that might lead to different query results (inconsistencies between implementations).

In any case I think we agree that the specification should be modified to tell exactly what CONTAINS is and how it should behave, because this discussion can go on forever and the real thing is that we are discussing opinions not current definitions and rules. We need the spec first to give us a framework to discuss, at least to agree on the basic definitions and behavior.