AQL: Formal definition of FROM clause

thomas.beale · 22 February 2020 00:35

I’m not sure about using the word ‘scope’ w.r.t. SQL or AQL. In simple terms, the various bits are as follows:

SELECT
projection (= subset of columns of a Table or View, or properties of a class/type)
FROM
domain / universe (= tables or classes/types from which columns/properties projection is defined)
WHERE
criteria (= row selection, by filtering on values)

pablo · 22 February 2020 01:41

That seams reasonable @thomas.beale, but interms of:

If we think of functions, FROM could be a function, the source data set for that function, could be EHR/DEMOGRAPHIC/xxx, is the domain (of that function), then the result or co-domain of the FROM applied to the domain is the domain for the query as a whole, since the query could also be considered a function.

But you can consider the query as a whole is applied to EHR/DEMOGRAPHIC/xxx, so that would be the domain for the query, not the result of the FROM, since the query would be a combination of functions applied one to the result of the other: QUERY_RESULT = SELECT(WHERE(FROM(domain))).

The difference is subtle, but really depends on what you are focusing on, the FROM clause or the complete AQL.

Even more, SELECT and WHERE are also functions, WHERE is a boolean function and SELECT is a mapping function. I would say FROM is a sub-set definition function (could be a “selection” function but gets weird having the SELECT clause).

This is what I understand it, I’m not saying this is the most correct way of understanding or defining things.

bna · 22 February 2020 08:24

Regarding the FROM as a filter into the domain

DIPS found a need to expand the query model to be able to run the same functional AQL with different constraints. This was suggested into the openEHR REST API v1.0. Since the SEC group wanted to keep the first version minimal this feature was postponed to later versions. We use the following request model. The tagScope and partitionBy is used a lot in production. The use-case is ward lists to query i.e. the latest (partitionBy = EpisodeOfCareId) body temperature for each episode of care (tag = EpisodeOfCareId).

{
“aql”: “string”,
“compositionUids”: [
“string”
],
“ehrIds”: [
“string”
],
“tagScope”: {
“tags”: [
{
“values”: [
“string”
],
“tag”: “string”
}
]
},
“partitionBy”: {
“tag”: “string”,
“limit”: 0
},
“correlationId”: “string”
}

sebastian.iancu · 23 February 2020 08:44

Quite a lot of things were said here that, at least in my opinion, I think are important:

Well, I don’t know how others are fully understanding and deeply seeing and feeling all the aspect of above quote, but for me:

I get @thomas.beale advocating for formal description in a BMM, it is perhaps the right place
but I also agree @Seref about extra burden on depending on BMM
the whole discussion is around AQL processor and AQL formalism specification, to make it model-agnostic, but the data-storage is not formally specified (neither db-type, neither data-definition or structure), which (I guess) means that the AQL-execution itself is implementation-specific - I wonder how much (if any) the BMM can be used at that level, I have impression that is hard-wired (as opposed to ADL parsing which takes directly benefit of BMM).

I suggest adding an extra chapter or few paragraphs in the beginning of AQL specs, that will capture these conceptual design aspects in a dialog above between @thomas.beale and @Seref . It might be useful for implementors to better understand the necessity of BMM in relation with AQL.

sebastian.iancu · 23 February 2020 09:07

This is a nice simple one:

but if we would like to use it in specs, then I would change it a bit:

SELECT
projection (= subset of columns or properties of the selected rowset)
FROM
domain (= rowset source, usually tables or classes/types from which columns/properties projection is defined)
WHERE
criteria (= rowset retrieval criteria, by filtering on their values)

thomas.beale · 23 February 2020 11:26

Yep, this is also good, probably better. I wasn’t trying to provide a proper text BTW, just to state a sort of common sense understanding of these things, in the interests of not getting too complicated or academic. I leave it to the rest here to get the text right for the users of the AQL specification and tools.

thomas.beale · 23 February 2020 11:29

sebastian.iancu:

Well, I don’t know how others are fully understanding and deeply seeing and feeling all the aspect of above quote, but for me:

I get @thomas.beale advocating for formal description in a BMM, it is perhaps the right place

but I also agree @Seref about extra burden on depending on BMM

the whole discussion is around AQL processor and AQL formalism specification, to make it model-agnostic, but the data-storage is not formally specified (neither db-type, neither data-definition or structure), which (I guess) means that the AQL-execution itself is implementation-specific - I wonder how much (if any) the BMM can be used at that level, I have impression that is hard-wired (as opposed to ADL parsing which takes directly benefit of BMM).

I suggest adding an extra chapter or few paragraphs in the beginning of AQL specs, that will capture these conceptual design aspects in a dialog above between @thomas.beale and @Seref . It might be useful for implementors to better understand the necessity of BMM in relation with AQL.

We don’t have to create the full BMM approach for AQL right now, it will take some time. But we do need to simulate it in the sense that the knowledge of logical whole/part containment not be directly part of the AQL spec or implementation, but instead be in e.g. some other file that is read to discover the semantics of such relationships.

In the long run, the semantics of a model should be fully stated in the model definition. The short term question is just where this model definition comes from.

Seref · 24 February 2020 12:52

Thanks Sebastian,
I think some clarification is needed, at least re what I suggested.

I for one am not discussing AQL processor(s), mainly because that’s an implementation topic. I do have consideration for implementations when I make my suggestions, but I won’t mention impl. unless I think that something I’m writing may be problematic from that perspective.

Regarding implementation, ITS is where we may have recommendations, but I’ll repeat my point that the base spec (on the left in Tom’s diagram) should not have references to ITS, because it then becomes something that cannot be implemented without using <particular_ITS_option> and something that used to be in the ITS box now exists in the BASE.

This would be crossing the Rubicon for openEHR, on the way to defining a virtual machine, which will make it a very unpopular option compared to FHIR. I appreciate no one else may see it that way, nor worry about the implications as much as I do, but this is my opinion.

I attempted to define AQL behaviour in the context of openEHR data and not go beyond that in terms of formalising it. In my humble opinion, you cannot half formalise it, it just confuses readers/implementers and a full formalisation is a mighty challenge, which I’ve done once as part of research. Talking about projections and universes requires the reader to apply those concepts to the task at hand to implement the spec and you have to make sure that there is not a too large gap between the formalism you use and the actual behaviour. In other words, if you’re going to use another formalism, it has to be at the right level of abstraction in terms of its proximity to semantics of AQL, but again, this is subjective and my view of using formalisms.

Re being model agnostic: no one clearly wrote this but I my understanding is demographics is now implicitly considered as a goal for AQL, which I assume is what the points about being model agnostic are referring to. Even if we’re now talking about RM + demographics, my view of AQL would be a query language that works on RM + demographics data which is a limited scope which can be defined in terms of an object model which can be represented with UML. I think we had a conversation in the last online SEC call to introduce a SYSTEM concept that sits at the root of EHR and Demographics together to address this concern.

At this point, I am not sure I have more to offer in terms of the way forward and I made all the points I’d like to make. So I’ll let the rest of the SEC to solidify how we’ll define AQL. I appreciate you taking the time to follow discussion and points made.

thomas.beale · 24 February 2020 14:14

Actually, in my view, AQL should work for any model. I am not clear on what semantics would make it specific to openEHR RM, or even just openEHR RM + Demographics. It should work the same way for archetyped data based on any model.

So the question just becomes: how does AQL know about the model underlying the queries written? Either the model definition is hard-wired in to an implementation, which is often a quick and reasonable way to get going - but it means that for every model, you have to do more hard-wiring - or it is defined in a more generic way.

As everyone who has ever tried this, you always get to a point where you say ok, enough, let’s do this generically, and you go to a generic model-representation approach. So when you do that, you move all the hard-wired definition semantics out to the generic representation - which for us is BMM, or it could be straight UML/XMI, or even JSON-schema, if you can get those things to behave properly. But the representation is in a way just a detail; the point is to have the model semantics in a place that can be interrogated by an AQL processor, then your AQL semantics are clearly separated from your model semantics. And with a smart model representation (like BMM is aiming to be), you can include all the meta-data you need, even for tricky things like logical containment represented by reference relationships (to be fair, we can even do that in UML, with stereotypes, I’ve just never put it in the model).

Now, since most AQL implementations to date didn’t try to deal with anything more than openEHR RM data, the above issue was not so apparent as if we were being more agnostic. But as soon as we try to solve issues like the CONTAINS semantics going over reference relationships, it becomes clear that the internal hard-wired representation of the model is deficient. Now, implementers could just go an hardwire that further info (which I am not against, BTW), but regardless of where it is concretely expressed, it is not logically anything to do with AQL, it is to do with the model of the data that some particular AQL queries are targetted to. So logically, it is not part of the AQL spec.

I am not convinced I am saying anything different to anyone else here

sebastian.iancu · 24 February 2020 15:15

I’m not sure if, at this time, we need a generic formal definition, other than for only the sake of making this in a nice proper way.

I think we just need to describe AQL so that we know how to apply it for EHR + Demographics, plus perhaps TP ?! Should this be made in a generic, model-agnostic way? Sure, why not (do it right)…? But I think this (BMM) will only describe the AQL semantic, leaving out the architectural aspects of how EHR+Demographic should (or should not) work together, or how TP might be involved, etc.; see also discussion we had about System-concept.
Therefore I wonder if your effort @thomas.beale, to have such BMM definition, would be valued accordingly (I have no idea how much time would that take for you). But, as an AQL implementor, I couldn’t just use BMM and have all things automatically done (like an AQL processor, or a AQL runner, etc…), I will still have to rely on hard-wires in my implementation; and this different than an ADL-app which can use BMM, isn’t it?

But perhaps I’m not the right person to comment on this, as I’m not implementing AQL neither BMM at this point, so I could by biased, or just plain wrong…

bna · 24 February 2020 15:28

Yes. This is IMHO true.

thomas.beale · 24 February 2020 15:53

Well it will go into BMM soon anyway, because BMM is undergoing a major revamp, which is close to finished, to make it do expressions, and full model representation (you can look in the working version if you like). But the logical containment semantics will also go into the UML models of openEHR - it’s something I should have done long ago, but it’s not hard to do.

Those changes will take time to filter through to libraries and tools, i.e. to be directly usable.

My point is, regardless of when the implementation of the change to BMM gets done, or even if you don’t use BMM, the semantics of model relationships of particular models should never be part of the AQL specification, they should always be stated in the model specification.

So even if right now, implementers do just hard-wire the semantics in, this should always be understood as an pragmatic implementation choice, not a specification-level thing.

Hopefully this is a bit clearer. I’ll show how this can be specified in the UML shortly.

sebastian.iancu · 24 February 2020 15:56

yes, it gets clear

Seref · 24 February 2020 16:16

this is a statement I’d like to understand better. My question to you is: what is the model specification? Let me give you an example that seems to run counter to what you said above:
From a query perspective/in the context of query semantics, the composition references in the EHR are interpreted as “contained” and this interpretation is what an implementation of CONTAINS keyword is supposed to implement.
See, I just defined some AQL behaviour and the model relationship, if I get you correctly, is specified in the AQL because it is a relationship that is meaningful in the query use case. You cannot express this relationship in the model specification, if the model specification means RM, because that’d only be possible if you said something in the lines of

…The references to compositions in the EHR are interpreted as containment in the context of querying…

at which point: you’d end up pushing a higher level aspect, querying, into a lower level, which is something we have opposed to together for years.

You can extend BMM to express this containment relationship and that being in the ITS, it is OK if this is what you mean by model in that statement above.

however, you’re still left with the task of defining the relationship between EHR and compositions in the AQL specification to explain and therefore formalise AQL.

How can you do this if you don’t refer to RM?

if you say something in the lines of

CONTAINS interprets the composition references in EHR as per BMM’s…

the BASE part of openEHR in the left hand side of the diagram you pasted is no longer self contained and just like the assumed types of platforms, BMM is now a prerequisite to implement openEHR. Am I correct to assume you’re considering moving on from UML to BMM completely at some point because BMM being a metamodel is already more computable than UML and that fits into your current train of thought as far as I can see.

I’d love to understand where I’m wrong in the above picture I drew. I cannot seem to get my head around your statement re the model relationships to begin with.

thomas.beale · 24 February 2020 18:43

Right - but the EHR → COMPOSITION relationships could be in a model containing the definitions HOUSE and ROOM, with the same kind of reference relationship between the two, and also the same logical compositional relationship. There is nothing special about the EHR / COMPOSITION relationship - indeed, there is nothing in the entire openEHR RM that isn’t in other models. So the specification of AQL should only know about kinds of relationships in general, not something about a model called ‘openEHR RM’.

An AQL query processor has to lookup some model description to find out these kind of things. For example, PARTY.relationships is not a containment relationship; the target PARTYs aren’t deleted if you delete a PARTY whose relationships point to them. How does an AQL query processor know if the keyword ‘CONTAINS’ is even allowed between two classes? It has to discover that the two classes are in a transitive containment relationship. Determining that properly means being able to query a Model object (e.g. a UML model, a BMM model etc) and make a call like:

    has_containment_relation (classA, classB: String): Boolean

e.g. you might want to ask if COMPOSITION can ‘contain’ CLUSTER, which it can, but not directly.

The fact of containinment is not itself anything to do with querying, it’s just a definitional fact of he model semantics. There are no doubt other tools, unrelated to querying, that could use this information. E.g. imagine a tool that can process a JSON representation of an EHR as a (probably giant) in-line hierarchy. A consumer of that data could determine a) that it was logically valid, and b) how to instantiate it properly into EHR and COMPOSITION objects.

Also, to be clear, ‘CONTAINS’ is nothing to do with EHR, COMPOSITION or any other specific types. CONTAINS is just an operator that asserts a (possibly transitive) compositional containment relationship between instances of two types.

In a model-driven implementation, the AQL processor sees the class name ‘EHR’, the keyword ‘CONTAINS’ and the class (say) ‘CLUSTER’, and (somehow) it knows this is openEHR, so it looks up the openEHR BMM and uses a function like the one above to ask if CLUSTERs can indeed really be logically contained inside EHRs. The BMM will have some marker on the various relationships (like the UML black diamond) that says ‘is_composition’, and that function will assess those relationships, and discover that yes, there is a transitive compositional containment path from EHR to CLUSTER.

AQL thus has to know nothing at all about the openEHR RM to process this instance of ‘CONTAINS’. Everything I said would be exactly the same if the RM and archetypes and queries in use were FHIR or NIEM or who knows what.

Am I getting anywhere?

bna · 24 February 2020 19:03

What about the assumptions made in the following topic, how can you interpret such logic without knowing the RM?

https://discourse.openehr.org/t/aql-same-logical-aql-with-different-syntax/366/2

thomas.beale · 24 February 2020 19:11

Same as for any RM: you look up the RM definition and then validate or execute the queries.

bna · 24 February 2020 19:22

The assumptions made in the example are correct?
(BTW : This is an honest question since it took us some time to figure out that it had to be interpreted this way)

thomas.beale · 24 February 2020 19:28

If you mean on that other post, yes, that all looks correct. But normally you would not mentioned the VERSION classes in a query.

bna · 24 February 2020 19:34

To be sure : You agree that they will all produce exactly the same result since they are identical operators on the data?