AQL: Formal definition of FROM clause

pablo · 25 February 2020 21:26

I really like the idea of adding something like that as a summary at the beginning of spec, it’s short and straightforward. I would suggest not to use the term “row” or “rowset” because that could indicate or suggest an implementation technology.

If we are not referring to the “row” in the sense of Relational Databases, and we want to keep using the terms, we should define our own “row” and “rowset” semantics in the AQL spec, which is also related to giving an idea of how an AQL processor /evaluator/execution should work.

BTW I like the idea of giving implementation hints in the spec, but don’t know if we need to separate things or create a more complete spec. With this I mean, to have a complete query spec we need to define:

syntax
query processing/evaluation/execution
result set

We are very close to have a good “syntax spec”, but we lack on the rest.

pablo · 25 February 2020 21:36

@all I have committed and improvement of the FROM definition to my PR: https://github.com/openEHR/specifications-QUERY/pull/5/commits/9517f7b2dedf3dc083b23c656065a97d5114d14b

I tried to remove any reference to a CDR or a specific RM, this is still WIP. Rewriting that I realized we need to mention that AQL is for any RM, but the RM should comply with a couple of things:

should be an OO RM (since the FROM uses class names)
the RM should be used in a dual-model environment (without this we don’t have archetype IDs or paths)

In the current v1.0 spec we have “Archetype Query Language (AQL) is a declarative query language developed specifically for expressing queries used for searching and retrieving the clinical data found in archetype-based EHRs.”

We might be covered for point 2. with “…data found in archetype-based EHRs…”, but is not so explicit. Also, there are still many references to the openEHR RM and to EHRs in general, constraining the scope of AQL to CDRs only, a constraint we need to remove.

Still I think both conditions should be explicitly mentioned in the AQL introduction (OOM + dual-model), and also mention AQL works on any RM that complies with those conditions.

What do you think?

bna · 25 February 2020 22:30

This is a great answer @matijap and you truly show why you are Better

Response to your two questions:

As I wrote earlier

What I meant by this is :

To teach the (internal) developers using our backend to use it properly and explain what the expected output will be based on the query they propose. Most developers has very high competence and experiences using SQL. They think AQL is the same - but it’s not. That is confusing and for many disappointing.
For the core team who has been working with openEHR since 2010 to find out how a given AQL is expected to map into complex hierarchical datastructures and produce the resultset that both a “semi-clinical-tech” and the “high-competent-openEHR-expert” think is correct. Often we find a discrepancy here. I.e. @ian.mcnicoll had some issues accepting the Glasgow Coma Scale example given here. And we, as a community and SEC group, has not yet found a shared solution for the permutation problem as explained here. For the both the latter examples we, DIPS, is working on some assumptions and query logic which seems to solve it. I will share it as soon as I am able to understand what the developers are doing currently (it’s AFAIK heavy stuff, but I think/have heard rumours that the Better guys already has some solution to this).
And of course the ORDER BY issue like “should there be a default order if no order is given”, how to order data types. And similar to this how to handle NULL when ordering? Do we need some operator to explicit give the AQL engine hints about i.e. NULL FIRST

All the examples above is more informal and descriptive than formal modelling definition. Other might have a different view on this. But I must say for us, DIPS, what is important in short terms is to define the expected rules for the problems raised above. And I can not see how to work with this kind of problems without discussing them. So far my the questions related to ORDER BY has been replied with “this can be fixed in BMM by some infix operators”. That’s fine I think, but for AQL we simply don’t care because the AQL pipeline is extremely handcrafted and optimized for our specific implementation. All wee need is an informal description of what we agree on as a SEC group.

Current use of AQL is limited to query EHR RM based data. I agree with @matijap that we need some clarifications in text which covers the use-cases that customers or clients will face. If we some time in the future will do more work on DEMOGRAPHICS or TASKPLANNING then we may add text to clarify such use-cases. I think @sebastian.iancu will provide some good use-cases for DEMOGRAPHICS and we will eager to learn about their experiences.

And as a final note to self: @Seref - I am sorry for not responding to your initial post in this topic. I think you made a really good start for the discussion. And it was so good that I didn’t have any specific comments to it. It made sense to me

bna · 25 February 2020 22:31

@pablo - sorry , I lost the context for this reply. Which of my posts are you referring?

sebastian.iancu · 26 February 2020 08:29

I was referring (hinting) to our (openEHR) row/result-set or whatever that type-name we will have in our specification (because we should have those types specified, including json-schema, xsd).

bna · 26 February 2020 08:38

I agree. Row and columns in this context is a description of the format of the result from the executed AQL. It’s not implementation specific. The terms are AQL specific definitions of the models/types/classes used in AQL.

pablo · 26 February 2020 09:01

That’s ok, my point is: if we use those terms, we need to define them in the spec or reference an external definition.

In fact, that is the point of this discussion and also the other thread about defining the operators for the simple types, because we build complex concepts without defining the basic semantics of their internal components.