AQL (and related) documentation

ian.mcnicoll · 3 March 2020 10:33

As we have started to get into the detail of tidying-up the AQL specification and related issues like datatype ordering and comparison rules, there has been a pretty lively discussion here about the best way of documenting the required behaviour of AQL in the context of the openEHR Reference models.

I think we now have to make a decision so that we can make tangible progress. Broadly speaking two approaches have been suggested, and debated.

Define and document the rules and behaviours directly in terms of openEHR classes , inside the current specifications e.g as part of the UML , or indeed defined directly in BMM, using somewhat abstract modelling language
Define and document the rules and behaviours as a separate part of the specification, working much more directly and in human-language terms along the lines that @Seref started here but referring, of course to the existing specifications. I think of this is a document profiling the generic AQL specification for use with openEHR RM - perhaps not the correct use of ‘profile’ but we know that we have to give explicit advice about how AQL should work in openEHR CDRs.

I very strongly favour option 2, and it was my reading of the discussion that this was also the strong preference from the current CDR implementers. The sense I got was that while everyone understood some theoretical benefits from option (1) that none of the current CDR implementations do make use of BMM or intend to do so.

There are, I believe, also disadvantages in making the specifications even more difficult for those looking on from outside. Whilst it may not be important for newbies to understand the AQL spec, we should at least try to make it accessible for those coming into our world. BMM may well be where we head (particularly in the tooling space) but it is not what most CDR implementers seem to need right now.

I think we need to decide - can we get some kind of consensus / decision here?

sebastian.iancu · 4 March 2020 09:38

I agree, with the remark that 2.) does not exclude the 1.), so that it can come at a later stage, and might be still beneficial for some of the community.

thomas.beale · 4 March 2020 09:42

There still seems to be some confusion here. Including openEHR RM specific semantics in the definition of AQL (i.e. the spec) is something we absolutely should not do.

This is not predicated in any way upon using BMM. BMM is just one meta-model method to implement AQL so that it its semantics are correct. There are other ways, including hard-wired libs etc. None of this should be visible in the AQL spec.

ian.mcnicoll · 4 March 2020 16:39

I accept that there should be an rm neutral expression of aql. But we also badly need to pin down how aql should behave in the context of openehr. The latter cannot be derived automatically from the former.

The only point at issue is how we document expected behaviour for aql in the context of openehr ref models. You can assume that the separate model neutral aql spec will exist separately. Two documents required.

thomas.beale · 4 March 2020 17:14

Still don’t really see the problem, at least not if we are talking about how ‘CONTAINS’ works. Because all that is needed is that an AQL processor has some way (doesn’t even have to be specified, for now) of determining which relationships in an underlying information model are logically ‘composition’. Nothing needs to be mentioned about openEHR anywhere at specification level.

ian.mcnicoll · 4 March 2020 22:39

That is not the message that was coming from the implementers that responded (most of them). There is sufficient wiggle-room in the high-level AQL spec that they see the need to be explicit about how e.g CONTAINS works in openEHR - Seref has already pointed out that EHR CONTAINS COMPOSITION is quite different (practically) from COMPOSITION CONTAINS and there are many other scenarios when the correct (or simply agreed) behaviours cannot be reliably devined from a generic AQL spec - just look at Seref’s teasers. The responses have all been different from experienced implementers and no-one has really argued that the alternative viewpoints are actually wrong in principle. So the behaviour is legitimately open to interpretation but people have an appetite to reduce or eliminate that variation (for openEHR). The generic AQL spec stands, but from my reading of the discussion it is not sufficient to direct the consistent behaviour that implementers want to agree.

thomas.beale · 4 March 2020 23:52

With respect to CONTAINS, I’m afraid this is in error. There is no need for any reference to any particular RM for AQL semantics to be specified. What can be generically specified is that CONTAINS can be used both over direct (by-value) model associations and also over indirect (by-id) model associations, where logical containment is defined in a model. For the latter case, an AQL processor would have to have a method provided of resolving an id to a target object. None of this is specific to openEHR RM; it can exist in any model, and is routine in modelling.

The teasers are something else - they show that people have different ideas of what the current projection semantics are, because a) they are not yet formally defined and b) they are not necessarily intuitive such that everyone who does the thought experiment gets the same answer. What the semantics should actually be is a question, and @Seref has thought more about this than anyone else in his PhD work. I expect the final semantics we define will be based on that work or at least heavily informed by it. But that has nothing to do with the containment question, or openEHR RM in particular.

The spec certainly needs a lot of work on the semantics to be done. It doesn’t need to refer to any particular RM to achieve this.

ian.mcnicoll · 5 March 2020 00:38

I’m afraid that is not the message I am getting from the implementers( I may be wrong).

I asked the question and have been told repeatedly that they want to frame further documentation directly in the context of the openEHR RM. That is the work that Seref suggested at the top of the other thread, and was supported by all of those who have ,or are intending to, implement AQL on openEHR. I am not aware of anyone currently implementing AQL on other models.

We are wasting time now IMO. @bna @Seref @matijap - please express a preference of the options above so we can actually do this work in whatever mode is preferred. Or restate the options if I am not expressing these correctly.

Seref · 5 March 2020 09:04

My position has been clear from the beginning but happy to repeat:

I think AQL is a query language specific to openEHR RM, with a potential, but not necessary extension to openEHR demographics.

Its behaviour should be defined in reference to RM types and structure implied by those types. I don’t want to use a meta model for this definition, but instead describe the expected behaviour using RM types (EHR, COMPOSITION, …)

I gave my reasons for my position many times, so I won’t do that again.

thomas.beale · 5 March 2020 10:36

Again, there is no need for any reference from AQL to any particular reference model. That’s not an opinion, it’s a formal fact. I’m not sure why this is not clear. The idea of logical containment being realised in different ways (direct reference, id-reference) is absolutely standard in IT, it’s not specific to any particular model. There is no reason we would not make AQL work for demographics or any other model. The semantics are identical.

If current implementers want to only make it work for the openEHR RM, that’s fine - I might do the same practically. But that’s an implementation question, not a specification question.

I don’t know how to make this any clearer. AQL (or EQL, as it was when we published it in 2007) was always intended as a general language. There’s no need to make otherwise now.

ian.mcnicoll · 5 March 2020 10:43

“If current implementers want to only make it work for the openEHR RM, that’s fine - I might do the same practically. But that’s an implementation question, not a specification question.”

It is an openEHR specification question - we need to have this to be able to do conformance testing and cross-vendor querying. If people see it as part of ITS, I’m happy but we need it, in the openEHR specs.

thomas.beale · 5 March 2020 10:50

To perform conformance testing of AQL implementations, we need to know that the AQL implem knows how to correctly process the CONTAINS statement w.r.t. an underlying model - any model. Along with a lot of other semantics implied by Seref’s teasers - none of which are specific to any model.

No doubt for the moment, we will specify actual conformance test sets and results using openEHR RM. But later on we will use openEHR demographics. And later, something else, probably TP queries and structures. Specific tests & regression results can certainly be specific to a model; the spec of the language semantics they are testing cannot.

thomas.beale · 5 March 2020 10:56

That doesn’t make sense to me. There is nothing special about EHR or COMPOSITION. What we have to do is specify AQL containment semantics (which is what we are specifically dicsussing here) in terms of the kinds of relationships they may apply to, viz: direct-reference and indirect-reference. Indirect references need a resolution mechanism, but that can be assumed to be there in the data access layer anyway, since it is always needed to make the system function properly anyway. There is no need to mention any particular model or particular classes from that model (you might use them as examples, which is another thing entirely).

ian.mcnicoll · 5 March 2020 10:59

Which may be true in theory (I am not qualified to comment) but what I am hearing very consistently from implemneters is that there is no appetite to do it that way. They want to work on a common set of rules based on known experience with how AQL can be and has been applied to the openEHR RM, not to work at an abstract ‘how any model’ should work layer.

These are the people who have ot make this work, I really think we need to start listening to them. Ultimately openEHR will not survive without their support in kind and in fees.

We can carry on this argument indefinitely but most of us have real projects and customers who expect us to deliver on the promise of vendor-neutral querying.

thomas.beale · 5 March 2020 11:10

I have not seen any statements to the effect that AQL should be specified in terms of one particular information model, other than by @Seref . I doubt very much whether that is the consensus. If it were, I’m afraid it doesn’t make it any more correct. It also would complicate, not simplify the AQL specification. I have no idea why anyone would want to substitute complexity and unclarity for the opposite.

Doing things properly, clearly and simply is the aim we should always have - this is what makes implementation easier, and reduces long-term maintenance costs. It’s why archetypes work. Half-baked hacking we can leave to SDOs and other orgs who don’t want to do their homework - with that approach you get the huge compendium of impossible-to-integrate special case junk like this.

Doing AQL properly is not in any way a theoretical consideration, it’s how normal, real-world engineering is done.

Doing things properly in a shared platform development is really the only option. Hacking just leads to non-reusability, fragility, lack of maintainability and is contrary to commercial interests, let alone the interests of correctness (i.e. clinical safety).

ian.mcnicoll · 5 March 2020 11:43

From the other thread

@Matija

We need a good (i.e. understandable and strict) human-readable specification so that most questions like the ones that @bna provides a constant stream (and now he revealed why ) can be answered simply by pointing out a sentence or paragraph in the specification (that can hopefully be interpreted only in one way).

@bna

Current use of AQL is limited to query EHR RM based data. I agree with @matijap that we need some clarifications in text which covers the use-cases that customers or clients will face. If we some time in the future will do more work on DEMOGRAPHICS or TASKPLANNING then we may add text to clarify such use-cases. I think @sebastian.iancu will provide some good use-cases for DEMOGRAPHICS and we will eager to learn about their experiences.

I have asked Birger for his views from an ehrBase perspective , as he is not a SEC member, and has said much the same thing.

No-one is hacking

Doing AQL properly is not in any way a theoretical consideration, it’s how normal, real-world engineering is done.

but it is not how all of the current successful implementations have been done, and I think you need to be careful of suggesting that the current CDRs are not doing ‘normal, real-world engineering’. They may not be doing it the way you think it might imagine it could or should be done ‘properly’ but that is not how the current CDR vendors have successfully implemented ‘real-world engineering’, or how they wish to go forward.

Doing things properly in a shared platform development is really the only option. Hacking just leads to non-reusability, fragility, lack of maintainability and is contrary to commercial interests, let alone the interests of correctness (i.e. clinical safety).

@Thomas I think you need to choose your words more carefully. You can justifiably accuse me of having a hacker mindset but not others here, who have real-world successful deployments and a desire to push forward, just not necessarily in the manner you have proposed.

thomas.beale · 5 March 2020 12:01

I have not said any such thing. On the contrary, they are all quite obviously doing real-world engineering, with all the compromises that entails. I am after all, an actual engineer, I know engineering when I see it.

What I am talking about is how we (in the SEC) perform the specification work properly. Hacking is what we observe in the SDO world, and in the majority of application development in IT in general. Everyone knows that most software is bad. Most standards (in e-health at least) are bad too.

All I am saying is that we should not start ‘hacking’ at the specification level. If we don’t do specifications properly, we are done. We get incomprehensible junk. The industry is already drowning in that (I get paid to study it). I’m not interested in creating more.

I am not accusing anyone of hacking, on the contrary, the openEHR community has been characterised by strategic, long-term thinking, and has embraced good engineering. I am just saying that we (collectively) should not start down that course.

On the question whether there is a clear consensus that the following is true:

we should make the AQL specification specifically dependent upon the openEHR RM, rather than being general (as it now is);
we should do this even though it is perfectly possible to specify it generically;
if we change the RM, then it is OK for the AQL spec to break (the logical consequence of the above);
if we expect to want similar querying on e.g. demographics, then we are up for writing another query language specification;
we intend that no general-purpose AQL processor be implementable.

I have not heard anything like a clear consensus, or even clear discussion on the above from implementers. That is surely yet to come.

Seref · 5 March 2020 12:39

Leaving your view of my suggestions aside, people who took the time to respond to last few weeks’ discussions deserve more respect than that.
But then again, is there a point?

matijap · 5 March 2020 12:40

I’m standing somewhere in the middle ground here, actually.

I agree with @thomas.beale (and therefore disagree with @Seref) that AQL is a generic language with generic execution rules that can be applied over any model. I agree that we should not add arbitrary openEHR-RM-related execution rules to it, like “CONTAINS INSTRUCTION should behave in a slightly different manner than CONTAINS OBSERVATION” just because that would be convenient in one special case.

However, where we could and indeed should (and will) define such exceptions (like “data from VERSIONs whose lifecycle state is ‘incomplete’ is not returned unless the v/lifecycle_state attribute is directly referenced in any part of the query”), is a document that describes behaviour of AQL when invoked in CDRs over openEHR RM-based data. I think such a document must exist if we don’t want to lose user base due to even-steeper-than-need-be learning curve due to (over)abstractness of the standard. I will even argue that composing such a document, while formally unnecessary and incomplete, has a priority over the abstract AQL specification.

Like I stated earlier, I understand Thomas’ argument that if you have properly formally specified model (RM) and query language (AQL), they can be specified independently and everything will just snap together. The problem is, it won’t: we’ll find a lot of behaviours we won’t like for practical reasons, and then we’ll patch up one or the other spec. I’d prefer a bottom-up approach where we clarify border-cases of AQL on openEHR RM first, and then infer the rules of one and the other after the fact. And there will be some leftover rules that we will not want to include in either, and that will be “Implementation guidelines for AQL in openEHR CDR” or something like that – the rule with incomplete versions might fall into this category.

Seref · 5 March 2020 12:43

this was never suggested. just for the record.