AQL semantics: separating RM semantics from AQL semantics

thomas.beale · 16 November 2020 15:04

All,
I am afraid I remain in the dark as to why we are talking about making AQL semantics specifically dependent on openEHR RM - I thought we already agreed to the opposite on this thread.

We already have the main RM, Demographics, and TP meta-models published and in use, in BMM, JSON-schema, and XSD formats. The class relations can just be looked up, as described in the above-referenced thread.

Not doing so makes an AQL engine fragile, whereas if the relations are looked up, the engine will work on any data. Not only that, but every addition we do to the RM will require tinkering with the AQL semantics spec, and with everyone’s AQL engines, to write in special rules for the relationships of each new change (example: the addition of EHR.folders). If anyone can explain why we wouldn’t want to do this generically, I would be very appreciative.

NB: I don’t mean to say that today’s implementations, which probably are hard-wired should immediately (or maybe ever) change - I’m just talking about how we specify the semantics of AQL. I really don’t see any reason why the very nice work @Seref has done would not be generic to any model - openEHR RM is not special in any way.

Generic rules will just be of the form:

if is_composition (C1, C2) then action_aaa

if is_reachable (C1, C2) then action_aaa

where action_aaa is some action to do in the query processor. Since that is_composition() and other lookups can be very easily done against the meta-models we already have, I’m not sure why we would document anything else. It doesn’t matter whether C1 happens to be EHR or any other class.

Looking for enlightenment…

yampeku · 16 November 2020 15:35

I think the key point is deciding how/when 2 instances of the same class are different and cannot be removed from the permutation without losing results.

You have to make each class provide some kind of “identity” function, making each row a set of “identity” results. If two rows share all the identifiers then you can ignore it as a duplicate. This can be easy to calculate/decide with higher level RM classes (composition, observation…), but it will need to be decided for lower level RM classes (do two ELEMENTs with the same meaning and value represent the same measurement?). Rules became obvious when using this kinds of “identity” functions.

This in not an openEHR specific problem. As @Seref said, it’s present in all hierarchical data. Good thing is that we can decide which is the set of data that would allow us to tell 2 classes apart.

I don’t really thing we can do this really generic, as you would need additional rules if you introduce new classes (e.g. if is_folder then action_aaa)

thomas.beale · 16 November 2020 16:05

I.e. two distinct instances with identical content? It could certainly happen with non-identified value instances e.g. DV_TEXT or similar. However, it’s not just a question of the instances themselves, but what relation they are at the end of e.g. LOCATABLE.name or COMPOSITION.category. Equivalent to saying that their paths have to be distinct, even if their values are the same.

I’m not sure if this is the main problem in generation of permutations though - I would have said that if the same container is matched once based on what it contains (e.g. Comp containing Obs[BP] and Action[medication admin]) then you don’t try to match it again. For the OR case, I think it’s the same - you match every container (EHR, or COMPOSITION) if a hit on any of the sub-parts is encountered, but only once. Those containers are your ‘FROM’ data set. Then you do the SELECT extract on that.

yampeku · 17 November 2020 08:33

Every time we use LOCATABLE.name for identification purposes, god kills a kitten. That’s exactly the kind of paths we should get out from the identification function (if we want to do queries that support more than one language, that is)

thomas.beale · 17 November 2020 10:13

Haha true! It was just an example though.