AQL: Clarify types to be supported by ORDER BY

Raised by @bna on slack. I’d suggest basing this on Aql: clarify types supported by relational operators Copying my response from slack:

ORDER BY is closely related to this since we can define it as based on the result of the relational operator x < y. So the RM types within the scope of ORDER by would be defined by the underlying relational operator < This would be a consistent way to define which types ORDER by would work on, based which types relational operators <,> and = work on

1 Like

As a first simplification, we could say “ORDER BY could be specified only over simple types like numbers and strings” and put some examples in the spec. Then discuss if that is enough or if we really need to order by an openEHR DV, which have multiple attributes and the AQL processor will really be doing a mapping to an internal attribute that will still be a simple type.

IMO the only thing we gain from ordering by a DV is to write a little shorter path, but at the price of making AQL processors, and the AQL spec, more complex.

Maybe not - as long as the ‘<’ operator is defined on any type (which it is in the openEHR DV_ORDERED types), then any ordering is easy enough to implement, assuming instantiation of the relevant DV objects.

Which I think is what @Seref was saying…

yep. what I’m saying is define order by in terms of ‘definition(s) of relational operator(s)’ and you have both consistency and simplicity.

I agree, but that will take some time for some types that have “accuracy” ans discussed on today’s meeting. Maybe we can break this into two issues, first specify ORDER BY for simple/assumed types **, then when we agree on how to order DVs, add support to DVs in AQL’s ORDER BY.

** BTW, I’m not sure is the assumed types also have some definition for “<” and “==”, or if we are relying on the programming languages for that (most are reasonable about == but some have differences, and might generate inconsistent results, something to investigate at least in the implementation technologies we have: Java/Groovy, .Net, PHP, Ruby, Python(?), Eiffel (I know one guy working on this one :slight_smile: )).

Yes, when handwaving. No, when real world hits you with its performance requirements. I agree with @pablo’s statement that there’s not much value added for customers, so I don’t think this pays off.

1 Like

I fully support this. The reason why I am boring you with this is the requirements to be able to ORDER houndred of millions of records . This is when you need to be careful about the details. The first mandatory step is to define exactly which attributes to use for ordering and also for equal operations.

The first level is to agree on: Do we support ORDER and EQUAL operations on non-primitives?

If yes: Which complex openEHR datatype should we support ORDER and EQUAL on and which attributes on the classes should be used? As a side note on this we also need to agree on the serialized form for the EQUAL operator of complex datastructures (ie. :: for DV_CODED_TEXT/CODE_PHRASE ).

As I mentioned in another topic I have come the (temporarily) conclusion to not support ORDER and EQUAL on non-primitives. The reason for this is that we so far are able to agree on the rules for the operations.

Not sure what you mean here. I assume we agree that we want ordering on the Ordered primitive types. In the Ordered type, the '<' operator is defined, and along with '=', that gives you all you need to define all the other operators.

At the specification level, these just state assumed semantics of whatever classes these Foundation classes are implemented by - suggestions are in the type x-ref list.

Now it may be that whatever library is being used say in Java doesn’t have a ‘<’ operator for Iso8601_date, only a lessThan() method. This is where type-checking and mapping of operator aliases to method names is required. In the future, the BMM will specify this, and an AQL implem would just look it up. Also in the future, AQL queries would be validated and compiled to directly executable form, which would avoid this lookup each time it is encountered in the same query.

Now we consider relational operators defined on RM and other classes - classes for which we write the libraries, we need to not only specify the interface but also the semantics for things like '<' on DvQuantity. This would map to a function like:

less_than (other: DvQuantity[1]): Boolean {
    pre_condition
        valid_comparison: is_strictly_comparable_to (other)
    // specify algorithm
}

is_strictly_comparable_to (other: DvQuantity[1]): Boolean {
    // specify algorithm
}

The algorithms are specific to each descendant type of DvOrdered, and they are whatever we agree on. They would then be implemented in the relevant types in the relevant implementation libraries.

I am not saying we need to implement ordering on any DV types, only that if we do, the above is the way to do it: agree on the semantics, add them to the specification, and implement in the normal way.

The only thing that is not solved in the current framework is that we don’t have the latest form of BMM implemented that would include the mappings from operator symbols to method names for classes openEHR defines. For assumed types, I would suggest that a private operator symbol → method mapper is always needed, since the methods may be different in each implementation, since they are supplied by Java, C#, etc and whatever libs are being used for the primitive types, date/times etc.

None of this seems controversial to me.

I mean it’s easy to have two instances of some type or other and implement comparison any way specification defines, and it’s easy to do this for any defined type, and it’s easy to implement a generic method that takes two objects and determines whether they are (according to specification) comparable and, if they are, compare them.

What is hard is to implement this comparison over indices that enable AQL engine to filter through billions of stored compositions in sub-second time. It’s still doable (usually), but it’s not straightforward, not generic, requires a lot of ingenuity and might not be worth it if it brings nothing major to the customers.

1 Like

Ok - so I think you are talking performance optimisation. I would have thought that approaches to that would include being able to access more or less any fragment of code from a main library (in Java, C# etc) from a ‘stored procedure’ invocation, which is possible for some years in Postgresql and I assume other DBs. Not really my area admittedly, but it doesn’t seem to me that performance considerations would change how we specify the semantics? Or if so, I don’t yet see how - are you proposing that we specify a dedicated ‘data comparison module’ for AQL or similar?

NB: I also agree that if some specific semantics like DvCodedText < DvCodedText doesn’t bring obvious value to AQL users, we probably shouldn’t be doing it…

Of course my point of view is impacted by implementation considerations as well. :slight_smile: If I saw it as something very beneficial for users, I’d be pushing it, but I really don’t. We’ll spend a lot of time discussing how to compare quantities or durations, and at the end the query authors won’t be sure any more what goes and will mostly work with primitive types anyway. (Comparing quantities with automatic conversion in expression language would be nice, though. :slight_smile: )

1 Like

Inclined to agree. Anyway, to implement DvQuantity ‘<’ properly, you actually have to define a function like this:

less_than (other: DvQuantity[1]; quantity_checker: lambda (dvq1, dvq2: DvQuantity): Boolean): Boolean {
    pre_condition
        valid_comparison: is_strictly_comparable_to (other, quantity_checker)
    // specify algorithm
}

where quantity_checker is an agent / lambda that takes two DvQuantities and (probably) goes and looks in a units/properties knowledge service to figure out that ‘mm[Hg]’ and ‘kPa’ are in fact comparable, and what is the conversion factor.

The other way to implement ‘<’ on some complex types like DvQuantity is indeed to define a DataValueComparator class, or similar, that knows all about units knowledge base, terminologies, and other resources needed to do proper comparisons.

One of the above approaches is actually the minimum for correct processing (my earlier bit of code conveniently skimmed past this detail :wink:

I am happy to stay well away from this right now…

just for future references, i will cross-link this here: Aql: clarify types supported by relational operators

1 Like