Aql: clarify types supported by relational operators

Seref · 7 February 2020 10:42

I’d suggest the following clarifications to the specification regarding the use of relational operators:

Let’s add a list of types supported by the relational operators > >= = != <= < and define expected behaviour.
We can later extend this list, but I think the ones below from the quantity package have high priority due to being used frequently. Type of value attribute in parenthesis.

DV_ORDINAL(Integer)
DV_SCALE(Real)
DV_PROPORTION(Real,Real)
DV_QUANTITY(Real,Int)
DV_COUNT(Int64)

DV_DURATION(String/ISO8601) => Non-string based comparison suggested(string based comp works in some cases but not in all, let’s discourage this). Otherwise logic and considerations for ISO8601 based date considerations apply, i.e. time zones etc.

DV_TEMPORAL subtypes when accuracy is null
DV_TIME(String/ISO8601) => Same as DV_DURATION
DV_DATE_TIME(String/ISO8601) => Same as DV_DURATION
DV_DATE(String/ISO8601) => Same as DV_DURATION
DV_TEMPORAL subtypes when accuracy is not null
Currently not supported. I have no idea a.t.m how we can specify behaviour for relational operators if my understanding of accuracy is correct. I think there are some cases where value and accuracy can work for comparision, as in 1.0 with 0.3 accuracy is definitely smaller than 2.0 with 0.4 accuracy: [0.7,1.3] interval won’t overlap with [1.6, 2.4] but when intervals overlap, it gets tricky. Happy to be corrected if I’m missing something here

Behaviour when at least one operand has non null accuracy?
=> exclude from result, optionally include warning in the lines of ‘undefined DV_TEMPORAL comparison encountered due to different accuracy of operands’

I’m not sure if there is any value in discussing <, > etc for types outside of quantity package. = != operators have wider applicability so we can list them as a separate subset, including types such as Dv_Text, uri etc.
Please list any types you’d like to see support relational operators in aql

yampeku · 7 February 2020 13:37

What about DV_BOOLEAN using just = and != operations?

Seref · 7 February 2020 13:39

yep, certainly. we’ll need to go through the existing (data)types like this, assuming we’re ok with what I’m suggesting above in terms of clarifying rel. ops.

pablo · 14 February 2020 05:08

I think this thread is related to point r. on my AQL issues report https://docs.google.com/document/d/1g8zOh06LhSNi1yFZWKuBzUX0bJN88r7mKpAFqDNi2JI/edit?usp=sharing

Seref · 14 February 2020 12:21

Yes, it is indeed. What I’m suggesting above should address some of your points, but let me get a bit more specific:

to which type each comparison operator could be applied (all the current examples compare only simple types, but don’t say anything about if comparison of DATA_VALUE or other complex types is supported or not)

See above. An extended list based on the above would go into the spec.

define if comparison of path to value (a/b/c > 123) should require the data type of a/b/c to be of a type comparable with the type of the constant 123, and if that doesn’t match, what kind of result should be expected (this is a rule for AQL processing implementation that should be defined at the syntax level, basically refers to type checking for paths).

I’m always in favour of type safety. I’m also in favour of having some result that encodes either success + data or failure + error details rather than using things like exceptions. In the context of AQL, which we’re exposing in REST as well, I’d suggest we define the result type/structure to reflect this. I suspect it may already have been done. I’ll check this and update this point.

related to point 2. for the “not equals” comparison, maybe on this case if the path doesn’t have the same type as the value (a/b/c > 123) the result could be “false”, so it compares type then the value. Some languages like JS and PHP have the === and !== (identical and not identical) operators that check first the type and then the value and both should be equal for the operation to return true.

I’d suggest total failure. If types don’t match, that query should not work and result mechanism I mentioned above should say: sorry, fix your types and come back. Falling back to a false result is weakining the type checking by allowing the same behaviour for what should be undefined (orange == apple) and normal comparison (1.0 == 1.0) We should not allow this.

related to type checking, when doing path to path comparisons (a/b/c > a/b/d), should also check the types are comparable?

Paths point at actual instances of RM types so we’re back to actual comparison of data, which should behave with type safety in mind as per my comment above.

are other types of comparison supported? (path to path, variable to variable, value to value, variable to value, etc.)

There is a need to support path to path (or in other words, value at path1 and value at path2) comparison in the where clause. One such requirement I had was to make sure that I could distinguish data in a template in the opthalmology domain. In the visual acuity archetype, I think a cluster were used twice to record data for left and right eye and I needed a way to say X should not be equal to Y where X and Y were the paths of ‘left’ and ‘right’ values that distinguishes eyes. It is impossible to express this without having path comparisons and the most natural location for this is WHERE clause at the moment.

semantics of each comparison operator should be defined in detail, and examples of when the operator returns true or false should be given, this is missing from the spec leaving the definition open to interpretation.

My suggestion above would define this by referring to assumed primitive types’ behaviour which should hold across most implementation technologies (unless someone does this whole thing in Javascript… hence the ‘most’)

bna · 15 February 2020 07:46

I wrote a table on how we interpret the comparsion/ordering of structrures in the DV family:

To illustrate how to order DV_DURATION I also added a very simple typescript : https://github.com/bjornna/openehr-conformance/tree/master/aql/case4-ordering/order_duration_example

bna · 17 February 2020 10:02

WARNING - we updated the description on comparative functions in AQL with the following text:

Updated 17.february 2020. After a long discussion we found that the best way to handle this is to not make a generic handling of DataTypes. The client has to make an AQL path down to the primitive value which may be sorted by the underlying platform (operative system, programming language). There is one exception to this: For date and datetime we will make a convenient way to ORDER BY a path to the datatype. The reason for this is that ordering on time is so normal for health applications.

Some type issues:

For string values we use the .NET StringComparer.OrdinalIgnoreCase Property
If a path locates different datatypes/primitives the engine will give a non-deterministic result on the ORDER BY or COMPARE function
Magnitude is introduced as a function attribute on DV_PROPORTION og DV_DURATION to make comparsion and order by possible for those datatypes.

See more here: https://github.com/bjornna/openehr-conformance/tree/master/aql/case4-ordering

ian.mcnicoll · 17 February 2020 10:22

This was what I originally argued on the call. However!!

I am persuaded that there is a good argument for working at datatype element level (as made by Thomas) , as this will be required for the expression language (and has already been done, I think by Cambio for GDL, so that we can do things like

if thisCodedText == ‘SNOMED-CT::123456| the text of this thing|’

The rules being that both the code_string and terminology must match

That does make expressions much easier to represent in GDL, TP but will make AQL statements much easier to use, particularly around terminologies and observables.

Perhaps this need only apply to a subset of datatypes but I think it is worth doing.

bna · 17 February 2020 10:54

@ian - that was my original idea and intention too, to work out some defaults for the comparators. But, when looking into it I found that there where more differences than equalities for the DATA_TYPES , and then I didn’t even start considering ELEMENTs with choice and accuracy.

So I ended up with this “defensive” proposal to not do any “magic” handling of the different data types.

I am still open for discussions on the topic of course, but want to share my current position as clear as possible.

BTW - we have some “fuzziness” in current version of EHR Craft Store (the CDR) , but we’ve found that this actually leads to more confusion to clients and in sum I think it is a cleaner approach to leave it to the client to actually make the assumptions about which primitives to use for comparison and ordering.

ian.mcnicoll · 17 February 2020 11:03

For now that is probably a good approach but I think we may come back to ‘magic’ handling quite quickly, as , if nothing else, it will be helpful to make sure that people are not mis-querying some of those primitives e.g mixing magnitudes with different units.

Accuracy probably needs to stay out of this as it will be very use-case specific, choice also - needs to be handled but at a different level.

We definitely need a condensed form (and associated rules) for CDSS/TP/EL and other expression use-cases.

thomas.beale · 17 February 2020 12:00

Just so people are clear, this is very easy to do in a routine way, which is as follows:

the type DvCodedText is defined to have a method equivalent_to_code(aCode: TerminologyCode): Boolean (or it might be equivalent_to_code(aCode: CodePhrase): Boolean)
the method definition includes a sub-clause of the form alias infix '==' which maps the '==' symbol to this method
the AQL parser parses the code, determines that the LHS is a DvCodedText, looks for methods in the BMM on that class that have the '==' symbol as an infix alias, and then looks to see if the RHS - a literal TerminologyCode (Foundation types) is the right type for the method - which it is, assuming the defs above.
The method implementation takes care of differing types, e.g. DvCodedText versus TerminologyCode;
There can of course be more than one method doing similar equality comparisons, with different signatures, e.g. equal(other: DvCodedText): Boolean will do a direct compare on two DvCodedText objects.

The same approach works for anything, including all the quantity operators - it’s just a case of mapping operator symbols to class methods. There are a few examples in the latest BMM.

bna · 17 February 2020 13:35

My question is very simple and has two levels:

Should we compare/sort on DV_CODED_TEXT level in openEHR?
If yes - what is the attribute(s) to compare/sort on ?

We need to agree on this first. Then we can add the agreed rules into the BMM and the software packages from different vendors.

Our suggested answers on the questions is as follows:

Should we compare/sort on DV_CODED_TEXT level in openEHR?
NO - the client must choose which primitive attribute to order on self. This way the client may choose to order on value , defining_code/code_string , defining_code/terminology_id/value or some combination of them.

This is not a final answer. We can adjust on this but we need to agree on these rules as a community.

thomas.beale · 17 February 2020 14:38

Well, conceptually there is no ‘order’ for terminology concepts. However, there are a couple of possibilities for understanding order.

For some terminologies like ICDx, the codes are prefixed with letters, and the letters indicate the main diagnostic areas, e.g. upper-O = Obstetrics. After that, AFAIK, the numbers have no meaning. More generally, terminology codes have no innate order.

The second possibility is the lexical order of the description, which is potentially useful in displays, and certain useful in searching. So, we could in theory define a less_than() function on DV_CODED_TEXT with an alias infix '<' that implements that idea. Normally, we would order within terminologies, so a lexical ordering would be based on strings like:

icd10::atrial fibrillation
snomed_ct::anxiety
snomed_ct::atrial fibrillation
snomed_ct:etc

ian.mcnicoll · 17 February 2020 14:43

@bna- agree the challenge here is figuring out a basic set of rules that can be safely applied, particularly where parts of the datatype are ‘indivisible’ e.g magnitude+units or defining_code + terminology_id.

Answers

1a. Yes but not in this tranche of AQL
2. Yes- let’s make a start on DV_TEXT/CODED_TEXT - I think there are close connections with use of ‘stringified’ elements in GDL/Flat formats and ExpLang.

I think there is considerable value in this well beyond sort/compare in AQL.

ian.mcnicoll · 17 February 2020 15:00

@Thomas - that’s not exactly the issue - it is more than when doing a comparison I would argue that both terminology_id AND code_string have to match

I would then say that the ordering should be on defining_code, onthe basis that this has ‘some meaning’ for terminologies like ICD but understanding that it is meaningless for SNOMED-CT.

If folks want to sort on the value - they can do that directly, and clinically that is likely to be cross-terminology or indeed include free-text.

The point is that these are actually the rules we have to agree.

thomas.beale · 17 February 2020 15:17

Right, but that’s an equality relation (is_equal(other: DvCodedTerm: Boolean), not an ordering relation (less_than(other: DvCodedTerm: Boolean). Any data types may have an equality relation, but only some will have ordering.

bna · 17 February 2020 15:45

All I want to know is how to implement order or equality on DV_CODED_TEXT.

As we all know there is a few attributes to choose among. How do we expect implementations to do it?

I am sorry if my question is to trivial. But I need to know and I want the logic to be the same for all openehr systems.

I have made a suggestion in the linked page on github. It’s more or less the same as @ian.mcnicoll suggest. Sort and compare on defining code. The terminology id first and the code second. This could be good enough IMHO.

Do we agree?

sebastian.garde · 17 February 2020 17:26

From my point of view, the equality relation can easily be defined (and I agree with the way @ian.mcnicoll and @bna suggest). So this covers = and !=

Re Ordering:
I would advise against using a sort order that imposes an order on elements that we define as “equal”.
This is what Ian’s suggestion on order (use terminology_id AND code_string for equality and defining_code as a whole) would do at least in edge cases if I understand the suggestion correctly.

Further, it is not clear to me - generally speaking - what ONE purpose sorting of a coded text can really have.
Sure, sorting makes sense and both the sorting options Thomas has documented above make sense - but it just depends on your use case at hand.

I am therefore not sure we should define a standard way of ordering (and thus the <, <=, >=, > for this at all) as part of the specs. Equality is important to define, however ordering seems to be use case dependent to me and sounds like trouble.

In any case, if you really want to define a default order (mainly I think this would be to make life easier in AQL and elsewhere), I think it should probably employ the exact same attributes that determine equality, e.g. terminology_id::code_string

ian.mcnicoll · 17 February 2020 17:45

Hi Sebastian,

I think there is a middle ground but agree we should only do this where it adds real value and is clinically safe and there is clear consensus. If not,m as Seref has suggested, the AQL engine should say ‘not valid’. We absolutely cannot and should not try to cover every combination, certainly for comparison but also for equality.

matijap · 18 February 2020 07:15

I would rather stay away from allowing comparison and ordering of any complex objects, even DVs, in AQL. Are there any sensible use cases for that, which can not be worked around by using slightly more complex (but not incomprehensible, but arguably more specific and to-the-point) AQL? I see one or perhaps two: comparing/ordering DV_QUANTITYes and comparing/ordering dates and times.

The second one we (and as I comprehend, @bna’s team as well) solve by specially handling ISO-8601 primitives so that they get interpreted and actually ordered/compared with time zones taken into account etc.

The first one could be solvable by defining an AQL function to convert between units based on openly defined rules, which would probably need to disregard any accuracy data, but as @ian.mcnicoll stated, that is to be handled elsewhere. Then one could write where convert(o/path/to/dvquantity, "mmol/l")/magnitude > 100 and similar. Comparison (or ordering) would still be done on a primitive value of magnitude, it would just get converted first (at least in principle; CDRs could optimise this with various kinds of trickery).

This gets us to the topic of how to handle run-time errors (like inconvertible units, possibly only on a subset of available data), but that is discussed elsewhere.