Aql: clarify types supported by relational operators

Yes, it is indeed. What I’m suggesting above should address some of your points, but let me get a bit more specific:

  1. to which type each comparison operator could be applied (all the current examples compare only simple types, but don’t say anything about if comparison of DATA_VALUE or other complex types is supported or not)

See above. An extended list based on the above would go into the spec.

  1. define if comparison of path to value (a/b/c > 123) should require the data type of a/b/c to be of a type comparable with the type of the constant 123, and if that doesn’t match, what kind of result should be expected (this is a rule for AQL processing implementation that should be defined at the syntax level, basically refers to type checking for paths).

I’m always in favour of type safety. I’m also in favour of having some result that encodes either success + data or failure + error details rather than using things like exceptions. In the context of AQL, which we’re exposing in REST as well, I’d suggest we define the result type/structure to reflect this. I suspect it may already have been done. I’ll check this and update this point.

  1. related to point 2. for the “not equals” comparison, maybe on this case if the path doesn’t have the same type as the value (a/b/c > 123) the result could be “false”, so it compares type then the value. Some languages like JS and PHP have the === and !== (identical and not identical) operators that check first the type and then the value and both should be equal for the operation to return true.

I’d suggest total failure. If types don’t match, that query should not work and result mechanism I mentioned above should say: sorry, fix your types and come back. Falling back to a false result is weakining the type checking by allowing the same behaviour for what should be undefined (orange == apple) and normal comparison (1.0 == 1.0) We should not allow this.

  1. related to type checking, when doing path to path comparisons (a/b/c > a/b/d), should also check the types are comparable?

Paths point at actual instances of RM types so we’re back to actual comparison of data, which should behave with type safety in mind as per my comment above.

  1. are other types of comparison supported? (path to path, variable to variable, value to value, variable to value, etc.)

There is a need to support path to path (or in other words, value at path1 and value at path2) comparison in the where clause. One such requirement I had was to make sure that I could distinguish data in a template in the opthalmology domain. In the visual acuity archetype, I think a cluster were used twice to record data for left and right eye and I needed a way to say X should not be equal to Y where X and Y were the paths of ‘left’ and ‘right’ values that distinguishes eyes. It is impossible to express this without having path comparisons and the most natural location for this is WHERE clause at the moment.

  1. semantics of each comparison operator should be defined in detail, and examples of when the operator returns true or false should be given, this is missing from the spec leaving the definition open to interpretation.

My suggestion above would define this by referring to assumed primitive types’ behaviour which should hold across most implementation technologies (unless someone does this whole thing in Javascript… hence the ‘most’)

I wrote a table on how we interpret the comparsion/ordering of structrures in the DV family:

To illustrate how to order DV_DURATION I also added a very simple typescript : https://github.com/bjornna/openehr-conformance/tree/master/aql/case4-ordering/order_duration_example

WARNING - we updated the description on comparative functions in AQL with the following text:

Updated 17.february 2020. After a long discussion we found that the best way to handle this is to not make a generic handling of DataTypes. The client has to make an AQL path down to the primitive value which may be sorted by the underlying platform (operative system, programming language). There is one exception to this: For date and datetime we will make a convenient way to ORDER BY a path to the datatype. The reason for this is that ordering on time is so normal for health applications.

Some type issues:

  • For string values we use the .NET StringComparer.OrdinalIgnoreCase Property
  • If a path locates different datatypes/primitives the engine will give a non-deterministic result on the ORDER BY or COMPARE function
  • Magnitude is introduced as a function attribute on DV_PROPORTION og DV_DURATION to make comparsion and order by possible for those datatypes.

See more here: https://github.com/bjornna/openehr-conformance/tree/master/aql/case4-ordering

This was what I originally argued on the call. However!!

I am persuaded that there is a good argument for working at datatype element level (as made by Thomas) , as this will be required for the expression language (and has already been done, I think by Cambio for GDL, so that we can do things like

if thisCodedText == ‘SNOMED-CT::123456| the text of this thing|’

The rules being that both the code_string and terminology must match

That does make expressions much easier to represent in GDL, TP but will make AQL statements much easier to use, particularly around terminologies and observables.

Perhaps this need only apply to a subset of datatypes but I think it is worth doing.

@ian - that was my original idea and intention too, to work out some defaults for the comparators. But, when looking into it I found that there where more differences than equalities for the DATA_TYPES , and then I didn’t even start considering ELEMENTs with choice and accuracy.

So I ended up with this “defensive” proposal to not do any “magic” handling of the different data types.

I am still open for discussions on the topic of course, but want to share my current position as clear as possible.

BTW - we have some “fuzziness” in current version of EHR Craft Store (the CDR) , but we’ve found that this actually leads to more confusion to clients and in sum I think it is a cleaner approach to leave it to the client to actually make the assumptions about which primitives to use for comparison and ordering.

For now that is probably a good approach but I think we may come back to ‘magic’ handling quite quickly, as , if nothing else, it will be helpful to make sure that people are not mis-querying some of those primitives e.g mixing magnitudes with different units.

Accuracy probably needs to stay out of this as it will be very use-case specific, choice also - needs to be handled but at a different level.

We definitely need a condensed form (and associated rules) for CDSS/TP/EL and other expression use-cases.

Just so people are clear, this is very easy to do in a routine way, which is as follows:

  • the type DvCodedText is defined to have a method equivalent_to_code(aCode: TerminologyCode): Boolean (or it might be equivalent_to_code(aCode: CodePhrase): Boolean)
  • the method definition includes a sub-clause of the form alias infix '==' which maps the '==' symbol to this method
  • the AQL parser parses the code, determines that the LHS is a DvCodedText, looks for methods in the BMM on that class that have the '==' symbol as an infix alias, and then looks to see if the RHS - a literal TerminologyCode (Foundation types) is the right type for the method - which it is, assuming the defs above.
  • The method implementation takes care of differing types, e.g. DvCodedText versus TerminologyCode;
  • There can of course be more than one method doing similar equality comparisons, with different signatures, e.g. equal(other: DvCodedText): Boolean will do a direct compare on two DvCodedText objects.

The same approach works for anything, including all the quantity operators - it’s just a case of mapping operator symbols to class methods. There are a few examples in the latest BMM.

My question is very simple and has two levels:

  1. Should we compare/sort on DV_CODED_TEXT level in openEHR?
  2. If yes - what is the attribute(s) to compare/sort on ?

We need to agree on this first. Then we can add the agreed rules into the BMM and the software packages from different vendors.

Our suggested answers on the questions is as follows:

  1. Should we compare/sort on DV_CODED_TEXT level in openEHR?
    NO - the client must choose which primitive attribute to order on self. This way the client may choose to order on value , defining_code/code_string , defining_code/terminology_id/value or some combination of them.

This is not a final answer. We can adjust on this but we need to agree on these rules as a community.

Well, conceptually there is no ‘order’ for terminology concepts. However, there are a couple of possibilities for understanding order.

For some terminologies like ICDx, the codes are prefixed with letters, and the letters indicate the main diagnostic areas, e.g. upper-O = Obstetrics. After that, AFAIK, the numbers have no meaning. More generally, terminology codes have no innate order.

The second possibility is the lexical order of the description, which is potentially useful in displays, and certain useful in searching. So, we could in theory define a less_than() function on DV_CODED_TEXT with an alias infix '<' that implements that idea. Normally, we would order within terminologies, so a lexical ordering would be based on strings like:

icd10::atrial fibrillation
snomed_ct::anxiety
snomed_ct::atrial fibrillation
snomed_ct:etc

@bna- agree the challenge here is figuring out a basic set of rules that can be safely applied, particularly where parts of the datatype are ‘indivisible’ e.g magnitude+units or defining_code + terminology_id.

Answers

1a. Yes but not in this tranche of AQL
2. Yes- let’s make a start on DV_TEXT/CODED_TEXT - I think there are close connections with use of ‘stringified’ elements in GDL/Flat formats and ExpLang.

I think there is considerable value in this well beyond sort/compare in AQL.

@Thomas - that’s not exactly the issue - it is more than when doing a comparison I would argue that both terminology_id AND code_string have to match

I would then say that the ordering should be on defining_code, onthe basis that this has ‘some meaning’ for terminologies like ICD but understanding that it is meaningless for SNOMED-CT.

If folks want to sort on the value - they can do that directly, and clinically that is likely to be cross-terminology or indeed include free-text.

The point is that these are actually the rules we have to agree.

Right, but that’s an equality relation (is_equal(other: DvCodedTerm: Boolean), not an ordering relation (less_than(other: DvCodedTerm: Boolean). Any data types may have an equality relation, but only some will have ordering.

All I want to know is how to implement order or equality on DV_CODED_TEXT.

As we all know there is a few attributes to choose among. How do we expect implementations to do it?

I am sorry if my question is to trivial. But I need to know and I want the logic to be the same for all openehr systems.

I have made a suggestion in the linked page on github. It’s more or less the same as @ian.mcnicoll suggest. Sort and compare on defining code. The terminology id first and the code second. This could be good enough IMHO.

Do we agree?

From my point of view, the equality relation can easily be defined (and I agree with the way @ian.mcnicoll and @bna suggest). So this covers = and !=

Re Ordering:
I would advise against using a sort order that imposes an order on elements that we define as “equal”.
This is what Ian’s suggestion on order (use terminology_id AND code_string for equality and defining_code as a whole) would do at least in edge cases if I understand the suggestion correctly.

Further, it is not clear to me - generally speaking - what ONE purpose sorting of a coded text can really have.
Sure, sorting makes sense and both the sorting options Thomas has documented above make sense - but it just depends on your use case at hand.

I am therefore not sure we should define a standard way of ordering (and thus the <, <=, >=, > for this at all) as part of the specs. Equality is important to define, however ordering seems to be use case dependent to me and sounds like trouble.

In any case, if you really want to define a default order (mainly I think this would be to make life easier in AQL and elsewhere), I think it should probably employ the exact same attributes that determine equality, e.g. terminology_id::code_string

2 Likes

Hi Sebastian,

I think there is a middle ground but agree we should only do this where it adds real value and is clinically safe and there is clear consensus. If not,m as Seref has suggested, the AQL engine should say ‘not valid’. We absolutely cannot and should not try to cover every combination, certainly for comparison but also for equality.

2 Likes

I would rather stay away from allowing comparison and ordering of any complex objects, even DVs, in AQL. Are there any sensible use cases for that, which can not be worked around by using slightly more complex (but not incomprehensible, but arguably more specific and to-the-point) AQL? I see one or perhaps two: comparing/ordering DV_QUANTITYes and comparing/ordering dates and times.

The second one we (and as I comprehend, @bna’s team as well) solve by specially handling ISO-8601 primitives so that they get interpreted and actually ordered/compared with time zones taken into account etc.

The first one could be solvable by defining an AQL function to convert between units based on openly defined rules, which would probably need to disregard any accuracy data, but as @ian.mcnicoll stated, that is to be handled elsewhere. Then one could write where convert(o/path/to/dvquantity, "mmol/l")/magnitude > 100 and similar. Comparison (or ordering) would still be done on a primitive value of magnitude, it would just get converted first (at least in principle; CDRs could optimise this with various kinds of trickery).

This gets us to the topic of how to handle run-time errors (like inconvertible units, possibly only on a subset of available data), but that is discussed elsewhere.

2 Likes

I think to have a good solution in the spec, we need to go back to the core of this issue: comparing primitive/assumed types.

IMO we need to revisit this model [1]:

[1] https://specifications.openehr.org/releases/BASE/latest/foundation_types.html#_overview_2

Some comments:

  1. Any has an is_equal() method, should we say exactly what “is equal” for each type?
  2. In the UML, the methods of Any don’t appear.
  3. Most operators are defined with the keyword “infix”, I don’t think that is valid UML.
  4. All operators defined as “infix” could be defined as a normal method in the UML.
  5. Then if some implementation needs to map those methods to an infix operator, it could be done, but not all implementations need that.
  6. Seems all Numeric classes are Ordered, so why not inherit Numeric from Ordered instead of having the Ordered_Numeric class?
  7. Ordered seems to be more an interface than an abstract class, since it only has a method.
  8. Numeric seems also to be more an interface than a class.
  9. Ordered has the lowerThan() defined as “infix <”, I think we need to define how that works for all the ordered types.

With this considered, mainly points 1. and 9. we would have a good base to define is_equals() and lower_than() for other types, like “numeric DVs”.

In theory we should do that.

In that particular view, they don’t, but they are visible here.

It isn’t, that is some old text I should fix that was imported from the manual diagrams. The methods should be defined with proper names like less_than(), is_equal() etc. UML doesn’t have meta-model to include operator aliases, so I’ll need to find another way to put them in.

We could do that. I followed (probably 15y ago) the type design of the Eiffel libraries which happened to have them separate (back then most other languages had not sorted out basic types beyond things like int, long, etc). These types aren’t implemented of course, they just enabled the assumed semantics of the assumed primitive types to be formalised.

These classes are defined as abstract classes rather than interfaces because they theoretically provide implementation of the relevant routines, they just aren’t directly instantiable. An ‘interface’ is strictly no implementation. Again, these distinctions are not that important in this part of the model.

Point 9 is also correct, and I can easily build a revised version of this model with the appropriate cleaning up.

The other option is to say “we delegate this logic to any programming language used as implementation technology”, but if we go that way, different languages might have a different criteria for = and < I would prefer to spend some time defining all the basic semantics inside the spec to give a solid semantic base to the rest of the RM. That way we avoid leaving things open to interpretation at this base level.

Thinking of is_equal or lower_than, I think an abstract class can’t provide the implementation because that will depend on the specific type, that is why I mentioned the interface thing.

The way the functions less_than(), greater_than(), less_than_or_equal() and greater_than_or_equal() work is that you leave less_than() abstract, and define it in concrete descendants, but the other 3 you can implement in the abstract class, and they will work correctly in all the others. Although in today’s languages I guess you will be forced to manually rewrite the signatures because you don’t have anchored types (‘like T’, including ‘like self’), but nevertheless, the implems from Any will still be correct.

Also, there is no in-principal reason why a class like Any or any other abstract class should not contain implementation, whereas an ‘interface’ class is in principle just that.

I have only used UML interfaces in the UML to indicate service interfaces, i.e. APIs, mainly in the abstract platform service spec.