Case-insensitive AQL queries

Callum_Anderson · 7 November 2024 12:25

I wonder if anyone has found a working solution for case-insensitive AQL queries? In EHRBase, it is technically possible to implement it at the database level, using Postgres collate. This can be applied either per field, or across the whole DB. The caveat is that you can no longer use “LIKE” comparisons in your underlying SQL queries.
And then of course if you’re target search fields are within a composition, well that’s stored as a JSON blob. Collations can’t be applied to jsonb types in Postgres.

Other than that, of course you can adjust the ORM (Jooq) mappings, if you’re custom-building your own EHRBase image. This isn’t possible of course if you’re using one of the published Docker images.

The biggest issue that I can see though, is that AQL doesn’t appear to have any provision for indicating case-sensitivity? At least not that I could see, looking at the documentation.

Paulmiller · 11 November 2024 09:37

Just bouncing this back up for @SEC @SEC-experts to see if anyone has any comment.

Paul

pablo · 11 November 2024 15:03

This seems to be an implementation constraint and might not have to do with the AQL specs.

Callum_Anderson · 11 November 2024 15:37

Thanks for the response Pablo. If that is the case, then there must be an AQL keyword that indicates a case-insensitive search? I can’t find any mention of it in the spec.

pablo · 11 November 2024 16:33

@Callum_Anderson in general when there is no definition in the specs, it formally allows any implementation. What should include a word about that (to clarify the functionality to users) is the implementation technology itself.

On a side note this is a good example of when the Conformance Verification process is needed (GitHub - ppazos/openehr-conformance-verification: Conformance verification framework or openEHR implementations) and this is exactly the role of a Conformance Statement document: it’s a document that explains how a vendor implemented the openEHR specs, which design and technical decisions were made (where the spec is open to interpretation), or where any divergence from the specs were implemented.

Callum_Anderson · 12 November 2024 08:50

Thanks @pablo. So in effect, it is not in the AQL, nor is it likely to be included in future.

I suppose we could aim to implement it within EHRBase, and then update the Conformance Statement accordingly. I’d never heard of that before, that’s useful to know, thanks. Sounds a bit like the Capability Statement in FHIR, although it doesn’t look like it’s mandated that the Conformance Statement should be returned via a REST API call. There is mention of it in the REST API spec under the OPTIONS method, but reads like it is optional.

vidi42 · 12 November 2024 09:44

@Callum_Anderson we are also considering internally how case-insensitive AQL queries would behave and what would it mean technically so maybe we could align at some point.

Also, I will bring this up in the SEC AQL workgroup as a discussion point to expand more on it.

In the meantime, if you are doing already some work on EHRbase in this regards please consider opening a PR GitHub - ehrbase/ehrbase: An open source openEHR server.

Callum_Anderson · 12 November 2024 10:51

Thanks Alex, that would be fantastic. I’m currently looking into a potential solution for EHRBase which will likely use a flag to indicate searches should be case-insensitive, using an environment variable to input that flag.

If it works out, I’ll certainly submit a PR.

pablo · 12 November 2024 17:47

The Conformance Statement is something pretty standard in the DICOM world, when working with imaginology almost every vendor will have a DICOM Conformance Statement document that says how they implemented the DICOM standard and which parts of it are supported, also documents any customizations or extensions done.

Since I started working with openEHR Conformance Verification around 2019 for EHRBase when I worked at HiGHmed, I’ve been promoting the idea for openEHR to incorporate and standardize the Conformance Statement in the openEHR world, but without any luck. I think it’s a crucial part for conformance verification but most think running some tests against a system is enough for openEHR conformance. For instance, it’s different to test a system that complies with openEHR RM 1.0.2 than RM 1.1.0, that’s another example of the info that a conformance statement should have.

You can check section 2 of this doc openehr-conformance-verification/openEHR Conformance Framework Design (2023-11-21).pdf at master · ppazos/openehr-conformance-verification · GitHub

For reference, this is a sample conformance statement from our Atomik Server Atomik standardized Clinical Data Repository and Demographic Data Repository

REFS about DICOM C.S.

https://dicom.nema.org/medical/dicom/current/output/chtml/part02/chapter_6.html

It’s something like that, though in FHIR it’s all about the API and in openEHR it could include some internal decisions, like the case sensitiveness you mentioned, which might not be a thing you see explicitly at the API level.

Hope that helps.

pablo · 18 November 2024 16:50

@Callum_Anderson I just found this comment in Atomik’s code and found it funny:

“for MySQL is LIKE, for postgres is ILIKE”

In MySQL LIKE is case insensitive, but for the same behavior in PostgreSQL ILIKE should be used instead of LIKE. I’m guessing EHRBase uses LIKE internally, though fixing that would be just adding the “I”.

Since Atomik could use multiple databases, we need to have those considerations to reach a consistent querying behavior.

ian.mcnicoll · 19 November 2024 16:47

Better’s LIKE is case-insensitive.

Do we ever need a case-sensitive LIKE in EHR querying? I can;t think of any, off -hand.

If not there is perhaps a case for just making LIKE case-insensitive (as per MySQL) and ?? adding LIKE BINARY ???

Altnernatively , explicitly add ILIKE to AQL.

One for SEC to chew over as part of conformance

pablo · 19 November 2024 17:32

From my experience case sensitiveness causes more issues than it solves. The query creator should consider that every time and make adjustments to avoid missing results on each query.

IMHO case sensitiveness might not be something we need to add to AQL, that is having an operation that is case sensitive and another that isn’t, though adding a comment might help implementers.

Note in MySQL all basic operators are case insensitive and include the characters with accents.

" Simple comparison operations (>=, >, =, <, <=, sorting, and grouping) are based on each character’s “sort value.” Characters with the same sort value are treated as the same character. For example, if e and é have the same sort value in a given collation, they compare as equal." https://dev.mysql.com/doc/refman/8.4/en/case-sensitivity.html

So it’s not only the LIKE in MySQL. It would be nice to test EHRBase and Better’s to see what happens with the simple comparisons. In EHRServer and Atomik it will work as MySQL works.

What I do think is needed is that implementations that do implement case sensitiveness (on purpose or not) might need to add a comment for that on their documentation, or better in their Conformance Statement document (mentioned above).

ian.mcnicoll · 29 November 2024 09:04

Might the easiest solution to this be to add Upper() and Lower() string functions?

Avoids postgres vs. MySQL flavour wars?

pablo · 30 November 2024 02:24

Note the issue is not about MySQL vs PostgreSQL. It’s more that an AQL implementation over PostgreSQL uses LIKE instead of ILIKE internally. So this is purely about an implementation decision or something that might have been overseen.

Also note that you might need to uppercase a path since you don’t know how the data at the path is stored, something like this IMHO is overkill:

UPPER(a/b/value) LIKE UPPER(‘hello’)

pablo · 9 December 2024 03:32

Just read something about SQL optimization that made me remember this thread. If you use a function in the WHERE over a column, then if there is an index on that column, the index won’t be used since the value needs to be evaluated to execute the function. I know we don’t have indexes and columns when we talk about AQL, but when AQL gets translated to SQL this could cause a performance problem on databases with millions of records. That’s another reason why I wouldn’t use UPPER() or LOWER() in the WHERE statement for doing data filtering.