AQL to Database SQL convertor

pablo · 28 February 2024 18:13

It’s long/complex to explain, you’ll need to study the whole AQL and we’ll need to do a workshop or two to go through all the issues in the spec. Though I did study the specs and tried to improve what I could while working for HiGHmed, but don’t trust me, you can check the spec yourself.

One small example, consider the WHERE clause definition: Archetype Query Language (AQL)

Then think about how that should be evaluated and executed and how that affects the results of a query, and all the issues that could happen from that evaluation/execution. You will find many things are not defined by the spec, for instance what happens when you have o/some/path > 123 and o/some/path points to a string, we don’t have a strongly typed syntax right now. The spec is focused on the syntax, not on the model and functions under it, and there is a good reason for that: when that is specified, it needs a reference schema, and since openEHR persistence is open for implementation, each vendor can define it’s own schema, so each vendor can implement AQL over different schemas in different was, then we have divergent results in some cases.

On the other hand, consider the WHERE definition in SQL:

Do you see the difference? SQL has a more mathematical/scientific approach for the definition. It also mentions how it applies to results (rows) in relation to the FROM clause, and gives some rules that defines how it should be evaluated/executed. All that is defined over a given model/schema that defines tables, columns, rows, etc. which we don’t have in openEHR.

Another point: AQLto SQL is not always the transformation you want. In fact a CDR could be implemented on a JSON or XML database, or on a graph database, so AQL will need to be translated to the speicific query language used by the database, like XQuery in XML dbs. For good or bad AQL resembles SQL which for newcomers might be confusing since most think that SHOULD be translated to SQL.

Certainly is not about that, it’s about all CDRs interpreting an AQL in the same way and execute exactly the same query (semantically) in the database brand they use, even if it’s no-SQL. That is why (for me) the model behind the query and the execution steps are more important than the syntax of the query. If two CDRs use the same query model and follow the same query execution rules, they can use any syntax they want to represent their queries as strings.

With that being said, I don’t think it’s practically possible to change the current spec in that direction, because it’s really difficult to define the DB schema, which is needed to specify the query model and execution details. Though I believe it’s possible to define some abstract/virtual model to have a framework that allow us to define those aspects in the spec, and that allows each vendor to map to their own technology stack. I also think this is a green area for innovation, since we could have different query languages for specific domains like data analysis, reporting, CDS, etc). For instance our query model is focused more on those areas than on being a general purpose query model (AQL is a general purpose query SYNTAX).

Hope that helps for the discussion.