AQL: Clarify use of parameters in AQL queries

Seref · 14 February 2020 11:38

Going over the section of 3.5 of the spec, based on Pablo’s comments re the need to clarify how quotes are to be applied, I realised that we need to do more than that.

Reading Section 3.5.3 (where and how parameters are resolved), I can see that the spec is implying that some parameters can be recognised by the backends by default and their value would be provided during query execution. One such example is ehrUid.

I suggest we remove this particular feature of parameters. The example of an ehrUID is particularly problematic: how would a server know which ehr uid to set during query execution? This could only work via use of a mechanism such as a session?

In the past, Ocean’s CDR had support for sessions. We’d open a session, then use the session token so that the CDR could set various data during execution. Without such a session/token mechanism, it is impossible for a CDR to know what ehrUid parameter should be set to. We walked away from this approach for various reasons, mainly the same reasons for server side computing moving to more stateless designs.

If we keep this definition of parameters, then we’re implying the need for use of a session mechanism, which is an implementation detail, which we’d rather not push into specs. If we don’t suggest sessions, then we’re specifying a vague behaviour, which would require an approach that we came to conclude to be hard to scale. Other vendors are welcome to share their experiences of course, this is Ocean’s.

I would be keen to hear use cases other vendors may have for having a list of pre-allocated server side parameter references/names. Unless we have some convincing arguments, we’d better remove this particular aspect of parameters.

Another reason for doing so is the REST API for Aql assuming that parameter values are provided along with the execution request for AQL queries. Whether or not the query is stored on the server side doesn’t matter. This use of parameters is similar to parameterised SQL queries used in JDBC, ADO.NET and other data access layers. This is indeed the most common way of using parameters in Ocean. Again, other vendors are most welcome to share their experiences and used cases.

Based on the above, and Pablo’s point about clarifying the quoting behaviour of parameters, I’d suggest the following content for section 3.5 (Parameters). I’ll write this in more detail if I can get some high level feedback

…
Parameters can be used at any place in an AQL query where value of a primitive RM type is expressed. The primitive types (base/org/openehr/base/foundation_types/primitive_types) for which a parameter can be used are:

		Boolean
    		Character 
    		Double
    		Integer
    		Integer64
    		Octet
    		Real
    		String 
    		Uri

When a parameter is used to refer to a value of Character, String or an Uri type, values provided to the parameter are automatically enclosed in single quotes by the AQL implementation. Parameter’s used for values of remaining basic types are replaced without modification in the query text.

The actual values of parameters are provided to the execution context of AQL query using the particular AQL implementation’s APIs. Users of AQL’s should consult vendor documentation to find out how to pass actual values of parameters to a query. For example, openEHR REST specification provides one such API definition for invoking AQL queries, supported by vendors who implement the REST API.

The use of parameters in AQL queries allow input sanitation (against malicious input values similar to SQL injection attacks), caching (when same query is invoked multiple times with same parameter values) and ease of solution development (reusing a stored query definition by passing only parameters.)
…

thomas.beale · 14 February 2020 12:56

I’m not sure this is as problematic as you think. The param can either:

be set inline to some actual id (useful for testing)
set by the execution context to an id it determines by previous processing, or
will be substituted with every EHRId in some Ehr repository, which must be known in the context.

I am pretty sure we want to retain replaceable params in AQL queries (and any other kind of query for that matter).

Seref · 14 February 2020 16:23

Can you please clarify these cases? first one sounds like the canonical use case I mentioned above and the other too are too vague for me.

Some pseudo-code and/or better definition of the use-case would be much appreciated.

thomas.beale · 14 February 2020 19:33

well essentially, either we have [ehr_id/value=1234], i.e. some actual id, or else [ehr_id/value=$ehr_id], i.e. contextual substitution is assumed. I don’t think AQL needs to say how the latter happens. Maybe I am misunderstanding the problem?

pablo · 14 February 2020 22:41

I think the spec lacks the context of execution of a query, which might clarify the use of parameters. Also, that might be OK since this is more related to the syntax than the processing/evaluation/execution, which might be good for a guide or an ITS spec.

As I understand, when an AQL expression is evaluated, something like this should happen:

QueryResult result = AQLProcessor.process(aqlExpression, paramsMap, contextMap, …)

If aqlExpression has $XYZ, params should have some an entry with key = XYZ.

And for stuff that might be used as params but are context of the app, stuff like current ehr_id, session info, etc. could be in the contextMap, or even included in the paramsMap with known keys.

IMO there is no problem mentioning $ehr_id per se in the examples, but of course we can add a comment about how that is obtained.

The most open use case I can think of is to allow users to execute queries at will. If that is the case, the query definition should include which parameters are required, what is the type of each param, and ask for the params, via a GUI or some kind of data entry, then execute the query considering those params.

That leads to another issue: there is no formal definition of parameters in the AQL syntax. IMO if an AQL expression uses $XYZ, there should be some kind of definition for the type of $XYZ inside the query expression, e.g.

def $XYZ: Integer
SELECT …
FROM …
WHERE … < $XYZ

About quotes for parameters

This query works in EHRSCAPE:

{
“q”: “select e/ehr_id, e/time_created, e/system_id from EHR e CONTAINS COMPOSITION c WHERE c/archetype_node_id=’$archetype_id’”,
“query_parameters”: {
“archetype_id”: “openEHR-EHR-COMPOSITION.minimal.v1”
}
}

This one doesn’t work:

{
“q”: “select e/ehr_id, e/time_created, e/system_id from EHR e CONTAINS COMPOSITION c WHERE c/archetype_node_id=$archetype_id”,
“query_parameters”: {
“archetype_id”: “openEHR-EHR-COMPOSITION.minimal.v1”
}
}

Seems the quotes are needed in ‘$archetype_id’, is that mandatory by the syntax?

We need to state if $archetype_id without the quotes is valid or invalid there.

matijap · 18 February 2020 07:49

Ehrscape (which is basically Better’s implementation of CDR) is a bit non-standard here in a way that I’m not really keen on proposing as the way to standardise (although that would be really convenient for us).

If you specify the parameter placeholder with a dollar sign, like you did, it gets inserted verbatim into AQL. This is injection prone, but useful for things like archetype predicates (in the FROM or CONTAINS section where they are not quoted), and those are usually not formed from user input anyway, so that’s okay if you know what you’re doing.

If you specify the parameter placeholder with a colon, like :archetype_id, it gets inserted into AQL quoted, taking care of all the needed escaping. That’s what you would use in a context like the one in your example.

We do not actually handle parameters in the AQL processor, like we probably should have, but do it all in the pre-processing phase and then parse AQL when all the parameters have already been inserted.

Regarding session-bound attributes like EHR ID I agree with @Seref we should not define them in the spec. Our implementation will probably retain that special parameter when the legacy sessioned API is used, but that’s only in legacy installations anyway.

sebastian.iancu · 19 February 2020 13:42

I’m afraid this will make automate testing and conformance checking quite difficult if not even impossible. passing parameters to AQL should eventually be standardized both on AQL spec as well as REST API spec.

sebastian.iancu · 19 February 2020 14:02

I think for security-injection reason as well as optimization purposes we would need support for parameters, but whether or not these are ‘consumed’ in a pre-processor, or handled by the AQL processor level I think is just an implementation aspect.

For backwards compatibility we should perhaps relay on (or support also) the $ sign. But we should however not facilitate and neither propagate things like

WHERE c/archetype_node_id=’$archetype_id’”

which for me suggests that it gets inserted verbatim.

Other than this, I think this text is quite clear, when it comes to value-types:

Seref · 21 February 2020 12:19

Thanks Sebastian,

I think at this point, we don’t have a complex parameter passing requirement beyond key/value pairs so I’m not sure if we need to model that and make part of conformance testing. Even if we did that, would that then mean that we’ll check conformance based on technology specific tests? As in, if I implement this in C#, then we’d have to write a .NET test harness, then Java, python etc…

I think ITS based conformance tests are the only realistically feasible ones and we already have REST API for that (to test AQL as a subset of it), which specifies how parameters are passed.

Does this make sense to you?

Seref · 21 February 2020 12:20

This is very helpful, thanks!