Which queries should EHR_STATUS.is_queryable actually affect?

Hey,

we want to implement the is_queryable restrictions. So before this is started i wanted to clarify some questions I came across while reading the specs.

EHR Information Model states that is_queryable will only affect population queries, meaning single EHR queries should still work.
Where i am am a bit unsure is what actually counts as a population query:

Archetype Query Language (AQL) states that the FROM clause of the query determines if it is a single or population query, where FROM EHR [ehr_id/value='1234'] would make it a single EHR query.

Query API states that a single EHR query can be achieved by supplying the openEHR-EHR-id header or ehr_id query parameter.

Question 1: Would these be applied to enforce the restriction to a single EHR regardless of whether the supplied query actually makes use of an ehr_id parameter or are they only applied if the query actually contains the parameter?

Question 2: Both definitions do not mention the WHERE clause in which one could also restrict the query to a single EHR like `WHERE e/ehr_id/value=‘1234’. So am I correct to assume that an AQL query in which the limitation to a single EHR is done in the WHERE clause would still count as a population query?

Question 3: Is every query that is not a single EHR query a population query?

A population query is one that does not specify an EhrId, and for which the Query service will determine the matching cohort of EHRs.

Thanks. I am still a bit confused about what “specifies an EhrId” means concretely. Here are some small examples to illustrate my current understanding and what is unclear to me.

This would be a population query :

SELECT e/ehr_id/value, c/uid/value
FROM EHR e
CONTAINS COMPOSITION c

But then if i sent the openEHR-EHR-id header or ehr_id query parameter it would become a single EHR query because now it is supposed to only be run on the specified EHR right?

For these two I would think they are population queries since the WHERE conditions are evaluated for the entire population, but i am not sure if that is correct, since the WHERE clauses also specify one or multiple EHR-IDs:

SELECT e/ehr_id/value, c/uid/value
FROM EHR e
CONTAINS COMPOSITION c
WHERE e/ehr_id/value='1234'
SELECT e/ehr_id/value, c/uid/value
FROM EHR e
CONTAINS COMPOSITION c
WHERE e/ehr_id/value='1234' OR e/ehr_id/value='5678'

While this one would never be a population query independent of the header/query-parameter, because it is always limited to one EHR:

SELECT e/ehr_id/value, c/uid/value
FROM EHR e[ehr_id/value=$ehrId]
CONTAINS COMPOSITION c

This could also be specified in the WHERE clause, because filters in the FROM clause are kind of a shortcut for/equivalent to, conditions in the WHERE. Something like: SELECT * FROM EHR e WHERE e/ehr_id/value='1234' is totally valid IMO.

That is related to your second question.

This is correct, though wording in that section is not 100% correct, since having a query with FROM EHR [ehr_id/value='1234']is actually not having any “parameters”, a parameter would be “ehr_id/value=$ehr_id_parameter”, IMO this needs rewording @thomas.beale @sebastian.iancu

That kind of answers your first question: is not about the parameter but providing a value for the ehr_id, as parameter or inline in the query itself.

The challenge is: actually checking for a single EHR query, needs to parse the AQL into an AST and analyzing that for conditions over the ehr_id, in the FROM or in the WHERE.

There could be a population query with a single EHR result. What makes a query to be a population one is that there is no filter for a single EHR in the query FROM / WHERE.

Hope that helps.

1 Like

If a query that doesn’t mention an EHR is submitted with that header, then the server needs o modify the query on the fly to add the condition for the EHR filter. Is that how this should work @thomas.beale @sebastian.iancu ?

If that is the way this was designed to work, IMO it is a weird design. I would expect the server to process the query as submitted, not changing it internally just because an extra parameter is there.

But I really don’t know how this works.

Yes very much so. Thanks :slight_smile:

So just to be sure: If i have for example a disjunction of multiple EHR-IDs in the WHERE clause (like WHERE e/ehr_id/value='1' OR e/ehr_id/value='2' , this would then be a population query since it does not target a single EHR.
(Sorry if i ask/state things multiple times. I’m a “rather ask/say it one time too much than not enough” person :smiley: )
And yes i agree, analyzing the WHERE-clause in particular may prove to be a challenge.

That is at least what the “Single EHR queries” section in Query API openEHR specs implies IMO.
EDIT: Also the REST spec is not clear on whether the header and the ehr_id “query parameter” are mutually exclusive. And if they are not it would be good to know which one takes precedence for modifying the query.

From a developer perspective I would agree that this feels like a weird design and will probably lead to confusion for developers

IMO this is not a population query. I think the distinction is that we only want to “see” active EHRs as part of population queries. For example, we might want a list of EHRs that have an open order entry for a medication or image study. In this case, we are really only interested in active records.

Sure, if in theory we use an “IN” operator (or “matches”) and put in all EHR IDs from our system, then we could argue that it is a population query. But I don’t think this is a helpful way to think about it this way.

There is not really a formal definition that I know of for ‘population query’. But in my view a query of this form is just N individual queries being defined in a slightly more efficient fashion.

I see a population query as one that is trying to find out which individuals in a population match some criteria, or what some aggregate characteristic is over the population (e.g. average number of visits per year to the doctor). In a population query (according to this definition) you don’t know initially which EHRs are of interest - indeed, EHR ids are likely to be in the result, not the query.

That is my view as well.

I agree with that, though there is a gray area right now in the specs. My rationale is:

  1. query classes divide in: single EHR queries / population queries
  2. when the ehr_id filter (note I said filter nor parameter which I think should be the correct name), the result will be for a population
  3. the specs doesn’t say if a multi-value filter is used for ehr_id then the query is a population one (like ehr_id IN (a,b,c)

Maybe we are missing (in the specs) another classification.

@thomas.beale mentioned cohort, IMO a cohort query (I propose that name) is when you want to target specific EHRs and multiple EHRs.

When doing population studies you initially want to target ANY EHR (population) and get the EHRs that comply with some condition.

Then you might want to further query, but just those EHRs that resulted from the population query, I would call these cohort queries.

Then you might want to target individual EHRs to check some conditions or, for a population study, not even query individual EHRs.

Yep, that is exactly what I thought when I read that part of the spec. I don’t like the system changing my queries because some parameter/header appears in the request.