# openEHR Querying specifications **Category:** [Technical (archive)](https://discourse.openehr.org/c/technical-archive/156) **Created:** 2008-06-03 15:39 UTC **Views:** 3 **Replies:** 26 **URL:** https://discourse.openehr.org/t/openehr-querying-specifications/14765 --- ## Post #1 by @thomas.beale As part of the ongoing specification work this year, we have started to build some resource pages for the various specifications\. One of them concerns a querying solution for openEHR \- see the wiki page at: http://www.openehr.org/wiki/display/spec/openEHR+Query+Specifications I have uploaded the Ocean Informatics developed 'Archetype Query Language' \(AQL\) as a candidate solution for querying archetype\-based data\. As explained in the query specification home page, AQL can be treated as a starting point for defining a normative openEHR querying language, or it may be considered to be one candidate amongst several, if there are others available\. Ocean Informatics undertakes to continue the development of this language in the openEHR space, so that if the openEHR community wishes to use AQL, the most recent modifications will be available\. The timetable we have initially suggested for openEHR is to have a solid development draft ready in Q3 2008, i\.e\. before end September\. See the roadmap for the current delivery plan \- http://www.openehr.org/specifications/spec_roadmap_2008.html \. A stable release should probably be available by mid 2009\. The current material is therefore intended as a resource for discussion and definition of a query language for openEHR\. A team can be defined after sufficient time for the community to react to this material and determine if it makes sense to use AQL as the basis or to seek other solutions or candidates\. \- thomas beale --- ## Post #2 by @Tim_Cook2 Hi Tom, > I have uploaded the Ocean Informatics developed 'Archetype Query > Language' \(AQL\) as a candidate solution for querying archetype\-based > data\. As explained in the query specification home page, AQL can be > treated as a starting point for defining a normative openEHR querying > language, or it may be considered to be one candidate amongst several, > if there are others available\. Ocean Informatics undertakes to continue > the development of this language in the openEHR space, so that if the > openEHR community wishes to use AQL, the most recent modifications will > be available\. This certainly 'looks' functional\. I assume that Ocean Informatics has done a fair amount of testing it to get to this point\. clause wouldn't contain either EHR of DEMOGRAPHICS as its first parameter? Also, does the FROM clause support wildcard matching such as: FROM EHR e\[\*\] CONTAINS COMPOSITION c\[openEHR\-EHR\-COMPOSITION\.report\.v1\] Oooops\! re\-reading the very first step I see that the default is ALL EHRs\. But, isn't explicit better than implicit in this case? Otherwise this query would read: FROM CONTAINS COMPOSITION c\[openEHR\-EHR\-COMPOSITION\.report\.v1\] Correct? Thanks, \-\-Tim --- ## Post #3 by @Greg_Caulton --- ## Post #4 by @ian.mcnicoll Fair point\. Perhaps AQL should support ranges of version numbers to simplify the query as in many cases the query will not be affected by a structrural change to the archetype e\.g\. > FROM EHR \[ehr\_id/value=$ehrUid\] > CONTAINS COMPOSITION \[openEHR\-EHR\-COMPOSITION\.encounter\.v\[BETWEEN 1\.5 AND 2\] > CONTAINS OBSERVATION obs \[openEHR\-EHR\-OBSERVATION\.blood\_pressure\.v\[>=1\] > WHERE \( > obs/data\[at0001\]/events\[at0006\]/data\[at0003\]/items\[at0004\]/value/value >= > 140 Versions and revisions would need to be handled\. Ian --- ## Post #5 by @system I suspect changes between version could potentially break the paths in WHERE clause. So maybe the version information isn't significant here since either the path works and the criteria are checked or the path doesn't work and fails silently. /Rong --- ## Post #6 by @mikael I disagree with Rong. If for example the change between the first and second version is a change in a position value set from “sitting”, “standing” and “other” to “sitting”, “standing”, “lying” and “other”. If then a query is written for the first version of the archetype searching for all cases where the position is “not sitting” and “not standing” the query will search for the position “other” and return a correct answer. If we implement Rong’s suggestion the query will work also with the second version of the archetype, but it will return another answer than the intended, namely the cases where the position is “not sitting” and “not standing” and “not lying” instead of the intended “not sitting” and “not standing”. I therefore think that excluding the version information can result in a mess. /Micke --- ## Post #7 by @Karsten_Hilbert It shouldn't, of course, be excluded by default but should be excludable on demand\. By, say, allowing regex matching for path definitions\. Karsten --- ## Post #8 by @mikael Karsten Hilbert wrote: --- ## Post #9 by @system Hi Tim, > From: openehr\-technical\-bounces@openehr\.org \[mailto:openehr-technical- > bounces@openehr\.org\] On Behalf Of Tim Cook > Sent: Wednesday, June 04, 2008 4:49 AM > To: For openEHR technical discussions > Subject: Re: openEHR Querying specifications > > Hi Tom, > > > I have uploaded the Ocean Informatics developed 'Archetype Query > > Language' \(AQL\) as a candidate solution for querying archetype\-based > > data\. As explained in the query specification home page, AQL can be > > treated as a starting point for defining a normative openEHR querying > > language, or it may be considered to be one candidate amongst several, > > if there are others available\. Ocean Informatics undertakes to > > continue the development of this language in the openEHR space, so > > that if the openEHR community wishes to use AQL, the most recent > > modifications will be available\. > > This certainly 'looks' functional\. I assume that Ocean Informatics has > done a fair amount of testing it to get to this point\. \[Chunlan Ma\] Yes, Ocean has implemented the parser an query engine to process AQL queries even though some features are not supported yet\. It is in our plan\. Currently, all Ocean products are using AQL to retrieve data from backend\. > From my \(possibly too brief\) read; are there any cases where the FROM > clause wouldn't contain either EHR of DEMOGRAPHICS as its first > parameter? \[Chunlan Ma\] Certainly, a FROM clause doesn't necessarily include EHR if it is not required\. In the From clause, if you specify a particular ehr\_id value as the criteria, then you are querying data within a single EHR\. Otherwise, you are doing population queries, i\.e\. querying data across all EHRs\. If you want to retrieve EHR associated information, then you need to specify the EHR variable in FROM clause, such as: SELECT e/ehr\_id/value FROM EHR e CONTAINS\.\.\. However, if you are not interested in the EHR associated information, then you can just leave it out\. For example, SELECT c/composer/name FROM COMPOSITION c\[openEHR\-EHR\-COMPOSITION\.report\.v1\]\.\.\. Also, does the FROM clause support wildcard matching such > as: > > FROM EHR e\[\*\] CONTAINS COMPOSITION c\[openEHR\-EHR\-COMPOSITION\.report\.v1\] > > Oooops\! re\-reading the very first step I see that the default is ALL > EHRs\. But, isn't explicit better than implicit in this case? Otherwise > this query would read: > > FROM CONTAINS COMPOSITION c\[openEHR\-EHR\-COMPOSITION\.report\.v1\] > > Correct? > \[Chunlan Ma\] We don't use wildcard in FROM clause because like what you said before, it is the same meaning as FROM EHR e CONTAINS\.\.\.\. If it is required, wildcard can be supported in WHERE clause or SELECT clause for the openEHR path, such as //\*\[at0002\] Cheers, Chunlan --- ## Post #10 by @Heath_Frankel3 Versions should be handled using the regular expression syntax of the archetype ID, as is done in ADL to represent slot constraints and action\_arcehtype\_id in ACTIVITY\. E\.g\. \[openEHR\-EHR\-COMPOSITION\.encounter\.v1\*\] BTW, using the OR operator you could have had \.\.\. CONTAINS COMPOSITION \[openEHR\-EHR\-COMPOSITION\.encounter\.v1\] OR COMPOSITION \[openEHR\-EHR\-COMPOSITION\.encounter\.v2\] \.\.\. Heath > From: openehr\-technical\-bounces@openehr\.org \[mailto:openehr-technical- > bounces@openehr\.org\] On Behalf Of Ian McNicoll > Sent: Wednesday, 4 June 2008 6:06 AM > To: For openEHR technical discussions > Subject: Re: openEHR Querying specifications > > Fair point\. Perhaps AQL should support ranges of version numbers to > simplify the query as in many cases the query will not be affected by > a structrural change to the archetype > > e\.g\. > > > FROM EHR \[ehr\_id/value=$ehrUid\] > > CONTAINS COMPOSITION \[openEHR\-EHR\-COMPOSITION\.encounter\.v\[BETWEEN 1\.5 AND 2\] > > CONTAINS OBSERVATION obs \[openEHR\-EHR\-OBSERVATION\.blood\_pressure\.v\[>=1\] > > WHERE \( > > obs/data\[at0001\]/events\[at0006\]/data\[at0003\]/items\[at0004\]/value/value > = > > 140 > > Versions and revisions would need to be handled\. > > Ian > > > > > \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- > > > > Message: 2 > > Date: Tue, 03 Jun 2008 16:39:37 \+0100 > > From: Thomas Beale <thomas\.beale@oceaninformatics\.com> > > Subject: openEHR Querying specifications > > To: Openehr\-Technical <openehr\-technical@openehr\.org> > > Message\-ID: <484565B9\.6030805@oceaninformatics\.com> > > Content\-Type: text/plain; charset=ISO\-8859\-1; format=flowed > > > > > > The current material is therefore intended as a resource for discussion > > and definition of a query language for openEHR\. A team can be defined > > after sufficient time for the community to react to this material and > > determine if it makes sense to use AQL as the basis or to seek other > > solutions or candidates\. > > > > \- thomas beale > > > > > > > > Perhaps this has been answered but as the archetypes change version is it > > expected that the AQL will need to keep up with that \- I assume our historic > > data would be specific to the archetype version \- not 'upgraded' ? > > > > i\.e\. after v1: > > > > FROM EHR \[ehr\_id/value=$ehrUid\] CONTAINS COMPOSITION > > \[openEHR\-EHR\-COMPOSITION\.encounter\.v1\] > > CONTAINS OBSERVATION obs \[openEHR\-EHR\-OBSERVATION\.blood\_pressure\.v1\] > > WHERE obs/data\[at0001\]/events\[at0006\]/data\[at0003\]/items\[at0004\]/value/value --- ## Post #11 by @system Dear All, There is a paper about AQL published in MedInfo 2007\. It may be of interest\. http://www.openehr.org/downloads/publications/archetypes/MedInfo_2007_EQL_MA \.pdf Cheers, Chunlan --- ## Post #12 by @Stef_Verlinden1 I'm not a technical person but to me it seems very cumbersome if such 'differences' could exist between 2 versions of the same archetypes. This would mean that for every query one has to go into detail of every version of that AT which could mean al lot of work. To my understanding versions of AT's should be 'backward compatible'. One can only add (and maybe remove) items, but never rename an existing item. Only then a lot of unnecessary work for 'query-makers' can be avoided. Is this assumption indeed correct? Cheers, Stef --- ## Post #13 by @mikael The difference I mentioned is an addition of a value to a value set and not a renaming. It is just another variant of the classical “not elsewhere classified”-problem in classifications like ICD. We probably have to be even more aware of the problem with varying value sets when data is reused when we use queries to retrieve value sets from external terminologies instead of include the value sets in the archetype. Greetings, Mikael --- ## Post #14 by @Karsten_Hilbert If it's excluded on demand in a particular query then the query is written to work on any version \- quite by design\. Of course, the version can be included in the SELECT list so that one may learn which version actually matched the query\. Karsten --- ## Post #15 by @Karsten_Hilbert \[openEHR\-EHR\-COMPOSITION\.encounter\.v\\d\+\], hopefully ? Karsten --- ## Post #16 by @thomas.beale Stef Verlinden wrote: > I'm not a technical person but to me it seems very cumbersome if such > 'differences' could exist between 2 versions of the same archetypes\. > This would mean that for every query one has to go into detail of > every version of that AT which could mean al lot of work\. > > To my understanding versions of AT's should be 'backward compatible'\. > One can only add \(and maybe remove\) items, but never rename an > existing item\. Only then a lot of unnecessary work for 'query\-makers' > can be avoided\. > > Is this assumption indeed correct? > all, we need to be very clear about archetype 'versions'\. There are two dimensions to the problem of 'archetype change' that need to be remembered\. Firstly, archetype development lifecycle\. Before an archetype is finally released after progressing through the quality assurance process, it will undergo many changes as part of normal authoring\. During this period, no production data are created, and there is no issue about backward or forward compatibility of archetypes or queries with data\. The second dimension is to do with the kinds of change that can be made\. First of all, many variations in data can be accommodated with no change at all \- archetypes are constraint models, not cookie\-cutter templates\. Most archetypes are designed to be very open\. Many changes to archetypes, including all translations, and any semantic change that loosens the constraints can be accommodated by a 'revision'\. Only incompatible changes result in a version change, i\.e\. a change in the '\.v1', '\.v2' etc part of the archetype identifier\. Now, after an archetype is released, such changes should be minimal, if not non\-existent\. Remember: this is only incompatible changes, such as ones that change the structure of information, make optional items mandatory and other such things\. When a new version is created, a data migration algorithm has to be published with it\. The result of this is that new \_versions\_ of officially released archetypes should be very low in number and should always have a formal definition of how to migrate data created using an older version\. The confusing factor that people are seeing now is that due to the current tooling, most archetype authors are creating new 'versions' when in fact the changes are only new revisions\. We are also seeing many archetypes that have not been quality assured\. These limitations are being addressed with new tooling that will soon be available, and a better defined version numbering system, using a 3 part identifier\. One of the things the new ADL parser will be able to do is to determine whether a given change requires a new version or not\. The algorhitms for doing this are not trivial and it has taken some time to get them worked out\. In production systems different archetype versions may give rise to the following: \- automatic data migration of data from an old version to a new version of an archetype \- automatic on\-the\-fly translation of data from an old version to the form required by a new version If either or both facilities are in operation, then only one version of any given archetype will effectively be vsible in the database \- the latest\. For situations where data created by more than one is left intact, we consider this as if it were two archetypes\. I\.e\. there is a general need if you are querying 'systolic blood pressure' to find all archetpes in which this could be recorded and to generate an appropriate query\. If let's say 2 out of 3 found archetypes happen to be two versions of one logical archetype, this is essentially the same situatoin as if 3 distinct archetypes had been found that carried this data item\. The key to managing this is the forthcoming online archetype repository classification ontology which allows you to do this search\. This ontology already exists in a basic form, and is what supports the querying at the old prototype repository at http://archetypes.com.au. I hope this clarifies things \- thomas --- ## Post #17 by @Stef_Verlinden1 Absolutely, thanks. Stef --- ## Post #18 by @Heath_Frankel3 \[openEHR\-EHR\-COMPOSITION\.encounter\.v1\*\] \(or perhaps more correctly \[openEHR\-EHR\-COMPOSITION\.encounter\.v1\.\*\], where the dot means any character not the version delimiter\) and \[openEHR\-EHR\-COMPOSITION\.encounter\.v\\d\+\] are different\. The first allows all revisions of \.v1 \(e\.g\. v1\.1, v1\.2, \.\.\) whilst the later allows all versions\. The rules of archetype major versions state that they cannot be compared and where as revisions have strict rules about what can be modified to ensure compatible\. Thomas has explained this elsewhere\. Therefore the former is more useful in queries\. Heath --- ## Post #19 by @system Dear all, The text below by Thomas warrants a conclusion: - *open*EHR needs a (place in a) document that defines the correct wording and meaning of: version and revision. To my mind these words are to much similar and can generate confusions. Alternatives: Package (new Archetype that breaks the previous package archetype) plus conversion script from the Old to the New Archetype) Version (new Archetype as the result of some editorial changes, only, not breaking the previous version) Gerard > The result of this is that new _versions_ of officially released > archetypes should be very low in number and should always have a formal > definition of how to migrate data created using an older version. > > The confusing factor that people are seeing now is that due to the > current tooling, most archetype authors are creating new 'versions' when > in fact the changes are only new revisions -- -- Gerard Freriks, MD Huigsloterdijk 378 2158 LR Buitenkaag The Netherlands T: +31 252544896 M: +31 620347088 E: [gfrer@luna.nl](mailto:gfrer@luna.nl) Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. Benjamin Franklin 11 Nov 1755 --- ## Post #20 by @system > I disagree with Rong. > > If for example the change between the first and second version is a change in a position value set from "sitting", "standing" and "other" to "sitting", "standing", "lying" and "other". If then a query is written for the first version of the archetype searching for all cases where the position is "not sitting" and "not standing" the query will search for the position "other" and return a correct answer. > > If we implement Rong's suggestion the query will work also with the second version of the archetype, but it will return another answer than the intended, namely the cases where the position is "not sitting" and "not standing" and "not lying" instead of the intended "not sitting" and "not standing". Micke, what if you keep the original search criteria "not sitting" and "not standing" instead of searching "others", will you get expected result with both versions? I was thinking on the potential broken paths between changes when I made my guess. Since archetypes are expressed more as "maximum datasets" now, it seems the changes of removing parts of the archetype will be kept minimum. Most of changes will perhaps be additions to allow more relevant data entries. If this is the case, the original path should remain valid through versions. I was too "quick" about broken path failing silently. The RM PATHABLE.item_at_path function (underlying path based query support in RM) requires path to be valid. Would this mean during query execution phase any invalid path in the query should result in stop of execution or exclusion of the current data instance from the result set? I think we need to make this clear in the query specification. The idea of "hiding" version number can be achieved if the Query generator tool will be smart enough to tell for a given set of versions of archetypes, a query should be not only valid with paths, but also semantically consistent across all versions. We probably want to have similar validation on the queries once they hit query engines. Cheers, Rong --- ## Post #21 by @Tim_Cook2 Hmmm, seems to me that you are introducing a new term AND omitting a term that is in use\. While not clarifying the previous terms which Tom did quite well\. I believe that Tom very well defined a version change as a change that substantially modified a previous version in such a manner as to render it incompatible with previous versions\. A Revision \(which you omitted\) is a change that may further constrain or otherwise modify an archetype but does not render the expressed information unusable\. This is also the same information that can be found in the relevant documents\. But I see no reference to 'Package' as far as archetypes are concerned\. Package implies a grouping of some type\. An archetype is \(AFAIK\) considered to be the expression of a single clinical concept\. Regards, Tim --- ## Post #22 by @ian.mcnicoll Hi Gerard, I 'm afraid I agree with Tim on this one\. The difference between 'breaking' Version changes and non\-breaking Revisions is clearly documented in the specification\. The reason for confusion and non\-use within the NHS is simply that the current tools do not support Revisions\. Now that the Template Model has been produced, I would hope that we can properly support Revisions in both Archetype Editor and Template Designer\. I appreciate that at first, Version / Revision is confusing because of the similarity in English, but I think it is too late now to change these terms, which at least have clear technical definitions\. I was interested to see Thomas's comment that a future version of the parser will offer guidance on whether an altered archetype needs to be re\-versioned\. BTW What would be the equivalents in Dutch for Revision and Version? Ian --- ## Post #23 by @Peter_Gummer1 Heath Frankel wrote: > \[openEHR\-EHR\-COMPOSITION\.encounter\.v1\*\] \(or perhaps more correctly > \[openEHR\-EHR\-COMPOSITION\.encounter\.v1\.\*\], where the dot means any > character > not the version delimiter\) and \[openEHR\-EHR\-COMPOSITION\.encounter\.v\\d\+\] > are > different\. The first allows all revisions of \.v1 \(e\.g\. v1\.1, v1\.2, \.\.\) Close, but not quite\! \[openEHR\-EHR\-COMPOSITION\.encounter\.v1\.\*\] allows all revisions of \.v1, \.v10, \.v11, \.\.\. \.v100, \.v101, etc\. To allow all revisions of \.v1, we would need this: \[openEHR\-EHR\-COMPOSITION\.encounter\.v1\\\.\.\*\] But what about \.v1draft? This regex wouldn't catch it\. Does this matter? Or is that old "draft" convention going to be phased out? \- Peter --- ## Post #24 by @Heath_Frankel3 The v1draft convention is already deprecated\. The BNF for AQL doesn't support it deliberately, to ensure only non\-draft archetypes are used when committing/retrieving data\. Heath > From: openehr\-technical\-bounces@openehr\.org \[mailto:openehr-technical- > bounces@openehr\.org\] On Behalf Of Peter Gummer > Sent: Thursday, 5 June 2008 11:14 AM > To: For openEHR technical discussions > Subject: Re: openEHR Querying specifications > > Heath Frankel wrote: > > \[openEHR\-EHR\-COMPOSITION\.encounter\.v1\*\] \(or perhaps more correctly > > \[openEHR\-EHR\-COMPOSITION\.encounter\.v1\.\*\], where the dot means any > > character > > not the version delimiter\) and \[openEHR\-EHR\-COMPOSITION\.encounter\.v\\d\+\] > > are > > different\. The first allows all revisions of \.v1 \(e\.g\. v1\.1, v1\.2, \.\.\) > > Close, but not quite\! > > \[openEHR\-EHR\-COMPOSITION\.encounter\.v1\.\*\] allows all revisions of \.v1, \.v10, > \.v11, \.\.\. \.v100, \.v101, etc\. > > To allow all revisions of \.v1, we would need this: > > \[openEHR\-EHR\-COMPOSITION\.encounter\.v1\\\.\.\*\] > > But what about \.v1draft? This regex wouldn't catch it\. Does this matter? Or --- ## Post #25 by @system I must disappoint you: Dutch: Revisie, versie. Gerard > BTW What would be the equivalents in Dutch for Revision and Version? -- -- Gerard Freriks, MD Huigsloterdijk 378 2158 LR Buitenkaag The Netherlands T: +31 252544896 M: +31 620347088 E: [gfrer@luna.nl](mailto:gfrer@luna.nl) Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. Benjamin Franklin 11 Nov 1755 --- ## Post #26 by @mikael > Mikael Nyström wrote: > >> If for example the change between the first and second version is a change >> in a position value set from "sitting", "standing" and "other" to "sitting", >> "standing", "lying" and "other". If then a query is written for the first >> version of the archetype searching for all cases where the position is "not >> sitting" and "not standing" the query will search for the position "other" >> and return a correct answer. >> >> If we implement Rong's suggestion the query will work also with the second >> version of the archetype, but it will return another answer than the >> intended, namely the cases where the position is "not sitting" and "not >> standing" and "not lying" instead of the intended "not sitting" and "not >> standing". > > > Micke, what if you keep the original search criteria "not sitting" and "not > standing" instead of searching "others", will you get expected result with > both versions? Yes, that works, and that is the proper way to do it. My example was an example of how people quite often do without realizing the consequences. They think that if something works in a specific version it will work in the subsequent versions. I think I have seen these kinds of problems too often in other areas in medical informatics (like the “not elsewhere classified”-problem in ICD) and I therefore think that people will do things without realizing the consequences also when they query archetypes. Greetings, Mikael --- ## Post #27 by @Tim_Cook2 The previously referred to AQL BNF carries this header: "Name" = 'EhrBank Query Lanaguage \(EQL\) \- \{Equal\}' "Version" = '0\.4' "Date" = '14 September 2006' "Author" = 'Chunlan Ma & Heath Frankel' We know it has been renamed from EQL to AQL but I am wondering if there is a newer version available anywhere? Thanks, Tim --- **Canonical:** https://discourse.openehr.org/t/openehr-querying-specifications/14765 **Original content:** https://discourse.openehr.org/t/openehr-querying-specifications/14765