# [EHRbase] Storing and querying data without an standalone archetype **Category:** [Platform](https://discourse.openehr.org/c/platform-implem/7) **Created:** 2024-03-28 17:45 UTC **Views:** 926 **Replies:** 39 **URL:** https://discourse.openehr.org/t/ehrbase-storing-and-querying-data-without-an-standalone-archetype/5058 --- ## Post #1 by @damoca **System**: EHRbase **Version**: v0.32.0 I've come across two problems that probably are closely related. I was doing some tests and I created a single archetype containing the full structure I needed. That is, a COMPOSITION, which includes an OBSERVATION, which includes several ELEMENT. ![image|356x374](upload://87sToKr8npORYuqrtvsCXAwrTrw.png) I know that's not the common good modeling practice, but it is completely legal. With that archetype I created the template, and then some instances using the Cabolabs openEHR toolkit. To my surprise, when I tried to load those instances in EHRbase, I got this error (at0004 is the OBSERVATION): ``` { "error": "Unprocessable Entity", "message": "/content[at0002, 2]/items[at0004, 1]: Invariant Is_archetypeRoot failed on type OBSERVATION" } ``` My deduction is that since the archetype_node_id in the OBSERVATION instance has an atNNNN code and not an archetypeID, it is thus rejected. I can understand that this kind of instances could impact in the indexing and querying processes, but nothing in the specifications says that it is an incorrect instance. This deduction was supported later, when I was trying to do an AQL that returns all ELEMENTs in all instances. ``` SELECT ele FROM EHR e CONTAINS COMPOSITION c CONTAINS OBSERVATION o CONTAINS ELEMENT ele ``` First I got no results. Then I made another test creating an ELEMENT archetype, inserted it into the OBSERVATION, created instances, and only those identified elements were successfully returned with the previous query. That means that an archetypeID is needed to index and query data, although ELEMENTS without it are accepted by the server without complaints. I see there a lack of coherence: * I cannot store OBSERVATION instances (I guess that it will happen with any kind of ENTRY) if they are not defined as an independent archetype. * I cannot query ELEMENT nodes unless they are defined in an independent archetype, although they can be stored being "anonymous" (as it is always the case). I insist, I completely understand that there could be efficiency reasons for these limitations, but they are affecting valid use cases of the specifications. Any thoughts about this? --- ## Post #2 by @birger.haarbrandt Would you mind attaching template and an example composition? Edit: I think the validation is done by Archie. I would have to check how the new AQL engine deals with the structure. --- ## Post #3 by @birger.haarbrandt @damoca Alright, I was able to reproduce this. In EHRbase you can set a configuration like this: ``` # Option to disable strict invariant validation. disable-strict-validation: true ``` This will allow you to store the composition. We might need to discuss with @MattijsK about any implications for Archie. Then I also checked with EHRbase's new AQL engine and you can to do the following: ``` { "q": "SELECT o FROM EHR e CONTAINS COMPOSITION c CONTAINS OBSERVATION o[at0002] CONTAINS ELEMENT e " } ``` and your query also works: ``` { "q": "SELECT ele FROM EHR e CONTAINS COMPOSITION c CONTAINS OBSERVATION o CONTAINS ELEMENT ele " } ``` --- ## Post #4 by @joostholslag I don’t think it’s Archie. Since I made a similar archetype in the Nedap archetype editor, which validates archetypes using Archie (iirc) and it doesn’t give any errors. https://archetype-editor.nedap.healthcare/advanced/joost/archetypes/nl.joostholslag::openEHR-EHR-COMPOSITION.Teat.v0.0.1/editor I thought ehrbase only used the RM classes from Archie, not the archetype processing, since that’s adl2 based. But maybe I remember wrongly. --- ## Post #5 by @joostholslag Since in adl2 templates are technically archetypes, I use this pattern of in line definition of archetype able structures regularly (mostly sections). So I definitely feel it’s a useful pattern to support. --- ## Post #6 by @birger.haarbrandt @joostholslag thanks a lot for taking a look into it! Then I think we need to investigate a bit on the EHRbase side --- ## Post #7 by @thomas.beale The information structure is certainly valid. It's not necessarily even 'bad modelling' - it's just that the typical archetype joining (created by slots of use_archetype references) is not being used, so there are no archetype ids at the usual locations such as the root node of OBSERVATION. A vanilla system built on published principles should therefore be able to deal with this. However... to make querying work in a reasonable way we tend to add some extra (non-published) requirements - that root points of ENTRYs for example are always a new archetype. Without this, it's hard to expect querying to work properly, since some ENTRYs won't be found by the usual methods. Note that we already do both kinds of modelling with CLUSTERs, and this is assumed. Thus, an AQL query looking for (say) CLUSTER device under OBSERVATION abc will not find it if there are inline CLUSTER structures inside the containing OBSERVATION rather than device data always being represented by a device CLUSTER archetype. There are two possible solutions I can see: * we say that an openEHR 'system' (CDR, however we want to call it) breaks information up according to certain levels, driven by some 'querying profile' or 'configuration' * we say that querying needs to be able to deal with any information structure, regardless of where the boundaries are, and provide ways to make it work. Following the first option, the 'levels' would presumably ones that have semantic significance. We generally agree that ENTRYs make sense as stand-alone statements of truth for example. The story gets more complicated with CLUSTERs, so it's not quite a question of 'levels' but of independent entities having their own archetypes. Hence, a CLUSTER archetype is required for device, but not (generally) for its subparts. The second option implies that querying would be complicated by trying to allow both modes of representation for what we consider to be information entities representing independent (and therefore independently queryable) entities. I would therefor favour a modelling approach approximating the first option. --- ## Post #8 by @birger.haarbrandt @thomas.beale Not sure about Better and DIPS, but with the new AQL engine we seem to be able to deal with option 2 in EHRbase. So depending on the general experience of implementers, I think option 2 can work as well. --- ## Post #9 by @damoca I attach the archetype, the template and a JSON instance. As I said it was just a quick technical example. [openEHR-EHR-COMPOSITION.proves_tecniques.v0.adl|attachment](upload://8IKw5Y7esoUT9Go67k3OTWDe3gf.adl) (3,6 KB) [Proves tecniques.opt|attachment](upload://sa8YZXBv5E10Q5PnXeR2p1XDFUV.opt) (18,3 KB) [Proves tècniques - instància.json|attachment](upload://p9xPnpd3DscjYZNW92YyPndo9zR.json) (6,2 KB) --- ## Post #10 by @thomas.beale I don't know that option 2 would be that hard to implement; what we have to think of is the numerous queries that are written over long periods of time, and managed as knowledge resources. Having inline structures representing separate entities will be problematic. I believe we should have a convention that says that an information object that IS-ABOUT an independent entity (thing or process - including an event, as understood by e.g. [BFO2](https://specifications.openehr.org/releases/UML/latest/index.html#Diagrams___19_0_3_8fe028d_1652093019886_864595_5188)) should always be modelled by its own archetype. WHereas parts of entities (things, process segments etc) may be modelled inline or (for re-use purposes) as separate archetypes. --- ## Post #11 by @damoca [quote="birger.haarbrandt, post:3, topic:5058"] In EHRbase you can set a configuration like this: ``` # Option to disable strict invariant validation. disable-strict-validation: true ``` [/quote] Thank you! I'm using the docker distribution, and I imagine that option corresponds to this docker run parameter: `SERVER_DISABLESTRICTVALIDATION --> Disable strict validation of openEHR input` I will try it, although it sounds as it will accept anything you send, I hope not :sweat_smile: --- ## Post #12 by @damoca I think it is important to distinguish between functionality and performance of AQL. It is reasonable to expect that in AQL you can query at least for any LOCATABLE children because, well, they are locatable. Then, if they are archetyped, then they can be indexed more easily and then the query performance is better... that's fair but just complementary. BTW, this is a good example of what @pablo mentioned some weeks ago about the need to improve the AQL specifications. What can be queried in the FROM clause? Now it is a quite informal definition: ![image|690x178](upload://3LWYl58KGiWePTOLNm4kueawitL.png) Does that include other non-LOCATABLE classes? https://discourse.openehr.org/t/aql-to-database-sql-convertor/4967/7?u=damoca --- ## Post #13 by @birger.haarbrandt Hi @damoca, it won't accept anything, it will still check for most constraints from the OPT + RM. I checked with your OPT and example and can confirm that you can do stuff like this: ``` { "q": "SELECT tree FROM COMPOSITION c CONTAINS OBSERVATION o[at0004] CONTAINS ITEM_TREE tree[at0006] CONTAINS ELEMENT e[at0007]" } ``` --- ## Post #14 by @damoca I tried with the parameter you mentioned, and now the instance is accepted :+1: The AQL is not working though. I'm willing to take a look to that new AQL engine implementation :grin: ``` { "error": "Bad Request", "message": "Could not process query/stored-query, reason: org.antlr.v4.runtime.misc.ParseCancellationException: AQL Parse exception: line 1: char 54 mismatched input 'at0004' expecting ARCHETYPEID" } ``` In any case, my idea was to not even needing to put the atNNNN code in the AQL, but filter it by a term_binding code. But that's another story. --- ## Post #15 by @birger.haarbrandt > In any case, my idea was to not even needing to put the atNNNN code in the AQL, but filter it by a term_binding code. But that’s another story. Can you share some details on what you have in mind? --- ## Post #16 by @damoca There are many facets of the problem. * We are working with multilingual templates and instances, thus we have to avoid any filter using textual names. * We are working with clones of the nodes, so the atNNNN code is repeated, but the term_binding changes for each clone, so we can use that for filtering results. * And finally, if we really believe in semantic interoperability, we should be able to locate any structure by its term_binding, no matter where it is located in one or multiple archetypes. Pushing the technology to its limits :smiley: --- ## Post #17 by @thomas.beale Something we are doing in Graphite is to use LOINC codes to name every data node in every archetype, including archetypes created from openEHR archetypes. In ADL2 these will appear as term bindings. These will be published openly soon (need to make a bit more progress on obtaining codes, which means quality checking natural language keys) and I would recommend we think about the same approach in openEHR. --- ## Post #18 by @pablo [quote="damoca, post:1, topic:5058"] With that archetype I created the template, and then some instances using the Cabolabs openEHR toolkit. [/quote] @damoca did you check the generated instance was correct? Just double checking if this is not an issue in the openEHR Toolkit instead of in EHRBase. Though I don't remember making any assumptions in the instance generator about where SLOTs should be used. --- ## Post #19 by @damoca Not in all detail, but I took a quick look to it and seemed right. --- ## Post #20 by @siljelb That's a very interesting approach! What happens if there aren't any semantically identical LOINC codes? --- ## Post #21 by @SevKohler [quote="damoca, post:16, topic:5058"] And finally, if we really believe in semantic interoperability, we should be able to locate any structure by its term_binding, no matter where it is located in one or multiple archetypes. [/quote] This is very important, thinking about the strength of querying children's of codes using e.g. a term server. There is also the question how do this code appear in the composition ? The binding from the archetypes should appear in the composition as term mapping. Pulled from the archetype bindings. --- ## Post #22 by @SevKohler Do you use term bindings for that ? --- ## Post #23 by @damoca [quote="SevKohler, post:21, topic:5058"] This is very important, thinking about the strength of querying children’s of codes using e.g. a term server. [/quote] Exactly, in theory AQL supports a function to query subsets (or even an ECL expression). But I don't know if any provider implements this. https://specifications.openehr.org/releases/QUERY/Release-1.1.0/AQL.html#_terminology ``` WHERE e/value/defining_code/code_string matches TERMINOLOGY('expand', 'hl7.org/fhir/4.0', 'http://snomed.info/sct?fhir_vs=isa/50697003') ``` [quote="SevKohler, post:21, topic:5058"] There is also the question how do this code appear in the composition ? [/quote] In the data instance, the code goes into the **name** attribute of the RM class. I could be inside the mapping of the DV_TEXT, or a defining_code of a DV_CODED_TEXT. ``` Sistòlica SNOMED-CT 271649006 111 mm[Hg] ``` --- ## Post #24 by @SevKohler [quote="damoca, post:23, topic:5058"] In the data instance, the code goes into the **name** attribute of the RM class. I could be inside the mapping of the DV_TEXT, or a defining_code of a DV_CODED_TEXT. [/quote] Yeah but what about e.g. DV_ORDINAL answers, the only real consistent solution here is using term_mapping tbh (if its not a DV_CODED per se). Also DV_CODED allows only for one coding, what about LOINC? Bind codings to them, render it into the composition using term mappings. --- ## Post #25 by @damoca The symbol of the DV_ORDINAL is a DV_CODED_TEXT, so you should be able to filter the values that you are looking for in the WHERE clause. I thought I was brave enough by wanting to search ELEMENTs by its term mapping, since we should be careful with the context where those elements are used. I can't imagine querying for a DV_ORDINAL value (i.e. select any DV_ORDINAL whose value is "sitting") without selecting at least the appropriate ELEMENT first. Regarding the multiple codings, the only solution is to use the **mappings** attribute of the DV_TEXT to put there the equivalent codes. --- ## Post #26 by @thomas.beale [quote="siljelb, post:20, topic:5058"] codes [/quote] We provide the keys i.e. rubrics - which are either archetype text fields from archetype terms, or may be something adjusted from that - to LOINC.org and they give us provisional codes (which are proper LOINC codes, i.e. NNNNN-N numerics), and then they review later. So sometimes they may rescind a provisional code (or maybe its meaning). We're about to test this process with LOINC for the first time this week, so I'll report on how it goes. It might seem weird extending LOINC to 'everything' when most people are used to thinking of it as a lab coding system, but given the LOINC / SNOMED collaboration, it seems to make sense - Snomed enables subsumption between LOINC codes to be asserted. We have also worked out a way of marking archetype fields with 'canonical LOINC codes'. Example: the LOINC code for 'body temperature' would be used instead of any specialised variant such as 'body temperature, tympanic method' (a typical precoordinated LOINC code). More soon. --- ## Post #27 by @pablo Can you share the compo? --- ## Post #28 by @damoca It was in a previous message :+1: --- ## Post #29 by @SevKohler I speak of the DV_ORDINAL answers, why shouldn't you not query on answers ? There are other examples, also then you will have term mappings + a coding in the DV_CODED_TEXT name, also sometimes you have atcodes as answers you annotate term mappings. I mean both are ok, but i rather like to have my codings all at one place ;) --- ## Post #30 by @damoca [quote="SevKohler, post:29, topic:5058"] I speak of the DV_ORDINAL answers, why shouldn’t you not query on answers ? [/quote] I did not say that you cannot query them, I said that probably you first need to query the question (the container ELEMENT), and then filter the answers you need in the WHERE clause. :wink: --- ## Post #31 by @SevKohler Sure, but what do you do if the ELEMENT coded text uses a LOINC code as name instead the SNOMED code you expect, the SNOMED code on the otherhand is contained in the term mapping to the LOINC code. Then you need to query for both on the name, i think just using the term mappings here makes things easier, covers all at once ;) All eggs in one basket, except for typical DV_CODED_TEXT which again often have at000* as local codes, which you also dont want to change or replace with snomed and rather annotate with term mappings. --- ## Post #32 by @SevKohler Better just added a autogeneration to the compositions of termmappings using the Bindings of the archetype in the latest release, i think thats a very elegant solution and the way to go. --- ## Post #33 by @sebastian.iancu 13 posts were merged into an existing topic: [Improve AQL to simplifying the querying of terms](/t/improve-aql-to-simplifying-the-querying-of-terms/5120/2) --- ## Post #34 by @ian.mcnicoll Can we maybe strip this out as a separate conversation on how to make mappings easier to query and to document? It is becoming an increasingly common issue as the use of LOINC and SNOMED becomes more universal. --- ## Post #35 by @birger.haarbrandt Is this done automatically when storing a composition? --- ## Post #36 by @SevKohler Yes, as far as i understood it, you can export the composition then with/without it + its searchable via AQL. --- ## Post #37 by @ian.mcnicoll The ability to 'push' term mappings has actually been in the Better Ehrscape /composition POST for some time but I think ti only works on LOCATABLEname/value and ![image|690x65](upload://mA0xcl6hpZd7OP6Vgaz341JocEg.png) It is applied at run-time per-composition, not in the template. and generates somethinh like this ``` { "@class": "ELEMENT", "name": { "@class": "DV_TEXT", "value": "Systolic", "mappings": [ { "@class": "TERM_MAPPING", "match": "=", "target": { "@class": "CODE_PHRASE", "terminology_id": { "@class": "TERMINOLOGY_ID", "value": "SNOMED-CT" }, "code_string": "271649006" } } ] } } ``` which was what I was suggesting in the other thread. I think it many only work on name/value. What is new in Better CDR 4.0 is the ability to query an ELEMENT for one of these terms , regardless of the parent archetype context. This uses a `code` predicate. Not how I might have done it but I guess it aligns with the FHIR approach. ``` SELECT sys/value as Systolic, dia/value as Diastolic FROM COMPOSITION c CONTAINS (ELEMENT sys[code='SNOMED-CT::271649006'] AND ELEMENT dia[code='SNOMED-CT::271650006']) ``` --- ## Post #38 by @SevKohler What also new is that they can pull/generate Term mappings out of the Term bindings of the archetypes. If it works only for LOCATABLE name/value idk have to try it. --- ## Post #47 by @damoca Back to my initial question, I tested with the new EHRbase v2.0 as suggested by @birger.haarbrandt and now it works perfectly, an AQL like this returns the list of all ELEMENTS without any problem :+1: ``` SELECT ele FROM EHR e CONTAINS COMPOSITION c CONTAINS OBSERVATION o CONTAINS ELEMENT ele ``` --- ## Post #48 by @sebastian.iancu For readability of this thread, I moved some AQL relevant posts to discourse.openehr.org/t/improve-aql-to-simplifying-the-querying-of-terms/5120 to continue there on AQL specifics. Some other RM related posts are at https://discourse.openehr.org/t/how-to-make-mappings-easier-to-query-and-to-document-in-openehr/5063 topic. --- **Canonical:** https://discourse.openehr.org/t/ehrbase-storing-and-querying-data-without-an-standalone-archetype/5058 **Original content:** https://discourse.openehr.org/t/ehrbase-storing-and-querying-data-without-an-standalone-archetype/5058