Best Database for openEHR

Bert_Verhees · 12 November 2019 20:57

I will take a look at the project and see how it goes. Using a JPA interface makes it modulair which is good. I wonder if you are able to separate the DB layer from the dats access code in a way that it is possible to use different kind of DB’s. This has as consequence that table-structures are invisible at data access level.

I think it is necessary for creating a modular OpenEhr engine.

Bert_Verhees · 12 November 2019 21:04

It is true Pablo, there is no best way, there are advantages/disadvantages to an architecture. And also legacy is important, this counts extra if modularity is lost, which happens often when a product exists longer.

I agree with the lessons you learn us, and I want to add one.

All software guru’s learn us to keep the architecture modular, even if it costs extra money or extra time or extra refactoring. Also extensive tests are important to make refactoring possible.

birger.haarbrandt · 12 November 2019 21:15

In theory, this might work. In (our) practice, EtherCIS and EHRbase are using lots of specific functions from Postgres. I think that especially AQL is the hard part here that requires some tweaking. So migrating to a different database technology would actually be very expensive right now.

Bert_Verhees · 12 November 2019 21:29

That is often that way, that keeping it strictly modular costs more then letting the different layers becoming visible in each other.

But in the longer term, strictly modular code always pays it back and strictly modular code is also better testable.

However, who am I to lay that upon you. You are building a great project. Congratulations with that and good luck.

pablo · 13 November 2019 16:04

I don’t think dependency on one specific DBMS is an issue, since most openEHR vendors actually use one specific DBMS, I guess is because all the tweaks needed to actually support AQL.

In the EHRServer we don’t support AQL, we have a Simplified Domain Specific Language for querying, that works similarly to AQL but doesn’t need customization at the database level. Also because we use Grails Framework, that uses Hibernate, the DMBS could be changed from configuration, and any relational DB is supported. This was also made by design, knowing different clients might prefer one DBMS over others, and they might have licenses for a specific DMBS. On the other hand, if we think of microservices, this is another microservice in the architecture and is like a black box working independently form the rest of the components, so the specific DBMS might not matter, it is just that for some it does. For instance, open source projects might choose MySQL/MariaDB over PostgreSQL or viceversa.

Bert_Verhees · 13 November 2019 19:07

Hi Pablo, very intesting paper you have written. I did not click on it the first time that I saw it.

I think like you that AQL is not necessarily needed to build a good CDR. Also because it is not a standard and not portable.

Path-based query languages like XQuery are better portable and maybe a better choice.

Google for fun on XQuery and mongodb.

Writing an AQL engine is of course possible but what is the point, what is it worth if there are high quality portable alternatives.

One can also consider in the context of being interoperable one should avoid using proprietary languages or languages with small user-base.

thomas.beale · 13 November 2019 19:15

Well AQL makes for portable queries, which are very interesting, especially to the CDS industry. No EMR systems have this at the moment.

Secondly, there is no reason why an openEHR system shouldn’t support more than one query language. There is no reason in principle not to add a specification corresponding to Pablo’s query language for example - we’ve just never tried to do it.

Bert_Verhees · 13 November 2019 19:39

In my opinion it would have been better when OpenEhr had chosen XQuery as preferred query language. It is an acknowledged w3 standard and guess capable of the same queries and like many systems using XQuery, OpenEhr has a path-based storage system.

But time will learn which query language will be used most. It is encouraging that more people consider other solutions.

pablo · 13 November 2019 22:42

My paper states that some AQL capabilities actually are against portability, like using name based predicates make the queries impossible to use on multi-language implementations. Also some vendors have their own extensions, which generate portability issues, like we have on MySQL vs. PostgreSQL queries, some are slightly different so are not 100% portable.

pablo · 13 November 2019 22:47

IMHO for some types of projects, there might be a specific way of querying that might be easier / simpler to implement than AQL and that actually satisfies the projects requirements very well. I’m not against AQL at all, just saying it is not the only solution, and it is not the best solution for all requirements, and sometimes it is not possible to implement. So I thought of the idea of having many domain specific languages for querying, designed to satisfy specific requirements, like support clinical decisions, help on analytics or population health.

Another nice touch in the openEHR REST API is that for queries, there is actually a “formalism” parameter that allows to send other things than AQL to the query services. In the near future that will be implemented in the EHRServer, using this simplified querying model.

thomas.beale · 13 November 2019 23:58

Well if literal strings are used in names, that’s true, but that’s going to happen in queries in any language, it’s not specific to AQL. It’s really about what’s in the data.

Vendor differences are hopefully a temporary thing.

thomas.beale · 14 November 2019 00:02

I would have been quite happy with XQuery, if it worked. But it doesn’t, directly. XQuery works fine expressing paths and predicates over serial XML documents. It doesn’t know about archetypes or two-layer modelling. AQL is about as close to XQuery as you can get for the more sophisticated data we have.

Bert_Verhees · 14 November 2019 06:55

True, working with XQuery requires a technical approach instead of a more conceptual, but on the other hand, shouldn’t querying always be something for technicians instead of clinicians.

The target user group that is focused with AQL is very small. Limited on:
One side by people who cannot understand that archetype identifier is nothing more then a technical property of a datastructure indicating the structure and RM class to expect.
On the other side by people who are not able to understand that blood pressure has a measurement-unit and a location in a datastructure, people who are not aware of datastructures in general.

In fact, the focussed group consists of non-technical clinical information analysts.

Good query editors therefor hide for the focussed groups the dirty technical details and generate queries in any language that is required.

So AQL should not be a database oriented query language but a language that is GUI-oriented and then underwater creates the needed technical query.
So being an intermediate language used in the GUI. It should not be implemented in a data access layer, but be abstracted independent of database-brand or - type

thomas.beale · 14 November 2019 10:44

AQL is exactly not a ‘database-oriented query language’ - it’s a model-based query language. For that reason, it is eminently suitable for clinician use. I am assuming the use of tools that hide ugly technical things like path codes etc - as Better(Marand) have done in their tools, and probably others by now. Having a query language at the domain-model level, in a model-driven editor with intellisense makes building queries pretty easy for anyone. Our tooling is what is lagging behind.

pablo · 14 November 2019 15:19

If you consider just the syntactical part, I agree, but I think the underlying problem is on how archetypes are modeled.

There are two cases,

a free text search where the text value used to compare is partial and is probable that comes from user input, which I think will be common but we are not doing that much in AQL, and,
when the text is hardcoded into the AQL expression as a filter for specific data, which I’ve seen a lot in AQL, but this case uses the text as it is a code, which I don’t think is too reliable and affects portability.

The issue with 2 is that could be avoided if the archetype model the text as a coded text, which is how the text-based criteria is currently used. Another point is translations could change in archetypes, those are fixed and improved more frequently than the internal structure, so if just one character is changed in the archetype, all queries that used that string to filter will be broken.

IMO using hardcoded text in AQL is bad design of queries and bad design of archetypes, which could be fixed by using codes. Note this is not a problem of AQL, is just that AQL is affected by how archetypes are design, but we end up with queries we can’t move between any openEHR system compliant with AQL.

pablo · 14 November 2019 15:21

I agree, in fact I recommend my students and customers to delegate query creation to clinicians. We just need to provide tools they can use.

thomas.beale · 14 November 2019 16:00

Well the choice of codes rather than text is always a runtime application choice, since any DV_TEXT can always be instantiated as a DV_CODED_TEXT. Maybe certain templates should force DV_TEXT to DV_CODED_TEXT more often, but that’s a local choice. Generally though, it would be better if free text search keys were substituted via parameters rather than hard-wired in. I’m not sure any of this is a problem of AQL as such (let’s assume we agree that we need better general text / terminology operators, semantics etc).

ian.mcnicoll · 15 November 2019 15:12

As in everything, this is largely about the art of the possible and having to fit your design decisions into existing tooling, client budget and the exact use-case.

I think what you are referring to, Pablo, is the very common practice of using templated name/values i.e renaming an existing archetype node. We generally do this for two purposes.

To localise the node name so that it fits better with local requirements. Of course the disadvantage here is that this is not (in adl.1.4) language independent. This is sort of worked around in the Better Archetype Designer which uses ADL2 under the hood but once we get into proper ADL2.0 we will get language independence. It would actually be possible to add language independent terms to name/value and I have done this on some occasions but in general, this is overkill for the particular client requirement, as they will need to manage that terminology. However this is not really an issue in querying as the local name should not differ semantically from the parent.
To distinguish cloned nodes whether whole archetypes, clusters or individual elements. This is more problematic, since the cloned name is often subtly semantically different from the parent (though normally be a sub-type) and we are definitely reliant on that name/value override to distingush nodes with identical atcodes but slightly different semantics. Again we could use coded_text here, though the tools do not make that easy, and ADL 2.0 should further mitigate but there may also be other situations where we need to distinguish cloned nodes at runtime - vis-a-viz the other conversation.

I guess, so far most of the time we have not had to worry about universla portability since it depends on a huge number of things - common external terminologies, common archetypes, common templating practice. This is just another design trade-off that will be easier to mange over time.

pablo · 15 November 2019 15:32

Yes @ian.mcnicoll I’m referring to that practice of using name nodes to define what data is inside, and keep those as DV_TEXT. The issue would be solved by using DV_CODED_TEXT, since the problem is really using a free text as some kind of code to define which information is recorded. IMO there should be a recommendation somewhere to specialize archetypes that have DV_TEXT and are used in that way, and define DV_CODED_TEXT in the specialization. The problem now is: the archetype that should appear on the data and queries should be the specialized one, not the one (maybe) downloaded from the CKM.

I would say that AQL is as portable between different implementations as SQL is portable between different implementations. In SQL portability ends when you use some custom implementation feature or when the query is written in a way that is not compatible with other SQL implementations.

Seref · 19 November 2019 09:08

which is the case for pretty much every computing standard out there. SQL is a hugely successful query language in terms of adoption. What you seem to point out as a weakness is more of a misuse and pales “if” you consider the fact that SQL achieved and delivered more than any other query language . Don’t want to comment further in this particular thread which is way too subjective and flamewar-prone , but humble suggestion from one implementer to another: stick to AQL, it’ll go to places. Implement alternatives if you want to, but you’ll find those hard to sell compared to AQL. Whatever shortcomings you can come up with will mostly be handled within the standard, just like SQL did with window functions etc in time.