openEHR REST APIs - Release 0.9.0 / invitation for comments

The REST API Team (Bostjan Lah, Erik Sundvall, Sebastian Iancu, Heath Frankel, Pablo Pazos, and others on the SEC and elsewhere) have made a 0.9.0 Release of the ITS (Implementation Technology Specifications) component, in order to make a pre-1.0.0 release of the REST APIs available for wider comment.

The key point about this current release is that it is meant to be a ‘core basics’ foundation of APIs to build on, and some services like CDS, and more sophisticated querying (e.g. that Erik Sundvall has published in the past) will be added over time.

NOTE THAT IN THE 0.9.0 RELEASE, BREAKING CHANGES ARE POSSIBLE.

You can see the ITS Release 0.9.0 link here, while the links you see on the specs ‘working baseline’ page are the result of 0.9.0 plus any modifications made due to feedback. The .apib files are here in Github (see mainly the includes directory).

We are aiming to release 1.0.0 at the end of February, at which point the formal process kicks in.

Since we are in Release 0.9.0, the formal PR/CR process is not needed, and you can comment here. However, if you want to raise a PR you can, in which case, please set the Component and Affects Version fields appropriately:

  • thomas beale

Hello,
I have some questions on the explicit definition on how the result rows are returned in the query API:
1. In the example at https://www.openehr.org/releases/ITS/Release-0.9.0/docs/query.html#header-resultset-example there is one result row with one fact for each of the columns that were requested. Three of the columns contain primitves (i.e. Strings) but one column contains a JSON representation of an object. It is difficult to grasp from the example why some facts are returned as primitves and some as serialized JSON objects. It would be beneficial to see the corresponding archetype definitions in order to understand how different columns containing different parts of the archetypes are encoded.
2. When one of the columns contains a list of items in each cell (because the denoted data type is an ITEM_LIST), will the content of that cell be JSON serialized ? When a user requests all laboratory values (e.g. calcium measurements) for a specific patient, then a table would be returned. But instead of receiving the individual measurements in seperate columns so the result could be immediately processed in tools like Excel or SPSS, the data would have to be again crunched by another tool that makes a real table out of the table-JSON-compound.
2b. What happens when multiple columns can contain multiple values as their values. When every column containing multiple values (ITEM_LISTs) will be encoded with JSON and written to the same cell this will not be an issue. But when the multiple values are instead distributed over multiple cells (or rows) some mechanism has to be defined how to cope with the combination of two columns containing multiple values for one record. In SQL-JOINs this is handled with the Cartesian product of the sets of facts. But this bears the risk of combinatorial explosions when defining queries with too many columns containing too many values for each record.
Greetings
Georg

Hi Georg,
Please see comments inline

I noticed the recent version of the REST api only works with templates, and not archetypes.

As our EHR is ADL 2 only, this has some interesting consequences.

Of course, you can upload just the operational templates and probably create a fully functional EHR, but if you work with ADL 2, you would always need an external tool to create the OPT2 from the ADL repository that you want to use, instead of an EHR that generates the OPT2s from the source archetypes itself. Of course, you could just use the ADL workbench or the Archie adlchecker command-line utility to do it for you, but I’m not sure if it’s a nice thing to do.

Also if you only use OPT2, you might want to do queries such as ‘retrieve all information that is stored in an EHR that has been derived from archetype with id X, including archetypes specializing from X’ (not just operational template X). An example: retrieve all reports, or all care plans, regardless of the used template. To do that, you probably need a specialization section in the OPT2, which according to the specs should have been removed, and you also need to create operational templates for archetypes that you never use to directly create compositions from. Or the tool querying the EHR must be fully aware of the full archetype specialization tree and use that to create a query with all specializing archetypes included in the query.

So, is it intended that the REST API only works with operational templates, and never archetypes?

Regards,

Pieter Bos

(attachments)

alfdjnidndifedce.png

Hi,

We can add definition endpoint for archetypes as well, no problem.
Actually we had it in the previous version, so I'll just add it back.

Best regards,
Bostjan

I’ve been developing an abstract version of the platform definition in the background, mostly reverse-engineered from the REST API, in order to capture the pure call semantics (i.e. without HTTP protocol effects, JSON or XML serialisation etc), and in that I proposed two interfaces, one for ADL14 and one for ADL2, which seems a clearer approach to me - see here. This might be helpful in determining the separation we want.

  • thomas

I think, it would be good if the community who did this fantastic job, should also divide a kernel in micro-services and having interfaces between them, and let there be one entry-service to have the micro-services expose themselve to the outside world over REST, for now.

There is knowledge about how to structure micro-services, how which pattern to sue, between them and inside the service.

The kernel for OpenEhr is too large for not dividing it. Think about:
- OpenEhr-Terminology,
- Archetype parsing,
- Archetype serializing,
- Storage,
- Validating of data,
- validating of users/authorizations (or connecting to an external user-service),
- measurement (ucum),
- SNOMED,
- Patient repository (connecting to external)
- Messaging to other EHR eco-systems
- Internal Message queue

It would be a great help if these microservices were defined, so everyone could build and borrow them, when they are defined on community level, they will be interchangeable.

This is a good starting point for learning about that subject, but an experienced architect would be welcome.
https://www.packtpub.com/application-development/microservice-patterns-and-best-practices

Bert

Bert,

agree, and that’s a good list.

  • thomas

Hi Thomas,
I have taken a look at the abstract service model interfaces and have compared them to the interfaces we have developed for our own data warhouse architecture:
* IQueryService:
In our experience we noticed that queries can take quite some time to finish, so a synchonized/blocking method call did not work well as an interface. Queries can fail because of lack of memory or lack of disc space when the result sets get too large. An asynchronous query execution worked better. Additionally to the execute methods which return a running_query_ID there have to be methods that check the status of running queries. The result sets are as well accessed via the running_query_ID. A running query should be cancelable. When possible, a running query should provide the expected amount of time it still needs to finish (in total% or in seconds).
* Definition package
It is sometimes useful for query designers to be able so work with and save unfinished or syntactically incorrect queries. The definition interface should provide methods that check the syntactic correctness of methods and provide comments (String-based) on the state of a given query.
The provenance of queries that are used for medical research should be as strictly logged as all code, specifications and data that is involved in the provision of clinical data. Therefore the query descriptor calls needs members for versions and who created the query.
To support the management of queries they should be able to be stored in a kind of file structure (treelike). A query should therefore belong to a folder to which it is associated.
* Admin package
An important part of querying is logging of executed queries. There should be an Interface for the history of all executed queries, including AQL code, user, execution time, execution duration, amount of results.
* I_EHR service interface
Our data warehouse system is based on an EAV schema. Therefore it was easy to provide methods that return all instances of a certain attribute. With this kind of access it is easier to debug or create summary reports instead of using the query interfaces for doing the same job. I have to admid that I have no idea how to provide this kind of access in an archetype based model. The sole access at the moment is via the ehr_id. More methods that provided orthogonal access possibilities based on archetype IDs would be nice.
Greetings
Georg

good point - we have thought about this a lot in the past, but never formalised it. The obvious approach would seem to be that point of care queries, that are usually pre-built and part of UI app screens, should be served fast, and synchronously, ideally from a dedicated query service instance. Secondly, research queries, business process queries and most population queries (e.g. on reports) should probably be asynchronosly served. So I think we need both kinds. I think you are asking for calls that enable an async execution to start, and return a handle to the ‘running query’. That seems eminently sensible. If you want to propose it more formally, let me know, otherwise I can put in some features to do this, as I understand the idea. good point. We could make ‘authored queries’ be based on the same class or as archetypes perhaps, or some subset. This provides a lot of meta-data. we coined the term ‘query set’ to cover the idea of a set of related queries, but we have not yet formalised it. If you have specific suggestions here, we can incorporate them. Good point - I’ll add this. well the general idea I think is an ETL concept, whereby AQL querying can be used to export out flat DB tables of the kind you want. Ocean at least did this in the past, and built some template additions, from memory, that mapped template paths to dB table columns. I don’t have any detailed spec for this however, and would in any case need input from people who work more in this area, such as your team :wink: - thomas

Hi Pieter,

Besides the API, I think for ADL2 archetypes and templates/OPTs have the same model, and archetype IDs / template IDs will follow the same structure.So for ADL2 using archetypes or templates would be the same in the API.

Which endpoints do you find problematic in terms of using ADL2?

About querying, analyzing your use case, I think there are two ways of knowing the full specialization hierarchy, one is to query an archetype repo/CKM while evaluating a query and do not have that info in the data repo. Like “give me all archetype IDs that specialize arch ID X”, this will be [A, B, C], then use that list on the query in your data repo like “archetype_id IN [A, B, C]”.

The other option is to have the archetype repo/CKM integrated with the clinical data repo (which I don’t like architecturally speaking), so the “give me all the archetype IDs that specialize arch ID X” is resolved internally.

Considering there is a component that has knowledge about the specialization, and that can be used internally (behind the API) I don’t see the need of adding explicit support in the API for archetypes.

What I think is better is to define an archetype repo/CKM API to manage archetypes and to resolve specialization queries among other queries like “this path exists for this archetype ID?”, etc. If this is possible, we can have interoperability between archetype repos and your queries can use my repo to get specialization info, and vice-versa.

Best,

Pablo.

Sounds like a good proposal Pablo.

For ADL 2 a single archetype api can be used for both archetypes and templates. However, it makes sense to allow the get api of archetypes to specify the form you want the result in: differential, flattened, or operational template (opt2).

Our EHR still will integrate the archetype part and query part, as well as the option to choose a used archetype for a slot at runtime. Could all be built with separate services and apis, but once you have everything integrated it makes for a very easy to use API for both EHR and datawarehouse usage. without needing sophisticated client libraries. However, you need much more complex server side tools in the EHR of course.

Regards,

Pieter

Op 16 feb. 2018 om 15:48 heeft Pablo Pazos <pablo.,@cabolabs.com> het volgende geschreven:Ivo

Hi Pieter,s

Besides the API, I think for ADL2 archetypes and templates/OPTs have the same model, and archetype IDs / template IDs will follow the same structure.So for ADL2 using archetypes or templates would be the same in the API.

Which endpoints do you find problematic in terms of using ADL2?

About querying, analyzing your use case, I think there are two ways of knowing the full specialization hierarchy, one is to query an archetype repo/CKM while evaluating a query and do not have that info in the data repo. Like "give me all archetype IDs that specialize arch ID X", this will be [A, B, C], then use that list on the query in your data repo like "archetype_id IN [A, B, C]".

The other option is to have the archetype repo/CKM integrated with the clinical data repo (which I don't like architecturally speaking), so the "give me all the archetype IDs that specialize arch ID X" is resolved internally.

Considering there is a component that has knowledge about the specialization, and that can be used internally (behind the API) I don't see the need of adding explicit support in the API for archetypes.

What I think is better is to define an archetype repo/CKM API to manage archetypes and to resolve specialization queries among other queries like "this path exists for this archetype ID?", etc. If this is possible, we can have interoperability between archetype repos and your queries can use my repo to get specialization info, and vice-versa.

Best,
Pablo.

Hi Pieter,

Besides the API, I think for ADL2 archetypes and templates/OPTs have the same model,

ADL2 archetypes and templates have the same model as each other, but not the same as ADL14 archetypes or templates…

and archetype IDs / template IDs will follow the same structure.So for ADL2 using archetypes or templates would be the same in the API.

they are nearly the same; the ADL2 ids can have namespaces and also 3-part version ids.

Which endpoints do you find problematic in terms of using ADL2?

About querying, analyzing your use case, I think there are two ways of knowing the full specialization hierarchy, one is to query an archetype repo/CKM while evaluating a query and do not have that info in the data repo. Like “give me all archetype IDs that specialize arch ID X”, this will be [A, B, C], then use that list on the query in your data repo like “archetype_id IN [A, B, C]”.

The other option is to have the archetype repo/CKM integrated with the clinical data repo (which I don’t like architecturally speaking), so the “give me all the archetype IDs that specialize arch ID X” is resolved internally.

agree - CKM and other source management repos should be kept separate. Operational artefact repos should only include valid artefacts that are part of some release that is intended to be used with the system, including for query computations.

Hi Seref,
Quite some discussion you already had on this topic last year. The examples are a bit hard to grasp because of their complexity in size. Although they are not really that extremely complex but perhaps the problems should be discussed with the least complex examples as possible.

I would think it more understandable if it were explicitely described in the API documentation that all result data is generally encoded in JSON (with a link/reference to the corresponding reference model page). But that wouldn't be sufficient. The description of the encoding concerns not only how a single element is encoded but even more how collections are encoded. Those collections are not ITEM_LISTs or any other part of the reference model of openEHR because they are formed at the moment the query is executed. When in the example from the API the "uid"-member of "c" is assumed to be of the type ITEM_LIST (or any other collection type) and the member "value" of "uid" were a primitive, the requested column "c/uid/value" represents a collections although the "value"s themselves are primitves. This can't be handled with a generic object-serialization mechanism. The only object containing all desired "value"s is the parent "c" object. This could be generically serialized, but it isn't requested in the query. So there has to be an iteration over all "value"s,which can themselves be serialized, but all their resulting JSON-Strings would have to be concatenated (externally from the serializing mechanisms) and written to the same result cell. Or, when no concatenation is desired, some solution with something like a Cartesian product would be needed.

How do the commercial implementations of AQL like the software from Marand handle this stuff ?

Greetings
Georg

Thanks, I forgot the query engine as a service, and Rongs decision-engine.

I am a bit busy coming two weeks, but maybe, when I have time I will publish a micro-service design. I will also follow the discussions and see which already available software is suitable to wrap with a service layer, so that it is possible without writing too much software to create kernel-micro-services, which should work together and replace the less performing pieces by new software. That is one advantage of microservices-concept

Sounds like fun to me, and I already see quite some possibilities in the open source domain.

But first we need a plan, architecture. We’ll see.

Bert

Hi Pieter,

Hi, sorry to interfere, if II understand well,

I think a possible problem could be that respiratory infection caused by a virus can return some derived codes to be returned although in this case it are not so many.

However to use this mechanism generally, it can happen that really many derived codes will be returned from the SNOMED engine, and in that case the AQL query would need to be executed many times. Once for each possible derived code.

One could also consider to hand over the result set from AQL to the SNOMED engine to see if it is derived, which could cause less executions.

But in both cases it is datamining which is always difficult to estimate what the best strategy in a specific case is.

A good idea maybe to design an intelligent query-strategy-decision engine which offers advice to see what works best. This engine could execute limited queries, for example, with a count operator so that it does not need to go all the way when a limit is reached.

It is true what you write that datamining queries seldom are expected to return in real time, but I have seen situations in marketing were they ran for hours and queried almost one million dossiers, we even created in between databases.

That decision engine could also be an external service.

It is good to hear that you think about separated services anyway. That works in the advantage of a microservices architecture.

Bert

Hi,

In order to reach full interoperability and interpretability we need a clearcut separation between constituting models that are part of the Interoperability standards stack.
It is the function of a Terminology to create a system of concepts that coherently defines concepts in a domain.
And that is and must be the only function of SNOMED.

When it comes to queries we need to take into account data values in a context, the epistemology.
SNOMED will NEVER be able to model, contain the full temporal and spacial epistemology.
That context/epistemology is defined by meta-data in COMPOSITION that as committed to databases and documents.
In addition the world of databases is the domain of what is called the Closed World Assumption and Terminologies like SNOMED are part of the Open Worlds Assumption domain.
Mixing these two creates severe problems.

So I oppose the thought to crate search engines in the SNOMED Terminology domain.
CIMI has adopted this fines, sharp divide between the worlds of Archetypes and Terminology.

Gerard Freriks
+31 620347088
gfrer@luna.nl

Kattensingel 20
2801 CA Gouda
the Netherlands

In an ideal world you probably would just ask if the code is in the subset (both as parameters). From the snomed evaluation cost of both operations (give me all the codes and is this code in the subset) cost virtually the same (or less). Also several caching techniques could be used in both scenarios so recurring queries are instant even for the “A in B” operation

That is why OpenEhr and SNOMED can be a strong combination.