Archie version 0.1.0 released

A few months ago I announced the development of an open source Java openEHR library called Archie. It has progressed enough that I’m now pleased to announced the release of version 0.1.0 of Archie. It supports the latest ADL 2 and reference model versions and is licensed under the Apache license.

The documentation and source can be found at https://github.com/nedap/archie. It’s available at Maven Central at com.nedap.healthcare:archie:0.1.0 .

Why another library?

The existing open source openEHR libraries were either ADL 1.4 or published under the Affero GPL. This means there is no library available for non-GPL ADL 2 openEHR projects. We are building a non-GPL openEHR implementation, so we needed a library and wrote it. We believe the openEHR community benefits from having up to date open source tools, so we decided to release Archie.

Features:

  * ADL parser, including a generic ODIN to Java objects mapper
  * Archetype Object Model
  * The EHR part of the Reference Model
  * Basic APath-queries on AOM and RM objects
  * Flattener
  * Operational template creation
  * Easy to use APIs for object creation, terminology lookup, archetype model tree walking and constraint checking
  * RM serializes to XML in accordance with the openEHR-published XSD, or to JSON using jackson
  * AOM serialization to JSON for easy use in JavaScript based web-applications.
  * Tools for reference model object creation and attribute setting based on archetype constraints
  * Pluggable reference model architecture: the reference model can be swapped for some other implementation and the tools keep working

Experimental features:

These features are included, but the API and implementation of these features will probably change in the coming versions:

  * Rules evaluation (with some minor syntax changes for now, see the project readme)
  * Full APath and XPath expression evaluation on reference model objects using the JAXP–implementation of your choice

Future releases and contributions

We’re continuing the development, so there will be more releases in the near future. If you want to contribute: pull requests and issue reports are very welcome!

Regards,

Pieter Bos
Nedap Healthcare

Dear Pieter,

thank you for the info. I will ceratinly take a look.

Kind Regards,
Mate

Very good Pieter, inspiring features.

Congratulations!

Bert

Hi Pieter,

This looks a really interesting and valuable piece of work. Many thanks for your efforts - hopefully others will want to contribute. It would be nice if we could somehow bring this work and the existing AOM2/ADL2 work together (or at least not duplicate work). I appreciate there is a different licensing approach but I don’t think that is necessarily set in stone.

Ian

Hi Pieter,

This is a significant piece of work! Congratulations for reaching this point and thank you for sharing it with the rest of the world. I for one look forward to hearing good news about your openEHR implementation.

All the best

Seref

Hi Ian,

Good to hear this work is being appreciated.

It could certainly be possible to merge Archie with the existing adl2-core library. I think the adl2-core library looks like it has good quality code and decent API. It could be interesting because although there is quite a bit of overlap in functionality, Archie has functionality that adl2-core does not have, and adl2-core has functionality Archie does not yet have. If the owners of that library are willing to relicense their code or at least parts of their code under a different license, it could be interesting. We’re open to releasing this code under a different license, but only if the resulting work can be used in non-GPL software.

Regards,

Pieter Bos
Nedap Healthcare

Nice wok Piter!

I've seen quite a lot of open source projects with dual licensing.
Maybe this is the way to go so we can please everyone

Regards

Hello Diego,

That is possibly, but has some complications:

To make a dual licensing approach work in this case it would requires us to release Archie under the AGPL, combine it with adl2-core and get a license from Marand to use their contributions to the resulting library in our products combined with a license from us to them to use our contributions in their products. Also all future contributors will have to sign a document allowing to use their contribution to be released under a different license by Marand and Nedap.

That would leave the resulting combined library still unusable for other non-GPL projects by others.

I would prefer another way forward :slight_smile:

Regards,

Pieter Bos

I’m sure something can be worked out. Not my call personally of course.

But just a thought for everyone who instantly thinks ‘oh no, not another wheel re-invention’… the work described here probably is slightly re-inventing something, but as Pieter has said, there are overlaps and also unique elements to each library.

Anyway, my thought is this: even a perfectly redundant wheel-reinvention exercise does achieve one very useful thing: it creates a new dev team that understands the specification and model intimately, and knows how to code with it - in other words we are growing the developer community. This is very valuable.

  • thomas

Thomas, I couldn't agree more.

Not only on license but also on technical architectural approach this is not just another library, but a library with unique possibilities, some will maybe be discovered later, when someone needs them. It is nicely programmed, easy to understand and walk through the code.

For example, how many libraries are there for processing XML for Java with a lot of similar functionality?
I know at least five, but there are more. That is how the software eco-system works, very Darwinian.
How many Office-applications, how many SQL databases, how many this, how many that are there?
I would not compare software-writing with another wheel-invention, software is complex, and even for wheels, they get reinvented all the time.

I am very pleased by the work of Nedap/Pieter Bos, and I am very pleased that a Dutch company is building an OpenEHR implementation.
When there are more companies/organizations/projects building around OpenEHR, OpenEHR will be accepted on the market even more easy.
It is one of the factors that make a software-concept to a success: many implementations.

By the way, another AOM-library is here:
https://github.com/BertVerhees/archetyped_kernel
I only work one hour a day on it, when time permits, so not every day, but still, it is already 75% complete. I hope to finish that work before August.

Best regards
Bert Verhees

My congratulations too, Pieter!

To add to Thomas, a perfectly redundant wheel-reinvention based on the specs also validates the specs nicely and highlights potential problems/inaccuracies/underspecified things in the underlying specs as we have seen previously with the various 1.4 implementations.

(Yes, even with the nearly perfect openEHR specs :wink: )

Sebastian

It certainly does validate specs. In fact, it already has caused some corrections to both the specs and the ANTLR-grammar.

And we already found a few more issues in the specs. I’ll soon file an issue report about the rules section, to specify how to handle operators on multiple-valued path expressions without a for_all :slight_smile:

Pieter

Pieter,

With respect to the ‘rules’ bit of ADL, and also GDL, there is a new draft ‘Expressions’ spec in the BASE component. This is a working draft, and partly lifted from ADL/AOMs specs (those now just include this one), plus some extensions to show how rule extensions are done properly.

This spec proposes an improved syntax, but it’s definitely not finished (e.g. I am thinking of getting rid of the $var style syntax), and it would be great to have some other collaborators on it who have a lot of experience with expressions / rules systems. So please have a look and feel free to comment - comments here probably make sense since others may be interested.

The draft of this spec will be released soon in a new release of the BASE component. All that means is that changes from then need to be documented by PRs and CRs in the normal fashion.

  • thomas

Hello Thomas,

I had already noticed the expressions part and based my experimental implementation on that. This email got quite long, so let’s start with a summary:

Summary:
- The current spec is quite similar to XPath. We can keep this even closer by referencing to the XPath specification in our specification in more places. It allows for tool reuse and resolves ambiguities in the specification.
- Some other problems/questions where found regarding to the spec, including grammar ambiguities and how to handle them and a question about node-ids that exist in the AOM, but not alway in the RM.

I have not implemented the full expression language yet, so I might find more, for example when I implement functions.

XPath and the relation to the expressions language:

Before i note my issues, I would like to point out I noticed the language is very similar to XPath. In fact, you can convert almost all of the expressions language to valid XPath 2.0-expressions with some simple steps:

  1. Split into separate statements. For every statement:
  2. Replace Apath shorthand notation with xpath: [id1] to [@archetype_node_id = ‘id1’], etc.
  3. Replace symbolic form of operators with the textual form
  4. Replace for_all … In … … with ‘every $var in /path satisfies …’
  5. Replaces implies with ‘if … then …’
  6. Replace exists(expression) with count(expression) > 0

Then, get an Xpath implementation that works on your reference model, or just convert to XML first. Then for every assertion, evaluate the expression to a boolean. For every variable declaration, evaluate the expression to the type given in the variable declaration and store it under the given name.
Then implement the standard functions and variables. Functions and variables are part of standard Xpath, and so is defining your own.

If you do this, you just implemented full assertion support with very little effort and code, and very little chance of mistakes!

(If all you have is xpath 1, the for all and implies require manual handling. You might need to do a bit of extra work for some datatypes, especially terminology codes)

Having noticed this, i’m strongly in favour of keeping the syntax as close to Xpath as possible. This means we can reuse tools. Or, if you have reasons to write your own (I do, unfortunately), at least you can validate your implementations easily by testing against a known implementation.

So I would argue strongly in favour of keeping the $var syntax, because it is the same as the xpath-standard.

Some constructions in the expressions have a valid reason why they are different than Xpath, for example, the shorthand notation for archetype node ids really helps. I would say this could include the exists operator, because it expresses something that is often needed and expressing it explicitly allows for some really nice features in user interfaces.

However, I think this does not apply to the for_all and implies statements. If they could be replaced with the corresponding Xpath-syntax, I would think that is a good idea.

Problems in the specification

Here the problems I found in the spec so far:

Multiple-valued paths and type conversion:

  * The spec does not say how to handle multiple-valued expressions, outside for_all statements. We could just follow the xpath-standard
  * The spec says nothing about type conversion. We could just follow the xpath-standard.

Whitespace aware grammar

The current definition of the language needs a whitespace aware grammar. If not, the following is ambiguous:

$var:Integer ::= /path/to/value
/path/to/another/value > 3

Because there is no way to see which part of /path/to/value/path/to/another/value belongs to the first or second statement without considering whitespace in your parser. And that’s fine in a lexer, but harder to do in a parser – although still possible. Alternatively, it’s easily solved by demarcating your assertions, for example by requiring a ‘;’ after every assertion

The same problem happens in a second place:

for_all $var in /path /some/other/path > $var/subpath

This is actually even a bit hard to read for a human, because the space after /path is easily overlooked. Both the whitespace-awareness and the human readability could be easily solved by replacing for_all with the every .. In … satisfies syntax of xpath.

Node ids in archetype/reference model objects

In archetypes, some nodes have node ids, that have no node id in the corresponding reference model object. This is tricky, because a valid path to an archetype node, converted to Xpath, is NOT a valid path to the corresponding reference model objects. For example, the context attribute of a Composition is an EVENT_CONTEXT. This does not have an archetype node id. But it always has one in the ADL/AOM. So if you write the path /context[id2], you can convert it to Xpath as /composition/context[@archetype_node_id = ‘id2’]. But this will result in an empty node set, because there is no matching attribute called archetype_node_id. Instead, you could just write /context, which works.

So, there are several options to address this in the specification, for example:

  1. Specify that paths to non-locatables should NOT have a [idx] predicate, even though the id in the archetype is present
  2. Specify that paths to non-locatables can have a [idx] predicate, but it should be ignored in implementations

Option 2 is a harder to implement, because you can no longer convert from Apath to Xpath without knowledge of the model. But as Apath expressions are not new, I’m thinking some other people will have an opinion on this :slight_smile:

Regards,

Pieter Bos

Hello Thomas,

I had already noticed the expressions part and based my experimental implementation on that. This email got quite long, so let’s start with a summary:

Summary:
- The current spec is quite similar to XPath. We can keep this even closer by referencing to the XPath specification in our specification in more places. It allows for tool reuse and resolves ambiguities in the specification.

I'm assuming you think we should reference specific parts of the the W3C Xpath spec from specific parts of the openEH Expressions spec? That sounds sensible to me. Do you have suggestions? If you report them in a PR, we can incorporate them in the spec.

- Some other problems/questions where found regarding to the spec, including grammar ambiguities and how to handle them and a question about node-ids that exist in the AOM, but not alway in the RM.

I remember responding to some earlier issues. If there are new issues not yet reported, please report them, either here or as PRs.

I have not implemented the full expression language yet, so I might find more, for example when I implement functions.

XPath and the relation to the expressions language:

Before i note my issues, I would like to point out I noticed the language is very similar to XPath. In fact, you can convert almost all of the expressions language to valid XPath 2.0-expressions with some simple steps:

   1. Split into separate statements. For every statement:
   2. Replace Apath shorthand notation with xpath: [id1] to [@archetype_node_id = ‘id1’], etc.
   3. Replace symbolic form of operators with the textual form
   4. Replace for_all … In … … with ‘every $var in /path satisfies …’
   5. Replaces implies with ‘if … then …’
   6. Replace exists(expression) with count(expression) > 0

Then, get an Xpath implementation that works on your reference model, or just convert to XML first. Then for every assertion, evaluate the expression to a boolean. For every variable declaration, evaluate the expression to the type given in the variable declaration and store it under the given name.
Then implement the standard functions and variables. Functions and variables are part of standard Xpath, and so is defining your own.

would it make sense to include an Appendix on 'Xpath Adaption' or similar, with this logical algorithm included? If you are interested in writing content for such an appendix, that would be very welcome.

If you do this, you just implemented full assertion support with very little effort and code, and very little chance of mistakes!

... for an XML data context of course....

There needs to be an ODINpath and a JSONpath....

(If all you have is xpath 1, the for all and implies require manual handling. You might need to do a bit of extra work for some datatypes, especially terminology codes)

Having noticed this, i’m strongly in favour of keeping the syntax as close to Xpath as possible. This means we can reuse tools. Or, if you have reasons to write your own (I do, unfortunately), at least you can validate your implementations easily by testing against a known implementation.

even if not, just being able to re-use grammar or grammar design ideas is useful.

So I would argue strongly in favour of keeping the $var syntax, because it is the same as the xpath-standard.

OK, that seems like a good reason.

Some constructions in the expressions have a valid reason why they are different than Xpath, for example, the shorthand notation for archetype node ids really helps. I would say this could include the exists operator, because it expresses something that is often needed and expressing it explicitly allows for some really nice features in user interfaces.

However, I think this does not apply to the for_all and implies statements. If they could be replaced with the corresponding Xpath-syntax, I would think that is a good idea.

on the one hand I prefer first order predicate logic operators, because absolutely everyone understands them, and the meaning is universal, but on the other hand I see the value of sticking closer to a well-known syntax. I'm inclined to not worry too much about the 'surface syntax' if it can be absolutely guaranteed that lossless (and easy) conversion can be performed. But others may have other ideas.

Problems in the specification

Pieter, can you put the issues you raise below in a new PR 'Expression language issues' or similar?

thanks

- thomas