How about creating an openEHR test base?

Hi all,

I’m analyzing different ways of having more people involved in openEHR software development at our spanish spakers openEHR community (http://openehr.org.es).

The idea of having a test base with sample artifacts just came through my mind.
Obviously this could be beneficial for all the openEHR community!

The idea is to have a public repository with some archetypes, templates and OTPs, with some referenced term sets, and some composition instances in XML/JSON format and also extract instances in XML/JSON could be great for us all, because we can try implementation and communication of openEHR data using those artifacts.

What I have trouble with is to find valid composition and extract instances in XML format, and also, with the current openEHR XSDs, issues with those have been reported several times and I don’t know if they were corrected or not, or if the current XSDs are valid and correct: http://www.openehr.org/releases/1.0.2/its/XML-schema/index.html

Can we think of creating something like that in the near future?

Just drop me a line if you want to collaborate!

I’ve created a page on the wiki: http://www.openehr.org/wiki/display/dev/Development+test+base

We’ll keep it up to date with our progress.

HI Pablo,

I have been seeking such repository to share our artefacts.
But I am hesitated to make it out, because of license issue.
I know that all the artefacts will be available under Apache 2
license, but it is not officially announced. Articles about license
on the openEHR.org are confused.
http://www.openehr.org/free_commercial_use.htm.html
http://www.openehr.org/298-OE.html
By the way, we cannot wait so long time to step out.
Shall we share our materials on GitHub?

Best regards,
Shinji Kobayashi

Hi Shinji and guys,

Right now I don’t care about license issues, if we have problems in the future, we can just create our own testing archetypes and templates and go on with the development :smiley:

About publishing, I think we need to discuss a little about how we will govern this repository, and how we will converge to a common and consistent set of artifacts for testing.

I have proposed here *** that we can start attaching files to the wiki and linking them under our names, each one of us can describe each artifact, what issues it has, what tweaks and fixes have made over those artifacts, etc.

When we have this “baseline”, we can start working on fixing problems, harmonizing formats, etc.

Then we can create consistent test sets, and small pieces of software that can process those test sets and execute some test (like unit testing for openEHR). In this stage we can move those test sets to something more powerful like github.

What do you think about this plan? Does it makes sense? Does all of us agree?

Please leave a comment at (or edit) the wiki page: *** http://www.openehr.org/wiki/display/dev/Development+test+base

pablo pazos wrote:

I have proposed here *** that we can start attaching files to the wiki and linking them under our names, each one of us can describe each artifact, what issues it has, what tweaks and fixes have made over those artifacts, etc.

Hi Pablo,

I get the impression that you aren't aware that this test repository already exists:

  http://www.openehr.org/svn/knowledge2/TRUNK/archetypes/ADL_1.5_test

Have you considered building on this, rather than starting a whole new repository?

- Peter

Hi Peter, thanks for the pointer.

I think this is only ADL related and only 1.5. My idea is to include ADL1.4 and RM instances in XML and JSON, RM & AOM XSD, also term sets.
Maybe we can took some samples from there, but I believe this new repo has a wider scope. What do you think?

Hi Pablo,

It makes more sense to me to add all of that to the existing repository rather than fragmenting the effort.

- Peter

Hi Peter,

That makes sense, but I think we are on a previous stage than deciding the physical location of the files.
Now we are trying to see what artifacts were developed individualy, what problems we have with our implementations, trying to improve and harmonize all that, etc. then we’ll look for a location for all that.

http://www.openehr.org/wiki/display/dev/Development+test+base

Artifact governance

Just to start with a clear view of what we have and in which state, each published artifact will be under the collaborator’s name, because each one of us might have different versions (maybe structurally different, with different tweaks and fixes) of those artifacts. Then we will try to converge on a common version for each artifact type.

Please attach files to this page and link them in the sections below. Please add a small description to each file, like “what it represents” and if you have tweaked the format or fixed some problem with the format, please comment about that too.

I would say the scope of that repository is different, as that is part
of the test for current evolving 1.5 syntax and does not include
'real' archetypes

Diego Boscá wrote:

I would say the scope of that repository is different, as that is part
of the test for current evolving 1.5 syntax and does not include
'real' archetypes

My understanding was that Pablo was not proposing real archetypes either. In his original post, Pablo proposed a "test base with sample artifacts".

How would this be different from the purpose of the existing http://www.openehr.org/svn/knowledge2 repository? The only difference that I can see is that Pablo has proposed adding a greater variety of artefacts (OPTs, etc.), so it seems natural to add them to the existing repository.

- Peter

Pablo also mentioned 'RM instances in a variety of formats', which are
not 'artefacts'.

Hi Diego & Peter,

What Diego said about evolving tests for ADL1.5 is true, we don’t want to test the tools or the specs, we want to test our implementations (EHRs, services, repositories, etc).

I agree this overlaps in some way with the CKM content (archetypes and templates), but our focus is on flat archetypes and operative templates, things that will be used by systems, not on source ADL archetypes with slots, abstract types and other things that makes implementation a pain in the 4$$… you know waht I mean.

I agree what Diego said in the last message: we want RM instances (XML) in the repo, which will be valid against XSDs (that we need to test and fix, XSDs will be included in the repo too). JSON instances will be welcome too :smiley:

To give more context, this is taken from a private message to Erik:

What I have in mind is to create something like a unit test for openEHR applications and services, with archetypes, rm instances and term sets. E.g. having a test set with some archetypes, a template, some term sets and a couple of instances in xml and json formats, and create some small software that can handle those test sets, validating instances to schemas, validating structures to archetypes, etc. and maybe geting data from the instances and doing something with it, …

Hi!

I agree that we need some RM instances etc initially. We have
versioned compositions in the demo server for our LiU EEE-system. We
don't know if they are 100% according to spec since they have not been
extensively tested. I'll upload some of them to the wikipage after a
deadline I have this week (remind me if they are not there next monday
:wink: I can give a limited number of people access to them now via
REST-interfaces (HTTP via a browser works fine). Mail me off-list if
you are in a hurry.

Would EHR-data reflecting a number realistic patient stories be
interesting to collaborate on as a second step? I am in desperate need
of such EHR data in order to create and test EHR-visualisations.
Getting "real" patient data is a pain to get access to and if we get
it we can never share it. Could we share the effort of creating a
number of such EHR instances (and perhaps write a shared academic
paper about it) - If so let's first check/discuss some of the options
for data entry and once that is fixed we can involve more clinicians
to create and improve/review the stories. A shared set could be reused
in several projects and make them more comparable too.

Best regards,
Erik Sundvall
erik.sundvall@liu.se http://www.imt.liu.se/~erisu/ Tel: +46-13-286733

My view is that this existing repository should be expanded to include all test case archetypes in ADL and any of the other serialised formalisms. Today it does mainly concentrate on ADL/AOM 1.5 test cases. Let’s think about what other test case material could be added, and how it should be organised. Rong Chen (Sweden) and Koray Atalag (NZ) have thought quite a lot about this in the past and I am sure would have ideas to contribute - Erik Sundvall has been thinking about some of the other serialisations. I have to admit to only having seriously thought about test cases for bidirectional tool processing, which is currently ADL, dADL, and will extend to XML-AOM (I just haven’t gotten around to this yet).

I have not thought too much about test cases for JSON or YAML, but I have done the output serialisations for them. Having done the first implementation of JSON, I think it is too weak a formalism to be seriously useful, because it lacks too many basic semantics - particularly dynamic type markers. Its cousin YAML is over-complicated (and in its whitespace form, nearly impossible to get right!), but does have proper OO semantics and I think can be used as a lossless serialisation. Others may have more evolved ideas on how these particular formalisms should be used in openEHR, so I am very happy to be educated by the experts. My main aim is to make sure that the transformations of ADL => JSON and ADL => YAML are correct. You can experiment with JSON, YAML and XML outputs of any ADL 1.4 or 1.5 archetypes right now, using the ADL workbench, which has a bulk export mode into these formalisms.

We have already discussed last week with Rong & Sebastian about moving the openEHR terminology there, and how to manage it more effectively, so the scope of this knowledge repository is going to continue to grow anyway. So any community input on how to expand this repository and manage it is welcome from my point of view (I realise the above might only be a subset of your original scope Pablo, so there are probably some things that still need to be done elsewhere.)

  • thomas

Hi Tom!

Could we use the openEHR github project (that you registered) for
hosting a subproject with the openEHR Terminology? I believe it can
make ongoing branching/patching more visible and easier to
merge/administrate.

There is no hurry to move existing test-archetypes there, but for new
efforts (terminology, RM-instance-examples etc) me might as well start
there (perhaps as a separate subproject).

Best regards,
Erik Sundvall
erik.sundvall@liu.se http://www.imt.liu.se/~erisu/ Tel: +46-13-286733

yes, we will obviously migrate over to Github in the coming months. I have a slight concern about how to avoid chaos, and I do think we need to think carefully about how we organise Git projects/subprojects in general. The openEHR terminology is not large (at all), but looks like it will become more than one file, according to a discussion the other day (I will write this up and post it before doing anything), but I was thinking it needs to be part of a broader openEHR knowledge repository. Although… I have listed it as a distinct ‘component’ of the specification program - maybe it should have its own repository anyway. Translations of it will multiply the number of files substantially as time goes on, so that is another reason perhaps for a separate repository.

I think test archetypes & templates probably should be separate from test & example data, so that is two repositories right there. That would give us:

  • open terminology

  • test archetypes & templates

  • test & example data
    We need to add existing active software projects:

  • Java ref implem project

  • ADL Workbench

  • (Ocean) Archetype Editor

  • Opereffa

Not sure about the following:

  • LiU modelling tools

Ruby I think is on its own repository; the Python implementation I believe is no longer openEHR, but some kind of custom fork in its own repositories. openEHR on .Net is on codeplex.

Any others?

  • thomas

Hi Thomas, just to be sure we are on the same page:

From previous emails:

What we need is to test our implementations (EHRs, services, repositories, etc), we don’t want to test the tools or the specs (i.e. we will not use an archetype for a “guitar” concept).

We want to concentrate on flat archetypes and operative templates, things that will be used by systems, not on source ADL archetypes with slots, abstract types and other things that makes implementation a pain in the 4$$… you know what I mean.

JSON and other serialization formats should be considered only for transport purposes, not for modelling, BTW I mentioned only RM instances in JSON, not archetype instances (but it’s possible to transport archetype and templates using JSON).

What I want (and maybe others) is:

  1. to be sure that RM XSDs are correct compared to the specs,
  2. have some RM XML instances are correct validated against XSDs,
  3. to have RM XML instances generated for some OTPs, with the referenced source archetypes and term sets accessible too,
  4. create some JSON form of those RM XML instances to play around with REST services and web browser/javascript apps,
  5. create some test cases in our own projects to be sure we are ok, maybe share those tests and results,
  6. maybe do some interoperability tests, e.g. generate some of this artifacts in one system, transport them to another and see if test cases pass or not.

What do you think guys?

Kind regards,
Pablo.

I don’t have the time to do what I’m going to suggest next, but if someone has time in their hands, I’d suggest writing a tool that will automatically generate valid XML RM documents, such as compositions etc.

Archetypes and templates define boundaries of all valid instances of clinical models, and one can generate random instances that belong to this set. Opereffa’s current version supports this, but not with XML output. I used this approach to test performance of persistence options

If our argument is that all the clinical information can be represented via models, why should be create RM instances by hand?

Regards
Seref

Hi Seref, I’ve a tool that generates composition instances from archetypes and data, what I don’t have is a way to generate a valid XML form from those compositions.

In order to do that, we should resolve current reported issues with the XSDs (see my first email), and then generate XMLs, at first maybe by hand, later integrated into tools.

I'm working on that, but the instances that are being generated for
the moment still need some further processing to be considered
clinically valid (e.g. if archetype says that a number <1000 is
expected, one valid value is -1234567, which makes no sense from a
clinical perspective). It needs works but looks promising so far