How about creating an openEHR test base?

Seref · 7 May 2012 15:10

That is my point exactly. If you use only the model (archetype && || template) and your data is not clinically valid, then should not this mean that the archetype needs work?
What if someone in a clinical setting entered -1234567 in real life? You would not be able to catch that with archetype based validation.
In this case, you should not be amending your code, you should be amending the archetype, or that is how I’d attempt to do it.

Kind regards
Seref

thomas.beale · 7 May 2012 16:09

Hi Thomas, just to be sure we are on the same page:

From previous emails:

What we need is to test our implementations (EHRs, services, repositories, etc), we don’t want to test the tools or the specs (i.e. we will not use an archetype for a “guitar” concept).

We want to concentrate on flat archetypes and operative templates, things that will be used by systems, not on source ADL archetypes with slots, abstract types and other things that makes implementation a pain in the 4$$… you know what I mean.

probably a couple of choices:

if you are thinking that the archetypes and templates are ‘test’ content, then put these archetypes and / or templates in the knowledge repository along with other test archetypes (it may be only templates, e.g. if the archetypes are straight from CKM).
Alternatively, if they are realistic structures, they could go into CKM as well (you can manage templates, ref-sets and downstream artefacts in CKM). The clinical modelling group are already thinking about using templates in CKM because this is more or less the only way to review archetypes properly.

JSON and other serialization formats should be considered only for transport purposes, not for modelling, BTW I mentioned only RM instances in JSON, not archetype instances (but it’s possible to transport archetype and templates using JSON).

I have no personal opinion on what people might use JSON for, my only concern is that it (in its current definition) doesn’t support dynamically bound data items (e.g. a List where the actual members are CLUSTERs and ELEMENTs), without some external schema for each data package, or else some kind of type hinting additions to JSON. Anyway, I am sure someone will figure out the solution here…

What I want (and maybe others) is:

to be sure that RM XSDs are correct compared to the specs,

if you mean the XSDs of the Reference Model, then they are already hosted in the specification repository and will continue to be.

The list below partly relates to conformance testing, which is also part of the Specification Program. We need a set of definitive conformance test input data for each openEHR Release, as part of a structured test set that can be used to validate any implementation of openEHR to some level of conformance. These could be in both XML and JSON.

have some RM XML instances are correct validated against XSDs,

to have RM XML instances generated for some OTPs, with the referenced source archetypes and term sets accessible too,

create some JSON form of those RM XML instances to play around with REST services and web browser/javascript apps,

create some test cases in our own projects to be sure we are ok, maybe share those tests and results,

maybe do some interoperability tests, e.g. generate some of this artifacts in one system, transport them to another and see if test cases pass or not.

probably we to separate the test data used for official conformance testing from other test data that might be specific to one implementation.

Anyway, this is a good list of things, I think we just need to organise where each thing goes.

thomas

thomas.beale · 7 May 2012 16:14

I don't have the time to do what I'm going to suggest next, but if someone has time in their hands, I'd suggest writing a tool that will automatically generate valid XML RM documents, such as compositions etc.

yes, a standard data synthesiser has been on the wish list for a long time, and is a needed tool. Ideally it would take a set of configuration inputs that cause it to generate data with some specific statistical properties, e.g. a realistic spread of signs, symptoms, diagnoses, demographic characteristics and so on.

Archetypes and templates define boundaries of all valid instances of clinical models, and one can generate random instances that belong to this set. Opereffa's current version supports this, but not with XML output. I used this approach to test performance of persistence options

I think random data is only useful to a point - you can test basic scalability and some performance metrics, but it won't reveal realistic performance tendencies for many types of queries, because real clinical only occupies a tiny, specific subset of the N-space represented by purely randomly generated data (and N is a high number!)

- thomas

pablo · 7 May 2012 16:58

Hi Thomas,

Hi Thomas, just to be sure we are on the same page:

From previous emails:

What we need is to test our implementations (EHRs, services, repositories, etc), we don’t want to test the tools or the specs (i.e. we will not use an archetype for a “guitar” concept).

We want to concentrate on flat archetypes and operative templates, things that will be used by systems, not on source ADL archetypes with slots, abstract types and other things that makes implementation a pain in the 4$$… you know what I mean.

probably a couple of choices:

if you are thinking that the archetypes and templates are ‘test’ content, then put these archetypes and / or templates in the knowledge repository along with other test archetypes (it may be only templates, e.g. if the archetypes are straight from CKM).
Alternatively, if they are realistic structures, they could go into CKM as well (you can manage templates, ref-sets and downstream artefacts in CKM). The clinical modelling group are already thinking about using templates in CKM because this is more or less the only way to review archetypes properly.

I’m looking into the second option, but as I said before (http://lists.openehr.org/pipermail/openehr-technical_lists.openehr.org/2012q2/007049.html) at this stage we are not discussing where to put things, we need to discuss how we will create the things we need.

JSON and other serialization formats should be considered only for transport purposes, not for modelling, BTW I mentioned only RM instances in JSON, not archetype instances (but it’s possible to transport archetype and templates using JSON).

I have no personal opinion on what people might use JSON for, my only concern is that it (in its current definition) doesn’t support dynamically bound data items (e.g. a List where the actual members are CLUSTERs and ELEMENTs), without some external schema for each data package, or else some kind of type hinting additions to JSON. Anyway, I am sure someone will figure out the solution here…

I don’t think I’m folowing you there, as I imagine it, JSON RM instances will be just for data, not for class definition, if you need the definition, just look into the archetype/template.

What I want (and maybe others) is:

to be sure that RM XSDs are correct compared to the specs,

if you mean the XSDs of the Reference Model, then they are already hosted in the specification repository and will continue to be.

Thinks all these items as unit test resources to excecute those tests, and creating something that let you download all you need automatically, or with just one click: archetypes, templates, term sets, XSDs, XMLs, JSON, etc.

Currently I have to download all this stuff from different sources, and some of this things are unavailable (e.g. from time to time someone asks on the lists for valid XML compositions).

The list below partly relates to conformance testing, which is also part of the Specification Program. We need a set of definitive conformance test input data for each openEHR Release, as part of a structured test set that can be used to validate any implementation of openEHR to some level of conformance. These could be in both XML and JSON.

I agree, but formalizing that will take some time, and I think we (implementers) have timely needs right now (obviously this work is not meant to be official to do conformance tests right now, but I think we can lead to that happy ending with a little experimentations).

have some RM XML instances are correct validated against XSDs,

to have RM XML instances generated for some OTPs, with the referenced source archetypes and term sets accessible too,

create some JSON form of those RM XML instances to play around with REST services and web browser/javascript apps,

create some test cases in our own projects to be sure we are ok, maybe share those tests and results,

maybe do some interoperability tests, e.g. generate some of this artifacts in one system, transport them to another and see if test cases pass or not.

probably we to separate the test data used for official conformance testing from other test data that might be specific to one implementation.

Anyway, this is a good list of things, I think we just need to organise where each thing goes.

Right now I prefer to concentrate my efforts on what artifacts we need to create, and later on where each thing goes.

Kind regards,
Pablo.

thomas

system · 7 May 2012 19:08

Hi Pablo,
The xml-binding component in the Java reference implementation does
just that. It binds RM object instance to generated XML objects that
can be serialized according to published XSD.
/Rong

pablo · 7 May 2012 21:37

Hi Rong,

That’s great news, but we have our own RM implementation because it handles ORM too.
But I think I can adapt your xml-binding component to use our RM impl, what do you think?

Heath_Frankel3 · 7 May 2012 22:49

Hi Erik,
I think that using an EHR service to store RM instances would be better than storing in SVN or GIT. Ultimately if the service was able to work from a GIT repository we would have the best of both worlds.
I had considered offering the Ocean EHR server but I assumed the usual issues relating to the commercial backend would have made this not suitable so I didn’t bother.
Would your service be an alternative, especially since it is RESTful?
Perhaps there is a need for multiple service implementations to be available working from the same instance repository, I am sure each have their strengths and weaknesses and interface approaches. For example the ocean EHR service picked up a data validation error reported on the list that another didn’t.
We can also use this to start comparing service models.
Heath

Heath_Frankel3 · 7 May 2012 23:02

Hi Pablo,
What issues do you have with the XSD? We have been producing valid instances for years. I have tools that can validate these in seconds. I am sitting on hundreds of test instances. Problem is I am not sitting around with nothing to do. If you have a student willing to do some dot NET code with little support you can go to openehr.codeplex.com to get what you need to create and validate openehr instances against OPTs and RM.

BTW, I have a local xsd that further constrains the published schema that picks up several additional RM invariants. Happy to contribute this but don’t want to confuse the status of the official schema. I also have a demographic schema which I believe is currently not part of the current openEHR release.
Heath

Heath_Frankel3 · 7 May 2012 23:14

Seref,
I think meaningful data is more useful than random maximal or minimal data.
I think that using the template data schema approach could be an easier way to produce data by hand if a GUI is not available but I am assuming this is not the case for Pablo.
The Ocean Template Designer is free to download, TDS can be generated, some work using a good XML tool can produce an instance fairly quickly. The TDD to openEHR transform is available on openehr.codeplex.com, you can use you language of choice to load the instance into an XML DOM, validate it against the schema to inject the default and fixed values and transform to openEHR.

From there you will need a bit more OPT and RM validation but you will be 90% of the way (especially if you use my further constrained version of basetypes.xsd, which I might make available on codeplex along with the transform).
Heath

Heath_Frankel3 · 7 May 2012 23:40

Pablo,

This is a good list, I have already commented on 1-3 and I am also interested 4-6.

I think a JSON format project would be good to make sure we get consistency earlier than later, it is not like XML where you can publish a schema and I suspect various toolkits will have their variations.

Producing test data is a time consuming effort, producing valid instance is easy enough but at present clinical archetypes are still moving so these get out of date quickly. The real work is developing know bad instances, because there are so many ways something can be bad. So we need to define the scope of this effort and perhaps using the test archetypes on openEHR is not a bad approach as these may be more stable than the clinical archetypes at this point. Having said that, perhaps as part of the CKM review process we can produce test instances that can be made available to CKM users and developers alike, this could be done at any stage of the review process, not just at completion.

Interoperability testing is extremely important, the IHE process demonstrates the benefits of this. I have done this with Rong several years ago and we found a few slight variations and assumptions that we needed to correlate, again I think we should do this sooner than later before we have too many systems installed with their own set of assumptions. This really needs resourcing but I think it should be the vendors that do this since ultimately they will be beneficiary of having an openEHR compatible system, but we do need some governance and tooling to support this process so we need some additional contributions to kick start the process.

I think your initiative is a really good start, it is certainly not a new idea but you’re making it happen, keep it up.

Regards

Heath

system · 8 May 2012 02:59

Hi Thomas Beale,

Our, Ruby implementation repository has already moved on GitHub for
our convenience
last year for our convenience.
I was wondering if we could move our repository under
github://openehr/ruby-impl-openhr.
It would be comprehensive rather than under skoba/ruby-impl-openehr
for publicity.

Regards,
Shinji Kobayashi

Seref · 8 May 2012 07:52

Interesting point again. There are various bits of functionality implemented in different projects, but the projects have different open source licences.
I’m not Rong of course, but his code uses mpl, and since I’ve used his code when I started Operaffa, Opereffa is mpl too (though it’ll be apache very soon).
So you’d need to check how licensing issues need to be handled if you use Rong’s code, assuming your work is not under mpl.

I think you’ve touched another important point Pablo

Kind regards
Seref

thomas.beale · 8 May 2012 09:04

you certainly can. I have to travel for a few days, but once I am back I will get on to organising with you and other teams how to structure the openEHR Github area.

thomas

ANASTASIOU_A1 · 8 May 2012 10:38

Dear Erik and all

(This email might appear a bit long but it actually makes just two points a) Data Synthesizer Tool, b)Availability of Realistic Subject data)

A) Data Synthesizer Tool
I absolutely agree on the "data synthesizer" tool.

It is something i would like to do as a test case for parsing an archetype's definition node and generating a representative object because in this case, each and every node defined in the spec would have to be handled.

It's not that much of a time consuming task if you already have the RM builder. The AM provides everything that is needed (For example: http://postimage.org/image/mcytss26f/ bounds for primitive types, cardinality / multiplicity for other data structures), so instead of just creating an object from the RM and attaching it in a hierarchy (just by calling its constructor maybe), some values would have to be generated and attached to its fields as well.

Once the RM object is constructed it can be serialized to anything (XML included) (and there goes a first "test base")

From this perspective, it is absolutely essential that the XSDs are valid (to ensure a valid structure) and also (Seref's got a very good point) that the archetypes are valid to ensure a valid content.

B) Availability of Realistic Subject Data
As far as clinically realistic datasets are concerned, i would like to suggest the following:

The Alzheimer's Disease Neuroimaging Initiative (ADNI) in the US is a long term project that collects, longitudinally, various clinical parameters from subjects at various stages in the disease (http://adni.loni.ucla.edu/).

At the moment, the dataset contains about 800 subjects. Each subject would have 4-5 sessions associated with it (at 6 month intervals usually) and for each session a number of parameters would be collected such as MMSE scores, ADAS Cog scores, received medication, lab tests and others as well as imaging biomarkers (MRI mostly). A basic "demographics" section is also available for each subject.

(To put it in the context of a visualisation, the story that these data reveal is the progression of AD on a subject / population of subjects which is very interesting.)

The data are made available as CSV files (about 12 MB just for the numerical data). An application must be made to ADNI to obtain the data. As redistribution of the data is prohibited (http://adni.loni.ucla.edu/wp-content/uploads/how_to_apply/ADNI_DSP_Policy.pdf) we would be working towards a tool that would accept a set of ADNI CSV files and transform them into a local openEHR enabled repository.

The task here would be to create some archetypes / templates that reflect the structure of the data shared by ADNI and then scan the CSVs and populate the openEHR enabled repository.

The CSV files are not in the best of conditions (the structure has been changed from version to version, certain fields (such as dates) might be in a number of different formats, the terminology is not exactly standardised, etc).

For us (ctmnd.org) to work on these files we have created an SQL database and a set of scripts that sanitize and import the CSVs.

I would be interested in turning this database into an openEHR enabled repository (whether a set of XML files or "proper" openEHR database) because it can be used for a number of things (especially for testing AQL).

If you think that this can be of help, let me know how we can progress with it.

Obviously the tool can be made available to everybody who can then apply to download the ADNI data locally.

I am not so sure about the data (even if they become totally anonymised), i will have to check, but in any case, going from "I have nothing" to "I have a database of multi-modal data from 800 subjects that is more realistic than test data" is got to worth the trouble of converting the CSVs.

Looking forward to hearing from you
Athanasios Anastasiou

Seref · 8 May 2012 11:16

Hi Athanasios,
The problem is always about time. If someone is willing to model an existing clinical data set, then for those who do not know about it, the UCI machine learning repository has some interesting clinical data sets. They’re freely available for research, and I think it would be fairly easy to use them for the type of test based we’re discussing. Just google UCI machine learning repository, and you should see what I’m talking about.
If the openEHR community has members who can put time into creating models for any of these (or other) data sets, and then turning them to valid RM serializations, I for one will not say no to that

Kind regards
Seref

ANASTASIOU_A1 · 8 May 2012 12:33

Hello Seref

Many thanks for the UCI reference, i was personally not aware of it and it's a great resource.

Well, as it seems there are plenty of "dummy but realistic" (!) dataset opportunities out there for creating a "test-base", it is indeed a matter of time and i am sorry to not have more experience with actually building archetypes, i can see the value in this and i'd definitely give it a try.

Perhaps we can create drafts though and even if these are not entirely correct they would be edited by others (?)

All the best
Athanasios Anastasiou

Heath_Frankel3 · 8 May 2012 12:53

Once again we have tooling to convert csv files to openEHR using template data schema but someone has to do the hard work of creating the archetypes, templates and transforms to make it all happen. This continues to be the blocker of this kind initiative. Let us know if anyone has the bandwidth.
Heath

system · 8 May 2012 13:31

Pablo,
The xml-binding component leverages the annotated constructors in the
RM classes for instantiating RM objects. It uses reflections
extensively. Take a look of the XMLBinding class for some inspiration.
I am sure you can adapt it for your own classes.
/Rong

pablo · 8 May 2012 21:52

Hi Heath,

I don’t want to open the scope to much at this stage. I know this is a process that will take some time. Maybe some of us can focus on artifacts and others on services & repositories.

I really like the idea of having different repositories sharing the same artifacts, this can be a good technical proof of concept of a distributed CKM. (not a new topic, but maybe a forgotten one: http://lists.openehr.org/pipermail/openehr-clinical_lists.openehr.org/2011-September/002201.html). If some of you want to open the access to your services, I can write clients for the EHRGen project to consume artifacts and evaluate how it all works together.

Kind regards,
Pablo.

pablo · 8 May 2012 22:03

Hi Heath,

The issues I mentioned were from seeing emails on the lists from other colleagues reporting problems, until now I didn’t worked with openEHR XSDs. I remember someone mentioned a problem of correspondence between XSDs and openEHR specs.

Maybe each member can mention what problems they had (Erik?, Athanasios?). Just for fun I’ve searched XSD on the lists:

https://www.google.com/search?sourceid=chrome&ie=UTF-8&q=xsd+site%3Alists.openehr.org%2Fpipermail%2Fopenehr-implementers_lists.openehr.org%2F#hl=es&sclient=psy-ab&q=xsd+site:lists.openehr.org%2Fpipermail%2Fopenehr-implementers_lists.openehr.org&oq=xsd+site:lists.openehr.org%2Fpipermail%2Fopenehr-implementers_lists.openehr.org&aq=f&aqi=&aql=&gs_l=serp.3…42653.42653.0.42798.1.1.0.0.0.0.0.0..0.0…0.0.C216hd-inng&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.,cf.osb&fp=ca1c69677034f246&biw=1280&bih=687

https://www.google.com/search?sourceid=chrome&ie=UTF-8&q=xsd+site%3Alists.openehr.org%2Fpipermail%2Fopenehr-technical_lists.openehr.org%2F#hl=es&sclient=psy-ab&q=xsd+site:lists.openehr.org%2Fpipermail%2Fopenehr-technical_lists.openehr.org&oq=xsd+site:lists.openehr.org%2Fpipermail%2Fopenehr-technical_lists.openehr.org&aq=f&aqi=&aql=&gs_l=serp.3…2087.2087.0.2601.1.1.0.0.0.0.242.242.2-1.1.0…0.0.3-xa3a0gTaY&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.,cf.osb&fp=ca1c69677034f246&biw=1280&bih=687

Please do contribute! you can add your name and attach the files here: http://www.openehr.org/wiki/display/dev/Development+test+base so there’s no mess up with current releases. Please mention what changes you have done to the XSDs here: http://www.openehr.org/releases/1.0.2/its/XML-schema/index.html
If you have some XML instances for those schemas, would be great too!

Topic		Replies	Views
bugs in domain types in XMLserializer Reference Implementation: Java (archive)	10	18	16 January 2012
Request for test XML instances from the implementer community Implementers (archive)	13	18	24 August 2012
ADL to XML Schema Technical (archive)	29	22	15 March 2005
Parsing of Archetypes/Templates Technical (archive)	10	13	12 November 2018
The Truth About XML was: openEHR Subversion => Github move progress Implementers (archive)	61	8	13 April 2013
Compact XML format...? Technical (archive)	45	35	29 November 2007
OPT version, schema & document Archetype Designer	38	2889	10 September 2024
XML Focus Group for openehr Clinical (archive)	15	13	16 September 2008
occurrences and cardinality in ADL, XML, JSON Technical (archive)	30	15	21 November 2011
lessons from Intermountain Health, and starting work on openEHR 2.x Technical (archive)	30	63	8 October 2012

How about creating an openEHR test base?

Related topics