Trying to understand the openEHR Information Model

Hi Randy

I guess it does so far at least. I guess there will only be a few back end openEHR servers in the future, one or two of which are likely to be open source.

The idea is that the query layer is further away from the implementation layer than is usual for a health care system. The way the openEHR data can be stored is still an experimental science.

Ocean has an AQL parser which then queries the openEHR repository and is now optimised in a number of ways. Some of these are independent of the storage mechanism entirely.

Cheers, Sam

Sam Heard
FRACGP, MRCGP, DRCOG, FACHI
Consultant & Chairman, Ocean Informatics
Chairman, openEHR Foundation
Chairman, NTGPE
Senior Visiting Research Fellow, University College London

by ‘sharing’ here? Most back-ends are designed as an open platform, obeying (or starting to obey) standardised service interfaces and the AQL language. So being able to replace one implementation with another is the whole idea, and certainly possible with some vendors already. - thomas

Vendors implement it in different ways, but to be honest, I think the main value isn't the straightforward part of the implementation, which isn't that complex, depending on how you do blobbing, it's the optimisations, and those depend heavily on the exact persistence layer choices.

I don't think there is an AQL engine open source yet, but in any case it only makes sense when there is an open source openEHR EHR service, which there currently is not.

- thomas

I don't think there is an AQL engine open source yet, but in any case it only makes sense when there is an open source openEHR EHR service, which there currently is not.

I don't think it is possible to write an AQL engine right now, because it is not defined well yet. One can only anticipate on what one thinks it will be.

So, my guess is it will be something between XPath and SQL. Instead of fieldnames, using paths replacing fields. I think the specification takes too long to arrive. It seems to me quite obvious what it will be.

But writing an engine, from scratch, for it will cost, more then one year for a very experienced developer, and even then.....
Companies like Oracle spend a substantial part of their developer-investment in good query-engines.

Storing data is peanuts, queriyng them is the hard part.

But there is a way around, use what others did, and that is, using path-based query-engines. There are quite some, also open source, good ones too.

It is just a matter of storing your OpenEHR datasets path-based, and query them path-based. For example, using an XML-database. Maybe there are other possibilities too.

And then you have a full featured AQL engine, as I think it will look like in the future, when the specs will be finally written. Maybe some syntax-translation is needed.

What could the AQL-specification hold for promises, which are not already delivered by XPath/XQuery right now? I cannot think of anything.

I think it would be wise for the OpenEHR community to look very well on what is already done by commercial companies and open source communities for years now, instead of reinventing the wheel, unless of course, when there are good reasons.
It would make introduction of many OpenEHR implementations much more easy, and that would be good for worldwide success for the OpenEHR specifications.

Ok, these are my two cents. I am very anxious to learn why the current XPath/XQuery-specifications are not good enough.

Have a nice sunday.

Bert.

I am very anxious to learn why the current XPath/XQuery-specifications are not good enough.Verstuurd vanaf mijn iPad

I meant to write curious instead of anxious, stupid autocorrection of iPad.

Bert

I meant to write curious instead of anxious, stupid autocorrection of iPad.

That is one dangerous–and very amusing–iPad. :slight_smile: Speak calmly to it next time. Bert, by this you got my day off to a rollicking start.

Randy

Yes, it is an IPad configured for use in Dutch, and sometimes it spontaneously starts understanding English, and sometimes it mixes both languages, and sometimes it rewrites words silently.

There is no quick way I know to change the language, so I look at all words with red lines under them. Sometimes I see a wrong word a few lines back, but it is very hard to get the cursor there.

I sometimes have an Android-tablet too ( in fact, my two sons have them, one Apple, one Android), and Android-tablets are as bad. But Android has a language-selector in the keyboard.
Since we have tablets, my family removed all computers from the living-space.

But, it gave you some pleasure. That is the good thing about it.

:wink:

Bert

Hi bert

Although risking to be a “pleasure killer” ;), but on my iPad 3 (and iPhone) I have a little globe symbol to the left of the space bar that allows toggling of languages.

As http://www.theipadguide.com/faq/how-can-i-type-different-languages-turn-international-keyboards-ipad explains it requires that you add at least a 2nd keyboard layout.

Not sure whether this requires a a newer version of iOS…

Cheers
Thilo

Hi Thilo, it is a “pleasure killer” that my iPad does not have that, I don’t know which version it is.
I bought it for my son, for Christmas, 1.5 years ago. Maybe I need to change settings somewhere.

But I must say, that I think that the original subject about AQL is much more interesting.
I would appreciate if.people would respond to that,

Thanks, Bert

Hi Bert,

Xquery wasn't stable in 2006 when we needed a query language. AQL was implemented by Ocean by 2007 and has been working since then, and something similar implemented by companies in Brazil. Later on, Marand implemented it, and I suspect someone else.

I don't know anyone who has done a serious analysis to show if Xquery could do the job. At the very least archetype identifiers and pathing would need to be catered for, but the rest might be made to work. I would welcome any kind of analysis like this.

- thomas

(attachments)

OceanInformaticsl.JPG

I am sorry, I have no time to provide a well done analysis, but I have an opinion. XQuery is stabilized in 2007, XPath is sometime longer around, but as I understand, in version 2.0 it is subset of XQuery 1.0. I am reading the O’Reilly book of Priscilla Walmsley about XQuery, she explains very thoroughly (as we are used from her). AQL as shown in the Wiki, (that is what I know of AQL), can very well be served by syntax-transformation to XPath/XQuery. Should one do that? Syntax-transformations? There is a risk. In favor of XQuery, there are query-engines available almost out of the box, open source or closed, some which are in development for 10 years, based on good indexing, and still being active developed. With all respect, but I think there has been very good work done, worldwide, and one should profit on that if possible. XQuery can also be used directly to query OpenEHR datasets. I see no reasons against this very good working solution. There is not really a need for a separate query-language. At this moment AQL is a niche and XQuery is a standard. I have read somewhere that also Cache from Intersystems in an additional module supports XQuery, but marketing language is often gibberish. One can never be sure what really is possible. Apart from that, maybe there is a wish to complete the ADL/AQL-eco-system, for those who chose not to store in XML and want to write their own AQL-query-engine on the database-concept of their choice. In that case, AQL should, in my opinion, be defined as close as possible to XPath/XQuery. I think very very close is possible and even obvious. This is, because the basic goal is the same, to offer a generic query-language. But other arguments could be: to comfort developers, to profit from what is already been done (in standard-definition and in tooling), and to provide interoperability with that part of the world, which understands XML better than ADL/AQL. But the next issue comes up. A shortcoming of the OpenEHR-documentation is the expression of the RM in XML-Schema. Derived OpenEHR-datasets can never be validated legally in XML-Schema 1.1 or 1.0. So defining the RM in a XML-Schema is quite useless, and bringing people on a dead end street. There are, however good alternatives, even better. But maybe, this is another subject Bert

Dear Bert,

The language switching button only appears (automatically) next to the space bar if you have set more than one language for your keyboard, in Settings. I only have one (UK) so I do not see that button.

With best wishes,

Dipak

Thanks, I already found it on directions of the link of Thilo Bert

that’s fine for XML data. But many implementations do not use XML as the storage format - and there are good reasons for that - XML Schema representations of object data require transformation, and have efficiency problems that have to be addressed in one way or another. The general need we have in openEHR is for an abstract query language that can be used to express queries to any openEHR (or 13606 or other archetype-based system), regardless of whether its concrete persistence happens to be in XML. If you are suggesting that we use Xquery/Xpath even for non-XML data representation cases, that’s a different conversation. It won’t work out of the box, because we use a more efficient path syntax (but which is easily convertible), and Xquery/Xpath make other assumptions due to being targetted to XML, e.g. they assume the XML attribute/element dichotomy, which doesn’t exist in normal object data; they don’t assume an object inheritance model, and so on. Nevertheless, if it could be shown that AQL could be mapped to a clean subset of Xquery/Xpath as a standard formalism, that’s likely to be useful. It would mean that those implementers who choose XML as their internal data representation would be able to use standard products out of the box, as you say. Others might be able to some components, e.g. Xquery parsers in order to build a query engine that talks to non-XML data. well it’s a bit more that that - it’s to define a query language that is a) based on the logical content models of the data and b) needs to know nothing about the concrete persistence representation of the data. The query language also has to support terminology-based query expressions and subsumption. But if it can be aligned, let’s do it. It just needs someone to do the work. Do you mean just that the Release 1.0.2 XSDs need to be better designed? We certainly know that, and welcome any proposals on that (of which there are already many). Not sure what you are saying here, Bert. XML openEHR data is regularly used as an exchange format for applications and systems. Can you explain a bit better what you mean by the above? - thomas

I don’t see any transformation needed, only leaf-data are stored as data, and that are always simple data, not objects, there is no transformation needed, no efficiency lost. There are no proven efficiency-problems in XML, that is only a story, from a bad research with lack of details, we had that discussion. The technique is over 15 years an industry-standard for many purposes. But I understand your point, we can discuss that without bashing XML: You are saying that people may want to use another storage than XML-databases, and than they can’t use XQuery. You are right, but can they use AQL? There is only an incomplete definition of AQL in a Wiki, that had no substantial changes since long time, thus hardly any progress. There is no guarantee that the Wiki is stable. I think you know what kind of effort and the risk is to write a new query-engine on a new language-concept for any database-concept of choice. Seref said it to Randolph a few days ago, there isn’t hardly any work done by third parties, only two implementations of AQL, and in the same sentence he calls AQL the almost most important part of the OpenEHR eco-system. Quote of Seref in this context: One could, reading this, starting to doubt if OpenEHR can exist without a query language, I think Seref is right. It cannot. And then there is no stable specification? Also consider this. How can two companies have implemented AQL if there is no stable definition? How much money do they put at stake with uncertain result? These are It brings me to the conclusion that for third parties, there is only one way to go, and that is XML, and XQuery, there is no other way to get an OpenEHR system ready at this time and the coming few years. The query language is one difficult part, the other difficult part is validation. Both can be solved using standard industry-tools, I come back to this at the end of this message. And I am not talking about MLHIM. :wink: The OpenEHR eco-system for XML is ready and full of features. I don’t say, XML is the only way, to write kernel. But it has many advantages, because of the wide industry-support, and the thousands of man-years development in that. Choosing any other solution means having to write an query engine for a query language which still is not declared stable, and having to write a validation-tool which, as far as I know, only exist for DADL. Implementing OpenEHR for a software-vendor, not using XML, is hardly an option. By chance, tomorrow I go to Intersystems, for a technical introduction for Cache and tooling. I am specially interested in (proprietary) path-based query-formalism they support. I ask them for XQuery-support. I’ve read on their website, it was possible. It is not surprising when their proprietary path-based query-formalism is very much like XPath. This is because how can a serious database-vendor nowadays live without XQuery-support? All big database-vendors support XML-structures, and they also support XQuery. Check Microsoft, check Oracle, XML is here to stay, and that is so since 15 years. XPath2.0 (which is a subset of XQuery 1.0) is very similar to the path-based AQL, easily convertible, as you call it. Maybe not 100% mapping, that can only be said after there is a stable AQL definition, but those from the AQL-Wiki can. I did similar once, writing a virtual query-engine. It was an engine which could query “mapped-to-objects-third-party-likely-structured-databases” (Excusez le mots). It could connect to an old COBOL database, a MUMPS hierarchical database, an API-based database, and some SQL databases. They all could, on this product, be approached over SQL, to a common simplified virtual datamodel. That was the goal, and it worked, more or less. It is about the same you propose in this sentence. Using the grammar coming out of a query-engine to use it on another database-concept with a likely but not the same structure and probably other kind of optimizations. It is very difficult to do something like that. It will cost man-months/years to get it fast performing and more or less bug-free. The easy part, simple selects will take a few months, but then, optimizing in different kind of indexes, also user defined indexes, multi-user, unions, sub-selects, aggregations, authorization. It is not easy at all, and I would definitely not advise a company to go this way. As far as I can see from the Wiki, AQL is not going any advanced way, but it looks very obvious as one can expect from a generic query-language. I don’t know the state of art what Seref is talking about when he says that AQL is implemented by two vendors. Terminology can possibly be done by preprocessing, depending of course on the terminology. No, I mean that it is impossible to represent RM 1.0.2 in W3C XML Schema. It is unusable. You cannot validate any XML-dataset modeled from an CKM archetype against the XSD’s on the OpenEHR website regardless of the XML-Schema-version. It is simply impossible, illegal. OpenEHR is breaking several XML-Schema rules. XML Schema in any version is not ready for multi level modeling. With some tricks it can be done, I do that now, but that is not very elegant. But I have not found any reason why it cannot be defined in RelaxNG, which is a widely used Oasis standard. But, I must admit, I am not completely ready researching this, but I am for more 80%. It relaxes on the points where XML Schema has its blocking restrictions. It looks promising, I will let you know, I think, end next month, when I start working again on this. My goal however is not only to represent OpenEHR in a schema-language, but everything that can be defined in ADL 1.4, so including OpenEHR. And the translation from ADL to schema needs to be done automatically. Oasis, as you will know is an industry standardization organization, it is Domain Member of OMG, and it is also sponsored by OMG (and the members of OMG). There are several RelaxNG schema definitions which made it to ISO-standard, it is stable for many years now. I am sorry to say. Writing a W3C XML Schema representing an archetype, and conforming the base-schema’s published on the OpenEHR website. It is not possible, not even for one single archetype from CKM. So the XSD’s are useless, meaning, there is no way they are useful. The conversion to the exchange-format cannot be validated against a constrained schema representing the archetype in which they are defined. I am pretty sure in this. You know what you have for source-data, maybe objects in Cache, or DADL or path/value-combinations. But you don’t know if the target/exchange XML-data are still valid. You can guess they are, but you cannot proof they will always be valid. I think, validation after data-transforming is very important. A guess should not be good enough. Bert

well it has been complete enough to be implemented and used in production systems for some years now. You are right, there are some unfinished bits, but they are not key elements - they don’t prevent large scale systems using AQL. well, again, the specification actually has been stable for a long time. It has not been made official like the other specifications (that should happen this year), and it probably should have been earlier, but I guess this way we have a lot of industry knowledge about it now, so we know it works. it hasn’t been a problem for the implementing companies. I don’t understand why you would say that, there are many already running. documents systems in production in clinical environments. An AQL implementation is actually a lot easier than you think, assuming that the main data are stored in blobs. that’s not at all the case. It’s perfectly normal to implement the whole system in Java, C#, Python, Ruby, whatever, and use numerous kinds of native storage, object storage, it could be MUMPs, relational+blob storage, XML as well. But there is nothing that I can think of in XML technology that makes it more attractive than anything else as a basis for implementing a core system (it’s more or less unavoidable on for interfacing). XML is one option, there are many others, and they work well. same as AQL in fact. There is no getting away from paths :wink: I’m not really disagreeing here, it’s just that noone has done a recent analysis on whether Xquery or a subset will do what AQL does. you are over-estimating the difficulty. The only thing that would make it hard is using an orthodox 3rd NF relation table design, but that should be avoided anyway, for multiple reasons. Connecting AQL to a well-designed back-end is really not hard. Optimising takes some work, but that’s to be expected. Apparently some successful companies did not take your advice :wink: which rules is it breaking? As far as I know, openEHR XML documents validate normally against the schemas. well, we already had that debate. It’s not what we use it for - we don’t do any ‘modelling’ in XSD, it’s just an interoperability schema. I am inclined to agree with this, from my limited research into Relax NG, it seems significantly better designed than XSD. I think you should target ADL 1.5, because ADL 1.4 has one or two errors in it, plus quite a few limitations. but they are widely used. So there is something not right in what you are saying. Are you referring to the XSDs for archetypes, or for the RM? ok - so here, you mean - how can it be proved that the XSD-based XML representation of the data are in fact a faithful representation of the original object data? As far as I know, the conversion of object to XML based on the XSDs is not 100% lossless in all cases. That’s the price of using XSD, but unfortunately huge parts of industry want it. If you are saying that we should publish Relax NG schemas and advertise those as being ‘safe to use’, I say: let’s do it. I’m all for that. - thomas

which rules is it breaking? As far as I know, openEHR XML documents validate normally against the schemas.

yes, I said it wrong, later in the message I said it better and I forgot to remove this statement.

So let me correct myself:

You cannot represent all Archetype constraints in XML schema, you can of course validate against the master scheme, but that is not very interesting. To validate you need to validate against the constraints. That is the important point of multi level modeling.

I discovered some important problems, besides the restriction/extension structure, which is quite disturbing. You are not allowed to restrict and extend an derived element at the same time. Just for clarity, A restriction in deriving in XML Schema is not the same as constraining in ADL.
Read the Priscilla Walmsley book on this, she explains it very well.

There are ways around this, but it is not very elegant.

Another very important restriction for using XML Schema, in my opinion, is that you cannot have two or more elements with the same name but a different data type. This data type must be in detail the same. XML Schema regards an Element with a Dv_Text as a different datatype from an Element with a Dv_CodedText.

Both elements will be called "items" in an XML schema representing an OpenEhr data structure, and thus is not allowed having them different details in data types. This brought Tim Cook to using the GUIDs in the element-names, which is unworkable in my opinion, and above all, probably unnecessary, because in RelaxNG this restriction does not exist.

Other tricks are also possible, for example augmenting element-names during validation-time, but also that is cumbersome code, and that just for avoiding the problems of an ivory tower stupid W3C standard?

So this is indeed an important restriction, which makes the clean use of XML Schema impossible in OpenEhr-rm, or any other ADL based multi level modeling system. Dirty use, tricks, ignoring validation errors, etc of course remain possible.

There are more restrictions, but less important. For example it is not possible to support the Dv_Time constraint/pattern hh:??:??, same for Dv_DateTime. In the Dv_Date is also a problem, but can be worked around by the "alternative" rule, but on another way then it is meant to use.

Anyway, after a few weeks I will probably define the OpenEhr RM and all possible constraints in RelaxNG.

Bert

well, we already had that debate. It's not what we use it for - we don't do any 'modelling' in XSD, it's just an interoperability schema.

Sorry I explained it again, Maybe someone reading had missed it.

:slight_smile:

I think we agree largely on this.

Bert

Verstuurd vanaf mijn iPad

which rules is it breaking? As far as I know, openEHR XML documents validate normally against the schemas.

yes, I said it wrong, later in the message I said it better and I forgot to remove this statement.

So let me correct myself:

You cannot represent all Archetype constraints in XML schema, you can of course validate against the master scheme, but that is not very interesting. To validate you need to validate against the constraints. That is the important point of multi level modeling.

that's true if you try to use XSD in its native form. I have been saying the same thing for years. But you can represent archetypes in XML in another way - as a straight object serialisation of an AOM structure. Have a look at the XML output of the current ADL workbench. I didn't create an XSD for that, but it would certainly be possible.

The XML format used by the Archetype Editor is of the latter form.

I discovered some important problems, besides the restriction/extension structure, which is quite disturbing. You are not allowed to restrict and extend an derived element at the same time. Just for clarity, A restriction in deriving in XML Schema is not the same as constraining in ADL.
Read the Priscilla Walmsley book on this, she explains it very well.

yes, and she is correct, it's a mess. See my comments to Tim earlier :wink: But there is no danger of openEHR doing this, I think, since we know it won't work effectively. That's why all the

There are ways around this, but it is not very elegant.

Another very important restriction for using XML Schema, in my opinion, is that you cannot have two or more elements with the same name but a different data type. This data type must be in detail the same. XML Schema regards an Element with a Dv_Text as a different datatype from an Element with a Dv_CodedText.

Both elements will be called "items" in an XML schema representing an OpenEhr data structure, and thus is not allowed having them different details in data types. This brought Tim Cook to using the GUIDs in the element-names, which is unworkable in my opinion, and above all, probably unnecessary, because in RelaxNG this restriction does not exist.

Other tricks are also possible, for example augmenting element-names during validation-time, but also that is cumbersome code, and that just for avoiding the problems of an ivory tower stupid W3C standard?

So this is indeed an important restriction, which makes the clean use of XML Schema impossible in OpenEhr-rm, or any other ADL based multi level modeling system. Dirty use, tricks, ignoring validation errors, etc of course remain possible.

There are more restrictions, but less important. For example it is not possible to support the Dv_Time constraint/pattern hh:??:??, same for Dv_DateTime. In the Dv_Date is also a problem, but can be worked around by the "alternative" rule, but on another way then it is meant to use.

Anyway, after a few weeks I will probably define the OpenEhr RM and all possible constraints in RelaxNG.

I agree with most of this, but I don't understand the issue - we don't do any of the above anyway. That's why we have ADL, AOM, and object transforms of the AOM... am I missing something?

- thomas

have ADL, AOM, and object transforms

What is missing is that xml offers validation and query out of the box, which means it has been developed and optimized for years by many companies and communities, and mostly is good quality software.