The Truth About XML was: openEHR Subversion => Github move progress

Hi Tom,

I have amended the Subject Line since the thread has diverged a bit.

[comments inline]

one of the problems with LinkEHR (which does have many good features) is
that it is driven off XSD. In principle, XSD is already a deformation of any
but the most trivial object model, due to its non-OO semantics. As time goes
on, it is clear that the XSD expression of data models like openEHR, 13606
etc will be more and more heavily optimised for XML data. This guarantees
such XSDs will be a further deformation of the original object model - the
view that programmers use.

I agree with you that you cannot represent an object model, fully, in
XML Schema language.
However, you seem to promote the idea that object oriented modelling
is the only information modelling approach[1].
This is a critical failure. The are many ways to engineer software
using many different modelling approaches.
So abstract information modelling, as you have noted, does not
necessarily fit all possible software modelling approaches and it is
unrealistic to think that it does. In desiging the openEHR model you
chose to use object oriented modelling. The openEHR reference
implementation uses a rather obscure, though quite pure,
implementation language, Eiffel. I think that history has shown that
this has caused some issues in development in other object oriented
languages.

So now if you build archetypes based on the XSD,
you are not defining models of object data that software can use (apart from
the low layer that deals with XML data conversion). I am unclear how any
tool based on XSD can be used for modelling object data (and that's nearly
all domain data in the world today, due to the use of object-oriented
programming languages).

I think that if you look, you will find that "nearly all of the domain
data in the world" exists in SQL models, not object oriented models.
So this is a rather biased statement designed to fit your message.
Not a representation of reality.

That said, the abstract concept of multi-level modelling, where there
is the separation of a generic reference model from the domain concept
models is very crucial. Another crucial factor is implementability; as
promoted by the openEHR Foundation mantra, "implementation,
implementation, implementation".

The last and possibly most crucial issue relates to implementability,
which is the availability of a talent pool and tooling. In order to
attract more than a handful of users to a technology there needs to
exist some level of talent as well as robust and commonly available
tools.

The two previous paragraphs are the reasons that the Multi-Level
Healthcare Information Modelling (MLHIM) project exists.

MLHIM is modeled from the ground up around the W3C XML Schema Language 1.1 [2] .
The reason for this is that the family of XML technologies are the
most ubiquitous tools throughout the global information processing
domain today. There is a significant number of open source and
proprietary tools from parser/validators to various levels of editors,
readily available. While serious XML development is not taught in all
university computer science programs, every student does get
introduced to XML in some manner.

The relationship of XML with emerging knowledge modelling tools like
Protégé in languages such as OWL[3] and vocabularies expressed in
RDF/XML[4] is an obvious advantage. There is an enormous skills pool
available for using XML data with REST APIs and in translating XML to
JSON for over-the-wire communications. There are thousands of websites
with information on how to do these things. It is irrelevant which
programming language you choose to use; Java, Eiffel, Ruby, Lua,
Python, etc. there are XML binding tools and access to XML validators.
There are tried and true methods of storing XML data in SQL databases,
XML databases and NoSQL databases. XQuery and XPath are very robust
and well known. Another big advantage is having the ability to do data
validation using commonly available tools in a complete path; from the
instance data to concept model to the reference model to the W3C XML
Schema specification to the W3C XML specification,

With XML Schema Language 1.1, we have the ability to build complex
structures using substitution groups and do very intricate data
analysis and validation, across models, using XPath in assert
statements. All without having to resort to RelaxNG or Schematron.
There are also tools and experience in using XML Schemas to
automatically generate generic XForms for presentation and data entry.
So maybe we had to make concessions in deciding to use XML technology
in MLHIM. However, I cannot think of anything that is missing at this
point.

In developing the MLHIM project we took the best ideas from HL7v3 and
openEHR. Then we sliced away the things that are not required to
maintain a robust healthcare concept modelling infrastructure. You can
get a copy of the XMind based template model here:
https://github.com/mlhim/tools.git There are detailed examples of
Concept Constraint Definitions (CCDs), instance data as well as
documentation for them (and the reference model) here:
https://github.com/mlhim/tech-docs.git There are many more places to
get inforamtion about and tools used for MLHIM and the various
sub-projects. A primary starting location is the MLHIM Community on
Google Plus http://gplus.to/MLHIMComm

The 2.4.2 release of the reference model is scheduled for release on
30 April, 2013. It has been informed by PhD and MSc projects as well
as other implementation projects. We are currently developing a
web-based tool to make CCD development even easier for non-technical
domain modelling experts. Access is currently restricted but if anyone
would like to participate in evaluation of the tool by build CCDs then
please send me an email privately and I can arrange to get access for
you.

The MLHIM eco-system supports development of CCDs without the need for
a consensus. The knowledge modeller has full autonomy to build models
appropriate for their application(s) ranging from small purpose
specific device apps to large institutional EHRs. There is no need to
be constrained by what has been developed previously. Of course, a
good system supports re-usability of high quality components. But
fitness for purpose is left to the model user. More than 1500
published CCDs are currently available in the Healthcare Knowledge
Component Repository at http://www.hckr.net

I want to close by saying that I am grateful for the work done in the
openEHR community. In my more than ten years of involvement with five
years on the ARB, I learned a lot! I learned how to do things right
as well as what can go wrong. MLHIM represents those lessons learned.

Regards,
Tim

[1] - http://en.wikipedia.org/wiki/Software_modelling

[2] - http://www.w3.org/TR/xmlschema11-1/

[3] - http://en.wikipedia.org/wiki/Web_Ontology_Language

[4] - http://www.w3.org/TR/rdf-syntax-grammar/

Hi Tim, I don’t see any problem here. The extant open ‘reference implementation’ of openEHR has been in Java for years now, and secondarily in Ruby () and C# (). The original Eiffel prototype was from nearly 10 years ago and was simply how I prototyped things from the GEHR project, while other OO languages matured. I am not sure that we have suffered any critical failure - can you point it out? ok, so I’ll clarify what I meant a bit: most domain (i.e. industry vertical) applications are being written in object languages these days - Java, Python, C#, C++, Ruby, etc. The software developer’s view of the data is normally via the ‘class’ construct of those languages. You are right of course that the vast majority of the data physically resides in some RDBMS or other. However, the table view isn’t the primary ‘model’ of the data for I would guess a majority of software systems development these days. There are of course major exceptions - systems written totally or mainly in SQL stored procedures or whatever, but new developments don’t tend to go this route. In terms of sheer amount of data, these latter systems are probably still in the majority - since tax databases, military systems etc, legacy bank systems are written this way, but in terms of numbers of software projects, I am pretty sure the balance is heavily in the other direction. well, since the primary openEHR projects are in Java, Ruby, C#, PHP, etc, I don’t see where the disconnect between the projects and the talent pool is. I think if you look at the , and also the , you won’t find much that doesn’t connect to the mainstream. <NB: in the below I am talking about the industry standard XSD 1.0, not the 9-month old XML Schema 1.1 spec> well I don’t really have anything to add to any of that. For the moment, industry (including openEHR, which publishes XSDs for all its models for years now) is still using XML, although . But XML schema as an language has been of no serious use, primarily because its inheritance model is utterly broken. There are two competing notions of specialisation - restriction and extension. Restriction is not a tool you can use in object-land because the semantics are additive down the inheritance hierarchy, but you can of course try and use it for constraint modelling. Although it is generally too weak for anything serious, and most projects I have seen going this route eventually give in and build tools to interpolate Schematron statements to do the job properly. Now you have two languages, plus you are mixing object (additive) and constraint (subtractive) modelling. Add to this the fact that the inheritance rules for XML attributes and Elements are different, and you have a modelling disaster area. James Clark, designer of Relax NG, sees inheritance in XML as a design flaw (from ): Difficulties in using type restriction (i.e. subtyping) in XSD seem well-known - . Not to mention the inability to deal with generic types of any kind, e.g. Interval, necessitating the creation of numerous fake types. And of course, the underlying inefficiency and messiness of the data are serious problems as well. Google and Facebook (and I think Amazon) don’t move data around internally in XML for this reason. None of this is to say that XML or XML-schema can’t be ‘used’ - I don’t know of any product or project in openEHR space that doesn’t use it somewhere, and of course it’s completely ubiquitous in the general IT world. What I am saying here is that the minute you try to express your information model primarily in XSD, you are in a world of pain. My lessons from projects using XSD are: I know there are all kinds of tricks to mitigate these problems, I’ve seen a lot of them. The fact that there is a mini-tech sector around XSD problem mitigation / optimisation testifies to the difficulty of this technology. XML Schema 1.1 introduces useful things that may reduce some of the above problems (good overview ), however as far as I can tell, its inheritance model is not much better than XSD 1.0 (although you can now inherit attributes properly, so that’s good). well I guess the main thing is seamlessness between your information model and your programming model view. I am not saying it’s the only way, but the approach in openEHR was oriented towards making sure that expressions of the information model, including all its semantics, are as close as possible to the software developer’s programming model. If we had done the primary specifications in XML, there would always be a significant disconnect between the models and the software (actually, the specs would have been nearly impossible to write). Not to mention, life would be hard for working with all the other data formats now in use, including JSON, and various binary formats. An approach that has emerged in industrial openEHR systems in the last few years is to message XSDs from templates - 1 XSD per template, and write a generic XML <=> canonical data conversion gateway. This means we can do all modelling in powerful formalisms like UML 2, EMF/Ecore (for the information models) and all constraint modelling in ADL / AOM 1.5, and treat XML as one possible data transport. From what I can see, the major direction in information modelling for the future will be Eclipse Modelling Framework, using Ecore-based models. This is where I think the computational expression of openEHR’s Reference Model will move to. The OHT Model Driven Health Tools (MDHT) project is already showing the way on this, at the same time adopting ADL 1.5 concepts for constraint modelling. I have no experience with XSD 1.1, and I think it will be years before mainstream industry catches up with it. But it may be that it does what is needed. we’ll obviously differ on our analysis of what is the best modelling formalism. The above are the conclusions I have come to over the years. Others may have other, better ideas, and it may be that an XSD 1.1 modelling effort in openEHR could make sense. I think the key thing would have been to ensure that the archetypes could be shared across openEHR and MLHIM. Archetypes are pretty widely used these days, and there are many projects now creating them. I don’t know if this is still possible; if not, it presents clinicians with the dilemma: model in ADL/AOM, or model in MLHIM? Replicated models aren’t fun to maintain… - thomas

Hi Tim, I don’t see any problem here. The extant open ‘reference implementation’ of openEHR has been in Java for years now, and secondarily in Ruby () and C# (). The original Eiffel prototype was from nearly 10 years ago and was simply how I prototyped things from the GEHR project, while other OO languages matured. I am not sure that we have suffered any critical failure - can you point it out? ok, so I’ll clarify what I meant a bit: most domain (i.e. industry vertical) applications are being written in object languages these days - Java, Python, C#, C++, Ruby, etc. The software developer’s view of the data is normally via the ‘class’ construct of those languages. You are right of course that the vast majority of the data physically resides in some RDBMS or other. However, the table view isn’t the primary ‘model’ of the data for I would guess a majority of software systems development these days. There are of course major exceptions - systems written totally or mainly in SQL stored procedures or whatever, but new developments don’t tend to go this route. In terms of sheer amount of data, these latter systems are probably still in the majority - since tax databases, military systems etc, legacy bank systems are written this way, but in terms of numbers of software projects, I am pretty sure the balance is heavily in the other direction. well, since the primary openEHR projects are in Java, Ruby, C#, PHP, etc, I don’t see where the disconnect between the projects and the talent pool is. I think if you look at the , and also the , you won’t find much that doesn’t connect to the mainstream. <NB: in the below I am talking about the industry standard XSD 1.0, not the 9-month old XML Schema 1.1 spec> well I don’t really have anything to add to any of that. For the moment, industry (including openEHR, which publishes XSDs for all its models for years now) is still using XML, although . But XML schema as an language has been of no serious use, primarily because its inheritance model is utterly broken. There are two competing notions of specialisation - restriction and extension. Restriction is not a tool you can use in object-land because the semantics are additive down the inheritance hierarchy, but you can of course try and use it for constraint modelling. Although it is generally too weak for anything serious, and most projects I have seen going this route eventually give in and build tools to interpolate Schematron statements to do the job properly. Now you have two languages, plus you are mixing object (additive) and constraint (subtractive) modelling. Add to this the fact that the inheritance rules for XML attributes and Elements are different, and you have a modelling disaster area. James Clark, designer of Relax NG, sees inheritance in XML as a design flaw (from ): Difficulties in using type restriction (i.e. subtyping) in XSD seem well-known - . Not to mention the inability to deal with generic types of any kind, e.g. Interval, necessitating the creation of numerous fake types. And of course, the underlying inefficiency and messiness of the data are serious problems as well. Google and Facebook (and I think Amazon) don’t move data around internally in XML for this reason. None of this is to say that XML or XML-schema can’t be ‘used’ - I don’t know of any product or project in openEHR space that doesn’t use it somewhere, and of course it’s completely ubiquitous in the general IT world. What I am saying here is that the minute you try to express your information model primarily in XSD, you are in a world of pain. My lessons from projects using XSD are: I know there are all kinds of tricks to mitigate these problems, I’ve seen a lot of them. The fact that there is a mini-tech sector around XSD problem mitigation / optimisation testifies to the difficulty of this technology. XML Schema 1.1 introduces useful things that may reduce some of the above problems (good overview ), however as far as I can tell, its inheritance model is not much better than XSD 1.0 (although you can now inherit attributes properly, so that’s good). well I guess the main thing is seamlessness between your information model and your programming model view. I am not saying it’s the only way, but the approach in openEHR was oriented towards making sure that expressions of the information model, including all its semantics, are as close as possible to the software developer’s programming model. If we had done the primary specifications in XML, there would always be a significant disconnect between the models and the software (actually, the specs would have been nearly impossible to write). Not to mention, life would be hard for working with all the other data formats now in use, including JSON, and various binary formats. An approach that has emerged in industrial openEHR systems in the last few years is to message XSDs from templates - 1 XSD per template, and write a generic XML <=> canonical data conversion gateway. This means we can do all modelling in powerful formalisms like UML 2, EMF/Ecore (for the information models) and all constraint modelling in ADL / AOM 1.5, and treat XML as one possible data transport. From what I can see, the major direction in information modelling for the future will be Eclipse Modelling Framework, using Ecore-based models. This is where I think the computational expression of openEHR’s Reference Model will move to. The OHT Model Driven Health Tools (MDHT) project is already showing the way on this, at the same time adopting ADL 1.5 concepts for constraint modelling. I have no experience with XSD 1.1, and I think it will be years before mainstream industry catches up with it. But it may be that it does what is needed. we’ll obviously differ on our analysis of what is the best modelling formalism. The above are the conclusions I have come to over the years. Others may have other, better ideas, and it may be that an XSD 1.1 modelling effort in openEHR could make sense. I think the key thing would have been to ensure that the archetypes could be shared across openEHR and MLHIM. Archetypes are pretty widely used these days, and there are many projects now creating them. I don’t know if this is still possible; if not, it presents clinicians with the dilemma: model in ADL/AOM, or model in MLHIM? Replicated models aren’t fun to maintain… - thomas

I should make it clear that the above is not exhaustive or definitive - there is the openEHRgen framework using Groovy, at least one openEHR PHP product and much more diversity out there. - thomas

Hi Tom,

However, you seem to promote the idea that object oriented modelling
is the only information modelling approach[1].
This is a critical failure. The are many ways to engineer software
using many different modelling approaches.

I don't see any problem here. The extant open 'reference implementation' of
openEHR has been in Java for years now, and secondarily in Ruby (openEHR.jp)
and C# (codeplex.com). The original Eiffel prototype was from nearly 10
years ago and was simply how I prototyped things from the GEHR project,
while other OO languages matured.

I am not sure that we have suffered any critical failure - can you point it
out?

If you re-read the paragraph you will note that I said that the assumption
that OO modelling is mandatory, is a critical failure; not any type of
language count.

well, since the primary openEHR projects are in Java, Ruby, C#, PHP, etc, I
don't see where the disconnect between the projects and the talent pool is.
I think if you look at the 'who is using it' pages, and also the openEHR
Github projects, you won't find much that doesn't connect to the mainstream.

The discussion about talent pool is about the data representation and
constraint languages.
XML and ADL. The development languages are common across the application domain.
I know that you believe that ADL is superior because it was designed
specifically to support
the openEHR information model. It is an impressive piece of work, but
this is where its value falls off.
XML has widespread industry acceptance and plethora of development and
validation tools against a global
standard.

<NB: in the below I am talking about the industry standard XSD 1.0, not the
9-month old XML Schema 1.1 spec>

The industry standard XML Schema Language is 1.1. The first draft was
published in April 2004
making it nine years old,

well I don't really have anything to add to any of that. For the moment,
industry (including openEHR, which publishes XSDs for all its models for
years now) is still using XML, although one has to wonder how long that will
go on.

A curious prognostication indeed.

But XML schema as an information modelling language has been of no serious
use, primarily because its inheritance model is utterly broken. There are
two competing notions of specialisation - restriction and extension.

Interesting. I believe that the broader industry sees them as
complimentary, not competing.

Restriction is not a tool you can use in object-land because the semantics
are additive down the inheritance hierarchy, but you can of course try and
use it for constraint modelling.

Restriction, as its name implies, is exactly intended and very useful
for constraint modelling.
Constraint modelling by restriction is, as you know, the corner-stone
of multi-level modelling.
Not OO modelling. Which is, of course, why openEHR has a reference
model and a constraint model.
They are used for the two complimetary aspects of multi-level modelling.

Although it is generally too weak for
anything serious, and most projects I have seen going this route eventually
give in and build tools to interpolate Schematron statements to do the job
properly. Now you have two languages, plus you are mixing object (additive)
and constraint (subtractive) modelling.

Those examples you are referring to are not using XML Schema 1.1.
Or at least not in its specified capacity. There is no longer a need
for RelaxNG or Schematron to be mixed-in.
Your information on XML technologies seems to be quite a bit out of date.

Add to this the fact that the inheritance rules for XML attributes and
Elements are different, and you have a modelling disaster area.

I will confess that XML attributes are, IMHO, over used and inject
semantics into a model
that shouldn't be there. For example HL7v3 and FHIR use them extensively.

James Clark, designer of Relax NG, sees inheritance in XML as a design flaw
(from http://www.thaiopensource.com/relaxng/design.html#section:15 ):

Of course! But then you are referencing an undated document by the
author of a competing/complimentary tool,
that is announcingannounces RelaxNG as new AND its most recent
reference is 2001.
So, my guess is that it is at least a decade old. Hardly a valid opinion today.

Difficulties in using type restriction (i.e. subtyping) in XSD seem
well-known - here. Not to mention the inability to deal with generic types
of any kind, e.g. Interval<Date>, necessitating the creation of numerous
fake types.

Hmmmmm, what is wrong with xs:duration?
I don't think I understand what you mean by "fake types".

And of course, the underlying inefficiency and messiness of the data are
serious problems as well. Google and Facebook (and I think Amazon) don't
move data around internally in XML for this reason.

That is kind of vague. Can you expand on this?
The fact taht any other domain does or doesn't use XML is really pretty
irrelevant to multi-level modelling in healthcare. I am comfortable in assuming
that none of them use ADL for anything. So the comparison is quite the
red-herring.
I think that limiting the conversation to multi-level modelling in
healthcare is
an appropriate approach. Otherwise, it is kind of pointless.

None of this is to say that XML or XML-schema can't be 'used' - I don't know
of any product or project in openEHR space that doesn't use it somewhere,
and of course it's completely ubiquitous in the general IT world. What I am
saying here is that the minute you try to express your information model
primarily in XSD, you are in a world of pain.

I will admit that expressing the MLHIM information model in XML Schema
1.1 terms
and then developing the actual implementation was a challenge at first.
But if you take a look today you will see that it is quite easy to understand,
standards compliant and fully functional.
The original challenge was to overcome my predjudice against XML.

My lessons from projects using XSD are:

XSDs are good for one thing: describing the contents of XML documents.
That's it.

Seems to be a pretty useful goal if you have XML documents that
contain your data.

but what we need are models that can describe data, software, documents,
documentation, interfaces, etc

But these are all VERY different artifacts and require different
models, tools and langauges.

get imported data out of XML as soon as possible, and into a tractable
computational formalism

Very much like get data out of ADL as soon as possible. Once you build
some tools to do that.

treat XSDs as interface specifications, to be generated from the underlying
primary information models, not as any kind of primary expression in their
own right
Define XSDs with as little inheritance as possible, avoid subtyping, i.e.
define types as standalone, regardless of the duplication.

I am not sure you understand XML Schema 1.1. Again you seem to be
approaching multi-level
modelling in healthcare as if OO modelling were the only choice. This
is the "I have a hammer,
everything is a nail" approach. It isn't very effective in the real
world where various tools
are need to solve various problems.

Maximise the space optimisation of the data, no matter what it takes. It
usually requires all kinds of tricks, heavy use of XML attributes, structure
flattening from the object model and so on. If you don't do this, any XML
data storage or will cost twice what it should and web services using XML be
horribly slow.

So, in your opinion, should you build your APIs in ADL? Of course not.
I fail to see your arguments against using XML for what it is designed for;
data representation and constraint modelling. Of course then you have all of
the related tools such as XSLT, SOAP, WSDL, XSLT, XPath, XQuery, etc.
for other tasks.
A fairly complete suite. There isn't a real, practical reason to
re-invent them. They make the interactions
smoother and easier understood by the IT industry as compared to using
a domain specific language.

I know there are all kinds of tricks to mitigate these problems, I've seen a
lot of them. The fact that there is a mini-tech sector around XSD problem
mitigation / optimisation testifies to the difficulty of this technology.

I do not consider anything inside the specification to be a "trick".
It seems pretty straight forward to me. Use cases were presented,
solutions specified and documented and industry adopted. What is
tricky about that?

XML Schema 1.1 introduces useful things that may reduce some of the above
problems (good overview here), however as far as I can tell, its inheritance
model is not much better than XSD 1.0 (although you can now inherit
attributes properly, so that's good).

It is not an OO language. If you are judging it based on those
characteriscs; please
see my critical failure comment above. OO is not the be all, end all
solution in
computer science. Much less, in multilevel modeling.

well I guess the main thing is seamlessness between your information model
and your programming model view. I am not saying it's the only way, but the
approach in openEHR was oriented towards making sure that expressions of the
information model, including all its semantics, are as close as possible to
the software developer's programming model. If we had done the primary
specifications in XML, there would always be a significant disconnect
between the models and the software (actually, the specs would have been
nearly impossible to write). Not to mention, life would be hard for working
with all the other data formats now in use, including JSON, and various
binary formats.

At the point in time when you developed ADL, you had no choice. In the
late 1990s, XML Schema 1.0 was broken.
It was only slightly better than using DTDs. But the IT industry
advances very rapidly.
Keeping in touch with technology changes is crucial.

An approach that has emerged in industrial openEHR systems in the last few
years is to generate message XSDs from templates - 1 XSD per template, and
write a generic XML <=> canonical data conversion gateway. This means we can
do all modelling in powerful formalisms like UML
2, EMF/Ecore (for the
information models) and all constraint modelling in ADL / AOM 1.5, and treat
XML as one possible data transport.

EMF/Ecore will be nice when they finally can generate standards
compliant XSDs without
Ecore cruft in them. At this point, once you use Ecore, you are
infected and can't leave.
I really wanted to use EMF for MDD. Maybe, someday.

From what I can see, the major direction in information modelling for the
future will be Eclipse Modelling Framework, using Ecore-based models. This
is where I think the computational expression of openEHR's Reference Model
will move to. The OHT Model Driven Health Tools (MDHT) project is already
showing the way on this, at the same time adopting ADL 1.5 concepts for
constraint modelling.

(see above comment)

I have no experience with XSD 1.1, and I think it will be years before
mainstream industry catches up with it. But it may be that it does what is
needed.

Well, I can't predicate how long it will take it to be used on a broader basis.
Probably, like most things, as people need the capability. Sometimes,
people resist change.
It takes them outside their comfort zone and they don't like it.

I can tell you that XML Schema 1.1 is very functional. It is
supported by open source and proprietary
tools and it is working quite well, without tricks, in MLHIM.

we'll obviously differ on our analysis of what is the best modelling
formalism. The above are the conclusions I have come to over the years.

I am not looking for "the best" modelling formalism. I am looking for
what works and is simple
to implement in order to move forward the main and necessary concept
of multi-level modelling
so that we can solve the semantic interoperability issue between
healthcare applications from purpose
specific mobile apps to enterprise systems.

Others may have other, better ideas, and it may be that an XSD 1.1 modelling
effort in openEHR could make sense.

I think the key thing would have been to ensure that the archetypes could be
shared across openEHR and MLHIM. Archetypes are pretty widely used these
days, and there are many projects now creating them. I don't know if this is
still possible; if not, it presents clinicians with the dilemma: model in
ADL/AOM, or model in MLHIM? Replicated models aren't fun to maintain...

I am not sure that there is any requirement for mapping.

While there are a number of people producing openEHR archetypes.
AFAICT tell there are only a dozen or so
that are in compliance with the openEHR specifications.
Specifically the "Knowledge Artefact Identification" document.
To address a couple of issues I have with the current openEHR eco-system:

Section 2.2 says:
"It is possible to define an identification scheme in which either or
both ontological and machine iden-
tifiers are used. If machine identification only is used, all human
artefact 'identification' is relegated
to meta-data description, such as names, purpose, and so on. One
problem with such schemes is that
meta-data characteristics are informal, and therefore can clash –
preventing any formalisation of the
ontological space occupied by the artefacts. Discovery of overlaps and
in fact any comparative fea-
ture of artefacts cannot be formalised, and therefore cannot be made
properly computable."

I will argue that UUIDs are very definitely "computable"; without ambiguity.
Metadata characteristics are very definitely formalized and have been
since at least 1995 (DCMI) and has been an
ISO standard since at least 2003. Therefore this paragraph is
inaccurate in the description of the usefulness
of machine processable identifiers and using metadata for formal descriptions.

Section 3.1 says:
"The general approach for identifying source artefacts is with an
ontological identifier, prefixed by a
namespace identifier if the artefact is managed within a Publishing
Organisation or in some other pro-
duction environment. The lack of a namespace in the identifier
indicates an ad hoc, uncontrolled arte-
fact, but its presence does not guarantee any particular kind of
‘control’ - quality can only be inferred
if the PO is accredited with a central governance body as using a
minimum quality process."

As far as I can find, there is no reference as to what this
accreditation process looks like or who it is managed
by outside of the openEHR CKM; that may be deployed locally. However ...

Section 3.2 says:
"Note that the name_space_id is constructed from a publisher
organisation identifier plus at least one
level of library/package identification. The latter condition ensures
that a PO that starts with only one
‘library’ can always evolve to having more than one.
All archetypes and templates should be identified with this style of
identifier. Any archetype or tem-
plate missing the name_space_id part is deemed to be an uncontrolled
artefact of unknown quality."

Archetypes on the NEHTA CKM http://dcm.nehta.org.au/ckm/ carry only
the openEHR RM namespace.
So, are therefore uncontrolled and of unknown quality; by openEHR definition.

There have been, in the past, archetypes that carried a nhs-dev
designator. I can't find them now. But they were
obviously in development and not deployed. If you browse the internet
looking for openEHR archetypes you can find
hundreds, maybe thousands. This shows that there are people
interested in building knowledge models.

They just aren't interested in the top-down, consensus controlled,
openEHR approach. This creates a chaotic, dangerous
environment for healthcare data. There can easily be multiple
archetypes with the same ID that have different
structures and therefore different instance data. Each instance of
data will not be able to determine which of
the competing archetypes it is supposed to be validated against
without unambiguos identification. I believe that
Dr. Dipak Kalra used the word "unacceptable" when Bert Verhees
confronted him with this issue on "Healthcare IT
Live!" in December 2012: http://goo.gl/UP2Z1

The openEHR eco-system is well engineered. It just isn't
sociologically acceptable. People want to be free to
design their concept models without top-down consensus. MLHIM allows
that with industry standard, off the shelf
tooling.

XML Schema 1.1, Concept Constraint Definitions and MLHIM; "Try it,
You'll like it".
http://gplus.to/MLHIM and http://gplus.to/MLHIMComm for more
information. You may also enjoy the website
at www.mlhim.org and the GitHub point at https://github.com/mlhim

Also, be sure to enjoy Healthcare IT Live! on YouTube
https://www.youtube.com/watch?v=HG7rRPT9KY0&list=PL5BDmBjSV7CsBYbzNBw-D03WEqSJcWxbP
where our guest today is Mr. Alex Fair, CEO of MedStartr.com
"Crowd-funding for healthcare."
https://plus.google.com/events/cof6sdrpjll3ca3stp0440k6ihc
DATE: 2013-04-04 1900 BRT, 2200 UTC, 1800 EDT
Local time and date finder: http://goo.gl/orcJU
You must RSVP "Yes" to receive a panel invitation to the hangout.

Regards,
Tim

Hi Tim,

I will leave the XML vs. ADL aspects for others to discuss and address
your comments re namespacing.

I think we all agree that openEHR artefacts need proper identification
control, almost certainly something like namespacing. Some draft
suggestions have been made and are carried on documentation on the
openEHR website.

Where I would take issue is that this implies some sort of
sociological approach to enforce people to use the archetypes in the
international CKM space. As an editor of these archetypes but also a
consumer, I can assure you that I have no hesitation in using, or
recommending the use of local archetypes wherever and whenever
necessary. The only issue is that people must be aware of potential
name conflicts, which I resolve by adding some kind of simple suffix
to the archetypeID e.g. OBSERVATION.blood_pressure_simi.v1. Of course
namespacing would be a better way of doing this but I see this a a
technical issue and I do not recognise the kind of top-down mentality
that you are suggesting is a blocker.

Of course we would like to develop international archetypes of
sufficient quality and general applicability that developers can pick
these up without needing to redevelop locally. Apart from enhancing
interoperability , there are clear efficiencies in re-using something
that has already been developed. For every suggestion, like yours,
that we are trying to enforce a top-down process, we get twice as many
implementers asking why there is not an international archetype for
x,y,z. IMO both perspectives are valid.

I think the point re the deficiences in archetype naming is well-made,
is non-contentious and is being addressed. It is easy to work around,
in my experience, and is certainly not a blocker to their inclusion in
real-world implementations. It is absolutely not a blocker to the
development and use of local/ regional or national archetypes where
the international equivalents are felt to be unsuitable, whether for
technical or sociological reasons.

The last 3 projects I have worked on (all major) have successfully
blended international, national and vendor archetypes without any
technical or sociological impediments. If the international ones fit,
use them, if they don't, don't. Over time as the international ones
improve, and the need to interoperate more widely also grows, I would
expect to see the balance change, but it is up to the consumers to
decide, where the balance of benefit lies

You are conflating two quite different issues.

Regards,

Ian

I don't think so. I demonstrated what the specifications have to say
on the matter.
Then pointed out that an internationally respected expert on ADL and
archetypes states that this situation is unacceptable.
The two issues are not a conflation. They are specifically inter-twined.

Your work-arounds, expedite they may be, are dangerous and
counter-productive to interoperability.

Those are the facts, not my assumptions.

Regards,
Tim

Tim Cook wrote:

actually, ADL was specifically designed to not support any information model, and it doesn’t. It’s just an abstract syntax, free of the vagaries of any other syntax. sure. In terms of being able to archetypes to XML, that has been available for probably a decade, and is in wide use. Some users ignore ADL entirely. I don’t think anyone has an issue with this. well, but it’s been stillborn for years, everyone knows that… if you mean the competing inheritance models - I have yet to meet any XML specialist who thinks they work. The maths are against it. but your original statement was (I thought) that you are using XML for the information model as well. That’s where it breaks, because of the inability to represent basic concepts like inheritance in the way that is normally used in object modelling (and most database schema languages these days). I’m just reporting what I know to be the case in various current national e-health modelling initiatives, none of which I am directly involved in… all the serious ones use XSD 1.0 + Schematron. I can’t say whether it is valid with respect to XSD 1.1, but it remains valid with respect to 1.0. I don’t see that XSD 1.1 has a healthier inheritance model, so it seems to me that anyone trying to do information modelling (not constraint modelling) is still going to get into trouble. I can’t see anything that contradicts Clark’s statements, even if they are not from last week. But let’s assume I don’t know what I am talking about. It must (according to you) be easy to express e.g. in XSD 1.1. I would be very interested to see how it deals with the generic types and inheritance, both handled by any normal programming language. You can’t define the type Interval in XSD - it doesn’t have parameterised types, even though all programming languages and UML have them. what I was pointing out is that XML as a general technology is far from a ‘final solution’ in any area of application. In modelling it is well known as problematic, and in data representation as well. Can you point to some MLHIM models that show specialisation, redefinition, clarity of expression, that sort of thing? I tried to find some but ran into raw XML source. well that’s just the point, they don’t - it’s possible to define a model so that an XSD form, a programming form, a display screen form and many others are all derivable from that source model. We only want to define the model of ‘microbiology result’ once, after all. This single-source modelling is a key goal of the approach. there is no data in ADL, only models. Not sure what you are trying to say here… well, pretty much the whole world is using programming languages that are essentially object-oriented or object-enabled - even uber languages like Haskell do most OO tricks. You’re using Python, that’s an OO language. XML wasn’t designed for data representation, it was designed for structured document mark-up. That’s why it’s so horrible for data representation. well let me just point to a single feature of object languages (including ADL) - inheritance / specialisation. Are you saying that’s of no use? How do you propose to adapt a model that you have to include local needs, without breaking the parent model semantics? if you can point to some online MLHIM models so we can see the result - the information model, and layers of MLHIM archetypes specialised based on that, it would be very helpful. Firstly , we are in the process of rewriting this, but also I think you misread what it said (which might not have been very clear) - it’s saying that machine ids are computable (obviously they are, at a basic level) the spec you are quoting from is about the future of identification, not the past (or present). so does openEHR, that’s what namespaces are about. If two groups both define a ‘blood pressure’ archetype today, there is an immediate problem. In the future with namespaced ids, the problem becomes manageable, since both forms can co-exist. I followed some of your URLs, but I still can’t locate a) the MLHIM reference model or any b) MLHIM archetypes that I can understand / read. I know they are lurking out there somewhere… can you provide some links? - thomas

[original post by Tim bounced; reposting manually for him]

On Thu, Apr 4, 2013 at 12:50 PM, Thomas Beale [<thomas.beale@oceaninformatics.com>](mailto:thomas.beale@oceaninformatics.com) wrote:

if you mean the competing inheritance models - I have yet to meet any XML
specialist who thinks they work. The maths are against it.

Interesting that you, the creator of a technology that makes many
people very uncomfortable (multi-level modelling), thinks that
conventional users of XML have something to say regarding XML as a
multi-level implementation.  Confusing.

but your original statement was (I thought) that you are using XML for the
information model as well.

Not specifically. We knew that we wanted to exercise all of the
capabilities of XML in actual implementation.  So, when building th
information model we remained conscious of that fact.  So, we knew
there were limitations.  Otherwise, the model would just be openEHR
without the EHR structures.  But, we wanted to be better prepared for
implementability without having to build all of the tools and
technologies that already exist.  We took a great idea and used
pragmatism on it.

Can you point to some MLHIM models that show specialisation, redefinition,
clarity of expression, that sort of thing? I tried to find some but ran into
raw XML source.

There is no need for specialisation or redefinition in MLHIM.  Concept
Constraint Definitions (CCDS) are immutable once published. In
conjunction with their included Reference Model version they endure in
order to remain as the model for that instance data.  Unlike you, I
believe that the ability to read and validate XML data will be around
for a looooong time to come.  There is simply too much of it for it to
go away anytime soon.  When it does go away, there will ways to
translate it to whatever comes next. Such as there is today.

The conceptual model expressed as a mindmap (XMind template):
[https://github.com/mlhim/tools/blob/master/xmind_templates/MLHIM_Model-2.4.2.xmt](https://github.com/mlhim/tools/blob/master/xmind_templates/MLHIM_Model-2.4.2.xmt)

A UML(ish) view:
[https://drive.google.com/a/mlhim.org/?tab=mo#folders/0B9KiX8eH4fiKUFhTb2w2ZGJlWVU](https://drive.google.com/a/mlhim.org/?tab=mo#folders/0B9KiX8eH4fiKUFhTb2w2ZGJlWVU)
I know that there is now a convention to express XML in UML models but
I have not had the time to study it properly.

There are examples; from instance data -> CCDs -> RM along with
documentation and XQL examples:
[https://github.com/mlhim/tech-docs](https://github.com/mlhim/tech-docs)

There are more than 1500 CCDs published on the Healthcare Knowledge
Component Repository site:
[http://www.hkcr.net/](http://www.hkcr.net/)

Historical code and other information is on Launchpad along with
sub-projects at: [http://launchpad.net/mlhim](http://launchpad.net/mlhim)

HTH.  Thanks for asking.

well that's just the point, they don't  - it's possible to define a model so
that an XSD form, a programming form, a display screen form and many others
are all derivable from that source model. We only want to define the model
of 'microbiology result' once, after all. This single-source modelling is a
key goal of the approach.

Right and being able to be 'transformed' into all of those expressions
is what the XML family of tools is very well known for.  So, I
misunderstood your original comment.

there is no data in ADL, only models. Not sure what you are trying to say
here....

Really?  I have seen several examples of dADL with instance information in it.

well, pretty much the whole world is using programming languages that are
essentially object-oriented or object-enabled - even uber languages like
Haskell do most OO tricks. You're using Python, that's an OO language.

That is true.  And each and every one of them have binding libraries
to XML Schema.

It must (according to you) be easy to express e.g. this part of the openEHR RM in XSD 1.1. I would be very interested to see how it deals  >  with the generic types and inheritance, both handled by any normal programming language.

I don't think you will find where I ever used the word easy. But yes
it is possible.  If you are interested enough to study then you can
discover how it can be done.  Prior to removing the unnecessary things
from the RM (for MLHIM 2.x), MLHIM was openEHR 1.0.1 compliant.  I am
not sure now if those artifacts exist.  You can check on Launchpad.

XML wasn't designed for data representation, it was designed for structured
document mark-up. That's why it's so horrible for data representation.

That is technically true; originally.  However, it is not
representative of what XML is today and is the reason why XML Schema
was designed and revised. In your opinion it is horrible but there is
a global industry that doesn't agree with you.

well let me just point to a single feature of object languages (including
ADL) - inheritance / specialisation. Are you saying that's of no use? How do
you propose to adapt a model that you have to include local needs, without
breaking the parent model semantics?

Witness my use of XML Schema 1.1  Yes, it makes some XML users
uncomfortable too because it is unconventional.  But it is standards
compliant, valid and allows for valid CCDs and instance data.  But
then, you don't have to take my word for it.  The examples are there.
USe your favorite off-the-shelf parser/validator that supports 1.1
(Xerces and Saxon come to mind) and see for yourself.

I won't discuss the merits of using or not using XML any further.  You
may or may not agree with me.  But I am very confident in my
multi-level model approach along with the use of RDF metadata to
extend semantics as the modeller sees fit.  MLHIM is a platform to
provide semantic interoperability.  We don't tell you how to structure
your application data (outside of having a small concept instance).
We don't tell you how to do workflow (though there are hooks to enable
it) nor do we tell you how to build your APIs.  The information
industry has published an enormous amount of openly available
documentation on using XML instance data "in healthcare" in these
scenarios.

MLHIM only provides a structure to separate the reference and domain
models.  There is only enough semantics in the RM to the separation of
categories such as clinical, demographic and admin entries.  Semantics
in healthcare are well defined in ontologies and controlled
vocabularies are are referenceable via RDF.

if you can point to some online MLHIM models so we can see the result - the
information model, and layers of MLHIM archetypes specialised based on that,
it would be very helpful.

See above.  But you first need to realize that specialization is not a
necessary feature and is infact quite messy.  At least according to
the domain experts that I have had discussions with in attempting to
specialize the 12 archetypes on the CKM.   I think that by now it is
evident that people and groups of people want to define their models
and use them. Whether or not their model fits some some definition of
maximal coverage. Unless you have an actual maximal coverage
archetype, there will be missing concepts for some sub-domains.  So,
we removed that approach in our definition of multi-level modelling.

Firstly , we are in the process of rewriting this, but also I think you
misread what it said (which might not have been very clear) - it's saying
that machine ids are computable (obviously they are, at a basic level)

Okay, well that document was written in 2009 and revised in 2010. It
is in the published branch.

Archetypes on the NEHTA CKM [http://dcm.nehta.org.au/ckm/](http://dcm.nehta.org.au/ckm/) carry only
the openEHR RM namespace.
So, are therefore uncontrolled and of unknown quality; by openEHR
definition.

the spec you are quoting from is about the future of identification, not the
past (or present).

It is difficult to tell from the various arguments whether this should
be considered to be part of the spec or not.  On one hand we are told
that the problem with conflicting IDs has been addressed. Then in
another context, we are told that I am quoting from a 'future'
document.  My point is; people are building systems and not adhering
to the full specifications. I just think it is important for everyone
to know that the openEHR eco-system is very well engineered and it
will only work correctly when all of the parts are implemented
correctly.

The openEHR eco-system is well engineered. It just isn't
sociologically acceptable. People want to be free to
design their concept models without top-down consensus. MLHIM allows
that with industry standard, off the shelf tooling.

so does openEHR, that's what namespaces are about. If two groups both define
a 'blood pressure' archetype today, there is an immediate problem. In the
future with namespaced ids, the problem becomes manageable, since both forms
can co-exist.

Thanks for confirming this problem, for today.  I hope that people
realize the potential issues that they are creating by operating
outside of the eco-system.  I also hope that whenever, 'the future',
arrives that people will understand that the need to use this
namespace capability. Are there estimates yet as to when the future
will arrive?

I followed some of your URLs, but I still can't locate a) the MLHIM
reference model or any b) MLHIM archetypes that I can understand / read. I
know they are lurking out there somewhere... can you provide some links?

If the links above do not answer your questions please inquire on the
MLHIM Google Plus page(s) or on the mailing lists from Launchpad.

Have a great day.

--Tim

============================================
Timothy Cook, MSc           +55 21 94711995
MLHIM [http://www.mlhim.org](http://www.mlhim.org)
Like Us on FB: [https://www.facebook.com/mlhim2](https://www.facebook.com/mlhim2)
Circle us on G+: [http://goo.gl/44EV5](http://goo.gl/44EV5)
Google Scholar: [http://goo.gl/MMZ1o](http://goo.gl/MMZ1o)
LinkedIn Profile:[http://www.linkedin.com/in/timothywaynecook](http://www.linkedin.com/in/timothywaynecook)

Thanks Tom. I probably posted with the incorrect email address again.
Arrrgh, organizing the simple things is difficult.

--Tim

not sure what you want to say here! I don’t disagree with that obviously. All openEHR systems I am aware of process XML data routinely, including HL7v2 data, and CDAs. But if you say there is no need for specialisation or redefinition it means there is no re-use to speak of - every model is its own thing. This is a major departure from the archetype approach, which is founded upon model reuse and adaptation. Now more or less. New versions of the documents are being published imminently, and the tooling is catching up to namespaces (also other things like annotations). - thomas

Cook: There is no need for specialisation or redefinition in MLHIM. Concept
Constraint Definitions (CCDS) are immutable once published. In
conjunction with their included Reference Model version they endure in
order to remain as the model for that instance data. Unlike you, I
believe that the ability to read and validate XML data will be around
for a looooong time to come. There is simply too much of it for it to
go away anytime soon. When it does go away, there will ways to
translate it to whatever comes next. Such as there is today.

Beale: I don’t disagree with that obviously. All openEHR systems I am aware of process XML data routinely, including HL7v2 data, and CDAs.

Beale: But if you say there is no need for specialisation or redefinition it means there is no re-use to speak of - every model is its own thing. This is a major departure from the archetype approach, which is founded upon model reuse and adaptation.

And that is the issue, and what is at the root of this dispute. Tim does not see the point of specialization or redefinition, which, in my opinion, is why he can hold forth so strongly for XML.

Randy Neall

Hi Tim

There is no need for specialisation or redefinition in MLHIM. Concept
Constraint Definitions (CCDS) are immutable once published. In
conjunction with their included Reference Model version they endure in
order to remain as the model for that instance data. Unlike you, I
believe that the ability to read and validate XML data will be around
for a looooong time to come. There is simply too much of it for it to
go away anytime soon. When it does go away, there will ways to
translate it to whatever comes next. Such as there is today.

Response from Tom: But if you say there is no need for specialisation or redefinition it means there is no re-use to speak of - every model is its own thing. This is a major departure from the archetype approach, which is founded upon model reuse and adaptation.

What was apparently an argument about XML modeling was actually an argument about something quite different, namely, whether specialization or redefinition have any place in EHR data modeling. I’m surprised you would mount such a challenge (if that’s what you meant to do), familiar as you have been with openEHR since the beginning. This is fundamental and basic. It would be interesting to hear you address this point directly and explicitly, leaving aside the preferred modeling ecosystem for now. Is specialization or redefinition actually important? Or was it important only until MLHIM appeared? I looked briefly at some of the MLHIM XSDs, which appear to model something akin to flat files, but then I probably missed the parent schema into which they all fit.

Randy Neall

[This is Tim again, initially bounced]

And that is the issue, and what is at the root of this dispute. Tim does not
see the point of specialization or redefinition, which, in my opinion, is
why he can hold forth so strongly for XML.

Randy Neall

You are mostly correct.  It isn't that I don't think that re-use is a
good idea.  The knowledge modellers and developers are telling us by
their actions that do not want to participate in the top-down, maximal
data model approach.  As I have said many times, for many years; it is
a wonderfully engineered eco-system. Now we know, it just doesn't work
in real practice on a global basis.

So that had to change. Add in some other simplifications in the RM and
openEHR turns into MLHIM.  My goal is to encourage multi-level
modelling to solve the semantic interoperability issue. Whatever
acronym you want to tie to it.

I know that MLHIM isn't perfect, but it is designed with agility and
data durability in mind.

--Tim

Tim, obviously some of us are interested in this statement. You say ‘it just doesn’t work in real practice’. Our experience is different, and I am interested in your evidence / justification of this statement. - thomas

actually, I will be a bit more specific. Let’s say we are talking about archetypes for some of the following topics (the following are some openEHR CLUSTER archetypes): None of these can be defined by ‘developers’. They are clinical content, and only clinical professionals can develop proper versions of them. So what you are saying is that ‘knowledge modellers’ (presumably physicians) don’t want to build such models by participating in a modelling exercise in which they communicate with other physicians working on the same models? It seems to me that the only alternative is that they build their own private models and ignore everyone else. That’s expedient, but it’s also a guarantee of non-interoperability. Maybe you can explain your statements in more detail? thanks - thomas

(attachments)

caheidaa.png

Two remarks, just some thoughts that occur to me in this discussion.

First, datamodel re-use like in inheritance is a technical approach, it is not proven that it will always be good from a medical-informatics point of view. So it should not be a leading principle in design. It is just handy for developers if it can be used. A user of a system is not aware of these principles, but may have requirements which do not necessarily fit inside a closed technical approach.

Second, designing for real practice on a global base sounds megalomaniac. It sounds like carrying the world on ones shoulder, which is quite some weight. I think it is better to accept that there are other good ways too, and one should find solutions to cooperate instead of thinking that ones system is good for every purpose.

I think that this is a weakness in as well OpenEHR as also MLHIM.

I like the Chinese saying: Let thousand flowers bloom.

Bert

That's expedient, but it's also a guarantee of non-interoperability.

As far as I can see, also from my experience, nor OpenEHR, nor MLHIM will be the only datamodel system on the world. Cooperation with other systems will always need a message-format. The same goes for other systems. Mapping will always be (at least partly) done manually.

The goal, what the customer wants, is not a solution, which dictates him to throw away his system, but he wants connectivity in which his system can participate.

This fact makes this discussion purely academical.

Bert

As far as I am concerned, and the projects I work on, which are in fact quite some, all using my OpenEHR kernel as base system, but above, wonderful architectures of design in interoperability and GUI's.

The main reason for using OpenEHR is it flexibily, without writing new kernels, supporting many different goals, from homecare to hospitalized heartfailure-monitoring. We collect data from HL7v2, also some non-standard proprietary systems, whatever, if we find it, we write software to get the data. At this moment, data are mostly at the endstation in our system, but change is coming. There is a probably desire to export to vMR (HL7 standard). We are not afraid for that.

Also for the Netherlands, we will in the near future need to support some HL7v3 messages, designed by Nictiz, in a way that a lot of medical information can be exchanged, and also, it is acceptable for most (very divers) system builders.

The fact that OpenEHR has as goal that it wants to serve the world is not very usefull to us, in fact, it is even a little bit disturbing. We write out own archetypes and we learn a lot from CKM. We don't want a third party to tell us, that we should conform to a global standard, we just like freedom. Flexibility is a way to be free.

This is reality in Europe, and I love it. It keeps developers busy, and gives customers to require their own things, and keeps the market sharp and innovative.

Bert