Hi Tom,
However, you seem to promote the idea that object oriented modelling
is the only information modelling approach[1].
This is a critical failure. The are many ways to engineer software
using many different modelling approaches.
I don't see any problem here. The extant open 'reference implementation' of
openEHR has been in Java for years now, and secondarily in Ruby (openEHR.jp)
and C# (codeplex.com). The original Eiffel prototype was from nearly 10
years ago and was simply how I prototyped things from the GEHR project,
while other OO languages matured.
I am not sure that we have suffered any critical failure - can you point it
out?
If you re-read the paragraph you will note that I said that the assumption
that OO modelling is mandatory, is a critical failure; not any type of
language count.
well, since the primary openEHR projects are in Java, Ruby, C#, PHP, etc, I
don't see where the disconnect between the projects and the talent pool is.
I think if you look at the 'who is using it' pages, and also the openEHR
Github projects, you won't find much that doesn't connect to the mainstream.
The discussion about talent pool is about the data representation and
constraint languages.
XML and ADL. The development languages are common across the application domain.
I know that you believe that ADL is superior because it was designed
specifically to support
the openEHR information model. It is an impressive piece of work, but
this is where its value falls off.
XML has widespread industry acceptance and plethora of development and
validation tools against a global
standard.
<NB: in the below I am talking about the industry standard XSD 1.0, not the
9-month old XML Schema 1.1 spec>
The industry standard XML Schema Language is 1.1. The first draft was
published in April 2004
making it nine years old,
well I don't really have anything to add to any of that. For the moment,
industry (including openEHR, which publishes XSDs for all its models for
years now) is still using XML, although one has to wonder how long that will
go on.
A curious prognostication indeed.
But XML schema as an information modelling language has been of no serious
use, primarily because its inheritance model is utterly broken. There are
two competing notions of specialisation - restriction and extension.
Interesting. I believe that the broader industry sees them as
complimentary, not competing.
Restriction is not a tool you can use in object-land because the semantics
are additive down the inheritance hierarchy, but you can of course try and
use it for constraint modelling.
Restriction, as its name implies, is exactly intended and very useful
for constraint modelling.
Constraint modelling by restriction is, as you know, the corner-stone
of multi-level modelling.
Not OO modelling. Which is, of course, why openEHR has a reference
model and a constraint model.
They are used for the two complimetary aspects of multi-level modelling.
Although it is generally too weak for
anything serious, and most projects I have seen going this route eventually
give in and build tools to interpolate Schematron statements to do the job
properly. Now you have two languages, plus you are mixing object (additive)
and constraint (subtractive) modelling.
Those examples you are referring to are not using XML Schema 1.1.
Or at least not in its specified capacity. There is no longer a need
for RelaxNG or Schematron to be mixed-in.
Your information on XML technologies seems to be quite a bit out of date.
Add to this the fact that the inheritance rules for XML attributes and
Elements are different, and you have a modelling disaster area.
I will confess that XML attributes are, IMHO, over used and inject
semantics into a model
that shouldn't be there. For example HL7v3 and FHIR use them extensively.
James Clark, designer of Relax NG, sees inheritance in XML as a design flaw
(from http://www.thaiopensource.com/relaxng/design.html#section:15 ):
Of course! But then you are referencing an undated document by the
author of a competing/complimentary tool,
that is announcingannounces RelaxNG as new AND its most recent
reference is 2001.
So, my guess is that it is at least a decade old. Hardly a valid opinion today.
Difficulties in using type restriction (i.e. subtyping) in XSD seem
well-known - here. Not to mention the inability to deal with generic types
of any kind, e.g. Interval<Date>, necessitating the creation of numerous
fake types.
Hmmmmm, what is wrong with xs:duration?
I don't think I understand what you mean by "fake types".
And of course, the underlying inefficiency and messiness of the data are
serious problems as well. Google and Facebook (and I think Amazon) don't
move data around internally in XML for this reason.
That is kind of vague. Can you expand on this?
The fact taht any other domain does or doesn't use XML is really pretty
irrelevant to multi-level modelling in healthcare. I am comfortable in assuming
that none of them use ADL for anything. So the comparison is quite the
red-herring.
I think that limiting the conversation to multi-level modelling in
healthcare is
an appropriate approach. Otherwise, it is kind of pointless.
None of this is to say that XML or XML-schema can't be 'used' - I don't know
of any product or project in openEHR space that doesn't use it somewhere,
and of course it's completely ubiquitous in the general IT world. What I am
saying here is that the minute you try to express your information model
primarily in XSD, you are in a world of pain.
I will admit that expressing the MLHIM information model in XML Schema
1.1 terms
and then developing the actual implementation was a challenge at first.
But if you take a look today you will see that it is quite easy to understand,
standards compliant and fully functional.
The original challenge was to overcome my predjudice against XML.
My lessons from projects using XSD are:
XSDs are good for one thing: describing the contents of XML documents.
That's it.
Seems to be a pretty useful goal if you have XML documents that
contain your data.
but what we need are models that can describe data, software, documents,
documentation, interfaces, etc
But these are all VERY different artifacts and require different
models, tools and langauges.
get imported data out of XML as soon as possible, and into a tractable
computational formalism
Very much like get data out of ADL as soon as possible. Once you build
some tools to do that.
treat XSDs as interface specifications, to be generated from the underlying
primary information models, not as any kind of primary expression in their
own right
Define XSDs with as little inheritance as possible, avoid subtyping, i.e.
define types as standalone, regardless of the duplication.
I am not sure you understand XML Schema 1.1. Again you seem to be
approaching multi-level
modelling in healthcare as if OO modelling were the only choice. This
is the "I have a hammer,
everything is a nail" approach. It isn't very effective in the real
world where various tools
are need to solve various problems.
Maximise the space optimisation of the data, no matter what it takes. It
usually requires all kinds of tricks, heavy use of XML attributes, structure
flattening from the object model and so on. If you don't do this, any XML
data storage or will cost twice what it should and web services using XML be
horribly slow.
So, in your opinion, should you build your APIs in ADL? Of course not.
I fail to see your arguments against using XML for what it is designed for;
data representation and constraint modelling. Of course then you have all of
the related tools such as XSLT, SOAP, WSDL, XSLT, XPath, XQuery, etc.
for other tasks.
A fairly complete suite. There isn't a real, practical reason to
re-invent them. They make the interactions
smoother and easier understood by the IT industry as compared to using
a domain specific language.
I know there are all kinds of tricks to mitigate these problems, I've seen a
lot of them. The fact that there is a mini-tech sector around XSD problem
mitigation / optimisation testifies to the difficulty of this technology.
I do not consider anything inside the specification to be a "trick".
It seems pretty straight forward to me. Use cases were presented,
solutions specified and documented and industry adopted. What is
tricky about that?
XML Schema 1.1 introduces useful things that may reduce some of the above
problems (good overview here), however as far as I can tell, its inheritance
model is not much better than XSD 1.0 (although you can now inherit
attributes properly, so that's good).
It is not an OO language. If you are judging it based on those
characteriscs; please
see my critical failure comment above. OO is not the be all, end all
solution in
computer science. Much less, in multilevel modeling.
well I guess the main thing is seamlessness between your information model
and your programming model view. I am not saying it's the only way, but the
approach in openEHR was oriented towards making sure that expressions of the
information model, including all its semantics, are as close as possible to
the software developer's programming model. If we had done the primary
specifications in XML, there would always be a significant disconnect
between the models and the software (actually, the specs would have been
nearly impossible to write). Not to mention, life would be hard for working
with all the other data formats now in use, including JSON, and various
binary formats.
At the point in time when you developed ADL, you had no choice. In the
late 1990s, XML Schema 1.0 was broken.
It was only slightly better than using DTDs. But the IT industry
advances very rapidly.
Keeping in touch with technology changes is crucial.
An approach that has emerged in industrial openEHR systems in the last few
years is to generate message XSDs from templates - 1 XSD per template, and
write a generic XML <=> canonical data conversion gateway. This means we can
do all modelling in powerful formalisms like UML
2, EMF/Ecore (for the
information models) and all constraint modelling in ADL / AOM 1.5, and treat
XML as one possible data transport.
EMF/Ecore will be nice when they finally can generate standards
compliant XSDs without
Ecore cruft in them. At this point, once you use Ecore, you are
infected and can't leave.
I really wanted to use EMF for MDD. Maybe, someday.
From what I can see, the major direction in information modelling for the
future will be Eclipse Modelling Framework, using Ecore-based models. This
is where I think the computational expression of openEHR's Reference Model
will move to. The OHT Model Driven Health Tools (MDHT) project is already
showing the way on this, at the same time adopting ADL 1.5 concepts for
constraint modelling.
(see above comment)
I have no experience with XSD 1.1, and I think it will be years before
mainstream industry catches up with it. But it may be that it does what is
needed.
Well, I can't predicate how long it will take it to be used on a broader basis.
Probably, like most things, as people need the capability. Sometimes,
people resist change.
It takes them outside their comfort zone and they don't like it.
I can tell you that XML Schema 1.1 is very functional. It is
supported by open source and proprietary
tools and it is working quite well, without tricks, in MLHIM.
we'll obviously differ on our analysis of what is the best modelling
formalism. The above are the conclusions I have come to over the years.
I am not looking for "the best" modelling formalism. I am looking for
what works and is simple
to implement in order to move forward the main and necessary concept
of multi-level modelling
so that we can solve the semantic interoperability issue between
healthcare applications from purpose
specific mobile apps to enterprise systems.
Others may have other, better ideas, and it may be that an XSD 1.1 modelling
effort in openEHR could make sense.
I think the key thing would have been to ensure that the archetypes could be
shared across openEHR and MLHIM. Archetypes are pretty widely used these
days, and there are many projects now creating them. I don't know if this is
still possible; if not, it presents clinicians with the dilemma: model in
ADL/AOM, or model in MLHIM? Replicated models aren't fun to maintain...
I am not sure that there is any requirement for mapping.
While there are a number of people producing openEHR archetypes.
AFAICT tell there are only a dozen or so
that are in compliance with the openEHR specifications.
Specifically the "Knowledge Artefact Identification" document.
To address a couple of issues I have with the current openEHR eco-system:
Section 2.2 says:
"It is possible to define an identification scheme in which either or
both ontological and machine iden-
tifiers are used. If machine identification only is used, all human
artefact 'identification' is relegated
to meta-data description, such as names, purpose, and so on. One
problem with such schemes is that
meta-data characteristics are informal, and therefore can clash –
preventing any formalisation of the
ontological space occupied by the artefacts. Discovery of overlaps and
in fact any comparative fea-
ture of artefacts cannot be formalised, and therefore cannot be made
properly computable."
I will argue that UUIDs are very definitely "computable"; without ambiguity.
Metadata characteristics are very definitely formalized and have been
since at least 1995 (DCMI) and has been an
ISO standard since at least 2003. Therefore this paragraph is
inaccurate in the description of the usefulness
of machine processable identifiers and using metadata for formal descriptions.
Section 3.1 says:
"The general approach for identifying source artefacts is with an
ontological identifier, prefixed by a
namespace identifier if the artefact is managed within a Publishing
Organisation or in some other pro-
duction environment. The lack of a namespace in the identifier
indicates an ad hoc, uncontrolled arte-
fact, but its presence does not guarantee any particular kind of
‘control’ - quality can only be inferred
if the PO is accredited with a central governance body as using a
minimum quality process."
As far as I can find, there is no reference as to what this
accreditation process looks like or who it is managed
by outside of the openEHR CKM; that may be deployed locally. However ...
Section 3.2 says:
"Note that the name_space_id is constructed from a publisher
organisation identifier plus at least one
level of library/package identification. The latter condition ensures
that a PO that starts with only one
‘library’ can always evolve to having more than one.
All archetypes and templates should be identified with this style of
identifier. Any archetype or tem-
plate missing the name_space_id part is deemed to be an uncontrolled
artefact of unknown quality."
Archetypes on the NEHTA CKM http://dcm.nehta.org.au/ckm/ carry only
the openEHR RM namespace.
So, are therefore uncontrolled and of unknown quality; by openEHR definition.
There have been, in the past, archetypes that carried a nhs-dev
designator. I can't find them now. But they were
obviously in development and not deployed. If you browse the internet
looking for openEHR archetypes you can find
hundreds, maybe thousands. This shows that there are people
interested in building knowledge models.
They just aren't interested in the top-down, consensus controlled,
openEHR approach. This creates a chaotic, dangerous
environment for healthcare data. There can easily be multiple
archetypes with the same ID that have different
structures and therefore different instance data. Each instance of
data will not be able to determine which of
the competing archetypes it is supposed to be validated against
without unambiguos identification. I believe that
Dr. Dipak Kalra used the word "unacceptable" when Bert Verhees
confronted him with this issue on "Healthcare IT
Live!" in December 2012: http://goo.gl/UP2Z1
The openEHR eco-system is well engineered. It just isn't
sociologically acceptable. People want to be free to
design their concept models without top-down consensus. MLHIM allows
that with industry standard, off the shelf
tooling.
XML Schema 1.1, Concept Constraint Definitions and MLHIM; "Try it,
You'll like it".
http://gplus.to/MLHIM and http://gplus.to/MLHIMComm for more
information. You may also enjoy the website
at www.mlhim.org and the GitHub point at https://github.com/mlhim
Also, be sure to enjoy Healthcare IT Live! on YouTube
https://www.youtube.com/watch?v=HG7rRPT9KY0&list=PL5BDmBjSV7CsBYbzNBw-D03WEqSJcWxbP
where our guest today is Mr. Alex Fair, CEO of MedStartr.com
"Crowd-funding for healthcare."
https://plus.google.com/events/cof6sdrpjll3ca3stp0440k6ihc
DATE: 2013-04-04 1900 BRT, 2200 UTC, 1800 EDT
Local time and date finder: http://goo.gl/orcJU
You must RSVP "Yes" to receive a panel invitation to the hangout.
Regards,
Tim