OpenEHR queries

Randy,

We have already indicated that we store indexed blobs. We can store these blobs as XML, DADL or Binary. It doesn’t matter, it is just a serialisation format and the “MAGIC” happens in the object layer. Another benefit of this is that it obfuscates the EHR content forcing the data access through the EHR Server to ensure that the semantics and security of the content is maintained. This is a deterrent to traditional application developers bypassing these important EHR requirements.

Regards

Heath

Heath Frankel
Product Development Manager

Ocean Informatics

Ground Floor, 64 Hindmarsh Square

Adelaide, SA, 5000

Australia

ph: +61 (0)8 8223 3075

mb: +61 (0)412 030 741
email: heath.frankel@oceaninformatics.com

Hi Randolph

Depends what you mean by trade secrets.

If you mean that there are some parts of openEHR that are secret or not disclosed then that’s not right. The whole specification is out there for anyone to use and implement. It is the most complete specification for semantic interoperability in the world today.

If you mean that Ocean has developed one possible commercial implementation of this and spent a number of years and a lot of money on it and is not publishing the source code and a detailed explanation of how it was done, then you are right! :slight_smile:

That said, there are still a lot of Ocean people contributing to the lists…

regards Hugh

Heath Frankel wrote:

Randolph Neall wrote:

Can I assume that what Thomas here advocates, ("relational databases can be

used very effectively as a low-level store of blobs keyed by path") is what
how the ocean persistence layer actually works? Beyond this, Thomas apparently
has little use for the capacities of Sql-type RDBMS systems to handle clinical
information. Does the Ocean system ultimately amount to blobs keyed by paths
(presumably string paths)? If so, what kind of blobs, XML blobs, or some other
structured text system?

to clarify: yes, more or less. If you lock the relational schema, or even an
object schema (i.e. an object model expressed as classes and/or as an ODB
schema, say in ODL) directly to a model of the real world phenomena your
system deals with (e.g. patient visits, path results, GP notes, physical
examinations, referrals etc) then there will be permanent problems of
maintainability. This has been borne out for as long as I have known anything
about computers (let's say 20 years working + 5 years university, when Edition
2 of Somerville was our idea of 'software engineering').

I have kept wondering why software engineering talks about the problem of
maintenance and having to continually throw away and rebuild systems, the
problems of drifting away from requirements and so on, as if they were being
solved. But they are mostly not being solved. The technical book shelves are
full of books teaching the same old thing (count the number of information
system books using airline booking or hotel or conference booking case
studies), implying that it works. But in real life it doesn't. We don't seem
to have any sustainable information systems - we have to keep fiddling with
them. (Note: I am talking about 'information' systems here - there are many
other computational systems whose information is more or less static and which
mostly do number crunching or visualisation or some other job).

The root problem in my view is that if you build an information system in such
a way that its business logic and database encode the facts of the domain (as
gathered last week, by you and a few colleagues, perhaps following 'use case'
analysis or some such idea), the database and logic of the system are
connected to the reality being modelled. However, since reality keeps
changing, along with our idea of it (and hence our modelling of it), the
system is never correct; we have to kee modifying it. This might be easy with
a small system, but with large distributed systems and billions of records,
and numerous changing requirements, it doesn't perform well at all (altough we
often delude oursevles that it is ok by restricting our work practices to
those that fit the software).

In other words, directly modelling the aspects of reality we are concerned
with as first-order concepts in the software and database is a recipe for
costly and permanent maintenance. I believe we have to instead model the
reality as a second order concept, with first order concepts in the system
being stable models of the classes of things found in the reality of the
domain. Hence in openEHR we model only things like Composition ('recording'),
Section ('heading'), Party, various kinds of Entry like Observation and so on.
We don't model any medical or clinical thing directly. Everything modelled as
a first order concept is domain-invariant - it has the same meaning right
across teh entire domain of application. Of course we can debate whether we
have gotten it right or not, but this is the intention.

We are not the first to do something like this of course - there are hundreds
of solutions in a similar vein, various kinds of business rule modelling
languages and so on. What we do in openEHR is just one (fairly comprehensive,
we think) approach to solving the problem of unmaintainable software systems.
It also happens to help solve the problem of getting the requirements from the
domain experts - far better than 'use case analysis a la Jacobson' does.

It now seems quite clear to me that building any real world concepts directly
into the software infrastructure of an information system is a mistake. We can
do far better than that, and we always should aim to do so.

Having said all this, then there is the obvious question: well what do we use
the database for? My view on that (and there are far better experts than I) is
this: once you clear your head of any idea of trying to model patients,
visits, pathology results, prescriptions, medications, etc etc etc in the
database, you have an incredible freedom to use the power of these systems -
and most modern databases are extremely powerful. We just use them in the
wrong way a lot of the time. Using a relational database for medicine in my
view makes sense as long as you build a schema that has no first-order domain
concepts in it, and instead encodes information in a generic way. That could
be blobs, paths, indexable columns, or some other method. There are many
variations available, and I doubt if we have really started to understand them.

All these comments also apply to object models - encode the real phenomena
your system processes directly into the software classes of your system, and
you will have a never-ending game of catchup on your hands.

- thomas beale

In other words, directly modelling the aspects of reality we are concerned
with as first-order concepts in the software and database is a recipe for
costly and permanent maintenance. I believe we have to instead model the
reality as a second order concept, with first order concepts in the system
being stable models of the classes of things found in the reality of the
domain. Hence in openEHR we model only things like Composition ('recording'),
Section ('heading'), Party, various kinds of Entry like Observation and so on.
We don't model any medical or clinical thing directly. Everything modelled as
a first order concept is domain-invariant - it has the same meaning right
across teh entire domain of application. Of course we can debate whether we
have gotten it right or not, but this is the intention.

Thanks for explaining your view, Thomas.

Doesn't this model division of the problem domain into two concepts just
mean that you're shifting the bulk of development and maintenance
complexity/costs from first order concept level to the second order
concept level? The reality of the problem domain _will_ change over
time, and must be reflected in software somewhere, somehow.

Of course I agree that divide-and-conquer is one of the best strategies
a human mind can work with; especially when at some point in time one
doesn't have to worry anymore about basic first order concepts because
they have cristallized into stability. But this looks very much like
well-known modern programming languages and the libraries (abstract
classes in OOP) available for them. Or is this a wrong analogy?

Roger

Hi Roger,

DISCLAIMER:
I do not, nor do I have any intention of speaking for anyone but myself.

In many ways you are EXACTLY correct in that we (the plural openEHR
Foundation we) are shifting the complexity to the second level. That
*really* is the point. Our (my) point is to develop software
applications that can deal with the changing knowledge model of
healthcare.

I have no intention of being the domain expert. I could really "care
less" (as we say in the US) whether it is a podiatrist or a cardiologist
that wants to commit "information" to my application. I MUST be able to
consume and manage that information. Their responsibility is to create
archetypes that represent their knowledge *AND* fit into the information
management model that we make available for them.

So, while we may be shifting a bit of the work (which is sharable via
openly available archetypes) we are also saying that if you create a
clinical application specific to cardiology or podiatry, the patient
centric PHR/EMR/EHR can still understand that information, "in context";
to how it was created.

Yep, it's a shift. Hopefully it is a shift that makes sense to those
that have seen the real world issues of previous healthcare information
systems.

Cheers,
Tim

Heath,

Good clear answer. Thanks.

You enable me to take us back to where this conversation started. I can now make a possibly uneducated guess how the Ocean querying works: You parse AQL queries into two distinct parts, (1) for the relational DB and its paths and (2) for the “object layer.” The first part narrows down the range of blobs you must look at as far as possible, and the second part penetrates into specific values in the blobs themselves. If you end up with 10,000 blobs resulting from part 1, you must parse and instantiate each one into the memory of your object server and then step through each one to find which of them satisfies the query. If I’m right, your system could run reasonably fast if part 1 of the query does not not yield a large result or if your object server runs on some heavy iron. I’m not sure if one of your blobs represents one complete patient record or merely a fragment of the record, but if it does encompass the complete record, the blob size could be large and the instantiation process (involving parsing if XML) in itself would consume resources and take time. Maybe you’ve found that querying against a large set of blobs is seldom necessary.

There are tradeoffs no matter how one goes, and I can see your logic. You mention the need for (1) obfuscation and (2) semantic integrity. Thomas’s concern centered more on the complexities of expressing hierarchies in traditional relational terms, and in maintaining ever-changing models (see his extended comment in this thread). Either way, you end up with your chosen architecture.

Did Ocean consider, at the beginning, using a relational node-based graph (verticies, edges, etc) structure, without blobs and without the schema itself ever having to change, and reject the idea?

Best regards,
Randolph

They can answer that but for your info I considered it and accepted it
:-). Which brings me to the next question but I will start a new
thread.

Thomas, thanks for your extended remarks. Your point is one you’ve made for a long time, that relational db schemas cannot keep up with the real world. I’m just wondering if moving the problem out of the relational DB and into blobs (persisted objects, I take it) solves the problem you so eloquently depict. Yes, it solves the schema problem. I grant you that. But you’re still left with imperfect and changing models even with blobs. I’ve read the openEHR specs enough to know that when an archetype version changes, one is obliged to convert all existing records (blobs) to conform to the new version, and that, it seems, would not be trivial. That task would worry me. Every affected blob would have to be rewritten. Maybe the only real problem you’re even trying to solve is that of the relational schema. You allow that there are many ways other than the path-blob approach, but you’ve made it clear that your preference is definitely that, another indication of how intractable the root problem actually is.

Thanks again.

Randolph

One more comment, Thomas. I think, at base, what you’re saying is that when you try to put your data into traditional rows and columns, you’ve made a heavy, unchangeable commitment, or least one that is not easily changed. But if you use something else, like structured text documents (such as XML) of some sort, you’ve made a lighter, more retractable commitment, and one that more easily expresses flexible hierarchies.

Randolph

Thomas, thanks for your extended remarks. Your point is one you’ve made for a long time, that relational db schemas cannot keep up with the real world. I’m just wondering if moving the problem out of the relational DB and into blobs (persisted objects, I take it) solves the problem you so eloquently depict. Yes, it solves the schema problem. I grant you that. But you’re still left with imperfect and changing models even with blobs. I’ve read the openEHR specs enough to know that when an archetype version changes, one is obliged to convert all existing records (blobs) to conform to the new version, and that, it

That is not true. When an archetype version changes, new data are created/validated by new version of the archetype while old data (blobs or whatever) are still processed by old archetypes. In the root node of the data, there is always information about the archetype (and its version) used to create the data ( LOCATABLE.archetype_details). So there is really no need to convert existing data when archetype changes. Hope this clarifies the matter.

Cheers,
Rong

So are you saying that persisted clinical data is never converted to conform to newer versions of an achetype, or simply that one is not compelled to convert?

Randolph

So are you saying that persisted clinical data is never converted to conform to newer versions of an achetype, or simply that one is not compelled to convert?

New version of an archetype is created when the changes are so significant that it can’t be backwards compatible. In other words, conversion old data to conform to newer versions of an archetype might not even be possible.

Quoted from Archetype Principles: http://www.openehr.org/svn/specification/TAGS/Release-1.0.1/publishing/architecture/am/archetype_principles.pdf

“Principle 14: There is a means of evolving existing archetypes to accommodate changing requirements, without invalidating data created with earlier versions. Since archetypes are used to create data, changes to archetypes must be regarded as creating a new archetype; i.e. the identifier of an archetype must incorporate its version. The only types of change to archetypes that can be made without changing the version are those which do not invalidate previously created data. Formally, such changes must not ‘narrow’ constraints expressed in the existing version.”

/Rong

Thanks, Rong.

Heath,

Good clear answer. Thanks.

You enable me to take us back to where this conversation started. I can now
make a possibly uneducated guess how the Ocean querying works: You
parse AQL queries into two distinct parts, (1) for the relational DB and its
paths and (2) for the "object layer." The first part narrows down the range
of blobs you must look at as far as possible, and the second part penetrates
into specific values in the blobs themselves. If you end up with 10,000
blobs resulting from part 1, you must parse and instantiate each one into
the memory of your object server and then step through each one to find
which of them satisfies the query.

Without going into the details of te exact architecture (its not interesting), ou are
missing one important factor: we can know in advance a lot about what is in
the blobs due to having the archetype 'path-map' for each blob. For example, if
teh query concerns HbA1C levels, we know that only the HbA1C archetype is
implicated. There is no need to even bother instantiating any blob not containing
the HbA1C archetype paths. The archetype path maps give yo an x-ray view of
the data.

- thomas beale

Thomas, thanks for your extended remarks. Your point is one you've made for
a long time, that relational db schemas cannot keep up with the real world.
I'm just wondering if moving the problem out of the relational DB and into
blobs (persisted objects, I take it) solves the problem you so eloquently
depict. Yes, it solves the schema problem. I grant you that. But you're

Well it does more than that - it allows the databse to be used to perform
optimised querying for a certain category of data (e.g. archetyped, blobbed
data), rather than for particular content (e.g. path results versus patient details).

still left with imperfect and changing models even with blobs.

imperfect in what sense?

I've read the

openEHR specs enough to know that when an archetype version changes,

one is

obliged to convert all existing records (blobs) to conform to the new
version, and that, it seems, would not be trivial. That task would worry

Not, not at all. Archetypes are immutable, and the data always conform to
thesame archetype. A new 'version' is in fact a new archetype which just
happens to have a similar name. New versions are not made very often,
because built-in flexibility (see e.g. generic archetypes like the lab archetype)
new revisions (compatible changes) and specialisations are used to deal with
almost all new requirements relating to existing archetypes. If a new version
becomes necessary, it is a new archetype. Data created with the previous
version still conform to that previous version, which is always available in the
repository. The data could be migrated to conform to the new version, but this
is only sensible if it is a useful thing to do.

The main thing to understand is that new 'versions' are not the main way of
dealing with changing requirements in archetypes.

me. Every affected blob would have to be rewritten. Maybe the only real
problem you're even trying to solve is that of the relational schema. You
allow that there are many ways other than the path-blob approach, but you've
made it clear that your preference is definitely that, another indication of
how intractable the root problem actually is.

You could just as easily use an object database. We have for example used
Matisse on other projects - it is an excellent product, and allows object data to
be correctly represented in a language independent way (unlike VM-style
caches like db4o). It could be used perfectly well with openEHR.

- thomas

So are you saying that persisted clinical data is never converted to conform
to newer versions of an achetype, or simply that one is not compelled to
convert?

Usually there zould be no reason to convert - no more than there would be to
convert data of archtype A into a form corresponding to archetype B. IN
openEHR, new versions are only used to upgrade an archetype when a new
_incompatible_ requirement occurs on an existing archetype. There are unlikely
to be many of these once you analyse a few possibilities - and it is important to
realise that such changes are most likely to be local.

Most changes that you can think of are dealt with by:
- judicious initial design; e.g. adding 4th sound to BP (systlic and diastolic are
1st and 5th sound pressures respectively) can be done with no changes to the
BP measurement archetype.

- specialisation: if you want a new kind of lab test to be specifically modelled,
just create a new specialisation of the lab result archetype

- revision: certain changes like adding a new field, or changing a cardinality
from 1..1 to 1..0 can be made to the current version of the archetype.

- template changes: templates can turn off and on ad rename various parts of
archetypes with no effect to the archetype semantics.

Hope this helps.

- thomas

The reality of the problem domain _will_ change over

time, and must be reflected in software somewhere, somehow.

Not directly in the software. The following types of logic can be represented
formally outside the software:
- many kinds of business rules, often expressed in a special language
- workflow specifications, e.g. in languages like XPDL or BPEL
- computerised guidelines, in languages like GLIF, Arden
- structural specifications like archetypes
- terminology, like Snomed

Changing any of these artefacts does not require changes to the database or
software in general; of course, if someone decides to build say a GUI or
application specific to a given guideline then this will be affected. But normally
this is t be avoided.

- thomas beale

Thank you very much everyone, and lately you, Thomas. You’ve clarified a lot. I now understand (as well as I could at the moment) how change is managed. And thanks for posting your piece on data storage. That was helpful.

Randolph