multiple questions: database, archetypes, templates

Hi,

In the last couple of weeks I've been studying a bunch of openEHR documents
to see whether or not we want to use it in our own project, but I stumble
upon some questions. Perhaps somebody in this community is able to direct me
to the right direction for answering these...

1. openEHR has a quite solid model, but at the end of the day you always
need some form of persistance. I've read about some suggested ways of
working on the openEHR website, but I'm not convinced any of these are
actually that much practical. What is your experience? How do you persist
all the data being collected?

2. the archetypes are quite fascinating and I am very interested in
decoupling all the clinical concepts from the hard technical layer, but for
this to work fluently it seems that archetypes should be standardised on
themselves. Is a separate communinity working on this? And if no standards
exist yet (I know of the archetype mindmap, but I have the feeling we would
need more specific ones), what are the thoughts of the openEHR community on
exchanging information between different archetypes trying to express the
same thing? Connecting archetypes by connecting ontologies? And how about
exchanging information with HL7 based systems?

3. Is Ocean's Template Designer the only template designer currently in use?
Are any other projects in the making? [I tried to get a temporary license
for that, but the e-mail address to achieve this with doesn't exist
apparently].

I realise I have a lot of questions, but perhaps someone can direct me to
some valuable resources..
Thanks in advance,
NMazur

Hi Nancy,

Hi,

In the last couple of weeks I've been studying a bunch of openEHR documents
to see whether or not we want to use it in our own project, but I stumble
upon some questions. Perhaps somebody in this community is able to direct me
to the right direction for answering these...

I am interested in hearing more about your specific project.

1. openEHR has a quite solid model, but at the end of the day you always
need some form of persistance. I've read about some suggested ways of
working on the openEHR website, but I'm not convinced any of these are
actually that much practical. What is your experience? How do you persist
all the data being collected?

Though you can use any persistence approach and database you wish with
the Python implementation. It comes with an object database that makes
persisting the natural hierarchical information transparent. So there
is no need for an ORM layer (saving 30% or more application code).

2. the archetypes are quite fascinating and I am very interested in
decoupling all the clinical concepts from the hard technical layer, but for
this to work fluently it seems that archetypes should be standardised on
themselves. Is a separate communinity working on this? And if no standards
exist yet (I know of the archetype mindmap, but I have the feeling we would
need more specific ones), what are the thoughts of the openEHR community on
exchanging information between different archetypes trying to express the
same thing? Connecting archetypes by connecting ontologies? And how about
exchanging information with HL7 based systems?

This is probably better answered by others but my limited experience is
that most applications will need to specialize the existing archetypes.
Also note (AFAIK) all existing archetypes are in draft mode. None have
been released officially by the Clinical Review Board. So yes there is a
governance model but it is still in the process of formally approving
any of the available archetypes.

I realise I have a lot of questions, but perhaps someone can direct me to
some valuable resources..

The Python implementation is wrapped by an application server to make
writing web apps faster and easier. Even at that, there is a lot to do
and learn in order to be productive after installing the software. We
are currently finishing up a 10 day workshop and that code is being
committed to the LNCC branch. It will be merged with the TRUNK next
week or you can get a snapshot of it at anytime if you wish.

While it is all still 'alpha' code we have learned a lot over this
workshop and are proceeding at a rapid pace. For more info you can
visit the links below.

--Timm

Hi Tim,

what a quick reply! Thanks!

Hi Nancy,

Hi,

In the last couple of weeks I’ve been studying a bunch of openEHR documents
to see whether or not we want to use it in our own project, but I stumble
upon some questions. Perhaps somebody in this community is able to direct me
to the right direction for answering these…

I am interested in hearing more about your specific project.

We’re working on an order-entry/result-reporting subsystem…

[openEHR <> persistance]

Though you can use any persistence approach and database you wish with
the Python implementation. It comes with an object database that makes
persisting the natural hierarchical information transparent. So there
is no need for an ORM layer (saving 30% or more application code).

I think I will have to take a closer look at you code… Up til now I’ve only looked at the
Java reference implementation.
As for us, we are clearly bound to be working in a relational database context, and that makes it
relevant of how to map things and model your data at that persistance layer.

[standardized archetypes]

This is probably better answered by others but my limited experience is
that most applications will need to specialize the existing archetypes.
Also note (AFAIK) all existing archetypes are in draft mode. None have
been released officially by the Clinical Review Board. So yes there is a
governance model but it is still in the process of formally approving
any of the available archetypes.

If the archetypes are not standardized yet, then the openEHR community must have some opinion on how
different openEHR systems communicate, if each of these uses a different archetype for a specific clinical
object of interest?

I realise I have a lot of questions, but perhaps someone can direct me to
some valuable resources..

The Python implementation is wrapped by an application server to make
writing web apps faster and easier. Even at that, there is a lot to do
and learn in order to be productive after installing the software. We
are currently finishing up a 10 day workshop and that code is being
committed to the LNCC branch. It will be merged with the TRUNK next
week or you can get a snapshot of it at anytime if you wish.

While it is all still ‘alpha’ code we have learned a lot over this
workshop and are proceeding at a rapid pace. For more info you can
visit the links below.

I will definitely have a look there! Thanks for the infos!

Nancy

Hi Nancy,

NMazur wrote:

Hi,

In the last couple of weeks I've been studying a bunch of openEHR documents
to see whether or not we want to use it in our own project, but I stumble
upon some questions. Perhaps somebody in this community is able to direct me
to the right direction for answering these...

1. openEHR has a quite solid model, but at the end of the day you always
need some form of persistance. I've read about some suggested ways of
working on the openEHR website, but I'm not convinced any of these are
actually that much practical. What is your experience? How do you persist
all the data being collected?

There is not much difference in storing openEHR data and most other kinds of object data that have some inherent complexity. The general classes of approach are:

  • classic object/relational mapping layer
  • object database
  • XML → XML database
  • XML → relational database
    In our product (Ocean Informatics) we use a fairly typical Object/Relational approach that is used in many products - the Compositions are expressed as XML, compressed and stored in a relational database along with various meta-data and indexing columns. This works surprisingly fast and we have no problems so far with performance in the 1,000,000 EHR range. There are also object database approaches which I have worked with before, and Tim has mentioned the Zope/Python approach.
2. the archetypes are quite fascinating and I am very interested in
decoupling all the clinical concepts from the hard technical layer, but for
this to work fluently it seems that archetypes should be standardised on
themselves. Is a separate communinity working on this? And if no standards
exist yet (I know of the archetype mindmap, but I have the feeling we would
need more specific ones), what are the thoughts of the openEHR community on
exchanging information between different archetypes trying to express the
same thing? Connecting archetypes by connecting ontologies? And how about
exchanging information with HL7 based systems?

The international governance of archetypes is the key to developing archetypes cooperatively and not creating incompatible duplicates. A major online resource for this will be announced soon, holding all of the hundreds of current openEHR archetypes, and allowing the internatoinal clinical community to become actively involved in developing and maintaining these models.

Exchanging information with other systems is done via templates designed to mimic the structure of messages, database tables and so on.

3. Is Ocean's Template Designer the only template designer currently in use?
Are any other projects in the making? [I tried to get a temporary license
for that, but the e-mail address to achieve this with doesn't exist
apparently].

The Ocean Template Editor is currently the only one. This will change over the coming months, but the main event of importance is that a proper open standard for templates is being finalised - see http://www.openehr.org/wiki/display/spec/openEHR+Templates+and+Specialised+Archetypes . The next generation of template editors, viewers and browsers will be based on this specification.

  • thomas beale

Hi,

Congratulation! You found a gold vein.
I am also hacking it.

<snip>

Up til now I've
only looked at the
Java reference implementation.
As for us, we are clearly bound to be working in a relational database
context, and that makes it
relevant of how to map things and model your data at that persistance layer.

Hi Nancy,

No reason you cannot persist archetype derived content in a relational
database of course. While I still have work to do I can point you to
some code which uses the java parser to open adl files that represent
archetypes, navigates the tree and pulls out the data elements it
needs.

To see the net effect as far as the application goes there is a video
on this page that shows how a blood pressure archetype is opened into
a form builder.

http://www.patientos.org/software/video_files/openehr/patientos_openehr.htm

Then if you are curious to see how forms look and interact within the
application from an end user perspective you can see the latest
version here:

http://www.patientos.org/forum_temp/82/82overview.htm

Here is the (messy) code that I threw together one night to parse the ADLs...

http://patientos.svn.sourceforge.net/viewvc/patientos/trunk/src/com/patientis/client/external/ImportArchetype.java?revision=45&view=markup

Thats about as far as I have got :slight_smile:

I think you only need adl-parser.jar (Rong Chen is the expert on that)

Good luck

Greg

http://www.patientos.org

with various meta-data and indexing columns. This works surprisingly fast
and we have no problems so far with performance in the 1,000,000 EHR range.
There are also object database approaches which I have worked with before,
and Tim has mentioned the Zope/Python approach.

I don't think that there is much difference between 10.000 and 1.000.000
EHR's.
It comes, in those cases to the speed of the database engine, if the code is
slow, it is already recognizable with 10.000 records.

This is because, the code must retrieve data which are together the wanted
entity.
The time needed to do so has two parts,
- one for the code, doing the querying and optimizing,
- two, the database which only executes that what the code wants it to
execute.

So, code and optimizations for storing 10.000 EHR's are exact the same as code
and optimizations for 1.000.000 EHR's.

Or am I wrong in this. If so, please explain.

Thanks, Bert

There are older SQL adapters but you'll likely want to look at
RelStorage http://pypi.python.org/pypi/RelStorage/1.1c1 if you are
interested in using the Python code.

Cheers,
Tim

So, code and optimizations for storing 10.000 EHR's are exact the same as
code and optimizations for 1.000.000 EHR's.

Forgot to add, the concept is (thus) very scalable, same code can run 10.000
or 1.000.000 EHR's

Bert

So, code and optimizations for storing 10.000 EHR's are exact the same
as
code and optimizations for 1.000.000 EHR's.

Forgot to add, the concept is (thus) very scalable, same code can run
10.000
or 1.000.000 EHR's

I think, of more importance is from how many accesses data can be
entered/read. The isolation of transactions is more important, than how
many EHR's a database can contain without collapsing performance, because,
the latter is mostly a database-issue (and also indexes, etc), and the
former is a pure code-issue.

Bert

Bert Verhees wrote:

with various meta-data and indexing columns. This works surprisingly fast
and we have no problems so far with performance in the 1,000,000 EHR range.
There are also object database approaches which I have worked with before,
and Tim has mentioned the Zope/Python approach.
    
I don't think that there is much difference between 10.000 and 1.000.000
EHR's.
It comes, in those cases to the speed of the database engine, if the code is
slow, it is already recognizable with 10.000 records.

This is because, the code must retrieve data which are together the wanted
entity.
The time needed to do so has two parts,
- one for the code, doing the querying and optimizing,
- two, the database which only executes that what the code wants it to
execute.

So, code and optimizations for storing 10.000 EHR's are exact the same as code
and optimizations for 1.000.000 EHR's.

*except when it comes to population quering, database reindexing, and
disk file system management...

- thomas

So, code and optimizations for storing 10.000 EHR's are exact the same
as code
and optimizations for 1.000.000 EHR's.

*except when it comes to population quering, database reindexing, and
disk file system management...

What you tell me reminds me of this:

I once have seen a hospital information system based on objects-storage.
The system was initially build in 1996. I keep the name of the system
private.

They put everything in kind of blob's (memory-dumps to disk, could have
been XML, if it had been in 2000, but wasn't, in that time, XML was
something very new in 1996). So it were binary-dumps.
Every time they queried a patient, they opened its BLOB, and had
everything, so it was a very fast system, and very flexible. They could
add objects, version objects, define new classes/objects, etc.

Data-exchange was just a matter of BLOB-exchange. The BLOB's were self
describing.

And the querying, datamining, at idle time, and on request, they opened
the blob's and pumped the required data to a relational database, and did
the querying and reports there. Everytime a BLOB was updated, they
updated, if appropriate, the besides running database as well.

Everything worked fine, it was not very handy for ad-hoc querying, but it
worked.

This system could be used in Openehr-storage. (maybe they read this, and
think, that is THE idea, they can contact me for help).

I don't do it that way, I don't trust BLOB's, I use, a mixture of XML and
a relational database. But my system is from 2008 instead of 1996,
although, the system of 1996 is still running.

It is typical, we all come to a mixture of XML (or BLOB's) and relational
data.

Bert

Hi Greg, Bert, Thomas,

sorry for having been offline in this discussion for a couple of days, but being new to all this I needed some time to digest/lookup/read what you’ve been writing about.

Which brought me to a number of other questions:

  • I’ve been reading some bits and pieces about XML Databases, whether native or not… and that all seems interesting, also Bert, you’ve mentioned the BLOB-based storage, and I can understand that the overhead of indexing/and others can be performed at specific moments, but I have understood that unforeseen queries have a much worse scalability issue with XML-based databases than with relational ones. Which still gives me the uneasy feeling that querying your EHRs can become a nightmare?
  • one thing that I’m a bit confused about is the actual relation between an archetype and the reference model. Say that you have an INSTRUCTION archetype, then persisting this consists of persisting data that is required/mandatory/optional in the reference model defintion of an INSTRUCTION, accompanied by all the data that is added as from the archetype. Right? So in a sense, this feels like an open invititation to map the reference model to your database model, and have the archetypes-defined data stored in your blobly-like database? How are your ideas on this?
  • in your database model, you tend to have everything expressed explicitly (i.e. sort of pure relational). Am i right?

Thanks in advance,
Nancy

Nancy Mazur wrote:

Hi Greg, Bert, Thomas,

sorry for having been offline in this discussion for a couple of days,
but being new to all this I needed some time to digest/lookup/read
what you've been writing about.

Which brought me to a number of other questions:

- I've been reading some bits and pieces about XML Databases, whether
native or not... and that all seems interesting, also Bert, you've
mentioned the BLOB-based storage, and I can understand that the
overhead of indexing/and others can be performed at specific moments,
but I have understood that unforeseen queries have a much worse
scalability issue with XML-based databases than with relational ones.
Which still gives me the uneasy feeling that querying your EHRs can
become a nightmare?

the way we approach querying in openEHR is with an archetype-based
querying language, AQL. See
http://www.openehr.org/wiki/display/spec/openEHR+Query+Specifications

This needs to be implemented for each kind of persistence back-end. To
have really good global performance, what items are indexed and how the
indexing works should be self-adapting to query traffic, but even
without this, our implementation (Ocean Informatics) has subsecond
response for most reads and writes.

- one thing that I'm a bit confused about is the actual relation
between an archetype and the reference model. Say that you have an
INSTRUCTION archetype, then persisting this consists of persisting
data that is required/mandatory/optional in the reference model
defintion of an INSTRUCTION, accompanied by all the data that is added
as from the archetype. Right? So in a sense, this feels like an open
invititation to map the reference model to your database model, and
have the archetypes-defined data stored in your blobly-like database?
How are your ideas on this?

to be correct, we talk of 'persisting reference model data, built
according to an archetype'. An archetype is a constraint statement - if
you use it to build date, it is a bit like using a mutli-dimensional
cookie cutter - you don't store the cookie cutter, you store the cookies
you make with it. Of course, the cookie cutter itself does have to be
stored somewhere, but that is in a knowledge base, not the operational
EHR database.

- in your database model, you tend to have everything expressed
explicitly (i.e. sort of pure relational). Am i right?

this may be true of some implementations, but certainly not all. For
some further ideas on persistence of archetyped data, see
http://www.openehr.org/wiki/pages/viewpage.action?pageId=786487

- thomas beale

or not... and that all seems interesting, also Bert, you've mentioned the
BLOB-based storage, and I can understand that the overhead of indexing/and
others can be performed at specific moments, but I have understood that
unforeseen queries have a much worse scalability issue with XML-based
databases than with relational ones. Which still gives me the uneasy
feeling
that querying your EHRs can become a nightmare?

For the record, it was not my solution, I was talking about, but it is a
solution in an EHR system, not openehr, but also object-based, from 1996.
Querying was not a nightmare, most of the time, because, most queries are
well-known. Unknown queries needed preparation.

In the present time, one must ask in how far this is a problem. It is
always possible for a technician, who understands the RM, any query can be
build and executed in less then a few hours.

Strictly querying the openehr-RM-classes this is not too difficult, one
could think about solutions as HQL from Hibernate, or OCL.
This could be possible without writing much code, I believe there is an
OCL implementation for java-objects, the only thing needed is that the
classes can transparent load their data. You start on the top, the
EHR-object, and query into the compositions, or anything else.
But there is a problem, it may be easy to implement OCL, but it is not
easy to use. This is mainly because (f.e.) a "List" is not often used in
the RM, but an "ItemList" takes it's place, there is no text
attribute-type, but there is a DvText, etc.
These kind of RM-complexity make the use of languages as OCL hard to
understand and use.

It will be quite an advantage to build and implement a query-language,
that works on the RM, and which will be user-friendly, and eliminates the
need for technicians.

- one thing that I'm a bit confused about is the actual relation between
an
archetype and the reference model. Say that you have an INSTRUCTION
archetype, then persisting this consists of persisting data that is
required/mandatory/optional in the reference model defintion of an
INSTRUCTION, accompanied by all the data that is added as from the
archetype. Right? So in a sense, this feels like an open invititation to
map
the reference model to your database model, and have the
archetypes-defined
data stored in your blobly-like database? How are your ideas on this?
- in your database model, you tend to have everything expressed explicitly
(i.e. sort of pure relational). Am i right?

No, I explain.

I describe now shortly, in a few lines how I solve this problem, it maybe
can help you and others to think about a solution.

I would really like others to do the same (describe their database
solution), but that does not happen too much. That is a pity.

So, my database-structure is not a relational mirror of the RM (for every
class a table), and it is not pure relational.
I have parts of data stored in XML (specially, the more simple
structures), mostly to avoid excessive querying of the database. This
works fine. (retrieve one XML-string from the database (fast query), and
having a complete structure filled up with data)

Also which gives the kernel-database-layer speed is, I wrote the
data-retrieval "lazy". This means that data are only on request collected.
I also, take advantage of the database-rows, data which are implicit
collected (while collected other data, in one single database-read-action)
fill already up the classes. So the querying is lazy and in other moments
also pro-active.

The RM-classes fill up their selves, so, you can asked a list of
compositions from an EHR, without doing any database call, of being aware
of a database.
Thios is a transparent layer which does not know anything about the
database, and below that, I can put whatever I want. Now I have the
solution as described here, but switching to another solution (f.e. ther
blob-structure, or intersystems Cache) is easy, and will not disturb the
application-layers

I am thinking about implementing OCL, as a query language, but this still
will be the domain of technicians.

I read some discussions about AQL, which seems very interesting, but I
don't see an implementable solution or specification now (maybe I missed
it)

As soon as it is ready, I will look further into that subject.

I hope this helps

Regards
Bert

Hi Nancy and Bert,

If you have an AQL compliant system, then you can build adhoc queries just as you would with a relational system. We have built such a system using a blob based storage approach and as Tom said, we are getting very good performance with high load and large databases (more than 25 million compositions in our load test system.) The queries can be built with a tool in a few seconds using a drag and drop approach. The really exciting thing about AQL is that the syntax is based on the semantics of the query, not the data structure (as in a relational SQL query). This means that you can write an AQL query and run it on ANY AQL compliant system and expect to get the same result set independent of the persistence model. The draft specification for AQL is here as Tom has pointed out

[http://www.openehr.org/wiki/display/spec/openEHR+Query+Specifications](http://www.openehr.org/wiki/display/spec/openEHR+Query+Specifications)

There is a difference of course with population vs individual EHR queries. Population queries are slower as you would expect (and as they are in a relational db) and can be tuned by setting up specific indexes for data that you know is going to participate in a query frequently (as you would for a relational db.) We are still seeing performance that differs little from a pure relational approach and offers hugely increased flexibility in the ability to manage complex clinical data.

regards Hugh

Bert Verhees wrote:

Thanks Hugh, it is very interesting, and very promising.

I will break my head on it sometime later, but I see good opportunities
for implementing this in Java, having a few indexes more.

Bert

<cookie-cut>

> - one thing that I'm a bit confused about is the actual relation
> between an archetype and the reference model. Say that you have an
> INSTRUCTION archetype, then persisting this consists of persisting
> data that is required/mandatory/optional in the reference model
> defintion of an INSTRUCTION, accompanied by all the data that is added
> as from the archetype. Right? So in a sense, this feels like an open
> invititation to map the reference model to your database model, and
> have the archetypes-defined data stored in your blobly-like database?
> How are your ideas on this?

to be correct, we talk of 'persisting reference model data, built
according to an archetype'. An archetype is a constraint statement - if
you use it to build date, it is a bit like using a mutli-dimensional
cookie cutter - you don't store the cookie cutter, you store the cookies
you make with it. Of course, the cookie cutter itself does have to be
stored somewhere, but that is in a knowledge base, not the operational
EHR database.

- thomas beale

Exactly, I do not store the cookie cutter (potentially could for
reference but not useful yet) - rather I leverage the Java
Implementation of the ADL parser to cut out the cookie and store that.
I can potentially map it back and theoretically leverage either
someone elses AQL parser or write one to consume AQL queries.
Performance is not really a concern - no more than for the rest of the
system -because the stored data (cookie) is not in a complex hierarchy
or blob form. It is stored relatively simply with references back to
that hierarchy.

Bert there is an AQL spec but it looked like it may be simplified in
the near term. Even less extensively used I believe is the ability to
export/import using an OpenEHR specification.

Personally I suspect of more immediate value is not necessarily the
exchange of interoperable data but perhaps if an organization defined
all their clinical documentation in terms of archetypes and templates,
then wanted to switch EHR's, they would not lose all that 'build' work
which would save significant cost in implementation time of the second
EHR.

Greg

http://www.patientos.org

Bert there is an AQL spec but it looked like it may be simplified in
the near term. Even less extensively used I believe is the ability to
export/import using an OpenEHR specification.
  

As it is now, it can easily be translated to OCL, and therefore easy to
implement, just add some indexes.
Another way would be separate tables which contain PATH and pointers to
stored locatable-objects.

But these are a quick estimates, I admit

When I see the roadmap of AQL, it is not something that will be hot for
the coming year

Bert

Hi Thomas,

  • one thing that I’m a bit confused about is the actual relation
    between an archetype and the reference model. Say that you have an
    INSTRUCTION archetype, then persisting this consists of persisting
    data that is required/mandatory/optional in the reference model
    defintion of an INSTRUCTION, accompanied by all the data that is added
    as from the archetype. Right? So in a sense, this feels like an open
    invititation to map the reference model to your database model, and
    have the archetypes-defined data stored in your blobly-like database?
    How are your ideas on this?

to be correct, we talk of ‘persisting reference model data, built
according to an archetype’. An archetype is a constraint statement - if
you use it to build date, it is a bit like using a mutli-dimensional
cookie cutter - you don’t store the cookie cutter, you store the cookies
you make with it. Of course, the cookie cutter itself does have to be
stored somewhere, but that is in a knowledge base, not the operational
EHR database.

no no, I’m not talking about storing the cookie cutter, indeed about storing the cookie… but
the raisins that are on the cookie are not specified by the cookie cutter, but at the end, it of
course all ends up in the cookie box (and then eaten)…

Back to the data captured based on a archetype… part of the data was specified by the
archetype kind (say, INSTRUCTION, EVALUATION or whatever CARE_ENTRY), and part of the
data comes from the archetype specification itself…

I was just wondering if that doesn’t naturally imply a relational approach to the RM, and a XML-BLOB-approach
the the rest of the data captured by an archetype…

  • in your database model, you tend to have everything expressed
    explicitly (i.e. sort of pure relational). Am i right?

this may be true of some implementations, but certainly not all. For
some further ideas on persistence of archetyped data, see
http://www.openehr.org/wiki/pages/viewpage.action?pageId=786487

Thanks for the pointer!

Nancy