Request: openEHR temporal data instance needed ...

hi list,

for my research on temporal data management of hierarchical medical data i am looking for a openEHR data instance that can be used as running example to work with.

the data instance should have following properties:

* contain time intervals, e.g. a history of patient's symptoms, history of medications etc.
* the time intervals should possibly overlap.
* aggregates like count, min, max, avg over the data (e.g. number of medications taken, avg of a observed value, etc.) should be possible.
* the event context start time should possibly be after the time intervals in the data or at least start after the first time interval in the data.
* the data instance should be in openEHR's XML format

does anybody have such a data instance and the corresponding archetype?
or at least, can anybody give me a clue for which archetype whose instances satisfy the above properties i should look for?

cheers
bruno

hi again,

since i wrote my request to the list regarding openehr data instances i have browsed the archetype repository and found some archetypes of which i think their instances could satisfy my temporal requirements.

openEHR-EHR-COMPOSITION.prescription.v1.adl:

in the lower level archetype openEHR-EHR-ITEM_TREE.medication.v1 the dates of first and last administrations can be recorded which form an interval. an instance of this archetype can record more than one medication, thus, the administration intervals might overlap. the amount of medications and other calculations can be performed medications, which are aggregates. the administration interval might not be contained in the context time.

similar applies to the archetypes below, ie they have lower level archetypes which can record data over time intervals and aggregates can be calculated.

openEHR-EHR-COMPOSITION.problem_list.v1draft.adl
openEHR-EHR-COMPOSITION.report.v1.adl
openEHR-EHR-COMPOSITION.history-medical_surgical.v1draft.adl

what do you think?
are my interpretations of those archetypes right?
does anybody have example instances of those archetypes that s/he can give me?
i have some instances, but maybe somebody has better ones, maybe even real ones.

cheers
bruno

Bruno Cadonna wrote:

Hi Bruno,

I think you are probably on the right track. The COMPOSITION
archetypes are containers which will include a number of different
clinical statements such as medication, diagnosis, lab results, all or
any of which might have associated overlapping timings. It would be
helpful to know a little more of what you mean by "temporal data
management of hierarchical medical data" to help us give better
advice.

Ian

Dr Ian McNicoll
office +44(0)141 560 4657
fax +44(0)141 560 4657
mobile +44 (0)775 209 7859
skype ianmcnicoll

Consultant - Ocean Informatics ian.mcnicoll@oceaninformatics.com
Consultant - IRIS GP Accounts

Member of BCS Primary Health Care Specialist Group – http://www.phcsg.org

Hi Ian,

I am sorry for my explanation being too vague.

Temporal data management is a field of data management focusing on the temporal aspect of data. Given temporal data, i.e. data with one or more time dimensions, operations like temporal joins and temporal aggregates are different compared to their counterparts for non-temporal data.
To make things clearer, the following example describes one type of temporal aggregation called instant temporal aggregation:

A patient was prescribed 2 medications. The first medication was prescribed for the interval 0 to 10, the second medication was prescribed for the interval 5 to 15. Assume you want to know how many medications were prescribed for this patient over time. First, you have to compute time interval for which the data does not change in time. This operation is called time slicing. In this example the constant intervals after time slicing are:
[0, 4]
[5, 10]
[11, 15]
The second step in temporal aggregation is to calculate the aggregate value -- in this case the number of medications -- for each constant interval, which are:
[0, 4], 1
[5, 10], 2
[11, 15], 1

In this example there are only two intervals but there might be a lot more with much more overlapping sections becoming a challenge regarding computing. Instant temporal aggregation is just one type of temporal aggregation, there are some more.
With relational DBMSs you do temporal aggregation with using complicated SQL queries, however, those are rather inefficient. In the last two decades researchers in temporal data management came up with some temporal data models, temporal operations and corresponding efficient algorithms, mainly for the relational data model. My research focuses on temporal data management on hierarchical data, like XML. Since I like the openEHR idea and I have worked in Health Informatics for the last years, I would like to use openEHR data instances.

At the moment I am looking for a sound running example, which is clinically relevant and needs temporal data management. I was thinking about some temporal aggregates over a prescription list, a problem list or some other archetyped data with potentially overlapping time intervals. If somebody has an idea, s/he is really welcome.

I hope things are clearer now.

Cheers
Bruno

Ian McNicoll wrote:

Hi Bruno,

Your research topic is very interesting and I believe openEHR architecture
ease hierarchical temporal data management. I don't have openEHR data
instances which satisfy your requirement. However, if you or anybody have
real case scenario, I would be able to generate the instances for you.

Cheers,

Chunlan

Hi Chunlan,

Thank you for being willing to help me with the data instances.
That is great and I really appreciate.

I have a question regarding your last post.
Could you explain why you believe, that openEHR architecture ease hierarchical temporal data management?

Cheers
Bruno

Chunlan Ma wrote:

Hi Bruno,

The reason that I believe openEHR architecture ease hierarchical temporal
data management is because that I think archetypes and semantic query
language that openEHR offers would smooth the process of deploying your
temporal data management model. It has nothing to do with the management
model development itself.

I guess that your current research is to develop the mathematical model
dealing with all sort of temporal data. Data quality, data
representation/data schema, and data retrieval maybe out of your research
scope. However, these factors have to be taken into account when your model
is deployed in a real system.

At the moment, I see archetypes is the most flexible and powerful technology
to represent complicated clinical data, including temporal data. Both
archetypes and AQL (Archetype Query Language) are separated from system
implementation. They are reusable and sharable across institutions. They can
be used to represent and retireve the data sources/data inputs of your
model.

Cheers,
Chunlan

Hi Chunlan,

You are right, when you write archetyped data can be used to represent the data sources/data inputs of my research, but I am not sure about the retrieval of the data.

If we look at figure 37 of the openEHR overview document, my research focus rather on the persistence layer than on the upper layers. I believe costly computations with very large data -- and I assume EHRs will contain very large data -- should be done in the persistence layer using capabilities of the underlying data model and database management system. I consider my research related to the implementation dependant part of temporal queries on openEHR data.

If I understood it right, AQL is a domain specific query interface to the openEHR system. Looking again on the figure mentioned above and on the service model definition, it is not clear to me how exactly a query should be answered:
* Does the query engine chop the query in minimal pieces and mainly process the data in the application logic layer or
* does the query engine minimally preprocess the query, forward it to the back-end service layer which forwards it further to the persistence layer where the query is translated into the native query language of the database mangement system and executed? The results would go the inverse way with all needed postprocessing.

I believe the second variant is more efficient than the first, especially for aggregate queries.

I hope I did not misunderstand you.

Cheers
Bruno

Chunlan Ma wrote:

Hi Bruno,

From: openehr-technical-bounces@openehr.org [mailto:openehr-technical-
bounces@openehr.org] On Behalf Of Bruno Cadonna
Sent: Friday, June 20, 2008 6:54 PM
To: For openEHR technical discussions
Subject: Re: Request: openEHR temporal data instance needed ...

Hi Chunlan,

You are right, when you write archetyped data can be used to represent
the data sources/data inputs of my research, but I am not sure about
the retrieval of the data.

If we look at figure 37 of the openEHR overview document, my research
focus rather on the persistence layer than on the upper layers. I
believe costly computations with very large data -- and I assume EHRs
will contain very large data -- should be done in the persistence layer
using capabilities of the underlying data model and database management
system. I consider my research related to the implementation dependant
part of temporal queries on openEHR data.

If I understood it right, AQL is a domain specific query interface to
the openEHR system. Looking again on the figure mentioned above and on
the service model definition, it is not clear to me how exactly a query
should be answered:

[Chunlan Ma]
AQL is an archetype specific query language.

* Does the query engine chop the query in minimal pieces and mainly
process the data in the application logic layer or
* does the query engine minimally preprocess the query, forward it to
the back-end service layer which forwards it further to the
persistence layer where the query is translated into the native query
language of the database mangement system and executed? The results
would go the inverse way with all needed postprocessing.

[Chunlan Ma]
An AQL query engine can be implemented in various ways and each approach
would have its own pros and cons. For instance, a query engine can be fully
independent from persistent layer, but trouble is that the processing speed
maybe slow. Your second option would reduce some work load of the query
engine and improve the efficiency, but you need to think about how to keep
the system maintenance at a low level.

No matter how a query engine is implemented and how persistent layer is
implemented, the AQL query statement must be the same and the returned
result set must be same if the query is executed against same data.

At the moment, we haven't done much work on AQL temporal queries. Only at
the archetype level, can you please list some of your requirements, i.e.
what sorts of data you want to retrieve?

Cheers,
Chunlan

Hi Chunlan,

> AQL is an archetype specific query language.

Ok, I meant the same, I just used wrong terminology.

> No matter how a query engine is implemented and how persistent layer is
> implemented, the AQL query statement must be the same and the returned
> result set must be same if the query is executed against same data.

That is completely clear to me and I think this is really really an important requirement.

> ... but you need to think about how to keep
> the system maintenance at a low level.

I agree with you. My research will focus more on the effectiveness, though.

> At the moment, we haven't done much work on AQL temporal queries. Only at
> the archetype level, can you please list some of your requirements, i.e.
> what sorts of data you want to retrieve?

As I wrote in my last post, I am working on the persistence layer leveraging database technology to answer temporal queries on data in openEHR format. My results will hopefully be useful for future query engines for AQL.
From the AQL language definition viewpoint, my most important requirement is the definition of syntactical constructs for temporal operators like sliding temporal window, instant temporal aggregation, etc. Probably the functionality of those operators could also be defined as plain AQL hiding the complexity of the queries in query tools. So, why do we need temporal operators in the language definition? If there are not any temporal operators defined, the query engine will always use the same algorithm to process the query independently whether the query is temporal or not. In contrast, if the query engine is able to recognise a temporal operator, it could use an algorithm specially developed for that temporal operator, hopefully leading to shorter runtime.
I am at the beginning of my research, so I do not yet exactly know how such temporal operators for AQL could look like, but I would be glad to collaborate with the openEHR community to define some.

Cheers
Bruno

Chunlan Ma wrote: