persistence

HAPPY NEW YEAR!!!

My question is on persistence layer for openehr java.
I was wondering if any body could give me a link to such.

Also, is it possibe to generate a hibernate mapping for the openehr java classes for use for a db schema. In other words, does the openehr classes include a data model layer.

(Apologies if its a stupid question but please someone kindly elucidate)

Cheers,
Ime

Hi Ime,

There is some information on the developer's wiki
http://www.openehr.org/wiki/display/dev/Developers+Home

I personally believe (though I haven't yet proved it). That it is much
more efficient to use an object database. Realized by code reduction
(30%?) and ease and speed of queries.

A project for another day is for me to write this up and post it on the
wiki as well.

Cheers,
Tim

Ime Asangansi schreef:

HAPPY NEW YEAR!!!

A bit early (13 hours to go to the new year, overhere), No problem.

There are many possibilities for persistence-layers, all solutions have
pro's and con's, for example
- as Tim suggested, a object database
- hibernate
- mimic hibernate-architecture in code (this is a part of my solution,
it is much work, but I have everything under control)
- blob's in relational-database
- hybride between blob's and relational
- dump objects to XML
- more hybrid's possible........
- add extra indexes for OLAP

etc, etc.

It also depends what your purpose is, and will be, how about scalability
(small scale gives you possibilities which can better be avoided in
large scale)

But whatever you choose, you have to buy it or build it yourself,
because no one is publishing his solution.
If you build one by yourself, I like to advise you that you take the
java-code as a good source of inspiration, but feel free to change
small, but important things.

The immutability in the java-kernel can sometimes stand in your way,
when this happens to you....
My solution is therefore, use what is usable for you, but do not let
your choice restrict you to that choice.

I have build a persistence layer, first in hibernate, but there were
some problems, I switched to my own solution.

Best thing (IMHO) to achieve would be an abstract and small persistence
layer, open sourced, to which the kernel can connect, and below, very
many choices of possible solutions can connect.

This would have some strong advantages:
- the reference kernel could be used unchanged to everyone, it could
really be shared code, which, for now, in my case, it is not
- problems in design, which reflect to the kernel code, especially its
immutability, could be solved, which are not solved now.

I suggest this every three months on this list, but the discussion
always dies without explicit mentioned reason.

It is not that I give up, I can go my own way, no problem, the specs are
in the documents, not in the reference java implementation, I use what I
can use from the kernel, I thank Rong and others for their great job,
but it is a pity, we can not work together.

I believe, it is rather sensitive, because, when you publish a
persistence-layer, you have a full blown product, which others can use,
I think, people fear to put their self out of business if they publish
too much knowledge. That, I beleive is the reason because the
discussions about this subject often die.

Those are my feelings about the situation, there is a chance that I am
wrong.

kind regards, and we, have a happy new year, in about 13 hours.
Bert Verhees

I discovered it just now, about saying what I wrote, more or less

http://www.openehr.org/wiki/pages/viewpage.action?pageId=786487

Bert

Thanks a lot Bert and Tim and others,

Tim, It will be interesting to explore your option. Object dbs are getting more popular now.

Bert, thanks for being very frank with the issues. I will think over the possibilites (vis-a-vis my still-small programming experience). I would support your open-source approach. I am (still) thinking of working on an open source project. I dont think anything stops anyone (openehr license-wise) from making an open persistence product.

Another option might be xml persistence, xindice-based. but I fear the speed issues with xml (in large dbs).

Cheers,
Ime

Bert Verhees wrote:

I believe, it is rather sensitive, because, when you publish a
persistence-layer, you have a full blown product, which others can use,
I think, people fear to put their self out of business if they publish
too much knowledge. That, I beleive is the reason because the
discussions about this subject often die.


I would suggest various reasons:

  • building an enterprise usable persistence layer - one that is highly performant, reliable (in terms of data integrity) and scalable is a serious endeavour. It requires real investmnet, not just for the design and implementation, but for load and performance testing. Trying to do it open source is likely to be a slow project, because it requires concentrated ongoing effort.

  • there is no such things as the perfect back-end for all use contexts, so a single mighty build-it-once-forever open source solution is likely to be flawed from the outset. What would be useful as open source is the binding layer containing an agreed API, e.g. an object db style API, or my current node+path idea (http://www.openehr.org/wiki/pages/viewpage.action?pageId=786487).

  • Because of the different needs of different contexts, commercial implementations are more likely to be the right business model, as long as they conform to the interfaces demanded by the community, possibly be incorporating lightweight open source components.
    Therefore one of the needs of openEHR (and information processing in general) is a standardised persistence layer API that provides the right kind of sematics without predetermining limiting choices to do with technology, performance or scalability to much. We have recognised this for a long time in openEHR, but not had the effort to implement it. I would describe the levels of interfaces needed as fllows:

  • a virtual EHR API - this is a fine-grained application interface for talking to an openEHR system, including building, saving and retrieving EHR data. It does not contain any persistence primitives, but provides the main interface for any application writer, who should be able to largely forget about everything else

  • an EHR service interface - this is a coarse-grained interface that knows about Compositions, Contributions, queries

  • a persistence layer that is archetype- (i.e. path-) enabled, in the sense of the node+path model described above.
    For the first two we have developed working versions in our own products, and will release the entire interfaces soon in documentary form. There is not much secret here, and we would expect an openEHR ‘standard’ for these interfaces to be developed by integrating such APIs built by anyone in the community, or defining alternative/componentised APIs, if it makes sense in some cases. The normal community process is the correct vehicle for doing this (i.e. discussion, proposals, Change Requests, ARB review etc).

The 3rd layer above is not standardised at all, and would not have to be openEHR-specific, but needs to know minimally about paths. Creating a specification for this could be useful for creating archetype-based information processing systems in general. This could be done by the same process, but will undoubtedly take longer since more implementation-based evidence is required.

Lastly, implementing highly performant database layer is largely a matter of experience. Beginners may think that you can just plug an object model into a hibernate tool and generate a schema, but it wll be dreadfully inefficient. Object databases as Tim has said are better aligned semantically to the kind of data we are dealing with, although they generally don’t know much about paths. XML-enabled databases may in fact be a better rather than a worse bet than a straight relational design, due to the path capabilities, although I have no direct experience with them (and I agree that almost anything to do with XML is sadly compromised at the outset by the terrible inefficiency of XML itself - but nevertheless, using XML in our own SQL Server system is surprisingly fast).

There are two main positions for people in this community to take:

  • front-end people - application developers who just want to get real systems up and running,
  • back-end people - people who want to engineer enterprise ready openEHR back-ends

The latter job is the expensive one, and I recommend that people think carefully before just jumping in. This is not to discourage competition, diversity etc - it is simply that the world is littered with dead projects which turn out to be 10x harder than the developers thought. I therefore believe that the best approach is to find the correct balance between standards / open source, and commercial implementations, which are based on those standards. Getting these API and service layers is a key part of that approach, and is how commercial implementations can be kept honest. By the way, commercial implemtations may or may not be open source.

In the end, our approach in openEHR is overturning some basic beliefs about how to build information systems, so we probably have to accept that making the whole thing successful will not be instant. We seem to be making good progress however, and I think the effort will be worthwhile.

  • thomas beale

Hi everybody,

just a short note:

I am more a front-end person (plan to start a OSS GUI project in
2008), although I have an vested interested in a open persistence
solution, since I would like to see an end-to-end system demonstrator
based on OSS components (GUI, kernel, persistence). IMO (and in Rong's
etc) this would really boost openEHR.

I believe a generic persistence layer API(s) - as Tom said - is the
way to go. This won't happen in one go. So in a truely agile
development style this has to happen over several iterations, while
every iteration product has to be usable!

The reason for this post is that I recently investigated the IBM DB2 9
DBMS . This could be a good starting point or reference to build the
API layer on.

Reasons for IBM DB2 9 DBMS
(http://www-306.ibm.com/software/data/db2/express/) are:
- the Express-C version is free and only has restrictions regarding
the hardware (max 2 CPUs and 2 GB Ram). Compared to the 'Express'
versions of Oracle and Microsoft, with limited DB size etc.
- it is a long-around enterprise SQL DBMS, now with additional native
XML support (pureXML feature)
- it seems to have good documentation (2 books and several recent
articles) and a active community
- there is a well-maintained performance testing toolkit for it
(http://tpox.sourceforge.net/).
- the Express-C license can even be used commercially! If the hardware
limitations become a problem an upgrade can be bought without having
to change the underlying code.

Cheers, Thilo

Thilo Schuler schreef:

Hi everybody,

just a short note:

I am more a front-end person (plan to start a OSS GUI project in
2008), although I have an vested interested in a open persistence
solution, since I would like to see an end-to-end system demonstrator
based on OSS components (GUI, kernel, persistence). IMO (and in Rong's
etc) this would really boost openEHR.

I believe a generic persistence layer API(s) - as Tom said - is the
way to go. This won't happen in one go. So in a truely agile
development style this has to happen over several iterations, while
every iteration product has to be usable!

The reason for this post is that I recently investigated the IBM DB2 9
DBMS . This could be a good starting point or reference to build the
API layer on.
  

If you need any help?

Bert

For openEHR I will concentrate on the GUI part. Had to investigate it
for a uni project.

Just wanted to let everybody know about IBM DB2 9.5, which I think is
a fair, "uncrippled" offer.

Thilo Schuler schreef:

For openEHR I will concentrate on the GUI part. Had to investigate it
for a uni project.

Just wanted to let everybody know about IBM DB2 9.5, which I think is
a fair, "uncrippled" offer.

oh

?!?!?

Bert Verhees schreef:

Thilo Schuler schreef:

For openEHR I will concentrate on the GUI part. Had to investigate it
for a uni project.

Just wanted to let everybody know about IBM DB2 9.5, which I think is
a fair, "uncrippled" offer.

oh

Sorry, Clicked accidently on Send

I hope someone will pick up this subject, I would be glad to help

So, if someone considers, DB2, or any other DB, that shoudln't matter,
there are many free DB-engines, it should not be visible in the API.

Bert

True, API persistence layer should be generic as said previously
mentioned. Although originally it needs to be developed based on a
reference DBMS and for this DB2 looks attractive (quick results?) on
first sight.

True, API persistence layer should be generic as said previously
mentioned. Although originally it needs to be developed based on a
reference DBMS and for this DB2 looks attractive (quick results?) on
first sight.

Not any more than, say, PostgreSQL. Actually less so, some are likely to argue.

I, for one, wouldn't bother trying to get DB2 to work on my Debian systems were I to test-drive an OpenEHR persistence engine. Other equally capable and more conveniently licensed DB engines come bundled with many a self-respecting operating system.

Karsten

Karsten Hilbert schreef:

True, API persistence layer should be generic as said previously
mentioned. Although originally it needs to be developed based on a
reference DBMS and for this DB2 looks attractive (quick results?) on
first sight.

Not any more than, say, PostgreSQL. Actually less so, some are likely to argue.

DB benchmarking is very difficult and depending on many factors.
Mostly this is done in various sub-functionalities.
For example, MySQL knows two table-types and three index types, and
maybe you can add third party software too.
Also it runs on BSD, Linux, Windows, and maybe Mac (don't know, the
last), and how may flavours of BSD and Linux and Windows are there, out
there, and which filesystems are used, how do you connect, etc......
Here is an example of benchmarking:
http://polepos.sourceforge.net/results/html/index.html

Let's just say that most of the popular DB-engines do a good job, that
is my experience. Not only speed is important, but also connectivity,
availability, CPU-use, memory-use, ANSI-SQL-use, other features.....

It depends on many factors which database suits best for your purpose.
An important factor is the schema you use.

Hi Thilo

I would think that Postgres or MySQL or even Firebird would be more open source friendly for such a project - nothing against DB2 but with the open source engines have no limitations at all and are easy to run on any platform… Lots of source code and examples and books etc…

regards Hugh

Thilo Schuler wrote:

Hi Hugh

I know about that Postgres and mySQL do a fantastic job regarding
relational data, but I am not sure about their XML capabilities (which
I think will be needed for the fine-grained openEHR compositions).
This is were DB2 is supposed to perform well due to their "pureXML"
technology that acts like a native XML DB. Here is a serious test
result http://www.intel.com/cd/ids/developer/asmo-na/eng/328960.htm .
Regarding books
(http://publib-b.boulder.ibm.com/redbooks.nsf/Redbooks?SearchView&Query=db2&SearchMax=4999),
examples (http://www-306.ibm.com/software/data/db2/ad/) and community
(http://www-306.ibm.com/software/data/db2/express/community.html)
doesn't look to bad...

Anyway was just an idea and the people that will work on the backend
will have to decide. But for the demonstrator I could imagine a
quick-and-dirty persistence solution basically based on a single table
with a couple of columns for searching and a XML column for the
composition data like Tom depicted in
http://www.openehr.org/wiki/pages/viewpage.action?pageId=786487 as
"basic serialisation". For this simple approach good XML features are
very important.

Cheers, Thilo

Anyway was just an idea and the people that will work on the backend
will have to decide. But for the demonstrator I could imagine a
quick-and-dirty persistence solution basically based on a single table
with a couple of columns for searching and a XML column for the
composition data like Tom depicted in
http://www.openehr.org/wiki/pages/viewpage.action?pageId=786487 as
"basic serialisation". For this simple approach good XML features are
very important.

Exactly, these are backend questions, not important when building an
API. There are many possible solutions, we can discuss them, but people
hesitate doing that. It is OK.

But what we, as community, can do is building an API to which the kernel
can connect, every or most possible persistence solutions should be able
to fit below.

That is where I would like to help, that is the missing part in OpenEhr
which we can do together, so why don't we?

Why do discussions die, when they enter this subject?

I would like to know that.

People discuss very seriuous all kind of subjects, very deeply, that is
OK, but why do they never discuss a solution to a Persistence-layer-API?
(not the layer itself, only the API.)
Another thing that is never discussed, that is the service-layer-API,
same question, not the service-laer itself is needed to discuss, only
the API.

Why is that?

I never found an answer to these questions, so I can only guess. Or
maybe I missed them, I do not read all emails. Possible. Please tell me,
don't stay quiet, but tell me what is happening, not only me, but I
think many silent people would want to know.

Bert

Bert Verhees wrote:

People discuss very seriuous all kind of subjects, very deeply, that is
OK, but why do they never discuss a solution to a Persistence-layer-API?
(not the layer itself, only the API.)
Another thing that is never discussed, that is the service-layer-API,
same question, not the service-laer itself is needed to discuss, only
the API.

Why is that?

I never found an answer to these questions, so I can only guess. Or
maybe I missed them, I do not read all emails. Possible. Please tell me,
don't stay quiet, but tell me what is happening, not only me, but I
think many silent people would want to know.

Bert

the main reason is that a persistence layer API is not an essential part
of openEHR - anyone can make up their own, and it makes no difference to
the semantics of openEHR. It is only really useful if there is going to
be a market in archetype-driven databases, which is not happening just
yet....

- thomas

Thomas Beale schreef:

Bert Verhees wrote:

People discuss very seriuous all kind of subjects, very deeply, that is
OK, but why do they never discuss a solution to a Persistence-layer-API?
(not the layer itself, only the API.)
Another thing that is never discussed, that is the service-layer-API,
same question, not the service-laer itself is needed to discuss, only
the API.

Why is that?

I never found an answer to these questions, so I can only guess. Or
maybe I missed them, I do not read all emails. Possible. Please tell me,
don't stay quiet, but tell me what is happening, not only me, but I
think many silent people would want to know.

Bert

the main reason is that a persistence layer API is not an essential part
of openEHR - anyone can make up their own, and it makes no difference to
the semantics of openEHR. It is only really useful if there is going to
be a market in archetype-driven databases, which is not happening just
yet....

You say, there is no market? I have other experiences, people are
interested, but they look for a market, a market which exists of
competing products, in various flavors, closed, open, with SLA, without,
etc, (in fact, that is what the word market means).
In fact, dutch enterprises in health-ICT had a meeting about this
product, a good meeting, maybe 100 important people were there, but they
did not find a market, they only found one single product, that is not a
market, that is not what many businesspeople look for.

From this point of view a API could be very useful, it could help create

that market.
(Do you think, Linux would have been a success when Linus Torvald had
waited for a market to come?)

Same thing with Openehr, if only some few companies offer the product,
it will stay a niche. We have to make the product boom by offering it to
the market on several ways.

You know, how many people don't even know/understand the status of this
project, they think it is an open source project, which you can download
and run. Even prominent members of ICT-health-community told me that I
should "hang a database under it and run it". I feel uncomfortable in
this situation, often having the feeling that I am talking about
something forbidden, often discussions just stop without reason, etc..

That is why I think, a persistence layer, or at least an API to it, will
help a lot, same thing with the services.