# OpenEHR and Oracle XML DB problems **Category:** [Technical (archive)](https://discourse.openehr.org/c/technical-archive/156) **Created:** 2015-04-15 13:47 UTC **Views:** 4 **Replies:** 23 **URL:** https://discourse.openehr.org/t/openehr-and-oracle-xml-db-problems/13407 --- ## Post #1 by @Dmitry_Baranov1 Hi everyone, According Bert experience \(https://www.linkedin.com/groups/Choice-OpenEHR-persistence-layer-144276.S.208531138), one must not try to adopt OpenEHR model to relational storage since almost all popular database engines able to process native XML\. So I'm experimenting with Oracle XML DB for almost two weeks, and I'm in despair and kindly ask you for help\. Here is a Github repository where I've collected few files \- https://github.com/da-baranov/openehr-ora : 1\) EHR\.xsd \- flattened OpenEHR XML schema 2\) regschema\.sql \- a script that creates an Oracle directory where test files should be copied to, registers XML schema and creates a table with XML\-schema\-based column that matches global "version" element 3\) composition\_1191\_0\.xml, composition\_1322\_0\.xml and composition\_1531\_0\.xml are test instance files which I borrowed from Pablo Pazos github \(https://github.com/ppazos/cabolabs-emrapp) 4\) I'm using Oracle 11\.2\.0\.2\.0 Express \(for some reasons can't use 12c\)\. The key problem is that Oracle rejects all the instance files producing the following error: insert into versions\(x\) values\(   xmltype\(     bfilename\('OPENEHR', 'composition\_1531\_0\.xml'\),     NLS\_CHARSET\_ID\('AL32UTF8'\)   \) \) --- ## Post #2 by @system Looks like a namespace\-problem\. I cannot judge that without seeing your XML Schema and XML\-file\. And I am not going to look at them\. Sorry for sounding harsh, but I am not an Oracle specialist\. I use Oracle 12c, so I cannot advise you with your Oracle\-version\. You should use 12c, it cannot be an hardware issue\. I have run Oracle 12c on a cheap netbook, with 4G memory\. So, my advise, buy another machine, a cheap one, let it be dedicated for Oracle\. For developing purpose I have an old Lenovo desktop\-machine, I bought for 100 Euro as Oracle server\. For demo purpose, as explained, that netbook, runs like hell on both\. For production, of course you need something else\. Install Oracle Linux, Oracle fits best on Oracle Linux\. Just do it, and then it should run fine, although you have to change the XML\-Schemas \(a bit\) before registering them to Oracle\. I already explained to you before what and why\. Anyway, your error\-situation may become less ambiguous\. Bert --- ## Post #3 by @Dmitry_Baranov1 Sorry Bert \) I had to explain that Oracle 11 is a business requirement, not a hardware limitation\. > Just do it, and then it should run fine, although you have to change the > XML\-Schemas \(a bit\) before registering them to Oracle\. I already > explained to you before what and why\. I remember your advise but not sure that changing all the sequences to choices is a right way for many reasons\. And I'm almost sure that it's an Oracle bug since other XSD validation tools \(visual studio, netbeans and eclipse\) say that my instance files are all OK and conform to XML schema\. There is a message on Oracle forum, similar symptoms \- https://community.oracle.com/message/10778556 Ok, I'll try 12c this night, thanks\. --- ## Post #4 by @system > Sorry Bert \) I had to explain that Oracle 11 is a business requirement, not a hardware limitation\. > >> Just do it, and then it should run fine, although you have to change the >> XML\-Schemas \(a bit\) before registering them to Oracle\. I already >> explained to you before what and why\. > > I remember your advise but not sure that changing all the sequences to choices is a right way for many reasons\. And I'm almost sure that it's an Oracle bug since other XSD validation tools \(visual studio, netbeans and eclipse\) say that my instance files are all OK and conform to XML schema\. You are right it is a messy solution, but if you use "sequence", like it is defined in the XML Schema, you have to take care that all the nodes in your XML are in the right/defined order\. I think this is hard to achieve when XML\-files are created in production, but of course, you can arrange that as an alternative\. I think it is a stupid rule in the XML\-Schema standard\. The only alternative is "choice", but then you are not able to have the counting of the elements validated\. Check the book written by Priscilla Walmsley, she is the all knowing saint regarding XML Schema\. If you don't have it, you can check the w3\-website, where the standard is published, but less comfortable to read\. So, validating completely against XML Schema is not feasible because of weakness in the XML\-Schema standard\. So why should you then register the XML\-Schema in the database? There is another reason for that, and that is because Oracle is able to optimize the querying by using the structures in the XML\-Schema\. And for that purpose, replacing the "sequence" with "choice" is not a problem\. I don't know if comparing the results with other products is a reason to say that your work is done right\. You should always follow the standard in your analyses, before you declare one product buggy and another not\. There is a message on Oracle forum, similar symptoms \- https://community.oracle.com/message/10778556 I see in that message that there was a patch recommended, did you try that? Always use the latest patch level\. > Ok, I'll try 12c this night, thanks\. Maybe on success, you are able to change the business requirement\. good luck\. Bert --- ## Post #5 by @thomas.beale I just hit this in doing the AOM2 schema\. It's a completely senseless rule, clearly a hangover from 'document' thinking \- nothing to do with 'data' thinking\. I ended up replacing <sequence> with\. <xs:choice maxOccurs="unbounded"> I have to say, XML\-schema based data looks extremely unattractive as a basis for anything except data exchange\. I wouldn't try to implement anything important inside a system with it, you are too compromised in too many ways\. \- thomas --- ## Post #6 by @system >>> Sorry Bert \) I had to explain that Oracle 11 is a business requirement, not a hardware limitation\. >>> >>>> Just do it, and then it should run fine, although you have to change the >>>> XML\-Schemas \(a bit\) before registering them to Oracle\. I already >>>> explained to you before what and why\. >>> >>> I remember your advise but not sure that changing all the sequences to choices is a right way for many reasons\. And I'm almost sure that it's an Oracle bug since other XSD validation tools \(visual studio, netbeans and eclipse\) say that my instance files are all OK and conform to XML schema\. >> >> You are right it is a messy solution, but if you use "sequence", like it is defined in the XML Schema, you have to take care that all the nodes in your XML are in the right/defined order\. >> I think this is hard to achieve when XML\-files are created in production, but of course, you can arrange that as an alternative\. >> >> I think it is a stupid rule in the XML\-Schema standard\. > > I just hit this in doing the AOM2 schema\. It's a completely senseless rule, clearly a hangover from 'document' thinking \- nothing to do with 'data' thinking\. I ended up replacing <sequence> with\. > > <xs:choice maxOccurs="unbounded"> It is a poor man's choice, because of the side\-effects\. The "choice"\-constraint makes the engine ignore the minOccurs/maxOccurs constraints in the elements under the choice element\. > I have to say, XML\-schema based data looks extremely unattractive as a basis for anything except data exchange\. I wouldn't try to implement anything important inside a system with it, you are too compromised in too many ways\. It is also used, as I wrote yesterday, to tell an XML\-database how, in a specific namespace, the data are arranged, in that way the database can auto\-create indexes, etc\. There is no other way to communicate the structure of XML\-documents, and even if we found another way, or invented it ourselves, we still would need a broad acceptation\. I never heard that there is any work going on in the committees, because XML Schema 1\.1 is almost 15 years old\. I believe it is considered finished\. If that is true, it is very sad\. Bert --- ## Post #7 by @Dmitry_Baranov1 Hi Bert, I give up\. The problem appears in 12c as well\. And people agree that it is an Oracle bug\. https://community.oracle.com/message/13008017 --- ## Post #8 by @thomas.beale We still need some kind of schema for XML docs/ messages, but I would not even think of making any persistence be based on XML, especially not XML schema\. Probably Relax NG would have been the better one for messages etc\. Keep XML at the boundaries, that's the key to happiness\! I guess JSON with its own schema will replace it in the next couple of years\. \- thomas --- ## Post #9 by @yampeku In fact, JSON\-Schema is only lacking a more detailed typing system to replace ADL :\) --- ## Post #10 by @system Dmitry, if you read the post carefully, not people, but one person says, it *appears* to be a bug He has to escalate it to engineering, and if it is a confirmed bug, is something you must wait for He says he can reproduce it, this means he used your files. I checked them also, they validate right in Oxygen. Oracle in different versions stumbling over your used OpenEHR XML Schema with your XML-instances? That is possible. There is a bug in a particular use of your schema. I am stressing to this because I am posting all the time Compositions in Oracle 12c, and all kind of other XML-instances from several namespaces. No problem at all. But I have things structured in another way. For example, when I post a Composition, the XML-instance starts with the Document-element called Composition. This is allowed because I have an xs:element COMPOSITION on top level in the XML-Schema (. I noticed in a glance that you have that too, and it is logical that you have, because you flattened the same XML Schema's as I did. When you use that, you have to store your audit and version information elsewhere. When I look at your XML Schema and instance, just a quick glance, then I see that you have at top level, and derived from that Original Version, which has as child-element data of type xs:anyType. I would not do that, because you throw away any optimization. Oracle can't do anything with this. You could at least have it to be Locatable, because data in version are always of type Locatable. You could even have a original-version of type composition, and one of type party, this would help optimization even more. I regret that I cannot post the XML Schema and XML-instances I use, because they are not of my IP. But they are structured in another way, more dedicated to efficiency. Bert --- ## Post #11 by @system With RelaxNG, IMHO, you cannot communicate structure, RelaxNG only serves validation\. So, if I am right, optimizing an XML database using RelaxNG is impossible\. You, can say, that if it validates in a RelaxNG\-schema, then it must be structured right, but learning a structure from a RelaxNG schema seems hard to me\. I don't think JSON wll replace XML, JSON has no attribute\-mechanism, and must treat everything as data\. It has its use for wellknown structures\. It is efficient\. But if you want JSON to be as rich as XML, then it must be restructured as XML, and then it will become XML2\. Which is OK for me, but then we still have to solve the problems with XML Schema, but then on JSON\-Schema Bert --- ## Post #12 by @system Hi there, there might be good reasons \(as mentioned here\) to not like XML but its mature and rich technology stack \(XSLT, XPath/XQuery, Schematron, RelaxNG\), available tooling \(Mapforce, XML Spy\.\.\.\) and support \(Most major database vendors\.\.\.\) makes it a working solution in many use\-cases\. I really can't say that we are unhappy with it at our persistence layer \(though one of my master students currently investigates the use of intersystems caché as an alternative\.\.\.I'm not quite sure if AQL will be possible in an "easy" way as in XML databases \) Of course, I like the elegance of Json but there is plenty of work to do until it's getting a real substitute \(if at all\)\.\.\. Best, Birger --- ## Post #13 by @Dmitry_Baranov2 > I regret that I cannot post the XML Schema and XML\-instances I use, because they are not of my IP\. But they are structured in another way, more dedicated to efficiency\. XML schema is intellectual property, I agree, but why might you or somebody else not to provide community with a couple \(well, a couple of hundreds will be better\) of depersonalized sample instance files? :\) We'ld appreciate that very much\. I'm a beginner with OpenEHR and it seems to me that there is a lack of OpenEHR instance examples on the Internet; HL7 CDA has plenty of samples and their FHIR has an excellent web site with instance samples repository, in various formats\. --- ## Post #14 by @system Sory Dmitry, I did not explain right\. The work I do is not of my IP\. I work for someone and we agreed he automatically has the IP of my work\. So that is why I can't share the XML Schema's and XML\-instances I use\. Bert --- ## Post #15 by @yampeku I can generate random sample instances from current archetypes for you if you need them\. Generated data may not make much sense as it only tries to follow the archetype constraints, but it should be enough for application testing and benchmark --- ## Post #16 by @Dmitry_Baranov1 Diego, that'll be great\. Hope that OpenEHR github owners will provide us with an instance samples repository some day or other :\) --- ## Post #17 by @thomas.beale Indeed, it would be a great thing\. The reason it doesn't exist so far, is that to be useful we need synthesised data sets that have some realistic statistical spread of values\. Since we are talking at multiple levels \- not just vital signs measurements, but covariance of all kinds of measurements with assessments \(diagnosis etc\), plans and orders and actions, the complexity is not trivial\. A data synthesiser to do this for openEHR would be a fantastic Master's project \(hint :\)\. \- thomas --- ## Post #18 by @system > > Indeed, it would be a great thing\. The reason it doesn't exist so far, is that to be useful we need synthesised data sets that have some realistic statistical spread of values\. Since we are talking at multiple levels \- not just vital signs measurements, but covariance of all kinds of measurements with assessments \(diagnosis etc\), plans and orders and actions, the complexity is not trivial\. > > A data synthesiser to do this for openEHR would be a fantastic Master's project \(hint :\)\. I use Oxygen, it can generate XML instances to XML Schema's, but first we need to change the data\-element of version to have type Locatable, or Composition\. If wanted, I can generate them too, it is only one minute work\. Bert --- ## Post #19 by @system Hi Thomas, do we really need realistic data sets? At least in the beginning, many programmers that are new to openEHR would already be happy with some valid data instances\. If anyone likes to do such a synthesier: Katrin Dentler has done some work on patient data generation \(http://www.cs.vu.nl/~frankh/postscript/KR4HC-2013-APDG.pdf)... maybe she can provide some existing code and knowledge if anybody wants to build such a thing :\) \(It might be a little out of scope for my department\) Best, Birger --- ## Post #20 by @Seref Hi Birger, Thanks for link to Katrin's work. Regards Seref --- ## Post #21 by @ANASTASIOU_A Had to make this a separate discussion about the data synth to distinguish it from the Oracle XML DB problems\. If you have access to primary data, e\.g\. the usual Pat\_ID, Timestamp, Code, Value, you can create very simple generative models by training a neural network to generate sequences with similar statistics\. In this case, the time parameter, code covariance and code value dynamics \(for those that have a value attached to them\) are all lumped together\. You can set inclusion/exclusion criteria that specify a sub\-population and have the neural network be trained and generate data for that specific population\. As a practical example, in a pilot study of a few thousand elderly patients, we found that along with dementia, it was inevitable to be getting significant cardiovascular problems and also, a lot of\.\.\.flu\. It's a crude way to be training a network to produce similar sets of codes and values but the dream of specifying patient data as a linear mixture of the profiles of different conditions is a bit far off yet :\) Of course, when this is done, we are still left with mapping a given clinical encoding scheme to the suitable openEHR archetype\. \(If we are looking at a general solution that is\)\. All the best Athanasios --- ## Post #22 by @system As part of NHS Code4Health, we should be able to expose a number of realistic composition instances, kindly donated by system vendors. On my Todo list :) Ian --- ## Post #23 by @pablo Hi Dmitry, as a side conceptual note, the openEHR IM is not a persistence model, and the XSDs are a way to express instances of the IM. The XSDs would require some refactoring work to express an specific persistence model for the Information Model, like simplifying some datatypes of the IM into datatypes used/supported by your persistence technology, like dates, texts, coded texts, ordinals, etc. I would suggest to explore hybrid approaches using relational+xml or relational+json persistence technologies. It might make your work easier. Cheers, Pablo. --- ## Post #24 by @Chamorro_Gordejuela Hi Dimitry, We are using the same approach as ​Bert (storing compositions, not versions) using Oracle 11.2.0.4.0 and it's working fine. We use openEHR schemas (Composition, Content, Structure and BaseTypes) with slight modifications (datetime regexp, for example). --- **Canonical:** https://discourse.openehr.org/t/openehr-and-oracle-xml-db-problems/13407 **Original content:** https://discourse.openehr.org/t/openehr-and-oracle-xml-db-problems/13407