Index in array's in ADL

Hi all,

I had following interesting discussion.

Suppose we have a cluster containing name of a person.

There is a field called firstname, and a field called lastname.

The problem is concerning the ADL-path, what should be used.

This is a part of the archetype:

CLUSTER[at0007] occurrences matches {0..1} matches { -- voornamen
                        items cardinality matches {0..*; ordered} matches {
                            ELEMENT[at0008] occurrences matches {0..*} matches { -- firstname
                                value matches {
                                    DV_TEXT matches {*}
                                }
                            }
                            ELEMENT[at0009] occurrences matches {0..1} matches { -- lastname
                                value matches {
                                    DV_TEXT matches {*}
                                }
                            }
                        }

This is a hypothecical example, only to explain:

Because at0008 has upper-occurrences of more then 1 ( a person can have more then one firstnames)
And at0009 has upper-occurrences of 1 (people only have one last-name (in this hypothetical country))

If we want to express data in a path/value combination, what would be the best solution?

   /items[at0008][1]/value/value = Jan
   /items[at0008][2]/value/value = Peter
   /items[at0009]/value/value = Balkenende

   /items[at0008][1]/value/value = Jan
   /items[at0008][2]/value/value = Peter
   /items[at0009][1]/value/value = Balkenende

   /items[at0008][1]/value/value = Jan
   /items[at0008][2]/value/value = Peter
   /items[at0009][3]/value/value = Balkenende

Or would another solution be better?
Thanks in advance for suggestions.

Kind regards
Bert Verhees

Hi Bert,

Personally I would use:

  /items[at0008]/value[1]/value = Jan
  /items[at0008]/value[2]/value = Peter
  /items[at0009]/value/value = Balkenende

Regards

Agree with Diego, I would use indexes only for repeatable data.

i would say

/items[at0008,1]/value/value = Mark
/items[at0008,2]/value/value = Rutte

Alessandro

Hi Alessandro,

I think you propose this?

  /items[at0008,1]/value/value = Mark
  /items[at0009,2]/value/value = Rutte

regards
Bert Verhees

Alessandro Torrisi schreef op 19-11-2013 20:19:

Either this or Bert’s original (if it’s legal Xpath) is correct, assuming the data look something like (I just added the outer bit and header to make it work in a tool): <?xml version="1.0" encoding="UTF-8"?> <cluster xmlns:xsi= archetype_node_id=“openEHR-EHR-CLUSTER.bert.v1” xmlns=> Jan Peter Balkenende I’m just messing around in Oxygen with some Xqueries and Xpaths. - thomas

Bert,

I was not clear, but this is is my suggestion

/items[at0008,1]/value/value = Jan
/items[at0008,2]/value/value = Peter

/items[at0009]/value/value = Balkenende

Alessandro

Hi Allessandro,

I like that. It is neat and fits well with the existing specialisation
dot notation. It is something we might be able to make use of in
templates as well.

Ian

Hi all,

Thank you very much for your response.

First I want to respond to this one. Because there is also an XML issue.
It is not that I want to be unfriendly, but, this also needs to be discusses.

I assume, the OpenEHR-XSD's are inspiration for this OpenEHR XML.
I write, inspiration, because it is not possible to use the XSD's for definition of XML-instances.
I filed a call yesterday on JIRA for that:
http://www.openehr.org/issues/browse/SPECPR-93

But now I see another problem.
If the "element" thing (see JIRA) was repaired in the XSD's, then still, in my opinion, it would not be possible to come to following XML-fragment.
This is also a point that needs to be discussed.

IMHO, it would look like this:

<?xml version="1.0" encoding="UTF-8"?>
<cluster xmlns="http://schemas.openehr.org/v1&quot;
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance&quot;
  xsi:schemaLocation="http://schemas.openehr.org/v1 file:Structure.xsd" archetype_node_id="openEHR-EHR-CLUSTER.bert.v1">
         <items archetype_node_id="at0008">
             <value>
                 <value>Jan</value>
             </value>
         </items>
         <items archetype_node_id="at0008">
             <value>
                 <value>Peter</value>
             </value>
         </items>
         <items archetype_node_id="at0009">
             <value>
                 <value>Balkenende</value>
             </value>
         </items>
</cluster>

I'm just messing around in Oxygen with some XSD's.

Me to :wink:

But for the rest, I like the XML-approach of defining paths, although, I am not sure about the XPath-style, because I think that XPath is a query-language.
That is one of the reasons why I started this discussion.

I happen to be in a renovation of some kernel-internals, and I want the path's used to have some kind of legitimacy. That is why I ask you to help me thinking about it.

Thanks,
Bert

Hi,

The idea below was by Leo Simons in a private discussion, I mention also my remarks on it, to ask your opinions on this

If we would have produced this XML from a dataset:
<?xml version="1.0" encoding="UTF-8"?>
<cluster xmlns="http://schemas.openehr.org/v1&quot;
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance&quot;
  xsi:schemaLocation="http://schemas.openehr.org/v1 file:Structure.xsd" archetype_node_id="openEHR-EHR-CLUSTER.bert.v1">
         <items archetype_node_id="at0008">
             <value>
                 <value xsi:type="DV_TEXT">Jan</value>
             </value>
         </items>
         <items xsi:type="ELEMENT" archetype_node_id="at0008">
             <value xsi:type="DV_TEXT">
                 <value>Peter</value>
             </value>
         </items>
         <items xsi:type="ELEMENT" archetype_node_id="at0009">
             <value xsi:type="DV_TEXT">
                 <value>Balkenende</value>
             </value>
         </items>
</cluster>

The XPaths would look like:

/cluster[@archetype_id='openEHR-EHR-CLUSTER.bert.v1' and @archetype_node_id='at0007']/items[position()=1 and @archetype_node_id='at0008']/value/value=Jan
/cluster[@archetype_id='openEHR-EHR-CLUSTER.bert.v1' and @archetype_node_id='at0007']/items[position()=2 and @archetype_node_id='at0008']/value/value=Peter
/cluster[@archetype_id='openEHR-EHR-CLUSTER.bert.v1' and @archetype_node_id='at0007']/items[position()=3 and @archetype_node_id='at0009']/value/value=Balkenende

As you can see, I use the attribute archetype_id, which is not in the specifications, but also I filed an JIRA for that
http://www.openehr.org/issues/browse/SPECPR-92

But apart from that
There are a few things that I don't like about this XPath notation for non-query-purpose.

First is the connecting operator "and" (in position()=1 and @archetype_node_id='at0008'), it is meaningless, because I am not defining a logical condition.
The second thing is that items which are different (in archetype_node_id) also gets an index-notation, following on the previous. (3 in this case)

The problem is that the feel-weight (like we build up over the years) of elements with a different archetype_node_id is that it is a completely different element, although belonging to the same items-list in code.
In the OpenEHR culture, we tend to omit an index-notation, when the element is unique because of its archetype_node_id, but I wonder, is that the right thing to do?

So I wonder, is X Path-compatibility the right answer to this?

So, summarizing
three problems with xPath:
- the logical operator "and" while it is not a query
- the index still counting while different archetype_node_id
- there was one more, but it slipped from my mind, and I have to leave right now in a hurry.

Maybe not following xPath-like path-notation is a better idea?

I will reply later to the other mails regarding this

Thank you all very much
Bert

For reference, in terms of 'real XML' here is an example of a coded term:

                         <items xsi:type="ELEMENT" archetype_node_id="at0001">
                             <name>
<value>Episodicity</value>
                             </name>
                             <value xsi:type="DV_CODED_TEXT">
                                 <value>First ever</value>
                                 <mappings>
                                     <match>=</match>
                                     <target>
                                         <terminology_id>
<value>SNOMED-CT</value>
                                         </terminology_id>
<code_string>255217005</code_string>
                                     </target>
                                 </mappings>
                                 <defining_code>
                                     <terminology_id>
<value>local</value>
                                     </terminology_id>
<code_string>at0033</code_string>
                                 </defining_code>
                             </value>
                         </items>

Of course he XSD for the next release of openEHR could be much more efficient than that. But that's what we use today.

- thomas

in terms of this particular issue, the current XSDs were designed for only a few ‘top-level’ document root objects, such as Composition, Party and so on. I agree that it needs to be more flexible. But I don’t think that affects the particular question of paths we are discussing here (but thanks for recording the issue - that will definitely get addressed). You are right that my little example is not technical valid in this way - I just wrapped the core XML so as to make Oxygen happier :wink: well again, this is ‘illegally’ wrapping a small fragment to make it standalone. And the at0007 level is missing… as for the paths, it depends on your starting point in the hierarchy as well. I like Alessandro’s solution, if it is legal. From your other post you had: /cluster[@archetype_id=‘openEHR-EHR-CLUSTER.bert.v1’ and @archetype_node_id=‘at0007’]/items[position()=1 and @archetype_node_id=‘at0008’]/value/value=Jan /cluster[@archetype_id=‘openEHR-EHR-CLUSTER.bert.v1’ and @archetype_node_id=‘at0007’]/items[position()=2 and @archetype_node_id=‘at0008’]/value/value=Peter /cluster[@archetype_id=‘openEHR-EHR-CLUSTER.bert.v1’ and @archetype_node_id=‘at0007’]/items[position()=3 and @archetype_node_id=‘at0009’]/value/value=Balkenende if I rewrite that in openEHR-shortened form, so we can read it : /cluster[at0007]/items[at0008 and position()=1]/value/value=‘Jan’ /cluster[at0007]/items[at0008 and position()=2]/value/value=‘Peter’ /cluster[at0007]/items[at0009 and [position()=3]/value/value=‘Balkenende’ It would be a small step to allow: /cluster[at0007]/items[at0008, 1]/value/value=‘Jan’ /cluster[at0007]/items[at0008, 2]/value/value=‘Peter’ /cluster[at0007]/items[at0009, 3]/value/value=‘Balkenende’ BTW some existing information on paths is inn the , paths and locator chapter. - thomas

Take into account that [position()=1] is equivalent to [1] in XPath.
In fact, another thing worth noticing is that if you can assure unique
atCodes you only need to put last one. Using both XPath look like this

/cluster[@archetype_id='openEHR-EHR-CLUSTER.bert.v1']/items[@archetype_node_id='at0008']/value[1]/value=Jan
/cluster[@archetype_id='openEHR-EHR-CLUSTER.bert.v1']/items[@archetype_node_id='at0008']/value[2]/value=Peter
/cluster[@archetype_id='openEHR-EHR-CLUSTER.bert.v1']/items[@archetype_node_id='at0009']/value/value=Balkenende

By the way, I see these like an argument for using atCodes in all
object nodes. I don't think anyone would argue that using atCodes is
less clear than using [1],[2], and [3] (being [3] in a completely
different branch)

This one indicates one items[at0008]-element with to value-elements in it. I don't think that is right.

It should be

/cluster[@archetype_id='openEHR-EHR-CLUSTER.bert.v1']/items[@archetype_node_id='at0008'][1]/value/value=Jan
/cluster[@archetype_id='openEHR-EHR-CLUSTER.bert.v1']/items[@archetype_node_id='at0008'][2]/value/value=Peter
/cluster[@archetype_id='openEHR-EHR-CLUSTER.bert.v1']/items[@archetype_node_id='at0009']/value/value=Balkenende

regards
Bert

It is right in XPath, maybe is less intuitive, but I would say these
are even less intuitive and still work
/cluster[@archetype_id='openEHR-EHR-CLUSTER.bert.v1']/items/value[3]/value=Balkenende
/cluster[@archetype_id='openEHR-EHR-CLUSTER.bert.v1']/items[3]/value/value=Balkenende

I think we should decide why is this for:

Is this for ADL paths? then use ADL path syntax.
Is this for XML query? then use XPath

No need to reinvent the wheel

I am thinking the same, I was just wondering if there was something which could be optimized. But first, let me describe to get to my explanation. In XML-land, there is only a path-notation which serves for queries, that is xPath. That is why I think it is not right for expressing paths. In ADL-land, there is a path-notation which can serve paths-expressions, and queries, like is described in the Architecture-Overview document. There is room for erroneous ambiguities in path-definitions, because of doubled information, and also there are inconsistencies in ADL path-syntax and there are unnecessary restrictions. And an archetype is a data-structure description, it should not have data in it. To define the position of a leafnode, no data or leafnode-constraints should be used. One could compare it with XSD, which is the same for XML. When you have a data-value, how do you define to which node it belongs, and were that node resides in a complex data-construct. This is described at page 66 at the Overview-document. But how to use the first node is not specified. Maybe it is somewhere else. So this should be, in ADL-path-notation: So the lower-case rm-type is placed, before placing the archetype_id, while the archetype_id has the rm_type information, even already in his Id. A chance for ambiguity! Never demand for information to be written twice! And also inconsistent, because in case of an archetypeslot, instead of the lower-case rm-type, the attribute name which contains the slot is used. Like this were the attribute “data” has the archetypeid “an.another.archetype.id” Also this is not nice, because it restricts, and does not tell us what happens. It restricts, because it leaves no room for an archetype_node_id, which can be on the attribute data (without archetype_node_id it is impossible to refer to it in the ontology) So following construct would emerge, if wanting data to refer to ontology : eventually with an index in it or But this is impossible because of removal of the attribute archetype_id, and having to put the archetype_id in the attribute archetype_node_id. This is an unnecessary restriction. ------------------------------------------ So, quite some time ago, I came to a way with less restrictions, less inconsistencies and less chances for erroneous ambiguities. -If there is an archetype_id, there is no need to mention the rm-type, because the information is in the archetype_id -If there is a slot, the attribute to which the slot is placed, is written as if there was another attribute to follow, just the same as always. -In this way there are no restrictions on also mentioning the archetype_node_id and eventually index. -An archetypeId in a path can always be recognized by its square brackets and no accompanying text (rmtype or attributename). So the example path will be The leading slash is not important and can be omitted. ------------------------------------------ I even had a short notation (I explain this just for fun) The curly brackets were needed to distinguish a archetype_node_id from an archetype_id. But this short path notation was never used much, not intuitive enough, I guess. So I forgot it. ------------------------------------------ So, why am I telling all this? First, I am not happy with the specs, how they define paths, I explained why. Second, I am not happy with my own solution, mostly because there is no consensus, it is a lonely adventure. And I was not happy with my index-notation, and when to use an index-notation. And in that part, you all did help me. I learned to better define and explain the way I go. When you look at an archetype, it has a parent-child construction. When to use an index-notation is when there are two similar children on one parent, so without index-notation, confusion could occur. Children are similar if they have the same attribute-name, same archetype_node_id, the same archetype_id. But I think that the notation with the comma is better, I am going to use that. It is not very important, it is just about defining how to define what the position of a datavalue/leafnode in the archetype-construct is. The drawback is that I still need ways to reconstruct paths, etc, for compatibility reasons. Thanks for your helping, and if some one wants to discuss it further, I am still open for that. It would be a good thing if the OpenEHR community would reconsider the ways paths are treated regards Bert Verhees

Hey folks,

Take into account that [position()=1] is equivalent to [1] in XPath.

Yup! The only reason to have the discussion is the difference, in xpath, between

  (: from the items with node id at0009, take the first one :slight_smile:
  /*[@archetype_node_id=at0009][1]

  (: from all the items, take the first one, then take all the ones with node id at0009 :slight_smile:
  /*[1][@archetype_node_id=at0009]

  (: from all the items, take the first one iff it has node id at0009 :slight_smile:
  /*[@archetype_node_id=at0009 and position()=1]

Depending on the data you're xpathing, these may or may not have the same result, since the construct [][] is a sub-select. I believe, looking at everyone's answers, the intent of ADL paths is that most-precise third option (AND of the predicates), which is also what we've implemented previously on our SQL backend, and conceptually matches that proposal of

  [at0009,1]

(the , meaning AND I think), and that would make for an easy and unambiguous ADL path --> xpath translation.

But, those aren't the current rules, and with our xml backend, right now, we have a not so easy ADL path --> xpath translation, because people want to write

  [at0009][1]

which we dutifully turn into xpath

  [@archetype_node_id=at0009][1]

which _seems_ the most spec-compatible behavior insofar as ADL paths are precisely defined -- but that does not match what people mean.

Hence, confusion, anger, suffering, and me asking Bert and Bert asking you :slight_smile:

regards,

Leo

PS: Jan Peter Balkende is prof. mr. dr. dr.h.c. mult. Jan __Pieter__ Balkende, Jan Peter is his roepnaam which should probably be kept as one string.....

firstly, the statement about atcodes above is right - we only need to do this [1] [2] business when there are multiple instances of the same at-code.

In the above, the [1] and [2] selectors aren't to select different values from under an ELEMENT (which is what the at0008 selects), so I would have expected them where Bert put them. But I'm not sure if the above is wrong either. Someone should see what Saxon makes of the above compared to Bert or Alessando's version. I'll try on the weekend if noone else gets round to it before then...

- thomas

It is right in XPath, maybe is less intuitive, but I would say these
are even less intuitive and still work
/cluster[@archetype_id='openEHR-EHR-CLUSTER.bert.v1']/items/value[3]/value=Balkenende
/cluster[@archetype_id='openEHR-EHR-CLUSTER.bert.v1']/items[3]/value/value=Balkenende

Diego,
do you mean you tested this? Could you provide the XML source content you did this on?

I think we should decide why is this for:

Is this for ADL paths? then use ADL path syntax.
Is this for XML query? then use XPath

well what we want I believe, is for the ADL paths to be easily mappable (i.e. simple algorithm) to convert to the correct Xpaths. I'm not yet convinced we have the answers right here. Happy to be corrected though.

- thomas

   (: from all the items, take the first one, then take all the ones with node id at0009 :slight_smile:
   /*[1][@archetype_node_id=at0009]

   (: from all the items, take the first one iff it has node id at0009 :slight_smile:
   /*[@archetype_node_id=at0009 and position()=1]

i.e., so two predicates in a row act like a pipeline of filters...

Depending on the data you're xpathing, these may or may not have the same result, since the construct [][] is a sub-select. I believe, looking at everyone's answers, the intent of ADL paths is that most-precise third option (AND of the predicates), which is also what we've implemented previously on our SQL backend, and conceptually matches that proposal of

   [at0009,1]

this was my intention when specifying it (years ago now!).

(the , meaning AND I think), and that would make for an easy and unambiguous ADL path --> xpath translation.

But, those aren't the current rules, and with our xml backend, right now, we have a not so easy ADL path --> xpath translation, because people want to write

   [at0009][1]

So I am unclear why you prefer to write this? Is there some system or customer requirement?

which we dutifully turn into xpath

   [@archetype_node_id=at0009][1]

which _seems_ the most spec-compatible behavior insofar as ADL paths are precisely defined -- but that does not match what people mean.

we could certainly extend the ADL path spec to allow this, it just needs to be specified as having different semantics from the above version.

If I were you, I would work on the principle that paths of the form [@archetype_node_id=at0009][1] are legal (even if I don't quite understand why you want these particular paths) and have the semantics you have explained above.

- thomas