LOCATABLE.item_at_path

in class LOCATABLE there is the following function:

item_at_path (a_path: String): Any
require
path /=Void and then valid_path(a_path)
ensure
Result /= Void

There has been a request to add a function:

items_at_path (a_path: String): List
require
path /=Void and then valid_path(a_path)
ensure
Result /= Void

And to make the semantics as follows:

  • item_at_path returns only one item; if the path target is a container, it returns the first item
  • items_at_path - return the whole container if a container attribute; if only one attribute, return as the sole member in a container

This is in contrast with the current specification, which implies that item_at_path returns a single item or a container of items, i.e. whatever is found at the path.

In the current scheme, your code doesn’t know whether it is getting a single object of a List back; in the proposed scheme, you (presumably) call each function based on the path itself, and the knowledge from the RM of what a given path points to.

Does anyone have preferences?

  • thomas beale

I like this change…Sam

Thomas Beale wrote:

in class LOCATABLE there is the following function:

item_at_path (a_path: String): Any
require
path /=Void and then valid_path(a_path)
ensure
Result /= Void

There has been a request to add a function:

items_at_path (a_path: String): List
require
path /=Void and then valid_path(a_path)
ensure
Result /= Void

And to make the semantics as follows:

  • item_at_path returns only one item; if the path target is a container, it returns the first item
  • items_at_path - return the whole container if a container attribute; if only one attribute, return as the sole member in a container

It doesn’t seem to be necessary since the client code can easily switch on the returned type and do the right processing to achieve the same effect. This change will force the client code to decide on which method to go based on the knowledge of the path, but sometimes the path is known only in runtime (or config time).

Other potential issues with this proposed change:

  1. “if the path target is a container, it returns the first item” by item_at_path would break the current semantics of this method causing confusion. If the new sematics were found indeed very useuful, I would suggest we keep the original version of the item_at_path and add two new functions with different names.
  2. What if the container is not ordered, the returned first item will be different from time to time.
  3. “if only one attribute, return as the sole member in a container” seems to be difficult to implement, e.g. pick any specific container type to use.

Cheers,
Rong

Hi Rong,
For almost the same reasons that you give below is why we should have an items_at_path method. For starters, I do not believe that the semantics of item_at_path are currently unambiguously stated, in fact proof this is evident in difference between the java kernel implementation of this method and that of the Ocean kernel. Adding additional methods will make it even more confusing which method to use. Part of this CR is to ensure that the semantics of the method are unambiguous.

Ocean has developed a data integration tool which uses very generic code to build openEHR compositions and the need to check the types returned by item_at_path is very cumbersome especially to determine if the result needs to be iterated further or not. Having a mechanism to ensure that you will always get a list that may have 0, 1 or more items will simplify this kind of generic code where “the path is known only ain runtime (or config time)”

The only reason the first item in an unordered list would be different time to time is if the instance has changed, surely?

I think Tom’s use of container in the discussion was too generic and the container must always be a list (as specified in definition of the method).

The big problem that this CR solves in the guess work about what you are going to get returned by the method, in particular when items have a upper cardinality > 1. To handle the case when there is a single item you have to handle the return of that class, but when it has more than one item you get a list, but all you want is the first item so you have to get the first item from that list. Having two methods that always returns a single item or a list of items will allow easier code logic. This pattern is common in XML Object Models.

Regards

Heath

Hi Rong,
For almost the same reasons that you give below is why we should have an items_at_path method. For starters, I do not believe that the semantics of item_at_path are currently unambiguously stated, in fact proof this is evident in difference between the java kernel implementation of this method and that of

Hi Heath,

I am curious about the difference between two implementations. Maybe you could elaborate a bit since I don’t have the privilege to look at the two implementations :slight_smile:

the Ocean kernel. Adding additional methods will make it even more confusing which method to use. Part of this CR is to ensure that the semantics of the method are unambiguous.

Ocean has developed a data integration tool which uses very generic code to build openEHR compositions and the need to check the types returned by item_at_path is very cumbersome especially to determine if the result needs to be iterated further or not. Having a mechanism to ensure that you will

always get a list that may have 0, 1 or more items will simplify this kind of generic code where “the path is known only ain runtime (or config time)”

But the required logic to handle unknown returned type seems to be quite simple to me - just switch on the type information if it’s a list, take the first element.

The only reason the first item in an unordered list would be different time to time is if the instance has changed, surely?

Well, if the container is not ordered the elements are returned in no particular order. You will get different ‘first item’ from time to time (even when the instance hasn’t changed at all) which is not desired I guess.

I think Tom’s use of container in the discussion was too generic and the container must always be a list (as specified in definition of the method).

Using List will solve ordering problem, since List is ordered, ‘first item’ will be the same every time. But it also creates new issue - what if I really want to have the container node itself which is not a type of List? which method should I use then? Apparently not the ‘new’ item_at_path, since it only returns the first element; and not items_at_path either since it will wrap all the children nodes in a List not the original container type.

Regards,
Rong

Rong,
See comments below prefixed with HF:

Heath

Dear all,

After reflecting on this for the last month or so, I try to resolve it
with how I would do it in Eiffel. The first thing is that in normal
code, one should almost never have to check the type of something. It
does happen in very generic code, and we are bordering on that. If I
were to code access to some attributes via path in Eiffel, or in fact in
any language, I would do it like this:

-- assume we know that some_path is a valid path of some kind
-- assume also that we are talking about ordered containers (called
sequences in Eiffel)
if valid_container_path(some_path: String) then
    obj_list := items_at_path(some_path) -- or container_at_path() would
be better
else
    obj := item_at_path(some_path)
end

Now, you might think: I don't want to do that all the time. But most of
the time you won't have to - you will know in context of your code that
you are expecting a container, or a single item back, so you will
directly call the correct function. The above code is when your calling
code really has no idea.

But if we do it the other way, in Eiffel we would have to do:

obj_list ?= item_at_path(some_path) -- assignment attempt
if obj_list /= Void then
    obj ?= item_at_path(some_path)
end

Even if we are in code that knows what it expects, you still have to do:

obj_list ?= item_at_path(some_path) -- expensive operation; not
statically checked at compile time

Or it could be done by checking types:

any_obj := item_at_path(some_path)
if conforms_to(dynamic_type(any_obj), some_list_type_id) then
    obj_list ?= any_obj
...

as you can see, you can't escape doing an assignment attempt, which is
an expensive and non-type safe operation. There is an equivalent in C#.
So if you don't know the return type of the function, you always have to
do some check or test when you call it - every time.

Given that most of the time the code is most likely to know what kind of
thing the path it has is pointing to, it would be far better to have a
normal type-safe function to return that kind of thing, and not be doing
runtime type guessing.

This would mean using a set of functions like:

item_at_path (a_path: String): Locatable (or is it Any? Locatable means
the smallest thing you can get is an Element)
       -- return whatever is found at path - could be a container or a
single Locatable
    require
       a_path_valid: a_path /= Void and then valid_item_path(a_path)
    ensure
       Result /= Void

container_at_path (a_path: String): Container<Any>
    require
       a_path_valid: a_path /= Void and then valid_container_path(a_path)
    ensure
       Result /= Void
  
valid_path (a_path: String): Boolean
       -- True if path exists at all in structure

valid_item_path (a_path: String): Boolean
       -- True if path refers to single attribute object
       -- e.g some times paths like aaa/bbb, but also sometimes
aaa/bbb[xxx], if xxx resolves to single item

valid_container_path (a_path: String): Boolean
       -- True if path refers to container object
       -- e.g some times paths like aaa/bbb, but also sometimes
aaa/bbb[xxx] if nested containers

path_of_item (a_loc: Locatable): String
       -- only works for Locatables, not containers

In the above, I deliberately avoided using 'List' rather than
'Container', since in theory a fairly high level type should be chosen.
In Eiffel I would use Sequence; in openEHR List might be alright - we
can decide on that later.

I also have not yet included a function that would return say a single
DV_DATE_TIME or Integer or whatever, as well as some complex types that
are not LOCATABLEs - and clearly paths into various parts of the model
can produce all such things. It may be that we want functions not for
'item' but for 'locatable', since these would be nice to use for some
kinds of path processing.

reactions?

- thomas

Heath Frankel wrote:

Thomas,
This is sort of going down the route that I was hoping but there is one key
scenario that needs clarification which was actually the main issue that I
had with item_at_path.

So to summarise, item_at_path can be used for full data paths such as

  /data/events[atxxxx and name/value='xxx']/data/items[atxxxx and
name/value='xxx']/items[atxxxx and name/value='xxx']

which returns an Element assuming that item_at_path is called on an
observation and the ITEM_STRUCTURE of the event data is a LIST. Assuming
that item_at_path returns ANY rather than LOCATABLE it can also be used for
the following path

  /data/events[atxxxx and name/value='xxx']/data/items[atxxxx and
name/value='xxx']/items[atxxxx and name/value='xxx']/value

which returns a DATA_VALUE associated with the ELEMENT value.

container_at_path can be used for data paths such as

  /content
and
  /data/events

which return the collection of CONTENT_ITEMs associated with a composition
and data EVENTs associated with an OBSERVATION respectively.

However, which method should be used and what will be returned for the
following data paths?

  /content[openEHR-EHR-OBSERVATION.xxx.v1]

  /data/events[atxxxx]

  /data/events[atxxxx and name/value='xxx']/data/items[atxxxx and
name/value='xxx']/items[atxxxx]

The point is that none of these paths are unique because they do not include
a name/value predicate so there is a chance that there are multiple
occurrences that satisfy the above data paths. Of course an archetype may
constrain these nodes to a single occurrence but I don't see why a generic
application should need to know about the archetype to determine if they are
going to get a collection of items or a single item.

So, I would like some clarification in the specs to know:

* Should I use item_at_path or container_at_path for non-unique data paths
like above?

* Can container_of_path be used for data paths to elements with cardinality

1, what is the container?

* Should I expect a collection of items only when there are multiple
occurrences or a collection of items always returned even when a single
occurrence exists and the archetype constrains the node to a single
occurrence?

* Can we use item_at_path with a data path to a container or non-unique path
to return the first item in a container?

Regards

Heath

Heath Frankel wrote:

Thomas,
This is sort of going down the route that I was hoping but there is one key
scenario that needs clarification which was actually the main issue that I
had with item_at_path.

So to summarise, item_at_path can be used for full data paths such as

  /data/events[atxxxx and name/value='xxx']/data/items[atxxxx and
name/value='xxx']/items[atxxxx and name/value='xxx']

which returns an Element assuming that item_at_path is called on an
observation and the ITEM_STRUCTURE of the event data is a LIST. Assuming
that item_at_path returns ANY rather than LOCATABLE it can also be used for
the following path

  /data/events[atxxxx and name/value='xxx']/data/items[atxxxx and
name/value='xxx']/items[atxxxx and name/value='xxx']/value

which returns a DATA_VALUE associated with the ELEMENT value.

container_at_path can be used for data paths such as

  /content
and
  /data/events

which return the collection of CONTENT_ITEMs associated with a composition
and data EVENTs associated with an OBSERVATION respectively.

However, which method should be used and what will be returned for the
following data paths?

  /content[openEHR-EHR-OBSERVATION.xxx.v1]
  
  /data/events[atxxxx]

  /data/events[atxxxx and name/value='xxx']/data/items[atxxxx and
name/value='xxx']/items[atxxxx]

The point is that none of these paths are unique because they do not include
a name/value predicate so there is a chance that there are multiple
occurrences that satisfy the above data paths. Of course an archetype may
constrain these nodes to a single occurrence but I don't see why a generic
application should need to know about the archetype to determine if they are
going to get a collection of items or a single item.
  

Ok, this clarifies something which we already knew, but had not factored
into the functional interface, namely that there are 3 cases for paths:

1. paths that correspond directly to a single attribute object, like
Observation.data
2. paths that correspond directly to a container object, like History.events
3. paths that when evaluated produce a set/list as an answer.

I propose that the functions on Locatable only deal with the first 2
types of paths, which we can think if as 'static' paths, i.e. paths
whose target can be determined just by looking at the path and the
reference model schema, but not the data. In fact, it can be done just
by looking at the path itself - there can be no predicate value
expressions, only navigation elements - as long as the path elements
actually exist in the RM structure.

The 3rd kind of path is really a true Xpath or Xquery kind of path;
maybe we can call it a 'dynamic' path or a 'query' path or something.
Its result is only knowable at runtime. This kind of path is not
sensible to handle in functions on Locatable; it should only be handled
in the vEHR API, query service, etc. The functions on Locatable should
be reserved for static paths.

This means that the functions I proposed would still stand but all the
valid*path() ones would be understood as rejecting non-static paths, so
I would have to document them that way.

So, I would like some clarification in the specs to know:

* Should I use item_at_path or container_at_path for non-unique data paths
like above?
  

use the valid_xxx_path() functions first, to work out what the paths
point to.

* Can container_of_path be used for data paths to elements with cardinality
  

1, what is the container?
    

I assume you mean container_at_path - based on the above - no - it can
only be used with static container paths.

* Should I expect a collection of items only when there are multiple
occurrences or a collection of items always returned even when a single
occurrence exists and the archetype constrains the node to a single
occurrence?
  

see above - this should now be clear.

* Can we use item_at_path with a data path to a container or non-unique path
to return the first item in a container?
  

Only by using [1], assuming we allow that.

- thomas

I have to say that this looks a little too complicated for running massive queries. The Xpath idea of always expecting a list back looks far more straight forward to me and then you can test the number of returned values or even take the first one if appropriate.
Sam

Thomas Beale wrote:

Sam Heard wrote:

I have to say that this looks a little too complicated for running
massive queries. The Xpath idea of always expecting a list back looks
far more straight forward to me and then you can test the number of
returned values or even take the first one if appropriate.
Sam

well, remember we are talking about the path functions on LOCATABLE; are
we saying we want the full power of Xpath implemented right there; are
we also saying we don't want functions that can actually return what a
path is pointing to (rather than always returning a List)?

- thomas

Sam Heard wrote:

I have to say that this looks a little too complicated for running
massive queries. The Xpath idea of always expecting a list back looks
far more straight forward to me and then you can test the number of
returned values or even take the first one if appropriate.
Sam

I am starting to come around to this way of thinking I have to admit.
The general situation with an X-path like path is that it will return
0..n results. Although in some particular cases we can inspect the RM
and the path, and know in advance that we will get back only 0 or 1
items, I think it does make more sense for this kind of function to
always return a List<Any>, and for a subsequent inspect of the RM schema
to be used to know how to process what is in the list. So a function
items_at_path(a_path: String): List<Any> would make sense for this purpose.

I can also imagine wanting a function that only ever returned single
objects like EVENT_CONTEXT (off COMPOSITION) or HISTORY
(OBSERVATION.data), for the purpose of data modification. In these
cases, we could assume that the relevant returned object would carry
sufficient interface to perform modification operations. E.g. HISTORY
might carry some List operations (which will be natively available on
the List at HISTORY.events) such as clear, insert(x), extend(x),
append(a_list), remove, go(), etc. Here we are starting to be more
prescriptive about the fine-grained data modification interface of a
kernel. I don't want to do this in Release 1.0.1, but we should think
about doing it, so that the vEHR kernel interfaces from different
implementations are compatible (not necessarily the same).

Now, the astute here will have already noticed that my second point can
only be true if the path provided can be checked to be unique with
respect to the data. It seems to me that functions:

path_unique(a_path: String): Boolean
path_non_unique(a_path: String): Boolean

could be quite useful.

If we had these, then the semantics of the functions item_at_path() and
items_at_path() are more like what Heath and Sam were imagining. You
would do:

if valid_path(a_path) then
    if path_unique(a_path) then
        x := item_at_path(a_path)
       -- process one item
    else
       list_of_x := items_at_path(a_path)
       -- iterate
    end
end

Or...do we just stay simple and always do:

if valid_path(a_path) then
   list_of_x := items_at_path(a_path)
end
if path_unique(a_path) then
    -- process one item
else
    -- iterate
end

reactions?

- thomas

if valid_path(a_path) then
if path_unique(a_path) then
x := item_at_path(a_path)
– process one item
else
list_of_x := items_at_path(a_path)
– iterate
end
end

Or…do we just stay simple and always do:

if valid_path(a_path) then
list_of_x := items_at_path(a_path)
end
if path_unique(a_path) then
– process one item
else
– iterate
end

If we have a funciton like count_items_at_path, which returns 0 for invalid path. We can do the following with one path query less.

count = count_items_at_path
if count == 1 then
– process one item
else if count > 1 then
– iterate
end

/Rong

Dear All
Why not just get them and then check that you only have one - that is how queries work everywhere else.

Sam,
It is this checking what we get back that we want to avoid. Also, when you query something you know what you expect back (because you state it in the query). The problem here is that what we can get back may change depending if there is a single occurrence or multiple occurrences.

We need to know that we are going to either a LOCATABLE or a Collection every time for the same path against any data instance. (Note that I use collection, not container as a container may be a LOCATABLE, such as SECTION and CLUSTER).

Rong,
The problem with count_items_at_path is that having a count = 1 does not distinguish between a path to a collection with 1 item (which may have more items another time) and a path that always has 1 item. If we know within the context of the application that path will return a single occurrence always we should be able to just – process one item and if we know that we have the possibility to have more than one instance that we just – iterate. But when the code is generic and we don’t know if the path will return a unique item or not then we need a way to distinguish between paths that will return a LOCATABLE or a Collection.

Tom,
I wonder if path_exists may be a better name than valid_path because the semantics of valid_path can be confused to mean that the path is valid against an archetype/template rather than valid against a data instance. I know this means a late attribute name change but I think will have significant benefit.

I think your first example is the way to go, but this could also work with the one function item_at_path assuming that you have unique_path function to check what you are likely to get in return.

Heath

Hi all,

I would go with the Xpath way - you can never be absolutely SURE you won’t get the wrong thing, or a null (even if the rules say you won’t). Why should you know what you will get from a query in advance? It is not something that you would normally know.

Cheers, Sam

Heath Frankel wrote:

Sam,
When you write an SQL query, you specify the columns that you expect to be returned in the table (even if it is all the columns in the table). When you specify an XPath expression you usually know what kind of node you are expecting back, an element, attribute or text node. Sure if you get the path wrong you may get something that you didn’t expect or a null, but that is idea of the valid_path method, to help write more robust code. We all know that XSLT (using XPath) is not the most robust and friendly programming language and I don’t think we should use it as a model programming language.

I would be happy to have item_at_path return null if a node does not exist but we should provide this kind of guidance in the documentation if that is the case (perhaps an invariant stating that item_at_path returns null when valid_path is false). But then what is the value of the valid_path function if we don’t need to use it to determine if a node exists?

Heath

Ahr, the latest version of the R1.0.1 candidate Common RM has add a precondition and post condition for item_at_path, it requires that valid_path must be true and ensures that item_at_path must not return null.

Heath Frankel wrote:

Sam,
It is this checking what we get back that we want to avoid. Also,
when you query something you know what you expect back (because you
state it in the query). The problem here is that what we can get back
may change depending if there is a single occurrence or multiple
occurrences.

We need to know that we are going to either a LOCATABLE or a
Collection<LOCATABLE> every time for the same path against any data
instance. (Note that I use collection, not container as a
container may be a LOCATABLE, such as SECTION and CLUSTER).

Rong,
The problem with count_items_at_path is that having a count = 1 does
not distinguish between a path to a collection with 1 item (which may
have more items another time) and a path that always has 1 item. If
we know within the context of the application that path will return a
single occurrence always we should be able to just -- process one item
and if we know that we have the possibility to have more than one
instance that we just -- iterate. But when the code is generic and we
don't know if the path will return a unique item or not then we need a
way to distinguish between paths that will return a LOCATABLE or a
Collection<LOCATABLE>.

Tom,
I wonder if path_exists may be a better name than valid_path because
the semantics of valid_path can be confused to mean that the path is
valid against an archetype/template rather than valid against a data
instance. I know this means a late attribute name change but I think
will have significant benefit.

good point - I agree.

I think your first example is the way to go, but this could also work
with the one function item_at_path assuming that you have unique_path
function to check what you are likely to get in return.

So the two-function approach is as follows:

if path_exists(a_path) then
    if path_unique(a_path) then
        x := item_at_path(a_path)
       -- process one item
    else
       list_of_x := items_at_path(a_path)
       -- iterate
    end
end

Otherwise we go with the more Xpath approach and just make
item_at_path() always return a List<Any>.

Note that neither of these functions return Locatable or List<Locatable>
- it has to be Any if we want to process paths right down to the lowest
grain, and to things that are not Locatable inheritors. We could add
another couple of functions to return these types, but it may be overkill.

- thomas

Heath Frankel wrote:

Sam,
It is this checking what we get back that we want to avoid. Also,
when you query something you know what you expect back (because you
state it in the query). The problem here is that what we can get back
may change depending if there is a single occurrence or multiple
occurrences.

We need to know that we are going to either a LOCATABLE or a
Collection every time for the same path against any data
instance. (Note that I use collection, not container as a
container may be a LOCATABLE, such as SECTION and CLUSTER).

Rong,
The problem with count_items_at_path is that having a count = 1 does
not distinguish between a path to a collection with 1 item (which may
have more items another time) and a path that always has 1 item. If

Agree here.

we know within the context of the application that path will return a
single occurrence always we should be able to just – process one item
and if we know that we have the possibility to have more than one
instance that we just – iterate. But when the code is generic and we
don’t know if the path will return a unique item or not then we need a
way to distinguish between paths that will return a LOCATABLE or a
Collection.

Tom,
I wonder if path_exists may be a better name than valid_path because
the semantics of valid_path can be confused to mean that the path is
valid against an archetype/template rather than valid against a data
instance. I know this means a late attribute name change but I think
will have significant benefit.
good point - I agree.

I think your first example is the way to go, but this could also work
with the one function item_at_path assuming that you have unique_path
function to check what you are likely to get in return.
So the two-function approach is as follows:

if path_exists(a_path) then
if path_unique(a_path) then
x := item_at_path(a_path)
– process one item
else
list_of_x := items_at_path(a_path)
– iterate
end
end

Otherwise we go with the more Xpath approach and just make
item_at_path() always return a List.

Note that neither of these functions return Locatable or List

  • it has to be Any if we want to process paths right down to the lowest
    grain, and to things that are not Locatable inheritors. We could add

I think so too. We need to support paths that can return attributes of native types.

another couple of functions to return these types, but it may be overkill.

It will be handy to have functions like string_at_path(), integer_at_path() etc if we do this. The benefit is the client code doesn’t need loop through the types and cast the return value to the right type. If these seem to be too fine-grained, we could have primitive_type_at_path(), and domain_type_at_path() to at least get the root type of these values.

Regards,
Rong