# Named element - and occurences **Category:** [Archetype Designer](https://discourse.openehr.org/c/archetype-designer/30) **Created:** 2022-04-04 10:32 UTC **Views:** 1568 **Replies:** 71 **URL:** https://discourse.openehr.org/t/named-element-and-occurences/2492 --- ## Post #1 by @bna A customer of us are using Archetype Designer (tools.openher.org) to create templates. It is possible to give the element a specific name and still have the element occurrences to unlimited. In the generated OPT the definition is like : ``` NEW_NAME_OF_ELEMENT ``` As we understand this the name of the element is constrained to only allow a name like `NEW_NAME_OF_ELEMENT`, which in turn makes it impossible to repeat the element and still have it uniquely addressable with a path. There is ticket in process to add a LOCATABLE.sequence_id https://openehr.atlassian.net/browse/SPECRM-63 . Before this is resolved and implemented in the specifications and software it will not be possible to change the name of an element and still have multiple occurrences. Any other views on this ? --- ## Post #2 by @Seref Just to make sure I understand the issue/question: [quote="bna, post:1, topic:2492"] It is possible to give the element a specific name and still have the element occurrences to unlimited. [/quote] You mean: it should be possible to constraint the value for the name of an element to a specific value, but have unlimited instances of that element in the data, each valid according to constraint on the value of name. Did I get it right? [quote="bna, post:1, topic:2492"] As we understand this the name of the element is constrained to only allow a name like `NEW_NAME_OF_ELEMENT`, which in turn makes it impossible to repeat the element and still have it uniquely addressable with a path [/quote] Interesting one. I think this depends on the interpretation of a constraint on the name of an element. If we're interpreting the the specificity of the value of the name as uniqueness, i.e. there should be only one element at this RM path with name having this value , then the tool's behaviour sounds correct. If the specificity of the value of the name means elements at this path can have only the following name, regardless of how many of elements may exist, then the tool's behaviour is not correct. I'd expect the latter to be the semantics of the constraint on the value of name. A constraint assigned to a string value of a property having side effects on cardinality does not sound right to me. I.e. specificity implying uniqueness. I'm not sure if this is the kind of view you were expecting, but for what it's worth this is mine :) --- ## Post #3 by @bna [quote="Seref, post:2, topic:2492"] I’d expect the latter to be the semantics of the constraint on the value of name. A constraint assigned to a string value of a property having side effects on cardinality does not sound right to me. I.e. specificity implying uniqueness. [/quote] I agree about this. This is the intention by the clinical modellerer. Still the RM implementation have a problem then by creating unique paths to the LOCATABLE elements. In our implementation we use the # annotation to give each repeating locatable a unique name, like this `NAME#` How do other implementers handle this? --- ## Post #4 by @yampeku [quote="Seref, post:2, topic:2492"] You mean: it should be possible to constraint the value for the name of an element to a specific value, but have unlimited instances of that element in the data, each valid according to constraint on the value of name. [/quote] I think this is the case for temporal series of repeated measures (for example) --- ## Post #5 by @bna [quote="yampeku, post:4, topic:2492"] I think this is the case for temporal series of repeated measures (for example) [/quote] Yes. This is one of the usecases for repeated data like elements, clusters or entry archetypes. The problem here is how to interpret the constraint on the name. Should it be interpreted as given and not to be changed? There are several usecases where the name is defined as fixed by the modelerer. Or should it be interpreted as the prefix for the name? AFAIK many implementations use a postfix with #id for repeatable structures. So do we. But for this kind of templates we assume the modelerer has set a fixed name. --- ## Post #6 by @yampeku If it's in an archetype then I assume it's fixed to that value. If it's in a template you are probably redefining the semantics of the node, so an atXXXX.1 , atXXXX.2 , atXXXX.n would be needed, where you can fix the value of the name to all the alternatives you need --- ## Post #7 by @Seref Before I forget, this is my take from today's discussion (which took place elsewhere), kind of a set of meeting minutes: - There's nothing in the openEHR specs that indicates that constraining the specificity of a name of an element should imply uniqueness of that element in terms of cardinality, if the element is under a type that allows anything beyond 0..1 - Modelling tools however, interpret a constraint on a name as a constraint on its cardinality as well. Ocean template designer and Better's Archetype Designer both modify the cardinality of an element in this case. - Users are reportedly manually modifying the generated models to relax the cardinality constraint, therefore reverting the interpretation of the tools. - Bjorn's users do not care about cardinality, they just need the element to have only one name. - (I think) this is to ensure queries can fetch this data element There are two ways the spec can clarify this ambiguity: at the RM level, or at the ADL (constraint) level. This is discussion to be had in the future in SEC. The most pragmatic thing for @bna may be to follow other users and modify the tooling output, and also suggest to Better that their tool displays a warning that constraining the name will modify the cardinality as well (@erik.sundvall 's suggestion as far as I can remember) @heath.frankel I suspect you have some valuable historical background for this, as usual, so feel free to educate me here. (as the various edits prove, I'm tired, so I'll go have some tea now) --- ## Post #8 by @yampeku [quote="Seref, post:7, topic:2492"] There’s nothing in the openEHR specs that indicates that constraining the specificity of a name of an element should imply uniqueness of that element in terms of cardinality, if the element is under a type that allows anything beyond 0…1 [/quote] I was a bit lost during the meeting, but this has lost me completely: Name attribute on Element has cardinality of 1..1 The value of a DV_TEXT inside of a name has also 1..1 cardinality. When we talk about "lists" in this context (and fixing to a specific value), are we talking about list constraints of the string inside the DV_TEXT? Alternatives of DV_TEXT? Fixing to a specific text is selecting a valid string for an actual list of strings coming from the parent archetype or is selecting a DV_TEXT alternative? --- ## Post #9 by @heath.frankel Quick answer for now. There is much history here, and from memory you are correct, there was always an undocumented rule that the name of a node needed to be unique within container. This applied at the RM level. In order for templates to be implementable against the RM, not just logical, when a constrain was applied to the name attribute of a node, the maxOccurs for that name constrained (cloned) node was explicit changed set to 1. Although this ensured templates would be implementable against this undocumented RM rule, it made logical modelling difficult where the name was constrained purely for labelling the node and not intended to constrain the occurrences. The cause of this problem was that in ADL1.4, we did not have any means to reference a constrained node other than using a name constraint. I understand this has been resolved in ADL2. I believe there was a Jira card raised for this issue to make it explicit in the RM that names do NOT need to be unique within a container, and Ocean proceeded to change their EHR libraries to allow this. However, the TemplateDesigner was not changed, partial to ensure this was done in a coordinated manner across the openEHR tooling chain including CKM and Archetype Designer, etc. Hope this helps. --- ## Post #10 by @joostholslag Am I correct this is no longer an issue in adl2? Since I think I’ve done this multiple times without issue. (Only issue was redefinition of a mandatory node; that’s been fixed recently) --- ## Post #11 by @Seref You have every right to feel lost here, it took us a few rounds of communication to land on the same page :) When we talk about lists, we're talking about containers, as Heath correctly said in his message. So we're talking about things that can contain more than one ELEMENT, i.e. subtypes of ITEM_STRUCTURE. The situation at a high level is: when a constraint is introduced for the name of an ELEMENT, the ITEM_STRUCTURE subtype containing the ELEMENT with that constraint cannot contain more than 1 instance. I don't know what the tools do exactly for ITEM_TREE, ITEM_list etc but that's the general behaviour we're talking about. Bjorn can clarify where the ELEMENT in his archetype was sitting when this problem emerged :) --- ## Post #12 by @yampeku I don't think we enforce this behavior BTW Probably related that we usually fill name attribute on the mapping phase to allow multilinguality --- ## Post #13 by @thomas.beale [quote="heath.frankel, post:9, topic:2492"] The cause of this problem was that in ADL1.4, we did not have any means to reference a constrained node other than using a name constraint. I understand this has been resolved in ADL2. [/quote] We have not quite resolved it - to do so would mean allowing the formation of paths using Xpath-like predicates, e.g. items[3]/events[1] and so on, if we wanted to access the Nth item / event. The original idea of requiring LOCATABLE.name to be unique among siblings was that runtime names should have clinical meaning. An runtime path of the form `items[at0004, 'name1']` (meaning `items[at0004 and name/value = 'name1']`) will access the at0004 node that has its `name`attribute = `name1`. It has been historically assumed that these names had to be unique in order to construct such paths. If we forget the unique name requirement, which we gave up on some years, ago, it means a path like the above could select two or more nodes. Thus, some other kind of predicate is needed to uniquely select nodes in runtime data. The most recent (5y ago) was `LOCATABLE.sequence_id` would allow this to work, or we could just use plain list order, as above, i.e. `items[3]` etc. --- ## Post #14 by @yampeku CDA uses that kind of paths in implementation guides by the way. If you can ensure that objects are ordered inside (which you can mandate in ADL) then paths using position() are just fine --- ## Post #15 by @Seref Hello @thomas.beale @bna @yampeku I'm pinging again about this because I just came across some data today that was committed with ..name_1, ...name_2 values and I found this topic while searching the forum which I've completely forgotten. Tom: I have a question. Your comment above is interesting: [quote="thomas.beale, post:13, topic:2492"] It has been historically assumed that these names had to be unique in order to construct such paths. If we forget the unique name requirement, which we gave up on some years, ago, it means a path like the above could select two or more nodes [/quote] What's the problem here? There are many cases in which a path can return N results, `COMPOSITION\content` being an obvious one. What's the problem with assuming a path may return N results? It's all over the RM in the form of fields with cardinality greater than 1 and we're already living with it in AQL implementations. Can we progress with this in SEC? I think it's a pretty fundamental thing and I'm in the mind that we should remove the unofficial rule Heath mentioned above, but I'm happy to have a conversation going as a starting point :slight_smile: --- ## Post #16 by @thomas.beale [quote="Seref, post:15, topic:2492"] What’s the problem here? There are many cases in which a path can return N results [/quote] Yep - that's not the problem (and it's why there are [functions returning multiple items in PATHABLE](https://specifications.openehr.org/releases/RM/latest/common.html#_pathable_class)). The question is: is there a way to build a path that refers to a specific leaf deep in a hierarchy? I.e. a so-called runtime path. The original problem posed by @bna was whether you can have the `name` field constrained to X and still have multiple occurrences of that archetype node. Answer: you can (that's the rule we changed), but it means you now can't count on the `name` field as a guaranteed node identifier to form a run-time unique path. Which means such paths need to use some other field, or else ordinal position (i.e. `items[3]` or similar). We have a long-term idea currently known as `sequence_id` (see [this CR](https://openehr.atlassian.net/browse/SPECRM-63)) that would provide another field guaranteed to have a unique id (without killing the system with UIDs or whatever), to help the formation of such paths. Evidently, solving this issue (not too hard technically) hasn't been a high priority, or we would have done it. --- ## Post #17 by @yampeku [quote="thomas.beale, post:16, topic:2492"] without killing the system with UIDs or whateve [/quote] But it says it's mandatory in the description, I would assume at least make it optional and use it only where needed --- ## Post #18 by @thomas.beale [quote="yampeku, post:17, topic:2492"] mandatory in the description [/quote] Yep the idea is that `sequence_id` (sometimes known as an 'accession number') would always be set on every item added under a container attribute, so then it's reliable. We didn't implement it to date, because it gets tricky over versioning with node deletions, and keeping track of the highest code. Needs a bit more design thinking. --- ## Post #19 by @Seref [quote="thomas.beale, post:16, topic:2492"] Answer: you can (that’s the rule we changed), [/quote] Ok, this was the bit that was not clear to me. I didn't realise this change took place. Is it in the spec/documented now? [quote="thomas.beale, post:16, topic:2492"] but it means you now can’t count on the `name` field as a guaranteed node identifier to form a run-time unique path. Which means such paths need to use some other field, or else ordinal position (i.e. `items[3]` or similar). [/quote] that is now a relevant, but separate requirement IMHO. Personally, I apologise for being pedantic, but I disagree with this: [quote="thomas.beale, post:16, topic:2492"] it means you now can’t count on the `name` field as a guaranteed node identifier to form a run-time unique path [/quote] the path is always unique since there is only one `name` property in the RM. It's the value at the path that may not be singular, and that is not a problem because we know we're pointing at the `name` of something that is sitting in a container thanks to RM. So I'll be even more annoying and redefine your comment as **no longer being able to piggyback the artificial cardinality constraint as index access [0]**. :) I am not sure how much real world value we could get from sequential access to containers: why would `[i]` be of interest to someone compared to `[i-1]` or `[i+1]` ? So if the change you mentioned above is made, then this issue is actually solved. Indexed access and its benefits is another discussion IMHO. :) --- ## Post #20 by @thomas.beale [quote="Seref, post:19, topic:2492"] Ok, this was the bit that was not clear to me. I didn’t realise this change took place. Is it in the spec/documented now? [/quote] https://openehr.atlassian.net/browse/SPECRM-27 [quote="Seref, post:19, topic:2492"] Personally, I apologise for being pedantic, but I disagree with this: it means you now can’t count on the `name` field as a guaranteed node identifier to form a run-time unique path [/quote] I used to, but was convinced otherwise. I blame @heath.frankel ;) But more seriously, various people close to implementation said similar things about that rule being too restrictive. [quote="Seref, post:19, topic:2492"] the path is always unique since there is only one `name` property in the RM [/quote] Not sure what you mean by 'the path is always unique' - to me that means: a path with fixed 'name' values always picks out exactly one item in a tree. Which it is not guaranteed to, due to the relaxation of the unique name rule. [quote="Seref, post:19, topic:2492"] I am not sure how much real world value we could get from sequential access to containers: why would `[i]` be of interest to someone compared to `[i-1]` or `[i+1]` ? [/quote] The only interest is that once you have such a path and know which node it corresponds to, that correspondence is guaranteed unique and correct forever. Otherwise, I agree - that's why we have not actioned that CR, and maybe we never will. It's really just a placeholder for the problem of unique paths that are not difficult to construct. Personally, I would still prefer if the `name` field were unique across sibling objects in a container - that was its intent, and it makes for easy unique runtime path construction. The problem of managing uniqueness *within archetypes* while still being able to constrain the `name` to something could have been achieved by allowing `name` field constraint to be a regex or pattern with wildcards e.g. `"sample*"`. It would be up to the runtime system to ensure unique name values, e.g. like `"sample#1"`, `"sample#2"`, etc. Another alternative was to allow the fixing of the `name` field, and also occurrences = 1, but then to allow more clones of the same object, with no `name` field constraint, or a pattern-based one. All possible in ADL2. But the current solution is ok as well. --- ## Post #21 by @Seref [quote="thomas.beale, post:20, topic:2492"] Not sure what you mean by ‘the path is always unique’ - to me that means: a path with fixed ‘name’ values always picks out exactly one item in a tree. [/quote] I was being really picky, but that's not important, let's leave that aside. My point is that the interpretation of a fixed name that implies uniqueness of the object with that name may be valid, but IMHO it is not compatible with the most common interpretation of containers with elements: a list. Your interpretation is that of a set. The reason people close to implementation are inclined to think in terms of a list most of the time but then they see set semantics enforced (well, they saw, now that it has changed) in a tricky way, they get confused. There are some interesting question here though. When is the set semantics required? I'd love to hear some clinical examples. @heather.leslie @ian.mcnicoll ? . If the uniqueness of elements of the set is determined by the name, does this also indicate their terminology/term mappings should be unique? The use of lists in the RM types is ubiquitous, but I have not seen a dedicated container type (set) used in entry subtypes etc. So how do the modellers express this? Clusters with different elements with cardinality `0..1` ? Something else? Just say "you're lost" if I completely lost the plot here and I'll go and do my homework :slight_smile: --- ## Post #22 by @thomas.beale [quote="Seref, post:21, topic:2492"] Your interpretation is that of a set. The reason people close to implementation are inclined to think in terms of a list [/quote] Partly right. Sets usually imply that order is not significant, but we always treat order as significant in openEHR, hence the use of `List<>` containers. (See your second question). Just for reference - consider the [definition of LOCATABLE.name in the spec](https://specifications.openehr.org/releases/RM/latest/common.html#_locatable_class). Let's ask the question: why at runtime would anyone give two sibling elements identical names, if the `name` field is the one intended to identify *at runtime* each element? If it's not the name field, what other field makes sibling items uniquely identifiable at runtime (or you could equivalently ask, what set of fields taken together form a unique key for an object)? That was the original design reasoning. I still think it was correct ;) --- ## Post #23 by @Seref [quote="thomas.beale, post:22, topic:2492"] but we always treat order as significant in openEHR, hence the use of `List<>` containers [/quote] hmm, that's the XSD:sequence semantics. I'll take your word for it if you say that's how it's defined. [quote="thomas.beale, post:22, topic:2492"] Let’s ask the question: why at runtime would anyone give two sibling elements identical names, if the `name` field is the one intended to identify *at runtime* each element? [/quote] Please travel to the original question that started this thread. I revisited this thread because I came across a number of name_1, name_2 instance in data, which made sense to have the same names but ended up _1,_2 etc due to the now removed restriction. Example: a list of infection control surveillance actions (not RM actions, but actions as in human actions,) taken. Common sense to expect something like "a list of actions taken by the infection control professional" to be represented in a container type in the RM. So I think this is proof that someone may indeed give identical names to two sibling elements during runtime. --- ## Post #24 by @thomas.beale [quote="Seref, post:23, topic:2492"] Common sense to expect something like “a list of actions taken by the infection control professional” to be represented in a container type in the RM. So I think this is proof that someone may indeed give identical names to two sibling elements during runtime [/quote] Well the user would, like 'sample' or whatever, but if it were me, I'd want the system to either add ordinal numbers or time-stamps, so I could later see the difference. Consider I come back 3 months later (as a clinical user) and want to understand what distinguishes those (say) 10 samples - the names are useless (I know they are 'samples') - I have to search for something else. Hopefully the application knows that in Observations, timestamps can be found on the Event objects, but a generic viewer (e.g. a patient portal) that might not even have the archetypes can't display the samples in any meaningful way to the user. Removing the unique name rule doesn't mean systems can't do unique naming of course. Anyway, I lost that argument a long time ago, so implementers are presumably finding other solutions that work. As it should be. --- ## Post #25 by @bna [quote="thomas.beale, post:24, topic:2492"] Removing the unique name rule doesn’t mean systems can’t do unique naming of course. Anyway, I lost that argument a long time ago, so implementers are presumably finding other solutions that work. As it should be. [/quote] Given a template with i.e. two openEHR-EHR-ACTION.procedure.v1 The first one is given the name "P1_NO" and the second "P2_NO" in the norwegian language and "P1_EN" and "P2_EN" in the english language. Seen in the user interrace of the archetype designer. The OPT is generated with norwegian as the primary language. In Better Archetype Designer the OPT will be given a constraint on the name to "P1_NO" for the first instance and "P2_NO" for the latter. We think the data then MUST have the given name. And if there are multiple occurences each instance MUST have the same name. And also that the same name will be used to define each element for any language based on the OPT. As a consequence there is no way to adress a specific locatable by using the name. To be able to adress a specific procedure instance using i.e. AQL or EHR_URI some other mechanism must be used. We suggest either: a) Use the index in the list, i.e. 0 for the first item like /content[0 and name/value='P2_NO'] b) Use the UID for each locatable - /content[uid/value='some_guid' and name/value='P2_NO'] (here the name is not needed since the guid will identify a unique instance Note that the term definition will use the localised name. We think this is used for user-interface purposes like the archetype designer or a form renderer. But the data MUST have the given constraint for name. Note also that the name of the item is defined by the primary language when generating the OPT. This means that the "same item" will have different paths (name/value) depending on the primary languate of the OPT. This is something the person generating the OPT must consider. There is a risk to define data that is not compatible between languages and installations. Two differences between Oceans Template Designer and Better Archetype Designer: a) Oceans does not update the term definition and only the constraint on name b) Better allows multiple occurences of items with constrained name. --- ## Post #26 by @thomas.beale [quote="bna, post:25, topic:2492"] As a consequence there is no way to adress a specific locatable by using the name. To be able to adress a specific procedure instance using i.e. AQL or EHR_URI some other mechanism must be used. We suggest either: a) Use the index in the list, i.e. 0 for the first item like /content[0 and name/value=‘P2_NO’] b) Use the UID for each locatable - /content[uid/value=‘some_guid’ and name/value=‘P2_NO’] (here the name is not needed since the guid will identify a unique instance [/quote] Option b) I think is very unpalatable, since it would force the addition of GUIDs absolutely everywhere, which could nearly double the size of EHRs, but also causes other problems (which I have analysed in the past for ISO 13606). Option a) would be an 'accession number' or 'sequence id', the topic of [SPECRM-63](https://openehr.atlassian.net/browse/SPECRM-63). If this were done, then it is a short step back to just appending that number to each new node when it is created, and we are back to names like 'name_2', 'name#2' or similar. If all openEHR systems assumed that a trailing '#xxx' in the name field was the sequence id, then it becomes easy to figure out the 'name' (i.e. the bit before the '#') and also no complexity is needed to find either a specific node (query with 'name#N') or all nodes with a certain name (query with 'name*' or 'name$'). If we don't do that, we need a specific field, and the current proposal is sequence_id. (I'm not sure if the proposal in that CR is too complicated though... we should revisit that). --- ## Post #27 by @Seref [quote="bna, post:25, topic:2492"] We think the data then MUST have the given name. And if there are multiple occurences each instance MUST have the same name. [/quote] That's what I advocated as well. [quote="bna, post:25, topic:2492"] As a consequence there is no way to adress a specific locatable by using the name. To be able to adress a specific procedure instance using i.e. AQL or EHR_URI some other mechanism must be used [/quote] I always find this requirement intriguing. Can you expand the AQL scenario a bit here? Specifically, how do you known the specific locatable is in index N ? What makes it specific compared to others sitting in the same collection? If there's something about the element in index N that makes it unique, would not it be more robust to use that criteria in the AQL query? The other issue is the ordering of elements in a result set in AQL: would not json serialisation/deserialisation roundtrips and the technology used by the aql implementation make it hard, if not impossible to assume/guarantee your aql query will always return element E in index N? --- ## Post #28 by @ian.mcnicoll The main value of having a specific identifier for a specific path would be for referencing that item from an external system e.g FHIR or even internally. I agree less likely as part of AQL , other than just to pick yup that reference directly --- ## Post #29 by @yampeku also for transformation and validation --- ## Post #30 by @bna [quote="Seref, post:27, topic:2492"] That’s what I advocated as well. [/quote] I wonder what is implemented on this? Do all follow this principle? --- ## Post #31 by @bna [quote="Seref, post:27, topic:2492"] I always find this requirement intriguing. Can you expand the AQL scenario a bit here? Specifically, how do you known the specific locatable is in index N ? What makes it specific compared to others sitting in the same collection? [/quote] I am not sure if I am able to explain here - but I will give it a try. To be able to query an item in a list like the example given with one or more ACTION.procedure placed in content some logic must be applied by the CDR. It has to read the incoming composition and build an index of the items. The index must be stored along with the data in the specific version. The compositions must, of course, be immutable for each version. The serialized form with the index must not be changed. The use-case for such queries are not common. I have used it a few times to query the n-first items of a known collection. In these use-cases the data within each item is not important - it was only needed to query and display the n-first items. Most use-cases with AQL will query for specific data, i.e. the procedure name/type defined by some terminology. --- ## Post #32 by @bna [quote="bna, post:31, topic:2492"] In these use-cases the data within each item is not important - it was only needed to query and display the n-first items. [/quote] An obvious response to this could be: "Why not use order and limit in the AQL?" I could have done that - but what to order by? Since the only knowledge I have about the data is the ordered sequence in the composition I have nothing else to order data on. --- ## Post #33 by @Seref Thanks, I appreciate the time you took to respond @bna I think I can see now that your response above was not a specific question, but more of an elaboration of the situation one could end up with if unique name constraint is not enforced, which we both agree that it should not be (enforced). The clinical scenario you have in mind seems to be (I'm beginning to sound like clippy), I have a bunch of activities recorded, and they all have the same name, and if I need to reference one or more of these, then what? It is a pretty good question and I think the answers would be different for AQL and EHR URL scenarios. I think AQL can do pretty well here given ACTION is a type likely to be supported in the FROM clause and AQL gives us the predicate constraint syntax. So you could eliminate/choose using `... ACTION act[atcode AND something_other_than_name/value = 'criteria'...]` Syntax-wise, predicate constraints can be any property and logical operators are also supported, so it may help you (assuming I got things right). EHR URL/path is more interesting because **(as far as can I can remember)** it does not have predicates, which leads to a new can of worms: what if we considered paths with predicates, and made EHR URLs Xpath to our Xquery (aql)? If my memory is wrong and it does support predicates in paths, then the same solution above applies. I'll stop now, in case I'm building a tower of assumptions here, but if I got it and you want to discuss a specific use case for accessing activities, happy to discuss that. --- ## Post #34 by @ian.mcnicoll From memory, the original requirement to have unique names was really driven by the need for Template designer to distinguish unique paths for different constraints on a 'cloned' node, rather aa run-time requirement primarily. However that behaviour drove CDRs to place pseudo- sequenceIDs onto sibling nodes names 'diagnosis#1' etc. That, in theory has started to go away at least in AD where with in the raw templates, cloned nodes are given a different nodeId (ADL2 syntax) but this does not find its way into the opt. So let's park the design-time need and concentrate on the run-time need to distinguish sibling nodes with identical paths, which is primarily about having a unique ID or path that can be referenced correctly. e.g to provide a FHIR resourceID, or a reference/path as part of an EHR_URI. Whilst it is theoretically possible to construct a unique path based on predicates, I don't think this is really practical, as the distinguishing attribute is largely impossible to predict Lets say the use case is a list of problem/diagnosis archetypes in a problem list, where each entry needs to be uniquely referencable Example procedure list | Date | Procedure | Laterality | Comment | | -------- | ------------ | -------- | -------- | | 1995 | Hip replacement | | | | 1996 | Hip replacement | | Tricky procedure | | 2021 | Hip replacement | Left | First one failed | | 2021 | Hip replacement | Right | Right implant failed | In this example the patient had a hip replacement in 1995 (laterality not documented) - the same in 1996, then in 2022 those replacements had to be re-done. Other than basing your predicate logic on every possible attribute/element in the Entry, it would be impossible to have any sort of generic approach to unique identification. My gut feeling is that we need to go for uid (or uid_compositionId) at Entry level plus sequenceIds on multiple nodes like events, activities as well as clusters and elements - that is not so far from what is actually happening right now, other than that the sequenceId name/suffix aproach is a bit hacky and non-standardised. --- ## Post #35 by @Seref Allow me to disagree Ian :slight_smile: [quote="ian.mcnicoll, post:34, topic:2492"] Whilst it is theoretically possible to construct a unique path based on predicates, I don’t think this is really practical, as the distinguishing attribute is largely impossible to predict [/quote] this, [quote="ian.mcnicoll, post:34, topic:2492"] Other than basing your predicate logic on every possible attribute/element in the Entry, it would be impossible to have any sort of generic approach to unique identification. [/quote] and this are about the point of uniquely identifying data, but that requirement is context-free in a sense. Why are you in a need to uniquely identify these nodes? What is your "querying or referencing data" context? Your approach sounds to me to be predicated on the assumption that if you have some unique identifier for the node, that guarantees all requirements for querying and referencing sibling data that has the same at code sitting under a collection. Guess what, it does not :) Because the unique identity of a node does not necessarily imply any clinical data semantics, such as the suggested guid, and both querying and referencing is based on clinical data semantics. Your suggestion has a chicken and egg problem: you need to pull all the siblings first, filter out the data items (siblings) that fit your criteria, then use their guids to reference them, so you have to establish the association between the semantics (query criteria) and data identifiers (guids) for them to be useful. So you have to fall back to using a predicate criteria to get to the guids you're interested in in the first place. If the data is missing as in your hip replacement example, you won't even get to its guid btw. It gets worse, you now lost the semantics if you build a url or a query (following the first one to get to uids first) using the uids because if you have EHR_URL_1 and EHR_URL_2 pointing at the same set of siblings using uids, just looking at the path of the URL won't give you any clues as to what that/those guids are fetching. As in `ehr://../../action/../[ae213-2323-...]` and `ehr://../../action/../[ae213-2323-...]` vs `ehr://../../action/../[outcome = 'outcome 0']` and `ehr://../../action/../[outcome = 'outcome1']` (to reuse the examples above to clarify: you'll need to use the second set to get to guids in the first place) What entry level uids may help is allow system implementers to distinguish between two sibling nodes both of which match the criteria of `ehr://../../action/../[outcome = 'outcome 0']` but that's an internal concern, much like the `.getHashCode()` or similar methods supported in mainstream OO languages. It is used all the time when you add/search items in collections, but you never see them exposed or used in actual code. [quote="ian.mcnicoll, post:34, topic:2492"] Whilst it is theoretically possible to construct a unique path based on predicates, I don’t think this is really practical, as the distinguishing attribute is largely impossible to predict [/quote] In light of all the headache I've given you above, reconsider this statement please :) My objection is: it is not only practical, but also sensible and useful to use predicate based paths, because they express meaning, stay closer to level 2 of two level modelling, unlike using guids, which fall down to level 1. If I missed the point as I often do, I'l buy you :beers: as per the usual arrangement... --- ## Post #36 by @bna [quote="ian.mcnicoll, post:34, topic:2492"] |Date|Procedure|Laterality|Comment| | --- | --- | --- | --- | |1995|Hip replacement||| |1996|Hip replacement||Tricky procedure| |2021|Hip replacement|Left|First one failed| |2021|Hip replacement|Right|Right implant failed| [/quote] This is a great example - and the simple question to be implemented is: How do you make an unique EHR_URI to the first item in the table? I am not sure how the table is generated. Some possibilities is: 1. Multipe instances of the same procedure archetye in a single composition 2. The result set from some AQL to get a list of procedure archetypes matching the procedure Hip replacement It's not important how the table is generated. My only concern is how to make a single and deterministic identifier of the first item. My use-case could i.e. be to send Ian some background data about the first replacement in a structured way to make it possible to update the laterality. IMHO we need some rule to share between implementations and it can't be based on data. The item might be selected by some user-interface where I click on the row. The application has no way to tell which data attributes defined my selection. Thus we need an index number where siblings are at the same level with the same name or the UID based approach. IMHO all implementations today use the #-based approach building a pseudo-index of siblings with same name. This works reasonably well in our system and I assume in others as well. This pattern is based on the assumption that all name on siblings must be unique and as such adressable/locatable/pathable. If we remove the unique constraint on name we need some other approach to get adressable paths. --- ## Post #37 by @yampeku [quote="thomas.beale, post:26, topic:2492"] Option b) I think is very unpalatable, since it would force the addition of GUIDs absolutely everywhere, which could nearly double the size of EHRs, but also causes other problems (which I have analysed in the past for ISO 13606). [/quote] Well, in principle you could get away with defining GUIDs to every part that can be potentially problematic with current usage. In particular I think it would solve almost all problems if you put guids just in the sibling ad_hoc sections (or in the ad_hoc structures with no real identifier) --- ## Post #38 by @bna [quote="Seref, post:35, topic:2492"] stay closer to level 2 of two level modelling, unlike using guids, which fall down to level 1. [/quote] I am looking for a level 1 approach. My intention for this thread was to give my platform developers an answer on how to identify siblings with the same name. They should not know anything about the clinical context , only the specifications. The identification of the siblings will be used in multiple software components like our Form Designer and Renderer, lists of clinical data with references back to the original, integration layer like two-way FHIR binding, etc. --- ## Post #39 by @Seref [quote="bna, post:38, topic:2492"] My intention for this thread was to give my platform developers an answer on how to identify siblings with the same name. They should not know anything about the clinical context , only the specifications. [/quote] Thanks @bna , my response so far was mainly addressing this bit of your input: [quote="bna, post:25, topic:2492"] To be able to adress a specific procedure instance using i.e. AQL or EHR_URI some other mechanism must be used. [/quote] I consider these to be different than this: [quote="bna, post:38, topic:2492"] The identification of the siblings will be used in multiple software components like our Form Designer and Renderer, [/quote] A form designer processing a specific piece of data is quite different then the context I was referring to. In that case you do have a need to identify and distinguish data, which is what I briefly mentioned as: [quote="Seref, post:35, topic:2492"] What entry level uids may help is allow system implementers to distinguish between two sibling nodes both of which match the criteria of `ehr://../../action/../[outcome = 'outcome 0']` but that’s an internal concern, much like the `.getHashCode()` or similar methods supported in mainstream OO languages. [/quote] Read the above as .. allow form designer implementers to distinguish between two sibling nodes.. and it justifies having uids. I've been specifically referring to use cases for AQL and EHR URLs, which I have strong opinions about. Your form designer may be built on these and then uid would leak into these features/concepts, which is something I'm strongly against, but I'm not working at DIPS, so I'll leave it at that :slight_smile: --- ## Post #40 by @thomas.beale [quote="ian.mcnicoll, post:34, topic:2492"] In this example the patient had a hip replacement in 1995 (laterality not documented) - the same in 1996, then in 2022 those replacements had to be re-done. [/quote] Well, let's hope that the 'dates' are a bit better than just the year ;) [quote="ian.mcnicoll, post:34, topic:2492"] Other than basing your predicate logic on every possible attribute/element in the Entry, it would be impossible to have any sort of generic approach to unique identification. [/quote] Well that's probably true, hence the idea of having a `sequence_id` attribute, so you are guaranteed that that will always be there no matter what could be in other fields. [quote="ian.mcnicoll, post:34, topic:2492"] My gut feeling is that we need to go for uid (or uid_compositionId) at Entry level plus sequenceIds on multiple nodes like events, activities as well as clusters and elements - that is not so far from what is actually happening right now, other than that the sequenceId name/suffix aproach is a bit hacky and non-standardised. [/quote] With the addition of Guids on Compositions as well, that's pretty much my view as well. It's guaranteed to work and doesn't overload the EHR with billions (yes) of Guids, mostly useless. [quote="Seref, post:35, topic:2492"] Why are you in a need to uniquely identify these nodes [/quote] Well, the main needs were mentioned earlier - to be able to provide a reliable id of a specific atom (e.g. the systolic pressure of the 38th BP sample in a series) to some other system or service, or even for internal use - to generate a citation to just that path. You are right in your general supposition that you won't get to the atom via normal querying. Instead a normal AQL query will bring back (say) 50 BP samples, and maybe graph them. You see a 180 mmHg on the 38th sample, and choose it somehow on the UI. Now the application can obtain the runtime path of just that item, and to do so, it would use the sequence_id field (plus Guid of enclosing Entry etc), assuming we had such a field. [quote="Seref, post:35, topic:2492"] you need to pull all the siblings first, filter out the data items (siblings) that fit your criteria, then use their guids to reference them [/quote] I think the first bit is correct; but you don't need Guids on each item, you just need some reliable id - hence the `sequence_id` proposal. [quote="Seref, post:35, topic:2492"] It gets worse, you now lost the semantics if you build a url or a query [/quote] Good analysis - this is another reason the universal Guid approach isn't a good one. Whereas with the sequence_id approach, you will use predicates of the form `[atcode='at0051']` or `[name/value='BP measurement']` to get the siblings in the first place, and then you can create fully unique paths containing predicates like `[atcode='at0051' and sequence_id='38']` or you could do `[name/value='BP measurement' and sequence_id='38']`. In fact, you would just be able to do `[sequence_id='38']` if you really want, since those sequence numbers are always unique amongst siblings. And it might make sense to generate such short paths as a kind of direct ref. But as you say, the semantics are no longer visible. If I understand correctly, your objection #1 is * don't avoid putting the semantic signifier in the predicate, otherwise we don't know what it means And your objection #2 is: * don't use `name/value='BP measurement` in such predicates, and you are saying that we should use only the semantic signifier (at- or id-code) and some unique qualifier. I agree with both of these. The `name` field will not tell you anything more than the at- or id-code (prove me wrong, someone ;), and it could be anything, whereas the at/id code is by definition the semantic indicator. The 3rd part of your position is: to get a unique path, use predicates of the form [atcode=X and some_other_field=unique_qualifier]. Also correct in my view. @ian.mcnicoll is saying: yes, but it can't be just any unique qualifier, because you never know which one will be unique. Probably not 100% literally true, but it is practically true - query authors don't have unlimited time to figure out which one it could be. I also agree with this. So, marrying all that together, we should: * use Guids in key places, generally Composition and Entry * include `sequence_id` on every `LOCATABLE` node (only useful on nodes inside containers, but that's most nodes) * initial querying could be on at/id-code OR name/value. At/id-code will get all the siblings; name/value might select just a subset, and so generate a more relevant initial selection list * construct runtime path predicates using the form `[atcode='at0051' and sequence_id='38']` Are we all on the same page? --- ## Post #41 by @Seref [quote="thomas.beale, post:40, topic:2492"] Are we all on the same page? [/quote] Not exactly, though we're significantly overlapping :slight_smile: . Bjorn's use case for querying and referencing data is different than what I assumed it is, but I don't have enough of an understanding to qualify my comments further. That's why I keep insisting on hearing the use case: to be able to think whether or not an approach is a good one in that context and how it effects other contexts/use cases. A UI form's runtime's use of a path (which may be a full EHR path/url) to identify an ELEMENT with DV_TEXT value is different than the use of a path in an AQL query, at least to me. You can justify for having sequence_ids or guids or whatever in the former, but I cannot see why they'd ever end up in a path snippet in an AQL query as I discussed above. Admittedly, I'm rather opinioned about what you should and should not do with AQL so I'm reluctant to reiterate, I'm obviously coming from a different perspective here. --- ## Post #42 by @pablo [quote="bna, post:25, topic:2492"] Given a template with i.e. two openEHR-EHR-ACTION.procedure.v1 The first one is given the name “P1_NO” and the second “P2_NO” in the norwegian language and “P1_EN” and “P2_EN” in the english language. Seen in the user interrace of the archetype designer. The OPT is generated with norwegian as the primary language. In Better Archetype Designer the OPT will be given a constraint on the name to “P1_NO” for the first instance and “P2_NO” for the latter. We think the data then MUST have the given name. And if there are multiple occurences each instance MUST have the same name. And also that the same name will be used to define each element for any language based on the OPT. As a consequence there is no way to adress a specific locatable by using the name. To be able to adress a specific procedure instance using i.e. AQL or EHR_URI some other mechanism must be used. We suggest either: a) Use the index in the list, i.e. 0 for the first item like /content[0 and name/value=‘P2_NO’] b) Use the UID for each locatable - /content[uid/value=‘some_guid’ and name/value=‘P2_NO’] (here the name is not needed since the guid will identify a unique instance Note that the term definition will use the localised name. We think this is used for user-interface purposes like the archetype designer or a form renderer. But the data MUST have the given constraint for name. [/quote] I'm late to the party, and not sure I understand all the points. What I see is: 1. the modelling time issue: giving constraints to names 2. the store time issue: storing data in a way that makes each object with the same name uniquely addressable 3. the query issue: how can I get one specific instance if there are many occurrences with the same name in the same compo 4. the reference issue: having an EHR_URI or LOCATABLE_REF that reference one specific instance if there are many occurrences with the same name in the same compo For 1. I prefer to define DV_CODED_TEXT constraints, having a code is more robust than a name, name could change, codes change rarely. Even better, it will be the same code in different languages, so instead of 4 options for P1_NO, P2_NO, P1_EN, P2_EN you have P1 and P2 as codes. For 2. besides storing the code as constrained in the archetype, we store an instance path, though decide to store the LOCATABLE.uid is also valid. An instance path would look like this: `/content[archetype_id=openEHR-EHR-OBSERVATION.lab_test-blood_glucose.v1](0)/data[at0001]/events[at0002](0)/data[at0003]/items[at0005](0)/value/value` Note we use the archetype_node_id and the index in the collection attributes (kind of the sequence_id mentioned above). So different instances of the same archetype node will be in the same collection with difference index number (0), (1), ... We chose parentheses to differentiate from the archetype_node_id square brackets. For 3. use the code to query, not the name. Names make queries not portable, and make them language dependent. For 4. I would use the instance path. If that could be standardized so we all use the same format, would be cool. In general I try to stay away from names in paths, like in some JSON flat formats are using paths with names, a name changes in the archetype, then you need to change the paths. Same will happens with storing (2), querying (3) and referencing (4). The real problem is a name could change in an archetype and the version of the archetype doesn't changes, so everything will still reference the same archetype version though all your previous uses of the name are invalid. If the archetype version would be updated for name changes, then this won't be a problem, but is not how things work today, and I think nobody wants an archetype with v58345 for each single name change that happened through the archetype history. I hope I understood the problem, if I didn't get it, please ignore my message :rofl: --- ## Post #43 by @damoca Just FYI: interestingly, the ADL path specification already talks about supporting the ordered predicate access: [Archetype Definition Language 1.4 (ADL1.4) (openehr.org)](https://specifications.openehr.org/releases/AM/latest/ADL1.4.html#_relationship_with_w3c_xpath) ![image|690x250](upload://k7gCVVNqDSfiMq2kuSvwm27IZlg.png) --- ## Post #44 by @thomas.beale [quote="Seref, post:41, topic:2492"] You can justify for having sequence_ids or guids or whatever in the former, but I cannot see why they’d ever end up in a path snippet in an AQL query as I discussed above [/quote] Yep. Until I see a reason I understand, I agree. --- ## Post #45 by @thomas.beale [quote="pablo, post:42, topic:2492"] An instance path would look like this: `/content[archetype_id=openEHR-EHR-OBSERVATION.lab_test-blood_glucose.v1]/data[at0001]/events[at0002]/data[at0003]/items[at0005]/value/value` Note we use the archetype_node_id and the index in the collection attributes (kind of the sequence_id mentioned above). So different instances of the same archetype node will be in the same collection with difference index number (0), (1), … [/quote] The first path won't on its own be an instance path. Are you doing something like this: `/content[archetype_id=openEHR-EHR-OBSERVATION.lab_test-blood_glucose.v1]/data[at0001]/events[at0002]`**(2)**`/data[at0003]/items[at0005]`**(4)**`/value/value` This isn't standard syntax... but it's easy to convert to a form that is. The problem with just using the position in the collection is the position of a specific data item can change from one version to the next. So the same instance path won't necessarily point to the same thing across versions. [quote="pablo, post:42, topic:2492"] For 3. use the code to query, not the name. Names make queries not portable, and make them language dependent. [/quote] Yep, they do. But if you are querying for historical data, it must be in some language ;) [quote="pablo, post:42, topic:2492"] For 4. I would use the instance path. If that could be standardized so we all use the same format, would be cool. [/quote] Indeed! --- ## Post #46 by @pablo [quote="thomas.beale, post:45, topic:2492"] `/content[archetype_id=openEHR-EHR-OBSERVATION.lab_test-blood_glucose.v1]/data[at0001]/events[at0002]` **(2)** `/data[at0003]/items[at0005]` **(4)** `/value/value` [/quote] It's discourse formatting, it removed the `(1)` Yes, it was that, just updated my comment. --- ## Post #47 by @pablo [quote="thomas.beale, post:45, topic:2492"] The problem with just using the position in the collection is the position of a specific data item can change from one version to the next. So the same instance path won’t necessarily point to the same thing across versions. [/quote] Is that a requirement? The index is just a way to identify in the same collection not across versions. The only cross-version identifier is LOCATABLE.uid (we keep coming at this and still keep avoiding it) [quote="thomas.beale, post:45, topic:2492"] Yep, they do. But if you are querying for historical data, it must be in some language :wink: [/quote] Data could be in any language, historical or not. If you have compositions in different languages in the same CDR, then a query using names won't get you all the data, it will just miss a lot of records. If there are different CDRs, only one language each, again, you can't use the same query to query in all CDRs, you need to create as many queries as CDRs with different languages you have in your infrastructure. See the problem? This is a issue huge for cloud platform providers, that is why I decided a long time ago to avoid names in queries at all costs, and avoid text constraints if possible to use coded text constraints instead (kind of the CDA approach with OIDs to identify each content item, because there are no archetype node ids or paths in CDA to identify the type of, for instance, an observation). Anyway, I'm not sure I fully understood the original problem or if I'm going off-topic rambling about the pains I had and how I solved them in my own platform. --- ## Post #48 by @thomas.beale [quote="pablo, post:47, topic:2492"] Is that a requirement? [/quote] I would say so. It is going to be confusing and eventually dangerous for a path that points to systolic pressure to wind up pointing to diastolic pressure when resolved against a different version of the data. [quote="pablo, post:47, topic:2492"] The index is just a way to identify in the same collection not across versions. The only cross-version identifier is LOCATABLE.uid (we keep coming at this and still keep avoiding it) [/quote] Across versions, `sequence_id` will work fine - if populated correctly. The basic rule is that you always increment the `sequence_id` for each new sibling data node, no matter in what version. That means that: * the sequence id never changes for a given node * deletions can't cause problems [quote="pablo, post:47, topic:2492"] Data could be in any language, historical or not. If you have compositions in different languages in the same CDR, then a query using names won’t get you all the data, it will just miss a lot of records [/quote] Certainly true. I was just indicating that simple querying with names will work, e.g. if you know all your text data is in Spanish, and was created in some reliable way, so you know spelling etc is the same for the same words etc. But - you'll never know if there is some record with a mis-spelling, or in Portuguese (some Brazilian guy comes to Uruguay) and so on. So your technical argument is correct. --- ## Post #49 by @pablo [quote="thomas.beale, post:48, topic:2492"] [quote="pablo, post:47, topic:2492"] Is that a requirement? [/quote] I would say so. It is going to be confusing and eventually dangerous for a path that points to systolic pressure to wind up pointing to diastolic pressure when resolved against a different version of the data. [/quote] The concept of instance path is to be locally unique, not to be unique across versions. The only thing that is unique across versions is the LOCATABLE.uid, it should be used alongside with the instance path (each has it's own use case). The instance path is actually a valid absolute (from the locatable root) archetype path when the index parts are removed, so it's easy to get the archetype path to get a constraint from the OPT. We are also using those paths in our flat format, because those don't depend on names and allow to transform back to the original locatable instance from the key/value structure. [quote="thomas.beale, post:48, topic:2492"] Across versions, `sequence_id` will work fine - if populated correctly. [/quote] That would be a challenge, because how can you know which node is which without the UID? This will always depend on the data instance. If the sequence_id is set per object instance then it is playing the role of a secondary ID while not using the primary instance ID that is the UID. So why not using just the UID? [quote="thomas.beale, post:48, topic:2492"] I was just indicating that simple querying with names will work, e.g. i**f you know all your text data is in Spanish**... [/quote] That's the issue: you don't know! Let's say I develop a system, then my client populates the system, and my client is from Switzerland, then records could be in French, German, Italian, Romansh, etc. Then an IT guy needs to get some data to populate a report and he creates a query using names in French. See the issue? Then the report is crap. [quote="thomas.beale, post:48, topic:2492"] you’ll never know if there is some record with a mis-spelling [/quote] I don't think that is an issue, because the case we are talking about is when the name constraint is in the template, so the composition instance will have the name values coming from the template not typed in. If something is misspelled, the error is in the template itself, which could happen too! In general, language dependent conditions are not a reliable way to query data, language dependent paths is not a reliable way to represent data. I'm not convinced of many things, most of the time I doubt each step I make, but I'll argue with anyone on this single point :face_with_monocle: --- ## Post #50 by @damoca [quote="pablo, post:49, topic:2492"] That’s the issue: you don’t know! Let’s say I develop a system, then my client populates the system, and my client is from Switzerland, then records could be in French, German, Italian, Romansh, etc. Then an IT guy needs to get some data to populate a report and he creates a query using names in French. See the issue? Then the report is crap. [/quote] I totally agree with Pablo here. We also try to avoid any natural language in paths or queries. For example, in Catalonia each clinician can decide if they write their reports in Spanish or Catalan, and probably freely choose a matching language for the user interface of the EHR system. --- ## Post #51 by @yampeku Seeing that names can change in minor versions and potentially paths can change with names, should name changes always be considered major versions/breaking changes? --- ## Post #52 by @thomas.beale [quote="pablo, post:49, topic:2492"] That’s the issue: you don’t know! [/quote] I'm not disagreeing with you here really. Just pointing out that *for now*, there is very little cross-border interop in most openEHR CDRs or other EMR, and although you as a supplier won't know what could be in a customer's system that you built, there will certainly be customers that can know what is in their systems because they know what it is connected to, and they probably know that all apps are in e.g. Slovenian. I am just saying that *if this is the case*, then you could use names in querying. But it's the only circumstance. As interop improves, and/or multiple languages are allowed in the one system (almost certainly happening in health tourism locations), then all bets are off. Language-dependent querying is definitely not to be relied on in any long-term sense, and no-one should build queries or paths they care about using `name/value`. But some still will in the short term, and it might well work OK for a while. --- ## Post #53 by @thomas.beale [quote="yampeku, post:51, topic:2492, full:true"] Seeing that names can change in minor versions and potentially paths can change with names, should name changes always be considered major versions/breaking changes? [/quote] That is an excellent question. I would say not, but we can only say that if we have a formal rule that 'runtime paths are constructed using predicates of the form `[at/id-code, sequence_id='x']` or similar. We don't yet have that `sequence_id`, but if we did, we'd be able to state that as a rule. Then - breaking changes to constraints on other fields like `name` are by definition not breaking changes to an archetype or template. --- ## Post #54 by @ian.mcnicoll I agree the use of 'name' as a differentiator is problematic but is forced on us by .opt1.4 Archetype Designer does correctly provide codes for 'renamed' nodes but has no way of populating the .opt with these. I don't think name changes should be a breaking change, at least not in archetypes. We can often minimise any impact of an archetype name change by retaining the same name in the template. So practically speaking, we have found this much less of an issue than it might seem, even where tools like Better Studio do use name-based paths to hook up the form controls to the templates. However we do need to get template level name/value codes into .opt and CDRs ASAP. --- ## Post #55 by @pablo [quote="thomas.beale, post:52, topic:2492"] Just pointing out that *for now*, there is very little cross-border interop in most openEHR CDRs or other EMR [/quote] David's example of Catalonia is a intra-border, multi-language case. Though, if that is not a problem `now`, it will be a problem in 3, 6, or 12 months. Which in `specification`time is kind of `now`. We need to plan ahead, not solve issues as they come IMHO. [quote="thomas.beale, post:52, topic:2492"] although you as a supplier won’t know what could be in a customer’s system that you built, there will certainly be customers that can know what is in their systems because they know what it is connected to, and they probably know that all apps are in [/quote] There are different areas, in some cases might be connected, in other cases not. 1. Platform provider: creates the platform 2. Clinical modelers: create archetypes and templates 3. Content creator: users (clinicians, patients, etc) 4. Maintainers: could be the same platform provider or some local developer, they create apps, queries, etc. If modelers take into account language, content creators enter data in different languages, and maintainers don't take into account language and create content using flat formats with paths that depend on names or queries that depend on names, then that could cause issues, like missing data in query results. I think you are making too many assumptions, which are understandable but not applicable in all contexts, while my view is to plan ahead for the worst case scenario (i.e. make no assumptions). I know what you mention could work on 80% of the cases, an I try to focus on the other 20%. --- ## Post #56 by @pablo [quote="ian.mcnicoll, post:54, topic:2492"] However we do need to get template level name/value codes into .opt and CDRs ASAP. [/quote] Exactly, that is what I mentioned above which is kind of the CDA approach to differentiate between entries in the body. So to use DV_CODED_TEXT constraints for the LOCATABLE.name instead of DV_TEXT at the OPT level (could be text at the archetype level). --- ## Post #57 by @ian.mcnicoll I've been trying to get my head round the various sub-threads/perspectives here on handlimg multiple occurences in AQL paths There are several different places where this might be or seem an issue 1. AQL-based querying @Seref is coming at this from an AQL perspective i.e using an AQL path as part of a SELECT or WHERE clause, where I agree it is very unlikely that you would want to do WHERE bp[2]/systolic/magnitude > 120 and that use of predicates could usefully be extended. 2. Unique identification of a node for referencing purposes Either for external use e.g to populate a FHIR resource or as the target for an EHR_URI. e.g. A reference from a medication order Indication element "Asthma" to the problem-diagnosis entry instance where the original Asthma diagnosis is documented. In this case there is a list of problem-diagnosis entries with identical name/value 'problem/diagnosis' and I still believe no generic way to construct unique paths to each of these entries without including potentially the values of every optional element with in the path. In this case, we get no help from template-level coded name overrides, as these will be identical. So either we use uids or some kind of sequenceId. Using the #suffix approach is how this works right now but would be much better if we had a specific attribute, rather than hacking the name/value e.g name/value = 'Problem/diagnosis#1' name/value = 'Problem/diagnosis#2' etc. My instinct is to use uid at Entry level but sequenceId below e.g for multiple events/activities or multiple clusters and elements, where these have not been renamed explicitly in the template. I think that lets us work most smoothly with the outside world e.g a FHIR Id might be a uid plus a short AqlPath/sequenceID to the event instance that equates to a FHIR Observation. 3. Design-time paths. Subtly different is where we want to reference a specific template constraint path for use in the tooling space e.g form building. Previous comments about not using names are correct but this is a limitation of opt1.4. Most of these issues would be resolved by getting coded name/values in there. I've not come across a situation in the design space where we would need to work with multiple occurrences of a design-time path where the name/values are identical (problem-diagnosis example above) but perhaps @bna can give an example. So for now, I'm not clear where having sequenceIds would be of assistance. In summary I think this comes down to 1. Getting code-based name/value into .opt 2. using uid and/or sequenceId to allow multiple run-time occurrences of a path based on the same name/value text/codedText. I favour a mix - uid at Entry level, sequenceID below that --- ## Post #58 by @bna [quote="ian.mcnicoll, post:57, topic:2492"] I’ve not come across a situation in the design space where we would need to work with multiple occurrences of a design-time path where the name/values are identical (problem-diagnosis example above) but perhaps @bna can give an example. So for now, I’m not clear where having sequenceIds would be of assistance. [/quote] You are right. This is not a design time issue/problem. The problem appears when you at design time duplicate i.e. an archetype root and give it a new name and at design time defines the node with occurences `0..>1`. Current OPT generation of archetype/template designers is to define the given name as a constraint on the node. Which means the name can't change which leads to not-unique names in the COMPOSITION. When generating an `EHR_URI` to define a unique path to a node in a COMPOSITION our implementation use the name of the node to define a unique path to the node. If two nodes have the same name you will, of course, get two nodes matching the path. I am sorry to confuse you with AQL in this manner. That's not the most important issue. When querying you will most likely expect multiple results matching some criteria. The most important thing to get consensus on is, IMHO, how to adress a unique nodes. This is item 2 in your excellent post @ian.mcnicoll : [quote="ian.mcnicoll, post:57, topic:2492"] Unique identification of a node for referencing purposes [/quote] I will also support you on this. This is very seldom and unlikely. Still I argue that is should be possible in openEHR to to this. [quote="ian.mcnicoll, post:57, topic:2492"] 1. AQL-based querying @Seref is coming at this from an AQL perspective i.e using an AQL path as part of a SELECT or WHERE clause, where I agree it is very unlikely that you would want to do WHERE bp[2]/systolic/magnitude > 120 and that use of predicates could usefully be extended. [/quote] [quote="pablo, post:55, topic:2492"] David’s example of Catalonia is a intra-border, multi-language case. Though, if that is not a problem `now`, it will be a problem in 3, 6, or 12 months. Which in `specification`time is kind of `now`. We need to plan ahead, not solve issues as they come IMHO. [/quote] I follow @pablo and @damoca on this. We should plan ahead an establish patterns to make application portable between systems, installations and languages. This is, IMHO, where openEHR has it's strengths. When we use plain-archetype based templates this is solved today. The problem seems to arise on the Template design and generating part of the ecosystem. If we solve the unique path problem descibed above we will more prepared. --- ## Post #59 by @pablo [quote="ian.mcnicoll, post:57, topic:2492"] In this case there is a list of problem-diagnosis entries with identical name/value ‘problem/diagnosis’ and I still believe no generic way to construct unique paths to each of these entries without including potentially the values of every optional element with in the path. In this case, we get no help from template-level coded name overrides, as these will be identical. So either we use uids or some kind of sequenceId. Using the #suffix approach is how this works right now but would be much better if we had a specific attribute, rather than hacking the name/value e.g name/value = ‘Problem/diagnosis#1’ name/value = ‘Problem/diagnosis#2’ etc. My instinct is to use uid at Entry level but sequenceId below e.g for multiple events/activities or multiple clusters and elements, where these have not been renamed explicitly in the template. [/quote] That is clearly a requirement for a data identifier, paths don't work for that, even the instance paths I mentioned if we consider the need of identifying something across different versions of the locatable (compo, status, folder, person, etc). I think this should be in a best practices guide, not in the spec?: "if you have this case bla bla bla.... then if you want to reference a node .... then you should set/use locatable.uid ...". Just because the uid is optional. Not sure why we need to mix mechanisms (uid + sequence), since we can use locatable.uid at any level from compo/folder/status to element. A recommendation about the name constraints would also be useful, also as a guide not in the spec, mentioning it would be better to use coded text instead of text constraints because of the potential problems that could occur having texts in a specific language (mentioned several on this thread). So using codes, better paths could be created, though those won't identify a specific item in a multiple attribute collection (content, events, items, etc). If the uid is not used, then data should be included in the path, creating a different type of path, the one with conditions on it, not a static one, and for this, a processing is needed because it's basically a query over the data, like xpath predicates. So if we go further on that way, we might want to have a complete spec for these predicates in the path, so we can filter data by any attribute, making it an approach to solve this issue and to provide a generic way of filtering data in documents, an openehr-x-path kind of thing, but across the RM, not an XML. IMO without the UID there is no static identifier of data, and using a path will require a dynamic method (something needs to be evaluated). Note the idea of the instance paths is to identify the position of a data item within one locatable, not across versions. It's a local secondary identifier for nodes, and the format helps to get data at that position and, removing the indexes, helps to get constraints from the OPT, because without the indexes it's just a valid archetype path. --- ## Post #60 by @thomas.beale [quote="pablo, post:59, topic:2492"] Not sure why we need to mix mechanisms (uid + sequence), since we can use locatable.uid at any level from compo/folder/status to element. [/quote] Because Guids take a lot of space, and putting them on every node significantly increases the space cost of the overall DB. Nearly all such Guids are a complete waste - they will never be used for anything, because direct refs to nearly all data nodes are never created. I've done space calculations for an openEHR DB in the past with and without Guids (let's ignore Guids on Compositinos and Entries, that's probably 2% of the possible total) and the size increase is significant for 'average' data. I priced out the difference for longt-term ITIL3 data-centre RAID 10 persistence - the difference was significant. In addition, Guids on all LOCATABLE nodes make a mess of data for any kind of human reading (testing...), and they also don't tell you the order of accession of the sibling nodes. There is a deeper semantic argument for only using Guids on Compositions and Entries. In openEHR, these structures are designed to be semantically stand-alone and have safe interpretations. But a data node like just a systolic pressure is not safe on its own - it could be a measurement, a target, or something else. A procedure might mean it was done, not done, recommended, not recommended. And so on. I think we should always therefore treat such objects as coherent wholes, and only reference internal elements via paths. [quote="pablo, post:59, topic:2492"] A recommendation about the name constraints would also be useful, also as a guide not in the spec, mentioning it would be better to use coded text instead of text constraints because of the potential problems that co [/quote] It is not intended that name fields be required to be coded. It might be nice, but it won't usually happen. We have to accept that names of things, like text fields, will be in some language. The key thing is (as you and @Seref pointed out earlier) is to avoid the use of such fields in querying or reference paths (e.g. in UI forms). [quote="pablo, post:59, topic:2492"] Note the idea of the instance paths is to identify the position of a data item within one locatable, not across versions. [/quote] Using a sequence_id will actually work across versions, as long as the ids are monotonically increasing over time and never re-used. --- ## Post #61 by @Seref [quote="thomas.beale, post:60, topic:2492"] Because Guids take a lot of space, and putting them on every node significantly increases the space cost of the overall DB [/quote] I'm going to shoot a horror movie for programmers one day, about what the guids do to pretty much all persistence technologies in terms of indexing.... The current working title is something in the lines of "Attack of the entropy: the demise of the inverted index" --- ## Post #62 by @ian.mcnicoll [quote="thomas.beale, post:60, topic:2492"] Using a sequence_id will actually work across versions, as long as the ids are monotonically increasing over time and never re-used. [/quote] Stoopid question (asking for a friend?) but is that not quite challenging to achieve for a specific node, across all possible composition versions? --- ## Post #63 by @thomas.beale [quote="ian.mcnicoll, post:62, topic:2492"] Stoopid question (asking for a friend?) but is that not quite challenging to achieve for a specific node, across all possible composition versions? [/quote] when a new node is added (e.g. imagine some Elements under some particular Cluster) it gets a sequence id (say, 5). That never changes from then on. The next sequence id to be used needs to be recorded somewhere. In a new version, some more nodes get added. Each time the sequence_id is assigned, and incremented. In the next version our original node (id=5) gets deleted. New nodes get added. And so on. As long as the ids are never re-used and the next id to assign is always remembered correctly - necessary because the current highest id node(s) could be deleted in some version, e.g. 20, 21, 22. The next node assigned will still be 23. So it works on the same logic as terminology - never re-use a code, and always generate new codes in a reliable way (incrementing, or some other more interesting scheme). We do this with id-codes in the ADL Workbench, and in any other ADL2-based tool, so that archetype paths are preserved over versions. The interesting question is: where does the 'next id to assign' get stored? There are various possible answers. But the scheme is not complicated. --- ## Post #64 by @pablo [quote="thomas.beale, post:60, topic:2492"] Because Guids take a lot of space, and putting them on every node significantly increases the space cost of the overall DB. Nearly all such Guids are a complete waste [/quote] Sorry, this cant be an argument in 2023. In terms of % of the bytes needed to store a full composition, the UIDs is just a small percentage, and today storage is cheaper than ever. Also the LOCATABLE.uid could store OIDs, which if well designed, could use less space than a UUID. Again, if you store UUIDs as strings, that takes more space than storing as 128 bit numbers, which is what they really are. So we are talking about 4 bytes per UUID stored as a number. Then the key is to use them where those are needed, not everywhere. Again, the question to ask, and data should be given, how much % of the total space does having UIDs where needed requires? Then compare that to provide a hack to the model in order to avoid storing that amount of extra data. Finally, is the solution really worth it? I mean 1. adding extra complexity, 2. really saves that much data? If you show me a real estimation of how much extra data is needed, and that is significant, I'll shut up. BTW, you need to store the sequence_id too, which is that is an int, it will take 32 bits, long will take 64, vs 128 bits needed by the current UID. Are we really optimizing at that level? --- ## Post #65 by @pablo [quote="ian.mcnicoll, post:62, topic:2492, full:true"] [quote="thomas.beale, post:60, topic:2492"] Using a sequence_id will actually work across versions, as long as the ids are monotonically increasing over time and never re-used. [/quote] Stoopid question (asking for a friend?) but is that not quite challenging to achieve for a specific node, across all possible composition versions? [/quote] I think it's trying to play the role of the locatable.uid, with the uid you know which entry is which even if they have the same name in a collection container. That is why I don't quite understand the idea of using yet another identifier instead of our old friend the UID :slight_smile: --- ## Post #66 by @thomas.beale [quote="pablo, post:64, topic:2492"] Sorry, this cant be an argument in 2023. In terms of % of the bytes needed to store a full composition, the UIDs is just a small percentage, and today storage is cheaper than ever. [/quote] I know. Everyone always says that. I did a set of calcs for the NHS taking into account a careful size estimate of openEHR data, and worked out the incremental cost increase of RAID 10 storage in a data centre, with NHS pricing. It turns out that universal Guids cost real money when you are into terabytes, RAID 10 etc. I'll have a look around to see if I still have them. [quote="pablo, post:64, topic:2492"] Then the key is to use them where those are needed, not everywhere [/quote] Exactly right - just what I am advocating (Compositions, Entries, Parties, Plans...). [quote="pablo, post:64, topic:2492"] BTW, you need to store the sequence_id too, which is that is an int, it will take 32 bits, long will take 64, vs 128 bits needed by the current UID. Are we really optimizing at that level? [/quote] That is actually a good point. I would use character strings. When you have terabytes (4TB was the size of one UK GP system storage requirement for probably 10m EHRs in the UK, about 10y ago), and peta-bytes over the long term, you always optimise. If you don't, you're always needlessly burning money and other resources. I am however more interested in semantic reasons for not putting Guids everywhere, rather than reasons of space economy... --- ## Post #67 by @thomas.beale [quote="pablo, post:64, topic:2492"] Also the LOCATABLE.uid could store OIDs, which if well designed, could use less space than a UUID [/quote] One thing to note: LOCATABLE.uid is a String field, not an Integer field. --- ## Post #68 by @bna This is a great and important discussion. In this post I put forward two postulates regarding sequence identifiers. Let's see if we can agree on them: - Postulate one: One sequence for each archetype_node_id at the same level - Postulate two: There is no need to persist the sequence identifier across versions of the COMPOSITION. **Postulate one: One sequence for each archetype_node_id at the same level** Given a container on any level of an COMPOSITION which has the possibility to add more than one item with the same archetype_node_id Then the sequence number must be shared for all instances of the same archetype_node_id. One simple example Given an archetype with a node with archetype_node_id at0003 And this node has the multiplicity 0..* In a template this node might be cloned and given a name constraint. The "original" node does not have a name constraint and can have any possible name. Let's say the node at0003 has the english term "Comment". In a template the node might be cloned and given the name "Other" We assign A as identifier of the original node and B to the cloned node In the data the client/user add multiple instances of the node. - A1 and A2 can have any name. Current best practices is to use the hashtag pattern like "Comment#1". - Since the name of A is unconstrained it is also allowed to give A2 the name "Other". - B1 is constrained on the name and must have the name "Other" This gives the following names of the nodes ``` A1 => name: "Comment#1", value:" openEHR is great" A2 => name: "Other", value: "The RM shines over healthcare" B1 => name: "Other", value: "DIPS provides great software for health providers" B2 => name: "Other", value: "Norway has the biggest ski-jump hill in the world" ``` Given this dataset in a COMPOSITION there is no way to identify the that A2, B1 and B2 comes from different definitions in the template. Since A2 was given the name "Other" it looks equal to B1 and B2. The sequence identifisers for this datasett might be: ``` A1 - 1 A2 - 2 B1 - 3 B2 - 4 ``` This gives us postulate one: There must be one sequence of ids for all nodes at the same level with the same archetype_node_id **Postulate two: There is no need to persist the sequence identifier across versions of the COMPOSITION.** The sequence identifier is only a weak, local and version specific way to distinguish nodes at the same level with the same archetype_node_id. To the most extreme the clinical identitical node might change sequence identifier across versions of the COMPOSITION. If we follow the example from above. Let's say the user removes A2 and attach a new instane B3. Then the sequence identifiers will be: ``` A1 = 1 B1 = 2 B2 = 3 B3 = 4 ``` Note that B1 has the same content across the versions. Still it changed sequence id from version 1 to version 2 since A2 was removed and the new calculated sequence gave new numbers. This is needed to make the sequence identification algorithm stateless to allow distributed and asynchronous editing of COMPOSITION. --- ## Post #69 by @pablo [quote="thomas.beale, post:67, topic:2492, full:true"] [quote="pablo, post:64, topic:2492"] Also the LOCATABLE.uid could store OIDs, which if well designed, could use less space than a UUID [/quote] One thing to note: LOCATABLE.uid is a String field, not an Integer field. [/quote] You mentioned "...increases the space cost of the overall DB...", at the DB level we can do whatever, as long as when transforming DB data to RM instance in memory it is a string, we are good. So in the DB it can be a 128bit number or a binary(16) like in MySQL. Postgres has a native UUID type which already stores internally as an optimized type (means it doesn't store the UUID as 36 bytes as it will require for the string version "e1fb491b-198f-496c-b5db-72261f9ddc30") Check this interesting post about MySQL https://www.mysqltutorial.org/mysql-uuid/ --- ## Post #70 by @thomas.beale Yes we could do something at the DB level. It will be a bit tricky since the uid field can have other kinds of Id that are not convertible, or at least not easily, to integers. It might be that in openEHRv2 we simplify that field to a Guid and then its type can be Integer, both computationally and in storage. --- ## Post #71 by @pablo [quote="bna, post:68, topic:2492"] Given a container on any level of an COMPOSITION which has the possibility to add more than one item with the same archetype_node_id Then the sequence number must be shared for all instances of the same archetype_node_id. One simple example Given an archetype with a node with archetype_node_id at0003 And this node has the multiplicity 0…* In a template this node might be cloned and given a name constraint. The “original” node does not have a name constraint and can have any possible name. Let’s say the node at0003 has the english term “Comment”. In a template the node might be cloned and given the name “Other” We assign A as identifier of the original node and B to the cloned node In the data the client/user add multiple instances of the node. * A1 and A2 can have any name. Current best practices is to use the hashtag pattern like “Comment#1”. * Since the name of A is unconstrained it is also allowed to give A2 the name “Other”. * B1 is constrained on the name and must have the name “Other” This gives the following names of the nodes ``` A1 => name: "Comment#1", value:" openEHR is great" A2 => name: "Other", value: "The RM shines over healthcare" B1 => name: "Other", value: "DIPS provides great software for health providers" B2 => name: "Other", value: "Norway has the biggest ski-jump hill in the world" ``` Given this dataset in a COMPOSITION there is no way to identify the that A2, B1 and B2 comes from different definitions in the template. Since A2 was given the name “Other” it looks equal to B1 and B2. The sequence identifisers for this datasett might be: ``` A1 - 1 A2 - 2 B1 - 3 B2 - 4 ``` This gives us postulate one: There must be one sequence of ids for all nodes at the same level with the same archetype_node_id [/quote] @bna I know understand the issue, I was focusing on the data not on the model. 1. Rephrasing, in AOM at the OPT level on a multiple attribute constraint you will have two alternatives. 2. These alternatives, in general are used for alternative types (e.g. in events, to have POINT_EVENT and INTERVAL_EVENT constraints) for the same constraint not as cloned alternatives of the same type (e.g. having two alternatives for POINT_EVENT at the multiple attribute `events`) 3. I understand that is totally valid, and not only to have two alternatives, that the two alternatives actually have the same archetype_node_id. I believe the model, at least AOM1.4 wasn't designed to support that case, and that's why we are discussing here. It would be nice to confirm that case is not supported and have some kind of statement added to the AOM1.4 spec, and maybe some patch as 1.5, since it's used a lot. Then yes, some kind of extra differentiator is needed, so when you have data for any of those nodes (POINT_EVENT "A' and POINT_EVENT "B"), the data can reference the right node. Without that, there is no possible data validation, since when you have a data set, you need to know exactly for each node, which AOM node constraints that (considering also if there is no constraint, then that node will be valid). Considering that: 1. the AOM node differentiator should be defined in the archetype or template, depends on where you have the cloned constraint node 2. the RM data instance of should have the node differentiator from the AOM to be able to get it for data validation and other functions That AOM differentiator has nothing to do with the RM instance index I mentioned above that we use on instance paths: [quote="pablo, post:42, topic:2492"] An instance path would look like this: `/content[archetype_id=openEHR-EHR-OBSERVATION.lab_test-blood_glucose.v1](0)/data[at0001]/events[at0002](0)/data[at0003]/items[at0005](0)/value/value` [/quote] Those indexes are not ids, but locators, and are local, don't work across versions of the same locatable. Now I understand that the AOM node differentiator I mentioned above is the sequence_id added to AOM2 @thomas.beale please confirm. Again, I was not talking about the AOM but the RM above since I didn't understand the whole picture. --- ## Post #72 by @pablo [quote="thomas.beale, post:66, topic:2492"] I did a set of calcs for the NHS taking into account a careful size estimate of openEHR data, and worked out the incremental cost increase of RAID 10 storage in a data [/quote] Did you counter UUIDs as 36 or 16 bytes? (plain text vs binary) [quote="thomas.beale, post:70, topic:2492"] Yes we could do something at the DB level. It will be a bit tricky since the uid field can have other kinds of Id that are not convertible, or at least not easily, to integers. [/quote] I believe, others could confirm or not, most of the locatable.uid stored are UUIDs not OIDs or INTERNET_IDs. Though at the top locatable level the uid is an object version id since we agreed on using the version.uid there. But for internal uids in locatables, it will be mostly UUIDs. Either way, we are discussing implementation but should be focusing on spec. If the spec says "it is recommended to use xxx" then implementers will figure out the optimizations they need to do, running numbers and costs at scale. Can't consider all those implementation details at the spec level. [quote="thomas.beale, post:70, topic:2492"] It might be that in openEHRv2 we simplify that field to a Guid and then its type can be Integer, both computationally and in storage. [/quote] If there is a bidirectional conversion possible to optimize storage, I don't think we need to worry about storage in the spec. --- **Canonical:** https://discourse.openehr.org/t/named-element-and-occurences/2492 **Original content:** https://discourse.openehr.org/t/named-element-and-occurences/2492