# Polishing node identifier (at-codes) use cases. **Category:** [Technical (archive)](https://discourse.openehr.org/c/technical-archive/156) **Created:** 2013-08-27 17:20 UTC **Views:** 5 **Replies:** 73 **URL:** https://discourse.openehr.org/t/polishing-node-identifier-at-codes-use-cases/15068 --- ## Post #1 by @yampeku Thinking a little about node identifiers I have thought some problematic use cases\. First, this is the current 'rule' in the wiki \(http://www.openehr.org/wiki/pages/viewpage.action?pageId=196633) for when node identifiers are really needed\. I copy the relevant part for ease the discussion: --- ## Post #2 by @system In which circumstance can a sibling occur of a DataValue? Certainly not in an ELEMENT\. I either cannot imagine another circumstance\. So why use a node\-value? Write a nodeId if you want, it is not very interesting\. The problem is another\. It annoys me quite some time, this issue, not if you use a nodeId or not, or if your archetype\-editor does or does not\. \*\*\*I would say, make it optional, configurable\*\*\*\* But what is the case? The problem is that there are two main archetype editors\. One creates nodeIds in DataValues, and the other does not\. The designers have apparently a different opinion on this\. Sometimes the editors crash/choke on the ADL construct the other delivers\. And even when they do not choke, when you change one letter in an archetype, maybe in the ontology\.\.\.\. What happens? The editor quickly removes/adds the nodeIds on all DataValues\. \(one does this, the other does that\) This makes it impossible to work with them both\. Ity makes it hard it exchange archetypes with other people\. --- ## Post #3 by @thomas.beale this wiki page is (I hate to say...) out of date - the current rules are: ADL takes a minimalist approach and does not require node identifiers where sibling object nodes can be otherwise distinguished. Node identifiers are mandatory in the following cases: I think this probably deals with the cases you point out below. I'll now go and update that wiki page :( - thomas --- ## Post #4 by @yampeku Well, I would say that "free or coded text" is quite common in healthcare, but even if you argue that this exact example has no real clinical validity in openEHR it is still an issue that could happen in any other standard \(I would say that CDA for example with his data types and all his inheritances will suffer this problem for sure\) You can see that in these years even at\-code rules have changed quite a bit \(just see current wording and compare with previous ones\)\. Specialization is a BIG part of archetypes and the more we use it the more new problems we find\. If you want we can wait until archetype systems are in use to detect these kind of issues\. Ignoring problems won't make them disappear\. As we have discussed before, we want to add the functionality to turn off the node autocompleting\. But again having all nodes with at\-code is perfectly fine according to the specifications, and given the issues we encounter with specializations I would say that is always be better safe than sorry\. --- ## Post #5 by @yampeku Good to know\. I think the only remaining issue could be the one to confirm if a specialized object should always have at\-code\. And regarding use\_node, I would also add that you have to be careful not to create an internal reference from a sibling \(and if you do then you MUST put an at\-code\) --- ## Post #6 by @thomas.beale > The problem with this rules come with the \(explicit or implicit\) > specialization of single attributes\. take this example: > >   ELEMENT\[at0009\] occurrences matches \{0\.\.1\} matches \{ \-\- Position >                                          value existence matches \{0\.\.1\} matches \{ >                                              DV\_TEXT occurrences > matches \{0\.\.1\} matches \{\*\} >                                          \} >                                      \} > > What happens if a DV\_TEXT is added on the specialization? Does it need > at code? Do we consider the rule to be applied to the archetype \+ > parent or only to the archetypes? Do we need to add an at\-code also to > the parent? If it were added in a specialisation with no code, it would be assumed to be a redefinition of an existing DV\_TEXT constraint, assuming there was one\. If added with a code, the post flattening validation \(phase 3\) in the current ADL workbench would need to detect this\. Right now it doesn't have checks in that phase for this\. I'll have to run this example through the compiler to see what it does\. > This would need either a rewriting of the rule to state the issue with > flat archetypes or a potential problem if an at\-code is not specified\. good point; the rules should specify that they apply to flattened archetypes, which means that ids are required even if each specialisation child introduces only one alternative\. I'll fix this in the spec\. > Another use case is when valid children types are part of the same > class hierarchy \(no need for specialization\)\. Do we need at\-codes when > we create siblings such as DV\_TEXT and DV\_CODED\_TEXT? according to the current rules \(see my previous post\), yes\. I would actually still rather avoid this, and am still thinking about it\.\.\. > If we have several different data types, such as DV\_BOOLEAN, > DV\_QUANTITY, and DV\_TEXT and then we want to add a DV\_CODED\_TEXT, > which one of the data types gets an at\-code? all? only the text ones? > none? according to the previous rules, none\. But DV\_CODED\_TEXT would be treated as a preferential constraint over DV\_TEXT due to the RM inheritance relationship\. It would be up to apps and tools to make use of that\. > Again, rewording/clarification is needed or problems may occur\. yes \- I doubt that we are at the final version of the wording on this \- but have a look at the new version and see what you think\. > What is so wrong about having at\-codes in every class of the archetype > with no ontology definition for that code? interesting question \- so far \(10 years\!\) we have always treated an at\-code as something that is in the ontology\. At the moment no tools at all would handle the assumption that only some codes had definitions; it would raise questions: how do you know which things need definitions and which don't? My guess is that there would need to be a special definition that is connected to the at\-codes you want to have no definitions, which would complicate the archetype ontology section structure\. \- thomas --- ## Post #7 by @yampeku I don't think archetype ontology would be more complicated at all\. There are currently archetypes with different set of properties in each at code and tools can handle it well \(if I remember correctly, NEHTA archetypes have extra properties\)\. I'm pretty sure tools are currently robust enough to deal with missing at codes at the ontology\. It's even very easy to check if an at\-code is on the ontology \(if I recall correctly from java implementation\.\.\.\) --- ## Post #8 by @thomas.beale Bert, I would be very happy to test some archetypes created with the LinkEHR editor in the ADL workbench, but I don't think any are publicly visible are they? We need some definitive problem reports to know what to fix\. Or log an issue here <http://www.openehr.org/issues/browse/AEPR>\. I am pretty sure we can fix problems in both tools if we know what they are\. \- thomas [details="(attachments)"] ![ocean\_full\_small.jpg|92x41](upload://omlkPxIt2jb2NPEgfRNrw9aCGOK.jpeg) ![btn\_liprofile\_blue\_80x15.png|80x15](upload://wvuEw46mMYRYzHqv86Wkoxe35S4.png) [/details] --- ## Post #9 by @yampeku I have quite a few generated archetypes. Not clinically valid (or clinically validated at least) but they should be simple and varied enough to test. Or we could just try to generate a full cycle (I could open a CKM archetype, save it and give it back to you to test it). [details="(attachments)"] ![btn\_liprofile\_blue\_80x15.png|80x15](upload://wvuEw46mMYRYzHqv86Wkoxe35S4.png) ![ocean\_full\_small.jpg|92x41](upload://omlkPxIt2jb2NPEgfRNrw9aCGOK.jpeg) [/details] --- ## Post #10 by @system Hi Thomas, thanks for your attention. I experienced the problems with the Ocean Archetype Editor and The LinkEhr editor when used in the same work-environment, for example, which causes archetypes to be opened in both editors. It is not difficult to reproduce the errors and inconviences. The issue is that both, Ocean and LinkEhr, do not recognize their responsibility and do not see a need to change this. One problem easy to reproduce, easy, a few steps. - create an archetype in the LinkEhr editor, most simple, based on Element with one DataValue. You will see it will have a nodeid on the DataValue. - open it in the Ocean Editor, as I recall, this is not possible, first you need to change a few things, forgot what exactly. Small things, but this should not be. Ok, repair it in a text-editor, open it, and change one letter in the ontology, and save it. - The nodeids in the DataValue are removed without notifying the user. The archetype has changed on other places then was the purpose of the user. You can do this also the other way around and you will experience also problems. The solution is: a good behavior would be that an archetype editor would conform to the archetype a user loads in it, changing it without notification is very wrong. Another solution also needed would be that nodeids on locations where these are not enforced by the ADL-definition should be optional. These problems exist for years now, everyone knows about them, If it was my software, I would comfort my users and customers with friendly solutions. regards Bert [details="(attachments)"] ![btn\_liprofile\_blue\_80x15.png|80x15](upload://wvuEw46mMYRYzHqv86Wkoxe35S4.png) ![ocean\_full\_small.jpg|92x41](upload://omlkPxIt2jb2NPEgfRNrw9aCGOK.jpeg) [/details] --- ## Post #11 by @thomas.beale well I think Nehta archetypes may have some hacks - they wouldn't conform to ADL 1.5 and I am not even sure if their internal archetypes conform to ADL 1.4. IN any case, the things they want (annotations mainly) are done differently in ADL 1.5. It's certainly easy to check for codes that are not in the ontology - and current the tools all do check, and generate lots of errors: VONSD, VATDF1, VATDF2, VATCD, VETDF, WETDF, VATCD, VACDF1, VACDF2; see meanings . I'm not saying it's not an idea worth thinking about, but the current specifications and tools all work on the opposite premise. - thomas --- ## Post #12 by @Peter_Gummer1 > The issue is that both, Ocean and LinkEhr, do not recognize their responsibility and do not see a need to change this\. Hi Bert, Glad you've brought this up again, but the problem won't get fixed unless you report it\. Can you report the problem at http://www.openehr.org/issues/browse/AEPR and attach an archetype that demonstrates it? > These problems exist for years now, everyone knows about them, If it was my software, I would comfort my users and customers with friendly solutions\. If I had a problem that I wanted fixed, I would report it in the problem tracker\. We are very busy and working on other projects\. If this problem is important to you, please report it and we may get around to it some day\. Please make sure that you attach an example of an archetype that demonstrates the problem\. Peter --- ## Post #13 by @pablo Hi all, very interesting discussion. ``` > Another use case is when valid children types are part of the same > class hierarchy (no need for specialization). Do we need at-codes when > we create siblings such as DV_TEXT and DV_CODED_TEXT? according to the current rules (see my previous post), yes. I would actually still rather avoid this, and am still thinking about it... ``` Thinking about this case, shouldn't be a better design approach to define DV_TEXT at the base archetype, and the DV_CODED_TEXT alternative (with further constraints) in a specialized archetype? So the issue of the codes dissapear and anyone can choose to use the very generic archetype or the specialized one. This is good from an Object Oriented design approach, but right now the specializations defined on the CKM are very specific concepts, not just a way to simplify modeling and reuse of artifacts (as it is in the case I mentioned). I mean, the specialized archetype is not a more specific concept, is just the same concept with just "a little" more detail. E.g. if we take "Healthcare service request" and "Imaging examinaton request", the specialized archetype I would create sits right in the middle of those concepts, in fact is closer to the more generic one. In short, I don't know if this should be defined at the model level or at the modeling proess level. BTW, I aggree with Bert in that we need interoperable archetype design tools. Back in 2009 we needed to choose an editor, and we tested LiU, Ocean and LinkEHR, and we couldn't load an archetype created by one tool into another tool :-/ Maybe now this has improve. --- ## Post #14 by @system > >> The issue is that both, Ocean and LinkEhr, do not recognize their responsibility and do not see a need to change this\. > > Hi Bert, > > Glad you've brought this up again, Can you report the problem at http://www.openehr.org/issues/browse/AEPR and attach an archetype that demonstrates it? Dear Peter, First I have to start up a Windows computer, this means, digging up my notebook, clean my desk to have space to put my notebook on, and hope Windows will start on it, which is always a risk\. Last time I remember my virusscanner was preventing Windows to start, I never tried again after that because my notebook also has Linux\. Both the LinkEhr as the Ocean\-editor only run on Windows\. The LinkEhr editor should run on Linux too, but it is for a 32\-bits JVM, and I cannot get it to run\. The Ocean editor should also run on Mono, but simply does not\. When I have succeeded both to run, than I must reproduce something\. It will cost me a day or more\. To whom can I send the bill? It is in the advantage of Ocean and LinkEhr to get this sorted out\. The problems are easy to reproduce\. I told you how to\. Please make a copy of this and the previous email, and put it in Jira\. It contains valuable information\. Or leave it and do nothing with it, like this has been done for years now\. The market maybe will correct you\. Thanks again for your attention\. Have a nice day\. Bert > but the problem won't get fixed unless you report it\. Ah, PS: But don't put the blame on me for your software having problems\. Thanks\. --- ## Post #15 by @Peter_Gummer1 Nothing has been done about it, Bert, because no one has ever logged it as an issue\. If any one out there actually does care about this, then please log it at http://www.openehr.org/issues/browse/AEPR with an example archetype\. Problems logged there do get fixed\. Peter --- ## Post #16 by @system Someday, when I am busy with it, I will do it\. Could well be coming month\. Or maybe I write my own archetype editor and thank Ocean and LinkEhr for giving me this business\-opportunity, maybe even send a box of chocolates then\. :\) Bert --- ## Post #17 by @system From Brussels, of course\. --- ## Post #18 by @thomas.beale Bert, all these tools are free, built and maintained by their originators at their own cost\. So you might be sending yourself the chocolates\.\.\. \- thomas --- ## Post #19 by @system There ain't no such thing as a free lunch Bert --- ## Post #20 by @system I'll try to summarize the origin of the different views we have regarding this topic and maybe this can be also useful to see why this is not just a configuration problem of the tools\. We can find the explanation of node identifiers in two places \(I use the latest drafts, I think\): \- In AOM 1\.5 specifications, page 47: "Semantic identifier of this node, used to distinguish sibling nodes of the same type\. \[Previously called ‘meaning’\]\. Each node\_id must be defined in the archetype ontology as a term code\." \- In ADL 1\.5 specifications, page 26: "In cADL, an entity in brackets of the form \[atNNNN\] following a type name is used to identify an object node, i\.e\. a node constraint delimiting a set of instances of the type as defined by the reference model\." and "A Node identifier is required for any object node that is intended to be addressable elsewhere in the same archetype, in a specialised child archetype, or in the runtime data and which would otherwise be ambiguous due to sibling object nodes" The definition in AOM is the one followed by the openEHR editor, i\.e\. a node identifier or atNNNN code is just a pointer to the ontology section and a mechanism to distinguish sibling nodes\. Thus, wherever it is not needed, the tool does not introduce that code in order not to dirty the ontology section\. The first part of the definition in ADL is the one followed in LinkEHR and, in our opinion, more correct formally\. When you introduce an archetype constraint for a C\_OBJECT you are in fact creating a definition of a type \(a sub\-type of the more generic type defined by the reference model class\) that will be used to create a subset of instances\. We have to distinguish this sub\-type from the RM type, and since the class name cannot be changed, the only solution is to use the atNNNN as type identifier\. In other words, our interpretation is that atNNNN codes are unique identifiers of each type defined in the archetype, that may be also used to link to the ontology section, but that is the optional part\. In fact, the only exception to this would be when you create constraints using a path, because then you are just navigating through the RM but do not change the meaning of the intermediate classes\. The logic of the tools and the validation checks of archetypes are built based on those interpretations\. I agree with Bert in one thing: tools shouldn't change things without notifications, but in this case we face a methodological difference, not just a configuration one, and that's why it is not easy to be solved\. David --- ## Post #21 by @system Some of the originators are students, working for their academic purposes and forgetting their tools very quickly when the have a job\. Some originators are part of an enterprise and building the tools to promote their enterprise\. Some of the originators are working in a university and getting well paid for spending their time on building such a tool\. Some of the originators are promoting a standard and using the tool as promotion\. Some of the originators are selling their tools for good money\. And I must say, I agree with them all, there is nothing wrong with that\. Nothing at all\. We all have to live, and everyone is doing it on his/her way\. There is nothing dishonorably on working for your profit\. They could be grateful accept the help I offered until now and profit from it, they can also do nothing with it\. It is their choice\. I fully respect that\. But saying that the tool isn't better because I \(me, as a person\) refuse to walk through some time\-consuming formalities, that is not right, is my opinion\. I leave it all up to the originators to improve their tooling or leave it as it is\. Once in a year, the subject comes up \(thanks, Diego\), and I write down this old annoyance\. I will stop doing this when I am bald and gray\. Maybe that is today, I just looked into the mirror\. Again, have a nice day, you are good folks\. Bert --- ## Post #22 by @thomas.beale Does LinkEHR actually do this? I.e. only some at-codes are found in the ontology? Your statement above (bolded) is right in theory (or at least that's the way I see it), but then the obvious question is: if I mutate the type (say) ENTRY to ENTRY[at0123], what does ENTRY[at0123] mean? In general we want to equate that with a meaning of some kind (like 'ENTRY' has a meaning). Remember, we could have done something like 'ENTRY[admission]' or 'ENTRY[bp_measurement]' but we don't do that because we want the meanings to be multi-lingual (one day the 'ENTRY' bit should be as well...). So we use term codes. So if we agree that 'mostly' we want those meanings defined, then the question is: which places doesn't it matter? I would say: places where it's obvious, like ELEMENT.value: DV_TEXT. My view has always been that we would avoid at-codes in locations where the meaning is obvious (principally for single-valued attributes, where the archetype meaning is the same as the RM meaning). The other reason for that is to limit the length of paths for Xpath processing. Unnecessary codes can double the length of some paths. If we go the other way, then we are saying: at-codes are 100% mandatory everywhere, but definitions for them are optional. Then we need some rules on when it is optional and when mandatory. What rules would you propose for that? Remembering that a clinical modeller absolutely relies on those rules for understanding the archetype? - thomas --- ## Post #23 by @Peter_Gummer1 Very happy to have the help, Bert\. Without people like you reporting problems, we don't know about them\. Look forward to getting that problem report when you get a chance\. It should only take you a couple of minutes … probably a lot quicker than writing all of these emails ;\-\) Peter --- ## Post #24 by @system > > Does LinkEHR actually do this? I\.e\. only some at\-codes are found in the > ontology? Your statement above \(bolded\) is right in theory \(or at least > that's the way I see it\), but then the obvious question is: if I mutate the > type \(say\) ENTRY to ENTRY\[at0123\], what does ENTRY\[at0123\] mean? In general > we want to equate that with a meaning of some kind \(like 'ENTRY' has a > meaning\)\. Remember, we could have done something like 'ENTRY\[admission\]' or > 'ENTRY\[bp\_measurement\]' but we don't do that because we want the meanings > to be multi\-lingual \(one day the 'ENTRY' bit should be as well\.\.\.\)\. So we > use term codes\. > > So if we agree that 'mostly' we want those meanings defined, then the > question is: which places doesn't it matter? I would say: places where it's > obvious, like ELEMENT\.value: DV\_TEXT\. My view has always been that we would > avoid at\-codes in locations where the meaning is obvious \(principally for > single\-valued attributes, where the archetype meaning is the same as the RM > meaning\)\. The other reason for that is to limit the length of paths for > Xpath processing\. Unnecessary codes can double the length of some paths\. > No, currently all atNNNN codes are also found at the ontology in LinkEHR, even if they are empty, to be compatible with the VATDF2 check, although we would like to avoid it :\-\) In my opinion we talk of two different levels of meaning\. One is the explicit meaning, where the definition of the node is defined through a natural text or a terminology binding and that is, of course, the needed for a complete semantic interoperability\. The other is the implicit meaning, when you create e\.g\. an OBSERVATION with occurrences \{1\.\.1\} you are creating "An OBSERVATION that only happens once"\. That means something \(otherwise you wouldn't have defined that constraint\), even if you cannot give it a natural name or a terminology code\. And if it means something, it shall have an identifier\. > If we go the other way, then we are saying: at\-codes are 100% mandatory > everywhere, but definitions for them are optional\. Then we need some rules > on when it is optional and when mandatory\. What rules would you propose for > that? Remembering that a clinical modeller absolutely relies on those rules > for understanding the archetype? > I don't think a clinical modeller would have to mind about these aspects\. He/she creates an archetype node \(internally, a unique atNNNN code is created\)\. He/she optionally gives it a name or defines a terminology binding \(internally the ontology structures are created\)\. When the archetype is used or processed, the systems will only use the information they have available\. --- ## Post #25 by @yampeku >> >>> Bert, >>> >>> all these tools are free, built and maintained by their originators at >>> their own cost\. So you might be sending yourself the chocolates\.\.\. >> >> There ain't no such thing as a free lunch > > Some of the originators are students, working for their academic purposes > and forgetting their tools very quickly when the have a job\. > Some originators are part of an enterprise and building the tools to promote > their enterprise\. > Some of the originators are working in a university and getting well paid > for spending their time on building such a tool\. > Some of the originators are promoting a standard and using the tool as > promotion\. > Some of the originators are selling their tools for good money\. > BTW, I don't know how much does a university researcher gets paid in the Netherlands, but I can assure you that is not that well paid in Spain ;D --- ## Post #26 by @system It depends on your qualifications, but it must be enough so one can live from it\. A dead researcher is of no good at all\. :\) --- ## Post #27 by @system I think you are right\. I do it, next chance as I happen to work with it\. Bert --- ## Post #28 by @thomas.beale David, I am not totally clear on what you mean. OBSERVATION is a class that always has an at-code on it anyway, because it is a high-level class and needs its clinical meaning defined anyway. If you impose {1..1} on it, you will do that via a SECTION or COMPOSITION slot. Ok, but if tools have no rules, then we can end up with an archetype like this: OBSERVATION matches { data matches { HISTORY matches { .... } } } with no meaning on anything. What is to prevent that? - thomas --- ## Post #29 by @system David, Can I summarise it for my understanding as: - ATxxxx codes are pointers to an 'ontology'. - ATxxxx codes can be considered symbols that represent a particular concept - The 'ontology' provides a name that will be used to display the name of a node (concept) in an archetype. - When a node is specialised the node name used will indicate a new concept (its meaning has changed) - When the archetype is specialised ideally the new concept in the specialisation is a subordinate concept. - When a Node is specialised the standard does not prescribe that the new concept is a sub-set of the previous one. - The question is: is each Node (and the concept it represents) unique or not. - The question is: is it obligatory that each node in the archetype carries a unique code of the form ATxxxx . My answers to both questions are: - Each archetype node is a unique concept that must have attached to it a unique identifier. - Archetype editors must support this. And I would like to add: - When specialising each specialised concept must be a subset of its previous one. Gerard Freriks +31 620347088 [gfrer@luna.nl](mailto:gfrer@luna.nl) --- ## Post #30 by @Hugh_Leslie1 Hi Gerard, This is science, not religion. Can you please give reasons for your statements that archetype nodes must be unique concepts and must be uniquely identified? In openEHR and 13606, the archetype is the unique concept which means that nodes quite rightly can have unique meaning in the context of the archetype. This is like human language where the same word can have different meanings depending on the context used. I have never been given a scientific reason why every node in an archetype should be uniquely coded or have unique meaning outside the archetype itself. I have never found a use case that makes this necessary but would be interested if anyone can show me one. Regards Hugh --- ## Post #31 by @pablo Just use archetypeID+nodeID, then you have a unique concept id for each node. --- ## Post #32 by @yampeku don't forget the organization responsible for that archetype ;D --- ## Post #33 by @pablo Yep, that should be necessary in case of archetypeID collisions. Maybe in the future we have an archetypeID server (like a DNS protocol) to query for archetypeID to globally check for uniqueness. --- ## Post #34 by @system True, that is also an old and not yet finished discussion, name\-collissions in archetypeIds\. At this moment there is no solution for this in EN13606 and OpenEHR\. Thanks for reminding\. Bert --- ## Post #35 by @thomas.beale For those who may not realise, Diego is referring \(I assume\) to the ADL 1\.5 namespaced archetype identifiers, which would give paths of the form: namespace::ARCHETYPE\_ID/path\[at0002\]/to\[at0003\]/happiness where the namespace corresponds to the issuing organisation\. \- thomas --- ## Post #36 by @system I must have missed this announcement, sorry, good that there is a solution now\. Bert --- ## Post #37 by @thomas.beale see wiki page and the draft spec . - thomas --- ## Post #38 by @system > > No, currently all atNNNN codes are also found at the ontology in > LinkEHR, even if they are empty, to be compatible with the VATDF2 check, > although we would like to avoid it :\-\) > > In my opinion we talk of two different levels of meaning\. One is the > explicit meaning, where the definition of the node is defined through a > natural text or a terminology binding and that is, of course, the needed > for a complete semantic interoperability\. The other is the implicit > meaning, when you create e\.g\. an OBSERVATION with occurrences \{1\.\.1\} you > are creating "An OBSERVATION that only happens once"\. That means something > \(otherwise you wouldn't have defined that constraint\), even if you cannot > give it a natural name or a terminology code\. And if it means something, it > shall have an identifier\. > > David, I am not totally clear on what you mean\. OBSERVATION is a class > that always has an at\-code on it anyway, because it is a high\-level class > and needs its clinical meaning defined anyway\. If you impose \{1\.\.1\} on it, > you will do that via a SECTION or COMPOSITION slot\. > I know that the openEHR archetype editor only allows introducing OBSERVATIONs and the other clinical classes through a slot at the COMPOSITION and SECTION, and probably that is a good methodological approach or good practice to improve archetype governance, but technically it is not the only possibility\. You can create an archetype from the top COMPOSITION to the leaf data values in one single archetype\. And in any case, it is just an example, we can use MY\_CLASS\_NAME or whatever to avoid thinking this is a problem about how things work at the openEHR reference model\. > I don't think a clinical modeller would have to mind about these > aspects\. He/she creates an archetype node \(internally, a unique atNNNN code > is created\)\. He/she optionally gives it a name or defines a terminology > binding \(internally the ontology structures are created\)\. When the > archetype is used or processed, the systems will only use the information > they have available\. > > Ok, but if tools have no rules, then we can end up with an archetype like > this: > > OBSERVATION matches \{ >     data matches \{ >         HISTORY matches \{ >             \.\.\.\. >         \} >     \} > \} > > with no meaning on anything\. What is to prevent that? > To be exact, in our approach all those classes should have an atNNNN code, even if it is not described at the ontology section\. But in any case, current atNNNN rules does not force to put a description with a sense either, except for the root node because it corresponds to the same concept as the archetype identifier when you create the archetype\. It is more a duty of the clinical validation team to check those kind of things, not something that can be automatically validated by rules\. Look at the following brand new archetype created with the openEHR editor, just choosing an OBSERVATION root: definition OBSERVATION\[at0000\] matches \{ \-\- Test example data matches \{ HISTORY\[at0001\] matches \{ \-\- Event Series events cardinality matches \{1\.\.\*; unordered\} matches \{ EVENT\[at0002\] occurrences matches \{0\.\.1\} matches \{ \-\- Any event data matches \{ ITEM\_TREE\[at0003\] matches \{\*\} \} \} \} \} \} \} ontology term\_definitions = < \["es"\] = < items = < \["at0000"\] = < text = <"Test example"> description = <"unknown"> > \["at0001"\] = < text = <"Event Series"> description = <"@ internal @"> > \["at0002"\] = < text = <"Any event"> description = <"\*"> > \["at0003"\] = < text = <"Tree"> description = <"@ internal @"> > > The only "meaning" there is the "Test example" name of the OBSERVATION, because it corresponds to the archetype name\. But all the others have no meaning and no existing rules are checking that \(having "Event series", "Any event" and "Tree" is the same as not saying anything\)\. So, again, those ontological descriptions will be always checked by the authors, not by the tools\. By the way, following the specifications, even that example archetype created with the openEHR editor is not perfect\. Both the at0001 and the at0003 codes should not be needed according to the rules, since they are members of single value attributes without sibling nodes\.\.\. --- ## Post #39 by @system I will stop at this point because I think it is the kernel of the discussion\. Which should be the idea? \- The ontology "must" provide a name or meaning for each atNNNN node in an archetype? This is how thing are supposed to work in current specifications \- The ontology "can" provide a name or meaning for each atNNNN node in an archetype? This is how we think it should be, the ontology provides a semantic description only when it is needed or it is possible\. And what is providing a meaning or semantic description? \- A terminology binding? Of course, we will rely on terminologies and ontologies for a complete semantic interoperability\. \- A natural language description? Well, here is where no automatic rules can exist to check if a description such as "Systolic blood pressure" or "This is a PQ type node" or "The sky is blue" or " " are correct or have a sense, only a human validation check can work here\. --- ## Post #40 by @system Why bring in religion? I want to understand and ask questions. And in the meantime I have an opinion based on my 'GBV'. (see below) > Can you please give reasons for your statements that archetype nodes must be unique concepts and must be uniquely identified? Reasons for my statement? - Each node in an archetype has a meaning - Without an implicit or explicit meaning the archetype node indicates chaos, meaninglessness, nothingness. - Meaning is attached to the given name of the node or code attached to that node - In the case of specialisations of archetypes several things can happen, one of those is renaming the node, changing the meaning - This changing the name and meaning of the archetype node needs to be reflected in a new unique code So far I have not explained the scope, the jurisdiction, the namespace, of the unique codes I'm alluding to. At the minimum it must be uniquely defined inside the archetype, and in the case of a code from an external coding system, it must be unique in that namespace. > I have never been given a scientific reason why every node in an archetype should be uniquely coded or have unique meaning outside the archetype itself. I have never found a use case that makes this necessary but would be interested if anyone can show me one. It all depends on scope of the archetype. Is it used in an '**open world**' or '**closed world**' situation? When used in a 'closed system' ( two or more actors that have an agreement of that what is exchanged, IHE with one profile) there is almost no need for external codes attached to archetypes. The explicit or implicit agreement is sufficient. Whatever the name in the archetype node, when the agreement says, the node name means 'Black', but we take to mean 'White' then there is no single problem. In 'Closed world systems' the highest level of semantic interoperability is Level2a/2b at the best. It is the world of messaging with ad-hoc agreements, as we know it. When the archetype is used in an 'open world' system, then we need to be very precise and explicit. Any party that interprets the data using the archetype as source, where it must find the meaning, needs to be informed fully. No single human intervention, human interpretation, must be needed to process fully and safely the data exchanged. Local agreements between actors how to interpret the archetype node name do not exist, the archetype itself is the full agreement. In 'Open world' systems the highest level of semantic interoperability is level 3. In this 'open world' situation there are other rules than in the 'closed world' situation. Finally. Do not create a dichotomic world where things are either science or religion. There are many shades. And in order to surprise you: Do not underestimate something that I call in Dutch: '*het gezonde boeren verstand*'. (GBV) Translated: the common sense of the farmer. Many obvious things that happen in life, happen because they happen. I do not have to prove, that water flows, that fire burns, that winds exits, for you and me to accept this is true, with or without a science, with or without any belief system, with or without any dogma. I try to base my own opinions on my 'GBV'. Gerard Freriks +31 620347088 [gfrer@luna.nl](mailto:gfrer@luna.nl) --- ## Post #41 by @thomas.beale yeps, that's certainly true, and it could easily make sense in some situations. well, ok, but that's just like saying that a half-made movie isn't watchable. Intermediate development states of any semantic object can clearly have meaningless placeholders for some period of time, while the designer does his/her thinking. yes, I agree with that. I forget why that tool does that, but in any case, don't take it as a design guide... - thomas --- ## Post #42 by @thomas.beale My view (to date, which I am happy to revise if a better theory comes into view) has been: - the ontology section provides a name or meaning for any node whose clinical/domain meaning is otherwise not understandable. That is, it's not needed for: well the idea here has always been, and remains justified today: --- ## Post #43 by @system I'm not contradicting those positions, which I agree, I'm just saying that this is a very subjective topic, dependant on the context of use, the availability of some resources \(e\.g terminological codes\) and many other factors\. So, we can all do our best but it will be very difficult to have rules that guide which nodes of the archetype have to be identified just based on a structural matter \(the rules you asked for\)\. --- ## Post #44 by @pablo Hi all, Maybe this is OT but is related. I remembered a problem I had some time ago working with algorithms that traverses the archetype structure. For CObjects without nodeID, the path of the CObject is equal to the path of it's parent CAttribute, so when I want to get the node with that path using Archetype.node(path), only one of those nodes will be returned. Of course there are workarounds, like checking the type of the returned node, and if a CAttribute is returned but I want the CObject, I just get the node.children()[0]. But that only can be implemented if you know that the path you're using is a path to a CObject, so it depends on the context of your algorithm to expect CObject or CAttribute for a path you have (i.e. if you previously visit a CAttribute and you algorithm traverses from root to leaves, you'll expect next nodes to be CObjects). From a developer point of view, having unique paths would solve a lot of workarounds and ugly code. So having a nodeID for each CObject node is something I would encourage on tooling. I really don't care of having more terms in the ontology :) --- ## Post #45 by @thomas.beale the usual thing to do here is to provide two (well actually 4, including the 'has' ones) functions: c_attribute_at_path (a_path: String): C_ATTRIBUTE pre-condition has_attribute_path (a_path) c_object_at_path (a_path: String): C_OBJECT pre-condition has_object_path (a_path) in the ADL workbench, we actually pre-compute this in the parse phase, but that isn't necessary of course. - thomas --- ## Post #46 by @thomas.beale You are probably right\. I think for the moment I would like to get ADL/AOM 1\.5 completed \(more or less\) with the current assumptions, at least until we can obtain some more evidence \(particularly from vendor companies with actual production implementations\) and modellers whose archetypes are deployed for real, that would show that we need to change the current status quo\. Call me conservative, but I don't like changing things without real world justification\! If anyone thinks they can invent better rules for node identification in the meantime, please feel free to post them\. It may be that we can make ADL/AOM work in a way that accommodates different 'modes' of operation\. \- thomas --- ## Post #47 by @pablo Wow, that's nice. Thanks Thomas. I'll propose the change to the Java Ref Impl project on GitHub (the one I'm using). --- ## Post #48 by @thomas.beale You can see the code in , but most of the pathing logic is supplied by an independent set of OG_XX classes (OG = object graph). So the class has most of the logic. If you can be bothered, you can get the source and build just to the point where you can browse with the Eiffel tool, that will allow you to see the code properly. It's all Apache 2 license, take what you want ;-) See for instructions. - thomas --- ## Post #49 by @system Well, LinkEHR is a real implementation in use by several organizations, and we think these identifiers are needed both technically and methodologically, so we will continue our way of doing thing :\-\) --- ## Post #50 by @thomas.beale To be clear, I didn't mean modelling tools, I meant production EHR systems that use the resulting models. I'm still not really clear on the rules that LinkEHR uses to decide when at-codes are not defined in the archetype ontology section. - thomas --- ## Post #51 by @system > > Well, LinkEHR is a real implementation in use by several organizations, > and we think these identifiers are needed both technically and > methodologically, so we will continue our way of doing thing :\-\) > > To be clear, I didn't mean modelling tools, I meant production EHR systems > that use the resulting models\. > Of course, me too: http://www.eurorec.org/news_events/newsArchive.cfm?newsID=239 > I'm still not really clear on the rules that LinkEHR uses to decide when > at\-codes are not defined in the archetype ontology section\. > The rules are: \- Every archetype node always has an explicit unique identifier\. We use the atNNNN codes to do so, to minimize the impact with current ADL\. \- The archetype authors decide, during the definition and review process, which nodes need or have a description or terminology binding due to clinical reasons\. --- ## Post #52 by @thomas.beale Yep, I know about that (the more systems the better!). But I would be interested to know what the clinical models look like - are they posted anywhere? And what is the clinical modelling process? I would think after a few years of it, there would be some ideas on which nodes need to be defined and which don't? I'm just trying to get some evidence here, so we can better understand the right set of rules to use in the formalism and its tooling. - thomas --- ## Post #53 by @pablo Hi David, IMO LInkEHR rules are a profile of the rules in the specs, that shouldn't make incompatible archetypes between tooling. I would like to see both Archetype Editors to support this profile: to open an archetype with the default behaviour of the specs (not having a nodeID for every node) on LinkEHR Ed. and work ok, and open a profiled archetype in Ocean AE and also work ok. Is that tough to do? As a developer / investigator / trainer, I really don't care about the decisions made but each software provider, I just need stuff to work :-) E.g. I wish one day on an openEHR workshop I can give the option of choosing the Archetype Editor to work with. Right now I only have one option that I know works with archetypes on the CKM, the Ocean one. And some time ago I tried LinkEHR Ed. and it was nice. I wish I can work with that today. --- ## Post #54 by @Peter_Gummer1 Never having seen a LinkEHR archetype myself, I can't say for sure how tough it would be to open one in the Ocean AE successfully and then make sure that the at-codes are preserved in the ADL and XML output, Pablo. I would guess that it would take a couple of days at least … and then it would have to be tested to make sure that the enhancement didn't break any other functionality. I don't have a couple of days right now to spend on something that has no business case behind it. But a good first step towards making such a business case would be if someone who needs this enhancement could find a few minutes to raise the issue on Jira … with an example :-) Peter --- ## Post #55 by @system I have uploaded some archetypes to http://www.en13606.org/resources/files/cat_view/67-sample-archetypes They correspond to the ISO EN 13606 archetypes of the Patient Summary document of the Fuenlabrada Hospital\. They are only in Spanish\. They are mainly based on openEHR archetypes from the CKM, but also enriched with some information from NEHTA specifications and from the Spanish law for clinical documentation\. \![image.png|845x762](upload://cyKAkUiNcJnwPpc15apLMokM4nv.png) --- ## Post #56 by @yampeku By the way, I just found out that archetype_node_id from locatable class from the reference model (common_im document, page 22) is obligatory (!!!). The meaning of the attribute is as follows: "Design-time archetype id of this node taken from its generating archetype; used to build archetype paths. Always in the form of an “at” code, e.g. “at0005”. This value enables a "standardised" name for this node to be generated, by referring to the generating archetype local ontology. At an archetype root point, the value of this attribute is always the stringified form of the archetype_id found in the archetype_details object." If you have to put the atxxxx code and the archetype does not have it, what do you put there? What should expect the systems? There is even an invariant defined as "Archetype_node_id_valid: archetype_node_id /= Void and then not archetype_node_id.is_empty" How does this work in your current implementations when sometimes the atxxxx code is not present? --- ## Post #57 by @thomas.beale it's simpler than you think \- we made that property mandatory so that programmers would never get a null exception\. If it doesn't contain an at\-code or an archetype id, it can be empty \(but not null\), or \(what the ADL workbench currently does\) \- it can contain a dummy id like 'unknown' that the software can easily spot and strip out\. I'm not against making it an optional property if developers would prefer that\. \- thomas --- ## Post #58 by @system Op 20\-9\-2013 17:01, Thomas Beale schreef: > it's simpler than you think \- we made that property mandatory so that programmers would never get a null exception\. Must have been along time ago, nowerdays, programmers have no problem handling a null property\. I wonder what the idea behind stuffing the archetype\_id in the archetype\_node\_id property is? Here you make it harder for programmers because the archetype\_id has another syntax in archetype\-paths then the archetype\_node\_id has, and anyway, lots of other functions, and a programmer has to check the string\-layout to find out if it is an archetype\_id or an archetype\_node\_id\. It also blocks the possibility to store the "at"\-code for the root, and check the ontology for its contents\. Bert --- ## Post #59 by @Peter_Gummer1 I think it has to be the dummy id\. According to the invariant, it can't be empty\. Peter --- ## Post #60 by @thomas.beale > Op 20\-9\-2013 17:01, Thomas Beale schreef: >> it's simpler than you think \- we made that property mandatory so that programmers would never get a null exception\. > > Must have been along time ago, nowerdays, programmers have no problem handling a null property\. actually, that's not quite true\. It's probably the primary reason for exceptions in object\-oriented software \- method call on a void object\. But I get what you are saying, and for this String field, being null would not pose a great problem\. So we could change the spec to do that\. > I wonder what the idea behind stuffing the archetype\_id in the archetype\_node\_id property is? > Here you make it harder for programmers because the archetype\_id has another syntax in archetype\-paths then the archetype\_node\_id has, and anyway, lots of other functions, and a programmer has to check the string\-layout to find out if it is an archetype\_id or an archetype\_node\_id\. It also blocks the possibility to store the "at"\-code for the root, and check the ontology for its contents\. the idea is that there is only one field to look at to find archetype identifying information in data\. It is either an archetype\_id \(string form\) or an at\-code, or \(for systems that support it\) it's empty / 'unknown' \(which could be replaced by null/void\)\. With the archetype id, you can always look up the archetype and find out the root code \(at0000, or a matching pattern like at0000\.1 or at0000\.1\.1\)\. But if you can't look up the archetype, you are lost, and that's what the archetype\_id is for\. \- thomas --- ## Post #61 by @system Yes, it is very easy to catch a null-exception and then do something with that information. Anyway, IMHO, specs should not solve technical problems, and they mostly don't do that. I believe this is also defined in UML. Technical problems are for implementers to solve. That is why this is a strange decision. The point is, the archetype_id is stored in the property archetype_node_id, Pablo implemented it like that in XML, and he found in the specs it should be that way. I think this is an unneeded complication of the specs. Better was to assign a special property for the archetype_id, besides the archetype_node_id. He found this spec in common.pdf, section 3.1.2 where is stated: "The archetype_node_id is the standardised semantic code for a node and comes This makes it difficult to implement, because, an implementer has to test if the archetype_node_id contains an at-code or an archetype_id. This can lead to ambiguities, for example if XML contains the archetype-slots and the connected instances are embedded, which is legal and can really speed up XPath-queries. This possibility ambiguities is special the possible because it is not really hard defined what an at-code looks at. Bert --- ## Post #62 by @system Yes, it is very easy to catch a null-exception and then do something with that information. Anyway, IMHO, specs should not solve technical problems, and they mostly don't do that. I believe this is also defined in UML. Technical problems are for implementers to solve. That is why this is a strange decision. The point is, the archetype_id is stored in the property archetype_node_id, Pablo implemented it like that in XML, and he found in the specs it should be that way. I think this is an unneeded complication of the specs. Better was to assign a special property for the archetype_id, besides the archetype_node_id. He found this spec in common.pdf, section 3.1.2 where is stated: "The archetype_node_id is the standardised semantic code for a node and comes This makes it difficult to implement, because, an implementer has to test if the archetype_node_id contains an at-code or an archetype_id. This can lead to ambiguities, for example if XML contains the archetype-slots and the connected instances are embedded, which is legal and can really speed up XPath-queries. These possible ambiguities can occur because it is not really hard defined what an at-code looks at. Bert --- ## Post #63 by @yampeku How does this 'unknown' value relate to the discussions we already had regarding the need of having all atxxxx codes present in the ontology? --- ## Post #64 by @thomas.beale Hi Bert, I don't happen to believe in that philosophy. Here's why: if you leave too much open, for implementers to constantly decide, then the 1,000 people (let's say) who download your specification will solve those problems individually. Some may talk on lists, but essentially (knowing developers as I do) they will mostly solve it on their own. Let's say each of those people takes average 2 hours to decide and test a solution for a given problem. That's 2,000 hours gone. Many of these solutions will be different, and many will have bugs or even be wrong. Let's say 30% are buggy / wrong. Let's say there is 10 hours average remedial time to fix each of these problems. That's 333 x 10 = another 3,330 hours gone. That's 5,330 hours, or over 2 person years. It clearly makes sense to spend 10, 20 or even 50 hours centrally to find a definitive answer once and publish that, rather than waste 2.5 person years at the periphery, creating low-grade chaos! In addition, some let's say 1% of the original - that's 10 implementations - have not only bugs, but bugs that cause patient harm or economic damage (e.g. wrong query results, downtime etc). Who knows what the cost of that will be. Worse than all of this is the fact that many of the 1,000 solutions to the problem will be different, perhaps 100 flavours. That means we have 100 flavours of solution to just that one tiny issue in the original specifications. On its own, that's a virtual guarantee that those solutions will not work interoperably without some small adjustment or remediation. The correction is probably small. However, if there are 100 similar decisions / issues in the specifications we are talking about a combinatorial explosion of millions of variants of what should be the same software component (or at least the same one within each programming language / technology), and that is a huge interoperability problem. My belief is that ambiguity is the enemy of good software and interoperability, and of efficiency in development. For that reason I believe specifications should very carefully specify things. I'll give a very simple example. The openEHR specifications routinely specify which properties of a class are mandatory, optional, and which String fields have to be non-empty. Even those simple things help save time. Now, the actual openEHR specs of course have some errors, and wrong decisions. The original specs that most people use today (but are about to be revised) probably have some wrong decisions made by me, as a best guess at the time of the best way to limit ambiguity. So what is really needed is for the communities around each development technology to build up common reference software components that become the one true way (for today) of doing X in Java, or Y in Python. If developers start saying 'X is a strange decision', and upon analysis, there is a better way to do X with no impact on data, quality, performance etc, we should do it. That's how we should progress. But I don't believe in 'leave it to the programmers' because I don't believe in 'programming', I only believe in 'design', carried out at different levels of granularity. Well we thought about that a long time ago, and the view was that then you will have two fields in every LOCATABLE, one of which (hopefully) is null/void in each actual instance. This could easily lead to errors, and wastes a data property. We certainly need to make sure that the pathing in the XML expression of the specifications works as it should. I'm not sure if I understand your last statement though. - thomas --- ## Post #65 by @system Sorry for skipping this, but I don't think this is relevant in the discussion. There is really something good in the UML-philosophy which says not to interfere in implementing, but keep specifications clean of implementation-issues. In this discussion are two things which illustrate that very well. Specifications thought of some time ago have tried to solve possible-implementation-errors by interfering in software-development. You are that this UML-philosophy is good. I will explain this. First, we can say about the specs that they are most of the text designed with this UML-philosophy in mind: no database-platform defined, no database structure defined, no programming language, no platform defined, everything in the OpenEHR-specs is open in a way that honors this UML-philosophy, which I think is good. But there are a few exceptions in the OpenEHR-design: One is having a small issue in the design explained in this argument: "we made that property mandatory so that programmers would never get a null exception. ". The other is having one single property in the design for different things, to avoid errors, as you explain below (I disagree). And then you bring in some calculations as argument, I already skipped them. What time do you save? Allowing developers to write sloppy code because they don't need to check for a null-value? Do you think that professional programmers are not able to apply basic programming rules, to check for a null value when retrieving data from a database or external source? I don't know which quality of software-development you expected in the OpenEHR community when writing this spec, but it does not seem that you had much confidence in developers, at that time. It is inefficient to have an empty string instead of a null value, it is a waste of processor-time. Now, programmers must check for the contents of a string, if it is empty then it must be considered null. Checking for a null-string (which does not exist in memory) is much more efficient. No String calculations needed, no object creation, etc. It is basic code-optimization, never instantiate a variable if you want it to be null. Your specs force software to be unnecessary inefficient. You are taking responsibility for errors bad or unexperienced programmers could eventually make. It shows disdain for most developers. Ivory tower we call that in the Netherlands. I don't see any errors for having different properties for different things. I see errors in having different things in the same property. A waste of a data-property? I do not understand what you are trying to say. Do you mean that there are occasions in which a specific property is useless? Because it is not used? Then I must say that OpenEHR has a lot of waste, because there are many properties which are not used all the time. :) Why is that a waste? Because of database-space? Maybe it is this: It must be because you don't want null-values and want to put empty strings in the place. That is indeed a waste, I explained above, it is a waste of memory, processor-time, database-usage. There, in that design-part, you justify a waste. Maybe it is time to give some responsibility of software-development to software-developers and stop thinking about decisions as - using one property for two different things - using empty-strings to indicate a null value This is the big-data-society in which programmers are educated in their profession. You should trust them more then you do now. As you say, you thought about this a long time ago. That was also my thought about this, and it would be good to change this. Imagine an archetype-slot, for example, for having contacts in a PERSON. There are two ways of implementing it in object-instances or XML-instances. One way is: Having different instances, connected via a not in the specs defined connection indicating that one instance should be placed inside the property of another instance. Talking about errors, here is a situation in which the specs fail to indicate how the connection must be made, and it is left to implementors. Seeing that the spec fail to specify this (and the specs want to protect us against simple programming-errors), we must conclude that the specs want us to really implement archetype-slotted instances to be a materialized part of the containing instance. I think this is a wise thing to do. Because, what do you want to do with data? You want to query them, and do this as efficient as possible. You want database-indexes to be used to find values for ADL-paths (which are easily translated to object-instance-paths or XPaths) The whole OpenEHR ecosystem is build around ADL-paths: AQL, templates, etc. Imagine you write a query which retrieves for you a PERSON (as an object-instance or an XML-instance, or another instanced way), and in that person are paths, ADL paths. Two difficulties arise: One: Now you write software to analyze that PERSON, and you see the "contact"-property, and you don't know at that moment if that contact is included via slots, or is included via a large PERSON-archetype. So in that case, you need to analyze the contents of archetype-node-id of the contacts to detect if it is an archetype_id in it or an at-code. This is very hard, and maybe impossible to do this trustworthy. So the programmer has to check the archetypes to check this. This is a big waste, unnecessary. A waste of a lot of processor-time, thousands lines of code are involved to read the archetype and check if a string in the "contact"-property is an at-code or an archetype_id. Two: Imagine writing a AQL-engine on a database. As we know, the syntax for an archetype_id is completely different from the syntax for an archetype_node_id. But the writer of the engine needs to find these completely different things in one property, with no indication which is what, especially in slotted-instance-sets. I think that you can see how difficult that is, he needs, as in the previous problem, to check archetypes to know if the contents of that property is an archetype_id, and interpret/create the ADL-path accordingly. This is not a wild example, we all need to create AQL-engines, to use the OpenEHR ecosystem as meant in the specs. It is not very hard to do, because, ADL is very similar to XPath, and I think that object-database, also have object-path-queries. So it is easy to translate, but we still need to do that, and create/interpret ADL-paths. The situation you have created, as you state, to avoid errors is causing errors or unnecessary difficulties and causing thousands of lines of code to be used (wasted processortime). I hope you agree that this is an error and I hope that you will take care that these two things (the other also in this email) will be changed in the specs. Thanks for your attention Bert --- ## Post #66 by @thomas.beale it's not developers like you or many of the other careful, thoughtful and professional people on these lists. But there are huge numbers of developers out there whose main job is implementing something else, but who have to quickly 'put something together' for this or that project, typically in a department of health, hospital or other provider site. These people have to write code in a rushed way, and will inevitably solve things as fast as possible without deep contemplation. And yet - those pieces of software routinely end up in real health data processing environments. So the aim of the specs is to reduce errors by this kind of development. Like I said, particular choices in the specs to achieve that might be wrong, and the community here needs to help improve that. I agree that's usually true. However sometimes there are reasons to never want a null field, which guarantees that software will always deal with the value safely rather than crashing unexpectedly. Occasionally it makes sense to do the same with Lists - ensure there is one, even if occasionally empty. like I said, in this case, it might make sense to change the spec. you have to realise that specification authors (should) try to minimise ambiguity and therefore possible errors for all users of a standard. The unfortunate reality is that everyone programs these days, and many people (who might be surgeons or senior administrators!) do part-time programming, but probably not very well. That's the world today... sure - if you have a separate property to store the archetype id, it is empty in 95% of all object instances, and also you need a class invariant to prevent it being filled at the same time as the archetype_node_id (at-code) property. If it's a single property, it always contains the archetype 'node id', which is either an internal node if (at-code) or an archetype root node id (the archetype id itself). I think that's pretty clear. not generally. You'll see that most string fields in openEHR that are optional can be null; most that are mandatory can't be empty, just as you said above. professional developers (over a certain age;-) may be. Numerous others who nevertheless build software are not. We need experienced professionals to help improve the specs. by XML-instances, do you mean 'by reference'? If you are referring to what the data instance structure looks like, yes if the reference model says it is inline (i.e. included by value) then that's what it is. The corresponding archetype structure technically could be made of multiple archetypes, connected by slots, or by one large archetype acting as a template. well to check in the data if you have an archetype id or an at-code, it's just going to be something like: if (archetype_details != null) { // archetype_node_id contains an archeytpe id } else { // archetype_node_id contains an at-code } the says this - see p 22 - invariants: Archetyped_valid: is_archetype_root xor archetype_details = Void I think it's only one line of code, as above. If you want to check whether an archetype has a slot at the 'contact' path then that's easy as well, with something like: if (my_archetype.definition.c_object_at_path (path_to_contact_property) instanceOf ArchetypeSlot)) // it's a slot else // something else I'm not sure where the difficulty lies. I don't believe any of the implementations of AQL have had any great difficulties in this area. Whatever path is provided in a query, the AQL engine just looks for it. It can easily do this in quite a dumb way. I can imagine that one day in the future we use Snomed-like codes for both at-code and archetype id, which would mean it's the same kind of code always in the property archetype_node_id in a Locatable, but that wouldn't make a lot of difference. I'm still not that clear on what the problem really is here. Yes, the archetype_node_id field can contain two different types of value, but they're easy to tell apart. I may just be missing the point here, so feel free to elaborate. Also, if other developers have had problems with this, post your experiences. - thomas --- ## Post #67 by @system So we can stop this discussion right here. I respect that you wanted to express your opinion on my message, but there is no need for me to comment on this. We agree that shit happens all the time, but apart from that that you will support a change of spec regarding the empty string representing a null value issue. But we have the other issue of having one property to store two different things without indication what is stored. You call having a property which is not often used a waste. I must disagree, it is very common in archetypes, I think it is in 90% of the archetypes that the root of a definition also has a node_id. So in that case both can occur simultaneously. But in the path only the archetype_id will occur, and it is easier for a programmer to find which one is the archetype_id if it is in a separate property. And anyway, I don't think a seldom used property is a waste. It is only bits and bytes, and there is hardly any code involved having this property. But as I showed in example, not having this property can make many thousands of lines code-execution necessary. That is a waste. We, system-builders, and special system-designers like you, do not decide which archetypes are going to be used. There are archetypes of megabytes, they exist. I don't think it is wise to have them, but it is that modeling is not always focused on performance, but more on academical medical ideas. We, builders of two level modeling systems, we must be able to live with this kind of academic exercises. But those archetypes cost one second ore more, just parsing on a medium speed computer. You don't want to do this unnecessary, you don't want to parse that kind of archetypes at every data-entry. It breaks your system. Because there is no sure way of analyzing a string and find out if it is an archetype_node_id or an archetype_id in slotted situations besides parsing and analyzing the archetype, this will make the situation of having one property for two different values inefficient, and in some situations dramatic inefficient. The idea of what I was saying, I think I can express it more clear now, is that there are two ways of embedding a slotted dataset (based on an archetype which fits in the slot) in the containing dataset (based on the archetype which has the slot, so to say, the containing archetype) One way is to add a reference to the container-dataset, which points to the slotted dataset. The other way is to add the slotted dataset materialized in the container-dataset. (The expression "materialized" is from oracle) The first one is not described in the specs, so to say, there is no spec which indicates how to reference the datasets. In theory the specs expect the second situation. The paths in AQL or templates are defined if the slotted datasets are materialized inside the containing dataset. This is also the most simple way to do this. This causes, however, a problem. Imagine you have a dataset and you want to express a path to a leaf-value. You must know in that case if there are slotted datasets in it, because the path will follow other syntax rules in case of slots. So in a PERSON without slots a contact would look like this [person-archetype]/contacts[at0003]/items[at0004]............. In a PERSON with slots it would look like this. [person-archetype]/contacts[at0003]/[contact_archetypeId]/items[at0004]............. So if you have a large dataset and you want to express ADL-paths to leaf-nodes, you need to know if there are slots. There is one way to find out. Parse the according archetype and find out if there are slots. You need to do that because you cannot trust the string analyzing of the archetype_node_id. So you have to execute thousands of lines of code to find out if an archetype contains slots. If there was a separate property for archetype_id, then it would only be looking at the accordingly property if it has a null value, (or an empty string :) This is indeed a way to handle this, but what bothers me in this case, two things. - You cannot have an XPath engine doing this complex querying, it makes path-based queries very complex, and maybe even impossible. - Maybe technical not so important, but the property name does not indicate what it contains, and it is bad programming practice to have misleading names. I understand that having an archetype_id property creates redundant information, because the information already is in the archetype_details property, but the same also goes for storing the archetype_id in the archetype_node_id. I think this redundancy is ugly, and should not occur. I think redundancy is a design error. The reason is that the archetype_details contain other information besides the archetype_id. The best way to do would be a separate archetype_id property, and eventually archetype_details without archetype_id, or find another way for the details, these details are also in archetype itself. I am not sure what it means if there are two different paths possible to one data-leaf. One path with the slot defined, and one path as if there was no slot. A few weeks ago we both argued to William Goossens that the path is the identifier for a datapoint, not the archetype_node_id. But now you seem to imply that there are more then one paths-definitions possible. By the way, it is getting late (again) Lets hope this will never happen. How difficult can we get :) Bert --- ## Post #68 by @thomas.beale It's 100% ;-) But what i meant was that in any instance structure, say a Composition, most of the nodes in the data tree will have an at-code in archetype_node_id, only a few - the archetype root points - will have archetype ids. The at-node corresponding to the root point is just the at0000 code (or a specialised version of that). Putting that in the data is not much use. well there is already a property - archteype_details - for that purpose. Originally we did think of putting the at0000 codes in archetype_node_id, and putting the archetype id in the archetype_details property, which is a separate object. (Somee) developers found it was too annoying to use like that, so we changed to the current way of using the properties. well you wouldn't be parsing archetypes at data entry - they have to be pre-parsed, validated, and used to generate Operational Templates (OPTs) which are the final XML structures stored in the server. as I said in he previous post, you can just check if archetype_details != null correct well the more obvious way to find out is to parse the OPT and just get its path set - that is what you can query with. but the Xpath engine doesn't need to do this. It just processes the query paths it finds in the queries. It doesn't need to know what archetypes were used to structure it. we'll certainly review this, and take the above into account. we are talking in templates, which are (generally) made up of a composition of archetypes. So it's true that in a system containing PERSON objects, some archetyped by a composition of archetypes, and others archetyped by template-mimicking 'big' archetypes, then you can get more than one path to the same object instance node (say PERSON.contacts). But that happens anyway, all the time, simply due to the use of diverse archetypes. There might be 20 templates whose paths are all different, that point to PERSON.contact objects in the data. That's the whole point - those semantically different paths - that's what querying runs on. My suspicion from what you are saying is that you are not doing a pre-load of operational templates into your back-end system. If you had that, the query service can work very optimally. - thomas --- ## Post #69 by @thomas.beale it's a fake node id value I use in the ADL workbench compiler, just to guarantee the non\-void non\-empty requirement\. Like I said earlier, I would rather make the field optional, but mandatory if there are siblings\. \- thomas --- ## Post #70 by @system People could use it for some ontology-message. I don't understand why it is there when it is not allowed to use, or when it is not possible to retrieve the connected information. Again this is some small thing in the specs which has no purpose while people could expect it. This is caused in chain reaction by the other, I think, ambiguous spec. Because the other spec makes it impossible to query the atcode from the definition of an archetype. I must say that this is not very nice defined. -------------------------- skip skip ------------------------------------------------------ Not quite so, XPath can have properties as path-arguments, and it must possible to query for certain objects with a specific archetype_id and another specific node_id. Since the root of the definition is allowed to have an node_id, one can expect people use it, so there can be a need to query them. As I said before, as a builder/designer of a two level modeling system, you cannot predict which archetypes people use. And you must be sure that the archetypes they use, can be used safely, which is not always the case in the current specs, because there may be possible information which cannot be reached at query-time. It depends if it is wished to have archetypes pre-loaded. If you run a kernel as a public service, and there are hundreds of archetypes operational, then it will cost a lot of memory to pre-load them all. The parsing an archetype should not be done more then once in its lifetime, it is expensive and unnecessary computing, especially when the archetypes are large. Saving them preloaded is a real memory-eater. In my system it is not useful to preload archetypes, because, archetypes are only parsed once in my system. That is when they are saved in the system. They are parsed in order to create a RNG/Schematron definition. That is used to validate the data, and if new data are entered, then they will be checked against that RNG/Schematron definition, not against the parsed archetype. The schema is loaded in microseconds and the validation takes one second. After the data are validated, they are stored in an XML-database, and they will never be validated again. They are ready for XPath-queries and XQueries, and all kind of complicated handling without even looking at an archetype. So the refusal to specify a "archetype_id" in the specs is, in my architecture, bad for performance, because it forces extra archetype-parsing, so I have that property without the consensus with the specs, and I do not see it as a waste. I make sure that when I have to export data to an OpenEHR system, I will put the archetype_id in the archetype_node_id property. Thanks for the discussion, sorry that we could not find an agreement. regards Bert --- ## Post #71 by @thomas.beale exactly - that's why you should pre-load them - I don't mean cache them in memory - I mean compile the them once, compile the templates, generate the operational templates, and store all of these in a local database or location in a post-parsed form, typically an XML or other object (e.g. JSON) object serialisation. If your system encounters a new archetype, that needs to be compiled and saved in the same way as well. ok, so the downstream form of an archetype you are using is a Schematron schema - so that's the thing that needs to be stored. right - that sounds like all other archetype-based systems I know of. but the specs already specify archetype_details, which contains the archetype id. And you can detect that easily in a schematron schema I guess. So you can easily figure out that you are on one of those nodes. Is the real problem simply that the syntax of what is in archetype_node_id on one of those nodes - an archetype_id rather than an at-code - causes some problem in your processing? I am not clear on what though... are you trying to use the at-code texts at runtime? Are they also in the Schematron schema? well, I'm just trying to understand the problem. There will certainly be a more scientific discussion in the near future with the specifications editorial committee. But it needs clear evidence of problems if it is going to change any specifications. - thomas --- ## Post #72 by @system >> >> In my system it is not useful to preload archetypes, because, archetypes are only parsed once in my system\. >> That is when they are saved in the system\. They are parsed in order to create a RNG/Schematron definition\. > > ok, so the downstream form of an archetype you are using is a Schematron schema \- so that's the thing that needs to be stored\. OK, I misunderstood that part of the discussion, having a form of XML\-schema is a representation of an archetype, which can be for specific purposes like validation more efficient then the archetype\-object, depending on the technical architecture of the kernel\. It seems that we agree on that\. >> That is used to validate the data, and if new data are entered, then they will be checked against that RNG/Schematron definition, not against the parsed archetype\. >> The schema is loaded in microseconds and the validation takes one second\. >> >> After the data are validated, they are stored in an XML\-database, and they will never be validated again\. They are ready for XPath\-queries and XQueries, and all kind of complicated handling without even looking at an archetype\. > > right \- that sounds like all other archetype\-based systems I know of\. > >> So the refusal to specify a "archetype\_id" in the specs is, in my architecture, bad for performance, because it forces extra archetype\-parsing, so I have that property without the consensus with the specs, and I do not see it as a waste\. I make sure that when I have to export data to an OpenEHR system, I will put the archetype\_id in the archetype\_node\_id property\. > > but the specs already specify archetype\_details, which contains the archetype id\. And you can detect that easily in a schematron schema I guess\. So you can easily figure out that you are on one of those nodes\. Is the real problem simply that the syntax of what is in archetype\_node\_id on one of those nodes \- an archetype\_id rather than an at\-code \- causes some problem in your processing? I am not clear on what though\.\.\. are you trying to use the at\-code texts at runtime? Are they also in the Schematron schema? We are not talking about the OpenEHR reference model, but about archetyped data\-handling\. I have two arguments, the first one is most simple to explain, so I start with that\. --- ## Post #73 by @thomas.beale Bert, can you raise an issue on SPECPR <http://www.openehr.org/issues/browse/SPECPR> \- that's the issue tracker that we use to feed specification work\. If you just paste most of this post in as the description that will be enough to get back to this when more people can get involved \(which will be fairly soon\)\. thanks \- thomas [details="(attachments)"] ![ocean\_full\_small.jpg|92x41](upload://omlkPxIt2jb2NPEgfRNrw9aCGOK.jpeg) ![btn\_liprofile\_blue\_80x15.png|80x15](upload://wvuEw46mMYRYzHqv86Wkoxe35S4.png) [/details] --- ## Post #74 by @system OK Bert --- **Canonical:** https://discourse.openehr.org/t/polishing-node-identifier-at-codes-use-cases/15068 **Original content:** https://discourse.openehr.org/t/polishing-node-identifier-at-codes-use-cases/15068