# Regex in Archetypes must include TYPE **Category:** [Technical (archive)](https://discourse.openehr.org/c/technical-archive/156) **Created:** 2008-07-16 11:21 UTC **Views:** 1 **Replies:** 17 **URL:** https://discourse.openehr.org/t/regex-in-archetypes-must-include-type/14796 --- ## Post #1 by @Adam_Flinton Dear All, The regex locator used to get an archetype from another archetype is too loose\. Given the Archetype ID is something like : openEHR\-EHR\-CLUSTER\.address\.v1 instead of : archetype\_id/value matches \{/address\\\.v1/\} we should have \(at the very least\): archetype\_id/value matches \{/CLUSTER\\\.address\\\.v1 /\} or \(preferably\) archetype\_id/value matches \{/openEHR\-EHR\-CLUSTER\\\.address\\\.v1 /\} \(A\) address\\\.v1 matches no particular string exactly as the Archetype ID is openEHR\-EHR\-CLUSTER\.address\.v1 As such you have to match on anything prior to address\. i\.e\. openEHR\-EHR\-CLUSTER\.address\.v1 matches but so would openEHR\-EHR\-CLUSTER\.person\_address\.v1 At the moment I am having to add a dot \(\\\.\) to each regex to bookend the start of the string\. e\.g\.:     \[xslt\] Archetype found from pattern     \[xslt\] p\_pattern = address\\\.v1     \[xslt\] v\_Dotpattern = \\\.address\\\.v1 \[xslt\] Archetype name = openEHR\-EHR\-CLUSTER\.address\.v1 This is both dangerous and pointless as you either have a valid/useable regex or you don't\. B\) As a result I am now having to split bar'ed regex'es such as: checklist\_item\-general\-cvs1\.v1|checklist\_item\-general\-cvs2\.v1|checklist\_item\-general\-cvs3\.v2|checklist\_item\-general\-cvs4\.v2draft|checklist\_item\-general\.v1|checklist\_item\-general\.v2|checklist\_item\-general\.v3 i\.e\. I am having to pre\-process the regex instead of being able to apply it directly as a regex\. C\) It is long term dangerous anyway as the only thing guaranteeing uniqueness of ID/Name is the filename in a given folder\. This is fine until you consider folders such as "structure" where you could have ITEM\_TREE\.adam\.v1 & SINGLE,LIST, or TABLE\.adam\.v1 \.\.\.\.\.\. So if in the Structure folder then\.\.\. /adam\\v1/ could possibly be ambiguous\. ie\. if I am adding a "structure" archetype & I am show a choice of adam\.v1,adam\.v1,adam\.v1 or adam\.v1 which one do I want/should I add? Seeing ITEM\_TREE\.adam\.v1, SINGLE\.adam\.v1 etc would fix that potential problem\. D\) Given the Archetype ID/Filename is openEHR\-EHR\-CLUSTER\.address\.v1 then what is saved by not showing at the very least CLUSTER\.address\.v1 in the drop down list & putting in a regex of: "CLUSTER\\\.address\\\.v1" ? E\) Going for a more exact regex in the first place then allows one to be more exact even in the looser cases e\.g\. CLUSTER\\\.address\\\.v or where you want to say "any CLUSTER exam"\. F\) If other sorts of Regex expression are required for use then I am not prepared to keep adding to the "regex pre\-processor" to in effect create another regex processor\. Either the regex is correct or it's not\. If it's not then don't use a regex\. Summary / Conclusion 1\) I would like to see the Archetype designer creating regex's which include at the very least the RM\_TYPE part of the name e\.g\. CLUSTER or ITEM\_TREE\. Preferably it should be the entire Archetype ID as that is the actual string being matched\. This should be a design time thing and not a publishing problem\. This will allow the correct use of regex for model location\. Either that or change all the Archetype ID/Filenames from : openEHR\-EHR\-CLUSTER\.address\.v1 to address\.v1 Adam Adam --- ## Post #2 by @system Hi Adam I know Tom has talked to you about this. I was involved in the original discussions about this many years ago and many points of view were expressed. The issue is that, as you know, the CLASS is specified in the slot definition and there is a regex for the remainder of the ID. What we have been doing is setting the regex to: openEHR-EHR-*CLASS_NAME*\.*REGEX_EXPRESSION* This seems very safe. There are problems if we have the regex including the class when it is already specified as they will have to be the same. I would be quite happy to just use a regex for the whole id constraint, or leave it as it is. Just including the class does not help much as far as I can see, and having both and a potential bug (different class specified) is also a problem. Can you pick up the class name from the slot and append it to 'openEHR-EHR-' and the regex to get the full statement? This can be explicit in the spec. Or lets have a suggestion to change to a full regex so we get it completed that way. Cheers, Sam Adam Flinton wrote: [details="(attachments)"] ![OceanInformaticsl.JPG|183x82](upload://2lcnRHcC3QqDv6AeaDZuo8M9Qlv.jpeg) [/details] --- ## Post #3 by @Adam_Flinton Sam Heard wrote: > Hi Adam > > I know Tom has talked to you about this\. I was involved in the > original discussions about this many years ago and many points of view > were expressed\. The issue is that, as you know, the CLASS is specified > in the slot definition and there is a regex for the remainder of the > ID\. What we have been doing is setting the regex to: > > openEHR\-EHR\-/CLASS\_NAME/\\\./REGEX\_EXPRESSION > / > This seems very safe\. There are problems if we have the regex > including the class when it is already specified as they will have to > be the same\. I would be quite happy to just use a regex for the whole > id constraint, or leave it as it is\. Just including the class does not > help much as far as I can see, and having both and a potential bug > \(different class specified\) is also a problem\. > > Can you pick up the class name from the slot and append it to > 'openEHR\-EHR\-' and the regex to get the full statement? This can be > explicit in the spec\. Or lets have a suggestion to change to a full > regex so we get it completed that way\. > > Cheers, Sam I would definitely say that it should be a full regex as: \(A\) Otherwise it's not a regex, it's a partial regex & then that causes problems further down e\.g\. checklist\_item\-general\-cvs1\.v1|checklist\_item\-general\-cvs2\.v1|checklist\_item\-general\-cvs3\.v2|checklist\_item\-general\-cvs4\.v2draft|checklist\_item\-general\.v1|checklist\_item\-general\.v2|checklist\_item\-general\.v3 Is a pseudo regex & thus it needs to be split by a "regex pre\-processor" & then each sub statement needs to have the "openEHR\-EHR\-CLASS\_NAME" appended to it & then put through the regex engine\. i\.e\. either have a regex or don't\. A pseudo regex means creating an entire "pseudo\-regex" processor which is crazy & for what? Already your own HTML XSLT fails for precisely this reason as you get: Include entries openEHR\-EHR\-CLUSTER\.checklist\_item\-general\-cvs1\.v1|checklist\_item\-general\-cvs2\.v1|checklist\_item\-general\-cvs3\.v2|checklist\_item\-general\-cvs4\.v2draft|checklist\_item\-general\.v1|checklist\_item\-general\.v2|checklist\_item\-general\.v3 B\) You are then asking for repetitive code in every implementation thus introducing the the possibilities of bugs again for no good reason\. I repeat\.\.\.: If you want to use a regex then use a regex which is useable as a regex\. At present it is not & for no good reason\. i\.e\. saying "take the pseudo\-regex & append xyz to it to create the real regex" is both error prone & means that you can't actually use the regex as a regex\. Adam --- ## Post #4 by @Tim_Cook2 \+1 \-\-Tim --- ## Post #5 by @system Hi Adam I take this point and in that case I would suggest that resulting issue to discuss is: Should we drop the class name from the Archetype Slot in ADL and just use the regex? There does not appear to be any reason in the AOM to include the class name. We do need the occurrences for the slot. allow_archetype CLUSTER occurrences matches {0..5} matches { include archetype_id/value matches {/exam\.v1|exam-uterus\.v1|exam-fetus\.v1/} might become: allow_archetype occurrences matches {0..5} matches { include archetype_id/value matches {/openEHR-EHR-CLUSTER\.exam\.v1|openEHR-EHR-CLUSTER\.exam-uterus\.v1|openEHR-EHR-CLUSTER\.exam-fetus\.v1/} This would have advantages in controlling ordering of included archetypes of mixed classes. Interested in others views. Cheers, Sam Adam Flinton wrote: [details="(attachments)"] ![OceanInformaticsl.JPG|183x82](upload://2lcnRHcC3QqDv6AeaDZuo8M9Qlv.jpeg) [/details] --- ## Post #6 by @thomas.beale Sam Heard wrote: > Hi Adam > > I take this point and in that case I would suggest that resulting issue to discuss is: > > Should we drop the class name from the Archetype Slot in ADL and just use the regex? There does not appear to be any reason in the AOM to include the class name. We do need the occurrences for the slot. > > allow_archetype CLUSTER occurrences matches {0..5} matches { > include > archetype_id/value matches {/exam\.v1|exam-uterus\.v1|exam-fetus\.v1/} > > might become: > > allow_archetype occurrences matches {0..5} matches { > include > archetype_id/value matches {/openEHR-EHR-CLUSTER\.exam\.v1|openEHR-EHR-CLUSTER\.exam-uterus\.v1|openEHR-EHR-CLUSTER\.exam-fetus\.v1/} no - this is definitely wrong. The class name is always needed in all ADL object blocks. There is no reason to drop it. Why would we do that? That would be rewriting the formalism. [details="(attachments)"] ![OceanC\_small.png|74x72](upload://5I367QG2SMJUp18Pt3jF6yz13Ey.png) [/details] --- ## Post #7 by @system Hi Thomas I had a look at the AOM and did not see anything, just include and exclude statements - didn't read the ADL spec. The point here is that we could have a slot that allowed different classes which would simplify things for the archetype authors. Could we have a slot that allows two different classes? Cheers, Sam Thomas Beale wrote: [details="(attachments)"] ![OceanInformaticsl.JPG|183x82](upload://2lcnRHcC3QqDv6AeaDZuo8M9Qlv.jpeg) [/details] --- ## Post #8 by @thomas.beale Sam Heard wrote: > Hi Thomas > > I had a look at the AOM and did not see anything, just include and > exclude statements \- didn't read the ADL spec\. The point here is that > we could have a slot that allowed different classes which would > simplify things for the archetype authors\. in the AOM all C\_OBJECTs have the rm\_type\_name attribute\. > Could we have a slot that allows two different classes? We already have that you can put more than one object constraint in a slot, either as alternatives for a single\-valued attribute or as multiple co\-existing items under a multiply\-valued attribute\. \- thomas --- ## Post #9 by @system OK - brilliant. An example of how to add more than one class in a slot....? Cheers, Sam Thomas Beale wrote: [details="(attachments)"] ![OceanInformaticsl.JPG|183x82](upload://2lcnRHcC3QqDv6AeaDZuo8M9Qlv.jpeg) [/details] --- ## Post #10 by @Peter_Gummer1 Sam Heard wrote: > \.\.\. What we have been doing is setting the regex to: > >> openEHR\-EHR\-CLASS\_NAME\\\.REGEX\_EXPRESSION This is not quite right because REGEX\_EXPRESSION might contain patterns for multiple concepts, as Adam mentioned\. You would have to wrap it up in parentheses: openEHR\-EHR\-CLASS\_NAME\\\.\(REGEX\_EXPRESSION\) Then it should work\. Adam Flinton wrote: > > \.\.\. thus it needs to be split by a "regex pre\-processor" > & then each sub statement needs to have the > "openEHR\-EHR\-CLASS\_NAME" > appended to it & then put through the regex engine\. No need to do that, Adam, just wrap the regex within parentheses\. So taking the example you gave, the correctly\-wrapped regex would be: Include entries openEHR\-EHR\-CLUSTER\\\.\(checklist\_item\-general\-cvs1\.v1|checklist\_item\-general\-cvs2\.v1|checklist\_item\-general\-cvs3\.v2|checklist\_item\-general\-cvs4\.v2draft|checklist\_item\-general\.v1|checklist\_item\-general\.v2|checklist\_item\-general\.v3\) The code to generate this is trivial in any programming language \(assuming your programming language has the ability to concatenate strings, which I reckon is a safe bet\)\. Unless you're programming in assembly language, it's probably one simple line of code\. \- Peter --- ## Post #11 by @Adam_Flinton Sam Heard wrote: > Hi Adam > > I take this point and in that case I would suggest that resulting issue to discuss is: > > Should we drop the class name from the Archetype Slot in ADL and just use the regex? There does not appear to be any reason in the AOM to include the class name\. We do need the occurrences for the slot\. > > allow\_archetype CLUSTER occurrences matches \{0\.\.5\} matches \{ >                                         include >                                             archetype\_id/value matches \{/exam\\\.v1|exam\-uterus\\\.v1|exam\-fetus\\\.v1/\} > > might become: > > allow\_archetype occurrences matches \{0\.\.5\} matches \{ >                                        include >                                             archetype\_id/value matches \{/openEHR\-EHR\-CLUSTER\\\.exam\\\.v1|openEHR\-EHR\-CLUSTER\\\.exam\-uterus\\\.v1|openEHR\-EHR\-CLUSTER\\\.exam\-fetus\\\.v1/\} > > This would have advantages in controlling ordering of included archetypes of mixed classes\. > > Interested in others views\. > > Cheers, Sam That would be fine by me\. That would allow me to drop my "regex pre\-processor" which would be nice & would give me some peace of mind wrt people using regex\. Adam --- ## Post #12 by @Adam_Flinton Thomas Beale wrote: > Sam Heard wrote: >> Hi Adam >> >> I take this point and in that case I would suggest that resulting >> issue to discuss is: >> >> Should we drop the class name from the Archetype Slot in ADL and just >> use the regex? There does not appear to be any reason in the AOM to >> include the class name\. We do need the occurrences for the slot\. >> >> allow\_archetype CLUSTER occurrences matches \{0\.\.5\} matches \{ >>                                         include >>                                             archetype\_id/value >> matches \{/exam\\\.v1|exam\-uterus\\\.v1|exam\-fetus\\\.v1/\} >> >> might become: >> >> allow\_archetype occurrences matches \{0\.\.5\} matches \{ >>                                        include >>                                             archetype\_id/value >> matches >> \{/openEHR\-EHR\-CLUSTER\\\.exam\\\.v1|openEHR\-EHR\-CLUSTER\\\.exam\-uterus\\\.v1|openEHR\-EHR\-CLUSTER\\\.exam\-fetus\\\.v1/\} > > no \- this is definitely wrong\. The class name is always needed in all > ADL object blocks\. There is no reason to drop it\. Why would we do > that? That would be rewriting the formalism\. > Either way is fine by me as the bit I care about is: "archetype\_id/value matches \{/openEHR\-EHR\-CLUSTER\\\.exam\\\.v1|openEHR\-EHR\-CLUSTER\\\.exam\-uterus\\\.v1|openEHR\-EHR\-CLUSTER\\\.exam\-fetus\\\.v1/\}" Adam --- ## Post #13 by @Adam_Flinton Peter Gummer wrote: > Sam Heard wrote: >   >> \.\.\. What we have been doing is setting the regex to: >> >>> openEHR\-EHR\-CLASS\_NAME\\\.REGEX\_EXPRESSION >>>       > This is not quite right because REGEX\_EXPRESSION might contain patterns for > multiple concepts, as Adam mentioned\. You would have to wrap it up in > parentheses: > > openEHR\-EHR\-CLASS\_NAME\\\.\(REGEX\_EXPRESSION\) > > Then it should work\. > > Adam Flinton wrote: >   >> \.\.\. thus it needs to be split by a "regex pre\-processor" >> & then each sub statement needs to have the >> "openEHR\-EHR\-CLASS\_NAME" >> appended to it & then put through the regex engine\. >>     > No need to do that, Adam, just wrap the regex within parentheses\. So taking > the example you gave, the correctly\-wrapped regex would be: > > Include entries > openEHR\-EHR\-CLUSTER\\\.\(checklist\_item\-general\-cvs1\.v1|checklist\_item\-general\-cvs2\.v1|checklist\_item\-general\-cvs3\.v2|checklist\_item\-general\-cvs4\.v2draft|checklist\_item\-general\.v1|checklist\_item\-general\.v2|checklist\_item\-general\.v3\) > > The code to generate this is trivial in any programming language \(assuming > your programming language has the ability to concatenate strings, which I > reckon is a safe bet\)\. Unless you're programming in assembly language, it's > probably one simple line of code\. > Hum\. I agree that that would work but it's still wrong in principle IMHO i\.e\. then the regex is still a pseudo/meta regex i\.e\. it requires processing to turn it into a valid regex\. This would have to be duplicated in all the different implementations etc\.etc\. Adam --- ## Post #14 by @thomas.beale Adam Flinton wrote: >> >> No need to do that, Adam, just wrap the regex within parentheses\. So taking >> the example you gave, the correctly\-wrapped regex would be: >> >> Include entries >> openEHR\-EHR\-CLUSTER\\\.\(checklist\_item\-general\-cvs1\.v1|checklist\_item\-general\-cvs2\.v1|checklist\_item\-general\-cvs3\.v2|checklist\_item\-general\-cvs4\.v2draft|checklist\_item\-general\.v1|checklist\_item\-general\.v2|checklist\_item\-general\.v3\) >> >> The code to generate this is trivial in any programming language \(assuming >> your programming language has the ability to concatenate strings, which I >> reckon is a safe bet\)\. Unless you're programming in assembly language, it's >> probably one simple line of code\. >> > > Hum\. > > I agree that that would work but it's still wrong in principle IMHO i\.e\. > then the regex is still a pseudo/meta regex i\.e\. it requires processing > to turn it into a valid regex\. > > This would have to be duplicated in all the different implementations > etc\.etc\. > \*All, I also agree with Adam\. A regex should be able to be used over a population of strings \(identifiers in this case\) and have the effect of filtering out what you want\. Having to put the regex together first is inviting problems \- some implementations will forget, others will do it wrongly, the specifications of how to do it will change\.\.\.\. Practically speaking this does not change the specifications, but I suspect we should put some guidance in to the effect that regexes based on full identifiers should be used in archetype slots\. \- thomas --- ## Post #15 by @Arnett_John_NHS_Conn I agree with Adam and Tom, if a REGEX is being used to specify the constraints on the slot then it should be a valid regular expression which allows each permissible archetype to be matched on the basis of its full archetype id as specified by the Archetype ID Syntax\. John --- ## Post #16 by @Peter_Gummer1 Thomas Beale wrote: > I also agree with Adam\. A regex should be able to be used over a > population of strings \(identifiers in this case\) and have the effect of > filtering out what you want\. \.\.\. > > Practically speaking this does not change the specifications, but I > suspect we should put some guidance in to the effect that regexes based > on full identifiers should be used in archetype slots\. >   Surely the specifications should be stronger than just guidance\. Existing tools that are massaging the regex will cease to work if they are given a full regex\. It would be a breaking change, so I think it should be spelled out in the specification\. Otherwise, tools are going to have to try to do some clever guesswork to decide whether a given pattern is intended to match the full archetype id or just the domain concept part of it\. \- Peter --- ## Post #17 by @thomas.beale Well the problem here is that the specifications don't actually say anything about the regexes, or even that you have to use regexes to identify archetypes in slots \- it is just one way of doing it\. So any tools today that take a particular approach to regexes are already outside the standard\. I think what we should probably do is to state that regexes, if used, must be assumed to be usable as a filter on whole archetype ids without prior modification\. This still does not prevent some tool using the short regexes now in use in the archetype editor, since the clearly can be used at a technical level \- it is just that they might create errors\. And there may be some short patterns which are actually correct\. I'm not sure how we can formally state this\.\.\.\. \- thomas Peter Gummer wrote: --- ## Post #18 by @Adam_Flinton Peter Gummer wrote: > Thomas Beale wrote: >   >> I also agree with Adam\. A regex should be able to be used over a >> population of strings \(identifiers in this case\) and have the effect of >> filtering out what you want\. \.\.\. >> >> Practically speaking this does not change the specifications, but I >> suspect we should put some guidance in to the effect that regexes based >> on full identifiers should be used in archetype slots\. >>   > Surely the specifications should be stronger than just guidance\. >   I agree\. > Existing tools that are massaging the regex will cease to work if they > are given a full regex\. It would be a breaking change, so I think it > should be spelled out in the specification\. Otherwise, tools are going > to have to try to do some clever guesswork to decide whether a given > pattern is intended to match the full archetype id or just the domain > concept part of it\. > I agree in that it should state that the regex is the regex, that's it, nothing else is required etc\. Wrt existing tools A\) We already know some tools break with the current system e\.g\. the XSLT for rendering a choice i\.e\. ABC | DEF | GHI etc\. B\) The string you are actually matching on is the Archetype ID\. As such that should be the basis of the regex\. doing a pseudo\-meta regex will hurt in the long run\. Quick example: NB: These are simply examples & are not intended as a source of discussion in & of themselves\. Imagine English speaking people want to use archetypes whose names have meaning to them e\.g\. "clinician"\. Now imagine a variety of English speaking jurisdictions all wanting to have their own definition of "clinician"\. You could have openEHR\-EHR\-CLUSTER\.clinician\-AUS\.v1, openEHR\-EHR\-CLUSTER\.clinician\-NZ\.v1, openEHR\-EHR\-CLUSTER\.clinician\-UK\.v1, openEHR\-EHR\-CLUSTER\.clinician\-US\.v1 etc\. But then what happens if you then specialize one to show it's a surgeon e\.g\. would it be openEHR\-EHR\-CLUSTER\.clinician\-surgeon\-AUS\.v1 or openEHR\-EHR\-CLUSTER\.clinician\-AUS\-surgeon\.v1, etc\. Or to avoid that sort of problem you could namespace it at the other end e\.g\.: openEHR\-EHR\-AUS\-CLUSTER\.clinician\.v1, openEHR\-EHR\-NZ\-CLUSTER\.clinician\.v1, openEHR\-EHR\-UK\-CLUSTER\.clinician\.v1, openEHR\-EHR\-US\-CLUSTER\.clinician\.v1 or even AUS\-openEHR\-EHR\-CLUSTER\.clinician\.v1 Thus having \{/clinician\\\.v1\} and adding "openEHR\-EHR\-CLUSTER\." would not work\. If the archetype is chosen then someone would have chosen openEHR\-EHR\-CLUSTER\.clinician\.v1 if that is the archetype ID or openEHR\-EHR\-AUS\-CLUSTER\.clinician\.v1 if that was\. Fix it now & something like the above becomes a non\-issue in the future\. Adam --- **Canonical:** https://discourse.openehr.org/t/regex-in-archetypes-must-include-type/14796 **Original content:** https://discourse.openehr.org/t/regex-in-archetypes-must-include-type/14796