Archetype IDs starting with a numeric?

siljelb · 25 February 2021 08:30

Hi!

We’ve noticed a behaviour both in CKM and AD where we’re not allowed to assign an identifier (the domain_concept part of the ARCHETYPE_ID) with 0-9 as the first character. We can’t find where this is stated in the specs, and we don’t understand why this limitation exists.

Help please?

sebastian.garde · 25 February 2021 08:53

It is defined here: Archetype Definition Language 1.4 (ADL1.4)
See the regex for V_ARCHETYPE_ID which requires to start each part with a letter before a number can be used.

----------/* V_ARCHETYPE_ID / ---------------------------------------------
[a-zA-Z][a-zA-Z0-9_]+(-[a-zA-Z][a-zA-Z0-9_]+){2}.[a-zA-Z][a-zA-Z0-9_]+(-[azA-Z][a-zA-Z0-9_]+).v[1-9][0-9]*

heather.leslie · 25 February 2021 09:09

So is there a reason that the regex requires a letter first, or is it just arbitrary?

We have archetypes where the concept name starts with a number, usually well-known scores or scales. It would make sense that the id aligns with the concept name.

sebastian.iancu · 25 February 2021 12:58

I don’t know personally the original thoughts and reasons, I guess this happend 15-20 years ago.
What I can assume is that is related to the fact that domain_concept is an identifier, and as programming good practice as well as (perhaps) the parser/lexer perspective, such identifiers does not start with numbers.

I guess the relevant section is in Archetype Identification, but it does not explicitly state my assumptions above. Perhaps @thomas.beale can give more hints / opinions?

birger.haarbrandt · 25 February 2021 13:03

The challenge is that within programming languages, often class names are not allowed to start with a number. For example, we use Templates to automatically generate classes. Theoretically we can convert these numbers to their word equivelant (9 becomes “Nine” etc.) but this might become awkward. No execuse to constrain clinicians with these technical details but might be the explanation for this.

heather.leslie · 26 February 2021 00:30

Yes, we found this. Hence our question.

We have two archetypes at present that are triggering the question - both scores/scales.

The ‘4 ‘A’s Test (4AT)’. The formal name is the “4 'A’s test” but is commonly known as the ‘4AT’ the ID is currently OBSERVATION.four_at.v0, which seems a little awkward and misaligned with the concept name - Observation Archetype: 4AT [openEHR Clinical Knowledge Manager]; and
The '6 Item Cognitive Impairment Test (6CIT) ’ - ID is currently OBSERVATION.six_cit.v0, commonly known at ‘6CIT’ with the same mismatch - Observation Archetype: 6 Item Cognitive Impairment Test (6CIT) [openEHR Clinical Knowledge Manager]

It’s not deal-breaking, more just an ugly compromise , I suppose. We just wanted to be sure it was for a reason, not just someone never thought about it.

thomas.beale · 26 February 2021 11:58

I don’t think there is a strong reason for it - we (in IT-land) just have this 50yo habit of not allowing identifiers to start with numbers, which I have replicated in the spec when I created the regexes for Archetype id. I can’t think off-hand of any specific reason not to allow numbers in this case. Where code generation occurs (as Birger mentioned), appropriate conversions can always be made to generate legal class or module names.

However, I don’t know how fast this could be changed in tools, which are likely to have copied the regexes from the specs. Maybe @pieterbos , @yampeku, @borut.fabjan, @sebastian.garde could have a think about that. We can relax the specs easily enough.

Juha-Pekka.Tolvanen · 1 March 2021 12:18

This does not happen only openEHR, as other communities defining metamodels tend to follow the same rule, like AUTOSAR in automotive, or EAST-ADL for EE systems. The rule there has been to my best knowledge due to XML serialization as an XML element whose name starts with a number is illegal XML. The same is in modeling support done in some UML profile tools where classes implement it and class names should not start with number. So unfortunately, technology choices dictate the reality - which it obviously should not.

ian.mcnicoll · 1 March 2021 12:35

Thanks - @heather.leslie - I think we maybe found our reason for not messing about with the current rules!! It is a little ugly but I think we can live with that if there are potential tech gotchas out there.

thomas.beale · 1 March 2021 12:55

This is true, but since serial formats are not our primary representation of anything, converters that generate out e.g. XML can always have rules added to synthesise legal names etc. Having said that, it puts the lexical ids of serial format entities out of sync with the original artefact (in ADL, for example), which will always create its own problems.

For sanity’s sake it is arguably better to stick with the less technically painful approach, and put up with a bit of cognitive annoyance in the ids.

sebastian.garde · 1 March 2021 13:28

I agree with the core technical challenges Juha-Pekka and Birger refer to.
Yes, you wsill be able to work around this in extra steps somehow but that is then also an extra source for errors.

In addition, changing this at this stage has the potential to cause problems in every existing tool defining or managing archetypes including underlying parsers etc, or directly or indirectly consuming/using them including systems, code generators, transforms - various subtle places where it may fail downstream, in sometimes subtle ways.

heather.leslie · 3 March 2021 23:43

OK

Thanks for the responses. We’ll stick with ugly.

Topic		Replies	Views
Error in CKM body_weight ADL2 form archetype Clinical adl	8	370	26 September 2023
VDFAI Validation interpretation ADL	7	797	1 April 2021
Do we have conflicting definitions for versioning in archetype node ids? ADL	2	493	27 October 2020
Evolution of identifiers (a future/current problem?) Technical (archive)	3	8	17 June 2012
(in)Valid TERMINOLOGY_IDs in templates Specifications conformance , template	14	940	20 September 2021
ADL/AOM 1.5 - id-codes unification - the final change Technical (archive)	3	10	2 January 2014
Version tracking in unpublished archetypes ADL	8	471	25 January 2021
Archetype references in archetypes, syntax Specifications	13	422	17 December 2021
Archetypes vs. Templates, archetype version in slots Technical (archive)	6	11	1 October 2014
Specialization depth Technical (archive)	19	24	17 December 2015

Archetype IDs starting with a numeric?

Related topics