Archetype IDs starting with a numeric?

Hi!

We’ve noticed a behaviour both in CKM and AD where we’re not allowed to assign an identifier (the domain_concept part of the ARCHETYPE_ID) with 0-9 as the first character. We can’t find where this is stated in the specs, and we don’t understand why this limitation exists.

Help please? :blush:

1 Like

It is defined here: Archetype Definition Language 1.4 (ADL1.4)
See the regex for V_ARCHETYPE_ID which requires to start each part with a letter before a number can be used.

----------/* V_ARCHETYPE_ID / ---------------------------------------------
[a-zA-Z][a-zA-Z0-9_]+(-[a-zA-Z][a-zA-Z0-9_]+){2}.[a-zA-Z][a-zA-Z0-9_]+(-[azA-Z][a-zA-Z0-9_]+)
.v[1-9][0-9]*

1 Like

So is there a reason that the regex requires a letter first, or is it just arbitrary?

We have archetypes where the concept name starts with a number, usually well-known scores or scales. It would make sense that the id aligns with the concept name.

2 Likes

I don’t know personally the original thoughts and reasons, I guess this happend 15-20 years ago.
What I can assume is that is related to the fact that domain_concept is an identifier, and as programming good practice as well as (perhaps) the parser/lexer perspective, such identifiers does not start with numbers.

I guess the relevant section is in Archetype Identification, but it does not explicitly state my assumptions above. Perhaps @thomas.beale can give more hints / opinions?

The challenge is that within programming languages, often class names are not allowed to start with a number. For example, we use Templates to automatically generate classes. Theoretically we can convert these numbers to their word equivelant (9 becomes “Nine” etc.) but this might become awkward. No execuse to constrain clinicians with these technical details but might be the explanation for this.

1 Like

Yes, we found this. Hence our question.

We have two archetypes at present that are triggering the question - both scores/scales.

It’s not deal-breaking, more just an ugly compromise :face_with_raised_eyebrow:, I suppose. We just wanted to be sure it was for a reason, not just someone never thought about it.

2 Likes

I don’t think there is a strong reason for it - we (in IT-land) just have this 50yo habit of not allowing identifiers to start with numbers, which I have replicated in the spec when I created the regexes for Archetype id. I can’t think off-hand of any specific reason not to allow numbers in this case. Where code generation occurs (as Birger mentioned), appropriate conversions can always be made to generate legal class or module names.

However, I don’t know how fast this could be changed in tools, which are likely to have copied the regexes from the specs. Maybe @pieterbos , @yampeku, @borut.fabjan, @sebastian.garde could have a think about that. We can relax the specs easily enough.

3 Likes

This does not happen only openEHR, as other communities defining metamodels tend to follow the same rule, like AUTOSAR in automotive, or EAST-ADL for EE systems. The rule there has been to my best knowledge due to XML serialization as an XML element whose name starts with a number is illegal XML. The same is in modeling support done in some UML profile tools where classes implement it and class names should not start with number. So unfortunately, technology choices dictate the reality - which it obviously should not.

Thanks - @heather.leslie - I think we maybe found our reason for not messing about with the current rules!! It is a little ugly but I think we can live with that if there are potential tech gotchas out there.

This is true, but since serial formats are not our primary representation of anything, converters that generate out e.g. XML can always have rules added to synthesise legal names etc. Having said that, it puts the lexical ids of serial format entities out of sync with the original artefact (in ADL, for example), which will always create its own problems.

For sanity’s sake it is arguably better to stick with the less technically painful approach, and put up with a bit of cognitive annoyance in the ids.

1 Like

I agree with the core technical challenges Juha-Pekka and Birger refer to.
Yes, you wsill be able to work around this in extra steps somehow but that is then also an extra source for errors.

In addition, changing this at this stage has the potential to cause problems in every existing tool defining or managing archetypes including underlying parsers etc, or directly or indirectly consuming/using them including systems, code generators, transforms - various subtle places where it may fail downstream, in sometimes subtle ways.

2 Likes

:face_with_raised_eyebrow: OK

Thanks for the responses. We’ll stick with ugly.

2 Likes