AOM C_STRING - single regex, or a list of strings?

This is a question we have wrestled with in defining the AOM String constrainer type C_STRING. Bert has brought it up with some useful analysis in this issue. I would be interested in other opinions on this question so that we can decide the best form for the C_STRING constraint.

  • thomas

IMHO even if regexes are powerful enough using them for representing
lists or literals is an overkill (precisely by the reason you gave,
having to control and escape all characters would make adl code more
less readable (think that a series of specialized atx.x.x codes would
always need to be escaped).
I don't see the problem of having just a list of strings and a regex
for C_String.
(I DO think that having a list of regexes is an overkill BTW)

right - but if we allow List<String> then technically it is possible to end up with more than one regex, unless we add some invariant like

is_regex (constraint.first) implies constraint.count = 1

I am inclined to think it's easier to just program it as:

across constraint as c some
     is_regex (c) and pcre_matcher. matches (extract_regex (c), data) or else c.same_string (data)
end

IN Java it would be an iteration loop with an exit if a match is found. You get the idea :slight_smile:

- thomas

That is also my position, I don't know if it was clear, because, when I explain too much, sometimes the message becomes unclear :wink:

So resuming:

There will, in my position, be two kind of constraints in a CString,

1) one is a single string, which will always be a regexpr,
2) and one is a list of strings will look like {"this","that"}, which can also have one string, but in that case needs to be expressed as {"string",...} the latter results in a list with one element in the CString-constraint. (this is compatible with the odin-syntax).

The stringlist syntax (2) will never be processed as regexpression.
When we agree on this, we can drop the enclosing slashes for regexpr in the single string (1), because a single string is always a regexpression, so everything in this single string will be written as a regexpression, always. Then we also do not need anymore to escape the forward-slash "/" in the regexpression because the conflict with the enclosing forward-slashes is gone

Completely agree with your proposal, Bert

OK, but there is only one data structure:

constraint: List

Are you saying you want another, separate string field?

It was already like that in the original AOM (pattern & list)

No, that is not necessary, the structure is redefined from CPrimitive, so it can be of type Object, which can result in a String or in a List. This needs then to be explained in the description. I would not have a problem with this. Another solution is indeed to add a property for a single regexpr. Bert

Do we have an agreement on this, like proposed?

If so, then I can propose the change of grammar according this agreement, which is not very much, and then the case can be closed.

Please let me know, so that I can proceed

thanks
Bert

right - but a couple of years ago, we simplified and reworked the C_PRIMITIVE subtypes (mainly based on analysis done due to AML). So today there is only 'constraint'. (There is an additional 'pattern_constraint' in C_TEMPORAL though).

Still not sure what you are proposing though...

- thomas

right - if we redefine it to just String, then the constraint is always a regex, even when it is fixed string(s). Or else we redefine it to List (how it is now) and allow this to represent:

well, we need to get this right for everyone. The current spec works, but you are suggesting that it be changed to make it ‘regex-only’; Diego I think is suggesting something else; but if we were going to change it, we need to get input from openEHR tool implementers, 13606 group, AML developers, CIMI…

  • thomas

That seems a long discussion to come, it will take years. I know that kind of discussions. In 13606 they are in a renewal. They are involving ISO also. The whole world is gonna have an opinion about it. The best way to stop innovation is standardization. I learned that from one of your blogs. But I understand your position, ADL 1.5 is part of an ISO standard, so you need ADL 2.0 to have the same position OK, I give up. Just leave it as it is. We have a constraint, and that is a List. That List can contain, one or more regular expressions, which are recognizable by enclosing forward-slashes, and the list can contain one or more strings, which are recognizable because of not having those forward slashes. I can handle it in code, no problem, I just thought it was a stupid idea to do it that way. But I am convinced, I cannot stand up against the whole world. I want the AOM to be ready soon, so, I stop arguing and do it this way, it is not that important anyway. As long as all needed constructs are supported it is good enough. Have a nice day. It is really not important. Best regards Bert

Don’t worry, I am thinking more in terms of weeks not years. But we do have a spec that does what is needed, the debate is whether there is a clearer way, and whether it is worth changing it. But - to make a (possibly) breaking change, we need more than a couple of days to reflect …

  • thomas

No problem, I can proceed with the other primitives, integers, datetime, duration etc, and there is still a lot of structure testing to do also, and I only work one or two hours a day on this. :wink: