ADL 1.5 ANTLR definitions...and a few questions.

Hello everyone

A brief email to let you know that a first version of ADL and cADL
expressed in EBNF and ANTLR's meta-language is now avaialble at:
https://github.com/aanastasiou/adl_ebnf

Next up is ODIN.

The ANTLR definitions specifically are available at:
https://github.com/aanastasiou/adl_ebnf/tree/master/src/antlrDefs/adl

In general, transcribing from:
http://www.openehr.org/wiki/display/spec/ADL+1.5+parser+resources was a
straightforward task for the majority of the rules.

However, there are a few points where i would appreciate some help /
guidance in order to end-up with a useful set of definitions for the specification. A list of these items is attached at the end of this message.

If you require any additional information, please let me know.

All the best
Athanasios Anastasiou

POINTS OF NOTICE REGARDING ADL 1.5:

1) Are there any ADL 1.5-specific files available out there for testing
purposes?

2) How significant is whitespace such as "[ \t\n\r]*" for ADL?
If no assumptions are made (like, "New line marks the start of a new
statement"), i would like to include a universal rule that skips such
whitespace.

3) Is ID_CODE_LEADER always going to be 'id'? Is it then considered a
SYMbol?

4) There is something odd about the definition of
V_ISO8601_DURATION_CONSTRAINT_PATTERN. The definition seems to include a
trailing '}' but then the parser puts it back in the stream. In the
ANTLR definition i have omitted the '}', would this be a problem? Can we
clarify this rule a bit?

5) Some rules contain comments such as: "rule to be removed once
archetypes containing "T" are gone"...Are these archetypes gone by now?
Can i clean up those rules?

6) Throughout the yacc definitions there are some conditionals whose
purpose i do not entirely understand. For example, during the definition
of V_REGEXP, there are conditional definitions for each constituent part
of a regular expression. Why is this? Same goes for V_STRING,
V_CADL_TEXT, V_RULES_TEXT, V_ODIN_TEXT. At the moment, i am matching these
with a non-greedy operator. Would this be a problem?

Athanasios,

I have been re-organising the resource pages somewhat - I have put your information below on the ADL parser resources page - feel free to rewrite & keep up to date.

  • thomas

Hello Thomas

Thank you very much, i will try to keep it up to date with current progress.

All the best
Athanasios Anastasiou

Hopefully the starting point for all resources is now . Regression test archetypes are . it isn’t. However, the current tools in some places assume that the outer level of keywords - the section names ‘languages’, ‘description’, ‘definition’ etc are all against the left hand edge of the file and that no other content is. This is to make it easier to detect the separate sections in a simple way, so that the ‘description’ section keyword is not mistaken for the word ‘description’ elsewhere. Of course this is a hack, and no-one should repeat it, but I’m mentioning it here, since you will see it in the . At some point, I’ll remove it, hopefully copying some nicer production rules that someone else comes up with. good question. Although the usual computer science thinking is: make everything generic all the time, in this case, the precise reason for these code ‘leaders’ (i.e. ‘id’, ‘ac’, ‘at’) is to separate codes into different semantic groups (i.e. ids, value set codes and value codes). In the current conception of ADL I think they should be preserved, because they make parsing (and error reporting) much easier, and make archetypes much easier to read. In the long term future, I think we might move to a different system where all the codes are external, and archetypes have a managed online terminology. We are quite some way off from that technology, and I would therefore assume that moving to it, if we ever get there, means a of existing archetypes. Consequently, I think the current coding approach should be treated as reliable for now (this is not to say that the current system can’t be improved). I guess you mean this rule: This is a scanner rule to get around a bug in the Archetype Editor which may or may not still exist. You should not replicate this one, or any other rule that has similar comments! yes, I would not replicate any such rules. I have not had the time to determine which tool errors have been fixed, but in any case, I think you should go ‘clean’. for example this This kind of thing is a pretty standard approach for dealing with strings, regexes and any other chunks of content that can easily contain normal keywords and syntax inside, but where of course the syntax has no meaning, so you don’t want to hit any of the normal rules for {}, keywords etc. So you need to include a way to consume these chunks to the right end point (and don’t forget in Strings, that means going past quoted " to get to the real "). I don’t know what way this is done in Antlr, but there must be a standard way to replicate it. In general, don’t be afraid to find a better scanner or production rule approach than what you see in the current compiler. Some of those rules are very old now. - thomas