ADL1.4 Grammar files

richard.kavanagh · 3 April 2023 09:34

I have started to have a ‘play’ with the ANTLR grammar files for archetypes, specifically the ones at adl-antlr/src/main/antlr/adl at master · openEHR/adl-antlr · GitHub which I assume are the latest versions.

Having used ANTLR to generate some code, the parsing fails quite quickly. My very limited knowledge of ANTLR syntax puzzles me as the grammar looks like it does not match what appears in the Archetype ADL.

Now I assume this grammar is used widely, so I assume it is correct.

The first issue I see (within adl14) is with the metadata.

the grammar has the following

concept_section:
 'concept' '[' AT_CODE ']';

meta_data: '(' meta_data_item  (';' meta_data_item )* ')' ;

meta_data_item:
      meta_data_tag_adl_version '=' VERSION_ID
    | meta_data_tag_uid '=' GUID
    | meta_data_tag_build_uid '=' GUID
    | meta_data_tag_rm_release '=' VERSION_ID
    | meta_data_tag_is_controlled
    | meta_data_tag_is_generated
    | ALPHANUM_ID ( '=' meta_data_value )?
    ;

the definition of ‘VERSION_ID’ (in base_lexer) is

VERSION_ID          : DIGIT+ '.' DIGIT+ '.' DIGIT+ ( ( '-rc' | '-alpha' ) ( '.' DIGIT+ )? )? ;

The code complains when it sees

archetype (adl_version=1.4; uid=f51e1f4d-a244-422d-b01e-429c9214b84b)

From what I can see the grammar is expecting a version in the format x.y.z (i.e. three components) and ‘1.4’ only has two.

As mentioned, previously I assume these grammar files are used in a lot of tooling, so my understanding of them is probably the issue.

What am I missing?

Seref · 3 April 2023 11:00

Just in case it helps you @richard.kavanagh , you may want to revise your fundamental assumption slightly.
The adl grammar repo’s readme says:

This repository contains Antlr4 grammars and related resources for ADL. Currently this is limited to ADL2, but can be extended to ADL 1.4 as needed.

So it may be the case @thomas.beale wrote these for adl2 primarily and adl 1.4 was a secondary concern.
In terms of wide use: Archie from Nedap, led by @pieterbos uses this grammar as its readme says, and EhrBase uses Archie. However, I’m not sure if there’s any major adl 1.4 parsing going on with EhrBase, because its use cases are more related to OPTs. Nedap is an ADL 2.0 shop as far as I know, though you may want to look at Archie code (integration/unit tests) that translates adl 1.4 to 2.0, which was experimental if my memory is correct.

This repo originally developed by @rong.chen may fit the bill for you. It is not antlr based, but it is used in production and actively maintained as you can see from the commits.

People I referenced from this response may clarify or correct things further, but my gut feeling is you may want to look at the second repo I referred to above for adl 1.4 related work if you’re seeking the safety of being used in actual production code.

thomas.beale · 3 April 2023 12:00

Hi Richard,
those grammars are the production ones, i.e. used in Archie and so on. So up to date. Maintained mainly by @pieterbos at Nedap and somewhat by me.

I’ll have to spend a bit of time to see what is specifically going wrong there.

I have also created a re-engineered set of grammars for ‘everything’ in openEHR, which has undergone some testing, but needs more and is not in production use. These grammars are ‘modal’ (for ADL) and cover ADL1.4 and ADL2, as well as a lot of other stuff, and are cleaner than the production ones.

Depending on what you are trying to do, you may want to try these.

richard.kavanagh · 4 April 2023 08:56

Thanks, @Seref a lot of useful information there. I am specifically looking to follow an ANTLR grammar based path but the repos you signposted are very interesting all the same.

richard.kavanagh · 4 April 2023 09:00

@thomas.beale that’s great, thank you.

I will take a look at them in more detail, I see you have changed the methodology to a two pass approach which hopefully will simplify things from my perspective.

Looks like I need to step up my learning on ANTLR grammar syntax as you have upped the complexity

I’m not looking to build any production code, just some tools to help me learn more…

thomas.beale · 4 April 2023 11:45

Well modal grammars do look a bit more complex, but they are a lot easier to reason about than single mode grammars for languages that have obvious sub-languages, which is almost any language, because even things like embedded regexes are another language… but ADL is a multi-part language of course - chunks of ODIN (which will become JSON in the near future), the constraint part, and a rules part.

If you open either Git repo project in IntelliJ with the Antlr plug-in installed, you will be able to run tests on them, visualise sample text grammar trees etc.

richard.kavanagh · 7 April 2023 23:07

@thomas.beale looking at the ‘combined’ folder in the repo you provided, is there a missing file?
I don’t see an ‘ADL14Lexer.tokens’ file, although the ‘AD2Lexer.tokens’ file does exist.

Not that I plan to use them just at the moment, but some of the other lexer files seem to be missing their token files as well.

thomas.beale · 8 April 2023 08:33

Hi Richard,
The tokens files are generated, and I have not been careful on pushing them up to GitHub. I’ll check that.

richard.kavanagh · 8 April 2023 08:54

Thanks Thomas. Starting to get my head around ANTLR now.