AQL ANTLR4-grammar

Hi!

At Medinfo2015 i have been positively surprised by all the different openEHR implementations that I had not heard of before.

AQL capability is present or planned in both some of the previously and recently known openEHR implemetnations

It would be good to collaborate around testing, updates and practial application of shared AQL grammar resources so that it becomes easier to support AQL in openEHR implementations.

When implementing AQL in the LiU-EEE REST demonstrator we used Java CC, that was a bit messy and uncomfortable working with. Using ANTLR4 seems like a potentially better way to go now, (including better tooling etc) so let’s help each other with ANTLR4 grammar and application.

Related wiki-pages:

Please respond and tell the rest of us what kind of plans, ideas and experiences you have regarding AQL parsing/implementation/translation etc.

I am working with Antlr4, studying it also. It’s possibilities are amazing. I think it is the best grammar environment, Terence Parr (rhe designer of it) explains why, and I think he is right. One important reason is that Antlr4 does not need code fragments in its grammar so that he grammar really is target language independent. With target language, the engine architecture but also programming language is meant.

There are a lot of grammar examples, and maybe there is one for xquery which would be very similar as AQL

But writing the grammar is one thing, the next thing is, what to do with the generated Antlr4 visitor or tree classes ( in any programming language)

Writing a query engine could be too much asked for an open source community. So better would be to use the grammar to come to an AQL to XQuery translation. When that is ready, we could use AQL on XML databases, which, are good performing when one is ready to pay a good price.

Maybe there is an object oriented query language which engine can be used for OO databases, but I am afraid, that is not my metier.

So, when the purpose is to use AQL on XML databases (I remember the old discussion we had about XML databases, but maybe we can agree to disagree on that), and translation to xquery is all right, then I can offer some help. I want to do that because I think a congruent set of standard techniques based on the same paradigm (ADL paths) wil offer enormous possibilities.

So, please let me know if it is necessary to help. Maybe others are very experienced in Antlr4 and it could be that it is more efficient when they do it.

I wrote this a bit confusing, too late at night and on my mobile (small screen).

The idea I wanted to write is that with Antlr4, you can write a grammar, without knowing the purpose of the grammar. If it is used to write a query engine, or if it is used to write a translator from AQL to XQuery or even SQL. It makes no difference for the grammar. This is the first grammar tool ever, which has this feature. All other grammar-tools, also the previous versions of Antlr, and JJ (used for ADL before) needed code-fragments in the grammar and were, in this way, bound to the target purpose.

So because a grammar can serve anyone, because the grammar is purpose independent, we can all benefit from this idea from Erik.

That was what I wanted to write yesterday.

Excuse me for any confusing.

Bert

Antlr4 rule capabilities and particularly pattern matcing is weaker than yacc/lex (in some cases quite a lot weaker), but it's more concise for the production rules, and as you say, it works for any output side. So that's a big win. Over time, I expect we'll cnvert everything to Antlr4.

Of course, Yacc/lex can only be used to generate C-code. It's functionality in pattern matching is limited to this.

To do whatever you need to do, writing a compiler, translating programming languages into another (transpiler), translating a query language into another, processing data, JSON,. XML, CSV, to many custom made targets, generated code accessible to choose for a listener/visitor or tree pattern, and having scoped symbol tables, that is where Antlr comes in.
That is why I have seriously picked up in learning to work with this. Regarding to pattern matching, its left/right associativity and direct left recursion make it an unique tool.
It is also very matured, as it is now in version 4.5. Especially it has grown a lot in the 4.x-version. Terence Parr researched 25 years parsers and parser generators.

Translating AQL to xQuery, must be very easy.
I think it is very suitable for the task Erik is mentioning, he is only wanting to define the grammar, as far as I understand.
What to do with it then, it is reasonably easy by using the Antlr-generated code pattern.

You can even translate the generated Java code to Eiffel by using Antlr :wink:
I think that is not very hard to do.

I checked it with parsing Pascal, Java, CSV, JSON, XML, I did not encounter any weakness.
I was pleasantly surprised by the simplicity of some grammars.
There are grammars and test files for many other languages.

Maybe the weakness is C-related? It is what one expects because the nature of Yacc/lex.
Although Java has a lot of syntax similarities with C.
Of course, there are bugs in Antlr, quite a few (105 open at this moment for Antlr4.5), but I do not consider them yet as a weakness of the concept.
But maybe some are.
I did not see anyone else referring in that way to it.

So I was not aware of any issues in pattern recognition. It sounds like a serious issue.
I think there must be good information about this.
Can you give more information or an example of this pattern-matching weakness.

Thanks
Bert

To say that, have you ever seen code generated by yacc/lex, bison or JJ. It is unreadable, and impossible for a normal human being to debug.
I removed one or two little bugs in the ADL-parser generated by JJ (regarding to some 13606 keywords).
I also once had to find a bug in some Bison generated C code. I still get said as I think back to those days.

It gave in both occasions a very unpleasant feeling of uncertainty, because, one can see at some parts what a small change causes, but how can one ever be sure that nothing unexpected changed in those hundreds of regenerated functions with only a cryptic name. You need to run your many junit-tests and still you will not be sure.

What those parsers/generators do is very severe cursing against every good programming practice.

How different is that in Antlr4.5, where there is a clean pattern of visitor or listener classes, which will handle your purposes.

It is not without reason that the JVM compiler makes no use of any generated code at all. Every parsing routine is written by hand.
That is the other way to do it, and can only be done by a large team.
But, what you see, if you look at Javac, it is all neat and understandable code with visitors handling everything.

And that is exactly the same thing Antlr does, but not by writing by hand, but generating.
It would not surprise me if the next Javac would use Antlr for code-parsing.

I regard Antlr4.5 as a big step in software development, causing complex grammar much easier to handle and to understand.

I will welcome very much if the OpenEHR community will have all parsable code done with Antlr grammars. I think it is necessary to maintain a good quality standard.

So, was I convincing?

:wink:

Have a nice day
Bert

Over time, I expect we’ll cnvert everything to Antlr4.

Last remark, I thought there were doubts and delays regarding to converting to antlr4, also because of the expressed doubt about its quality, this message gave me the idea, but as I checked the sources, which I was not aware of, there is really a lot done already.

Good news. Good progress, thanks to the contributors, for that.

Maybe the communication channels are too many to be in an easy way on the latest info’s all the time: wiki, github, mailinglists, even wiki’s on github.

Good that Stackexchange thing didn’t work out. :wink:

Maybe communication is something to reconsider, sometimes, more is not always more. But that is another discussion. Having a central point where all communications have daily refreshed references, something like that. Like a Jira dashboard.

I won’t bother you anymore, quiet here anyway, I guess the majority is enjoying a jetlag from the medinfo.

Have a good sleep
Bert.

Antlr4 rule capabilities and particularly pattern matching is weaker than yacc/lex (in some cases quite a lot weaker),

Of course, Yacc/lex can only be used to generate C-code. It's functionality in pattern matching is limited to this.

nope - it can be used to generate anything (I use it to generate Eiffel code). The point is it can only generate 1 language.

To do whatever you need to do, writing a compiler, translating programming languages into another (transpiler), translating a query language into another, processing data, JSON,. XML, CSV, to many custom made targets, generated code accessible to choose for a listener/visitor or tree pattern, and having scoped symbol tables, that is where Antlr comes in.

actually - these are all output stages, and they are perfectly doable with any yacc/lex compiler serialiser - which is what the ADL workbench is, and what it does. Antlr des make a lot of this easier however.

Maybe the weakness is C-related? It is what one expects because the nature of Yacc/lex.

yacc/lex grammars have nothing to do with C - they work the same way with any language.

The weaknesses are mainly in the regex matching for string patterns - Antlr doesn't do anything like full regex. Its stateful sub-grammar handling needs a bit of work as well.

But I agree, it's probably the future, at least for a while.

- thomas

That is interesting, I didn't realize that. Thanks for the info

Bert

A teacher, years ago, told the classroom, if you try to solve a problem with regex, then you have another problem added.
(he was joking, but not entirely)

I would like to know an example or description of an issue in which Antlr4.5 does not work good enough in the OpenEHR context.
I think that information can help me and other developers really a lot.

Thanks in advance,
Bert

I did some shallow research, maybe it can help answering this question

The specs say, antlr4.5 support standard regular expressions

http://meri-stuff.blogspot.nl/2011/09/antlr-tutorial-expression-language.html#LexerBasics
I guess this is not the same as full regular expressions, but what is missing?

Here are the language features from the various regex engines:
https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines#Language_features

It seems very natural that Antlr4.5 supports the standard Java-regex-engine, because that is supporting code one get for free when programming in Java.

Added by Antlr to the Java regex is recursion. It is described as one of the key grammar-features.
I am not sure, maybe, recursion in regex context is not the same as recursion in grammar-context.

I think, these are important issues, because, if Antlr4.5 is used, there is another code-base then when one of the traditional grammar tools is used.
So, if there is knowledge of shortcomings of Antlr4.5 in the OpenEHR context, or expected future OpenEHR context, then it is very important that all developers know about it, so they can avoid running into very expensive problems.

Bert