Unhandled exception parsing ADL

I've built an ant task for generating XML from ADL & running it against
out adl archetypes dies when an unhandled error pops up.

Exception in thread "main" se.acode.openehr.parser.TokenMgrError:
Lexical error at line 1, column 1. Encountered: "\u00ef" (239), after : ""
    at
se.acode.openehr.parser.ADLParserTokenManager.getNextToken(ADLParserTokenManager.java:27554)
    at
se.acode.openehr.parser.ADLParser.jj_consume_token(ADLParser.java:7061)
    at se.acode.openehr.parser.ADLParser.archetype(ADLParser.java:214)
    at se.acode.openehr.parser.ADLParser.parse(ADLParser.java:101)

      final public Archetype archetype() throws ParseException, Exception {
<cut>
    jj_consume_token(SYM_ARCHETYPE);
   
      final private Token jj_consume_token(int kind) throws ParseException {
    Token oldToken;
    if ((oldToken = token).next != null) token = token.next;
    else token = token.next = token_source.getNextToken();
   
<cut>

public Token getNextToken()

<cut>
   
         throw new TokenMgrError(EOFSeen, curLexState, error_line,
error_column, error_after, curChar, TokenMgrError.LEXICAL_ERROR);

I.e. the Token jj_consume_token() method only throws a parse exception
where getNextToken() :

A) Throws a TokenMgrError within the body of the method
B) Doesn't then throw it at the method level (e.g. public Token
getNextToken() throws TokenMgrError {} )
C) Token jj_consume_token() doesn't then either catch or throw the
TokenMgrError.

Adam

The problem resolves to the (optional) Byte Order Mark for UTF-8.
Various times people have opened ADL files with text editors & some
introduce the BOM upon saving.

I have written a little class which strips the BOM but given the parser
looks to be generated I am unsure how to integrate it.

The byte array to check for/starts with is:

    private static final byte UTF8_BOM_1 = (byte) 0xef;
    private static final byte UTF8_BOM_2 = (byte) 0xbb;
    private static final byte UTF8_BOM_3 = (byte) 0xbf;

...

final byte[] utf8Bom = {UTF8_BOM_1, UTF8_BOM_2, UTF8_BOM_3};

And then this needs to be removed prior to parsing

Any takers?

Any idea at which level this would be best integrated so that I can do
it & then send the changed files to this list or indeed how do I get
commit rights etc?

Adam

Adam Flinton wrote:

Hi Adam,

there is a parser rule in the adl.jj that ignores the BOM (ll. 224):

< * > SKIP : /* WHITE SPACE */
{
  " "

"\t"
"\n"
"\r"
"\f"
"\ufeff" /* UTF-8 Byte Order Mark */

}

Not sure why this doesn't work for you?

Cheers
Sebastian

Adam Flinton wrote:

Sebastian Garde wrote:

Hi Adam,

there is a parser rule in the adl.jj that ignores the BOM (ll. 224):

< * > SKIP : /* WHITE SPACE */
{
" "
> "\t"
> "\n"
> "\r"
> "\f"
> "\ufeff" /* UTF-8 Byte Order Mark */
}

Not sure why this doesn't work for you?

Good question...... However it doesn't seem to....Have you tried firing
in ADL which have a BOM?

Also... are you sure "\ufeff" is correct?

Not:

"\u00ef" (i.e. decimal 239).

Also the BOM is:

239 187 191 i.e. ||EF BB BF & not just 239|

I've integrated the fix into my own ant task though :

    // For BOM removal
    private static final byte UTF8_BOM_1 = (byte) 0xef;
    private static final byte UTF8_BOM_2 = (byte) 0xbb;
    private static final byte UTF8_BOM_3 = (byte) 0xbf;
    final byte[] utf8Bom = { UTF8_BOM_1, UTF8_BOM_2, UTF8_BOM_3 };
    byte[] bomBuffer = new byte[utf8Bom.length];
    long skipL = utf8Bom.length;

<cut>

                InputStream test = getInputStream(inputFile);
                InputStream in = getInputStream(inputFile);
                OutputStream out = getOutputStream(outputFile);
                ADL2XMLConvertor conv = new ADL2XMLConvertor();
                //System.out.println("PCDTO convert() OK to here
inputFile = "+inputFile);
                boolean skip = checkRemoveBOM(test);
                System.out.println("skip = "+skip);
                if(skip){
                    System.out.println("skipL = "+skipL);
                    in.skip(skipL);
                }
                err = conv.convert(in, out);

<cut>

    public boolean checkRemoveBOM(InputStream in) {
        boolean containsBom = false;
        try {
            int nRead = in.read(bomBuffer, 0, bomBuffer.length);
            if (nRead != -1) {
                if (java.util.Arrays.equals(bomBuffer, utf8Bom)) {
                    // System.out.write(bomBuffer, 0, nRead);
                    System.out.println("Contains BOM");
                    containsBom = true;
                }
            }
        } catch (java.io.IOException e) {
            System.err.println("I/O error occurred: " + e.getMessage());
        }
        return containsBom;
    }

& this works so I can get on with the generation to XML which is the
main thing for me.

I have an XML diff task & the Ocean ADL2XMl converter so I am about to
run some test runs & then eventually run against the NHS ADL which
is in:

http://svn.openehr.org/knowledge/archetypes/dev-uk-nhs/adl

So I may be coming back with a list of diffs.

TIA

Adam

Hi Adam,

This should be an easy fix. Just include your BOM1,2,3 as part of the SKIP block. Can you forward some archetype that has these BOMs for testcase?

Cheers,
Rong