# Unhandled exception parsing ADL **Category:** [Reference Implementation: Java (archive)](https://discourse.openehr.org/c/reference-implementation-java-archive/154) **Created:** 2008-10-23 08:28 UTC **Views:** 3 **Replies:** 4 **URL:** https://discourse.openehr.org/t/unhandled-exception-parsing-adl/16106 --- ## Post #1 by @Adam_Flinton I've built an ant task for generating XML from ADL & running it against out adl archetypes dies when an unhandled error pops up\. Exception in thread "main" se\.acode\.openehr\.parser\.TokenMgrError: Lexical error at line 1, column 1\. Encountered: "\\u00ef" \(239\), after : ""     at se\.acode\.openehr\.parser\.ADLParserTokenManager\.getNextToken\(ADLParserTokenManager\.java:27554\)     at se\.acode\.openehr\.parser\.ADLParser\.jj\_consume\_token\(ADLParser\.java:7061\)     at se\.acode\.openehr\.parser\.ADLParser\.archetype\(ADLParser\.java:214\)     at se\.acode\.openehr\.parser\.ADLParser\.parse\(ADLParser\.java:101\)       final public Archetype archetype\(\) throws ParseException, Exception \{ <cut>     jj\_consume\_token\(SYM\_ARCHETYPE\);           final private Token jj\_consume\_token\(int kind\) throws ParseException \{     Token oldToken;     if \(\(oldToken = token\)\.next \!= null\) token = token\.next;     else token = token\.next = token\_source\.getNextToken\(\);     <cut> public Token getNextToken\(\) <cut>              throw new TokenMgrError\(EOFSeen, curLexState, error\_line, error\_column, error\_after, curChar, TokenMgrError\.LEXICAL\_ERROR\); I\.e\. the Token jj\_consume\_token\(\) method only throws a parse exception where getNextToken\(\) : A\) Throws a TokenMgrError within the body of the method B\) Doesn't then throw it at the method level \(e\.g\. public Token getNextToken\(\) throws TokenMgrError \{\} \) C\) Token jj\_consume\_token\(\) doesn't then either catch or throw the TokenMgrError\. Adam --- ## Post #2 by @Adam_Flinton The problem resolves to the \(optional\) Byte Order Mark for UTF\-8\. Various times people have opened ADL files with text editors & some introduce the BOM upon saving\. I have written a little class which strips the BOM but given the parser looks to be generated I am unsure how to integrate it\. The byte array to check for/starts with is:     private static final byte UTF8\_BOM\_1 = \(byte\) 0xef;     private static final byte UTF8\_BOM\_2 = \(byte\) 0xbb;     private static final byte UTF8\_BOM\_3 = \(byte\) 0xbf; \.\.\. final byte\[\] utf8Bom = \{UTF8\_BOM\_1, UTF8\_BOM\_2, UTF8\_BOM\_3\}; And then this needs to be removed prior to parsing Any takers? Any idea at which level this would be best integrated so that I can do it & then send the changed files to this list or indeed how do I get commit rights etc? Adam Adam Flinton wrote: --- ## Post #3 by @system Hi Adam, there is a parser rule in the adl\.jj that ignores the BOM \(ll\. 224\): < \* > SKIP : /\* WHITE SPACE \*/ \{   " " > "\\t" > "\\n" > "\\r" > "\\f" > "\\ufeff" /\* UTF\-8 Byte Order Mark \*/ \} Not sure why this doesn't work for you? Cheers Sebastian Adam Flinton wrote: --- ## Post #4 by @Adam_Flinton Sebastian Garde wrote: > Hi Adam, > > there is a parser rule in the adl\.jj that ignores the BOM \(ll\. 224\): > > < \* > SKIP : /\* WHITE SPACE \*/ > \{ > " " > > "\\t" > > "\\n" > > "\\r" > > "\\f" > > "\\ufeff" /\* UTF\-8 Byte Order Mark \*/ > \} > > Not sure why this doesn't work for you? > Good question\.\.\.\.\.\. However it doesn't seem to\.\.\.\.Have you tried firing in ADL which have a BOM? Also\.\.\. are you sure "\\ufeff" is correct? Not: "\\u00ef" \(i\.e\. decimal 239\)\. Also the BOM is: > 239 187 191 i\.e\. ||EF BB BF & not just 239| I've integrated the fix into my own ant task though :     // For BOM removal     private static final byte UTF8\_BOM\_1 = \(byte\) 0xef;     private static final byte UTF8\_BOM\_2 = \(byte\) 0xbb;     private static final byte UTF8\_BOM\_3 = \(byte\) 0xbf;     final byte\[\] utf8Bom = \{ UTF8\_BOM\_1, UTF8\_BOM\_2, UTF8\_BOM\_3 \};     byte\[\] bomBuffer = new byte\[utf8Bom\.length\];     long skipL = utf8Bom\.length; <cut>                 InputStream test = getInputStream\(inputFile\);                 InputStream in = getInputStream\(inputFile\);                 OutputStream out = getOutputStream\(outputFile\);                 ADL2XMLConvertor conv = new ADL2XMLConvertor\(\);                 //System\.out\.println\("PCDTO convert\(\) OK to here inputFile = "\+inputFile\);                 boolean skip = checkRemoveBOM\(test\);                 System\.out\.println\("skip = "\+skip\);                 if\(skip\)\{                     System\.out\.println\("skipL = "\+skipL\);                     in\.skip\(skipL\);                 \}                 err = conv\.convert\(in, out\); <cut>     public boolean checkRemoveBOM\(InputStream in\) \{         boolean containsBom = false;         try \{             int nRead = in\.read\(bomBuffer, 0, bomBuffer\.length\);             if \(nRead \!= \-1\) \{                 if \(java\.util\.Arrays\.equals\(bomBuffer, utf8Bom\)\) \{                     // System\.out\.write\(bomBuffer, 0, nRead\);                     System\.out\.println\("Contains BOM"\);                     containsBom = true;                 \}             \}         \} catch \(java\.io\.IOException e\) \{             System\.err\.println\("I/O error occurred: " \+ e\.getMessage\(\)\);         \}         return containsBom;     \} & this works so I can get on with the generation to XML which is the main thing for me\. I have an XML diff task & the Ocean ADL2XMl converter so I am about to run some test runs & then eventually run against the NHS ADL which is in: http://svn.openehr.org/knowledge/archetypes/dev-uk-nhs/adl So I may be coming back with a list of diffs\. TIA Adam --- ## Post #5 by @system Hi Adam, This should be an easy fix. Just include your BOM1,2,3 as part of the SKIP block. Can you forward some archetype that has these BOMs for testcase? Cheers, Rong --- **Canonical:** https://discourse.openehr.org/t/unhandled-exception-parsing-adl/16106 **Original content:** https://discourse.openehr.org/t/unhandled-exception-parsing-adl/16106