the unit test testParsingWithoutUTF8Encoding fails on my Mac. After trying to make my Mac use UTF-8 by default instead of Mac Roman I gave up. However, i am not sure the test is right. The test should work even with a different default encoding, or not?
public void testParsingWithUTF8Encoding() throws Exception {
try {
ADLParser parser = new ADLParser(loadFromClasspath(
“adl-test-entry.unicode_BOM_support.test.adl”), “UTF-8”);
parser.parse();
} catch(Throwable t) {
fail(“failed to parse BOM with UTF8 encoding..”);
}
}
If the test fails, I believe something is going wrong in the ADLParser as the archetype cannot be parsed.
Something in the ADLParser is going wrong in a Mac environment without UTF-8 default encoding.
I unfortunately don’t own a Mac, and not sure if Rong does, but maybe you can get the error message to us to see what the parser is expecting?
I also notice that the archetype is in Windows format (i.e. using pair of CR and LF characters to terminate lines, whereas Unix uses an LF character only and Mac uses a CR character only.)
Maybe you can convert it to Unix or Mac format and see if that helps (but be sure to keep the invisible Byte order mark (BOM) at the beginning of the file or test may be ok, but not testing what it should test anymore)
the method testParsingWithUTF8Encoding works fine as expected. The only test that fails is testParsingWithoutUTF8Encoding.
----------------- WHAT’s SEEMS TO BE HAPPENING ----------------------
testParsingWithoutUTF8Encoding calls ADLParser with only one argument which calls the constructor
public SimpleCharStream(java.io.InputStream dstream, String encoding, int startline,
int startcolumn, int buffersize)
with parameter encoding == null. In this case InputStreamReader with just one argument is used. In other words, inputStream instance variable of SimpleCharStream uses default encoding for reading. In Macs default encoding is Mac Roman (Java). Instead of using default encoding maybe ADLParser should use UTF-8 (I am not sure if it is right). When ADLParser uses UTF-8 on Mac everything works fine (testParsingWithUTF8Encoding passes).
If “UTF-8” is specified in the parser constructor, the test will be identical to the previous one, testParsingWithUTF8Encoding. So perhaps we should just specify an encoding other than UTF-8, for example “ISO-8859-1”.
there is a third option to consider: when ADLParser is used with only one argument the encoding used can be UTF-8 instead of the one used by the platform. In this case, no change to the tests are needed. But documentation should state clearly that UTF-8 is used when no specific encoding is provided. This is in accordance with “Support Information Model”, page 18, section 3.3.1.1 which states: “… In openEHR, UTF-8 encoding is assumed”.
This is a very good option. It will be nice if the new API wouldn’t break existing code in other components. Perhaps change the original class ADLParser to something else and name the new interface ADLParser with the same kind of constructors. Then in the default constructor of the new class, we instantiate the real parser with UTF encoding.