There was a lot of discussion about this five years ago. Back then, it was decided that ADL files would always be UTF-8.
The problem is that text editors don't know that we always encode ADL as UTF-8. Without the BOM, there is no way for a text editor to know that it is not in some other encoding. Different text editors behave differently; some of them infer that it's UTF-8, presumably by analysing the byte sequences, and they get it right most of the time; but other text editors assume that it's Latin-1, so they don't handle the ADL correctly.
I am pretty sure that the discussion lead to the conclusion we should always have a BOM, at least hen the archetype is served by CKM.
I guess ~5 years ago, UTF8 was still less accepted and there were problems when you didn’t have the BOM…e.g. people editing the file in a text editor and the non-ASCII characters in non-English languages broke, sometimes without the user noticing and uploading to CKM and creating a character mess.
These problems may be less relevant now, I don’t know, but the conclusion then was that we should always provide the BOM.
While it is obviously not marking a byte order in UTF-8, the BOM is not illegal in UTF8, and provides the information that it is UTF without first having to look through the file or just assume that it is UTF8.
For CKM, it wouldn’t matter much as long as we can safely assume that an uploaded archetype is ALWAYS in UTF8.
I believe we are actually adding the BOM if it is not there at the moment.
When it started to appear in archetypes, presumably added by the Archetype Editor, we actually changed the Java parser to be able to deal with the initial BOM.
There are problems either way, the question is, what is causing less problems.
It is not illegal to have, and text editors can get it wrong if it is not present.
Removing it if it is causing a problem seems to be less problematic to me than messing up characters without realising - e.g. in German there are only some of these problematic characters (ä,ü,ä,ß) and you may not realise the problem immediately.
Thanks for all the info. I haven’t considered what Sebastian said and it makes a lot of sense: BOM is needed to deal with languages that doesn’t use common characters like Russian or Arabic because editors can guess wrong encoding if they find “weird” characters.
For now on I’ll assume that all archetypes downloaded from the CKM have the BOM.