Jonathan Leffler wrote:
You've mis-characterized the problem. UTF-8 doesn't have the quirk -- MS operating systems have the quirk. See: http://unicode.org/faq/utf_bom.html#BOM
We can note one of the parting comments in the FAQ:
A particular protocol (e.g. Microsoft conventions for .txt files) may require use of the BOM on certain Unicode data streams, such as files. When you need to conform to such a protocol, use a BOM.
We can also note that none of the TZ data files are .txt files (because they do not have the extension .txt in the file name) - and therefore do not need the BOM. Or a tool can be provided that stuffs a UTF-8 BOM (bytes 0xEF 0xBB 0xBF in that sequence) at the start of the file, transferring it to the MS format.
I'm in complete agreement that UTF-8 without BOM is the 'correct' solution. It's worth pointing out that MS Notepad correctly detects and renders UTF-8/no BOM as UTF-8; there's just no way to stop it from writing a BOM when a file is saved. Thus the only people likely to be affected by UTF-8/no BOM are those who download tz files, open and save them in Notepad, then pass these files to a BOM-unaware parser. It all seems fairly unlikely, and really is down to the user's choice of 'faulty' tools. Andy -- FoxClocks