Re: Non-ASCII encoding

July 1, 2008

      Jonathan Leffler wrote:
...
You've mis-characterized the problem.  UTF-8 doesn't have the quirk -- 
MS operating systems have the quirk.  See: 
http://unicode.org/faq/utf_bom.html#BOM
We can note one of the parting comments in the FAQ:
A particular protocol (e.g. Microsoft conventions for .txt files) may 
require use of the BOM on certain Unicode data streams, such as files. 
When you need to conform to such a protocol, use a BOM.
We can also note that none of the TZ data files are .txt files (because 
they do not have the extension .txt in the file name) - and therefore do 
not need the BOM.  Or a tool can be provided that stuffs a UTF-8 BOM 
(bytes 0xEF 0xBB 0xBF in that sequence) at the start of the file, 
transferring it to the MS format.
I'm in complete agreement that UTF-8 without BOM is the 'correct' solution.

It's worth pointing out that MS Notepad correctly detects and renders 
UTF-8/no BOM as UTF-8; there's just no way to stop it from writing a BOM 
when a file is saved. Thus the only people likely to be affected by 
UTF-8/no BOM are those who download tz files, open and save them in 
Notepad, then pass these files to a BOM-unaware parser. It all seems 
fairly unlikely, and really is down to the user's choice of 'faulty' tools.

Andy

--
FoxClocks

Re: Non-ASCII encoding

Andy McDonald