On 01/10/13 05:04, Ian Abbott wrote:
I'd prefer the comments to be in UTF-8 without the HTML entities and HTML tags, but the non-comment parts of the files to be restricted to plain-old ASCII. The current HTML mark-up tags seem to have been added around December 1997 or earlier, although there have been URLs in the files since 1996 or earlier. The TZ files pre-date HTML by several years and pre-date UTF-8 by several more years.
The HTML markup has bothered me, too; I have found it more distracting than useful. URLs themselves should be fine, but the <a href='...'> business gets in the way.
I'm not sure how widespread the adoption of UTF-8 text files is in the big, wide world, but I don't suppose we should care as long as the zic compilers don't break and the systems that zic is run on support 8-bit text files.
There still is the problem that people who are editing the files with their own text editors may be hampered. In my normal way of editing text across the network (ssh and LC_ALL=C and emacs -nw), non-ASCII characters are rendered as ugly hexadecimalish strings that are hard to read. I can work around the problem but it is an annoyance.