On 01.07.2008 01:26, Rodrigo Severo wrote:
The text by Paul Schulze about Brazilian timezones is missing all accented characters. Here is the text with the proper characters: I would like to use the opportunity to clarify the question of the encoding of non-ASCII characters in the tzdata files. This is only a minor point because they only occur in the comments but I think it should at least be defined.
In tzdata2008c there seems to be only one non-ASCII character, the accented e in the name José Miguel Garrido in the file southamerica. It is obviously encoded in ISO 8859-1 (Latin1). If more non-ASCII characters are going to be included in the tzdata files, I would like to propose to define UTF-8 as the official encoding of the tzdata files. UTF-8 is widely supported and is a true superset of 7-bit ASCII, so it does not change the encoding of the actual data. I think it is only a question of time until the name of a contributor, a location, or an official publication cannot be properly represented in any single 8-bit encoding. For example, the letter "r" in my surname should really be "ř", "Latin Small Letter R With Caron" (U+0159) which is not part of ISO 8859-1. Best regards Martin Jerabek