proposed time zone package changes: Brazil; Mauritius; URL fixes

Below find proposed changes to files in the time zone package: * australasia, europe, and southamerica: changes by Halász Sándor Viktor to fix URLs. * africa: changes to reflect Mauritius's 2008-2009 DST experiment. * southamerica and zone.tab: changes to reflect Brazilian time zone realignments. As usual, I plan to let these percolate over a weekend; if no problems are found tzdata2008d.tar.gz should appear on 2008-07-07. --ado diff -r -c old/africa new/africa *** old/africa Mon May 19 17:48:03 2008 --- new/africa Mon Jun 30 12:18:01 2008 *************** *** 1,4 **** ! # @(#)africa 8.11 # <pre> # This data is by no means authoritative; if you think you know better, --- 1,4 ---- ! # @(#)africa 8.12 # <pre> # This data is by no means authoritative; if you think you know better, *************** *** 387,395 **** 0:00 - GMT # Mauritius ! # Zone NAME GMTOFF RULES FORMAT [UNTIL] ! Zone Indian/Mauritius 3:50:00 - LMT 1907 # Port Louis ! 4:00 - MUT # Mauritius Time # Agalega Is, Rodriguez # no information; probably like Indian/Mauritius --- 387,443 ---- 0:00 - GMT # Mauritius ! # From Steffen Thorsen (2008-06-25): ! # Mauritius plans to observe DST from 2008-11-01 to 2009-03-31 on a trial ! # basis. ! # ! # Some information about it from sources there: ! # <a href="http://www.lexpress.mu/display_search_result.php?news_id=109689"> ! # http://www.lexpress.mu/display_search_result.php?news_id=109689 (French) ! # </a> ! # <a href="http://www.mcci.org/readmorechamber.aspx?id=359"> ! # http://www.mcci.org/readmorechamber.aspx?id=359 ! # </a> ! # ! # Based on the articles like those above and contact with government ! # officials, we have written more about it here: ! # <a href="http://www.timeanddate.com/news/time/mauritius-daylight-saving-time.html"> ! # http://www.timeanddate.com/news/time/mauritius-daylight-saving-time.html ! # </a> ! # ! # It seems that Mauritius observed daylight saving time from 1982-10-10 to ! # 1983-03-20 as well, but that was not successful. Current zoneinfo or ! # Shanks/Pottenger do not contain any details about this 1982-83 change. ! ! # From Alex Krivenyshev (2008-06-25): ! # Mauritius plan to introduce Summer Time (DST) on 1st November 2008 has ! # been confirmed on authoritative web site-Government Information ! # Service-Government of Mauritius. ! # ! # Please check out ! # <a href="http://economicdevelopment.gov.mu/portal/site/Mainhomepage/menuitem.a42b24128104d9845dabddd154508a0c/?content_id=0a7cee8b5d69a110VgnVCM1000000a04a8c0RCRD"> ! # http://economicdevelopment.gov.mu/portal/site/Mainhomepage/menuitem.a42b2412... ! # </a> ! # or ! # <a href="http://www.worldtimezone.com/dst_news/dst_news_mauritius01.html"> ! # http://www.worldtimezone.com/dst_news/dst_news_mauritius01.html ! # </a> ! # ! # Thus Summer Time is being introduced on a pilot basis. The clock will ! # be moved forward one hour as from 1st November 2008 to 31st March ! # 2009.... ! ! # From Arthur David Olson (2008-06-30): ! # The www.timeanddate.com article cited by Steffen Thorsen notes that "A ! # final decision has yet to be made on the times that daylight saving ! # would begin and end on these dates." As a place holder, use midnight. ! ! # Rule NAME FROM TO TYPE IN ON AT SAVE LETTER/S ! Rule Mauritius 2008 only - Nov 1 0:00 1:00 S ! Rule Mauritius 2009 only - Apr 1 0:00 0:00 - ! # Zone NAME GMTOFF RULES FORMAT [UNTIL] ! Zone Indian/Mauritius 3:50:00 - LMT 1907 # Port Louis ! 4:00 Mauritius MU%sT # Mauritius Time # Agalega Is, Rodriguez # no information; probably like Indian/Mauritius diff -r -c old/australasia new/australasia *** old/australasia Mon Mar 24 08:30:58 2008 --- new/australasia Mon Jun 30 11:54:56 2008 *************** *** 1,4 **** ! # @(#)australasia 8.8 # <pre> # This file also includes Pacific islands. --- 1,4 ---- ! # @(#)australasia 8.9 # <pre> # This file also includes Pacific islands. *************** *** 1346,1352 **** # * Tonga will introduce DST in November # # I was given this link by John Letts: ! # <a hef="http://news.bbc.co.uk/hi/english/world/asia-pacific/newsid_424000/424764.stm"> # http://news.bbc.co.uk/hi/english/world/asia-pacific/newsid_424000/424764.stm # </a> # --- 1346,1352 ---- # * Tonga will introduce DST in November # # I was given this link by John Letts: ! # <a href="http://news.bbc.co.uk/hi/english/world/asia-pacific/newsid_424000/424764.stm"> # http://news.bbc.co.uk/hi/english/world/asia-pacific/newsid_424000/424764.stm # </a> # *************** *** 1356,1362 **** # (12 + 1 hour DST). # From Arthur David Olson (1999-09-20): ! # According to <a href="http://www.tongaonline.com/news/sept1799.html> # http://www.tongaonline.com/news/sept1799.html # </a>: # "Daylight Savings Time will take effect on Oct. 2 through April 15, 2000 --- 1356,1362 ---- # (12 + 1 hour DST). # From Arthur David Olson (1999-09-20): ! # According to <a href="http://www.tongaonline.com/news/sept1799.html"> # http://www.tongaonline.com/news/sept1799.html # </a>: # "Daylight Savings Time will take effect on Oct. 2 through April 15, 2000 diff -r -c old/europe new/europe *** old/europe Mon Mar 24 08:30:58 2008 --- new/europe Mon Jun 30 11:54:56 2008 *************** *** 1,4 **** ! # @(#)europe 8.12 # <pre> # This data is by no means authoritative; if you think you know better, --- 1,4 ---- ! # @(#)europe 8.13 # <pre> # This data is by no means authoritative; if you think you know better, *************** *** 457,463 **** Rule EU 1981 max - Mar lastSun 1:00u 1:00 S Rule EU 1996 max - Oct lastSun 1:00u 0 - # The most recent directive covers the years starting in 2002. See: ! # <a href="http://europa.eu.int/eur-lex/en/lif/dat/2000/en_300L0084.html" # Directive 2000/84/EC of the European Parliament and of the Council # of 19 January 2001 on summer-time arrangements. # </a> --- 457,463 ---- Rule EU 1981 max - Mar lastSun 1:00u 1:00 S Rule EU 1996 max - Oct lastSun 1:00u 0 - # The most recent directive covers the years starting in 2002. See: ! # <a href="http://europa.eu.int/eur-lex/en/lif/dat/2000/en_300L0084.html"> # Directive 2000/84/EC of the European Parliament and of the Council # of 19 January 2001 on summer-time arrangements. # </a> *************** *** 1099,1104 **** --- 1099,1106 ---- # From Paul Eggert (2003-03-08): # <a href="http://www.parlament-berlin.de/pds-fraktion.nsf/727459127c8b66ee8525662300459099/defc77cb784f180ac1256c2b0030274b/$FILE/bersarint.pdf"> + # http://www.parlament-berlin.de/pds-fraktion.nsf/727459127c8b66ee852566230045... + # </a> # says that Bersarin issued an order to use Moscow time on May 20. # However, Moscow did not observe daylight saving in 1945, so # this was equivalent to CEMT (GMT+3), not GMT+4. diff -r -c old/southamerica new/southamerica *** old/southamerica Mon Mar 24 08:30:58 2008 --- new/southamerica Mon Jun 30 18:37:16 2008 *************** *** 1,4 **** ! # @(#)southamerica 8.19 # <pre> # This data is by no means authoritative; if you think you know better, --- 1,4 ---- ! # @(#)southamerica 8.22 # <pre> # This data is by no means authoritative; if you think you know better, *************** *** 546,551 **** --- 546,595 ---- # Decretos sobre o Horario de Verao no Brasil # </a>. + # From Paul Schulze (2008-06-24): + # ...by law number 11.662 of April 24, 2008 (published in the "Dirio Oficial da Unio"...) + # in Brazil there are changes in the timezones, effective today (00:00am + # at June 24, 2008) as follows: + # + # a) The timezone UTC+5 is e[x]tinguished, with all the Acre state and the + # part of the Amazonas state that had this timezone now being put to the + # timezone UTC+4 + # b) The whole Par state now is put at timezone UTC+3, instead of just + # part of it, as was before. + # + # This change follows a proposal of senator Tio Viana of Acre state, that + # proposed it due to concerns about open television channels displaying + # programs inappropriate to youths in the states that had the timezone + # UTC+5 too early in the night. In the occasion, some more corrections + # were proposed, trying to unify the timezones of any given state. This + # change modifies timezone rules defined in decree 2.784 of 18 June, + # 1913. + + # From Rodrigo Severo (2008-06-24): + # Just correcting the URL: + # <a href="https://www.in.gov.br/imprensa/visualiza/index.jsp?jornal=3Ddo&secao=3D1&pagina=3D1&data=3D25/04/2008"> + # https://www.in.gov.br/imprensa/visualiza/index.jsp?jornal=3Ddo&secao=3D1&pag... + # </a> + # + # As a result of the above Decree I believe the America/Rio_Branco + # timezone shall be modified from UTC-5 to UTC-4 and a new timezone shall + # be created to represent the the west side of the Para State. I + # suggest this new timezone be called Santarem as the most + # important/populated city in the affected area. + # + # This new timezone would be the same as the Rio_Branco timezone up to + # the 2008/06/24 change which would be to UTC-3 instead of UTC-4. + + # From Alex Krivenyshev (2008-06-24): + # This is a quick reference page for New and Old Brazil Time Zones map. + # <a href="http://www.worldtimezone.com/brazil-time-new-old.php"> + # http://www.worldtimezone.com/brazil-time-new-old.php + # </a> + # + # - 4 time zones replaced by 3 time zones-eliminating time zone UTC- 05 + # (state Acre and the part of the Amazonas will be UTC/GMT- 04) - western + # part of Par state is moving to one timezone UTC- 03 (from UTC -04). + # Rule NAME FROM TO TYPE IN ON AT SAVE LETTER/S # Decree <a href="http://pcdsh01.on.br/HV20466.htm">20,466</a> (1931-10-01) # Decree <a href="http://pcdsh01.on.br/HV21896.htm">21,896</a> (1932-01-10) *************** *** 662,674 **** Rule Brazil 2000 2001 - Oct Sun>=8 0:00 1:00 S Rule Brazil 2001 2006 - Feb Sun>=15 0:00 0 - # Decree 4,399 (2002-10-01) repeals DST in AL, CE, MA, PB, PE, PI, RN, SE. ! # <a href="http://www.presidencia.gov.br/CCIVIL/decreto/2002/D4399.htm"></a> Rule Brazil 2002 only - Nov 3 0:00 1:00 S # Decree 4,844 (2003-09-24; corrected 2003-09-26) repeals DST in BA, MT, TO. ! # <a href="http://www.presidencia.gov.br/CCIVIL/decreto/2003/D4844.htm"></a> Rule Brazil 2003 only - Oct 19 0:00 1:00 S # Decree 5,223 (2004-10-01) reestablishes DST in MT. ! # <a href="http://www.planalto.gov.br/ccivil_03/_Ato2004-2006/2004/Decreto/D5223.htm"></a> Rule Brazil 2004 only - Nov 2 0:00 1:00 S # Decree <a href="http://pcdsh01.on.br/DecHV5539.gif">5,539</a> (2005-09-19), # adopted by the same states as before. --- 706,718 ---- Rule Brazil 2000 2001 - Oct Sun>=8 0:00 1:00 S Rule Brazil 2001 2006 - Feb Sun>=15 0:00 0 - # Decree 4,399 (2002-10-01) repeals DST in AL, CE, MA, PB, PE, PI, RN, SE. ! # <a href="http://www.presidencia.gov.br/CCIVIL/decreto/2002/D4399.htm">4,339</a> Rule Brazil 2002 only - Nov 3 0:00 1:00 S # Decree 4,844 (2003-09-24; corrected 2003-09-26) repeals DST in BA, MT, TO. ! # <a href="http://www.presidencia.gov.br/CCIVIL/decreto/2003/D4844.htm">4,844</a> Rule Brazil 2003 only - Oct 19 0:00 1:00 S # Decree 5,223 (2004-10-01) reestablishes DST in MT. ! # <a href="http://www.planalto.gov.br/ccivil_03/_Ato2004-2006/2004/Decreto/D5223.htm">5,223</a> Rule Brazil 2004 only - Nov 2 0:00 1:00 S # Decree <a href="http://pcdsh01.on.br/DecHV5539.gif">5,539</a> (2005-09-19), # adopted by the same states as before. *************** *** 687,693 **** # For dates after mid-2008, the above rules with TO="max" are guesses # and are quite possibly wrong, but are more likely than no DST at all. - # Zone NAME GMTOFF RULES FORMAT [UNTIL] # # Fernando de Noronha (administratively part of PE) --- 731,736 ---- *************** *** 775,782 **** -4:00 - AMT 2004 Oct 1 -4:00 Brazil AM%sT # ! # west Para (PA), Rondonia (RO) ! # West Para includes Altamira, Oribidos, Prainha, Oriximina, and Santarem. Zone America/Porto_Velho -4:15:36 - LMT 1914 -4:00 Brazil AM%sT 1988 Sep 12 -4:00 - AMT --- 818,831 ---- -4:00 - AMT 2004 Oct 1 -4:00 Brazil AM%sT # ! # west Para (PA) ! Zone America/Santarem -3:38:48 - LMT 1914 ! -4:00 Brazil AM%sT 1988 Sep 12 ! -4:00 - AMT 2008 Jun 24 ! -3:00 - BRT ! # ! # Rondonia (RO) ! # Rondonia includes Altamira, Oribidos, Prainha, and Oriximina. Zone America/Porto_Velho -4:15:36 - LMT 1914 -4:00 Brazil AM%sT 1988 Sep 12 -4:00 - AMT *************** *** 808,816 **** # Acre (AC) Zone America/Rio_Branco -4:31:12 - LMT 1914 -5:00 Brazil AC%sT 1988 Sep 12 ! -5:00 - ACT - # Chile # From Eduardo Krell (1995-10-19): --- 857,865 ---- # Acre (AC) Zone America/Rio_Branco -4:31:12 - LMT 1914 -5:00 Brazil AC%sT 1988 Sep 12 ! -5:00 - ACT 2008 Jun 24 ! -4:00 - AMT # Chile # From Eduardo Krell (1995-10-19): diff -r -c old/zone.tab new/zone.tab *** old/zone.tab Mon Mar 24 08:30:59 2008 --- new/zone.tab Mon Jun 30 18:37:16 2008 *************** *** 1,4 **** ! # @(#)zone.tab 8.16 # # TZ zone descriptions # --- 1,4 ---- ! # @(#)zone.tab 8.18 # # TZ zone descriptions # *************** *** 92,98 **** BR -2332-04637 America/Sao_Paulo S & SE Brazil (GO, DF, MG, ES, RJ, SP, PR, SC, RS) BR -2027-05437 America/Campo_Grande Mato Grosso do Sul BR -1535-05605 America/Cuiaba Mato Grosso ! BR -0846-06354 America/Porto_Velho W Para, Rondonia BR +0249-06040 America/Boa_Vista Roraima BR -0308-06001 America/Manaus E Amazonas BR -0640-06952 America/Eirunepe W Amazonas --- 92,99 ---- BR -2332-04637 America/Sao_Paulo S & SE Brazil (GO, DF, MG, ES, RJ, SP, PR, SC, RS) BR -2027-05437 America/Campo_Grande Mato Grosso do Sul BR -1535-05605 America/Cuiaba Mato Grosso ! BR -0226-05452 America/Santarem W Para ! BR -0846-06354 America/Porto_Velho Rondonia BR +0249-06040 America/Boa_Vista Roraima BR -0308-06001 America/Manaus E Amazonas BR -0640-06952 America/Eirunepe W Amazonas

On Mon, Jun 30, 2008 at 7:42 PM, Arthur David Olson < olsona@elsie.nci.nih.gov> wrote:
Below find proposed changes to files in the time zone package: * australasia, europe, and southamerica: changes by Halász Sándor Viktor to fix URLs. * africa: changes to reflect Mauritius's 2008-2009 DST experiment. * southamerica and zone.tab: changes to reflect Brazilian time zone realignments.
As usual, I plan to let these percolate over a weekend; if no problems are found tzdata2008d.tar.gz should appear on 2008-07-07.
The text by Paul Schulze about Brazilian timezones is missing all accented characters. Here is the text with the proper characters: + # From Paul Schulze (2008-06-24): + # ...by law number 11.662 of April 24, 2008 (published in the "Diário Oficial da União"...) + # in Brazil there are changes in the timezones, effective today (00:00am + # at June 24, 2008) as follows: + # + # a) The timezone UTC+5 is e[x]tinguished, with all the Acre state and the + # part of the Amazonas state that had this timezone now being put to the + # timezone UTC+4 + # b) The whole Pará state now is put at timezone UTC+3, instead of just + # part of it, as was before. + # + # This change follows a proposal of senator Tião Viana of Acre state, that + # proposed it due to concerns about open television channels displaying + # programs inappropriate to youths in the states that had the timezone + # UTC+5 too early in the night. In the occasion, some more corrections + # were proposed, trying to unify the timezones of any given state. This + # change modifies timezone rules defined in decree 2.784 of 18 June, + # 1913. Regards, Rodrigo Severo

On 01.07.2008 01:26, Rodrigo Severo wrote:
The text by Paul Schulze about Brazilian timezones is missing all accented characters. Here is the text with the proper characters: I would like to use the opportunity to clarify the question of the encoding of non-ASCII characters in the tzdata files. This is only a minor point because they only occur in the comments but I think it should at least be defined.
In tzdata2008c there seems to be only one non-ASCII character, the accented e in the name José Miguel Garrido in the file southamerica. It is obviously encoded in ISO 8859-1 (Latin1). If more non-ASCII characters are going to be included in the tzdata files, I would like to propose to define UTF-8 as the official encoding of the tzdata files. UTF-8 is widely supported and is a true superset of 7-bit ASCII, so it does not change the encoding of the actual data. I think it is only a question of time until the name of a contributor, a location, or an official publication cannot be properly represented in any single 8-bit encoding. For example, the letter "r" in my surname should really be "ř", "Latin Small Letter R With Caron" (U+0159) which is not part of ISO 8859-1. Best regards Martin Jerabek

-On [20080701 09:30], Martin Jerabek (martin.jerabek@isis-papyrus.com) wrote:
If more non-ASCII characters are going to be included in the tzdata files, I would like to propose to define UTF-8 as the official encoding of the tzdata files.
+1 on that.
For example, the letter "r" in my surname should really be "ř", "Latin Small Letter R With Caron" (U+0159) which is not part of ISO 8859-1. [snip] Martin Jerabek
Funny, you sent your email UTF-8 encoded, but leave your ř as r. ;) -- Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B The rain comes falling down, my life flows to the ground, no longer feeling the pain, my flame now fading away...

-On [20080701 09:30], Martin Jerabek (martin.jerabek@isis-papyrus.com) wrote:
If more non-ASCII characters are going to be included in the tzdata files, I would like to propose to define UTF-8 as the official encoding of the tzdata files.
I'm a bit ambivalent on this one. In principle, I agree. In practice UTF-8 has at least one little quirk which has caused me problems: Microsoft operating systems always start UTF-8 encoded files with a Byte Order Mark (BOM) (http://en.wikipedia.org/wiki/Byte_Order_Mark) *nix-like operating systems never do (at least in my experience) and at least one perl-based xml parser running on Linux chokes on the BOM. So I have a practical preference for the 7-bit subset of UTF-8 with no BOM (of course I would never dream of calling this ASCII ;) If we go for UTF-8 can we be very firm about whether a BOM is required or prohibited and please make sure its not permitted. Julian Cable BBC World Service http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.

On 01.07.2008 10:53, Julian Cable wrote:
So I have a practical preference for the 7-bit subset of UTF-8 with no BOM (of course I would never dream of calling this ASCII ;)
Well, the 7-bit subset of UTF-8 with no BOM *is* ASCII, so we might as well call it ASCII. ;-) Pure 7-bit ASCII would of course be the most portable encoding but in 2008 we should not longer have to deny non-English [1] speakers and countries the correct spelling of their names and places.
If we go for UTF-8 can we be very firm about whether a BOM is required or prohibited and please make sure its not permitted.
Yes, definitely. One of the biggest advantages of UTF-8 is that programs which do not support UTF-8 can usually still process UTF-8-encoded files. There are no embedded zero bytes, and the bytes of a multi-byte character are never equal to 7-bit ASCII characters. If a tzdata file suddenly started with hex EF BB BF, the parser would try to interpret these bytes as the start of a rule, and fail. I understand the tendency of using an encoding mark for Unicode files in the Microsoft world, and it is very useful for UTF-16 and UTF-32, but (1) UTF-8 has only one byte order, and (2) adding it would cause more problems than it is worth. I assume that Windows editors which support UTF-8 can also be manually switched to UTF-8 without the need for a BOM. Best regards Martin Jerabek [1] Yes, there are a few languages other than English whose script only needs 7-bit ASCII.

Martin Jerabek said:
Pure 7-bit ASCII would of course be the most portable encoding but in 2008 we should not longer have to deny non-English [1] speakers and countries the correct spelling of their names and places. [...] [1] Yes, there are a few languages other than English whose script only needs 7-bit ASCII.
Of course, English is not one of them. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 THUS plc | |

Clive D.W. Feather wrote:
[1] Yes, there are a few languages other than English whose script only needs 7-bit ASCII.
Of course, English is not one of them.
It would be naïve to think otherwise. English has a handful of words that need récherché characters. ASCII presents an adequate façade, but my 2¢ worth is that English also benefits from Unicode™. It even needs the odd symbol from outside Latin-1 enough of the time that UTF-8 seems a sensible way to keep our heads above H₂O. :) My only request is to give a bit of warning. Right now, I have the Tcl version of ZIC coded to expect the input files to be Latin-1. It's a one-line change to make them UTF-8, but it's important to know *when* to make it. -- 73 de ke9tv/2, Kevin

I say just let's get over with it and do the change to UTF-8 now. There are no real problems with it, it's only comments, and it is not worth any extra pondering. -- Foreca Ltd Jaakko.Hyvatti@foreca.com Tammasaarenkatu 5, FI-00180 Helsinki, Finland http://www.foreca.com

In message <486AD3BC.1040706@nycap.rr.com>, Kevin Kenny <kkenny2@nycap.rr.com> writes
Clive D.W. Feather wrote:
[1] Yes, there are a few languages other than English whose script only needs 7-bit ASCII. Of course, English is not one of them. It would be naïve to think otherwise. English has a handful of words that need récherché characters. ASCII presents an adequate façade, but my 2¢ worth is that English also benefits from Unicode™.
Indeed, since I don't have an encyclopædiac knowledge of such usages, merely a mental list of examples, I couldn't disagree with you.
My only request is to give a bit of warning. Right now, I have the Tcl version of ZIC coded to expect the input files to be Latin-1. It's a one-line change to make them UTF-8, but it's important to know *when* to make it.
If I've read the updates correctly, all non-ASCII characters have been removed. So this might be a good time to declare that UTF-8 is the rule for the future. -- Clive D.W. Feather | Internet Expert | Work: <clive@demon.net> Tel: +44 20 8495 6138 | Demon Internet | Home: <clive@davros.org> Fax: +44 870 051 9937 | Thus plc | Web: <http://www.davros.org> Please reply to the Reply-To address, which is: <clive@demon.net>

I assume that Windows editors which support UTF-8 can also be manually switched to UTF-8 without the need for a BOM.
notepad seems to open UTF-8 documents with no BOM ok but it puts one back on when it saves it. Julian http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.

On Tue, Jul 1, 2008 at 1:53 AM, Julian Cable <julian.cable@bbc.co.uk> wrote:
-On [20080701 09:30], Martin Jerabek (martin.jerabek@isis-papyrus.com) wrote:
If more non-ASCII characters are going to be included in the tzdata files, I would like to propose to define UTF-8 as the official encoding of the tzdata files.
In principle, I agree. In practice UTF-8 has at least one little quirk which has caused me problems:
Microsoft operating systems always start UTF-8 encoded files with a Byte Order Mark (BOM) (http://en.wikipedia.org/wiki/Byte_Order_Mark)
*nix-like operating systems never do (at least in my experience) and at least one perl-based xml parser running on Linux chokes on the BOM.
You've mis-characterized the problem. UTF-8 doesn't have the quirk -- MS operating systems have the quirk. See: http://unicode.org/faq/utf_bom.html#BOM We can note one of the parting comments in the FAQ: A particular protocol (e.g. Microsoft conventions for .txt files) may require use of the BOM on certain Unicode data streams, such as files. When you need to conform to such a protocol, use a BOM. We can also note that none of the TZ data files are .txt files (because they do not have the extension .txt in the file name) - and therefore do not need the BOM. Or a tool can be provided that stuffs a UTF-8 BOM (bytes 0xEF 0xBB 0xBF in that sequence) at the start of the file, transferring it to the MS format. MS operating systems are wrong - for all they represent a large proportion of the installed o/s out there. I'm not sure how often the Olson data are handled on MS systems (probably more than I'd expect). So, I would recommend that the code set is defined as UTF-8 without BOM in files - and the files can be converted to UTF-8 with BOM (for use) on systems that need the BOM. -- Jonathan Leffler <jonathan.leffler@gmail.com> #include <disclaimer.h> Guardian of DBD::Informix - v2008.0513 - http://dbi.perl.org "Blessed are we who can laugh at ourselves, for we shall never cease to be amused."

Jonathan Leffler wrote:
You've mis-characterized the problem. UTF-8 doesn't have the quirk -- MS operating systems have the quirk. See: http://unicode.org/faq/utf_bom.html#BOM
We can note one of the parting comments in the FAQ:
A particular protocol (e.g. Microsoft conventions for .txt files) may require use of the BOM on certain Unicode data streams, such as files. When you need to conform to such a protocol, use a BOM.
We can also note that none of the TZ data files are .txt files (because they do not have the extension .txt in the file name) - and therefore do not need the BOM. Or a tool can be provided that stuffs a UTF-8 BOM (bytes 0xEF 0xBB 0xBF in that sequence) at the start of the file, transferring it to the MS format.
I'm in complete agreement that UTF-8 without BOM is the 'correct' solution. It's worth pointing out that MS Notepad correctly detects and renders UTF-8/no BOM as UTF-8; there's just no way to stop it from writing a BOM when a file is saved. Thus the only people likely to be affected by UTF-8/no BOM are those who download tz files, open and save them in Notepad, then pass these files to a BOM-unaware parser. It all seems fairly unlikely, and really is down to the user's choice of 'faulty' tools. Andy -- FoxClocks

On Jul 1, 2008, at 9:56 AM, Andy McDonald wrote:
I'm in complete agreement that UTF-8 without BOM is the 'correct' solution.
It's worth pointing out that MS Notepad correctly detects and renders UTF-8/no BOM as UTF-8; there's just no way to stop it from writing a BOM when a file is saved. Thus the only people likely to be affected by UTF-8/no BOM are those who download tz files, open and save them in Notepad, then pass these files to a BOM-unaware parser. It all seems fairly unlikely, and really is down to the user's choice of 'faulty' tools.
+1 Deborah

On 7/2/2008 1:33 AM, Jonathan Leffler wrote:
So, I would recommend that the code set is defined as UTF-8 without BOM in files - and the files can be converted to UTF-8 with BOM (for use) on systems that need the BOM.
+1 Please note that proposed changes messages for review have to be encoded in UTF-8 as well. Masayoshi

Jonathan Leffler: You've mis-characterized the problem. UTF-8 doesn't have the quirk -- MS operating systems have the quirk. See: http://unicode.org/faq/utf_bom.html#BOM [Julian] well - the way I read that FAQ BOMs are explicitly mentioned and allowed, which is why we couldn't beat our supplier up over it and had to code around it. But I think everyone who has so far commented is in agreement that if we use UTF-8 we should "Ban the BOM" (sorry). Julian http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
participants (13)
-
Andy McDonald
-
Arthur David Olson
-
Clive D. W. Feather
-
Clive D.W. Feather
-
Deborah Goldsmith
-
Jaakko Hyvätti
-
Jeroen Ruigrok van der Werven
-
Jonathan Leffler
-
Julian Cable
-
Kevin Kenny
-
Martin Jerabek
-
Masayoshi Okutsu
-
Rodrigo Severo