Time Zone Localizations

older
Re: 64-bit time_t must go--this is...

Mark Davis

June 10, 2004

7:04 p.m.

The common locale data repository project (CLDR) hosted by the Unicode consortium (www.unicode.org/cldr/) provides for translations of time zone IDs, based on the public domain time zone database at ftp://elsie.nci.nih.gov/pub/. A number of issues have come up concerning those translations, and we have put together a proposal for changing the way that is done. The goal would be to make changes in CLDR 1.1, which would be released around mid-October of this year. The current version of the proposal is at: http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim... I'd very much appreciate any feedback on the proposal. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

Show replies by date

Paul Eggert

June 2004

11:19 p.m.

"Mark Davis" <mark.davis@jtcsv.com> writes:

...

http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim...

This web page contains several URLs that I can't follow. For example, <http://snjgsa.ibm.com/~marked/public/zones/zone_log.html> doesn't work for me, since the name snjgsa.ibm.com doesn't resolve. Can you please fix this? Thanks.

Mark Davis

11:43 p.m.

Thanks for pointing this out. My apologies: the links were incorrect. It should now work. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Paul Eggert" <eggert@CS.UCLA.EDU> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Thu, 2004 Jun 10 16:19 Subject: Re: Time Zone Localizations

...

"Mark Davis" <mark.davis@jtcsv.com> writes:

...
http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim...

This web page contains several URLs that I can't follow. For example, <http://snjgsa.ibm.com/~marked/public/zones/zone_log.html> doesn't work for me, since the name snjgsa.ibm.com doesn't resolve. Can you please fix this? Thanks.

Guy Harris

1:11 a.m.

On Jun 10, 2004, at 12:04 PM, Mark Davis wrote:

...

I'd very much appreciate any feedback on the proposal.

Some of the countries listed as missing zones are: Bouvet Island - an uninhabited volcanic island, almost entirely covered by glaciers, controlled by Norway, and designated as a nature reserve, according to http://www.cia.gov/cia/publications/factbook/geos/bv.html I don't know if the automated meteorological station on the island cares about time zones or not. Heard Island and McDonald Islands - uninhabited, barren, sub-Antarctic islands now controlled by Australia, designated as a nature preserve, according to http://www.cia.gov/cia/publications/factbook/geos/hm.html They don't even mention any automated meteorological stations, just seals and birds. Yugoslavia - it's now Serbia and Montenegro. Europe/Belgrade is the correct zone for it. Some of the time zones listed as missing countries are: Europe/Belgrade: Serbia and Montenegro, which has the ISO 3166-1 Alpha-2 code CS, according to http://www.iso.org/iso/en/prods-services/iso3166ma/01whats-new/2003 -07-23_statement_cs.html Asia/Riyadh{87,88,89}: Saudi Arabia, SA - those are historical, from an era when Saudi Arabia used solar time, and apply only to Riyadh (and, if you're really fussy, to a particular location in Riyadh, I guess), so they're not appropriate for Saudi Arabia as a whole. I don't know what names you'd give them. Etc/GMT{[+-]N} are just for fixed GMT offsets; they don't correspond to countries. WET, CET, MET, and EET "are for backward compatibility with older versions"; various Europe/XXX rules should presumably be used instead - I guess you could pick cities for each of them.

Mark Davis

1:40 a.m.

Thanks for your feedback.

...

Bouvet Island - an uninhabited volcanic island, almost entirely ... Etc/GMT{[+-]N} are just for fixed GMT offsets; they don't correspond to countries.

Yes, we realize that Bouvet Island and Heard Island and McDonald Islands are completely obscure places; it is more for a matter of API/testing completeness. Understood that Etc/GMT... don't correspond to countries. But in an API and for translation, it is useful to have everything attached to a country, even if it is a pseudo-country. That's why the suggestion in the document is to use ZZ for them, which is a private-use ISO country code, which can be translated as "no country". As to Yugoslavia, that is a real mess, because the ISO committee just doesn't care about stability of identifiers. You can have a database set up with someone's country of birth stored as CS. All of a sudden by some whim of ISO, that data is invalidated. More on that at http://www.unicode.org/consortium/utc-positions.html#2stability.

...

Asia/Riyadh{87,88,89}: Saudi Arabia, SA - those are historical, from an era when Saudi Arabia used solar time, and apply only to Riyadh (and, if you're really fussy, to a particular location in Riyadh, I guess), so they're not appropriate for Saudi Arabia as a whole. I don't know what names you'd give them. ... WET, CET, MET, and EET "are for backward compatibility with older versions"; various Europe/XXX rules should presumably be used instead - I guess you could pick cities for each of them.

For these, I guess my recommendation would be to not bother translating them at all -- they are all compatibility orphans, one wouldn't encourage their use. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Guy Harris" <guy@alum.mit.edu> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Thu, 2004 Jun 10 18:11 Subject: Re: Time Zone Localizations

...

On Jun 10, 2004, at 12:04 PM, Mark Davis wrote:

...
I'd very much appreciate any feedback on the proposal.

Some of the countries listed as missing zones are:

Bouvet Island - an uninhabited volcanic island, almost entirely covered by glaciers, controlled by Norway, and designated as a nature reserve, according to

http://www.cia.gov/cia/publications/factbook/geos/bv.html

I don't know if the automated meteorological station on the island cares about time zones or not.

Heard Island and McDonald Islands - uninhabited, barren, sub-Antarctic islands now controlled by Australia, designated as a nature preserve, according to

http://www.cia.gov/cia/publications/factbook/geos/hm.html

They don't even mention any automated meteorological stations, just seals and birds.

Yugoslavia - it's now Serbia and Montenegro. Europe/Belgrade is the correct zone for it.

Some of the time zones listed as missing countries are:

Europe/Belgrade: Serbia and Montenegro, which has the ISO 3166-1 Alpha-2 code CS, according to

http://www.iso.org/iso/en/prods-services/iso3166ma/01whats-new/2003 -07-23_statement_cs.html

Asia/Riyadh{87,88,89}: Saudi Arabia, SA - those are historical, from an era when Saudi Arabia used solar time, and apply only to Riyadh (and, if you're really fussy, to a particular location in Riyadh, I guess), so they're not appropriate for Saudi Arabia as a whole. I don't know what names you'd give them.

Etc/GMT{[+-]N} are just for fixed GMT offsets; they don't correspond to countries.

WET, CET, MET, and EET "are for backward compatibility with older versions"; various Europe/XXX rules should presumably be used instead - I guess you could pick cities for each of them.

Guy Harris

2:06 a.m.

On Jun 10, 2004, at 6:40 PM, Mark Davis wrote:

...

As to Yugoslavia, that is a real mess, because the ISO committee just doesn't care about stability of identifiers. You can have a database set up with someone's country of birth stored as CS. All of a sudden by some whim of ISO, that data is invalidated. More on that at http://www.unicode.org/consortium/utc-positions.html#2stability.

Yes, but the Europe/Belgrade zone isn't missing a country (the country happens to have changed, and its name and 3166 country code changed as a result, but it's currently a zone for Serbia and Montenegro), and Yugoslavia either 1) isn't a country any more, and thus can't have a zone or 2) is now called "Serbia and Montenegro" and thus has the Europe/Belgrade zone. I.e., it's not that Europe/Belgrade is missing a country, it's that it's missing a country with a stable ISO 3166 country code, and it's not that Yugoslavia is missing a zone, it's that it's not called "Yugoslavia" any more. The only way I'd see that as being an issue would be if, in the proposed lookup mechanism, a "country" is identified by its Alpha-2 ISO 3166-1 code; that's the best-known code, and looks as if it might be the only one for which you don't have to pay the ISO a significant amount of money for, but if Alpha-3 or the numeric code doesn't have the stability problems that Alpha-2 has, perhaps, if a unique identifier for countries is needed, you could use that. (Or use some special internal codes for Serbia and Montenegro and Czechoslovakia that aren't two-letter codes.) I.e., it's a mess, but it's not sufficient of a mess to make it impossible to talk about times and time zones in Serbia and Montenegro in CLDR, or to speak of the locale for the region corresponding to Europe/Belgrade, or to use "ZZ" as the country code for "Europe/Belgrade".

...

...
Asia/Riyadh{87,88,89}: Saudi Arabia, SA - those are historical, from an era when Saudi Arabia used solar time, and apply only to Riyadh (and, if you're really fussy, to a particular location in Riyadh, I guess), so they're not appropriate for Saudi Arabia as a whole. I don't know what names you'd give them. ... WET, CET, MET, and EET "are for backward compatibility with older versions"; various Europe/XXX rules should presumably be used instead - I guess you could pick cities for each of them.

For these, I guess my recommendation would be to not bother translating them at all -- they are all compatibility orphans, one wouldn't encourage their use.

Asia/Riyadh{87,88,89} aren't compatibility orphans; the corresponding country is Saudi Arabia, we just don't happen to have time zone information for the entire country prior to their adopting a standard time zone - and, according to mail from Paul Eggert: http://www.imc.org/ietf-calendar/archive1/msg02618.html "Nobody uses solar time any more for civil time. The last holdout was Saudi Arabia; they converted to UTC+3 in 1950." so I'm not sure why we even still have the "solar{87,88,89}" files any more, as that seems to imply that they *didn't* use solar time in 1987, 1988, or 1989.

Paul Eggert

6:13 a.m.

Guy Harris <guy@alum.mit.edu> writes:

...

"Nobody uses solar time any more for civil time. The last holdout was Saudi Arabia; they converted to UTC+3 in 1950."

so I'm not sure why we even still have the "solar{87,88,89}" files any more, as that seems to imply that they *didn't* use solar time in 1987, 1988, or 1989.

I think those files are there more as of a proof-of-concept than anything else. As far as I can determine, the Saudis use UTC+3 for all civil time, and have done so since 1950. For religious purposes they use direct astronomical observations to determine key times. In practice, many of these observations have turned out to be incorrect, which adds to the fun. The rest of this message isn't all that relevant to times, but it is relevant to anybody wanting to internationalize for Saudi Arabia. Saudi Arabia is different in many ways. Their religious calendar differs from their main civil calendar. And their main civil calendar not only differs from the Gregorian calendar: it also differs from the calendars used by Muslims outside the Arabian peninsula. To make things even more interesting, they changed their civil calendrical system in 1999 and again in 2002. One more thing: the Saudis use a solar zodiacal calendar for some civil purposes (the fiscal year, and their National Day holiday). Reference: Robert Harry van Gent The Umm al-Qura Calendar of Saudi Arabia (2003-09-11) <http://www.phys.uu.nl/~vgent/islam/mecca/ummalqura.htm>

Mark Davis

2:30 p.m.

I'm afraid I left the wrong impression, by being too brief. I agree with your statements: one can do two possible things: // maintaining stability YU => Europe/Belgrade CS => Europe/Prague or // not maintaining stability YU => Europe/Belgrade CS => Europe/Belgrade Note that the draft successor to RFC3066 uses the UN geographic codes whenever ISO duplicates a code, since unlike the ISO codes, the UN codes are stable. So it maps 891 => Europe/Belgrade // 891 is Serbia and Montenegro Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Guy Harris" <guy@alum.mit.edu> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Thu, 2004 Jun 10 19:06 Subject: Re: Time Zone Localizations

...

On Jun 10, 2004, at 6:40 PM, Mark Davis wrote:

...
As to Yugoslavia, that is a real mess, because the ISO committee just doesn't care about stability of identifiers. You can have a database set up with someone's country of birth stored as CS. All of a sudden by some whim of ISO, that data is invalidated. More on that at http://www.unicode.org/consortium/utc-positions.html#2stability.

Yes, but the Europe/Belgrade zone isn't missing a country (the country happens to have changed, and its name and 3166 country code changed as a result, but it's currently a zone for Serbia and Montenegro), and Yugoslavia either

1) isn't a country any more, and thus can't have a zone

or

2) is now called "Serbia and Montenegro" and thus has the Europe/Belgrade zone.

I.e., it's not that Europe/Belgrade is missing a country, it's that it's missing a country with a stable ISO 3166 country code, and it's not that Yugoslavia is missing a zone, it's that it's not called "Yugoslavia" any more.

The only way I'd see that as being an issue would be if, in the proposed lookup mechanism, a "country" is identified by its Alpha-2 ISO 3166-1 code; that's the best-known code, and looks as if it might be the only one for which you don't have to pay the ISO a significant amount of money for, but if Alpha-3 or the numeric code doesn't have the stability problems that Alpha-2 has, perhaps, if a unique identifier for countries is needed, you could use that. (Or use some special internal codes for Serbia and Montenegro and Czechoslovakia that aren't two-letter codes.)

I.e., it's a mess, but it's not sufficient of a mess to make it impossible to talk about times and time zones in Serbia and Montenegro in CLDR, or to speak of the locale for the region corresponding to Europe/Belgrade, or to use "ZZ" as the country code for "Europe/Belgrade".

...
...
Asia/Riyadh{87,88,89}: Saudi Arabia, SA - those are historical, from an era when Saudi Arabia used solar time, and apply only to Riyadh (and, if you're really fussy, to a particular location in Riyadh, I guess), so they're not appropriate for Saudi Arabia as a whole. I don't know what names you'd give them. ... WET, CET, MET, and EET "are for backward compatibility with older versions"; various Europe/XXX rules should presumably be used instead - I guess you could pick cities for each of them.

For these, I guess my recommendation would be to not bother translating them at all -- they are all compatibility orphans, one wouldn't encourage their use.

Asia/Riyadh{87,88,89} aren't compatibility orphans; the corresponding country is Saudi Arabia, we just don't happen to have time zone information for the entire country prior to their adopting a standard time zone - and, according to mail from Paul Eggert:

http://www.imc.org/ietf-calendar/archive1/msg02618.html

"Nobody uses solar time any more for civil time. The last holdout was Saudi Arabia; they converted to UTC+3 in 1950."

so I'm not sure why we even still have the "solar{87,88,89}" files any more, as that seems to imply that they *didn't* use solar time in 1987, 1988, or 1989.

Clive D.W. Feather

2:36 p.m.

Mark Davis said:

...

Note that the draft successor to RFC3066

Can you give me a pointer to that?

...

uses the UN geographic codes whenever ISO duplicates a code, since unlike the ISO codes, the UN codes are stable.

Mark Davis

3:26 p.m.

Hmmm. If the UN codes are not stable either, then we have even more issues to deal with. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Clive D.W. Feather" <clive@demon.net> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: "Guy Harris" <guy@alum.mit.edu>; <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 07:36 Subject: Re: Time Zone Localizations

...

Mark Davis said:

...
Note that the draft successor to RFC3066

Can you give me a pointer to that?

...
uses the UN geographic codes whenever ISO duplicates a code, since unlike the ISO codes, the UN codes are stable.

Except when they aren't. Like Ethiopia or Panama.

-- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

Infoman

5:22 p.m.

Mark, The issue for a standard for a systematic approach to change management for codes in coded domains is one I have been advocating and working on for a number of years. Unfortunately, it always gets mix-up as part of agenda's of others. regards - Jake Knoppers -----Original Message----- From: Mark Davis [mailto:mark.davis@jtcsv.com] Sent: June 11, 2004 11:27 AM To: Clive D.W. Feather Cc: Guy Harris; tz@lecserver.nci.nih.gov Subject: Re: Time Zone Localizations Hmmm. If the UN codes are not stable either, then we have even more issues to deal with. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Clive D.W. Feather" <clive@demon.net> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: "Guy Harris" <guy@alum.mit.edu>; <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 07:36 Subject: Re: Time Zone Localizations

...

Mark Davis said:

...
Note that the draft successor to RFC3066

Can you give me a pointer to that?

...
uses the UN geographic codes whenever ISO duplicates a code, since unlike the ISO codes, the UN codes are stable.

Except when they aren't. Like Ethiopia or Panama.

-- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

jcowan＠reutershealth.com

3:40 p.m.

Clive D.W. Feather scripsit:

...

...
uses the UN geographic codes whenever ISO duplicates a code, since unlike the ISO codes, the UN codes are stable.

Except when they aren't. Like Ethiopia or Panama.

They are not stable in the sense of never changing. They are stable in the sense of never being reassigned, unlike both the 2-alphas and the 3-alphas. -- Some people open all the Windows; John Cowan wise wives welcome the spring jcowan@reutershealth.com by moving the Unix. http://www.reutershealth.com --ad for Unix Book Units (U.K.) http://www.ccil.org/~cowan (see http://cm.bell-labs.com/cm/cs/who/dmr/unix3image.gif)

Clive D.W. Feather

11:36 a.m.

jcowan@reutershealth.com said:

...

...
...
uses the UN geographic codes whenever ISO duplicates a code, since unlike the ISO codes, the UN codes are stable.

...

They are not stable in the sense of never changing. They are stable in the sense of never being reassigned, unlike both the 2-alphas and the 3-alphas.

Infoman

12:17 p.m.

ISO rules in ISO 3601-1 allow for the re-assignment of 2-alpha codes, ten years after a previously assigned code was withdrawn. regards - Jake -----Original Message----- From: Clive D.W. Feather [mailto:clive@demon.net] Sent: June 15, 2004 7:36 AM To: jcowan@reutershealth.com Cc: tz@lecserver.nci.nih.gov Subject: Re: Time Zone Localizations jcowan@reutershealth.com said:

...

...
...
uses the UN geographic codes whenever ISO duplicates a code, since unlike the ISO codes, the UN codes are stable.

...

They are not stable in the sense of never changing. They are stable in the sense of never being reassigned, unlike both the 2-alphas and the 3-alphas.

jcowan＠reutershealth.com

9:25 p.m.

Clive D.W. Feather scripsit:

...

jcowan@reutershealth.com said:

...
...
...
uses the UN geographic codes whenever ISO duplicates a code, since unlike the ISO codes, the UN codes are stable.

...
They are not stable in the sense of never changing. They are stable in the sense of never being reassigned, unlike both the 2-alphas and the 3-alphas.

You hope. What guarantee do you have?

There are no guarantees in life. The U.N. has a formal policy that the numbers don't get reassigned unless the national territory changes substantially (Serbia & Montenegro has a new number compared to Yugoslavia) and don't get reused. -- "How they ever reached any conclusion at all jcowan@reutershealth.com> is starkly unknowable to the human mind." http://www.reutershealth.com --"Backstage Lensman", Randall Garrett http://www.ccil.org/~cowan

Clive D.W. Feather

5:39 a.m.

Mark Davis said:

...

Understood that Etc/GMT... don't correspond to countries. But in an API and for translation, it is useful to have everything attached to a country, even if it is a pseudo-country. That's why the suggestion in the document is to use ZZ for them, which is a private-use ISO country code, which can be translated as "no country".

I wouldn't use ZZ, because it's useful for AA and ZZ to lie outside the range of codes used. Rather, use one of the other private-use codes - isn't OO available? If I've misremembered that, then certainly QQ and QZ are. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

Infoman

11:33 a.m.

IN ISO 3166-1, in the 3-digit numeric codes, the 900+-999 range is available for private use. Some of these are already "taken", i.e. in widespread use. I note that banking and financial services (ISO TC68) standardards for financial transactions do use the 3-digit numeric code because it is the most stable. Attached is a a document which may be of interest. Comments welcome on the proposed default conventions. With respect to pseudo-countries, one should note that near 25% of the entities in ISO 3166-1 are not "countries", i.e. UN member nation states. I will forward some information on this next week. I am involved in a JTC1/SC32/WG1 eBusiness standard pertaining to "jurisdictional domains". Here date/time referencing and localization requirements are importnat. regards - Jake Knoppers -----Original Message----- From: Clive D.W. Feather [mailto:clive@demon.net] Sent: June 11, 2004 1:40 AM To: Mark Davis Cc: Guy Harris; tz@lecserver.nci.nih.gov Subject: Re: Time Zone Localizations Mark Davis said:

...

Understood that Etc/GMT... don't correspond to countries. But in an API and for translation, it is useful to have everything attached to a country, even if it is a pseudo-country. That's why the suggestion in the document is to use ZZ for them, which is a private-use ISO country code, which can be translated as "no country".

Clive D.W. Feather

11:42 a.m.

Infoman said:

...

IN ISO 3166-1, in the 3-digit numeric codes, the 900+-999 range is available for private use. Some of these are already "taken", i.e. in widespread use. I note that banking and financial services (ISO TC68) standardards for financial transactions do use the 3-digit numeric code because it is the most stable.

The numeric codes serve a different purpose to the alphabetic ones - they identify physical territory. So when a country changes name and therefore alpha code, the number stays the same. But when the territory changes (e.g. absorption of East by West Germany) the number changes even if the alpha code stays the same. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

Infoman

12:06 p.m.

Clive, Actually the numbers stay the same, Russia retains the number for the Soviet Union and united Germany retained the number for West-Germany. The perspective here is that of a jurisdictional nature. Even though the physical territory may change, the jurisdictional entity remains. East-Germany stopped being a member of the UN. With the break-up of the Soviet Union, Russia continued with the same numeric code, others codes were added when parts of the SU became UN recognized member states ( goes via the Security Council). The same happened re Ethiopia (& Eritrea), Yugoslavia, etc. The 3 digit numeric code and the 3-alpha code belong to are under the control of the UN and maintained by the UN Statistical system. The the official long and short names of each country are under the control of each country. However changes in names or names for "new" countries are vetted by the UN. Normally proposed changes do go through "automatically". Only rarely does a UN member object (e.g. as is the current case with "Macedonia). The ISO 3166-1 standard takes this UN originated information and then adds/manages the 2-alpha codes. trust that this is of some help - Jake Knoppers -----Original Message----- From: Clive D.W. Feather [mailto:clive@demon.net] Sent: June 11, 2004 7:43 AM To: Infoman Cc: Mark Davis; Guy Harris; tz@lecserver.nci.nih.gov Subject: Re: Time Zone Localizations Infoman said:

...

IN ISO 3166-1, in the 3-digit numeric codes, the 900+-999 range is available for private use. Some of these are already "taken", i.e. in widespread use. I note that banking and financial services (ISO TC68) standardards for financial transactions do use the 3-digit numeric code because it is the most stable.

Clive D.W. Feather

12:49 p.m.

Infoman said:

...

Actually the numbers stay the same, Russia retains the number for the Soviet Union and united Germany retained the number for West-Germany. [...] The same happened re Ethiopia (& Eritrea), Yugoslavia, etc.

Not so. On 1990-10-30: DD DDR 278 withdrawn (3166-3 entry DDDE) DE DEU 280 replaced DE DEU 276 added New numeric code for the same political entity (Federal Germany). On 1992-06-15: SU SUN 810 withdrawn (3166-3 entry SUHH) RU RUS 643 added (and others, e.g. KZ KAZ 398) New numeric code because Russia was not the USSR. On 1990-08-14: YD YMD 720 withdrawn (3166-3 entry YDYE) YE YEM 886 replaced YE YEM 887 added Merged Yemen got a new number. On 1993-07-22: PA PAN 590 replaced PA PAN 591 added Panama got a new number when it absorbed the Canal Zone (which didn't have one). On 1993-07-16: ET ETH 230 replaced ET ETH 231 added ER ERI 232 added Ethiopia got a new number when Eritrea split off. On 1993-07-28: YU YUG 890 replaced YU YUG 891 added Yugoslavia got a new number to recognise that it was really just Serbia and Montenegro (and this number transferred to the new CS SCG code).

...

The perspective here is that of a jurisdictional nature. Even though the physical territory may change, the jurisdictional entity remains.

And the numeric code changes - see above.

...

The 3 digit numeric code and the 3-alpha code belong to are under the control of the UN and maintained by the UN Statistical system. The the official long and short names of each country are under the control of each country. However changes in names or names for "new" countries are vetted by the UN. Normally proposed changes do go through "automatically". Only rarely does a UN member object (e.g. as is the current case with "Macedonia).

The ISO 3166-1 standard takes this UN originated information and then adds/manages the 2-alpha codes.

If you look at the ISO3166MA web site, you'll see that only the numeric codes come from the UN Statistics Division. I can't find my copy of 3166, but I believe you'll find this distinction between alpha and numeric codes in it. I believe you'll also find there is a 4-digit code as well, which is derived from the two letter code by the calculation 1070+30a+b, where a represents the first letter and b the second letter (A=1, B=2, etc.). In this code, AF is 1106 and GB is 1282. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | |

Infoman

2:35 p.m.

Clive, Thanks very much. My apologies and I stand to be corrected. The information that I sent was based on that provided by the UN people in their Statistical Division. It is now obvious that whomever, I spoke to and corresponded with three years ago, is not as precise as the "lord of time". Once again thanks for the correction - Jake Knoppers -----Original Message----- From: Clive D.W. Feather [mailto:clive@demon.net] Sent: June 11, 2004 8:49 AM To: Infoman Cc: Mark Davis; Guy Harris; tz@lecserver.nci.nih.gov Subject: Re: Time Zone Localizations Infoman said:

...

Actually the numbers stay the same, Russia retains the number for the Soviet Union and united Germany retained the number for West-Germany. [...] The same happened re Ethiopia (& Eritrea), Yugoslavia, etc.

...

The perspective here is that of a jurisdictional nature. Even though the physical territory may change, the jurisdictional entity remains.

And the numeric code changes - see above.

...

The 3 digit numeric code and the 3-alpha code belong to are under the control of the UN and maintained by the UN Statistical system. The the official long and short names of each country are under the control of each country. However changes in names or names for "new" countries are vetted by the UN. Normally proposed changes do go through "automatically". Only rarely does a UN member object (e.g. as is the current case with "Macedonia).

The ISO 3166-1 standard takes this UN originated information and then adds/manages the 2-alpha codes.

Clive D.W. Feather

2:37 p.m.

Infoman said:

...

Thanks very much. My apologies and I stand to be corrected.

You're welcome. Can you prod the people who wrote that PDF you circulated earlier? They'll take more notice of you than me.

...

The information that I sent was based on that provided by the UN people in their Statistical Division. It is now obvious that whomever, I spoke to and corresponded with three years ago, is not as precise as the "lord of time".

Infoman

5:02 p.m.

Clive et al, I received some other comments and feedback on JTC1N7335. My intention is to (together with David Clemis) draft a new version and include comments and corrections. This should be completed before end of July, 2004. Those which you and others may have including the one already made will be incorporated. merci - Jake Knoppers -----Original Message----- From: Clive D.W. Feather [mailto:clive@demon.net] Sent: June 11, 2004 10:38 AM To: Infoman Cc: Mark Davis; Guy Harris; tz@lecserver.nci.nih.gov Subject: Re: Time Zone Localizations Infoman said:

...

Thanks very much. My apologies and I stand to be corrected.

You're welcome. Can you prod the people who wrote that PDF you circulated earlier? They'll take more notice of you than me.

...

The information that I sent was based on that provided by the UN people in their Statistical Division. It is now obvious that whomever, I spoke to and corresponded with three years ago, is not as precise as the "lord of time".

Chuck Soper

7:21 a.m.

I think that the tables listed at this link could be revised to improve clarity: http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/zon... Are these tables informational or vital to the process? For example, will the Aliases table be used to "Canonicalize the Olson ID" (step 1 of the Fallback procedure)? If so, then the table should be based on the current time zone data files instead of "current Java data". This could turn into a serious maintenance issue. Ideally, I think that a script could be run on the time zone data files to generate a new aliases table each time the time zone data files are updated (currently tzdata2004a). ### Aliases The Aliases table appears to contain time zone names from several different tzdata2004a files: backward, etcetera, and systemv. If you use several different tables such as backward aliases, etcetera aliases, and systemv aliases then I expect that they would be easier to maintain as the time zone database is updated. I suggest avoiding terms like 'Bogus' and 'Real' because they're not very descriptive. Here are some possible new terms: 'Bogus' to 'Obsolete Olson ID from tzdata2004a/backward' 'Real' to 'Valid Olson ID as of tzdata2004a' The 'Bogus' column in the Aliases table contains both obsolete and valid time zone names. Antarctica/South_Pole and America/Shiprock are valid; they are both listed in the tzdata2004a/zone.tab file. These files correspond to (are equal to) Antarctica/McMurdo and America/Denver respectively but that doesn't necessarily mean that they are invalid or bogus. Where did AET (line 2, Aliases table) for Australia/Sydney come from? I can't find any reference to it in tzdata2004a. ### Country to Zones The 'Country to Zones' looks good, yet it contains at least one obsolete country code, YU for Yugoslavia. I realize that this is a known issue. You can use the tzdata2004a/zone.tab file to generate an updated 'Country to Zones' table. For obsolete ISO 3166 country codes such as YU I think that ISO 3166-3 could be referenced. ISO 3166-3 represents codes for formerly used names of countries: http://www.niso.org/standards/resources/3166.html http://www.iso.org/iso/en/prods-services/iso3166ma/04background-on-iso-3166/... Each ISO 3166-3 entry has several fields including a four letter code where the first two letters are the formerly used code and the last two letters are the code that replaced it. The four letter code for Yugoslavia is 'YUCS'. The ISO web site says that ISO 3166-3 was first published in 1998 (or maybe 1999), but I cannot find the original document. Yes, it would be ironic if the standard for formerly used names is formerly used. Should CLDR (or the time zone database) maintain a list of formerly used country codes? This would be similar to the backward file to maintain obsolete time zone names. ### Countries that are missing Zones Could this table be renamed to 'Country Codes that do not have Zones'? Of course, Yugoslavia shouldn't be in the table. ### Zones that are missing Countries Could this table be renamed to "Zones that do not map to specific Countries"? Europe/Belgrade is in this list, yet it's listed in the tzdata2004a/zone.tab file with a country code of CS. I suppose this is a know issue related to the YU country code for Yugoslavia. I don't think that Asia/Riyadh87, Asia/Riyadh88, and Asia/Riyadh89 belong in this list. They correspond to the tzdata2004a files: solar87, solar88, and solar89. Should they go in the alias table(s)? ### Windows IDs This table seems to imply that there is a one-to-one relationship between Olson IDs and Windows IDs. Since you only have 75 Windows IDs listed, I suspect that one Windows ID may map to one or more Olson IDs. ### Equivalent Modern Zones My first reaction whenever I see a significant effort to simplify something for the user is to think that it could lead to problems. How did you determine that "A lot of people just don't care about historic differences"? Do most people use time zones merely to keep track of the current time or current differences between another time zone? How else do users use time zones? That's all for now. I hope that my comments are useful. Chuck At 6:40 PM -0700 6/10/04, Mark Davis wrote:

...

Thanks for your feedback.

...
Bouvet Island - an uninhabited volcanic island, almost entirely ... Etc/GMT{[+-]N} are just for fixed GMT offsets; they don't correspond to countries.

Yes, we realize that Bouvet Island and Heard Island and McDonald Islands are completely obscure places; it is more for a matter of API/testing completeness. Understood that Etc/GMT... don't correspond to countries. But in an API and for translation, it is useful to have everything attached to a country, even if it is a pseudo-country. That's why the suggestion in the document is to use ZZ for them, which is a private-use ISO country code, which can be translated as "no country".

As to Yugoslavia, that is a real mess, because the ISO committee just doesn't care about stability of identifiers. You can have a database set up with someone's country of birth stored as CS. All of a sudden by some whim of ISO, that data is invalidated. More on that at http://www.unicode.org/consortium/utc-positions.html#2stability.

...
Asia/Riyadh{87,88,89}: Saudi Arabia, SA - those are historical, from an era when Saudi Arabia used solar time, and apply only to Riyadh (and, if you're really fussy, to a particular location in Riyadh, I guess), so they're not appropriate for Saudi Arabia as a whole. I don't know what names you'd give them. ... WET, CET, MET, and EET "are for backward compatibility with older versions"; various Europe/XXX rules should presumably be used instead - I guess you could pick cities for each of them.

For these, I guess my recommendation would be to not bother translating them at all -- they are all compatibility orphans, one wouldn't encourage their use.

Mark __________________________________ http://www.macchiato.com ? '¤÷ËÕ¤½Ë¼·¬Ë»¦ÕÃË ?

----- Original Message ----- From: "Guy Harris" <guy@alum.mit.edu> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Thu, 2004 Jun 10 18:11 Subject: Re: Time Zone Localizations

...
On Jun 10, 2004, at 12:04 PM, Mark Davis wrote:

...
I'd very much appreciate any feedback on the proposal.

Some of the countries listed as missing zones are:

Bouvet Island - an uninhabited volcanic island, almost entirely covered by glaciers, controlled by Norway, and designated as a nature reserve, according to

http://www.cia.gov/cia/publications/factbook/geos/bv.html

I don't know if the automated meteorological station on the island cares about time zones or not.

Heard Island and McDonald Islands - uninhabited, barren, sub-Antarctic islands now controlled by Australia, designated as a nature preserve, according to

http://www.cia.gov/cia/publications/factbook/geos/hm.html

They don't even mention any automated meteorological stations, just seals and birds.

Yugoslavia - it's now Serbia and Montenegro. Europe/Belgrade is the correct zone for it.

Some of the time zones listed as missing countries are:

Europe/Belgrade: Serbia and Montenegro, which has the ISO 3166-1 Alpha-2 code CS, according to

http://www.iso.org/iso/en/prods-services/iso3166ma/01whats-new/2003 -07-23_statement_cs.html

Asia/Riyadh{87,88,89}: Saudi Arabia, SA - those are historical, from an era when Saudi Arabia used solar time, and apply only to Riyadh (and, if you're really fussy, to a particular location in Riyadh, I guess), so they're not appropriate for Saudi Arabia as a whole. I don't know what names you'd give them.

Etc/GMT{[+-]N} are just for fixed GMT offsets; they don't correspond to countries.

WET, CET, MET, and EET "are for backward compatibility with older versions"; various Europe/XXX rules should presumably be used instead - I guess you could pick cities for each of them.

Masayoshi Okutsu

2:49 p.m.

Chuck Soper wrote:

...

Are these tables informational or vital to the process? For example, will the Aliases table be used to "Canonicalize the Olson ID" (step 1 of the Fallback procedure)? If so, then the table should be based on the current time zone data files instead of "current Java data". This could turn into a serious maintenance issue. Ideally, I think that a script could be run on the time zone data files to generate a new aliases table each time the time zone data files are updated (currently tzdata2004a).

I agree. And it's unclear what the "current Java data" is. *My* current Java data is tzdata2004a, which is very unlikely to be shipped with 1.5...

...

Where did AET (line 2, Aliases table) for Australia/Sydney come from? I can't find any reference to it in tzdata2004a.

I guess it came from JDK1.1 compatibility names. Someone (not me!) invented 3-letter time zone IDs in JDK 1.1, which failed to support more time zones. Java uses the Olson zone IDs since 1.2. But we still have the JDK 1.1 compatibility names as far as they don't conflict with any Olson zone IDs.

...

### Equivalent Modern Zones My first reaction whenever I see a significant effort to simplify something for the user is to think that it could lead to problems.

I tend to agree with you. I think we should find out another way to reduce the number of localized time zone name strings.

...

How did you determine that "A lot of people just don't care about historic differences"? Do most people use time zones merely to keep track of the current time or current differences between another time zone? How else do users use time zones?

Right. And how can we define "current" for time zones which change their offsets some time in the future, like America/Mendoza in tzdata2004a? The day after tomorrow...? :-) Masayoshi

Mark Davis

6:49 p.m.

comments interleaved below. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Chuck Soper" <chucks@lmi.net> To: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 00:21 Subject: Re: Time Zone Localizations

...

I think that the tables listed at this link could be revised to improve clarity:

http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/zon...

...

Are these tables informational or vital to the process? For example, will the Aliases table be used to "Canonicalize the Olson ID" (step 1 of the Fallback procedure)? If so, then the table should be based on the current time zone data files instead of "current Java data". This could turn into a serious maintenance issue. Ideally, I think that a script could be run on the time zone data files to generate a new aliases table each time the time zone data files are updated (currently tzdata2004a).

The tables are informational, just to provide a different view of the data and provide background for the issues involved. The end goal is to work off of the current time zone data, as provided on ftp://elsie.nci.nih.gov/pub/. It may be necessary to supplement that data, e.g. to have a list of 'outmoded codes' like WET, CET, MET, EET, Asia/Riyadh87, Asia/Riyadh88, Asia/Riyadh89; or add IDs for missing country codes, but we would really far rather that all the data could come from ftp://elsie.nci.nih.gov/pub/.

...

### Aliases The Aliases table appears to contain time zone names from several different tzdata2004a files: backward, etcetera, and systemv. If you use several different tables such as backward aliases, etcetera aliases, and systemv aliases then I expect that they would be easier to maintain as the time zone database is updated.

Good. Part of our supplementary data could be which whole tables to exclude.

...

I suggest avoiding terms like 'Bogus' and 'Real' because they're not very descriptive. Here are some possible new terms: 'Bogus' to 'Obsolete Olson ID from tzdata2004a/backward' 'Real' to 'Valid Olson ID as of tzdata2004a'

The 'Bogus' column in the Aliases table contains both obsolete and valid time zone names. Antarctica/South_Pole and America/Shiprock are valid; they are both listed in the tzdata2004a/zone.tab file. These files correspond to (are equal to) Antarctica/McMurdo and America/Denver respectively but that doesn't necessarily mean that they are invalid or bogus.

You're right: Bogus is an overly familiar term. However, to reduce the translation requirements and make the data more manageable, we do want to set up some uniqueness criteria. If two IDs have exactly the same behavior since the time when time zones were adopted, and have always been in the same country over that period, we only want one of them to be in the main list. The other can be an alternate -- and still work-- but we would recommend an extremely low priority on translation.

...

Where did AET (line 2, Aliases table) for Australia/Sydney come from? I can't find any reference to it in tzdata2004a.

There are some aliases that come from Java. That is noted in the document, but probably not clearly enough.

...

### Country to Zones The 'Country to Zones' looks good, yet it contains at least one obsolete country code, YU for Yugoslavia. I realize that this is a known issue. You can use the tzdata2004a/zone.tab file to generate an updated 'Country to Zones' table.

For obsolete ISO 3166 country codes such as YU I think that ISO 3166-3 could be referenced. ISO 3166-3 represents codes for formerly used names of countries: http://www.niso.org/standards/resources/3166.html

http://www.iso.org/iso/en/prods-services/iso3166ma/04background-on-iso-3166/...

...

Each ISO 3166-3 entry has several fields including a four letter code where the first two letters are the formerly used code and the last two letters are the code that replaced it. The four letter code for Yugoslavia is 'YUCS'.

The ISO web site says that ISO 3166-3 was first published in 1998 (or maybe 1999), but I cannot find the original document. Yes, it would be ironic if the standard for formerly used names is formerly used.

There is separate discussion on the email list on the instability of ISO codes. Unfortunately, the ISO committee makes no stability guarantees about 3 letter codes either.

...

Should CLDR (or the time zone database) maintain a list of formerly used country codes? This would be similar to the backward file to maintain obsolete time zone names.

We would probably use the same mechanism as the RFC 3066bis; that once a country code is introduced by ISO, we never retract it. If they introduce a different meaning for that code, we don't follow them -- and instead use the UN code.

...

### Countries that are missing Zones Could this table be renamed to 'Country Codes that do not have Zones'? Of course, Yugoslavia shouldn't be in the table.

Yes, and you are correct on Yugoslavia. (Apparently the Java implementation filters that out).

...

### Zones that are missing Countries Could this table be renamed to "Zones that do not map to specific Countries"?

Yes

...

Europe/Belgrade is in this list, yet it's listed in the tzdata2004a/zone.tab file with a country code of CS. I suppose this is a know issue related to the YU country code for Yugoslavia.

Yes

...

I don't think that Asia/Riyadh87, Asia/Riyadh88, and Asia/Riyadh89 belong in this list. They correspond to the tzdata2004a files: solar87, solar88, and solar89. Should they go in the alias table(s)?

They really sound like items we should just ignore, for the purposes of these document, since they are not really useful.

...

### Windows IDs This table seems to imply that there is a one-to-one relationship between Olson IDs and Windows IDs. Since you only have 75 Windows IDs listed, I suspect that one Windows ID may map to one or more Olson IDs.

There should be more explanation. These are, to the best of our knowledge, the appropriate mappings to use for Windows IDs, but its presence here is only informational. Windows doesn't try to do historic time zones, nor do they cover all of the modern timezones completely.

...

### Equivalent Modern Zones My first reaction whenever I see a significant effort to simplify something for the user is to think that it could lead to problems. How did you determine that "A lot of people just don't care about historic differences"? Do most people use time zones merely to keep track of the current time or current differences between another time zone? How else do users use time zones?

Many (I would dare say the vast majority) of end users just don't care now that there was once a difference between Dawson, Whitehorse and Los Angeles. When they pick a timezone in some preferences dialog (on their machine, in a website preferences page, etc) they just want to see one choice for that zone, not three different ones that they have to think about. The UI might have an advanced button (as the text discusses) for someone who does really care, but that will be a very small proportion of users.

...

That's all for now. I hope that my comments are useful. Chuck

Absolutely! Actually, another question. We have traditionally referred to the timezone IDs in ftp://elsie.nci.nih.gov/pub/ as "Olson IDs". What is the best way to refer to them?

...

At 6:40 PM -0700 6/10/04, Mark Davis wrote:

...
Thanks for your feedback.

...
Bouvet Island - an uninhabited volcanic island, almost entirely ... Etc/GMT{[+-]N} are just for fixed GMT offsets; they don't correspond to countries.

Yes, we realize that Bouvet Island and Heard Island and McDonald Islands are completely obscure places; it is more for a matter of API/testing completeness. Understood that Etc/GMT... don't correspond to countries. But in an API and for translation, it is useful to have everything attached to a country, even if

...

...
is a pseudo-country. That's why the suggestion in the document is to use ZZ for them, which is a private-use ISO country code, which can be translated as "no country".

As to Yugoslavia, that is a real mess, because the ISO committee just doesn't care about stability of identifiers. You can have a database set up with someone's country of birth stored as CS. All of a sudden by some whim of ISO, that data is invalidated. More on that at http://www.unicode.org/consortium/utc-positions.html#2stability.

...
Asia/Riyadh{87,88,89}: Saudi Arabia, SA - those are historical, from an era when Saudi Arabia used solar time, and apply only to Riyadh (and, if you're really fussy, to a particular location in Riyadh, I guess), so they're not appropriate for Saudi Arabia as a whole. I don't know what names you'd give them. ... WET, CET, MET, and EET "are for backward compatibility with older versions"; various Europe/XXX rules should presumably be used instead - I guess you could pick cities for each of them.

For these, I guess my recommendation would be to not bother translating them at all -- they are all compatibility orphans, one wouldn't encourage their use.

Mark __________________________________ http://www.macchiato.com ? '¤÷ËÕZY¤½Ë¼·¬Ë»¦Z?ÕÃË ?

----- Original Message ----- From: "Guy Harris" <guy@alum.mit.edu> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Thu, 2004 Jun 10 18:11 Subject: Re: Time Zone Localizations

...
On Jun 10, 2004, at 12:04 PM, Mark Davis wrote:

...
I'd very much appreciate any feedback on the proposal.

Some of the countries listed as missing zones are:

Bouvet Island - an uninhabited volcanic island, almost entirely covered by glaciers, controlled by Norway, and designated as a nature reserve, according to

http://www.cia.gov/cia/publications/factbook/geos/bv.html

I don't know if the automated meteorological station on the island cares about time zones or not.

Heard Island and McDonald Islands - uninhabited, barren, sub-Antarctic islands now controlled by Australia, designated as a nature preserve, according to

http://www.cia.gov/cia/publications/factbook/geos/hm.html

They don't even mention any automated meteorological stations, just seals and birds.

Yugoslavia - it's now Serbia and Montenegro. Europe/Belgrade is the correct zone for it.

Some of the time zones listed as missing countries are:

Europe/Belgrade: Serbia and Montenegro, which has the ISO 3166-1 Alpha-2 code CS, according to

http://www.iso.org/iso/en/prods-services/iso3166ma/01whats-new/2003 -07-23_statement_cs.html

Asia/Riyadh{87,88,89}: Saudi Arabia, SA - those are historical, from an era when Saudi Arabia used solar time, and apply only to Riyadh (and, if you're really fussy, to a particular location in Riyadh, I guess), so they're not appropriate for Saudi Arabia as a whole. I don't know what names you'd give them.

Etc/GMT{[+-]N} are just for fixed GMT offsets; they don't correspond to countries.

WET, CET, MET, and EET "are for backward compatibility with older versions"; various Europe/XXX rules should presumably be used instead - I guess you could pick cities for each of them.

Garrett Wollman

7:41 p.m.

<<On Fri, 11 Jun 2004 11:49:03 -0700, "Mark Davis" <mark.davis@jtcsv.com> said: [Text formatting recovered.]

...

Many (I would dare say the vast majority) of end users just don't care now that there was once a difference between Dawson, Whitehorse and Los Angeles. When they pick a timezone in some preferences dialog (on their machine, in a website preferences page, etc) they just want to see one choice for that zone, not three

There are really very few cases where you might give people multiple choices, having already selected a particular country or national region. In the tzsetup(8) user interface which I wrote, users must first select a region and then a "country" (scare quotes because they are actually selecting a 3166-2 code behind the scenes, but the interface doesn't tell them that). The US probably provides most of the complicated cases once you've gotten that far; few other countries have more than one historic zone for each existing modern zone. In any event, if the user has already selected a locale, then you should default to presenting only the time zones associated with the country or region identified by the locale, with an option to "see all". There is no need to identify "equivalent" time zones when most of them are already known to be largely irrelevant. There is a separate localization issue that comes up when trying to answer the question, "What time is it in _____?". I don't know if the scope of your project extends to that question. -GAWollman

Mark Davis

10:01 p.m.

There are 29 countries that have multiple zones. If you show only modern zones, you have 99 different zones for those countries; if you show historic zones, you have 164, a considerable increase. For example, for the US you have the following, with a ';' between zones that are the same nowadays, and ';' between ones that are different. It clutters up the UI and adds more translation requirements if you need to distinguish non-modern zones. Pacific/Honolulu; America/Adak; America/Anchorage, America/Nome, America/Juneau, America/Yakutat; America/Los_Angeles; America/Phoenix; America/Denver, America/Boise; America/Chicago, America/North_Dakota/Center, America/Menominee; America/Indianapolis, America/Indiana/Knox, America/Indiana/Vevay, America/Indiana/Marengo; America/New_York, America/Kentucky/Monticello, America/Detroit, America/Louisville Note: we're not saying the modern equivalents must be identified; this is more a matter of setting priorities for translation. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Garrett Wollman" <wollman@khavrinen.lcs.mit.edu> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 12:41 Subject: Re: Time Zone Localizations

...

<<On Fri, 11 Jun 2004 11:49:03 -0700, "Mark Davis" <mark.davis@jtcsv.com> said:

[Text formatting recovered.]

...
Many (I would dare say the vast majority) of end users just don't care now that there was once a difference between Dawson, Whitehorse and Los Angeles. When they pick a timezone in some preferences dialog (on their machine, in a website preferences page, etc) they just want to see one choice for that zone, not three

There are really very few cases where you might give people multiple choices, having already selected a particular country or national region. In the tzsetup(8) user interface which I wrote, users must first select a region and then a "country" (scare quotes because they are actually selecting a 3166-2 code behind the scenes, but the interface doesn't tell them that). The US probably provides most of the complicated cases once you've gotten that far; few other countries have more than one historic zone for each existing modern zone. In any event, if the user has already selected a locale, then you should default to presenting only the time zones associated with the country or region identified by the locale, with an option to "see all". There is no need to identify "equivalent" time zones when most of them are already known to be largely irrelevant.

There is a separate localization issue that comes up when trying to answer the question, "What time is it in _____?". I don't know if the scope of your project extends to that question.

-GAWollman

Garrett Wollman

10:49 p.m.

<<On Fri, 11 Jun 2004 15:01:15 -0700, "Mark Davis" <mark.davis@jtcsv.com> said:

...

ones that are different. It clutters up the UI and adds more translation requirements if you need to distinguish non-modern zones.

...

Pacific/Honolulu; America/Adak; America/Anchorage, America/Nome, America/Juneau, America/Yakutat; America/Los_Angeles; America/Phoenix; America/Denver, America/Boise; America/Chicago, America/North_Dakota/Center, America/Menominee; America/Indianapolis, America/Indiana/Knox, America/Indiana/Vevay, America/Indiana/Marengo; America/New_York, America/Kentucky/Monticello, America/Detroit, America/Louisville

Well, except that you don't need to translate any of these, unless you're really interested in supporting `ko_US' or the like. None of these are relevant to any other country. Their existence should not cause a maintenance burden for other locales. Likewise, there is no need to perform a Swahili translation for the multitude of Brazilian locales. -GAWollman

Mark Davis

11:30 p.m.

Those pesky non-English speakers! But there is a point: simply because I don't speak Russian does mean that I would have no occasion to see or use time zones that happen to be in Russia. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Garrett Wollman" <wollman@khavrinen.lcs.mit.edu> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: "Garrett Wollman" <wollman@khavrinen.lcs.mit.edu>; <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 15:49 Subject: Re: Time Zone Localizations

...

<<On Fri, 11 Jun 2004 15:01:15 -0700, "Mark Davis" <mark.davis@jtcsv.com> said:

...
ones that are different. It clutters up the UI and adds more translation requirements if you need to distinguish non-modern zones.

...
Pacific/Honolulu; America/Adak; America/Anchorage, America/Nome, America/Juneau, America/Yakutat; America/Los_Angeles; America/Phoenix; America/Denver, America/Boise; America/Chicago, America/North_Dakota/Center, America/Menominee; America/Indianapolis, America/Indiana/Knox, America/Indiana/Vevay, America/Indiana/Marengo; America/New_York, America/Kentucky/Monticello, America/Detroit, America/Louisville

Well, except that you don't need to translate any of these, unless you're really interested in supporting `ko_US' or the like. None of these are relevant to any other country. Their existence should not cause a maintenance burden for other locales. Likewise, there is no need to perform a Swahili translation for the multitude of Brazilian locales.

-GAWollman

Eric Muller

11:45 p.m.

Mark Davis wrote:

...

Those pesky non-English speakers! But there is a point: simply because I don't speak Russian does mean that I would have no occasion to see or use time zones that happen to be in Russia.

For example, my coworker in Russia may schedule a meeting, which will occur in Moscow during my visit over there next week, but I have asked the calendaring software to report stuff to me in the en_US locale. Hence I get a message, in English, with a Russian time zone. If the meeting were in San Jose, CA, then I would expect the time zone to be that of San Jose, and my Russian coworker would expect the Russian translation of (the short or the long form of) PDT. Eric.

Garrett Wollman

1:29 a.m.

<<On Fri, 11 Jun 2004 16:30:53 -0700, "Mark Davis" <mark.davis@jtcsv.com> said:

...

Those pesky non-English speakers! But there is a point: simply because I don't speak Russian does mean that I would have no occasion to see or use time zones that happen to be in Russia.

It is, however, less likely that you would need to do so. You *were* writing about the need to prioritize translation efforts; it seems clear to me that the translation of all zones into their "native" locale should be given a higher priority than translating any zone into a language in which it is rarely referenced. Some zones may never need to be translated. Procedurally, I would suggest: foreach zone { foreach locale used in country of zone { translate all symbols used in this zone translate identifier of zone if necessary } } ; break point 1 foreach locale { foreach equivalence class of proleptically isomorphic zones { choose exemplar appropriate for this locale translate all symbols used in exemplar translate identifier of exemplar if necessary } } ; break point 2 translate remaining symbols and identifiers -GAWollman

Mark Davis

2:06 a.m.

Yes, I agree that not every translator will want to translate every zone. And the ordering that you suggest is reasonable. It is hard for us to judge exactly the priority that people in a given country will give to particular zones; our goal is to make it possible to have reasonable fallback behavior for zones that they don't want to translate, and give them guidance as to the effects of their choices. They may have different priorities, once they understand that. For example, it may be that in the Ukraine a lot of business is done with Russia, so it is worthwhile to translate all the Russian zones in detail; but for Australia they may depend on the fallback policy. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Garrett Wollman" <wollman@khavrinen.lcs.mit.edu> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 18:29 Subject: Re: Time Zone Localizations

...

<<On Fri, 11 Jun 2004 16:30:53 -0700, "Mark Davis" <mark.davis@jtcsv.com> said:

...
Those pesky non-English speakers! But there is a point: simply because I don't speak Russian does mean that I would have no occasion to see or use time zones that happen to be in Russia.

It is, however, less likely that you would need to do so. You *were* writing about the need to prioritize translation efforts; it seems clear to me that the translation of all zones into their "native" locale should be given a higher priority than translating any zone into a language in which it is rarely referenced. Some zones may never need to be translated. Procedurally, I would suggest:

foreach zone { foreach locale used in country of zone { translate all symbols used in this zone translate identifier of zone if necessary } } ; break point 1

foreach locale { foreach equivalence class of proleptically isomorphic zones { choose exemplar appropriate for this locale translate all symbols used in exemplar translate identifier of exemplar if necessary } } ; break point 2

translate remaining symbols and identifiers

-GAWollman

jcowan＠reutershealth.com

8:23 p.m.

Mark Davis scripsit:

...

However, to reduce the translation requirements and make the data more manageable, we do want to set up some uniqueness criteria. If two IDs have exactly the same behavior since the time when time zones were adopted,

In fact the Olson data do not separate timezones in a given country that have been the same since 1970-01-01. Otherwise, Indiana would have something like 30 time zones instead of just four.

...

and have always been in the same country over that period, we only want one of them to be in the main list. The other can be an alternate -- and still work-- but we would recommend an extremely low priority on translation.

I think that is a mistake, for two reasons: national chauvinism and future-proofing. About the former, nothing need be said; but the whole point of setting your zone to the country you are in (especially if you live there) is that you don't want to have to reset it if your national legislature changes the rules, either the DST rules or the zone proper. Within the EU, DST rules are harmonized, but which zone to adopt is a purely national decision.

...

Many (I would dare say the vast majority) of end users just don't care now that there was once a difference between Dawson, Whitehorse and Los Angeles.

This strikes me as backwards. If you're in the U.S., you should see U.S. choices; in Canada you should see Canadian ones.

...

Absolutely!

I think the series of fallbacks is unnecessarily complex. In particular, the fallback from "Pacific Time" to "GMT-07:00/08:00" doesn't tell me that much, because I don't know a priori whether it's winter or summer currently. In addition, it fails to exploit the nice thing about the use of city names in Olson, namely that city names don't need that much localization: in the vast majority of cases, the internationally known name is the only name. (Transliteration might be required if the current locale has no Latin letters.) Thus the full combinatorial explosion of city name x language can mostly be short-circuited. I propose a simpler scheme, therefore: 1) If you have a translation for the time zone name x the language, use it. 2a) Get the localized name for the city (or if none, the Olson city name); 2b) Get the "Tampo de '%1'" schema for the language (or if none, use just "%1"); 2c) Substitute the city name into the schema and use that. When I'm communicating with users about the Reuters Health system, I always refer to events occurring at such-and-such a time, New York time. That communicates not only a GMT offset but a set of DST rules. This is also what's typically done in legal documents -- see the legal ads for bond redemption announcements in a newspaper. -- John Cowan <jcowan@reutershealth.com> http://www.reutershealth.com I amar prestar aen, han mathon ne nen, http://www.ccil.org/~cowan han mathon ne chae, a han noston ne 'wilith. --Galadriel, LOTR:FOTR

Mark Davis

10:51 p.m.

comments interleaved below. Since this is getting back into the translation issues, I'm cc'ing the cldr group. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: <jcowan@reutershealth.com> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov>; "Chuck Soper" <chucks@lmi.net> Sent: Fri, 2004 Jun 11 13:23 Subject: Re: Time Zone Localizations

...

Mark Davis scripsit:

...
However, to reduce the translation requirements and make the data more manageable, we do want to set up some uniqueness criteria. If two IDs have exactly the same behavior since the time when time zones were adopted,

In fact the Olson data do not separate timezones in a given country that have been the same since 1970-01-01. Otherwise, Indiana would have something like 30 time zones instead of just four.

...
and have always been in the same country over that period, we only want one of them to be in the main list. The other can be an alternate -- and still work-- but we would recommend an extremely low priority on translation.

I think that is a mistake, for two reasons: national chauvinism and future-proofing. About the former, nothing need be said; but the whole point of setting your zone to the country you are in (especially if you live there) is that you don't want to have to reset it if your national legislature changes the rules, either the DST rules or the zone proper. Within the EU, DST rules are harmonized, but which zone to adopt is a purely national decision.

I said "have always been in the same country over that period"; this would not make any zones "modern equivalents" that were in different countries. But see below.

...

...
Many (I would dare say the vast majority) of end users just don't care now that there was once a difference between Dawson, Whitehorse and Los Angeles.

This strikes me as backwards. If you're in the U.S., you should see U.S. choices; in Canada you should see Canadian ones.

My fault for confusing you; I mistyped Los Angeles instead of Vancouver. Here is a real example. Each of the items separated by commas are modern equivalents, and all within the same country (Canada). Thus America/Dawson, America/Whitehorse, America/Vancouver are not distinguished by country, and all behave the same nowadays. America/Dawson, America/Whitehorse, America/Vancouver; America/Dawson_Creek; America/Inuvik, America/Yellowknife, America/Edmonton, America/Cambridge_Bay; America/Swift_Current, America/Regina; America/Rainy_River, America/Rankin_Inlet; America/Winnipeg; America/Iqaluit, America/Pangnirtung, America/Nipigon, America/Thunder_Bay, America/Montreal; America/Goose_Bay; America/Glace_Bay, America/Halifax; America/St_Johns

...

...
Absolutely!

I think the series of fallbacks is unnecessarily complex. In particular, the fallback from "Pacific Time" to "GMT-07:00/08:00" doesn't tell me that much, because I don't know a priori whether it's winter or summer currently.

In addition, it fails to exploit the nice thing about the use of city names in Olson, namely that city names don't need that much localization: in the vast majority of cases, the internationally known name is the only name. (Transliteration might be required if the current locale has no Latin letters.) Thus the full combinatorial explosion of city name x language can mostly be short-circuited.

If that were true, we'd not as much of a problem. And if everyone spoke English this would all be much easier ;-) Look at London, from the CLDR: <ldml><dates><timeZoneNames><zone type="GMT"><exemplarCity> "Londain": ·ga· "Londen": ·nl· "London": ·da· ·en· ·fr· ·sv· "Londra": ·it· "Londres": ·es· ·pt· "Lontoo": ·fi· "ロンドン": ·ja· "伦敦": ·zh· "倫敦": ·zh_Hant· "런던": ·ko· ... These are only for a few languages, but there is a lot of variation. A great part of the motivation for this is to cut down on the amount of data required, just from the sheer magnitude of the problem when you multiply the figures by the 90 languages currently in CLDR, plus the many more languages to come.

...

I propose a simpler scheme, therefore:

1) If you have a translation for the time zone name x the language, use it.

2a) Get the localized name for the city (or if none, the Olson city name); 2b) Get the "Tampo de '%1'" schema for the language (or if none, use just

"%1");

...

2c) Substitute the city name into the schema and use that.

I can understand your desire for simplicity, and I am not happy with there being 8 possible steps. But depending on city data would be very painful. We already have in CLDR a lot of country data, so if we can leverage that it really helps. Let's look at the figures. There are 239 countries. Of them, 210 have a single zone. Using a country name for each of them is essentially free. Of the remainder, 8 only have multiple zones historically. So the modern ones are again essentially free. Of the rest, cities might be the best way to go. We would need 99 cities for modern zone distinctions, 140 if we added historic also. If you multiply that by 90 languages it is still a lot of data, but *way*, *way* better than 558 x 90 we are faced with now! So that is the reason for Step #4.1 in http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim... But we still need some fallback in case there is no unique country, and no translated city. Now, it may be better to nuke #4.2 and #5, e.g. dropping the GMT part. GMT format when there is no daylight savings does not lose any information (nowadays). Where there is daylight, it does lose information -- although actually not much -- but avoids the problem of using cities that may either be unknown to the user or not in a script s/he can read. The only place where it is ambiguous (within a country) is if you have two zones that have the same summer & winter offsets, but start at different times. That is pretty rare. (Across countries, or historically, it is not quite so rare.) That being said, we are not wedded to the GMT format either; have to toss it around a bit. You are right that GMT format does not protect against future changes; but we have to look at likelyhood. The city format also doesn't protect against all possible changes; I might use America/Los Angeles right now meaning my time zone, but if the N. California counties changed to a different zone, splitting that one, then it wouldn't be correct any more. [Of course, what would really be nice is if the world could agree to all switch to/from daylight savings at the same (local) time, e.g. 02:00 the last Sundays in March and September. Then you could convey all modern zones with three formats, without loss of information: - GMT-08:00 (for no daylight savings) - GMT-08:00N (for daylight savings March-Sept), and - GMT-08:00S (for daylight savings Sept-March). Of course, the chances of something sensible like this are, well, zip.]

...

When I'm communicating with users about the Reuters Health system, I always refer to events occurring at such-and-such a time, New York time. That communicates not only a GMT offset but a set of DST rules. This is also what's typically done in legal documents -- see the legal ads for bond redemption announcements in a newspaper.

-- John Cowan <jcowan@reutershealth.com> http://www.reutershealth.com I amar prestar aen, han mathon ne nen, http://www.ccil.org/~cowan han mathon ne chae, a han noston ne 'wilith. --Galadriel, LOTR:FOTR

John Cowan

4:13 a.m.

Mark Davis scripsit:

...

America/Dawson, America/Whitehorse, America/Vancouver;

Yes, I agree that this kind of variation can be merged away for localization purposes. I didn't understand it before.

...

Look at London, from the CLDR:

Yes, well, Europe is a particularly bad case, because they all have local names for each others' locations. I rather doubt that the same applies to Vancouver or Winnipeg or Iqaluit. As I said before, you may want to apply your transliteration engine when you know there's a script barrier.

...

There are 239 countries. Of them, 210 have a single zone. Using a country name for each of them is essentially free.

Yes, that is the Right Thing.

...

But we still need some fallback in case there is no unique country, and no translated city.

Use the Olson name of the city in that case. It's not ideal, but it still helps a great deal. (Transliterated when necessary.)

...

Of course, the chances of something sensible like this are, well, zip.]

Not really. The appropriate time for DST changes depends on latitude, and in the Southern Hemisphere it goes the other way. -- As you read this, I don't want you to feel John Cowan sorry for me, because, I believe everyone jcowan@reutershealth.com will die someday. http://www.reutershealth.com --From a Nigerian-type scam spam http://www.ccil.org/~cowan

Mark Davis

5:59 a.m.

...

Use the Olson name of the city in that case. It's not ideal, but it still helps a great deal. (Transliterated when necessary.)

Not sure that would be the best, as I said. Have to run it by a number of country contacts.

...

...
Of course, the chances of something sensible like this are, well, zip.]

Not really. The appropriate time for DST changes depends on latitude, and in the Southern Hemisphere it goes the other way.

If you look at the zone data, there is really not much of a correlation between latitude and daylight start/stop. And if you read my example, there were three cases, with a clear indication of whether it was N daylight savings or S daylight savings.

...

-- As you read this, I don't want you to feel John Cowan sorry for me, because, I believe everyone jcowan@reutershealth.com will die someday. http://www.reutershealth.com --From a Nigerian-type scam spam http://www.ccil.org/~cowan

Paul Eggert

11:52 p.m.

"Mark Davis" <mark.davis@jtcsv.com> writes:

...

We have traditionally referred to the timezone IDs in ftp://elsie.nci.nih.gov/pub/ as "Olson IDs". What is the best way to refer to them?

The zdump(8) man page says "zonenames". Perhaps "Olson zone names"?

Mark Davis

11:58 p.m.

zonenames is not distinct enough. Olson zone names is better, but usually we try to use identifier instead of name, because name can mean the human-visible name, after translation. So it sounds like Olson zone identifier or maybe Olson time zone identifier? Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Paul Eggert" <eggert@CS.UCLA.EDU> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 16:52 Subject: Re: Time Zone Localizations

...

"Mark Davis" <mark.davis@jtcsv.com> writes:

...
We have traditionally referred to the timezone IDs in ftp://elsie.nci.nih.gov/pub/ as "Olson IDs". What is the best way to refer to them?

The zdump(8) man page says "zonenames". Perhaps "Olson zone names"?

Paul Eggert

12:34 a.m.

"Mark Davis" <mark.davis@jtcsv.com> writes:

...

it sounds like Olson zone identifier or maybe Olson time zone identifier?

RFC 2445 calls these sorts of things TZIDs, so perhaps "Olson TZID"? Or if you want to spell out "identifier", then "Olson TZ identifier"? They are used as values of the TZ environment variable, so "TZ" has a good pedigree.

Mark Davis

1:06 a.m.

That sounds like the best abbreviation, with the full name being Olson Time Zone Identifier. And as a reference, should we say something like: Olson Time Zone Identifier (Olson TZID) as defined by ftp://elsie.nci.nih.gov/pub/ Is there a stable link to a document that we could cite as the specification for the IDs, instead of just pointing to an FTP site? http://www.twinsun.com/tz/tz-link.htm is more of a background document than a specification. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Paul Eggert" <eggert@CS.UCLA.EDU> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 17:34 Subject: Re: Time Zone Localizations

...

"Mark Davis" <mark.davis@jtcsv.com> writes:

...
it sounds like Olson zone identifier or maybe Olson time zone identifier?

RFC 2445 calls these sorts of things TZIDs, so perhaps "Olson TZID"? Or if you want to spell out "identifier", then "Olson TZ identifier"? They are used as values of the TZ environment variable, so "TZ" has a good pedigree.

Paul Eggert

1:56 a.m.

"Mark Davis" <mark.davis@jtcsv.com> writes:

...

ftp://elsie.nci.nih.gov/pub/

Is there a stable link to a document that we could cite as the specification for the IDs, instead of just pointing to an FTP site?

When you ask for a "specification for the IDs", do you mean "which TZIDs are in the Olson database?" or "what syntax is allowed for the TZIDs?"? Either way, I'd say that the FTP site is the stable reference. According to my records that location dates back to April 25, 1992, so it's one of the most stable links on the planet.

Mark Davis

2:13 a.m.

What I mean is that we can link to that site, but there is no obvious document there like "olson_tzid_specification.txt" (or .html or .pdf) that people can read to tell them how the Olson TZIDs are defined by the data files in that directory. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Paul Eggert" <eggert@CS.UCLA.EDU> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 18:56 Subject: Re: Time Zone Localizations

...

"Mark Davis" <mark.davis@jtcsv.com> writes:

...
ftp://elsie.nci.nih.gov/pub/

Is there a stable link to a document that we could cite as the specification for the IDs, instead of just pointing to an FTP site?

When you ask for a "specification for the IDs", do you mean "which TZIDs are in the Olson database?" or "what syntax is allowed for the TZIDs?"?

Either way, I'd say that the FTP site is the stable reference. According to my records that location dates back to April 25, 1992, so it's one of the most stable links on the planet.

Paul Eggert

5:01 a.m.

"Mark Davis" <mark.davis@jtcsv.com> writes:

...

like "olson_tzid_specification.txt" (or .html or .pdf) that people can read to tell them how the Olson TZIDs are defined by the data files in that directory.

If you just want a list of TZIDs, then the zone.tab file is perhaps your best bet right now. Admittedly there's no single URL for it but it's not hard to give directions for deriving it. If you want to know exactly the data are derived then the zic man page is the specification. You can Google "zic man page" if it's too much of a pain to unpack it from the canonical location.

Mark Davis

5:57 a.m.

It's not really that I personally would want it; it is that we can't point people to a document that they can read, that describes a specification. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Paul Eggert" <eggert@CS.UCLA.EDU> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 22:01 Subject: Re: Time Zone Localizations

...

"Mark Davis" <mark.davis@jtcsv.com> writes:

...
like "olson_tzid_specification.txt" (or .html or .pdf) that people can read to tell them how the Olson TZIDs are defined by the data files in that directory.

If you just want a list of TZIDs, then the zone.tab file is perhaps your best bet right now. Admittedly there's no single URL for it but it's not hard to give directions for deriving it.

If you want to know exactly the data are derived then the zic man page is the specification. You can Google "zic man page" if it's too much of a pain to unpack it from the canonical location.

Paul Eggert

6:33 a.m.

"Mark Davis" <mark.davis@jtcsv.com> writes:

...

we can't point people to a document that they can read, that describes a specification.

I don't see why not. If you want a list of Olson TZIDs, you can make a copy of zone.tab, and point people at it. Or if you want the specification for the zic input file format, you can make a copy of the zic man page, and point people at that. This stuff is all in the public domain. On the other hand, if the existing documentation is not enough for your needs, then it might be helpful if you could write something that will do the job, and contribute it back to the mainline distribution. Documentation is often the hardest job in volunteer efforts like these, and I'd welcome any good contributions in this area.

Mark Davis

6:27 p.m.

The latter is what we need; it needed be a long document, and I'd be glad to put together a draft for your consideration. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Paul Eggert" <eggert@CS.UCLA.EDU> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 23:33 Subject: Re: Time Zone Localizations

...

"Mark Davis" <mark.davis@jtcsv.com> writes:

...
we can't point people to a document that they can read, that describes a specification.

I don't see why not. If you want a list of Olson TZIDs, you can make a copy of zone.tab, and point people at it. Or if you want the specification for the zic input file format, you can make a copy of the zic man page, and point people at that. This stuff is all in the public domain.

On the other hand, if the existing documentation is not enough for your needs, then it might be helpful if you could write something that will do the job, and contribute it back to the mainline distribution. Documentation is often the hardest job in volunteer efforts like these, and I'd welcome any good contributions in this area.

David Keegel

7:49 a.m.

There are general rules for choosing TZID names and various other things in the "Theory" file within tzdata. I'm not sure if that helps Mark, because I'm not really clear on exactly what sort of specifications he is looking for. But in any case, newcomers seem to have difficulty finding things like the naming conventions in the Theory file. It might be worth putting pointers to the Theory file from tzcode and the zone.tab file from tzdata in places like http://www.twinsun.com/tz/tz-link.htm and maybe the README for tzcode. Perhaps there should even be a short README for tzdata and/or ftp://elsie.nci.nih.gov/pub/. ] The latter is what we need; it needed be a long document, and I'd be glad to put ] together a draft for your consideration. ] ] Mark ] __________________________________ ] http://www.macchiato.com ] ___ _______________________________________________________________ ___ ] ] ----- Original Message ----- ] From: "Paul Eggert" <eggert@CS.UCLA.EDU> ] To: "Mark Davis" <mark.davis@jtcsv.com> ] Cc: <tz@lecserver.nci.nih.gov> ] Sent: Fri, 2004 Jun 11 23:33 ] Subject: Re: Time Zone Localizations ] ] ] > "Mark Davis" <mark.davis@jtcsv.com> writes: ] > ] > > we can't point people to a document that they can read, that ] > > describes a specification. ] > ] > I don't see why not. If you want a list of Olson TZIDs, you can make ] > a copy of zone.tab, and point people at it. Or if you want the ] > specification for the zic input file format, you can make a copy of ] > the zic man page, and point people at that. This stuff is all in the ] > public domain. ] > ] > On the other hand, if the existing documentation is not enough for ] > your needs, then it might be helpful if you could write something that ] > will do the job, and contribute it back to the mainline distribution. ] > Documentation is often the hardest job in volunteer efforts like ] > these, and I'd welcome any good contributions in this area. ___________________________________________________________________________ David Keegel <djk@cybersource.com.au> http://www.cyber.com.au/users/djk/ Cybersource P/L: Linux/Unix Systems Administration Consulting/Contracting

Mark Davis

7:42 p.m.

Well, here's something like what a spec would be. A document with a stable link, e.g. ftp://elsie.nci.nih.gov/pub/olson_timezone_specification.html, that describes (not necessarily in order): 1. What are the all valid Olson TZIDs (not necessarily a list in the document, but it should be possible to point to a single file that contains that list, in a well-described format). 2. How to determine which are 'canonical' and which are simply included for compatibility 3. What is the meaning of an TZID 4. What is the versioning scheme, including assurance that: - once a version is issued it is never changed. - TZIDs are stable, in the sense that they will never be withdrawn or reused with a substantially different semantic in later versions 5. An explicit description of the data representation for all of the data files. It should include common features, such as that data on each line after # is a comment. It should be possible from the document to understand the format of all of the data files, completely independent of any code. Again, this can be indirect: e.g. The data format for the zone.tab file is described in the header of that file. It should be possible to read this document directly -- and the lists in #1 and #2 -- instead of hunting around in compressed files for all the pieces. A lot of the material for the above are present, someplace on the site. For example, much of #3 could come from the information in the Theory file from "----- Names of time zone rule files -----" up to "----- Calendrical issues -----", but should be reworded somewhat to remove phrases like "When this package is installed,". Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "David Keegel" <djk@cybersource.com.au> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: "Paul Eggert" <eggert@CS.UCLA.EDU>; <tz@lecserver.nci.nih.gov> Sent: Sun, 2004 Jun 13 00:49 Subject: Re: Time Zone Localizations

...

There are general rules for choosing TZID names and various other things in the "Theory" file within tzdata.

I'm not sure if that helps Mark, because I'm not really clear on exactly what sort of specifications he is looking for.

But in any case, newcomers seem to have difficulty finding things like the naming conventions in the Theory file. It might be worth putting pointers to the Theory file from tzcode and the zone.tab file from tzdata in places like http://www.twinsun.com/tz/tz-link.htm and maybe the README for tzcode. Perhaps there should even be a short README for tzdata and/or ftp://elsie.nci.nih.gov/pub/.

] The latter is what we need; it needed be a long document, and I'd be glad to put ] together a draft for your consideration. ] ] Mark ] __________________________________ ] http://www.macchiato.com ] ___ _______________________________________________________________ ___ ] ] ----- Original Message ----- ] From: "Paul Eggert" <eggert@CS.UCLA.EDU> ] To: "Mark Davis" <mark.davis@jtcsv.com> ] Cc: <tz@lecserver.nci.nih.gov> ] Sent: Fri, 2004 Jun 11 23:33 ] Subject: Re: Time Zone Localizations ] ] ] > "Mark Davis" <mark.davis@jtcsv.com> writes: ] > ] > > we can't point people to a document that they can read, that ] > > describes a specification. ] > ] > I don't see why not. If you want a list of Olson TZIDs, you can make ] > a copy of zone.tab, and point people at it. Or if you want the ] > specification for the zic input file format, you can make a copy of ] > the zic man page, and point people at that. This stuff is all in the ] > public domain. ] > ] > On the other hand, if the existing documentation is not enough for ] > your needs, then it might be helpful if you could write something that ] > will do the job, and contribute it back to the mainline distribution. ] > Documentation is often the hardest job in volunteer efforts like ] > these, and I'd welcome any good contributions in this area.

___________________________________________________________________________ David Keegel <djk@cybersource.com.au> http://www.cyber.com.au/users/djk/ Cybersource P/L: Linux/Unix Systems Administration Consulting/Contracting

John Cowan

1:30 a.m.

Mark Davis scripsit:

...

3. What is the meaning of an TZID

Just what is the meaning of "meaning" in this context?

...

4. What is the versioning scheme, including assurance that: - once a version is issued it is never changed. - TZIDs are stable, in the sense that they will never be withdrawn or reused with a substantially different semantic in later versions

Again, what does that mean? New York might move to a new timezone, if Congress so decided, or if it became an independent country (hey, we pay them more than they pay us). In that case the Eastern zone would have to be renamed America/Philadelphia, or whatever. More probably, Congress might change the DST rules, in which case the current predictions of the U.S. zones would become incorrect and would have to be updated. The only other conceivable scenario would be for the Big Apple to have its name changed (Nueva York might be a possibility). In that case, some other New York might come to prominence such that it became the largest city in some other time zone in the Americas. That's not very likely, but it would constitute a genuine reuse. The advantage and disadvantage of Olson TZIDs is that they aren't arbitrary: they depend entirely on facts on the ground. -- John Cowan jcowan@reutershealth.com http://www.reutershealth.com "Mr. Lane, if you ever wish anything that I can do, all you will have to do will be to send me a telegram asking and it will be done." "Mr. Hearst, if you ever get a telegram from me asking you to do anything, you can put the telegram down as a forgery."

Mark Davis

4:17 a.m.

comments below. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "John Cowan" <cowan@ccil.org> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: "David Keegel" <djk@cybersource.com.au>; "Paul Eggert" <eggert@CS.UCLA.EDU>; <tz@lecserver.nci.nih.gov> Sent: Sun, 2004 Jun 13 18:30 Subject: Re: Time Zone Localizations

...

Mark Davis scripsit:

...
3. What is the meaning of an TZID

Just what is the meaning of "meaning" in this context?

A TZID is not just a sequence of letters, it has a meaning. That meaning appears to be explained in Threory, but your comments below make me very nervous.

...

...
4. What is the versioning scheme, including assurance that: - once a version is issued it is never changed. - TZIDs are stable, in the sense that they will never be withdrawn or reused with a substantially different semantic in later versions

Again, what does that mean? New York might move to a new timezone, if Congress so decided, or if it became an independent country (hey, we pay them more than they pay us). In that case the Eastern zone would have to be renamed America/Philadelphia, or whatever.

This is exactly the kind of thing that needs to be made clear, exactly what needs to be specified by the "meaning". There are at least two choices: 1. America/New_York will always be the timezone for the city of New York, even if it secedes from the US, joins Canada, and goes on Newfoundland time. 2. America/New_York is a stand-in for US Eastern Time. If New York City secedes and changes, the America/New_York TZID will associated with US Eastern Time. Of course, that may be unlikely for NYC, but the principle has to be clear so that its application to other TZIDs is well-defined.

...

More probably, Congress might change the DST rules, in which case the current predictions of the U.S. zones would become incorrect and would have to be updated.

The only other conceivable scenario would be for the Big Apple to have its name changed (Nueva York might be a possibility). In that case, some other New York might come to prominence such that it became the largest city in some other time zone in the Americas. That's not very likely, but it would constitute a genuine reuse.

And if America/New_York is to be stable, that would, of course, be a huge mistake, akin to changing CS from Czechoslovakia to Serbia and Montenegro. Simply because the city were renamed should have no effect on the TZID. If it were necessary to have another city because, essentially, the zone that was associated with NYC needs to be split, that is another matter. One would expect a new TZID to be introduced in that case.

...

The advantage and disadvantage of Olson TZIDs is that they aren't arbitrary: they depend entirely on facts on the ground.

That is a bit too facile. They don't depend entirely on the facts on the ground; they have to have a clear meaning so that when the "facts on ground" change, one can have reasonable expectations as to what principles would be maintained in adjusting the TZIDs to the new circumstances.

...

-- John Cowan jcowan@reutershealth.com http://www.reutershealth.com "Mr. Lane, if you ever wish anything that I can do, all you will have to do will be to send me a telegram asking and it will be done." "Mr. Hearst, if you ever get a telegram from me asking you to do anything, you can put the telegram down as a forgery."

Paul Eggert

5:09 a.m.

"Mark Davis" <mark.davis@jtcsv.com> writes:

...

There are at least two choices:

1. America/New_York will always be the timezone for the city of New York, even if it secedes from the US, joins Canada, and goes on Newfoundland time.

That's correct.

...

2. America/New_York is a stand-in for US Eastern Time. If New York City secedes and changes, the America/New_York TZID will associated with US Eastern Time.

No, America/New_York is intended to be New York time.

...

And if America/New_York is to be stable, that would, of course, be a huge mistake,

That's what the 'backward' file is for. For example, when we discovered that the name "Ulaanbaatar" had become far more common in English than "Ulan Bator", we renamed Asia/Ulan_Bator to Asia/Ulaanbaatar and put a link in the "backward" file. This sort of thing happens occasionally, just as ISO 3166 renames country codes.

...

they have to have a clear meaning so that when the "facts on ground" change, one can have reasonable expectations as to what principles would be maintained in adjusting the TZIDs to the new circumstances.

This issue is discussed at length in the Theory file.

John Cowan

5:53 a.m.

Mark Davis scripsit:

...

1. America/New_York will always be the timezone for the city of New York, even if it secedes from the US, joins Canada, and goes on Newfoundland time.

Cool idea. That way all other Canadians can hate New Yorkers up and down and sideways, taking some of the pressure off Toronto.

...

2. America/New_York is a stand-in for US Eastern Time. If New York City secedes and changes, the America/New_York TZID will associated with US Eastern Time.

"will remain", I suppose. No, Philadelphia (as of 1990, anyhow; I don't have current population data) would take over as the exemplar.

...

And if America/New_York is to be stable, that would, of course, be a huge mistake, akin to changing CS from Czechoslovakia to Serbia and Montenegro. Simply because the city were renamed should have no effect on the TZID.

The TZID would be reassigned with an alias provided. This has happened. However, AFAIK no TZID has ever been reused with a different meaning. -- Here lies the Christian, John Cowan judge, and poet Peter, http://www.reutershealth.com Who broke the laws of God http://www.ccil.org/~cowan and man and metre. jcowan@reutershealth.com

Paul Eggert

4:16 a.m.

"Mark Davis" <mark.davis@jtcsv.com> writes:

...

1. What are the all valid Olson TZIDs 2. How to determine which are 'canonical' and which are simply included for compatibility ... 5. An explicit description of the data representation for all of the data files.

We've covered these issues in our emails so far, I think, so all you'd need to do is write it all down.

...

3. What is the meaning of an TZID

Sorry, I don't know what you mean by "meaning of a TZID". A TZID is just a name. But perhaps you can just consult the Theory file to see the naming convention.

...

4. What is the versioning scheme,

You can get many (most?) old versions at <ftp://munnari.oz.au/pub/oldtz/>; the versioning scheme should be fairly evident though I suppose it wouldn't hurt to write it up.

...

including assurance that:

I'm afraid there is no warranty of any kind. This is entirely an informal volunteer effort.

...

- once a version is issued it is never changed.

That's been true in practice. For example, once tzdata2003d.tar.gz was issued, it wasn't changed; instead a newer version tzdata2003e.tar.gz was issued.

...

- TZIDs are stable, in the sense that they will never be withdrawn or reused with a substantially different semantic in later versions

That's also been true in practice, mostly. Generally speaking, TZIDs are never withdrawn; they're just moved to the 'backward' file. However, I can think of one exception. In 1994 some of the GMT-related TZIDs did change their semantics to conform to POSIX. For example, the old TZID "GMT-12" was withdrawn and its replacement is called "Etc/GMT+12"; this because POSIX required a different semantics for TZ="GMT-12". This sort of confusion is one of the reasons why I don't encourage the use of TZIDs like "Etc/GMT+12".

Paul Eggert

12:08 a.m.

"Mark Davis" <mark.davis@jtcsv.com> writes:

...

If two IDs have exactly the same behavior since the time when time zones were adopted, and have always been in the same country over that period, we only want one of them to be in the main list.

If two Olson zone names are aliased via a "Link" command in the Olson database, it could be that they are true aliases (e.g., Asia/Nicosia and Europe/Nicosia) or it could be that they happen to be same since 1970 and differed before then but we're not sure about the details (e.g., Europe/Oslo and Arctic/Longyearbyen) or it could be that they happen to have been the same since time zones were introduced but quite possibly will differ in the future (e.g., Antarctica/McMurdo and Antarctica/South_Pole). The database itself doesn't tell you which of these possibilities apply (though the comments give hints). However, one can safely conclude that if the Link command is in the "backward" or "etcetera" file then it is a true alias. I think that for your purposes it's probably safe to ignore all Link commands, except perhaps for those needed to establish the existence of an Olson zone name for a particular country. If by "country" one means ISO 3166 country code, these zone names would be: Arctic/Longyearbyen Europe/Bratislava Europe/Ljubljana Europe/San_Marino Europe/Sarajevo Europe/Skopje Europe/Vatican Europe/Zagreb

Mark Davis

1:21 a.m.

Yes, it would be very nice if there were explicit data in the database that would allow someone to programmatically derive the following: 1. The set of links to not skip (because then there would be country IDs with no TZID).

...

Arctic/Longyearbyen // or Atlantic/Jan_Mayen, don't care which Europe/Bratislava Europe/Ljubljana Europe/San_Marino Europe/Sarajevo Europe/Skopje Europe/Vatican Europe/Zagreb (http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/zon... og.html#skipped_aliases)

2. The set of oddballs to skip (in forming a choice list) WET, CET, MET, EET, Asia/Riyadh87, Asia/Riyadh88, Asia/Riyadh89 3. And if the database had unique TZIDs corresponding to the 'missing' ISO country codes BV, HM, and a mapping in zone.tab for those, and from some private-use ISO country code to the GMT* codes (I suggested ZZ, but any one of the following are available: AA, QM-QZ, XA-XZ and ZZ). Actually, if the oddballs in #2 just didn't have a mapping from a private-use ISO code, that would suffice. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Paul Eggert" <eggert@CS.UCLA.EDU> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov>; "Chuck Soper" <chucks@lmi.net> Sent: Fri, 2004 Jun 11 17:08 Subject: Re: Time Zone Localizations

...

"Mark Davis" <mark.davis@jtcsv.com> writes:

...
If two IDs have exactly the same behavior since the time when time zones were adopted, and have always been in the same country over that period, we only want one of them to be in the main list.

If two Olson zone names are aliased via a "Link" command in the Olson database, it could be that they are true aliases (e.g., Asia/Nicosia and Europe/Nicosia) or it could be that they happen to be same since 1970 and differed before then but we're not sure about the details (e.g., Europe/Oslo and Arctic/Longyearbyen) or it could be that they happen to have been the same since time zones were introduced but quite possibly will differ in the future (e.g., Antarctica/McMurdo and Antarctica/South_Pole). The database itself doesn't tell you which of these possibilities apply (though the comments give hints). However, one can safely conclude that if the Link command is in the "backward" or "etcetera" file then it is a true alias.

I think that for your purposes it's probably safe to ignore all Link commands, except perhaps for those needed to establish the existence of an Olson zone name for a particular country. If by "country" one means ISO 3166 country code, these zone names would be:

Arctic/Longyearbyen Europe/Bratislava Europe/Ljubljana Europe/San_Marino Europe/Sarajevo Europe/Skopje Europe/Vatican Europe/Zagreb

Paul Eggert

2:09 a.m.

"Mark Davis" <mark.davis@jtcsv.com> writes:

...

...
Arctic/Longyearbyen // or Atlantic/Jan_Mayen, don't care which

I'd suggest Longyearbyen over Jan Mayen, as its population is much larger.

...

3. And if the database had unique TZIDs corresponding to the 'missing' ISO country codes BV, HM

I think someone else has already addressed this issue, but those "countries" are uninhabited. The TZ database attempts to record the clock values that people actually use, so if there are no people then the local time is undefined. More generally: as the number of people (at a location) shrinks, the question "what is the local time?" becomes more and more arbitrary. In some Antarctic locations it seems that the answer really and truly depends on who you're talking to. When the number of people equals zero, the value is undefined, so it doesn't make sense to put it into the database. To give another instance of this problem: the uninhabited island of Clipperton is officially in the PF country code, but it doesn't correspond to any of the TZIDs for PF (Pacific/Tahiti, Pacific/Marquesas, Pacific/Gambier). If Clipperton ever becomes inhabited, it'd undoubtedly have a UTC offset that differed from those three entries (and quite possibly it would no longer belong to PF).

...

From the TZ database point of view, Clipperton is just like Bouvet: it's a small patch of land without any inhabitants, so it doesn't get an entry, even though it would deserve one if it were inhabited.

Masayoshi Okutsu

1:43 p.m.

This is a bit off from the proposal, but related to time zone localizations. It appears that the Locale Data Markup Language spec for <timeZoneNames> (http://www.unicode.org/reports/tr35/#%3CtimeZoneNames%3E) assumes that a time zone has a single set of long and short names, which assumption is not valid if a system supports historical time zone changes. Actually, the time zone support in Java has this problem because it supports historical changes since 1.4 and always display the "latest" time zone names. I planned to fix it in J2SE 1.5 (a.k.a. Tiger), but I couldn't due to another commitment. Is it possible for CLDR to make corrections to the <timeZoneNames> spec so that it can represent all historical name changes? Thanks, -- Masayoshi Okutsu Java Internationalization Sun Microsystems (K.K.) Mark Davis wrote:

...

The common locale data repository project (CLDR) hosted by the Unicode consortium (www.unicode.org/cldr/) provides for translations of time zone IDs, based on the public domain time zone database at ftp://elsie.nci.nih.gov/pub/. A number of issues have come up concerning those translations, and we have put together a proposal for changing the way that is done. The goal would be to make changes in CLDR 1.1, which would be released around mid-October of this year. The current version of the proposal is at:

http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim...

I'd very much appreciate any feedback on the proposal.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

Mark Davis

3:25 p.m.

Actually, this is directly related, since LDML is the format used for CLDR. However, the comment is based on a misunderstanding: LDML currently does allow for translation of *all* of the timezone IDs, modern and historical. The problems we are trying to address with this proposal are that the sheer volume of translations is difficult to manage, *and* many languages just don't have corresponding terms. And we didn't give guidance before as to which IDs were the most important to translate, so the translations that are in CLDR were not done in any kind of priority order. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 06:43 Subject: Re: Time Zone Localizations This is a bit off from the proposal, but related to time zone localizations. It appears that the Locale Data Markup Language spec for <timeZoneNames> (http://www.unicode.org/reports/tr35/#%3CtimeZoneNames%3E) assumes that a time zone has a single set of long and short names, which assumption is not valid if a system supports historical time zone changes. Actually, the time zone support in Java has this problem because it supports historical changes since 1.4 and always display the "latest" time zone names. I planned to fix it in J2SE 1.5 (a.k.a. Tiger), but I couldn't due to another commitment. Is it possible for CLDR to make corrections to the <timeZoneNames> spec so that it can represent all historical name changes? Thanks, -- Masayoshi Okutsu Java Internationalization Sun Microsystems (K.K.) Mark Davis wrote:

...

The common locale data repository project (CLDR) hosted by the Unicode consortium (www.unicode.org/cldr/) provides for translations of time zone IDs, based on the public domain time zone database at ftp://elsie.nci.nih.gov/pub/. A number of issues have come up concerning those translations, and we have put together a proposal for changing the way that is done. The goal would be to make changes in CLDR 1.1, which would be released around mid-October of this year. The current version of the proposal is at:

http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim... one_localization.html

I'd very much appreciate any feedback on the proposal.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

Masayoshi Okutsu

3:41 p.m.

Mark Davis wrote:

...

Actually, this is directly related, since LDML is the format used for CLDR. However, the comment is based on a misunderstanding: LDML currently does allow for translation of *all* of the timezone IDs, modern and historical.

I guess you don't translate timezone IDs... Anyway, do you mean that LDML allows users to define DTD? (Sorry if this is not a correct way to talk about XML...) So the syntax of <zone> is really user-defined? Thanks, Masayoshi

...

The problems we are trying to address with this proposal are that the sheer volume of translations is difficult to manage, *and* many languages just don't have corresponding terms. And we didn't give guidance before as to which IDs were the most important to translate, so the translations that are in CLDR were not done in any kind of priority order.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 06:43 Subject: Re: Time Zone Localizations

This is a bit off from the proposal, but related to time zone localizations.

It appears that the Locale Data Markup Language spec for <timeZoneNames> (http://www.unicode.org/reports/tr35/#%3CtimeZoneNames%3E) assumes that a time zone has a single set of long and short names, which assumption is not valid if a system supports historical time zone changes. Actually, the time zone support in Java has this problem because it supports historical changes since 1.4 and always display the "latest" time zone names. I planned to fix it in J2SE 1.5 (a.k.a. Tiger), but I couldn't due to another commitment.

Is it possible for CLDR to make corrections to the <timeZoneNames> spec so that it can represent all historical name changes?

Thanks, -- Masayoshi Okutsu Java Internationalization Sun Microsystems (K.K.)

Mark Davis wrote:

...
The common locale data repository project (CLDR) hosted by the Unicode consortium (www.unicode.org/cldr/) provides for translations of time zone IDs, based on the public domain time zone database at ftp://elsie.nci.nih.gov/pub/.

A

...
number of issues have come up concerning those translations, and we have put together a proposal for changing the way that is done. The goal would be to

make

...
changes in CLDR 1.1, which would be released around mid-October of this year. The current version of the proposal is at:

http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim...

one_localization.html

...
I'd very much appreciate any feedback on the proposal.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

Mark Davis

6:19 p.m.

I don't know where you are getting that. They are *not* user-defined IDs. The text in http://www.unicode.org/reports/tr35/ defines the IDs as matching the IDs in ftp://elsie.nci.nih.gov/pub/. See also http://www.unicode.org/cldr/data_formats.html#Display_Names also. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 08:41 Subject: Re: Time Zone Localizations Mark Davis wrote:

...

Actually, this is directly related, since LDML is the format used for CLDR. However, the comment is based on a misunderstanding: LDML currently does allow for translation of *all* of the timezone IDs, modern and historical.

...

The problems we are trying to address with this proposal are that the sheer volume of translations is difficult to manage, *and* many languages just don't have corresponding terms. And we didn't give guidance before as to which IDs were the most important to translate, so the translations that are in CLDR were not done in any kind of priority order.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 06:43 Subject: Re: Time Zone Localizations

This is a bit off from the proposal, but related to time zone localizations.

It appears that the Locale Data Markup Language spec for <timeZoneNames> (http://www.unicode.org/reports/tr35/#%3CtimeZoneNames%3E) assumes that a time zone has a single set of long and short names, which assumption is not valid if a system supports historical time zone changes. Actually, the time zone support in Java has this problem because it supports historical changes since 1.4 and always display the "latest" time zone names. I planned to fix it in J2SE 1.5 (a.k.a. Tiger), but I couldn't due to another commitment.

Is it possible for CLDR to make corrections to the <timeZoneNames> spec so that it can represent all historical name changes?

Thanks, -- Masayoshi Okutsu Java Internationalization Sun Microsystems (K.K.)

Mark Davis wrote:

...
The common locale data repository project (CLDR) hosted by the Unicode consortium (www.unicode.org/cldr/) provides for translations of time zone IDs, based on the public domain time zone database at ftp://elsie.nci.nih.gov/pub/.

A

...
number of issues have come up concerning those translations, and we have put together a proposal for changing the way that is done. The goal would be to

make

...
changes in CLDR 1.1, which would be released around mid-October of this year. The current version of the proposal is at:

http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim... z

one_localization.html

...
I'd very much appreciate any feedback on the proposal.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

Masayoshi Okutsu

12:43 a.m.

Probably you have misunderstood my first question... Many zones have changed their time zones in their history. It's not a valid assumption that a single zone ID represents a single time zone. It may be confusing to say "zone changes time zones". Let me give you an example. Here's the history of Asia/Singapore. # Zone NAME GMTOFF RULES FORMAT [UNTIL] Zone Asia/Singapore 6:55:25 - LMT 1901 Jan 1 6:55:25 - SMT 1905 Jun 1 # Singapore M.T. 7:00 - MALT 1933 Jan 1 # Malaya Time 7:00 0:20 MALST 1936 Jan 1 7:20 - MALT 1941 Sep 1 7:30 - MALT 1942 Feb 16 9:00 - JST 1945 Sep 12 7:30 - MALT 1965 Aug 9 # independence 7:30 - SGT 1982 Jan 1 # Singapore Time 8:00 - SGT During 1942-02-16 until 1945-09-12, Asia/Singapore's time zone was Japan Standard Time (GMT+09:00). (Sorry if this isn't a good example...) The LDML spec didn't look like supporting all historical time zones within a single zone. So what I'd like to see is something organized like: <zone type="Asia/Singapore" > <zoneNameSet format="SGT"> <long> <generic>Singapore Time</generic> <standard>Singapore Time</standard> <daylight>Singapore Time</daylight> </long> <short> <generic>SGT</generic> <standard>SGT</standard> <daylight>SGT</daylight> </short> </zoneNameSet> ... <zoneNameSet format="LMT"> <long> <generic>Local Mean Time</generic> ... <short> <generic>LMT</generic> ... </zoneNameSet> <exemplarCity>Singapore</exemplarCity> </zone> Obviously use of this structure involves redundant names. Probably it should have all name sets which can be referred to by <zone>. I got several bug reports on Java time zone support because it supports all historical changes since 1.4. Some customers were just confused with historically correct local time. Some of them could have been avoided if Java was capable of giving historically correct time zone names. Hope my question is now clearer. Thanks, Masayoshi Mark Davis wrote:

...

I don't know where you are getting that. They are *not* user-defined IDs. The text in http://www.unicode.org/reports/tr35/ defines the IDs as matching the IDs in ftp://elsie.nci.nih.gov/pub/. See also http://www.unicode.org/cldr/data_formats.html#Display_Names also.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 08:41 Subject: Re: Time Zone Localizations

Mark Davis wrote:

...
Actually, this is directly related, since LDML is the format used for CLDR. However, the comment is based on a misunderstanding: LDML currently does allow for translation of *all* of the timezone IDs, modern and historical.

I guess you don't translate timezone IDs... Anyway, do you mean that LDML allows users to define DTD? (Sorry if this is not a correct way to talk about XML...) So the syntax of <zone> is really user-defined?

Thanks, Masayoshi

...
The problems we are trying to address with this proposal are that the sheer volume of translations is difficult to manage, *and* many languages just don't have corresponding terms. And we didn't give guidance before as to which IDs were the most important to translate, so the translations that are in CLDR were not done in any kind of priority order.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 06:43 Subject: Re: Time Zone Localizations

This is a bit off from the proposal, but related to time zone localizations.

It appears that the Locale Data Markup Language spec for <timeZoneNames> (http://www.unicode.org/reports/tr35/#%3CtimeZoneNames%3E) assumes that a time zone has a single set of long and short names, which assumption is not valid if a system supports historical time zone changes. Actually, the time zone support in Java has this problem because it supports historical changes since 1.4 and always display the "latest" time zone names. I planned to fix it in J2SE 1.5 (a.k.a. Tiger), but I couldn't due to another commitment.

Is it possible for CLDR to make corrections to the <timeZoneNames> spec so that it can represent all historical name changes?

Thanks, -- Masayoshi Okutsu Java Internationalization Sun Microsystems (K.K.)

Mark Davis wrote:

...
The common locale data repository project (CLDR) hosted by the Unicode consortium (www.unicode.org/cldr/) provides for translations of time zone IDs, based on the public domain time zone database at ftp://elsie.nci.nih.gov/pub/.

A

...
number of issues have come up concerning those translations, and we have put together a proposal for changing the way that is done. The goal would be to

make

...
changes in CLDR 1.1, which would be released around mid-October of this year. The current version of the proposal is at:

http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim...

z

...
...
one_localization.html

...
I'd very much appreciate any feedback on the proposal.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

Masayoshi Okutsu

2:35 p.m.

It's strange. I responded to your message about 13 hours ago, but it doesn't show up yet... Let me try again (with a short version). I believe you misunderstood my first question. It's simply an invalid assumption that a zone (in the Olson zoneinfo) represents a single time zone. How do you describe the following with the LDML spec, for example? # Zone NAME GMTOFF RULES FORMAT [UNTIL] Zone Asia/Singapore 6:55:25 - LMT 1901 Jan 1 6:55:25 - SMT 1905 Jun 1 # Singapore M.T. 7:00 - MALT 1933 Jan 1 # Malaya Time 7:00 0:20 MALST 1936 Jan 1 7:20 - MALT 1941 Sep 1 7:30 - MALT 1942 Feb 16 9:00 - JST 1945 Sep 12 7:30 - MALT 1965 Aug 9 # independence 7:30 - SGT 1982 Jan 1 # Singapore Time 8:00 - SGT If the given date is, for example, 1942-02-16, the local time zone name has to be "JST" (in the short form). I didn't think LDML would allow for defining historical time zone names. I didn't mean "historical" tome zone *IDs*. (I really don't understand "modern" and "historical" in your message, though. All of (most of?) the zones are modern. They just have their own history. And we just don't know what will happen to zones in the future. What you think is "modern" today might be "historical" tomorrow.) Thanks, Masayoshi Mark Davis wrote:

...

I don't know where you are getting that. They are *not* user-defined IDs. The text in http://www.unicode.org/reports/tr35/ defines the IDs as matching the IDs in ftp://elsie.nci.nih.gov/pub/. See also http://www.unicode.org/cldr/data_formats.html#Display_Names also.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 08:41 Subject: Re: Time Zone Localizations

Mark Davis wrote:

...
Actually, this is directly related, since LDML is the format used for CLDR. However, the comment is based on a misunderstanding: LDML currently does allow for translation of *all* of the timezone IDs, modern and historical.

I guess you don't translate timezone IDs... Anyway, do you mean that LDML allows users to define DTD? (Sorry if this is not a correct way to talk about XML...) So the syntax of <zone> is really user-defined?

Thanks, Masayoshi

...
The problems we are trying to address with this proposal are that the sheer volume of translations is difficult to manage, *and* many languages just don't have corresponding terms. And we didn't give guidance before as to which IDs were the most important to translate, so the translations that are in CLDR were not done in any kind of priority order.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 06:43 Subject: Re: Time Zone Localizations

This is a bit off from the proposal, but related to time zone localizations.

It appears that the Locale Data Markup Language spec for <timeZoneNames> (http://www.unicode.org/reports/tr35/#%3CtimeZoneNames%3E) assumes that a time zone has a single set of long and short names, which assumption is not valid if a system supports historical time zone changes. Actually, the time zone support in Java has this problem because it supports historical changes since 1.4 and always display the "latest" time zone names. I planned to fix it in J2SE 1.5 (a.k.a. Tiger), but I couldn't due to another commitment.

Is it possible for CLDR to make corrections to the <timeZoneNames> spec so that it can represent all historical name changes?

Thanks, -- Masayoshi Okutsu Java Internationalization Sun Microsystems (K.K.)

Mark Davis wrote:

...
The common locale data repository project (CLDR) hosted by the Unicode consortium (www.unicode.org/cldr/) provides for translations of time zone IDs, based on the public domain time zone database at ftp://elsie.nci.nih.gov/pub/.

A

...
number of issues have come up concerning those translations, and we have put together a proposal for changing the way that is done. The goal would be to

make

...
changes in CLDR 1.1, which would be released around mid-October of this year. The current version of the proposal is at:

http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim...

z

...
...
one_localization.html

...
I'd very much appreciate any feedback on the proposal.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

Mark Davis

4:24 p.m.

I think we may be talking past one another. LDML provide for a way to *localize* the Olson TZIDs. For example, you can localize the term "Asia/Singapore". That latter ID is the thing that identifies a timezone. It does not at all attempt to provide an alternative to *computing* the results of applying the Olson TZID to any given point in time. That is left to the implementation of the Olson time zone database. There is no need, nor desire, to duplicate that in the LDML. As far as we are concerned, 'historic' time zone support simply means the ability for the time zone computation to reflect differences in behavior that existed in the past but no longer exist. An implementation that was only limited to 'modern' time zones (like Windows, or older Java or ICU) would not be able to distinguish between two zones that have the same behavior now, but differed at some time in the past. So the Olson time zone database encompasses historic time zones, and has historic time zone IDs that LDML allows people to attach localizations to. Is that clearer? Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Sat, 2004 Jun 12 07:35 Subject: Re: Time Zone Localizations It's strange. I responded to your message about 13 hours ago, but it doesn't show up yet... Let me try again (with a short version). I believe you misunderstood my first question. It's simply an invalid assumption that a zone (in the Olson zoneinfo) represents a single time zone. How do you describe the following with the LDML spec, for example? # Zone NAME GMTOFF RULES FORMAT [UNTIL] Zone Asia/Singapore 6:55:25 - LMT 1901 Jan 1 6:55:25 - SMT 1905 Jun 1 # Singapore M.T. 7:00 - MALT 1933 Jan 1 # Malaya Time 7:00 0:20 MALST 1936 Jan 1 7:20 - MALT 1941 Sep 1 7:30 - MALT 1942 Feb 16 9:00 - JST 1945 Sep 12 7:30 - MALT 1965 Aug 9 # independence 7:30 - SGT 1982 Jan 1 # Singapore Time 8:00 - SGT If the given date is, for example, 1942-02-16, the local time zone name has to be "JST" (in the short form). I didn't think LDML would allow for defining historical time zone names. I didn't mean "historical" tome zone *IDs*. (I really don't understand "modern" and "historical" in your message, though. All of (most of?) the zones are modern. They just have their own history. And we just don't know what will happen to zones in the future. What you think is "modern" today might be "historical" tomorrow.) Thanks, Masayoshi Mark Davis wrote:

...

I don't know where you are getting that. They are *not* user-defined IDs. The text in http://www.unicode.org/reports/tr35/ defines the IDs as matching the IDs in ftp://elsie.nci.nih.gov/pub/. See also http://www.unicode.org/cldr/data_formats.html#Display_Names also.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 08:41 Subject: Re: Time Zone Localizations

Mark Davis wrote:

...
Actually, this is directly related, since LDML is the format used for CLDR. However, the comment is based on a misunderstanding: LDML currently does allow for translation of *all* of the timezone IDs, modern and historical.

I guess you don't translate timezone IDs... Anyway, do you mean that LDML allows users to define DTD? (Sorry if this is not a correct way to talk about XML...) So the syntax of <zone> is really user-defined?

Thanks, Masayoshi

...
The problems we are trying to address with this proposal are that the sheer volume of translations is difficult to manage, *and* many languages just don't have corresponding terms. And we didn't give guidance before as to which IDs were the most important to translate, so the translations that are in CLDR were not done in any kind of priority order.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 06:43 Subject: Re: Time Zone Localizations

This is a bit off from the proposal, but related to time zone localizations.

It appears that the Locale Data Markup Language spec for <timeZoneNames> (http://www.unicode.org/reports/tr35/#%3CtimeZoneNames%3E) assumes that a time zone has a single set of long and short names, which assumption is not valid if a system supports historical time zone changes. Actually, the time zone support in Java has this problem because it supports historical changes since 1.4 and always display the "latest" time zone names. I planned to fix it in J2SE 1.5 (a.k.a. Tiger), but I couldn't due to another commitment.

Is it possible for CLDR to make corrections to the <timeZoneNames> spec so that it can represent all historical name changes?

Thanks, -- Masayoshi Okutsu Java Internationalization Sun Microsystems (K.K.)

Mark Davis wrote:

...
The common locale data repository project (CLDR) hosted by the Unicode consortium (www.unicode.org/cldr/) provides for translations of time zone IDs, based on the public domain time zone database at ftp://elsie.nci.nih.gov/pub/.

A

...
number of issues have come up concerning those translations, and we have put together a proposal for changing the way that is done. The goal would be to

make

...
changes in CLDR 1.1, which would be released around mid-October of this year. The current version of the proposal is at:

http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim... _

z

...
...
one_localization.html

...
I'd very much appreciate any feedback on the proposal.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

Masayoshi Okutsu

12:36 p.m.

Probably *my* comment wasn't clear enough for you. But I believe that I've been talking about the same thing as what was pointed out by Paul (the second bullet of Message-id: <87isdxxm5a.fsf@penguin.cs.ucla.edu>). I *am* talking about the names part, not the computing part. For example, Java is capable of *computing* correct local time as of 1942-02-16 in Asia/Singapore. However, since Java has only the last time zone (display) names (i.e., "Singapore Time" and "SGT" for Asia/Singapore), Java date/time formatting produces "Singapore Time" or "SGT" for 1942-02-16, which should be "Japan Standard Time" or "JST". (tzcode is capable of producing correct abbreviations. But I didn't think that abbreviation only support for historic names was appropriate for Java and I didn't add them in 1.4.) I see the same issue in the LDML spec in <timeZoneNames> as Java currently has. My earlier comment included the following which may explain what I'd like to see in the LDML spec. <timeZoneNames> <zone type="America/Los_Angeles" > <zoneNameSet format="SGT"> <long> <generic>Singapore Time</generic> ... </long> <short> <generic>SGT</generic> ... </short> </zoneNameSet> ... <zoneNameSet format="LMT"> <long> <generic>Local Mean Time</generic> ... </long> <short> <generic>LMT</generic> ... </short> </zoneNameSet> </zone> ... </timeZoneNames> The format="..." may be problematic. But I haven't thought out any real syntax for the requirement. Thanks, Masayoshi Mark Davis wrote:

...

I think we may be talking past one another. LDML provide for a way to *localize* the Olson TZIDs. For example, you can localize the term "Asia/Singapore". That latter ID is the thing that identifies a timezone.

It does not at all attempt to provide an alternative to *computing* the results of applying the Olson TZID to any given point in time. That is left to the implementation of the Olson time zone database. There is no need, nor desire, to duplicate that in the LDML.

As far as we are concerned, 'historic' time zone support simply means the ability for the time zone computation to reflect differences in behavior that existed in the past but no longer exist. An implementation that was only limited to 'modern' time zones (like Windows, or older Java or ICU) would not be able to distinguish between two zones that have the same behavior now, but differed at some time in the past. So the Olson time zone database encompasses historic time zones, and has historic time zone IDs that LDML allows people to attach localizations to.

Is that clearer?

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Sat, 2004 Jun 12 07:35 Subject: Re: Time Zone Localizations

It's strange. I responded to your message about 13 hours ago, but it doesn't show up yet... Let me try again (with a short version).

I believe you misunderstood my first question. It's simply an invalid assumption that a zone (in the Olson zoneinfo) represents a single time zone. How do you describe the following with the LDML spec, for example?

# Zone NAME GMTOFF RULES FORMAT [UNTIL] Zone Asia/Singapore 6:55:25 - LMT 1901 Jan 1 6:55:25 - SMT 1905 Jun 1 # Singapore M.T. 7:00 - MALT 1933 Jan 1 # Malaya Time 7:00 0:20 MALST 1936 Jan 1 7:20 - MALT 1941 Sep 1 7:30 - MALT 1942 Feb 16 9:00 - JST 1945 Sep 12 7:30 - MALT 1965 Aug 9 # independence 7:30 - SGT 1982 Jan 1 # Singapore Time 8:00 - SGT

If the given date is, for example, 1942-02-16, the local time zone name has to be "JST" (in the short form). I didn't think LDML would allow for defining historical time zone names. I didn't mean "historical" tome zone *IDs*. (I really don't understand "modern" and "historical" in your message, though. All of (most of?) the zones are modern. They just have their own history. And we just don't know what will happen to zones in the future. What you think is "modern" today might be "historical" tomorrow.)

Thanks, Masayoshi

Mark Davis wrote:

...
I don't know where you are getting that. They are *not* user-defined IDs. The text in http://www.unicode.org/reports/tr35/ defines the IDs as matching the

IDs

...
in ftp://elsie.nci.nih.gov/pub/. See also http://www.unicode.org/cldr/data_formats.html#Display_Names also.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 08:41 Subject: Re: Time Zone Localizations

Mark Davis wrote:

...
Actually, this is directly related, since LDML is the format used for CLDR. However, the comment is based on a misunderstanding: LDML currently does allow for translation of *all* of the timezone IDs, modern and historical.

I guess you don't translate timezone IDs... Anyway, do you mean that LDML allows users to define DTD? (Sorry if this is not a correct way to talk about XML...) So the syntax of <zone> is really user-defined?

Thanks, Masayoshi

...
The problems we are trying to address with this proposal are that the sheer volume of translations is difficult to manage, *and* many languages just don't have corresponding terms. And we didn't give guidance before as to which IDs were the most important to translate, so the translations that are in CLDR

were

...
...
not done in any kind of priority order.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 06:43 Subject: Re: Time Zone Localizations

This is a bit off from the proposal, but related to time zone localizations.

It appears that the Locale Data Markup Language spec for <timeZoneNames> (http://www.unicode.org/reports/tr35/#%3CtimeZoneNames%3E) assumes that a time zone has a single set of long and short names, which assumption is not valid if a system supports historical time zone changes. Actually, the time zone support in Java has this problem because it supports historical changes since 1.4 and always display the "latest" time zone names. I planned to fix it in J2SE 1.5 (a.k.a. Tiger), but I couldn't due to another commitment.

Is it possible for CLDR to make corrections to the <timeZoneNames> spec so that it can represent all historical name changes?

Thanks, -- Masayoshi Okutsu Java Internationalization Sun Microsystems (K.K.)

Mark Davis wrote:

...
The common locale data repository project (CLDR) hosted by the Unicode consortium (www.unicode.org/cldr/) provides for translations of time zone

IDs,

...
...
...
based on the public domain time zone database at

ftp://elsie.nci.nih.gov/pub/.

...
...
...
A

...
number of issues have come up concerning those translations, and we have put together a proposal for changing the way that is done. The goal would be to

make

...
changes in CLDR 1.1, which would be released around mid-October of this year. The current version of the proposal is at:

http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim...

_

...
...
...
z

...
...
one_localization.html

...
I'd very much appreciate any feedback on the proposal.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

Mark Davis

11:57 p.m.

I am just not quite understanding what you are getting at. In CLDR, the English translation of the TZID Asia/Singapore would be in the en locale data; the Japanese translation of TZID Asia/Singapore would be in the ja locale file, etc. (Note: we don't necessarily have the data for everything yet, but the LDML format allows all of that.) The best I can make out, it sounds like what you want would be the 1942 Japanese names for the TZID Asia/Singapore, etc.. While it would certainly be possible to do that with CLDR by providing a variant locale (ja-JP-1942), it seems rather odd. When I generate a date right now, and the date happens to be in the past at some time, I don't generate it with the conventions that would have applied in *on that date* (unless I am doing a historical novel, for example). I don't write 1624-1-15 as "The Fifteenth Day of January in the Year of Our Lord Nineteen-Hundred Four-and-Twenty", or whatever would have been accepted usage at the time: I write it (in en_US) as "January 15th, 1624" (or some other modern equivalent). So I must still be misunderstanding you. (Also, we looked at using the Olson TZID abbreviations, but they don't appear to have wide currency -- people in the countries in question didn't seem to be familiar with them -- so we decided not to use them.) Mark (BTW, I will be on a trip next week, and won't be able to reply very often on this subject) __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Sun, 2004 Jun 13 05:36 Subject: Re: Time Zone Localizations Probably *my* comment wasn't clear enough for you. But I believe that I've been talking about the same thing as what was pointed out by Paul (the second bullet of Message-id: <87isdxxm5a.fsf@penguin.cs.ucla.edu>). I *am* talking about the names part, not the computing part. For example, Java is capable of *computing* correct local time as of 1942-02-16 in Asia/Singapore. However, since Java has only the last time zone (display) names (i.e., "Singapore Time" and "SGT" for Asia/Singapore), Java date/time formatting produces "Singapore Time" or "SGT" for 1942-02-16, which should be "Japan Standard Time" or "JST". (tzcode is capable of producing correct abbreviations. But I didn't think that abbreviation only support for historic names was appropriate for Java and I didn't add them in 1.4.) I see the same issue in the LDML spec in <timeZoneNames> as Java currently has. My earlier comment included the following which may explain what I'd like to see in the LDML spec. <timeZoneNames> <zone type="America/Los_Angeles" > <zoneNameSet format="SGT"> <long> <generic>Singapore Time</generic> ... </long> <short> <generic>SGT</generic> ... </short> </zoneNameSet> ... <zoneNameSet format="LMT"> <long> <generic>Local Mean Time</generic> ... </long> <short> <generic>LMT</generic> ... </short> </zoneNameSet> </zone> ... </timeZoneNames> The format="..." may be problematic. But I haven't thought out any real syntax for the requirement. Thanks, Masayoshi Mark Davis wrote:

...

I think we may be talking past one another. LDML provide for a way to *localize* the Olson TZIDs. For example, you can localize the term "Asia/Singapore". That latter ID is the thing that identifies a timezone.

It does not at all attempt to provide an alternative to *computing* the results of applying the Olson TZID to any given point in time. That is left to the implementation of the Olson time zone database. There is no need, nor desire, to duplicate that in the LDML.

As far as we are concerned, 'historic' time zone support simply means the ability for the time zone computation to reflect differences in behavior that existed in the past but no longer exist. An implementation that was only limited to 'modern' time zones (like Windows, or older Java or ICU) would not be able to distinguish between two zones that have the same behavior now, but differed at some time in the past. So the Olson time zone database encompasses historic time zones, and has historic time zone IDs that LDML allows people to attach localizations to.

Is that clearer?

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Sat, 2004 Jun 12 07:35 Subject: Re: Time Zone Localizations

It's strange. I responded to your message about 13 hours ago, but it doesn't show up yet... Let me try again (with a short version).

I believe you misunderstood my first question. It's simply an invalid assumption that a zone (in the Olson zoneinfo) represents a single time zone. How do you describe the following with the LDML spec, for example?

# Zone NAME GMTOFF RULES FORMAT [UNTIL] Zone Asia/Singapore 6:55:25 - LMT 1901 Jan 1 6:55:25 - SMT 1905 Jun 1 # Singapore M.T. 7:00 - MALT 1933 Jan 1 # Malaya Time 7:00 0:20 MALST 1936 Jan 1 7:20 - MALT 1941 Sep 1 7:30 - MALT 1942 Feb 16 9:00 - JST 1945 Sep 12 7:30 - MALT 1965 Aug 9 # independence 7:30 - SGT 1982 Jan 1 # Singapore Time 8:00 - SGT

If the given date is, for example, 1942-02-16, the local time zone name has to be "JST" (in the short form). I didn't think LDML would allow for defining historical time zone names. I didn't mean "historical" tome zone *IDs*. (I really don't understand "modern" and "historical" in your message, though. All of (most of?) the zones are modern. They just have their own history. And we just don't know what will happen to zones in the future. What you think is "modern" today might be "historical" tomorrow.)

Thanks, Masayoshi

Mark Davis wrote:

...
I don't know where you are getting that. They are *not* user-defined IDs. The text in http://www.unicode.org/reports/tr35/ defines the IDs as matching the

IDs

...
in ftp://elsie.nci.nih.gov/pub/. See also http://www.unicode.org/cldr/data_formats.html#Display_Names also.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 08:41 Subject: Re: Time Zone Localizations

Mark Davis wrote:

...
Actually, this is directly related, since LDML is the format used for CLDR. However, the comment is based on a misunderstanding: LDML currently does allow for translation of *all* of the timezone IDs, modern and historical.

I guess you don't translate timezone IDs... Anyway, do you mean that LDML allows users to define DTD? (Sorry if this is not a correct way to talk about XML...) So the syntax of <zone> is really user-defined?

Thanks, Masayoshi

...
The problems we are trying to address with this proposal are that the sheer volume of translations is difficult to manage, *and* many languages just don't have corresponding terms. And we didn't give guidance before as to which IDs were the most important to translate, so the translations that are in CLDR

were

...
...
not done in any kind of priority order.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 06:43 Subject: Re: Time Zone Localizations

This is a bit off from the proposal, but related to time zone localizations.

It appears that the Locale Data Markup Language spec for <timeZoneNames> (http://www.unicode.org/reports/tr35/#%3CtimeZoneNames%3E) assumes that a time zone has a single set of long and short names, which assumption is not valid if a system supports historical time zone changes. Actually, the time zone support in Java has this problem because it supports historical changes since 1.4 and always display the "latest" time zone names. I planned to fix it in J2SE 1.5 (a.k.a. Tiger), but I couldn't due to another commitment.

Is it possible for CLDR to make corrections to the <timeZoneNames> spec so that it can represent all historical name changes?

Thanks, -- Masayoshi Okutsu Java Internationalization Sun Microsystems (K.K.)

Mark Davis wrote:

...
The common locale data repository project (CLDR) hosted by the Unicode consortium (www.unicode.org/cldr/) provides for translations of time zone

IDs,

...
...
...
based on the public domain time zone database at

ftp://elsie.nci.nih.gov/pub/.

...
...
...
A

...
number of issues have come up concerning those translations, and we have put together a proposal for changing the way that is done. The goal would be to

make

...
changes in CLDR 1.1, which would be released around mid-October of this year. The current version of the proposal is at:

http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim e

_

...
...
...
z

...
...
one_localization.html

...
I'd very much appreciate any feedback on the proposal.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

John Cowan

3:05 a.m.

Mark Davis scripsit:

...

When I generate a date right now, and the date happens to be in the past at some time, I don't generate it with the conventions that would have applied in *on that date* (unless I am doing a historical novel, for example).

Well, that turns out not to be the case. Try "date -d 1943-01-01" on a system where GNU 'date' is available (Linux, e.g.) Set TZ to an American time zone first if need be.

...

(Also, we looked at using the Olson TZID abbreviations, but they don't appear to have wide currency -- people in the countries in question didn't seem to be familiar with them -- so we decided not to use them.)

They are not meant to be authentic, and exist because time libraries are expected to provide a time zone abbreviation even where they don't really exist. -- John Cowan jcowan@reutershealth.com www.reutershealth.com www.ccil.org/~cowan Rather than making ill-conceived suggestions for improvement based on uninformed guesses about established conventions in a field of study with which familiarity is limited, it is sometimes better to stick to merely observing the usage and listening to the explanations offered, inserting only questions as needed to fill in gaps in understanding. --Peter Constable

Mark Davis

4 a.m.

...

Well, that turns out not to be the case.

You are saying that if I generated a series of dates, one per year, going back 100 years, that the format would change in the middle? If my spreadsheet did that, I'd figure it was a bug -- *not* a feature. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "John Cowan" <cowan@ccil.org> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Sun, 2004 Jun 13 20:05 Subject: Re: Time Zone Localizations

...

Mark Davis scripsit:

...
When I generate a date right now, and the date happens to be in the past at some time, I don't generate it with the conventions that would have applied in *on that date* (unless I am doing a historical novel, for example).

Well, that turns out not to be the case. Try "date -d 1943-01-01" on a system where GNU 'date' is available (Linux, e.g.) Set TZ to an American time zone first if need be.

...
(Also, we looked at using the Olson TZID abbreviations, but they don't appear to have wide currency -- people in the countries in question didn't seem to be familiar with them -- so we decided not to use them.)

They are not meant to be authentic, and exist because time libraries are expected to provide a time zone abbreviation even where they don't really exist.

-- John Cowan jcowan@reutershealth.com www.reutershealth.com www.ccil.org/~cowan Rather than making ill-conceived suggestions for improvement based on uninformed guesses about established conventions in a field of study with which familiarity is limited, it is sometimes better to stick to merely observing the usage and listening to the explanations offered, inserting only questions as needed to fill in gaps in understanding. --Peter Constable

Paul Eggert

5 a.m.

"Mark Davis" <mark.davis@jtcsv.com> writes:

...

You are saying that if I generated a series of dates, one per year, going back 100 years, that the format would change in the middle?

Absolutely, if by "dates" you mean time stamps including the name of the local time in question. This is the only feasible way to address the problem. Locations can have several different UTC offsets over the years (not just two). For example, "Sri Lanka Time" means UTC+6 for current time stamps, but it means UTC+6:30 for time stamps in summer 1996 because Sri Lanka changed its definition of standard time in October 1996. Numeric UTC offsets avoid this problem, but of course they have some other problems of their own. There's no free lunch here.

John Cowan

5:56 a.m.

Mark Davis scripsit:

...

You are saying that if I generated a series of dates, one per year, going back 100 years, that the format would change in the middle? If my spreadsheet did that, I'd figure it was a bug -- *not* a feature.

Time zones are infested with the Real World, I fear. L.A. was on UTC-7 in January 1943, but it would be incorrect to call that Pacific Daylight Time. -- John Cowan www.ccil.org/~cowan www.reutershealth.com jcowan@reutershealth.com All "isms" should be "wasms". --Abbie

Paul Eggert

4:41 a.m.

"Mark Davis" <mark.davis@jtcsv.com> writes:

...

(Also, we looked at using the Olson TZID abbreviations, but they don't appear to have wide currency -- people in the countries in question didn't seem to be familiar with them -- so we decided not to use them.)

Yes, as tz-link.htm says in many cases they are the invention of the tz maintainers. If you have a reliable source of English-language abbreviations for these other local times, please let us know.

Masayoshi Okutsu

6:18 a.m.

Probably I've given you a confusing example. It's not related to Japanese locale at all. Since Paul and others have given another examples, like "Pacific War Time". I don't think I'd need to give additional examples. It is confusing to produce *correct* (past) local time with the current time zone name as if the current GMT offset and DST rules were applied to the local time. Thanks, Masayoshi Mark Davis wrote:

...

I am just not quite understanding what you are getting at. In CLDR, the English translation of the TZID Asia/Singapore would be in the en locale data; the Japanese translation of TZID Asia/Singapore would be in the ja locale file, etc. (Note: we don't necessarily have the data for everything yet, but the LDML format allows all of that.)

The best I can make out, it sounds like what you want would be the 1942 Japanese names for the TZID Asia/Singapore, etc.. While it would certainly be possible to do that with CLDR by providing a variant locale (ja-JP-1942), it seems rather odd. When I generate a date right now, and the date happens to be in the past at some time, I don't generate it with the conventions that would have applied in *on that date* (unless I am doing a historical novel, for example). I don't write 1624-1-15 as "The Fifteenth Day of January in the Year of Our Lord Nineteen-Hundred Four-and-Twenty", or whatever would have been accepted usage at the time: I write it (in en_US) as "January 15th, 1624" (or some other modern equivalent).

So I must still be misunderstanding you.

(Also, we looked at using the Olson TZID abbreviations, but they don't appear to have wide currency -- people in the countries in question didn't seem to be familiar with them -- so we decided not to use them.)

Mark

(BTW, I will be on a trip next week, and won't be able to reply very often on this subject)

__________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Sun, 2004 Jun 13 05:36 Subject: Re: Time Zone Localizations

Probably *my* comment wasn't clear enough for you. But I believe that I've been talking about the same thing as what was pointed out by Paul (the second bullet of Message-id: <87isdxxm5a.fsf@penguin.cs.ucla.edu>). I *am* talking about the names part, not the computing part.

For example, Java is capable of *computing* correct local time as of 1942-02-16 in Asia/Singapore. However, since Java has only the last time zone (display) names (i.e., "Singapore Time" and "SGT" for Asia/Singapore), Java date/time formatting produces "Singapore Time" or "SGT" for 1942-02-16, which should be "Japan Standard Time" or "JST". (tzcode is capable of producing correct abbreviations. But I didn't think that abbreviation only support for historic names was appropriate for Java and I didn't add them in 1.4.) I see the same issue in the LDML spec in <timeZoneNames> as Java currently has.

My earlier comment included the following which may explain what I'd like to see in the LDML spec.

<timeZoneNames> <zone type="America/Los_Angeles" > <zoneNameSet format="SGT"> <long> <generic>Singapore Time</generic> ... </long> <short> <generic>SGT</generic> ... </short> </zoneNameSet> ... <zoneNameSet format="LMT"> <long> <generic>Local Mean Time</generic> ... </long> <short> <generic>LMT</generic> ... </short> </zoneNameSet> </zone> ... </timeZoneNames>

The format="..." may be problematic. But I haven't thought out any real syntax for the requirement.

Thanks, Masayoshi

Mark Davis wrote:

...

...
I think we may be talking past one another. LDML provide for a way to

*localize*

...
the Olson TZIDs. For example, you can localize the term "Asia/Singapore". That latter ID is the thing that identifies a timezone.

It does not at all attempt to provide an alternative to *computing* the results of applying the Olson TZID to any given point in time. That is left to the implementation of the Olson time zone database. There is no need, nor desire,

to

...
duplicate that in the LDML.

As far as we are concerned, 'historic' time zone support simply means the ability for the time zone computation to reflect differences in behavior that existed in the past but no longer exist. An implementation that was only

limited

...
to 'modern' time zones (like Windows, or older Java or ICU) would not be able

to

...
distinguish between two zones that have the same behavior now, but differed at some time in the past. So the Olson time zone database encompasses historic

time

...
zones, and has historic time zone IDs that LDML allows people to attach localizations to.

Is that clearer?

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Sat, 2004 Jun 12 07:35 Subject: Re: Time Zone Localizations

It's strange. I responded to your message about 13 hours ago, but it doesn't show up yet... Let me try again (with a short version).

I believe you misunderstood my first question. It's simply an invalid assumption that a zone (in the Olson zoneinfo) represents a single time zone. How do you describe the following with the LDML spec, for example?

# Zone NAME GMTOFF RULES FORMAT [UNTIL] Zone Asia/Singapore 6:55:25 - LMT 1901 Jan 1 6:55:25 - SMT 1905 Jun 1 # Singapore M.T. 7:00 - MALT 1933 Jan 1 # Malaya Time 7:00 0:20 MALST 1936 Jan 1 7:20 - MALT 1941 Sep 1 7:30 - MALT 1942 Feb 16 9:00 - JST 1945 Sep 12 7:30 - MALT 1965 Aug 9 # independence 7:30 - SGT 1982 Jan 1 # Singapore Time 8:00 - SGT

If the given date is, for example, 1942-02-16, the local time zone name has to be "JST" (in the short form). I didn't think LDML would allow for defining historical time zone names. I didn't mean "historical" tome zone *IDs*. (I really don't understand "modern" and "historical" in your message, though. All of (most of?) the zones are modern. They just have their own history. And we just don't know what will happen to zones in the future. What you think is "modern" today might be "historical" tomorrow.)

Thanks, Masayoshi

Mark Davis wrote:

...
I don't know where you are getting that. They are *not* user-defined IDs. The text in http://www.unicode.org/reports/tr35/ defines the IDs as matching the

IDs

...
in ftp://elsie.nci.nih.gov/pub/. See also http://www.unicode.org/cldr/data_formats.html#Display_Names also.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 08:41 Subject: Re: Time Zone Localizations

Mark Davis wrote:

...
Actually, this is directly related, since LDML is the format used for CLDR. However, the comment is based on a misunderstanding: LDML currently does

allow

...
...
...
for translation of *all* of the timezone IDs, modern and historical.

I guess you don't translate timezone IDs... Anyway, do you mean that LDML allows users to define DTD? (Sorry if this is not a correct way to talk about XML...) So the syntax of <zone> is really user-defined?

Thanks, Masayoshi

...
The problems we are trying to address with this proposal are that the sheer volume of translations is difficult to manage, *and* many languages just

don't

...
...
...
have corresponding terms. And we didn't give guidance before as to which IDs were the most important to translate, so the translations that are in CLDR

were

...
...
not done in any kind of priority order.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- From: "Masayoshi Okutsu" <Masayoshi.Okutsu@Sun.COM> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Fri, 2004 Jun 11 06:43 Subject: Re: Time Zone Localizations

This is a bit off from the proposal, but related to time zone localizations.

It appears that the Locale Data Markup Language spec for <timeZoneNames> (http://www.unicode.org/reports/tr35/#%3CtimeZoneNames%3E) assumes that a time zone has a single set of long and short names, which assumption is not valid if a system supports historical time zone changes. Actually, the time zone support in Java has this problem because it supports historical changes since 1.4 and always display the "latest" time zone names. I planned to fix it in J2SE 1.5 (a.k.a. Tiger), but I couldn't due to another commitment.

Is it possible for CLDR to make corrections to the <timeZoneNames> spec so that it can represent all historical name changes?

Thanks, -- Masayoshi Okutsu Java Internationalization Sun Microsystems (K.K.)

Mark Davis wrote:

...
The common locale data repository project (CLDR) hosted by the Unicode consortium (www.unicode.org/cldr/) provides for translations of time zone

IDs,

...
...
...
based on the public domain time zone database at

ftp://elsie.nci.nih.gov/pub/.

...
...
...
A

...
number of issues have come up concerning those translations, and we have put together a proposal for changing the way that is done. The goal would be to

make

...
changes in CLDR 1.1, which would be released around mid-October of this

year.

...
...
...
...
The current version of the proposal is at:

http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim

e

...
...
...
...
_

...
...
...
z

...
...
one_localization.html

...
I'd very much appreciate any feedback on the proposal.

Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄

Guy Harris

3:14 a.m.

On Sat, Jun 12, 2004 at 09:24:34AM -0700, Mark Davis wrote:

...

I think we may be talking past one another. LDML provide for a way to *localize* the Olson TZIDs. For example, you can localize the term "Asia/Singapore".

Actually, "Asia/Singapore" is, in the Olson code, a relative pathname, so, in all locales, it's "Asia/Singapore" - i.e., the localization function is a no-op. You might want to localize a textual description of that region, but that's a different matter.

Mark Davis

4:22 a.m.

I must not be making the situation clear. The internal code is invariant across locales. But the "display name" for that code will change dramatically. That is, the TZID will not change. But its representation to local users must; because they may not know English. For example, "Europe/London" will remain constant. But the "display name" for it will reflect the representations of London (or "British Time"), such as the following: "Londain": ·ga· "Londen": ·nl· "London": ·da· ·en· ·fr· ·sv· "Londra": ·it· "Londres": ·es· ·pt· "Lontoo": ·fi· "ロンドン": ·ja· "伦敦": ·zh· "倫敦": ·zh_Hant· "런던": ·ko· Expecting otherwise is like expecting you to recognize 倫敦 as the name of a TZID. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Guy Harris" <guy@alum.mit.edu> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: "Masayoshi Okutsu" <Masayoshi.Okutsu@sun.com>; <tz@lecserver.nci.nih.gov> Sent: Sun, 2004 Jun 13 20:14 Subject: Re: Time Zone Localizations

...

On Sat, Jun 12, 2004 at 09:24:34AM -0700, Mark Davis wrote:

...
I think we may be talking past one another. LDML provide for a way to *localize* the Olson TZIDs. For example, you can localize the term "Asia/Singapore".

Actually, "Asia/Singapore" is, in the Olson code, a relative pathname, so, in all locales, it's "Asia/Singapore" - i.e., the localization function is a no-op.

You might want to localize a textual description of that region, but that's a different matter.

Guy Harris

8:33 p.m.

On Jun 13, 2004, at 9:22 PM, Mark Davis wrote:

...

I must not be making the situation clear. The internal code is invariant across locales. But the "display name" for that code will change dramatically. That is, the TZID will not change. But its representation to local users must; because they may not know English. For example, "Europe/London" will remain constant.

I would not describe that as "localizing the Olson TZIDs"; I'd describe it as "localizing the names corresponding to the regions for given Olson TZIDs" or "localizing the names for the types of time (e.g., "British Summer Time") for given Olson TZIDs". I.e., you do not localize the term "Asia/Singapore". You localize the term "Singapore Time".

Paul Eggert

7:37 a.m.

"Mark Davis" <mark.davis@jtcsv.com> writes:

...

http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim...

Commenting on the version of 2004-06-11: * The document uses the term "CLDR" without defining it. * 'These translations mark a difference between "generic" usage (aka "wall time") like "Pacific Time" and a fixed offset from GMT like "Pacific Standard Time" or "Pacific Daylight Time", and also allow for both abbreviated and full names.' This doesn't seem to allow for variants in names (e.g. DST in Los Angeles has variously been called "Pacific Daylight Time", "Pacific War Time", and "Pacific Peace Time"). Nor does it seem to allow for double DST. * The document defines "modern" equivalents as those that produce the same result for the current year. More accurate, I think, would be that two zones would have to predict the same results for the current year, the next year, the year after that, etc. In some cases two zones might predict the same results for the current year, but not for next year. * "windows ids shows a mapping from windows IDs to Olson IDs." You might mention that this is a one-to-many mapping, and the Olson ID is chosen somewhat arbitrarily. * "reset of the year" -> "rest of the year" * 'If there is an exact translation, use it. America/Los_Angeles => "Pacific Time"' This example doesn't seem to allow for the case where a city changes time zones. These things happen. (It just happened in Argentina last week.) If you really want a name that means US Pacific Time, you'll need to supply one. Historically the Olson TZID "US/Pacific" served this purpose, but it has been supplanted by geographical IDs partly to avoid ambiguity, partly to avoid controversy. * 'America/Los_Angeles => "Tampo de San Fransisko"' I don't understand this example. Why is Los Angeles being translated as if it were San Fransisco? * "Note that the results are semi-reversible; one cannot necessarily recover the exact time zone that one started with, but can recover a modern equivalent." I don't understand this claim completely, but I'm skeptical. Natural-language time notations are notoriously ambiguous. * "In Mexico one would prefer to see the America/Mexico_City timezone (appropriately localized), while in the US, the America/Chicago one." This is not merely an issue of ease-of-use: it is also important when one wants to specify the desired behavior in the presence of likely future changes. In 2001, the mayor of Mexico City threatened to abolish DST in that city, and the matter remains somewhat controversial, so it's quite possible that the Mexico City and Chicago will diverge in the not-too-distant future. * Some of the document uses notations like "GMT+7" to mean west of Greenwich; some of it uses the same notation to mean east of Greenwich. It should be consistent. I suggest "GMT+7" to mean east of Greenwich. Also, these days it's more technically accurate to write "UTC" instead of "GMT". * At the HTML level the document specifies charset=windows-1252 and the four HTML lines containing non-ASCII characters didn't come out right in my browser. Can you please use plain ASCII instead, e.g., use "«" instead of the Windows-1252 byte that means left-pointing double angle quotation mark? Or it may be simpler to reformulate the examples to avoid non-ASCII characters. * I don't really understand the proposal. Sorry. It's fairly complicated and I don't fully understand the intended use or the motivation. Could be this is because I'm not in the CLDR universe. Anyway, this means I can't really comment on the fundamentals of the document, only on relatively-minor issues like the above.

Oscar van Vlijmen

3:24 p.m.

New subject: Time Zone Localizations (encodings)

Paul Eggert wrote:

...

* At the HTML level the document specifies charset=windows-1252 and the four HTML lines containing non-ASCII characters didn't come out right in my browser. Can you please use plain ASCII instead, e.g., use "«" instead of the Windows-1252 byte that means left-pointing double angle quotation mark? Or it may be simpler to reformulate the examples to avoid non-ASCII characters.

Gave no problem in a couple of my browsers, for instance Macintosh Netscape 4, Internet Explorer 5. The "offending" characters were encoded correctly, namely as literals according to the character set windows-1252. Another way to encode the "left-pointing double angle quotation mark" is indeed by "«". A serious problem with this type of "named entities" is that there are 2451 named entities defined in the ISO/IEC DTR 9573-13 2nd Ed. standard <http://www.w3.org/2003/entities/iso9573-2003doc/9573.html> (or a more recent version) but only about one hundred of these are really supported by most browsers since version type 4 (Netscape/Microsoft) on most platforms. Much better supported are numbered entities like « (hexadecimal) or « (decimal). Decimal entities are preferred due to a somewhat better compatibility with older browsers and because you can copy them directly from most DTD entity definition files - e.g. the *.ent files from w3.org. If you have to use characters outside the defined character set - in this case windows-1252 - then hexadecimal/decimal entities are obligatory. In that case a utf-8 character set designation for the html document is advised, but this means that all windows-1252 literals should be reencoded to numbered entities according to Unicode positions. In short: for multilingual html pages one should use a utf-8 character set designation (per meta or http-header) and encode all special characters with numbered entities according to Unicode positions. Oscar van Vlijmen 2004-06-12 Sorry, but I don't C-copy emails to discussion partners personally if an email to the tz-list should be sufficient.

Mark Davis

8:15 p.m.

Thanks very much for your comments. I put up a new version of http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim... which tries to address these comments and some of the others that I've gotten on this list. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Paul Eggert" <eggert@CS.UCLA.EDU> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Sat, 2004 Jun 12 00:37 Subject: Re: Time Zone Localizations

...

"Mark Davis" <mark.davis@jtcsv.com> writes:

...
http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim...

Commenting on the version of 2004-06-11:

* The document uses the term "CLDR" without defining it.

* 'These translations mark a difference between "generic" usage (aka "wall time") like "Pacific Time" and a fixed offset from GMT like "Pacific Standard Time" or "Pacific Daylight Time", and also allow for both abbreviated and full names.'

This doesn't seem to allow for variants in names (e.g. DST in Los Angeles has variously been called "Pacific Daylight Time", "Pacific War Time", and "Pacific Peace Time"). Nor does it seem to allow for double DST.

* The document defines "modern" equivalents as those that produce the same result for the current year. More accurate, I think, would be that two zones would have to predict the same results for the current year, the next year, the year after that, etc. In some cases two zones might predict the same results for the current year, but not for next year.

* "windows ids shows a mapping from windows IDs to Olson IDs." You might mention that this is a one-to-many mapping, and the Olson ID is chosen somewhat arbitrarily.

* "reset of the year" -> "rest of the year"

* 'If there is an exact translation, use it. America/Los_Angeles => "Pacific Time"'

This example doesn't seem to allow for the case where a city changes time zones. These things happen. (It just happened in Argentina last week.)

If you really want a name that means US Pacific Time, you'll need to supply one. Historically the Olson TZID "US/Pacific" served this purpose, but it has been supplanted by geographical IDs partly to avoid ambiguity, partly to avoid controversy.

* 'America/Los_Angeles => "Tampo de San Fransisko"' I don't understand this example. Why is Los Angeles being translated as if it were San Fransisco?

* "Note that the results are semi-reversible; one cannot necessarily recover the exact time zone that one started with, but can recover a modern equivalent." I don't understand this claim completely, but I'm skeptical. Natural-language time notations are notoriously ambiguous.

* "In Mexico one would prefer to see the America/Mexico_City timezone (appropriately localized), while in the US, the America/Chicago one." This is not merely an issue of ease-of-use: it is also important when one wants to specify the desired behavior in the presence of likely future changes. In 2001, the mayor of Mexico City threatened to abolish DST in that city, and the matter remains somewhat controversial, so it's quite possible that the Mexico City and Chicago will diverge in the not-too-distant future.

* Some of the document uses notations like "GMT+7" to mean west of Greenwich; some of it uses the same notation to mean east of Greenwich. It should be consistent. I suggest "GMT+7" to mean east of Greenwich. Also, these days it's more technically accurate to write "UTC" instead of "GMT".

* At the HTML level the document specifies charset=windows-1252 and the four HTML lines containing non-ASCII characters didn't come out right in my browser. Can you please use plain ASCII instead, e.g., use "«" instead of the Windows-1252 byte that means left-pointing double angle quotation mark? Or it may be simpler to reformulate the examples to avoid non-ASCII characters.

* I don't really understand the proposal. Sorry. It's fairly complicated and I don't fully understand the intended use or the motivation. Could be this is because I'm not in the CLDR universe. Anyway, this means I can't really comment on the fundamentals of the document, only on relatively-minor issues like the above.

Paul Eggert

2:31 a.m.

"Mark Davis" <mark.davis@jtcsv.com> writes:

...

http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim...

A few comments on the 2004-06-12 edition: * Re "Eastern European Daylight Time". A more common English translation is "Eastern European Summer Time". This is certainly true for British English, and tends to be true even for American sources like the Olson database. I think the usual idioms in practice are as follows. legend: - generic time name (and abbreviation) 0 standard time name (and abbreviation) 1 daylight-saving time name (and abbreviation) example of typical American English style: - Eastern Time (ET) 0 Eastern Standard Time (EST) 1 Eastern Daylight Time (EDT) example of typical Australian English style: - Eastern Time (ET) 0 Eastern Standard Time (EST) 1 Eastern Summer Time (EST) example of typical European English style: - Eastern European Time (EET) 0 Eastern European Time (EET) 1 Eastern European Summer Time (EEST) Note that some names and/or abbreviations are ambiguous in some locales. This suggests that one may run into some problems with the idea of having UIs use names to distinguish the cases labeled "-", "0", "1". This is somewhat related to the issue I previously mentioned that, even in the same locale, the name differs depending on what time stamps you're talking about. For example "Pacific War Time" should be used for daylight-saving time in Los Angeles from 1942-02-09 through 1945-08-14. * I don't understand the syntax of the examples starting <ldml><dates>.... * Re "Note: the Olson TZIDs uses the opposite sign as RFC 822 with GMT formats". This is because the Olson TZIDs use the same sign as POSIX TZ settings. * "They are organized by cities" -> "They are organized by compact locations, typically cities." Sometimes they're islands or bases. As far as the requests go (in part F): * "A list of the set of links to not skip". These would be all the Link lines that are mentioned in zone.tab. * "The addition of unique TZIDs corresponding to the 'missing' ISO country codes BV, HM". We've discussed this. These areas are uninhabited and have no local time, so it's questionable that they need TZIDs. By the way, there's one other 'missing' ISO country code that I just discovered: AX, for the Aaland Islands. (AX was added on February 13 but nobody told us. :-) I'll add a new TZID Europe/Mariehamn for it, in my next proposed tz update. It'll be an alias for Europe/Helsinki. * "A mapping from some private-use ISO country code to the Etc/GMT* TZIDs." The Olson Etc/GMT* TZIDs are pretty much a subset of POSIX, which allows an essentially unlimited set of time zone IDs. Most of the POSIX TZIDs will never be used in practice; conversely, the Olson Etc/GMT* TZIDs do not suffice (for example, they don't suffice for Los Angeles, or for India). I'm not sure it makes sense to single out the Etc/GMT* IDs. Perhaps you need a way to specify any POSIX ID instead.

Mark Davis

8:03 p.m.

Thanks again. Some comments below. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: "Paul Eggert" <eggert@CS.UCLA.EDU> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov> Sent: Sat, 2004 Jun 12 19:31 Subject: Re: Time Zone Localizations

...

"Mark Davis" <mark.davis@jtcsv.com> writes:

...
http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim...

A few comments on the 2004-06-12 edition:

* Re "Eastern European Daylight Time". A more common English translation is "Eastern European Summer Time". This is certainly true for British English, and tends to be true even for American sources like the Olson database.

A bit of background. CLDR permits different translations for different locales, so the same TZID could be translated as "Eastern European Daylight Time" for en_US (the US English locale), and "Eastern European Summer Time" for en_UK. It all depends on what would be better understood in each place. Of course, if there is a single term that would work for all English locales, it is better to use that. I filed http://www.jtcsv.com/cgibin/locale-bugs?findid=152 on this.

...

I think the usual idioms in practice are as follows.

legend: - generic time name (and abbreviation) 0 standard time name (and abbreviation) 1 daylight-saving time name (and abbreviation)

example of typical American English style: - Eastern Time (ET) 0 Eastern Standard Time (EST) 1 Eastern Daylight Time (EDT)

example of typical Australian English style: - Eastern Time (ET) 0 Eastern Standard Time (EST) 1 Eastern Summer Time (EST)

example of typical European English style: - Eastern European Time (EET) 0 Eastern European Time (EET) 1 Eastern European Summer Time (EEST)

Note that some names and/or abbreviations are ambiguous in some locales. This suggests that one may run into some problems with the idea of having UIs use names to distinguish the cases labeled "-", "0", "1".

For any given locale, we would disallow a collision between names. Thus for the locale en (English) we can't have AST mean both Alaska Standard Time and Atlantic Standard Time. What we can do is have AST mean one in en_US and another in en_CA (Canada).

...

This is somewhat related to the issue I previously mentioned that, even in the same locale, the name differs depending on what time stamps you're talking about. For example "Pacific War Time" should be used for daylight-saving time in Los Angeles from 1942-02-09 through 1945-08-14.

At this point, we are not really interested in translations of the way that certain TZIDs would have been named in the past; it is enough of a problem to get the present-day names!

...

* I don't understand the syntax of the examples starting <ldml><dates>....

The first line just lists the XML elements and attributes that are common to the languages. The following lines list different translations. The locales using each translation are listed surrounded by dots. The locale format is described in http://www.unicode.org/reports/tr35/. Briefly, in all the cases mentioned it is language_code ("_" script_code)? ("_" territory_code)? // Perl notation where the language code is the ISO 639 code. Thus en is English, fi is Finnish, zh is Chinese, etc.

...

* Re "Note: the Olson TZIDs uses the opposite sign as RFC 822 with GMT formats". This is because the Olson TZIDs use the same sign as POSIX TZ settings.

* "They are organized by cities" -> "They are organized by compact locations, typically cities." Sometimes they're islands or bases.

good notes, thanks

...

As far as the requests go (in part F):

I hope you'll have patience with me on this, as I learn more about the structure.

...

* "A list of the set of links to not skip". These would be all the Link lines that are mentioned in zone.tab.

Good.

...

* "The addition of unique TZIDs corresponding to the 'missing' ISO country codes BV, HM". We've discussed this. These areas are uninhabited and have no local time, so it's questionable that they need TZIDs.

I agree that these are little rocks, of no value. So it is very odd that they are given their own country codes by ISO. However, a lot of software is driven by those ISO codes, so having them be missing may be a problem. Have to look more into this.

...

By the way, there's one other 'missing' ISO country code that I just discovered: AX, for the Aaland Islands. (AX was added on February 13 but nobody told us. :-) I'll add a new TZID Europe/Mariehamn for it, in my next proposed tz update. It'll be an alias for Europe/Helsinki.

Thanks.

...

* "A mapping from some private-use ISO country code to the Etc/GMT* TZIDs."

The Olson Etc/GMT* TZIDs are pretty much a subset of POSIX, which allows an essentially unlimited set of time zone IDs. Most of the POSIX TZIDs will never be used in practice; conversely, the Olson Etc/GMT* TZIDs do not suffice (for example, they don't suffice for Los Angeles, or for India).

I'm not sure it makes sense to single out the Etc/GMT* IDs. Perhaps you need a way to specify any POSIX ID instead.

We certainly don't want all the POSIX IDs; as you say, they won't be used in practice. The only reason for including the Etc/GMT* IDs would be if they are needed. I had thought that to get all the timezones that are in use in the world, including those in international waters, wouldn't we have to include the Etc/GMT* ones? Or is this wrong?

...

Paul Eggert

4:37 a.m.

"Mark Davis" <mark.davis@jtcsv.com> writes:

...

For any given locale, we would disallow a collision between names. Thus for the locale en (English) we can't have AST mean both Alaska Standard Time and Atlantic Standard Time.

That collides with common practice in Australia, which is to use "EST" to denote both "Eastern Standard Time" and "Eastern Summer Time". The tz database attempts to support common practice as much as possible, so if your system doesn't allow this sort of thing you'll have to make some special provision for the discrepancies.

...

...
For example "Pacific War Time" should be used for daylight-saving time in Los Angeles from 1942-02-09 through 1945-08-14.

At this point, we are not really interested in translations of the way that certain TZIDs would have been named in the past;

I sense a problem in terminology here. The TZID is the same either way: it's "America/Los_Angeles". The problem is merely that Los Angeles has different names for daylight-saving time, depending on the time stamp in question. Even if I'm writing today, it's incorrect for me to write the phrase "January 20, 1943, at 8:00am Pacific Daylight Time" in Los Angeles: I should write "Pacific War Time" instead. Another example: in 1988 Newfoundland observed double-daylight time (two hours ahead of normal), which the tz database calls "NDDT" instead of the usual "NDT". It could happen again. You may not want to support double-daylight saving time, but if so you should note down the problem somewhere.

...

I had thought that to get all the timezones that are in use in the world, including those in international waters, wouldn't we have to include the Etc/GMT* ones? Or is this wrong?

In international waters, my understanding is that local time is entirely at the whim of the ship's captain. (So you may have to support all the POSIX TZIDs after all. :-) I suppose the Olson Etc/GMT* TZIDs are better than nothing, but they weren't intended to be (and are not) a complete list of time zone offsets in actual use, whether in international waters or anywhere else.

8031

Age (days ago)

8039

Last active (days ago)

List overview

Download

80 comments

13 participants

participants (13)

Chuck Soper
Clive D.W. Feather
David Keegel
Eric Muller
Garrett Wollman
Guy Harris
Infoman
jcowan＠reutershealth.com
John Cowan
Mark Davis
Masayoshi Okutsu
Oscar van Vlijmen
Paul Eggert