comments interleaved below. Since this is getting back into the translation issues, I'm cc'ing the cldr group. Mark __________________________________ http://www.macchiato.com ► शिष्यादिच्छेत्पराजयम् ◄ ----- Original Message ----- From: <jcowan@reutershealth.com> To: "Mark Davis" <mark.davis@jtcsv.com> Cc: <tz@lecserver.nci.nih.gov>; "Chuck Soper" <chucks@lmi.net> Sent: Fri, 2004 Jun 11 13:23 Subject: Re: Time Zone Localizations
Mark Davis scripsit:
However, to reduce the translation requirements and make the data more manageable, we do want to set up some uniqueness criteria. If two IDs have exactly the same behavior since the time when time zones were adopted,
In fact the Olson data do not separate timezones in a given country that have been the same since 1970-01-01. Otherwise, Indiana would have something like 30 time zones instead of just four.
and have always been in the same country over that period, we only want one of them to be in the main list. The other can be an alternate -- and still work-- but we would recommend an extremely low priority on translation.
I think that is a mistake, for two reasons: national chauvinism and future-proofing. About the former, nothing need be said; but the whole point of setting your zone to the country you are in (especially if you live there) is that you don't want to have to reset it if your national legislature changes the rules, either the DST rules or the zone proper. Within the EU, DST rules are harmonized, but which zone to adopt is a purely national decision.
I said "have always been in the same country over that period"; this would not make any zones "modern equivalents" that were in different countries. But see below.
Many (I would dare say the vast majority) of end users just don't care now that there was once a difference between Dawson, Whitehorse and Los Angeles.
This strikes me as backwards. If you're in the U.S., you should see U.S. choices; in Canada you should see Canadian ones.
My fault for confusing you; I mistyped Los Angeles instead of Vancouver. Here is a real example. Each of the items separated by commas are modern equivalents, and all within the same country (Canada). Thus America/Dawson, America/Whitehorse, America/Vancouver are not distinguished by country, and all behave the same nowadays. America/Dawson, America/Whitehorse, America/Vancouver; America/Dawson_Creek; America/Inuvik, America/Yellowknife, America/Edmonton, America/Cambridge_Bay; America/Swift_Current, America/Regina; America/Rainy_River, America/Rankin_Inlet; America/Winnipeg; America/Iqaluit, America/Pangnirtung, America/Nipigon, America/Thunder_Bay, America/Montreal; America/Goose_Bay; America/Glace_Bay, America/Halifax; America/St_Johns
Absolutely!
I think the series of fallbacks is unnecessarily complex. In particular, the fallback from "Pacific Time" to "GMT-07:00/08:00" doesn't tell me that much, because I don't know a priori whether it's winter or summer currently.
In addition, it fails to exploit the nice thing about the use of city names in Olson, namely that city names don't need that much localization: in the vast majority of cases, the internationally known name is the only name. (Transliteration might be required if the current locale has no Latin letters.) Thus the full combinatorial explosion of city name x language can mostly be short-circuited.
If that were true, we'd not as much of a problem. And if everyone spoke English this would all be much easier ;-) Look at London, from the CLDR: <ldml><dates><timeZoneNames><zone type="GMT"><exemplarCity> "Londain": ·ga· "Londen": ·nl· "London": ·da· ·en· ·fr· ·sv· "Londra": ·it· "Londres": ·es· ·pt· "Lontoo": ·fi· "ロンドン": ·ja· "伦敦": ·zh· "倫敦": ·zh_Hant· "런던": ·ko· ... These are only for a few languages, but there is a lot of variation. A great part of the motivation for this is to cut down on the amount of data required, just from the sheer magnitude of the problem when you multiply the figures by the 90 languages currently in CLDR, plus the many more languages to come.
I propose a simpler scheme, therefore:
1) If you have a translation for the time zone name x the language, use it.
2a) Get the localized name for the city (or if none, the Olson city name); 2b) Get the "Tampo de '%1'" schema for the language (or if none, use just
"%1");
2c) Substitute the city name into the schema and use that.
I can understand your desire for simplicity, and I am not happy with there being 8 possible steps. But depending on city data would be very painful. We already have in CLDR a lot of country data, so if we can leverage that it really helps. Let's look at the figures. There are 239 countries. Of them, 210 have a single zone. Using a country name for each of them is essentially free. Of the remainder, 8 only have multiple zones historically. So the modern ones are again essentially free. Of the rest, cities might be the best way to go. We would need 99 cities for modern zone distinctions, 140 if we added historic also. If you multiply that by 90 languages it is still a lot of data, but *way*, *way* better than 558 x 90 we are faced with now! So that is the reason for Step #4.1 in http://oss.software.ibm.com/cvs/icu/~checkout~/icuhtml/design/formatting/tim... But we still need some fallback in case there is no unique country, and no translated city. Now, it may be better to nuke #4.2 and #5, e.g. dropping the GMT part. GMT format when there is no daylight savings does not lose any information (nowadays). Where there is daylight, it does lose information -- although actually not much -- but avoids the problem of using cities that may either be unknown to the user or not in a script s/he can read. The only place where it is ambiguous (within a country) is if you have two zones that have the same summer & winter offsets, but start at different times. That is pretty rare. (Across countries, or historically, it is not quite so rare.) That being said, we are not wedded to the GMT format either; have to toss it around a bit. You are right that GMT format does not protect against future changes; but we have to look at likelyhood. The city format also doesn't protect against all possible changes; I might use America/Los Angeles right now meaning my time zone, but if the N. California counties changed to a different zone, splitting that one, then it wouldn't be correct any more. [Of course, what would really be nice is if the world could agree to all switch to/from daylight savings at the same (local) time, e.g. 02:00 the last Sundays in March and September. Then you could convey all modern zones with three formats, without loss of information: - GMT-08:00 (for no daylight savings) - GMT-08:00N (for daylight savings March-Sept), and - GMT-08:00S (for daylight savings Sept-March). Of course, the chances of something sensible like this are, well, zip.]
When I'm communicating with users about the Reuters Health system, I always refer to events occurring at such-and-such a time, New York time. That communicates not only a GMT offset but a set of DST rules. This is also what's typically done in legal documents -- see the legal ads for bond redemption announcements in a newspaper.
-- John Cowan <jcowan@reutershealth.com> http://www.reutershealth.com I amar prestar aen, han mathon ne nen, http://www.ccil.org/~cowan han mathon ne chae, a han noston ne 'wilith. --Galadriel, LOTR:FOTR