Mark Davis is not on the time zone mailing list (at least not at the address below); direct replies appropriately. --ado ----- Original Message ----- From: "Mark Davis" <mark.davis@icu-project.org> To: "Tz (tz@elsie.nci.nih.gov)" <tz@lecserver.nci.nih.gov> Cc: "Olson, Arthur David (NIH/NCI)" <olsona@dc37a.nci.nih.gov> Sent: Tuesday, August 09, 2005 07:28 Subject: Request from CLDR committee The CLDR technical committee has been looking at issues that have come up in connection with timezones, and have the following requests of the TZ group. Background. As discussed previously on this list, the CLDR project supplies localizations for timezone identifiers, based on the TZ IDs from the TZ database. The localizations can either be explicit strings, or if unavailable, use the translated country name (where a country has only a single timezone) or fall back to the last field of the TZID and the translated country name. Thus in German: America/Havana => "Kuba" Europe/Moscow => "Moskau (Russische Föderation)" America/Los_Angeles => "Los Angeles (Vereinigte Staaten)" The process is somewhat more complicated than this description, but this provides the gist. For other examples, see http://unicode.org/cldr/data/common/test/ with different locales. (BTW, a corrigendum was issued for the GMT problem raised earlier; see http://unicode.org/cldr/corrigenda.html). Requests. 1. Missing Country Codes. There are two missing ISO country codes. While these are uninhabited rocks, they should be added according to Theory, which says: " Include at least one location per time zone rule set per country. One such location is enough. Use ISO 3166 (see the file iso3166.tab) to help decide whether something is a country. " There are also good, practical reasons to do this; an implementation that maps from country codes to sets of zones needs to have some value for all ISO country codes. The two ISO codes are: HM 53 06 S, 72 31 E Heard Island and McDonald Islands BV 54 26 S, 3 24 E Bouvet Island (locations are from http://www.cia.gov/cia/publications/factbook/geos/bv.html http://www.cia.gov/cia/publications/factbook/geos/hm.html ) Suggested changes: To zone.tab, add HM -5306+7231 Pacific/Heard BV -5426+0324 Atlantic/Bouvet To antarctica, add Zone Atlantic/Bouvet 0:00 - GMT Zone Pacific/Heard 5:00 - GMT 2. Enabling Canonical IDs The Link commands in the database establish equivalence classes between TZIDs (aka "location names"). For an implementation like CLDR, it is important that there be a completely stable canonical TZID that represents any of those equivalents. Based on feedback from this list, we chose it to be: a) the TZID in zone.tab as of 2004a, or b) any new TZID in a later version of zone.tab that is not equivalent to a TZID introduced in an earlier version. That is, we use America/Buenos_Aires since it was in 2004a, and America/Argentina/Tucuman since it was introduced later. One dependency we have is that the last field in the canonical ID be unique. That is, we can't have both a Europe/London and an America/London. Now, if the TZ database ever added a TZID that was not unique in this sense, we could add our own canonical ID outside of the TZ database, but that is clearly not our preference, not at all. This appears to be the practice in the TZ database, as evidenced by the following in southamerica: # Bahia (BA) # There are too many Salvadors elsewhere, so use America/Bahia instead # of America/Salvador. We also depend on the feature that every equivalence class (except Etc/...) has exactly one member in zone.tab. To avoid having to hack around problems in the future, we would like this to be captured in Theory as requirements for the construction of future IDs. This is in no way a functional restriction, just on the choice of names. Thus, we propose the addition of something like the following to the "rules used for choosing location names" in Theory. All locations (the final field in a location name) must be unique. Thus one cannot have Europe/London and either America/London or America/Canada/London. Two location names that appear in zone.tab cannot be Linked together, either directly or through a chain of Links. Conversely, every location (except for those starting with "Etc") must be Linked to a location name in zone.tab. 3. Definitional Links. For the purpose of something like CLDR, it is important to separate out the *definitional* equivalents from the *incidental* equivalents (equivalencies that happen to be true for now, but could change in the future). You don't want to include two TZIDs in the same definitional equivalence class if they are ever different, or could be in the future, because then comparisons between TZIDs (as equivalent) could be true now, but fail in the future. After looking at the equivalence classes established by Link, it turns out that there are a few anomalies. Now, Theory says: " If all the clocks in a country's region have agreed since 1970, don't bother to include more than one location even if subregions' clocks disagreed before 1970. Otherwise these tables would become annoyingly large. " This makes a great deal of sense. After all, if we go back to when daylight savings started, then every location on earth (that didn't share a longitude with another location) would be a separate TZID. However, there are a small number of anomalous cases. List A below contains items that should be Linked, since they are always and will always be equivalent (as far as TZ calculations go). List B contains cases that have been the same since 1970, but are not Linked. So they appear to violate the condition above in Theory. Conversely, List C contains cases that clearly reference different locations, and thus before timezones were added, they had different offsets (sun time). So if the same criteria are applied as in List B, they would be unlinked. So we request that the items in List A be linked, and each of the pairs in List B and C be treated consistently: Option 1. Leave (or make) the pair Linked, and pick one item in each pair, and document that it is obsolete, and will never be unlinked from the other. Option 2. Leave (or make) the pair Unlinked. If it was previously Linked, then thus according to #2 above, one of the pair would be added to zone.tab. And add to Theory, under "rules used for choosing location names". As of version X, whenever two location names have been linked in the past, for stability they will remain linked forever. ============= List A. TZIDs that are not linked, but are the same 001 Etc/GMT 001 Etc/UTC 001 Etc/UCT -- not linked, identical List B. TZIDs that are not linked, are different locations, but are the same since 1970 AQ Antarctica/Mawson AQ Antarctica/Vostok -- same since 57 FM Pacific/Truk FM Pacific/Yap -- same since 70 GB Europe/Belfast GB Europe/London -- same since 68 ML Africa/Bamako ML Africa/Timbuktu -- same since 60 List C. TZIDs that are linked, but refer to different locations. (This was derived by inspection; if there are other cases of IDs that refer to different locations, please let us know.) AQ Antarctica/McMurdo AQ Antarctica/South_Pole -- linked, different places SJ Arctic/Longyearbyen SJ Atlantic/Jan_Mayen -- linked, different places US America/Denver US America/Shiprock -- linked, different places AR America/Argentina/Cordoba AR America/Rosario -- linked, different places AU Australia/Sydney AU Australia/Canberra -- linked, different places BR America/Rio_Branco BR America/Porto_Acre -- linked, different places BR Brazil/Acre ? IL Asia/Jerusalem IL Asia/Tel_Aviv -- linked, different places MD Europe/Chisinau MD Europe/Tiraspol -- linked, different places MX America/Tijuana MX America/Ensenada -- linked, different places US America/Indianapolis US America/Fort_Wayne -- linked, different places