Why did you rename Russian zone name abbreviations

Hello! https://github.com/eggert/tz/commit/1ac038c2c3f25f7211474ae08feb6afb820 e35fe Why did you rename Russian zone name abbreviations? Can you provide a rationale for this change? This change breaks date parsing, etc... With best regards, Sergei Turchanov.

Sergei Turchanov wrote:
Why did you rename Russian zone name abbreviations? Can you provide a rationale for this change?
Sure, the basic idea is that the tz database should reflect common practice in English-language abbreviations, rather than inventing and/or imposing abbreviations. So I have been gradually removing invented abbreviations and replacing them with UT offsets. I think Russia is mostly done now, and it's time to turn our attention to other parts of the world.
This change breaks date parsing, etc...
Alphabetic zone abbreviations are ambiguous in practice and any date parsing depending on them is questionable anyway. The above is documented in more detail in the 'Theory' file.

On Mon, Oct 31, 2016 at 2:48 AM, Paul Eggert <eggert@cs.ucla.edu> wrote:
Sure, the basic idea is that the tz database should reflect common practice in English-language abbreviations, rather than inventing and/or imposing abbreviations. So I have been gradually removing invented abbreviations and replacing them with UT offsets.
You may have been too aggressive in this effort. Some of the abbreviations that may have been originally invented by you or Olson have since come into common practice. The use of English abbreviations of Russian time zones is fairly common in Russia. Interestingly, with the exception of MSK (МСК), the timezone abbreviations are not translated/transliterated to Russian in Russian language documents and used in Latin script which is ubiquitous in anything computer related. See, for example, <https://ru.wikipedia.org/wiki/Время_в_России>.

On 10/31/2016 12:17 PM, Alexander Belopolsky wrote:
Some of the abbreviations that may have been originally invented by you or Olson have since come into common practice.
Yes, web pages derived from older versions of the tz database use them. But they do not appear to be common in general English usage. I just now checked Google News for English-language news articles mentioning both "VLAT" and "Vladivostok", and found zero occurrences, which suggests that the abbreviation has not caught on in practice in English. Instead, I found phrases like this: "Japanese ship Pacific Venus carrying 500 passengers will moor at the port of Vladivostok at 07:00 (local time)..." <http://en.portnews.ru/news/228171/> "Vladivostok is GMT +10" <http://www.wanderlust.co.uk/magazine/blogs/matthew-woodward/keeping-track-of...>

On Mon, Oct 31, 2016 at 4:28 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 10/31/2016 12:17 PM, Alexander Belopolsky wrote:
Some of the abbreviations that may have been originally invented by you or Olson have since come into common practice.
Yes, web pages derived from older versions of the tz database use them. But they do not appear to be common in general English usage.
I am not sure English language web pages reflect the expectations of the majority of users of Russian time zones. As I mentioned earlier, it is common to use English abbreviations in Russian technical documents. I wouldn't be surprised if similar practices exist in other non-English speaking countries, particularly in those with non-Latin scripts.

On 10/31/2016 01:39 PM, Alexander Belopolsky wrote:
I wouldn't be surprised if similar practices exist in other non-English speaking countries
I'm afraid we don't have the resources to cover common practices in languages other than English, even if these languages use Latin-letter abbreviations that may have been derived from English originally. This sort of thing is better addressed by CLDR <http://cldr.unicode.org/>, which has a mandate to cover time zone abbreviations in non-English locales.

On Tue, Nov 1, 2016, at 12:12, Paul Eggert wrote:
On 10/31/2016 01:39 PM, Alexander Belopolsky wrote:
I wouldn't be surprised if similar practices exist in other non-English speaking countries
I'm afraid we don't have the resources to cover common practices in languages other than English, even if these languages use Latin-letter abbreviations that may have been derived from English originally. This sort of thing is better addressed by CLDR <http://cldr.unicode.org/>, which has a mandate to cover time zone abbreviations in non-English locales.
The simple fact is, they don't have a reference implementation of strftime, and we do. Maybe it's time to step up. Also, there's a difference between doing localization in general and accepting a single abbreviation used in a single language that is not English when there is no English abbreviation to conflict. We absolutely do have the resources to do the latter, you just don't *want* to - that's not the same thing.

On 2016-11-01 11:24, Random832 wrote:
On Tue, Nov 1, 2016, at 12:12, Paul Eggert wrote:
On 10/31/2016 01:39 PM, Alexander Belopolsky wrote:
I wouldn't be surprised if similar practices exist in other non-English speaking countries
I'm afraid we don't have the resources to cover common practices in languages other than English, even if these languages use Latin-letter abbreviations that may have been derived from English originally. This sort of thing is better addressed by CLDR <http://cldr.unicode.org/>, which has a mandate to cover time zone abbreviations in non-English locales.
The simple fact is, they don't have a reference implementation of strftime, and we do. Maybe it's time to step up.
Also, there's a difference between doing localization in general and accepting a single abbreviation used in a single language that is not English when there is no English abbreviation to conflict. We absolutely do have the resources to do the latter, you just don't *want* to - that's not the same thing.
Perhaps someone who regularly uses CLDR could post a simple demo of tz abbreviation L10N. Debian packaging configuration distributes I18N templates in 23 languages for dpkg-reconfigure tzdata aka tzselect in /var/lib/dpkg/info/tzdata.templates - also useful to find words other languages use for time zone, countries, territories, and municipalities. -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

Thanks for the interesting CLDR discussion. I don’t have a demo offhand that addresses this specific case, but I can try to work on one. CLDR doesn’t provide implementation, but libraries such as ICU do In any event, if there are CLDR (or ICU) specific questions, they may be more appropriate on the mailing lists such as those listed on http://www.unicode.org/consortium/distlist.html Regards, Steven ( in the middle of several hours of CLDR and ICU presentations at IUC40 ) El 11/1/16 3:04 PM, "Brian Inglis" <tz-bounces@iana.org en nombre de Brian.Inglis@SystematicSw.ab.ca> escribió:
On 2016-11-01 11:24, Random832 wrote:
On Tue, Nov 1, 2016, at 12:12, Paul Eggert wrote:
On 10/31/2016 01:39 PM, Alexander Belopolsky wrote:
I wouldn't be surprised if similar practices exist in other non-English speaking countries
I'm afraid we don't have the resources to cover common practices in languages other than English, even if these languages use Latin-letter abbreviations that may have been derived from English originally. This sort of thing is better addressed by CLDR <http://cldr.unicode.org/>, which has a mandate to cover time zone abbreviations in non-English locales.
The simple fact is, they don't have a reference implementation of strftime, and we do. Maybe it's time to step up.
ICU has u_sprintf http://icu-project.org/apiref/icu4c/ustdio_8h.html#a762fa0be07b80294eb9b5eb4...
Also, there's a difference between doing localization in general and accepting a single abbreviation used in a single language that is not English when there is no English abbreviation to conflict. We absolutely do have the resources to do the latter, you just don't *want* to - that's not the same thing.
Perhaps someone who regularly uses CLDR could post a simple demo of tz abbreviation L10N.
Debian packaging configuration distributes I18N templates in 23 languages for dpkg-reconfigure tzdata aka tzselect in /var/lib/dpkg/info/tzdata.templates - also useful to find words other languages use for time zone, countries, territories, and municipalities.
-- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

On 11/01/2016 10:24 AM, Random832 wrote:
We absolutely do have the resources to do the latter, you just don't*want* to - that's not the same thing.
I don't want to because I don't agree that we have resources for this sort of thing. For example, suppose there are conflicting opinions about the Latin-letter abbreviation to use for the local (non-Beijing) time in Ürümqi, based on usage in Uyghur vs Oirat vs Mandarin vs other sources. In such a situation, our sources are likely to be biased for nontechnical reasons and we can't arbitrate such disputes reliably. It's too much work already to deal with English-language problems, without also trying to deal with obscure-to-us disputes in Xinjiang or Dagestan or whatever. Applications should not care what time zone abbreviation is used in places like Ürümqi and Vladivostok, places where there is no real-world standard. Since that's the case, we should not spend much time worrying about it either.

On Tue, Nov 1, 2016 at 12:12 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 10/31/2016 01:39 PM, Alexander Belopolsky wrote:
I wouldn't be surprised if similar practices exist in other non-English speaking countries
I'm afraid we don't have the resources to cover common practices in languages other than English,
Well, clearly not making a change should take fewer resources than making it. I would think any change like this should be driven by input from the affected regions. Think how much havoc would it cause in the US if you would change EST/EDT to -05/-04. Many users would have no idea what these numbers mean. This is a general problem with UTC offsets: in the zones more than 1-2 hours away from the zero meridian the time at Greenwich is fairly irrelevant and people tend to refer to local times by name (as in the US), or by offset from the central zone (as in Russia).
even if these languages use Latin-letter abbreviations that may have been derived from English originally. This sort of thing is better addressed by CLDR <http://cldr.unicode.org/>, which has a mandate to cover time zone abbreviations in non-English locales.
I don't think CLDR can supply translation of +10 to VLAT without help from tzdb. In fact, I don't think CLDR even supports abbreviations - it looks like they only localize TZID's such as Asia/Vladivostok. Note that many systems don't bother translating the abbreviations even when the rest of the date display is localized. For example on Linux, I get $ TZ=Europe/Moscow LANG=ru_RU.UTF-8 date Срд Ноя 2 20:50:12 MSK 2016 and on Mac OS X: $ TZ=Europe/Moscow LANG=ru_RU date среда, 2 ноября 2016 г. 20:48:10 (MSK) In fact, I think POSIX made a mistake allowing %Z to be localized and most systems don't do that. As a practical matter, while I appreciate all the theory behind the transition, it would be better to limit the changes to cases where "invented" abbreviations are reported to cause user confusion. (And what can be more confusing than EST? Being in the Western Hemisphere, it is not Eastern and being Winter time, it is not Summer!)

On Nov 2, 2016, at 11:07 AM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
I don't think CLDR can supply translation of +10 to VLAT without help from tzdb. In fact, I don't think CLDR even supports abbreviations - it looks like they only localize TZID's such as Asia/Vladivostok.
TZIDs aren't themselves localized, as they're really not intended to be used in user interfaces. CLDR has entries that map from what they call "metazones" to both long and short time zone names; the short time zone names are abbreviations. A "metazone" is a handle that's used for time zone names; it probably usually corresponds to what we think of as a "time zone", but there may be cases where it doesn't. For example, there's a metazone called "America_Eastern", where "America" presumably means "the Americas" - in the same sense of "America" in tzids - rather than "the United States of America", as it's used for Canadian and Mexican time zones as well. Its data in the CLDR's "main/en.xml" file is <metazone type="America_Eastern"> <long> <generic>Eastern Time</generic> <standard>Eastern Standard Time</standard> <daylight>Eastern Daylight Time</daylight> </long> <short> <generic>ET</generic> <standard>EST</standard> <daylight>EDT</daylight> </short> </metazone> and its data in the "main/fr.xml" file is <metazone type="America_Eastern"> <long> <generic>heure de l’Est nord-américain</generic> <standard>heure normale de l’Est nord-américain</standard> <daylight>heure d’été de l’Est</daylight> </long> </metazone> so they provide the long time zone name in French but use the same abbreviation for French and English. The same applies for "main/es.xml" - the long names are in Spanish but the abbreviations fall back on the English ones. A locale corresponding to a tzdb entry may move from one metazone to another if, for example, a locale switches from one time zone to another. For example, the "supplemental/metaZones.xml" file has <timezone type="America/Indiana/Knox"> <usesMetazone to="1991-10-27 07:00" mzone="America_Central"/> <usesMetazone to="2006-04-02 07:00" from="1991-10-27 07:00" mzone="America_Eastern"/> <usesMetazone from="2006-04-02 07:00" mzone="America_Central"/> </timezone> because the tzdb entry for America/Indiana/Knox is: Zone America/Indiana/Knox -5:46:30 - LMT 1883 Nov 18 12:13:30 -6:00 US C%sT 1947 -6:00 Starke C%sT 1962 Apr 29 2:00 -5:00 - EST 1963 Oct 27 2:00 -6:00 US C%sT 1991 Oct 27 2:00 -5:00 - EST 2006 Apr 2 2:00 -6:00 US C%sT The CLDR doesn't cover the pre-1991 changes - I don't know whether they use the UNIX Epoch as a cutoff or not - but it covers the switch from Central time (C%sT, i.e. CST or CDT) to Eastern time in 1991 and the switch from Eastern time back to Central time in 2006. For Vladivostok, the CLDR has <timezone type="Asia/Vladivostok"> <usesMetazone mzone="Vladivostok"/> </timezone> in supplemental/metaZones.xml, so there's a "Vladivostok" metazone and Asia/Vladivostok is always in that metazone. The data for that metazone in main/en.xml is <metazone type="Vladivostok"> <long> <generic>Vladivostok Time</generic> <standard>Vladivostok Standard Time</standard> <daylight>Vladivostok Summer Time</daylight> </long> </metazone> and in main/ru.xml is <metazone type="Vladivostok"> <long> <generic>Владивосток</generic> <standard>Владивосток, стандартное время</standard> <daylight>Владивосток, летнее время</daylight> </long> </metazone> They do not have "short" names - i.e., abbreviations - for Vladivostok. If the entry in main/en.xml were <metazone type="Vladivostok"> <long> <generic>Vladivostok Time</generic> <standard>Vladivostok Standard Time</standard> <daylight>Vladivostok Summer Time</daylight> </long> <short> <generic>VLAT</generic> <standard>VLAST</standard> <daylight>VLADT</daylight> </short> </metazone> or something such as that, the CLDR could handle providing an abbreviation, and if a different abbreviation were appropriate in Russian, <short> entries could be provided in the main/ru.xml entry.
Note that many systems don't bother translating the abbreviations even when the rest of the date display is localized.
For example on Linux, I get
$ TZ=Europe/Moscow LANG=ru_RU.UTF-8 date Срд Ноя 2 20:50:12 MSK 2016
and on Mac OS X:
$ TZ=Europe/Moscow LANG=ru_RU date среда, 2 ноября 2016 г. 20:48:10 (MSK)
The CLDR provides no abbreviations for the "Moscow" metazone, so it couldn't be used to localize the abbreviation. Somebody would have to provide localized abbreviations to the CLDR maintainers if they're desired.

On Wed, Nov 2, 2016 at 2:46 PM, Guy Harris <guy@alum.mit.edu> wrote:
CLDR has entries that map from what they call "metazones" to both long and short time zone names; the short time zone names are abbreviations.
Thanks for a crash course on CLDR! Does this mean that a date utility that uses CLDR will not be affected by the tzdb change from VLAT to +10? Is timezone data in en.xml derived mechanically from tzdb or maintained independently?

On Nov 2, 2016, at 12:02 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Thanks for a crash course on CLDR! Does this mean that a date utility that uses CLDR will not be affected by the tzdb change from VLAT to +10?
I assume we're talking about a date utility conforming to the Single UNIX Specification, so that the default date format is "+%a %b %e %H:%M:%S %Z %Y", thus including a "Timezone name, or no characters if no timezone is determinable." If so, then it depends on how the code determines the timezone name. If it interprets %Z as an abbreviation - or a short name in CLDR terminology - then, as the CLDR doesn't currently provide abbreviations for the Vladivostok metazone, %Z will have to come from some source other than the metazone, and it may be that it comes from a source other than the CLDR, and it might be affected by the change. Here's more information on the Unicode Locale Data Markup Language, as used by the CLDR: http://www.unicode.org/reports/tr35/tr35-dates.html#Time_Zone_Format_Termino...
Is timezone data in en.xml derived mechanically from tzdb or maintained independently?
I don't know, but I suspect it's maintained independently, by the CLDR maintainers. I think some maintainers are members of the list; if so, hopefully they'll respond.

On Wed, Nov 2, 2016, at 14:07, Alexander Belopolsky wrote:
I don't think CLDR can supply translation of +10 to VLAT without help from tzdb. In fact, I don't think CLDR even supports abbreviations - it looks like they only localize TZID's such as Asia/Vladivostok. Note that many systems don't bother translating the abbreviations even when the rest of the date display is localized.
CLDR does support abbreviations, but a couple steps of indirection are required. You have to take the tzid and an effective date to find out the "metazone", then the "metazone" knows what its abbreviations are. I've advocated in the past for tz files to identify the metazone of each transition because the CLDR should really not be responsible for keeping this up to date.

On Nov 2, 2016, at 9:55 PM, Random832 <random832@fastmail.com> wrote:
You have to take the tzid and an effective date to find out the "metazone", then the "metazone" knows what its abbreviations are. I've advocated in the past for tz files to identify the metazone of each transition because the CLDR should really not be responsible for keeping this up to date.
One way to do that might be to 1) have items either in the source files or in a separate file defining metazones and their tzdb-supplied abbreviations; 2) replace the FORMAT column with a METAZONE column and give the metazone name. However, that might break code (other than zic, which we'd change) that reads the *source* files rather than the *compiled* files; if that's an issue, we might have to do something such as: give the new files names ending with ".tz" or some extension; have part of the process for building a release be running a program or script that takes the .tz files and, if the metazones are in a separate file, the metazone file, and generates old-style files with a FORMAT column. We'd also have version 4 of the compiled files; one possibility would be to append a count of metazones, a list of metazone names, and a table of metazone name table indices, indexed by the same value that's used as an index in the struct ttinfo table. We could add an API that, for a given time_t value, returns the metazone name. That could be used to look up data in the CLDR.

On Thu, Nov 3, 2016, at 03:35, Guy Harris wrote:
One way to do that might be to
1) have items either in the source files or in a separate file defining metazones and their tzdb-supplied abbreviations;
2) replace the FORMAT column with a METAZONE column and give the metazone name.
I strongly suspect that tzid/abbreviation pairs can be mapped directly to metazones (i.e. there's no case where the same tzid and abbreviation maps to two different metazones at different times). Or we could simply ditch the abbreviations (a process we seem to already have started) and put the metazone in the abbreviation field. Or just maintain the tzid/time to metazone mapping source in a separate file, either in the same format that CLDR does now or in some format (whitespace-separated values?) that's easier to write a parser for. My main point is that it should have the same release cycle as the rest of tzdata, so that they won't fall out of sync, whether by creation of an entirely new zone that CLDR doesn't know about, or moving one to a different 'metazone'. This happens all the time and it only doesn't cause problems because nobody actually uses the data; a status quo that will not survive if we continue the course of removing abbreviations from tzdata and insisting that people who need them switch to using the CLDR.
We'd also have version 4 of the compiled files; one possibility would be to append a count of metazones, a list of metazone names, and a table of metazone name table indices, indexed by the same value that's used as an index in the struct ttinfo table. We could add an API that, for a given time_t value, returns the metazone name. That could be used to look up data in the CLDR.
Incidentally, is there a CLDR mailing list? I feel like they should be a part of this, since what I'm suggesting is essentially a transfer of "ownership" of part of the data they currently maintain.

My main point is that it should have the same release cycle as the rest of
tzdata, so that they won't fall out of sync, whether by creation of an entirely new zone that CLDR doesn't know about, or moving one to a different 'metazone'. This happens all the time and it only doesn't cause problems because nobody actually uses the data
Well, I use it. https://github.com/mj1856/TimeZoneNames Others use it too and I've commented on the release cycle problems before, in issues like this one: https://github.com/nodatime/nodatime/issues/473 Oh, you meant something mainstream, well: http://site.icu-project.org/ And I believe that's used by many major browsers, Chrome, FireFox, Edge, etc.
Incidentally, is there a CLDR mailing list? I feel like they should be a part of this, since what I'm suggesting is essentially a transfer of "ownership" of part of the data they currently maintain.
Yes: http://www.unicode.org/mailman/listinfo/cldr-users More details here: http://www.unicode.org/consortium/distlist.html

On Thu, Nov 3, 2016, at 13:56, Matt Johnson wrote:
nobody actually uses the data
Well, I use it. https://github.com/mj1856/TimeZoneNames
Others use it too and I've commented on the release cycle problems before, in issues like this one: https://github.com/nodatime/nodatime/issues/473
Oh, you meant something mainstream, well: http://site.icu-project.org/
Okay, "nobody" was a bit hyperbolic. My main issue was the fact that most OS's C libraries (which tend to follow tzdb's lead, except for Windows - which has its own timezones and its own localization) don't seem use it, and as far as I know Python's standard library (which doesn't exactly have its act together with timezones in general either) also doesn't. The fact that we're talking about your project and nodatime rather than the basic class libraries of either C# or Java suggests that the latter don't use the CLDR data, or possibly even the tzdb data at all. The fact that they _do_ tend to provide localization of, say, month names, means a lot of people might not realize that they're inadequate.

Okay, "nobody" was a bit hyperbolic. My main issue was the fact that most OS's C libraries (which tend to follow tzdb's lead, except for Windows - which has its own timezones and its own localization) don't seem use it, and as far as I know Python's standard library (which doesn't exactly have its act together with timezones in general either) also doesn't.
The fact that we're talking about your project and nodatime rather than the basic class libraries of either C# or Java suggests that the latter don't use the CLDR data, or possibly even the tzdb data at all. The fact that they _do_ tend to provide localization of, say, month names, means a lot of people might not realize that they're inadequate.
Python's standard library doesn't have it, but there's Babel, which does have time zone support from CLDR http://babel.pocoo.org/ I believe Java has been using it for quite some time now. AFAIK, TimeZone.getDisplayName calls into ICU4J, which uses CLDR data. https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html#getDisplay... Anyway, I agree that it's not likely to be found in low-level C libraries, but that doesn't mean it's not in use. My person experience - I've found most of the time zone long names in CLDR to be adequate, but their support for time zone abbreviations is somewhat patchwork. They have lots of them, but they don't cover the world. As far as locale, In general, they seem to only exist in English, and occasionally in French. (ex, "HNE", or "HEE" for French-Canadian Eastern Time abbreviations. Are these used for real in Canada? Who knows.) They also seem to think that abbreviations should be only used locally, such as "BST" only being used by British English. One would think that if an American was talking about DST in the UK they would also use the term "BST", but no: http://unicode.org/cldr/trac/ticket/8498 Honestly, I don't think pointing folks at CLDR as a better source of time zone abbreviations is very useful. CLDR might be able to host such data, but it just doesn't seem to be curated to the same degree that other things are. </rant>

On 2016-11-03 14:47, Matt Johnson wrote:
As far as locale, In general, they seem to only exist in English, and occasionally in French. (ex, "HNE", or "HEE" for French-Canadian Eastern Time abbreviations. Are these used for real in Canada? Who knows.)
Yes, they are. The French portion of the Canadian Broadcasting Network, called 'Radio-Canada' uses them on their TV schedules on their website. http://ici.radio-canada.ca/tele/guide-horaire But most Canadians are not francophone, and use the same English abbreviations as Americans do. --

The French portion of the Canadian Broadcasting Network, called 'Radio-Canada' uses them on their TV schedules on their website. http://ici.radio-canada.ca/tele/guide-horaire
Nice find. Thanks!

On 2016-11-03 13:03, Matt Johnson wrote:
The French portion of the Canadian Broadcasting Network, called 'Radio-Canada' uses them on their TV schedules on their website. http://ici.radio-canada.ca/tele/guide-horaire
Nice find. Thanks!
Officielle: http://www.nrc-cnrc.gc.ca/fra/services/heure/fuseaux_horaires.html This weekend they move from Heure Avancee d... to Heure Normale d... examples: http://www.nrc-cnrc.gc.ca/fra/services/heure/faq/index.html#Q5 -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
participants (9)
-
Alexander Belopolsky
-
Brian Inglis
-
David Patte ₯
-
Guy Harris
-
Matt Johnson
-
Paul Eggert
-
Random832
-
Sergei Turchanov
-
Steven R. Loomis