Single source of truth for timezone data
Hi everyone, I am writing this mail with my distribution maintainer hat on since I am responsible for keeping the tzdata package up-to-date in Ubuntu. I like to propose to have a single source of truth for timezone data, which should be the tzdata source package. Updating this package in a distribution like Debian/Ubuntu should be enough to update all consumers in the distribution. Sadly this is currently not the case. Ubuntu since 20.04 (focal) adds the four icu source files metaZones.txt, timezoneTypes.txt, windowsZones.txt, and zoneinfo64.txt from https://github.com/unicode-org/icu-data to the tzdata source package. Then it uses genrb and icupkg to generate the .res files. Since the icu project lacks behind the tzdata release by hours up to days, Ubuntu has to update tzdata twice (first the tzdata release, then the icu update). metaZones.txt, timezoneTypes.txt, and windowsZones.txt are generated using tools/cldr/cldr-to-icu/build-icu-data.xml from https://github.com/unicode-org/icu. zoneinfo64.txt is generated by tz2icu. build-icu-data.xml uses https://github.com/unicode-org/cldr as input to convert the cldr data to icu. If I saw that correctly, only common/supplemental/metaZones.xml needs to be updated there on an update. Can tzdata include the necessary data and tools to generate metaZones.xml from its source? Then it would be possible to generate the icu data from the tzdata source without the need to wait for icu to catch up. Then Debian/Ubuntu can ship a tzdata-icu package for it [1]. [1] https://bugs.debian.org/954112 -- Benjamin Drung Debian & Ubuntu Developer
On 12/1/22 05:45, Benjamin Drung via tz wrote the following, which I'm quoting nearly in its entirety because it doesn't seem to have hit the tz archive (presumably because Benjamin is not on the tz mailing list):
I am writing this mail with my distribution maintainer hat on since I am responsible for keeping the tzdata package up-to-date in Ubuntu. I like to propose to have a single source of truth for timezone data, which should be the tzdata source package. Updating this package in a distribution like Debian/Ubuntu should be enough to update all consumers in the distribution.
Sadly this is currently not the case. Ubuntu since 20.04 (focal) adds the four icu source files metaZones.txt, timezoneTypes.txt, windowsZones.txt, and zoneinfo64.txt from https://github.com/unicode-org/icu-data to the tzdata source package. Then it uses genrb and icupkg to generate the .res files. Since the icu project lacks behind the tzdata release by hours up to days, Ubuntu has to update tzdata twice (first the tzdata release, then the icu update).
metaZones.txt, timezoneTypes.txt, and windowsZones.txt are generated using tools/cldr/cldr-to-icu/build-icu-data.xml from https://github.com/unicode-org/icu. zoneinfo64.txt is generated by tz2icu. build-icu-data.xml uses https://github.com/unicode-org/cldr as input to convert the cldr data to icu. If I saw that correctly, only common/supplemental/metaZones.xml needs to be updated there on an update.
Can tzdata include the necessary data and tools to generate metaZones.xml from its source? Then it would be possible to generate the icu data from the tzdata source without the need to wait for icu to catch up. Then Debian/Ubuntu can ship a tzdata-icu package for it [1].
Currently when creating data, tzcode uses only software that can be built and run on a barebones POSIX system, and this tzcode software is all public domain. If the tools you're thinking of are all public-domain POSIX apps, then what you're asking might be doable. If not, that might introduce a significant licensing and/or portability issue, which we'd have to think through carefully before proceeding. I took a brief look at metaZones.xml[1] and saw that it contains much info not derived from TZDB. Its metazone-based data entries are useful for internationalization, which historically has been out of scope for TZDB and has "belonged" to CLDR. Presumably if tzdata contained this i18n info, TZDB would need to coordinate with the CLDR project before any TZDB release that changed any i18n-relevant data. Unfortunately this coordination would introduce a coupling that would slow down distribution of timezone data. At times the CLDR data have skipped coverage of TZDB releases, due to being so far behind. In contrast, we've had to do TZDB releases with less than 24 hours' notice and that's stretching our capabilities as it is; it's hard to imagine that we could react that quickly even with i18n-related delays factored in. Also, I don't see how TZDB itself could generate the files in question without running afoul of Unicode, Inc.'s copyright. The metaZones.xml file's terms of use don't allow TZDB to generate its own metaZones.xml copy, which might different in unimportant ways (e.g., spacing) or even in ways that are semantically significant with whatever Unicode, Inc. eventually decides to do. At best TZDB could merely redistribute the metaZones.xml file that Unicode, Inc. eventually publishes, but this would mean that the metaZones.xml file in a TZDB release would likely be out of date. You might be better off creating a tzdata-icu package along the lines suggested by Aurelien Jarno[2]. Although adding a package is a hassle (as one can also see with the tzdata-java package), the hassle seems inevitable here, due to the delay between getting new timestamp data and getting new i18n for that data. [1]: https://github.com/unicode-org/cldr/blob/main/common/supplemental/metaZones.... [2]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=954112
Copyrights notwithstanding, it really isn't the job of the TZDB project to take over the work of either CLDR or ICU. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
Also copyrights notwithstanding (but still mportant), considering that metaZones.tzt file that Paul included in the message to which I am replying, as a "source of truth" is laughable. tzdata isn't perfect, though it (at least until a couple of years ago) always strived to have the best known data available to us. That file contains nonsense: "Australia:Brisbane"{ { "Australia_Eastern", } } "Australia:Sydney"{ { "Australia_Eastern", } } That cannot be right, the times in Brisbane and Sydney (right now) aren't the same - they cannot possibly (whatever the rest of that data actually means) be treated the same. We also see: "Australia:Hobart"{ { "Australia_Eastern", } } which is less bad right now, as Hobart's and Sydney's clocks, today, are the same - but they haven't always been, not even back to 1970. The same issue applies to: "Australia:Adelaide"{ { "Australia_Central", } } "Australia:Darwin"{ { "Australia_Central", } } The clocks in Darwin and Adelaide (right now) are not the same either. I haven't checked all of the rest of it, but if just this sample is this bad, why would anyone want to even consider distributing this stuff? Note that the next version of POSIX has extended its definition of TZ (or will have, after it is completed, approved, and published) in a way that allows both America/New_York and Australia:Adelaide (and many other possibilities) to be valid TZ strings (it still allows leading ':' strings for implementation defined specifiers) - but it requires that such timezones contain accurate zone data for all transitions since the epoch (1970-01-01T:00:00:00Z) and encourages providing the best possible data for older times. That is, implementations must map Australia:Darwin into Australia/Darwin if using tzdata, not to Australia/Adelaide, and similarly Australia/Sydney Australia/Brisbane and Australia/Hobart must refer to 3 different transition lists. kre
On Dec 1, 2022, at 5:45 AM, Benjamin Drung via tz <tz@iana.org> wrote:
Sadly this is currently not the case. Ubuntu since 20.04 (focal) adds the four icu source files metaZones.txt, timezoneTypes.txt, windowsZones.txt, and zoneinfo64.txt
Those aren't part of CLDR, those are part of icu-data, and I've no idea what the heck they are.
metaZones.txt, timezoneTypes.txt, and windowsZones.txt are generated using tools/cldr/cldr-to-icu/build-icu-data.xml from https://github.com/unicode-org/icu. zoneinfo64.txt is generated by tz2icu. build-icu-data.xml uses https://github.com/unicode-org/cldr as input to convert the cldr data to icu. If I saw that correctly, only common/supplemental/metaZones.xml needs to be updated there on an update.
That would be good, as that would mean that those icu-data files would Not Be Our Problem.
Can tzdata include the necessary data and tools to generate metaZones.xml
*That's* from CLDR
from its source?
A "metazone" name appears to be an key in maps from metazone names to sets of "time zone" names, e.g. the map common/main/en.xml in CLDR-28 maps "Europe_Western" to "long" names: a "generic" name "Western European Time"; a "standard" name "Western European Standard Time"; a "daylight" name "Western European Summer Time"; and the "timeZoneNames" map in common/main/es_US.xml in CLDR-28 maps "America_Central" to "short" names: a "generic" name "CT; a "standard" name "CST"; a "daylight" name "CDT"; so it'a somewhat like the result of combining the FORMAT field in a Zone line with the LETTER/S field of a Rule line, except that it's locale-specific, unlike the FORMAT+LETTER/S time zone abbreviation we provide. In addition, the time zone abbreviation is required by POSIX, while the "long name" isn't. It Might Be Nice if we could assist in associating specific Zone-line+Rule-line combinations with metazones, but that might require us to know things about locales, not just about time zones, which would require us to know things that I think are clearly not in our scope.
On Dec 2, 2022, at 7:36 PM, Guy Harris via tz <tz@iana.org> wrote:
so it'a somewhat like the result of combining the FORMAT field in a Zone line with the LETTER/S field of a Rule line, except that it's locale-specific, unlike the FORMAT+LETTER/S time zone abbreviation we provide.
No, the metazone would be something we might, for example, add to the end of a Zone line - it doesn't indicate standard vs. daylight saving time, it has both names if DST is, or ever was, observed by any tzdb region while in that metazone. (Yes, tzdb regions can move from one metazone to another.) So we could, in principle: have new versions of the source files with an additional METAZONE field at the end of Zone lines; have zic handle that field; have a tool to strip off the METAZONE field, to produce files to be handled by tools that read source files and that *don't* understand the METAZONE field; *if* the CLDR 1) is willing to work with that; 2) is willing to do the initial work of creating those new versions of the source files; 3) is willing either to put up with whatever names we choose for metazones any time we create a new Zone that needs a new metazone or is willing to give us a new metazone name in a timely fashion (as in "fast enough that the new metazone name is provided before we need to make a new tzdb release) whenever we need to create a new Zone. \\
[Retitling from "Single source of truth for timezone data" as per <https://data.iana.org/time-zones/theory.html#accuracy>.] On 2022-12-02 22:57, Guy Harris via tz wrote:
So we could, in principle:
have new versions of the source files with an additional METAZONE field at the end of Zone lines;
have zic handle that field;
have a tool to strip off the METAZONE field, to produce files to be handled by tools that read source files and that*don't* understand the METAZONE field;
Good suggestion. Possibly even better would be leave the current files alone (this would cause less disruption downstream), and have a new file containing a table that maps abbreviations+contexts to longer names that disambiguate the abbreviations. These longer names could then be used by CLDR and any other localization efforts. Something like this, say: ACST * Australian Central Standard ACDT * Australian Central Daylight AST * Atlantic Standard ADT * Atlantic Daylight APT * Atlantic Peace AWT * Atlantic War ADDT * Atlantic Double Daylight ... CST America/* Central Standard CST Asia/* China Standard ... IST Asia/Jerusalem Israel Standard IST Asia/Kolkata India Standard IST Europe/Dublin Irish Standard ... BMT Asia/Jakarta Batavia Mean BMT Europe/Zurich Bern Mean BMT America/Barbados Bridgetown Mean BMT Europe/Chisinau Bucharest Mean where the xMT abbreviation list is shortened by omitting entries like the following as they can be deduced automatically: BMT Asia/Baghdad Baghdad Mean BMT Asia/Bangkok Bangkok Mean BMT America/Bermuda Bermuda Mean BMT America/Bogota Bogota Mean BMT Europe/Brussels Brussels Mean BMT Europe/Bucharest Bucharest Mean and with the convention that LMT always denotes local mean time when it appears in a Zone line (as opposed to a continuation line). This new table would be a machine-processable improvement on the less-formal abbreviation lists that are already published here: https://data.iana.org/time-zones/theory.html#abbreviations Such a table should be relatively easy to maintain, as the set of abbreviations is small and rarely changes now that we've removed invented abbreviations. Of course it would require some work from the CLDR side, if they want to use this new table instead of doing things their current way, so it'd be nice to hear from the CLDR side before proceeding.
On Dec 3, 2022, at 12:22 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
Possibly even better would be leave the current files alone (this would cause less disruption downstream), and have a new file containing a table that maps abbreviations+contexts to longer names that disambiguate the abbreviations. These longer names could then be used by CLDR and any other localization efforts. Something like this, say:
ACST * Australian Central Standard ACDT * Australian Central Daylight
CLDR also provideds a "generic" name, so I guess that would be ACST * Australian Central Standard, Australian Central, ACT ACDT * Australian Central Daylight, Australian Central, ACT to allow both abbreviations to be converted to the "generic" zone long - and short - names, unless they'd be able to do that on their end.
Guy Harris wrote in <49DEEDEF-3BA0-4E1C-BFB1-AA372A93F0FD@sonic.net>: |On Dec 2, 2022, at 7:36 PM, Guy Harris via tz <tz@iana.org> wrote: ... |(Yes, tzdb regions can move from one metazone to another.) | |So we could, in principle: ... I would like to also point to UN Locode mappings. On a different list many of the readers are also on i said about three weeks ago: UN Locode support should possibly be supported or reserved as trade will continue to be the thing, i would suggest via leading @ plus the five letters which makes up a Locode entry. I would presume these Locode IDs become known more and more in the future. [.]IANA TZ over and over discusses the syntax and content of the Area/Location IDs as such, and whereas i would assume backward-compatibility will not be lost, i could imagine a new maintainer (the current one is on the austin ML and reads this, sufficient time provided) switches to a new format that serves ISO 3166 country codes much better, and ISO 3166 country codes are the base (root level) of UN Locodes. --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Does that mean that libc APIs will behave differently from whatever is using ICU files? On Android we are waiting for ICU/CLDR patches because we offer libc, ICU, and several Java APIs. I believe Apple also waits for ICU/CLDR patches, though I do not know their reasons. On Fri, 2 Dec 2022 at 17:25, Benjamin Drung via tz <tz@iana.org> wrote:
Hi everyone,
I am writing this mail with my distribution maintainer hat on since I am responsible for keeping the tzdata package up-to-date in Ubuntu. I like to propose to have a single source of truth for timezone data, which should be the tzdata source package. Updating this package in a distribution like Debian/Ubuntu should be enough to update all consumers in the distribution.
Sadly this is currently not the case. Ubuntu since 20.04 (focal) adds the four icu source files metaZones.txt, timezoneTypes.txt, windowsZones.txt, and zoneinfo64.txt from https://github.com/unicode-org/icu-data to the tzdata source package. Then it uses genrb and icupkg to generate the .res files. Since the icu project lacks behind the tzdata release by hours up to days, Ubuntu has to update tzdata twice (first the tzdata release, then the icu update).
metaZones.txt, timezoneTypes.txt, and windowsZones.txt are generated using tools/cldr/cldr-to-icu/build-icu-data.xml from https://github.com/unicode-org/icu. zoneinfo64.txt is generated by tz2icu. build-icu-data.xml uses https://github.com/unicode-org/cldr as input to convert the cldr data to icu. If I saw that correctly, only common/supplemental/metaZones.xml needs to be updated there on an update.
Can tzdata include the necessary data and tools to generate metaZones.xml from its source? Then it would be possible to generate the icu data from the tzdata source without the need to wait for icu to catch up. Then Debian/Ubuntu can ship a tzdata-icu package for it [1].
[1] https://bugs.debian.org/954112
-- Benjamin Drung Debian & Ubuntu Developer
On Wed, Dec 7, 2022 at 3:38 AM Almaz Mingaleev via tz <tz@iana.org> wrote:
Does that mean that libc APIs will behave differently from whatever is using ICU files?
yes, this is already true, and...
On Android we are waiting for ICU/CLDR patches because we offer libc, ICU, and several Java APIs.
...exactly the reason Android does this :-) (we learned this the hard way a decade ago when we updated tzdata but not icu, and the results of date formatting suddenly depended on which API you used :-( in a sense it doesn't matter --- 99.999% of dates/times a *user* sees on Android have come from icu rather than libc, but that also means that updating tzdata without icu is not as useful as you might otherwise imagine.)
I believe Apple also waits for ICU/CLDR patches, though I do not know their reasons.
i'm not aware of *any* libc that uses icu (or cldr data directly) for this[1], so everyone's in the same boat. ____ 1. "this" being specifically just the time/date stuff. note that bionic does, for example, defer to icu for all the iswprint() and wcwidth() type stuff. i wanted to do the same for iconv() but never worked out how to bridge the API gap between that and icu. i've never tried with the date/time stuff, and i'm not aware that anyone else has, so if anyone has implementation experience there, i'd love to hear it!
On Fri, 2 Dec 2022 at 17:25, Benjamin Drung via tz <tz@iana.org> wrote:
Hi everyone,
I am writing this mail with my distribution maintainer hat on since I am responsible for keeping the tzdata package up-to-date in Ubuntu. I like to propose to have a single source of truth for timezone data, which should be the tzdata source package. Updating this package in a distribution like Debian/Ubuntu should be enough to update all consumers in the distribution.
Sadly this is currently not the case. Ubuntu since 20.04 (focal) adds the four icu source files metaZones.txt, timezoneTypes.txt, windowsZones.txt, and zoneinfo64.txt from https://github.com/unicode-org/icu-data to the tzdata source package. Then it uses genrb and icupkg to generate the .res files. Since the icu project lacks behind the tzdata release by hours up to days, Ubuntu has to update tzdata twice (first the tzdata release, then the icu update).
metaZones.txt, timezoneTypes.txt, and windowsZones.txt are generated using tools/cldr/cldr-to-icu/build-icu-data.xml from https://github.com/unicode-org/icu. zoneinfo64.txt is generated by tz2icu. build-icu-data.xml uses https://github.com/unicode-org/cldr as input to convert the cldr data to icu. If I saw that correctly, only common/supplemental/metaZones.xml needs to be updated there on an update.
Can tzdata include the necessary data and tools to generate metaZones.xml from its source? Then it would be possible to generate the icu data from the tzdata source without the need to wait for icu to catch up. Then Debian/Ubuntu can ship a tzdata-icu package for it [1].
[1] https://bugs.debian.org/954112
-- Benjamin Drung Debian & Ubuntu Developer
On 2022-12-07 03:37, Almaz Mingaleev via tz wrote:
Does that mean that libc APIs will behave differently from whatever is using ICU files?
No, as I understand it the idea is to allow automatic rebuilding of the timezone-related part of the CLDR/ICU data, whenever a new TZDB version is released, without having to wait for updates from CLDR/ICU. As I understand it, if TZDB releases version X, Android and Apple currently wait for ICU/CLDR to release a new version that takes X's changes into account. The idea is that we could improve on this by developing a procedure that, given a new TZDB release X and an older CLDR/ICU release W, would produce a slightly-modified copy W' of the CLDR/IDU data that takes X's changes into account, without having to wait for ICU/CLDR to produce a new version. Of course when ICU/CLDR get around to producing a new version X themselves, you could discard W'. The main obstacles I see to this are: * Copyright issues. Unicode, Inc. does not allow people to redistribute modified versions of their data files, even if the modifications are clearly needed. * Institutional inertia. ICU/CLDR is big and is unaccustomed to moving fast, and timezone info is not high priority for ICU/CLDR. Although we haven't so far seen any sign of interest here from the Unicode, Inc. side, perhaps they'll get around to it if some big customers ask them in the right way. By the way, what bad things would happen if Android and Apple *didn't* wait? In other projects I've dealt with, if an upstream program P adds a new timezone abbreviation "XYT", downstream distros like Ubuntu can immediately ship the new P without waiting for i18n updates. P will still work fine in all locales, except that the new "XYT" abbreviation (should it be needed) will appear as-is until translators catch up. Could Android and Apple do something similar with timezone-related strings?
On Wed, Dec 7, 2022 at 10:08 AM Paul Eggert via tz <tz@iana.org> wrote:
On 2022-12-07 03:37, Almaz Mingaleev via tz wrote:
Does that mean that libc APIs will behave differently from whatever is using ICU files?
No, as I understand it the idea is to allow automatic rebuilding of the timezone-related part of the CLDR/ICU data, whenever a new TZDB version is released, without having to wait for updates from CLDR/ICU.
As I understand it, if TZDB releases version X, Android and Apple currently wait for ICU/CLDR to release a new version that takes X's changes into account. The idea is that we could improve on this by developing a procedure that, given a new TZDB release X and an older CLDR/ICU release W, would produce a slightly-modified copy W' of the CLDR/IDU data that takes X's changes into account, without having to wait for ICU/CLDR to produce a new version. Of course when ICU/CLDR get around to producing a new version X themselves, you could discard W'.
The main obstacles I see to this are:
* Copyright issues. Unicode, Inc. does not allow people to redistribute modified versions of their data files, even if the modifications are clearly needed.
* Institutional inertia. ICU/CLDR is big and is unaccustomed to moving fast, and timezone info is not high priority for ICU/CLDR.
Although we haven't so far seen any sign of interest here from the Unicode, Inc. side, perhaps they'll get around to it if some big customers ask them in the right way.
By the way, what bad things would happen if Android and Apple *didn't* wait? In other projects I've dealt with, if an upstream program P adds a new timezone abbreviation "XYT", downstream distros like Ubuntu can immediately ship the new P without waiting for i18n updates. P will still work fine in all locales, except that the new "XYT" abbreviation (should it be needed) will appear as-is until translators catch up. Could Android and Apple do something similar with timezone-related strings?
it's possible i'm adding more confusion here (in which case tell me and i'll shut up) because i haven't done any of these updates in many years now (thanks, almaz and others!), but the problem we had in the past that i alluded to was that icu4c's date formatting was using its own copy of the _transition_ data. so format via libc and you get the tzdata transitions (we're using more-or-less tzcode directly), but format via icu4c (which is almost every string in the UI) and you'd be using the transition data bundled with icu. if that's no longer duplicated, and it's only the _strings_ now, that's awesome, and nothing i've said (in this mail or my previous one) is relevant (for which i apologize!).
On Wed, 7 Dec 2022 at 18:08, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 2022-12-07 03:37, Almaz Mingaleev via tz wrote:
Does that mean that libc APIs will behave differently from whatever is using ICU files?
No, as I understand it the idea is to allow automatic rebuilding of the timezone-related part of the CLDR/ICU data, whenever a new TZDB version is released, without having to wait for updates from CLDR/ICU.
I've meant how things work now. My concern was around these lines:
Since the icu project lacks behind the tzdata release by hours up to days, Ubuntu has to update tzdata twice (first the tzdata release, then the icu update).
Between these updates you will get different answers from different APIs.
As I understand it, if TZDB releases version X, Android and Apple currently wait for ICU/CLDR to release a new version that takes X's changes into account. The idea is that we could improve on this by developing a procedure that, given a new TZDB release X and an older CLDR/ICU release W, would produce a slightly-modified copy W' of the CLDR/IDU data that takes X's changes into account, without having to wait for ICU/CLDR to produce a new version. Of course when ICU/CLDR get around to producing a new version X themselves, you could discard W'.
ICU and CLDR versions are bumped once a year. We do not wait for a new version, but for patches only.
The main obstacles I see to this are:
* Copyright issues. Unicode, Inc. does not allow people to redistribute modified versions of their data files, even if the modifications are clearly needed.
* Institutional inertia. ICU/CLDR is big and is unaccustomed to moving fast, and timezone info is not high priority for ICU/CLDR.
Although we haven't so far seen any sign of interest here from the Unicode, Inc. side, perhaps they'll get around to it if some big customers ask them in the right way.
By the way, what bad things would happen if Android and Apple *didn't* wait? In other projects I've dealt with, if an upstream program P adds a
On Android we cannot not wait. As I mentioned there are several API surfaces and being consistently wrong about _new_ changes is better than having different answers within the platform. Also, it takes time to prepare and rollout these updates, so not waiting will only cause disruption, extra work, and bugs from confused users.
new timezone abbreviation "XYT", downstream distros like Ubuntu can immediately ship the new P without waiting for i18n updates. P will
It is not a problem if a distro ships it, it is a problem if this new stuff leaks to the external world. It would be naive to expect from a user base of a service to have up-to-date time zone data even after a month after TZDB release.
still work fine in all locales, except that the new "XYT" abbreviation (should it be needed) will appear as-is until translators catch up.
It is not translations that we are waiting for, but changes like [1 <https://github.com/unicode-org/icu/pull/2261/files>]. Recent time zone changes were short notice ones and ICU team (thanks Yoshito and others!) did these changes very quickly. [1] https://github.com/unicode-org/icu/pull/2261/files
Could Android and Apple do something similar with timezone-related strings?
On 2022-12-08 05:24, Almaz Mingaleev wrote:
new timezone abbreviation "XYT", downstream distros like Ubuntu can immediately ship the new P without waiting for i18n updates. P will
It is not a problem if a distro ships it, it is a problem if this new stuff leaks to the external world.
Sorry, I'm not following. When Ubuntu ships a new tzdata package then surely that has "leaked" to the outside world?
It would be naive to expect from a user base of a service to have up-to-date time zone data even after a month after TZDB release.
In the past I've gotten my Ubuntu systems updated within 24 hours of a tzdata release, simply by applying patches as usual from Ubuntu. But as Benjamin indicates, this is not happening with 2022g. I just now checked for updates and I'm still stuck on 2022f. So from my point of view, Ubuntu is slow in this case - instead of taking less than a day, it's taking more than a week.
being consistently wrong about _new_ changes is better than having different answers within the platform.
As a Ubuntu user, I'd prefer tzdata to be up-to-date even though ICU is out-of-date, over having both tzdata and ICU out-of-date. Of course Ubuntu differs from Android in that most apps use tzdata not ICU. Still, I'm a bit curious what end-user-visible problems would occur on Android and/or Ubuntu if tzdata leads ICU slightly. I know you've seen problems, but were they end-user problems or just test-case problems? On Ubuntu various other copies of tzdata (e.g., Python's) can be slightly out of date too, but this doesn't seem to be much of an issue.
It is not translations that we are waiting for, but changes like [1 <https://github.com/unicode-org/icu/pull/2261/files>]. Recent time zone changes were short notice ones and ICU team (thanks Yoshito and others!) did these changes very quickly.
Thanks, I didn't know that. Here's a timeline I see for the latest Mexico change: * 2022-11-28 17:00 UTC - news article published announcing the change (which is not official yet, I think) <http://puentelibre.mx/noticia/ciudad_juarez_cambio_horario_noviembre_2022/> * 2022-11-29 03:55:31 UTC - tz mailing list notified <https://mm.icann.org/pipermail/tz/2022-November/032365.html> * 2022-11-29 17:42:29 UTC (14 hours after notification) - tzdb 2022g announced <https://mm.icann.org/pipermail/tz-announce/2022-November/000076.html> * 2022-11-29 18:23:41 UTC (less than an hour after tzdb 2022g announcement) tzdata 2022g-r0 released for Alpine Linux <https://pkgs.alpinelinux.org/package/edge/main/x86/tzdata> * 2022-11-30 07:06 UTC (7 hours after tzdb 2022g announcement) - tzdata 2022g-1 released for Arch Linux <https://archlinux.org/packages/core/x86_64/tzdata/> * 2022-12-01 03:08:08 UTC (33 hours after tzdb 2022g announcement) - abovementioned ICU patch committed * 2022-12-01 12:38:06 UTC (9 hours after ICU patch committed) - Ubuntu patch committed <https://launchpad.net/ubuntu/+source/tzdata/2022g-0ubuntu0.22.10.1> * 2022-12-05 (4 days after ICU patch committed) - Red Hat Enterprise Linux fix available to users <https://access.redhat.com/errata/RHBA-2022:8785> * 2022-12-07 (6 days after ICU patch committed) - Android patch committed <https://android.googlesource.com/platform/system/timezone/+/ea3e0ece71974c1d...> * now (a week after ICU patch committed) - my Ubuntu workstation is still not updated. We should be able to do better than this; that is, be more like Alpine or Arch Linux, or at least more like RHEL (though I see that Fedora still hasn't released 2022g...). Though ICU is part of the problem (as is tzdb itself :-), most of the delay seems to be occurring even after ICU patches are applied.
On Thu, Dec 8, 2022 at 12:50 PM Paul Eggert via tz <tz@iana.org> wrote:
On 2022-12-08 05:24, Almaz Mingaleev wrote:
new timezone abbreviation "XYT", downstream distros like Ubuntu can immediately ship the new P without waiting for i18n updates. P will
It is not a problem if a distro ships it, it is a problem if this new stuff leaks to the external world.
Sorry, I'm not following. When Ubuntu ships a new tzdata package then surely that has "leaked" to the outside world?
It would be naive to expect from a user base of a service to have up-to-date time zone data even after a month after TZDB release.
In the past I've gotten my Ubuntu systems updated within 24 hours of a tzdata release, simply by applying patches as usual from Ubuntu. But as Benjamin indicates, this is not happening with 2022g. I just now checked for updates and I'm still stuck on 2022f. So from my point of view, Ubuntu is slow in this case - instead of taking less than a day, it's taking more than a week.
being consistently wrong about _new_ changes is better than having different answers within the platform.
As a Ubuntu user, I'd prefer tzdata to be up-to-date even though ICU is out-of-date, over having both tzdata and ICU out-of-date. Of course Ubuntu differs from Android in that most apps use tzdata not ICU. Still, I'm a bit curious what end-user-visible problems would occur on Android and/or Ubuntu if tzdata leads ICU slightly. I know you've seen problems, but were they end-user problems or just test-case problems? On Ubuntu various other copies of tzdata (e.g., Python's) can be slightly out of date too, but this doesn't seem to be much of an issue.
(it's been years since i was responsible for this, but unlike the folks who've owned it since, i'm in the same time zone as you, so i'll share my anecdata anyway...) i think i've seen two different variants: "app A doesn't match app B" and "the app doesn't match the clock at the top of the screen provided by the system". "two different parts of app A don't match each other" ought to be possible, but i don't think i've seen that one in practice, presumably because developers tend to be consistent about what languages/APIs they're using. thanks to work from some of the aforementioned folks over in Europe/London, there should be fewer mismatches today, as more stuff (including at least one platform API) has been moved over to icu.
It is not translations that we are waiting for, but changes like [1 <https://github.com/unicode-org/icu/pull/2261/files>]. Recent time zone changes were short notice ones and ICU team (thanks Yoshito and others!) did these changes very quickly.
Thanks, I didn't know that.
Here's a timeline I see for the latest Mexico change:
* 2022-11-28 17:00 UTC - news article published announcing the change (which is not official yet, I think) < http://puentelibre.mx/noticia/ciudad_juarez_cambio_horario_noviembre_2022/
* 2022-11-29 03:55:31 UTC - tz mailing list notified <https://mm.icann.org/pipermail/tz/2022-November/032365.html>
* 2022-11-29 17:42:29 UTC (14 hours after notification) - tzdb 2022g announced <https://mm.icann.org/pipermail/tz-announce/2022-November/000076.html>
* 2022-11-29 18:23:41 UTC (less than an hour after tzdb 2022g announcement) tzdata 2022g-r0 released for Alpine Linux <https://pkgs.alpinelinux.org/package/edge/main/x86/tzdata>
* 2022-11-30 07:06 UTC (7 hours after tzdb 2022g announcement) - tzdata 2022g-1 released for Arch Linux <https://archlinux.org/packages/core/x86_64/tzdata/>
* 2022-12-01 03:08:08 UTC (33 hours after tzdb 2022g announcement) - abovementioned ICU patch committed
* 2022-12-01 12:38:06 UTC (9 hours after ICU patch committed) - Ubuntu patch committed <https://launchpad.net/ubuntu/+source/tzdata/2022g-0ubuntu0.22.10.1>
* 2022-12-05 (4 days after ICU patch committed) - Red Hat Enterprise Linux fix available to users <https://access.redhat.com/errata/RHBA-2022:8785>
* 2022-12-07 (6 days after ICU patch committed) - Android patch committed < https://android.googlesource.com/platform/system/timezone/+/ea3e0ece71974c1d...
* now (a week after ICU patch committed) - my Ubuntu workstation is still not updated.
We should be able to do better than this; that is, be more like Alpine or Arch Linux, or at least more like RHEL (though I see that Fedora still hasn't released 2022g...). Though ICU is part of the problem (as is tzdb itself :-), most of the delay seems to be occurring even after ICU patches are applied.
For general users of computing devices (laptops, mobile, webapps, etc) getting everything to line up with sudden changes is a huge issue. Pushing changes to all the programs on all the computers in all the server farms, all the mobile devices, all the laptops, etc is a massive amount of work that takes time and effort. So countries that only give a few days or even weeks notice can expect a significant period of disruptions — inconsistencies between devices. Mark On Thu, Dec 8, 2022 at 12:50 PM Paul Eggert via tz <tz@iana.org> wrote:
On 2022-12-08 05:24, Almaz Mingaleev wrote:
new timezone abbreviation "XYT", downstream distros like Ubuntu can immediately ship the new P without waiting for i18n updates. P will
It is not a problem if a distro ships it, it is a problem if this new stuff leaks to the external world.
Sorry, I'm not following. When Ubuntu ships a new tzdata package then surely that has "leaked" to the outside world?
It would be naive to expect from a user base of a service to have up-to-date time zone data even after a month after TZDB release.
In the past I've gotten my Ubuntu systems updated within 24 hours of a tzdata release, simply by applying patches as usual from Ubuntu. But as Benjamin indicates, this is not happening with 2022g. I just now checked for updates and I'm still stuck on 2022f. So from my point of view, Ubuntu is slow in this case - instead of taking less than a day, it's taking more than a week.
being consistently wrong about _new_ changes is better than having different answers within the platform.
As a Ubuntu user, I'd prefer tzdata to be up-to-date even though ICU is out-of-date, over having both tzdata and ICU out-of-date. Of course Ubuntu differs from Android in that most apps use tzdata not ICU. Still, I'm a bit curious what end-user-visible problems would occur on Android and/or Ubuntu if tzdata leads ICU slightly. I know you've seen problems, but were they end-user problems or just test-case problems? On Ubuntu various other copies of tzdata (e.g., Python's) can be slightly out of date too, but this doesn't seem to be much of an issue.
It is not translations that we are waiting for, but changes like [1 <https://github.com/unicode-org/icu/pull/2261/files>]. Recent time zone changes were short notice ones and ICU team (thanks Yoshito and others!) did these changes very quickly.
Thanks, I didn't know that.
Here's a timeline I see for the latest Mexico change:
* 2022-11-28 17:00 UTC - news article published announcing the change (which is not official yet, I think) < http://puentelibre.mx/noticia/ciudad_juarez_cambio_horario_noviembre_2022/
* 2022-11-29 03:55:31 UTC - tz mailing list notified <https://mm.icann.org/pipermail/tz/2022-November/032365.html>
* 2022-11-29 17:42:29 UTC (14 hours after notification) - tzdb 2022g announced <https://mm.icann.org/pipermail/tz-announce/2022-November/000076.html>
* 2022-11-29 18:23:41 UTC (less than an hour after tzdb 2022g announcement) tzdata 2022g-r0 released for Alpine Linux <https://pkgs.alpinelinux.org/package/edge/main/x86/tzdata>
* 2022-11-30 07:06 UTC (7 hours after tzdb 2022g announcement) - tzdata 2022g-1 released for Arch Linux <https://archlinux.org/packages/core/x86_64/tzdata/>
* 2022-12-01 03:08:08 UTC (33 hours after tzdb 2022g announcement) - abovementioned ICU patch committed
* 2022-12-01 12:38:06 UTC (9 hours after ICU patch committed) - Ubuntu patch committed <https://launchpad.net/ubuntu/+source/tzdata/2022g-0ubuntu0.22.10.1>
* 2022-12-05 (4 days after ICU patch committed) - Red Hat Enterprise Linux fix available to users <https://access.redhat.com/errata/RHBA-2022:8785>
* 2022-12-07 (6 days after ICU patch committed) - Android patch committed < https://android.googlesource.com/platform/system/timezone/+/ea3e0ece71974c1d...
* now (a week after ICU patch committed) - my Ubuntu workstation is still not updated.
We should be able to do better than this; that is, be more like Alpine or Arch Linux, or at least more like RHEL (though I see that Fedora still hasn't released 2022g...). Though ICU is part of the problem (as is tzdb itself :-), most of the delay seems to be occurring even after ICU patches are applied.
On Thu, 2022-12-08 at 12:50 -0800, Paul Eggert wrote:
In the past I've gotten my Ubuntu systems updated within 24 hours of a tzdata release, simply by applying patches as usual from Ubuntu. But as Benjamin indicates, this is not happening with 2022g. I just now checked for updates and I'm still stuck on 2022f. So from my point of view, Ubuntu is slow in this case - instead of taking less than a day, it's taking more than a week.
Ubuntu has as stable release update (SRU) process [1]. The process includes letting the update age in -proposed for at least seven days. That hasn't changed since the beginning in 2006. So I wonder how you get the updates within 24 hours. Feel free to start a discussion with the SRU team to reduce the ageing time for tzdata updates that are urgent. The technical minimum would be a few hours (for the package build and to run all autopkgtest). Side note: Ubuntu uses phased updates which means that updates reach the users step by step. The tzdata updates are copied to -security [2] and therefore all users should get them without phasing. The update for 2022g is tracked in bug #1998321 [3]. [1] https://wiki.ubuntu.com/StableReleaseUpdates#Procedure [2] https://wiki.ubuntu.com/StableReleaseUpdates#tzdata [3] https://bugs.launchpad.net/ubuntu/+source/tzdata/+bug/1998321
being consistently wrong about _new_ changes is better than having different answers within the platform.
As a Ubuntu user, I'd prefer tzdata to be up-to-date even though ICU is out-of-date, over having both tzdata and ICU out-of-date. Of course Ubuntu differs from Android in that most apps use tzdata not ICU. Still, I'm a bit curious what end-user-visible problems would occur on Android and/or Ubuntu if tzdata leads ICU slightly. I know you've seen problems, but were they end-user problems or just test-case problems? On Ubuntu various other copies of tzdata (e.g., Python's) can be slightly out of date too, but this doesn't seem to be much of an issue.
There should be no copies of tzdata in Ubuntu. All software packages should only rely on tzdata. Please let me know if you find any. The only problematic package that I know is python3-tz which includes a hard coded list of time zone names [4]. Python uses the data from the tzdata package. [4] https://bugs.launchpad.net/ubuntu/+source/python-tz/+bug/207604
It is not translations that we are waiting for, but changes like [1 <https://github.com/unicode-org/icu/pull/2261/files>]. Recent time zone changes were short notice ones and ICU team (thanks Yoshito and others!) did these changes very quickly.
Thanks, I didn't know that.
Here's a timeline I see for the latest Mexico change:
Let me enhance that timeline.
* 2022-11-28 17:00 UTC - news article published announcing the change (which is not official yet, I think) <http://puentelibre.mx/noticia/ciudad_juarez_cambio_horario_noviembre_2022/>
* 2022-11-29 03:55:31 UTC - tz mailing list notified <https://mm.icann.org/pipermail/tz/2022-November/032365.html>
* 2022-11-29 17:42:29 UTC (14 hours after notification) - tzdb 2022g announced <https://mm.icann.org/pipermail/tz-announce/2022-November/000076.html>
* 2022-11-29 18:23:41 UTC (less than an hour after tzdb 2022g announcement) tzdata 2022g-r0 released for Alpine Linux <https://pkgs.alpinelinux.org/package/edge/main/x86/tzdata>
* 2022-11-30 07:06 UTC (7 hours after tzdb 2022g announcement) - tzdata 2022g-1 released for Arch Linux <https://archlinux.org/packages/core/x86_64/tzdata/>
* 2022-11-30 10:21 UTC - Ubuntu tzdata update ticket created https://bugs.launchpad.net/ubuntu/+source/tzdata/+bug/1998321 * 2022-11-30 13:42 UTC - tzdata 2022g-0ubuntu1 uploaded to Ubuntu 23.04 (lunar) * 2022-11-30 14:46 UTC - tzdata 2022g-0ubuntu0.22.10.0 uploaded to Ubuntu 22.10 (kinetic-proposed) * 2022-11-30 16:35 UTC - tzdata 2022g-0ubuntu0.22.04.0 uploaded to Ubuntu 22.04 (jammy-proposed) * 2022-11-30 17:58 UTC - tzdata 2022g-0ubuntu0.20.04.0 uploaded to Ubuntu 20.04 (focal-proposed) * 2022-11-30 19:08 UTC - tzdata 2022g-0ubuntu0.18.04 uploaded to Ubuntu 18.04 (bionic-proposed)
* 2022-12-01 03:08:08 UTC (33 hours after tzdb 2022g announcement) - abovementioned ICU patch committed
* 2022-12-01 12:30 UTC - tzdata 2022g-0ubuntu2 with ICU update uploaded to Ubuntu 23.04 (lunar)
* 2022-12-01 12:38:06 UTC (9 hours after ICU patch committed) - Ubuntu patch committed <https://launchpad.net/ubuntu/+source/tzdata/2022g-0ubuntu0.22.10.1>
* 2022-12-01 12:50 UTC - tzdata 2022g-0ubuntu0.22.04.1 with ICU update uploaded to Ubuntu 22.04 (jammy-proposed) * 2022-12-01 12:54 UTC - tzdata 2022g-0ubuntu0.20.04.1 with ICU update uploaded to Ubuntu 20.04 (focal-proposed)
* 2022-12-05 (4 days after ICU patch committed) - Red Hat Enterprise Linux fix available to users <https://access.redhat.com/errata/RHBA-2022:8785>
* 2022-12-07 (6 days after ICU patch committed) - Android patch committed <https://android.googlesource.com/platform/system/timezone/+/ea3e0ece71974c1d...>
* now (a week after ICU patch committed) - my Ubuntu workstation is still not updated.
We should be able to do better than this; that is, be more like Alpine or Arch Linux, or at least more like RHEL (though I see that Fedora still hasn't released 2022g...). Though ICU is part of the problem (as is tzdb itself :-), most of the delay seems to be occurring even after ICU patches are applied.
-- Benjamin Drung Debian & Ubuntu Developer
On 2022-12-09 15:08, Benjamin Drung wrote:
On Thu, 2022-12-08 at 12:50 -0800, Paul Eggert wrote:
In the past I've gotten my Ubuntu systems updated within 24 hours of a tzdata release, simply by applying patches as usual from Ubuntu. But as Benjamin indicates, this is not happening with 2022g. I just now checked for updates and I'm still stuck on 2022f. So from my point of view, Ubuntu is slow in this case - instead of taking less than a day, it's taking more than a week.
Ubuntu has as stable release update (SRU) process [1]. The process includes letting the update age in -proposed for at least seven days. That hasn't changed since the beginning in 2006. So I wonder how you get the updates within 24 hours.
I remember writing to this list about the speed of Ubuntu updates at least once. After some searching today, I found that after the 2015f release was announced August 11, 2015 at 04:37:39 UTC, I wrote on August 13 at 15:01:16 UTC that my Ubuntu workstation was already updated. See: https://mm.icann.org/pipermail/tz/2015-August/022594.html I didn't do anything special; I simply ran updates. So it sounds like the SRU process has had some sort of exception in the past for tzdata, and it suggests that Ubuntu could continue to have this exception. That said, running updates is complicated in Ubuntu as there are several ways to do it. On the Ubuntu 22.10 workstation I'm typing this on, if I click on Activities and search for "update" I get three different Ubuntu-supplied GUI icons, labeled "Software Updater", "Software & Updates", and "Ubuntu Software"; each can update tzdata. I can also do command-line updates with shell commands like "apt" or terminal-window commands like "aptitude". These alternate ways of doing updates typically do not agree, so although I did just update as usual in 2015, perhaps I used a different update procedure than the one you're thinking about. (Back in 2015 I was probably using "aptitude".)
There should be no copies of tzdata in Ubuntu. All software packages should only rely on tzdata. Please let me know if you find any. The only problematic package that I know is python3-tz which includes a hard coded list of time zone names [4].
I suspect there are several packages other than python3-pytzdata that have such lists. Although I by no means have a lot of packages installed on my workstation, I just now ran the command grep -rl Africa/Casablanca /usr /snap and found several files that mentioned this string (which is almost surely derived from tzdata). Here are three: /usr/share/liblangtag/common/bcp47/timezone.xml /usr/lib/thunderbird/libxul.so /usr/lib/x86_64-linux-gnu/libQt5Core.so.5.15.6 from the liblangtag-common, thunderbird, and libqt5core5a packages. Also, due to Ubuntu's use of snaps there are multiple copies of tzdata on my workstation, not all of which agree. For example, the shell command: find /usr /snap '(' -name right -o -name posix ')' -prune -o -name Ojinaga -type f -print outputs six file names, representing two distinct time zone histories for America/Ojinaga, as shown by the following command. Although I hope apps never use the obsolete Ojinaga files, I don't know how to check this - nor do I know why Ubuntu keeps around the obsolete files given the desire that there be just one copy of tzdata in Ubuntu. $ diff -u <(zdump -i /usr/share/zoneinfo/America/Ojinaga) <(zdump -i /snap/core20/1695/usr/share/zoneinfo/America/Ojinaga) | head -n 17 --- /dev/fd/63 2022-12-10 12:28:58.449876843 -0800 +++ /dev/fd/62 2022-12-10 12:28:58.449876843 -0800 @@ -1,5 +1,5 @@ -TZ="/usr/share/zoneinfo/America/Ojinaga" +TZ="/snap/core20/1695/usr/share/zoneinfo/America/Ojinaga" - - -065740 LMT 1922-01-01 00 -07 MST 1927-06-11 00 -06 CST @@ -60,4 +60,958 @@ 2021-03-14 03 -06 MDT 1 2021-11-07 01 -07 MST 2022-03-13 03 -06 MDT 1 -2022-10-30 02 -06 CST +2022-11-06 01 -07 MST +2023-03-12 03 -06 MDT 1 +2023-11-05 01 -07 MST
As I understand it, if TZDB releases version X, Android and Apple currently wait for ICU/CLDR to release a new version that takes X's changes into account.
The only changes that CLDR worries about are ones that would have an effect on the choice of 'labels' in formatting dates and times. You can see some of those in https://unicode-org.github.io/cldr-staging/charts/42/verify/zones/fr.html. For example, if a version X of the TZDB has no new zones, and changes to the rules for a zone don't require a change to a metazone (e.g. which zones it encompasses, or a new metazone), then nothing needs to happen. Even when a change is needed, in many cases the normal fallback among the different types of 'labels' means that people will see something reasonable until an implementation upgrades to the latest CLDR. Moreover, CLDR has become over time more insulated from unnecessary instabilities in the TZDB, such as changes in spellings for identifiers, changes in the zone1970.tab file (the backward-compatibility aids from the TZDB are extremely helpful for that!), etc. However, just as with the TZDB, there are cases where countries change their rules on crazy-short notice, and where CLDR needs to respond quickly. We try to do dot releases for those cases, but often translations for zones or metazones will lag since we have over 100 languages to manage. Offline, people can contact me at mark@unicode.org if they have suggestions for improvements. Mark [Yoshito is the expert, so he can correct this if some details are off.] On Wed, Dec 7, 2022 at 10:08 AM Paul Eggert via tz <tz@iana.org> wrote:
On 2022-12-07 03:37, Almaz Mingaleev via tz wrote:
Does that mean that libc APIs will behave differently from whatever is using ICU files?
No, as I understand it the idea is to allow automatic rebuilding of the timezone-related part of the CLDR/ICU data, whenever a new TZDB version is released, without having to wait for updates from CLDR/ICU.
As I understand it, if TZDB releases version X, Android and Apple currently wait for ICU/CLDR to release a new version that takes X's changes into account. The idea is that we could improve on this by developing a procedure that, given a new TZDB release X and an older CLDR/ICU release W, would produce a slightly-modified copy W' of the CLDR/IDU data that takes X's changes into account, without having to wait for ICU/CLDR to produce a new version. Of course when ICU/CLDR get around to producing a new version X themselves, you could discard W'.
The main obstacles I see to this are:
* Copyright issues. Unicode, Inc. does not allow people to redistribute modified versions of their data files, even if the modifications are clearly needed.
* Institutional inertia. ICU/CLDR is big and is unaccustomed to moving fast, and timezone info is not high priority for ICU/CLDR.
Although we haven't so far seen any sign of interest here from the Unicode, Inc. side, perhaps they'll get around to it if some big customers ask them in the right way.
By the way, what bad things would happen if Android and Apple *didn't* wait? In other projects I've dealt with, if an upstream program P adds a new timezone abbreviation "XYT", downstream distros like Ubuntu can immediately ship the new P without waiting for i18n updates. P will still work fine in all locales, except that the new "XYT" abbreviation (should it be needed) will appear as-is until translators catch up. Could Android and Apple do something similar with timezone-related strings?
On Thu, Dec 8, 2022 at 12:00 PM Mark Davis ☕️ via tz <tz@iana.org> wrote:
As I understand it, if TZDB releases version X, Android and Apple currently wait for ICU/CLDR to release a new version that takes X's changes into account.
The only changes that CLDR worries about are ones that would have an effect on the choice of 'labels' in formatting dates and times. You can see some of those in https://unicode-org.github.io/cldr-staging/charts/42/verify/zones/fr.html .
i think we might be talking at cross-purposes here? i think you're strictly correct for _cldr_ but i think the questions are really about "the whole of icu", so the fact (i think!) that icu _does_ need to change icu4c/source/data/misc/zoneinfo64.txt in response to tzdata changes is sufficient to be a problem as defined in this thread, even if no new translations are needed for cldr specifically.
For example, if a version X of the TZDB has no new zones, and changes to the rules for a zone don't require a change to a metazone (e.g. which zones it encompasses, or a new metazone), then nothing needs to happen.
Even when a change is needed, in many cases the normal fallback among the different types of 'labels' means that people will see something reasonable until an implementation upgrades to the latest CLDR.
Moreover, CLDR has become over time more insulated from unnecessary instabilities in the TZDB, such as changes in spellings for identifiers, changes in the zone1970.tab file (the backward-compatibility aids from the TZDB are extremely helpful for that!), etc.
However, just as with the TZDB, there are cases where countries change their rules on crazy-short notice, and where CLDR needs to respond quickly. We try to do dot releases for those cases, but often translations for zones or metazones will lag since we have over 100 languages to manage. Offline, people can contact me at mark@unicode.org if they have suggestions for improvements.
Mark
[Yoshito is the expert, so he can correct this if some details are off.]
On Wed, Dec 7, 2022 at 10:08 AM Paul Eggert via tz <tz@iana.org> wrote:
On 2022-12-07 03:37, Almaz Mingaleev via tz wrote:
Does that mean that libc APIs will behave differently from whatever is using ICU files?
No, as I understand it the idea is to allow automatic rebuilding of the timezone-related part of the CLDR/ICU data, whenever a new TZDB version is released, without having to wait for updates from CLDR/ICU.
As I understand it, if TZDB releases version X, Android and Apple currently wait for ICU/CLDR to release a new version that takes X's changes into account. The idea is that we could improve on this by developing a procedure that, given a new TZDB release X and an older CLDR/ICU release W, would produce a slightly-modified copy W' of the CLDR/IDU data that takes X's changes into account, without having to wait for ICU/CLDR to produce a new version. Of course when ICU/CLDR get around to producing a new version X themselves, you could discard W'.
The main obstacles I see to this are:
* Copyright issues. Unicode, Inc. does not allow people to redistribute modified versions of their data files, even if the modifications are clearly needed.
* Institutional inertia. ICU/CLDR is big and is unaccustomed to moving fast, and timezone info is not high priority for ICU/CLDR.
Although we haven't so far seen any sign of interest here from the Unicode, Inc. side, perhaps they'll get around to it if some big customers ask them in the right way.
By the way, what bad things would happen if Android and Apple *didn't* wait? In other projects I've dealt with, if an upstream program P adds a new timezone abbreviation "XYT", downstream distros like Ubuntu can immediately ship the new P without waiting for i18n updates. P will still work fine in all locales, except that the new "XYT" abbreviation (should it be needed) will appear as-is until translators catch up. Could Android and Apple do something similar with timezone-related strings?
On Wed, 2022-12-07 at 10:08 -0800, Paul Eggert wrote:
On 2022-12-07 03:37, Almaz Mingaleev via tz wrote:
Does that mean that libc APIs will behave differently from whatever is using ICU files?
No, as I understand it the idea is to allow automatic rebuilding of the timezone-related part of the CLDR/ICU data, whenever a new TZDB version is released, without having to wait for updates from CLDR/ICU.
Exactly.
As I understand it, if TZDB releases version X, Android and Apple currently wait for ICU/CLDR to release a new version that takes X's changes into account. The idea is that we could improve on this by developing a procedure that, given a new TZDB release X and an older CLDR/ICU release W, would produce a slightly-modified copy W' of the CLDR/IDU data that takes X's changes into account, without having to wait for ICU/CLDR to produce a new version. Of course when ICU/CLDR get around to producing a new version X themselves, you could discard W'.
The main obstacles I see to this are:
* Copyright issues. Unicode, Inc. does not allow people to redistribute modified versions of their data files, even if the modifications are clearly needed.
https://github.com/unicode-org/cldr/blob/main/common/supplemental/metaZones.... says "Copyright © 1991-2013 Unicode, Inc." and "For terms of use, see http://www.unicode.org/copyright.html". This page points to https://www.unicode.org/license.txt This license looks very similar to BSD/MIT.
* Institutional inertia. ICU/CLDR is big and is unaccustomed to moving fast, and timezone info is not high priority for ICU/CLDR.
Although we haven't so far seen any sign of interest here from the Unicode, Inc. side, perhaps they'll get around to it if some big customers ask them in the right way.
By the way, what bad things would happen if Android and Apple *didn't* wait? In other projects I've dealt with, if an upstream program P adds a new timezone abbreviation "XYT", downstream distros like Ubuntu can immediately ship the new P without waiting for i18n updates. P will still work fine in all locales, except that the new "XYT" abbreviation (should it be needed) will appear as-is until translators catch up. Could Android and Apple do something similar with timezone-related strings?
-- Benjamin Drung Debian & Ubuntu Developer
On 2022-12-09 15:13, Benjamin Drung wrote:
https://github.com/unicode-org/cldr/blob/main/common/supplemental/metaZones.... says "Copyright © 1991-2013 Unicode, Inc." and "For terms of use, see http://www.unicode.org/copyright.html". This page points to https://www.unicode.org/license.txt
Thanks for pointing this out - I wasn't aware of the intent there. Still, I see some legal issues. copyright.html defines Unicode data files to be the data files published under the following URLs: https://www.unicode.org/Public/ (except PDF online code charts) https://www.unicode.org/reports/ https://www.unicode.org/ivd/data/ license.txt says we have permission to redistribute modified versions of these data files. However, I don't see anything that grants permission to redistribute modified versions of <https://github.com/unicode-org/cldr/blob/main/common/supplemental/metaZones....> (the file you mentioned) as I could not find it under any of the abovementioned three URLs. I expect the intent was that we have permission to redistribute working data like that metaZones.xml URL even though it isn't part of an official release, but if so the intent should be stated formally. (I've learned to be cautious when it comes to copyright....)
On Sat, 2022-12-10 at 13:28 -0800, Paul Eggert wrote:
On 2022-12-09 15:13, Benjamin Drung wrote:
https://github.com/unicode-org/cldr/blob/main/common/supplemental/metaZones.... says "Copyright © 1991-2013 Unicode, Inc." and "For terms of use, see http://www.unicode.org/copyright.html". This page points to https://www.unicode.org/license.txt
Thanks for pointing this out - I wasn't aware of the intent there. Still, I see some legal issues. copyright.html defines Unicode data files to be the data files published under the following URLs:
https://www.unicode.org/Public/ (except PDF online code charts) https://www.unicode.org/reports/ https://www.unicode.org/ivd/data/
license.txt says we have permission to redistribute modified versions of these data files. However, I don't see anything that grants permission to redistribute modified versions of <https://github.com/unicode-org/cldr/blob/main/common/supplemental/metaZones....> (the file you mentioned) as I could not find it under any of the abovementioned three URLs.
I expect the intent was that we have permission to redistribute working data like that metaZones.xml URL even though it isn't part of an official release, but if so the intent should be stated formally. (I've learned to be cautious when it comes to copyright....)
https://github.com/unicode-org/cldr comes with unicode-license.txt. The README.md says: "Usage of CLDR data and software is governed by the Unicode Terms of Use a copy of which is included as unicode- license.txt." and "SPDX-License-Identifier: Unicode-DFS-2016". Some of the files in the repository have this SPDX license header. We should ask them to add this license header to all relevant files including metaZones.xml. -- Benjamin Drung Debian & Ubuntu Developer
participants (9)
-
Almaz Mingaleev -
Benjamin Drung -
Doug Ewell -
enh -
Guy Harris -
Mark Davis ☕️ -
Paul Eggert -
Robert Elz -
Steffen Nurpmeso