Re: [tz] Single source of truth for timezone data

Dec. 2, 2022

      On 12/1/22 05:45, Benjamin Drung via tz wrote the following, which I'm 
quoting nearly in its entirety because it doesn't seem to have hit the 
tz archive (presumably because Benjamin is not on the tz mailing list):
...
I am writing this mail with my distribution maintainer hat on since I am
responsible for keeping the tzdata package up-to-date in Ubuntu. I like
to propose to have a single source of truth for timezone data, which
should be the tzdata source package. Updating this package in a
distribution like Debian/Ubuntu should be enough to update all consumers
in the distribution.
Sadly this is currently not the case. Ubuntu since 20.04 (focal) adds
the four icu source files metaZones.txt, timezoneTypes.txt,
windowsZones.txt, and zoneinfo64.txt from
https://github.com/unicode-org/icu-data to the tzdata source package.
Then it uses genrb and icupkg to generate the .res files. Since the icu
project lacks behind the tzdata release by hours up to days, Ubuntu has
to update tzdata twice (first the tzdata release, then the icu update).
metaZones.txt, timezoneTypes.txt, and windowsZones.txt are generated
using tools/cldr/cldr-to-icu/build-icu-data.xml from
https://github.com/unicode-org/icu. zoneinfo64.txt is generated by
tz2icu. build-icu-data.xml uses https://github.com/unicode-org/cldr as
input to convert the cldr data to icu. If I saw that correctly, only
common/supplemental/metaZones.xml needs to be updated there on an
update.
Can tzdata include the necessary data and tools to generate
metaZones.xml from its source? Then it would be possible to generate the
icu data from the tzdata source without the need to wait for icu to
catch up. Then Debian/Ubuntu can ship a tzdata-icu package for it [1].
[1] https://bugs.debian.org/954112
Currently when creating data, tzcode uses only software that can be 
built and run on a barebones POSIX system, and this tzcode software is 
all public domain. If the tools you're thinking of are all public-domain 
POSIX apps, then what you're asking might be doable. If not, that might 
introduce a significant licensing and/or portability issue, which we'd 
have to think through carefully before proceeding.

I took a brief look at metaZones.xml[1] and saw that it contains much 
info not derived from TZDB. Its metazone-based data entries are useful 
for internationalization, which historically has been out of scope for 
TZDB and has "belonged" to CLDR. Presumably if tzdata contained this 
i18n info, TZDB would need to coordinate with the CLDR project before 
any TZDB release that changed any i18n-relevant data. Unfortunately this 
coordination would introduce a coupling that would slow down 
distribution of timezone data. At times the CLDR data have skipped 
coverage of TZDB releases, due to being so far behind. In contrast, 
we've had to do TZDB releases with less than 24 hours' notice and that's 
stretching our capabilities as it is; it's hard to imagine that we could 
react that quickly even with i18n-related delays factored in.

Also, I don't see how TZDB itself could generate the files in question 
without running afoul of Unicode, Inc.'s copyright. The metaZones.xml 
file's terms of use don't allow TZDB to generate its own metaZones.xml 
copy, which might different in unimportant ways (e.g., spacing) or even 
in ways that are semantically significant with whatever Unicode, Inc. 
eventually decides to do. At best TZDB could merely redistribute the 
metaZones.xml file that Unicode, Inc. eventually publishes, but this 
would mean that the metaZones.xml file in a TZDB release would likely be 
out of date.

You might be better off creating a tzdata-icu package along the lines 
suggested by Aurelien Jarno[2]. Although adding a package is a hassle 
(as one can also see with the tzdata-java package), the hassle seems 
inevitable here, due to the delay between getting new timestamp data and 
getting new i18n for that data.

[1]: 
https://github.com/unicode-org/cldr/blob/main/common/supplemental/metaZones....
[2]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=954112

Re: [tz] Single source of truth for timezone data

Paul Eggert