Add option to act like global-tz
I wangled a bit of free time and wrote the attached proposed patch, which adds an option that causes 'make install' to generate time zone data (TZif files) that are byte-for-byte identical to what Stephen Colebourne's global-tz currently generates. This is done via a few lines of awk script. The idea is to give an option to distros that prefer a separate Zone per ISO 3166 country, just as we already have build-time options for rearguard/vanguard form, TZif truncation, leap seconds, etc. I haven't installed this in the development database. Comments welcome as usual.
Hi, This sounds great! Thanks for taking the time to investigate. Can you clarify what is generated? As a reminder, the tooling chains that global-tz supports need the *source* files (africa, europe, northamerica...) to contain the data for Zone per ISO-3166. This is because: - they contain their own parsers of the source data [1] [2] [3] - the parsers typically operate and expose data in rearguard mode (no negative DST) [4] - they consume and expose the last Rule line for each zone [5] If the source data is available as a result of the make command then there should be no long term need for global-tz. If the source data is not available as a result of the make command, then sadly global-tz would still be needed unless the patch can be enhanced. thanks Stephen [1] https://github.com/ThreeTen/threetenbp/blob/master/src/main/java/org/threete... [2] https://cldr.unicode.org/development/updating-codes/update-time-zone-data-fo... [3] https://github.com/openjdk/jdk/blob/master/make/jdk/src/classes/build/tools/... [4] https://github.com/JodaOrg/global-tz/releases/tag/2022agtz [5] https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/time/zone/... On Wed, 27 Jul 2022 at 02:19, Paul Eggert via tz <tz@iana.org> wrote:
I wangled a bit of free time and wrote the attached proposed patch, which adds an option that causes 'make install' to generate time zone data (TZif files) that are byte-for-byte identical to what Stephen Colebourne's global-tz currently generates. This is done via a few lines of awk script. The idea is to give an option to distros that prefer a separate Zone per ISO 3166 country, just as we already have build-time options for rearguard/vanguard form, TZif truncation, leap seconds, etc.
I haven't installed this in the development database. Comments welcome as usual.
On 7/27/22 01:47, Stephen Colebourne via tz wrote:
Can you clarify what is generated?
With the patch, this shell command: make PACKRATDATA=backzone PACKRATLIST=zone.tab install installs into /usr/share/zoneinfo files that are identical to those installed by global-tz's "make install", except that /usr/share/zoneinfo/tzdata.zi's lines differ in ways that should not matter. To be precise, tzdata.zi's lines are sorted differently (which is irrelevant) and have one unused ruleset (also irrelevant, and easily fixed with the attached further patch which I have not installed either) and tzdata.zi's version comments differ as follows: -# version 2022agtz-122-gcd2e4b5 +# version 2022a-60-ge743ce9 +# ddeps backzone zone.tab None of this should affect how tzdata.zi is interpreted.
the tooling chains that global-tz supports need the *source* files (africa, europe, northamerica...) to contain the data
You can get that by using the tailored_tarballs target added recently. For example, this shell command: make DATAFORM=rearguard \ PACKRATDATA=backzone PACKRATLIST=zone.tab tailored_tarballs generates a tarball that should be suitable for the toolchains you mention. I just now did that using the two proposed patches, and put the results here temporarily: https://www.cs.ucla.edu/~eggert/tz/tzdata2022a-61-g3af508c-tailored.tar.gz
No further comment and I have installed those two patches into the development version on GitHub.
On Wed, 27 Jul 2022 at 19:10, Paul Eggert via tz <tz@iana.org> wrote:
On 7/27/22 01:47, Stephen Colebourne via tz wrote:
Can you clarify what is generated? the tooling chains that global-tz supports need the *source* files (africa, europe, northamerica...) to contain the data
You can get that by using the tailored_tarballs target added recently. For example, this shell command:
make DATAFORM=rearguard \ PACKRATDATA=backzone PACKRATLIST=zone.tab tailored_tarballs
generates a tarball that should be suitable for the toolchains you mention.
Thanks. The output being generated here is correct for some downstream toolchains AFAICT. Observations: 1) Some of the toolchains (ThreeTen-BP at least) depend on the `leapseconds` file, so can that be added please. 2) The generated tarball places all data in the etcetera file, rather than in africa/asia etc. This may cause problems for downstream users that wish to only package some files. I suspect that group of people is small, nevertheless it is worth noting that the generated tarball is not equivalent. 3) The generated tarball omits other files like zone.tab, iso3166.tab, backzone, backward and so on. In effect the contents are significantly less functional than those of global-tz. ie. global-tz goes to significant lengths to simulate exactly what iana-tz would have looked like if the pre-1970 merges had never happened. The output of global-tz is therefore fully compatible with all toolchains, no matter what they do with the data. Actions/Questions: As a minimum we need to add the `leapseconds` file to the generated tarball. Is it practical to keep the various files separate? Or to include more files? Right now it doesn't really feel like a compatible tzdata tarball as it has quite different content. My concern here is non-Java downstream projects that wish to avoid the merges - do they have what they need? (ie. I would prefer not to maintain global-tz, and it will need a good few hours work to undo the latest European merges, but I would need to be sure that all potential use cases are covered.) thanks Stephen
On 8/3/22 17:01, Stephen Colebourne via tz wrote:
1) Some of the toolchains (ThreeTen-BP at least) depend on the `leapseconds` file, so can that be added please.
Thanks for pointing that out. I installed the attached patch, which also adds the other source files you mentioned. (You also mentioned 'backward' which was already included albeit in size-zero form.) I then built a freshly tailored tarball using "make PACKRATDATA=backzone PACKRATLIST=zone.tab tailored_tarballs", which you can temporarily find here, if you'd like to test with it: https://www.cs.ucla.edu/~eggert/tz/tzdata2022a-77-g9b665ce-tailored.tar.gz
2) The generated tarball places all data in the etcetera file, rather than in africa/asia etc. This may cause problems for downstream users that wish to only package some files. I suspect that group of people is small, nevertheless it is worth noting that the generated tarball is not equivalent.
Yes, I added a comment to that effect in the Makefile (see attached patch). I doubt whether anyone packages just 'asia'. If they do then it's fixable but will take some work and complexity that I'd rather avoid. The idea is to support real-world use, and not worry about every theoretically-possible use.
Is it practical to keep the various files separate? Or to include more files?
Including more files is easy and is done by the attached patch. Getting every source file to look just like global-tz's would be harder and I hope it's not needed.
On 8/3/22 18:27, Paul Eggert wrote:
Including more files is easy and is done by the attached patch.
That patch was a little too enthusiastic for tailored tarballs in vanguard form which doesn't need 'pacificnew', so I installed the attached further patch. This patch doesn't affect tarballs in main or rearguard form, as they still have an empty pacificnew file for the benefit of TZUpdater 2.3.1 and earlier.
On Thu, 4 Aug 2022 at 02:28, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 8/3/22 17:01, Stephen Colebourne via tz wrote:
1) Some of the toolchains (ThreeTen-BP at least) depend on the `leapseconds` file, so can that be added please.
Thanks for pointing that out. I installed the attached patch, which also adds the other source files you mentioned. (You also mentioned 'backward' which was already included albeit in size-zero form.)
Thanks. I've had a look at the tar.gz file and it now looks like a "normal" tzdata" file. The only difference is that the data is in one etcetera file instead of spread out over various continent-like files. AFAICT, this is sufficient for the Java tooling I'm aware of, although I've yet to actually run tests on all of them. It does look like future versions of the Java libraries can use this new option from tzdb, which is good for everyone concerned. I'd recommend anyone else that wishes to use a global-tz like non-merged set of data to try the new tar.gz file to see if it meets their needs. thanks Stephen
Date: Thu, 4 Aug 2022 22:54:45 +0100 From: Stephen Colebourne via tz <tz@iana.org> Message-ID: <CACzrW9A2PPmUz8A1=mkRCGwny-oA=Kjd=dTLjweE=X0+_-Umqw@mail.gmail.com> | I'd recommend anyone else that wishes to use a global-tz like | non-merged set of data to try the new tar.gz file to see if it meets | their needs. I have, while preparing to incorporate the update into NetBSD, and it doesn't seem to (unless I am missing something) do what we need at all. Generating the appropriate TZIf files is only half the battle - NetBSD users mostly have the sources (for everything) (as contracted with Linux, etc, where even though the sources are available, a very small fraction of users seem to make any use of that). What I need is for the data to be in the files it has always been in. Whether it gets there by simply unpacking the tzdata tar file, or by running some script upon what is there to produce the desired results is not important - but running a script to produce some other file, which isn't one where the user would expect to find data should they wish to change it, isn't useful to us at all. I would have also thought that keeping all of the relevant data in the appropriate files would make maintenance easier - when changes are being made, all the (likely) zones that might be affected are in one place - if you want to ignore some of them and make them links (essentially replacing data which has been characterised as possibly, or even probably, incorrect, with data which is almost guaranteed incorrect) then doing that as the data is processed seems like the better choice, rather than having to go check some other file, and make sure that everywhere that links have been made are still appropriate to remain links seems like the hard way to me. So, if you don't make a 2022btgz version available, I guess I will go back to manually inserting the updates, as I have done previously. kre eg: since we haven't yet released a NetBSD update for 2022b, any users in Chile who want to be prepared can easily make the update themselves, but they need to get the same results when they do "zic southamerica" (other than the change they made) as the TZif files that are distributed with the system. (And the same for europe, australasia, northamerica, africa, ...)
On 8/15/22 12:23, Robert Elz via tz wrote:
What I need is for the data to be in the files it has always been in.
In what sense is that not true for the tarball format in question[1]? What programs go awry if that tarball's data are used, and what are the symptoms of these failures? If the main problem is that a user who wants to hand-edit this data will not know to edit 'etcetera' rather than (say) 'australasia', then it'd be easy to change the tarball generator so that instead of 'australasia' being empty it would contain a comment like "# Please edit the file 'etcetera' instead."; would that help? If the problem is somewhere else it'd be helpful to know what it is. [1] Here's the latest iteration of the tarball format in question: https://www.cs.ucla.edu/~eggert/tz/tzdata2022b-5-gc9ba895-tailored.tar.gz
Date: Mon, 15 Aug 2022 13:24:48 -0700 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <fb33ecdc-753d-ea7e-6d4d-f30af7b0e059@cs.ucla.edu> | In what sense is that not true for the tarball format in question[1]? The data isn't in the files it belongs in. | What programs go awry if that tarball's data are used, and what are the | symptoms of these failures? One cannot easily checked what has changed, as the whole data file (all of them in that format) has gone away. I know you understand version control systems - we use one as well (currently cvs, gradually migrating to mercurial) and being able to look and see what changed in what version (and why) is important. | If the main problem is that a user who wants to hand-edit this data will | not know to edit 'etcetera' rather than (say) 'australasia', then it'd | be easy to change the tarball generator so that instead of 'australasia' | being empty it would contain a comment like "# Please edit the file | 'etcetera' instead."; would that help? No. If you would put the data which belongs in australasia back in the australasia file, that would help (and the same with all the others). Even if the file had markers in it, showing which parts belong to (or originate from if you prefer) which original file it would be better. That is, assuming that when the backzone data is merged in, it is put into the correct places in the file (that from which it was removed) so that we don't have countless differences which are in reality simply lines moving around. | [1] Here's the latest iteration of the tarball format in question: | https://www.cs.ucla.edu/~eggert/tz/tzdata2022b-5-gc9ba895-tailored.tar.gz Thanks -- in later reading of previously unread messages, I saw there was one of those for 2022a as well, had I seen that one I might have looked at it earlier, not that I think that would have changed anything. This file (as it doesn't include anything from backzone) doesn't help at all though - the tzdata file as distributed is easier to deal with than that particular one - but I assume that much is able to be altered. kre
On Wed, Jul 27, 2022 at 2:11 PM Paul Eggert via tz <tz@iana.org> wrote:
With the patch, this shell command:
make PACKRATDATA=backzone PACKRATLIST=zone.tab install
installs into /usr/share/zoneinfo files that are identical to those installed by global-tz's "make install"
Is the intent that the standard install and the "PACKRATDATA=backzone PACKRATLIST=zone.tab" install should not exhibit any differences for 1970+ timestamps?
On 8/16/22 00:08, Bradley White wrote:
Is the intent that the standard install and the "PACKRATDATA=backzone PACKRATLIST=zone.tab" install should not exhibit any differences for 1970+ timestamps?
Yes, and you can test this by running something like the following in a source directory. The diff command should output nothing. (The test takes a while.) I did this sort of test before releasing 2022b. make clean make TOPDIR=$PWD/tza ZFLAGS=-r@0 install ./zdump -i $(awk '/^[^#]/{print $3}' zone1970.tab) >a.tzs make clean make TOPDIR=$PWD/tzb ZFLAGS=-r@0 \ PACKRATDATA=backzone PACKRATLIST=zone.tab \ install ./zdump -i $(awk '/^[^#]/{print $3}' zone1970.tab) >b.tzs diff a.tzs b.tzs rm -r tza tzb a.tzs b.tzs
On Tue, Aug 16, 2022 at 5:23 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
On 8/16/22 00:08, Bradley White wrote:
Is the intent that the standard install and the "PACKRATDATA=backzone PACKRATLIST=zone.tab" install should not exhibit any differences for 1970+ timestamps?
Yes, and you can test this by running something like the following in a source directory. The diff command should output nothing. (The test takes a while.) I did this sort of test before releasing 2022b.
make clean make TOPDIR=$PWD/tza ZFLAGS=-r@0 install ./zdump -i $(awk '/^[^#]/{print $3}' zone1970.tab) >a.tzs make clean make TOPDIR=$PWD/tzb ZFLAGS=-r@0 \ PACKRATDATA=backzone PACKRATLIST=zone.tab \ install ./zdump -i $(awk '/^[^#]/{print $3}' zone1970.tab) >b.tzs diff a.tzs b.tzs rm -r tza tzb a.tzs b.tzs
That seems to only consider zones in "zone1970.tab", which, by definition, are the zones "where civil timestamps have agreed since 1970." What about examples like ... $ TZ=<path-to-standard>/Africa/Freetown date -d @1660625107 Tue Aug 16 04:45:07 GMT 2022 $ TZ=<path-to-packrat>/Africa/Freetown date -d @1660625107 Tue Aug 16 05:05:07 +01 2022 and ... $ TZ=<path-to-standard>/Pacific/Saipan date -d @230659200 Sun Apr 24 03:00:00 GDT 1977 $ TZ=<path-to-packrat>/Pacific/Saipan date -d @230659200 Sun Apr 24 02:00:00 +10 1977 It sure seems like those post-1970 timestamps are receiving different treatments (abbreviations notwithstanding).
On 8/16/22 14:57, Bradley White wrote:
$ TZ=<path-to-standard>/Africa/Freetown date -d @1660625107 Tue Aug 16 04:45:07 GMT 2022 $ TZ=<path-to-packrat>/Africa/Freetown date -d @1660625107 Tue Aug 16 05:05:07 +01 2022
and ...
$ TZ=<path-to-standard>/Pacific/Saipan date -d @230659200 Sun Apr 24 03:00:00 GDT 1977 $ TZ=<path-to-packrat>/Pacific/Saipan date -d @230659200 Sun Apr 24 02:00:00 +10 1977
Thanks, good catches. The Africa/Freetown glitch was introduced in 2022a when fixing other problems with Sierra Leone. The Pacific/Saipan glitch is more interesting. It was introduced in 2018h, when we added DST rules for Pacific/Guam but I didn't notice that this should have caused Pacific/Saipan to split off and so I mistakenly kept the latter linked to the former. However, I just now looked into this and discovered that the old Pacific/Saipan data entries (taken from Shanks) were also mistaken. In the default build, as luck would have it the 2015h merge of Pacific/Guam and Pacific/Saipan, together with my 2018h mistake, actually *fixed* bugs in Pacific/Saipan because the merge and my subsequent mistake caused Shanks's mistakes to be ignored. I ran the following test to find similar glitches: rm -fr tza tzb make clean make TOPDIR=$PWD/tza ZFLAGS=-r@0 install ./zdump -i $(awk '/^[^#]/{print $3}' zone.tab) >a.tzs make clean make TOPDIR=$PWD/tzb ZFLAGS=-r@0 \ PACKRATDATA=backzone PACKRATLIST=zone.tab \ install ./zdump -i $(awk '/^[^#]/{print $3}' zone.tab) >b.tzs diff a.tzs b.tzs rm -r tza tzb a.tzs b.tzs and I found one more glitch: the time zone abbreviation for Pacific/Midway disagreed (this glitch was introduced in 2015b). I installed into the development repository the attached patch to 'backzone' to fix these glitches. The above test passes now. Because this patch affects only 'backzone', it doesn't affect any timestamps in the default build.
On Tue, Aug 16, 2022 at 8:51 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
On 8/16/22 14:57, Bradley White wrote:
$ TZ=<path-to-standard>/Africa/Freetown date -d @1660625107 Tue Aug 16 04:45:07 GMT 2022 $ TZ=<path-to-packrat>/Africa/Freetown date -d @1660625107 Tue Aug 16 05:05:07 +01 2022
The Africa/Freetown glitch was introduced in 2022a when fixing other problems with Sierra Leone.
I ran the following test to find similar glitches: [...]
I installed into the development repository the attached patch to 'backzone' to fix these glitches. The above test passes now.
For what it's worth, that patch appears to remove the last instance of "DST all year" <https://github.com/eggert/tz/blob/aca1a705795d4d3986a3ac795e673c410ace2693/z...>. Hopefully you have some other hidden tests that continue to exercise those code paths.
On 8/17/22 10:31, Bradley White wrote:
For what it's worth, that patch appears to remove the last instance of "DST all year" <https://github.com/eggert/tz/blob/aca1a705795d4d3986a3ac795e673c410ace2693/z...>. Hopefully you have some other hidden tests that continue to exercise those code paths.
Unfortunately all the tests I normally run are in the Makefile, and none of them exercise "DST all year" as far as I know. It would be nice if someone had the time to write good test cases for "DST all year". For example, POSIX requires support for it by via tricky settings like TZ='XXX6PDT7,0/0,J365/23' and tzcode should work with such settings (which I think it does). That being said, I expect that a lot of other software would misbehave under "DST all year". This is partly why tzdata has routinely modeled "DST all year" as standard time in the past. This may sound like a small thing, but it could turn into a real problem if the US adopts a "DST all year" law.
On Tue, Aug 16, 2022 at 8:51 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
On 8/16/22 14:57, Bradley White wrote:
$ TZ=<path-to-standard>/Africa/Freetown date -d @1660625107 Tue Aug 16 04:45:07 GMT 2022 $ TZ=<path-to-packrat>/Africa/Freetown date -d @1660625107 Tue Aug 16 05:05:07 +01 2022
The Africa/Freetown glitch was introduced in 2022a when fixing other problems with Sierra Leone.
I installed into the development repository the attached patch to 'backzone' to fix these glitches.
Because this patch affects only 'backzone', it doesn't affect any timestamps in the default build.
The sense I'm getting from discussions on the backzone issue is that many corporations and redistributors will become "PACKRAT"s when moving to 2022b, if only to provide historical consistency (and independently of what anyone considers historical accuracy). The "global-tz" advocates certainly fall into that category. In that light, you might reconsider the concept of a "default build". It is just one choice, and one which may not be used by the majority of consumers. Concretely, I suggest that backzone changes affecting current timestamps (like the Africa/Freetown case above) be treated with the same release expediency that would be afforded to similar changes in the "default" data.
On 8/17/22 20:30, Bradley White wrote:
The sense I'm getting from discussions on the backzone issue is that many corporations and redistributors will become "PACKRAT"s when moving to 2022b, if only to provide historical consistency
I'm not getting the same sense. And if "historical consistency" means "don't mess with TZDB data", then redistributors should avoid the PACKRAT* options, as switching from 2022a to 2022b+PACKRATDATA introduces more changes to timestamps than simply upgrading from 2022a to 2022b. There are other good reasons not to use the PACKRAT* options. Not only do they add politics (which in the long run will likely cause trouble for people opting for PACKRAT*), they use lower-quality data that, despite repeated requests, nobody has volunteered to maintain. If you want stability this is not a good basis for it.
Concretely, I suggest that backzone changes affecting current timestamps (like the Africa/Freetown case above) be treated with the same release expediency that would be afforded to similar changes in the "default" data.
I don't see a need to issue a new TZDB release merely because of glitches involving the new options. Those who are competent and brave (or foolhardy :-) enough to use these bleeding-edge options can easily apply the patch published in <https://mm.icann.org/pipermail/tz/2022-August/031836.html>, assuming they care about the glitches. The rest of us can wait for the next release. As usual, though, my bottom-line advice is: don't use PACKRAT*.
On Mon, 15 Aug 2022 at 21:19, Robert Elz via tz <tz@iana.org> wrote:
So, if you don't make a 2022btgz version available, I guess I will go back to manually inserting the updates, as I have done previously.
global-tz 2022bgtz is already available (and built into Joda-Time and ThreeTen-Backport) https://github.com/JodaOrg/global-tz/releases (In the short term it was better to go through the pain and get a version of tzdb I'm confident of rather than the PACKRATLIST alternative. However, as I've said previously I don't particularly want to maintain that in the medium or long term). Paul, extrapolating from Robert's point is I think one I've made before: that it is much simpler to start with the data in the correct files and generate the merged form automatically for those that want it merged than it is to start with the merged form and try to recreate the data in the correct files. Robert, if there is something specific about global-tz that is "wrong" in your opinion, such as the contents of zonetab, feel free to raise an issue and we can discuss it there to see if anything can/should be done. thanks Stephen
Date: Mon, 15 Aug 2022 21:36:52 +0100 From: Stephen Colebourne via tz <tz@iana.org> Message-ID: <CACzrW9CP9w5SMUX=Er7ana2dyGn4oW0T18HB+mG6-Hg_kwdy_w@mail.gmail.com> I am going to try this off list, but it looks as if joda.org mail goes to gmail, in which case it is 99% certain that the message will never make it to you (Google don't like my attitudes wrt e-mail and the takeover by the major e-mail providers - they bounce almost all mail from munnari.oz.au). As I expected, that failed - something called mx.kundenserver.de received the message, then tried forwarding to gmail, which rejected the message. So apologies to people on the list who didn't really need to see this. | Robert, if there is something specific about global-tz that is "wrong" I would have preferred if the comments from backzone moved back to the original files, along with the data - the way it was done for 2022agtz but it worked as it was, in general. kre
On Wed, 17 Aug 2022 at 14:06, Robert Elz via tz <tz@iana.org> wrote:
| Robert, if there is something specific about global-tz that is "wrong"
I would have preferred if the comments from backzone moved back to the original files, along with the data - the way it was done for 2022agtz but it worked as it was, in general.
Feel free to raise it here. In summary though, comments would be a lot of effort for minimal reward IMO: https://github.com/JodaOrg/global-tz/issues (2022cgtz is now available) thanks Stephen
participants (5)
-
Bradley White -
Derick Rethans -
Paul Eggert -
Robert Elz -
Stephen Colebourne