[PROPOSED PATCH 1/2] Also distribute code and data in a single tarball
Suggested by Arthur David Olson in: https://mm.icann.org/pipermail/tz/2000-January/010789.html * Makefile (GNUTARFLAGS): Add --sort=name, so that the single tarball is more reproducible. (clean): Remove tzdb. (tzdb-$(VERSION).tar.gz): New target. (tarballs): Add it. (tzdb-$(VERSION).tar.gz.asc): New target. (signatures): Add it. * NEWS: Document this. --- Makefile | 21 +++++++++++++++++---- NEWS | 9 +++++++++ 2 files changed, 26 insertions(+), 4 deletions(-) diff --git a/Makefile b/Makefile index fc18819..da697b7 100644 --- a/Makefile +++ b/Makefile @@ -326,7 +326,7 @@ OK_LINE= '^'$(OK_CHAR)'*$$' # Flags to give 'tar' when making a distribution. # Try to use flags appropriate for GNU tar. -GNUTARFLAGS= --numeric-owner --owner=0 --group=0 --mode=go+u,go-w +GNUTARFLAGS= --numeric-owner --owner=0 --group=0 --mode=go+u,go-w --sort=name TARFLAGS= `if tar $(GNUTARFLAGS) --version >/dev/null 2>&1; \ then echo $(GNUTARFLAGS); \ else :; \ @@ -568,7 +568,7 @@ clean_misc: rm -f core *.o *.out \ date tzselect version.h zdump zic yearistype libtz.a clean: clean_misc - rm -fr *.dir $(TZS_NEW) + rm -fr *.dir tzdb $(TZS_NEW) maintainer-clean: clean @echo 'This command is intended for maintainers to use; it' @@ -672,7 +672,8 @@ check_time_t_alternatives: done rm -fr time_t.dir -tarballs: tzcode$(VERSION).tar.gz tzdata$(VERSION).tar.gz +tarballs: tzcode$(VERSION).tar.gz tzdata$(VERSION).tar.gz \ + tzdb-$(VERSION).tar.gz tzcode$(VERSION).tar.gz: set-timestamps.out LC_ALL=C && export LC_ALL && \ @@ -685,7 +686,16 @@ tzdata$(VERSION).tar.gz: set-timestamps.out tar $(TARFLAGS) -cf - $(COMMON) $(DATA) $(MISC) $(TZS) | \ gzip $(GZIPFLAGS) > $@ -signatures: tzcode$(VERSION).tar.gz.asc tzdata$(VERSION).tar.gz.asc +tzdb-$(VERSION).tar.gz: set-timestamps.out + rm -fr tzdb + mkdir tzdb + ln $(COMMON) $(DOCS) $(SOURCES) $(DATA) $(MISC) $(TZS) tzdb + touch -cmr $$(ls -t tzdb/* | sed 1q) tzdb + LC_ALL=C && export LC_ALL && \ + tar $(TARFLAGS) -cf - tzdb | gzip $(GZIPFLAGS) > $@ + +signatures: tzcode$(VERSION).tar.gz.asc tzdata$(VERSION).tar.gz.asc \ + tzdb-$(VERSION).tar.gz.asc tzcode$(VERSION).tar.gz.asc: tzcode$(VERSION).tar.gz gpg --armor --detach-sign $? @@ -693,6 +703,9 @@ tzcode$(VERSION).tar.gz.asc: tzcode$(VERSION).tar.gz tzdata$(VERSION).tar.gz.asc: tzdata$(VERSION).tar.gz gpg --armor --detach-sign $? +tzdb-$(VERSION).tar.gz.asc: tzdb-$(VERSION).tar.gz + gpg --armor --detach-sign $? + typecheck: $(MAKE) clean for i in "long long" unsigned; \ diff --git a/NEWS b/NEWS index 023883b..18b7c9d 100644 --- a/NEWS +++ b/NEWS @@ -32,6 +32,15 @@ Unreleased, experimental changes what should be the output of 'zdump -i -c 2050' on primary zones. 'make check' now checks that zdump generates this output. + A new distribution format is available in the tarball + tzdb-VERSION.tar.gz and the signature tzdb-VERSION.tar.gz.asc. + The new tarball has the contents of tzcodeVERSION.tar.gz and + tzdataVERSION.tar.gz, in a single top-level directory 'tzdb' with + all other files under this directory, as is typical for software + distributions. The new format is intended to replace the old + tarball pair. The old format will continue to be distributed for + a while. + Changes to documentation and commentary tzfile.5 now documents the new restriction on POSIX TZ-like -- 2.5.5
* Makefile (maintainer-clean): Remove *.tar.lz too. (tzdb-$(VERSION).tar.lz): Rename from tzdb-$(VERSION).tar.gz, and use lzip rather than gzip. All uses changed. * NEWS: Document this. --- Makefile | 12 ++++++------ NEWS | 5 +++-- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/Makefile b/Makefile index da697b7..98024d0 100644 --- a/Makefile +++ b/Makefile @@ -573,7 +573,7 @@ clean: clean_misc maintainer-clean: clean @echo 'This command is intended for maintainers to use; it' @echo 'deletes files that may need special tools to rebuild.' - rm -f leapseconds $(MANTXTS) $(TZS) *.asc *.tar.gz + rm -f leapseconds $(MANTXTS) $(TZS) *.asc *.tar.[gl]z names: @echo $(ENCHILADA) @@ -673,7 +673,7 @@ check_time_t_alternatives: rm -fr time_t.dir tarballs: tzcode$(VERSION).tar.gz tzdata$(VERSION).tar.gz \ - tzdb-$(VERSION).tar.gz + tzdb-$(VERSION).tar.lz tzcode$(VERSION).tar.gz: set-timestamps.out LC_ALL=C && export LC_ALL && \ @@ -686,16 +686,16 @@ tzdata$(VERSION).tar.gz: set-timestamps.out tar $(TARFLAGS) -cf - $(COMMON) $(DATA) $(MISC) $(TZS) | \ gzip $(GZIPFLAGS) > $@ -tzdb-$(VERSION).tar.gz: set-timestamps.out +tzdb-$(VERSION).tar.lz: set-timestamps.out rm -fr tzdb mkdir tzdb ln $(COMMON) $(DOCS) $(SOURCES) $(DATA) $(MISC) $(TZS) tzdb touch -cmr $$(ls -t tzdb/* | sed 1q) tzdb LC_ALL=C && export LC_ALL && \ - tar $(TARFLAGS) -cf - tzdb | gzip $(GZIPFLAGS) > $@ + tar $(TARFLAGS) -cf - tzdb | lzip -9 > $@ signatures: tzcode$(VERSION).tar.gz.asc tzdata$(VERSION).tar.gz.asc \ - tzdb-$(VERSION).tar.gz.asc + tzdb-$(VERSION).tar.lz.asc tzcode$(VERSION).tar.gz.asc: tzcode$(VERSION).tar.gz gpg --armor --detach-sign $? @@ -703,7 +703,7 @@ tzcode$(VERSION).tar.gz.asc: tzcode$(VERSION).tar.gz tzdata$(VERSION).tar.gz.asc: tzdata$(VERSION).tar.gz gpg --armor --detach-sign $? -tzdb-$(VERSION).tar.gz.asc: tzdb-$(VERSION).tar.gz +tzdb-$(VERSION).tar.lz.asc: tzdb-$(VERSION).tar.lz gpg --armor --detach-sign $? typecheck: diff --git a/NEWS b/NEWS index 18b7c9d..61213b0 100644 --- a/NEWS +++ b/NEWS @@ -33,13 +33,14 @@ Unreleased, experimental changes 'make check' now checks that zdump generates this output. A new distribution format is available in the tarball - tzdb-VERSION.tar.gz and the signature tzdb-VERSION.tar.gz.asc. + tzdb-VERSION.tar.lz and the signature tzdb-VERSION.tar.lz.asc. The new tarball has the contents of tzcodeVERSION.tar.gz and tzdataVERSION.tar.gz, in a single top-level directory 'tzdb' with all other files under this directory, as is typical for software distributions. The new format is intended to replace the old tarball pair. The old format will continue to be distributed for - a while. + a while. The new format uses lzip compression, which is + significantly smaller than gzip and is simpler than xz. Changes to documentation and commentary -- 2.5.5
If I understand right, this is proposed to be an additional kit, right? The tzdatannnn kits are expected to continue? paul
Paul_Koning@Dell.com wrote:
If I understand right, this is proposed to be an additional kit, right? The tzdatannnn kits are expected to continue?
Yes. I'd like to discontinue the old-style distribution at some point, but there's no rush.
On Aug 22, 2016, at 11:11 AM, Paul Eggert <eggert@CS.UCLA.EDU> wrote:
Paul_Koning@Dell.com wrote:
If I understand right, this is proposed to be an additional kit, right? The tzdatannnn kits are expected to continue?
Yes. I'd like to discontinue the old-style distribution at some point, but there's no rush.
If you do, please give plenty of notice. Some of us have automated scripts that rely on the old format. paul
Paul_Koning@Dell.com wrote:
If you do, please give plenty of notice. Some of us have automated scripts that rely on the old format.
Sure, we can do that in the next release, which will still ship the old format.
Date: Mon, 22 Aug 2016 08:11:46 -0700 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <492db551-d050-d0c0-0b46-2bb49d89c2f8@cs.ucla.edu> | Yes. I'd like to discontinue the old-style distribution at some point Old style distribution was a single file containing both code and data. It was split because it was more manageable (for everyone) that way, most people don't really care much about the code - that just serves as a reference implementation, and the various system distributions use their own code (either based originally on the reference implementation, or independently created) and just want the data - which is (or should be) the primary output from this group. kre ps: I don't really see any need for a better compression technique than what is currently used (or really, for that matter, any compression at all.) Neither the code, nor the data, and not even the combination of the two, is big enough by modern standards to require it.
I don't see anything driving this disruptive change other than an email from 16 years ago. I only ever use the tzdata, and was considering writing a script myself to download it. This just seems like needless fiddling with something that isn't broken. Stephen On 22 August 2016 at 16:39, Robert Elz <kre@munnari.oz.au> wrote:
Date: Mon, 22 Aug 2016 08:11:46 -0700 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <492db551-d050-d0c0-0b46-2bb49d89c2f8@cs.ucla.edu>
| Yes. I'd like to discontinue the old-style distribution at some point
Old style distribution was a single file containing both code and data. It was split because it was more manageable (for everyone) that way, most people don't really care much about the code - that just serves as a reference implementation, and the various system distributions use their own code (either based originally on the reference implementation, or independently created) and just want the data - which is (or should be) the primary output from this group.
kre
ps: I don't really see any need for a better compression technique than what is currently used (or really, for that matter, any compression at all.) Neither the code, nor the data, and not even the combination of the two, is big enough by modern standards to require it.
Robert Elz wrote:
Old style distribution was a single file containing both code and data. It was split because it was more manageable (for everyone) that way
I must say it's not more manageable for me, and I recall at least one other complaint about splitting the distribution in two. The original idea was that the code would be stable and so could be distributed separately and independently from the data. And for a while we did that: for example, in 1999 there was one more data than code release, and users just had to know that tzcode1999h was released contemporaneously (and tested) with tzdata1999i. But this was confusing. On at least one occasion it confused even me, and I released something with the "wrong" version number. So starting in late 2012 code and data releases have been issued in lockstep, even if there is no change to the code other than the version number. So nowadays the release comes as two tarballs, but it's really just one release. Unfortunately the tarball separation is incomplete, as some files are distributed in both tarballs, which means that if you combine tzcode version X with tzdata version Y (not recommended) you may need to resolve conflicts. There's another point of confusion with the old format. Its tarballs extract files into the working directory, which is unusual and I recall at least one complaint about it. It's almost universal practice nowadays for distribution tarballs to extract into a subdirectory named after the distribution. In some circles having a bunch of top-level files in an archive is evem considered to be a security risk. In the long run it'll be a win if we switch to common practice and avoid these problems. I hope the above all helps to explain the format switch a bit more. People who just want the data can grab the unified tarball and get just the files they want. That's what they do now anyway, as the data tarball contains several non-data files.
ps: I don't really see any need for a better compression technique than what is currently used (or really, for that matter, any compression at all.)
The need for compression will go up once we start distributing .tzs files (which partly motivated the format switch). With the current draft the tarball sizes look like this: bytes file 665600 tzcodeX.tar 1669120 tzdataX.tar 2334720 concatenation of tzcodeX.tar + tzdataX.tar 202051 tzcodeX.tar.gz 393052 tzdataX.tar.gz 595103 concatenation of tzcodeX.tar.gz + tzdataX.tar.gz 383796 tzdb-X.tar.xz Granted, if you're well connected all these sizes are small. But if you're paying for each kilobyte of download, a 36% savings over split .gz format (84% savings from raw tarballs) is nice to have. Also, the compressed tzdb tarball is a tad smaller than the compressed tzcode tarball, so even if you want just the data you're still a bit ahead by downloading the new combined format.
Paul Eggert <eggert@cs.ucla.edu> wrote: |Robert Elz wrote: |> Old style distribution was a single file containing both code and data. |> It was split because it was more manageable (for everyone) that way |> ps: I don't really see any need for a better compression technique than |> what is currently used (or really, for that matter, any compression \ |> at all.) |Granted, if you're well connected all these sizes are small. But if you're |paying for each kilobyte of download, a 36% savings over split .gz format \ |(84% |savings from raw tarballs) is nice to have. I concur, enterily general that is, it really matters, especially so on the country even in highly developed countries (talking about Germany). E.g., i have just recently contributed a message to a DragonFly BSD thread, though only in spirit because my messages go to /dev/null (last message of mine reported "fcntl(2) lock on /dev/null fails with EINVAL", so that is probably fixed (ha ha)) ||Its getting harder and harder to fit a reasonable base dist onto a CD. \ ||I really recommend using the USB disk image instead of trying to burn \ ||an ISO. | |I didn't want to start a thread, but now jumping in to add that |the release image was only about 170 MB when compressed with |xz(1), which would have saved ~100 MB download (and archive |storage) compared to bzip2. (This matters here, more often than |not). --steffen
On 22 August 2016 at 14:57, Paul Eggert <eggert@cs.ucla.edu> wrote:
The need for compression will go up once we start distributing .tzs files (which partly motivated the format switch).
Regardless of whether tzdata (the source dataset) and tzcode (the reference implementation) are merged into one tarball — I have no strong opinion, but would err on the side of long lead times for changes of this sort — I trust that the idea would be to keep the reference output in a separate distribution. I think concatenating the reference output in a single tarball alongside the source dataset would encourage implementations which simply parse the reference output, which may be counter to our goals. -- Tim Parenti
Tim Parenti wrote:
concatenating the reference output in a single tarball alongside the source dataset would encourage implementations which simply parse the reference output, which may be counter to our goals
I don't see how putting the reference output into a separate tarball would affect the degree of encouragement. Implementations that wanted to parse the reference output could do that regardless of whether it's in a separate tarball. For what it's worth, the reference output format is extensional, so it can lose some of the input's information. For example, the input could specify a special rule for the year 2051, info that is discarded by the current cutoff of 2050. So as things stand now, in principle the reference output is not a good choice for a downstream implementation. (If we change the reference output format to capture everything, this obstacle would go away of course.) By "extensional" I mean this: https://en.wikipedia.org/wiki/Extensional_definition
On 22/08/16 16:11, Paul Eggert wrote:
Paul_Koning@Dell.com wrote:
If I understand right, this is proposed to be an additional kit, right? The tzdatannnn kits are expected to continue?
Yes. I'd like to discontinue the old-style distribution at some point, but there's no rush.
Personally I only access the data pack. Although if there was some take-up on the provision of a tzdist service that would provide both diff's and full history of changes then there would not be a need. The lack of any traction on a service that can be used to identify when published data may need correcting is still a problem :( -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk
participants (7)
-
Lester Caine -
Paul Eggert -
Paul_Koning@dell.com -
Robert Elz -
Steffen Nurpmeso -
Stephen Colebourne -
Tim Parenti