Request to provide .tar.gz tarballs for tzdb
Dear colleagues, We’ve been asked by this correspondent to forward this request on to this list. In response to the original request we explained the lzip distribution is largely an amalgam of the gzip distributions, and they indicated they still wanted gzip to be considered specifically to increase “availability for low end systems as lzip isn’t that widely supported”: Subject: Provide .tar.gz tarballs for tzdb Date: Mon, 2 May 2022 13:23:15 +0300 From: "Firas Khalil Khana" <firasuke@gmail.com> To: iana@iana.org Hey there, I've come to notice that both tzcode and tzdata have .tar.gz tarballs but tzdb only has .tar.lz (lzip) tarballs which isn't a commonly used format. Would it be possible to provide gzipped tarballs for tzdb as well? Thanks! -- Kim Davies VP, IANA Services, ICANN President, PTI
they indicated they still wanted gzip to be considered specifically to increase “availability for low end systems as lzip isn’t that widely supported”:
Unfortunately I'm not quite seeing the use case here. If it's a low-end system that supports only gzip-style compression, presumably it is an end-use system that merely needs access to the data and it should get tzdataXXXXX.tar.gz. In contrast, the tzdb tarball contains code intended for use by developers, who are expected to have access to software tools like cc and lzip that a low-end system might not have, and developers can easily gain access to these tools if they don't have them already. If we were to start from scratch now, Zstandard (Internet RFC 8878) would be a good choice for distributing the data as it outperforms gzip on both speed and compression ratios (though it doesn't compress as well as the much-slower lzip), and my impression is that Zstandard is gaining in popularity. Perhaps we could start by encouraging the distribution of tzdata.zi in Zstandard format in addition to what we already distribute; to represent all of TZDB 2022a's data, tzdata.zi.zst needs just 24,604 bytes, which is nearly as good as tzdata.zi.lz's 22,460 bytes. One more thing. We shouldn't assume that iana.org as a utility that billions of devices can blithely access for time zone data. If you're building an IoT application for low-end devices, please arrange to have your downstream servers serve up time zone data. And if you do that, you can use any compression format you like.
Thanks for the info. I'm just having a difficult time understanding why both tzcode and tzdata use gzip, while tzdb uses lzip. Why not use gzip for all? Or even better Zstandard for all? Come to think of it, I think it would be better to switch to Zstandard as it's much more popular than lzip. On Thu, May 5, 2022 at 6:21 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
they indicated they still wanted gzip to be considered specifically to increase “availability for low end systems as lzip isn’t that widely supported”:
Unfortunately I'm not quite seeing the use case here. If it's a low-end system that supports only gzip-style compression, presumably it is an end-use system that merely needs access to the data and it should get tzdataXXXXX.tar.gz. In contrast, the tzdb tarball contains code intended for use by developers, who are expected to have access to software tools like cc and lzip that a low-end system might not have, and developers can easily gain access to these tools if they don't have them already.
If we were to start from scratch now, Zstandard (Internet RFC 8878) would be a good choice for distributing the data as it outperforms gzip on both speed and compression ratios (though it doesn't compress as well as the much-slower lzip), and my impression is that Zstandard is gaining in popularity. Perhaps we could start by encouraging the distribution of tzdata.zi in Zstandard format in addition to what we already distribute; to represent all of TZDB 2022a's data, tzdata.zi.zst needs just 24,604 bytes, which is nearly as good as tzdata.zi.lz's 22,460 bytes.
One more thing. We shouldn't assume that iana.org as a utility that billions of devices can blithely access for time zone data. If you're building an IoT application for low-end devices, please arrange to have your downstream servers serve up time zone data. And if you do that, you can use any compression format you like.
On 5/5/22 09:18, Firas Khalil Khana wrote:
Why not use gzip for all?
gzip doesn't compress as well as lzip.
Or even better Zstandard for all?
Zstandard wasn't available when lzip was chosen, and it doesn't compress as well as lzip does. The tzdb tarball is intended for developers, where lzip's relative slowness in compression and decompression is unimportant. Also, the benefits of switching to a better compression algorithm need to be weighed against the maintenance hassle of adapting to the switch.
I apologize for what is certainly an off-topic rant, but:
Why not use gzip for all?
gzip doesn't compress as well as lzip.
But gzip compresses well enough, and storage is almost infinitely cheap.
Also, the benefits of switching to a better compression algorithm need to be weighed against the maintenance hassle of adapting to the switch.
Absolutely. And that hassle is HUGE. Every few years, I have to replace my monitor, after putting my fist through it out of frustrated rage, because I've just downloaded some package I'm interested in, but first I have to go through the yak-shaving exercise of downloading and building a decompressor for the nifty new "better" decompression algorithm it uses. (I'm kidding about the broken monitor part, but just barely.) But, clearly, I'm just a cranky old Luddite. Progress marches on. Please, let's not clutter this list with a bunch of zealous endorsements of how Very Much Better the latest compression algorithms are, and how stubborn I am for failing to appreciate them.
On May 5, 2022, at 11:21, Paul Eggert via tz <tz@iana.org> wrote:
One more thing. We shouldn't assume that iana.org <http://iana.org/> as a utility that billions of devices can blithely access for time zone data. If you're building an IoT application for low-end devices, please arrange to have your downstream servers serve up time zone data. And if you do that, you can use any compression format you like.
A point worth emphasizing. For details of the fallout from an infamous and analogous design misstep (not directly involving TZ or IANA), see: https://en.wikipedia.org/wiki/NTP_server_misuse_and_abuse#D-Link_and_Poul-He... <https://en.wikipedia.org/wiki/NTP_server_misuse_and_abuse#D-Link_and_Poul-He...> Cheers! |---------------------------------------------------------------------| | Frederick F. Gleason, Jr. | Chief Developer | | | Paravel Systems | |---------------------------------------------------------------------| | All progress is based upon a universal innate desire of every | | organism to live beyond its income. | | | | -- Samuel Butler | | "Notebooks" | |---------------------------------------------------------------------|
participants (5)
-
Firas Khalil Khana -
Fred Gleason -
Kim Davies -
Paul Eggert -
scs@eskimo.com