On 2020-08-20 14:37, Paul Eggert wrote:
On 8/20/20 12:33 PM, Brian Inglis wrote:
They often have bigger issues with space for decoding, data storage, and use; one suggestion was a stream compressed list of file base names and POSIX strings from the last line of the files e.g.
The main argument for POSIX strings is simplicity rather than saving storage or network capacity or even CPU time. Currently all the tzdb data can be compressed to 22,418 bytes, via 'lzip -9 tzdata.zi'. Although limiting tzdata to (compressed) names and POSIX strings would shrink that considerably, I doubt whether the data shrinkage is worth it, except perhaps for embedded applications where all timestamps are future timestamps.
Agreed - the issues are not on IoT dev SoC boards with GB of RAM and flash, running minimal BSD or Linux distros, but the deployed SoCs may have only MB or even KB of ROM, RAM, and perhaps no flash. The code space and memory required for modern decompressors may be over the available memory budget on the chip and perhaps over the available time budget for the deployed processor speed. The simple reference lzip decompressor lzd requires 5MB virtual on a 32 bit system. Editing approaches such as using tzdata.zi, POSIX TZ strings, airport codes for locations, and eliminating all but a single location with identical current rule sets per offset, probably reduce size more and allow lighter weight code for limited memory systems. For comparison, I took the current continent data files only, concatenated, tarred, zipped, 7zed them to produce the tz.* files, also picked the POSIX TZ strings from the last lines of the equivalent zoneinfo files, kept only the basename of the path, and manually eliminated truly redundant duplicated names to get tz-posix.log, then compressed tz.tar, tz.txt, tzdata.zi, and tz-posix.log, using the available compressors, all using the highest -9 compression where available, including zip with bzip2, to get the following file and archive sizes: 750K tz.tar 741K tz.txt 329K tz.tar.Z 328K tz.txt.Z 251K tz.tar.gz 250K tz.txt.gz 225K tz.tar.zst 224K tz.txt.zst 219K tz.zip 202K tz.tar.7z 202K tz.tar.xz 202K tz.7z 202K tz.txt.7z 202K tz.txt.xz 202K tz.tar.lz 201K tz.txt.lz 192K tz.tar.zip 192K tz.tar.bz2 192K tz.txt.zip 192K tz.txt.bz2 109K tzdata.zi 39K tzdata.zi.Z 27K tzdata.zi.gz 26K tzdata.zi.zst 22K tzdata.zi.7z 22K tzdata.zi.xz 22K tzdata.zi.lz 22K tzdata.zi.zip 22K tzdata.zi.bz2 12K tz-posix.log 5.4K tz-posix.log.Z 3.8K tz-posix.log.gz 3.7K tz-posix.log.zst 3.6K tz-posix.log.zip 3.6K tz-posix.log.7z 3.5K tz-posix.log.xz 3.5K tz-posix.log.lz 3.5K tz-posix.log.bz2 -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada This email may be disturbing to some readers as it contains too much technical detail. Reader discretion is advised. [Data in IEC units and prefixes, physical quantities in SI.]