On Tue, Aug 30, 2016 at 5:18 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
$ ls -l tz*.tar.*z* -rw-r--r-- 1 eggert eggert 202609 Aug 30 14:00 tzcode2016X.tar.gz -rw-r--r-- 1 eggert eggert 394169 Aug 30 14:00 tzdata2016X.tar.gz -rw-r--r-- 1 eggert eggert 426667 Aug 30 14:10 tzdb-2016X.tar.bz2 -rw-r--r-- 1 eggert eggert 382991 Aug 30 14:00 tzdb-2016X.tar.lz
If the size of data distribution is a concern, it looks like one can achieve a much better compression by simply discarding comments in the data files: $ cat africa antarctica asia australasia \ europe northamerica southamerica | wc -c 647830 $ cat africa antarctica asia australasia \ europe northamerica southamerica | egrep -v '^\w*(#.*|$)' | wc -c 151231 Given the structured (low entropy) nature of the resulting stream, it compresses very well: $ cat africa antarctica asia australasia \ europe northamerica southamerica | egrep -v '^\w*(#.*|$)'| xz -c | wc -c 24600