On Fri, Aug 26, 2016, at 19:10, Antonio Diaz Diaz wrote:
Paul Eggert wrote:
please use a compression format that can be handled easily by Windows users as well. For instance, choose a format from the list that 7Zip can handle: http://www.7-zip.org/
Thanks for mentioning the problem. xz format is on 7-Zip's list; it's a tiny bit larger than lzip format for our data (0.3% larger for the draft tzdb tarball) but I suppose portability trumps this minor advantage.
Please, do not use xz for a new distribution format. The xz format is defective. See for example http://www.nongnu.org/lzip/xz_inadequate.html
Seems like a lot of fear, uncertainty, and doubt. " Xz was designed as a fragmented format. Xz implementations may choose what subset of the format they support. For example the xz-embedded decompressor does not support the optional CRC64 check, and is therefore unable to verify the integrity of the files produced by default by xz-utils. Xz files must be produced specially for the xz-embedded decompressor. " - is this last sentence even true? does xz-embedded fail to open the files, or merely doesn't run the integrity check? Someone could write an lzip extractor that ignores the CRC, would this be an indictment of your format? "It has room for 2^63 filters, which can then be combined to make an even larger number of algorithms. Xz reserves less than 0.8% of filter IDs for custom filters, but even this small range provides about 8 million custom filter IDs for each human inhabitant on earth. There is not the slightest justification for such egregious level of extensibility. " - this seems like a criticism of data type choice? I'm not sure what the point is. "The 'file' utility does not provide any help:" "Xz-utils can report the minimum version of xz-utils required to decompress a given file, but it must examine the file contents to find it out," - how does 'file' work if not by examining the file content? "Not only data at a random position are interpreted as the CRC. Whatever data that follow the bogus CRC will be interpreted as the beginning of the following field, preventing the successful decoding of any remaining data in the stream. " What are the odds that the bytes found there will coincidentally match the CRC of the short data? And won't a corrupted length field always prevent the successful decoding of any remaining data, regardless of how the CRC is stored relative to it? ---- Anyway, why even use a compressed format? Is the data large enough for it to matter?