On 2017-05-23 02:29, Paul Eggert wrote:
We are planning to ship a new subpackage for users who want to have access to the raw zone data files e.g. leapseconds
This is a good idea overall; thanks. Here are some comments and suggestions for improvement.
First, as a terminology issue, we need a better name than "raw zone data". The files we're talking about are ordinary text files, and "raw" has the wrong connotation for text. Also, the package name "tzdata-zonedata" is repetitive and somewhat-confusing. Instead, how about a package name like "tzdata-info" or "tzdata-src" or something like that?
Or tzdata-source, although Debian packagers may balk at that usage, as RH packagers balk at tzdata-src.
Just as an example we would ship the following files: LICENSE The LICENSE file conveys misleading information for the files in question, as they are all public domain, so let's not install it. Of course if you want to install all the source files as a package, then LICENSE should be included along with all the other files in the tzdb tarball; but as I understand it, the goal here is to install only the data source.
Almost mandatory nowadays for consideration for packaging, and avoidance of doubt, it states that all the files are PD, with code exceptions.
africa antarctica asia australasia europe northamerica southamerica pacificnew etcetera backward systemv factory backzone
The installed source data should match the installed binary data, so the above list of files needs to be adjusted to match what's installed as binary data. For example, by default 'backzone' should be omitted since its data items are normally not installed.
Packagers may use any of back{ward,zone} and zone{,1970}.tab generating their binary packages based on the reference distribution, depending on their policy decisions and tradeoffs of space vs backward compatibility with earlier releases. The corresponding tzdata distribution source packages can be installed by those who want one-for-one source. This (sub-)package is for those who want only the source data for other uses, implied by the suggested approach.
Also, that's a long list of file names. I would rather not propagate implementation details like this list into the installation directory. Although the intent may be that "the raw zone data format may change", in practice what happens is that people depend on the format. So we might as well use a simple format rather than a complicated one; see below for a specific proposal.
iso3166.tab zone1970.tab zone.tab.
These files are already installed, and installing copies of them in a different directory would lead to operational problems. How about if we just leave them where they already are?
Do we know all or any of these are installed with all binary distributions? This proposed package is effectively a data only developer package for those who do not use the reference distribution code, and for various reasons may not want to have the source code on their systems.
leapseconds leap-seconds.list
We need not and probably should not ship two text files that contain the same leap-second info in different representations. As we're considering removing leap-seconds.list anyway, let's just install 'leapseconds' and skip leap-seconds.list.
Some distributions ship neither, and e.g. Debian ships only original source file leap-seconds.list, from what I can find. In conformance with NTP crypto file guidelines, which is why this file is generated and how it is intended to be propagated, the canonical name is leap-seconds.<timestamp> where <timestamp> is the NTP time stamp for the file generation time, allowing for checking whether this is the latest proventic generation, soft linked to a generic name, in this case leap-seconds.list.
version
I would rather that we didn't recommend installing this file in the tzdb source, as that would be a maintenance hassle and anyway the file is not needed to generate the binary data. Similarly, I don't think the installation directory's name should contain the tzdb version number, as others have proposed. Versioning should be an independent aspect of operations, and it should not be our job.
It is the packagers job to ensure that some indication of version is available, and that indication is now in the version file. It is probably desirable for intended users (and necessary for packagers) to allow multiple releases to be installed simultaneously, with symlinks like ...-latest, or without any suffix, used operationally by admins, packagers, developers, or users to designate the currently preferred release.
With the above in mind, here's a simpler proposal: We optionally install two text files: 'leapseconds' and a new file 'tzdata.zi' containing the parts of asia, australasia, etc. that are actually used to create the binary data.
The idea is that 'zic tzdata.zi' exactly re-creates the installed binary data files, and that 'zic -l leapseconds tzdata.zi' does the same for data with leap seconds. Programs that want text rather than binary data can read tzdata.zi (and optionally, 'leapseconds'). Because tzdata.zi uses the documented zic format, third-party tools can parse it. (".zi" stands for "zoneinfo": ".zi" is to zic as :.c: is to cc.)
Packagers prefer to distribute source files as is, and as the intent of this (sub-)package is presumably to allow users to develop other products based on the data files, or possibly further subset them, as for embedded distributions, original file names, sizes, and timestamps ensure that no files or data are missing from the package.
We can install these two text files by default into the same directory as the already-installed text files iso3166.tab, zone1970.tab, and zone.tab.
As all files are currently available in the original and distribution source packages corresponding to the tzdata binary packages, there are other requirements from the requesters implied by installation into a distinct directory only the source data files, sufficient for a major vendor distributor to plan on releasing separate packages. These could be used by downstream language packagers e.g. ghc, java, dotnet, mono, python, ruby, etc. in their ...-tzdata-... packages, as well as by embedded distribution or application packagers, e.g. Oracle, etc., who may have to maintain a strictly documented long term audit trail from original source data to selected source data to generated binary data, to meet standards and for financial and government systems and applications. Requests about how to handle some of these requirements have been posted on the list. -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada