Are tzdata2016g & tzdata-latest missing the VERSION file?

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hello, Thanks to all the folks whom have maintained and contributed to this vital database over the for years. Has the format changed somehow? If this and future database files were to break backwards-compatibility arbitrarily, it would be an unwelcome, breaking change for potentially tens- to hundreds-of-thousands of deployed endpoints and would an expensive burden and require lots of system software providers to repackage and push new software to untold millions to billions of endpoints, some of which are for-all-practical-purposes immutable (such as embedded systems / SCADA). In short: it would make a lot of people mad because it would be very expensive and time-consuming to change. For example, here's just one error from a wide-deployed utility (TZUpdater 2.1) which worked using tzdata2016f, but not with tzdata-latest nor tzdata2016g: $ java -jar tzupdater.jar -l https://www.iana.org/time-zones/repository/tzdata-latest.tar.gz Using https://www.iana.org/time-zones/repository/tzdata-latest.tar.gz as source for tzdata bundle. Source directory does not contain file: VERSION $ $ java -jar tzupdater.jar -l https://www.iana.org/time-zones/repository/releases/tzdata2016f.tar.gz Using https://www.iana.org/time-zones/repository/releases/tzdata2016f.tar.gz as source for tzdata bundle. JRE has the same version as the tzupdater provided one (tzdata2016f). $ If I might humbly suggest continuing using what was a de-facto standard for stability and least-surprise, where at all possible. Thanks, Barry Allard PS: I saw a related past discussion but can't respond to it right now. -----BEGIN PGP SIGNATURE----- Version: Mailvelope v1.5.2 Comment: https://www.mailvelope.com wsFcBAEBCAAQBQJX7i2oCRAUysGWoSICbAAAR4QP/1MoTcbGQC7IfGNOZ1z3 nwl4FQIqJc7sECdA57AgGfjMQvweR2QZkayFEvQEgiCZ+0QinRfS03EQtY+r XJzmdRarJprfDq4b91Bp5LITT0bwKIAcHsqdvHs/BSlBUP2JgwLi7DHui/kP J9YMMI2nczoQx3ozXFGo3hWWV580dnHVdm4kyvtMGkJ1DCG+uhTmePzyH/Yz Zign0qsVKuse5PBwYsqqmgCCNO8smi2XwM0jGSr7jjoYd/We7ySzgq0vdMdt 0CQ3G2sR/DeyGBvsXDEVZ8LVCaTkevmVe8X8Ky9Z+U25CYV7ZoqTWNX6TBS4 cjGMfbxdBOCXRNRyOUkYLPEeeLmrsKl76bq1nGqY00NNrinjkPdxmxrDp8OP tiw1xKQM5j67Dkm6cbVhSPAyhIKcrGfGiahNMOILYnuEx/TJDQXFBQKBWIqc aamL404CRIWlpy4URDKGIFNOblaX6v1pTgIXe60GOjbNT4iIfSA7sFBMJrWw 7gvlnn4/X4pl7LegwRAMt87ikxom7nd8IBN6VZx0MyFRKPWKVvrFkmR5ZQ5Q R5zDGFLztRACyJruktM/LTfOa/RZYn1B4pqh3Ns7VE+zrMYEjTDix79RC2Au n8eCFIgsYVVpkTQ5RrTFp4BatUsV+LG9RROwq2om83VeXJIGYiVrCrviT+ia CyJF =XmoD -----END PGP SIGNATURE-----

Barry Allard wrote:
Has the format changed somehow? If this and future database files were to break backwards-compatibility arbitrarily...
The format of the tzdata files themselves is, I think, sacrosanct. There are no compatibility issues there. The recently-discussed issues all have to do with packaging and distribution, which affects system integrators and people who repackage and redistribute the data. As far as I know it does not affect the end systems at all. Also, I get the impression -- though I am not an expert on the issues -- that most of the recent compatibility issues have involved repackagers and redistributors who had, unbeknownst to the core tz maintainers, become dependent on accidental aspects of the core distribution which were not part of the defined interface. There was no reason to suspect that making an essentially inside-the-black-box change to the implementation would end up breaking someone's downstream repackaging mechanism, until the reports came back that they had. It's true, though, that once a resource has become as widespread and indispensable as the tz database is, there's a fine line to be walked between an appropriate level of conservatism so as not to perturb downstream consumers, versus not making any improvements at all.

On 09/30/2016 02:18 AM, Barry Allard wrote:
Has the format changed somehow?
Yes and no. No, because the documented part of the format did not change. Yes, because TZUpdater apparently relies on an accidental feature (a version-number macro setting in an uninstalled Makefile) that happened to be part of the tz source code starting in 2012f, a feature that never worked well and was adjusted to work in a different way in 2016g. I did not know about this undesirable dependency as I do not use TZUpdater. The TZUpdater folks did not object to the patches circulated earlier on this list that changed how version-numbering works. So there were breakdowns all around. We have been discussing ways to do better next time, starting by documenting better what is supposed to be stable and what is not guaranteed; see <http://mm.icann.org/pipermail/tz/2016-September/024225.html>. For version numbers, we cannot go back to bad old way, as it is incompatible with the now-common practice of accessing data from the Git repository. We'll have to come up with a better way, and document it and support it, so that tools like TZUpdater that want a version string can get one. That way, the TZUpdater folks can modify their software to use the better way. This is currently a topic of discussion on the tz mailing list. Obviously whatever we come up with won't be working until the next tz release at the earliest.

On Sep 30, 2016, at 11:54 AM, Paul Eggert <eggert@cs.ucla.edu> wrote:
For version numbers, we cannot go back to bad old way, as it is incompatible with the now-common practice of accessing data from the Git repository. We'll have to come up with a better way, and document it and support it, so that tools like TZUpdater that want a version string can get one.
I am not sure why Makefile is included in tzdata archives at all, but since it is there, why not replace "unknown" in the VERSION setting with the actual version at the time the archive is created?

On Fri, Sep 30, 2016 at 12:15 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 09/30/2016 09:04 AM, Alexander Belopolsky wrote:
why not replace "unknown" in the VERSION setting with the actual version at the time the archive is created?
That would mean the Makefile would differ from what's in the repository.
Why is that a problem? The current logic in the Makefile (git describe) is useless outside of a git clone. It can be modified to use the pre-defined value if it is not "unknown" so that "make version" works in a directory where tzdata is expanded. The advantage over distributing the version file is backward compatibility.

On 09/30/2016 10:22 AM, Alexander Belopolsky wrote:
That would mean the Makefile would differ from what's in the repository.
Why is that a problem?
It's normal to distribute files unmodified from a repository. That way, when people compare what they have to what is in the repository, they see just the changes they've made. This is what we've always done for the tz project. You're right that we could go the other way and distribute files that purposely differ from what's in the repository. However, I suspect this would cause more trouble than it cures, in the long run.

On Sep 30, 2016, at 1:35 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 09/30/2016 10:22 AM, Alexander Belopolsky wrote:
That would mean the Makefile would differ from what's in the repository.
Why is that a problem?
It's normal to distribute files unmodified from a repository. That way, when people compare what they have to what is in the repository, they see just the changes they've made. This is what we've always done for the tz project.
That makes sense, but isn't that just a matter of flipping two steps (or adding one) in the release creation process? 1. Pick a version number. 2. Update foo.txt with that version number. 3. Commit foo.txt. 4. Build the kit. paul

On Fri, Sep 30, 2016 at 1:35 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
It's normal to distribute files unmodified from a repository.
There would be little harm in keeping the version in the repository's Makefile as well. The "version" target seems to ignore the VERSION= setting anyways. It is common for projects to keep the version being worked on somewhere in the project files. Logistically, the extra burden is just to update the value right after the release is tagged. This can be part of an automated release procedure.

On Fri, Sep 30, 2016 at 2:04 PM, Alexander Belopolsky < alexander.belopolsky@gmail.com> wrote:
It is common for projects to keep the version being worked on somewhere in the project files.
I meant: "It is common for projects to keep the version being worked on WRITTEN somewhere in the project files."

On 09/30/2016 11:04 AM, Alexander Belopolsky wrote:
There would be little harm in keeping the version in the repository's Makefile as well.
I'm afraid I see some harm. Yes, bumping the release could be done automatically or semiautomatically as part of a release procedure, along the lines that Paul Koning suggested. But this would mean that the only time the repository Makefile's version number would be correct would be near the time of a release; at most other points of time the version number would be wrong, and this would be confusing. Also, it would mean that bumping the release would not be an atomic operation, as it is now. I'm also worried that downstream distributors will modify the data but still call it "2016g". We are considering installing the version number along with the other data; if we do that, this mislabeling problem will get worse because the wrong label will become part of the runtime environment, and having the wrong version number in a development Makefile will likely contribute to the mislabeling.

On Sep 30, 2016, at 2:53 PM, Paul Eggert <eggert@CS.UCLA.EDU> wrote:
... I'm also worried that downstream distributors will modify the data but still call it "2016g". We are considering installing the version number along with the other data; if we do that, this mislabeling problem will get worse because the wrong label will become part of the runtime environment, and having the wrong version number in a development Makefile will likely contribute to the mislabeling.
It's inherent in any open source project that this can happen. You're dependent on the competence of distributors, which may vary. I know that Red Hat, for example, is careful to mark local changes as such and to add a fourth part to the version number to carry their local change number. If some other distributor is not so competent, you can point out to them the error of their ways. For egregious cases, public shaming is an option. paul

On Fri, Sep 30, 2016 at 2:53 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
Yes, bumping the release could be done automatically or semiautomatically as part of a release procedure, along the lines that Paul Koning suggested. But this would mean that the only time the repository Makefile's version number would be correct would be near the time of a release; at most other points of time the version number would be wrong, and this would be confusing.
This is not a new problem nor is it specific to this project. The approach that I described has been used by the CPython project for years. The only difference is that the version is specified in configure.ac and then propagated to Makefile through a series of build steps. See < https://github.com/python/cpython/blob/92fc774fcdb1fccd9eb520e7394d4463536cc...
.
Also, it would mean that bumping the release would not be an atomic operation, as it is now.
My suggestion is slightly different from Paul Koning's. I suggest that the version in Makefile is bumped *after* the release is tagged and the work on the next release starts. This is more inline with generally accepted versioning schemes where pre-release versions have the prefix corresponding to the next rather than previous release. See for example < http://semver.org/#spec-item-9>. Bumping the release is still an atomic operation: once the tag is created a final release can be built. Anything built before that is a pre-release.

One common release engineering pattern is to have a release script which builds, runs tests, bumps the version by removing -dev from the version in a commit, tags it, build/hashes/signs/releases artifacts and then really bumps the {{next minor ver}}-dev version in a new commit on master and finally pushes commits and tag. Then, it ends up that master will usually contain a -dev version (and is always stable) and releases are non-dev. Bonus points for gpg signing commits and tags, and using signed-of-by. On Fri, Sep 30, 2016 at 11:53 AM Paul Eggert <eggert@cs.ucla.edu> wrote:
On 09/30/2016 11:04 AM, Alexander Belopolsky wrote:
There would be little harm in keeping the version in the repository's Makefile as well.
I'm afraid I see some harm. Yes, bumping the release could be done automatically or semiautomatically as part of a release procedure, along the lines that Paul Koning suggested. But this would mean that the only time the repository Makefile's version number would be correct would be near the time of a release; at most other points of time the version number would be wrong, and this would be confusing. Also, it would mean that bumping the release would not be an atomic operation, as it is now.
I'm also worried that downstream distributors will modify the data but still call it "2016g". We are considering installing the version number along with the other data; if we do that, this mislabeling problem will get worse because the wrong label will become part of the runtime environment, and having the wrong version number in a development Makefile will likely contribute to the mislabeling.

Date: Fri, 30 Sep 2016 10:35:03 -0700 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <08763380-238a-9dc8-3058-3e564a33738c@cs.ucla.edu> | This is what we've always done for the tz project. For the vast majority of the life of the tz project, the repository wasn't available, so "always done" really means just the last few years... And in any case, the repository is constantly changing (if I recall the sequence of events, and I know this was an unusual case) for 2016g the repository had already been updated before the official tarballs were in place - "so the compare and see local mods" was never going to work there, and cannot really be expected to work in normal circumstances - anyone who needs to do this needs their own repo (NetBSD does this - for the data we make no local changes, so it doesn't add much, but the code is different) or at least to keep (or fetch again) a virgin distribution copy they can compare against. The way NetBSD handles the "version in the repo" problem is to update the repo to the desired version string when a new version is branched (for tz that means released, as we have no updates to released versions, just new ones) and then immediately after the release tarballs are made, update it again (given branches, one branch not says, the equivalent of, "2016g + patches", and the other "development for 2016h".) For the simpler distribution policies of the tz project, there just "2016g+" or something in the repo, immediately after the tarballs are made would be fine (and informs people that what they're seein from the repo is everything that was in 2016g, plus later patches - initially just that one single patch for the vesion string of course - and that 2016h is not available yet.) | You're right that we could go the other way and distribute files that | purposely differ from what's in the repository. However, I suspect this | would cause more trouble than it cures, in the long run. I doubt it, as in practice, after a few days anyway, that is what happens. Of course, whatever is distributed should be available from the repo with suitable extraction parameters (requesting a specific version - I don't know git well enough to know what the terminology is there) - that is, the new version string should be checked in, the release made, and then an update checked in immediately after - not just extract the files from the repo, edit one, and then ship that. kre ps: for tzdata NetBSD manually (currently) tracks the version info (we have not always updated to every new version,) so we know which we have, and which is current - and so which tarballs we should fetch - that allows us to use the version labelled tarballs, rather than "latest" - so the version info that's in there, now that there is some, somewhere, is just ignored - we make use of the info before we have fetched it. Once it seems likely that the way the version is represented inside the tarballs becomes stable, we will (or at least I intend) that we will check what we're expecting to get with what we fetched claims to be as one more validation step. For tzcode we don't much care what the version's label is, NetBSD's code base is different enough (though must of it is based upon the reference impl, originally) that claiming we're on 2016g code would be a misrepresentation. The code gets updated mch less frequently than the data (it is considerably more work to merge) which is one reason I am not much a fan of the combined tarball approach - mostly the code is just trash to discard.)

On 09/30/2016 12:38 PM, Robert Elz wrote:
For the simpler distribution policies of the tz project, there just "2016g+" or something in the repo, immediately after the tarballs are made
Although that's better than the pre-2016g tz versioning scheme, it's worse than the current scheme because it would use the same version number "2016g+" for every commit between 2016g and the next release. In contrast, the current tz versioning scheme updates the version number automatically with every commit, and this is more precise. For example, in the current development repository (commit 63207b74698aa9642f9c17f635c65b0114c6d191) the version number is 2016g-11-g63207b7, whereas in the previous commit the version number is 2016g-10-g373261b. In larger projects there may be reasons to use less-precise version numbers, as this avoids rebuilding everything that depends on the version number merely because some otherwise-unrelated component changes. Also there's some inertia, as older projects (such as CPython, mentioned by Alexander Belopolsky in this thread) developed their versioning schemes with repository software that didn't support automatic version-number generation as well as Git does. The tz project is small, though, and we don't have to worry about SCCS compatibility any more, so neither of these issues are significant for us.

Date: Fri, 30 Sep 2016 13:06:25 -0700 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <631e15b0-31dd-be7f-36ec-1da444456075@cs.ucla.edu> | Although that's better than the pre-2016g tz versioning scheme, it's | worse than the current scheme because it would use the same version | number "2016g+" for every commit between 2016g and the next release. You obviously don't have to do that ("2016g+") if you don't want, a new version string for every commit is just fine (though personally I find the git auto-generated version strings obnoxious - not that I have any better scheme for auto generated versions) - the point was to explicitly set the version string to the release identifier (overriding any auto generated version) to make the release tarballs and then set it back to indicate a development version immediately after that (watever that version string looks like, and however frequently it changes - for netbsd while we don't change the "2016g+patches" type version ID (any patches there are reqired to keep compat, so all you ever get is bug fixes) but the "on the way to 2016h" type version ID is updated as required, that one is not constant. I don't think it matters (especially here where there are not hundreds of actual committers) whether the release generation is in any sense atomic. It is all going to happen with one "make tarballs" or whatever, whether that runs one git command, or a dozen, really doesn't matter (the only significant cost is that of safely generating the signatures while protecting the private key). All that is requied is that there be an easy way to extract the released versions from the repo (and possibly/probably less easy ways to extract the intermediate versions should that ever be needed, and any repo software & scheme, managed rationally, should allow all of that.) Also, nothing that you can do (possibly excepting just asking "please don't do that") can possibly solve the problem of redistributors changing the data without changing the version string - assuming the redistributors will all use git and the git automated version stuff will make it happen is just naive. The signatures allow people who actually see the tarballs to know whether they have the original versions or not, but for all those who just get the binary files installed, there's nothing rational to be done (we could generate and distribute them, and sign each one, but that only works if we somehow convince the implementations to actually verify the signatures - and the overhead of that, for everyone, would be absurd.) kre

On 09/30/2016 01:54 PM, Robert Elz wrote:
scheme for auto generated versions) - the point was to explicitly set the version string to the release identifier (overriding any auto generated version) to make the release tarballs and then set it back to indicate a development version immediately after that
Oh, perhaps I wasn't clear, as the current tz scheme does that too. That is, if you go back in time by typing 'git checkout 2016g' and then type 'make', the automatically-generated version is simply '2016g' instead of a more-verbose version number generated for commits that are between one release and the next. The idea is to use Git's version-number generator rather than reinvent its wheel poorly.
I don't think it matters (especially here where there are not hundreds of actual committers) whether the release generation is in any sense atomic.
I agree that atomicity is a nicety and not an essential for us.
All that is requied is that there be an easy way to extract the released versions from the repo
Currently, that's done with 'make version' which is a small front end for 'git describe'. It should be easy for developers who use a Git repository to run simple Git commands like that.
assuming the redistributors will all use git and the git automated version stuff will make it happen is just naive.
Yes, we don't want to assume that. The current version-numbering system assumes either the traditional tarball download that contains a version number built for you, or a Git repository where you generate the version number.
the only real equirement is that the location be fixed once and then not changed (just its value updated)
Yes, that's the main thing. We're not there yet, though.

Oh, I also meant to say in that last message that I don't think it really matters where in the distribution the version identifier is placed. Whether it is in a Makefile, or a version,h or version.txt, or even just in the NEWS, the only real equirement is that the location be fixed once and then not changed (just its value updated) - the redistributors will work out whatever is needed for their systems and add whatever mechanism is needed to get the version info from wherever it is put by the tz project to wherever they want to store it. That's what it sounds like TZUpdater was doing already - just without being aware that the data they wre using wasn't something that was considered stable here. kre

On 2016-09-30 13:38, Robert Elz wrote:
On Fri, 30 Sep 2016 10:35:03 -0700, Paul Eggert wrote:
This is what we've always done for the tz project. The code gets updated mch less frequently than the data (it is considerably more work to merge) which is one reason I am not much a fan of the combined tarball approach - mostly the code is just trash to discard.)
Most distributions I've looked at ship zic, zdump, and tzselect as part of a libc related base utils package. They selectively make and install those and their man pages, and sometimes some docs, using their own package build scripts to the standard places. -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

On Fri, Sep 30, 2016, at 18:27, Brian Inglis wrote:
Most distributions I've looked at ship zic, zdump, and tzselect as part of a libc related base utils package. They selectively make and install those and their man pages, and sometimes some docs, using their own package build scripts to the standard places.
And are built from the glibc source (or their own respective repository, in the case of other OSes) rather than the tzcode source.

Random832 wrote:
And are built from the glibc source (or their own respective repository, in the case of other OSes) rather than the tzcode source.
Yes, and that's something that can be tricky when one talks about tz "version". For zic, zdump and tzselect the latest glibc is a copy of 2015g tzcode, and many distributions combine this with 2016f or 2016g tzdata. This is true for the machine I'm typing this on, which is running Ubuntu 16.04.1 LTS; on it, tzselect --version reports "zdump (Ubuntu GLIBC 2.23-0ubuntu3) 2.23", so it says it is a modified version of glibc 2.23; as it happens this part of glibc has not been modified other than by configuring the version number, bug-report address, etc. in the usual way. There's often not a single "version number" for the installed binary data files. On my machine the data source files came from 2016f and were compiled with zic that came from 2015g, and both version numbers are relevant to the installed binary data.
participants (8)
-
Alexander Belopolsky
-
Barry Allard
-
Brian Inglis
-
Paul Eggert
-
Paul.Koning@dell.com
-
Random832
-
Robert Elz
-
scs@eskimo.com