Some thoughts about the way forward
A modest proposal, if I may. Let's take it as read that Paul is going to stick to his guns about the May reorganization --- he's certainly shown no willingness to do otherwise. (I do accept that there are valid reasons for that move, even if I differ as to their importance.) What do we need to do to mitigate the undesirable consequences of that reorganization? After chewing on this for awhile, it seems to me that the really fundamental undesirable consequence is going to be inconsistent received versions of tzdb. Currently, the usage of backzone seems to be quite negligible [1]. But I fear it is certain that some platforms will start using backzone, in order to placate users who are negatively affected by the May changes. That will mean that those platforms will start showing different results from platforms that don't do that, for all zones in backzone (both old and new). I rate this outcome as disastrous on multiple levels, not least being the damage to the credibility of the tzdb project itself. To avoid that, I already proposed [2] that we drop the links in "backward" that are overwritten by backzone. This would have the effect that a non-backzone build would not offer those zone names at all, instead of presenting them as having the data content of some other zone. This would eliminate the issue of different platforms presenting different results for the "same" zone. If we do that, it seems probable that instead of some platforms adopting backzone, nearly all will, because otherwise their users will moan about their favorite zone being gone. I therefore further suggest that we make use of backzone be the default. Anybody who wants a "lean and mean" build can leave it out --- but they'll be presenting a clean subset of the data seen in the default build, rather than data that is different and known to be less good in some cases. This approach suggests that we make some adjustments in how we think about things. I think we ought to rename backzone to "extended" or something like that. We'd view tzdb as offering a "base" set of zones that are considered in-scope per the rule about different-since-1970, plus an "extended" set of zones that are out-of-scope and are not maintained as carefully as the base. (This really is just applying different terminology to the existing understanding about how backzone is maintained.) The primary advantage of doing it this way, rather than the way we're handling backzone now, is that it's a lot clearer to end users what the status of the extended zones is, and we're not confusingly offering two different versions of those zones. Assuming we make these changes before shipping git tip in its present form, we could expect that users in zones affected by the May changes would see no actual change in their TZ data. There would be a loss of data stability for users in the zones that were moved to backzone previously. I'm not terribly thrilled about that, but it seems like the least amount of damage to the least amount of people, compared to any other likely outcome. We could hopefully placate those users by pointing out that (1) the new data, while likely not perfect, is almost certainly a net improvement over what was shipped before, and (2) this is effectively reverting these zones to their pre-2015 state. We'd thus be acknowledging that the original implementation of backzone was a mistake, and undoing it with the least amount of side-effects we can manage. I shall now retire and wait to be shot at ... regards, tom lane [1] https://mm.icann.org/pipermail/tz/2021-September/030572.html [2] https://mm.icann.org/pipermail/tz/2021-September/030632.html
On 9/23/21 14:39, Tom Lane via tz wrote:
There would be a loss of data stability for users in the zones that were moved to backzone previously. I'm not terribly thrilled about that, but it seems like the least amount of damage to the least amount of people
No, quite the reverse is true. More timezones (and more people) would be affected by adopting backzone, than by what's in the development version now. For example, the population of Chongqing is about double that of Norway and Sweden combined. And backzone's Asia/Chongqing stands for a lot more than just the municipality of Chongqing. If you want to maximize data stability under the constraint of being fair, then the current development repository beats all other proposals I've seen so far.
If you want to maximize data stability under the constraint of being fair,
You are causing potentially a lot of compatibility issues for people around the globe — in a tremendous (and inexplicable) rush — because of your notion of 'fair'. That notion seems to be shared by few if any other people. Could you explain *exactly* how people in Africa (for example) are disadvantaged by having pre-1970 data for Oslo and Berlin? How exactly are their lives made worse? Mark On Thu, Sep 23, 2021 at 4:58 PM Paul Eggert via tz <tz@iana.org> wrote:
On 9/23/21 14:39, Tom Lane via tz wrote:
There would be a loss of data stability for users in the zones that were moved to backzone previously. I'm not terribly thrilled about that, but it seems like the least amount of damage to the least amount of people
No, quite the reverse is true. More timezones (and more people) would be affected by adopting backzone, than by what's in the development version now. For example, the population of Chongqing is about double that of Norway and Sweden combined. And backzone's Asia/Chongqing stands for a lot more than just the municipality of Chongqing.
If you want to maximize data stability under the constraint of being fair, then the current development repository beats all other proposals I've seen so far.
On 9/23/21 17:44, Mark Davis ☕ wrote:
You are causing potentially a lot of compatibility issues for people around the globe
Not really. We've done this several times before, and the compatibility issues were negligible.
Could you explain *exactly* how people in Africa (for example) are disadvantaged by having pre-1970 data for Oslo and Berlin?
How exactly are their lives made worse?
Do I really have to explain this? If we give COVID-19 shots to people in San Francisco but not Los Angeles, purely for reasons unrelated to public health, we are being unfair even though Los Angelenos' lives will be not be made worse - they will die off at the same rate as before. (Sorry about the gruesome analogy. I just spent the first day back teaching classes at UCLA - in person for the first time since March 2020, yay! - and COVID-19 measures are on everybody's minds.)
So your solution is to give COVID shots to nobody, instead of working to get COVID shots to more people? This may seem snide, but I'm just trying to understand the logic. In any event, that doesn't answer the question.
How exactly are their lives made worse?
My chief concern is instability and incompatibility. Can you please supply just one concrete example: A person in Kenya will be better off by having Oslo merged with Berlin because: <fill in the blank> Also, why is it so very, very important to make this change right now, even though essentially everyone who has looked at the situation says to wait? <fill in the blank> Mark On Thu, Sep 23, 2021 at 6:07 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
On 9/23/21 17:44, Mark Davis ☕ wrote:
You are causing potentially a lot of compatibility issues for people around the globe
Not really. We've done this several times before, and the compatibility issues were negligible.
Could you explain *exactly* how people in Africa (for example) are disadvantaged by having pre-1970 data for Oslo and Berlin?
How exactly are their lives made worse?
Do I really have to explain this? If we give COVID-19 shots to people in San Francisco but not Los Angeles, purely for reasons unrelated to public health, we are being unfair even though Los Angelenos' lives will be not be made worse - they will die off at the same rate as before.
(Sorry about the gruesome analogy. I just spent the first day back teaching classes at UCLA - in person for the first time since March 2020, yay! - and COVID-19 measures are on everybody's minds.)
On 9/23/21 9:00 PM, Mark Davis ☕ wrote:
My chief concern is instability and incompatibility
2021a1 will give you maximum stability and compatibility with 2021a, so you can use that if equity is not as much of a concern for you.
why is it so very, very important to make this change right now
The equity issue has been on the table for months, no other approach has been developed or tested, and the only other approaches proposed would be less stable and compatible than the already built-and-tested 2021b would be. The equity issue was raised early this year, and we've delayed dealing with it for far too long already. Equity is a real issue of concern, and it's a bad look for us if we continue with a clearly-inequitable primary distribution when a fairer approach has long been implemented and available and nothing else is available. This is mostly a disagreement about maintenance philosophy not end-user functionality, as the pre-1970 differences between 2021a1 and 2021b will be minor when considered from end users' point of view. We know this because we've made similar changes many times in previous releases. I'll be happy to collaborate on building something that will accommodate our philosophical differences in later releases, and have already proposed specific (though not-yet-installed) working code that goes a long way toward doing that. Having had some experience with writing and testing that code, I have confidence that this technical approach will succeed if the community wants to work together on this. Of course there will be issues - among other things, the at-least-one-Zone-per-country-code philosophy is even more unstable/incompatible than 2021b will be - but they're clearly solvable.
On Fri, 24 Sept 2021 at 08:09, Paul Eggert via tz <tz@iana.org> wrote:
On 9/23/21 9:00 PM, Mark Davis ☕ wrote:
My chief concern is instability and incompatibility
2021a1 will give you maximum stability and compatibility with 2021a, so you can use that if equity is not as much of a concern for you.
Except that the name 2021a1 is *not* compatible with https://data.iana.org/time-zones/tz-link.html A requirement of "in order to keep stability and compatibility, you have to adopt a new naming scheme which is incompatible with the old one" seems counterproductive to me. You're basically asking people to choose between stability of naming scheme or stability of data - and unnecessarily, IMO. If you just changed the names to (say) 2021b and 2021b-equity (or 2021b1 if you prefer) then: - Those who value stability and compatibility get what they want, at the cost of equity - Those who value equity get what they want, at the cost of stability and compatibility Jon
On 9/24/21 12:32 AM, Jon Skeet wrote:
Except that the name 2021a1 is*not* compatible with https://data.iana.org/time-zones/tz-link.html
I don't know of any software that will break due to the name 2021a1. If you know of one, we could issue 2021b and 2021c. I had already considered doing that but thought that it would more problematic than what I ended up proposing, for reasons that I hope are obvious. Making the equitable distribution look like an optional flaky branch is not the way to move forward. It shouldn't be considered optional because fairness ought to be one of our core principles. And it shouldn't be considered flaky because it's not flaky; we've done this sort of thing many times before without significant incident.
On Fri, 24 Sept 2021 at 08:51, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 9/24/21 12:32 AM, Jon Skeet wrote:
Except that the name 2021a1 is*not* compatible with https://data.iana.org/time-zones/tz-link.html
I don't know of any software that will break due to the name 2021a1. If you know of one, we could issue 2021b and 2021c. I had already considered doing that but thought that it would more problematic than what I ended up proposing, for reasons that I hope are obvious.
Concrete issue with Android, from a mail from Almaz Mingaleev: For Android having 2021a1 and 2021b would be inconvenient. Because
there are hardcoded places which expect that tzdata version is exactly 5 characters. And we can't update that code along with time zone files.
(I acknowledge that there's already the potential for problems there if 2021aa is ever needed, but there would at least be fairly clear warning that that was coming - by the time we got to 2021p or so, I'd expect it to be looked at seriously.) Concern, though less specific, from Derick Rethans I can't remember the last time there was a number after the version letter
(so 2004, at the latest), and none of the tooling that I've been involved with will know how to handle this.
Speculative concern from Florian Weimer: I'm slightly worried that people have grown to depend on the \d+[a-z]+
format for version numbers, so this choice of version might break some things.
Known breakage reported by Paul Ganssle (the second sentence): This is not exactly a guarantee, but 2021a1 does violate that nomenclature,
which will likely break scripts that rely on it (I have scripts that actively assert that the version numbering follows this convention, for example).
So that's a mixture of "we know X and Y will break, and we think other things may do as well". Is that sufficient evidence to convince you that 2021a1 is problematic? Making the equitable distribution look like an optional flaky branch is
not the way to move forward. It shouldn't be considered optional because fairness ought to be one of our core principles. And it shouldn't be considered flaky because it's not flaky; we've done this sort of thing many times before without significant incident.
I wouldn't use the word "flaky" but I *would* say it's experimental, in that we don't genuinely know the impact of a change of *this* scale. I would suggest that we've done "this sort of thing" on a smaller scale. To look at it another way: what's the absolute urgency here? If you *just* release 2021b as "2021a + Samoa" then we're basically in the position we were in before. If there's a pressing need for the "equitable distribution" to be released, then presumably there was before - but it hasn't been released. It feels to me like "the need to get onto the equitable distribution" (ideally with community consensus, which I think is lacking at the moment) and "the need to get the Samoa changes out" are orthogonal - whereas your proposed releases conflate the two. Jon
On 9/24/21 1:08 AM, Jon Skeet wrote:
we don't genuinely know the impact of a change of *this* scale. I would suggest that we've done "this sort of thing" on a smaller scale.
OK, how about if I scale back the current round of link-merging, so that it's on the scale of what we've done in previous releases? I would not at all be happy with such an approach since it would delay the release of an equitable solution, but if this approach will help reach consensus I can prepare a patch along those lines. The idea would be to finish the job in the next few releases.
On Fri, 24 Sept 2021 at 09:39, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 9/24/21 1:08 AM, Jon Skeet wrote:
we don't genuinely know the impact of a change of *this* scale. I would suggest that we've done "this sort of thing" on a smaller scale.
OK, how about if I scale back the current round of link-merging, so that it's on the scale of what we've done in previous releases? I would not at all be happy with such an approach since it would delay the release of an equitable solution, but if this approach will help reach consensus I can prepare a patch along those lines. The idea would be to finish the job in the next few releases.
I'd need to see exactly what's proposed, as would others, but I'm happy to look. However, I wouldn't expect us to be able to get opinions on that *immediately* - whereas we really need a change for Samoa ASAP. So I personally (I'm not going to attempt to speak for anyone else) would be happy with a plan of: - Release 2021b as "2021a + Samoa" today - Create the patch with the limited changes and submit it for community approval, with an eye to releasing that (and making progress in your view) in a few weeks, whether or not there are any other data changes to be released. What we're still missing is any way of taking the community view, admittedly. I'd suggest that a GitHub PR with the change could allow "voting" via reactions (as well as comments on the details, of course). I suspect that's clearer than just mailing list discussions - although of course it requires folks to have GitHub accounts. Jon
On 9/24/21 2:10 AM, Jon Skeet wrote:
I'd need to see exactly what's proposed, as would others, but I'm happy to look.
OK, I checked old releases and a common pattern was to replace nine Zones with Links (done, for example, in 2014g, 2014h, and 2014j). This was the median number of replaced Zones per link, in releases between 2013 and 2015. So a patch is attached, to do something similar this time for 2021b. Although the size of this patch may appear forbidding, it greatly shrinks the distance between 2021a and 2021b. The effect of this patch is to revert all the disputed changes, except for the changes to the following nine Zones which are still moved to 'backzone': Africa/Accra, America/Atikokan, America/Blanc-Sablon, America/Creston, America/Curacao, America/Nassau, America/Port_of_Spain, Antarctica/DumontDUrville, Antarctica/Syowa. These Zones were chosen by using the following shell command: git diff 2021a..39df8c8b22605f59f71213cfb92b3fd321e31d3c backzone | awk '/^+Zone/ && $2 != "Pacific/Kanton" && $2 != "Pacific/Enderbury" && $2 != "Africa/Blantyre" { print $2 }' | head -n 9 | sort In other words: look at all the Zones moved to backzone since 2021a by the current development database, in the order that 'git diff' mentions the Zone, and choose the first nine Zones (excluding Pacific/Kanton Pacific/Enderbury, which is part of a change not being disagreed about, and Africa/Blantyre, which was already in 'backzone'). I view this as an major concession on my part, as it will cause 2021b to fix the equity problem only part way, meaning that we'll need about four releases to finish the job if we make similarly-sized fixes in the future. However, this does address the concern about the number of changes by 2021b to pre-1970 timestamps, as we've dealt with changes of this size multiple times before. Plus, this approach restores 2021a's Europe/Oslo and Europe/Stockholm, the two Zones most-commonly mentioned. (I did not tweak the shell script just to save those two Zones for now.) The intent is to move forward with this work in later releases.
we really need a change for Samoa ASAP.
And we now get to throw Jordan into the mix.... I realize that there's not time to do careful review of this patch before the Samoa change needs to take effect. However, since this patch is simply a reversion, it should be OK. I have applied my usual tests to the resulting tzdb and it succeeds. This is pretty much how similar patches were tested back in 2014 etc., so this should be good to go. I realize that there will still be disagreement about the specific contents of the release 2021b that results from this patch (along with the usual administrivia). We can address that in a followup patch as quickly as may be, and continue to work forward to a better approach.
I guarantee that the name 2021a1 will break tooling and software at Apple. There are tools and OS software with regular expressions based on the documented nomenclature. We can’t release something with a name like that. Please stick to the documented nomenclature. Deborah
On Sep 24, 2021, at 1:08 AM, Jon Skeet via tz <tz@iana.org> wrote:
On Fri, 24 Sept 2021 at 08:51, Paul Eggert <eggert@cs.ucla.edu <mailto:eggert@cs.ucla.edu>> wrote: On 9/24/21 12:32 AM, Jon Skeet wrote:
Except that the name 2021a1 is*not* compatible with https://data.iana.org/time-zones/tz-link.html <https://data.iana.org/time-zones/tz-link.html>
I don't know of any software that will break due to the name 2021a1. If you know of one, we could issue 2021b and 2021c. I had already considered doing that but thought that it would more problematic than what I ended up proposing, for reasons that I hope are obvious.
Concrete issue with Android, from a mail from Almaz Mingaleev:
For Android having 2021a1 and 2021b would be inconvenient. Because there are hardcoded places which expect that tzdata version is exactly 5 characters. And we can't update that code along with time zone files.
(I acknowledge that there's already the potential for problems there if 2021aa is ever needed, but there would at least be fairly clear warning that that was coming - by the time we got to 2021p or so, I'd expect it to be looked at seriously.)
Concern, though less specific, from Derick Rethans
I can't remember the last time there was a number after the version letter (so 2004, at the latest), and none of the tooling that I've been involved with will know how to handle this.
Speculative concern from Florian Weimer:
I'm slightly worried that people have grown to depend on the \d+[a-z]+ format for version numbers, so this choice of version might break some things.
Known breakage reported by Paul Ganssle (the second sentence):
This is not exactly a guarantee, but 2021a1 does violate that nomenclature, which will likely break scripts that rely on it (I have scripts that actively assert that the version numbering follows this convention, for example).
So that's a mixture of "we know X and Y will break, and we think other things may do as well". Is that sufficient evidence to convince you that 2021a1 is problematic?
Making the equitable distribution look like an optional flaky branch is not the way to move forward. It shouldn't be considered optional because fairness ought to be one of our core principles. And it shouldn't be considered flaky because it's not flaky; we've done this sort of thing many times before without significant incident.
I wouldn't use the word "flaky" but I would say it's experimental, in that we don't genuinely know the impact of a change of this scale. I would suggest that we've done "this sort of thing" on a smaller scale.
To look at it another way: what's the absolute urgency here? If you just release 2021b as "2021a + Samoa" then we're basically in the position we were in before. If there's a pressing need for the "equitable distribution" to be released, then presumably there was before - but it hasn't been released. It feels to me like "the need to get onto the equitable distribution" (ideally with community consensus, which I think is lacking at the moment) and "the need to get the Samoa changes out" are orthogonal - whereas your proposed releases conflate the two.
Jon
Specifically, this is what is documented and this is what our software supports:
Since 1996, each version has been a four-digit year followed by lower-case letter (a through z, then za through zz, then zza through zzz, and so on)
Deborah
On Sep 24, 2021, at 1:01 PM, Deborah Goldsmith via tz <tz@iana.org> wrote:
I guarantee that the name 2021a1 will break tooling and software at Apple. There are tools and OS software with regular expressions based on the documented nomenclature. We can’t release something with a name like that. Please stick to the documented nomenclature.
Deborah
On Sep 24, 2021, at 1:08 AM, Jon Skeet via tz <tz@iana.org <mailto:tz@iana.org>> wrote:
On Fri, 24 Sept 2021 at 08:51, Paul Eggert <eggert@cs.ucla.edu <mailto:eggert@cs.ucla.edu>> wrote: On 9/24/21 12:32 AM, Jon Skeet wrote:
Except that the name 2021a1 is*not* compatible with https://data.iana.org/time-zones/tz-link.html <https://data.iana.org/time-zones/tz-link.html>
I don't know of any software that will break due to the name 2021a1. If you know of one, we could issue 2021b and 2021c. I had already considered doing that but thought that it would more problematic than what I ended up proposing, for reasons that I hope are obvious.
Concrete issue with Android, from a mail from Almaz Mingaleev:
For Android having 2021a1 and 2021b would be inconvenient. Because there are hardcoded places which expect that tzdata version is exactly 5 characters. And we can't update that code along with time zone files.
(I acknowledge that there's already the potential for problems there if 2021aa is ever needed, but there would at least be fairly clear warning that that was coming - by the time we got to 2021p or so, I'd expect it to be looked at seriously.)
Concern, though less specific, from Derick Rethans
I can't remember the last time there was a number after the version letter (so 2004, at the latest), and none of the tooling that I've been involved with will know how to handle this.
Speculative concern from Florian Weimer:
I'm slightly worried that people have grown to depend on the \d+[a-z]+ format for version numbers, so this choice of version might break some things.
Known breakage reported by Paul Ganssle (the second sentence):
This is not exactly a guarantee, but 2021a1 does violate that nomenclature, which will likely break scripts that rely on it (I have scripts that actively assert that the version numbering follows this convention, for example).
So that's a mixture of "we know X and Y will break, and we think other things may do as well". Is that sufficient evidence to convince you that 2021a1 is problematic?
Making the equitable distribution look like an optional flaky branch is not the way to move forward. It shouldn't be considered optional because fairness ought to be one of our core principles. And it shouldn't be considered flaky because it's not flaky; we've done this sort of thing many times before without significant incident.
I wouldn't use the word "flaky" but I would say it's experimental, in that we don't genuinely know the impact of a change of this scale. I would suggest that we've done "this sort of thing" on a smaller scale.
To look at it another way: what's the absolute urgency here? If you just release 2021b as "2021a + Samoa" then we're basically in the position we were in before. If there's a pressing need for the "equitable distribution" to be released, then presumably there was before - but it hasn't been released. It feels to me like "the need to get onto the equitable distribution" (ideally with community consensus, which I think is lacking at the moment) and "the need to get the Samoa changes out" are orthogonal - whereas your proposed releases conflate the two.
Jon
On 9/24/21 1:08 PM, Deborah Goldsmith via tz wrote:
Specifically, this is what is documented and this is what our software supports:
Yes, thanks to you (and to others) for bringing this up. It's become clear that we should stick to that naming convention for our next release. That being said, we will likely need more flexibility in the not-too-distant future. Please take a look at the email I sent to Almaz Mingaleev (who raised the same topic). https://mm.icann.org/pipermail/tz/2021-September/030715.html
AFAIK Apple software doesn’t have a problem with length, just the pattern, which currently follows the spec. If you propose changing it, there needs to be a new spec, and a considerable period of time (preferably a year) to adjust to it. Thanks, Deborah
On Sep 24, 2021, at 1:30 PM, Paul Eggert <eggert@CS.UCLA.EDU> wrote:
On 9/24/21 1:08 PM, Deborah Goldsmith via tz wrote:
Specifically, this is what is documented and this is what our software supports:
Yes, thanks to you (and to others) for bringing this up. It's become clear that we should stick to that naming convention for our next release.
That being said, we will likely need more flexibility in the not-too-distant future. Please take a look at the email I sent to Almaz Mingaleev (who raised the same topic).
https://mm.icann.org/pipermail/tz/2021-September/030715.html
On 9/24/21 7:32 PM, Deborah Goldsmith wrote:
AFAIK Apple software doesn’t have a problem with length, just the pattern, which currently follows the spec.
Yes, thanks for bringing that up. We should consider that in any version number variant we might want to use in the near future. If I read the spec correctly, the pattern Apple uses should be equivalent to the POSIX extended regular expression '[0-9]{4}z*[a-z]' in the C locale. Am I right about Apple's pattern? Also, does Apple software insist that version numbers must be in strict order? For example, does it require that the version after '2021zz' must be either '2021zza' or a version Ya where Y is a four-digit year greater than 2021? or could the next version after '2021zz' be anything matched by the abovementioned pattern? The spec is silent on this subject.
If you propose changing it, there needs to be a new spec, and a considerable period of time (preferably a year) to adjust to it.
Yes, quite so. Plus, the above details should be nailed down. (And the spec should be extended so that it clearly allows year numbers greater than 9999 - even though there should be *plenty* of time before that particular flexibility is needed....)
On Sat, 25 Sept 2021 at 06:45, Paul Eggert via tz <tz@iana.org> wrote:
On 9/24/21 7:32 PM, Deborah Goldsmith wrote:
AFAIK Apple software doesn’t have a problem with length, just the pattern, which currently follows the spec.
Yes, thanks for bringing that up. We should consider that in any version number variant we might want to use in the near future.
If I read the spec correctly, the pattern Apple uses should be equivalent to the POSIX extended regular expression '[0-9]{4}z*[a-z]' in the C locale. Am I right about Apple's pattern?
Also, does Apple software insist that version numbers must be in strict order? For example, does it require that the version after '2021zz' must be either '2021zza' or a version Ya where Y is a four-digit year greater than 2021? or could the next version after '2021zz' be anything matched by the abovementioned pattern? The spec is silent on this subject.
Might I suggest we consider creating a simply-sortable order that naturally allows for more than 26 releases? That could be as simple as starting with 2023aa, 2023ab...2023az, 2023ba, 2023bb... 2023bz, 2023ca. etc. I really hope we never have to cope with more than 676 releases in a year :)
If you propose changing it, there needs to be a new spec, and a considerable period of time (preferably a year) to adjust to it.
Yes, quite so. Plus, the above details should be nailed down. (And the spec
should be extended so that it clearly allows year numbers greater than 9999 - even though there should be *plenty* of time before that
particular flexibility is needed....)
I suspect that doing so is likely to be burdensome for very little benefit. I suspect that designing a filename format expected to be stable for longer than 7000 years is futile, whereas there are simplicity benefits in having a fixed year length. If there's ever a sub-group mailing list set up for this (or however we want to discuss it) I'd be happy to be part of it. Jon
On 9/24/21 10:56 PM, Jon Skeet via tz wrote:
Might I suggest we consider creating a simply-sortable order that naturally allows for more than 26 releases?
The current spec does do that, since it allows 2021za as the natural successor of 2021z, and this kind of sequence allows as many versions per year as you like, and is sortable using lexicographic order. A downside of the current approach is that the size of the version number grows linearly (not logarithmically) with the number of versions per year. I hope we don't run into a practical problem with that limitation.... Rearguard tarballs already start with lines like "# version 2021a-rearguard". That's not quite the same thing as the version number in a tzdb distribution tarball's name, although it's closely related.
Paul Eggert via tz said:
Might I suggest we consider creating a simply-sortable order that naturally allows for more than 26 releases?
The current spec does do that, since it allows 2021za as the natural successor of 2021z, and this kind of sequence allows as many versions per year as you like, and is sortable using lexicographic order.
A downside of the current approach is that the size of the version number grows linearly (not logarithmically) with the number of versions per year. I hope we don't run into a practical problem with that limitation....
I can't see us ever wanting to do more than one release a day on average (i.e. I can see the need to update a release on the same day, but that would be rare), so we want to allow for up to 366 releases per year. So why not change the sequence to go a to y, then zaa to zzz? That way you never have one ID being the prefix of another. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 9/25/21 2:59 PM, Clive D.W. Feather wrote:
So why not change the sequence to go a to y, then zaa to zzz? That way you never have one ID being the prefix of another.
I had also been considering "z followed by any sequence of lower-case-letters" as one way to extend the version-numbering scheme in practice, while continuing to conform to the specific wording of the current guidelines. Let's keep this idea in our back pocket in case it might be useful in the long term. In the short run, though, it sounds dubious due to the unfortunate Android compatibility issue Almaz mentioned in <https://mm.icann.org/pipermail/tz/2021-September/030621.html>, where Android allows only 5-character version numbers and so cannot go past "2021z".
On Fri, 24 Sept 2021 at 08:09, Paul Eggert via tz <tz@iana.org> wrote:
2021a1 will give you maximum stability and compatibility with 2021a, so you can use that if equity is not as much of a concern for you.
No we can't. Any downstream project based off the GitHub source repo will only see the tag on the main branch (2021b) and not get the stability. Any project with fixed version naming conventions (eg. Android) will not be able to adopt 2021a1. By making the choice you have, every downstream project that wants stability is forced to make some kind of change in the next 24 hours.
The equity issue was raised early this year, and we've delayed dealing with it for far too long already. Equity is a real issue of concern, and it's a bad look for us if we continue with a clearly-inequitable primary distribution when a fairer approach has long been implemented and available and nothing else is available.
Paul, please consider that I and others consider the tip of main branch to be considerably *less* equitable than 2021a. Until you can accept that your definition of equity is not the only one on this list we won't move forward.
This is mostly a disagreement about maintenance philosophy not end-user functionality, as the pre-1970 differences between 2021a1 and 2021b will be minor when considered from end users' point of view. We know this because we've made similar changes many times in previous releases.
I've already made clear that the tip of main would be utterly disastrous for Joda-Time users. Yesterday I was forced to make a release that tries to block adoption of your proposed 2021b, but unfortunately the new release won't be picked up by the very application teams that will be most affected. There is a big difference between merging the time zones of two African or Caribbean countries and merging the time zones of two European countries. Whether equitable or not, the reality is a lot more of the world's economy will be affected this time.
I'll be happy to collaborate on building something that will accommodate our philosophical differences in later releases
I am also happy to collaborate on a solution. But I sense absolutely no willingness on your part to provide the time and space to make that happen. Apart from Tim Parenti, I don't think any list member wants you to release 2021a1/2021b in the manner you proposed. Especially when there is a low risk alternative that allows us to progress matters more sensibly. Does that opposition really not matter at all? I believe the mailing list has spoken very clearly - move the current main to be a branch, reset main to 2021 and release 2021a with minimal Samoa changes (as 2021b). Then take a long weekend and then put forward proposed solutions. (I have two or three proposed solutions ready and waiting, but I don't want to publish until we are past the Samoa release). Stephen
On 9/24/21 1:33 AM, Stephen Colebourne via tz wrote:
There is a big difference between merging the time zones of two African or Caribbean countries and merging the time zones of two European countries. Whether equitable or not
It's clearly not equitable. We should not make special exceptions for Norway and Sweden while having China, southeast Asia, Africa, etc. follow the same rules as everyone else. There is no timekeeping justification for this; it's purely a political decision and it's a terrible look for us. I've just sent a suggestion that would back off many of the changes you're objecting to. I view this as being a big concession on my part, because I'll now have to defend making a gradual fix to the equity problem. Would that suggestion be acceptable to you? Here it is again, if you haven't seen it in the recent blizzard of emails:
OK, how about if I scale back the current round of link-merging, so that it's on the scale of what we've done in previous releases? I would not at all be happy with such an approach since it would delay the release of an equitable solution, but if this approach will help reach consensus I can prepare a patch along those lines. The idea would be to finish the job in the next few releases.
I really am trying to find a compromise here (even if it's a compromise that nobody likes :-). However, a compromise works only if the other side accepts it.
On Fri, 24 Sept 2021 at 09:58, Paul Eggert <eggert@cs.ucla.edu> wrote:
It's clearly not equitable. We should not make special exceptions for Norway and Sweden while having China, southeast Asia, Africa, etc. follow the same rules as everyone else. There is no timekeeping justification for this; it's purely a political decision and it's a terrible look for us.
I'm not disagreeing with the notion that Norway and Sweden should follow the same rules as everyone else. I am saying that current rules result in what I consider to be an inequitable outcome where Berlin is favoured over Oslo. I understand that you don't see that as inequitable, but please try to understand that I do. (The are also separate, but important downstream issues of stability and breakages that need handling in a more considered manner)
I've just sent a suggestion that would back off many of the changes you're objecting to. I view this as being a big concession on my part, because I'll now have to defend making a gradual fix to the equity problem. Would that suggestion be acceptable to you? Here it is again, if you haven't seen it in the recent blizzard of emails:
OK, how about if I scale back the current round of link-merging, so that it's on the scale of what we've done in previous releases? I would not at all be happy with such an approach since it would delay the release of an equitable solution, but if this approach will help reach consensus I can prepare a patch along those lines. The idea would be to finish the job in the next few releases.
I really am trying to find a compromise here (even if it's a compromise that nobody likes :-). However, a compromise works only if the other side accepts it.
I have a good final position state for tzdb in my head, but I don't want to write it to the list until everything is calmer. (My proposal meets both your and my equity viewpoints). My request is that 2021b contains no link merging so we can discuss things calmly over the next couple of weeks. You won't need to defend a gradual link-merge if we agree in advance on a long-term solution, which I hope you see as a benefit. (This also avoids the risks associated with an immediate fork, whether internal eg 2021a1 or external) Stephen
On Fri, 24 Sept 2021 at 10:15, Stephen Colebourne <scolebourne@joda.org> wrote:
OK, how about if I scale back the current round of link-merging, so that it's on the scale of what we've done in previous releases? I would not at all be happy with such an approach since it would delay the release of an equitable solution, but if this approach will help reach consensus I can prepare a patch along those lines. The idea would be to finish the job in the next few releases.
I really am trying to find a compromise here (even if it's a compromise that nobody likes :-). However, a compromise works only if the other side accepts it.
As an indication of good faith, I also commit to changing Joda-Time's two-decade long behaviour wrt treating Links as full aliases if we can delay the link merging and release a simple 2021a+Samoa/Jordan. Stephen
On 24.09.21 11:15, Stephen Colebourne via tz wrote:
On Fri, 24 Sept 2021 at 09:58, Paul Eggert <eggert@cs.ucla.edu> wrote:
It's clearly not equitable. We should not make special exceptions for Norway and Sweden while having China, southeast Asia, Africa, etc. follow the same rules as everyone else. There is no timekeeping justification for this; it's purely a political decision and it's a terrible look for us. I'm not disagreeing with the notion that Norway and Sweden should follow the same rules as everyone else. I am saying that current rules result in what I consider to be an inequitable outcome where Berlin is favoured over Oslo. I understand that you don't see that as inequitable, but please try to understand that I do.
(The are also separate, but important downstream issues of stability and breakages that need handling in a more considered manner)
I've just sent a suggestion that would back off many of the changes you're objecting to. I view this as being a big concession on my part, because I'll now have to defend making a gradual fix to the equity problem. Would that suggestion be acceptable to you? Here it is again, if you haven't seen it in the recent blizzard of emails:
OK, how about if I scale back the current round of link-merging, so that it's on the scale of what we've done in previous releases? I would not at all be happy with such an approach since it would delay the release of an equitable solution, but if this approach will help reach consensus I can prepare a patch along those lines. The idea would be to finish the job in the next few releases. I really am trying to find a compromise here (even if it's a compromise that nobody likes :-). However, a compromise works only if the other side accepts it. I have a good final position state for tzdb in my head, but I don't want to write it to the list until everything is calmer. (My proposal meets both your and my equity viewpoints).
Before you put out a "good final position", could you please respond to Paul on his compromise proposal. Eliot
On Fri, 24 Sept 2021 at 11:05, Eliot Lear via tz <tz@iana.org> wrote:
Before you put out a "good final position", could you please respond to Paul on his compromise proposal.
I think I did just that. The compromise of a short delay is good. The compromise position of smaller chunks of link-merging is unnecessary if we can agree on a better alternate solution. Once a better approach is agreed we can lay down a specific plan in advance to roll it out, effectively side-stepping the negative aspects of multiple separate link-merges. As such, it makes no sense to have any link merging in 2021b. (Paul's compromise position is unclear as to whether he intends to have no link-merging in 2021b, or just a smaller amount. Given the immediate damage a link-merge causes Joda-Time's millions of users, I don't have the ability to compromise on the contents of 2021b wrt link-merging. But I do have the ability to seek a consensus solution that can be rolled out in a planned manner, even if that requires changes to Joda-Time.) Stephen
The Unicode ICU team discussed the proposed changes in the TZDB in their meeting earlier this week and we are reporting the consensus here. This is an initial report, since time is short. Members are very concerned about the downstream impact, and the inevitable compatibility mismatches between different implementations. While the pre-1970 data may not seem important to some people, the instability caused by its removal can be considerable, and last for years to come. Even if the TZDB provides a way to produce data compatible with 2021a or before by option, this may introduce confusion. For example, an OS packager may pick a default data package with pre-1970 rules merged, while a library packager like ICU may pick a variant with pre-1970 data preserved. Previously, multiple implementations used a single data so there is general consistency. With the proposed plan, there could be differences in results before 1970 between multiple implementations, causing problems everywhere - e.g. Linux and Java, ICU and Linux, etc. If the change is made, here are the probable steps that would happen in ICU, based on the two areas that would be affected. *1. Dropping zone IDs from the zone.tab.*The main impact here is that a lot of implementations rely on the mapping of zone IDs to ISO country codes. ICU already has an internal exception table that contains certain (zone IDs, ISO code) mappings that retains information that used to be in zone.tab. We would extend that table to add all of the zones dropped by the proposed change. We would probably also move the data and the rest of zone.tab to CLDR, so that we have a public, structured set of data in XML and JSON. This would effectively clone the zone.tab data. That way, implementations could use the zone.tab information to maintain the difference between Europe/Oslo and Europe/Berlin. That is, while the internal software might map Europe/Oslo to Europe/Berlin via a Link to get rules for evaluation, the library would still treat Europe/Oslo as a separate ID from Europe/Berlin. *2. Removing the pre-1970 rules*Or rather, moving the pre-1970 data into a file that is mixed in with other data that is not currently used. ICU doesn't want to get into the business of maintaining a fork of the TZDB, but if another major industry player took that role on, then ICU would consider adopting it so that the data is maintained. Mark Davis, Unicode Consortium President and Chair of CLDR Yoshito Umaoka, Vice Chair of ICU On Fri, Sep 24, 2021 at 3:38 AM Stephen Colebourne via tz <tz@iana.org> wrote:
On Fri, 24 Sept 2021 at 11:05, Eliot Lear via tz <tz@iana.org> wrote:
Before you put out a "good final position", could you please respond to Paul on his compromise proposal.
I think I did just that.
The compromise of a short delay is good. The compromise position of smaller chunks of link-merging is unnecessary if we can agree on a better alternate solution. Once a better approach is agreed we can lay down a specific plan in advance to roll it out, effectively side-stepping the negative aspects of multiple separate link-merges. As such, it makes no sense to have any link merging in 2021b.
(Paul's compromise position is unclear as to whether he intends to have no link-merging in 2021b, or just a smaller amount. Given the immediate damage a link-merge causes Joda-Time's millions of users, I don't have the ability to compromise on the contents of 2021b wrt link-merging. But I do have the ability to seek a consensus solution that can be rolled out in a planned manner, even if that requires changes to Joda-Time.)
Stephen
[retitling from "Re: [tz] Some thoughts about the way forward"] On 9/24/21 6:49 AM, Mark Davis ☕ wrote:
The Unicode ICU team discussed the proposed changes in the TZDB in their meeting earlier this week and we are reporting the consensus here.
Thanks for taking the time to write about this, as I have not had the time to follow how ICU deals with timezones. I am fuzzy, for example, on the relationship between CLDR and ICU when it comes to timezone data.
Members are very concerned about the downstream impact, and the inevitable compatibility mismatches between different implementations.
Yes, and similar concerns were expressed by others. We eventually muddled through by generating just one new tzdb version, 2021b, which nobody really likes but which I hope avoids a tzdb fork for now.
If the change is made, here are the probable steps that would happen in ICU, based on the two areas that would be affected.
*1. Dropping zone IDs from the zone.tab.*
Which zone.tab is this? I didn't see a zone.tab in the ICU4C 69.1 source or data tarballs. The ICU4C data tarball has a file data/misc/zoneinfo64.txt that contains zoneinfo64:table(nofallback) defining Names as an array of tzdb Zone and Link names and some other strings; is that's what is meant by zone.tab?
implementations rely on the mapping of zone IDs to ISO country codes. ICU already has an internal exception table that contains certain (zone IDs, ISO code) mappings that retains information that used to be in zone.tab. We would extend that table to add all of the zones dropped by the proposed change.
I'm not quite following, since no names in 2021a were dropped in 2021b. All that happened is that some names were changed from Zones to Links. Although I'm probably barking up the wrong tree, I don't see why the abovementioned Names array would need to worry about Zone-to-Link changes. For example, America/Creston was changed from a Zone in 2021a to a Link in 2021b, but the Names array contains both Zones and Links so its "America/Creston" entry should not need to change.
We would probably also move the data and the rest of zone.tab to CLDR, so that we have a public, structured set of data in XML and JSON. This would effectively clone the zone.tab data.
Sorry, I'm a bit lost here too. Isn't ICU data mostly sourced from CLDR? That's what <https://unicode-org.github.io/icu/userguide/icu_data/#icu-and-cldr-data> implies.
That way, implementations could use the zone.tab information to maintain the difference between Europe/Oslo and Europe/Berlin. That is, while the internal software might map Europe/Oslo to Europe/Berlin via a Link to get rules for evaluation, the library would still treat Europe/Oslo as a separate ID from Europe/Berlin.
That sounds reasonable. What did ICU and/or CLDR do when tzdb made similar Zone-to-Link changes in previous tzdb releases? For example, Australia/Currie was changed from a Zone in 2020d to a Link in tzdb 2020e. Is there any reason for ICU to treat 2021b's Zone-to-Link changes differently than it treated 2020e's similar change?
Paul Eggert via tz said:
It's clearly not equitable. We should not make special exceptions for Norway and Sweden while having China, southeast Asia, Africa, etc. follow the same rules as everyone else. There is no timekeeping justification for this; it's purely a political decision and it's a terrible look for us.
Who is pushing you to do this urgently? Because I don't see anyone on the list doing so. Earlier this year I had to go through an "appropriate language" change to a 3200 page document which was *not* just replacing "master" and "slave" with new (and in my opinion better) terms. It was a big exercise and nobody expected it to happen in a week. Without prejudice to the meaning of "equitable" in this context (though I'd like to see a definition in TZDB terms), provided that there is a clear programme to get the database into an equitable form, any reasonable person will accept that it's not trivial and may take some time to complete, particularly because we don't know the effects of these changes on our downstream users. I am going to add myself to those who are saying that 2021b should be 2021a plus the Samoa changes *only*. Bring your present top of tree to be the draft 2021c and let's all talk about where we go after the weekend. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
By the fact that you are completely dodging the question ("A person in Kenya will be better off by having Oslo merged with Berlin because:..."), my only conclusion can be that in fact you have no answer; that you cannot spell out a concrete case where a person in Kenya is disadvantaged by the merger of Oslo with Berlin. Why make a change if it doesn't help anyone, and has the potential to do considerable damage? Secondly, your analogy with COVID is even more misplaced. Instead of "If we give COVID-19 shots to people in San Francisco but not Los Angeles, purely for reasons unrelated to public health...", what you are proposing is analogous to *de-vaccinating* all the people in San Francisco so that they are on the same level as Los Angeles. That is a pretty extreme form of "equity". I'm sure you're trying to do the right thing here, but the nearly unanimous response to your proposal should give you pause, and give us all time to consider issues that have been raised. During my career I've seen many cases where it seemed that some small quick change would have some benefit, but it had to be retracted when it blew up in our faces. Being a core piece of technology for all computers, mobile phones, etc. with many, many players all needing to work in concert so that everything interoperates is a heavy responsibility, and not one to be taken lightly. Please don't rush into this. Mark On Fri, Sep 24, 2021 at 12:09 AM Paul Eggert <eggert@cs.ucla.edu> wrote:
On 9/23/21 9:00 PM, Mark Davis ☕ wrote:
My chief concern is instability and incompatibility
2021a1 will give you maximum stability and compatibility with 2021a, so you can use that if equity is not as much of a concern for you.
why is it so very, very important to make this change right now
The equity issue has been on the table for months, no other approach has been developed or tested, and the only other approaches proposed would be less stable and compatible than the already built-and-tested 2021b would be.
The equity issue was raised early this year, and we've delayed dealing with it for far too long already. Equity is a real issue of concern, and it's a bad look for us if we continue with a clearly-inequitable primary distribution when a fairer approach has long been implemented and available and nothing else is available.
This is mostly a disagreement about maintenance philosophy not end-user functionality, as the pre-1970 differences between 2021a1 and 2021b will be minor when considered from end users' point of view. We know this because we've made similar changes many times in previous releases.
I'll be happy to collaborate on building something that will accommodate our philosophical differences in later releases, and have already proposed specific (though not-yet-installed) working code that goes a long way toward doing that. Having had some experience with writing and testing that code, I have confidence that this technical approach will succeed if the community wants to work together on this. Of course there will be issues - among other things, the at-least-one-Zone-per-country-code philosophy is even more unstable/incompatible than 2021b will be - but they're clearly solvable.
On 9/24/21 7:06 AM, Mark Davis ☕ wrote:
By the fact that you are completely dodging the question ("A person in Kenya will be better off by having Oslo merged with Berlin because:..."), my only conclusion can be that in fact you have no answer
Yes I do have an answer. Fairness is not about whether A would be better off if a change is made. It's about how A is treated relative to B. If the US government routinely made payments of $1000 to every registered Republican voter, that'd be unfair regardless of the fact that Democrats would live their lives just as before. It's pretty weird that I have to explain this, to be honest.
Not sure if my previous msg went through, apologies if this is a duplicate. Paul Eggert via tz <tz@iana.org> writes:
On 9/24/21 7:06 AM, Mark Davis ☕ wrote:
By the fact that you are completely dodging the question ("A person in Kenya will be better off by having Oslo merged with Berlin because:..."), my only conclusion can be that in fact you have no answer
Yes I do have an answer. Fairness is not about whether A would be better off if a change is made. It's about how A is treated relative to B.
I'm trying my best to understand your point of view. Although I would prefer the tzdb to adopt more of a linux kernel philosophy of never breaking userspace as first principle, I don't see how this is promoting fairness either. Perhaps I read too many of Abraham Lincoln's works when I was younger, but I've always thought of fairness as equal opportunity rather than absolute equality. In that context: 1) Everyone in the world has the opportunity to add zones and data. 2) No one is at a systematic disadvantage to doing so. I'm having trouble seeing where tzdb is being unfair to anyone at all under this definition. Some volunteer added historical zones for X, Y, Z. Other volunteers can add zones for places that they are interested in. Everyone has equal opportunity to do so. As far as I can see right now, this change seems more unfair to people who have done the work to investigate and provide the data for certain zones. Perhaps I'm naive to the situation, but using an equal opportunity definition of fairness seems pretty apolitical to me and doesn't seem like there's a systematic disadvantage to anyone? I'm sure there would be different areas of concern, but perhaps the delta could be smaller. And, of course, perhaps I missed some relevant examples among the many emails. Under the absolute equality definition of fairness, which I believe you are going for, there are various issues with data stability. Are there any reasons not to adopt the definition of fairness being equal opportunity rather than absolute equality? That could perhaps maximize both stability and fairness, and also keep the focus on improving the data. I've been trying to understand the situation better, could someone point me to a the relevant policy change document / rationale? I'm unfamiliar of where to look.
On 9/24/21 9:03 AM, TJ wrote:
I'm having trouble seeing where tzdb is being unfair to anyone at all under this definition. Some volunteer added historical zones for X, Y, Z.
That volunteer was me.
Other volunteers can add zones for places that they are interested in. Everyone has equal opportunity to do so.
We can't have a free-for-all in which anyone can add a Zone for (say) Kosciusko County, Indiana to the primary database on the grounds that it differed from Indianapolis back in 1906. Even covering Indiana alone would require hundreds of Zones, the data would be practically impossible to verify, and the overall utility to end users would be negative (due to the resulting complication and confusion).
As far as I can see right now, this change seems more unfair to people who have done the work to investigate and provide the data for certain zones.
The equity issues are not about tzdb contributors. They're about whether tzdb is fair (and appears to be fair) to users. And even if one is truly worried about contributors, I contributed the vast majority of Zones. Tim Parenti has also contributed recently. We don't at all mind having our contributions be in 'backzone' rather than in some other file. I expect future contributors will be similar.
could someone point me to a the relevant policy change document / rationale?
The current guidelines are here: https://data.iana.org/time-zones/theory.html#naming
On Sep 24, 2021, at 1:27 PM, Paul Eggert via tz <tz@iana.org> wrote:
We can't have a free-for-all in which anyone can add a Zone for (say) Kosciusko County, Indiana to the primary database on the grounds that it differed from Indianapolis back in 1906. Even covering Indiana alone would require hundreds of Zones, the data would be practically impossible to verify, and the overall utility to end users would be negative (due to the resulting complication and confusion).
And if all those Zones were added (this is, as I understand it, *adding* Zones, *not* converting Links to Zones), would Apple, OpenStreetMaps, Ubuntu (assuming they don't just use OpenStreetMaps and the tz shape files generated from them), etc. start using them? Apple might be using some backzone entries *now*, but I'm not sure they'd sign up to set up their city list and map information to handle all of them. Perhaps a policy of, at minimum, "we will not create any new Zones for pre-1970 differences", if that's not *already* a firm policy, would make it clearer that tzdb isn't "the" time zone database for all of time, it's only going to be "the" time zone database for 1970 and beyond, and if anybody needs correct information on time offsets prior to 1970 for all locations in a given area, they're welcome to develop it and maintain it themselves.
On Thu, Sep 23, 2021 at 6:07 PM Paul Eggert <eggert@cs.ucla.edu> wrote:
Not really. We've done this several times before, and the compatibility issues were negligible.
As a user that will be affected by this, I am more concerned by the practical aspects. I make extensive use of the tzdata between 1900-1970 and have data that depends on the stability of the offset calculation from the timezone stored in the form of America/Los_Angeles, etc. I have never used the backzone file, so I do not want to import extensive amounts of extraneous zones, but I would like to keep the newest changes post-1970. How would I be able to re-build the exact tz data from 1900-1970 that is currently available with future incremental changes? Without knowing that, I will have to maintain my own fork forever and I'm sure others will, too.
If you want to maximize data stability under the constraint of being fair
Do I really have to explain this? If we give COVID-19 shots to people in San Francisco but not Los Angeles, purely for reasons unrelated to public health, we are being unfair even though Los Angelenos' lives will be not be made worse - they will die off at the same rate as before.
I'm not sure I understand. Although I would prefer the tzdb to adopt more of a linux kernel philosophy of never breaking userspace as first principle, I don't see how this is promoting fairness either. Perhaps I read too many of Abraham Lincoln's works when I was younger, but I've always thought of fairness as equal opportunity rather than absolute equality. In that context: 1) Everyone in the world has the opportunity to add zones. 2) No one is at a systematic disadvantage to doing so. I'm not particularly interested in a philosophical discussion about fairness, which is much better suited to an in-person conversation, but I just can't see how tzdb is being unfair to anyone at all. Some volunteer added historical zones for X, Y, Z. Other volunteers can add zones for places that they are interested in. Everyone has equal opportunity to do so. As far as I can see right now, this change seems more unfair to people who have done the work to investigate and provide the data. Is there any reason not to adopt the definition of fairness being equal opportunity? That would maximize both stability and fairness, and also keep the focus on improving the data.
On Sep 23, 2021, at 10:25 PM, TJ via tz <tz@iana.org> wrote:
As a user that will be affected by this, I am more concerned by the practical aspects. I make extensive use of the tzdata between 1900-1970 and have data that depends on the stability of the offset calculation from the timezone stored in the form of America/Los_Angeles, etc.
I have never used the backzone file, so I do not want to import extensive amounts of extraneous zones, but I would like to keep the newest changes post-1970.
"I have never used the backzone file" means that you are currently getting Toronto rules for America/Montreal, so presumably that's OK. And presumably you have only been using that data since 2015-04-11 or so, otherwise you *didn't* have stability for the duration of whatever use you have - prior to the 2015c release, America/Montreal was in the northamerica file rather than in the backzone file. (America/Montreal is not the only item that moved from the main files to backzone in the past, so "America/Montreal isn't a problem" doesn't necessarily mean that there are no problems.) Note also that if you want stability of the offset calculation, and somebody else wants errors in pre-1970 timezone information (which may well be present in that information), one of you will have to lose.
On Sep 24, 2021, at 8:06 PM, Guy Harris via tz <tz@iana.org> wrote:
Note also that if you want stability of the offset calculation, and somebody else wants errors in pre-1970 timezone information (which may well be present in that information), one of you will have to lose.
...and note that you might have had to avoid 2021b *even if it didn't move anything to backzone*; to quote Paul's announcement:
Correct many pre-1993 transitions, fixing entries originally derived from Shanks, Whitman, and Mundell. The fixes include: - Barbados: standard time was introduced in 1911, not 1932; and DST was observed in 1942-1944 - Cook Islands: In 1899 they switched from east to west of GMT, celebrating Christmas for two days. They (and Niue) switched to standard time in 1952, not 1901. - Guyana: corrected LMT for Georgetown; the introduction of standard time in 1911, not 1915; and corrections to 1975 and 1992 transitions - Kanton: uninhabited before 1937-08-31 - Niue: only observed -11:20 from 1952 through 1964, then went to -11 instead of -11:30 - Portugal: DST was observed in 1950 - Tonga: corrected LMT; the introduction of standard time in 1945, not 1901; and corrections to the transition from +12:20 to +13 in 1961, not 1941 Additional fixes to entries in the 'backzone' file include: - Enderbury: inhabited only 1860/1885 and 1938-03-06/1942-02-09 - The Gambia: 1933 and 1942 transitions - Malawi: several 1911 through 1925 transitions - Sierra Leone: several 1913 through 1941 transitions, and DST was NOT observed in 1957 through 1962 (Thanks to P Chan, Michael Deckers, Alexander Krivenyshev and Alois Treindl.)
Until we reach a fixed point, where all the past tzdb data matches past reality, *if* we ever reach such a fixed point, for past data - and perhaps especially pre-1970 data - "stability" is unlikely to be the tzdb's middle name.
Paul Eggert <eggert@cs.ucla.edu> writes:
If you want to maximize data stability under the constraint of being fair, then the current development repository beats all other proposals I've seen so far.
Several other people have already made this point in varying words, but: why are you so insistent that the only way to improve fairness is to make the default contents of tzdb strictly worse? Why not strive to make it strictly better, instead? I will agree that there's some room for debate as to whether enabling all of backzone by default is "strictly better". Some of the data in it is probably wrong. But a lot of it is probably right, too --- in particular, a whole lot of what you shoved in there since 2021a is very well attested. In any case, the data that we are currently substituting by default is *certainly* wrong. Moreover, getting that data back into mainstream circulation would improve our chances of finding and fixing remaining errors. I just finished looking through git-tip backzone to get a better idea of exactly what's in there. I count 113 non-commented Zones (up from 82 in 2021a, so this has been a rather large expansion of that category). Of those, only 13 have comments questioning their veracity: Africa/Douala Africa/Malabo Africa/Porto-Novo America/Creston America/Montreal Antarctica/Vostok Asia/Hanoi Asia/Vientiane Europe/Luxembourg Indian/Cocos Pacific/Chuuk Pacific/Enderbury Pacific/Midway There's also America/Rosario, which seems more "superseded by other entries" than "wrong", though I've not traced it closely. So it's pretty clear that a lot of what is in backzone is not there because we have any reason to doubt it. (I am here rejecting the proposition that "if the only source is Shanks then it's probably wrong". You need evidence to call an entry probably wrong.) Perhaps we ought to subdivide backzone more finely. I'm now thinking about a three-tier classification of zones: Class A: in-scope per the 1970 cutoff rule. Included in all builds of tzdb. Class B: out-of-scope per the cutoff rule, but we have no reason to doubt correctness. Included in the default build, but perhaps we could offer an easy way to exclude these. Class C: out-of-scope and there is evidence that it might be wrong. Not included by default, needs a build choice to include. Class C would initially be the zones I listed above, but new evidence could cause zones to move to another class. In any case, I am firmly of the opinion that link-merging is a horrid idea and we should get rid of it, not do more of it. If a given build does not contain the best data we have for a zone, it should not define that zone at all, rather than substitute false data. The path you are currently on is inevitably going to lead to significant populations of systems offering different definitions of these zones than other systems do, and that is going to be a mess. regards, tom lane
On 9/24/21 7:29 AM, Tom Lane wrote:
why are you so insistent that the only way to improve fairness is to make the default contents of tzdb strictly worse?
First, it's not strictly worse. The stuff in 'backzone' is lower quality, and adding lower-quality data is not strictly an improvement. Accuracy and sourcing has been important for the success of tzdb, and we shouldn't lose sight of that by incorporating a bunch of questionable data. Second, supporting unnecessary pre-1970 data is more work for maintainers, repackagers and (most importantly) end users for almost zero benefit. The timestamps in question are unimportant for almost every use of this database. Third, the few people who try to use these old timestamps are typically doing it wrong. The old data are magnets for errors. For example, the very few people who use Europe/Berlin naturally think that Europe/Berlin is reliable for old German timestamps. They're quite wrong: Europe/Berlin is wrong for most of Germany. This continued focus on pre-1970 timestamps is unhealthy for tzdb. It's taken waaaaayy to much of our time. We really need to tone it down.
On 9/24/21 8:04 AM, Paul Eggert via tz wrote:
For example, the very few people who use Europe/Berlin naturally think that Europe/Berlin is reliable for old German timestamps.
I meant to write "the very few people who use Europe/Berlin for pre-1970 timestamps ...". Sorry about the error.
Paul Eggert <eggert@cs.ucla.edu> writes:
On 9/24/21 7:29 AM, Tom Lane wrote:
why are you so insistent that the only way to improve fairness is to make the default contents of tzdb strictly worse?
First, it's not strictly worse. The stuff in 'backzone' is lower quality, and adding lower-quality data is not strictly an improvement. Accuracy and sourcing has been important for the success of tzdb, and we shouldn't lose sight of that by incorporating a bunch of questionable data.
That argument was perhaps true before May. There is no justification for calling much of what you moved in May "lower quality".
Second, supporting unnecessary pre-1970 data is more work for maintainers, repackagers and (most importantly) end users for almost zero benefit. The timestamps in question are unimportant for almost every use of this database.
These arguments might be a reason for removing pre-1970 data altogether. They are not a reason for simply moving it to another file (and, in a default build, replacing it with clearly-inferior data that end users cannot easily tell is wrong). That does not improve anything. regards, tom lane
On 2021-09-24 15:04, Paul Eggert via tz wrote:
The stuff in 'backzone' is lower quality, and adding lower-quality data is not strictly an improvement.
Not true: Europe/Belfast, Europe/Guernsey, Europe/Jersey, Europe/Isle_of_Man are not of "lower quality", and in all the recent NEWS comments for moves of timezones into backzone, the move was never justified by the poor quality of its data. Michael Deckers.
On 9/24/21 1:28 PM, Michael H Deckers via tz wrote:
The stuff in 'backzone' is lower quality, and adding lower-quality data is not strictly an improvement.
Not true: Europe/Belfast, Europe/Guernsey, Europe/Jersey, Europe/Isle_of_Man are not of "lower quality",
You're correct, I should have written "is often of lower quality".
Paul Eggert via tz <tz@iana.org> writes:
On 9/24/21 1:28 PM, Michael H Deckers via tz wrote:
The stuff in 'backzone' is lower quality, and adding lower-quality data is not strictly an improvement.
Not true: Europe/Belfast, Europe/Guernsey, Europe/Jersey, Europe/Isle_of_Man are not of "lower quality",
You're correct, I should have written "is often of lower quality".
I think the current hoo-hah has been brought on precisely by shoving stuff into backzone despite there *not* being an argument that it's of poor quality. I still think there's room to resolve the unhappiness by adopting a three-way classification such as I suggested upthread. There's surely room to negotiate where the boundary between the "in by default" and "out by default" groups falls. It seems you'd prefer a strict standard, akin to "in only if it's documented to a level similar to the in-scope zones". Unfortunately, that seems like a pretty squishy standard, since there are in-scope zones with only the scantiest of documentation (South Georgia and Suriname being the first couple of examples I came across). I think the rule I suggested, "in unless there are documented concerns about correctness", would be a lot simpler to apply and easier to defend. regards, tom lane
On 9/24/21 1:51 PM, Tom Lane wrote:
I think the rule I suggested, "in unless there are documented concerns about correctness", would be a lot simpler to apply and easier to defend.
Having gone through the thought experiment of applying that rule to the data I help maintain, I disagree. For one thing, it's quite backwards about the burden of proof. Essentially the rule you're suggesting is "Here's a Zone; it should go in unless someone can prove me wrong." And even if that were fixed, I see waaayy more opportunities for debate and disagreement if the standards for including data become more subjective. I'm not saying we can't change the guidelines. But it'd need to be done carefully (and after the next release; I'm a bit rushed now), and any revision to be practical would have to be a lot more selective than what you're suggesting.
participants (11)
-
Clive D.W. Feather -
Deborah Goldsmith -
Eliot Lear -
Guy Harris -
Jon Skeet -
Mark Davis ☕️ -
Michael H Deckers -
Paul Eggert -
Stephen Colebourne -
TJ -
Tom Lane