Paul Eggert <eggert@cs.ucla.edu> writes:
If you want to maximize data stability under the constraint of being fair, then the current development repository beats all other proposals I've seen so far.
Several other people have already made this point in varying words, but: why are you so insistent that the only way to improve fairness is to make the default contents of tzdb strictly worse? Why not strive to make it strictly better, instead? I will agree that there's some room for debate as to whether enabling all of backzone by default is "strictly better". Some of the data in it is probably wrong. But a lot of it is probably right, too --- in particular, a whole lot of what you shoved in there since 2021a is very well attested. In any case, the data that we are currently substituting by default is *certainly* wrong. Moreover, getting that data back into mainstream circulation would improve our chances of finding and fixing remaining errors. I just finished looking through git-tip backzone to get a better idea of exactly what's in there. I count 113 non-commented Zones (up from 82 in 2021a, so this has been a rather large expansion of that category). Of those, only 13 have comments questioning their veracity: Africa/Douala Africa/Malabo Africa/Porto-Novo America/Creston America/Montreal Antarctica/Vostok Asia/Hanoi Asia/Vientiane Europe/Luxembourg Indian/Cocos Pacific/Chuuk Pacific/Enderbury Pacific/Midway There's also America/Rosario, which seems more "superseded by other entries" than "wrong", though I've not traced it closely. So it's pretty clear that a lot of what is in backzone is not there because we have any reason to doubt it. (I am here rejecting the proposition that "if the only source is Shanks then it's probably wrong". You need evidence to call an entry probably wrong.) Perhaps we ought to subdivide backzone more finely. I'm now thinking about a three-tier classification of zones: Class A: in-scope per the 1970 cutoff rule. Included in all builds of tzdb. Class B: out-of-scope per the cutoff rule, but we have no reason to doubt correctness. Included in the default build, but perhaps we could offer an easy way to exclude these. Class C: out-of-scope and there is evidence that it might be wrong. Not included by default, needs a build choice to include. Class C would initially be the zones I listed above, but new evidence could cause zones to move to another class. In any case, I am firmly of the opinion that link-merging is a horrid idea and we should get rid of it, not do more of it. If a given build does not contain the best data we have for a zone, it should not define that zone at all, rather than substitute false data. The path you are currently on is inevitably going to lead to significant populations of systems offering different definitions of these zones than other systems do, and that is going to be a mess. regards, tom lane