Stephen Colebourne wrote:
ubiquity is a key value of the data. The same data is used everywhere from Unix to Java to mobile phones.
No, it's pretty routinely filtered before it hits many platforms. One example: QNX has unsigned time_t, which by design filters out all data before 1970. Furthermore, there is an inevitable delay in propagating changes to the field. Even if we're talking a single host with 64-bit signed time_t (so that it matches Java's 'long'), I've seen situations where Java's copy of the data disagree with the POSIX copy. And certainly a distributed application cannot assume ubiquity, as the client and server may be updated at different times. So, for various reasons unrelated to the proposed changes, it's already the case that applications cannot assume that the data are ubiquitous and that the same data are used everywhere. That's not to say that we should introduce changes merely for the sake of changes; far from it. I agree with you that stability is a good property. But we shouldn't be inhibited from change because of the goal of having the data be the same everywhere. That goal is unattainable, and always has been.
I'm not speaking on behalf of myself, but on behalf of Java development generally.
These comments would have more weight if they pointed to user problems that occurred when we made similar changes in the past. Based on my experience I'm skeptical that there were significant user problems. I've asked the list for reports of problems but nobody else has reported problems either. This suggests that the concerns are misplaced. On this list I have also noted that the changes promise to make life easier for users in some cases, by omitting irrelevant choices. This is a real advantage that should trump stability concerns.
the leading supporters of Paul's approach are from an academic background (Paul, Guy, yourself)
This appears to be based on a misconception. I won't speak for Guy and Russ, but my career has been spent more in industry than in academia. I developed most of the tz database while in industry: I worked on enterprise software, and built several distributed applications involving many clients and using the tz database. I am attempting to use the tz maintenance practices that I used while in industry.
the recent batch of changes are far in excess of what has happened over previous years.
Sometimes I get up the energy to fix things. Often I don't. (Let's not be looking at gift horses in the mouth. :-)
zone ID merging that loses the start date of offsets or abbreviations, even if those are guesswork/invented (because the replacement is not an enhancement, its worse).
I've had quite a bit of experience in dealing with the Shanks data. From my experience the proposed change is a fairer representation of what we know than the previous version was. You're right that we don't know that the new version is correct and the old is wrong (both are guesses), but it's not right to say that the new version is worse.