Stephen Colebourne via tz said:
Following on from the previous thread [1] I wanted to try and classify the IDs we have, which may or may not identify missing IDs.
Again, please avoid talking about pre-1970 data at this point.
Obsolete ------------ IDs that are obsolete and should never be used. They date from many years ago whe tzdb was just starting. Yet these still do appear in downstream UIs even today (of course UIs should not use the tzdb ID list, but in reality lots do).
Examples: Portugal, NZ-CHAT, Navajo, Libya.
Proposal: Provide 3-6 months notice, then move obsolete IDs to a new file "obsolete" which downstream projects are strongly encouraged not to include. (I would argue that the time has come to properly remove these IDs, which are very inconsistent in terms of which are provided and which not, eg Portugal, but not Spain)
Counter-proposal. These should be treated as renamings. So Portugal -> Europe/Lisbon and treated like your next category. I presume that these are the same as some canonical zone since 1970. Pre-1970 data in these should be treated however we decide to treat pre-1970 data.
Deprecated, same location ------------------------------------ IDs that have been deprecated with a single clear alternative ID being provided. Both IDs represent the same physical location/city.
Spelling changes: Asia/Katmandu (replaced by Asia/Kathmandu), Asia/Rangoon (replaced by Asia/Yangon)
ID structure changes: America/Louisville (replaced by America/Kentucky/Louisville)
Proposal: Ensure all of these are in `backward` Consider: Is there any way to move these IDs to the obsolete file? Maybe after 5 years? Or do we just accept backwards compatibility restrictions on these?
Or make the information available (and possibly tools) to allow downstreams to decide their policy on these. For example, a file that said: Asia/Rangoon Asia/Yangon rename 2005-11-26 (or whatever the actual rename date was). The explicit "rename" there allows this file to show other things, such as merges of zones that only differ pre-1970: Europe/Oslo Europe/Berlin merge 2020-12-31 (or "merge-pre-1970").
Legally described mega-zones ----------------------------------------- IDs for locations where a federal or supra-national body defines rules, eg the EU or US DOT.
Examples: US/Mountain, CET, WET
Consider: Can we write down a rule to identify when something like this should be included? Then move the matching IDs to the main files (eg. are the EU and US DOT the only two examples here?)
The EU doesn't define "CET" or "WET", or even specify the names. The EU specifies constraints on the rule for the zones that cover the places that follow EU rules. So "CET" is not a zone; it's a collection of zones that have been in step since early 1983 or whatever later date they joined the collection. "WET", incidentally, starts from 1998-03-29. Nothing in EU law stops a country moving from "CET" to "WET" or "EET". And these names do not appear anywhere I can find in the legislation (it was "Member States belonging to the zero time zone and the other Member States" and later became "the Member States apart from Ireland and the United Kingdom, on the one hand, and Ireland and the United Kingdom, on the other", which is not the same division; this was probably when Portugal joined). So, since these don't describe "places keeping the same time since 1970", what exactly are they and why do we have them? (I suspect that US/Mountain has a similar problem in that not everywhere in Mountain time observes DST.)
Regions ----------- IDs for abstract regions that have had the same wall clock since 1970.
Examples: Europe/Berlin, America/New_York, Africa/Abidjan
Proposal: Ensure all of these are in the main files. Consider: Should there be new IDs for each of these abstract regions to indicate they are a separate and distinct concept? eg. "Region/Berlin". (Maybe something to consider in future threads as it isn't clear what the benefit of doing so is without considering pre-1970 which I'm still trying to avoid)
Non-region locations --------------------------- IDs for locations that are not region IDs. Each ID will have the same wall clock since 1970 as one of the region IDs.
Examples: Europe/Oslo, Europe/Amsterdam, Atlantic/Reykjavik
Consider: Can we write down a rule that covers which IDs are included here?
If I'm understanding correctly what Paul's been doing, these are "IDs that refer to regions that have the same time history since 1970 as another region but a different time history before that and are not the region that uses the ID that would be chosen using our standard conventions (basically 'largest town')". Or, put another way, partition the set of all zones into subsets, each of which have the same history since 1970. In each subset, one is what you've called an abstract region and the rest are non-region locations. The choice of the first is made based on our normal naming rules.
And therefore when a new ID can be added to this set? If we can define a rule, then these can be split so rule-following IDs are in the main files and rule-breaking ones are in `backward` (although ideally they should be separate from the spelling changes).
Hang on, why should we ever add a new ID at all. My view is that we should *not* be adding new IDs. So long as we're talking about a post-1970 database, that is. In other words, the rule is "they stay for backwards compatibility reasons and no other". For someone only building with 1970-onwards data, these would be equivalent to aliases, so are treated as equivalent to renames - see above.
Obviously, we can say these IDs only exist for backwards compatibility, but that seems like a weak justification,
Why? If we were starting a new TZDB from scratch, we could ignore it because there wouldn't be any backwards to be compatible with. But there is, so we need to thing about it.
and doesn't tackle the issue of when a new ID would be added to the list (which has been a point of tension).
Why not "never"? Well, apart from following a bug fix.
As is well known, I think the obvious rule is that the IDs follow the ISO-3166-1 standard (rule: one ID per ISO code, additional IDs may be added where clocks have diverged since 1970). Using ISO-3166 can be justified by IANA domain policy [2]:
That's not a justification, since IANA were handing rights over these names to those ISO bodies. And IANA have long given up on that policy, which is why there are .gg and .scot.
As per the previous thread, these non-region location IDs are actively used in downstream business applications, and it is not OK that only works because tzdb happens to have IDs for backwards compatibility. There needs to be a better justification than that
Sorry, but that is *exactly* the definition of "backwards compatibility". If someone starts a new application that only uses post-Paul-merges names because that's all they see, they will *not* be using these names nor care in the slightest about them. It's *ALL* about backwards compatibility.
Fixed/etc type rules -------------------------- IDs with a fixed offset
Examples: GMT, UTC, Etc/GMT-9
Proposal: No change, retain in the main files unless a particular ID is considered obsolete or deprecated
The easiest way to treat these is to deem that there are certain virtual places with their own time history (e.g. "international waters near the 30 degrees east meridian") which deserve their own zone on that basis (this one being Etc/GMT-1). But you've left out the mammoth in the room, which is pre-1970 data. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646