Proposal: Sections for different kinds of backward links
I have taken a stab at splitting the `backward` file into sections. This is based on repeated observations made by multiple people that the file represents different kinds of Link. https://github.com/eggert/tz/pull/33 The sections I'm suggesting are: CROSS-ISO - These link across an ISO-3166 boundary EXCESS - These were added despite the locations having the same timekeeping since 1970 RENAMED - These are simple aliases/renames OBSOLETE - These are the most obsolete form of IDs FIXED - These are alternate names for fixed offsets I don't expect this PR to be merged as is - There will probably need to be some consideration as to the correct split, and as to whether each ID is in the correct section. The approach I've used is commented sections with stable section names. Alternate approaches to this problem are possible. For example, a new file could be created for each section. That seems to be more disruptive than the proposed approach. Note that in order for this change to be useful, the section names need to be considered part of the source file API, and maintained over time. thanks Stephen
On 9 Nov 2021, at 21:30, Stephen Colebourne via tz <tz@iana.org> wrote:
Note that in order for this change to be useful, the section names need to be considered part of the source file API, and maintained over time.
Agreed—if any kind of division is introduced like this, it needs to be consistently machine-readable. I’m interested in the discussions around the different kind of Links, and what counts as “deprecated”. My angle comes from lurking in the bug tracker for the Moment Timezone JS library. Users there have raised several issues regarding Zone or Link names that we on the list consider deprecated (or “best to be avoided”), but have no formal interface to indicate it. Some examples include: Provide a way to determine whether a timezone is deprecated. (https://github.com/moment/moment-timezone/issues/956) Questions about US/Pacific-New (before it was removed completely). Confusion over Asia/Calcutta vs Asia/Kolkata, but a lot of that is caused by different web browsers returning different values. I also see that the PHP documentation has to maintain a page listing zone identifiers that shouldn’t be used: https://www.php.net/manual/en/timezones.others.php I’m assuming that it would be more desirable for PHP to have a software API to indicate the “unwanted" nature of those identifiers. (I also assume Derick will chime in to correct my assumption.) Cheers, Gil
On 11/9/21 02:30, Stephen Colebourne via tz wrote:
Alternate approaches to this problem are possible. For example, a new file could be created for each section.
This alternative sounds good to me, as I've had bad luck in the past with "comments" that are actually part of an API. Separating the Link commands into different .zi files would make it easy for downstream users to select which links they want by specifying the already-existing Makefile variable BACKWARD. For compatibility with uses that assume a file named 'backward', we could build 'backward' from the concatenation of these new .zi files.
On Tue, 9 Nov 2021 at 16:42, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 11/9/21 02:30, Stephen Colebourne via tz wrote:
Alternate approaches to this problem are possible. For example, a new file could be created for each section.
This alternative sounds good to me, as I've had bad luck in the past with "comments" that are actually part of an API. Separating the Link commands into different .zi files would make it easy for downstream users to select which links they want by specifying the already-existing Makefile variable BACKWARD. For compatibility with uses that assume a file named 'backward', we could build 'backward' from the concatenation of these new .zi files.
I don't have a problem with that, although it seems like more work than using comments. I'd be interested in your view as to how to group the IDs. Certainly an `obsolete` file would make a lot of sense. Stephen
Stephen Colebourne via tz said:
This alternative sounds good to me, as I've had bad luck in the past with "comments" that are actually part of an API.
I don't have a problem with that, although it seems like more work than using comments.
I'm with Paul on that. I've had too much pain in the past with file formats or APIs that didn't have room for expansion and then you have a desperate need. Trying to repurpose comments is a terrible idea (lint notwithstanding). Better to have something clean. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 11/9/21 10:43, Stephen Colebourne via tz wrote:
I'd be interested in your view as to how to group the IDs.
These groupings should provide useful information downstream. Perhaps something like the following: * primary - These name the primary location of a region with clocks the same since 1970, e.g., Europe/London, Asia/Kolkata, Asia/Calcutta, Asia/Singapore, Singapore. * secondary - These name secondary locations in a region, and exist only for compatibility with older tzdb versions, e.g., Australia/Canberra, Antarctica/South_Pole. * posixish - These are for platforms that lack support for POSIX TZ settings, e.g., Etc/GMT, Greenwich, CET, EST5EDT, Zulu. * obsolescent - These are planned to be removed in future releases. We could put "W-SU" into this category as an example, since the Soviet Union hasn't existed for decades. Other names could be moved into this category as they become obsolescent. * other - These names don't fit into the above categories, and also exist only for compatibility with older tzdb versions, e.g., Cuba, Egypt, GB-Eire, Iceland, Navajo, PRC. The above groupings don't worry about whether names are Zones or Links because that information is already present in the source code.
On Fri, 12 Nov 2021 at 05:04, Paul Eggert <eggert@cs.ucla.edu> wrote: Unfortunately, from my perspective your groupings have eliminated useful information I need as a downstream consumer. The category `posixish` is fine. The distinction between `obsolescent` and `other` seems unnecessary - they are all IDs no longer suitable for use. The inability to identify spelling changes is a big problem. As a downstream consumer I need to be able to indicate which spelling users should use. I think this needs a separate category. It does require tzdb to pick which of the spellings is the preferred one, but tzdb already does this when the Link is defined (In `primary` the information can be derived from whether it is a Link or Zone, but it cannot be derived for other categories). The `primary` category is fine as it is factual (based on clocks aligning post-1970). To be useful it would need one ID per region (ie. only one spelling allowed). Including all non-primary IDs in `secondary` is not helpful at all. While you may not think that one ID per ISO code is useful, it is surely clear at this point that others do. What you have here is an opportunity to provide some relief for those who want one ID per ISO code without it significantly impacting your maintenance cost. This would require `secondary` to be split, one listing the main ID per ISO code (where not in the `primary`) and one listing all the other excess locations. These categories could be called `secondary` and `tertiary`. The net result is that `primary` would represent those IDs you believe tzdb should provide, while `primary` and `secondary` combined represent those IDs I believe should be provided (and what was effectively provided prior to 2014). Were this to be done, it would be possible to provide a command line option that pulls the relevant parts of `backzone` in based on the combined list of `primary` and `secondary`. As a downstream consumer I'd personally prefer one file with a column for the type of ID. Having these as separate files makes this more complex to use (and more complex to maintain IMO). thanks Stephen
On 11/14/21 04:16, Stephen Colebourne via tz wrote:
The distinction between `obsolescent` and `other` seems unnecessary - they are all IDs no longer suitable for use.
We could merge the two into a single category 'obsolescent'.
I need to be able to indicate which spelling users should use.
That info is already in zone.tab's column 3. Although that file is marked "deprecated", perhaps we can undeprecate it.
On Sun, 14 Nov 2021 at 21:17, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 11/14/21 04:16, Stephen Colebourne via tz wrote:
I need to be able to indicate which spelling users should use.
That info is already in zone.tab's column 3. Although that file is marked "deprecated", perhaps we can undeprecate it.
I fail to see how zone.tab helps with spelling issues (as per the quoted text). (If there is a spelling change in `secondary`, there is no mechanism to determine which is the preferred spelling, as both IDs will Link to the same primary ID.) Stephen
On 11/14/21 15:16, Stephen Colebourne via tz wrote:
On Sun, 14 Nov 2021 at 21:17, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 11/14/21 04:16, Stephen Colebourne via tz wrote:
I need to be able to indicate which spelling users should use.
That info is already in zone.tab's column 3. Although that file is marked "deprecated", perhaps we can undeprecate it.
I fail to see how zone.tab helps with spelling issues (as per the quoted text).
Sorry, I'm not following. Not sure what you mean by "the quoted text".
(If there is a spelling change in `secondary`, there is no mechanism to determine which is the preferred spelling, as both IDs will Link to the same primary ID.)
I was thinking that one can look in zone.tab. If the spelling is there it's "preferred" otherwise it's not. This heuristic won't suffice for arbitrary questions about "preference" (as it doesn't impose a total order on names) but it should be good enough for questions relating to backward compatibility with older guidelines.
On Tue, 16 Nov 2021 at 18:20, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 11/14/21 15:16, Stephen Colebourne via tz wrote:
(If there is a spelling change in `secondary`, there is no mechanism to determine which is the preferred spelling, as both IDs will Link to the same primary ID.)
I was thinking that one can look in zone.tab. If the spelling is there it's "preferred" otherwise it's not. This heuristic won't suffice for arbitrary questions about "preference" (as it doesn't impose a total order on names) but it should be good enough for questions relating to backward compatibility with older guidelines.
That makes these categories less useful. It involves a downstream consumer processing zone.tab and cross-checking it. It also doesn't handle the case where there is a spelling change for an ID that is not listed in zone.tab. It is not at all clear why you would want the `primary` category to contain both Asia/Kolkata and Asia/Calcutta. Stephen
On 11/17/21 02:03, Stephen Colebourne via tz wrote:
On Tue, 16 Nov 2021 at 18:20, Paul Eggert <eggert@cs.ucla.edu> wrote:
I was thinking that one can look in zone.tab. If the spelling is there it's "preferred" otherwise it's not.
That makes these categories less useful. It involves a downstream consumer processing zone.tab and cross-checking it. It also doesn't handle the case where there is a spelling change for an ID that is not listed in zone.tab.
You're right that it's a bit more work downstream, and that it doesn't handle arbitrary questions about which names are preferred to which other names (for example, it doesn't establish a total order on names). However, it does seem to suffice for the specific problem you mentioned about backwards compatibility (one name per ISO country), and it has the advantage of not burdening the upstream maintenance process with even more political questions.
It is not at all clear why you would want the `primary` category to contain both Asia/Kolkata and Asia/Calcutta.
The idea was that the 'primary' category would be about the timestamps (where Asia/Kolkata and Asia/Calcutta are identical), not about the metadata (where people might disagree about how to spell the city's name). This would help insulate tzdb from political issues. Although we cannot insulate tzdb completely from politics, when it's easy to do so then we should do it.
As a reminder, you said "These groupings should provide useful information downstream." I've provided feedback that the categories are not helpful to me as a downstream consumer - Having "Singapore" (an obsolete ID) in `primary` really isn't useful info and you might as well not bother with the change. The whole point is to provide a way to prune obsolete IDs like that. If you want downstream consumers to have useful info from this change you would need a list of all obsolete IDs, all posixish IDs and the minimal set of primary IDs (ie. excluding spelling and obsolete variants) . thanks Stephen On Wed, 17 Nov 2021 at 20:43, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 11/17/21 02:03, Stephen Colebourne via tz wrote:
On Tue, 16 Nov 2021 at 18:20, Paul Eggert <eggert@cs.ucla.edu> wrote:
I was thinking that one can look in zone.tab. If the spelling is there it's "preferred" otherwise it's not.
That makes these categories less useful. It involves a downstream consumer processing zone.tab and cross-checking it. It also doesn't handle the case where there is a spelling change for an ID that is not listed in zone.tab.
You're right that it's a bit more work downstream, and that it doesn't handle arbitrary questions about which names are preferred to which other names (for example, it doesn't establish a total order on names). However, it does seem to suffice for the specific problem you mentioned about backwards compatibility (one name per ISO country), and it has the advantage of not burdening the upstream maintenance process with even more political questions.
It is not at all clear why you would want the `primary` category to contain both Asia/Kolkata and Asia/Calcutta.
The idea was that the 'primary' category would be about the timestamps (where Asia/Kolkata and Asia/Calcutta are identical), not about the metadata (where people might disagree about how to spell the city's name). This would help insulate tzdb from political issues. Although we cannot insulate tzdb completely from politics, when it's easy to do so then we should do it.
On 11/17/21 18:29, Stephen Colebourne via tz wrote:
Having "Singapore" (an obsolete ID) in `primary` really isn't useful info and you might as well not bother with the change. The whole point is to provide a way to prune obsolete IDs like that.
For many years the files zone1970.tab and zone.tab have provided a way to do that. Either of these files suffice to prune away obsolete IDs, for a particular meaning of "obsolete", by using the rule that if the file contains a name, it's not obsolete. The two files differ because they use different definition of "obsolete". zone1970.tab is intended to correspond to the current guidelines (one name per alike-since-1970 region), whereas zone.tab is intended to correspond to older guidelines (a name for each alike-since-1970 region in each country). If neither of these files correspond to the definition of "obsolete" that you need, it'd be helpful to see examples of where they both go amiss so that we can think about how to remedy the situtation.
If you want downstream consumers to have useful info from this change you would need a list of all obsolete IDs, all posixish IDs and the minimal set of primary IDs (ie. excluding spelling and obsolete variants) .
Here is a simple definition that differs a bit from what I sent earlier, but has the virtue of not requiring reorganizing the data: obsolete IDs are those not listed in zone.tab or zone1970.tab (your pick). Posixish IDs start with "Etc/". The minimal set of primary IDs consists of those listed in zone.tab or zone1970.tab (your pick). These categories overlap, but the overlap shouldn't be much of a problem.
Stephen Colebourne via tz <tz@iana.org> writes:
It is not at all clear why you would want the `primary` category to contain both Asia/Kolkata and Asia/Calcutta.
Why not? Your comments here seem to imply that you have a goal of getting people to stop using identifiers that you think are deprecated and get them to start using identifiers that you think are better, but it's entirely inobvious to me that this should be a goal of the tz project. If both identifiers will continue to exist forever (and I think that's very likely), who cares which one people use? Why is it important to try to distinguish between them? So much of the conflict here seems to derive from your belief that a Link somehow indicates that the name is deprecated and should be treated differently than files that are not links, but I don't see any reason why this should be true. -- Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>
Stephen Colebourne via tz said:
As a downstream consumer I'd personally prefer one file with a column for the type of ID. Having these as separate files makes this more complex to use (and more complex to maintain IMO).
I would agree with this: a file giving each zone and its status, plus (where not primary) the name of the primary zone. -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On 11/14/21 13:46, Clive D.W. Feather via tz wrote:
a file giving each zone and its status, plus (where not primary) the name of the primary zone.
A name of the primary zone is already present: just look for a Link line with the same target. Perhaps we could just have a file listing each name and its category, where the categories seem to be 'primary', 'secondary', 'posixish', and 'old' (or 'obsolescent').
participants (5)
-
Clive D.W. Feather -
Gilmore Davidson -
Paul Eggert -
Russ Allbery -
Stephen Colebourne