State of the tzdb

newer
Proposed patch - Theory change to...

older
[PATCH] Document some longstanding...

Stephen Colebourne

Sept. 3, 2013

10 a.m.

We're a lot further forward now to regaining stability, but there are a few points still outstanding. 1) Should there be one zone ID in the main files (not "backward") for each inhabited ISO-3166 code? I've argued for this as a reinstatement of a rule of 16 years standing which is no more than common sense. Paul has argued against. 2) Zone ID for America/Shiprock Reinstate Yes/No? 3) Zone ID for Antarctica/South_Pole Reinstate Yes/No? At a minimum, the "backward" link needs to point to McMurdo 4) Recognise correct usage of "backward" file As per ECMAscript: http://norbertlindenberg.com/ecmascript/intl.html#sec-6.4.2 the "backward" is, and will continue to be, used for canonicalization - ie. identifying which IDs can be normalized away. As such, the tzdb needs to be very careful about cross-ISO-3166 links placed in the "backward" file (IMO there should be none to avoid political offence via canonicalization). The use of "backward" should be further documented in Theory to indicate its usage as a canonicalization file, and to indicate that the entire tzdb data set must make sense if the "backward" file is not processed (thus no lines in zone.tab should point to an ID in "backward") 5) Pre-1970 data Some of this seems uncontroversial at this point (don't remove data, don't create new IDs just for pre-1970). Some of it is still under discussion. Only enhancement of existing IDs is necessary to the applications I work with. As noted in the other thread, removing IDs that only differ in LMT is harder than initially thought, as the start of fixed offsets and the name of fixed offsets are pieces of data covered under the no deletion rule. As such, it is unlikely that there are many possibilities for merging zones. 6) ... anything I've forgotten? Finally, I'm sorry I've had to spend so must list time discussing this. Unfortunately, this data is just too important to me and the applications/langauages downstream of me to allow me to not object loudly. thanks Stephen

Show replies by date

Lester Caine

September 2013

11:43 a.m.

Stephen Colebourne wrote:

...

We're a lot further forward now to regaining stability, but there are a few points still outstanding.

1) Should there be one zone ID in the main files (not "backward") for each inhabited ISO-3166 code?

I've argued for this as a reinstatement of a rule of 16 years standing which is no more than common sense. Paul has argued against.

A debate on openstreetmap at the moment relates to the status of historic information in that database. Many people feel that if it does not currently exist on the ground, then it should be removed. It is a little of an academic discussion since the data will remain in the change logs anyway, but it's access to that data which is at question. The current proposal is that some data is archived to an openhistorymap which will allow the use of a 'date' to define what data is displayed, but in many cases the bulk of the main database needs to be combined with the small amount of history to create the final dataset anyway, so openhistorymap has to have a complete copy of the main map! You can't define a reason for NOT including something in the historic map. Not to dissimilar to what we are talking about here? To my mind all that is missing is an 'end-date' when a legacy zone was merged with a current one? We need the 'backwards' data in the main database, so why not just have all the zone data with both start and end dates ... most of which are open ended, but some 'link' to other zones at a particular time point. In the case of the IM data, the start of GMT is recorded as 30th March 1883, there is a little commotion about dates for 1921, and then the law is passed to use the English time changes from 1922 ( when we find a copy :) ), so the zone data is tagged as linked to GB at that time on. From the historic perspective there are separate documents approving the changes to 'time' with separate approval dates, but personally I'm happy THAT material is in an archive. It's just the initial facts that differ and it does not add much to complete the whole picture? I think many other mergers are a similar situation with just a small amount of extra data? Turning the problem around, if new evidence is found that provides additional differences to the change from LMT to a standard time for parts of an existing zone ... the problem that Paul is worried about ... Adding a new couple of entries giving that detail and linking to the ongoing 'time thread' is just sensible to my mind. Identifying the extra zone may be more controversial but the evidence would have to provide a reason to include the data anyway? When you look at an historic date you get all the currently active zone names ... -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

Guy Harris

7:30 p.m.

On Sep 3, 2013, at 4:43 AM, Lester Caine <lester@lsces.co.uk> wrote:

...

Stephen Colebourne wrote:

...
We're a lot further forward now to regaining stability, but there are a few points still outstanding.

1) Should there be one zone ID in the main files (not "backward") for each inhabited ISO-3166 code?

I've argued for this as a reinstatement of a rule of 16 years standing which is no more than common sense. Paul has argued against.

A debate on openstreetmap at the moment relates to the status of historic information in that database. Many people feel that if it does not currently exist on the ground, then it should be removed. It is a little of an academic discussion since the data will remain in the change logs anyway, but it's access to that data which is at question. The current proposal is that some data is archived to an openhistorymap which will allow the use of a 'date' to define what data is displayed, but in many cases the bulk of the main database needs to be combined with the small amount of history to create the final dataset anyway, so openhistorymap has to have a complete copy of the main map! You can't define a reason for NOT including something in the historic map.

Not to dissimilar to what we are talking about here?

You're talking about historical data, so you're presumably referring to this point in Stephen's message:

...

5) Pre-1970 data Some of this seems uncontroversial at this point (don't remove data, don't create new IDs just for pre-1970). Some of it is still under discussion. Only enhancement of existing IDs is necessary to the applications I work with.

As noted in the other thread, removing IDs that only differ in LMT is harder than initially thought, as the start of fixed offsets and the name of fixed offsets are pieces of data covered under the no deletion rule. As such, it is unlikely that there are many possibilities for merging zones.

not to the quoted point, which was about ISO 3166 country codes. If, for example, you're talking about rules within the UK, those all correspond to a single ISO 3166 country code, so his point about ISO 3166 country codes doesn't apply.

Lester Caine

8:02 p.m.

Guy Harris wrote:

...

On Sep 3, 2013, at 4:43 AM, Lester Caine <lester@lsces.co.uk> wrote:

...
Stephen Colebourne wrote:

...
We're a lot further forward now to regaining stability, but there are a few points still outstanding.

1) Should there be one zone ID in the main files (not "backward") for each inhabited ISO-3166 code?

I've argued for this as a reinstatement of a rule of 16 years standing which is no more than common sense. Paul has argued against.

A debate on openstreetmap at the moment relates to the status of historic information in that database. Many people feel that if it does not currently exist on the ground, then it should be removed. It is a little of an academic discussion since the data will remain in the change logs anyway, but it's access to that data which is at question. The current proposal is that some data is archived to an openhistorymap which will allow the use of a 'date' to define what data is displayed, but in many cases the bulk of the main database needs to be combined with the small amount of history to create the final dataset anyway, so openhistorymap has to have a complete copy of the main map! You can't define a reason for NOT including something in the historic map.

Not to dissimilar to what we are talking about here?

You're talking about historical data, so you're presumably referring to this point in Stephen's message:

...
5) Pre-1970 data Some of this seems uncontroversial at this point (don't remove data, don't create new IDs just for pre-1970). Some of it is still under discussion. Only enhancement of existing IDs is necessary to the applications I work with.

As noted in the other thread, removing IDs that only differ in LMT is harder than initially thought, as the start of fixed offsets and the name of fixed offsets are pieces of data covered under the no deletion rule. As such, it is unlikely that there are many possibilities for merging zones.

not to the quoted point, which was about ISO 3166 country codes. If, for example, you're talking about rules within the UK, those all correspond to a single ISO 3166 country code, so his point about ISO 3166 country codes doesn't apply.

The I was not talking about any particular statement, but the generality of removing older data from a system that is in active use. Many of the comments relate to winnowing old data and essentially ignoring the historic material. I on the other hand would prefer to be able to retain the pre-1970 material in a format that can be used in exactly the same way as any other time window. I'm not sure as yet that we have agreement on exactly what historic data is fundamentally correct and what needs winnowing simply because it IS on shaky assumptions? The data that I've reviewed confirms what I had already assumed and is valid back to the 1880's and I think there has been a similar statement relating to US data? I've already outlined what I think is a good compromise to bring in LMT information without adding extra zones, along with the maintaining of extra pre-1970 data which just creates additional zones for just a short period of time. I'm just not sure anybody agrees with my analysis? A system that returns a list of timezones based on a supplied date would seem to me to simplify everything without loosing the critical historic zones? And move far enough back we just use LMT (or solar time) based on longitude ... -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

Guy Harris

11:30 p.m.

On Sep 3, 2013, at 1:02 PM, Lester Caine <lester@lsces.co.uk> wrote:

...

The I was not talking about any particular statement, but the generality of removing older data from a system that is in active use. Many of the comments relate to winnowing old data and essentially ignoring the historic material. I on the other hand would prefer to be able to retain the pre-1970 material in a format that can be used in exactly the same way as any other time window.

I think Zefram's first usage of the term "winnowing" was when he said:

...

If this is a popular idea, I think I should expand a bit on how to deal with it. I alluded earlier to a problem in manual zone selection, where the user may be forced to choose between zones that are equivalent for eir purposes. The same approach really addresses both issues, and it's worth generalising.

The essential process is to winnow a set of timezones so that only inequivalent zones remain. Equivalence is in general defined by the user, specifically by the user indicating a range of years that is of interest. The kind of cutoff discussed so far describes the lower end of the range; some applications would also benefit from being able to specify an upper end.

The key here is "Equivalence is in general *defined by the user*, specifically by the user indicating a range of years that is of interest." "The user" might be the ultimate end user or the software packaging using the tzdb. I.e., the tzdb wouldn't itself force all users to ignore the historic material, but it would *allow* users to do so. Some or all UN*X systems using the tzdb for localtime() and mktime() might well winnow out all cases where zones differ only prior to 1970, so as to reduce the number of time zones for which somebody configuring the system has to be aware, i.e., not "[forcing] the user to choose between zones that are equivalent for their purposes". That might, however, surprise some current users of those systems, and applications using the system-supplied version of the tzdb on those systems. Other applications might winnow out nothing, or might winnow out systems where zones only differ prior to 1900, or some other specified date.

random832＠fastmail.us

12:34 p.m.

On Tue, Sep 3, 2013, at 6:00, Stephen Colebourne wrote:

...

2) Zone ID for America/Shiprock

Reinstate Yes/No?

I will point out that, for instance, for all the ridiculous cluster of zones there are for Indiana, we don't have zones for the few pieces of Indiana that actually match zones outside the state. Nor for the parts of Idaho that don't follow America/Boise. This seems to be an artifact of a simpler period in the database - the descriptions in the original commit of zone.tab are: +US +394421-1045903 America/Denver Mountain Time +US +364708-1084111 America/Shiprock Mountain Time - Navajo +US +332654-1120424 America/Phoenix Mountain Standard Time - Arizona With that description for America/Phoenix, it was necessary for America/Shiprock to describe the parts of Arizona that didn't match that rule.

Paul Eggert

2:46 p.m.

random832@fastmail.us wrote:

...

for all the ridiculous cluster of zones there are for Indiana, we don't have zones for the few pieces of Indiana that actually match zones outside the state. Nor for the parts of Idaho that don't follow America/Boise.

This seems to be an artifact of a simpler period in the database

No, it's not an artifact zone.tab was never intended to be a definitive delination of sub-country regions, and it does not have that role now, either. Kansas is split into two time zones, for example, but we don't have two entries for it in zone.tab -- the people in Kansas can use America/Denver or America/Chicago as appropriate. The situation for Indiana is similar: people in (say) Gary can use America/Chicago.

random832＠fastmail.us

3:18 p.m.

On Tue, Sep 3, 2013, at 10:46, Paul Eggert wrote:

...

No, it's not an artifact zone.tab was never intended to be a definitive delination of sub-country regions, and it does not have that role now, either.

Kansas is split into two time zones, for example, but we don't have two entries for it in zone.tab -- the people in Kansas can use America/Denver or America/Chicago as appropriate. The situation for Indiana is similar: people in (say) Gary can use America/Chicago.

You're being unnecessarily aggressive when all I was doing was pointing out two facts: A) the description for America/Phoenix said "Arizona" B) there was no indication in America/Denver that it did not apply to parts of Arizona. I could go back further and point out that (back when the names were self-explanatory) they were called "US/Arizona" (again, implying in the absence of anything else that it would be all of arizona) and "Navajo"

random832＠fastmail.us

3:20 p.m.

On Tue, Sep 3, 2013, at 10:46, Paul Eggert wrote:

...

Kansas is split into two time zones, for example, but we don't have two entries for it in zone.tab -- the people in Kansas can use America/Denver or America/Chicago as appropriate. The situation for Indiana is similar: people in (say) Gary can use America/Chicago.

I was pointing out the Indiana thing _in support of_ removing America/Shiprock. I'm not sure what you're disagreeing with here.

Paul Eggert

3:44 p.m.

random832@fastmail.us wrote:

...

I'm not sure what you're disagreeing with here.

My apologies for misunderstanding your comments. I think we're in agreement on the main points.

Guy Harris

7:25 p.m.

On Sep 3, 2013, at 3:00 AM, Stephen Colebourne <scolebourne@joda.org> wrote:

...

1) Should there be one zone ID in the main files (not "backward") for each inhabited ISO-3166 code?

(Presumably meaning "*at least* one zone ID"; having only one zone ID for the ISO 3166 code "US" would obviously be wrong, and the same applies to the codes "CA", "RU", and "AU".)

Guy Harris

8:38 p.m.

On Sep 3, 2013, at 3:00 AM, Stephen Colebourne <scolebourne@joda.org> wrote:

...

4) Recognise correct usage of "backward" file As per ECMAscript: http://norbertlindenberg.com/ecmascript/intl.html#sec-6.4.2 the "backward" is, and will continue to be, used for canonicalization - ie. identifying which IDs can be normalized away.

I presume If ianaTimeZone is a Link name, then let ianaTimeZone be the corresponding Zone name as specified in the “backward” file of the IANA Time Zone Database. was intended to either to be If ianaTimeZone is a Link name *that appears in the “backward” file of the IANA Time Zone Database*, then let ianaTimeZone be the corresponding Zone name as specified in that file. or to be If ianaTimeZone is a Link name, then let ianaTimeZone be the corresponding Zone name as specified in whichever file of the IANA Time Zone Database specifies the Link name. because there are Link names that *don't* appear in the "backward" file. (I've sent a message to Norbert to that effect.) With the first of those formulations, Europe/Zagreb would *not* get canonicalized to Europe/Belgrade, and Europe/Istanbul would *not* get canonicalized to Asia/Istanbul, for example. With the second of those formulations, *both* of those canonicalizations would occur. My guess is that the first of those formulations was intended, i.e. that two Linked-together tzids are to be thought of as separate tzids (that happen to share GMT offsets etc.) unless the Link line is in the "backwards" file.

Norbert Lindenberg

1:16 a.m.

On Sep 3, 2013, at 13:38 , Guy Harris <guy@alum.mit.edu> wrote:

...

On Sep 3, 2013, at 3:00 AM, Stephen Colebourne <scolebourne@joda.org> wrote:

...
4) Recognise correct usage of "backward" file As per ECMAscript: http://norbertlindenberg.com/ecmascript/intl.html#sec-6.4.2 the "backward" is, and will continue to be, used for canonicalization - ie. identifying which IDs can be normalized away.

I presume

If ianaTimeZone is a Link name, then let ianaTimeZone be the corresponding Zone name as specified in the “backward” file of the IANA Time Zone Database.

was intended to either to be

If ianaTimeZone is a Link name *that appears in the “backward” file of the IANA Time Zone Database*, then let ianaTimeZone be the corresponding Zone name as specified in that file.

or to be

If ianaTimeZone is a Link name, then let ianaTimeZone be the corresponding Zone name as specified in whichever file of the IANA Time Zone Database specifies the Link name.

because there are Link names that *don't* appear in the "backward" file. (I've sent a message to Norbert to that effect.)

With the first of those formulations, Europe/Zagreb would *not* get canonicalized to Europe/Belgrade, and Europe/Istanbul would *not* get canonicalized to Asia/Istanbul, for example.

With the second of those formulations, *both* of those canonicalizations would occur.

My guess is that the first of those formulations was intended, i.e. that two Linked-together tzids are to be thought of as separate tzids (that happen to share GMT offsets etc.) unless the Link line is in the "backwards" file.

Thanks for the feedback - this is a bug in the spec. The intent was the second - to treat all Zone names as canonical names and all Link names as aliases. The mistake was in the assumption that all Link entries are contained in the "backward" file. I've filed a bug report. Norbert

Guy Harris

8:18 a.m.

On Sep 3, 2013, at 6:16 PM, Norbert Lindenberg <ietf@lindenbergsoftware.com> wrote:

...

Thanks for the feedback - this is a bug in the spec. The intent was the second - to treat all Zone names as canonical names and all Link names as aliases. The mistake was in the assumption that all Link entries are contained in the "backward" file. I've filed a bug report.

...so, if the reason to move the Europe/... entries that link to Europe/Belgrade out of "backward" to "europe" is to avoid having tzids with names containing other former Yugoslavian cities canonicalized to Europe/Belgrade, that's not going to suffice - the only way to do that is not to have them be links at all.

4661

Age (days ago)

4662

Last active (days ago)

List overview

Download

13 comments

6 participants

participants (6)

Guy Harris
Lester Caine
Norbert Lindenberg
Paul Eggert
random832＠fastmail.us
Stephen Colebourne

State of the tzdb

Stephen Colebourne

Lester Caine

Guy Harris

Lester Caine

Guy Harris

random832＠fastmail.us

Paul Eggert

random832＠fastmail.us

random832＠fastmail.us

Paul Eggert

Guy Harris

Guy Harris

Norbert Lindenberg

Guy Harris

tags

participants (6)