Issues with pre-1970 information in TZDB

Stephen Colebourne

Sept. 22, 2021

10:52 a.m.

(David Braverman recently asked for a summary of the issue. This is my attempt to summarize in a relatively even-handed way) The TZDB theory file starts with the following: "The tz database attempts to record the history and predicted future of civil time scales. It organizes time zone and daylight saving time data by partitioning the world into timezones whose clocks all agree about timestamps that occur after the POSIX Epoch (1970-01-01 00:00:00 UTC. Although 1970 is a somewhat-arbitrary cutoff, there are significant challenges to moving the cutoff earlier even by a decade or two, due to the wide variety of local practices before computer timekeeping became prevalent." I have always thought this is a wise choice for the management of time zone data. I suspect that most people on this list would agree. The rules for creating new IDs where post-1970 data differs are well understood and, I believe, agreed upon. Apart from debates over negative daylight saving, I do not believe there are any significant issues with the data post-1970. Despite the theory file introduction above, TZDB does in fact contain data for some locations before 1970. The issue at hand is which timezone IDs are allowed to have this data and which are not. And more broadly, the degree to which it is acceptable to change the status quo on the pre-1970 data. Over many years, pre-1970 data was added to many different IDs. However, over recent years, pre-1970 data has been removed. (Technically, it has been moved to another file, but for the purposes of those consuming the main set of tzdb files it has effectively been removed.) The net result of recent changes are various situations which might be described as "cleaner", "more equitable", "nonsensical", "unacceptable" or "less equitable" depending on your viewpoint. For example, Europe/Berlin has pre-1970 data, but Europe/Oslo and Europe/Stockholm do not. In technical terms, Europe/Oslo and Europe/Stockholm are now aliases for Europe/Berlin (known as Links in tzdb). Where the problem lies is that a user who queries Europe/Oslo for the timezone offset in 1950 used to get the data for Oslo but will now get the data for Berlin. Depending on your viewpoint this is "irrelevant", "unfortunate" or "offensive". Why has the data changed? Because Oslo, Stockholm and Berlin all have the same timezone data post-1970, and Berlin is the largest city. The argument is that if only post-1970 data matters (as per the theory file), then there is no justification for three separate data sets when only one will do, and the one chosen is the one with the largest city. The counter argument is that merging data sets across country boundaries is unacceptable and politically naive, particularly when there were no complaints about the previous status quo. More broadly, different individuals, some representing organizations, have expressed different opinions on what they do or do not want from the tzdb data set. Some would like a full historical record of time zone data, others want stability, many I suspect have absolutely no interest whatsoever in pre-1970 data. My personal concerns are data stability (agreed managed changes are OK), and the politically-sensitive inaccuracy that results from merging across country boundaries. Can we keep responses limited on this thread? Perhaps only respond if you think I've mischaracterized the issues at stake here? Or missed something obvious? Stephen

Show replies by date

David Braverman

September 2021

2:34 p.m.

Thank you, I appreciate this. David Braverman -----Original Message----- From: tz <tz-bounces@iana.org> On Behalf Of Stephen Colebourne via tz Sent: Wednesday 22 September 2021 05:52 To: Time Zone Mailing List <tz@iana.org> Subject: [tz] Issues with pre-1970 information in TZDB (David Braverman recently asked for a summary of the issue. This is my attempt to summarize in a relatively even-handed way) The TZDB theory file starts with the following: "The tz database attempts to record the history and predicted future of civil time scales. It organizes time zone and daylight saving time data by partitioning the world into timezones whose clocks all agree about timestamps that occur after the POSIX Epoch (1970-01-01 00:00:00 UTC. Although 1970 is a somewhat-arbitrary cutoff, there are significant challenges to moving the cutoff earlier even by a decade or two, due to the wide variety of local practices before computer timekeeping became prevalent." I have always thought this is a wise choice for the management of time zone data. I suspect that most people on this list would agree. The rules for creating new IDs where post-1970 data differs are well understood and, I believe, agreed upon. Apart from debates over negative daylight saving, I do not believe there are any significant issues with the data post-1970. Despite the theory file introduction above, TZDB does in fact contain data for some locations before 1970. The issue at hand is which timezone IDs are allowed to have this data and which are not. And more broadly, the degree to which it is acceptable to change the status quo on the pre-1970 data. Over many years, pre-1970 data was added to many different IDs. However, over recent years, pre-1970 data has been removed. (Technically, it has been moved to another file, but for the purposes of those consuming the main set of tzdb files it has effectively been removed.) The net result of recent changes are various situations which might be described as "cleaner", "more equitable", "nonsensical", "unacceptable" or "less equitable" depending on your viewpoint. For example, Europe/Berlin has pre-1970 data, but Europe/Oslo and Europe/Stockholm do not. In technical terms, Europe/Oslo and Europe/Stockholm are now aliases for Europe/Berlin (known as Links in tzdb). Where the problem lies is that a user who queries Europe/Oslo for the timezone offset in 1950 used to get the data for Oslo but will now get the data for Berlin. Depending on your viewpoint this is "irrelevant", "unfortunate" or "offensive". Why has the data changed? Because Oslo, Stockholm and Berlin all have the same timezone data post-1970, and Berlin is the largest city. The argument is that if only post-1970 data matters (as per the theory file), then there is no justification for three separate data sets when only one will do, and the one chosen is the one with the largest city. The counter argument is that merging data sets across country boundaries is unacceptable and politically naive, particularly when there were no complaints about the previous status quo. More broadly, different individuals, some representing organizations, have expressed different opinions on what they do or do not want from the tzdb data set. Some would like a full historical record of time zone data, others want stability, many I suspect have absolutely no interest whatsoever in pre-1970 data. My personal concerns are data stability (agreed managed changes are OK), and the politically-sensitive inaccuracy that results from merging across country boundaries. Can we keep responses limited on this thread? Perhaps only respond if you think I've mischaracterized the issues at stake here? Or missed something obvious? Stephen

Brooks Harris

6:31 p.m.

I've been following this discussion for months but I now find myself a little confused about what the issues are and from what the controversy arises. As I've understood it there seems to have been three issues that motivated Paul to implement the 'merger': 1) Reducing the number of time zones to reduce the size of the distribution. 2) Merging time zones that have identical rule sets (at least since 1970). 3) Separating pre-1970 rule sets from post-1970 rule sets. Stephen's summary is helpful but seems to concentrate on the ramifications of the 'merger' on time zone names (tags) and country codes, which I agree is an important consideration. I'm inclined to agree with those that feel a reversion to 2021a would be the best approach. Indeed the changes would have significant consequences to my current (in development) tzdb parser which reads the source files directly with no modifications to accumulate all time zones that have existed. Filtering this comprehensive list of time zones to some subset for specific purposes, such as CLDR for Windows, is then my client application's responsibility. I'd really rather not have to make significant changes to accommodate the reorganization of the merged tzdb. I gather I am not alone I am most thankful for and impressed by Paul's contributions over the years. Maybe I could ask him to summarize his initial reasons for the merger and why he feels reverting to 2021a is not a good interim solution? Thanks, -Brooks Harris On 2021-09-22 10:34 AM, David Braverman via tz wrote:

...

Thank you, I appreciate this.

David Braverman

-----Original Message----- From: tz<tz-bounces@iana.org> On Behalf Of Stephen Colebourne via tz Sent: Wednesday 22 September 2021 05:52 To: Time Zone Mailing List<tz@iana.org> Subject: [tz] Issues with pre-1970 information in TZDB

(David Braverman recently asked for a summary of the issue. This is my attempt to summarize in a relatively even-handed way)

The TZDB theory file starts with the following:

"The tz database attempts to record the history and predicted future of civil time scales. It organizes time zone and daylight saving time data by partitioning the world into timezones whose clocks all agree about timestamps that occur after the POSIX Epoch (1970-01-01 00:00:00 UTC. Although 1970 is a somewhat-arbitrary cutoff, there are significant challenges to moving the cutoff earlier even by a decade or two, due to the wide variety of local practices before computer timekeeping became prevalent."

I have always thought this is a wise choice for the management of time zone data. I suspect that most people on this list would agree.

The rules for creating new IDs where post-1970 data differs are well understood and, I believe, agreed upon. Apart from debates over negative daylight saving, I do not believe there are any significant issues with the data post-1970.

Despite the theory file introduction above, TZDB does in fact contain data for some locations before 1970. The issue at hand is which timezone IDs are allowed to have this data and which are not. And more broadly, the degree to which it is acceptable to change the status quo on the pre-1970 data.

Over many years, pre-1970 data was added to many different IDs. However, over recent years, pre-1970 data has been removed. (Technically, it has been moved to another file, but for the purposes of those consuming the main set of tzdb files it has effectively been removed.)

The net result of recent changes are various situations which might be described as "cleaner", "more equitable", "nonsensical", "unacceptable" or "less equitable" depending on your viewpoint. For example, Europe/Berlin has pre-1970 data, but Europe/Oslo and Europe/Stockholm do not. In technical terms, Europe/Oslo and Europe/Stockholm are now aliases for Europe/Berlin (known as Links in tzdb).

Where the problem lies is that a user who queries Europe/Oslo for the timezone offset in 1950 used to get the data for Oslo but will now get the data for Berlin. Depending on your viewpoint this is "irrelevant", "unfortunate" or "offensive".

Why has the data changed? Because Oslo, Stockholm and Berlin all have the same timezone data post-1970, and Berlin is the largest city. The argument is that if only post-1970 data matters (as per the theory file), then there is no justification for three separate data sets when only one will do, and the one chosen is the one with the largest city. The counter argument is that merging data sets across country boundaries is unacceptable and politically naive, particularly when there were no complaints about the previous status quo.

More broadly, different individuals, some representing organizations, have expressed different opinions on what they do or do not want from the tzdb data set. Some would like a full historical record of time zone data, others want stability, many I suspect have absolutely no interest whatsoever in pre-1970 data.

My personal concerns are data stability (agreed managed changes are OK), and the politically-sensitive inaccuracy that results from merging across country boundaries.

Can we keep responses limited on this thread? Perhaps only respond if you think I've mischaracterized the issues at stake here? Or missed something obvious?

Stephen

Paul Eggert

4:47 p.m.

On 9/22/21 11:31 AM, Brooks Harris via tz wrote:

...

Indeed the changes would have significant consequences to my current (in development) tzdb parser which reads the source files directly with no modifications to accumulate all time zones that have existed.

If the goal is to accumulate all timezones that have ever existed in tzdb, the parser should read 'backzone', as 'backzone' has for some time been the repository for entries that were formerly Zones but are now Links in the default database. If the parser reads 'backzone', you shouldn't notice effects due to the recent alike-since-1970 changes. Otherwise the parser should be changed to read 'backzone', regardless of whether the recent alike-since-1970 changes are present. Even 'backzone' won't suffice to achieve the goal I mentioned, as I vaguely recall some deleted Zones never made it to backzone way back when. I don't recall the details, unfortunately. You can get most of the details by looking at the Git history, I expect, but it'd be a bit of a job. If you find out anything from that search, please let us know, as I expect these old deleted Zones should be put into 'backzone' though this is low priority.

Brooks Harris

6:28 p.m.

On 2021-09-23 12:47 PM, Paul Eggert wrote:

...

On 9/22/21 11:31 AM, Brooks Harris via tz wrote:

...
Indeed the changes would have significant consequences to my current (in development) tzdb parser which reads the source files directly with no modifications to accumulate all time zones that have existed.

If the goal is to accumulate all timezones that have ever existed in tzdb, the parser should read 'backzone', as 'backzone' has for some time been the repository for entries that were formerly Zones but are now Links in the default database.

If the parser reads 'backzone', you shouldn't notice effects due to the recent alike-since-1970 changes. Otherwise the parser should be changed to read 'backzone', regardless of whether the recent alike-since-1970 changes are present.

Even 'backzone' won't suffice to achieve the goal I mentioned, as I vaguely recall some deleted Zones never made it to backzone way back when. I don't recall the details, unfortunately. You can get most of the details by looking at the Git history, I expect, but it'd be a bit of a job. If you find out anything from that search, please let us know, as I expect these old deleted Zones should be put into 'backzone' though this is low priority.

Thanks Paul. Yes, I'm reading backzone. As mentioned my approach sought to first accumulate ALL the information and to leave it to higher layers of the client to make choices for its specific purpose. A particular example I'm concerned with is filtering the tzdb data through CLDR windowsZones.xml. Other clients may make other choices for other target OSs or applications. I have a question. What is the criteria of what would qualify a zone or rule set as "pre-1970"? It seems the first rule set of many time zones relevant to a 1970 start precedes 1970. For example: # Zone NAME STDOFF RULES FORMAT [UNTIL] Zone America/New_York -4:56:02 - LMT 1883 Nov 18 12:03:58 -5:00 US E%sT 1920 -5:00 NYC E%sT 1942 -5:00 US E%sT 1946 -5:00 NYC E%sT 1967 -5:00 US E%sT The last zone era begins in 1967. Other time zones may have much earlier 'starting points'. For instance # Rule NAME FROM TO TYPE IN ON AT SAVE LETTER/S Rule Japan 1948 only - May Sat>=1 24:00 1:00 D Rule Japan 1948 1951 - Sep Sat>=8 25:00 0 S Rule Japan 1949 only - Apr Sat>=1 24:00 1:00 D Rule Japan 1950 1951 - May Sat>=1 24:00 1:00 D # Zone NAME STDOFF RULES FORMAT [UNTIL] Zone Asia/Tokyo 9:18:59 - LMT 1887 Dec 31 15:00u 9:00 Japan J%sT That has single zone era starting in 1887, and the latest DST rule in 1951. It seems these time zones could not be changed or merged to backzone. So what constitutes "pre-1970" and justifies a move to backzone? I gather this 'merge' procedure began in 2015? Is there documentation or discussion of that decision? -Brooks

Paul Eggert

12:35 a.m.

On 9/23/21 11:28, Brooks Harris wrote:

...

It seems these time zones could not be changed or merged to backzone. So what constitutes "pre-1970" and justifies a move to backzone?

This is documented in the guidelines for timezone identifiers: https://data.iana.org/time-zones/theory.html#naming Look for "1970" on that page. I gather

...

this 'merge' procedure began in 2015? Is there documentation or discussion of that decision?

I think it was more like 2013 but do not recall the details. Discussions are archived here: http://mm.icann.org/pipermail/tz/

Brooks Harris

1:42 p.m.

On 2021-09-23 8:35 PM, Paul Eggert wrote:

...

On 9/23/21 11:28, Brooks Harris wrote:

...
It seems these time zones could not be changed or merged to backzone. So what constitutes "pre-1970" and justifies a move to backzone?

This is documented in the guidelines for timezone identifiers:

https://data.iana.org/time-zones/theory.html#naming

Look for "1970" on that page.

I gather

...
this 'merge' procedure began in 2015? Is there documentation or discussion of that decision?

I think it was more like 2013 but do not recall the details. Discussions are archived here:

http://mm.icann.org/pipermail/tz/

Sure enough, a long discussion begins in 2013. [tz] Dealing with Pre-1970 Data http://mm.icann.org/pipermail/tz/2013-August/019674.html There are no doubt many others in later years but its difficult to find in the archives. Further references appreciated. Of course I've read 'theory' many times, and reviewed it again. But I still not understanding the criteria by which any time zone would be moved to 'backzone'. Let me try a simple example question: Why is Asia/Tokyo retained while Europe/Oslo is moved?

Paul Eggert

8:59 p.m.

On 9/24/21 6:42 AM, Brooks Harris wrote:

...

Why is Asia/Tokyo retained while Europe/Oslo is moved?

Among named locations, Asia/Tokyo represents a unique set of post-1970 timestamps. Europe/Oslo does not.

Michael H Deckers

9:59 p.m.

On 2021-09-24 13:42, Brooks Harris via tz asked:

...

Why is Asia/Tokyo retained while Europe/Oslo is moved?

Let me try to explain. Europe/Oslo and Europe/Berlin agree since 1966, and tzdb only wants to describe local time scales since 1970, so one of them suffices. The tzdb rule for that case mandates that the larger city is taken; the other city (Oslo) is moved to backzone, where all the currently unnecessary timezone data are kept. As for Asia/Tokyo, when it is found that its local time agrees with that of Pacific/Palau since 1952, then Palau will be moved to backzone. While the typical interfaces require that local times extend before 1970, the exact values for the far past are usually not so important except for a few user groups: astronomers and astrologers need exact data for the past, while data bases and similar systems need stable data for the past. When a user specifies Europe/Oslo, she gets the data of Europe/Berlin unless the system she uses was built with backzone included (which is rare). This leads to surprises: when Berlin switches to permanent summer time some time in the future, then Europe/Oslo must be released from backzone, and the pre-1970 data for Oslo will change for no reason obvious to the user. Therefore, for better stability backzone should be included (at least the part with post-1970 duplicates), while smaller data size is achieved without it. Michael Deckers.

Brooks Harris

11:03 p.m.

On 2021-09-24 5:59 PM, Michael H Deckers wrote:

...

On 2021-09-24 13:42, Brooks Harris via tz asked:

...
Why is Asia/Tokyo retained while Europe/Oslo is moved?

Let me try to explain.

Europe/Oslo and Europe/Berlin agree since 1966, and tzdb only wants to describe local time scales since 1970, so one of them suffices. The tzdb rule for that case mandates that the larger city is taken; the other city (Oslo) is moved to backzone, where all the currently unnecessary timezone data are kept.

As for Asia/Tokyo, when it is found that its local time agrees with that of Pacific/Palau since 1952, then Palau will be moved to backzone.

While the typical interfaces require that local times extend before 1970, the exact values for the far past are usually not so important except for a few user groups: astronomers and astrologers need exact data for the past, while data bases and similar systems need stable data for the past.

When a user specifies Europe/Oslo, she gets the data of Europe/Berlin unless the system she uses was built with backzone included (which is rare). This leads to surprises: when Berlin switches to permanent summer time some time in the future, then Europe/Oslo must be released from backzone, and the pre-1970 data for Oslo will change for no reason obvious to the user. Therefore, for better stability backzone should be included (at least the part with post-1970 duplicates), while smaller data size is achieved without it.

Michael Deckers.

Hi Michael, thanks. So smaller data size is the driving motivator?

Michael H Deckers

8:51 p.m.

On 2021-09-23 16:47, Paul Eggert via tz wrote:

...

Even 'backzone' won't suffice to achieve the goal I mentioned, as I vaguely recall some deleted Zones never made it to backzone way back when. I don't recall the details, unfortunately.

I only know of two cases where a timezone with data was converted to a link and the data never reappeared: America/Santa_Isabel on 2016-01-27 (2016a) Pacific/Yap on 2005-08-22 (2005l) Some timezones such as Asia/Ishigaki were in tzdb for a very short time and then were dropped without trace (1999b). Michael Deckers.

Murray S. Kucherawy

7:16 p.m.

On Wed, Sep 22, 2021 at 3:52 AM Stephen Colebourne via tz <tz@iana.org> wrote:

...

Can we keep responses limited on this thread? Perhaps only respond if you think I've mischaracterized the issues at stake here? Or missed something obvious?

My understanding is that these data are being moved from the regional files to the backzone file. It's been pointed out before that a compile-time option can be set to include those entries in the production output of the build. If that's wrong or incomplete, please do correct me. In either case, can you please explain why that compile-time option is not an acceptable solution for those who object to the change? -MSK

Tom Lane

7:34 p.m.

"Murray S. Kucherawy via tz" <tz@iana.org> writes:

...

My understanding is that these data are being moved from the regional files to the backzone file. It's been pointed out before that a compile-time option can be set to include those entries in the production output of the build.

Correct.

...

If that's wrong or incomplete, please do correct me. In either case, can you please explain why that compile-time option is not an acceptable solution for those who object to the change?

There are two core difficulties from my perspective: 1. It's not possible to separate the new backzone zones from the old. It's therefore impossible to generate a TZif tree that matches the prior dataset: you can either lose the data for the moved zones, or gain it back while also absorbing a whole lot of other changes of dubious quality. (If they weren't dubious, they wouldn't have been in backzone to begin with.) Either choice forces dubious changes to one or another subset of the zones. We know for certain that not including backzone will degrade the data for the moved zones. It's less clear how damaging adding backzone will be to the overall quality of the data set, but presumably if people felt really good about those entries, they'd have been in the main data set. 2. This approach puts it on individual tzdb distributors to decide which of these two options to choose. Some will choose differently than others, meaning we'll now have two received versions of tzdb, which is as bad as a fork from the perspective of end users. (It was argued that we already have problem #2 because some distributors already use backzone. AFAICT that's only a small minority though. It'd likely become a much bigger issue.) regards, tom lane

Tom Lane

7:51 p.m.

I wrote:

...

1. It's not possible to separate the new backzone zones from the old. It's therefore impossible to generate a TZif tree that matches the prior dataset: you can either lose the data for the moved zones, or gain it back while also absorbing a whole lot of other changes of dubious quality. (If they weren't dubious, they wouldn't have been in backzone to begin with.) Either choice forces dubious changes to one or another subset of the zones.

BTW, maybe it's necessary to clarify: adding backzone to the compile does not result in a change in the set of zone names that are generated. What it changes is the data presented for some of the zones.

...

From my perspective, it'd have been better or at least more honest if the moved zones had disappeared from the TZif tree altogether. It would then be apparent to end users that they were depending on a no-longer-supported zone definition. As-is, it's a subtle data change that you might not notice for a really long time, leaving you with a big mess to clean up when you do notice.

regards, tom lane

Paul Eggert

5:06 p.m.

On 9/22/21 12:51 PM, Tom Lane via tz wrote:

...

From my perspective, it'd have been better or at least more honest if the moved zones had disappeared from the TZif tree altogether.

That's what we did long ago, but as I vaguely recall there was resistance to removing data. Hence 'backzone', which is not part of the default database, and which contains data that's unnecessary according to the current guidelines, data that are often of lower quality. The current disagreement is mostly over whether part of 'backzone' is necessary. There is an argument that some (but not all) of the 'backzone' data is necessary in order to accommodate certain political concerns, and that the guidelines should be altered to say that these political concerns are enough to justify Zones that would not otherwise be needed. I'm skeptical of that argument, partly because that means there'll be more low-quality data in the database, partly because it means more work for us and for our users for essentially zero benefit, and partly because the more we entangle tzdb with politics the more time we'll waste in political debates.

Tom Lane

11:02 p.m.

I wrote:

...

2. This approach puts it on individual tzdb distributors to decide which of these two options to choose. Some will choose differently than others, meaning we'll now have two received versions of tzdb, which is as bad as a fork from the perspective of end users.

...

(It was argued that we already have problem #2 because some distributors already use backzone. AFAICT that's only a small minority though. It'd likely become a much bigger issue.)

To put some detail on that claim ... I ran around and checked systems I had handy to see whether the vendor-provided tzdb includes backzone. I checked this by seeing whether Africa/Timbuktu contained different data from Africa/Abidjan, which hasn't been true since 2014f unless you built with backzone. (Some of the system images I checked are a year or two old, but it seems unlikely that any vendors would have changed their policies recently.) Of Red Hat (both Fedora and RHEL) Debian macOS FreeBSD OpenBSD NetBSD not one is building with backzone. I think it's reasonably safe to assert that the current population of backzone users is negligible. (Of course, Windows would be the elephant in the room here, but last I heard Windows uses their own timezone database not tzdb.) regards, tom lane

Brooks Harris

11:25 p.m.

On 2021-09-22 7:02 PM, Tom Lane via tz wrote:

...

I wrote:

...
2. This approach puts it on individual tzdb distributors to decide which of these two options to choose. Some will choose differently than others, meaning we'll now have two received versions of tzdb, which is as bad as a fork from the perspective of end users. (It was argued that we already have problem #2 because some distributors already use backzone. AFAICT that's only a small minority though. It'd likely become a much bigger issue.) To put some detail on that claim ... I ran around and checked systems I had handy to see whether the vendor-provided tzdb includes backzone. I checked this by seeing whether Africa/Timbuktu contained different data from Africa/Abidjan, which hasn't been true since 2014f unless you built with backzone. (Some of the system images I checked are a year or two old, but it seems unlikely that any vendors would have changed their policies recently.) Of

Red Hat (both Fedora and RHEL) Debian macOS FreeBSD OpenBSD NetBSD

not one is building with backzone. I think it's reasonably safe to assert that the current population of backzone users is negligible.

(Of course, Windows would be the elephant in the room here, but last I heard Windows uses their own timezone database not tzdb.)

regards, tom lane

As I understand it Windows uses tzdb as filtered through CLDR: unicode-org/cldr https://github.com/unicode-org/cldr/blob/main/common/supplemental/metaZones.... So its "based on" tzdb source data but is a sub-set of tzdb time zones mapped to reverse compatible Windows "time zones". I don't believe it supports any pre-1970 data.

Howard Hinnant

11:34 p.m.

On Sep 22, 2021, at 7:02 PM, Tom Lane via tz <tz@iana.org> wrote:

...

(Of course, Windows would be the elephant in the room here, but last I heard Windows uses their own timezone database not tzdb.)

Windows now ships IANA data as part of their C++20 std::lib. I believe they use ICU as the delivery vehicle, though that could change in the future. This is in addition to their own timezone database for their OS API. Howard

Eliot Lear

5:49 a.m.

On 22.09.21 21:34, Tom Lane via tz wrote:

...

We know for certain that not including backzone will degrade the data for the moved zones.

Unless of course if the data moved is questionable of questionable origin. Eliot

Scott Kilpatrick

12:40 p.m.

In another thread, Paul wrote:

...

Locations in countries like Norway and Sweden got special treatment by being Zones, whereas locations in countries like Angola and Ethiopia were only Links.

To which Robert Elz asked, why not instead resolve that discrepancy by restoring full data definitions, in the `africa` file, for Africa/Luanda and Africa/Addis_Ababa? That seems like a better resolution. After all, those two IDs had distinct definitions in the past, and then Paul converted them to Links back in 2014 (see commits [1] and [2], respectively). But really my question is whether Stephen C. et al would accept such a change. If Europe/Stockholm *becoming a Link and losing pre-1970 history* is an unacceptable change to data stability, what about Africa/Luanda *ceasing to be a Link and gaining pre-1970 history*? [1] https://github.com/eggert/tz/commit/94f941ebd1a02356253290333da2977c3ae28aaf [2] https://github.com/eggert/tz/commit/6f6f20f0bc739ce3dfd503700b793c5e459e309d On Thu, Sep 23, 2021 at 1:49 AM Eliot Lear via tz <tz@iana.org> wrote:

...

On 22.09.21 21:34, Tom Lane via tz wrote:

...
We know for certain that not including backzone will degrade the data for the moved zones.

Unless of course if the data moved is questionable of questionable origin.

Eliot

Michael H Deckers

12:48 p.m.

On 2021-09-23 12:40, Scott Kilpatrick via tz asked:

...

But really my question is whether Stephen C. et al would accept such a change. If Europe/Stockholm*becoming a Link and losing pre-1970 history* is an unacceptable change to data stability, what about Africa/Luanda *ceasing to be a Link and gaining pre-1970 history*?

That is exactly what we had in tzdb from 1993 until 2014g. Michael Deckers.

Stephen Colebourne

12:53 p.m.

On Thu, 23 Sept 2021 at 13:41, Scott Kilpatrick via tz <tz@iana.org> wrote:

...

In another thread, Paul wrote:

...
Locations in countries like Norway and Sweden got special treatment by being Zones, whereas locations in countries like Angola and Ethiopia were only Links.

To which Robert Elz asked, why not instead resolve that discrepancy by restoring full data definitions, in the `africa` file, for Africa/Luanda and Africa/Addis_Ababa? That seems like a better resolution. After all, those two IDs had distinct definitions in the past, and then Paul converted them to Links back in 2014 (see commits [1] and [2], respectively).

But really my question is whether Stephen C. et al would accept such a change. If Europe/Stockholm *becoming a Link and losing pre-1970 history* is an unacceptable change to data stability, what about Africa/Luanda *ceasing to be a Link and gaining pre-1970 history*?

It is one of the possible solutions that is acceptable to me. The question is really "what rule determines whether an ID in the main data files does or does not contain pre-1970 data". Stephen

Tom Lane

2:32 p.m.

Stephen Colebourne via tz <tz@iana.org> writes:

...

...
But really my question is whether Stephen C. et al would accept such a change. If Europe/Stockholm *becoming a Link and losing pre-1970 history* is an unacceptable change to data stability, what about Africa/Luanda *ceasing to be a Link and gaining pre-1970 history*?

...

It is one of the possible solutions that is acceptable to me. The question is really "what rule determines whether an ID in the main data files does or does not contain pre-1970 data".

That's half of the issue. The other half is "if a zone does not contain relevant pre-1970 data, how would end users know that?" The existing answer is "there's no way for them to know", which is pretty awful, especially when we seem to be willing to change the data at the drop of a hat. regards, tom lane

Paul Eggert

5:22 p.m.

On 9/23/21 7:32 AM, Tom Lane via tz wrote:

...

That's half of the issue. The other half is "if a zone does not contain relevant pre-1970 data, how would end users know that?"

For most users the answer is simple: the pre-1970 data should not be relied upon for anything other than an estimate. That data are intended only for exact locations named, and most people in the world do not live in those exact locations. For example, Europe/Berlin is wrong for most of Germany before 1970. Similarly for Europe/Rome and Italy, Asia/Shanghai and China, etc., etc. etc. None of the pre-1970 data should be used for anything important, at least not without understanding its severe limitations. Luckily for us, tzdb users invariably follow the advice I'm giving here, whether consciously or not.

Tom Lane

5:31 p.m.

Paul Eggert <eggert@cs.ucla.edu> writes:

...

On 9/23/21 7:32 AM, Tom Lane via tz wrote:

...
That's half of the issue. The other half is "if a zone does not contain relevant pre-1970 data, how would end users know that?"

...

For most users the answer is simple: the pre-1970 data should not be relied upon for anything other than an estimate. That data are intended only for exact locations named, and most people in the world do not live in those exact locations.

That's fine, but if that is the policy, then the data should in fact be valid (to the best of our ability) for the named location. It's not okay to present data for Europe/Oslo that is simply known incorrect for Oslo. Whether it's exact for anywhere else in Norway is a red herring. In other words, I'm of the opinion that the answer to my question above should be "we don't ever do that". Right now, it isn't. regards, tom lane

Paul Eggert

5:43 p.m.

On 9/23/21 10:31 AM, Tom Lane wrote:

...

...
For most users the answer is simple: the pre-1970 data should not be relied upon for anything other than an estimate. That data are intended only for exact locations named, and most people in the world do not live in those exact locations. That's fine, but if that is the policy, then the data should in fact be valid (to the best of our ability) for the named location.

Oh, sorry, I should have been clear that I was talking about Zones and not Links. Links don't follow that rule, and never have.

Tom Lane

5:49 p.m.

Paul Eggert <eggert@cs.ucla.edu> writes:

...

On 9/23/21 10:31 AM, Tom Lane wrote:

...
That's fine, but if that is the policy, then the data should in fact be valid (to the best of our ability) for the named location.

...

Oh, sorry, I should have been clear that I was talking about Zones and not Links. Links don't follow that rule, and never have.

I think that's a pretty arbitrary and wrongheaded policy, mainly because end users don't know the difference between a zone and a link. All that they can see is whether the data associated with a name is correct for that location; and I maintain that it always should be. Basically the problem here is that the link mechanism, which was (I suppose) designed to serve just to provide common aliases for a zone name, has been abused to allow substitution of more- or less- accurate data for the same zone name. That wasn't a good idea, and its shortcomings are becoming more obvious as more zones are subjected to this abuse. regards, tom lane

Guy Harris

8:03 p.m.

On Sep 23, 2021, at 10:31 AM, Tom Lane via tz <tz@iana.org> wrote:

...

That's fine, but if that is the policy, then the data should in fact be valid (to the best of our ability) for the named location. It's not okay to present data for Europe/Oslo that is simply known incorrect for Oslo. Whether it's exact for anywhere else in Norway is a red herring.

So what is a given tzid supposed to refer to when using it for pre-1970 data? The city in its name, and no other location on Earth? If so, that puts further restrictions on the use of tzdb for pre-1970 dates and times, over and above "this data is based on whatever sources were most recently used to produce it; the correctness of that data depends on the sources used, and some sources are known to have made a number of mistakes, and that data is also subject to change as if more demonstrably correct data is found". (It's not acceptable to say "the city in its name, and no other location on Earth" for post-1970 data unless everybody's willing to have an *very large* number of tzids, maintain such a tzdb, and use such a tzdb.)

Tom Lane

8:12 p.m.

Guy Harris <gharris@sonic.net> writes:

...

On Sep 23, 2021, at 10:31 AM, Tom Lane via tz <tz@iana.org> wrote:

...
That's fine, but if that is the policy, then the data should in fact be valid (to the best of our ability) for the named location. It's not okay to present data for Europe/Oslo that is simply known incorrect for Oslo. Whether it's exact for anywhere else in Norway is a red herring.

...

So what is a given tzid supposed to refer to when using it for pre-1970 data? The city in its name, and no other location on Earth?

Why not? That would seem to be the obvious interpretation. Besides which, the existing practice of citing LMT for pre-standardization dates already limits any arguable exactitude to the city's meridian. regards, tom lane

Guy Harris

9:38 p.m.

On Sep 23, 2021, at 1:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

...

Guy Harris <gharris@sonic.net> writes:

...
On Sep 23, 2021, at 10:31 AM, Tom Lane via tz <tz@iana.org> wrote:

...
That's fine, but if that is the policy, then the data should in fact be valid (to the best of our ability) for the named location. It's not okay to present data for Europe/Oslo that is simply known incorrect for Oslo. Whether it's exact for anywhere else in Norway is a red herring.

...
So what is a given tzid supposed to refer to when using it for pre-1970 data? The city in its name, and no other location on Earth?

Why not? That would seem to be the obvious interpretation. Besides which, the existing practice of citing LMT for pre-standardization dates already limits any arguable exactitude to the city's meridian.

OK, then let's make that explicit in, for example, Theory, by indicating that if you use the pre-1970 data for a given tzdb region for anything other than the city that appears in the tzid, there is even less guarantee of correctness than if you use it for the city. What if that data 1) has been determined to be correct based on sources we deem sufficiently reliable and 2) it applies to more than just the city and its environs, but it doesn't apply to, for example, the entire country? Should one or more new tzdb regions be created for those other locations?

scs＠eskimo.com

9:51 p.m.

Guy Harris wrote:

...

What if [the pre-1970] data 1) has been determined to be correct based on sources we deem sufficiently reliable and 2) it applies to more than just the city and its environs, but it doesn't apply to, for example, the entire country? Should one or more new tzdb regions be created for those other locations?

I'm starting to feel like I'm dealing with Vroomfondel and Majikthise here, demanding rigidly-defined areas of doubt and uncertainty. :-) In all seriousness, I think we need to take some of the energy we're expending on the pre-1970 database correctness problem, and redirect it towards exhorting end users *not* to depend on pre-1970 data too much. In particular, if there are people out there who are taking pre-1970 local timestamps, using tzdb to compute proleptic Unix time_t values, and storing those timestamps in databases, such that changes to tzdb will later cause interestingly-wrong local timestamps to pop back out, we need to get the word out that this may not be the best way of doing things, after all!

Tom Lane

10:20 p.m.

Guy Harris <gharris@sonic.net> writes:

...

OK, then let's make that explicit in, for example, Theory, by indicating that if you use the pre-1970 data for a given tzdb region for anything other than the city that appears in the tzid, there is even less guarantee of correctness than if you use it for the city.

Seems reasonable.

...

What if that data 1) has been determined to be correct based on sources we deem sufficiently reliable and 2) it applies to more than just the city and its environs, but it doesn't apply to, for example, the entire country? Should one or more new tzdb regions be created for those other locations?

I think existing policy is that we don't make separate zones simply because of pre-1970 differences. I'm fine with continuing that, as long as it's modulated by additional policies: * I don't want to remove any existing zone, even if it's a zone that wouldn't be created per current policy: the amount of grief we'd get for that is just not worth it. * And I do want to see some policy that allows new per-country zones to be created, even if they don't currently differ from their neighbors. I think that that would largely reduce political pressure from people who feel that "XYZ ought to have its own zone", and it has the merit of future-proofing tzdb against possible country-specific timekeeping law changes. For example, if Norway someday changes its laws such that it's no longer identical to Europe/Berlin, then it's a whole lot less painful all around if Europe/Oslo already exists and is already being used by most of the affected users. They'll just automatically get the right time from a tzdb update, without having to adjust their settings. As I've said before, I don't think the TZ Coordinator need be proactive about creating such zones. He need only be willing to accept suitably- researched patches from people who are excited about those cases. regards, tom lane

Guy Harris

10:43 p.m.

On Sep 23, 2021, at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

...

For example, if Norway someday changes its laws such that it's no longer identical to Europe/Berlin, then it's a whole lot less painful all around if Europe/Oslo already exists and is already being used by most of the affected users. They'll just automatically get the right time from a tzdb update, without having to adjust their settings.

It already exists, *and will continue to exist with Paul's most recent changes*. With those changes, it will be a Link rather than a Zone, but that does not prevent any software from using Europe/Oslo as a tzid. Systems that use the current location, if available, can continue to choose Europe/Oslo for locations in Norway (tzdb region boundaries aren't maintained by the tzdb project, and they could continue to treat Europe/Oslo as a separate tzdb region). The issue here is whether any software *above* the tzdb project's reference implementation currently uses backzone tzids, and, if they don't, whether they will stop using Europe/Oslo if it's moved to backzone.

Tom Lane

10:48 p.m.

Guy Harris <gharris@sonic.net> writes:

...

On Sep 23, 2021, at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

...
For example, if Norway someday changes its laws such that it's no longer identical to Europe/Berlin, then it's a whole lot less painful all around if Europe/Oslo already exists and is already being used by most of the affected users. They'll just automatically get the right time from a tzdb update, without having to adjust their settings.

...

It already exists, *and will continue to exist with Paul's most recent changes*.

Perhaps I should have used a different example. Or are you just spreading confusion for the sake of it? regards, tom lane

Guy Harris

11:16 p.m.

On Sep 23, 2021, at 3:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

...

Guy Harris <gharris@sonic.net> writes:

...
On Sep 23, 2021, at 3:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

...
For example, if Norway someday changes its laws such that it's no longer identical to Europe/Berlin, then it's a whole lot less painful all around if Europe/Oslo already exists and is already being used by most of the affected users. They'll just automatically get the right time from a tzdb update, without having to adjust their settings.

...
It already exists, *and will continue to exist with Paul's most recent changes*.

Perhaps I should have used a different example.

The only such example would be one of the tzdb regions that has already been removed, rather than moved to backzone; zones that *were* moved to backzone continue to exist as aliases, and the data is available.

...

Or are you just spreading confusion for the sake of it?

No. I'm pointing out that the data doesn't *disappear*, nor does the ability for software to use Europe/Oslo or America/Montreal or Africa/Asmara as a tzid. However, software that uses tzids might limit itself to tzids corresponding to Zones rather than to Links, even in cases where the tzid does have a Zone in backzone. If so, then the automatic adjustment you mention won't happen; for software that continues to use tzids corresponding to Zones in backzone, it will. So the question is "what software limits itself to tzids corresponding to Zones rather than Links?", and I suspect the answer is "enough software that moving Zones to backzone could be disruptive". Perhaps if there were three categories of tzids: tzids corresponding to Zones in the main files; tzids corresponding to Zones in backzone; tzids corresponding to Links regardless of whether you use backzone or not; (which might involve removing backzone in favor of some other mechanism for putting tzids into those categories) that might make it more likely that tzids corresponding to Zones in backzone will be used by software. That doesn't solve the pre-1970 data issue, but, again, we need to decide what we want to do with that.

Paul Eggert

5:46 p.m.

On 9/23/21 10:22 AM, Paul Eggert wrote:

...

Luckily for us, tzdb users invariably follow the advice I'm giving here, whether consciously or not.

I should have made it clear that this last sentence was snark. Apologies if anyone took it seriously. (We all know nobody follows my advice, right? :-)

Steffen Nurpmeso

10:33 p.m.

Paul Eggert wrote in <f256f787-9a80-35bc-271b-5269ebf3836f@cs.ucla.edu>: |On 9/23/21 7:32 AM, Tom Lane via tz wrote: |> That's half of the issue. The other half is "if a zone does not contain |> relevant pre-1970 data, how would end users know that?" | |For most users the answer is simple: the pre-1970 data should not be |relied upon for anything other than an estimate. That data are intended |only for exact locations named, and most people in the world do not live |in those exact locations. | |For example, Europe/Berlin is wrong for most of Germany before 1970. I do not think so? MEZ (CET) was introduced in 1893-04-01, with clock adjustments all over Germany (-30 .. +30 minute range i read). This goes so far that GNU date(1) in conjunction with the IANA TZ displays "date: invalid date X" where X is 00:00:00..00:06:31" for TZ=Europe/Berlin! |Similarly for Europe/Rome and Italy, Asia/Shanghai and China, etc., etc. |etc. | |None of the pre-1970 data should be used for anything important, at |least not without understanding its severe limitations. | |Luckily for us, tzdb users invariably follow the advice I'm giving here, |whether consciously or not. --End of <f256f787-9a80-35bc-271b-5269ebf3836f@cs.ucla.edu> --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)

Paul Eggert

12:55 a.m.

On 9/23/21 15:33, Steffen Nurpmeso wrote:

...

| |For example, Europe/Berlin is wrong for most of Germany before 1970.

I do not think so? MEZ (CET) was introduced in 1893-04-01,

Sure, but Berlin was under Russian control just after World War II and the Russian commander of the city decided to run Berlin's clocks on Moscow time because that matched the Red Army's clocks. This is why Europe/Berlin is known to be wrong for most of Germany, as well as for Norway and for Sweden. Really, nobody should be taking the pre-1970 part of tzdb seriously unless they really know what they're doing, which they probably don't. That part of tzdb is meant primarily to demonstrate that the infrastructure can in *theory* specify pre-1970 civil time, even though you generally don't want to be *using* that specification outside of artificial Zones like PST8PDT and Etc/UTC.

Michael H Deckers

9:12 p.m.

On 2021-09-23 14:32, Tom Lane via tz asked:

...

That's half of the issue. The other half is "if a zone does not contain relevant pre-1970 data, how would end users know that?"

One way is to determine the offset local - UT for the distant past. Most tzdb timezones that are not links use an offset corresponding closely to the eastern longitude of the location, eg as given in zone.tab. (Locations assumed to be uninhabited in the far past are the exception.) Michael Deckers.

Paul Eggert

5:16 p.m.

On 9/23/21 5:53 AM, Stephen Colebourne via tz wrote:

...

It is one of the possible solutions that is acceptable to me. The question is really "what rule determines whether an ID in the main data files does or does not contain pre-1970 data".

Here's one way to automate any answer that question. 1. Find out which ISO country codes even 'backzone' doesn't cover, and put into 'backzone' Zone entries for one location in each such country code. (This will require some research.) 2. Use the patch I mentioned here: https://mm.icann.org/pipermail/tz/2021-September/030456.html 3. Write a file 'zonepercc.bp' that determines which part of 'backzone' you want, to support an equitable rule involving one Zone per ISO country code (plus all the Zones that already exist). This is as simple as a list of Zones you want. 4. Run 'make BACKPICK=zonepercc.bp PACKRATDATA=backzone tzdata.zi' No doubt there are other ways to do it, but for this particular way the software exists already. All that's needed is some patient research and data selection. There's no need to fork; I'd be happy to include the resulting zonepercc.bp file in the tzdb code, and to discuss further ways to make the automation even easier.

Robert Elz

3:42 p.m.

Date: Thu, 23 Sep 2021 07:49:31 +0200 From: Eliot Lear via tz <tz@iana.org> Message-ID: <1949110b-0201-63a3-bb7f-728bc868a7f0@lear.ch> | Unless of course if the data moved is questionable of questionable origin. The data may have 3 different (perhaps more if one cared enough) trust levels... 1. It can be backed by verifiable sources that seem to be reliable, so we believe it. 2. It is disputed by verifiable sources, that seem to be reliable, so we consider it incorrect. 3. Neither of those, and we simply have no way to know. In that, no-one questions use of type 1 data, and type 2 simply becomes type 1 after the obvious correction is made. The disputes are all about #3 - and certainly you can call that questionable, or arguable, or various other similar terms - none of which actually mean anything. Everything can be questioned, and so is questionable. What the time currently is, right now, in London can be questioned, so that is questionable. Should that data be removed from tzdb because of that? If you want to claim some data is incorrect, provide evidence to support that, otherwise you're simply maligning whoever supplied the data in the first place (nb: not necessarily the person who edited it into tzdb, but the source of their information.) For any data to be in tzdb, at some time or other, someone believed it might have been correct (no-one has ever, typos excepted, deliberately added bogus data) - their belief might have been incorrect, but to remove that data, just wondering "was that correct" should not be enough - we should need #2 type evidence - something that shows that some entry is actually wrong. That has happened often enough that we know corrections get made when needed. Simply discarding data because we neither ever had (or perhaps simply did not record, perhaps because it was not provided to us) the evidence to support some apparent claim is not a sane way forward. And it is even worse when the effect of this is to (effectively) remove the old data (anything present in backzone is effectively removed, even though it is still there in the file, as no-one uses it). Eg: to re-use an example that has been used here a bit recently, if someone today looks at the app which translates dates & times of historic events to the local time of the requestor ("what time was it, for me, when Armstrong stepped onto the moon's surface?" etc). If someone in Montreal did that, and got what they believe was the wrong answer (because they remember the event), then investigate why, and see "We use the data for Toronto, and that answer was correct for Toronto" they'd perhaps wonder why, but that would be the end of it. But if they were instead using the data for Montreal, and it turned out to be incorrect, they'd actually complain (probably to the authors of the app, who would then complain here). That way the data might be corrected. But not if it is buried in backzone. kre

Tom Lane

4:35 p.m.

Robert Elz via tz <tz@iana.org> writes:

...

If you want to claim some data is incorrect, provide evidence to support that, otherwise you're simply maligning whoever supplied the data in the first place (nb: not necessarily the person who edited it into tzdb, but the source of their information.)

Yeah, that is a fair point. It looks like a lot of the stuff that initially got put into backzone was put there because the only source for it was Shanks, and we've found enough errors in Shanks to have healthy distrust for it. Still, in the absence of other evidence, that remains the best available data.

...

... That way the data might be corrected. But not if it is buried in backzone.

After thinking about this for awhile, I think that a lot of our problems can be summarized as "backzone is misdesigned". Having links in the base dataset that are overwritten by more-extensive information if you enable backzone is just an awful design, because it's got next-door-to-zero discoverability. End users cannot tell if they have the best available info or a lie, unless they understand enough about TZif to look into that file tree --- and unless it's set up using symlinks, which seems to be distinctly a minority practice, even that won't make it very clear. As a modest proposal, therefore, I suggest that we should consider just dropping all the overwritable links. That way, if you have a base dataset, it will be obvious that Africa/Timbuktu is not good data because it won't be there at all. If you enable backzone (which perhaps needs a better name), then you get Africa/Timbuktu along with a ton of other data of perhaps dubious reliability. But you know what you have. The current design where the same zone identifier could refer to two different datasets is bad by any rational standard, and we've only gotten away with it because the field usage of backzone is negligible. But if we keep moving stuff to backzone, that's going to change. There's a separate question of what the rule should be for putting a given zone into the "base" or "extended" collections. But maybe that becomes less of a hill that people are ready to die on. If we do it this way, I foresee a lot of distros starting to ship the "extended" collection -- but they won't be shipping different definitions of the same zone name. regards, tom lane

Eliot Lear

4:37 p.m.

On 23.09.21 18:35, Tom Lane wrote:

...

Robert Elz via tz <tz@iana.org> writes:

...
If you want to claim some data is incorrect, provide evidence to support that, otherwise you're simply maligning whoever supplied the data in the first place (nb: not necessarily the person who edited it into tzdb, but the source of their information.) Yeah, that is a fair point. It looks like a lot of the stuff that initially got put into backzone was put there because the only source for it was Shanks, and we've found enough errors in Shanks to have healthy distrust for it. Still, in the absence of other evidence, that remains the best available data.

Yes. The "maligning" has occurred in the comments in the database already, by the very person who inserted the data. I don't feel the need to go farther. Eliot

Paul Eggert

5:26 p.m.

On 9/23/21 9:35 AM, Tom Lane via tz wrote:

...

After thinking about this for awhile, I think that a lot of our problems can be summarized as "backzone is misdesigned".

I'm fully in agreement that better designs are are possible, and would be happy to hear about what those better designs would be. This'll take more work than we could shoehorn into 2021b of course.

Tom Lane

5:38 p.m.

Paul Eggert <eggert@cs.ucla.edu> writes:

...

On 9/23/21 9:35 AM, Tom Lane via tz wrote:

...
After thinking about this for awhile, I think that a lot of our problems can be summarized as "backzone is misdesigned".

...

I'm fully in agreement that better designs are are possible, and would be happy to hear about what those better designs would be. This'll take more work than we could shoehorn into 2021b of course.

Agreed. *Please* listen to the universal advice you've received that 2021b should be 2021a plus the Samoa changes. I think there is a path forward, but if you bull forward with shipping git tip as-is, there will not be. regards, tom lane

Paul Eggert

12:31 a.m.

On 9/23/21 10:38, Tom Lane wrote:

...

I think there is a path forward, but if you bull forward with shipping git tip as-is, there will not be.

I don't see why not. The alternate version that has only the Samoa fix should give you a clear path forward in the short term. In the longer term the idea is to come up with a single-repository version that combines the two approaches, which should be a path forward for both of us. Among other things it will give us time to redesign how 'backzone' works, something that really can't be done before the end of the week.

Philip Paeps

2:39 a.m.

On 2021-09-24 08:31:14 (+0800), Paul Eggert via tz wrote:

...

On 9/23/21 10:38, Tom Lane wrote:

...
I think there is a path forward, but if you bull forward with shipping git tip as-is, there will not be.

I don't see why not. The alternate version that has only the Samoa fix should give you a clear path forward in the short term. In the longer term the idea is to come up with a single-repository version that combines the two approaches, which should be a path forward for both of us. Among other things it will give us time to redesign how 'backzone' works, something that really can't be done before the end of the week.

Releasing the current tree as-is will further inflame tempers on this mailing list, making it an even less conducive environment for that discussion. Releasing 2021b as 2021a+Samoa will allow us to discuss the way forward without having to worry about a world with a mix of 2021b and 2021a1 and people who install 2021b and get bitten by behaviour they don't expect. Philip -- Philip Paeps Senior Reality Engineer Alternative Enterprises

Robert Elz

12:52 p.m.

Date: Thu, 23 Sep 2021 17:31:14 -0700 From: Paul Eggert via tz <tz@iana.org> Message-ID: <56559a18-6bab-d340-67ca-80892dfcf55f@cs.ucla.edu> | The alternate version that has only the Samoa fix | should give you a clear path forward in the short term. Only if it is tz-latest on the IANA distribution site.

...

From a different message: | If the rule were "at least one Zone per political unit that has the legal | power to set its own rules", we'd have dozens more Zones than we do now, | Zones that would cause more trouble than they'd cure.

What trouble would that be? I fail to see it. And from an earlier message: | Another reason - more important to my mind - is that sticking with 2021a's | blueprint would mean that its equity problems would remain present in | whoever uses that blueprint. Whatever issues exist would remain for a while yes, but assuming they really exist, they've been there for some time now - it is not crucial that any fix be applied this week. Furtherm the solution you're adopting is the wrong one, you cannot answer people who claim to be disadvantaged by disadvantaging others - "sorry, we did it to you, but it really isn't just you, we're screwing this other group as well..." Not a rational answer. The correct fix is to be inclusive. To take an example from a different area which I suspect applies to you. I assume your department does not discriminate against women applicants, right? (Substitute any other sometimes disadvantaged group for "women" in this paragraph if you like). What would happen if one year there were simply no women applicants? Do you go out and kidnap a few, and force them to enrol, in the name of equity? I doubt it. I know, the solution, you refuse to enrol any non-women so that yu can show that you're not discriminating against women. That's fair and equitable, right? Perhaps but also insane. Here we don't even need to go add Angola, Niger, etc - unless someone from there supplies data and requests that it be included. Hypothetical discrimination is not discrimination, just noise. If there is a request to add a zone for one of those, then simply add it. All equitable and fair, and very very simple. And finally, the most recent suggestion: | OK, how about if I scale back the current round of link-merging, so that | it's on the scale of what we've done in previous releases? That would depend upon what "scale back" means. If it means "none of" that would be just fine. If it means "all currently proposed, except Oslo" then no, that will not do at all. Just release 2021a + Samoa (plus Jordan if you feel that's ready, that one is far less urgent) and everything else can wait. There can be another release in a month, or even a week or two, if we can find something we agree upon. kre

Paul Eggert

9:10 p.m.

On 9/24/21 5:52 AM, Robert Elz wrote:

...

From a different message: | If the rule were "at least one Zone per political unit that has the legal | power to set its own rules", we'd have dozens more Zones than we do now, | Zones that would cause more trouble than they'd cure.

What trouble would that be? I fail to see it.

I've tried to cover that topic in other emails but have no doubt missed some points. But here's another point: the set of political units with these powers have changed with time, so do we allow new Zones only political units with these powers since 1970? Or if we reject the other political units then what's our justification for that?

...

If there is a request to add a zone for one of those, then simply add it. All equitable and fair, and very very simple.

Unfortunately it would not work, for reasons I discussed a few minutes ago (not yet in the tzdb archive, but I hope you can find it).

...

| OK, how about if I scale back the current round of link-merging, so that | it's on the scale of what we've done in previous releases?

That would depend upon what "scale back" means. If it means "none of" that would be just fine. If it means "all currently proposed, except Oslo" then no, that will not do at all.

It wouldn't be either. It would be on the scale that we've done in previous releases. Typically, we'd change about 10 Zones to Links (and move the resulting data to 'backzone' so it wouldn't get lost). This worked well in practice.

Robert Elz

10:53 p.m.

Date: Fri, 24 Sep 2021 14:10:29 -0700 From: Paul Eggert <eggert@cs.ucla.edu> Message-ID: <5189c80f-7242-d8dd-8f87-6e25c714283a@cs.ucla.edu> | I've tried to cover that topic in other emails but have no doubt missed | some points. But here's another point: the set of political units with | these powers have changed with time Yes. | so do we allow new Zones only | political units with these powers since 1970? For zones that have never existed (are not in the main files, or backzone) only for authorities that exist at the time of asking. Once created (and stable, ie: we can fix typos, and other errors, soon after creation - certainly any time between when created in the source db and the next release) zones should never be removed. Never. Moving to backzone is removed for this purpose. | Or if we reject the other | political units then what's our justification for that? The point is to avoid authorities from feeling the need to fiddle their timezones just to qualify for a zone. Only current authorities with the appropriate actual capacity can do that, nothing that no longer exists, and nothing that is merely claimimg power ("I should be in charge") but actually has none. | Unfortunately it would not work, for reasons I discussed a few minutes | ago (not yet in the tzdb archive, but I hope you can find it). I have seen it. I don't believe them. | It wouldn't be either. It would be on the scale that we've done in | previous releases. Typically, we'd change about 10 Zones to Links (and | move the resulting data to 'backzone' so it wouldn't get lost). This | worked well in practice. Please, don't. Just don't. Avoid any provocation for now, and do exactly 2021a plus updates for Samoa and Jordan (and anything else similar that exists). No zone merges at all. None of that is urgent. They can all wait for a later update (or perhaps, never happen at all). kre

Paul Eggert

4:58 p.m.

On 9/22/21 12:34 PM, Tom Lane via tz wrote:

...

1. It's not possible to separate the new backzone zones from the old.

No, actually it is possible. I've done it, by using the patch I described here: https://mm.icann.org/pipermail/tz/2021-September/030456.html That patch does exactly the separation you describe. It separates the new backzone Zones from the old ones. When using it, I obtained tzdata.zi and TZif files that exactly matched 2021a except for the changes you're not objecting to. I verified the match by zdumping the resulting TZif files and comparing the outputs. This all took quite a bit of work, but it was work I was willing to undertake to avoid the (to mind worse) alternative of forking tzdb.

...

2. This approach puts it on individual tzdb distributors to decide which of these two options to choose.

Yes, that problem is inherent to any fork or option-equivalent-to-fork. That's one reason I recommended against using the patch. Another reason - more important to my mind - is that sticking with 2021a's blueprint would mean that its equity problems would remain present in whoever uses that blueprint.

Tom Lane

5:15 p.m.

Paul Eggert <eggert@cs.ucla.edu> writes:

...

On 9/22/21 12:34 PM, Tom Lane via tz wrote:

...
1. It's not possible to separate the new backzone zones from the old.

...

No, actually it is possible. I've done it, by using the patch I described here: https://mm.icann.org/pipermail/tz/2021-September/030456.html

Of course I meant "tzdb, as currently constituted, provides no way to do that without an unreasonable amount of work, as well as needing intimate knowledge of the data set". Applying those patches would change that state of affairs. However ...

...

...
2. This approach puts it on individual tzdb distributors to decide which of these two options to choose.

...

Yes, that problem is inherent to any fork or option-equivalent-to-fork.

... I agree that this isn't a very desirable direction to go in, because we really don't want different distributors shipping different definitions of the same zone name. I suggested nearby [1] that this is fundamentally caused by misdesign of the backzone mechanism. I think a lot of the current angst is caused by the idea that we are (depending on build options) shipping zone definitions that are known to not be the best available data. That's bad both intrinsically and because it means different platforms might define the same zone name differently. But it would be very simple to just not do that. regards, tom lane [1] https://mm.icann.org/pipermail/tz/2021-September/030632.html

dpatte

7:40 p.m.

In particular, in the current scenario, can we propose additions and corrections to the backzone file as new data becomes apparent, including adding new pre1970 zones?Sent from my Galaxy -------- Original message --------From: "Murray S. Kucherawy via tz" <tz@iana.org> Date: 2021-09-22 15:16 (GMT-05:00) To: Stephen Colebourne <scolebourne@joda.org> Cc: Time Zone Mailing List <tz@iana.org> Subject: Re: [tz] Issues with pre-1970 information in TZDB On Wed, Sep 22, 2021 at 3:52 AM Stephen Colebourne via tz <tz@iana.org> wrote:Can we keep responses limited on this thread? Perhaps only respond if you think I've mischaracterized the issues at stake here? Or missed something obvious?My understanding is that these data are being moved from the regional files to the backzone file. It's been pointed out before that a compile-time option can be set to include those entries in the production output of the build.If that's wrong or incomplete, please do correct me. In either case, can you please explain why that compile-time option is not an acceptable solution for those who object to the change?-MSK

Paul Eggert

5:27 p.m.

On 9/22/21 12:40 PM, dpatte via tz wrote:

...

In particular, in the current scenario, can we propose additions and corrections to the backzone file as new data becomes apparent, including adding new pre1970 zones?

Yes.

Stephen Colebourne

10:08 p.m.

On Wed, 22 Sept 2021 at 20:16, Murray S. Kucherawy <superuser@gmail.com> wrote:

...

On Wed, Sep 22, 2021 at 3:52 AM Stephen Colebourne via tz <tz@iana.org> wrote:

...
Can we keep responses limited on this thread? Perhaps only respond if you think I've mischaracterized the issues at stake here? Or missed something obvious?

My understanding is that these data are being moved from the regional files to the backzone file. It's been pointed out before that a compile-time option can be set to include those entries in the production output of the build.

If that's wrong or incomplete, please do correct me. In either case, can you please explain why that compile-time option is not an acceptable solution for those who object to the change?

The subtlety is in how the data set is consumed. While many downstream projects use the makefile, not all do. A significant portion of downstream users make use of the source files directly, with their own parsers. ie. there is no ability to use a compile-time option. Those parsers are not setup to use backzone even if it were a valid option (Tom Lane has indicated how you can't reverse engineer the 2021a data cleanly from backzone as it is mixed with lots of other historic data that has been similarly abandoned). Even if the parsers were to be updated (and in the case of CLDR's backwards compatibiilty it is not clear that they can), it is not appropriate to have to make the change without a significant notice period. For example, for many years Joda-Time has treated Links as permanent aliases, thus with 2021b any user of the library (which is probably millions of applications) would see Europe/Oslo actively replaced by Europe/Berlin. This behaviour was chosen as Links were originally only really used for spelling changes, ie. proper aliases. However since the data set has been fiddled with, there is now no way to tell whether a Link is a true alias (a spelling change) or a meaningful ID like Europe/Oslo. Given the above, tzdb users like myself have asked for the opposite compile-time flag. One where all the data is in the main files, but where a flag is used to slim the data set to focus on post-1970 data for those that need smaller data files. I hope this explains why backzone and compile-time flags are not a solution to the issue. Stephen

Murray S. Kucherawy

6:15 a.m.

New subject: Interfaces (was Re: Issues with pre-1970 information in TZDB)

On Wed, Sep 22, 2021 at 3:08 PM Stephen Colebourne <scolebourne@joda.org> wrote:

...

The subtlety is in how the data set is consumed. While many downstream projects use the makefile, not all do. A significant portion of downstream users make use of the source files directly, with their own parsers. ie. there is no ability to use a compile-time option. Those parsers are not setup to use backzone even if it were a valid option (Tom Lane has indicated how you can't reverse engineer the 2021a data cleanly from backzone as it is mixed with lots of other historic data that has been similarly abandoned). Even if the parsers were to be updated (and in the case of CLDR's backwards compatibiilty it is not clear that they can), it is not appropriate to have to make the change without a significant notice period.

I had a feeling this was the answer. I think this warrants a critical question: What is the intended interface to the data? If the intent of the coordinators/maintainers is that consumers will use the compiled version, then one could argue that those consuming the data directly do so at their own risk, because that interface isn't expressly supported. If so, it would be a layering violation to read the files directly. But maybe this was never specified, and the interface applications are to use is thus ambiguous (i.e., they have a choice), and it's possibly too far along now to compel them to change. If the community does choose to undertake a revision of RFC 6557, this might be one of the things that should be reviewed and codified. -MSK

Guy Harris

6:23 a.m.

New subject: Interfaces (was Re: Issues with pre-1970 information in TZDB)

On Sep 22, 2021, at 11:15 PM, Murray S. Kucherawy via tz <tz@iana.org> wrote:

...

I think this warrants a critical question: What is the intended interface to the data? If the intent of the coordinators/maintainers is that consumers will use the compiled version,

In the early days of the project, I don't remember there being any intent whatsoever to have the tzdb source files being directly read by consumers; the primary use of the database was in UN*X libraries that either incorporated the reference implementation of code to read the compiled files and use them to implement localtime()/mktime()/etc., or in UN*X libraries that implemented their own code to do that (I think that's what GNU libc did).

...

then one could argue that those consuming the data directly do so at their own risk, because that interface isn't expressly supported. If so, it would be a layering violation to read the files directly.

But maybe this was never specified, and the interface applications are to use is thus ambiguous (i.e., they have a choice), and it's possibly too far along now to compel them to change.

I don't think it was *explicitly* specified, and, for whatever reason, at least some consumers chose to directly read the source files. It'd probably be technically difficult, and contentious, to change that. A possibility would be to have: the *real* source files, which might even have a different file format if that turns out to be useful (e.g., specifying CLDR-style metazones); the "textual compiled files", which are in the current format, and which would be generated as part of the release process; the binary compiled files, just as we currently have.

Stephen Colebourne

9:32 a.m.

New subject: Interfaces (was Re: Issues with pre-1970 information in TZDB)

On Thu, 23 Sept 2021 at 07:16, Murray S. Kucherawy via tz <tz@iana.org> wrote:

...

On Wed, Sep 22, 2021 at 3:08 PM Stephen Colebourne <scolebourne@joda.org> wrote:

...
The subtlety is in how the data set is consumed. While many downstream projects use the makefile, not all do. A significant portion of downstream users make use of the source files directly, with their own parsers. ie. there is no ability to use a compile-time option. Those parsers are not setup to use backzone even if it were a valid option (Tom Lane has indicated how you can't reverse engineer the 2021a data cleanly from backzone as it is mixed with lots of other historic data that has been similarly abandoned). Even if the parsers were to be updated (and in the case of CLDR's backwards compatibiilty it is not clear that they can), it is not appropriate to have to make the change without a significant notice period.

I had a feeling this was the answer.

I think this warrants a critical question: What is the intended interface to the data?

While it is a good critical question, the reality is that the source files are used by downstream users. In fact, there are many different downstream parsers of the source files. It is certainly far too late to remove access to the source files - they are a de facto interface to the project. Previous changes to the source files have already been met with strong resistance. Stephen

Paul Eggert

5:41 p.m.

New subject: Interfaces (was Re: Issues with pre-1970 information in TZDB)

On 9/22/21 11:15 PM, Murray S. Kucherawy via tz wrote:

...

If the community does choose to undertake a revision of RFC 6557, this might be one of the things that should be reviewed and codified.

As far as RFCs go, Arthur, Ken and I have already written RFC 8536, which specifies TZif format and in effect codifies tzfile.5. A successor to RFC 8536 is in the works. I'd also welcome a companion RFC to specify .zi files, essentially codifying zic.8. It'd be some work, tho. Plus, it's more than just specifying the input format to 'zic'; it's specifying a lot of the other stuff we've been discussing recently. Which'd be a lot of work.

Paul Eggert

5:36 p.m.

On 9/22/21 3:08 PM, Stephen Colebourne via tz wrote:

...

there is now no way to tell whether a Link is a true alias (a spelling change) or a meaningful ID like Europe/Oslo.

Sure, but this has always been true of Links; there's nothing new here. For example, the very first edition of 'backward', back in 1993, had this Link: Link Australia/Sydney Australia/ACT even though Sydney is not in the Australian Capital Territory. That's not a simple spelling change; it's a meaningful ID, if one's definition of "meaningful" includes political meanings. Europe/Oslo is similar to Australia/ACT in this respect.

Phake Nick

7:26 p.m.

Thing is, if the platform maintainer decided to include only the main tz file but not those in the link which contain more historical time data for more locations, why would they want to switch to a separate fork of tz database which differ from the main database primarily in term of containing more historical time data for more locations? 在 2021年9月22日週三 18:52，Stephen Colebourne via tz <tz@iana.org> 寫道：

...

(David Braverman recently asked for a summary of the issue. This is my attempt to summarize in a relatively even-handed way)

The TZDB theory file starts with the following:

"The tz database attempts to record the history and predicted future of civil time scales. It organizes time zone and daylight saving time data by partitioning the world into timezones whose clocks all agree about timestamps that occur after the POSIX Epoch (1970-01-01 00:00:00 UTC. Although 1970 is a somewhat-arbitrary cutoff, there are significant challenges to moving the cutoff earlier even by a decade or two, due to the wide variety of local practices before computer timekeeping became prevalent."

I have always thought this is a wise choice for the management of time zone data. I suspect that most people on this list would agree.

The rules for creating new IDs where post-1970 data differs are well understood and, I believe, agreed upon. Apart from debates over negative daylight saving, I do not believe there are any significant issues with the data post-1970.

Despite the theory file introduction above, TZDB does in fact contain data for some locations before 1970. The issue at hand is which timezone IDs are allowed to have this data and which are not. And more broadly, the degree to which it is acceptable to change the status quo on the pre-1970 data.

Over many years, pre-1970 data was added to many different IDs. However, over recent years, pre-1970 data has been removed. (Technically, it has been moved to another file, but for the purposes of those consuming the main set of tzdb files it has effectively been removed.)

The net result of recent changes are various situations which might be described as "cleaner", "more equitable", "nonsensical", "unacceptable" or "less equitable" depending on your viewpoint. For example, Europe/Berlin has pre-1970 data, but Europe/Oslo and Europe/Stockholm do not. In technical terms, Europe/Oslo and Europe/Stockholm are now aliases for Europe/Berlin (known as Links in tzdb).

Where the problem lies is that a user who queries Europe/Oslo for the timezone offset in 1950 used to get the data for Oslo but will now get the data for Berlin. Depending on your viewpoint this is "irrelevant", "unfortunate" or "offensive".

Why has the data changed? Because Oslo, Stockholm and Berlin all have the same timezone data post-1970, and Berlin is the largest city. The argument is that if only post-1970 data matters (as per the theory file), then there is no justification for three separate data sets when only one will do, and the one chosen is the one with the largest city. The counter argument is that merging data sets across country boundaries is unacceptable and politically naive, particularly when there were no complaints about the previous status quo.

More broadly, different individuals, some representing organizations, have expressed different opinions on what they do or do not want from the tzdb data set. Some would like a full historical record of time zone data, others want stability, many I suspect have absolutely no interest whatsoever in pre-1970 data.

My personal concerns are data stability (agreed managed changes are OK), and the politically-sensitive inaccuracy that results from merging across country boundaries.

Can we keep responses limited on this thread? Perhaps only respond if you think I've mischaracterized the issues at stake here? Or missed something obvious?

Stephen

Guy Harris

9:05 p.m.

On Sep 22, 2021, at 3:52 AM, Stephen Colebourne via tz <tz@iana.org> wrote:

...

More broadly, different individuals, some representing organizations, have expressed different opinions on what they do or do not want from the tzdb data set. Some would like a full historical record of time zone data, others want stability, many I suspect have absolutely no interest whatsoever in pre-1970 data.

Presumably "stability" here includes "don't convert existing tzdb regions to links merely because they have the same 1970-and-later data" (putting them into the "don't merge" camp); does it also include "don't split existing tzdb regions due to the discovery that different parts of those regions had different pre-1970 data"? (Those who want a full historical record of time zone data would presumably want those existing regions split.)

Tom Lane

9:17 p.m.

Guy Harris via tz <tz@iana.org> writes:

...

Presumably "stability" here includes "don't convert existing tzdb regions to links merely because they have the same 1970-and-later data" (putting them into the "don't merge" camp); does it also include "don't split existing tzdb regions due to the discovery that different parts of those regions had different pre-1970 data"? (Those who want a full historical record of time zone data would presumably want those existing regions split.)

For my own purposes, splits are not particularly problematic as long as the existing name can keep the existing historical data --- nobody is forced to adopt the new zone immediately, and whatever stored timestamps they have still mean the same thing. It gets a little more exciting if we discover that, say, Europe/Berlin's back data is more appropriate to some other place than it is to Berlin. OTOH, it's not clear how that differs from "Europe/Berlin's back data is wrong", so probably we'd just fix it and move on. In general, I think that incremental changes that clearly (or at least plausibly) improve the accuracy of tzdb's description of reality are fine. One thing that's particularly sticking in my craw about the changes under debate is that they undeniably made tzdb worse as a description of reality in the places at issue. regards, tom lane

Guy Harris

9:39 p.m.

On Sep 22, 2021, at 2:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

...

Guy Harris via tz <tz@iana.org> writes:

...
Presumably "stability" here includes "don't convert existing tzdb regions to links merely because they have the same 1970-and-later data" (putting them into the "don't merge" camp); does it also include "don't split existing tzdb regions due to the discovery that different parts of those regions had different pre-1970 data"? (Those who want a full historical record of time zone data would presumably want those existing regions split.)

For my own purposes, splits are not particularly problematic as long as the existing name can keep the existing historical data --- nobody is forced to adopt the new zone immediately, and whatever stored timestamps they have still mean the same thing.

It gets a little more exciting if we discover that, say, Europe/Berlin's back data is more appropriate to some other place than it is to Berlin. OTOH, it's not clear how that differs from "Europe/Berlin's back data is wrong", so probably we'd just fix it and move on.

In general, I think that incremental changes that clearly (or at least plausibly) improve the accuracy of tzdb's description of reality are fine. One thing that's particularly sticking in my craw about the changes under debate is that they undeniably made tzdb worse as a description of reality in the places at issue.

There's "description of 1970-and-after reality" and there's "description of pre-1970 reality". As far as I know, tzdb must accurately describe 1970-and-after reality, especially present-day-reality (otherwise a lot of its users will have to use their own fork that *does* accurately describe it), and if that requires that a region be split, so be it; anybody who wants "stability" in the sense of "never ever split regions" is best advised to spend their time lobbying governments not to take actions that would require region splitting rather than asking the tzdb maintainer not to split regions. It's pre-1970 reality that's causing this issue (as the subject line of this thread indicates). For those who, in Stephen's taxonomy:

...

More broadly, different individuals, some representing organizations, have expressed different opinions on what they do or do not want from the tzdb data set. Some would like a full historical record of time zone data, others want stability, many I suspect have absolutely no interest whatsoever in pre-1970 data.

would like a full historical record of time zone data, their rules would presumably be "if we discover that a given tzdb region didn't uniformly have the same offset or rules all the way back to the establishment of standard time, we should split it". They are presumably using backzone, as, without it, you don't have a full historical record of time zone data, and would thus be unaffected by the proposed merger. For those who have no interest whatsoever in pre-1970 data, some might be "split or don't split, I don't care" and some might be "don't split", depending on the extent to which they care about stability in the sense of "no new regions unless required by 1970-and-later differences". They presumably aren't bothered by the proposed merger, as it only affects pre-1970 data. For those who want stability, they are, presumably, those who *do* care about pre-1970 data, so they presumably don't want regions to disappear in favor of links due to one region sharing 1970-and-later data with another region, and are thus opposed to the proposed merger. Do any of them also want "no new regions unless required by 1970-and-later differences"?

Tom Lane

10:06 p.m.

Guy Harris <gharris@sonic.net> writes:

...

For those who, in Stephen's taxonomy: ... would like a full historical record of time zone data, their rules would presumably be "if we discover that a given tzdb region didn't uniformly have the same offset or rules all the way back to the establishment of standard time, we should split it". They are presumably using backzone, as, without it, you don't have a full historical record of time zone data, and would thus be unaffected by the proposed merger.

I don't think that last statement follows from the available information. People who are not intimately familiar with how tzdb is set up won't even know that backzone exists. Many of the consumers who might have a stake in this discussion don't have any option there anyway, because they are using a tzdb distribution made by somebody else. I will stipulate that if you'd been using backzone all along, you'd be unaffected by the May changes. The problem is that a lot of people who never heard of that, and have no input into its use, will be affected. I'm not convinced that the existing users of backzone are the people to optimize this change for.

...

For those who want stability, they are, presumably, those who *do* care about pre-1970 data, so they presumably don't want regions to disappear in favor of links due to one region sharing 1970-and-later data with another region, and are thus opposed to the proposed merger. Do any of them also want "no new regions unless required by 1970-and-later differences"?

As you pointed out yourself, all consumers of tzdb had better be prepared for zone splits, because one could be forced on them at any time due to governmental decisions. I doubt there are people for whom a split due to ancient history is worse than a split due to current changes. regards, tom lane

Guy Harris

10:32 p.m.

On Sep 22, 2021, at 3:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

...

Guy Harris <gharris@sonic.net> writes:

...
For those who, in Stephen's taxonomy: ... would like a full historical record of time zone data, their rules would presumably be "if we discover that a given tzdb region didn't uniformly have the same offset or rules all the way back to the establishment of standard time, we should split it". They are presumably using backzone, as, without it, you don't have a full historical record of time zone data, and would thus be unaffected by the proposed merger.

I don't think that last statement follows from the available information. People who are not intimately familiar with how tzdb is set up won't even know that backzone exists. Many of the consumers who might have a stake in this discussion don't have any option there anyway, because they are using a tzdb distribution made by somebody else.

Currently, people who need a full historical record of time zone data had better acquaint themselves with how tzdb is set up on the system on which they're using it, as that affects whether they have a full historical record of time zone data - if the data they're using doesn't include backzone, there are a number of tzdb regions for which it does *not* include a full historical record of time zone data, e.g. Montreal. They'd also better look at the tzdb source files to see where there is pre-1970 data suspected of not being correct - again, see Montreal: # Canada # # From Paul Eggert (2015-03-24): # Since 1970 most of Quebec has been like Toronto; see # America/Toronto. However, earlier versions of the tz database # mistakenly relied on data from Shanks & Pottenger saying that Quebec # differed from Ontario after 1970, and the following rules and zone # were created for most of Quebec from the incorrect Shanks & # Pottenger data. The post-1970 entries have been corrected, but the # pre-1970 entries are unchecked and probably have errors. # I.e., people in the "need a full historical record of time zone data" group are advised not to take an "I'll just trust that somebody's set up the data in the way that I need and have vetted it" approach. Those in the "would like a full historical record of time zone data" but aren't in the "need a full historical record of time zone data" should consider why they would like that full historical record and either 1) put themselves in the "need a full historical record of time zone data" group and do the necessary work to ensure that they have that record or 2) arrange to work without a full historical record of time zone data.

...

I will stipulate that if you'd been using backzone all along, you'd be unaffected by the May changes.

I.e., anybody in the "need a full historical record of time zone data" who has a clue will be unaffected. Those *in that group* (i.e, the *need* group) who do *not* have a clue need to be given one, so that they do the necessary work.

...

The problem is that a lot of people who never heard of that, and have no input into its use, will be affected.

OK, so which group(s) are those people in? Those in the "need a full historical record of time zone data" group, see above. Those in the "a full historical record of time zone data is nice to have, but not necessary" should decide whether they belong in that group, and what that means, or just decide that they're in the "stability is important" group or the "I don't really care about pre-1970 data so, if it gets merged, that's not a real problem" group.

Howard Hinnant

10:23 p.m.

On Sep 22, 2021, at 5:17 PM, Tom Lane via tz <tz@iana.org> wrote:

...

One thing that's particularly sticking in my craw about the changes under debate is that they undeniably made tzdb worse as a description of reality in the places at issue.

This is my objection as well. Before the change the tzdb does a “best effort” on pre-1970 dates. After the change it’s as if we are intentionally putting in data known to be incorrect and different from the correct data we had. People who consume this data have no clue that there is a 1970 border, and no way to change whatever tzdb version their vendor provides for them. And in an ideal world, when said consumer runs her program on a different platform, she would get the same results. We shouldn’t make that portability less likely. We should not create multiple competing versions of the database, whether hosted separately or together. The latest released version should have the most accurate data to the best of our ability. Otherwise we get https://xkcd.com/927/ . Giving vendors competing versions of this database hurts everyone involved: * It hurts the vendors because they have to spend time figuring out what they ship. * It hurts library and application programmer because it makes their code less portable. * It hurts people using programs because they can get different answers when switching products or platforms. * It hurts the reputation of this project in the eyes of those who discover that we purposefully made such a mess. Maybe it was a mistake to put pre-1970 data into the tzdb. But that decision was made long ago, and this genie can not be stuffed back into the bottle. Now we have to deal with the fact that people are using pre-1970 data. Howard

Guy Harris

10:38 p.m.

On Sep 22, 2021, at 3:23 PM, Howard Hinnant <howard.hinnant@gmail.com> wrote:

...

On Sep 22, 2021, at 5:17 PM, Tom Lane via tz <tz@iana.org> wrote:

...
One thing that's particularly sticking in my craw about the changes under debate is that they undeniably made tzdb worse as a description of reality in the places at issue.

This is my objection as well. Before the change the tzdb does a “best effort” on pre-1970 dates.

"Best" in what sense? The 2021a data makes an effort to keep the data for Europe/Oslo separate from the data for Europe/Berlin. It makes no effort to keep the data for America/Montreal separate from the data for America/Toronto. The comment in backzone about the America/Montreal data is # Canada # # From Paul Eggert (2015-03-24): # Since 1970 most of Quebec has been like Toronto; see # America/Toronto. However, earlier versions of the tz database # mistakenly relied on data from Shanks & Pottenger saying that Quebec # differed from Ontario after 1970, and the following rules and zone # were created for most of Quebec from the incorrect Shanks & # Pottenger data. The post-1970 entries have been corrected, but the # pre-1970 entries are unchecked and probably have errors. # The comment in the 2021a europe about the Europe/Oslo data is # Norway # http://met.no/met/met_lex/q_u/sommertid.html (2004-01) agrees with Shanks & # Pottenger. so the difference here appears to be how well-vetted the pre-1970 data is vetted. So does "best effort" mean "provide pre-1970 data in the default configuration if we have reason to believe it's accurate, but provide it only in backzone if we don't"?

Tom Lane

11:43 p.m.

Guy Harris <gharris@sonic.net> writes:

...

So does "best effort" mean "provide pre-1970 data in the default configuration if we have reason to believe it's accurate, but provide it only in backzone if we don't"?

That would be a fine policy in my opinion. Indeed, up till the May changes I thought it *was* the policy. Silly me. regards, tom lane

Guy Harris

11:58 p.m.

On Sep 22, 2021, at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

...

Guy Harris <gharris@sonic.net> writes:

...
So does "best effort" mean "provide pre-1970 data in the default configuration if we have reason to believe it's accurate, but provide it only in backzone if we don't"?

That would be a fine policy in my opinion. Indeed, up till the May changes I thought it *was* the policy. Silly me.

I sent a message to the list noting that: stuff has been moved to backzone in the past; the backzone comment in 2021a speaks of at least some of the data there being questionable; as an example, America/Montreal, moved to backzone in 2015 or so, has comments suggesting it's not reliable, while Europe/Oslo's comment seems to suggests that the original sources agree with The Norwegian Meteorological Institute and thus that the data is presumably considered reliable. Hopefully useful discussion of the "considered-to-be-reliable pre-1970 data vs. not-considered-to-be-relaible pre-1970 data" question will ensue.

Tom Lane

3:24 a.m.

Guy Harris <gharris@sonic.net> writes:

...

I sent a message to the list noting that:

...

stuff has been moved to backzone in the past;

...

the backzone comment in 2021a speaks of at least some of the data there being questionable;

...

as an example, America/Montreal, moved to backzone in 2015 or so, has comments suggesting it's not reliable, while Europe/Oslo's comment seems to suggests that the original sources agree with The Norwegian Meteorological Institute and thus that the data is presumably considered reliable.

...

Hopefully useful discussion of the "considered-to-be-reliable pre-1970 data vs. not-considered-to-be-relaible pre-1970 data" question will ensue.

I just spent awhile looking through backzone as of 2021a, and the stuff that was added to it in May. Nearly all of the older entries have either no documentation at all, or comments explicitly questioning their veracity. I count 82 non-comment Zone lines in the 2021a data, of which perhaps three have enough positive supporting commentary that maybe there's a case for promoting them into the default zone set. The May patches moved 32 new Zones into backzone. These zones are, as far as I can see, mostly *far* better attested than what was there before. There are definitely a few that belong in backzone if the standard is "do we trust this data", but it's hard to avoid the conclusion that a double standard has been applied here. Don't take my word for it; look through the files for yourself. regards, tom lane

Paul Eggert

12:16 a.m.

On 9/22/21 15:23, Howard Hinnant via tz wrote:

...

Before the change the tzdb does a “best effort” on pre-1970 dates.

No, quite the reverse. There are many places in 2021a where we have some data (often unreliable) about pre-1970 timestamps, and we put that data into 'backzone' instead of doing a "best effort" on pre-1970 dates in the default database. Examples include Africa/Addis_Ababa, Africa/Freetown, and Asia/Harbin. I expect that there are more Zones in 2021a where this sort of thing happens, than Zones in the changes being discussed. There's a lot of precedent for the recent changes, and a lot of experience for the fact that the fallout from this kind of change is negligible in practice.

Stephen Colebourne

10:19 p.m.

On Wed, 22 Sept 2021 at 22:05, Guy Harris via tz <tz@iana.org> wrote:

...

On Sep 22, 2021, at 3:52 AM, Stephen Colebourne via tz <tz@iana.org> wrote:

...
More broadly, different individuals, some representing organizations, have expressed different opinions on what they do or do not want from the tzdb data set. Some would like a full historical record of time zone data, others want stability, many I suspect have absolutely no interest whatsoever in pre-1970 data.

Presumably "stability" here includes "don't convert existing tzdb regions to links merely because they have the same 1970-and-later data" (putting them into the "don't merge" camp);

Yes.

...

does it also include "don't split existing tzdb regions due to the discovery that different parts of those regions had different pre-1970 data"? (Those who want a full historical record of time zone data would presumably want those existing regions split.)

Yes, personally I have no desire to split zones solely because of pre-1970 data. (The concept of splitting zones to get one per ISO country is one possible solution to the issue, but not one I am tied to if a different more acceptable rule can be derived) The question is really down to which of the post-1970 zones we do have are entitled to have pre-1970 data. Stephen

1719

Age (days ago)

1721

Last active (days ago)

List overview

Download

72 comments

17 participants

participants (17)

Brooks Harris
David Braverman
dpatte
Eliot Lear
Guy Harris
Howard Hinnant
Michael H Deckers
Murray S. Kucherawy
Paul Eggert
Phake Nick
Philip Paeps
Robert Elz
Scott Kilpatrick
scs＠eskimo.com
Steffen Nurpmeso
Stephen Colebourne
Tom Lane