Error in Scandinavian tz data
Hello! I'm developing an application using the JavaScript Intl API (which is relying on the IANA tz db) for converting between local time and UTC. I started receiving error reports from users about wrong calculations (particularly from Denmark in the 1940's), and I discovered that in some cases daylight saving time was applied in a way not in accordance to historical facts. I investigated the IANA tz database files and I discovered that a change has happened in version 2022b (in the file "europe"). Compared to the earlier version (2022a), 4 zones seem to have their rules removed and instead linked to Europe/Berlin: Link Europe/Berlin Arctic/Longyearbyen Link Europe/Berlin Europe/Copenhagen Link Europe/Berlin Europe/Oslo Link Europe/Berlin Europe/Stockholm This, evidently, is not correct, and will result in erroneous results for these zones. Best wishes, Anders Rosendahl
On Sat, Jun 17, 2023 at 08:20:53PM +0200, Anders Rosendahl via tz wrote:
Hello!
I'm developing an application using the JavaScript Intl API (which is relying on the IANA tz db) for converting between local time and UTC.
I started receiving error reports from users about wrong calculations (particularly from Denmark in the 1940's), and I discovered that in some cases daylight saving time was applied in a way not in accordance to historical facts.
I investigated the IANA tz database files and I discovered that a change has happened in version 2022b (in the file "europe"). Compared to the earlier version (2022a), 4 zones seem to have their rules removed and instead linked to Europe/Berlin:
Link Europe/Berlin Arctic/Longyearbyen Link Europe/Berlin Europe/Copenhagen Link Europe/Berlin Europe/Oslo Link Europe/Berlin Europe/Stockholm
This, evidently, is not correct, and will result in erroneous results for these zones.
Oh dear. This horse has been beaten to death and then resurrected for another turn, the trenches are very deep. Basically, IANA tz only cares about dates after 1970-01-01 and for that reason the maintainer have started merging zones that don't differ after that date but does still provide the data but it won't be included in the default builds (which is what everyone gets from their updates) Other people have, just like you, said that this is a bad move and this have lead to a fork of the project but I do not remember where that is located, this fork is used in at least some of the BSDs. My personal opinion is that the default IANA build should say nothing about dates before 1970-01-01 - it should, in particular, not claim that changes for some random place (like Berlin) applies to some other place (like Copenhagen). /MF
On Jun 27, 2023, at 12:19 AM, Magnus Fromreide via tz <tz@iana.org> wrote:
My personal opinion is that the default IANA build should say nothing about dates before 1970-01-01 - it should, in particular, not claim that changes for some random place (like Berlin) applies to some other place (like Copenhagen).
And, presumably, it shouldn't make any other claims about Copenhagen before 1970-01-0, either. Or about Berlin, or about New York City, or any of the other cities chosen for tzdb region names, for that matter. (Not that Continent/City should be read as only making claims about that city.) (At this point, I wish we could get away with doing as Microsoft do, and just return -1 for all attempts to call localtime() for dates prior to the UNIX Epoch.)
On Tue, 27 Jun 2023 at 08:20, Magnus Fromreide via tz <tz@iana.org> wrote:
On Sat, Jun 17, 2023 at 08:20:53PM +0200, Anders Rosendahl via tz wrote: Other people have, just like you, said that this is a bad move and this have lead to a fork of the project but I do not remember where that is located, this fork is used in at least some of the BSDs.
https://github.com/JodaOrg/global-tz I don't want to maintain the fork, but I have no choice given the current situation at IANA tzdb is simply broken as the OP notes.
My personal opinion is that the default IANA build should say nothing about dates before 1970-01-01 - it should, in particular, not claim that changes for some random place (like Berlin) applies to some other place (like Copenhagen).
Indeed, this would be an acceptable and equitable outcome. It isn't ideal, as TZDB actually holds the data for pre-1970, but at least the pre-1970 data would be equally wrong everywhere, rather than favouring some places over others (which is my primary objection). On Tue, 27 Jun 2023 at 08:31, Guy Harris via tz <tz@iana.org> wrote:
(At this point, I wish we could get away with doing as Microsoft do, and just return -1 for all attempts to call localtime() for dates prior to the UNIX Epoch.)
The simplest approach would be to determine a rule, eg. - take the standard offset that applied on most days (modal average) for the 5 years from 1970 to 1975 - and use that pre-1970. - or, take the LMT of the city and round to the nearest hour So long as the answer is reasonable for most cases, it would be fine. Stephen
On 2023-06-27 01:35, Stephen Colebourne via tz wrote:
The simplest approach would be to determine a rule, eg. - take the standard offset that applied on most days (modal average) for the 5 years from 1970 to 1975 - and use that pre-1970. - or, take the LMT of the city and round to the nearest hour So long as the answer is reasonable for most cases, it would be fine.
This would take the existing TZif files (admittedly problematic, as you say) and make them worse, as they'd become wrong for every location, even the location that names the Zone. Surely it would be better to discard the pre-1970 data - then users would be on notice that it's missing. And there's a standard way to do that, documented in the Makefile: use 'make ZFLAGS=-r@0'. Perhaps this option should be documented more prominently. It's not clear that -r@0 should be the Makefile default, though, as that could well cause more trouble than it would cure. For example, it would cause the following behavior: $ export TZ=Europe/Copenhagen $ date -r 1; date -r 0; date -r -1 Thu Jan 1 01:00:01 CET 1970 Thu Jan 1 01:00:00 CET 1970 Wed Dec 31 23:59:59 -00 1969 and the UT offset zero and abbreviation -00 of pre-1970 timestamps would likely give many users pause. That being said, in installations not needing pre-1970 timestamps, -z@0 is a clear win. Getting back to Anders's original question, one can get behavior closer to what he asked with something like this: make PACKRATDATA=backzone PACKRATLIST=zone.tab install although I don't recommend this for ordinary applications. One should treat the result with a grain of salt as it's likely wrong for Copenhagen and it's certainly a bit wrong for Aarhus etc. This is a hazard of trying to push tzdata further than it can reliably bear.
On Tue, 27 Jun 2023 at 10:31, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 2023-06-27 01:35, Stephen Colebourne via tz wrote:
The simplest approach would be to determine a rule, eg. - take the standard offset that applied on most days (modal average) for the 5 years from 1970 to 1975 - and use that pre-1970. - or, take the LMT of the city and round to the nearest hour So long as the answer is reasonable for most cases, it would be fine.
This would take the existing TZif files (admittedly problematic, as you say) and make them worse, as they'd become wrong for every location, even the location that names the Zone.
Being slightly wrong everywhere is a much better outcome than what we have today.
Surely it would be better to discard the pre-1970 data - then users would be on notice that it's missing. And there's a standard way to do that, documented in the Makefile: use 'make ZFLAGS=-r@0'. Perhaps this option should be documented more prominently.
It's not clear that -r@0 should be the Makefile default, though, as that could well cause more trouble than it would cure. For example, it would cause the following behavior:
$ export TZ=Europe/Copenhagen $ date -r 1; date -r 0; date -r -1 Thu Jan 1 01:00:01 CET 1970 Thu Jan 1 01:00:00 CET 1970 Wed Dec 31 23:59:59 -00 1969
and the UT offset zero and abbreviation -00 of pre-1970 timestamps would likely give many users pause. That being said, in installations not needing pre-1970 timestamps, -z@0 is a clear win.
In most cases, end users do not pick what options to install with. They just get what they are given by their packager. Since the packager cannot know whether the end user wants pre-1970 data or not, a sensible packager will err on the side of providing more data, not less, and thus want to include pre-1970 data. 'make ZFLAGS=-r@0' is of no interest to packagers precisely because it is obviously wrong. ie. choosing UTC does not make people think, it just means the make option is not used. It is simply not good enough to be a viable choice. It is also not close enough to any values that have previously been placed in long-term storage. Hopefully all this explains why "Surely it would be better to discard the pre-1970 data then users would be on notice that it's missing" (ZFLAGS=-r@0) isn't a viable route forward. The rule-based truncation I outlined above is a compromise position suitable for use as the default in the makefile. I believe it meets the needs of packagers who would hopefully accept it. It provides truncated data pre-1970, but in a way that is not completely unreasonable: * It is good enough for most use cases except those that really care about historical detail * It avoids the weird per-second LMT offsets in the far past that often confuse end users * It is close enough to what end users have in long-term storage to not cause migration issues Just returning UTC does not meet these goals.
it's likely wrong for Copenhagen
I think you do a disservice to TZDB's many authors here. I'd argue that the data for Copenhagen is likely to be entirely correct, as it has had many eyes on it for many years. Beyond that, I think there is also a need to recognise that TZDB's pre-1970 data is the de facto truth for large parts of the world. Most end users don't care about the accuracy, just that someone has made an effort to record it. I still believe that reinstating the data would be by far the best outcome, but a rules-based truncation approach would be a viable alternative if pre-1970 data is not something to be maintained. Stephen
Beyond that, I think there is also a need to recognise that TZDB's pre-1970 data is the de facto truth for large parts of the world. Most end users don't care about the accuracy, just that someone has made an effort to record it. I still believe that reinstating the data would be by far the best outcome, but a rules-based truncation approach would be a viable alternative if pre-1970 data is not something to be maintained.
Hear hear --------------------------- DISCLAIMER: The information contained in this electronic message and in any attachments to this message is intended only for the person or entity to which this electronic message is addressed. If you are not the intended recipient, you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. -----Original Message----- From: tz <tz-bounces@iana.org> On Behalf Of Stephen Colebourne via tz Sent: Tuesday, June 27, 2023 1:09 PM To: Time zone mailing list <tz@iana.org> Subject: Re: [tz] Error in Scandinavian tz data On Tue, 27 Jun 2023 at 10:31, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 2023-06-27 01:35, Stephen Colebourne via tz wrote:
The simplest approach would be to determine a rule, eg. - take the standard offset that applied on most days (modal average) for the 5 years from 1970 to 1975 - and use that pre-1970. - or, take the LMT of the city and round to the nearest hour So long as the answer is reasonable for most cases, it would be fine.
This would take the existing TZif files (admittedly problematic, as you say) and make them worse, as they'd become wrong for every location, even the location that names the Zone.
Being slightly wrong everywhere is a much better outcome than what we have today.
Surely it would be better to discard the pre-1970 data - then users would be on notice that it's missing. And there's a standard way to do that, documented in the Makefile: use 'make ZFLAGS=-r@0'. Perhaps this option should be documented more prominently.
It's not clear that -r@0 should be the Makefile default, though, as that could well cause more trouble than it would cure. For example, it would cause the following behavior:
$ export TZ=Europe/Copenhagen $ date -r 1; date -r 0; date -r -1 Thu Jan 1 01:00:01 CET 1970 Thu Jan 1 01:00:00 CET 1970 Wed Dec 31 23:59:59 -00 1969
and the UT offset zero and abbreviation -00 of pre-1970 timestamps would likely give many users pause. That being said, in installations not needing pre-1970 timestamps, -z@0 is a clear win.
In most cases, end users do not pick what options to install with. They just get what they are given by their packager. Since the packager cannot know whether the end user wants pre-1970 data or not, a sensible packager will err on the side of providing more data, not less, and thus want to include pre-1970 data. 'make ZFLAGS=-r@0' is of no interest to packagers precisely because it is obviously wrong. ie. choosing UTC does not make people think, it just means the make option is not used. It is simply not good enough to be a viable choice. It is also not close enough to any values that have previously been placed in long-term storage. Hopefully all this explains why "Surely it would be better to discard the pre-1970 data then users would be on notice that it's missing" (ZFLAGS=-r@0) isn't a viable route forward. The rule-based truncation I outlined above is a compromise position suitable for use as the default in the makefile. I believe it meets the needs of packagers who would hopefully accept it. It provides truncated data pre-1970, but in a way that is not completely unreasonable: * It is good enough for most use cases except those that really care about historical detail * It avoids the weird per-second LMT offsets in the far past that often confuse end users * It is close enough to what end users have in long-term storage to not cause migration issues Just returning UTC does not meet these goals.
it's likely wrong for Copenhagen
I think you do a disservice to TZDB's many authors here. I'd argue that the data for Copenhagen is likely to be entirely correct, as it has had many eyes on it for many years. Beyond that, I think there is also a need to recognise that TZDB's pre-1970 data is the de facto truth for large parts of the world. Most end users don't care about the accuracy, just that someone has made an effort to record it. I still believe that reinstating the data would be by far the best outcome, but a rules-based truncation approach would be a viable alternative if pre-1970 data is not something to be maintained. Stephen
On 2023-06-27 04:09, Stephen Colebourne via tz wrote:
'make ZFLAGS=-r@0' is of no interest to packagers precisely because it is obviously wrong.
Then I must not be understanding "the default IANA build should say nothing about dates before 1970-01-01", a proposal that you said was acceptable. I thought "make ZFLAGS=-r@0" would implement that proposal, as it builds TZif files that say nothing about timestamps before 1970. But since you're saying "make ZFLAGS=-r@0" is of no interest, it seems the proposal is about something else. If so, it would be helpful to know what the proposal is.
Just returning UTC does not meet these goals.
"make ZFLAGS=-r@0" does not cause tzdb localtime to just return UTC, as localtime also returns an error indication indicating that the actual offset is unknown. This error indication is in the form of the time zone abbreviation "-00".
the data for Copenhagen is likely to be entirely correct
I doubt that, as some of the data are sourced only from Shanks, which has proven to be unreliable.
there is also a need to recognise that TZDB's pre-1970 data is the de facto truth
That would not be accurate advertising. TZDB is only TZDB. It has never been "the de facto truth", unless "truth" means only "act like TZDB Release X". For many years TZDB's files have said that it is "by no means authoritative", and for many years TZDB installations have varied in minor ways from one platform to another.
On Wed, 28 Jun 2023 at 01:26, Paul Eggert <eggert@cs.ucla.edu> wrote:
Then I must not be understanding "the default IANA build should say nothing about dates before 1970-01-01", a proposal that you said was acceptable.
Re-reading the thread, I see where your confusion comes from. What I was agreeing with was the concept of removing attempts to be accurate pre-1970. I then pointed out why being completely inaccurate (all UTC) is not viable as the default. My proposal was:
The simplest approach would be to determine a rule, eg. - take the standard offset that applied on most days (modal average) for the 5 years from 1970 to 1975 - and use that pre-1970. - or, take the LMT of the city and round to the nearest hour So long as the answer is reasonable for most cases, it would be fine.
ie. I believe that if tzdb provided a fixed offset pre-1970, but one linked to LMT or common practice in that location, it would be a suitable approach for most packagers. I reckon to the nearest hour would be sufficient. Stephen
Stephen Colebourne wrote:
Re-reading the thread, I see where your confusion comes from. What I was agreeing with was the concept of removing attempts to be accurate pre-1970. I then pointed out why being completely inaccurate (all UTC) is not viable as the default.
I think Paul noted already that “-00” is not the same as UTC:
"make ZFLAGS=-r@0" does not cause tzdb localtime to just return UTC, as localtime also returns an error indication indicating that the actual offset is unknown. This error indication is in the form of the time zone abbreviation "-00".
-- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
On 2023-06-28 01:17, Stephen Colebourne via tz wrote:
I believe that if tzdb provided a fixed offset pre-1970, but one linked to LMT or common practice in that location
For timestamps before 1970 we could approximate LMT along with a time zone abbreviation saying that the UT offset is approximate. Something equivalent to this, say: Zone America/New_York -5:00 - c-05 1970 Jan 1 0u -5:00 US E%sT Zone Asia/Tokyo 9:00 - c+09 1970 Jan 1 0u 9:00 Japan J%sT The idea would be that a time zone abbreviation "c+NN" stands for "circa +NN", where the exact offset is unspecified. Software that cares whether an offset is approximate could look in the abbreviation for leading "c+" and "c-" (along with looking for "-00", which would continue to have its current meaning). POSIX allows "c+" and "c-" at the start of time zone abbreviations so this would pass POSIX muster. This sort of thing could be generated automatically from the current data.
On Wed, 28 Jun 2023 at 18:28, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 2023-06-28 01:17, Stephen Colebourne via tz wrote:
I believe that if tzdb provided a fixed offset pre-1970, but one linked to LMT or common practice in that location
For timestamps before 1970 we could approximate LMT along with a time zone abbreviation saying that the UT offset is approximate. Something equivalent to this, say:
Zone America/New_York -5:00 - c-05 1970 Jan 1 0u -5:00 US E%sT
Zone Asia/Tokyo 9:00 - c+09 1970 Jan 1 0u 9:00 Japan J%sT
The idea would be that a time zone abbreviation "c+NN" stands for "circa +NN", where the exact offset is unspecified. Software that cares whether an offset is approximate could look in the abbreviation for leading "c+" and "c-" (along with looking for "-00", which would continue to have its current meaning). POSIX allows "c+" and "c-" at the start of time zone abbreviations so this would pass POSIX muster.
Or the name could be something consistent across the globe, like "LMT", such as "SET" (Solar estimated time). It would probably be easier to have one consistent name (like LMT) than a prefix. I think before progressing this further it would need other packagers to chime in and comment on whether they feel it would be acceptable as the default setting for tzdb pre-1970. Stephen
participants (7)
-
Anders Rosendahl -
Doug Ewell -
Guy Harris -
Magnus Fromreide -
Paul Eggert -
Paw Boel Nielsen -
Stephen Colebourne