In this thread I want to try and capture the *requirements* for the project from the perspective of a downstream user. Given this, we may then have a chance to analyse whether the project needs a change of rules. NOTE: The discussion below is not a proposal, nor does it seek to explore which locations get pre-1970 data. Lets keep that for another thread. -------- 1) "What is the time now?" To answer this question you need a dataset that records the current timezone rules for each region in the world. Over time that data will naturally build up to form a historic record. To aid the project, a 1970 rule has been adopted for data management. In some parts of the world, clocks have agreed since 1970 across large parts of a continent. In other parts of the world, different parts of the same country have had different rules since 1970. Viewing the data through a post-1970 lens splits the planet into regions where clocks have agreed since 1970. Some of these regions are quite abstract, for example the region including Iceland, Ivory Coast, and St Helena. An abstract region would need to be split if one of the parts of the region changed its timezone rules. If a user had previously selected to follow one of these abstract regions, they might have to change their timezone settings. For example, if a user in Iceland selected their abstract region by the recommended ID based on the largest city they would get `Africa/Abidjan`, but if the Ivory Coast changed their rules, users in Iceland would have to change their selected timezone. The minimum set of IDs is one ID for each abstract region. The current ID scheme selects the largest city in the abstract region. My interpretation: * An abstract region should theoretically have an ID separate to the IDs of the cities/countries in the region * Trying the ID of the abstract region to the largest city in the region is prone to failure if that city changes timezone rules but the rest of the abstract region stays the same. * An abstract region ID would naturally have no pre-1970 data, as it doesn't define data in a single city/location. * The alternative is to encourage users to use their local city, which means that IDs like Europe/Amsterdam and Europe/Oslo cannot be deprecated -------- 2) "What was the time at some instant in the past?" The project can answer this question pretty reliably for dates from 1970 onwards because of the data set in #1. The project can never realistically answer the question reliably or accurately before 1970 for *all* locations. For example, timezone rules were entirely local in the US for about 20 years, meaning that hundreds of entries would be needed to correctly capture the data. In addition, much of the historic data globally is unknown and not viable to research, even if someone wanted to do so. It is possible to research and record historic timezone rules for *some* locations. Europe/London being one example. The project generally operates on the basis that if an ID exists, historical data can be recorded against it, although which file that data lives in has varied over time. IDs are generally not created solely to document historic data. There is no minimum set of IDs for this use case - the set of possible IDs is unbounded. My interpretation: * The majority of pre-1970 data isn't much use * Some pre-1970 data is high quality, and almost certainly relied upon (eg Europe/London) * Without a massive increase in scope the best that can be done pre-1970 is to record data against IDs created for some other reason. -------- 3) "An event will happen at this time in the future" If an end-user wants to store an event in the future, eg. a one hour meeting next month, the correct approach is to store the date, time and zone ID, not the offset (which is not yet known for certain).Even if the local timezone authority changes the tz rules between now and the event, the time of the event is still correct. https://stackoverflow.com/questions/19166995/java-calendar-date-and-time-man... For this approach to work, the ID that is stored must be aligned with the timezone authority. Business contracts are frequently written to describe future events. For example, payment must be made by 2pm Paris time on a given date. The minimum set of IDs for this use case is one ID per timezone authority. This is typically the country but not necessarily, examples being decisions devolved to states/nations etc. My interpretation: * Using TZDB IDs for future events/contracts is very widespread * In most parts of the world, this requires one ID per country, in some parts of the world additional IDs would ideally be needed such as per state in the US * One ID for each ISO country (as was the guideline in the past) is a reasonable minimum expectation of end users * While one might argue IDs for future events are not TZDBs problem, I believe the reality is that they are because usage in this way is already very widespread -------- 4) "An event will happen at this time in the future relative to a shared/common definition" Some future events are declared relative to a definition that is supra-national or federal. For example, US Mountain, US Pacific, EU Central etc. A TV show might be defined to air at 8pm Mountain Time in one months time. It will air at that time regardless of whether any of the states that are currently on Mountain Time change to Pacific Time. For example, even if Denver (and Colorado) moved to Pacific Time or stopped following DST, Mountain Time would still exist and the TV show would still air at 8pm Mountain. The minimum set of IDs for this use case is one ID per shared/common definition. These IDs are not full aliases for a country or city ID, although they may currently share data. To support this use case, these IDs cannot be deprecated. My interpretation: * Some shared zone IDs are distinct and separate enough to warrant a full ID * US Mountain time is not the same as Denver time when considering the future, the IDs could diverge * In TZDB terms, the Link from US/Mountain to America/Denver is fine to define the data, but US/Mountain should not be considered deprecated as it has a real and distinct meaning separate from America/Denver * It would be useful to agree what the criteria for including these shared/federal rules is. -------- 5) "Data should be backwards compatible" The project has various associated downstream projects. Many of these have an expectation of backwards compatibility. This compatibility extends to the binary format, the source format, and the data contained within. This use case wants to see the project just continue with the data it has, enhancing it over time. The project has a pretty good record in terms of not removing IDs. For example, IDs exist for "Portugal" and "Poland", but not "Italy" or "Spain" simply because they were defined once upon a time. My understanding is that IDs in the `backward` file represent deprecated IDs that should not be used going forward. Yet, downstream projects focussed on backwards compatibility will generally use the `backward` file in full. The minimum set of IDs for this use case is the set of IDs that existed in the previous release. My interpretation: * Many users care about compatibility * The current IDs are not going away * Inconsistency has existed in TZDB for many years, see "Portugal" vs "Spain" - no one wants to fix that * It is hard at present to identify IDs that are deprecated because of proper aliasing, such as spelling changes, vs those where the ID is a real location but its TZDB status has been reduced. * Change may be possible with a long notice period --------- Please note that the above is not a proposal, but an exploration of the design space. If you have any use cases I've missed feel free to write them up. thanks Stephen
On Wed, Sep 29, 2021 at 5:58 PM Stephen Colebourne via tz <tz@iana.org> wrote:
In this thread I want to try and capture the *requirements* for the project from the perspective of a downstream user. Given this, we may then have a chance to analyse whether the project needs a change of rules.
Thanks for starting this. Are you thinking that these are akin to user stories for just the data, or for everything? Besides the discussion you opened, there are related topics that can be added, such as (but not totally): - What is the public API of the project, and how is it advertised? - What is the backwards-compatability policy should a change be needed to the public API? For those types of topics, there is inspiration from many of the well-managed F/OSS ecosystems. And this be a nit, but I think "What is the time now?" needs to be more explicitly "What is the wall-time for a particular person now?" and similar for past and future. --Matthew Donadio (matt@mxd120.com)
Dear Stephen, On Wednesday, 29 September 2021, you wrote:
In this thread I want to try and capture the *requirements* for the project from the perspective of a downstream user. Given this, we may then have a chance to analyse whether the project needs a change of rules.
Thank you very much for undertaking this effort. I believe this is an important step towards having a fruitful discussion later on on how to handle the timezone data policies.
Please note that the above is not a proposal, but an exploration of the design space. If you have any use cases I've missed feel free to write them up.
I support your suggestion to gather this data first before we delve into details of the implementation / possible changes. ==A note regarding your examples: a) Although possibly a boundary case, I'd like to add that when using location based identifiers, for past or future times, a unique conversion from local time to UTC is sometimes impossible: 2020-10-25 02:30:00 Europe/Copenhagen cannot be mapped unambiguously to UTC, whereas the reverse mapping works of course. I am not aware of any software that takes this into account. b) Additionally, there exist invalid times: 2020-03-29 02:30:00 Europe/Copenhagen is one example, and when/if negative leap seconds arise dates such as 2022-07-01 01:59:59 Europe/Copenhagen could be another one. Therefore a reasonable requirement from the downstream user side would be to convey information about such conditions ('invalid time' , 'ambiguous time', evtl. also 'unconfirmed tzdata' for times not covered in the database) Cheers, Jürgen -- Jürgen Appel Research Scientist Denmark's National Metrology Institute Dansk Fundamental Metrologi, DFM A/S (dfm.dk) Kogle Allé 5 DK-2970 Hørsholm Denmark Mobile: +45 25459049 Email: jap@dfm.dk VAT: DK-29217939
On Thu, 30 Sept 2021 at 09:42, Jürgen Appel via tz <tz@iana.org> wrote:
a) Although possibly a boundary case, I'd like to add that when using location based identifiers, for past or future times, a unique conversion from local time to UTC is sometimes impossible:
2020-10-25 02:30:00 Europe/Copenhagen cannot be mapped unambiguously to UTC, whereas the reverse mapping works of course. I am not aware of any software that takes this into account.
FYI, java.time.* and Joda-Time provide explicit tools for developers to manage gaps and overlaps on the local timeline.
Therefore a reasonable requirement from the downstream user side would be to convey information about such conditions ('invalid time' , 'ambiguous time', evtl. also 'unconfirmed tzdata' for times not covered in the database)
From my perspective as a software library author, handling of these two use cases belong at the downstream library/consumer level, not at the TZDB level.
thanks Stephen
On Thu, 30 Sept 2021 at 10:10, Stephen Colebourne via tz <tz@iana.org> wrote:
On Thu, 30 Sept 2021 at 09:42, Jürgen Appel via tz <tz@iana.org> wrote:
a) Although possibly a boundary case, I'd like to add that when using location based identifiers, for past or future times, a unique conversion from local time to UTC is sometimes impossible:
2020-10-25 02:30:00 Europe/Copenhagen cannot be mapped unambiguously to UTC, whereas the reverse mapping works of course. I am not aware of any software that takes this into account.
FYI, java.time.* and Joda-Time provide explicit tools for developers to manage gaps and overlaps on the local timeline.
Ditto Noda Time <https://nodatime.org>, the library for .NET The .NET standard library provides tools to *check* for gaps and overlaps, but isn't quite as insistent on the user being aware of the possibility as Noda Time is.
Therefore a reasonable requirement from the downstream user side would be to convey information about such conditions ('invalid time' , 'ambiguous time', evtl. also 'unconfirmed tzdata' for times not covered in the database)
From my perspective as a software library author, handling of these two use cases belong at the downstream library/consumer level, not at the TZDB level.
Indeed - the required data for invalid/ambiguous time is already available in the database; it just needs to be presented appropriately. The "unconfirmed tzdata" is slightly trickier, and I'm not sure what most libraries do about that. Jon
Thanks for starting this discussion. I looked at your email’s subject line and wrote up some thoughts about the stated topic, as I figured it’d be good to think independently at first so that we could see two fresh approaches. Here are those thoughts. (I will comment on your email more directly in a followup message.) ----- Knowing one’s users is important when thinking about use cases, so I’ll start by listing tzdb users, as follows. This list is in no particular order and is surely incomplete. * End users who don’t want to worry about timestamp conversion. They want conversion between UT and local time to work correctly with a minimum of fuss. * Administrators who set up default timezones for sets of devices under their administration. * Users or downstream projects that care only about leap seconds (notably, NTP users consuming the leap-seconds.list file). * Downstream projects that package TZif files for delivery to their users as part as OS releases or patches. * Downstream projects that communicate TZif files or other TZDB-derived data to their users via TZDIST (Internet RFC 7808) or other network protocols. * Downstream projects that translate TZif files into some other format, that is then communicated to their users. * Downstream projects that translate tzdb source files into some non-TZif format, which is then communicated to their users somehow. * Downstream projects that produce timezone choosers (programs that let user choose a timezone setting), either via global positioning, or selecting from a map, or selecting via a textual interface. * Downstream projects that produce fancier interfaces, such as extracting timezone histories, comparing two timezone histories, and examining tzdb’s own history. * Downstream projects that infer metadata from the tzdb source code, metadata that are not explicit in the source. * Downstream projects that use tzcode. * Governments that change timezone rules and need to have tzdb updated. * Humans reading tzdb source files, for understanding of timezone history. * Timezone historians who might find corrections to old data, or who might find old data not currently present in the database. * Activists who want governments to change timezone rules. * Activists who want tzdb to highlight their concerns. * Researchers who study the timezone data and its evolution as interesting objects in their own right. * Maintainers who have limited resources to accommodate all the above. Given the above, here are some use cases. This list also is in no particular order and is surely incomplete. * Convert near-future or near-past timestamps. * Convert far-future or distant-past timestamps. * Select which timezone to use, either by default or for an individual conversion. * Determine what the conversion rules are, e.g., what rules are used for spring-forward and fall-back. * Assess conversion accuracy. * Determine the source of the data used for conversions. * Assess the consensus level of timezone data, since in some cases the rules were significantly disputed by the people in charge of clocks at the time. Also, rules may be disputed now by timezone historians. * Patch a distributed tzdb for use further downstream. * Update a local copy of tzdb because of a new tzdb release or patchset. * Respond to queries about why tzdb is the way it is. * Update conversion of previously-converted timestamps after tzdb changes. * Update tzdb because governments changed the rules. * Update tzdb because errors were found. * Update tzdb because data entries are dubious or disputed. * Update tzdb because it unfairly discriminates (or appears to discriminate) in favor of some countries, ethnic groups, etc. * Update downstream code from tzcode, even if the downstream code has been patched. * Update a downstream database that contains tzdb data, perhaps in a different form and perhaps with a superset of a subset of tzdb, and perhaps patched in its own way. * Derive different versions of timezone data for different needs.
Thanks for these two lists. I think that they cover the users and use cases pretty well. These can be used with my list to get a sense of who might be affected by any changes that are made. Stephen On Sat, 2 Oct 2021 at 03:08, Paul Eggert via tz <tz@iana.org> wrote:
Thanks for starting this discussion. I looked at your email’s subject line and wrote up some thoughts about the stated topic, as I figured it’d be good to think independently at first so that we could see two fresh approaches. Here are those thoughts. (I will comment on your email more directly in a followup message.)
-----
Knowing one’s users is important when thinking about use cases, so I’ll start by listing tzdb users, as follows. This list is in no particular order and is surely incomplete.
* End users who don’t want to worry about timestamp conversion. They want conversion between UT and local time to work correctly with a minimum of fuss.
* Administrators who set up default timezones for sets of devices under their administration.
* Users or downstream projects that care only about leap seconds (notably, NTP users consuming the leap-seconds.list file).
* Downstream projects that package TZif files for delivery to their users as part as OS releases or patches.
* Downstream projects that communicate TZif files or other TZDB-derived data to their users via TZDIST (Internet RFC 7808) or other network protocols.
* Downstream projects that translate TZif files into some other format, that is then communicated to their users.
* Downstream projects that translate tzdb source files into some non-TZif format, which is then communicated to their users somehow.
* Downstream projects that produce timezone choosers (programs that let user choose a timezone setting), either via global positioning, or selecting from a map, or selecting via a textual interface.
* Downstream projects that produce fancier interfaces, such as extracting timezone histories, comparing two timezone histories, and examining tzdb’s own history.
* Downstream projects that infer metadata from the tzdb source code, metadata that are not explicit in the source.
* Downstream projects that use tzcode.
* Governments that change timezone rules and need to have tzdb updated.
* Humans reading tzdb source files, for understanding of timezone history.
* Timezone historians who might find corrections to old data, or who might find old data not currently present in the database.
* Activists who want governments to change timezone rules.
* Activists who want tzdb to highlight their concerns.
* Researchers who study the timezone data and its evolution as interesting objects in their own right.
* Maintainers who have limited resources to accommodate all the above.
Given the above, here are some use cases. This list also is in no particular order and is surely incomplete.
* Convert near-future or near-past timestamps.
* Convert far-future or distant-past timestamps.
* Select which timezone to use, either by default or for an individual conversion.
* Determine what the conversion rules are, e.g., what rules are used for spring-forward and fall-back.
* Assess conversion accuracy.
* Determine the source of the data used for conversions.
* Assess the consensus level of timezone data, since in some cases the rules were significantly disputed by the people in charge of clocks at the time. Also, rules may be disputed now by timezone historians.
* Patch a distributed tzdb for use further downstream.
* Update a local copy of tzdb because of a new tzdb release or patchset.
* Respond to queries about why tzdb is the way it is.
* Update conversion of previously-converted timestamps after tzdb changes.
* Update tzdb because governments changed the rules.
* Update tzdb because errors were found.
* Update tzdb because data entries are dubious or disputed.
* Update tzdb because it unfairly discriminates (or appears to discriminate) in favor of some countries, ethnic groups, etc.
* Update downstream code from tzcode, even if the downstream code has been patched.
* Update a downstream database that contains tzdb data, perhaps in a different form and perhaps with a superset of a subset of tzdb, and perhaps patched in its own way.
* Derive different versions of timezone data for different needs.
On 9/29/21 2:57 PM, Stephen Colebourne via tz wrote:
* An abstract region should theoretically have an ID separate to the IDs of the cities/countries in the region
Yes, others have proposed this, notably Russ Allbery in <https://mm.icann.org/pipermail/tz/2021-September/030518.html>. It's not clear, though, whether merely adding this level of indirection would be worth the cost. It wouldn't remove political concerns, for example. Nor would it address Zone splits any better than the current approach does. Although it might be worth adding abstract IDs to implement a larger project, I'd like to see that larger project's outlines before opining.
* An abstract region ID would naturally have no pre-1970 data, as it doesn't define data in a single city/location.
Sorry, I don't know what "naturally" refers to here. Aren't abstract IDs orthogonal to eliminating pre-1970 data? After all, we could introduce abstract IDs without eliminating pre-1970 data, and vice versa.
timezone rules were entirely local in the US for about 20 years
Perhaps you are referring to 1946-1966? But US timezone rules were local for considerably more than twenty years (1883-1917, 1920-1941, 1946-1966).
3) "An event will happen at this time in the future" If an end-user wants to store an event in the future, eg. a one hour meeting next month, the correct approach is to store the date, time and zone ID
That doesn't suffice, as it doesn't work if the Zone splits in the meantime.
* In most parts of the world, this requires one ID per country
No, it requires far fewer Zones for this use case. I estimate that it would require about 110 Zones, a bit fewer if we omit legacy Zones like PST8PDT. That's far fewer than the 249 country codes in iso3166.tab. (We'd need so few Zones because for this use case we could merge all Zones agreeing in the future.)
* One ID for each ISO country (as was the guideline in the past) is a reasonable minimum expectation of end users
I doubt that, given that we could get by with so few Zones for the given use case.
4) "An event will happen at this time in the future relative to a shared/common definition"
A TV show might be defined to air at 8pm Mountain Time in one months time. It will air at that time regardless of whether any of the states that are currently on Mountain Time change to Pacific Time.
I don't think we can realistically predict what will happen for future events of this sort, which means this sort of thing should not be a design constraint. tzdb needs flexibility, not a straightjacket, to deal with unknown future events. Suppose, for example, that in mid-2022 the US east coast switches from -05 (-04 with DST) to -04 all year, something that's under serious consideration. A TV show that was formerly planned for 2022-12-01 19:00 -05 will likely be rescheduled for 2022-12-01 19:00 -04, i.e., still 7pm in New York, Philadelphia, Miami, etc. but with a different UTC. If such a show had been scheduled with TZ='EST5EDT' then that TZ setting would be wrong after the change; whereas if it had been scheduled with TZ='America/New_York' it would be OK. We can't predict how events like this will shake out. Nor should we be compelled to add legacy-like Zones like "AEST-10AEDT" or "<MSK+07>-10<MSK+07>" merely because of the existence of legacy zones like "PST8PDT". Such complexity would be more trouble than it'd be worth: there are good reasons we moved away from "PST8PDT" and the like.
My understanding is that IDs in the `backward` file represent deprecated IDs that should not be used going forward.
My understanding has been that the names in 'backward' should be supported indefinitely unless there is good reason to remove them. We have on rare occasions removed 'backward' links (I recall Canada/East-Saskatchewan; perhaps there were a few others) but I think this was to fix bugs, not mere cleanups. It sounds like we should document this issue better, if your understanding is the reverse of mine.
* It is hard at present to identify IDs that are deprecated because of proper aliasing, such as spelling changes, vs those where the ID is a real location but its TZDB status has been reduced.
Good point, and it'd be good to document somehow why Link entries exist. Among other things, I suspect that there are more than just the two reasons that you mention.
Paul Eggert via tz <tz@iana.org> writes:
Yes, others have proposed this, notably Russ Allbery in <https://mm.icann.org/pipermail/tz/2021-September/030518.html>. It's not clear, though, whether merely adding this level of indirection would be worth the cost. It wouldn't remove political concerns, for example. Nor would it address Zone splits any better than the current approach does.
Although it might be worth adding abstract IDs to implement a larger project, I'd like to see that larger project's outlines before opining.
I think it would be useful primarily if it made sense to hand off the political concerns to a different body that had the infrastructure to apply more political rules to making decisions (assuming that's considered appropriate). The Unicode CLDR, for example (although I have no idea if they want that duty). It handles zone splits better in the specific sense that it separates naming from adding the data and makes it uncontroversial to add a new abstract ID with different rules. It doesn't make it any easier to decide what to call that zone (if anything; it would be possible to have abstract zones that have no names), and where the existing name should point, which seems to be where most of the political disputes lie. It just enables those decisions to be made by a different party. If there's no desire to hand off the naming portions to a different body, the level of indirection may not serve any useful purpose. It does make it easier for different groups to maintain different sets of names while sharing the same underlying data, but that results in inconsistent behavior for users despite using the same name, which is presumably undesirable. (That's currently happening right now, however, and to a lesser extent happens if backzone is included in the data set.) -- Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>
On Sat, 2 Oct 2021 at 03:24, Paul Eggert via tz <tz@iana.org> wrote:
* An abstract region ID would naturally have no pre-1970 data, as it doesn't define data in a single city/location.
Sorry, I don't know what "naturally" refers to here. Aren't abstract IDs orthogonal to eliminating pre-1970 data? After all, we could introduce abstract IDs without eliminating pre-1970 data, and vice versa.
An abstract region represents a part of the Earth that has had the same clock rules since 1970. At some point prior to 1970 different locations within the abstract region had different clock rules, even if that was just LMT. There is "naturally" no pre-1970 data because the definition of the abstract region is all about post-1970 data, with pre-1970 data diverging. Imagine there was an abstract region called "Abstract/SameAsGmt" which is what Iceland and Ivory Coast follow. That region *as a whole* does not have *one* set of pre-1970 data, it has *many* sets. ie. Pre-1970 data only realistically works when viewed at a city level, not at an abstract region level.
3) "An event will happen at this time in the future" If an end-user wants to store an event in the future, eg. a one hour meeting next month, the correct approach is to store the date, time and zone ID
That doesn't suffice, as it doesn't work if the Zone splits in the meantime.
Of course that is true. But this hasn't been a problem in practice for business applications AFAICT. The point above is documenting how things *are*. Business applications and calendaring systems really do store the local date/time and zone ID from TZDB for future events. This works precisely because there are enough IDs provided by TZDB to make this work. If TZDB had only ever offered an ID for each abstract region, then *some other* system/project would have needed to be invented to define a set of IDs suitable for business applications. And moreover, that other system/project would need to provide the mapping from their ID to an underlying abstract region. (This is kind of Russ' point - that the IDs needed for business applications could in theory be separated from the IDs that provide data for abstract regions.) But TZDB doesn't just provide IDs for abstract regions, it has always provided more than that. And by providing separate IDs like Europe/Oslo and Atlantic/Reykjavik for many years, TZDB has provided the basic tools necessary for business applications to function in the way described above. While I can understand the conceptual desire to have TZDB only provide abstract regions, it seems completely impractical to do so at this point given how embedded the data actually is across the industry. ie. it is too late to get rid of IDs like Europe/Oslo and Atlantic/Reykjavik. But there may yet be other ways to allow TZDB to focus on the abstract regions (a discussion for a later thread). (Note that this comment only discusses the IDs, not the associated data). I hope this explanation explains more clearly the "one ID per country" conclusion in the OP. TZDB *already* effectively provides one ID per country, and applications *already* rely on that fact to meet use case #3. What is really needed IMO is an RFC/theory updates to recognise the vital nature of IDs like Europe/Oslo and Atlantic/Reykjavik to business applications (Again, the comment here is just about IDs, not the associated data)
4) "An event will happen at this time in the future relative to a shared/common definition"
A TV show might be defined to air at 8pm Mountain Time in one months time. It will air at that time regardless of whether any of the states that are currently on Mountain Time change to Pacific Time.
I don't think we can realistically predict what will happen for future events of this sort, which means this sort of thing should not be a design constraint. tzdb needs flexibility, not a straightjacket, to deal with unknown future events.
Suppose, for example, that in mid-2022 the US east coast switches from -05 (-04 with DST) to -04 all year, something that's under serious consideration. A TV show that was formerly planned for 2022-12-01 19:00 -05 will likely be rescheduled for 2022-12-01 19:00 -04, i.e., still 7pm in New York, Philadelphia, Miami, etc. but with a different UTC. If such a show had been scheduled with TZ='EST5EDT' then that TZ setting would be wrong after the change; whereas if it had been scheduled with TZ='America/New_York' it would be OK.
Not sure if the point was missed here. My point is really just an observation of timezone reality IMO - that US Mountain Time (as defined by the US DOT) is not the same as Denver Time (the time experienced in Denver). It is a coincidence that the two happen to be the same, but they are not legally the same. (A legal contract could be written to refer to Denver Time or US Mountain Time, and if the two diverge, that distinction would matter.) IMO, it is important to ensure that IDs exist for such shared zones. But luckily enough we already have "US/Mountain" and "CET", so the ID part is already sorted (in the US and EU). Logically, the ID US/Mountain should provide no time zone data prior to the US DOT first defining the rule, but that subtlety probably isn't really necessary.
* It is hard at present to identify IDs that are deprecated because of proper aliasing, such as spelling changes, vs those where the ID is a real location but its TZDB status has been reduced.
Good point, and it'd be good to document somehow why Link entries exist. Among other things, I suspect that there are more than just the two reasons that you mention.
Defining the set of IDs the project should provide (still ignoring pre-1970 issues) is IMO the next step in the process which would address the Link/backward point. I'll start a new thread for that soon, unless there is much more to say on this thread. thanks Stephen
On Oct 2, 2021, at 3:02 PM, Stephen Colebourne via tz <tz@iana.org> wrote:
The point above is documenting how things *are*. Business applications and calendaring systems really do store the local date/time and zone ID from TZDB for future events. This works precisely because there are enough IDs provided by TZDB to make this work.
If we're talking about *future* events, the only IDs that would be required to make this work would be IDs corresponding to regions that differ in dates and times after the earliest date and time that matters to the business application in question. And any business application (I'm including calendaring systems in the category of "business application" here) that uses the tzdb couldn't have done so before the tzdb was first released (which was 14-15 years after 1970) - an application could not have, in 1963, have recorded an event scheduled for 1965 as a date/time and zone ID from tzdb. So it's not as if there was an *inherent* need for Europe/Oslo as a tzdb ID to make these applications work; if there had never been a Europe/Oslo ID, and Norwegian locations had used Europe/Berlin, that would have been sufficient to, for example, record future events taking place in Norway. The reason why we would continue to have a Europe/Oslo ID even if we put Norway into the Europe/Berlin tzdb region is for *backwards compatibility* with applications that have, for example, stored future events with a tzdb ID of Europe/Oslo.
On Sun, 3 Oct 2021 at 00:29, Guy Harris via tz <tz@iana.org> wrote:
So it's not as if there was an *inherent* need for Europe/Oslo as a tzdb ID to make these applications work; if there had never been a Europe/Oslo ID, and Norwegian locations had used Europe/Berlin, that would have been sufficient to, for example, record future events taking place in Norway.
Here is a not untypical legal contract: https://www.sec.gov/Archives/edgar/data/1308106/000119312512368196/d401408de... "NIBOR means that the rate for an interest period will be the rate for deposits in Norwegian Kroner for a period as defined under Bond Reference Rate which appears on the Reuters Screen NIBR Page as of 12.00 noon, Oslo time" The 100% correct way to represent this in a business application is to define an ID scheme that includes an ID for "Oslo time" and then keep a constantly updated mapping from that ID to the actual zone used in Oslo. In practice, most business applications/companies would find that very expensive and cumbersome to maintain by themselves. Instead, most business applications simply use "Europe/Oslo" to represent the concept. Effectively, this outsources both the ID and the upkeep of the mapping to tzdb. Feel free to say that the business applications are doing it wrong. Feel free to say that this shouldn't be tzdb's issue to solve. But my answer will be the same - that no longer matters because this *is* how tzdb IDs are actually used. And in practice it has worked pretty effectively for many years across much of the planet. And IMO, if tzdb had never published an ID for Oslo, somebody somewhere would have had to invent such a thing - either internal inside lots of companies, or a separate non-tzdb open source project. ie. that there is an inherent need is demonstrated by actual usage. thanks Stephen
Hi, I'm struggling to understand the relevance of the below. For one, there exists a link for Europe/Oslo that accurately reflects the GMT offset in Oslo at this moment. For another, the same sort of contract could be made for Piscataway, NJ, or Niemes. Fr, for which there are no specific TZ definitions. Help me understand your point. Eliot On 03.10.21 11:47, Stephen Colebourne via tz wrote:
On Sun, 3 Oct 2021 at 00:29, Guy Harris via tz <tz@iana.org> wrote:
So it's not as if there was an *inherent* need for Europe/Oslo as a tzdb ID to make these applications work; if there had never been a Europe/Oslo ID, and Norwegian locations had used Europe/Berlin, that would have been sufficient to, for example, record future events taking place in Norway. Here is a not untypical legal contract: https://www.sec.gov/Archives/edgar/data/1308106/000119312512368196/d401408de... "NIBOR means that the rate for an interest period will be the rate for deposits in Norwegian Kroner for a period as defined under Bond Reference Rate which appears on the Reuters Screen NIBR Page as of 12.00 noon, Oslo time"
The 100% correct way to represent this in a business application is to define an ID scheme that includes an ID for "Oslo time" and then keep a constantly updated mapping from that ID to the actual zone used in Oslo.
In practice, most business applications/companies would find that very expensive and cumbersome to maintain by themselves. Instead, most business applications simply use "Europe/Oslo" to represent the concept. Effectively, this outsources both the ID and the upkeep of the mapping to tzdb.
Feel free to say that the business applications are doing it wrong. Feel free to say that this shouldn't be tzdb's issue to solve. But my answer will be the same - that no longer matters because this *is* how tzdb IDs are actually used. And in practice it has worked pretty effectively for many years across much of the planet.
And IMO, if tzdb had never published an ID for Oslo, somebody somewhere would have had to invent such a thing - either internal inside lots of companies, or a separate non-tzdb open source project. ie. that there is an inherent need is demonstrated by actual usage.
thanks Stephen
On Sun, 3 Oct 2021 at 12:53, Eliot Lear via tz <tz@iana.org> wrote:
For another, the same sort of contract could be made for Piscataway, NJ, or Niemes. Fr, for which there are no specific TZ definitions.
And Guy asks
So what do those applications due for contracts that refer to "Frankfurt time"?
So, it turns out that there is, I believe, a critical difference here between the US view of timezones and the European one. In the US, timezones do not follow the boundary of the whole country, nor even States. In Europe, by contrast, timezone boundaries are very much driven by country boundaries. Thus the answer for Frankfurt is that applications use the zone ID associated with the country, Europe/Berlin, and applications accept that this is not actually "Frankfurt Time", but a good enough approximation. (Same with Nimes and Europe/Paris). Because there are a limited set of countries, which (almost) all have an associated tz ID and zone rules only change at the country-level, this works out just fine. (Again, as mentioned before, if you want to be 100% correct, you cannot rely on tzdb IDs for this, but most applications accept the imperfection with tzdb IDs being good enough). For the US, tzdb actually provides less ability to solve this problem. Even if tzdb had an ID for each State, that would be insufficient to cleanly map a location to a good enough ID. In the US, tzdb effectively doesn't help a user find itz info for Piscataway, NJ - something must *directly* connect Piscataway with America/New_York - no other geographic hierarchy such as the State will do. ie. in Europe, you can map location to country (typically easy to do) and then country to timezone ID (easy) and be happy with the outcome. In the US, you would need a direct mapping from location to timezone for thousands of locations because there is no suitable intermediate location hierarchy. An abstract region model is the only one that really works for the US. Why does this matter in this discussion? Well, one thing I don't want to see in tzdb is an attempt to force the US model onto Europe. The extra level of country implicit in the European zone IDs is really useful to applications. And part of this discussion is to tease out the implicit and make it explicit. thanks Stephen
On Oct 4, 2021, at 4:00 PM, Stephen Colebourne via tz <tz@iana.org> wrote:
On Sun, 3 Oct 2021 at 12:53, Eliot Lear via tz <tz@iana.org> wrote:
For another, the same sort of contract could be made for Piscataway, NJ, or Niemes. Fr, for which there are no specific TZ definitions.
And Guy asks
So what do those applications due for contracts that refer to "Frankfurt time"?
So, it turns out that there is, I believe, a critical difference here between the US view of timezones and the European one. In the US, timezones do not follow the boundary of the whole country, nor even States.
Canada, Australia, Russia, and Brazil, possibly among others, say "hi!" The world does not consist solely of "the United States" and "Europe". (And part of Russia is in Europe - and it has at least two European tzdb regions, Europe/Kaliningrad and Europe/Moscow.)
In Europe, by contrast, timezone boundaries are very much driven by country boundaries.
Except where they aren't.
Thus the answer for Frankfurt is that applications use the zone ID associated with the country, Europe/Berlin,
So those applications have some mechanism to map "Frankfurt" to "Europe/Berlin". *If* that mechanism maps "Frankfurt" to "Germany" and then maps "Germany" to "Europe/Berlin", that works... for Europe. There are places where it won't work.
and applications accept that this is not actually "Frankfurt Time", but a good enough approximation. (Same with Nimes and Europe/Paris). Because there are a limited set of countries, which (almost) all have an associated tz ID and zone rules only change at the country-level, this works out just fine.
So applications that use a mechanism as described above for countries outside (Western) Europe will not work in Canada: https://www.sec.gov/Archives/edgar/data/855931/000095012309070801/o58303exv4... "the rate of interest per annum equal to the average annual yield rate for one month Canadian Dollar bankers’ acceptances (expressed for such purpose as a yearly rate per annum in accordance with Section 5.4) which rate is shown on the display referred to as the “CDOR Page” (or any display substituted therefor) of Reuters Limited (or any successor thereto or Affiliate thereof) at 10:00 a.m. (Toronto time) on such day or, if such day is not a Banking Day, on the immediately preceding Banking Day, plus 1.00% per annum," https://www.sec.gov/Archives/edgar/data/1042682/000104746902008250/a2096324z... "1.1.20 "Bankers' Acceptance Discount Rate" means (i) in respect of Bankers' Acceptances to be purchased by the Lenders which are Schedule I banks under the Bank Act (Canada), the average rate for Canadian Dollar bankers' acceptances having Designated Periods of 1, 2, 3 or 6 months quoted on Reuters Service, page CDOR "Canadian Interbank Bid BA Rates" (the "CDOR Rate"), having an identical Designated Period to that of the Bankers' Acceptance to be issued on such day, (ii) in respect of Bankers' Acceptances to be purchased by the Lenders which are Schedule II banks under the Bank Act (Canada) or Schedule III banks under the Bank Act (Canada) which are not subject to the restrictions and requirements referred to in Section 524(2) thereof and in respect of Discount Notes, the rate for Canadian Dollar bankers' acceptances quoted by the BA Schedule II Reference Lender, having an identical Designated Period to that of the Bankers' Acceptance to be issued on such day, and (iii) in respect of Discount Notes to be purchased by Caisse centrale Desjardins, the annual discount rate established by Caisse centrale Desjardins in accordance with its normal practice as corresponding to the "all-in cost", expressed as a rate per annum and calculated at approximately 10:00 a.m. (Montreal time) on the date of issue of such Discount Notes, to Caisse centrale Desjardins to provide fixed rate loans denominated in Canadian Dollars to be made for a comparable term as that for which such Discount Notes are to be issued and in an amount approximately equal to the face value of the Discount Notes to be then issued by Caisse centrale Desjardins as a part of such issue; the whole having regard to the costs and charges to Caisse centrale Desjardins in obtaining matching deposits in Canadian Dollars in an amount sufficient to fund such fixed rate loans and for a comparable term, together with all other costs and charges (other than internal costs and charges) incidental thereto including the imputed costs of any primary, secondary or other reserves or special deposits required to be maintained in the circumstances; provided that the rates referred to in clauses (ii) and (iii) above may not exceed the rate determined under paragraph (i) by more than 10 basis points (.10%) (in each of cases (i), (ii) and (iii), the "Discount Rates"). In all cases, the Discount Rates shall be quoted at approximately 10:00 A.M. (Montreal time) on the Acceptance Date calculated on the basis of a year of 365 days." https://www.sec.gov/Archives/edgar/data/1163279/000116327902000002/ex1022.ht... ""Expiry Time" means the earlier of: (i) 4:30 p.m. (Vancouver time) on the Due Date; and (ii) 4:30 p.m. (Vancouver time) on the fifth Business Day after the Qualification Date;" or Australia: https://www.sec.gov/Archives/edgar/data/1564708/000119312520027351/d822982de... "BBSY Rate for a period means the higher of zero and the following rate determined at or about 11.00am (Sydney time) on the first day of that period (or if different the time specified by the Facility Agent as the time at which this rate is normally published) and for a period equivalent (in the opinion of the Facility Agent, without the need for instructions) to the Interest Period:" https://www.sec.gov/Archives/edgar/data/869370/000119312504190244/dex21.htm "unless a contrary indication appears, a time of day is a reference to Melbourne time." https://www.sec.gov/Archives/edgar/data/811156/000095012404004125/k87920exv1... "(TIME OF DAY) a reference to time is a reference to Perth time;" or the US: https://www.sec.gov/Archives/edgar/data/1421517/000143774917008029/ex10-1.ht... "Each Advance under the Revolving Credit Loan (each, a ‘Borrowing’) shall be made on notice, given not later than 12:00 Noon (New York time) on the third Business Day prior to the date of the proposed Borrowing in the case of a LIBOR Rate Advance, and not later than 12:00 Noon (New York time) on the date of the proposed Borrowing in the case of a Base Rate Advance, by Borrower to Lender. Unless otherwise agreed in writing by Lender, each such notice of a Borrowing shall be by telephone, confirmed immediately in writing (by telecopier, e-mail or otherwise as permitted hereunder), substantially in the form of Exhibit C (a ‘Notice of Borrowing’), specifying therein the requested (i) date of such Borrowing, (ii) Type of Advance comprising such Borrowing, (iii) aggregate principal amount of such Borrowing, and (iv) the Interest Period, in the case of a LIBOR Rate Advance. Each Borrowing (including any Conversion or Continuation) shall be in an amount equal to One Million Dollars ($1,000,000) or a whole multiple of Five Hundred Thousand Dollars ($500,000) in excess thereof." https://www.sec.gov/Archives/edgar/data/1835256/000119312521052538/d90982dex... "1.6 Time References. Unless the context of this Agreement or any other Loan Document clearly requires otherwise, all references to time of day refer to Pacific standard time or Pacific daylight saving time, as in effect in Los Angeles, California on such day. ..." https://www.sec.gov/Archives/edgar/data/1001039/000110465916088718/a16-1412_... "“Eleventh District Cost of Funds Rate” means, with respect to any Interest Determination Date specified below (an “Eleventh District Cost of Funds Interest Determination Date”), the rate equal to the monthly weighted average cost of funds for the calendar month immediately preceding the month in which the Eleventh District Cost of Funds Interest Determination Date falls as set forth under the caption “11th District” on the display on Reuters (or any successor service) on page COFI/ARMS (or any other page as may replace the specified page on that service) (“COFI/ARMS Page”), or if not so displayed, on Bloomberg service (or any successor service) on page ALLX COF (or any other page as may replace the specified page on that service) (“Bloomberg page ALLX COF”) in each case, as of 11:00 A.M., San Francisco time, on the Eleventh District Cost of Funds Interest Determination Date. If such rate does not appear on the COFI/ARMS Page or Bloomberg Page ALLX COF on any related Eleventh District Cost of Funds Interest Determination Date, the Eleventh District Cost of Funds Rate for the Eleventh District Cost of Funds Interest Determination Date will be the Eleventh District Cost of Funds Rate Index, as defined below. If the FHLB of San Francisco fails to announce the rate for the calendar month next preceding the Eleventh District Cost of Funds Interest Determination Date, then the Eleventh District Cost of Funds Rate for that date will be the Eleventh District Cost of Funds Rate in effect on that Eleventh District Cost of Funds Interest Determination Date."
For the US, tzdb actually provides less ability to solve this problem. Even if tzdb had an ID for each State, that would be insufficient to cleanly map a location to a good enough ID. In the US, tzdb effectively doesn't help a user find itz info for Piscataway, NJ - something must *directly* connect Piscataway with America/New_York - no other geographic hierarchy such as the State will do.
ie. in Europe, you can map location to country (typically easy to do) and then country to timezone ID (easy) and be happy with the outcome. In the US, you would need a direct mapping from location to timezone for thousands of locations because there is no suitable intermediate location hierarchy. An abstract region model is the only one that really works for the US.
What's the difference between "map location to country" and "map location to tzdb ID" that renders the former "easy to do" in Europe but that renders the latter *not* easy to do in Europe? Is it that the former doesn't need direct mapping from location to country, so that even with a larger number of cities in Europe, you wouldn't need a table that gives the country for all of those cities?
Why does this matter in this discussion? Well, one thing I don't want to see in tzdb is an attempt to force the US model onto Europe. The extra level of country implicit in the European zone IDs is really useful to applications.
So either 1) you're expressing concern for applications that *only* have to deal with times in "Europe" in the sense of "that region on the Eurasian continent where countries don't have more than one tzdb region" or 2) you're expressing concern for application that use *different* algorithms for selecting tzdb IDs to represent "XXX Time", where the algorithm used to map between "XXX Time" and a tzdb region is "map XXX time to a country and then map that country to a tzdb region", and don't want those applications to have to come up with a different algorithm for Europe.
And part of this discussion is to tease out the implicit and make it explicit.
Which is exactly what I've been doing in the past couple of messages. So, let's make it explicit in which regions of the world it's important that "map "XXX Time" to a country and map a country to a tzdb region" must work, with the rest of the world possibly using "map "XXX time" directly to a tzdb region" (an algorithm that would *also* work for Europe).
Guy Harris via tz said:
So, it turns out that there is, I believe, a critical difference here between the US view of timezones and the European one. In the US, timezones do not follow the boundary of the whole country, nor even States.
Canada, Australia, Russia, and Brazil, possibly among others, say "hi!"
So do Cyprus, the Kingdom of Denmark, France, the Kingdom of the Netherlands, Portugal, and Spain.
In Europe, by contrast, timezone boundaries are very much driven by country boundaries. Except where they aren't.
Zigactly.
ie. in Europe, you can map location to country (typically easy to do)
Except when it isn't (several well-known examples). -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
On Oct 5, 2021, at 12:54 AM, Clive D.W. Feather <clive@davros.org> wrote:
So do Cyprus, the Kingdom of Denmark, France, the Kingdom of the Netherlands, Portugal, and Spain.
In which case, for example, let's hope no sufficiently notable financial institution has a significant office in the Canary Islands or Melilla, so that the city-to-tzdb-ID mapping algorithm doesn't have to cope with Las Palmas de Gran Canaria Time, Santa Cruz de Tenerife Time, or Melilla Time - *maybe* the existence of Africa/Ceuta will make the algorithm work for "Ceuta Time" if it also tries {Continent}/{tzdb-ID-ification of city name}. (A quick search of sec.gov doesn't find any of those, so *maybe* an "XXX Time" -> country -> tzdb region mapping will work for Spain.)
On 10/4/21 16:00, Stephen Colebourne via tz wrote:
there is, I believe, a critical difference here between the US view of timezones and the European one. In the US, timezones do not follow the boundary of the whole country, nor even States. In Europe, by contrast, timezone boundaries are very much driven by country boundaries.
Certainly country boundaries affect timezone boundaries. But even in Europe it's not entirely true that the ISO 3166 country boundaries determine timezone boundaries. For example, Busingen is in Germany (DE) but observes Swiss (CH) time. There are several other examples of timezones crossing national boundaries in Europe, including Croatia, Luxembourg, and Slovenia. Conversely, Spain has multiple Zones with borders that are not country boundaries.
one thing I don't want to see in tzdb is an attempt to force the US model onto Europe.
Yes, and we discovered this long ago with tzdb. We started out with a US-centric model with timezones like "US/Pacific", then discovered that the US-centric model didn't work as well elsewhere.
On Oct 3, 2021, at 2:47 AM, Stephen Colebourne via tz <tz@iana.org> wrote:
On Sun, 3 Oct 2021 at 00:29, Guy Harris via tz <tz@iana.org> wrote:
So it's not as if there was an *inherent* need for Europe/Oslo as a tzdb ID to make these applications work; if there had never been a Europe/Oslo ID, and Norwegian locations had used Europe/Berlin, that would have been sufficient to, for example, record future events taking place in Norway.
Here is a not untypical legal contract: https://www.sec.gov/Archives/edgar/data/1308106/000119312512368196/d401408de... "NIBOR means that the rate for an interest period will be the rate for deposits in Norwegian Kroner for a period as defined under Bond Reference Rate which appears on the Reuters Screen NIBR Page as of 12.00 noon, Oslo time"
https://www.sec.gov/Archives/edgar/data/1524472/000152447218000006/xyl123120... "Base Rate means for any Loan or Unpaid Sum in EUR the EURIBOR Screen Rate as at 11.00 a.m. (Frankfurt time), on the relevant Rate Fixing Day for such currency for a period equal in length to the Interest Period of that Loan or Unpaid Sum and, if any such rate is below zero, the Base Rate will be deemed to be zero." And: # # This is a directory into which the 2021a lz-compressed tarball was unpacked # # We look for a tzid corresponding to Oslo in a source file # $ egrep '/Oslo' * | egrep -v 'backzone|to2050.tzs|tzdata.zi|zone.tab|zone1970.tab' backward:Link Europe/Oslo Atlantic/Jan_Mayen europe:Zone Europe/Oslo 0:43:00 - LMT 1895 Jan 1 europe:# All these events predate our cutoff date of 1970, so use Europe/Oslo europe:Link Europe/Oslo Arctic/Longyearbyen # # There is one. # # Now we look for a tzid corresponding to Frankfurt in a source file # $ egrep '/Frankfurt' * | egrep -v 'backzone|to2050.tzs|tzdata.zi|zone.tab|zone1970.tab' $ # # There isn't one. #
The 100% correct way to represent this in a business application is to define an ID scheme that includes an ID for "Oslo time" and then keep a constantly updated mapping from that ID to the actual zone used in Oslo.
In practice, most business applications/companies would find that very expensive and cumbersome to maintain by themselves. Instead, most business applications simply use "Europe/Oslo" to represent the concept. Effectively, this outsources both the ID and the upkeep of the mapping to tzdb.
So what do those applications due for contracts that refer to "Frankfurt time"?
On 2021-10-03 09:47, Stephen Colebourne via tz wrote:
Instead, most business applications simply use "Europe/Oslo" to represent the concept. Effectively, this outsources both the ID and the upkeep of the mapping to tzdb.
Thanks for the clear presentation of the issues for persistent data storage! From the viewpoint of SQL database system providers, one cannot "merge" Europe/Oslo with Europe/Berlin because tzdb does not (and can not) guarantee that they remain the equal in the future. Not even Africa/Asmera can be equated with Africa/Asmara when tzdb does not guarantee that Africa/Asmera will always remain to be linked to Africa/Asmara in the future as long as it exists. In fact, among the timezone names that were merged since 2013, two have been "unmerged" in the meantime (São_Tomé and Juba). Database systems are used to store consistent sets af data over decennials; hence they can only exploit invariants of tzdb that are guaranteed to hold across many releases. The local time scale offsets are modeled in SQL as "nondeterministic" data; these are expected to vary over time and cannot be used in constraints and thus cannot affect data consistency. Most of the guarantees made by tzdb seem to be subject to change in the next release, not just the "merges". For instance, the dst bit used to indicate "summer time" for about 20 years, but since 2018a, the dst bit could also indicate winter time (and it is *hard* to check when it indicates which). If a database system relied on the earlier definition, they had a serious problem convincing Irish users that "summer time" is used in winter. So it appears that the tzdb data description does not make any long time guarantees except for standardized names for locations, and one or two local time scales for each location. (And that certainly is a praiseworthy feat and deserves our utmost respect!) The second local time scale (the one in /backzone), if any, is probably the better guess of the local time scale at the location indicated by its name, but that is not guaranteed either. Actually, it is not even true that the two time scales always agree after 1970, although most people seem to believe otherwise. So I think that database system providers can exploit tzdb only by using (practically all) the tzdb timezone names, without any "merging", and selecting for each of them one of the two local time scale definitions, as they see fit. In my opinion, this "cherry picking" is necessary and becomes more difficult when more time scale definitions are moved to backzone, and unnecessarily so. Michael Deckers.
On 10/3/21 1:58 PM, Michael H Deckers via tz wrote:
From the viewpoint of SQL database system providers, one cannot "merge" Europe/Oslo with Europe/Berlin because tzdb does not (and can not) guarantee that they remain the equal in the future.
That's OK and there's longstanding precedent for that sort of thing, as database users should not "merge" Europe/Vatican to Europe/Rome for the same reason, even though the former is a Link and the latter a Zone in tzdb. This issue predates and is largely independent of whether out-of-scope pre-1970 Zones should be merged within tzdb itself.
For instance, the dst bit used to indicate "summer time" for about 20 years, but since 2018a, the dst bit could also indicate winter time
If this is talking about so-called "negative DST" where the UT offset with is_dst is less than the UT offset without, then "winter time" is in general a misnomer as the phenomenon can occur in summer (e.g., Morocco) as well as in winter (e.g., Ireland). And negative DST has been been required by POSIX and supported by tzcode for decades; the only new thing circa 2018 is that this longstanding feature began to be used in tzdata.
it is not even true that the two time scales always agree after 1970, although most people seem to believe otherwise.
If there are counterexamples please suggest fixes, preferably in 'git format-patch' form. Although 'backzone' is documented to be less reliable, we're certainly open to fixes from the community.
On 2021-10-04 01:45, Paul Eggert commented on my remark:
it is not even true that the two time scales always agree after 1970, although most people seem to believe otherwise.
If there are counterexamples please suggest fixes, preferably in 'git format-patch' form. Although 'backzone' is documented to be less reliable, we're certainly open to fixes from the community.
Before proposing premature fixes, I think it is appropriate to first agree and then specify what the backzone file should contain. The backzone file says: # This file contains data outside the normal scope of the tz database, # in that its zones do not differ from normal tz zones after 1970. If this were intended to mean that # The timezones in this file agree after 1970 with tz zones # in other files. then America/Ensenada must not occur in backzone. But the text does not say that, and America/Ensenada occurs in backzone; so we do not know what, if anything at all, is wrong here. The backzone file also says: # ... Many of # the zones were formerly in other source files, but were removed or # replaced by links as their data entries were questionable and/or they # differed from other zones only in pre-1970 timestamps. What is the purpose of keeping "questionable" versions of a timezone when we have a better version? If we know an error, it should be corrected -- but I do not see any purpose in keeping the incorrect version in backzone. And if we do not know where the errors are, then the timezone is in good company: except for a few carefully researched cases, we are never sure whether we have listed all transitions, or whether all transitions are correct. No need to keep it in backzone either. So the stuff about "questionable" data entries should disappear from backzone. Michael Deckers.
On 10/2/21 3:02 PM, Stephen Colebourne via tz wrote:
An abstract region represents a part of the Earth that has had the same clock rules since 1970. At some point prior to 1970 different locations within the abstract region had different clock rules, even if that was just LMT.
Thanks, I understand that notion better now.
3) "An event will happen at this time in the future" If an end-user wants to store an event in the future, eg. a one hour meeting next month, the correct approach is to store the date, time and zone ID
That doesn't suffice, as it doesn't work if the Zone splits in the meantime.
Of course that is true. But this hasn't been a problem in practice for business applications AFAICT.
If we could ignore problems as significant as Zone splits, we would have a lot of leeway. As you suggested earlier, we could even discard all pre-1970 data, as that would be less of a practical problem than Zone splits are. However, I doubt whether Zone splits are that easy to ignore. Granted, Zone splits don't happen every day - I think the most recent one was in 2018 when Asia/Qyzylorda hived off Asia/Qostanay. However, I expect that the roughly 800,000 people in the newly-created Zone had generated a fair number of affected timestamps in their business planning and scheduling.
by providing separate IDs like Europe/Oslo and Atlantic/Reykjavik for many years, TZDB has provided the basic tools necessary for business applications to function in the way described above.
Sure, and those IDs are still there and still supported. I think we agree that IDs like Europe/Oslo should continue to exit.
IMO, it is important to ensure that IDs exist for such shared zones. But luckily enough we already have "US/Mountain" and "CET", so the ID part is already sorted (in the US and EU).
Unfortunately even that is not sorted. There's a good chance that neither "US/Mountain" nor "CET" will work the way users might expect, if the US and Europe change time zone rules in ways that are seriously being discussed by political leaders. For example, suppose a few US/Mountain locations (just Idaho, say) decide to continue with current US/Mountain rules while most US/Mountain locations switch to CST (something that tzdb currently has no Zone for). In that case, if we had only Zones like "US/Mountain" we'd be asking most users to change their TZ settings, which'd be worse than the current system where "America/Denver" should continue to work for most users. Originally, tzdb started out with zones like "US/Mountain" and "CET". However, when I added data for the rest of the world, I discovered that the old approach didn't generalize well to places less centralized than US and the EU. It was stretching to even apply that old scheme to Australia, where different states in the same time zone sometimes (but not always) use different DST rules. I didn't offhand see how to apply the old approach to Russia (which uses a different system mostly unknown in the West, and which was chaotic in the 1990s), much less Africa, the Pacific, etc. I'm not saying it couldn't have been done (at least in theory) given enough time and research into all the legal definitions of time, but the time wasn't available and the research has never been done.
While I can understand the conceptual desire to have TZDB only provide abstract regions, it seems completely impractical to do so at this point
I agree, as mentioned above. For compatibility reasons we should provide names like Europe/Oslo into the indefinite future. If this isn't already clear in the guidelines it'd be good to make it clear.
Stephen Colebourne via tz said:
3) "An event will happen at this time in the future" If an end-user wants to store an event in the future, eg. a one hour meeting next month, the correct approach is to store the date, time and zone ID
That doesn't suffice, as it doesn't work if the Zone splits in the meantime.
Of course that is true. But this hasn't been a problem in practice for business applications AFAICT.
How can you tell? If a business application is in use in a region that has a split, I would expect it to affect the people in at least one half of the split, if not both.
The point above is documenting how things *are*. Business applications and calendaring systems really do store the local date/time and zone ID from TZDB for future events. This works precisely because there are enough IDs provided by TZDB to make this work.
Yes, and more. And it *does not* work if your data includes a zone that gets split.
If TZDB had only ever offered an ID for each abstract region, then *some other* system/project would have needed to be invented to define a set of IDs suitable for business applications.
No, it wouldn't. Assuming that we're sticking with post-1970 data, then a business application would work just fine using "Europe/Berlin" for data relating to Berlin, to Frankfurt, to Oslo, and to Stavanger.
But TZDB doesn't just provide IDs for abstract regions, it has always provided more than that. And by providing separate IDs like Europe/Oslo and Atlantic/Reykjavik for many years, TZDB has provided the basic tools necessary for business applications to function in the way described above.
Completely false. Please explain how the presence of the string "Europe/Oslo", which is interchangable with the string "Europe/Berlin", has made any difference.
it is too late to get rid of IDs like Europe/Oslo and Atlantic/Reykjavik.
For backwards compatibility, yes, I agree and I think everyone else does. But saying that "Europe/Berlin" and "Europe/Oslo" are equivalent works just fine, just as saying that "Africa/Accra" and "Africa/Abidjan" are equivalent does. The business application only cares about the abstract regions, not the specific strings. (In the guts, anyway; user interface is another question.) -- Clive D.W. Feather | If you lie to the compiler, Email: clive@davros.org | it will get its revenge. Web: http://www.davros.org | - Henry Spencer Mobile: +44 7973 377646
participants (10)
-
Clive D.W. Feather -
Eliot Lear -
Guy Harris -
Jon Skeet -
Jürgen Appel -
Matthew Donadio -
Michael H Deckers -
Paul Eggert -
Russ Allbery -
Stephen Colebourne