models for timezones
It gets mentioned here often enough, but there's another significant aspect of the tz database which I'm not sure is always sufficiently appreciated, and that's how significantly it extends on the old V7/SysV/Posix model of time zones. In that old model, a time zone has one offset from Greenwich/UTC, and maybe a second offset on top of that to realize "daylight saving time", and one set of rules for deciding when to transition to and from DST. (Once upon a time, that set of rules might have been limited to the pre-1987 rules for the United States, and it might have been hard-compiled into C code. With the advent of the Posix TZ environment variable definition it's at least a little bit more flexible.) But of course (and since the beginning) the tz database has expanded on that model in at least two different ways: 1. It's not limited to one "standard" and perhaps one "daylight" time per year; there can be almost arbitrarily many offsets and arbitrarily many transition dates between them. 2. Everything -- the "base" offset (not that this is even a meaningful concept any more), the alternate offsets, and the transition dates between them can vary over time. And of course the reasons for doing this were that both changes were necessary in order to accommodate the real world; the old SysV/Posix model simply couldn't cope. But this means that there's a significantly imperfect match between the tz database and any programming language or time-handling library that has any of these notions: * A timezone is primarily defined by its base offset from UTC. * It's an additional property of a timezone whether it observes DST or not. * If DST applies to a timezone, it's defined by an additional offset and a single set of rules defining when it applies. * DST always involves a positive offset from the base offset. You *might* be able to get away with a tricky and/or imperfect mapping between the tz database and a scheme that has some of those notions -- for example, the tz reference code successfully implements the C/Posix API, isdst flag and all -- but it's bound to have warts or imperfections. Everything I've said here is, I imagine, familiar and unsurprising to any regular reader of this list, but it probably bears a bit more discussion in the Theory file. I was going to suggest adding these two new bullets under "Extensions to POSIX": * The TZ database allows for considerably more variation in time per year beyond the familiar "standard" and "daylight savings" times. A time zone can have any number of variations per year, and they may have arbitrary designations, not limited to "standard" and "daylight". * The parameters defining a time zone can vary over time, reflecting the fact that this happens in the real world all the time, as legislatures continually redefine them. Whether and when a zone observes DST (or other variants), and even the zone's notional "base offset" from UTC, are variable. (Therefore it doesn't even really make sense to talk about a zone's "base offset", since it's not necessarily a single number.) But that's in the "Time and date functions" section, and easy to overlook. So maybe these words belong as a paragraph higher up, perhaps even in the initial "Scope of the tz database" section: Because the database's scope encompasses real-world timezones which have wide-ranging complexity and are always evolving, the model for describing time zones allows for considerably more variation in time per year beyond the familiar "standard" and "daylight savings" times. A time zone can have any number of variations per year, and they may have arbitrary designations, not limited to "standard" and "daylight". Furthermore, the parameters defining a time zone can vary over time. Whether and when a zone observes DST (or other variants), and even the zone's notional "base offset" from UTC, are variable. (Therefore it doesn't even really make sense to talk about a zone's "base offset", since it is not necessarily a single number.)
On Feb 11, 2018, at 7:45 PM, Steve Summit <scs@eskimo.com> wrote:
In that old model, a time zone has one offset from Greenwich/UTC, and maybe a second offset on top of that to realize "daylight saving time", and one set of rules for deciding when to transition to and from DST. (Once upon a time, that set of rules might have been limited to the pre-1987 rules for the United States, and it might have been hard-compiled into C code. With the advent of the Posix TZ environment variable definition it's at least a little bit more flexible.)
Which is why I prefer calling the things identified by tzdb identifiers "tzdb regions" rather than "time zones" - a tzdb region may, over its history, be in more than one of what people (at least in the US and, I suspect, Canada) think of as "time zones", and there may be more than one tzdb region in a given "time zone". And theory.html starts out saying To represent this data, the world is partitioned into regions whose clocks all agree about timestamps that occur after the somewhat-arbitrary cutoff point of the POSIX Epoch (1970-01-01 00:00:00 UTC). although it later refers to those regions as "time zones".
Guy Harris wrote:
On Feb 11, 2018, at 7:45 PM, Steve Summit <scs@eskimo.com> wrote:
In that old model, a time zone has one offset from Greenwich/UTC...
Which is why I prefer calling the things identified by tzdb identifiers "tzdb regions" rather than "time zones" - a tzdb region may, over its history, be in more than one of [...] "time zones", and there may be more than one tzdb region in a given "time zone".
Well said. And even though I was just preaching about "things not always sufficiently appreciated", I'm not sure I've sufficiently appreciated this point. I live in America/New_York, and it's easy to imagine this is just sort of a longhand for EST5EDT, but of course it's not. (And if I lived in, say, northwestern Indiana, I would be less inclined to make this mistake. Perhaps living in northwestern Indiana for a few years should be a prerequisite for anyone aspiring to profess any real expertise on time zone issues.)
On 02/11/2018 08:18 PM, Guy Harris wrote:
And theory.html starts out saying
To represent this data, the world is partitioned into regions whose clocks all agree about timestamps that occur after the somewhat-arbitrary cutoff point of the POSIX Epoch (1970-01-01 00:00:00 UTC).
although it later refers to those regions as "time zones".
Thanks for catching that, and thanks, Steve, for proposing improved wording aboujt tzdb's extensions to the POSIX model. Proposed patch attached; it uses "tz region" as being a bit shorter than "tzdb region".
participants (3)
-
Guy Harris -
Paul Eggert -
scs@eskimo.com