[tz] TZDB use cases

Sept. 29, 2021

      In this thread I want to try and capture the *requirements* for the
project from the perspective of a downstream user. Given this, we may
then have a chance to analyse whether the project needs a change of
rules.

NOTE: The discussion below is not a proposal, nor does it seek to
explore which locations get pre-1970 data. Lets keep that for another
thread.

--------
1) "What is the time now?"
To answer this question you need a dataset that records the current
timezone rules for each region in the world. Over time that data will
naturally build up to form a historic record.

To aid the project, a 1970 rule has been adopted for data management.
In some parts of the world, clocks have agreed since 1970 across large
parts of a continent. In other parts of the world, different parts of
the same country have had different rules since 1970.

Viewing the data through a post-1970 lens splits the planet into
regions where clocks have agreed since 1970. Some of these regions are
quite abstract, for example the region including Iceland, Ivory Coast,
and St Helena.

An abstract region would need to be split if one of the parts of the
region changed its timezone rules. If a user had previously selected
to follow one of these abstract regions, they might have to change
their timezone settings. For example, if a user in Iceland selected
their abstract region by the recommended ID based on the largest city
they would get `Africa/Abidjan`, but if the Ivory Coast changed their
rules, users in Iceland would have to change their selected timezone.

The minimum set of IDs is one ID for each abstract region. The current
ID scheme selects the largest city in the abstract region.

My interpretation:
* An abstract region should theoretically have an ID separate to the
IDs of the cities/countries in the region
* Trying the ID of the abstract region to the largest city in the
region is prone to failure if that city changes timezone rules but the
rest of the abstract region stays the same.
* An abstract region ID would naturally have no pre-1970 data, as it
doesn't define data in a single city/location.
* The alternative is to encourage users to use their local city, which
means that IDs like Europe/Amsterdam and Europe/Oslo cannot be
deprecated

--------
2) "What was the time at some instant in the past?"
The project can answer this question pretty reliably for dates from
1970 onwards because of the data set in #1.

The project can never realistically answer the question reliably or
accurately before 1970 for *all* locations. For example, timezone
rules were entirely local in the US for about 20 years, meaning that
hundreds of entries would be needed to correctly capture the data. In
addition, much of the historic data globally is unknown and not viable
to research, even if someone wanted to do so.

It is possible to research and record historic timezone rules for
*some* locations. Europe/London being one example.

The project generally operates on the basis that if an ID exists,
historical data can be recorded against it, although which file that
data lives in has varied over time. IDs are generally not created
solely to document historic data.

There is no minimum set of IDs for this use case - the set of possible
IDs is unbounded.

My interpretation:
* The majority of pre-1970 data isn't much use
* Some pre-1970 data is high quality, and almost certainly relied upon
(eg Europe/London)
* Without a massive increase in scope the best that can be done
pre-1970 is to record data against IDs created for some other reason.

--------
3) "An event will happen at this time in the future"
If an end-user wants to store an event in the future, eg. a one hour
meeting next month, the correct approach is to store the date, time
and zone ID, not the offset (which is not yet known for certain).Even
if the local timezone authority changes the tz rules between now and
the event, the time of the event is still correct.
https://stackoverflow.com/questions/19166995/java-calendar-date-and-time-man...
For this approach to work, the ID that is stored must be aligned with
the timezone authority.

Business contracts are frequently written to describe future events.
For example, payment must be made by 2pm Paris time on a given date.

The minimum set of IDs for this use case is one ID per timezone
authority.  This is typically the country but not necessarily,
examples being decisions devolved to states/nations etc.

My interpretation:
* Using TZDB IDs for future events/contracts is very widespread
* In most parts of the world, this requires one ID per country, in
some parts of the world additional IDs would ideally be needed such as
per state in the US
* One ID for each ISO country (as was the guideline in the past) is a
reasonable minimum expectation of end users
* While one might argue IDs for future events are not TZDBs problem, I
believe the reality is that they are because usage in this way is
already very widespread

--------
4) "An event will happen at this time in the future relative to a
shared/common definition"

Some future events are declared relative to a definition that is
supra-national or federal. For example, US Mountain, US Pacific, EU
Central etc.

A TV show might be defined to air at 8pm Mountain Time in one months
time. It will air at that time regardless of whether any of the states
that are currently on Mountain Time change to Pacific Time. For
example, even if Denver (and Colorado) moved to Pacific Time or
stopped following DST, Mountain Time would still exist and the TV show
would still air at 8pm Mountain.

The minimum set of IDs for this use case is one ID per shared/common
definition. These IDs are not full aliases for a country or city ID,
although they may currently share data. To support this use case,
these IDs cannot be deprecated.

My interpretation:
* Some shared zone IDs are distinct and separate enough to warrant a full ID
* US Mountain time is not the same as Denver time when considering the
future, the IDs could diverge
* In TZDB terms, the Link from US/Mountain to America/Denver is fine
to define the data, but US/Mountain should not be considered
deprecated as it has a real and distinct meaning separate from
America/Denver
* It would be useful to agree what the criteria for including these
shared/federal rules is.

--------
5)  "Data should be backwards compatible"
The project has various associated downstream projects. Many of these
have an expectation of backwards compatibility. This compatibility
extends to the binary format, the source format, and the data
contained within. This use case wants to see the project just continue
with the data it has, enhancing it over time.

The project has a pretty good record in terms of not removing IDs. For
example, IDs exist for "Portugal" and "Poland", but not "Italy" or
"Spain" simply because they were defined once upon a time.

My understanding is that IDs in the `backward` file represent
deprecated IDs that should not be used going forward. Yet, downstream
projects focussed on backwards compatibility will generally use the
`backward` file in full.

The minimum set of IDs for this use case is the set of IDs that
existed in the previous release.

My interpretation:
* Many users care about compatibility
* The current IDs are not going away
* Inconsistency has existed in TZDB for many years, see "Portugal" vs
"Spain" - no one wants to fix that
* It is hard at present to identify IDs that are deprecated because of
proper aliasing, such as spelling changes, vs those where the ID is a
real location but its TZDB status has been reduced.
* Change may be possible with a long notice period

---------

Please note that the above is not a proposal, but an exploration of
the design space. If you have any use cases I've missed feel free to
write them up.

thanks
Stephen

[tz] TZDB use cases

Stephen Colebourne