I get the impression that this debate is caused by the existence of 2 different schools of thought:

* Descriptive: Paul wants to describe the timezones of the world without regard to how those time zones were created, and merge them into the smallest set that can generate the timekeeping rules. I can see that in this view, merging timezones from different countries into the same equivalence class is reasonable.

* Prescriptive: I think Stephen and others start with the fact that time zones are the creations of political organizations which write the regulations that define the timezones. Those governing bodies are predominantly organized by country in a hierarchical structure. In this view, it does *not* make sense to merge timezones from different countries. This view also implies that the TZ identifiers should reflect the political organizational structure of the world.

I want to suggest that it may be possible for these 2 views to coexist. We could create a new file, e.g. call it 'countryzone', which contains a set of Links organized in a hierarchical tree by country, pointing to the Core zones. Paul can maintain the Core files as before, and 'countryzone' would be maintained by a different set of people. Assuming the Core timezones is a complete set that covers all unique timezones in the world, then all other ISO-country based timezones can be mapped to one of the Core timezones.

For this to work, I think we need to clarify the semantics of the 'Link' records in the TZ database. As far as I can tell, there are at least 3 different meanings of the Link record:

1) Link Canonical Deprecated

* Deprecated is an old zone which should no longer be used

2) Link Canonical Alternate

* alternate spelling or alias, but not deprecated

3) Link Canonical Merged

* zones which were merged because they have the same rules by chance, but there is no semantic relationship to each other

I propose that we replace the 'Link' keyword with 3 new keywords that identify the precise meaning: LinkOld, LinkAlt, and LinkMerged. (My hope is that keeping the 'Link' prefix will make it easy to update existing TZDB parsers to preserve their previous behavior.) Slight aside: I learned that some 3rd party timezone libraries do not preserve round-trip zone Id for Links. In other words, (pseudo-code) `TimeZone(linkName).getName() != linkName`. I wonder if it is worth defining the expected behavior of each type of Links for downstream libraries.

For the pre-1970 data, it is my understanding that the 'backzone' file contains Zone records which should replace ONLY the LinkMerged records found in the other files. I propose that all LinkMerged records be extracted into a separate file (let's call it 'mergedzone') so that there is a clear symmetry between 'backzone' and 'mergedzone', which allows them to be substituted for each other. The dependency diagram looks something like this:

countryzone

Core (africa, asia, etc...)

+-- backzone

+-- mergedzone

Downstream libraries which want only post-1970 can use: countryzone, Core, mergedzone

Downstream libraries which want to include pre-1970 can use: countryzone, Core, backzone

@Stephen: We may be at a point where further debate is not productive. Perhaps we should create an exploratory fork of the TZDB to evaluate these ideas explicitly. It is easier to get feedback from a concrete implementation than to continue discussing ideas and options in a vacuum. I propose a GitHub project with an initial seed of the 10 raw TZDB files. And let's use the usual GitHub PR, Issues, and Discussions workflow, so that proposals can be reviewed and discussed before being committed into the repo.

If there is any chance that this will result in being able to type "Canada/Toronto" instead of "America/Toronto", that would resolve an annoyance that has lasted some 30-35 years.

Brian

On Thu, Nov 4, 2021 at 4:04 PM Stephen Colebourne via tz <tz@iana.org> wrote:

On Wed, 3 Nov 2021 at 22:40, Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 10/18/21 06:07, Stephen Colebourne via tz wrote:
> > What tzdb previously offered was a set of IDs,
> > based on a simple rule - "ID as needed for post-1970 data, with at
> > least one per ISO country". Full history was available for each of
> > these (whether accurate or not).
>
> That wasn't ever the case. For example, there was never full history
> (accurate or not) for San Marino. We shouldn't base our analysis on the
> idea that we formerly had at least one Zone per ISO country, as we never
> had an ironclad rule like that and we did just fine without any such rule.

Lets unpack this for a minute.

Looking at the state of tzdb in mid 2012:
- Europe/San_Marino existed as an ID
- it was an alias for Europe/Rome
https://github.com/eggert/tz/blob/dccd5a16af62c52f2b49a2fe56270a710617cbbd/europe#L1452-L1461

In practical terms as a user:
- you could query it for full history
- the data you got back was accurate post-1970
- the data you got back pre-1970 was of unknown accuracy (except LMT
which was definitely inaccurate)
- the data was the best researched data for San Marino available

As such, I don't think it is correct to say that "there was never full
history" for San Marino. The ID existed and history could be queried.
The data that was available was good enough because San Marino shares
enough geopolitical history with Rome that users can overlook the
distinction. And no-one has ever been motivated to do better. This is
a hugely different scenario to Reykjavik returning data from Abidjan
where you are intending to knowingly make the data worse for
end-users.

The ironclad rule (AFAICT) is that there was always an *ID* for each
ISO country, and that the data it returned was acceptably accurate,
not outrageously wrong.

> There's no *timekeeping* reason to require a Zone for every ISO country.
> Adding such a requirement would complicate maintenance.

I think someone born in Iceland before 1970 might well disagree that
there is no timekeeping reason at work here.

I think the real problem here is that you are trying to fundamentally
change what tzdb offers. I'm here communicating as clearly as I can
that end-users expect one zone per country as a minimum because that
is what they have had for 15 or 20 years. Retaining backwards
compatibility for IDs is great, but meaningless if those IDs return
backwards incompatible data.

Ultimately, you haven't addressed my key point that a perfectly
rational unified set of IDs has been bifurcated into ones that are
deemed important and ones that are not. That is quite specifically
something *new*, a change from what the project previously provided.
And I think most would objectively judge it as being a degradation of
what is offered by tzdb.

> These downsides of a one-Zone-per-country rule may not appear to be all
> that serious to people who are not actively maintaining the database,
> but as the primary maintainer of a database that I would like to be as
> accurate as possible, I would object to adding distracting and
> error-prone makework like that to my volunteer workload.

To be clear, I think this is exactly why tzdb should move beyond being
a volunteer-led project. In practical terms, the only realistic
financially supported option I'm aware of is CLDR. But it is up to
those funding CLDR to decide if they are willing to pay to expand it's
mandate.

In reality I don't think there actually is any extra work, as you have
already separately committed to including any historical data people
provide, and new ISO codes are an extremely rare occurrence. The real
work in recent years has been the fallout from your choice to degrade
what tzdb offers.

If you genuinely do want to reduce your volunteer work to only be the
abstract post-1970 regions and not to maintain any data pre-1970, then
you really should be clear about that. You could then look for an
alternate maintainer of tzdb itself as you would be maintaining what
amounts to a new database, which would best sit in a different git
repo. That data could then be an input to tzdb itself.

Stephen