I get the impression that this debate is caused by the existence of 2 different schools of thought:
* Descriptive: Paul wants to describe the timezones of the world without regard to how those time zones were created, and merge them into the smallest set that can generate the timekeeping rules. I can see that in this view, merging timezones from different countries into the same equivalence class is reasonable.
* Prescriptive: I think Stephen and others start with the fact that time zones are the creations of political organizations which write the regulations that define the timezones. Those governing bodies are predominantly organized by country in a hierarchical structure. In this view, it does *not* make sense to merge timezones from different countries. This view also implies that the TZ identifiers should reflect the political organizational structure of the world.
I want to suggest that it may be possible for these 2 views to coexist. We could create a new file, e.g. call it 'countryzone', which contains a set of Links organized in a hierarchical tree by country, pointing to the Core zones. Paul can maintain the Core files as before, and 'countryzone' would be maintained by a different set of people. Assuming the Core timezones is a complete set that covers all unique timezones in the world, then all other ISO-country based timezones can be mapped to one of the Core timezones.
For this to work, I think we need to clarify the semantics of the 'Link' records in the TZ database. As far as I can tell, there are at least 3 different meanings of the Link record:
1) Link Canonical Deprecated
* Deprecated is an old zone which should no longer be used
2) Link Canonical Alternate
* alternate spelling or alias, but not deprecated
3) Link Canonical Merged
* zones which were merged because they have the same rules by chance, but there is no semantic relationship to each other
I propose that we replace the 'Link' keyword with 3 new keywords that identify the precise meaning: LinkOld, LinkAlt, and LinkMerged. (My hope is that keeping the 'Link' prefix will make it easy to update existing TZDB parsers to preserve their previous behavior.) Slight aside: I learned that some 3rd party timezone libraries do not preserve round-trip zone Id for Links. In other words, (pseudo-code) `TimeZone(linkName).getName() != linkName`. I wonder if it is worth defining the expected behavior of each type of Links for downstream libraries.
For the pre-1970 data, it is my understanding that the 'backzone' file contains Zone records which should replace ONLY the LinkMerged records found in the other files. I propose that all LinkMerged records be extracted into a separate file (let's call it 'mergedzone') so that there is a clear symmetry between 'backzone' and 'mergedzone', which allows them to be substituted for each other. The dependency diagram looks something like this:
countryzone
|
v
Core (africa, asia, etc...)
+-- backzone
+-- mergedzone
Downstream libraries which want only post-1970 can use: countryzone, Core, mergedzone
Downstream libraries which want to include pre-1970 can use: countryzone, Core, backzone
@Stephen: We may be at a point where further debate is not productive. Perhaps we should create an exploratory fork of the TZDB to evaluate these ideas explicitly. It is easier to get feedback from a concrete implementation than to continue discussing ideas and options in a vacuum. I propose a GitHub project with an initial seed of the 10 raw TZDB files. And let's use the usual GitHub PR, Issues, and Discussions workflow, so that proposals can be reviewed and discussed before being committed into the repo.
If there is any chance that this will result in being able to type "Canada/Toronto" instead of "America/Toronto", that would resolve an annoyance that has lasted some 30-35 years.
Brian