I was be contributing a longer message on this whole subject within a few days, but I want to state upfront that country/city is insufficient in many countries for a fully expandable location database.

Off the top of my head, I can think of three different Richmonds in Canada. One in Quebec, one in Ontario, and one in BC. I know there are many Springfields in many USA states, and let's not even talk about the potential USA/Columbus.

The key of a database should be unique and recognizable. For many countries, multiple levels of government are the only way to properly differentiate cities.

Canada/British_Columbia/Richmond would be more appropriate.

This allows the pre-1970 db to be both infinitely expandable, as well as user friendly to the point of being fully obvious to all users.






Sent from my Galaxy


-------- Original message --------
From: Brian Park via tz <tz@iana.org>
Date: 2021-11-05 11:27 (GMT-05:00)
To: Philip Paeps <philip@trouble.is>
Cc: Stephen Colebourne <scolebourne@joda.org>, Time Zone Mailing List <tz@iana.org>
Subject: Re: [tz] Pre-1970 data

On Thu, Nov 4, 2021 at 10:11 PM Philip Paeps <philip@trouble.is> wrote:
On 2021-11-05 12:17:34 (+0800), Brian Park via tz wrote:
> I get the impression that this debate is caused by the existence of 2
> different schools of thought: [...]
>
> I want to suggest that it may be possible for these 2 views to
> coexist.

They de facto coexist right now.  The overwhelming majority of the data
are descriptive.  Only recent efforts have made some of the post-1970
data appear more prescriptive.

They coexist in an ad hoc manner right now, and that seems to be one of the causes for the contention. I am suggesting that we formalize the separation, so that both groups are happier.

> We
> could create a new file, e.g. call it 'countryzone', which contains a
> set
> of Links organized in a hierarchical tree by country, pointing to the
> Core
> zones.

I strongly believe we should continue to carefully avoid attempting to
group data by country.  [I would even avoid using the word "country"
wherever possible.]

Can you explain why? Because it will cause arguments about disputed places? I think only a small minority of places around the world are disputed. By separating these ISO-country timezones into a 'countryzone' file, perhaps we can confine the debate into a smaller section of the TZDB. We could create duplicate entries (i.e. Country1/City, Country2/City), or create a pseudo-country called "Disputed" (i.e. Disputed/City). The point is, we can create policies that govern these disputed regions.

Could we move 'countryzone' into a separate project? Probably, but some amount of initial coordination and refactoring would be required to resolve conflicting zone identifiers.

Overall, I feel like the TZDB data should lean a bit more towards matching how end-users think about timezones in the real world (Prescriptive), and lean slightly less on treating timezones as a clustering problem (Descriptive). But I can see pros and cons of both approaches. Which is why I am suggesting ways to make the 2 approaches interoperate better.

> For the pre-1970 data, it is my understanding that the 'backzone' file
> contains Zone records which should replace ONLY the LinkMerged records
> found in the other files. I propose that all LinkMerged records be
> extracted into a separate file (let's call it 'mergedzone') so that
> there
> is a clear symmetry between 'backzone' and 'mergedzone', which allows
> them
> to be substituted for each other. The dependency diagram looks
> something
> like this:

As I've suggested before in another thread, I think we should consider
undoing the split into backzone.  I really liked Stephen's phrasing
earlier in this thread: acceptably accurate, not outrageously wrong.  We
started moving data to backzone to limit the scope of 'active'
maintenance to post-1970 data.  That artificial split led us towards a
more prescriptive worldview.  It seems clear that prescriptive simply
does not work for a real world with people on it.

I think Paul Eggert has made it clear that he does not want to maintain this data. My proposed refactoring of this info into the 'backzone' /  'mergedzone' pair makes it easy for downstream libraries to add back the 'backzone' data if they want. The 'make PACKRATDATA=backzone' hack does not help downstream libraries which do not use TZif or the Makefile.
> If there is any chance that this will result in being able to type
> "Canada/Toronto" instead of "America/Toronto", that would resolve an
> annoyance that has lasted some 30-35 years.
In this context, America refers to the landmass, not to the political
entity occupying a large chunk of it.  [Canada/Eastern etc moved to
backward around 1993, as far as I can tell.]

Virtual no one in the world thinks of "America" as referring to all of "North America" and "South America".

Brian