Destruction of timezone history - please revert (typos corrected)

I consider the decision to merge zones of multiple countries, just because they share a common post-1970 history, one of the worst decisions ever made for the TZ project. There used to be Zone Europe/Copenhagen, which represented the exact timezone history of Copenhagen, in fact of all of Denmark, since the introduction of standard time in Denmark in 1890. Now the file 'europe' says # For Denmark see Europe/Berlin. But zone Europe/Berlin is a bad representation of Denmark's timezone history. It gives false data from 1949 backward. The decision to remove zones like Europe/Copenhagen from the main zone files destroys the work of a whole generation of timezone history researchers. It is a completely unnecessary step, forced by nothing. I urge the responsible persons to revert this step.

I assume the main reason why Paul Eggert wants to merge time zones is the fact there is less maintenance work, if for example EU rules change not only regarding DST, but regarding the standard time. If something like this happens, multiple zones have to be updated. Some years ago I developed my own patch to zic.c to make it understand the new keyword 'follow'. That way, I can represent a zone which deviates from a TZ "standard zone" before a certain date, and then say 'follow standard zone' at the end of the zone data. I have completed that for several countries, for example to represent the deviations from Europe/Berlin within parts of Germany before 1949, or to represent the numerous deviations in France from Europe/Paris. I attach two files which show the application of the 'follow' keyword. Of course I will be happy to share the patch to zic.c which enables it to process the follow keyword. As it is now, I apply my patch each time an update of zic.c is published. If the follow keyword is introduced into the main distribution, it could be used for zones like Europe/Copenhagen which instead of being eliminated would become Zone Europe/Copenhagen 0:50:20 - LMT 1890 0:50:20 - CMT 1894 Jan 1 # Copenhagen MT 1:00 Denmark CE%sT 1942 Nov 2 2:00s 1:00 C-Eur CE%sT 1945 Apr 2 2:00 1:00 Denmark CE%sT 1950 0 follow Europe/Berlin The information that Copenhagen is the same as Berlin after 1949 would be there, as well as the original data.

Adding a "follow"-like syntax is a fine idea. I think it's been proposed before, but not implemented. I would hope that one could use "follow" not only at the end of a Zone, but also within a zone, such as "location X follows location Y from 1950 to 1990". Also, the code should defend itself against follow cycles (e.g., A follows B follows C follows A). Any changes along these lines would be complicated by backward-compatibility concerns, though. That is, we could use a "follow"-like syntax only in vanguard format at first. To give you a feel of how long this can take, we added support for %z in zone abbreviations in 2015f, and we're not using this feature even now in the vanguard data, much less the main data (this is mostly due to my lack of time...).

I kept my implementation for 'follow' very simple: - only at the end - not recursive, ie., the zone to which follow points does not have follow itself. The reason is my experience with the Shanks database in its software version (not the book variant, which is nearly flat). That one is full of jumps from one table to the other and back. It is extremely complicated to follow this as a human, which makes it extremely to maintain. The reason for this structure chosen by Shanks was that it was implemented in the lat 1970ies in Fortran an machines with very limited RAM. Data volume had to be kept to a minimum. In the book publication, there was no space limit. This is why he used flat tables there. It is better to have a structure as simple and flat as possible. Readability for humans has to be kept in mind. I designed my 'follow' excatly for the purpose of extending TZ database backwards. All sub-areas which exist pre-1970, for example 128 areas in the state of Illinois, 340 in Indiana, 222 in New York state, 114 in Ohio, end up in the single zone which covers post 1970 Illinois, Indiana, New York or Ohio. That gives a simple, readable and maintainable 'follow' syntax. Any more complex follow structures are not worth the trouble. In my implementation, each zone gets of course its complete stand-alone binary file. So any space saved in the source files with more complex jump structures gets lost in the binary files anyway. A major problem is that zic.c is not the only converter from source files to binary. If the source syntax is changed, all converters which are not patched, will fail. On 14.06.21 09:27, Paul Eggert wrote:
Adding a "follow"-like syntax is a fine idea. I think it's been proposed before, but not implemented. I would hope that one could use "follow" not only at the end of a Zone, but also within a zone, such as "location X follows location Y from 1950 to 1990". Also, the code should defend itself against follow cycles (e.g., A follows B follows C follows A).
Any changes along these lines would be complicated by backward-compatibility concerns, though. That is, we could use a "follow"-like syntax only in vanguard format at first. To give you a feel of how long this can take, we added support for %z in zone abbreviations in 2015f, and we're not using this feature even now in the vanguard data, much less the main data (this is mostly due to my lack of time...).

On 6/14/21 6:42 AM, Alois Treindl wrote:
The reason is my experience with the Shanks database in its software version (not the book variant, which is nearly flat). > That one is full of jumps from one table to the other and back. It is extremely complicated to follow this as a human
I have never used Shanks's software and didn't know that it exposed its tables to users. And although Shanks's books typically use "follow" only at the end of a time table, they sometimes use it earlier to good effect. For example, the Shanks International Atlas (6th edition) has a table for Melbourne which essentially says "use Sydney's table with these few exceptions", and the exceptions are not all at the start which means the Melbourne table has multiple "follow"s.
Readability for humans has to be kept in mind.
Yes, but I expect that readability would be helped by having "follow" within tables so long as it's done reasonably, as the Shanks book does for Melbourne. Part of the issue here is: what is the typical use case? For Shanks's books, the main idea is to present the illusion of a complete set of data for everywhere on the planet, and to make it easy to look up a timestamp history from a location. These books were automatically-generated from more-compact tables that are easier to maintain but harder to read. For tzdb, we don't present the illusion of a complete timezone history for every location; instead the goal is merely to record histories since 1970. Also, the tzdb source consists of easier-to-maintain but harder-to-read compact tables, with the idea that downstream software like 'zdump' can make the tables easier to read as needed. So the goals for tzdb's tables differ from the goals of Shanks's books. In particular, a common scenario in tzdb is for two zones to split: formerly they were identical before 1970, but now they're different. For example, Pacific/Bougainville split from Pacific/Port_Moresby in 2014i. If we had had "follow" back then, I would have wanted to use it to say the equivalent of "Pacific/Bougainville is like Pacific/Port_Moresby before 2014-12-28 02:00, and here's what it looks like afterwards". But I couldn't do that if "follow" is only at the end. Anyway, thanks for contributing the code to implement "follow" at the end; it's provided food for thought and perhaps we can see our way to implementing something along those lines. (Of course at first it would be used only in vanguard format.)

On Mon, 14 Jun 2021 at 08:27, Paul Eggert via tz <tz@iana.org> wrote:
Any changes along these lines would be complicated by backward-compatibility concerns, though. That is, we could use a "follow"-like syntax only in vanguard format at first.
There is a difference with negative DST though. In my experience, negative DST is impossible to revert in the general case, as a result people want to avoid it. By contrast, a follow rule could be easily implemented as no data is being lost. Stephen

Just in case anyone is interested, my patch for zic.c (from 25 April 2021) is attached. It is in the public domain. On 12.06.21 12:34, Alois Treindl via tz wrote:
I assume the main reason why Paul Eggert wants to merge time zones is the fact there is less maintenance work, if for example EU rules change not only regarding DST, but regarding the standard time.
If something like this happens, multiple zones have to be updated.
Some years ago I developed my own patch to zic.c to make it understand the new keyword 'follow'.
That way, I can represent a zone which deviates from a TZ "standard zone" before a certain date, and then say 'follow standard zone' at the end of the zone data.
I have completed that for several countries, for example to represent the deviations from Europe/Berlin within parts of Germany before 1949, or to represent the numerous deviations in France from Europe/Paris.
I attach two files which show the application of the 'follow' keyword.
Of course I will be happy to share the patch to zic.c which enables it to process the follow keyword.
As it is now, I apply my patch each time an update of zic.c is published.
If the follow keyword is introduced into the main distribution, it could be used for zones like Europe/Copenhagen which instead of being eliminated would become
Zone Europe/Copenhagen 0:50:20 - LMT 1890 0:50:20 - CMT 1894 Jan 1 # Copenhagen MT 1:00 Denmark CE%sT 1942 Nov 2 2:00s 1:00 C-Eur CE%sT 1945 Apr 2 2:00 1:00 Denmark CE%sT 1950 0 follow Europe/Berlin
The information that Copenhagen is the same as Berlin after 1949 would be there, as well as the original data.

I discovered and fixed a bug in my patch. I am not going to continue posting it here, as I do not want to swamp the list with stuff outside its target. Before anyone tries to use the patch I posted before, please write to me to get an updated version. On 14.06.21 15:48, Alois Treindl via tz wrote:
Just in case anyone is interested, my patch for zic.c (from 25 April 2021) is attached. It is in the public domain.
participants (3)
-
Alois Treindl
-
Paul Eggert
-
Stephen Colebourne