Here are some stats on how much difference winnowing makes to the dataset. Baseline: the source files define 441 distinct zones. Winnowing then reduces the number of distinct zones thus: YEAR COUNT 0000 440 1880 440 1890 439 1900 439 1910 437 1920 435 1930 435 1940 434 1950 425 1960 423 1970 417 1980 391 1990 365 2000 339 2010 313 2020 305 The reduction by one zone for a threshold year of 0000 comes from Pacific/Johnston, a US minor outlying island, which is defined with data identical to the "HST" zone. (Contrary to the usual practice of using LMT for the first segment of geographical zones.) As the two tzfiles are byte-for-byte identical, tzwinnow will merge them regardless of date thresholds. The results for a threshold in the future set a lower limit on the size to which the database can be reduced by this mechanism. Relative to the present full database of 441 zones, it's only a modest gain. Using a threshold later than 1970 for installation purposes will probably only be attractive in a few more decades' time, when there have been many more zone splits arising from contemporary activity. A threshold later than 1970 is currently much more valuable for tzselect purposes. Here it's saving human cognitive load rather than storage space. The reduction in the number of zones is concentrated disproportionately in the countries that have the greatest complexity. This is particularly noticeable with the US zones, the list of which is quite unwieldy in unwinnowed form. Over the period where we attempt complete coverage (1970 to today), the rate at which the number of distinct zones changes is amazingly consistent at 26 per decade. A similar pattern emerges when winnowing with a varying upper date limit, showing that there's a long-term roughly constant rate of zone churn. The number of zones distinct within a single decade is also roughly constant, around 330. The number distinct within a single year hovers around 300, possibly showing a slight rising trend, but I suspect that's an artifact of incomplete data. This suggests that a strategy of winnowing with a moving threshold that remains N years ago will produce a roughly constant zone count. By contrast, the number of zones differing at any time post-1970 (currently 417), or from any other fixed threshold, can grow without bound, and it looks like it will. -zefram
Zefram wrote:
The reduction by one zone for a threshold year of 0000 comes from Pacific/Johnston, a US minor outlying island, which is defined with data identical to the "HST" zone. (Contrary to the usual practice of using LMT for the first segment of geographical zones.)
That's clearly an error in the database; thanks for reporting it. A proposed patch at the end of this message. Unfortunately I haven't had time to review the winnowing code yet, but I do plan to get to it. ----- The old, standalone entry was demonstrably wrong, and the link is more nearly right. Problem reported by Andrew Main (Zefram) in <http://mm.icann.org/pipermail/tz/2013-September/019817.html>. --- australasia | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/australasia b/australasia index 797f81c..74ebee2 100644 --- a/australasia +++ b/australasia @@ -749,8 +749,17 @@ Zone Pacific/Funafuti 11:56:52 - LMT 1901 # no information; was probably like Pacific/Kiritimati # Johnston -# Zone NAME GMTOFF RULES FORMAT [UNTIL] -Zone Pacific/Johnston -10:00 - HST +# +# From Paul Eggert (2013-09-03): +# In his memoirs of June 6th to October 4, 1945 +# <http://www.315bw.org/Herb_Bach.htm> (2005), Herbert C. Bach writes, +# "We started our letdown to Kwajalein Atoll and landed there at 5:00 AM +# Johnston time, 1:30 AM Kwajalein time." This was in June 1945, and +# confirms that Johnston kept the same time as Honolulu in summer 1945. +# We have no better information, so for now, assume this has been true +# indefinitely into the past. +# +Link Pacific/Honolulu Pacific/Johnston # Kingman # uninhabited -- 1.8.1.2
participants (2)
-
Paul Eggert -
Zefram