[PATCH] New file 'pre1970' for zones that differ only in pre-1970 time stamps.

This lets us preserve information about pre-1970 time stamps when we change a Zone to a Link to another zone whose time stamps agree after 1970. This should address concerns about some recent changes that removed this information. This implementation is a stripped-down version of a suggestion by Andrew Main (Zefram) in <http://mm.icann.org/pipermail/tz/2013-August/019615.html> and <http://mm.icann.org/pipermail/tz/2013-August/019639.html> to allow filtering tz data by date range. Unlike Zefram's suggestion, this implementation supports only two date ranges, namely 1970 on, using 'make BACKWARD=backward'; and all dates, using 'make BACKWARD="pre1970 back-pre1970"'. At some point I'd like to improve it to support arbitrary date ranges, but at least we've now restored the data whose loss was of some concern. * .gitignore: Add back-pre1970. * Makefile (BACKWARD): New macro. (YDATA): Use it instead of 'backward'. (AWK_SCRIPTS): New macro, with additional script back-pre1970.awk. (MISC): Use it. (back-pre1970): New rule. (clean_misc): Clean back-pre1970. Also clean time.tab, while we're at it. (check_public): Don't require pre1970 to stand alone. * pre1970, back-pre1970.awk: New files. --- .gitignore | 1 + Makefile | 32 +++++- back-pre1970.awk | 18 ++++ pre1970 | 291 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 337 insertions(+), 5 deletions(-) create mode 100644 back-pre1970.awk create mode 100644 pre1970 diff --git a/.gitignore b/.gitignore index 18dbbcc..28b1bc9 100644 --- a/.gitignore +++ b/.gitignore @@ -4,6 +4,7 @@ *.txt *~ ChangeLog +back-pre1970 date leapseconds time.tab diff --git a/Makefile b/Makefile index a74d1a7..ffddb08 100644 --- a/Makefile +++ b/Makefile @@ -49,6 +49,22 @@ POSIXRULES= America/New_York ZONETABTYPE= zone +# How to support obsolescent time zones in a backward-compatible way. +# This variable affects only pre-1970 time stamps, on hosts that support them. +# It has two possible values, 'backward' and 'pre1970 back-pre1970'. +# +# 'backward' is the traditional approach, and is simpler and more efficient; +# it is designed to generate one zone for each region where clocks have agreed +# since 1970. +# +# 'pre1970 back-pre1970' can generate more than one zone in that situation, +# which means it can preserve a bit of pre-1970 data that 'backward' does not; +# almost all pre-1970 data is missing, though, so don't get your hopes up. +# +# Sometimes 'backward' is more-compatible with earlier versions of this database, +# and sometimes 'pre1970 back-pre1970' is; it depends on the situation. +BACKWARD= backward + # Also see TZDEFRULESTRING below, which takes effect only # if the time zone files cannot be accessed. @@ -322,7 +338,7 @@ COMMON= Makefile DOCS= README Theory $(MANS) date.1 PRIMARY_YDATA= africa antarctica asia australasia \ europe northamerica southamerica -YDATA= $(PRIMARY_YDATA) pacificnew etcetera backward +YDATA= $(PRIMARY_YDATA) pacificnew etcetera $(BACKWARD) NDATA= systemv factory SDATA= solar87 solar88 solar89 TDATA= $(YDATA) $(NDATA) $(SDATA) @@ -330,9 +346,10 @@ TABDATA= iso3166.tab time.tab zone.tab DATA= $(YDATA) $(NDATA) $(SDATA) $(TABDATA) \ leap-seconds.list yearistype.sh WEB_PAGES= tz-art.htm tz-link.htm +AWK_SCRIPTS= back-pre1970.awk checktab.awk leapseconds.awk zone-time.awk MISC= usno1988 usno1989 usno1989a usno1995 usno1997 usno1998 \ - $(WEB_PAGES) checktab.awk leapseconds.awk workman.sh \ - zoneinfo2tdf.pl + $(WEB_PAGES) $(AWK_SCRIPTS) \ + workman.sh zoneinfo2tdf.pl ENCHILADA= $(COMMON) $(DOCS) $(SOURCES) $(DATA) $(MISC) # And for the benefit of csh users on systems that assume the user @@ -423,6 +440,9 @@ zones: $(REDO) time.tab: $(YDATA) zone.tab zone-time.awk $(AWK) -f zone-time.awk $(YDATA) >$@ +back-pre1970: pre1970 backward + $(AWK) -v pre1970=pre1970 -f $@.awk backward >$@ + $(TZLIB): $(LIBOBJS) -mkdir $(TOPDIR) $(LIBDIR) ar ru $@ $(LIBOBJS) @@ -457,6 +477,7 @@ check_web: $(WEB_PAGES) clean_misc: rm -f core *.o *.out \ + back-pre1970 time.tab \ date leapseconds tzselect version.h zdump zic yearistype clean: clean_misc rm -f -r tzpublic @@ -488,7 +509,7 @@ set-timestamps: $$cmd || exit; \ done -# The zics below ensure that each data file can stand on its own. +# The zics below ensure that each non-pre1970 data file can stand on its own. # We also do an all-files run to catch links to links. check_public: $(ENCHILADA) @@ -496,7 +517,8 @@ check_public: $(ENCHILADA) make "CFLAGS=$(GCC_DEBUG_FLAGS)" mkdir tzpublic for i in $(TDATA) ; do \ - $(zic) -v -d tzpublic $$i 2>&1 || exit; \ + test $$i = pre1970 || $(zic) -v -d tzpublic $$i 2>&1 \ + || exit; \ done $(zic) -v -d tzpublic $(TDATA) rm -f -r tzpublic diff --git a/back-pre1970.awk b/back-pre1970.awk new file mode 100644 index 0000000..f7c54fc --- /dev/null +++ b/back-pre1970.awk @@ -0,0 +1,18 @@ +# Generate 'back-pre1970' from the two input files 'pre1970' and 'backward'. +# The output consists of all lines in 'backward' that are not links to +# files mentioned in 'pre1970'. Think of it as 'backward' minus 'pre1970'. + +# The 'backward' file is the input. +# The awk variable 'pre1970' contains the name of the pre1970 file. + +# This file is in the public domain. + +# Contributed by Paul Eggert. + +BEGIN { + while ((getline <pre1970) == 1) + if ($1 == "Zone") + pre1970_zone[$2] = 1 +} + +! (/^Link/ && pre1970_zone[$3]) { print } diff --git a/pre1970 b/pre1970 new file mode 100644 index 0000000..d8b8f34 --- /dev/null +++ b/pre1970 @@ -0,0 +1,291 @@ +# Pre-1970 data + +# This file is in the public domain. + +# This file contains zones that were formerly in other source files, +# but were later removed or replaced by backward-compatibility links +# as they differ from other zones only in pre-1970 time stamps. + +# Although the tz database focuses on post-1970 time stamps, these +# entries are retained here as they may be of some use to people +# interested in pre-1970 time stamps, even though they cover only a +# tiny sliver of pre-1970 data and are unreliable for that data. +# Also, these entries can help with backward compatibility with some +# old versions of the tz database. They are incompatible with other +# old versions of the database, though; it depends on which old +# version you're interested in. + +# Entries are sorted by Zone name. Each entry is preceded by the name +# of the country that the entry is in, along with any other commentary +# and rules associated with the entry. Some rules, e.g., 'Canada', +# are defined by other source files; this file is not intended to be +# used without those other files. + +# Zone NAME GMTOFF RULES FORMAT [UNTIL] + +# Mali +# no longer different from Bamako, but too famous to omit +Zone Africa/Timbuktu -0:12:04 - LMT 1912 + 0:00 - GMT + +# Anguilla +Zone America/Anguilla -4:12:16 - LMT 1912 Mar 2 + -4:00 - AST + +# Antigua and Barbuda +Zone America/Antigua -4:07:12 - LMT 1912 Mar 2 + -5:00 - EST 1951 + -4:00 - AST + +# Argentina +# Chubut (CH) +# The name "Comodoro Rivadavia" exceeds the 14-byte POSIX limit. +Zone America/Argentina/ComodRivadavia -4:30:00 - LMT 1894 Oct 31 + -4:16:48 - CMT 1920 May + -4:00 - ART 1930 Dec + -4:00 Arg AR%sT 1969 Oct 5 + -3:00 Arg AR%sT 1991 Mar 3 + -4:00 - WART 1991 Oct 20 + -3:00 Arg AR%sT 1999 Oct 3 + -4:00 Arg AR%sT 2000 Mar 3 + -3:00 - ART 2004 Jun 1 + -4:00 - WART 2004 Jun 20 + -3:00 - ART + +# Aruba +Zone America/Aruba -4:40:24 - LMT 1912 Feb 12 # Oranjestad + -4:30 - ANT 1965 # Netherlands Antilles Time + -4:00 - AST + +# Canada + +Zone America/Atikokan -6:06:28 - LMT 1895 + -6:00 Canada C%sT 1940 Sep 29 + -6:00 1:00 CDT 1942 Feb 9 2:00s + -6:00 Canada C%sT 1945 Sep 30 2:00 + -5:00 - EST + +Zone America/Blanc-Sablon -3:48:28 - LMT 1884 + -4:00 Canada A%sT 1970 + -4:00 - AST + +# Cayman Is +Zone America/Cayman -5:25:32 - LMT 1890 # Georgetown + -5:07:12 - KMT 1912 Feb # Kingston Mean Time + -5:00 - EST + +# Canada +Zone America/Coral_Harbour -5:32:40 - LMT 1884 + -5:00 NT_YK E%sT 1946 + -5:00 - EST + +# Curacao +Zone America/Curacao -4:35:47 - LMT 1912 Feb 12 # Willemstad + -4:30 - ANT 1965 # Netherlands Antilles Time + -4:00 - AST + +# Dominica +Zone America/Dominica -4:05:36 - LMT 1911 Jul 1 0:01 # Roseau + -4:00 - AST + +# Mexico +Zone America/Ensenada -7:46:28 - LMT 1922 Jan 1 0:13:32 + -8:00 - PST 1927 Jun 10 23:00 + -7:00 - MST 1930 Nov 16 + -8:00 - PST 1942 Apr + -7:00 - MST 1949 Jan 14 + -8:00 - PST 1996 + -8:00 Mexico P%sT + +# US +Zone America/Fort_Wayne -5:00 US E%sT 1946 + -5:00 - EST # Always EST as of 1986 + +# Grenada +Zone America/Grenada -4:07:00 - LMT 1911 Jul # St George's + -4:00 - AST + +# Guadeloupe +Zone America/Guadeloupe -4:06:08 - LMT 1911 Jun 8 # Pointe a Pitre + -4:00 - AST + +# Canada +# Rule NAME FROM TO TYPE IN ON AT SAVE LETTER/S +Rule Mont 1917 only - Mar 25 2:00 1:00 D +Rule Mont 1917 only - Apr 24 0:00 0 S +Rule Mont 1919 only - Mar 31 2:30 1:00 D +Rule Mont 1919 only - Oct 25 2:30 0 S +Rule Mont 1920 only - May 2 2:30 1:00 D +Rule Mont 1920 1922 - Oct Sun>=1 2:30 0 S +Rule Mont 1921 only - May 1 2:00 1:00 D +Rule Mont 1922 only - Apr 30 2:00 1:00 D +Rule Mont 1924 only - May 17 2:00 1:00 D +Rule Mont 1924 1926 - Sep lastSun 2:30 0 S +Rule Mont 1925 1926 - May Sun>=1 2:00 1:00 D +# The 1927-to-1937 rules can be expressed more simply as +# Rule Mont 1927 1937 - Apr lastSat 24:00 1:00 D +# Rule Mont 1927 1937 - Sep lastSat 24:00 0 S +# The rules below avoid use of 24:00 +# (which pre-1998 versions of zic cannot handle). +Rule Mont 1927 only - May 1 0:00 1:00 D +Rule Mont 1927 1932 - Sep lastSun 0:00 0 S +Rule Mont 1928 1931 - Apr lastSun 0:00 1:00 D +Rule Mont 1932 only - May 1 0:00 1:00 D +Rule Mont 1933 1940 - Apr lastSun 0:00 1:00 D +Rule Mont 1933 only - Oct 1 0:00 0 S +Rule Mont 1934 1939 - Sep lastSun 0:00 0 S +Rule Mont 1946 1973 - Apr lastSun 2:00 1:00 D +Rule Mont 1945 1948 - Sep lastSun 2:00 0 S +Rule Mont 1949 1950 - Oct lastSun 2:00 0 S +Rule Mont 1951 1956 - Sep lastSun 2:00 0 S +Rule Mont 1957 1973 - Oct lastSun 2:00 0 S +# Zone NAME GMTOFF RULES FORMAT [UNTIL] +Zone America/Montreal -4:54:16 - LMT 1884 + -5:00 Mont E%sT 1918 + -5:00 Canada E%sT 1919 + -5:00 Mont E%sT 1942 Feb 9 2:00s + -5:00 Canada E%sT 1946 + -5:00 Mont E%sT 1974 + -5:00 Canada E%sT + +# Montserrat +Zone America/Montserrat -4:08:52 - LMT 1911 Jul 1 0:01 # Cork Hill + -4:00 - AST + +# Bahamas +# Rule NAME FROM TO TYPE IN ON AT SAVE LETTER/S +Rule Bahamas 1964 1975 - Oct lastSun 2:00 0 S +Rule Bahamas 1964 1975 - Apr lastSun 2:00 1:00 D +# Zone NAME GMTOFF RULES FORMAT [UNTIL] +Zone America/Nassau -5:09:30 - LMT 1912 Mar 2 + -5:00 Bahamas E%sT 1976 + -5:00 US E%sT + +# Trinidad and Tobago +Zone America/Port_of_Spain -4:06:04 - LMT 1912 Mar 2 + -4:00 - AST + +# Brazil +# Rio_Branco is too ambiguous, since there's a Rio Branco in Uruguay too. +Zone America/Porto_Acre -4:31:12 - LMT 1914 + -5:00 Brazil AC%sT 1988 Sep 12 + -5:00 - ACT + +# Argentina +# Santa Fe (SF), Entre Rios (ER), Corrientes (CN), Misiones (MN), Chaco (CC), +# Formosa (FM), La Pampa (LP), Chubut (CH) +Zone America/Rosario -4:02:40 - LMT 1894 Nov + -4:16:44 - CMT 1920 May + -4:00 - ART 1930 Dec + -4:00 Arg AR%sT 1969 Oct 5 + -3:00 Arg AR%sT 1991 Jul + -3:00 - ART 1999 Oct 3 0:00 + -4:00 Arg AR%sT 2000 Mar 3 0:00 + -3:00 - ART + +# St Kitts-Nevis +Zone America/St_Kitts -4:10:52 - LMT 1912 Mar 2 # Basseterre + -4:00 - AST + +# St Lucia +Zone America/St_Lucia -4:04:00 - LMT 1890 # Castries + -4:04:00 - CMT 1912 # Castries Mean Time + -4:00 - AST + +# Virgin Is +Zone America/St_Thomas -4:19:44 - LMT 1911 Jul # Charlotte Amalie + -4:00 - AST + +# St Vincent and the Grenadines +Zone America/St_Vincent -4:04:56 - LMT 1890 # Kingstown + -4:04:56 - KMT 1912 # Kingstown Mean Time + -4:00 - AST + +# British Virgin Is +Zone America/Tortola -4:18:28 - LMT 1911 Jul # Road Town + -4:00 - AST + +# McMurdo, Ross Island, since 1955-12 +Zone Antarctica/McMurdo 0 - zzz 1956 + 12:00 NZAQ NZ%sT + +# Japan +Zone Asia/Ishigaki 8:16:36 - LMT 1896 + 8:00 - CST + +# Israel +Zone Asia/Tel_Aviv 2:19:04 - LMT 1880 + 2:21 - JMT 1918 + 2:00 Zion I%sT + +# Russia +Zone Asia/Tomsk 5:39:52 - LMT 1924 May 2 + 6:00 - TSK 1957 Mar + 7:00 Russia TS%s 1991 Mar 31 2:00s + 6:00 1:00 TSD 1991 Sep 29 2:00s + 6:00 - TSK 1992 Jan 19 2:00s + 7:00 Russia TS%s + +# Svalbard & Jan Mayen +Zone Atlantic/Jan_Mayen -1:00 - EGT + +# Australia +Zone Australia/Canberra 9:56:32 - LMT 1895 Feb + 10:00 - EST 1917 Jan 1 0:01 + 10:00 Aus EST 1971 Oct 31 2:00 + 10:00 AN EST 1981 Oct 25 2:00 + 10:00 1:00 EST 1982 Apr 4 3:00 + 10:00 AN EST + +# UK +Zone Europe/Belfast -0:23:40 - LMT 1880 Aug 2 + -0:25:21 - DMT 1916 May 21 2:00 # Dublin/Dunsink MT + -0:25:21 1:00 IST 1916 Oct 1 2:00s # Irish Summer Time + 0:00 GB-Eire %s 1968 Oct 27 + 1:00 - BST 1971 Oct 31 2:00u + 0:00 GB-Eire %s 1996 + 0:00 EU GMT/BST + +# Slovenia +Zone Europe/Ljubljana 0:58:04 - LMT 1884 + 1:00 - CET 1941 Apr 18 23:00 + 1:00 C-Eur CE%sT 1945 May 8 2:00s + 1:00 1:00 CEST 1945 Sep 16 2:00s + 1:00 - CET 1982 Nov 27 + 1:00 EU CE%sT + +# Bosnia and Herzegovina +Zone Europe/Sarajevo 1:13:40 - LMT 1884 + 1:00 - CET 1941 Apr 18 23:00 + 1:00 C-Eur CE%sT 1945 May 8 2:00s + 1:00 1:00 CEST 1945 Sep 16 2:00s + 1:00 - CET 1982 Nov 27 + 1:00 EU CE%sT + +# Macedonia +Zone Europe/Skopje 1:25:44 - LMT 1884 + 1:00 - CET 1941 Apr 18 23:00 + 1:00 C-Eur CE%sT 1945 May 8 2:00s + 1:00 1:00 CEST 1945 Sep 16 2:00s + 1:00 - CET 1982 Nov 27 + 1:00 EU CE%sT + +# Moldova +Zone Europe/Tiraspol 1:58:32 - LMT 1880 + 1:55 - CMT 1918 Feb 15 # Chisinau MT + 1:44:24 - BMT 1931 Jul 24 # Bucharest MT + 2:00 Romania EE%sT 1940 Aug 15 + 2:00 1:00 EEST 1941 Jul 17 + 1:00 C-Eur CE%sT 1944 Aug 24 + 3:00 Russia MSK/MSD 1991 Mar 31 2:00 + 2:00 Russia EE%sT 1992 Jan 19 2:00 + 3:00 Russia MSK/MSD + +# Croatia +# Zone NAME GMTOFF RULES FORMAT [UNTIL] +Zone Europe/Zagreb 1:03:52 - LMT 1884 + 1:00 - CET 1941 Apr 18 23:00 + 1:00 C-Eur CE%sT 1945 May 8 2:00s + 1:00 1:00 CEST 1945 Sep 16 2:00s + 1:00 - CET 1982 Nov 27 + 1:00 EU CE%sT -- 1.8.1.2

Hello, so if i'm correct pre-1970 data will be moved out to another file, for Oracle RDBMS we have our own compiler who works directly on the TZdata files. We have a lot of customer who use times from before 1970. please do not move things out without enough time to adapt any structural changes to the actual dataset. Especially for a big corporate env like Oracle it really takes time to get things done. if there is so much reluctance to have pre-1970 dates in a result set then surely adapting the *parser* to omit pre-1970 dates would be more appropriate. I might have missed a really good reason why this all has been done , but personally i do not see any added value of moving pre-1970 TZ definitions out to an other file, it simply complicates things, breaks current behaviour and is harder to parse manually/look at it Regards, Gunther Oracle RDBMS TZ coordinator On 30/08/2013 10:09, Paul Eggert wrote:
This lets us preserve information about pre-1970 time stamps when we change a Zone to a Link to another zone whose time stamps agree after 1970. This should address concerns about some recent changes that removed this information. This implementation is a stripped-down version of a suggestion by Andrew Main (Zefram) in <http://mm.icann.org/pipermail/tz/2013-August/019615.html> and <http://mm.icann.org/pipermail/tz/2013-August/019639.html> to allow filtering tz data by date range. Unlike Zefram's suggestion, this implementation supports only two date ranges, namely 1970 on, using 'make BACKWARD=backward'; and all dates, using 'make BACKWARD="pre1970 back-pre1970"'. At some point I'd like to improve it to support arbitrary date ranges, but at least we've now restored the data whose loss was of some concern. * .gitignore: Add back-pre1970. * Makefile (BACKWARD): New macro. (YDATA): Use it instead of 'backward'. (AWK_SCRIPTS): New macro, with additional script back-pre1970.awk. (MISC): Use it. (back-pre1970): New rule. (clean_misc): Clean back-pre1970. Also clean time.tab, while we're at it. (check_public): Don't require pre1970 to stand alone. * pre1970, back-pre1970.awk: New files. --- .gitignore | 1 + Makefile | 32 +++++- back-pre1970.awk | 18 ++++ pre1970 | 291 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 337 insertions(+), 5 deletions(-) create mode 100644 back-pre1970.awk create mode 100644 pre1970
diff --git a/.gitignore b/.gitignore index 18dbbcc..28b1bc9 100644 --- a/.gitignore +++ b/.gitignore @@ -4,6 +4,7 @@ *.txt *~ ChangeLog +back-pre1970 date leapseconds time.tab diff --git a/Makefile b/Makefile index a74d1a7..ffddb08 100644 --- a/Makefile +++ b/Makefile @@ -49,6 +49,22 @@ POSIXRULES= America/New_York
ZONETABTYPE= zone
+# How to support obsolescent time zones in a backward-compatible way. +# This variable affects only pre-1970 time stamps, on hosts that support them. +# It has two possible values, 'backward' and 'pre1970 back-pre1970'. +# +# 'backward' is the traditional approach, and is simpler and more efficient; +# it is designed to generate one zone for each region where clocks have agreed +# since 1970. +# +# 'pre1970 back-pre1970' can generate more than one zone in that situation, +# which means it can preserve a bit of pre-1970 data that 'backward' does not; +# almost all pre-1970 data is missing, though, so don't get your hopes up. +# +# Sometimes 'backward' is more-compatible with earlier versions of this database, +# and sometimes 'pre1970 back-pre1970' is; it depends on the situation. +BACKWARD= backward + # Also see TZDEFRULESTRING below, which takes effect only # if the time zone files cannot be accessed.
@@ -322,7 +338,7 @@ COMMON= Makefile DOCS= README Theory $(MANS) date.1 PRIMARY_YDATA= africa antarctica asia australasia \ europe northamerica southamerica -YDATA= $(PRIMARY_YDATA) pacificnew etcetera backward +YDATA= $(PRIMARY_YDATA) pacificnew etcetera $(BACKWARD) NDATA= systemv factory SDATA= solar87 solar88 solar89 TDATA= $(YDATA) $(NDATA) $(SDATA) @@ -330,9 +346,10 @@ TABDATA= iso3166.tab time.tab zone.tab DATA= $(YDATA) $(NDATA) $(SDATA) $(TABDATA) \ leap-seconds.list yearistype.sh WEB_PAGES= tz-art.htm tz-link.htm +AWK_SCRIPTS= back-pre1970.awk checktab.awk leapseconds.awk zone-time.awk MISC= usno1988 usno1989 usno1989a usno1995 usno1997 usno1998 \ - $(WEB_PAGES) checktab.awk leapseconds.awk workman.sh \ - zoneinfo2tdf.pl + $(WEB_PAGES) $(AWK_SCRIPTS) \ + workman.sh zoneinfo2tdf.pl ENCHILADA= $(COMMON) $(DOCS) $(SOURCES) $(DATA) $(MISC)
# And for the benefit of csh users on systems that assume the user @@ -423,6 +440,9 @@ zones: $(REDO) time.tab: $(YDATA) zone.tab zone-time.awk $(AWK) -f zone-time.awk $(YDATA) >$@
+back-pre1970: pre1970 backward + $(AWK) -v pre1970=pre1970 -f $@.awk backward >$@ + $(TZLIB): $(LIBOBJS) -mkdir $(TOPDIR) $(LIBDIR) ar ru $@ $(LIBOBJS) @@ -457,6 +477,7 @@ check_web: $(WEB_PAGES)
clean_misc: rm -f core *.o *.out \ + back-pre1970 time.tab \ date leapseconds tzselect version.h zdump zic yearistype clean: clean_misc rm -f -r tzpublic @@ -488,7 +509,7 @@ set-timestamps: $$cmd || exit; \ done
-# The zics below ensure that each data file can stand on its own. +# The zics below ensure that each non-pre1970 data file can stand on its own. # We also do an all-files run to catch links to links.
check_public: $(ENCHILADA) @@ -496,7 +517,8 @@ check_public: $(ENCHILADA) make "CFLAGS=$(GCC_DEBUG_FLAGS)" mkdir tzpublic for i in $(TDATA) ; do \ - $(zic) -v -d tzpublic $$i 2>&1 || exit; \ + test $$i = pre1970 || $(zic) -v -d tzpublic $$i 2>&1 \ + || exit; \ done $(zic) -v -d tzpublic $(TDATA) rm -f -r tzpublic diff --git a/back-pre1970.awk b/back-pre1970.awk new file mode 100644 index 0000000..f7c54fc --- /dev/null +++ b/back-pre1970.awk @@ -0,0 +1,18 @@ +# Generate 'back-pre1970' from the two input files 'pre1970' and 'backward'. +# The output consists of all lines in 'backward' that are not links to +# files mentioned in 'pre1970'. Think of it as 'backward' minus 'pre1970'. + +# The 'backward' file is the input. +# The awk variable 'pre1970' contains the name of the pre1970 file. + +# This file is in the public domain. + +# Contributed by Paul Eggert. + +BEGIN { + while ((getline <pre1970) == 1) + if ($1 == "Zone") + pre1970_zone[$2] = 1 +} + +! (/^Link/ && pre1970_zone[$3]) { print } diff --git a/pre1970 b/pre1970 new file mode 100644 index 0000000..d8b8f34 --- /dev/null +++ b/pre1970 @@ -0,0 +1,291 @@ +# Pre-1970 data + +# This file is in the public domain. + +# This file contains zones that were formerly in other source files, +# but were later removed or replaced by backward-compatibility links +# as they differ from other zones only in pre-1970 time stamps. + +# Although the tz database focuses on post-1970 time stamps, these +# entries are retained here as they may be of some use to people +# interested in pre-1970 time stamps, even though they cover only a +# tiny sliver of pre-1970 data and are unreliable for that data. +# Also, these entries can help with backward compatibility with some +# old versions of the tz database. They are incompatible with other +# old versions of the database, though; it depends on which old +# version you're interested in. + +# Entries are sorted by Zone name. Each entry is preceded by the name +# of the country that the entry is in, along with any other commentary +# and rules associated with the entry. Some rules, e.g., 'Canada', +# are defined by other source files; this file is not intended to be +# used without those other files. + +# Zone NAME GMTOFF RULES FORMAT [UNTIL] + +# Mali +# no longer different from Bamako, but too famous to omit +Zone Africa/Timbuktu -0:12:04 - LMT 1912 + 0:00 - GMT + +# Anguilla +Zone America/Anguilla -4:12:16 - LMT 1912 Mar 2 + -4:00 - AST + +# Antigua and Barbuda +Zone America/Antigua -4:07:12 - LMT 1912 Mar 2 + -5:00 - EST 1951 + -4:00 - AST + +# Argentina +# Chubut (CH) +# The name "Comodoro Rivadavia" exceeds the 14-byte POSIX limit. +Zone America/Argentina/ComodRivadavia -4:30:00 - LMT 1894 Oct 31 + -4:16:48 - CMT 1920 May + -4:00 - ART 1930 Dec + -4:00 Arg AR%sT 1969 Oct 5 + -3:00 Arg AR%sT 1991 Mar 3 + -4:00 - WART 1991 Oct 20 + -3:00 Arg AR%sT 1999 Oct 3 + -4:00 Arg AR%sT 2000 Mar 3 + -3:00 - ART 2004 Jun 1 + -4:00 - WART 2004 Jun 20 + -3:00 - ART + +# Aruba +Zone America/Aruba -4:40:24 - LMT 1912 Feb 12 # Oranjestad + -4:30 - ANT 1965 # Netherlands Antilles Time + -4:00 - AST + +# Canada + +Zone America/Atikokan -6:06:28 - LMT 1895 + -6:00 Canada C%sT 1940 Sep 29 + -6:00 1:00 CDT 1942 Feb 9 2:00s + -6:00 Canada C%sT 1945 Sep 30 2:00 + -5:00 - EST + +Zone America/Blanc-Sablon -3:48:28 - LMT 1884 + -4:00 Canada A%sT 1970 + -4:00 - AST + +# Cayman Is +Zone America/Cayman -5:25:32 - LMT 1890 # Georgetown + -5:07:12 - KMT 1912 Feb # Kingston Mean Time + -5:00 - EST + +# Canada +Zone America/Coral_Harbour -5:32:40 - LMT 1884 + -5:00 NT_YK E%sT 1946 + -5:00 - EST + +# Curacao +Zone America/Curacao -4:35:47 - LMT 1912 Feb 12 # Willemstad + -4:30 - ANT 1965 # Netherlands Antilles Time + -4:00 - AST + +# Dominica +Zone America/Dominica -4:05:36 - LMT 1911 Jul 1 0:01 # Roseau + -4:00 - AST + +# Mexico +Zone America/Ensenada -7:46:28 - LMT 1922 Jan 1 0:13:32 + -8:00 - PST 1927 Jun 10 23:00 + -7:00 - MST 1930 Nov 16 + -8:00 - PST 1942 Apr + -7:00 - MST 1949 Jan 14 + -8:00 - PST 1996 + -8:00 Mexico P%sT + +# US +Zone America/Fort_Wayne -5:00 US E%sT 1946 + -5:00 - EST # Always EST as of 1986 + +# Grenada +Zone America/Grenada -4:07:00 - LMT 1911 Jul # St George's + -4:00 - AST + +# Guadeloupe +Zone America/Guadeloupe -4:06:08 - LMT 1911 Jun 8 # Pointe a Pitre + -4:00 - AST + +# Canada +# Rule NAME FROM TO TYPE IN ON AT SAVE LETTER/S +Rule Mont 1917 only - Mar 25 2:00 1:00 D +Rule Mont 1917 only - Apr 24 0:00 0 S +Rule Mont 1919 only - Mar 31 2:30 1:00 D +Rule Mont 1919 only - Oct 25 2:30 0 S +Rule Mont 1920 only - May 2 2:30 1:00 D +Rule Mont 1920 1922 - Oct Sun>=1 2:30 0 S +Rule Mont 1921 only - May 1 2:00 1:00 D +Rule Mont 1922 only - Apr 30 2:00 1:00 D +Rule Mont 1924 only - May 17 2:00 1:00 D +Rule Mont 1924 1926 - Sep lastSun 2:30 0 S +Rule Mont 1925 1926 - May Sun>=1 2:00 1:00 D +# The 1927-to-1937 rules can be expressed more simply as +# Rule Mont 1927 1937 - Apr lastSat 24:00 1:00 D +# Rule Mont 1927 1937 - Sep lastSat 24:00 0 S +# The rules below avoid use of 24:00 +# (which pre-1998 versions of zic cannot handle). +Rule Mont 1927 only - May 1 0:00 1:00 D +Rule Mont 1927 1932 - Sep lastSun 0:00 0 S +Rule Mont 1928 1931 - Apr lastSun 0:00 1:00 D +Rule Mont 1932 only - May 1 0:00 1:00 D +Rule Mont 1933 1940 - Apr lastSun 0:00 1:00 D +Rule Mont 1933 only - Oct 1 0:00 0 S +Rule Mont 1934 1939 - Sep lastSun 0:00 0 S +Rule Mont 1946 1973 - Apr lastSun 2:00 1:00 D +Rule Mont 1945 1948 - Sep lastSun 2:00 0 S +Rule Mont 1949 1950 - Oct lastSun 2:00 0 S +Rule Mont 1951 1956 - Sep lastSun 2:00 0 S +Rule Mont 1957 1973 - Oct lastSun 2:00 0 S +# Zone NAME GMTOFF RULES FORMAT [UNTIL] +Zone America/Montreal -4:54:16 - LMT 1884 + -5:00 Mont E%sT 1918 + -5:00 Canada E%sT 1919 + -5:00 Mont E%sT 1942 Feb 9 2:00s + -5:00 Canada E%sT 1946 + -5:00 Mont E%sT 1974 + -5:00 Canada E%sT + +# Montserrat +Zone America/Montserrat -4:08:52 - LMT 1911 Jul 1 0:01 # Cork Hill + -4:00 - AST + +# Bahamas +# Rule NAME FROM TO TYPE IN ON AT SAVE LETTER/S +Rule Bahamas 1964 1975 - Oct lastSun 2:00 0 S +Rule Bahamas 1964 1975 - Apr lastSun 2:00 1:00 D +# Zone NAME GMTOFF RULES FORMAT [UNTIL] +Zone America/Nassau -5:09:30 - LMT 1912 Mar 2 + -5:00 Bahamas E%sT 1976 + -5:00 US E%sT + +# Trinidad and Tobago +Zone America/Port_of_Spain -4:06:04 - LMT 1912 Mar 2 + -4:00 - AST + +# Brazil +# Rio_Branco is too ambiguous, since there's a Rio Branco in Uruguay too. +Zone America/Porto_Acre -4:31:12 - LMT 1914 + -5:00 Brazil AC%sT 1988 Sep 12 + -5:00 - ACT + +# Argentina +# Santa Fe (SF), Entre Rios (ER), Corrientes (CN), Misiones (MN), Chaco (CC), +# Formosa (FM), La Pampa (LP), Chubut (CH) +Zone America/Rosario -4:02:40 - LMT 1894 Nov + -4:16:44 - CMT 1920 May + -4:00 - ART 1930 Dec + -4:00 Arg AR%sT 1969 Oct 5 + -3:00 Arg AR%sT 1991 Jul + -3:00 - ART 1999 Oct 3 0:00 + -4:00 Arg AR%sT 2000 Mar 3 0:00 + -3:00 - ART + +# St Kitts-Nevis +Zone America/St_Kitts -4:10:52 - LMT 1912 Mar 2 # Basseterre + -4:00 - AST + +# St Lucia +Zone America/St_Lucia -4:04:00 - LMT 1890 # Castries + -4:04:00 - CMT 1912 # Castries Mean Time + -4:00 - AST + +# Virgin Is +Zone America/St_Thomas -4:19:44 - LMT 1911 Jul # Charlotte Amalie + -4:00 - AST + +# St Vincent and the Grenadines +Zone America/St_Vincent -4:04:56 - LMT 1890 # Kingstown + -4:04:56 - KMT 1912 # Kingstown Mean Time + -4:00 - AST + +# British Virgin Is +Zone America/Tortola -4:18:28 - LMT 1911 Jul # Road Town + -4:00 - AST + +# McMurdo, Ross Island, since 1955-12 +Zone Antarctica/McMurdo 0 - zzz 1956 + 12:00 NZAQ NZ%sT + +# Japan +Zone Asia/Ishigaki 8:16:36 - LMT 1896 + 8:00 - CST + +# Israel +Zone Asia/Tel_Aviv 2:19:04 - LMT 1880 + 2:21 - JMT 1918 + 2:00 Zion I%sT + +# Russia +Zone Asia/Tomsk 5:39:52 - LMT 1924 May 2 + 6:00 - TSK 1957 Mar + 7:00 Russia TS%s 1991 Mar 31 2:00s + 6:00 1:00 TSD 1991 Sep 29 2:00s + 6:00 - TSK 1992 Jan 19 2:00s + 7:00 Russia TS%s + +# Svalbard & Jan Mayen +Zone Atlantic/Jan_Mayen -1:00 - EGT + +# Australia +Zone Australia/Canberra 9:56:32 - LMT 1895 Feb + 10:00 - EST 1917 Jan 1 0:01 + 10:00 Aus EST 1971 Oct 31 2:00 + 10:00 AN EST 1981 Oct 25 2:00 + 10:00 1:00 EST 1982 Apr 4 3:00 + 10:00 AN EST + +# UK +Zone Europe/Belfast -0:23:40 - LMT 1880 Aug 2 + -0:25:21 - DMT 1916 May 21 2:00 # Dublin/Dunsink MT + -0:25:21 1:00 IST 1916 Oct 1 2:00s # Irish Summer Time + 0:00 GB-Eire %s 1968 Oct 27 + 1:00 - BST 1971 Oct 31 2:00u + 0:00 GB-Eire %s 1996 + 0:00 EU GMT/BST + +# Slovenia +Zone Europe/Ljubljana 0:58:04 - LMT 1884 + 1:00 - CET 1941 Apr 18 23:00 + 1:00 C-Eur CE%sT 1945 May 8 2:00s + 1:00 1:00 CEST 1945 Sep 16 2:00s + 1:00 - CET 1982 Nov 27 + 1:00 EU CE%sT + +# Bosnia and Herzegovina +Zone Europe/Sarajevo 1:13:40 - LMT 1884 + 1:00 - CET 1941 Apr 18 23:00 + 1:00 C-Eur CE%sT 1945 May 8 2:00s + 1:00 1:00 CEST 1945 Sep 16 2:00s + 1:00 - CET 1982 Nov 27 + 1:00 EU CE%sT + +# Macedonia +Zone Europe/Skopje 1:25:44 - LMT 1884 + 1:00 - CET 1941 Apr 18 23:00 + 1:00 C-Eur CE%sT 1945 May 8 2:00s + 1:00 1:00 CEST 1945 Sep 16 2:00s + 1:00 - CET 1982 Nov 27 + 1:00 EU CE%sT + +# Moldova +Zone Europe/Tiraspol 1:58:32 - LMT 1880 + 1:55 - CMT 1918 Feb 15 # Chisinau MT + 1:44:24 - BMT 1931 Jul 24 # Bucharest MT + 2:00 Romania EE%sT 1940 Aug 15 + 2:00 1:00 EEST 1941 Jul 17 + 1:00 C-Eur CE%sT 1944 Aug 24 + 3:00 Russia MSK/MSD 1991 Mar 31 2:00 + 2:00 Russia EE%sT 1992 Jan 19 2:00 + 3:00 Russia MSK/MSD + +# Croatia +# Zone NAME GMTOFF RULES FORMAT [UNTIL] +Zone Europe/Zagreb 1:03:52 - LMT 1884 + 1:00 - CET 1941 Apr 18 23:00 + 1:00 C-Eur CE%sT 1945 May 8 2:00s + 1:00 1:00 CEST 1945 Sep 16 2:00s + 1:00 - CET 1982 Nov 27 + 1:00 EU CE%sT

On Fri, Aug 30, 2013 at 11:29:16AM +0100, Stephen Colebourne <scolebourne@joda.org> wrote:
I oppose this direction for the tzdb on principle.
TL;DR: The tzdb isn't broken. Don't fix it.
I agree, this new direction of removing data without need is too drastic.
I also doubt that you have or will get consensus to push through such a change.
Indeed, I have yet to see somebody on this list backing these changes, so this doesn't even seem to be controversial.
to add a +1 to this email.
+1 On Fri, Aug 30, 2013 at 10:51:35AM +0200, gunther vermeir <gunther.vermeir@oracle.com> wrote:
I might have missed a really good reason why this all has been done ,
I think it hasn't been given. The only explanation so far was that the Theory file requires it, which has been shown not to be so. -- The choice of a Deliantra, the free code+content MORPG -----==- _GNU_ http://www.deliantra.net ----==-- _ generation ---==---(_)__ __ ____ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / schmorp@schmorp.de -=====/_/_//_/\_,_/ /_/\_\

On 31 August 2013 01:22, Paul Eggert <eggert@cs.ucla.edu> wrote:
Zefram wrote:
Would you be interested in a patch for this from me?
I would, yes. What sort of thing did you have in mind? We do need to retain backward compatibility of the data, as per recent discussion.
Personally, I'm fine with the addition of filtering to allow zic users to reduce their data set. I'd strongly argue such filtering should be optional, and the default should be as it is now. Stephen

Paul Eggert wrote:
I would, yes. What sort of thing did you have in mind?
At the core, a new program "tzwinnow" that reduces a list of timezones to inequivalent ones, comparing equivalence over the user's choice of date range. It compares tzfiles, not source, partly because that's easier and partly so that it can be used at tzselect time. Supporting tzwinnow, a new .tab file containing population data, derived from new magic comments in the data source files. Once that's available, lots of things can be winnowed. time.tab is actually a winnowed version of zone.tab. (Resolving links is the minimal form of winnowing.) At build time, we initially build the maximalist set of tzfiles, for every zone with distinct data. A second stage of building winnows the tzfiles and zone.tab according to the installer's choice of date range. I'm wondering about the time.tab/zone.tab distinction. We need to start from the existing maximalist zone.tab, of course. Applying winnowing to the full set of zone names in the file would cause cross-country links of the type that time.tab has, which some find objectionable. We can get a winnowed zone.tab that avoids this if each country's zone set is winnowed separately. We can do both, of course, but the regularity of the process may cast doubt on the value of one or the other. There's been some mention of winnowing the source files. That wouldn't be part of the standard build process, but it'd be easy enough to have a program to do that. The amount of source parsing required is not very great; far less than would be required to have tzwinnow work from source. Actually I had such a solid design in my head that I went ahead and started work. I have population figures for all the zones added to the source, generation of the .tab, the tzwinnow program, and documentation for all of this. Next up is use of tzwinnow in tzselect, which is the point at which we start to get a tangible benefit, so that's probably when I should start actually posting the patches. Some early output from tzwinnow has shown that you weren't entirely consistent about removing pre-1970 distinctions. For example, you retained Europe/Copenhagen, Europe/Oslo, and Europe/Stockholm despite the fact that they've all matched Europe/Berlin since 1970. I guess that's because their source descriptions refer to different sets of DST rules for the 1970s, with those rule sets happening to agree in that range (none having any DST transitions since 1965). -zefram

I wrote:
Next up is use of tzwinnow in tzselect, which is the point at which we start to get a tangible benefit, so that's probably when I should start actually posting the patches.
Now implemented this. Interesting discovery: this is actually the most complex place to apply winnowing. It's not just a matter of making links: the region descriptions in zone.tab have to be merged, and doing that in a way that produces legible results takes some effort. I'm happy with the results I'm getting. OK, patches. You can pull the changes from my git repo, <git://git.fysh.org/zefram/tz.git> branch zefram/winnow. It's based on Eggert's current master branch. Unfortunately the first patch, which adds the population data, is a bit big for the mailing list: $ wc 00* 4068 25639 172653 0001-add-Wikipedia-URLs-and-population-data.patch 311 1739 11259 0002-generate-file-of-normalised-population-figures.patch 276 1539 9871 0003-restore-the-rest-of-the-pre-1970-data.patch 919 4059 24752 0004-new-program-tzwinnow.patch 427 2441 17202 0005-optionally-winnow-zone-list-in-tzselect.patch 6001 35417 235737 total So I won't post the patches here unless instructed. The third patch is optional. The other four are sequentially interdependent. -zefram

On Sat, Aug 31, 2013, at 22:46, Zefram wrote:
I wrote:
Next up is use of tzwinnow in tzselect, which is the point at which we start to get a tangible benefit, so that's probably when I should start actually posting the patches.
Now implemented this. Interesting discovery: this is actually the most complex place to apply winnowing. It's not just a matter of making links: the region descriptions in zone.tab have to be merged, and doing that in a way that produces legible results takes some effort. I'm happy with the results I'm getting.
It may be beneficial to add some manual hinting. The largish merged UTC+01 zones in Europe should probably be described as some variation on "Central European Time", a string that appears nowhere in zone.tab.

random832@fastmail.us wrote:
It may be beneficial to add some manual hinting.
I tweaked the descriptions in zone.tab to make automated merging work better.
The largish merged UTC+01 zones in Europe should probably be described as some variation on "Central European Time", a string that appears nowhere in zone.tab.
Actually these are easy cases. European countries have few zones each, and so don't need elaborate region descriptions to distinguish them within the context of the selected country. Stating "Central European Time" doesn't seem useful; usually the whole country is on some variant of CET. Quite unlike US or CA, where there are several base offsets in current use and then DST differences between locations that share base offset. There we benefit greatly from using "{Alaska,Pacific,Mountain,Central,Eastern} Time" as the first level of subdivision within the country. -zefram

I've started to take a look at this, and it appears to be nice work that will head us in the right direction; thanks. I found a problem, though, in that tzwinnow is not winnowing out as much as I expect. For example, tzwinnow -a 1970z should find that Africa/Accra and Africa/Dakar are duplicates, since they've both been at plain GMT since 1970, but tzwinnow considers them to be distinct for some reason. One suggestion: it would be nice for tzwinnow to have an option where it ignores differences due only to the time zone abbreviations, for applications that care only about UTC offsets. I found the problem with Accra and Dakar by running the following test script, which is not intended to be portable or fast but should run on any GNU/Linux host with the necessary packages installed. This script found that the version of the tz database that you used had 417 zones (this does not count links), of which 190 are duplicates from the year 2013 on. Hence it found 227 distinct zones from the year 2013 on, a considerably smaller number than what you found with tzwinnow. If we ignore time zone abbreviations, a variant of the script finds 314 duplicates, which means there are 103 distinct zones today. Having to choose from 103 values should be significantly easier for users than having to choose from 417. #! /bin/sh TOPDIR=$1 test -f "$TOPDIR/etc/zdump" || { echo >&2 "$0: usage: $0 topdir"; exit 1; } start_time_t=0 start_year=2013 limit_year=2500 # Prepend "." to the path, since this is meant to be run # in the source directory, which contains tzwinnow, zdump, and maybe 'date'. LC_ALL=C PATH=.:$PATH TZ=UTC0 export LC_ALL PATH TZ date_format='%Y-%m-%dT%H:%M:%S %Z' for date_origin_option in '-d@' '-r' ''; do test -n "$date_origin_option" || { echo >&2 "date is dumb"; exit 1; } date_output=$(date $date_origin_option$start_time_t "+$date_format") [ "$date_output" = '1970-01-01T00:00:00 UTC' ] && break done zonedir=$TOPDIR/etc/zoneinfo tmp=$(mktemp -d) || exit # trap 'status=$?; rm -fr $tmp; exit $status' 0 # trap exit 1 2 13 15 (cd $zonedir && find * ! -name '*.tab' -type f -ls | sort | awk '{if (inum != $1) print $NF; inum = $1; }' | sort ) >$tmp/names tzwinnow -a ${start_year}z -B ${limit_year}z -z "$zonedir" -l \ <$tmp/names >$tmp/tzwinnow.out for name in $(cat $tmp/names); do dest=$tmp/zdump.out/$name mkdir -p $(dirname $dest) (TZ=$zonedir/$name date $date_origin_option$start_time_t "+$date_format" && zdump -V -c $start_year,$limit_year $name | sed 's/^[^ ]* *//' ) >$dest || break done (cd $tmp/zdump.out && fdupes -qr . | sed 's@^\./@@') >$tmp/check.out echo "output is in: $tmp"

Paul Eggert wrote:
For example, tzwinnow -a 1970z should find that Africa/Accra and Africa/Dakar are duplicates, since they've both been at plain GMT since 1970, but tzwinnow considers them to be distinct for some reason.
Good catch. The logic around the transition to the POSIX-TZ rule is wonky. It's perceiving the zones as different because their last transitions are at different times, even though that's 1942-12-30 vs 1941-06-01. I'll fix that.
One suggestion: it would be nice for tzwinnow to have an option where it ignores differences due only to the time zone abbreviations, for applications that care only about UTC offsets.
Sure. It also compares the is_dst flag; I can make that all optional. I'd noticed that the Russian timezones don't merge when tzselect applies winnowing. This is partly due to using different initialisms even when the base offset is shared. -zefram

On Sun, 01 Sep 2013, Zefram wrote:
OK, patches. You can pull the changes from my git repo, <git://git.fysh.org/zefram/tz.git> branch zefram/winnow. It's based on Eggert's current master branch. Unfortunately the first patch, which adds the population data, is a bit big for the mailing list:
While trying to test this code, I encountered two problems in the awk programs. 1. The standard awk split() function does not accept regular expressions delimited by slashes. For example, you should use the standard split(region_data, rd, "\\n") instead of the non-standard split(region_data, rd, /\n/) 2. In standard awk, the command line option "-v variable=value" requires the "value" part to be expressed in the same notation as is used for double-quoted strings in awk source code. By my reading of the POSIX specification for awk <http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html>, this implies that raw newlines are not allowed there, although newlines encoded as \n are allowed. You can deal with this using environment variables, like this: winnow_result=$( d="$region_data" \ $AWK ' BEGIN { d = ENVIRON["d"] gsub(/\t[^\t\n]*/, "", d) print d exit } ' </dev/null | [...] Alternatively, you could pass positional arguments instead of options, like this: winnow_result=$( $AWK ' BEGIN { d = ARGV[1] gsub(/\t[^\t\n]*/, "", d) print d exit } ' \ "$region_data" \ </dev/null | [...] --apb (Alan Barrett)

Alan Barrett wrote:
While trying to test this code, I encountered two problems in the awk programs.
Thanks for those comments. I'll work on these issues. I spotted another awk issue: the "asort" built-in function that I used turns out to be specific to gawk. gawk's lint mode warns about that and about some of how array modification interacts with looping. -zefram

It's glorious, a testimonial to the tool's design, how much can be achieved with awk. At about the point where the need to do sophisticated tasks brings the code out of the common, almost universally portable subset, and you find yourself writing to one particular dialect, perhaps a richer language might support a better solution. I'm not an awk programmer, but I do shell scripts occasionally, and about the point where it becomes tempting to code to a superset of Bourne shell like bash or zsh or whatever, I tend to change languages. A tool that doesn't have to be run at single-user boot time is less confined in choice of tools. Does this tool really need to be portable to busybox, or other such spare environment?

On Sat, 07 Sep 2013, Bennett Todd wrote:
Does this tool really need to be portable to busybox, or other such spare environment?
No, but I think that it is desirable for it to be portable to most POSIX systems. Zefram seems willing to remove any gawkisms that have crept into his patches, so I don't see a problem. The shell portion of the code already relies on the select statement, which is in bash and in ksh, but not in POSIX sh, and I think that's fine -- most systems have at least one of ksh or bash. The awk part of the code used to be portable, and should soon be portable again. --apb (Alan Barrett)

Zefram wrote:> Paul Eggert wrote:
I'm wondering about the time.tab/zone.tab distinction.
It could be that we can discard time.tab, if tzwinnow can serve as a substitute. I'll take a look at what you've done, to see whether that would work.
Some early output from tzwinnow has shown that you weren't entirely consistent about removing pre-1970 distinctions.
No, I was consistent, it's just that I was stopped in the middle of the cleanup process.

I oppose this direction for the tzdb on principle. TL;DR: The tzdb isn't broken. Don't fix it. In terms of the proposed patch, it is possible to use this structure, but requires extra work to do so. Parsing the new data will require changes to the 3 compilers I maintain. Gunther Vermeir has indicated that Oracle RDBS also has changes to make, which will be difficult to do. I suspect that there are others. Your patch generates the back-pre-1970 dynamically, rather than checking it in and releasing it in the distribution. That generation code is inaccessible to those who write compilers in other languages, thus the logic must be repeated. (The simplest solution is to process all links after all zones, and ignore links that have the same names as zones). The patch also requires changing the set of input files to be processed, adding attic. Beyond the immediate patch feedback, I just think this is entirely the wrong direction for the tzdb project. The key problems are (1) divergence in the meaning of an ID and (2) change for changes sake. The tzdb currently offers two basic sets of data - "right" (with leapsecs) and "normal". The number of people using "right" is very small, and the associated local time differences are also very small, so the "right" files can be mostly ignored. By contrast, this approach adds another set of data, one which feedback is demonstrating will be relatively widely used. As such, what this approach does is create a divergence in the meaning of a time-zone ID, of the kind that has not happened before. This is a big no-no to me. My experience with similar problems (divergence between the meaning of an ID between Joda-Time and the Java JDK) indicates that users find it confusing and consider it a bug. One ID should mean the same wherever it is used, from Unix to Java to PHP - that is the ubiquity aspect of the "Principles" email I sent out. For example, with this approach, the ID "America/Atikokan" will have two very different sets of time-zone history associated with it. A user looking at the zone on a Unix command line will see a different history to that viewed through Java (or PHP from the sounds of it). This difference is hugely negative for my users. The alternative approach, which I am arguing for, is to make no change at all. Simply leave the data as it was in the tzdb. The key principle is that once you have created an ID, you have a responsibility to maintain it in the long term, and that applies to all data, pre and post 1970. Whether the data is good, bad or indifferent is not important - backwards compatibility matters more. (Note that enhancements to history based on research continue to be fine). That is what I desribe as stability. I'm pretty sure its what lots of those who have joined in the thread in some small way expressing discomfort have been looking to achieve. Note: I don't object to the addition of filters in zic, but they should be switched off by default to avoid divergence of ID meaning as much as possible. To the extent I can, I'm trying to exercise a veto on this change and the entire set of recent amendments. I hope I've expressed why in clear enough terms. I also doubt that you have or will get consensus to push through such a change. I am sorry that you have made a rod for your own back here. Your attempt to reduce controversy has clearly failed IMO - your cure is worse than the disease: https://github.com/eggert/tz/commit/d3b025adb25554ee10b986850371e573df92733e I encourage others who simply want to see tzdb ID history preserved without change (and a focus placed back on current Government changes) to add a +1 to this email. Stephen

Stephen Colebourne wrote:
I encourage others who simply want to see tzdb ID history preserved without change (and a focus placed back on current Government changes) to add a +1 to this email.
The history IS the data ... We need a single view of that data that everyone can use rather than returning to 'personal interpenetration' from other sources, and each API doing it's own thing. -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

Lester Caine wrote:
Stephen Colebourne wrote:
I encourage others who simply want to see tzdb ID history preserved without change (and a focus placed back on current Government changes) to add a +1 to this email.
The history IS the data ... We need a single view of that data that everyone can use rather than returning to 'personal interpenetration' from other sources, and each API doing it's own thing.
Spellchecker :) ... 'personal interpretation' -- Lester Caine - G8HFL ----------------------------- Contact - http://lsces.co.uk/wiki/?page=contact L.S.Caine Electronic Services - http://lsces.co.uk EnquirySolve - http://enquirysolve.com/ Model Engineers Digital Workshop - http://medw.co.uk Rainbow Digital Media - http://rainbowdigitalmedia.co.uk

Stephen Colebourne <scolebourne@joda.org> wrote: |I oppose this direction for the tzdb on principle. What i thought about this project (and that was my impression from the datasets, the only thing i deal[t] with) that there is a strategy of «best effort». I.e., data is collected and integrated if it seems reasonable. It was clear to me that there is a fuzzy point in the past, for which data cannot be accurate. |TL;DR: The tzdb isn't broken. Don't fix it. Therefore i fail to understand the entire direction, and agree with this statement. It seems a lot of people only use the data, not the code. I don't know about an alternative dataset of equal quality that could be used by those people. Therefore, and to me, it seems to be better to leave the data alone and only adjust the tools. If adjusting the tools is accepted by the consumers of these tools, then if that requires additional data to work, then this new data should be placed into a new file. |I encourage others who simply want to see tzdb ID history preserved |without change (and a focus placed back on current Government changes) |to add a +1 to this email. +1 |Stephen --steffen

On Fri, Aug 30, 2013 at 6:29 AM, Stephen Colebourne <scolebourne@joda.org>wrote:
I encourage others who simply want to see tzdb ID history preserved without change (and a focus placed back on current Government changes) to add a +1 to this email.
+1 - I have not seen any good reason presented to disrupt the stability of the data. The stability and history of the data set is what motivated porting to it in the first place. -Andrew Bloomberg LP

On Fri, Aug 30, 2013 at 3:29 AM, Stephen Colebourne <scolebourne@joda.org>wrote:
I oppose this direction for the tzdb on principle.
TL;DR: The tzdb isn't broken. Don't fix it.
In terms of the proposed patch, it is possible to use this structure, but requires extra work to do so. Parsing the new data will require changes to the 3 compilers I maintain. Gunther Vermeir has indicated that Oracle RDBS also has changes to make, which will be difficult to do. I suspect that there are others.
Your patch generates the back-pre-1970 dynamically, rather than checking it in and releasing it in the distribution. That generation code is inaccessible to those who write compilers in other languages, thus the logic must be repeated. (The simplest solution is to process all links after all zones, and ignore links that have the same names as zones). The patch also requires changing the set of input files to be processed, adding attic.
Beyond the immediate patch feedback, I just think this is entirely the wrong direction for the tzdb project.
The key problems are (1) divergence in the meaning of an ID and (2) change for changes sake.
The tzdb currently offers two basic sets of data - "right" (with leapsecs) and "normal". The number of people using "right" is very small, and the associated local time differences are also very small, so the "right" files can be mostly ignored. By contrast, this approach adds another set of data, one which feedback is demonstrating will be relatively widely used.
As such, what this approach does is create a divergence in the meaning of a time-zone ID, of the kind that has not happened before. This is a big no-no to me. My experience with similar problems (divergence between the meaning of an ID between Joda-Time and the Java JDK) indicates that users find it confusing and consider it a bug. One ID should mean the same wherever it is used, from Unix to Java to PHP - that is the ubiquity aspect of the "Principles" email I sent out.
For example, with this approach, the ID "America/Atikokan" will have two very different sets of time-zone history associated with it. A user looking at the zone on a Unix command line will see a different history to that viewed through Java (or PHP from the sounds of it). This difference is hugely negative for my users.
The alternative approach, which I am arguing for, is to make no change at all. Simply leave the data as it was in the tzdb. The key principle is that once you have created an ID, you have a responsibility to maintain it in the long term, and that applies to all data, pre and post 1970. Whether the data is good, bad or indifferent is not important - backwards compatibility matters more. (Note that enhancements to history based on research continue to be fine).
That is what I desribe as stability. I'm pretty sure its what lots of those who have joined in the thread in some small way expressing discomfort have been looking to achieve.
Note: I don't object to the addition of filters in zic, but they should be switched off by default to avoid divergence of ID meaning as much as possible.
To the extent I can, I'm trying to exercise a veto on this change and the entire set of recent amendments. I hope I've expressed why in clear enough terms. I also doubt that you have or will get consensus to push through such a change.
I am sorry that you have made a rod for your own back here. Your attempt to reduce controversy has clearly failed IMO - your cure is worse than the disease:
https://github.com/eggert/tz/commit/d3b025adb25554ee10b986850371e573df92733e
I encourage others who simply want to see tzdb ID history preserved without change (and a focus placed back on current Government changes) to add a +1 to this email.
+1 --- i think the arguments that apply to Joda-Time and the JDK apply equally to Android's C library and Java libraries, both of which use this data. it's hard to justify changing something unless it was demonstrably wrong before and is demonstrably less wrong after.
Stephen

On Fri, Aug 30, 2013 at 6:29 AM, Stephen Colebourne <scolebourne@joda.org>wrote:
I encourage others who simply want to see tzdb ID history preserved without change (and a focus placed back on current Government changes) to add a +1 to this email.
+1
I have huge appreciation for the work Paul Eggert does on this project, and I don't enjoy the political fights on the mailing list, but continuity is more important.
participants (13)
-
Alan Barrett
-
Andrew Paprocki
-
Bennett Todd
-
enh
-
gunther vermeir
-
Gwillim Law
-
Lester Caine
-
Marc Lehmann
-
Paul Eggert
-
random832@fastmail.us
-
Steffen Daode Nurpmeso
-
Stephen Colebourne
-
Zefram