[PROPOSED] Support zi parsers that mishandle negative DST offsets
This is intended to provide a way to support both clients that require data to have only positive DST offsets, and clients that do not have this restriction. * Makefile (XDST, SDST): New macros. (TZDATA_ZI_DEPS): Add zidst.awk. (DSTDATA_ZI_DEPS): New macro. (all): Depend on fulldata.zi and pdstdata.zi. (fulldata.zi pdstdata.zi): New rule. (tzdata.zi): Use $(XDST)data.zi instead of reading original source. (check_zishrink): Check zidst.awk, too. (clean): Remove all *.zi files, not just tzdata.zi. * NEWS, europe: Mention this. * zidst.awk: New file. --- Makefile | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++----------- NEWS | 30 ++++++++++++++++++++++++++++++ europe | 39 ++++++++++++++++++++++----------------- zidst.awk | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 154 insertions(+), 28 deletions(-) create mode 100644 zidst.awk diff --git a/Makefile b/Makefile index 8c84cd9..92ddb80 100644 --- a/Makefile +++ b/Makefile @@ -10,6 +10,26 @@ VERSION= unknown # Email address for bug reports. BUGEMAIL= tz@iana.org +# To install the full data, which can contain daylight saving time +# offsets that are negative (relative to standard time), use +# XDST= full +# To install data containing only positive daylight saving time +# offsets, but otherwise as close to the full data as practical, use +# XDST= pdst +XDST= pdst +# Parsers requiring DST offsets to be positive should use the file +# pdstdata.zi, which contains almost all the data of 'africa' etc., +# except with positive DST offsets. This works around a problem that +# was discovered in January 2018 with negative DST in tests for ICU +# and OpenJDK. See: +# https://mm.icann.org/pipermail/tz/2018-January/025825.html +# https://mm.icann.org/pipermail/tz/2018-January/025822.html +# Currently the 'africa' etc. files use pdst form if comments are +# ignored, to ease transition for parsers that do not support +# negative DST offsets. This is intended to change to full form at +# some point, so that full-featured zi parsers that use the 'africa' +# files will get the full data without changing anything. + # Change the line below for your time zone (after finding the zone you want in # the time zone files, or adding it to a time zone file). # Alternately, if you discover you've got the wrong time zone, you can just @@ -463,7 +483,8 @@ TDATA= $(YDATA) $(NDATA) $(BACKWARD) ZONETABLES= zone1970.tab zone.tab TABDATA= iso3166.tab $(TZDATA_TEXT) $(ZONETABLES) LEAP_DEPS= leapseconds.awk leap-seconds.list -TZDATA_ZI_DEPS= zishrink.awk version $(TDATA) $(PACKRATDATA) +TZDATA_ZI_DEPS= zidst.awk zishrink.awk version $(TDATA) $(PACKRATDATA) +DSTDATA_ZI_DEPS= zidst.awk $(TDATA) $(PACKRATDATA) DATA= $(TDATA_TO_CHECK) backzone iso3166.tab leap-seconds.list \ leapseconds yearistype.sh $(ZONETABLES) AWK_SCRIPTS= checklinks.awk checktab.awk leapseconds.awk zishrink.awk @@ -500,7 +521,8 @@ VERSION_DEPS= \ SHELL= /bin/sh -all: tzselect yearistype zic zdump libtz.a $(TABDATA) +all: tzselect yearistype zic zdump libtz.a $(TABDATA) \ + fulldata.zi pdstdata.zi ALL: all date $(ENCHILADA) @@ -535,11 +557,15 @@ version: $(VERSION_DEPS) printf '%s\n' "$$V" >$@.out mv $@.out $@ -# This file can be tailored by setting BACKWARD, PACKRATDATA, etc. -tzdata.zi: $(TZDATA_ZI_DEPS) +# These files can be tailored by setting BACKWARD, PACKRATDATA, etc. +fulldata.zi pdstdata.zi: $(DSTDATA_ZI_DEPS) + $(AWK) -v outfile='$@' -f zidst.awk $(TDATA) $(PACKRATDATA) \ + >$@.out + mv $@.out $@ +tzdata.zi: $(XDST)data.zi version version=`sed 1q version` && \ LC_ALL=C $(AWK) -v version="$$version" -f zishrink.awk \ - $(TDATA) $(PACKRATDATA) >$@.out + $(XDST)data.zi >$@.out mv $@.out $@ version.h: version @@ -721,17 +747,32 @@ check_tzs: $(TZS) $(TZS_NEW) check_web: tz-how-to.html $(VALIDATE_ENV) $(VALIDATE) $(VALIDATE_FLAGS) tz-how-to.html -# Check that tzdata.zi generates the same binary data that its sources do. -check_zishrink: tzdata.zi zic leapseconds $(PACKRATDATA) $(TDATA) +# The format of the source files, either full or pdst. +# Currently they are in pdst format, but this is expected to change. +SDST = pdst + +# Check that zishrink.awk does not alter the data, and that zidst.awk +# preserves $(SDST) data. +check_zishrink: zic leapseconds $(PACKRATDATA) $(TDATA) \ + $(XDST)data.zi tzdata.zi for type in posix right; do \ - mkdir -p time_t.dir/$$type time_t.dir/$$type-shrunk && \ + mkdir -p time_t.dir/$$type time_t.dir/$$type-$(SDST) \ + time_t.dir/$$type-shrunk && \ case $$type in \ right) leap='-L leapseconds';; \ *) leap=;; \ esac && \ - $(ZIC) $$leap -d time_t.dir/$$type $(TDATA) && \ - $(AWK) '/^Rule/' $(TDATA) | \ + $(ZIC) $$leap -d time_t.dir/$$type $(XDST)data.zi && \ + $(AWK) '/^Rule/' $(XDST)data.zi | \ $(ZIC) $$leap -d time_t.dir/$$type - $(PACKRATDATA) && \ + case $(XDST) in \ + $(SDST)) \ + $(ZIC) $$leap -d time_t.dir/$$type-$(SDST) $(TDATA) && \ + $(AWK) '/^Rule/' $(TDATA) | \ + $(ZIC) $$leap -d time_t.dir/$$type-$(SDST) \ + $(XDST)data.zi && \ + diff -r time_t.dir/$$type time_t.dir/$$type-$(SDST);; \ + esac && \ $(ZIC) $$leap -d time_t.dir/$$type-shrunk tzdata.zi && \ diff -r time_t.dir/$$type time_t.dir/$$type-shrunk || exit; \ done @@ -741,7 +782,7 @@ clean_misc: rm -f core *.o *.out \ date tzselect version.h zdump zic yearistype libtz.a clean: clean_misc - rm -fr *.dir tzdata.zi tzdb-*/ $(TZS_NEW) + rm -fr *.dir *.zi tzdb-*/ $(TZS_NEW) maintainer-clean: clean @echo 'This command is intended for maintainers to use; it' diff --git a/NEWS b/NEWS index 4f763c0..c455f3c 100644 --- a/NEWS +++ b/NEWS @@ -2,6 +2,36 @@ News for the tz database Unreleased, experimental changes + Briefly: + Support zi parsers that mishandle negative DST offsets + + Changes to build procedure + + The new XDST macro in the Makefile lets the installer choose + XDST=full, which allows arbitrary DST offsets in the data, or + XDST=pdst, which allows only positive DST offsets. Choosing + XDST=full is arguably more correct for Ireland, which observes + Irish Standard Time (IST, UTC+01) in summer and GMT (UTC) in + winter. Choosing XDST=pdst is better for zoneinfo parsers that do + not work well with negative DST offsets, notably OpenJDK+CLDR. + On platforms using tzcode or similar APIs, XDST should not affect + any behavior other than that depending on the tm_isdst flag. + + For now this change does not affect client-visible behavior by + default, as the Makefile defaults to XDST=pdst and uncommented + parts of the data source files contain only pdst-format data. + After a bit of time for testing, XDST=full and full-format source + files are planned to become the default, so that parsers that + support negative DST offsets can get full data without changing + their build procedures. Parsers requiring positive DST offsets + should use the new file pdstdata.zi instead of tzdata.zi or the + source files 'africa' etc.: pdstdata.zi is pdst-compatible, it is + automatically built from the data source files, and it will + continue to be pdst-compatible regardless of XDST. To get + full-format data now, use the new file fulldata.zi, which will + continue to be full-format regardless of XDST. To get the format + selected by XDST, use tzdata.zi. + Changes to code The code is a bit more portable to MS-Windows. (Thanks to Manuela diff --git a/europe b/europe index 6c1ccbe..5aeda33 100644 --- a/europe +++ b/europe @@ -508,11 +508,27 @@ Link Europe/London Europe/Jersey Link Europe/London Europe/Guernsey Link Europe/London Europe/Isle_of_Man -# From Paul Eggert (2018-01-19): +# From Paul Eggert (2018-01-30): +# In January 2018 we discovered that the negative DST offsets in the +# Eire rules cause problems with tests for ICU: +# https://mm.icann.org/pipermail/tz/2018-January/025825.html +# and with tests for OpenJDK: +# https://mm.icann.org/pipermail/tz/2018-January/025822.html +# To work around this problem, zidst.awk translates the following data +# lines into two forms. First, fulldata.zi contains the full data, +# which includes negative DST offsets. Second, pdstdata.zi uses a +# traditional approximation for Irish time stamps after 1971-10-31 +# 02:00 UTC; although this approximation has tm_isdst flags that are +# the reverse of the full data, its UTC offsets are correct and this +# suffices for ICU and OpenJDK. Although this source file currently +# has pdstdata.zi lines active and fulldata.zi lines commented out, +# this is intended to change in the near future and downstream code +# should not rely on it. +# # The following is like GB-Eire and EU, except with standard time in # summer and negative daylight saving time in winter. -# Although currently commented out, this will need to become uncommented -# once the ICU/OpenJDK workaround is removed; see below. +# This rule set is active in fulldata.zi and is commented out in +# pdstdata.zi. # Rule NAME FROM TO TYPE IN ON AT SAVE LETTER/S #Rule Eire 1971 only - Oct 31 2:00u -1:00 GMT #Rule Eire 1972 1980 - Mar Sun>=16 2:00u 0 IST @@ -533,24 +549,13 @@ Zone Europe/Dublin -0:25:00 - LMT 1880 Aug 2 0:00 1:00 IST 1947 Nov 2 2:00s 0:00 - GMT 1948 Apr 18 2:00s 0:00 GB-Eire GMT/IST 1968 Oct 27 -# From Paul Eggert (2018-01-18): -# The next line should look like this: +# The next line is active in fulldata.zi and commented out in pdstdata.zi. # 1:00 Eire IST/GMT -# However, in January 2018 we discovered that the Eire rules cause -# problems with tests for ICU: -# https://mm.icann.org/pipermail/tz/2018-January/025825.html -# and with tests for OpenJDK: -# https://mm.icann.org/pipermail/tz/2018-January/025822.html -# To work around this problem, use a traditional approximation for -# time stamps after 1971-10-31 02:00 UTC, to give ICU and OpenJDK -# developers breathing room to fix bugs. This approximation has -# correct UTC offsets, but results in tm_isdst flags are the reverse -# of what they should be. This workaround is temporary and should be -# removed reasonably soon. +# These three lines are active in pdstdata.zi and commented out in +# fulldata.zi. 1:00 - IST 1971 Oct 31 2:00u 0:00 GB-Eire GMT/IST 1996 0:00 EU GMT/IST -# End of workaround for ICU and OpenJDK bugs. ############################################################################### diff --git a/zidst.awk b/zidst.awk new file mode 100644 index 0000000..7885e9a --- /dev/null +++ b/zidst.awk @@ -0,0 +1,50 @@ +# Convert tzdata source into full or positive-DST form + +# Contributed by Paul Eggert. This file is in the public domain. + +# This is not a general-purpose converter; it is designed for current tzdata. +# +# When converting to full form, the output can use negative DST offsets. +# +# When converting to positive-DST form, the output uses only positive +# DST offsets. The idea is for the output data to simulate the +# behavior of the input data as best it can within the constraints of +# positive DST offsets. +# +# In the input, lines requiring the full format are commented #[full] +# and the positive DST near-equivalents are commented #[pdst]. + +BEGIN { + dst_type["full"] = 1 + dst_type["pdst"] = 1 + + # The command line should set OUTFILE to the name of the output file, + # which should start with either "full" or "pdst". + todst = substr(outfile, 1, 4) + if (!dst_type[todst]) exit 1 +} + +/^Zone/ { zone = $2 } + +{ + in_comment = /^#/ + + # Test whether this line should differ between the full and the pdst versions. + Rule_Eire = /^#?Rule[\t ]+Eire[\t ]/ + Zone_Dublin_post_1968 \ + = (zone == "Europe/Dublin" && /^#?[\t ]+[01]:00[\t ]/ \ + && (!$(in_comment + 4) || 1968 < $(in_comment + 4))) + + # If so, uncomment the desired version and comment out the undesired one. + if (Rule_Eire || Zone_Dublin_post_1968) { + if ((Rule_Eire \ + || (Zone_Dublin_post_1968 && $(in_comment + 3) == "IST/GMT")) \ + == (todst == "full")) { + sub(/^#/, "") + } else if (/^[^#]/) { + sub(/^/, "#") + } + } +} + +{ print } -- 2.14.3
AFAICT, this does not provide a solution to anything, but perhaps I don't understand it. Projects like OpenJDK and Joda-Time parse the source files of tzdb. zic is not used. Make is not run. Users are encouraged to update the time-zone data themselves: http://www.joda.org/joda-time/tz_update.html http://www.threeten.org/threetenbp/update-tzdb.html Specifically, users are expected to copy across files like "europe", "northamerica", and "asia". There is no `pdstdata.zi` file checked in to the source repository. Nor is there a vanguard/rearguard file. Since zic/make is not run, how is a downstream consumer going to use them (assuming it were desirable to do so, which I don't accept). If the file isn't in tzdata2018c.tar.gz then it effectively doesn't exist. Stephen On 30 January 2018 at 08:49, Paul Eggert <eggert@cs.ucla.edu> wrote:
This is intended to provide a way to support both clients that require data to have only positive DST offsets, and clients that do not have this restriction. * Makefile (XDST, SDST): New macros. (TZDATA_ZI_DEPS): Add zidst.awk. (DSTDATA_ZI_DEPS): New macro. (all): Depend on fulldata.zi and pdstdata.zi. (fulldata.zi pdstdata.zi): New rule. (tzdata.zi): Use $(XDST)data.zi instead of reading original source. (check_zishrink): Check zidst.awk, too. (clean): Remove all *.zi files, not just tzdata.zi. * NEWS, europe: Mention this. * zidst.awk: New file. --- Makefile | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++----------- NEWS | 30 ++++++++++++++++++++++++++++++ europe | 39 ++++++++++++++++++++++----------------- zidst.awk | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 154 insertions(+), 28 deletions(-) create mode 100644 zidst.awk
diff --git a/Makefile b/Makefile index 8c84cd9..92ddb80 100644 --- a/Makefile +++ b/Makefile @@ -10,6 +10,26 @@ VERSION= unknown # Email address for bug reports. BUGEMAIL= tz@iana.org
+# To install the full data, which can contain daylight saving time +# offsets that are negative (relative to standard time), use +# XDST= full +# To install data containing only positive daylight saving time +# offsets, but otherwise as close to the full data as practical, use +# XDST= pdst +XDST= pdst +# Parsers requiring DST offsets to be positive should use the file +# pdstdata.zi, which contains almost all the data of 'africa' etc., +# except with positive DST offsets. This works around a problem that +# was discovered in January 2018 with negative DST in tests for ICU +# and OpenJDK. See: +# https://mm.icann.org/pipermail/tz/2018-January/025825.html +# https://mm.icann.org/pipermail/tz/2018-January/025822.html +# Currently the 'africa' etc. files use pdst form if comments are +# ignored, to ease transition for parsers that do not support +# negative DST offsets. This is intended to change to full form at +# some point, so that full-featured zi parsers that use the 'africa' +# files will get the full data without changing anything. + # Change the line below for your time zone (after finding the zone you want in # the time zone files, or adding it to a time zone file). # Alternately, if you discover you've got the wrong time zone, you can just @@ -463,7 +483,8 @@ TDATA= $(YDATA) $(NDATA) $(BACKWARD) ZONETABLES= zone1970.tab zone.tab TABDATA= iso3166.tab $(TZDATA_TEXT) $(ZONETABLES) LEAP_DEPS= leapseconds.awk leap-seconds.list -TZDATA_ZI_DEPS= zishrink.awk version $(TDATA) $(PACKRATDATA) +TZDATA_ZI_DEPS= zidst.awk zishrink.awk version $(TDATA) $(PACKRATDATA) +DSTDATA_ZI_DEPS= zidst.awk $(TDATA) $(PACKRATDATA) DATA= $(TDATA_TO_CHECK) backzone iso3166.tab leap-seconds.list \ leapseconds yearistype.sh $(ZONETABLES) AWK_SCRIPTS= checklinks.awk checktab.awk leapseconds.awk zishrink.awk @@ -500,7 +521,8 @@ VERSION_DEPS= \
SHELL= /bin/sh
-all: tzselect yearistype zic zdump libtz.a $(TABDATA) +all: tzselect yearistype zic zdump libtz.a $(TABDATA) \ + fulldata.zi pdstdata.zi
ALL: all date $(ENCHILADA)
@@ -535,11 +557,15 @@ version: $(VERSION_DEPS) printf '%s\n' "$$V" >$@.out mv $@.out $@
-# This file can be tailored by setting BACKWARD, PACKRATDATA, etc. -tzdata.zi: $(TZDATA_ZI_DEPS) +# These files can be tailored by setting BACKWARD, PACKRATDATA, etc. +fulldata.zi pdstdata.zi: $(DSTDATA_ZI_DEPS) + $(AWK) -v outfile='$@' -f zidst.awk $(TDATA) $(PACKRATDATA) \ + >$@.out + mv $@.out $@ +tzdata.zi: $(XDST)data.zi version version=`sed 1q version` && \ LC_ALL=C $(AWK) -v version="$$version" -f zishrink.awk \ - $(TDATA) $(PACKRATDATA) >$@.out + $(XDST)data.zi >$@.out mv $@.out $@
version.h: version @@ -721,17 +747,32 @@ check_tzs: $(TZS) $(TZS_NEW) check_web: tz-how-to.html $(VALIDATE_ENV) $(VALIDATE) $(VALIDATE_FLAGS) tz-how-to.html
-# Check that tzdata.zi generates the same binary data that its sources do. -check_zishrink: tzdata.zi zic leapseconds $(PACKRATDATA) $(TDATA) +# The format of the source files, either full or pdst. +# Currently they are in pdst format, but this is expected to change. +SDST = pdst + +# Check that zishrink.awk does not alter the data, and that zidst.awk +# preserves $(SDST) data. +check_zishrink: zic leapseconds $(PACKRATDATA) $(TDATA) \ + $(XDST)data.zi tzdata.zi for type in posix right; do \ - mkdir -p time_t.dir/$$type time_t.dir/$$type-shrunk && \ + mkdir -p time_t.dir/$$type time_t.dir/$$type-$(SDST) \ + time_t.dir/$$type-shrunk && \ case $$type in \ right) leap='-L leapseconds';; \ *) leap=;; \ esac && \ - $(ZIC) $$leap -d time_t.dir/$$type $(TDATA) && \ - $(AWK) '/^Rule/' $(TDATA) | \ + $(ZIC) $$leap -d time_t.dir/$$type $(XDST)data.zi && \ + $(AWK) '/^Rule/' $(XDST)data.zi | \ $(ZIC) $$leap -d time_t.dir/$$type - $(PACKRATDATA) && \ + case $(XDST) in \ + $(SDST)) \ + $(ZIC) $$leap -d time_t.dir/$$type-$(SDST) $(TDATA) && \ + $(AWK) '/^Rule/' $(TDATA) | \ + $(ZIC) $$leap -d time_t.dir/$$type-$(SDST) \ + $(XDST)data.zi && \ + diff -r time_t.dir/$$type time_t.dir/$$type-$(SDST);; \ + esac && \ $(ZIC) $$leap -d time_t.dir/$$type-shrunk tzdata.zi && \ diff -r time_t.dir/$$type time_t.dir/$$type-shrunk || exit; \ done @@ -741,7 +782,7 @@ clean_misc: rm -f core *.o *.out \ date tzselect version.h zdump zic yearistype libtz.a clean: clean_misc - rm -fr *.dir tzdata.zi tzdb-*/ $(TZS_NEW) + rm -fr *.dir *.zi tzdb-*/ $(TZS_NEW)
maintainer-clean: clean @echo 'This command is intended for maintainers to use; it' diff --git a/NEWS b/NEWS index 4f763c0..c455f3c 100644 --- a/NEWS +++ b/NEWS @@ -2,6 +2,36 @@ News for the tz database
Unreleased, experimental changes
+ Briefly: + Support zi parsers that mishandle negative DST offsets + + Changes to build procedure + + The new XDST macro in the Makefile lets the installer choose + XDST=full, which allows arbitrary DST offsets in the data, or + XDST=pdst, which allows only positive DST offsets. Choosing + XDST=full is arguably more correct for Ireland, which observes + Irish Standard Time (IST, UTC+01) in summer and GMT (UTC) in + winter. Choosing XDST=pdst is better for zoneinfo parsers that do + not work well with negative DST offsets, notably OpenJDK+CLDR. + On platforms using tzcode or similar APIs, XDST should not affect + any behavior other than that depending on the tm_isdst flag. + + For now this change does not affect client-visible behavior by + default, as the Makefile defaults to XDST=pdst and uncommented + parts of the data source files contain only pdst-format data. + After a bit of time for testing, XDST=full and full-format source + files are planned to become the default, so that parsers that + support negative DST offsets can get full data without changing + their build procedures. Parsers requiring positive DST offsets + should use the new file pdstdata.zi instead of tzdata.zi or the + source files 'africa' etc.: pdstdata.zi is pdst-compatible, it is + automatically built from the data source files, and it will + continue to be pdst-compatible regardless of XDST. To get + full-format data now, use the new file fulldata.zi, which will + continue to be full-format regardless of XDST. To get the format + selected by XDST, use tzdata.zi. + Changes to code
The code is a bit more portable to MS-Windows. (Thanks to Manuela diff --git a/europe b/europe index 6c1ccbe..5aeda33 100644 --- a/europe +++ b/europe @@ -508,11 +508,27 @@ Link Europe/London Europe/Jersey Link Europe/London Europe/Guernsey Link Europe/London Europe/Isle_of_Man
-# From Paul Eggert (2018-01-19): +# From Paul Eggert (2018-01-30): +# In January 2018 we discovered that the negative DST offsets in the +# Eire rules cause problems with tests for ICU: +# https://mm.icann.org/pipermail/tz/2018-January/025825.html +# and with tests for OpenJDK: +# https://mm.icann.org/pipermail/tz/2018-January/025822.html +# To work around this problem, zidst.awk translates the following data +# lines into two forms. First, fulldata.zi contains the full data, +# which includes negative DST offsets. Second, pdstdata.zi uses a +# traditional approximation for Irish time stamps after 1971-10-31 +# 02:00 UTC; although this approximation has tm_isdst flags that are +# the reverse of the full data, its UTC offsets are correct and this +# suffices for ICU and OpenJDK. Although this source file currently +# has pdstdata.zi lines active and fulldata.zi lines commented out, +# this is intended to change in the near future and downstream code +# should not rely on it. +# # The following is like GB-Eire and EU, except with standard time in # summer and negative daylight saving time in winter. -# Although currently commented out, this will need to become uncommented -# once the ICU/OpenJDK workaround is removed; see below. +# This rule set is active in fulldata.zi and is commented out in +# pdstdata.zi. # Rule NAME FROM TO TYPE IN ON AT SAVE LETTER/S #Rule Eire 1971 only - Oct 31 2:00u -1:00 GMT #Rule Eire 1972 1980 - Mar Sun>=16 2:00u 0 IST @@ -533,24 +549,13 @@ Zone Europe/Dublin -0:25:00 - LMT 1880 Aug 2 0:00 1:00 IST 1947 Nov 2 2:00s 0:00 - GMT 1948 Apr 18 2:00s 0:00 GB-Eire GMT/IST 1968 Oct 27 -# From Paul Eggert (2018-01-18): -# The next line should look like this: +# The next line is active in fulldata.zi and commented out in pdstdata.zi. # 1:00 Eire IST/GMT -# However, in January 2018 we discovered that the Eire rules cause -# problems with tests for ICU: -# https://mm.icann.org/pipermail/tz/2018-January/025825.html -# and with tests for OpenJDK: -# https://mm.icann.org/pipermail/tz/2018-January/025822.html -# To work around this problem, use a traditional approximation for -# time stamps after 1971-10-31 02:00 UTC, to give ICU and OpenJDK -# developers breathing room to fix bugs. This approximation has -# correct UTC offsets, but results in tm_isdst flags are the reverse -# of what they should be. This workaround is temporary and should be -# removed reasonably soon. +# These three lines are active in pdstdata.zi and commented out in +# fulldata.zi. 1:00 - IST 1971 Oct 31 2:00u 0:00 GB-Eire GMT/IST 1996 0:00 EU GMT/IST -# End of workaround for ICU and OpenJDK bugs.
############################################################################### diff --git a/zidst.awk b/zidst.awk new file mode 100644 index 0000000..7885e9a --- /dev/null +++ b/zidst.awk @@ -0,0 +1,50 @@ +# Convert tzdata source into full or positive-DST form + +# Contributed by Paul Eggert. This file is in the public domain. + +# This is not a general-purpose converter; it is designed for current tzdata. +# +# When converting to full form, the output can use negative DST offsets. +# +# When converting to positive-DST form, the output uses only positive +# DST offsets. The idea is for the output data to simulate the +# behavior of the input data as best it can within the constraints of +# positive DST offsets. +# +# In the input, lines requiring the full format are commented #[full] +# and the positive DST near-equivalents are commented #[pdst]. + +BEGIN { + dst_type["full"] = 1 + dst_type["pdst"] = 1 + + # The command line should set OUTFILE to the name of the output file, + # which should start with either "full" or "pdst". + todst = substr(outfile, 1, 4) + if (!dst_type[todst]) exit 1 +} + +/^Zone/ { zone = $2 } + +{ + in_comment = /^#/ + + # Test whether this line should differ between the full and the pdst versions. + Rule_Eire = /^#?Rule[\t ]+Eire[\t ]/ + Zone_Dublin_post_1968 \ + = (zone == "Europe/Dublin" && /^#?[\t ]+[01]:00[\t ]/ \ + && (!$(in_comment + 4) || 1968 < $(in_comment + 4))) + + # If so, uncomment the desired version and comment out the undesired one. + if (Rule_Eire || Zone_Dublin_post_1968) { + if ((Rule_Eire \ + || (Zone_Dublin_post_1968 && $(in_comment + 3) == "IST/GMT")) \ + == (todst == "full")) { + sub(/^#/, "") + } else if (/^[^#]/) { + sub(/^/, "#") + } + } +} + +{ print } -- 2.14.3
On 02/06/2018 07:40 AM, Stephen Colebourne wrote:
Users are encouraged to update the time-zone data themselves: http://www.joda.org/joda-time/tz_update.html http://www.threeten.org/threetenbp/update-tzdb.html
They should continue to be able to do that. joda.org, threeten.org, etc. can continue to distribute tarballs in rearguard format or whatever other format they like, and users can use those tarballs. Something like this should be done regardless of whether there is any format change, as iana.org doesn't have the resources to support every device on the planet directly.
Since zic/make is not run, how is a downstream consumer going to use them A process at joda.org, threeten.org, etc. can run 'make' to generate a tarball that contains files in rearguard format, and then joda.org, threeten.org etc. can redistribute that tarball. Assuming the current development version, the process could run something like the following, say.
make rearguard.zi mkdir tzrear.dir ln rearguard.zi tzrear.dir/africa for file in antarctica asia australasia europe northamerica southamerica etcetera systemv factory backward ; do \ touch tzrear.dir/$file || exit; \ done (cd tzrear.dir && \ tar -cf - africa antarctica asia australasia europe northamerica southamerica etcetera systemv factory backward ) | \ gzip >tzrear2018c.tar.gz No doubt this will require minor tweaking to pacify whatever quirks OpenJDK has, but I hope you get the idea. The point is that if OpenJDK wants a particular tweak to the format, then it should be in charge of its own destiny, and not be at the mercy of upstream changes. This is what other distributors do, and it's a good time for OpenJDK to start following their lead.
Although most Java users like me and millions other really appreciate your work very much, I fear you don't fully understand how Java world works/ticks. The big majority of Java users is not willing to run any shell scripts or macros just because it would not be compatible with the major goal to create platform-independent Java software. So the only real primary interface from the perspective of a Java user consists of the source code files like "Africa", "Europe" etc. I myself can surely not use your macro "Make" because that would require me and other users of public Java APIs to download extra non-Java-software components even on unknown future platform-specific environments. It is only interesting for those Java users who just develop private-only tools for a known specific platform but this is clearly a minority. And I still think that extra source code files with documented reliable formats would far more serve the specific needs of Java users without the need to change the zic input, so no problem either for the Unix-world or for Java commune. Macros and shell scripts might be very fine in Unix world but are a no-go for most Java users. I hope that you take this into your considerations. Meno Am 10.02.2018 um 00:11 schrieb Paul Eggert:
On 02/06/2018 07:40 AM, Stephen Colebourne wrote:
Users are encouraged to update the time-zone data themselves: http://www.joda.org/joda-time/tz_update.html http://www.threeten.org/threetenbp/update-tzdb.html
They should continue to be able to do that. joda.org, threeten.org, etc. can continue to distribute tarballs in rearguard format or whatever other format they like, and users can use those tarballs. Something like this should be done regardless of whether there is any format change, as iana.org doesn't have the resources to support every device on the planet directly.
Since zic/make is not run, how is a downstream consumer going to use them A process at joda.org, threeten.org, etc. can run 'make' to generate a tarball that contains files in rearguard format, and then joda.org, threeten.org etc. can redistribute that tarball. Assuming the current development version, the process could run something like the following, say.
make rearguard.zi mkdir tzrear.dir ln rearguard.zi tzrear.dir/africa for file in antarctica asia australasia europe northamerica southamerica etcetera systemv factory backward ; do \ touch tzrear.dir/$file || exit; \ done (cd tzrear.dir && \ tar -cf - africa antarctica asia australasia europe northamerica southamerica etcetera systemv factory backward ) | \ gzip >tzrear2018c.tar.gz
No doubt this will require minor tweaking to pacify whatever quirks OpenJDK has, but I hope you get the idea.
The point is that if OpenJDK wants a particular tweak to the format, then it should be in charge of its own destiny, and not be at the mercy of upstream changes. This is what other distributors do, and it's a good time for OpenJDK to start following their lead.
Meno Hochschild wrote:
The big majority of Java users is not willing to run any shell scripts or macros
That's fine, and there's no suggestion that they do so. The suggestion is that Java users can continue to use source code files named "africa", "europe", etc., which they get from distributions that are automatically generated by the relevant server sites. Any shell scripts or Makefiles (or Java code, for that matter) used to generate these distributions would be run only on the relatively small number of sites that produce the distributions in question. This is routine practice with other downstream consumers of tzdb, and it is a good way to insulate them from upstream changes that they have compatibility problems with or are otherwise not yet ready for.
On 9 February 2018 at 23:11, Paul Eggert <eggert@cs.ucla.edu> wrote:
Since zic/make is not run, how is a downstream consumer going to use them
A process at joda.org, threeten.org, etc. can run 'make' to generate a tarball that contains files in rearguard format, and then joda.org, threeten.org etc. can redistribute that tarball.
To be clear here, as the maintainer of the project, you are effectively asking some of its key consumers to fork the project. I find that pretty astonishing. The tzdb project has provided a distribution of the relevant data in a suitable form for consumption for 20 years or so. It still does today, as the change is currently rolled back. I'm certainly not about to go replicating the tzdb distribution, placing large amount of work and cost on me, merely to work around a change that simply should not be happening. Mark Davis has recently expressed this again in another thread - there is absolutely no good rationale for making this change, and it clearly causes major pain. The only effect on zic is a flag that everyone seems to agree is pointless/deprecated, and some disagree that the change is correct wrt the flag's specification. A rational observer would be astonished that this issue got beyond 20 emails never mind 200. If zic wants to reverse the dst flag, it should do so. The source files should remain with positive SAVE values. Stephen
Assuming the current development version, the process could run something like the following, say.
make rearguard.zi mkdir tzrear.dir ln rearguard.zi tzrear.dir/africa for file in antarctica asia australasia europe northamerica southamerica etcetera systemv factory backward ; do \ touch tzrear.dir/$file || exit; \ done (cd tzrear.dir && \ tar -cf - africa antarctica asia australasia europe northamerica southamerica etcetera systemv factory backward ) | \ gzip >tzrear2018c.tar.gz
No doubt this will require minor tweaking to pacify whatever quirks OpenJDK has, but I hope you get the idea.
The point is that if OpenJDK wants a particular tweak to the format, then it should be in charge of its own destiny, and not be at the mercy of upstream changes. This is what other distributors do, and it's a good time for OpenJDK to start following their lead.
To be clear here, as the maintainer of the project, you are effectively asking some of its key consumers to fork the project. I find that pretty astonishing.
What you're effectively being asked to do is to fix the broken software that makes the inappropriate assumption that SAVE cannot be negative. He's also providing a compatibility mode for the project so that in the intermediate time it will work with broken downstream consumers, and you're being asked to redistribute the compatibility version if that's all your project can consume. Let's not be hyperbolic here, he's not asking you to maintain an entirely separate time zone database. I'll note that this is not at *all* unheard of. Note that debian redistributes my project, python-dateutil, and applies patches to it before redistribution ( https://sources.debian.org/patches/python-dateutil/2.6.1-1/ ) so that they can use their centrally distributed zoneinfo files rather than the ones distributed with the library. Distributing software with small compatibility patches is incredibly common and probably the only sane way to manage the various different timescales on which software is developed. I don't think it's amazingly common for the upstream package to actually do the work of providing the patches *for you*, as Paul has done here. Frankly, I'm not sure why you want to lobby so hard for this sort of thing in the upstream distribution. If you continue to rely solely on consuming the IANA data directly with no possibility to patch it if it gets out of sync with your software, you are at the mercy of the upstream distribution (as you now see). If, on the other hand, you set up your downstream software to consume patched versions of the software (even if you fix them so that the patch is no longer necessary), you can be much more responsive when something changes in an unexpected way. On 02/11/2018 03:44 PM, Stephen Colebourne wrote:
On 9 February 2018 at 23:11, Paul Eggert <eggert@cs.ucla.edu> wrote:
Since zic/make is not run, how is a downstream consumer going to use them
A process at joda.org, threeten.org, etc. can run 'make' to generate a tarball that contains files in rearguard format, and then joda.org, threeten.org etc. can redistribute that tarball.
The tzdb project has provided a distribution of the relevant data in a suitable form for consumption for 20 years or so. It still does today, as the change is currently rolled back.
I'm certainly not about to go replicating the tzdb distribution, placing large amount of work and cost on me, merely to work around a change that simply should not be happening. Mark Davis has recently expressed this again in another thread - there is absolutely no good rationale for making this change, and it clearly causes major pain. The only effect on zic is a flag that everyone seems to agree is pointless/deprecated, and some disagree that the change is correct wrt the flag's specification. A rational observer would be astonished that this issue got beyond 20 emails never mind 200.
If zic wants to reverse the dst flag, it should do so. The source files should remain with positive SAVE values.
Stephen
Assuming the current development version, the process could run something like the following, say.
make rearguard.zi mkdir tzrear.dir ln rearguard.zi tzrear.dir/africa for file in antarctica asia australasia europe northamerica southamerica etcetera systemv factory backward ; do \ touch tzrear.dir/$file || exit; \ done (cd tzrear.dir && \ tar -cf - africa antarctica asia australasia europe northamerica southamerica etcetera systemv factory backward ) | \ gzip >tzrear2018c.tar.gz
No doubt this will require minor tweaking to pacify whatever quirks OpenJDK has, but I hope you get the idea.
The point is that if OpenJDK wants a particular tweak to the format, then it should be in charge of its own destiny, and not be at the mercy of upstream changes. This is what other distributors do, and it's a good time for OpenJDK to start following their lead.
Stephen Colebourne wrote:
you are effectively asking some of its key consumers to fork the project. I find that pretty astonishing.
I would find it astonishing too, if that was what was being suggested. But it's not. I'm merely suggesting standard practice that is commonly done by other tzdb consumers. For example, you can see it in action here: https://packages.debian.org/sid/tzdata This currently contains a copy of tzdata 2018c, along with a small patch that supplies some time zone links (e.g., SystemV/AST4ADT) that were removed from tzdata in release 2005p, links that Debian kept due to compatibility concerns. This is not a fork of the project in any real sense; it's just routine shimming.
there is absolutely no good rationale for making this change
Although there is good rationale for the change, you clearly disagree. Similarly, back in 2005 there was good rationale to remove SystemV/AST4ADT and clearly not everyone agreed. There was a reasonable approach back then to accommodate that disagreement, and a similar approach would work well here too. After all, it's simply not practical to insist on complete lock-step agreement among all tzdata consumers.
On 11 February 2018 at 22:29, Paul Eggert <eggert@cs.ucla.edu> wrote:
Stephen Colebourne wrote:
you are effectively asking some of its key consumers to fork the project. I find that pretty astonishing.
I would find it astonishing too, if that was what was being suggested. But it's not. I'm merely suggesting standard practice that is commonly done by other tzdb consumers. For example, you can see it in action here:
These are groups/teams for whom packaging/repackaging is their normal role. Its not comparable. In addition, every user of these libraries has gone to IANA for the files for over 15 years - they understand IANA to be the source of timezone data. Breaking that is undesirable.
there is absolutely no good rationale for making this change
Although there is good rationale for the change, you clearly disagree.
The only visible change for zic is a flag that is deprecated/discouraged. So why not change zic or the tooling to alter the flag in the case of Ireland? Its very ego-centric of the project to put the importance of that flag ahead of all other consumers. We have had an agreement for many years that the zic input files represent an API used by other systems, and is therefore protected for compatibility as an API. While everyone accepts that the importance of positive SAVE values was not previously established, it clearly is now. So, it should simply be declared as part of the API. Note that I've never precluded the tzdb distribution containing other/additional files with other/additional information, just the protection of the input source files. Stephen
Stephen Colebourne wrote:
These are groups/teams for whom packaging/repackaging is their normal role. Its not comparable.
Sure it's comparable. We're not talking about crack teams of dozens of software maintainers. We're talking only about a simple script that I wrote in a few minutes, a script that can be run automatically. This sort of thing is routine in software, and it's not reasonable to dismiss it merely on the grounds that it is "packaging/repackaging".
every user of these libraries has gone to IANA for the files for over 15 years
No, as IANA didn't start hosting the data until 2011. Users of these libraries evidently didn't have much trouble switching to iana.org, and they won't have much trouble switching to joda.org either.
why not change zic or the tooling to alter the flag in the case of Ireland?
Because we're trying to model Irish time as best we can, and this includes modeling Irish Standard Time as standard time.
We have had an agreement for many years
No, we never had any agreement to freeze the format. The zic input format has evolved in the past, and it will undoubtedly continue to evolve in the future.
Date: Mon, 12 Feb 2018 08:26:25 +0000 From: Stephen Colebourne <scolebourne@joda.org> Message-ID: <CACzrW9CyxYAv+=F8X=rmU9E3Cd+ZbgLvNF3LxjNagfi4tfeg-Q@mail.gmail.com> I typed this reply on Tuesday, then decided to defer sending it, hoping that someone else might refute some of the nonsense that was in this message. The rest of this e-mail is untouched from then (now that Paul has replied, the similarities in the parts we both mention are meaningful I think.) kre | In addition, every user of these libraries has gone to IANA for the | files for over 15 years - they understand IANA to be the source of | timezone data. Breaking that is undesirable. Nonsense. First because IANA was not involved in any way at all until (I think) mid (or late) 2012 (perhaps even 2013) - which is something less than 15 years ago ... going to IANA for this data before then would have been a colossal waste of time. Second, the vast majority of users of this data get it from their system vendor (via MacOS, Android, whichever Linux or BSD they're using, etc - Windows users, as I understand it, are even more isolated though the primary data ends up there too). None of those bother IANA at all (other than the few, like those on this list, who have a greater than normal interest.) For the people who make those distributions, getting the data from IANA is entirely reasonable - for end users it is not. IANA is not funded to provide that level of service, and it was never part of our agreement with them (via the IESG) to provide that level of access. If it gets abused, they're quite likely to simply say "we're done" and delete the whole thing - if you have (or anyone has) users depending upon it, they would suffer, and there's nothing you could do about it (they would not even have to give any advance notice, though they probably would.) | The only visible change for zic is a flag that is | deprecated/discouraged. In the zic output, and currentl.y - but the source data should be correct, deliberately supplying lies just so some software doesn't break (over the long term) is intolerable. | We have had an agreement for many years that the zic input files | represent an API used by other systems, Where exact;ly was that agreement created / docuumented? We have known that people have other translators of the zic input data format, but it has always been understood, I believe, and generally implemented, that when the format changed it was their responsibility to update to handle it (ie: that is what actually has happened). |. While everyone accepts that the importance of positive SAVE | values was not previously established, it clearly is now. I disagree. there is some broken software that is making a bogus assumption. That needs to be fixed. There is no fundamental reason why the SAVE value needs to be positive (or non-negative to allow for the pedantry in someone else's message), it isn't even difficult to deal with all possibilities - it just needs to be done. And no-one, anywhere, should be assuming that the zic input format is fixed, it isn't, and never has been. It has changed several times, and will do again I have little doubt. If you had been taking any notice of what has been happening over the past 15 years (or better, back for the 30+ years this project has existed) you'd know that. kre
Stephen Colebourne wrote:
To be clear here, as the maintainer of the project, you are effectively asking some of its key consumers to fork the project. I find that pretty astonishing.
No, he's not asking you to fork the project, although I understand why it seems that way to you. Analogously, you're not trying to hijack the project, although I have to say it seems that way to me. As I understand it, the OpenJDK/CLDR/ICU/Joda projects are currently set up so that millions of Java end users are using the zic input files directly, without modification, and they're fetching the files directly from iana.org. That's a pretty constraining situation! And as long as things work that way, you're *never* going to be in favor of *any* change to the zic input file format, because you're never going to be able to assure yourself that all of those millions of users are going to upgrade their software any time soon. There have already been multiple incompatibilities (how does OpenJDK/CLDR/ICU/Joda deal with more than two time variants per year, by the way?), and there will certainly be more. So migrating to a strategy that could loosen the currently too-tight coupling between the projects would be to the benefit of all concerned, don't you think?
On February 9 I wrote:
Assuming the current development version, the process could run something like the following, say.
The usual springtime DST uproar is behind us, so now is a good time to work out the kinks in this process. I installed into the GitHub development repository the attached proposed patches to automate an improved version of the process. Turning the crank on this automation results in the gzipped tarball cited below. This tarball does not use negative DST offsets, so it can be used by parsers that do not support negative DST offsets; otherwise this tarball is like the regular tarballs (the idea is that these will resume using negative DST offsets a la tzdb 2018b, this time for Africa/Windhoek, Europe/Dublin, and Europe/Prague). In short, please try out your tzdata parsers on this test tarball: https://web.cs.ucla.edu/~eggert/tz/test/tzdata2018d-22-g89da66c-rearguard.ta... It has an associated .asc signature as usual. Also, its parent directory has the corresponding files that we already generate, if you'd like to compare rearguard to main data.
On 2018-01-30 08:49, Paul Eggert proposed changes:
This is intended to provide a way to support both clients that require data to have only positive DST offsets, and clients that do not have this restriction.
What is meant here by "DST offset"? The term is not used nor defined in the description of zic input. Do you mean the values in the SAVE column of Rule lines in zic input? Probably not -- the SAVE values in zic input cannot be required to be positive as the value 00:00 must be allowed. So what property is guaranteed for the zic input pdstdata.zi? Michael Deckers.
Michael H Deckers via tz wrote:
What is meant here by "DST offset"? The term is not used nor defined in the description of zic input. Do you mean the values in the SAVE column of Rule lines in zic input? Probably not -- the SAVE values in zic input cannot be required to be positive as the value 00:00 must be allowed.
I did mean the SAVE values, but I was sloppy about using "positive" when "nonnegative" was intended. Proposed patch attached. Thanks for mentioning this.
participants (7)
-
Meno Hochschild -
Michael H Deckers -
Paul Eggert -
Paul G -
Robert Elz -
scs@eskimo.com -
Stephen Colebourne