[PATCH 1/2] Fix zic parsing of abbreviated line codes
* NEWS: Document this. * zic.8: Document more clearly that "Zone" etc. can be abbreviated. * zic.c (line_codes): Remove, replacing with ... (zi_line_codes, leap_line_codes): ... these new constants. (infile): Use them to distinguish context more accurately. Remove no-longer-applicable warning. --- NEWS | 9 +++++++++ zic.8 | 9 ++++++++- zic.c | 16 +++++++++------- 3 files changed, 26 insertions(+), 8 deletions(-) diff --git a/NEWS b/NEWS index 0a76781..99f5d64 100644 --- a/NEWS +++ b/NEWS @@ -48,6 +48,15 @@ Unreleased, experimental changes Also, zic warns about the undocumented usage with a "last-" prefix, e.g., "last-Fri". + Similarly, zic now accepts the unambiguous abbreviation "L" for + "Link" in ordinary context and for "Leap" in leap-second context. + Conversely, zic no longer accepts non-prefixes such as "La" as + abbreviations for words like "Leap". + + zic no longer accepts leap second lines in ordinary input, or + ordinary lines in leap second input. Formerly, zic sometimes + warned about this undocumented usage and handled it incorrectly. + Several minor changes have been made to the code to make it a bit easier to port to MS-Windows. (Thanks to Kees Dekker for reporting the problems.) diff --git a/zic.8 b/zic.8 index 6fee96d..ab95b08 100644 --- a/zic.8 +++ b/zic.8 @@ -147,7 +147,14 @@ Any line that is blank (after comment stripping) is ignored. Non-blank lines are expected to be of one of three types: rule lines, zone lines, and link lines. .PP -Names (such as month names) must be in English and are case insensitive. +Names must be in English and are case insensitive. +They appear in several contexts, and include month and weekday names +and keywords such as +.BR "maximum" , +.BR "only" , +.BR "Rolling" , +and +.BR "Zone" . A name can be abbreviated by omitting all but an initial prefix; any abbreviation must be unambiguous in context. .PP diff --git a/zic.c b/zic.c index 33cb4e7..ff71af6 100644 --- a/zic.c +++ b/zic.c @@ -298,10 +298,13 @@ struct lookup { static struct lookup const * byword(const char * string, const struct lookup * lp); -static struct lookup const line_codes[] = { +static struct lookup const zi_line_codes[] = { { "Rule", LC_RULE }, { "Zone", LC_ZONE }, { "Link", LC_LINK }, + { NULL, 0 } +}; +static struct lookup const leap_line_codes[] = { { "Leap", LC_LEAP }, { NULL, 0} }; @@ -1114,6 +1117,8 @@ infile(const char *name) } else if (wantcont) { wantcont = inzcont(fields, nfields); } else { + struct lookup const *line_codes + = name == leapsec ? leap_line_codes : zi_line_codes; lp = byword(fields[0], line_codes); if (lp == NULL) error(_("input line of unknown type")); @@ -1130,11 +1135,7 @@ infile(const char *name) wantcont = false; break; case LC_LEAP: - if (name != leapsec) - warning(_("%s: Leap line in non leap" - " seconds file %s"), - progname, name); - else inleap(fields, nfields); + inleap(fields, nfields); wantcont = false; break; default: /* "cannot happen" */ @@ -1586,7 +1587,8 @@ rulesub(struct rule *rp, const char *loyearp, const char *hiyearp, ** Day work. ** Accept things such as: ** 1 - ** last-Sunday + ** lastSunday + ** last-Sunday (undocumented; warn about this) ** Sun<=20 ** Sun>=7 */ -- 2.9.4
Without this change, tzdata source's 669742 bytes (180832 bytes compressed) shrank to tzdata.zi's 123627 bytes (22247 bytes compressed). With this change, tzdata.zi is 106273 bytes (21203 bytes compressed). That is, the change's data compression ratio is about 1.16 (1.05 for compressed data), and the total data compression ratio of tzdata.zi is now about 6.3 (8.5 for compressed data). These figures assume lzip -9 compression. * Makefile (tzdata.zi): Do not set the Awk PACKRATDATA var, as zishrink.awk now handles duplicates directly. * Makefile (zonenames, $(TZS_NEW)): * checklinks.awk: Work even when line codes are abbreviated. * zishrink.awk (paw_through_packratdata): Remove; no longer needed. Caller removed. (gen_rule_name, output_saved_lines): New functions. (process_input_line): Use it to abbreviate rule names. Abbreviate line codes and "max" too. Save output lines instead of printing them immediately, so that later output lines can supersede earlier. (END): Output saved lines. --- Makefile | 10 +++--- checklinks.awk | 4 +-- zishrink.awk | 105 +++++++++++++++++++++++++++++++++++++++++++-------------- 3 files changed, 85 insertions(+), 34 deletions(-) diff --git a/Makefile b/Makefile index 3aa9e04..c3ef931 100644 --- a/Makefile +++ b/Makefile @@ -478,9 +478,7 @@ version: $(VERSION_DEPS) # This file can be tailored by setting BACKWARD, PACKRATDATA, etc. tzdata.zi: $(TZDATA_ZI_DEPS) - LC_ALL=C $(AWK) -v PACKRATDATA='$(PACKRATDATA)' \ - -f zishrink.awk \ - $(TDATA) $(PACKRATDATA) >$@.out + LC_ALL=C $(AWK) -f zishrink.awk $(TDATA) $(PACKRATDATA) >$@.out mv $@.out $@ version.h: version @@ -558,11 +556,11 @@ zones: $(REDO) $(TZS_NEW): tzdata.zi zdump zic mkdir -p tzs.dir $(zic) -d tzs.dir tzdata.zi - $(AWK) '/^Link/{print $$1 "\t" $$2 "\t" $$3}' \ + $(AWK) '/^L/{print "Link\t" $$2 "\t" $$3}' \ tzdata.zi | LC_ALL=C sort >$@.out wd=`pwd` && \ zones=`$(AWK) -v wd="$$wd" \ - '/^Zone/{print wd "/tzs.dir/" $$2}' tzdata.zi \ + '/^Z/{print wd "/tzs.dir/" $$2}' tzdata.zi \ | LC_ALL=C sort` && \ ./zdump -i -c $(TZS_YEAR) $$zones >>$@.out sed 's,^TZ=".*tzs\.dir/,TZ=",' $@.out >$@.sed.out @@ -826,7 +824,7 @@ typecheck: done zonenames: tzdata.zi - @$(AWK) '/^Zone/ { print $$2 } /^Link/ { print $$3 }' tzdata.zi + @$(AWK) '/^Z/ { print $$2 } /^L/ { print $$3 }' tzdata.zi asctime.o: private.h tzfile.h date.o: private.h diff --git a/checklinks.awk b/checklinks.awk index 5b3e157..f309010 100644 --- a/checklinks.awk +++ b/checklinks.awk @@ -9,7 +9,7 @@ BEGIN { Zone = "\n" } -/^Zone/ { +/^Z/ { if (defined[$2]) { if (defined[$2] == Zone) { printf "%s: Zone has duplicate definition\n", $2 @@ -21,7 +21,7 @@ BEGIN { defined[$2] = Zone } -/^Link/ { +/^L/ { if (defined[$3]) { if (defined[$3] == Zone) { printf "%s: Link with same name as Zone\n", $3 diff --git a/zishrink.awk b/zishrink.awk index 235b8f3..2c05a8d 100644 --- a/zishrink.awk +++ b/zishrink.awk @@ -6,33 +6,52 @@ # 'zic' should treat this script's output as if it were identical to # this script's input. -function paw_through_packratdata(line) + +# Return a new rule name. +# N_RULE_NAMES keeps track of how many rule names have been generated. + +function gen_rule_name(alphabet, base, rule_name, n, digit) { - if (PACKRATDATA) { - while (0 < (getline line <PACKRATDATA)) { - if (split(line, field)) { - if (field[1] == "Zone") packrat_zone[field[2]] = 1 - if (field[1] == "Link") packrat_zone[field[3]] = 1 - } - } - close(PACKRATDATA) - } + alphabet = "" + alphabet = alphabet "ABCDEFGHIJKLMNOPQRSTUVWXYZ" + alphabet = alphabet "abcdefghijklmnopqrstuvwxyz" + alphabet = alphabet "!$%&'()*+,./:;<=>?@[\\]^_`{|}~" + base = length(alphabet) + rule_name = "" + n = n_rule_names++ + + do { + n -= rule_name && n <= base + digit = n % base + rule_name = substr(alphabet, digit + 1, 1) rule_name + n = (n - digit) / base + } while (n); + + return rule_name } -function process_input_line(line, field, end) +# Process an input line and save it for later output. + +function process_input_line(line, field, end, i, n, startdef) { # Remove comments, normalize spaces, and append a space to each line. sub(/#.*/, "", line) line = line " " gsub(/[[:space:]]+/, " ", line) + # Abbreviate keywords. Do not abbreviate "Link" to just "L", + # as pre-2017c zic erroneously diagnoses "Li" as ambiguous. + sub(/^Link /, "Li ", line) + sub(/^Rule /, "R ", line) + sub(/^Zone /, "Z ", line) + # SystemV rules are not needed. - if (line ~ /^Rule SystemV /) next + if (line ~ /^R SystemV /) next # Replace FooAsia rules with the same rules without "Asia", as they # are duplicates. if (match(line, /[^ ]Asia /)) { - if (line ~ /^Rule /) next + if (line ~ /^R /) next line = substr(line, 1, RSTART) substr(line, RSTART + 5) } @@ -53,7 +72,10 @@ function process_input_line(line, field, end) line = substr(line, 1, end - 3) substr(line, end - 1) } - # Abbreviate "only" and month names. + # Abbreviate "max", "only" and month names. + # Do not abbreviate "min", as pre-2017c zic erroneously diagnoses "mi" + # as ambiguous. + gsub(/ max /, " ma ", line) gsub(/ only /, " o ", line) gsub(/ Jan /, " Ja ", line) gsub(/ Feb /, " F ", line) @@ -78,26 +100,57 @@ function process_input_line(line, field, end) # Remove unnecessary trailing " Ja" (for January). sub(/ Ja$/, "", line) - # Output lines unless they are later overridden in PACKRATDATA. - if (line ~ /^[LRZ]/) { - overridden = 0 - if (FILENAME != PACKRATDATA) { - split(line, field) - if (field[1] == "Zone") - overridden = packrat_zone[field[2]] - else if (field[1] == "Link" && packrat_zone[field[3]]) - next + n = split(line, field) + + # Abbreviate rule names. + i = field[1] == "Z" ? 4 : field[1] == "Li" ? 0 : 2 + if (i && field[i] ~ /^[^-+0-9]/) { + if (!rule[field[i]]) + rule[field[i]] = gen_rule_name() + field[i] = rule[field[i]] + } + + # If this zone supersedes an earlier one, delete the earlier one + # from the saved output lines. + startdef = "" + if (field[1] == "Z") + zonename = startdef = field[2] + else if (field[1] == "Li") + zonename = startdef = field[3] + else if (field[1] == "R") + zonename = "" + if (startdef) { + i = zonedef[startdef] + if (i) { + do + output_line[i - 1] = "" + while (output_line[i++] ~ /^[-+0-9]/); } } - if (!overridden) - print line + zonedef[zonename] = nout + 1 + + # Save the line for later output. + line = field[1] + for (i = 2; i <= n; i++) + line = line " " field[i] + output_line[nout++] = line +} + +function output_saved_lines(i) +{ + for (i = 0; i < nout; i++) + if (output_line[i]) + print output_line[i] } BEGIN { print "# This zic input file is in the public domain." - paw_through_packratdata() } /^[[:space:]]*[^#[:space:]]/ { process_input_line($0) } + +END { + output_saved_lines() +} -- 2.9.4
If you maintain a zoneinfo parser, I suggest testing it on the attached file tzdata.zi, which I built automatically by running 'make tzdata.zi'. This text file is in standard (albeit abbreviated) zic format and contains all the default tzdata. Further shrinking of tzdata.zi can be done once we start assuming tzcode 2015f and later, so that tzdata.zi can use "%z" instead of spelling out numeric abbreviations.
participants (1)
-
Paul Eggert