Greetings, I am not sure if this is technically an error from your perspective or not; but the latest 2024b DB does not load using the Azul ZUUpdater Tool. It fails with the following message: Failed: java.lang.Exception: Failed while parsing file '/tmp/ziupdater1990610874409301294343524276772419/srcdir/northamerica' on line 2633 'Rule Mexico 1931 only - April 30 0:00 1:00 D' From looking at the source files this is the only occurrence of a full month string I could find; the rest are 3 letter abbreviations, e.g. Apr. I thought I would let you know in case this was a mistake. Regards Chris Burke This message may contain confidential and/or privileged information and is intended solely for the named recipient. Please notify Air New Zealand immediately if received in error and delete the message (do not distribute further). Views expressed are those of the sender and may not represent Air New Zealand. For more information on the Air New Zealand Group, visit us online at https://www.airnewzealand.com/
On 2024-09-05 17:33, Christopher Burke via tz wrote:
Greetings,
I am not sure if this is technically an error from your perspective or not; but the latest 2024b DB does not load using the Azul ZUUpdater Tool.
It fails with the following message:
Failed: java.lang.Exception: Failed while parsing file '/tmp/ziupdater1990610874409301294343524276772419/srcdir/northamerica' on line 2633 'Rule Mexico 1931 only - April 30 0:00 1:00 D'
From looking at the source files this is the only occurrence of a full month string I could find; the rest are 3 letter abbreviations, e.g. Apr.
Thanks, we're aware of that issue and it'll be fixed in the next TZDB release. In the meantime you can just stick with 2024a, or manually change "April" to "Apr". Also, could you please report it as a bug to the Azul folks? Month and day names can be spelled out in full or abbreviated in any unambiguous way, this has been documented for ages, and previous TZDB data used full names on occasion. Thanks.
Paul Eggert wrote:
Month and day names can be spelled out in full or abbreviated in any unambiguous way, this has been documented for ages,
Can you provide a pointer to the documentation for this? I’ve looked through the files in the distribution and through Bill Seymour’s “how-to” page, and can’t find where this is mentioned one way or the other. Apologies if you’ve provided this link before and I missed it. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
On 2024-09-06 10:46, Doug Ewell wrote:
Month and day names can be spelled out in full or abbreviated in any unambiguous way, this has been documented for ages, Can you provide a pointer to the documentation for this?
The zic(8) man page's FILES section <https://man7.org/linux/man-pages/man8/zic.8.html#FILES> says the following: "Names must be in English and are case insensitive. They appear in several contexts, and include month and weekday names and keywords such as maximum, only, Rolling, and Zone. A name can be abbreviated by omitting all but an initial prefix; any abbreviation must be unambiguous in context." Wording about names and abbreviations has been present in the TZDB documentation, in some form or another, since release 2010j. To help avoid further confusion in this matter, I recently installed the following patch to give examples of month names and abbreviations, and this patch should appear in the next release: https://github.com/eggert/tz/commit/96fa7b7dd4cf8bd415a50e0f6a190488fa3c8078
Could I gently suggest that Postel's Law [1] applies here? What zic consumes can continue to be lenient. What TZDB produces should be strictly defined. In this case, the TZDB data should always be "Apr" and not "April". Stephen [1] https://en.wikipedia.org/wiki/Robustness_principle On Fri, 6 Sept 2024 at 19:48, Paul Eggert via tz <tz@iana.org> wrote:
On 2024-09-06 10:46, Doug Ewell wrote:
Month and day names can be spelled out in full or abbreviated in any unambiguous way, this has been documented for ages, Can you provide a pointer to the documentation for this?
The zic(8) man page's FILES section <https://man7.org/linux/man-pages/man8/zic.8.html#FILES> says the following: "Names must be in English and are case insensitive. They appear in several contexts, and include month and weekday names and keywords such as maximum, only, Rolling, and Zone. A name can be abbreviated by omitting all but an initial prefix; any abbreviation must be unambiguous in context."
Wording about names and abbreviations has been present in the TZDB documentation, in some form or another, since release 2010j. To help avoid further confusion in this matter, I recently installed the following patch to give examples of month names and abbreviations, and this patch should appear in the next release:
https://github.com/eggert/tz/commit/96fa7b7dd4cf8bd415a50e0f6a190488fa3c8078
On 2024-09-06 16:30, Stephen Colebourne via tz wrote:
Could I gently suggest that Postel's Law [1] applies here?
Already done yesterday on the TZDB side, via these two patches to the development repository: https://github.com/eggert/tz/commit/926b507fa5c3192b1b68fab5910cbd3ba9377c97 https://github.com/eggert/tz/commit/7b6fb155cadd5e5ee70b55c2770e1bdd2f5d2a38 In the last couple of days I have been encouraging the other side of Postel's Law downstream too, as parsers should accept both full and abbreviated names.
Thanks. I note that the zic doc mentions accepting "lastsa" instead of "lastSat". Are there checks in place to ensure that TZDB only produces the normalized form of codes like these? Stephen On Sat, 7 Sept 2024 at 05:58, Paul Eggert <eggert@cs.ucla.edu> wrote:
On 2024-09-06 16:30, Stephen Colebourne via tz wrote:
Could I gently suggest that Postel's Law [1] applies here?
Already done yesterday on the TZDB side, via these two patches to the development repository:
https://github.com/eggert/tz/commit/926b507fa5c3192b1b68fab5910cbd3ba9377c97
https://github.com/eggert/tz/commit/7b6fb155cadd5e5ee70b55c2770e1bdd2f5d2a38
In the last couple of days I have been encouraging the other side of Postel's Law downstream too, as parsers should accept both full and abbreviated names.
On 2024-09-07 02:20, Stephen Colebourne via tz wrote:
the zic doc mentions accepting "lastsa" instead of "lastSat". Are there checks in place to ensure that TZDB only produces the normalized form of codes like these?
No, and there must be dozens of other normalization issues that are not checked. I haven't bothered putting in checks like that unless problems arise, partly due to lack of time and partly so that I don't have to worry about what the "normal form" should be. As it happens, "lastSa" does appear in one of the TZDB tarballs, in tzdata.zi (attached). Any parser that can't handle "lastSa" really should be fixed.
On Sat, 7 Sept 2024 at 16:15, Paul Eggert via tz <tz@iana.org> wrote:
On 2024-09-07 02:20, Stephen Colebourne via tz wrote:
the zic doc mentions accepting "lastsa" instead of "lastSat". Are there checks in place to ensure that TZDB only produces the normalized form of codes like these?
No, and there must be dozens of other normalization issues that are not checked. I haven't bothered putting in checks like that unless problems arise, partly due to lack of time and partly so that I don't have to worry about what the "normal form" should be.
As it happens, "lastSa" does appear in one of the TZDB tarballs, in tzdata.zi (attached). Any parser that can't handle "lastSa" really should be fixed.
I assume the "last" part of "lastSunday" does not count as a "name" (as in the paragraph of zic(8) that begins "Names must be in English and are case insensitive") , because that would imply lastSunday could be abbreviated to lSu. But can "last" contain uppercase letters, or does it have to be exactly "last"? My assumption from zic(8) is exactly "last". I just realised today that my parser handles abbreviations properly but isn't always case insensitive. I'd like to make sure I'm handling "last" correctly at the same time as fixing the case handling.
Paul Eggert wrote:
The zic(8) man page's FILES section <https://man7.org/linux/man-pages/man8/zic.8.html#FILES> says the following:
OK, that is what I was looking for. Happily, zic.8.txt (the whimsically spaced plain-text version of zic.8.html) is in the code distribution, so it is not necessary to hit the Web to get this spec.
Wording about names and abbreviations has been present in the TZDB documentation, in some form or another, since release 2010j. To help avoid further confusion in this matter, I recently installed the following patch to give examples of month names and abbreviations, and this patch should appear in the next release:
https://github.com/eggert/tz/commit/96fa7b7dd4cf8bd415a50e0f6a190488fa3c8078
which includes: | Month names may be abbreviated as mentioned previously; | for example, January can appear as | .q January , | .q JANU | or | .q Ja , | but not as | .q j | which would be ambiguous with July. It’s actually more nuanced than that: “j” is also ambiguous with June, so “Ja” would be acceptable whereas “Ju” (same number of letters) would not. I realize this might be construed as overexplaining, which has garnered a very bad name of late, so perhaps what is in the patch is best left as is. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
On 2024-09-07 10:40, Doug Ewell wrote:
It’s actually more nuanced than that: “j” is also ambiguous with June, so “Ja” would be acceptable whereas “Ju” (same number of letters) would not.
Yes, thanks, as it happens Robert Elz also pointed out that glitch to me privately, and I fixed it yesterday as per the attached patch.
participants (5)
-
Christopher Burke
-
Doug Ewell
-
Jonathan Wakely
-
Paul Eggert
-
Stephen Colebourne