New leapseconds.awk doesn't work on macOS
Hi, I just noticed that starting with 2019c, leapseconds.awk is producing a damaged leapseconds file on macOS, e.g.:
# All leap-seconds are Stationary (S) at the given UTC time. # The correction (+ or -) is made at the given time, so in the unlikely # event of a negative leap second, a line would look like this: # Leap YEAR MON DAY 23:59:59 - S # Typical lines look like this: # Leap YEAR MON DAY 23:59:60 + S
# POSIX timestamps for the data in this file: #updated -2208988800 (1900-01-01 00:00:00 UTC) #expires -2208988800 (1900-01-01 00:00:00 UTC)
It turns out that having a regex in RS is a gawk feature and is not supported in Darwin awk. Commenting out the definition of RS in leapseconds.awk fixed the problem. I don’t know how many customers of TZ will be using versions of awk that don’t support regexes in RS. I don’t know about FreeBSD, OpenBSD, etc. Thanks, Debbie
Would it make sense to just use tr to strip carriage returns from the file before feeding it to the awk script? Thanks, Debbie
On Jan 10, 2020, at 7:32 PM, Deborah Goldsmith via tz <tz@iana.org> wrote:
Hi,
I just noticed that starting with 2019c, leapseconds.awk is producing a damaged leapseconds file on macOS, e.g.:
# All leap-seconds are Stationary (S) at the given UTC time. # The correction (+ or -) is made at the given time, so in the unlikely # event of a negative leap second, a line would look like this: # Leap YEAR MON DAY 23:59:59 - S # Typical lines look like this: # Leap YEAR MON DAY 23:59:60 + S
# POSIX timestamps for the data in this file: #updated -2208988800 (1900-01-01 00:00:00 UTC) #expires -2208988800 (1900-01-01 00:00:00 UTC)
It turns out that having a regex in RS is a gawk feature and is not supported in Darwin awk. Commenting out the definition of RS in leapseconds.awk fixed the problem.
I don’t know how many customers of TZ will be using versions of awk that don’t support regexes in RS. I don’t know about FreeBSD, OpenBSD, etc.
Thanks, Debbie
NetBSD already supported regular expressions in RS, and they upstreamed that to one-true-awk last year (https://github.com/onetrueawk/awk/commit/643a5a3dad633431c6ce8831944c23059a6... and https://github.com/onetrueawk/awk/commit/7cae39dfa53e17981990f649a2f6b4c1ba8...) so hopefully all the OSes not using gawk can unify at some point. but, yeah, right now most awks don't have this ability. (for Android builds we're actually using one-true-awk on both Linux and macOS so developers working on one can't break developers working on the other.) On Fri, Jan 10, 2020 at 9:31 PM Deborah Goldsmith via tz <tz@iana.org> wrote:
Would it make sense to just use tr to strip carriage returns from the file before feeding it to the awk script?
Thanks, Debbie
On Jan 10, 2020, at 7:32 PM, Deborah Goldsmith via tz <tz@iana.org> wrote:
Hi,
I just noticed that starting with 2019c, leapseconds.awk is producing a damaged leapseconds file on macOS, e.g.:
# All leap-seconds are Stationary (S) at the given UTC time. # The correction (+ or -) is made at the given time, so in the unlikely # event of a negative leap second, a line would look like this: # Leap YEAR MON DAY 23:59:59 - S # Typical lines look like this: # Leap YEAR MON DAY 23:59:60 + S
# POSIX timestamps for the data in this file: #updated -2208988800 (1900-01-01 00:00:00 UTC) #expires -2208988800 (1900-01-01 00:00:00 UTC)
It turns out that having a regex in RS is a gawk feature and is not supported in Darwin awk. Commenting out the definition of RS in leapseconds.awk fixed the problem.
I don’t know how many customers of TZ will be using versions of awk that don’t support regexes in RS. I don’t know about FreeBSD, OpenBSD, etc.
Thanks, Debbie
On Jan 10, 2020, at 10:26 PM, enh via tz <tz@iana.org> wrote:
NetBSD already supported regular expressions in RS, and they upstreamed that to one-true-awk last year
macOS: https://opensource.apple.com/source/awk/awk-24/ uses what I'm guessing, based on the Lucent copyrights, is the One True Awk: https://github.com/onetrueawk/awk and I suspect most if not all of the *BSDs do so as well, but macOS seems to be using a 2007-vintage version. Perhaps they should update, especially to a version that has...
(https://github.com/onetrueawk/awk/commit/643a5a3dad633431c6ce8831944c23059a6... and https://github.com/onetrueawk/awk/commit/7cae39dfa53e17981990f649a2f6b4c1ba8...) so hopefully all the OSes not using gawk can unify at some point.
...that change.
but, yeah, right now most awks don't have this ability.
...so the tzcode should probably not depend on it.
Guy Harris wrote in <4EDFA3A5-DA50-41BE-A248-EB850CB3A581@alum.mit.edu>: |On Jan 10, 2020, at 10:26 PM, enh via tz <tz@iana.org> wrote: |> NetBSD already supported regular expressions in RS, and they |> upstreamed that to one-true-awk last year | |macOS: | | https://opensource.apple.com/source/awk/awk-24/ | |uses what I'm guessing, based on the Lucent copyrights, is the One \ |True Awk: | | https://github.com/onetrueawk/awk | |and I suspect most if not all of the *BSDs do so as well, but macOS \ |seems to be using a 2007-vintage version. Perhaps they should update, \ Debian systems use a historic mawk implementation worth testing against, as it has no gsub and does not have character classes (i finally have made a port for the Linux hobby distribution i use to be able to do so easily). And Sun xpg4/bin/awk expands \ sequences twice: - gsub(/"/, "\\\\\"", LINE) + # Sun xpg4/bin/awk expands those twice: + # Notice that backslash escapes are interpreted twice, once in + # lexical processing of the string and once in processing the + # regular expression. + i = "\"" + gsub(/"/, "\\\\\"", i) + i = (i == "\"") + gsub(/"/, (i ? "\\\\\"" : "\134\134\""), LINE) |especially to a version that has... | |> (https://github.com/onetrueawk/awk/commit/643a5a3dad633431c6ce8831944c23\ |> 059a6be309 |> and https://github.com/onetrueawk/awk/commit/7cae39dfa53e17981990f649a2f\ |> 6b4c1ba856112) |> so hopefully all the OSes not using gawk can unify at some point. | |...that change. | |> but, yeah, right now most awks don't have this ability. | |...so the tzcode should probably not depend on it. And the other hand and in my experience packagers simply start adding build time dependencies to work around such. I unfortunately forced them to use gawk on MacOS and Debian in the past... --End of <4EDFA3A5-DA50-41BE-A248-EB850CB3A581@alum.mit.edu> --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
Steffen Nurpmeso wrote in <20200111204437.3LtvV%steffen@sdaoden.eu>: |Guy Harris wrote in <4EDFA3A5-DA50-41BE-A248-EB850CB3A581@alum.mit.edu>: ||On Jan 10, 2020, at 10:26 PM, enh via tz <tz@iana.org> wrote: ||> NetBSD already supported regular expressions in RS, and they ||> upstreamed that to one-true-awk last year ... |Debian systems use a historic mawk implementation worth |testing against, as it has no gsub and does not have character Correction. gsub it has. --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
On 2020-01-10, at 23:26:15, enh via tz <tz@iana.org> wrote:
NetBSD already supported regular expressions in RS, and they upstreamed that to one-true-awk last year (https://github.com/onetrueawk/awk/commit/643a5a3dad633431c6ce8831944c23059a6... and https://github.com/onetrueawk/awk/commit/7cae39dfa53e17981990f649a2f6b4c1ba8...) so hopefully all the OSes not using gawk can unify at some point. but, yeah, right now most awks don't have this ability. (for Android builds we're actually using one-true-awk on both Linux and macOS so developers working on one can't break developers working on the other.)
POSIX rules. Why take a chance to depend on leading edge features?: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html#tag_20_0... ... RS ... If RS contains more than one character, the results are unspecified. ...
On Fri, Jan 10, 2020 at 9:31 PM Deborah Goldsmith via tz <tz@iana.org> wrote:
Would it make sense to just use tr to strip carriage returns from the file before feeding it to the awk script?
No need. A sub() rule fixes it compatibly. Patch attached. Tested on MacOS Mojave with simulated a la NIST input. -- gil
Thanks for the quick fix! Debbie
On Jan 11, 2020, at 8:30 AM, Paul Eggert <eggert@CS.UCLA.EDU> wrote:
On 1/10/20 7:32 PM, Deborah Goldsmith via tz wrote:
It turns out that having a regex in RS is a gawk feature and is not supported in Darwin awk.
Thanks for reporting the problem. I installed the attached patch to fix this portability bug. <0001-Fix-leapseconds.awk-portability-bug.patch>
participants (6)
-
Deborah Goldsmith -
enh -
Guy Harris -
Paul Eggert -
Paul Gilmartin -
Steffen Nurpmeso