On Dec 22, 2020, at 10:59 PM, Deborah Goldsmith via tz <tz@iana.org> wrote:
OK, I think I (mostly) figured it out. On Darwin (macOS) the default value of FS is “ “ (space).
On any Single UNIX Specification-compatible system, the default value of FS is space. To quote the awk page in The Open Group Base Specifications Issue 7, 2018 edition: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html "FS Input field separator regular expression; a <space> by default." Apple doesn't claim conformance to that (I've seen it referred to as "V7", which is more than a bit amusing...), but they do claim conformance to UNIX 03, and the UNIX 03 awk page: https://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html says the same thing. That probably goes back to earlier versions - all the way back to V7 (Seventh Edition UNIX, not Issue 7 of the Single UNIX Specification), I'd bet. The GNU Awk manual's section on default field splitting: https://www.gnu.org/software/gawk/manual/gawk.html#Default-Field-Splitting says Fields are normally separated by whitespace sequences (spaces, TABs, and newlines), not by single spaces. Two spaces in a row do not delimit an empty field. The default value of the field separator FS is a string containing a single space, " ". If awk interpreted this value in the usual way, each space character would separate fields, so two spaces in a row would make an empty field between them. The reason this does not happen is that a single space as the value of FS is a special case—it is taken to specify the default manner of delimiting fields. And the Single UNIX Specification awk page says: An extended regular expression can be used to separate fields by using the -F ERE option or by assigning a string containing the expression to the built-in variable FS. The default value of the FS variable shall be a single <space>. The following describes FS behavior: * If FS is a null string, the behavior is unspecified. * If FS is a single character: * If FS is <space>, skip leading and trailing <blank>s; fields shall be delimited by sets of one or more <blank>s. * Otherwise, if FS is any other character c, fields shall be delimited by each single occurrence of c. * Otherwise, the string value of FS shall be considered to be an extended regular expression. Each occurrence of a sequence matching the extended regular expression shall delimit fields. so, again, FS = " " is a special case, meaning "one or more blanks separate fields". The awk.h in Apple's awk is copyright by Lucent Technologies, which indicates that it's presumably an AT&T version that got open-sourced, probably the One True AWK. I'm not sure which versions of AWK that leaves out, so "On Darwin (macOS) the default value of FS is “ “ (space)." can probably be replaced by "in any version of AWK worthy of the name the default value of FS is " " (space).", so that's not a difference between macOS and other OSes.
I suspect that these failures will occur on any system, not just Darwin,
Probably, as per the above.
but I don’t have access to a non-Darwin system with a working awk at the moment.
I have a large pile of VMs running Linux, Solaris 11, and various *BSDs (as well as macOS going back to Leopard!), so I can give it a try on several of them (all of them would be a bit tedious: $ ls -d ~/Documents/Virtual\ Machines/*.vmwarevm | wc -l 55 but trying it on the most recent version of each major group of OSes, dumping both Ubuntu and Fedora into the "Linux" group, wouldn't be too bad).