Doubts about a typo fix
Hi Paul, I don't undertstand the commit shown below. It breaks the '-' symbol, which now is a hyphen. See the table mentioned yesterday: ┌──────────────────────────────────────────────────────────────────┐ │Keycap Appearance and meaning Special character and meaning │ ├──────────────────────────────────────────────────────────────────┤ │" " neutral double quote \[dq] neutral double quote │ │' ’ closing single quote \[aq] neutral apostrophe │ │- ‐ hyphen \- or \[-] minus sign/Unix dash │ │\ (escape character) \e or \[rs] reverse solidus │ │^ ˆ modifier circumflex \(ha circumflex/caret/“hat” │ │` ‘ opening single quote \(ga grave accent │ │~ ˜ modifier tilde \(ti tilde │ └──────────────────────────────────────────────────────────────────┘ So, at least, it should be (but I believe the initial code was correct): .q "zic \*\-r @$(date +%s)" However, I wonder what that \* is intending to do there (I can see no difference in my screen with or without it). Cheers, Alex --- commit 918e10e8963b3c0d38d3b5fb8ec9cf08ecd03757 Author: Paul Eggert <eggert@cs.ucla.edu> Date: Tue Jul 12 06:26:53 2022 -0700 * zic.8: fix minus typo diff --git a/zic.8 b/zic.8 index 0cd0781e..e8816e5b 100644 --- a/zic.8 +++ b/zic.8 @@ -145,7 +145,7 @@ .SH OPTIONS 31-bit signed integers. On platforms with GNU .BR date , -.q "zic \-r @$(date +%s)" +.q "zic \*-r @$(date +%s)" omits data intended for past timestamps. Although this option typically reduces the output file's size, the size can increase due to the need to represent the timestamp range -- <http://www.alejandro-colomar.es/>
Hi Alex, I'm not Paul but I think I can address this item anyway. At 2022-11-23T20:31:22+0100, Alejandro Colomar wrote:
I don't undertstand the commit shown below.
@@ -145,7 +145,7 @@ .SH OPTIONS 31-bit signed integers. On platforms with GNU .BR date , -.q "zic \-r @$(date +%s)" +.q "zic \*-r @$(date +%s)" omits data intended for past timestamps. Although this option typically reduces the output file's size, the size can increase due to the need to represent the timestamp range
It breaks the '-' symbol, which now is a hyphen. See the table mentioned yesterday: [...] So, at least, it should be (but I believe the initial code was correct):
.q "zic \*\-r @$(date +%s)"
However, I wonder what that \* is intending to do there (I can see no difference in my screen with or without it).
Your second point addresses your first, because the zic(8) man page does something old-fashioned: it defines a string for the minus sign. On my system, the page has this in a sort of prologue. 27 .ie \n(.g \{\ 28 . ds : \: 29 . ds - \f(CW-\fP 30 .\} 31 .el \{\ 32 . ds : 33 . ds - \- 34 .\} You see no diagnostic--I expect you have warnings turned on--because the page has defined a string named `-`, and the *roff input `\*-` interpolates a string named `-`. Strictly, this string definition should be updated to use the font's minus sign even if the formatter is groff (the `.g` register interpolates a true value). .ie \n(.g \{\ . ds : \: . ds - \f(CW\-\fP .\} Most people won't see a difference because groff 1.22.4 (and earlier releases going back to, I think, 2009) the man(7) macro package remaps the hyphen to the minus sign on the 'utf8' output device. This will be changing in groff 1.23 to improve consistency with man page rendering on typesetters.[1] Workarounds are documented.[2] I also note that "CW" is an old, AT&T device-independent troff-compatible font name.[3] groff's preferred name for this face is "CR", because for the past couple of decades a monospace font (often Courier) has generally been available in all four styles (roman, oblique, bold, and bold-oblique). All of that said, I wouldn't switch to a monospace font just to render a dash; not if groff is the formatter. Paul is more of a battle-scarred veteran than I am, so there may be a good reason to define this string on proprietary Unix systems--tzdata has to be _really_ portable--but on any system using groff or Heirloom Doctools troff, I can't think of one.[4] mandoc maintainer Ingo Schwarze and I both recommend against performing string definitions, or interpolating strings, in man pages. So I would see if it's feasible to get away with dropping the definition and use of a `-` string altogether. But if not, there's no _validity_ problem with what Paul has. Regards, Branden [1] https://git.savannah.gnu.org/cgit/groff.git/tree/NEWS?id=23ffa46c8c951fec1d2... [2] https://git.savannah.gnu.org/cgit/groff.git/tree/PROBLEMS?id=23ffa46c8c951fe... [3] https://github.com/n-t-roff/DWB3.3/tree/master/text/devnroff [4] I don't know of anyone using neatroff to render man pages, but that may simply because I haven't seen them speak up about it.
Thanks for the info about groff. You're right, tzdb man pages are supposed to be portable to both groff and traditional troff. For the latter I test with /usr/bin/nroff and /usr/bin/troff on Solaris 10, which is the oldest troff I know that is still supported. On 2022-11-23 13:40, G. Branden Robinson wrote:
Strictly, this string definition should be updated to use the font's minus sign even if the formatter is groff (the `.g` register interpolates a true value).
.ie \n(.g \{\ . ds : \: . ds - \f(CW\-\fP .\}
If we did that, Groff would set a source string like "\*-\*-help" as "−−help", with two instances of U+2212 MINUS SIGN instead of U+002D HYPHEN-MINUS. Therefore people couldn't cut and paste code examples out of HTML or PDF, and into the shell. "\f(CW-\fP" is used instead of plain "-" because when the output is PDF, it is more clearly visible to humans as a hyphen-minus instead of as a hyphen (U+2010 HYPHEN).
Most people won't see a difference because groff 1.22.4 (and earlier releases going back to, I think, 2009) the man(7) macro package remaps the hyphen to the minus sign on the 'utf8' output device.
I noticed the abovementioned problem with PDF output, and I still see it with groff 1.22.4. I see a different issue with groff 1.22.4 on Ubuntu 22.10: I cannot easily see the difference between "\f(CR-\fP" and "\f(CR\-\fP" on output to PDF. If I cut from the output PDF and paste into Emacs or the terminal, the former is indeed U+002D and the latter U+2202 and the difference is readily visible in Emacs or the terminal; but it's not readily visible in the PDF. However, this glitch is not a serious issue for man pages since examples always contain hyphen-minuses not minus signs, so I didn't worry about it. I assume that it's yet another font thing, since the problem doesn't occur in the default Roman font.
I also note that "CW" is an old, AT&T device-independent troff-compatible font name.[3] groff's preferred name for this face is "CR", because for the past couple of decades a monospace font (often Courier) has generally been available in all four styles (roman, oblique, bold, and bold-oblique).
Thanks, I didn't know that was preferred. I installed the attached patch into the tzdb development repository
Paul Eggert via tz <tz@iana.org> writes:
Thanks for the info about groff. You're right, tzdb man pages are supposed to be portable to both groff and traditional troff. For the latter I test with /usr/bin/nroff and /usr/bin/troff on Solaris 10, which is the oldest troff I know that is still supported.
[...]
"\f(CW-\fP" is used instead of plain "-" because when the output is PDF, it is more clearly visible to humans as a hyphen-minus instead of as a hyphen (U+2010 HYPHEN).
You have to be very careful with the combination of \f(CW and \fP on Solaris 10 nroff, and I suspect the construct you are using has nascent bugs. \f(CW doesn't produce a font change on Solaris 2.6 with nroff, so if you write something like: \fBsomething\fP \f(CW-\fP something else you will discover that "something else" is in bold because the second \fP reverts to the "previous" font, which nroff thinks is \fB becuase \f(CW was ignored. (Just tested now on a Solaris 10 host.) Pod::Man has fairly elaborate workarounds for this bug.
I also note that "CW" is an old, AT&T device-independent troff-compatible font name.[3] groff's preferred name for this face is "CR", because for the past couple of decades a monospace font (often Courier) has generally been available in all four styles (roman, oblique, bold, and bold-oblique).
Thanks, I didn't know that was preferred. I installed the attached patch into the tzdb development repository
Just be warned that \f(CR is not a valid font name in all *roff implementations, which is why Pod::Man uses \f(CW by default. Not sure how much you care. (And, to be honest, not sure how much anyone should care about any implementations other than groff and mandoc these days.) -- Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>
On 2022-11-25 19:20, Russ Allbery wrote:
You have to be very careful with the combination of \f(CW and \fP on Solaris 10 nroff
That should be OK, as \f(CW - which is now \f(CR - is used only if \n(.g is nonzero, i.e., only if it's groff and not traditional troff. I toyed with using \f[CW] instead of \f(CW to underscore that it's groff-specific. However, that might be overkill given the number of non-*roff programs that read these files.
Hi Paul, At 2022-11-25T18:18:46-0800, Paul Eggert wrote:
Thanks for the info about groff. You're right, tzdb man pages are supposed to be portable to both groff and traditional troff. For the latter I test with /usr/bin/nroff and /usr/bin/troff on Solaris 10, which is the oldest troff I know that is still supported.
I'm curious to know what that support looks like. Is there evidence of any _development_?
If we did that, Groff would set a source string like "\*-\*-help" as "−−help", with two instances of U+2212 MINUS SIGN instead of U+002D HYPHEN-MINUS. Therefore people couldn't cut and paste code examples out of HTML or PDF, and into the shell.
This hasn't been true for PDFs produced by groff for about 10 years.[1][2] You can copy a U+2212 minus sign and it will paste as a U+002D.
"\f(CW-\fP" is used instead of plain "-" because when the output is PDF, it is more clearly visible to humans as a hyphen-minus instead of as a hyphen (U+2010 HYPHEN).
Okay. It's a shame that's necessary.
Most people won't see a difference because groff 1.22.4 (and earlier releases going back to, I think, 2009) the man(7) macro package remaps the hyphen to the minus sign on the 'utf8' output device.
I noticed the abovementioned problem with PDF output, and I still see it with groff 1.22.4.
Some distributors do violence to the man.local file. But I am not a PDF expert; for this I'll have to turn as I often do to Deri James, who also wrote the gropdf output driver. Deri, what's a good way to root-cause the issue Paul describes? If I prepare the following document: $ cat EXPERIMENTS/minus-and-hyphen.man .TH foo 1 2022-11-25 "groff test suite" .SH Name foo \- frobnicate a bar .SH Description Copy and paste me: foo\-bar-baz. and render it with "groff -Tpdf -man" using either groff 1.22.4 or groff Git, then when I copy-and-paste "foo-bar." from the document to a shell prompt, I get this: $ echo foo-bar-baz. | od -c 0000000 f o o - b a r - b a z . \n 0000015
I see a different issue with groff 1.22.4 on Ubuntu 22.10: I cannot easily see the difference between "\f(CR-\fP" and "\f(CR\-\fP" on output to PDF. If I cut from the output PDF and paste into Emacs or the terminal, the former is indeed U+002D and the latter U+2202 and the difference is readily visible in Emacs or the terminal;
That's odd. This definitely is not consistent with the groff 1.22.4 behavior I'm familiar with. I find the minus sign and hyphen glyphs fairly distinguishable. I modified my example file above to switch to the CR font. Attaching (cropped, 7.7KiB) screenshot.
but it's not readily visible in the PDF. However, this glitch is not a serious issue for man pages since examples always contain hyphen-minuses not minus signs, so I didn't worry about it. I assume that it's yet another font thing, since the problem doesn't occur in the default Roman font.
Possibly; when fonts aren't embedded in the PDF, we're at the mercy of whatever the renderer supplies. groff 1.23 will be shipping a 380-page compilation of all its man pages in PDF, and it embeds the fonts; I am hopeful that this will provide a reliable basis for comparisons so that we can better track down issues like the ones above. Regards, Branden [1] Commit: https://git.savannah.gnu.org/cgit/groff.git/commit/?id=4536678ce5713907304ad... [2] One explanation: https://lists.gnu.org/archive/html/groff/2018-05/msg00076.html
At 2022-11-25T19:50:14-0800, Paul Eggert wrote:
On 2022-11-25 19:20, Russ Allbery wrote:
You have to be very careful with the combination of \f(CW and \fP on Solaris 10 nroff
That should be OK, as \f(CW - which is now \f(CR - is used only if \n(.g is nonzero, i.e., only if it's groff and not traditional troff.
Just for precision's sake, the .g register interpolating a true value means (by convention) that an implementation is claiming support for groff extensions. This happens with Heirloom Doctools troff, for instance, if one gives it the "-mg" option. (There are other ways to switch on its "groff mode".) Also, to reiterate, "CW" as a font name is not a groff extension; it has some history in Documenter's Workbench troff and I think it may have appeared in Research Unix troff as well in the 1980s, but I don't have convincing evidence of this, just educated guesses based on man(7) and ms(7) man pages from that era. If I had sources for Research Unix V8-V10 I'd be a happy guy.
I toyed with using \f[CW] instead of \f(CW to underscore that it's groff-specific. However, that might be overkill given the number of non-*roff programs that read these files.
In my opinion that's not necessary, and implies too much. Regards, Branden
G. Branden Robinson wrote in <20221126035253.pli53qzgfx6tbax5@illithid>: |At 2022-11-25T18:18:46-0800, Paul Eggert wrote: ... |> If we did that, Groff would set a source string like "\*-\*-help" as |> "−−help", with two instances of U+2212 MINUS SIGN instead of U+002D |> HYPHEN-MINUS. Therefore people couldn't cut and paste code examples |> out of HTML or PDF, and into the shell. | |This hasn't been true for PDFs produced by groff for about 10 |years.[1][2] You can copy a U+2212 minus sign and it will paste as a |U+002D. It would be great if groff would release adjustments to grotty so that one could again use copy+paste also in manuals. And now please do not beat me onto that hyphen-minus for options, and that one should do this or that, but it is for many other characters, too. If i look at bash manual for example, hyphen-minus is ok, but caret is not ^ but U+02C6 MODIFIER LETTER CIRCUMFLEX ACCENT, and i see U+2018 LEFT SINGLE QUOTATION MARK instead of single-quotes. That is cool and maybe milks the shit out of the typographic capabilities of modern UTF-8 terminal emulators (i think i quote you here, more or less), but i always have to use "LC_ALL=C man XY" to enable copy+paste for myself. But hey, it is only me, i am not a prof at an University who is prowd of dozens of Noble price winners and other such prices, many of them still worth something aka based upon scientific grounds. --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
participants (5)
-
Alejandro Colomar -
G. Branden Robinson -
Paul Eggert -
Russ Allbery -
Steffen Nurpmeso