On 2023-01-23 15:32, John Sauter via tz wrote:
On Mon, 2023-01-23 at 15:28 -0700, Paul Gilmartin via tz wrote:
On 1/23/23 13:48:02, Paul Eggert via tz wrote:
* Makefile (UNUSUAL_OK_LATIN_1): Allow all non-alphabetic, non-ASCII printable characters that are Latin-1. This is primarily for “§” and we might as well allow them all since even XEmacs 21 supports them all.
+UNUSUAL_OK_LATIN_1 = ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿×÷
Ouch! UTF-8 is too pervasive on desktops and WWW for that to be comfortable.
And on a UTF-8 desktop, GNU sed strangles on non-UTF-8 strings: 1250 $ printf 'a\xa7b\n' | sed -E 's/(.)(.)(.)/1 \1 2 \2 3 \3/' sed: RE error: illegal byte sequence 1251 $
I think the intent is to allow non-ASCII characters that are in Latin- 1, even though the file is coded in UTF-8. That is, not all Unicode characters are allowed, only those that appear in Latin-1.
Nitpick - ordinal indicators are Letters other like non-Latin scripts and micro sign is lowercase like Western scripts so match [[:alpha:]] not [[:punct:]]: $ man iso-8859-1 | grep '\s[[:alpha:]]\s' | head -3 252 170 AA ª FEMININE ORDINAL INDICATOR 265 181 B5 µ MICRO SIGN 272 186 BA º MASCULINE ORDINAL INDICATOR $ grep -ah 'ORDINAL\|MICRO SIGN' unicode-symbols.txt \ unicode/15.0.0/ucd/UnicodeData.txt ª U+00AA FEMININE ORDINAL INDICATOR µ U+00B5 MICRO SIGN º U+00BA MASCULINE ORDINAL INDICATOR 00AA;FEMININE ORDINAL INDICATOR;Lo;0;L;<super> 0061;;;;N;;;;; 00B5;MICRO SIGN;Ll;0;L;<compat> 03BC;;;;N;;;039C;;039C 00BA;MASCULINE ORDINAL INDICATOR;Lo;0;L;<super> 006F;;;;N;;;;; -- Take care. Thanks, Brian Inglis Calgary, Alberta, Canada La perfection est atteinte Perfection is achieved non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut -- Antoine de Saint-Exupéry