At Mon, 13 Oct 2003 23:47:00 +0200, Oscar van Vlijmen <ovv@hetnet.nl> writes:
1. There are in the POSIX 1 region of characters below code position 128 no significant differences between the encodings us-ascii, iso-8859-1 and utf-8.
Yes, if we stick to the ASCII subset (and use only TAB and LF among the control characters) we should be OK. Pretty much everybody can read ASCII. ISO-8859-1 is incompatible with UTF-8, EUC, shift-JIS, etc. and so it's a bit more likely to be mishandled. (ISO-8859-1 used to be the default character set for HTML, but that was a while ago now.)
2. If an html page will be viewed off-line, it would be useful to put a <meta> tag in the <head> section describing a character set. iso-8859-1 would be the most compatible,
For some time the web pages have had <meta> tags that specify US-ASCII, which is a bit more conservative than ISO-8859-1. They also have their US-ASCII encoding specified in their XML declaration. Hmm, I just noticed that the HTTP header said "Content-Type: text/html; charset=iso-8859-1"; I just fixed this to say "us-ascii" so that it's all consistent.