Re: [IANA #1365934] Re: Error in bibxlm2 entry for reference.ANSI.X3-4.1986.xml
Hello Amanda, Sorry for the long delay of my answer, and for the length of my answer. [Copying ietf-charsets@iana.org, where people interested in charsets (if there's anybody left) might be subscribed.] On 2025-07-04 09:52, Amanda Baber via RT wrote:
Hi Martin,
It was brought to our attention some time ago that the reference to ANSI_X3.4-1968 in the note at the top of the Character Sets registry may be incorrect, and that it should probably point to "INCITS.4-1986," or "INCITS.4-1986, originally published as ANSI X3.4-1986" instead.
[For the record, this gets redirected to https://www.iana.org/assignments/character-sets/character-sets.xhtml, so strictly speaking, the discussion below is about the later.]
Should IANA update that note? If so, which reference should we use?
TL;DR: It doesn't matter at all. Main proposal: Because it's irrelevant, eliminate the reference to ANSI_X3.4-1968 in the note at the top, so that we don't need to discuss what reference would be best. Comments are of course welcome. Details: The text in question, with a bit of context, is: ==== Note These are the official names for character sets that may be used in the Internet and may be referred to in Internet documentation. These names are expressed in ANSI_X3.4-1968 which is commonly called US-ASCII or simply ASCII. The character set most commonly use in the Internet and used especially in protocol standards is US-ASCII, this is strongly encouraged. The use of the name US-ASCII is also encouraged. ==== [The note itself continues, but this is the relevant part.] The first question is what "These names are expressed in" refers to. It may refer to the entries in the page itself. In that case, the statement is correct, because the page is served with "Content-Type: text/html; charset=UTF-8". US-ASCII is by definition also UTF-8, and all the 'names' are in the US-ASCII subset of UTF-8. "These names are expressed in" may also refer to how these names are to be used in protocols and data. In that case, the "strongly encouraged" applies, but the statement in its general form would be wrong because charset labels may easily be expressed in EBCDIC on an IBM mainframe, or in UTF-16 or UTF-32 inside programs e.g. in Java or Python. The second question is why ANSI_X3.4-1968/US-ASCII/INCITS.4-1986 is explicitly mentioned with respect to names. All the names use only characters from the invariant subset of ISO 646 (see e.g. https://en.wikipedia.org/wiki/ISO/IEC_646#Code_page_layout); I checked this by searching for each of these characters in the page. The only characters that appear in the IANA page and are redefined in national variants of ISO 646 are '[' and ']', and these only appear around references, not in any of the charset names themselves. My guess is that ANSI_X3.4-1968 was mentioned to stress the importance of US-ASCII, because when the registry was set up, there was quite a chance that people would send emails without charset information that were in any national variant of ISO 646. But this is just a guess, and if correct, it would only be a secondary reason. The next question is whether the mention of ANSI_X3.4-1968, or anything in place of it, is necessary or helpful. My answer would be that it may have had some value, but is no longer necessary. I don't think that any of the many other IANA registries registering 'names' (as opposed to numbers) says anything about how these names are (to be) encoded. Why should this page say anything more? The national variants of ISO 646 are mostly gone, US-ASCII is the base of text encoding on the Internet and the Web not only because the various standards say so, but also and much more importantly, because it's what's used in practice. So *my proposal* is to change the first paragraph of the note to: ==== These are the official names for character sets that may be used in the Internet and may be referred to in Internet documentation. The character set most commonly used in the Internet and used especially in protocol standards is US-ASCII, this is strongly encouraged. The use of the name US-ASCII is also encouraged. ==== [also fixing one instance of 'use' -> 'used'] If the above proposal is not accepted, the next question is what are the alternatives, and in particular whether "INCITS.4-1986" is an improvement over "ANSI_X3.4-1968". First, let's ignore the year. Researching the relationship between ANSI and INCITS, one immediately finds things such as "The InterNational Committee for Information Technology Standards (INCITS), is an ANSI-accredited standards development organization composed of Information technology developers. It was formerly known as the X3 and NCITS.". So INCITS.4-19XX and ANSI_X3.4-19XX would simply be equivalent. A relationship such as the above between ANSI and INCITS is very frequent in the realm of standards development and approval. Such relationships also come in various shapes and forms, from co-operation to rubber stamping, from republication to simple reference. The main reason for this is that the technical expertise is not in the same place as the official (or semi-official) authority. Just to mention an example, what's generally known as JavaScript is standardized by ECMA as ECMA-262, and is also ISO/IEC 16262 (later ISO/IEC 22275). Next, let's look at the years. It's currently "ANSI_X3.4-1968", whereas "INCITS.4-1986" is proposed. The question is what difference the year would make. I don't have actual copies of these standards. But it's easy to find the relevant parts on the Web. For the 1968 version, RFC 20, in its original paper version as scanned at https://www.rfc-editor.org/rfc/rfc20.pdf contains the relevant parts of what it calls USAS X3,4-1968. https://archive.org/details/enf-ascii-1968-1970/ seems to be a full copy, including several appendices and additional material. For the 1986 version, we can see it in the 2007 proposed review at https://www.unicode.org/L2/L2006/06388-review-incits4.pdf. While I'm sure that a careful comparison between these two versions will bring up many differences of interest to a historian, there are no essential differences in the allocation of characters to code points. The most noticeable difference is the use of a broken line glyph for the "VERTICAL LINE" character in 1968, which became a single line in 1986. But this is irrelevant for us, because none of the charset names contains a '|', and because glyph variants wouldn't be relevant anyway. So the conclusion for whether we use the 1968 version or the 1986 version is that it doesn't matter (assuming we use any version at all). Just in case we use any version, the question would be what exact labels to use. For the 1986 version, according to https://www.unicode.org/L2/L2006/06388-review-incits4.pdf, page 2, would be "ANSI INCITS 4-1986 (R2002)" or "ANSI X3.4-1986 (R1997)", and extrapolating back from the later we should end up at simply "ANSI X3.4-1986". For the 1968 version, it is entitled "USA Standard Code for Information Interchange" and labeled USAS X3.4-1968, with "Business Equipment Manufacturers Association" as a 'Sponsor' and approved by the "United States of America Standards Institute". As an aside, it's interesting to note that the initials in "USA Standard Code for Information Interchange" are the same as in US-ASCII, although my guess is that this is a somewhat indirect coincidence (first "USA Standard Code for Information Interchange" being shortened to "American Standard Code for Information Interchange" (ASCII), and later "US" being added again). In general, one has to note that the labels under discussion ("ANSI_X3.4-1968" and "INCITS.4-1986") seem both to be out of sync when it comes to correspondence between organization name and year of issue. In 1968, the name ANSI didn't exist (see e.g. the timeline and history by 10s of years at https://www.ansi.org/about/history, in particular the entries for 1960'S and 1970'S), and in 1986, the name INCITS didn't exist (see https://www.incits.org/about/history). Looking at https://en.wikipedia.org/wiki/ASCII#Revisions, if we want to refer the actual versions in 1968 or 1986, it would be "USAS X3.4-1968" or "ANSI X3.4-1986". If we want to use INCITS, we should make sure that we refer to one of the revisions/reviews of the 1986 explicitly, e.g. "ANSI INCITS 4-1986 (R2002)" (if we want ANSI to be included) or "INCITS 4-1986 (R2022)" (if we do not want ANSI to be included, and/or want to make sure we refer to the latest version).
The suggested reference format comes from a discussion with the RFC Editor, which resulted in the text of the first normative reference entry here:
https://datatracker.ietf.org/doc/html/draft-ietf-emailcore-rfc5322bis-12#nam...
Checked. That one seems to prefer the newest version as explained above. Personally, I'd probably cited RFC 20 (STD 80) instead. Also, I'd probably preferred ANSI X3.4-1986 in order to show that this is something extremely stable, rather than to run the danger that some readers start to wonder what the changes between 1986 and 2022 may be.
If this is OK, we'll ask an AD to sign off.
If it's okay with you, I'd like you to ask the AD to sign off on my proposal, i.e. to *remove* the mention of the standard from the Note at the start of the page. But please let's wait for a week or so to see if there are comment from others. Regards, Martin.
thanks,
Amanda Baber IANA Operations Manager
On Thursday, 11 September 2025, 05:23:28 (-04:00), Martin J. Dürst wrote:
So *my proposal* is to change the first paragraph of the note to: ==== These are the official names for character sets that may be used in the Internet and may be referred to in Internet documentation. The character set most commonly used in the Internet and used especially in protocol standards is US-ASCII, this is strongly encouraged. The use of the name US-ASCII is also encouraged. ====
Please fix the ugly comma splice, at least.
While I'm sure that a careful comparison between these two versions will bring up many differences of interest to a historian, there are no essential differences in the allocation of characters to code points. The most noticeable difference is the use of a broken line glyph for the "VERTICAL LINE" character in 1968, which became a single line in 1986. But this is irrelevant for us, because none of the charset names contains a '|', and because glyph variants wouldn't be relevant anyway.
So the conclusion for whether we use the 1968 version or the 1986 version is that it doesn't matter (assuming we use any version at all).
There were some changes made since the 1968 edition. I can't say with certainty whether they materially affect the official names, but the 1968 edition permitted "dualities" in certain characters. That is, different interpretations of some code points, such as $ possibly being a different currency. In particular, Appendix D of the 1986 edition describes the standard's own history and important changes. In my opinion, for clarity and certainty, the reference should be to "ANSI X3.4-1986". That *is* US-ASCII as it is used and understood today. INCITS republished the standard but I don't believe they ever made any changes aside from adding their own cover page. Sławomir Osipiuk
participants (2)
-
Martin J. Dürst -
Sławomir Osipiuk