New subject: [IANA #1365934] Re: Error in bibxlm2 entry for reference.ANSI.X3-4.1986.xml

Sept. 11, 2025

      Hello Amanda,

Sorry for the long delay of my answer, and for the length of my answer.

[Copying ietf-charsets@iana.org, where people interested in charsets (if 
there's anybody left) might be subscribed.]

On 2025-07-04 09:52, Amanda Baber via RT wrote:
...
Hi Martin,
It was brought to our attention some time ago that the reference to ANSI_X3.4-1968 in the note at the top of the Character Sets registry may be incorrect, and that it should probably point to "INCITS.4-1986," or "INCITS.4-1986, originally published as ANSI X3.4-1986" instead.
https://www.iana.org/assignments/character-sets
[For the record, this gets redirected to 
https://www.iana.org/assignments/character-sets/character-sets.xhtml, so 
strictly speaking, the discussion below is about the later.]
...
Should IANA update that note? If so, which reference should we use?
TL;DR: It doesn't matter at all.

Main proposal: Because it's irrelevant, eliminate the reference to 
ANSI_X3.4-1968 in the note at the top, so that we don't need to discuss 
what reference would be best.

Comments are of course welcome.

Details:

The text in question, with a bit of context, is:
====
Note

These are the official names for character sets that may be used in
the Internet and may be referred to in Internet documentation.  These
names are expressed in ANSI_X3.4-1968 which is commonly called
US-ASCII or simply ASCII.  The character set most commonly use in the
Internet and used especially in protocol standards is US-ASCII, this
is strongly encouraged.  The use of the name US-ASCII is also
encouraged.
==== [The note itself continues, but this is the relevant part.]

The first question is what "These names are expressed in" refers to. It 
may refer to the entries in the page itself. In that case, the statement 
is correct, because the page is served with "Content-Type: text/html; 
charset=UTF-8". US-ASCII is by definition also UTF-8, and all the 
'names' are in the US-ASCII subset of UTF-8.

"These names are expressed in" may also refer to how these names are to 
be used in protocols and data. In that case, the "strongly encouraged" 
applies, but the statement in its general form would be wrong because 
charset labels may easily be expressed in EBCDIC on an IBM mainframe, or 
in UTF-16 or UTF-32 inside programs e.g. in Java or Python.

The second question is why ANSI_X3.4-1968/US-ASCII/INCITS.4-1986 is 
explicitly mentioned with respect to names. All the names use only 
characters from the invariant subset of ISO 646 (see e.g. 
https://en.wikipedia.org/wiki/ISO/IEC_646#Code_page_layout); I checked 
this by searching for each of these characters in the page.

The only characters that appear in the IANA page and are redefined in 
national variants of ISO 646 are '[' and ']', and these only appear 
around references, not in any of the charset names themselves. My guess 
is that ANSI_X3.4-1968 was mentioned to stress the importance of 
US-ASCII, because when the registry was set up, there was quite a chance 
that people would send emails without charset information that were in 
any national variant of ISO 646. But this is just a guess, and if 
correct, it would only be a secondary reason.

The next question is whether the mention of ANSI_X3.4-1968, or anything 
in place of it, is necessary or helpful. My answer would be that it may 
have had some value, but is no longer necessary. I don't think that any 
of the many other IANA registries registering 'names' (as opposed to 
numbers) says anything about how these names are (to be) encoded. Why 
should this page say anything more? The national variants of ISO 646 are 
mostly gone, US-ASCII is the base of text encoding on the Internet and 
the Web not only because the various standards say so, but also and much 
more importantly, because it's what's used in practice.

So *my proposal* is to change the first paragraph of the note to:
====
These are the official names for character sets that may be used in
the Internet and may be referred to in Internet documentation.  The 
character set most commonly used in the Internet and used especially in 
protocol standards is US-ASCII, this is strongly encouraged.  The use of 
the name US-ASCII is also encouraged.
====
[also fixing one instance of 'use' -> 'used']

If the above proposal is not accepted, the next question is what are the 
alternatives, and in particular whether "INCITS.4-1986" is an 
improvement over "ANSI_X3.4-1968". First, let's ignore the year. 
Researching the relationship between ANSI and INCITS, one immediately 
finds things such as "The InterNational Committee for Information 
Technology Standards (INCITS), is an ANSI-accredited standards 
development organization composed of Information technology developers. 
It was formerly known as the X3 and NCITS.". So INCITS.4-19XX and 
ANSI_X3.4-19XX would simply be equivalent.

A relationship such as the above between ANSI and INCITS is very 
frequent in the realm of standards development and approval. Such 
relationships also come in various shapes and forms, from co-operation 
to rubber stamping, from republication to simple reference. The main 
reason for this is that the technical expertise is not in the same place 
as the official (or semi-official) authority.

Just to mention an example, what's generally known as JavaScript is 
standardized by ECMA as ECMA-262, and is also ISO/IEC 16262 (later 
ISO/IEC 22275).

Next, let's look at the years. It's currently "ANSI_X3.4-1968", whereas 
"INCITS.4-1986" is proposed. The question is what difference the year 
would make. I don't have actual copies of these standards. But it's easy 
to find the relevant parts on the Web.

For the 1968 version, RFC 20, in its original paper version as scanned 
at https://www.rfc-editor.org/rfc/rfc20.pdf contains the relevant parts 
of what it calls USAS X3,4-1968. 
https://archive.org/details/enf-ascii-1968-1970/ seems to be a full 
copy, including several appendices and additional material. For the 1986 
version, we can see it in the 2007 proposed review at 
https://www.unicode.org/L2/L2006/06388-review-incits4.pdf.

While I'm sure that a careful comparison between these two versions will 
bring up many differences of interest to a historian, there are no 
essential differences in the allocation of characters to code points. 
The most noticeable difference is the use of a broken line glyph for the 
"VERTICAL LINE" character in 1968, which became a single line in 1986. 
But this is irrelevant for us, because none of the charset names 
contains a '|', and because glyph variants wouldn't be relevant anyway.

So the conclusion for whether we use the 1968 version or the 1986 
version is that it doesn't matter (assuming we use any version at all).

Just in case we use any version, the question would be what exact labels 
to use. For the 1986 version, according to 
https://www.unicode.org/L2/L2006/06388-review-incits4.pdf, page 2, would 
be "ANSI INCITS 4-1986 (R2002)" or "ANSI X3.4-1986 (R1997)", and 
extrapolating back from the later we should end up at simply "ANSI 
X3.4-1986".

For the 1968 version, it is entitled "USA Standard Code for Information 
Interchange" and labeled USAS X3.4-1968, with "Business Equipment 
Manufacturers Association" as a 'Sponsor' and approved by the "United 
States of America Standards Institute". As an aside, it's interesting to 
note that the initials in "USA Standard Code for Information 
Interchange" are the same as in US-ASCII, although my guess is that this 
is a somewhat indirect coincidence (first "USA Standard Code for 
Information Interchange" being shortened to "American Standard Code for 
Information Interchange" (ASCII), and later "US" being added again).

In general, one has to note that the labels under discussion 
("ANSI_X3.4-1968" and "INCITS.4-1986") seem both to be out of sync when 
it comes to correspondence between organization name and year of issue. 
In 1968, the name ANSI didn't exist (see e.g. the timeline and history 
by 10s of years at https://www.ansi.org/about/history, in particular the 
entries for 1960'S and 1970'S), and in 1986, the name INCITS didn't 
exist (see https://www.incits.org/about/history).

Looking at https://en.wikipedia.org/wiki/ASCII#Revisions, if we want to 
refer the actual versions in 1968 or 1986, it would be "USAS X3.4-1968" 
or "ANSI X3.4-1986". If we want to use INCITS, we should make sure that 
we refer to one of the revisions/reviews of the 1986 explicitly, e.g. 
"ANSI INCITS 4-1986 (R2002)" (if we want ANSI to be included) or "INCITS 
4-1986 (R2022)" (if we do not want ANSI to be included, and/or want to 
make sure we refer to the latest version).
...
The suggested reference format comes from a discussion with the RFC Editor, which resulted in the text of the first normative reference entry here:
https://datatracker.ietf.org/doc/html/draft-ietf-emailcore-rfc5322bis-12#nam...
Checked. That one seems to prefer the newest version as explained above. 
Personally, I'd probably cited RFC 20 (STD 80) instead. Also, I'd 
probably preferred ANSI X3.4-1986 in order to show that this is 
something extremely stable, rather than to run the danger that some 
readers start to wonder what the changes between 1986 and 2022 may be.
...
If this is OK, we'll ask an AD to sign off.
If it's okay with you, I'd like you to ask the AD to sign off on my 
proposal, i.e. to *remove* the mention of the standard from the Note at 
the start of the page. But please let's wait for a week or so to see if 
there are comment from others.

Regards,   Martin.
...
thanks,
Amanda Baber
IANA Operations Manager

Re: [IANA #1365934] Re: Error in bibxlm2 entry for reference.ANSI.X3-4.1986.xml

Martin J. Dürst

Sławomir Osipiuk

tags

participants (2)