Hi, On Tue, Feb 12, 2019 at 03:32:54PM -0500, John Levine wrote:
The second version of IDNA, IDNA2008, recognized this problem and deliberately removed all the mappings. The idea was that experts in different scripts and languages would create mappings that make sense for people who use those scripts and speak those languages. The mappings would turn the user into into standardized U-labels that the IDN software can then use.
This isn't quite correct for the case of the dots in domain names. There are two additional important wrinkles here. First, IDNA is defined for _labels_, and not for _domain names_. This is perfectly clear in IDNA2008. It is less clear in IDNA2003, because while most of that specification _is_ about labels, there are some places where the whole domain mname is implicated. This is particularly true of label separators (the dots). That brings us, however, to two different problems. First, domain names are distributed in their operation, and that means that there is no way to be sure that the "whole domain name" is in one script. We see this today, quite commonly, where there are IDLs that live under traditional LDH-labels. For most Latin-based languages, this isn't really a problem, but where you have multiple scripts where at least one is not Latin, it's hard to be sure exactly which rules ought to apply. But more importantly, there is an additional problem with domain names: the label separators we are used to seeing _don't appear_ in the DNS. A domain name like crankycanuck.ca. does not appear, in the DNS, as a series of octets separated by a special character (.), but instead a series of octets bound by length indicators that also function as label separators (conceptually, it's like 12crankycanuck2ca00; the final 0 is a null label to indicate the root. This is, by the way, the reason it is possible to have a label with a . in it in the DNS. You rarely see these, but they sometimes show up in the responisble person field of the SOA record). Since the separator never actually appears in the DNS and since you're supposed to go label by label, this is a problem. Now, it _might_ be that an application that is attempting to handle IDNs that are likely to be entered in a given locale should do some sort of mapping of the normal stops in that locale: that's roughly what RFC 5895 suggests.
If there were an Armenian mapping for IDNs, when the characters in a domain name are Armenian, it handles Armenenian punctuation, and when the characters are Latin, Latin punctuation.
That won't, of course, work, because it is possible to have mixed code point repertoires either within or between labels. _Probably_ it would be safe just to map all stops to ".", but nobody knows and the last time we tried that it didn't work out. Best regards, A -- Andrew Sullivan ajs@anvilwalrusden.com