On 5/13/2019 10:08 AM, John Levine wrote:
On Mon, 13 May 2019, Ram Mohan wrote:
While it's a straightforward argument to say no variants should be allowed on the DNS, the reality in many linguistic locales is that variants are a part of everyday life. Not just in the Han script, but in Indic and Arabic scripts, among others. We can't wish them away, nor do we have the luxury of saying the DNS wasn't designed for it, so it shall never support it.
I think there's a large gap between "many writing systems can write the same thing in different ways" and "those different ways should be in the DNS."
It's easy to see why you'd block variants, but particularly given the utter lack of tools to provision them, and no interest in creating those tools, hard to see why you'd delegate them.
John, sorry, I'm with Ram on this one. Where I agree with you is on the beneficial nature of blocked variants. They are a cheap and underused tool to limit the attack surface for deceptive registrations. However, once you block a variant, you take away the option for applicants to apply for the variant even if they have registered the original label. Where variants are unrelated, that's not an issue. But in many scripts you have situations where different keyboards, for example, may have one of the variants, but not the other. And where both variants are used for the same letter. By blocking such variants, you exclude one community from "reaching" any label registered for another one. We don't really have that situation in the Latin script, not even with European languages. The closest you can get is that Danish uses "ø" for the letter for which Swedish uses "ö". However, the same word, like the name of the Danish capital, is often spelled differently elsewhere in the word, e.g. Köpenhamm instead of København. (Therefore, Copenhagen can simply apply for all three spellings, and additional ones as well - such as the German spelling; even if the two code points were variants, the labels are not). That reduces the case for making these two letters variants of each other. However, between Arabic and Persian, you'll have cases where geographic names differ ONLY by such local variant use. You could target only one community. Not allowing someone to register both variants in this situation causes just as many problems as ignoring the variant relation altogether and letting an unrelated party register the variant. There are many similar examples, and the best way to handle them is to support allocatable variants. They still block the access to registration of the variant label by unrelated parties, but allow one applicant to register both. With the new LGR format, you can express some further constraints so that only a limited number of variant *labels* can be allocated. For example, if you have a pair of code points that are variants, and a pair of labels that contain 2 copies of each (in matching positions) you would normally get a set of 4 variant labels. An easy way to constrain that is to limit all variants to be from the same subset (e.g. either all Persian, or all Arabic). Allocatable variants still leave it to the discretion of the applicant as to whether to apply for more than one variant. Some people prefer automatic activation, where all applicant would receive all variants. There may be some cases where that would match the overall users' expectations. For example, there are three sets of digits used with the Arabic script, each being in "native" use in a different region. Two of these sets even share many digit forms. Since any numbers these digits represent are obviously the same, and in some cases, users cannot tell which set of digits is used, forcing activation may be justified. No matter how you come down on that last example, there's no escaping the need to deal with scripts that do have such variants. The LGR format in RFC 7940 is making a start in letting you express more of your registry policies in a machine-readable format that was possible before (even with the earlier IDN table format extensions for Chinese and Arabic). Time to put some of the other tools into place. A./