In article <3e57fd07-4aba-cb3f-5c18-f54daa110425@ix.netcom.com> you write:
-=-=-=-=-=-
I ran an analysis using an early draft of the Root Zone LGR for Latin.
I just went through all of the IDNs in contracted TLD zone files and checked them for compliance with IDN2003 and IDN2008, and also checked that that they were valid under the TLD's LGRs. I still need to clean up my data because new TLDs have often neglected to file thelr LGRs with IANA, but I have some overall observations. One is that the number of invalid IDNs is pretty small, in .com about 1000 out of a million IDNs, and similarly small in most other TLDs. But the invalid ones are much more likely to be malicious. For example, although most of the invalid names in .com are from the early 2000s and predate the rules, not all are. On October 23, 2014, someone registered xn--google-36d.com and xn--google-37d.com which are google.com with hard to see modifiers on the "g". They're registered through dynadot, which provides no WHOIS info at all, so I can't guess who it is or what they intend. there's no money in invalid IDNs, and there's no excuse for registries to permit them. The messsage here is that UA cuts both ways, and if we ask users and browsers to handle IDNs, at the same time we ask registries and registrars not to give the users IDNs that are garbage. Further aprpopos of the Farsight paper, it is my impression that the near-homographs it uses as examples are less likely to cause trouble than multi-label names with mixed L2R and R2L labels that can render the names overlaid so it looks like paypal.com but it's actually paypal.<R2Lstuff>.othername.com. There are ways to defend against this, e.g., the M3AAWG best practice on when to render IDNs, but it's not something we can ignore. R's, John