Maria, thanks for sharing this.
One observation on 1.4.a (script mixing). There are known writing systems, namely Japanese, that will mix Unicode scripts in a single label. These scripts are Hiragana, Katakana and Han. Romaji (Latin) might be used in conjunction with
the others too.
Here are a few resources that touch on that:
ICANN IDN guidelines 4.1
https://www.icann.org/en/system/files/files/idn-guidelines-22sep22-en.pdf
Unicode Technical Standard 39 (Restriction-Level Detection, Highly Restrictive)
http://www.unicode.org/reports/tr39/#Restriction_Level_Detection
Hope this is useful.
Dennis
From:
UA-discuss <ua-discuss-bounces@icann.org> on behalf of "UA-discuss@icann.org" <ua-discuss@icann.org>
Reply-To: Maria Kolesnikova <masha@cctld.ru>
Date: Thursday, May 4, 2023 at 7:15 AM
To: "UA-discuss@icann.org" <ua-discuss@icann.org>
Subject: [EXTERNAL] [UA-discuss] Guidelines on linkification for URLs with non-ASCII characters
|
Caution: This email originated from outside the organization. Do not click links or open attachments unless
you recognize the sender and know the content is safe. |
Dear all,
We are happy to share with you the Guidelines on linkification for URLs with non-ASCII characters, that have been developed by the Russian Working Group on Universal
Acceptance recently.
The document provides best practices related to identification in a text and automated creation of hyperlinks containing domain names and email addresses in non-ASCII scripts. It can be helpful for software developers implementing linkification mechanisms.
The document also includes some proposals on how to behave if script mixing is detected in any label of the domain name.
Hope these short guidelines can be of any assistance in your work on Universal Acceptance implementation.
If you have any comments on the document, we would be glad to hear them.
With best regards,
Maria Kolesnikova