Maria, thanks for sharing this.

One observation on 1.4.a (script mixing). There are known writing systems, namely Japanese, that will mix Unicode scripts in a single label. These scripts are Hiragana, Katakana and Han. Romaji (Latin) might be used in conjunction with the others too.

Here are a few resources that touch on that:

ICANN IDN guidelines 4.1 https://www.icann.org/en/system/files/files/idn-guidelines-22sep22-en.pdf

Unicode Technical Standard 39 (Restriction-Level Detection, Highly Restrictive) http://www.unicode.org/reports/tr39/#Restriction_Level_Detection

Hope this is useful.

Dennis

From: UA-discuss <ua-discuss-bounces@icann.org> on behalf of "UA-discuss@icann.org" <ua-discuss@icann.org>
Reply-To: Maria Kolesnikova <masha@cctld.ru>
Date: Thursday, May 4, 2023 at 7:15 AM
To: "UA-discuss@icann.org" <ua-discuss@icann.org>
Subject: [EXTERNAL] [UA-discuss] Guidelines on linkification for URLs with non-ASCII characters

Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Dear all,

We are happy to share with you the Guidelines on linkification for URLs with non-ASCII characters, that have been developed by the Russian Working Group on Universal Acceptance recently.

The document provides best practices related to identification in a text and automated creation of hyperlinks containing domain names and email addresses in non-ASCII scripts. It can be helpful for software developers implementing linkification mechanisms. The document also includes some proposals on how to behave if script mixing is detected in any label of the domain name.

Hope these short guidelines can be of any assistance in your work on Universal Acceptance implementation.

If you have any comments on the document, we would be glad to hear them.

With best regards,

Maria Kolesnikova