Guidelines on linkification for URLs with non-ASCII characters
Dear all, We are happy to share with you the Guidelines on linkification for URLs with non-ASCII characters, that have been developed by the Russian Working Group on Universal Acceptance recently. The document provides best practices related to identification in a text and automated creation of hyperlinks containing domain names and email addresses in non-ASCII scripts. It can be helpful for software developers implementing linkification mechanisms. The document also includes some proposals on how to behave if script mixing is detected in any label of the domain name. Hope these short guidelines can be of any assistance in your work on Universal Acceptance implementation. If you have any comments on the document, we would be glad to hear them. With best regards, Maria Kolesnikova
Dear Maria, Thanks a lot for sharing and keep the good works alway up. These guidelines indeed worth to have the attention, well done. Have a nice days. Sent from my iPhone On May 4, 2023, at 2:15 PM, Maria Kolesnikova via UA-discuss <ua-discuss@icann.org> wrote: Dear all, We are happy to share with you the Guidelines on linkification for URLs with non-ASCII characters, that have been developed by the Russian Working Group on Universal Acceptance recently. The document provides best practices related to identification in a text and automated creation of hyperlinks containing domain names and email addresses in non-ASCII scripts. It can be helpful for software developers implementing linkification mechanisms. The document also includes some proposals on how to behave if script mixing is detected in any label of the domain name. Hope these short guidelines can be of any assistance in your work on Universal Acceptance implementation. If you have any comments on the document, we would be glad to hear them. With best regards, Maria Kolesnikova <Guidelines_on_linkification_for_URLs_with_nonASCII_characters.pdf> _______________________________________________ UA-discuss mailing list UA-discuss@icann.org https://mm.icann.org/mailman/listinfo/ua-discuss _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
Maria, thanks for sharing this. One observation on 1.4.a (script mixing). There are known writing systems, namely Japanese, that will mix Unicode scripts in a single label. These scripts are Hiragana, Katakana and Han. Romaji (Latin) might be used in conjunction with the others too. Here are a few resources that touch on that: ICANN IDN guidelines 4.1 https://www.icann.org/en/system/files/files/idn-guidelines-22sep22-en.pdf Unicode Technical Standard 39 (Restriction-Level Detection, Highly Restrictive) http://www.unicode.org/reports/tr39/#Restriction_Level_Detection Hope this is useful. Dennis From: UA-discuss <ua-discuss-bounces@icann.org> on behalf of "UA-discuss@icann.org" <ua-discuss@icann.org> Reply-To: Maria Kolesnikova <masha@cctld.ru> Date: Thursday, May 4, 2023 at 7:15 AM To: "UA-discuss@icann.org" <ua-discuss@icann.org> Subject: [EXTERNAL] [UA-discuss] Guidelines on linkification for URLs with non-ASCII characters Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Dear all, We are happy to share with you the Guidelines on linkification for URLs with non-ASCII characters, that have been developed by the Russian Working Group on Universal Acceptance recently. The document provides best practices related to identification in a text and automated creation of hyperlinks containing domain names and email addresses in non-ASCII scripts. It can be helpful for software developers implementing linkification mechanisms. The document also includes some proposals on how to behave if script mixing is detected in any label of the domain name. Hope these short guidelines can be of any assistance in your work on Universal Acceptance implementation. If you have any comments on the document, we would be glad to hear them. With best regards, Maria Kolesnikova
Dear All, Dennis, Thank you for all the comments that we have received on the Guidelines on linkification for URLs with non-ASCII characters so far. We really appreciate it! Based on your input the Russian working group have prepared the updated version of the guidelines taking into account existence of local languages where script mixing is allowed by default. Initially we focused on mostly Cyrillic script and local languages based on it, that’s why script mixing issue was not considered in this particular aspect. Here are both versions of the document for your convenience – redlined and updated. The script mixing paragraph 1.4 was enriched by point (d) plus some other additions. We hope our work will be helpful for the whole community. In case UASG members consider this document valuable, you are welcome to use the proposed guidelines for further development. With warm wishes, Maria Kolesnikova From: Tan Tanaka, Dennis <dtantanaka@verisign.com> Sent: Thursday, May 4, 2023 4:54 PM To: masha@cctld.ru; ua-discuss@icann.org Subject: Re: [UA-discuss] Guidelines on linkification for URLs with non-ASCII characters Maria, thanks for sharing this. One observation on 1.4.a (script mixing). There are known writing systems, namely Japanese, that will mix Unicode scripts in a single label. These scripts are Hiragana, Katakana and Han. Romaji (Latin) might be used in conjunction with the others too. Here are a few resources that touch on that: ICANN IDN guidelines 4.1 https://www.icann.org/en/system/files/files/idn-guidelines-22sep22-en.pdf Unicode Technical Standard 39 (Restriction-Level Detection, Highly Restrictive) http://www.unicode.org/reports/tr39/#Restriction_Level_Detection Hope this is useful. Dennis From: UA-discuss < <mailto:ua-discuss-bounces@icann.org> ua-discuss-bounces@icann.org> on behalf of " <mailto:UA-discuss@icann.org> UA-discuss@icann.org" < <mailto:ua-discuss@icann.org> ua-discuss@icann.org> Reply-To: Maria Kolesnikova < <mailto:masha@cctld.ru> masha@cctld.ru> Date: Thursday, May 4, 2023 at 7:15 AM To: " <mailto:UA-discuss@icann.org> UA-discuss@icann.org" < <mailto:ua-discuss@icann.org> ua-discuss@icann.org> Subject: [EXTERNAL] [UA-discuss] Guidelines on linkification for URLs with non-ASCII characters Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Dear all, We are happy to share with you the Guidelines on linkification for URLs with non-ASCII characters, that have been developed by the Russian Working Group on Universal Acceptance recently. The document provides best practices related to identification in a text and automated creation of hyperlinks containing domain names and email addresses in non-ASCII scripts. It can be helpful for software developers implementing linkification mechanisms. The document also includes some proposals on how to behave if script mixing is detected in any label of the domain name. Hope these short guidelines can be of any assistance in your work on Universal Acceptance implementation. If you have any comments on the document, we would be glad to hear them. With best regards, Maria Kolesnikova
Hi Maria, Thanks for addressing the observation. You are correct that in a single script context script-mixing does not make sense, but since this paper may be intended for a wider audience then a clarification is warranted. The updated version does that so thank you. Just one minor edit, the “more information” references in 1.4.d should be 3.3 and 3.4, IDN guidelines and Unicode Report 39 respectively. Best, Dennis From: Maria Kolesnikova <masha@cctld.ru> Date: Tuesday, June 6, 2023 at 6:37 AM To: Dennis Tan Tanaka <dtantanaka@verisign.com>, "UA-discuss@icann.org" <ua-discuss@icann.org> Subject: [EXTERNAL] RE: [UA-discuss] Guidelines on linkification for URLs with non-ASCII characters Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Dear All, Dennis, Thank you for all the comments that we have received on the Guidelines on linkification for URLs with non-ASCII characters so far. We really appreciate it! Based on your input the Russian working group have prepared the updated version of the guidelines taking into account existence of local languages where script mixing is allowed by default. Initially we focused on mostly Cyrillic script and local languages based on it, that’s why script mixing issue was not considered in this particular aspect. Here are both versions of the document for your convenience – redlined and updated. The script mixing paragraph 1.4 was enriched by point (d) plus some other additions. We hope our work will be helpful for the whole community. In case UASG members consider this document valuable, you are welcome to use the proposed guidelines for further development. With warm wishes, Maria Kolesnikova From: Tan Tanaka, Dennis <dtantanaka@verisign.com> Sent: Thursday, May 4, 2023 4:54 PM To: masha@cctld.ru; ua-discuss@icann.org Subject: Re: [UA-discuss] Guidelines on linkification for URLs with non-ASCII characters Maria, thanks for sharing this. One observation on 1.4.a (script mixing). There are known writing systems, namely Japanese, that will mix Unicode scripts in a single label. These scripts are Hiragana, Katakana and Han. Romaji (Latin) might be used in conjunction with the others too. Here are a few resources that touch on that: ICANN IDN guidelines 4.1 https://www.icann.org/en/system/files/files/idn-guidelines-22sep22-en.pdf<https://secure-web.cisco.com/1Isv9F-8uloVh4WychzF5bR_5dRjc6j2fTsNjx1bnX1MWTl7e6p8v6fB73jHlAYPju4pEKXp8eRkh7NWOs30ZfQupW7aqjWO7Zs2Bcgzvw-k7strGaaPf4ntmXzHHDtH9-kxTPTQUldw8TSimcxPR34ygIU19gHKkX6uqv1nDBHf0cQxEe_btXsah97kkKxm8i6yM8zmiS2Qbv7N4MvxPlK5iNf8EQ_19406WvT74npG1EAJd5pzhkfuFEOikPg45kDQh5ng2O6Vc7nRDSj4hHEcRAoV5_FhOtYoDq4c3bGAWKRqowquzBFwAgj0Yk4f6/https%3A%2F%2Fwww.icann.org%2Fen%2Fsystem%2Ffiles%2Ffiles%2Fidn-guidelines-22sep22-en.pdf> Unicode Technical Standard 39 (Restriction-Level Detection, Highly Restrictive) http://www.unicode.org/reports/tr39/#Restriction_Level_Detection<http://secure-web.cisco.com/1wmbI-dpaPGkVUuYyIBcud-eCA9uzJF5hkLinz_LvvDYKf0c1lRLg8gRaH52-eKX2qh3BTbmjaOcuAMjk_HvOgteorqxOh10bioRblwK7Lg1rkuAgHNlSXNjHnHU4T5uMsOn55fcjOeKDDL_WJ_K8iL69Fd5Y07YKvYimgAaw1uyqw6w4hp471WO7EdYhQJehJb6Ng2wgK43kzMxaseBwusuHdRhW5U3-7pESKy89YTSN6fEPJoOs5PeWVxDl0wVZa4Vmzqt4mPst1WqibPzgjH8s-_kVAVnFvCuD9h6x7az7OLdcq_edkEirJEn7evrL/http%3A%2F%2Fwww.unicode.org%2Freports%2Ftr39%2F%23Restriction_Level_Detection> Hope this is useful. Dennis From: UA-discuss <ua-discuss-bounces@icann.org<mailto:ua-discuss-bounces@icann.org>> on behalf of "UA-discuss@icann.org<mailto:UA-discuss@icann.org>" <ua-discuss@icann.org<mailto:ua-discuss@icann.org>> Reply-To: Maria Kolesnikova <masha@cctld.ru<mailto:masha@cctld.ru>> Date: Thursday, May 4, 2023 at 7:15 AM To: "UA-discuss@icann.org<mailto:UA-discuss@icann.org>" <ua-discuss@icann.org<mailto:ua-discuss@icann.org>> Subject: [EXTERNAL] [UA-discuss] Guidelines on linkification for URLs with non-ASCII characters Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Dear all, We are happy to share with you the Guidelines on linkification for URLs with non-ASCII characters, that have been developed by the Russian Working Group on Universal Acceptance recently. The document provides best practices related to identification in a text and automated creation of hyperlinks containing domain names and email addresses in non-ASCII scripts. It can be helpful for software developers implementing linkification mechanisms. The document also includes some proposals on how to behave if script mixing is detected in any label of the domain name. Hope these short guidelines can be of any assistance in your work on Universal Acceptance implementation. If you have any comments on the document, we would be glad to hear them. With best regards, Maria Kolesnikova
Dear Maria, Congratulations to "Russian Working Group on Universal Acceptance" and specially Maria. This work is expected to help entire community. Best regards, Anil Kumar Jain From: ua-discuss@icann.org To: ua-discuss@icann.org Sent: Thursday, May 4, 2023 4:45:20 PM Subject: [UA-discuss] Guidelines on linkification for URLs with non-ASCII characters Dear all, We are happy to share with you the Guidelines on linkification for URLs with non-ASCII characters, that have been developed by the Russian Working Group on Universal Acceptance recently. The document provides best practices related to identification in a text and automated creation of hyperlinks containing domain names and email addresses in non-ASCII scripts. It can be helpful for software developers implementing linkification mechanisms. The document also includes some proposals on how to behave if script mixing is detected in any label of the domain name. Hope these short guidelines can be of any assistance in your work on Universal Acceptance implementation. If you have any comments on the document, we would be glad to hear them. With best regards, Maria Kolesnikova _______________________________________________ UA-discuss mailing list UA-discuss@icann.org https://mm.icann.org/mailman/listinfo/ua-discuss _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
participants (4)
-
Abdalmonem Galila -
Anil Jain -
Maria Kolesnikova -
Tan Tanaka, Dennis