A chance to improve UA - Call for review of Unicode Linkification draft UTS #58
Universal Acceptance colleagues: We have an opportunity to review a potentially very influential tool for promoting UA. The Unicode Consortium is drafting a proposed Unicode Technical Standard, #58, "Unicode Linkification". As you know, linkification is the process of recognising URLs in plain text, and turning those URLs into active hyperlinks. A big UA problem is that many pieces of software do not linkify Internationalised Domain Names, or globally inclusive email addresses. This is based in part on a limited conception that only latin script letters can appear in domain names or email addresses, and that only a few top level domains are valid. I believe that Unicode Technical Standards are very authoritative in the parts of the software world which we seek to influence. If this UTS gets linkification right, it will be a tool we can use to persuade apps which fall short to do better. If the UTS gets linkification wrong, it will be difficult for us to persuade apps to ignore the UTS and instead do what we say is right. The Unicode Consortium has published a draft of UTS #58, "Unicode Linkification", at <https://www.unicode.org/reports/tr58/>. It is a detailed technical document, including algorithms for detecting the beginning and end of links, and escaping, and data on characters properties. They welcome feedback via a form they link to in the draft. I encourage all UASG participants, with ideas on how linkification should work in order to promote Universal Acceptance, to review the draft and to submit comments. I do not see a specific deadline, but we should not delay. Best regards, —Jim DeLaHunt, Vancouver, Canada -- --Jim DeLaHunt,jdlh@jdlh.com http://blog.jdlh.com/ (http://jdlh.com/) multilingual websites consultant, Vancouver, Canada
UA Colleagues: Someone pointed out to me a challenging case for linkification. I would like your help gathering examples to suggest to the UTS #58 work at Unicode. As you know, some scripts in the world use right-to-left directionality, e.g. Arabic, Hebrew, Adlam. Many scripts in the world use left-to-right directionality, e.g. Latin, Thai. Can anyone give me examples of plausible *email addresses* from the real world which include both right-to-left and left-to-right text, and which might cause problems for linkification algorithms? What about examples of *domain names* which include both right-to-left and left-to-right text? What about *full URLs* (i.e. domain name plus path) which include both right-to-left and left-to-right text? Maybe the domain name has one directionality, and the path elements have the other. Or some path elements have one directionality, and other path elements have the other. I can invent contrived examples, but examples drawn from real usage are much more compelling. I would appreciate any suggestions, either as replies to this email discussion or directly to me. I will submit the most helpful as comments to UTS #58. Also, I repeat my encouragement for you to submit any suggestions you have for the UTS #58 proposed standard on linkification directly to the Unicode Consortium, following the instructions at <https://www.unicode.org/reports/tr58/>. Best regards, —Jim DeLaHunt On 2024-12-06 13:56, Jim DeLaHunt via UA-discuss wrote:
Universal Acceptance colleagues:
We have an opportunity to review a potentially very influential tool for promoting UA. The Unicode Consortium is drafting a proposed Unicode Technical Standard, #58, "Unicode Linkification".
As you know, linkification is the process of recognising URLs in plain text, and turning those URLs into active hyperlinks. A big UA problem is that many pieces of software do not linkify Internationalised Domain Names, or globally inclusive email addresses. This is based in part on a limited conception that only latin script letters can appear in domain names or email addresses, and that only a few top level domains are valid.
I believe that Unicode Technical Standards are very authoritative in the parts of the software world which we seek to influence. If this UTS gets linkification right, it will be a tool we can use to persuade apps which fall short to do better. If the UTS gets linkification wrong, it will be difficult for us to persuade apps to ignore the UTS and instead do what we say is right.
The Unicode Consortium has published a draft of UTS #58, "Unicode Linkification", at
<https://www.unicode.org/reports/tr58/>.
It is a detailed technical document, including algorithms for detecting the beginning and end of links, and escaping, and data on characters properties.
They welcome feedback via a form they link to in the draft. I encourage all UASG participants, with ideas on how linkification should work in order to promote Universal Acceptance, to review the draft and to submit comments. I do not see a specific deadline, but we should not delay.
Best regards, —Jim DeLaHunt, Vancouver, Canada
-- --Jim DeLaHunt,jdlh@jdlh.com http://blog.jdlh.com/ (http://jdlh.com/) multilingual websites consultant, Vancouver, Canada
_______________________________________________ UA-discuss mailing list --ua-discuss@icann.org To unsubscribe send an email toua-discuss-leave@icann.org _______________________________________________ By submitting your personal data, you consent to the processing of your personal data for purposes of subscribing to this mailing list accordance with the ICANN Privacy Policy (https://www.icann.org/privacy/policy) and the website Terms of Service (https://www.icann.org/privacy/tos). You can visit the Mailman link above to change your membership status or configuration, including unsubscribing, setting digest-style delivery or disabling delivery altogether (e.g., for a vacation), and so on.
-- --Jim DeLaHunt,jdlh@jdlh.com http://blog.jdlh.com/ (http://jdlh.com/) multilingual websites consultant, Vancouver, Canada
participants (2)
-
anil Jain -
Jim DeLaHunt