Review of UASG018

Don Hollander

Dec. 31, 2018

9:30 p.m.

Please find attached a link to a proposed revision of UASG018. http://viagenie.ca/ua/UASG_Programming_Language_Framework_Review_v1.0%20-%20... Comments welcome, of course. Don Don Hollander Secretary General - UASG Skype: Don_Hollander

Attachments:

attachment.html (text/html — 2.7 KB)

Show replies by date

Jim DeLaHunt

January 2019

1:36 a.m.

Don: Thank you for sending this document out for review. It looks like this is only part of the complete content. Either a separately-authored piece of content needs to be merged in, or a companion document needs to be distributed with this one and cross-referenced. In either case, I don't think this .docx file by itself fulfills the goal of UASG018. Details below. On 2018-12-31 13:30, Don Hollander wrote:

...

Please find attached a link to a proposed revision of UASG018.

http://viagenie.ca/ua/UASG_Programming_Language_Framework_Review_v1.0%20-%20...

Comments welcome, of course.

Don

Don Hollander

Secretary General – UASG

Skype: Don_Hollander

My background: I gave a presentation on the assessments encouraged by this document in Oct 2017 to the 41st Internationalization and Unicode Conference (/Universal Acceptance of non-Latin email addresses and domain names: how does your framework rate? (IUC41 presentation)/, <http://blog.jdlh.com/en/2017/10/31/universal-acceptance-eval-iuc41/>). Top level comments from a quick read-through of this version: /About this Document/, p. 5: "Technical details required by those performing library evaluations are presented in a separate document. This separation of documents is purely due to technical restrictions in the document platform. [Footnote] The tables in the technical presentation are wide, and best presented in landscape form. Google Docs cannot at present mix portrait and landscape pages in a single document." What is this "separate document"? I don't see a cross-reference to it. I don't see it circulated with this document. The purpose of UASG018 is, as I understand it, to describe an evaluation methodology in sufficient detail that some competent engineer could use it as a guide to performing an evaluation of a framework, and that evaluations done by different people following UASG018 for different frameworks would provide comparable insight about the frameworks. Without the "technical details required by those performing library evaluations", this document is incomplete for that task. And, why accept this limitation of the Google Docs tool as a reason for making UASG018 a half-document? There are many alternatives. 1. Find an authoring tool more up to the requirements of the content. 2. Author the landscape-format tables as a separate Google Doc making a separate PDF file, and merge the PDF files into one UASG018 in PDF form. 3. Author the tables in landscape orientation, on portrait-orientation pages, and let their text become tiny. Those reading online can zoom in to make the text readable. 4. Distribute UASG018 as two PDF or .docx files. But I don't think this single .docx file as distributed is acceptable to do the work of UASG018. /Page count/: was 24 pages in Version 0.96 (March 10th, 2017), the previous copy I had locally. It is 17 pages in Version 1.1 (July 13th, 2018). The list of changes don't explain to me how 17 pages was cut. Some is editorial: a blank page 2 was dropped. But some must have been substantive. The revision history should give a clue if scope was changed. /File name/: the document circulated has "1.0" and 2018-12-31 in its file name, /UASG_Programming_Language_Framework_Review_v1.0 - VG 2018-12-31.docx/ . But the content of the doc says it is "Version 1.1 (July 13th, 2018)". I suggest that both version number and date in the filename match the document content, or else it will be hard to find the correct version of the document. /Font of code samples/: e.g. Appendix A, code sample 1. On my computer, most of the code is in the Roboto Mono font, but the line "const char *name = u8"普遍接受-测试.世界";" appears in SimSun font. This is probably to represent the Chinese text in the example well. However, SimSun looks very different from Roboto Mono. It has thinner lines, different character widths, the glyphs have serifs, and the quote marks appear slanted instead of as upright C-language string delimiters. The code samples should be formatted in a consistent font. /Appendix B - References/, page 16: The references to external documents are given as links, with link text being the title of the document, and the link reference being the URL of the document. This is workabout but not terribly clear. It would be better to give the title, the date of the revision accessed, then URL of the document as both link text and link reference. This way, someone who copies the reference and pastes into a text-only document won't lose the link. It is also closer to traditional citation style. "/a set of comprehensive test data/",//Footnote 3, page 7: This footnote text links to <ftp://ftp.unicode.org/Public/idna/latest/IdnaMappingTable.txt>. If that resource is useful enough to link to, it should be in Appendix B as well. /Test case strings in reuseable form/: I had suggested before that UASG provide the strings listed as test cases in this document (and in its "technical details required by those performing library evaluations" sister document) be also provided as reusable data files. I recall you encouraging me to do this, and I have not yet followed through. I still think it's a good idea. This review cycle might be an opportunity to get it done. Once done, this document should link to that resource also. I hope these comments are helpful in getting discussion going. I look forward to hearing from others. Best regards, —Jim DeLaHunt, Vancouver, Canada -- --Jim DeLaHunt, jdlh@jdlh.com http://blog.jdlh.com/ (http://jdlh.com/) multilingual websites consultant 355-1027 Davie St, Vancouver BC V6E 4L2, Canada Canada mobile +1-604-376-8953

Jim Hague

4:19 p.m.

On 05/01/2019 01:36, Jim DeLaHunt wrote:

...

/Page count/: was 24 pages in Version 0.96 (March 10th, 2017), the previous copy I had locally. It is 17 pages in Version 1.1 (July 13th, 2018). The list of changes don't explain to me how 17 pages was cut. Some is editorial: a blank page 2 was dropped. But some must have been substantive. The revision history should give a clue if scope was changed.

As one of the authors of the initial version of UASG018, I was interested in what substantive changes had been made. On the basis of a brief review, I can summarise these as: 1. All non-technical library evaluation has been removed. 2. Low-level tests removed: - L-R2A, IDNA2008, convert registration label to ASCII registry form. - L-DNC, IDNA2008, domain name equivalence comparison. 3. High-level tests removed: - H-DND, Domain name, decompose into component labels. - H-ED, Email address, decompose into components (i.e. mailbox, domain). - H-US, URL, syntactic check. - H-UD, URL, decompose into components (i.e. scheme, domain, user, port, path, arguments) 4. High-level test added: - H-ID, identifier lookup, compare identifier stored in the system against one used to authenticate user (whatever that means). Verify it follows RFC8264. We would be intrigued to learn the rationale for these changes. The non-technical library evaluation is intended to summarise information developers need when deciding to incorporate a library into a product. The aspects summarised are, in practice, at least as important as simple technical correctness. The removed tests all cover operations that application programmers need to perform and, with the possible exception of L-R2A, need to perform frequently. These operations are complex to perform correctly within an application, and so are very commonly not implemented correctly. Finally, we are puzzled as to the usefulness of H-ID in the context of UA. RFC5890 explicitly notes that it does not use Stringprep (RFC3454). PRECIS(RFC8264), the successor to Stringprep, is therefore surely irrelevant to IDNA/UA? -- Jim Hague - jim@sinodun.com Never trust a computer you can't lift.

Marc Blanchet

5:31 p.m.

On 7 Jan 2019, at 11:19, Jim Hague wrote:

...

On 05/01/2019 01:36, Jim DeLaHunt wrote:

...
/Page count/: was 24 pages in Version 0.96 (March 10th, 2017), the previous copy I had locally. It is 17 pages in Version 1.1 (July 13th, 2018). The list of changes don't explain to me how 17 pages was cut. Some is editorial: a blank page 2 was dropped. But some must have been substantive. The revision history should give a clue if scope was changed.

As one of the authors of the initial version of UASG018, I was interested in what substantive changes had been made.

thanks for the review. A bit of rationale: - we wanted to have a short, to-the-point, implementable, « scientific » list of test cases to be implemented in a test suite.

...

On the basis of a brief review, I can summarise these as:

1. All non-technical library evaluation has been removed.

from rationale above, these were removed on purpose. Not that they don’t have any value, just we did not intend to fill those. Also, many of them are subjective and subject to interpretation.

...

2. Low-level tests removed: - L-R2A, IDNA2008, convert registration label to ASCII registry form. - L-DNC, IDNA2008, domain name equivalence comparison. - H-DND, Domain name, decompose into component labels.

Could not find many libraries (in those that we have tested) offering these operations/API.

...

- H-ED, Email address, decompose into components (i.e. mailbox, domain). - H-US, URL, syntactic check. - H-UD, URL, decompose into components (i.e. scheme, domain, user, port, path, arguments)

IRI are in a bad shape currently in standards. There have been an IETF wg trying to fix the earlier RFC (which is essentially fairly incomplete and buggy) but the wg did not succeed and stopped. Moreover, there are browsers specs that differ from RFCs. So it is currently a difficult and non clear environment. Therefore, we decided to focus on domain name only of an URL/URI/IRI.

...

4. High-level test added: - H-ID, identifier lookup, compare identifier stored in the system against one used to authenticate user (whatever that means). Verify it follows RFC8264.

because email addresses are often used as identifiers in various systems, protocols, in authentication frameworks. (i.e. most web apps use your email address as username.) Therefore, its support is important, since if it does not correctly support your i18n email address as identifier, then you can not successfully login…: pretty big to me! And RFC8264 is the current standard for i18n identifiers.

...

We would be intrigued to learn the rationale for these changes.

hope rationale and details above answers your questions. Regards, Marc (on behalf of Viagenie team)

...

The non-technical library evaluation is intended to summarise information developers need when deciding to incorporate a library into a product. The aspects summarised are, in practice, at least as important as simple technical correctness.

The removed tests all cover operations that application programmers need to perform and, with the possible exception of L-R2A, need to perform frequently. These operations are complex to perform correctly within an application, and so are very commonly not implemented correctly.

Finally, we are puzzled as to the usefulness of H-ID in the context of UA. RFC5890 explicitly notes that it does not use Stringprep (RFC3454). PRECIS(RFC8264), the successor to Stringprep, is therefore surely irrelevant to IDNA/UA? -- Jim Hague - jim@sinodun.com Never trust a computer you can't lift.

Jim Hague

2:44 p.m.

On 07/01/2019 17:31, Marc Blanchet wrote:

...

On 7 Jan 2019, at 11:19, Jim Hague wrote:

...
As one of the authors of the initial version of UASG018, I was interested in what substantive changes had been made.

thanks for the review. A bit of rationale: - we wanted to have a short, to-the-point, implementable, « scientific » list of test cases to be implemented in a test suite.

OK, thanks. I think this is a bit of a change of emphasis, then; we were aiming at producing a document that would guide developers in their choice of a library and inform them of how much of the functionality they might require is present.

...

...
On the basis of a brief review, I can summarise these as:

1. All non-technical library evaluation has been removed.

from rationale above, these were removed on purpose. Not that they don’t have any value, just we did not intend to fill those. Also, many of them are subjective and subject to interpretation.

There is, indeed, some subjectivity, which is why those items are not scored. We did feel they were items developers would need to make an decision about a library. Put it another way, it's certainly information that would heavily influence any decision I might make about the worth and usability of any library I'm evaluating for use in my applications.

...

...
2. Low-level tests removed: - L-R2A, IDNA2008, convert registration label to ASCII registry form. - L-DNC, IDNA2008, domain name equivalence comparison. - H-DND, Domain name, decompose into component labels.

Could not find many libraries (in those that we have tested) offering these operations/API.

That's true. Again, we were starting from the point of view of functions that developers might need from a library, and rating libraries against those potential needs, with a view to informing library authors/vendors about items that would be required to improve UA takeup. We feel that the current state of library availability and range of functions inhibits correct implementation of UA. However, if the goal of this document is moved to just document current provision, removing the tests is consistent (though in the case of L-DNC, I note that at least one major library does implement that function).

...

...
- H-ED, Email address, decompose into components (i.e. mailbox, domain). - H-US, URL, syntactic check. - H-UD, URL, decompose into components (i.e. scheme, domain, user, port, path, arguments)

IRI are in a bad shape currently in standards. There have been an IETF wg trying to fix the earlier RFC (which is essentially fairly incomplete and buggy) but the wg did not succeed and stopped. Moreover, there are browsers specs that differ from RFCs. So it is currently a difficult and non clear environment. Therefore, we decided to focus on domain name only of an URL/URI/IRI.

I understand that IRIs are a minefield; our intention was that developers might have access at least to current best practice functions and have a clear understanding of their limitations rather than carry on rolling their own. Does the same apply to email address decomposition?

...

...
4. High-level test added: - H-ID, identifier lookup, compare identifier stored in the system against one used to authenticate user (whatever that means). Verify it follows RFC8264.

because email addresses are often used as identifiers in various systems, protocols, in authentication frameworks. (i.e. most web apps use your email address as username.) Therefore, its support is important, since if it does not correctly support your i18n email address as identifier, then you can not successfully login…: pretty big to me! And RFC8264 is the current standard for i18n identifiers.

Our version of the document pre-dates RFC8264, so adding a test for PRECIS identifier comparison is a good thing and (if I read the RFC correctly) forms a superset of L-DNC, where L-DNC is H-ID with the IDNA2008 profile. However, looking at the case of a email address as identifier, is it your intention that two email addresses with different domains that compare equal under IDNA2008 should compare equal when passed to H-ID? In other words, that H-ID will spot they are email addresses, decompose to mailbox and domain, and perform type-specific comparison on each? I'm wondering why this function is classified high-level, rather than being presented as a simple low-level comparison or transformation function? -- Jim Hague - jim@sinodun.com Never trust a computer you can't lift.

2714

Age (days ago)

2722

Last active (days ago)

List overview

Download

4 comments

4 participants

participants (4)

Don Hollander
Jim DeLaHunt
Jim Hague
Marc Blanchet