All, I've started to put together a list of tentative cross script homoglyphs. This is based partially on tables published by Unicode and data found in LGR proposals for the Root Zone. I've augmented the set with some of my own research. I've also indicated whether a code point was considered to be in likely widespread modern use as indicated by its inclusion into the Maximal Starting Repertoire for the Root Zone, MSR-2. Unicode's data cover code points that are "intentional" (that is expected to look the same). Unfortunately for anyone working with IDNA 2008, they contain a lot of irrelevant entries (which might be useful for other types of identifiers, perhaps) and they are presented in a format that requires knowing the NFD decomposition for all code points; easy for an algorithm, difficult for human reviewers working off IDNA 2008 PVALID lists (which are in NFC, that is composed). Finally, there are some curious omissions in the data. (Unicode publishes a rather larger list of "confusables", but my take is that there, the signal to noise ratio is unfavorable). I have removed the few items that constituted pure in-script duplication. The LGR data contain some additional suggested homoglyphs. Some of these are not as purely "intentional" as the Unicode set, but as they have been reviewed by the relevant communities, I've added them here. I have not added homoglyphs across script boundaries but inside multi-script writing systems, like the homoglyphs set of code points that link Hiragana and Katakana. However, it might be useful to add the set of Kana to Han homoglyphs (because they might be usable to spoof Chinese-only domains). Whether or not that is useful is one of the questions I hope to get answered by sharing the collection at this stage. (So far, they are not listed). Comments welcome, A./ PS: I have, as of yet, not provided the full listing of homoglyph relations where one script has a precomposed a code point and the other has a combining sequence. (Many of these code points are not in the widest use, so they do not constitute a priority). PPS: I'm using an RFC7940-based tool suite, so the result is formatted and looks like an LGR, but that's not the point. It's just like the guy with the hammer, to whom everything looked like a nail. The match is not really that bad in this case; the formatting gives some nice freebies like automatic display of Unicode names, script values and the like. However, a final collection might look very different, so I request feedback on the contents and scope, not the layout.