This document is mechanically formatted from the XML file for the LGR. It provides additional summary data and explanatory text. The XML file remains the sole normative specification of the LGR.
| Date | 2016-12-12 |
|---|---|
| LGR Version | 1 |
| Unicode Version | 10.0.0 |
Note to Reviewers:
While this document is formatted like an LGR, it does not define a
repertoire or Label Generation Rules. Instead, it uses the variant formalism to present sets of
code points that are identical across scripts. Doing so allowed the use of existing tools for
verification and cross-checking tasks, as well as formatting the data into HTML-based tables.
Review is solicited on whether the code points constitute cross-script homoglyphs and whether having
such a data collection is considered useful.
This file presents a collection of code points and code point sequences that could be considered cross-script homoglyphs. The focus is on code points that cannot be distinguished, because they are shown with identical glyphs in most or all fonts.
This file was generated by starting with Intentional.txt from Version 10.0.0, filtered to exclude DISALLOWED code points from IDNA 2008 (the latter set is based on Unicode version 6.3.0). Also filtered out were any "in-script" homoglyphs, that is any code point that are identical to another code point of the same script, and not also a cross-script homoglyph.
In addition, code points that where cited as cross-script homoglyphs in relevant LGR proposals have been added and referenced to the corresponding proposal
This list is augmented by adding a few code points that are not intentionally the same, but effectively identical. Those code points may have a nominally distinct shape, as shown in the code charts, and while some fonts may make that distinction, many or most common fonts do not. In some cases a code point has two common glyph shape and one or both may be identical to the shape for another code point.
There is a much larger set of code points that are pairwise similar, sometimes confusingly so; compare the well known examples of DIGIT ONE (1) and SMALL L (l) that predates the development of IDNs. These "confusables" are not considered true homoglyphs and are excluded here.
A number of code points in this file are for scripts that are not in widespread modern use or they are deprecated or obsolete code points in otherwise modern scripts. For these code points, known cross-script homoglyphs have nevertheless been listed. For deprecated and known obsolete code points, the conservative approach would be to not allow them, which also removes the problem of their homoglyph relations. For cases where the script itself is in limited or not in modern use, it should be noted that these may not be well-understood enough to be sure that all cross script homoglyph relations are known. They may also have other as-yet-unidentified problems for use with identifiers.
Code points referenced with [100] are included in the MSR-2. The MSR-2 is limited to code points in widespread modern use. This does not mean that all of these code points are not "safe", but at least they are moderately well understood, and information about them is available in this and similar data collections.
A note on Intentional.txt: the methodology for that file normalizes to NFD (fully decomposed) not NFC (composed) which is the way IDNs are normalized. Accordingly, precomposed code points corresponding to NFD sequences differing only by codepoints that are listed in Intentional.txt are also considered "intentional" here and are referenced as [151].
| Number of elements in Repertoire | 135 |
|---|---|
| Number of excluded elements | 2 |
| Total entries in table | 137 |
| Longest code point sequence | 2 |
| Number of code points | 135 |
| Number of sequences | 2 |
The following table lists the repertoire by code point (or code point sequence). The data in the Script and Name column are extracted from the Unicode character database. Where a comment in the original LGR is equal to the character name, it has been suppressed.
For any code point or sequence for which a variant is defined, additional information is provided in the Variants column. Some code points or sequences listed in the following table are not part of the repertoire itself; they document targets for out-of-repertoire variant mappings or optional code points as indicated. See also the legend provided below the table.
| Code Point |
Glyph | Script | Name | References | Required Context | Part of Repertoire |
Variants | Comment |
|---|---|---|---|---|---|---|---|---|
| U+0061 | a | Latin | LATIN SMALL LETTER A | [100], [150], [202] | ✔ | set 1 | U+0061 is identical to U+0430 | |
| U+0063 | c | Latin | LATIN SMALL LETTER C | [100], [150], [202] | ✔ | set 2 | U+0063 is identical to U+0441 | |
| U+0064 | d | Latin | LATIN SMALL LETTER D | [100], [150] | ✔ | set 3 | U+0064 is identical to U+0501 | |
| U+0065 | e | Latin | LATIN SMALL LETTER E | [100], [150], [202] | ✔ | set 4 | U+0065 is identical to U+0435 | |
| U+0067 | g | Latin | LATIN SMALL LETTER G | [100], [204] | ✔ | set 5 | U+0067 not reliably distinguished from U+0581 | |
| U+0068 | h | Latin | LATIN SMALL LETTER H | [204] | ✔ | set 6 | U+0068 is not always distinct from U+04BB and U+0570 | |
| U+0069 | i | Latin | LATIN SMALL LETTER I | [100], [150], [202] | ✔ | set 7 | U+0069 is identical to U+0456 | |
| U+006A | j | Latin | LATIN SMALL LETTER J | [100], [150], [202] | ✔ | set 8 | U+006A is identical to U+03F3 | |
| U+006C | l | Latin | LATIN SMALL LETTER L | [100] | ✔ | set 9 | U+006C is frequently identical to U+04CF | |
| U+006E | n | Latin | LATIN SMALL LETTER N | [100], [204] | ✔ | set 10 | U+006E not always distinct from U+0578 | |
| U+006F | o | Latin | LATIN SMALL LETTER O | [100], [150], [202] | ✔ | set 11 | U+006F is identical to U+03BF and U+043E and U+585 | |
| U+0070 | p | Latin | LATIN SMALL LETTER P | [100], [150], [202] | ✔ | set 12 | U+0070 is identical to U+0440 | |
| U+0071 | q | Latin | LATIN SMALL LETTER Q | [100], [204] | ✔ | set 13 | U+0071 not reliably distinguished from U+0566 and identical to U+051B | |
| U+0073 | s | Latin | LATIN SMALL LETTER S | [100], [150], [202] | ✔ | set 14 | U+0073 is identical to U+0455 | |
| U+0075 | u | Latin | LATIN SMALL LETTER U | [100], [204] | ✔ | set 15 | U+0075 not always distinct from U+057D | |
| U+0077 | w | Latin | LATIN SMALL LETTER W | ✔ | set 16 | U+0077 (not in MSR-2) identical to U+051D | ||
| U+0078 | x | Latin | LATIN SMALL LETTER X | [100], [150], [202] | ✔ | set 17 | U+0078 is identical to U+0445 | |
| U+0079 | y | Latin | LATIN SMALL LETTER Y | [100], [150], [202] | ✔ | set 18 | U+0079 is identical to U+0443 | |
| U+00E6 | æ | Latin | LATIN SMALL LETTER AE | [100], [150], [202] | ✔ | set 19 | U+00E6 is identical to U+04D5 | |
| U+00E7 | ç | Latin | LATIN SMALL LETTER C WITH CEDILLA | [100], [151] | ✔ | set 20 | U+00E7 is identical to U+04AB | |
| U+00E8 | è | Latin | LATIN SMALL LETTER E WITH GRAVE | [100], [151] | ✔ | set 21 | U+00E8 is identical to U+0450 | |
| U+00EB | ë | Latin | LATIN SMALL LETTER E WITH DIAERESIS | [100], [151] | ✔ | set 22 | U+00EB is identical to U+0451 | |
| U+00EF | ï | Latin | LATIN SMALL LETTER I WITH DIAERESIS | [100], [151] | ✔ | set 23 | U+00EF is identical to U+0457 | |
| U+00FF | ÿ | Latin | LATIN SMALL LETTER Y WITH DIAERESIS | ✔ | set 24 | U+00FF identical to U+04F0 | ||
| U+0115 | ĕ | Latin | LATIN SMALL LETTER E WITH BREVE | ✔ | set 25 | U+0115 identical to U+04D7 | ||
| U+0127 | ħ | Latin | LATIN SMALL LETTER H WITH STROKE | ✔ | set 26 | U+0127 identical to U+045B | ||
| U+0138 | ĸ | Latin | LATIN SMALL LETTER KRA | [150] | excluded-cp | ✗ | set 27 | Obsolete U+0138 is identical to U+043A and U+03BA |
| U+01DD | ǝ | Latin | LATIN SMALL LETTER TURNED E | [100], [150], [202] | ✔ | set 28 | U+01DD is identical to U+0259 and U+04D9 | |
| U+0259 | ə | Latin | LATIN SMALL LETTER SCHWA | [100], [150] | ✔ | set 28 | U+0259 is identical to U+04D9 and U+01DD | |
| U+025B | ɛ | Latin | LATIN SMALL LETTER OPEN E | [100], [150] | ✔ | set 29 | U+025B is identical to U+03B5 | |
| U+025C | ɜ | Latin | LATIN SMALL LETTER REVERSED OPEN E | [150] | ✔ | set 30 | U+025C is frequently identical to U+0437 | |
| U+0269 | ɩ | Latin | LATIN SMALL LETTER IOTA | [100], [150], [204] | ✔ | set 31 | U+0269 not reliably distinguished from U+0582 and identical to U+03B9 | |
| U+0269 U+0308 | ɩ̈ | [151] | ✔ | set 32 | U+0269 U+0308 not reliably distinguished from U+0582 and identical to U+03B9 | |||
| U+026A | ɪ | Latin | LATIN LETTER SMALL CAPITAL I | [100], [150] | ✔ | set 9 | U+026A intended to be identical to U+04CF but is often distinct; 026A may be similar to 0069, but is commonly distinct | |
| U+0275 | ɵ | Latin | LATIN SMALL LETTER BARRED O | [100], [150] | ✔ | set 33 | U+0275 is identical to U+04E9 | |
| U+0292 | ʒ | Latin | LATIN SMALL LETTER EZH | [100], [150] | ✔ | set 34 | U+0292 is identical to U+04E1 | |
| U+0299 | ʙ | Latin | LATIN LETTER SMALL CAPITAL B | [150] | ✔ | set 35 | U+0299 is identical to U+0432 | |
| U+029C | ʜ | Latin | LATIN LETTER SMALL CAPITAL H | [150] | ✔ | set 36 | U+029C is identical to U+043D | |
| U+0306 | ̆ | Inherited | COMBINING BREVE | ✔ | set 37 | U+0306 (not in MSR-2) not reliably distinguishable from U+A67C | ||
| U+0363 | ͣ | Inherited | COMBINING LATIN SMALL LETTER A | [115] | ✔ | set 38 | U+0363 is identical to U+2DF6 COMBINING LATIN SMALL LETTER A | |
| U+0364 | ͤ | Inherited | COMBINING LATIN SMALL LETTER E | [115] | ✔ | set 39 | U+0364 is identical to U+2DF7 COMBINING LATIN SMALL LETTER E | |
| U+0366 | ͦ | Inherited | COMBINING LATIN SMALL LETTER O | [115] | ✔ | set 40 | U+0366 is identical to U+2DEA COMBINING LATIN SMALL LETTER O | |
| U+0368 | ͨ | Inherited | COMBINING LATIN SMALL LETTER C | [115] | ✔ | set 41 | U+0368 is identical to U+2DED COMBINING LATIN SMALL LETTER C | |
| U+036F | ͯ | Inherited | COMBINING LATIN SMALL LETTER X | [115] | ✔ | set 42 | U+036F is identical to U+2DEF COMBINING LATIN SMALL LETTER X | |
| U+03B4 | δ | Greek | GREEK SMALL LETTER DELTA | ✔ | set 43 | U+03B4 (MSR-2) identical to U+1E9F | ||
| U+03B5 | ε | Greek | GREEK SMALL LETTER EPSILON | [100], [150] | ✔ | set 29 | U+03B5 is identical to U+025B and not reliably distinguished from U+0511 | |
| U+03B7 | η | Greek | GREEK SMALL LETTER ETA | [100], [204] | ✔ | set 44 | U+03B7 not reliably distinguished from U+0572 | |
| U+03B9 | ι | Greek | GREEK SMALL LETTER IOTA | [100], [150], [204] | ✔ | set 31 | U+03B9 not reliably distinguished from U+0582 and identical to U+0269 | |
| U+03BA | κ | Greek | GREEK SMALL LETTER KAPPA | [100], [202] | ✔ | set 27 | (not in intentional) U+03BA is not reliably distinguished from U+043A | |
| U+03BF | ο | Greek | GREEK SMALL LETTER OMICRON | [100], [150], [202] | ✔ | set 11 | U+03BF is identical to U+006F, U+043E and U+0585 | |
| U+03C6 | φ | Greek | GREEK SMALL LETTER PHI | [100], [150], [202] | ✔ | set 45 | U+03C6 in some fonts is identical to U+0444 | |
| U+03CA | ϊ | Greek | GREEK SMALL LETTER IOTA WITH DIALYTIKA | [100], [151] | ✔ | set 32 | U+03ca not reliably distinguished from U+0582 U+0308 and identical to U+0269 U+0308 | |
| U+03F3 | ϳ | Greek | GREEK LETTER YOT | [100], [150] | ✔ | set 8 | U+03F3 is identical to U+006A | |
| U+0430 | а | Cyrillic | CYRILLIC SMALL LETTER A | [100], [150] | ✔ | set 1 | U+0430 is identical to U+0061 | |
| U+0432 | в | Cyrillic | CYRILLIC SMALL LETTER VE | [100], [150] | ✔ | set 35 | U+0432 is identical to U+0299 | |
| U+0433 | г | Cyrillic | CYRILLIC SMALL LETTER GHE | [100], [150] | ✔ | set 46 | U+0433 is identical to U+1D26 | |
| U+0435 | е | Cyrillic | CYRILLIC SMALL LETTER IE | [100], [150], [202] | ✔ | set 4 | U+0435 is identical to U+0065 | |
| U+0437 | з | Cyrillic | CYRILLIC SMALL LETTER ZE | [100], [202] | ✔ | set 30 | U+0437 is identical to U+025C | |
| U+043A | к | Cyrillic | CYRILLIC SMALL LETTER KA | [100], [150] | ✔ | set 27 | U+043A is identical to U+0138 and not reliably distinguished from U+03BA | |
| U+043B | л | Cyrillic | CYRILLIC SMALL LETTER EL | [100], [150] | ✔ | set 47 | U+043B is identical to U+12DB | |
| U+043C | м | Cyrillic | CYRILLIC SMALL LETTER EM | [100], [150] | ✔ | set 48 | U+043C is identical to U+1D0D | |
| U+043D | н | Cyrillic | CYRILLIC SMALL LETTER EN | [100], [150] | ✔ | set 36 | U+043D is identical to U+029C | |
| U+043E | о | Cyrillic | CYRILLIC SMALL LETTER O | [202] | ✔ | set 11 | (not in intentional) U+043E is identical to U+006F, U+03BF and U+0585 | |
| U+043F | п | Cyrillic | CYRILLIC SMALL LETTER PE | [100], [150] | ✔ | set 49 | U+043F is identical to U+1D28 | |
| U+0440 | р | Cyrillic | CYRILLIC SMALL LETTER ER | [100], [150], [202] | ✔ | set 12 | U+0440 is identical to U+0070 | |
| U+0441 | с | Cyrillic | CYRILLIC SMALL LETTER ES | [100], [150], [202] | ✔ | set 2 | U+0441 is identical to U+0063 | |
| U+0442 | т | Cyrillic | CYRILLIC SMALL LETTER TE | [100], [150] | ✔ | set 50 | U+0442 is identical to U+1D1B | |
| U+0443 | у | Cyrillic | CYRILLIC SMALL LETTER U | [100], [150], [202] | ✔ | set 18 | U+0443 is identical to U+0079 | |
| U+0444 | ф | Cyrillic | CYRILLIC SMALL LETTER EF | [100], [150], [202] | ✔ | set 45 | U+0444 is identical to U+03C6 in some fonts | |
| U+0445 | х | Cyrillic | CYRILLIC SMALL LETTER HA | [100], [150], [202] | ✔ | set 17 | U+0445 is identical to U+0078 | |
| U+0450 | ѐ | Cyrillic | CYRILLIC SMALL LETTER IE WITH GRAVE | [100], [151] | ✔ | set 21 | U+0450 is identical to U+00E8 | |
| U+0451 | ё | Cyrillic | CYRILLIC SMALL LETTER IO | [100], [151] | ✔ | set 22 | U+0451 is identical to U+00EF | |
| U+0455 | ѕ | Cyrillic | CYRILLIC SMALL LETTER DZE | [100], [150] | ✔ | set 14 | U+0455 is identical to U+0073 | |
| U+0456 | і | Cyrillic | CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I | [100], [150] | ✔ | set 7 | U+0456 is identical to U+0069 | |
| U+0457 | ї | Cyrillic | CYRILLIC SMALL LETTER YI | [100], [151] | ✔ | set 23 | U+0457 is identical to U+00EF | |
| U+045B | ћ | Cyrillic | CYRILLIC SMALL LETTER TSHE | ✔ | set 26 | U+045B identical to h-stroke U+0127 | ||
| U+045C | ќ | Cyrillic | CYRILLIC SMALL LETTER KJE | ✔ | set 29 | U+045C 0301 not reliably distinguished from kappa+tonos U+03B5 | ||
| U+049B | қ | Cyrillic | CYRILLIC SMALL LETTER KA WITH DESCENDER | ✔ | set 51 | U+049B (not in MSR-2) identical to k with descender U+2C6A | ||
| U+04A3 | ң | Cyrillic | CYRILLIC SMALL LETTER EN WITH DESCENDER | ✔ | set 52 | U+04A3 (not in MSR-2) identical to h with descender U+2C68 | ||
| U+04AB | ҫ | Cyrillic | CYRILLIC SMALL LETTER ES WITH DESCENDER | [100], [151] | ✔ | set 20 | U+04AB is identical to U+00E7 | |
| U+04BB | һ | Cyrillic | CYRILLIC SMALL LETTER SHHA | [100], [150] | ✔ | set 6 | U+04BB is identical to U+0068 and U+0570 | |
| U+04CF | ӏ | Cyrillic | CYRILLIC SMALL LETTER PALOCHKA | [100], [150] | ✔ | set 9 | U+04CF is identical to U+026A and frequently to U+006C | |
| U+04D5 | ӕ | Cyrillic | CYRILLIC SMALL LIGATURE A IE | [100], [150] | ✔ | set 19 | U+04D5 is identical to U+00E6 | |
| U+04D7 | ӗ | Cyrillic | CYRILLIC SMALL LETTER IE WITH BREVE | ✔ | set 25 | U+04D7 identical to U+0115 | ||
| U+04D9 | ә | Cyrillic | CYRILLIC SMALL LETTER SCHWA | [100], [150], [202] | ✔ | set 28 | U+04D9 is identical to U+01DD and U+0259 | |
| U+04E1 | ӡ | Cyrillic | CYRILLIC SMALL LETTER ABKHASIAN DZE | [100], [150] | ✔ | set 34 | U+04E1 is identical to U+0292 | |
| U+04E9 | ө | Cyrillic | CYRILLIC SMALL LETTER BARRED O | [100], [150] | ✔ | set 33 | U+04E9 is identical to U+0275 | |
| U+04F0 | Ӱ | Cyrillic | CYRILLIC CAPITAL LETTER U WITH DIAERESIS | ✔ | set 24 | U+04F0 identical to U+00FF | ||
| U+0501 | ԁ | Cyrillic | CYRILLIC SMALL LETTER KOMI DE | [150] | ✔ | set 3 | U+0501 is identical to U+0064 | |
| U+0511 | ԑ | Cyrillic | CYRILLIC SMALL LETTER REVERSED ZE | ✔ | set 29 | U+0511 not reliably distinguished from Latin EPSILON U+025B and U+03B5 | ||
| U+051B | ԛ | Cyrillic | CYRILLIC SMALL LETTER QA | ✔ | set 13 | U+051B identical to letter Q U+0071 | ||
| U+051D | ԝ | Cyrillic | CYRILLIC SMALL LETTER WE | ✔ | set 16 | U+051D identical to letter W U+0077 | ||
| U+0566 | զ | Armenian | ARMENIAN SMALL LETTER ZA | [100], [204] | ✔ | set 13 | U+0566 not reliably distinguished from U+0071 | |
| U+0570 | հ | Armenian | ARMENIAN SMALL LETTER HO | [100], [204] | ✔ | set 6 | U+0570 is identical to U+0068 abd U+04BB | |
| U+0572 | ղ | Armenian | ARMENIAN SMALL LETTER GHAD | [100], [204] | ✔ | set 44 | U+0572 not reliably distinguished from U+03B7 | |
| U+0578 | ո | Armenian | ARMENIAN SMALL LETTER VO | [100], [204] | ✔ | set 10 | U+0578 not reliably distinguished from U+006E | |
| U+057D | ս | Armenian | ARMENIAN SMALL LETTER SEH | [100], [204] | ✔ | set 15 | U+057D not reliably distinguished from U+0075 | |
| U+0581 | ց | Armenian | ARMENIAN SMALL LETTER CO | [100], [204] | ✔ | set 5 | U+0581 not reliably distinguished from U+0067 | |
| U+0582 | ւ | Armenian | ARMENIAN SMALL LETTER YIWN | [100], [204] | ✔ | set 31 | U+0582 not reliably distinguished from U+0269 and U+03B9 | |
| U+0582 U+0308 | ւ̈ | ✔ | set 32 | U+0582 0308 not reliably distinguished from U+0269 U+0308 and U+03B9 U+0308 | ||||
| U+0585 | օ | Armenian | ARMENIAN SMALL LETTER OH | [202] | ✔ | set 11 | (not in intentional) U+0585 is identical to U+006F, U+03BF and U+043E | |
| U+101D | ဝ | Myanmar | MYANMAR LETTER WA | [150] | ✔ | set 53 | Letter U+101D is identical to digit U+1040 | |
| U+1040 | ၀ | Myanmar | MYANMAR DIGIT ZERO | [150] | ✔ | set 53 | Digit U+1040 is identical to letter U+101D | |
| U+17A2 | អ | Khmer | KHMER LETTER QA | [100], [150] | ✔ | set 54 | U+17A2 is identical to deprecated U+17A3 | |
| U+17A3 | ឣ | Khmer | KHMER INDEPENDENT VOWEL QAQ | [150] | excluded-cp | ✗ | set 54 | (deprecated) U+17A3 is identical to U+17A2 |
| U+1835 | ᠵ | Mongolian | MONGOLIAN LETTER JA | [150] | ✔ | set 55 | U+1835 is identical to U+1855 | |
| U+1855 | ᡕ | Mongolian | MONGOLIAN LETTER TODO YA | [150] | ✔ | set 55 | U+1855 is identical to U+1835 | |
| U+199E | ᦞ | New_Tai_Lue | NEW TAI LUE LETTER LOW VA | [150] | ✔ | set 56 | Letter U+199E is identical to digit U+19D0 | |
| U+19B1 | ᦱ | New_Tai_Lue | NEW TAI LUE VOWEL SIGN AA | [150] | ✔ | set 57 | Letter U+19B1 is identical to digit U+19D1 | |
| U+19D0 | ᧐ | New_Tai_Lue | NEW TAI LUE DIGIT ZERO | [150] | ✔ | set 56 | Letter U+19D0 is identical to digit U+199E | |
| U+19D1 | ᧑ | New_Tai_Lue | NEW TAI LUE DIGIT ONE | [150] | ✔ | set 57 | Digit U+19D1 is identical to letter U+19B2 | |
| U+1B0D | ᬍ | Balinese | BALINESE LETTER LA LENGA | [150] | ✔ | set 58 | Letter U+1B0D is identical to digit U+1B52 | |
| U+1B11 | ᬑ | Balinese | BALINESE LETTER OKARA | [150] | ✔ | set 59 | Letter U+1B11 is identical to digit U+1B53 | |
| U+1B28 | ᬨ | Balinese | BALINESE LETTER PA KAPAL | [150] | ✔ | set 60 | Letter U+1B28 is identical to digit U+1B58 | |
| U+1B52 | ᭒ | Balinese | BALINESE DIGIT TWO | [150] | ✔ | set 58 | U+1B52 is identical to U+1B0D | |
| U+1B53 | ᭓ | Balinese | BALINESE DIGIT THREE | [150] | ✔ | set 59 | Digit U+1B53 is identical to letter U+1B11 | |
| U+1B58 | ᭘ | Balinese | BALINESE DIGIT EIGHT | [150] | ✔ | set 60 | Digit U+1B58 is identical to letter U+1B28 | |
| U+1D0D | ᴍ | Latin | LATIN LETTER SMALL CAPITAL M | [150] | ✔ | set 48 | U+1D0D is identical to U+043C | |
| U+1D18 | ᴘ | Latin | LATIN LETTER SMALL CAPITAL P | [150] | ✔ | set 61 | U+1D18 is identical to U+1D29 | |
| U+1D1B | ᴛ | Latin | LATIN LETTER SMALL CAPITAL T | [150] | ✔ | set 50 | U+1D1B is identical to U+0442 | |
| U+1D26 | ᴦ | Greek | GREEK LETTER SMALL CAPITAL GAMMA | [150] | ✔ | set 46 | U+1D26 is identical to U+0433 | |
| U+1D28 | ᴨ | Greek | GREEK LETTER SMALL CAPITAL PI | [150] | ✔ | set 49 | U+1D28 is identical to U+043F | |
| U+1D29 | ᴩ | Greek | GREEK LETTER SMALL CAPITAL RHO | [150] | ✔ | set 61 | U+1D29 is identical to U+1D18 | |
| U+1D2B | ᴫ | Cyrillic | CYRILLIC LETTER SMALL CAPITAL EL | [150] | ✔ | set 47 | U+1D2B is identical to U+043B | |
| U+1E9F | ẟ | Latin | LATIN SMALL LETTER DELTA | ✔ | set 43 | U+1E9F (MSR-2) identical to U+03B4 | ||
| U+2C68 | ⱨ | Latin | LATIN SMALL LETTER H WITH DESCENDER | ✔ | set 52 | U+2C68 identical to U+04A3 | ||
| U+2C6A | ⱪ | Latin | LATIN SMALL LETTER K WITH DESCENDER | ✔ | set 51 | U+2C6A identical to U+049B | ||
| U+2DEA | ⷪ | Cyrillic | COMBINING CYRILLIC LETTER O | [115] | ✔ | set 40 | U+2DEA is identical to U+0366 | |
| U+2DED | ⷭ | Cyrillic | COMBINING CYRILLIC LETTER ES | [115] | ✔ | set 41 | U+2DED is identical to U+0368 | |
| U+2DEF | ⷯ | Cyrillic | COMBINING CYRILLIC LETTER HA | [115] | ✔ | set 42 | U+2DEF is identical to U+036F | |
| U+2DF6 | ⷶ | Cyrillic | COMBINING CYRILLIC LETTER A | [115] | ✔ | set 38 | U+2DF7 is identical to U+0363 | |
| U+2DF7 | ⷷ | Cyrillic | COMBINING CYRILLIC LETTER IE | [115] | ✔ | set 39 | U+2DF7 is identical to U+0364 | |
| U+A67C | ꙼ | Cyrillic | COMBINING CYRILLIC KAVYKA | ✔ | set 37 | U+A67C not reliably distinguishable from U+0306 | ||
| U+1039A | 𐎚 | Ugaritic | UGARITIC LETTER TO | [150] | ✔ | set 62 | U+1039A is identical to U+12038 | |
| U+10486 | 𐒆 | Osmanya | OSMANYA LETTER DEEL | [150] | ✔ | set 63 | U+10486 is identical to U+104A0 | |
| U+104A0 | 𐒠 | Osmanya | OSMANYA DIGIT ZERO | [150] | ✔ | set 63 | U+104A0 is identical to U+10486 | |
| U+12038 | 𒀸 | Cuneiform | CUNEIFORM SIGN ASH | [150] | ✔ | set 62 | U+12038 is identical to U+1039A |
Legend
Throughout this LGR, a code point sequence may be annotated with a string in ALL CAPS that is constructed on the same principle as a name for a Unicode Named Sequence. No claim is made that a sequence thus annotated is in fact a named sequence, nor that the annotation in such case actually corresponds to the formal name of a named sequence.
| Number of variant sets | 63 |
|---|---|
| Largest variant set | 4 |
| Ordinary Variants by Type | blocked (6) cross-script-homoglyph (146) homoglyph (22) |
| Reflexive Variants by Type |
The following tables list all variant sets defined in this LGR, except for singleton sets. Each table lists all variant mapping pairs of the set; one per row. Mappings are assumed to be symmetric: each row documents both forward (→) and reverse (←) mapping directions. In each table, the mappings are sorted by Source value in ascending code point order; shading is used to group mappings from the same source code point or sequence.
Where the type of both forward and reverse mappings are the same, a single value is given in the Type(s) column, otherwise the types for forward and reverse mapping are given in that order, as indicated by the arrows. The same applies to any comments.
A mapping where source and target are the same is reflexive. Variant sets consisting of only a single reflexive mapping are not shown as a set. Instead, the variant type of the mapping is listed in the Variants column of the Repertoire by Code Point table. Reflexive mappings that are part of a larger set are indicated with a “≡”.
In a properly specified LGR, all members within each variant set are variants of each other; the mappings in each set are symmetric and transitive; and all variant sets are disjoint.
Common Legend
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0061 | a | 0430 | а | ↔ | cross-script-homoglyph | [150] / [150], [202] | U+0061 (a) is identical to U+0430 (а) / U+0430 (а) is identical to U+0061 (a) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0063 | c | 0441 | с | ↔ | cross-script-homoglyph | [150], [202] | U+0063 (c) is identical to U+0441 (с) / U+0441 (с) is identical to U+0063 (c) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0064 | d | 0501 | ԁ | ↔ | cross-script-homoglyph | [150] | U+0064 (d) is identical to U+0501 (ԁ) / U+0501 (ԁ) is identical to U+0064 (d) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0065 | e | 0435 | е | ↔ | cross-script-homoglyph | [150], [202] | U+0065 (e) is identical to U+0435 (е) / U+0435 (е) is identical to U+0065 (e) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0067 | g | 0581 | ց | ↔ | cross-script-homoglyph | [204] | U+0067 (g) not reliably distinguished from U+0581 (ց) / U+0581 (ց) not reliably distinguished from U+0067 (g) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0068 | h | 04BB | һ | ↔ | cross-script-homoglyph | [150] | U+0068 (h) is identical to U+04BB (һ) and U+0570 (հ) / U+04BB (һ) is not always distinct from U+0068 (h) |
| 0068 | h | 0570 | հ | ↔ | cross-script-homoglyph | [202], [204] / [204] | U+0068 (h) is identical to U+04BB (һ) and U+0570 (հ) / U+0570 (հ) is not always distinct from U+0068 (h) |
| 04BB | һ | 0570 | հ | ↔ | cross-script-homoglyph | [202], [204] / [100], [204] | U+04BB (һ) is identical to U+0068 (h) and U+0570 (հ) / U+0570 (հ) is identical to U+0068 (h) and U+04BB (һ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0069 | i | 0456 | і | ↔ | cross-script-homoglyph | [150] / [150], [202] | U+0069 (i) is identical to U+0456 (і) / U+0456 (і) is identical to U+0069 (i) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 006A | j | 03F3 | ϳ | ↔ | cross-script-homoglyph | [150] / [150], [202] | U+006A (j) is identical to U+03F3 (ϳ) / U+03F3 (ϳ) is identical to U+006A (j) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 006C | l | 026A | ɪ | ↔ | homoglyph | / [150] | U+006C (l) is similar to 026A in mostly sans-serif fonts / U+026A (ɪ) is sometimes identical to U+04CF (ӏ) and sometimes indistinguishable from 006C |
| 006C | l | 04CF | ӏ | ↔ | cross-script-homoglyph | [202] / | (not in intentional) U+006C (l) is frequently identical to U+04CF (ӏ) / (not in intentional) U+04CF (ӏ) is frequently identical to U+0069 (i) |
| 026A | ɪ | 04CF | ӏ | ↔ | cross-script-homoglyph | [150] | U+026A (ɪ) is identical to U+04CF (ӏ) / U+04CF (ӏ) is commonly identical to U+006C (l) and sometimes identical to 026A |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 006E | n | 0578 | ո | ↔ | cross-script-homoglyph | [204] | U+006E (n) not reliably distinguished from U+0578 (ո) / U+0578 (ո) not always distinct from U+006E (n) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 006F | o | 03BF | ο | ↔ | cross-script-homoglyph | [150], [202] | U+006F (o) is identical to U+006F (o) , U+043E (о) and U+0585 (օ) / U+03BF (ο) is identical to U+006F (o) and U+043E (о) and U+0585 (օ) |
| 006F | o | 043E | о | ↔ | cross-script-homoglyph | [150], [202] / [202] | U+006F (o) is identical to U+03BF (ο) and U+043E (о) and U+585 / (not in intentional) U+043E (о) is identical to U+03BF (ο) and 006F and U+0585 (օ) |
| 006F | o | 0585 | օ | ↔ | cross-script-homoglyph | [100], [150], [202] / [202] | U+006F (o) is identical to U+03BF (ο) and U+043E (о) and U+585 / (not in intentional) U+0585 (օ) is identical to U+03BF (ο) and U+043E (о) and U+006F (o) |
| 03BF | ο | 043E | о | ↔ | cross-script-homoglyph | [150], [202] / [202] | U+03BF (ο) is identical to U+069 and U+043E (о) and U+585 / (not in intentional) U+043E (о) is identical to U+006F (o) , U+03BF (ο) and U+0585 (օ) |
| 03BF | ο | 0585 | օ | ↔ | cross-script-homoglyph | [100], [150], [202] / [202] | U+03BF (ο) is identical to U+043E (о) , U+0069 (i) and U+0585 (օ) / (not in intentional) U+0585 (օ) is identical to U+006F (o) , U+043E (о) and U+03BF (ο) |
| 043E | о | 0585 | օ | ↔ | cross-script-homoglyph | [202] | (not in intentional) U+043E (о) is identical to U+03BF (ο) , U+0069 (i) and U+0585 (օ) / (not in intentional) U+0585 (օ) is identical to U+03BF (ο) and U+043E (о) and U+006F (o) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0070 | p | 0440 | р | ↔ | cross-script-homoglyph | [150], [202] | U+0070 (p) is identical to U+0440 (р) / U+0440 (р) is identical to U+0070 (p) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0071 | q | 051B | ԛ | ↔ | cross-script-homoglyph | U+0071 (q) identical to letter Q U+051B (ԛ) / U+051B (not in MSR-2) identical to U+0071 (q) and not reliably distinguished from U+0566 (զ) | |
| 0071 | q | 0566 | զ | ↔ | cross-script-homoglyph | [204] | U+0071 (q) not reliably distinguished from U+0566 (զ) / U+0566 (զ) not reliably distinguished from U+0071 (q) |
| 051B | ԛ | 0566 | զ | → | blocked | Required for Symmetry | |
| ← | cross-script-homoglyph | U+051B (not in MSR-2) identical to U+0071 (q) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0073 | s | 0455 | ѕ | ↔ | cross-script-homoglyph | [150] / [150], [202] | U+0073 (s) is identical to U+0455 (ѕ) / U+0455 (ѕ) is identical to U+0073 (s) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0075 | u | 057D | ս | ↔ | cross-script-homoglyph | [204] | U+0075 (u) not reliably distinguished from U+057D (ս) / U+057D (ս) not always distinct from U+0075 (u) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0077 | w | 051D | ԝ | ↔ | cross-script-homoglyph | U+0077 (w) identical to letter W U+051D (ԝ) / U+051D (ԝ) (not in MSR-2) identical to U+0077 (w) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0078 | x | 0445 | х | ↔ | cross-script-homoglyph | [150], [202] | U+0078 (x) is identical to U+0445 (х) / U+0445 (х) is identical to U+0078 (x) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0079 | y | 0443 | у | ↔ | cross-script-homoglyph | [150], [202] | U+0079 (y) is identical to U+0443 (у) / U+0443 (у) is identical to U+0079 (y) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 00E6 | æ | 04D5 | ӕ | ↔ | cross-script-homoglyph | [150] / [150], [202] | U+00E6 (æ) is identical to U+04D5 (ӕ) / U+04D5 (ӕ) is identical to U+00R6 |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 00E7 | ç | 04AB | ҫ | ↔ | cross-script-homoglyph | [100], [151] | U+00E7 (ç) is identical to U+04AB (ҫ) / U+04AB (ҫ) is identical to U+00E7 (ç) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 00E8 | è | 0450 | ѐ | ↔ | cross-script-homoglyph | [151] | U+00E8 (è) is identical to U+0450 (ѐ) / U+0450 (ѐ) is identical to U+00E8 (è) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 00EB | ë | 0451 | ё | ↔ | cross-script-homoglyph | [151] | U+00EB (ë) is identical to U+0451 (ё) / U+0451 (ё) is identical to U+00EB (ë) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 00EF | ï | 0457 | ї | ↔ | cross-script-homoglyph | [151] | U+00EF (ï) is identical to U+0457 (ї) / U+0457 (ї) is identical to U+00EF (ï) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 00FF | ÿ | 04F0 | Ӱ | ↔ | cross-script-homoglyph | U+00FF (ÿ) identical to U+04F0 (Ӱ) / U+04F0 (Ӱ) identical to U+00FF (ÿ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0115 | ĕ | 04D7 | ӗ | ↔ | cross-script-homoglyph | U+0115 (ĕ) identical to U+04D7 (ӗ) / U+04D7 (ӗ) identical to U+0115 (ĕ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0127 | ħ | 045B | ћ | ↔ | cross-script-homoglyph | U+0127 (ħ) identical to h-stroke U+045B (ћ) / U+045B (ћ) identical to U+0127 (ħ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0138 | ĸ | 03BA | κ | ↔ | cross-script-homoglyph | [150] / [150], [202] | Obsolete U+0138 (ĸ) is identical to U+043A (к) and U+03BA (κ) / U+03BA (κ) is identical to U+043A (к) and to obsolete U+0138 (ĸ) |
| 0138 | ĸ | 043A | к | ↔ | cross-script-homoglyph | [150] | U+0138 (ĸ) is identical to U+XXXX / U+043A (к) is identical to U+03BA (κ) and to obsolete U+0138 (ĸ) |
| 03BA | κ | 043A | к | ↔ | cross-script-homoglyph | [202] | (not in intentional) U+03BA (κ) is not reliably distinguished from U+043A (к) / (not in intentional) U+043A (к) is not reliably distinguished from U+03BA (κ) and identical to obsolete U+0138 (ĸ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 01DD | ǝ | 0259 | ə | ↔ | homoglyph | [150] | U+01DD (ǝ) is identical to U+0259 (ə) and U+04D9 (ә) / U+0259 (ə) is identical to U+01DD (ǝ) and U+04D9 (ә) |
| 01DD | ǝ | 04D9 | ә | ↔ | cross-script-homoglyph | [150], [202] | U+01DD (ǝ) is identical to U+0259 (ə) and 04D9 / U+04D9 (ә) is identical to U+0259 (ə) and U+01DD (ǝ) |
| 0259 | ə | 04D9 | ә | ↔ | cross-script-homoglyph | [150] | U+0259 (ə) is identical to U+01DD (ǝ) and U+04D9 (ә) / U+04D9 (ә) is identical to U+0259 (ə) and U+01DD (ǝ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 025B | ɛ | 03B5 | ε | ↔ | cross-script-homoglyph | [150] | U+025B (ɛ) is identical to U+03B5 (ε) and not reliably distinguished from U+0511 (ԑ) / U+03B5 (ε) is identical to U+025B (ɛ) |
| 025B | ɛ | 045C | ќ | ↔ | blocked | [150] / | / Added for Transitivity / |
| 025B | ɛ | 0511 | ԑ | ↔ | cross-script-homoglyph | U+025B (ɛ) not reliably distinguished from Latin EPSILON U+0511 (ԑ) / U+0511 (not in MSR-2) not reliably distinguished from U+025B (ɛ) and U+03B5 (ε) | |
| 03B5 | ε | 045C | ќ | → | blocked | Required for Symmetry | |
| ← | cross-script-homoglyph | U+03B5 (ε) 0301 not reliably distinguished from kappa+tonos U+045C (ќ) | |||||
| 03B5 | ε | 0511 | ԑ | ↔ | cross-script-homoglyph | [100], [150] / | U+03B5 (ε) is not reliably distinguished from U+0511 (ԑ) and identical to U+025B (ɛ) / U+0511 (not in MSR-2) not reliably distinguished from U+025B (ɛ) and U+03B5 (ε) |
| 045C | ќ | 0511 | ԑ | ↔ | blocked | / Added for Transitivity / |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 025C | ɜ | 0437 | з | ↔ | cross-script-homoglyph | [202] / [150] | U+025C (ɜ) is identical to U+0437 (з) / U+0437 (з) is not always distinct from U+025C (ɜ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0269 | ɩ | 03B9 | ι | ↔ | cross-script-homoglyph | [150], [204] | U+0269 (ɩ) not reliably distinguished from U+0582 (ւ) and identical to U+03B9 (ι) / U+03B9 (ι) not reliably distinguished from U+0582 (ւ) and identical to U+0269 (ɩ) |
| 0269 | ɩ | 0582 | ւ | ↔ | cross-script-homoglyph | [204] | U+0269 (ɩ) not reliably distinguished from U+0582 (ւ) and U+03B9 (ι) / U+0582 (ւ) not reliably distinguished from U+0269 (ɩ) and U+03B9 (ι) |
| 03B9 | ι | 0582 | ւ | ↔ | cross-script-homoglyph | [204] | U+03B9 (ι) not reliably distinguished from U+0582 (ւ) ans U+0269 (ɩ) / U+0582 (ւ) not reliably distinguished from U+0269 (ɩ) and U+03B9 (ι) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0269 0308 | ɩ̈ | 03CA | ϊ | ↔ | cross-script-homoglyph | [151] / [100], [151] | U+0269 U+0308 (ɩ̈) not reliably distinguished from U+0582 (ւ) and identical to U+03CA (ϊ) / U+03ca (ϊ) not reliably distinguished from U+0582 (ւ) u+0308and identical to U+0269 U+0308 (ɩ̈) |
| 0269 0308 | ɩ̈ | 0582 0308 | ւ̈ | ↔ | cross-script-homoglyph | [151] / | U+0269 U+0308 (ɩ̈) not reliably distinguished from U+0582 (ւ) and identical to U+03B9 (ι) / U+0582 (ւ) 0308 not reliably distinguished from U+0269 U+0308 (ɩ̈) and U+03B9 U+0308 (ϊ) |
| 03CA | ϊ | 0582 0308 | ւ̈ | ↔ | cross-script-homoglyph | [100], [151] / | U+03ca (ϊ) not reliably distinguished from U+0582 (ւ) u+0308and identical to U+0269 U+0308 (ɩ̈) / U+0582 U+0308 (ւ̈) not reliably distinguished from U+0269 U+0308 (ɩ̈) and U+03CA (ϊ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0275 | ɵ | 04E9 | ө | ↔ | cross-script-homoglyph | [150] | U+0275 (ɵ) is identical to U+04E9 (ө) / U+04E9 (ө) is identical to U+0275 (ɵ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0292 | ʒ | 04E1 | ӡ | ↔ | cross-script-homoglyph | [150] | U+0292 (ʒ) is identical to U+04E1 (ӡ) / U+04E1 (ӡ) is identical to U+0292 (ʒ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0299 | ʙ | 0432 | в | ↔ | cross-script-homoglyph | [150] | U+0299 (ʙ) is identical to U+0432 (в) / U+0432 (в) is identical to U+0299 (ʙ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 029C | ʜ | 043D | н | ↔ | cross-script-homoglyph | [150] | U+029C (ʜ) is identical to U+043D (н) / U+043D (н) is identical to U+029C (ʜ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0306 | ̆ | A67C | ꙼ | ↔ | cross-script-homoglyph | U+0306 (̆) not reliably distinguishable from U+A67C (꙼) / U+A67C (not in MSR-2) not reliably distinguishable from U+0306 (̆) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0363 | ͣ | 2DF6 | ⷶ | ↔ | cross-script-homoglyph | U+0363 (ͣ) is identical to U+2DF6 (ⷶ) COMBINING LATIN SMALL LETTER A / U+2DF6 (ⷶ) is identical to U+0363 (ͣ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0364 | ͤ | 2DF7 | ⷷ | ↔ | cross-script-homoglyph | U+0364 (ͤ) is identical to U+2DF7 (ⷷ) COMBINING LATIN SMALL LETTER E / U+2DF7 (ⷷ) is identical to U+0364 (ͤ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0366 | ͦ | 2DEA | ⷪ | ↔ | cross-script-homoglyph | U+0366 (ͦ) is identical to U+2DEA (ⷪ) COMBINING LATIN SMALL LETTER O / U+2DEA (ⷪ) is identical to U+0366 (ͦ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0368 | ͨ | 2DED | ⷭ | ↔ | cross-script-homoglyph | U+0368 (ͨ) is identical to U+2DED (ⷭ) COMBINING LATIN SMALL LETTER C / U+2DED (ⷭ) is identical to U+0368 (ͨ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 036F | ͯ | 2DEF | ⷯ | ↔ | cross-script-homoglyph | U+036F (ͯ) is identical to U+2DEF (ⷯ) COMBINING LATIN SMALL LETTER X / U+2DEF (ⷯ) is identical to U+036F (ͯ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 03B4 | δ | 1E9F | ẟ | ↔ | cross-script-homoglyph | U+03B4 (MSR-2) identical to U+1E9F (ẟ) / U+1E9F (MSR-2) identical to U+03B4 (δ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 03B7 | η | 0572 | ղ | ↔ | cross-script-homoglyph | [204] | U+03B7 (η) not reliably distinguished from U+0572 (ղ) / U+0572 (ղ) not reliably distinguished from U+03B7 (η) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 03C6 | φ | 0444 | ф | ↔ | cross-script-homoglyph | [150], [202] | U+03C6 (φ) in some fonts is identical to U+0444 (ф) / U+0444 (ф) is identical to U+03C6 (φ) in some fonts |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0433 | г | 1D26 | ᴦ | ↔ | cross-script-homoglyph | [150] | U+0433 (г) is identical to U+1D26 (ᴦ) / U+1D26 (ᴦ) is identical to U+0433 (г) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 043B | л | 1D2B | ᴫ | ↔ | cross-script-homoglyph | [150] | U+043B (л) is identical to U+1D2B (ᴫ) / U+1D2B (ᴫ) is identical to U+043B (л) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 043C | м | 1D0D | ᴍ | ↔ | cross-script-homoglyph | [150] | U+043C (м) is identical to U+1D0D (ᴍ) / U+1D0D (ᴍ) is identical to U+043C (м) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 043F | п | 1D28 | ᴨ | ↔ | cross-script-homoglyph | [150] | U+043F (п) is identical to U+1D28 (ᴨ) / U+1D28 (ᴨ) is identical to U+043F (п) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 0442 | т | 1D1B | ᴛ | ↔ | cross-script-homoglyph | [150] | U+0442 (т) is identical to U+1D1B (ᴛ) / U+1D1B (ᴛ) is identical to U+0442 (т) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 049B | қ | 2C6A | ⱪ | ↔ | cross-script-homoglyph | U+049B (қ) identical to U+2C6A (ⱪ) / U+2C6A (not in MSR-2) identical to k with descender U+049B (қ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 04A3 | ң | 2C68 | ⱨ | ↔ | cross-script-homoglyph | U+04A3 (ң) identical to U+2C68 (ⱨ) / U+2C68 (not in MSR-2) identical to h with descender U+04A3 (ң) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 101D | ဝ | 1040 | ၀ | ↔ | homoglyph | [150] | Letter U+101D (ဝ) is identical to Digit U+1040 (၀) / Digit U+1040 (၀) is identical to letter U+101D (ဝ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 17A2 | អ | 17A3 | ឣ | ↔ | homoglyph | [150] | U+17A2 (អ) is identical to deprecated U+17A3 (ឣ) / (deprecated) U+17A3 (ឣ) is identical to U+017A2 (អ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 1835 | ᠵ | 1855 | ᡕ | ↔ | homoglyph | [150] | U+1835 (ᠵ) is identical to U+1855 (ᡕ) / U+1855 (ᡕ) is identical to U+1835 (ᠵ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 199E | ᦞ | 19D0 | ᧐ | ↔ | homoglyph | [150] | Letter U+199E (ᦞ) is identical to digit U+19D0 (᧐) / Digit U+19D0 (᧐) is identical to letter U+199E (ᦞ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 19B1 | ᦱ | 19D1 | ᧑ | ↔ | homoglyph | [150] | Letter U+19B1 (ᦱ) is identical to digit U+19D1 (᧑) / Digit U+19B1 (ᦱ) is identical to letter U+19D1 (᧑) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 1B0D | ᬍ | 1B52 | ᭒ | ↔ | homoglyph | [150] | Digit U+1B0D (ᬍ) is identical to letter U+1B0D (ᬍ) / Digit U+1B52 (᭒) is identical to letter U+1B0D (ᬍ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 1B11 | ᬑ | 1B53 | ᭓ | ↔ | homoglyph | [150] | Letter U+1B11 (ᬑ) is identical to digit U+1B53 (᭓) / Digit U+1B53 (᭓) is identical to letter U+1B11 (ᬑ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 1B28 | ᬨ | 1B58 | ᭘ | ↔ | homoglyph | [150] | Letter U+1B28 (ᬨ) is identical to digit U+1B58 (᭘) / Digit U+1B58 (᭘) is identical to letter U+1B28 (ᬨ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 1D18 | ᴘ | 1D29 | ᴩ | ↔ | cross-script-homoglyph | [150] | U+1D18 (ᴘ) is identical to U+1D29 (ᴩ) / U+1D29 (ᴩ) is identical to U+1D18 (ᴘ) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 1039A | 𐎚 | 12038 | 𒀸 | ↔ | cross-script-homoglyph | [150] | U+1039A (𐎚) is identical to U+12038 (𒀸) / U+12038 (𒀸) is identical to U+1039A (𐎚) |
| Source | Glyph | Target | Glyph | Type(s) | Ref | Comment | |
|---|---|---|---|---|---|---|---|
| 10486 | 𐒆 | 104A0 | 𐒠 | ↔ | homoglyph | [150] | U+10486 (𐒆) is identical to U+104A0 (𐒠) / U+104A0 (𐒠) is identical to U+10486 (𐒆) |
The LGR does not define any named or implicit character classes.
The following table lists all named rules defined in the LGR and indicates whether they are used as trigger in an action or as context (when or not-when) for a code point. (Any use of context rules for variants is not indicated).
| Name | Used as Trigger |
Used as Context |
Anchor | Regular Expression | Ref | Comment |
|---|---|---|---|---|---|---|
| excluded-cp | ✔ | (^$) |
This rule matches the empty label,; if used as context rule for a code point, it invalidates any label that contains the code point, effectively excluding the code point from the eligible repertoire | |||
| preceding-hamza-above | ✔ | (⚓(?=\u0654)) |
match if code point precedes U+0654 | |||
| following-soft-dotted | ((?<=[∅=\p{SD=Y}])) |
match if code point follows a soft-dotted character |
Legend
Note: The following rules are defined but not used in this LGR: preceding-hamza-above, following-soft-dotted.
The following table lists the actions that are used to assign dispositions to labels and variant labels, based on the specified conditions. The order of actions defines their precedence: the first action triggered by a label is the one defining its disposition.
| # | Condition | Rule / Variant Set | Disposition | Ref | Comment | |
|---|---|---|---|---|---|---|
| 1 | if at least one variant is in | {homoglyph} | → | blocked | homoglyphs are mutually exclusive by default | |
| 2 | if at least one variant is in | {cross-script-homoglyph} | → | blocked | cross-script-homoglyphs are mutually exclusive by default |
Legend
Note: The following variant types defined in this LGR are not used as triggers for any actions: blocked. This is not necessarily an error. Labels containing such types are usually handled in the Catch All action.
| [100] | MSR-2 Code points included in MSR-2 |
| [115] | MSR-2 Code points excluded from MSR-2 |
| [150] | The Unicode Consortium, "Intentional.txt", Version 10.0.0, http://www.unicode.org/Public/security/10.0.0/intentional.txt Code points considered identical by intention |
| [151] | Derived from NFC plus The Unicode Consortium, "Intentional.txt", Version 10.0.0, http://www.unicode.org/Public/security/10.0.0/intentional.txt Combining sequences involving code points considered identical by intention, after applying NFC. |
| [202] | Cyrillic LGR Root Zone |
| [204] | Armenian LGR Root Zone |