LGR for unspecified language

This document is mechanically formatted from the XML file for the LGR. It provides additional summary data and explanatory text. The XML file remains the sole normative specification of the LGR.

Date 2016-12-12
LGR Version 1
Unicode Version 10.0.0

Description

Preliminary Collection of Data on Cross-script Homoglyphs

Note to Reviewers:
While this document is formatted like an LGR, it does not define a repertoire or Label Generation Rules. Instead, it uses the variant formalism to present sets of code points that are identical across scripts. Doing so allowed the use of existing tools for verification and cross-checking tasks, as well as formatting the data into HTML-based tables.
Review is solicited on whether the code points constitute cross-script homoglyphs and whether having such a data collection is considered useful.

Overview

This file presents a collection of code points and code point sequences that could be considered cross-script homoglyphs. The focus is on code points that cannot be distinguished, because they are shown with identical glyphs in most or all fonts.

This file was generated by starting with Intentional.txt from Version 10.0.0, filtered to exclude DISALLOWED code points from IDNA 2008 (the latter set is based on Unicode version 6.3.0). Also filtered out were any "in-script" homoglyphs, that is any code point that are identical to another code point of the same script, and not also a cross-script homoglyph.

In addition, code points that where cited as cross-script homoglyphs in relevant LGR proposals have been added and referenced to the corresponding proposal

This list is augmented by adding a few code points that are not intentionally the same, but effectively identical. Those code points may have a nominally distinct shape, as shown in the code charts, and while some fonts may make that distinction, many or most common fonts do not. In some cases a code point has two common glyph shape and one or both may be identical to the shape for another code point.

There is a much larger set of code points that are pairwise similar, sometimes confusingly so; compare the well known examples of DIGIT ONE (1) and SMALL L (l) that predates the development of IDNs. These "confusables" are not considered true homoglyphs and are excluded here.

Obsolete and not widely used code points

A number of code points in this file are for scripts that are not in widespread modern use or they are deprecated or obsolete code points in otherwise modern scripts. For these code points, known cross-script homoglyphs have nevertheless been listed. For deprecated and known obsolete code points, the conservative approach would be to not allow them, which also removes the problem of their homoglyph relations. For cases where the script itself is in limited or not in modern use, it should be noted that these may not be well-understood enough to be sure that all cross script homoglyph relations are known. They may also have other as-yet-unidentified problems for use with identifiers.

Code points referenced with [100] are included in the MSR-2. The MSR-2 is limited to code points in widespread modern use. This does not mean that all of these code points are not "safe", but at least they are moderately well understood, and information about them is available in this and similar data collections.

A note on Intentional.txt: the methodology for that file normalizes to NFD (fully decomposed) not NFC (composed) which is the way IDNs are normalized. Accordingly, precomposed code points corresponding to NFD sequences differing only by codepoints that are listed in Intentional.txt are also considered "intentional" here and are referenced as [151].

Repertoire

Summary

Number of elements in Repertoire 135
Number of excluded elements 2
Total entries in table 137
Longest code point sequence 2
Number of code points 135
Number of sequences 2

Repertoire by Code Point

The following table lists the repertoire by code point (or code point sequence). The data in the Script and Name column are extracted from the Unicode character database. Where a comment in the original LGR is equal to the character name, it has been suppressed.

For any code point or sequence for which a variant is defined, additional information is provided in the Variants column. Some code points or sequences listed in the following table are not part of the repertoire itself; they document targets for out-of-repertoire variant mappings or optional code points as indicated. See also the legend provided below the table.

Code
Point
Glyph Script Name References Required Context Part of
Repertoire
Variants Comment
U+0061 a Latin LATIN SMALL LETTER A [100], [150], [202]   set 1 U+0061 is identical to U+0430
U+0063 c Latin LATIN SMALL LETTER C [100], [150], [202]   set 2 U+0063 is identical to U+0441
U+0064 d Latin LATIN SMALL LETTER D [100], [150]   set 3 U+0064 is identical to U+0501
U+0065 e Latin LATIN SMALL LETTER E [100], [150], [202]   set 4 U+0065 is identical to U+0435
U+0067 g Latin LATIN SMALL LETTER G [100], [204]   set 5 U+0067 not reliably distinguished from U+0581
U+0068 h Latin LATIN SMALL LETTER H [204]   set 6 U+0068 is not always distinct from U+04BB and U+0570
U+0069 i Latin LATIN SMALL LETTER I [100], [150], [202]   set 7 U+0069 is identical to U+0456
U+006A j Latin LATIN SMALL LETTER J [100], [150], [202]   set 8 U+006A is identical to U+03F3
U+006C l Latin LATIN SMALL LETTER L [100]   set 9 U+006C is frequently identical to U+04CF
U+006E n Latin LATIN SMALL LETTER N [100], [204]   set 10 U+006E not always distinct from U+0578
U+006F o Latin LATIN SMALL LETTER O [100], [150], [202]   set 11 U+006F is identical to U+03BF and U+043E and U+585
U+0070 p Latin LATIN SMALL LETTER P [100], [150], [202]   set 12 U+0070 is identical to U+0440
U+0071 q Latin LATIN SMALL LETTER Q [100], [204]   set 13 U+0071 not reliably distinguished from U+0566 and identical to U+051B
U+0073 s Latin LATIN SMALL LETTER S [100], [150], [202]   set 14 U+0073 is identical to U+0455
U+0075 u Latin LATIN SMALL LETTER U [100], [204]   set 15 U+0075 not always distinct from U+057D
U+0077 w Latin LATIN SMALL LETTER W   set 16 U+0077 (not in MSR-2) identical to U+051D
U+0078 x Latin LATIN SMALL LETTER X [100], [150], [202]   set 17 U+0078 is identical to U+0445
U+0079 y Latin LATIN SMALL LETTER Y [100], [150], [202]   set 18 U+0079 is identical to U+0443
U+00E6 æ Latin LATIN SMALL LETTER AE [100], [150], [202]   set 19 U+00E6 is identical to U+04D5
U+00E7 ç Latin LATIN SMALL LETTER C WITH CEDILLA [100], [151]   set 20 U+00E7 is identical to U+04AB
U+00E8 è Latin LATIN SMALL LETTER E WITH GRAVE [100], [151]   set 21 U+00E8 is identical to U+0450
U+00EB ë Latin LATIN SMALL LETTER E WITH DIAERESIS [100], [151]   set 22 U+00EB is identical to U+0451
U+00EF ï Latin LATIN SMALL LETTER I WITH DIAERESIS [100], [151]   set 23 U+00EF is identical to U+0457
U+00FF ÿ Latin LATIN SMALL LETTER Y WITH DIAERESIS   set 24 U+00FF identical to U+04F0
U+0115 ĕ Latin LATIN SMALL LETTER E WITH BREVE   set 25 U+0115 identical to U+04D7
U+0127 ħ Latin LATIN SMALL LETTER H WITH STROKE   set 26 U+0127 identical to U+045B
U+0138 ĸ Latin LATIN SMALL LETTER KRA [150] excluded-cp set 27 Obsolete U+0138 is identical to U+043A and U+03BA
U+01DD ǝ Latin LATIN SMALL LETTER TURNED E [100], [150], [202]   set 28 U+01DD is identical to U+0259 and U+04D9
U+0259 ə Latin LATIN SMALL LETTER SCHWA [100], [150]   set 28 U+0259 is identical to U+04D9 and U+01DD
U+025B ɛ Latin LATIN SMALL LETTER OPEN E [100], [150]   set 29 U+025B is identical to U+03B5
U+025C ɜ Latin LATIN SMALL LETTER REVERSED OPEN E [150]   set 30 U+025C is frequently identical to U+0437
U+0269 ɩ Latin LATIN SMALL LETTER IOTA [100], [150], [204]   set 31 U+0269 not reliably distinguished from U+0582 and identical to U+03B9
U+0269 U+0308 ɩ̈ [151]   set 32 U+0269 U+0308 not reliably distinguished from U+0582 and identical to U+03B9
U+026A ɪ Latin LATIN LETTER SMALL CAPITAL I [100], [150]   set 9 U+026A intended to be identical to U+04CF but is often distinct; 026A may be similar to 0069, but is commonly distinct
U+0275 ɵ Latin LATIN SMALL LETTER BARRED O [100], [150]   set 33 U+0275 is identical to U+04E9
U+0292 ʒ Latin LATIN SMALL LETTER EZH [100], [150]   set 34 U+0292 is identical to U+04E1
U+0299 ʙ Latin LATIN LETTER SMALL CAPITAL B [150]   set 35 U+0299 is identical to U+0432
U+029C ʜ Latin LATIN LETTER SMALL CAPITAL H [150]   set 36 U+029C is identical to U+043D
U+0306 ̆ Inherited COMBINING BREVE   set 37 U+0306 (not in MSR-2) not reliably distinguishable from U+A67C
U+0363 ͣ Inherited COMBINING LATIN SMALL LETTER A [115]   set 38 U+0363 is identical to U+2DF6 COMBINING LATIN SMALL LETTER A
U+0364 ͤ Inherited COMBINING LATIN SMALL LETTER E [115]   set 39 U+0364 is identical to U+2DF7 COMBINING LATIN SMALL LETTER E
U+0366 ͦ Inherited COMBINING LATIN SMALL LETTER O [115]   set 40 U+0366 is identical to U+2DEA COMBINING LATIN SMALL LETTER O
U+0368 ͨ Inherited COMBINING LATIN SMALL LETTER C [115]   set 41 U+0368 is identical to U+2DED COMBINING LATIN SMALL LETTER C
U+036F ͯ Inherited COMBINING LATIN SMALL LETTER X [115]   set 42 U+036F is identical to U+2DEF COMBINING LATIN SMALL LETTER X
U+03B4 δ Greek GREEK SMALL LETTER DELTA   set 43 U+03B4 (MSR-2) identical to U+1E9F
U+03B5 ε Greek GREEK SMALL LETTER EPSILON [100], [150]   set 29 U+03B5 is identical to U+025B and not reliably distinguished from U+0511
U+03B7 η Greek GREEK SMALL LETTER ETA [100], [204]   set 44 U+03B7 not reliably distinguished from U+0572
U+03B9 ι Greek GREEK SMALL LETTER IOTA [100], [150], [204]   set 31 U+03B9 not reliably distinguished from U+0582 and identical to U+0269
U+03BA κ Greek GREEK SMALL LETTER KAPPA [100], [202]   set 27 (not in intentional) U+03BA is not reliably distinguished from U+043A
U+03BF ο Greek GREEK SMALL LETTER OMICRON [100], [150], [202]   set 11 U+03BF is identical to U+006F, U+043E and U+0585
U+03C6 φ Greek GREEK SMALL LETTER PHI [100], [150], [202]   set 45 U+03C6 in some fonts is identical to U+0444
U+03CA ϊ Greek GREEK SMALL LETTER IOTA WITH DIALYTIKA [100], [151]   set 32 U+03ca not reliably distinguished from U+0582 U+0308 and identical to U+0269 U+0308
U+03F3 ϳ Greek GREEK LETTER YOT [100], [150]   set 8 U+03F3 is identical to U+006A
U+0430 а Cyrillic CYRILLIC SMALL LETTER A [100], [150]   set 1 U+0430 is identical to U+0061
U+0432 в Cyrillic CYRILLIC SMALL LETTER VE [100], [150]   set 35 U+0432 is identical to U+0299
U+0433 г Cyrillic CYRILLIC SMALL LETTER GHE [100], [150]   set 46 U+0433 is identical to U+1D26
U+0435 е Cyrillic CYRILLIC SMALL LETTER IE [100], [150], [202]   set 4 U+0435 is identical to U+0065
U+0437 з Cyrillic CYRILLIC SMALL LETTER ZE [100], [202]   set 30 U+0437 is identical to U+025C
U+043A к Cyrillic CYRILLIC SMALL LETTER KA [100], [150]   set 27 U+043A is identical to U+0138 and not reliably distinguished from U+03BA
U+043B л Cyrillic CYRILLIC SMALL LETTER EL [100], [150]   set 47 U+043B is identical to U+12DB
U+043C м Cyrillic CYRILLIC SMALL LETTER EM [100], [150]   set 48 U+043C is identical to U+1D0D
U+043D н Cyrillic CYRILLIC SMALL LETTER EN [100], [150]   set 36 U+043D is identical to U+029C
U+043E о Cyrillic CYRILLIC SMALL LETTER O [202]   set 11 (not in intentional) U+043E is identical to U+006F, U+03BF and U+0585
U+043F п Cyrillic CYRILLIC SMALL LETTER PE [100], [150]   set 49 U+043F is identical to U+1D28
U+0440 р Cyrillic CYRILLIC SMALL LETTER ER [100], [150], [202]   set 12 U+0440 is identical to U+0070
U+0441 с Cyrillic CYRILLIC SMALL LETTER ES [100], [150], [202]   set 2 U+0441 is identical to U+0063
U+0442 т Cyrillic CYRILLIC SMALL LETTER TE [100], [150]   set 50 U+0442 is identical to U+1D1B
U+0443 у Cyrillic CYRILLIC SMALL LETTER U [100], [150], [202]   set 18 U+0443 is identical to U+0079
U+0444 ф Cyrillic CYRILLIC SMALL LETTER EF [100], [150], [202]   set 45 U+0444 is identical to U+03C6 in some fonts
U+0445 х Cyrillic CYRILLIC SMALL LETTER HA [100], [150], [202]   set 17 U+0445 is identical to U+0078
U+0450 ѐ Cyrillic CYRILLIC SMALL LETTER IE WITH GRAVE [100], [151]   set 21 U+0450 is identical to U+00E8
U+0451 ё Cyrillic CYRILLIC SMALL LETTER IO [100], [151]   set 22 U+0451 is identical to U+00EF
U+0455 ѕ Cyrillic CYRILLIC SMALL LETTER DZE [100], [150]   set 14 U+0455 is identical to U+0073
U+0456 і Cyrillic CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I [100], [150]   set 7 U+0456 is identical to U+0069
U+0457 ї Cyrillic CYRILLIC SMALL LETTER YI [100], [151]   set 23 U+0457 is identical to U+00EF
U+045B ћ Cyrillic CYRILLIC SMALL LETTER TSHE   set 26 U+045B identical to h-stroke U+0127
U+045C ќ Cyrillic CYRILLIC SMALL LETTER KJE   set 29 U+045C 0301 not reliably distinguished from kappa+tonos U+03B5
U+049B қ Cyrillic CYRILLIC SMALL LETTER KA WITH DESCENDER   set 51 U+049B (not in MSR-2) identical to k with descender U+2C6A
U+04A3 ң Cyrillic CYRILLIC SMALL LETTER EN WITH DESCENDER   set 52 U+04A3 (not in MSR-2) identical to h with descender U+2C68
U+04AB ҫ Cyrillic CYRILLIC SMALL LETTER ES WITH DESCENDER [100], [151]   set 20 U+04AB is identical to U+00E7
U+04BB һ Cyrillic CYRILLIC SMALL LETTER SHHA [100], [150]   set 6 U+04BB is identical to U+0068 and U+0570
U+04CF ӏ Cyrillic CYRILLIC SMALL LETTER PALOCHKA [100], [150]   set 9 U+04CF is identical to U+026A and frequently to U+006C
U+04D5 ӕ Cyrillic CYRILLIC SMALL LIGATURE A IE [100], [150]   set 19 U+04D5 is identical to U+00E6
U+04D7 ӗ Cyrillic CYRILLIC SMALL LETTER IE WITH BREVE   set 25 U+04D7 identical to U+0115
U+04D9 ә Cyrillic CYRILLIC SMALL LETTER SCHWA [100], [150], [202]   set 28 U+04D9 is identical to U+01DD and U+0259
U+04E1 ӡ Cyrillic CYRILLIC SMALL LETTER ABKHASIAN DZE [100], [150]   set 34 U+04E1 is identical to U+0292
U+04E9 ө Cyrillic CYRILLIC SMALL LETTER BARRED O [100], [150]   set 33 U+04E9 is identical to U+0275
U+04F0 Ӱ Cyrillic CYRILLIC CAPITAL LETTER U WITH DIAERESIS   set 24 U+04F0 identical to U+00FF
U+0501 ԁ Cyrillic CYRILLIC SMALL LETTER KOMI DE [150]   set 3 U+0501 is identical to U+0064
U+0511 ԑ Cyrillic CYRILLIC SMALL LETTER REVERSED ZE   set 29 U+0511 not reliably distinguished from Latin EPSILON U+025B and U+03B5
U+051B ԛ Cyrillic CYRILLIC SMALL LETTER QA   set 13 U+051B identical to letter Q U+0071
U+051D ԝ Cyrillic CYRILLIC SMALL LETTER WE   set 16 U+051D identical to letter W U+0077
U+0566 զ Armenian ARMENIAN SMALL LETTER ZA [100], [204]   set 13 U+0566 not reliably distinguished from U+0071
U+0570 հ Armenian ARMENIAN SMALL LETTER HO [100], [204]   set 6 U+0570 is identical to U+0068 abd U+04BB
U+0572 ղ Armenian ARMENIAN SMALL LETTER GHAD [100], [204]   set 44 U+0572 not reliably distinguished from U+03B7
U+0578 ո Armenian ARMENIAN SMALL LETTER VO [100], [204]   set 10 U+0578 not reliably distinguished from U+006E
U+057D ս Armenian ARMENIAN SMALL LETTER SEH [100], [204]   set 15 U+057D not reliably distinguished from U+0075
U+0581 ց Armenian ARMENIAN SMALL LETTER CO [100], [204]   set 5 U+0581 not reliably distinguished from U+0067
U+0582 ւ Armenian ARMENIAN SMALL LETTER YIWN [100], [204]   set 31 U+0582 not reliably distinguished from U+0269 and U+03B9
U+0582 U+0308 ւ̈   set 32 U+0582 0308 not reliably distinguished from U+0269 U+0308 and U+03B9 U+0308
U+0585 օ Armenian ARMENIAN SMALL LETTER OH [202]   set 11 (not in intentional) U+0585 is identical to U+006F, U+03BF and U+043E
U+101D Myanmar MYANMAR LETTER WA [150]   set 53 Letter U+101D is identical to digit U+1040
U+1040 Myanmar MYANMAR DIGIT ZERO [150]   set 53 Digit U+1040 is identical to letter U+101D
U+17A2 Khmer KHMER LETTER QA [100], [150]   set 54 U+17A2 is identical to deprecated U+17A3
U+17A3 Khmer KHMER INDEPENDENT VOWEL QAQ [150] excluded-cp set 54 (deprecated) U+17A3 is identical to U+17A2
U+1835 Mongolian MONGOLIAN LETTER JA [150]   set 55 U+1835 is identical to U+1855
U+1855 Mongolian MONGOLIAN LETTER TODO YA [150]   set 55 U+1855 is identical to U+1835
U+199E New_Tai_Lue NEW TAI LUE LETTER LOW VA [150]   set 56 Letter U+199E is identical to digit U+19D0
U+19B1 New_Tai_Lue NEW TAI LUE VOWEL SIGN AA [150]   set 57 Letter U+19B1 is identical to digit U+19D1
U+19D0 New_Tai_Lue NEW TAI LUE DIGIT ZERO [150]   set 56 Letter U+19D0 is identical to digit U+199E
U+19D1 New_Tai_Lue NEW TAI LUE DIGIT ONE [150]   set 57 Digit U+19D1 is identical to letter U+19B2
U+1B0D Balinese BALINESE LETTER LA LENGA [150]   set 58 Letter U+1B0D is identical to digit U+1B52
U+1B11 Balinese BALINESE LETTER OKARA [150]   set 59 Letter U+1B11 is identical to digit U+1B53
U+1B28 Balinese BALINESE LETTER PA KAPAL [150]   set 60 Letter U+1B28 is identical to digit U+1B58
U+1B52 Balinese BALINESE DIGIT TWO [150]   set 58 U+1B52 is identical to U+1B0D
U+1B53 Balinese BALINESE DIGIT THREE [150]   set 59 Digit U+1B53 is identical to letter U+1B11
U+1B58 Balinese BALINESE DIGIT EIGHT [150]   set 60 Digit U+1B58 is identical to letter U+1B28
U+1D0D Latin LATIN LETTER SMALL CAPITAL M [150]   set 48 U+1D0D is identical to U+043C
U+1D18 Latin LATIN LETTER SMALL CAPITAL P [150]   set 61 U+1D18 is identical to U+1D29
U+1D1B Latin LATIN LETTER SMALL CAPITAL T [150]   set 50 U+1D1B is identical to U+0442
U+1D26 Greek GREEK LETTER SMALL CAPITAL GAMMA [150]   set 46 U+1D26 is identical to U+0433
U+1D28 Greek GREEK LETTER SMALL CAPITAL PI [150]   set 49 U+1D28 is identical to U+043F
U+1D29 Greek GREEK LETTER SMALL CAPITAL RHO [150]   set 61 U+1D29 is identical to U+1D18
U+1D2B Cyrillic CYRILLIC LETTER SMALL CAPITAL EL [150]   set 47 U+1D2B is identical to U+043B
U+1E9F Latin LATIN SMALL LETTER DELTA   set 43 U+1E9F (MSR-2) identical to U+03B4
U+2C68 Latin LATIN SMALL LETTER H WITH DESCENDER   set 52 U+2C68 identical to U+04A3
U+2C6A Latin LATIN SMALL LETTER K WITH DESCENDER   set 51 U+2C6A identical to U+049B
U+2DEA Cyrillic COMBINING CYRILLIC LETTER O [115]   set 40 U+2DEA is identical to U+0366
U+2DED Cyrillic COMBINING CYRILLIC LETTER ES [115]   set 41 U+2DED is identical to U+0368
U+2DEF Cyrillic COMBINING CYRILLIC LETTER HA [115]   set 42 U+2DEF is identical to U+036F
U+2DF6 Cyrillic COMBINING CYRILLIC LETTER A [115]   set 38 U+2DF7 is identical to U+0363
U+2DF7 Cyrillic COMBINING CYRILLIC LETTER IE [115]   set 39 U+2DF7 is identical to U+0364
U+A67C Cyrillic COMBINING CYRILLIC KAVYKA   set 37 U+A67C not reliably distinguishable from U+0306
U+1039A 𐎚 Ugaritic UGARITIC LETTER TO [150]   set 62 U+1039A is identical to U+12038
U+10486 𐒆 Osmanya OSMANYA LETTER DEEL [150]   set 63 U+10486 is identical to U+104A0
U+104A0 𐒠 Osmanya OSMANYA DIGIT ZERO [150]   set 63 U+104A0 is identical to U+10486
U+12038 𒀸 Cuneiform CUNEIFORM SIGN ASH [150]   set 62 U+12038 is identical to U+1039A

Legend

Throughout this LGR, a code point sequence may be annotated with a string in ALL CAPS that is constructed on the same principle as a name for a Unicode Named Sequence. No claim is made that a sequence thus annotated is in fact a named sequence, nor that the annotation in such case actually corresponds to the formal name of a named sequence.

Code Point
A code point or code point sequence.
Name
Shows the character or sequence name from the Unicode Character Database.
Glyph
The shape displayed depends on the fonts available to your browser.
Script
Shows the script property value from the Unicode Character Database. Combining marks may have the value Inherited and code points used with more than one script may have the value Common. Sequences are not annotated with a script value.
References
Links to the references associated with the code point or sequence, if any.
Required Context
Link to the rule defining the required context a code point or sequence must satisfy. If prefixed by "not:", identifies a context that must not occur.
Variants
A link to the variant set the code point or sequence is a member of, except where a coded point or sequence maps only to itself, in which case the type of that mapping is listed.
Comment
If the comment in this row consists only of the code point or sequence name, it is suppressed in this view.
✔ - core repertoire
A check mark in the Part-of-Repertoire column indicates a code point is part of the core repertoire.
✗ - excluded from repertoire
A code point shown with is not part of the repertoire. It is shown only for documentation or review purposes.

Variant Sets

Summary

Number of variant sets 63
Largest variant set 4
Ordinary Variants by Type blocked (6)
cross-script-homoglyph (146)
homoglyph (22)
Reflexive Variants by Type  

The following tables list all variant sets defined in this LGR, except for singleton sets. Each table lists all variant mapping pairs of the set; one per row. Mappings are assumed to be symmetric: each row documents both forward (→) and reverse (←) mapping directions. In each table, the mappings are sorted by Source value in ascending code point order; shading is used to group mappings from the same source code point or sequence.

Where the type of both forward and reverse mappings are the same, a single value is given in the Type(s) column, otherwise the types for forward and reverse mapping are given in that order, as indicated by the arrows. The same applies to any comments.

A mapping where source and target are the same is reflexive. Variant sets consisting of only a single reflexive mapping are not shown as a set. Instead, the variant type of the mapping is listed in the Variants column of the Repertoire by Code Point table. Reflexive mappings that are part of a larger set are indicated with a “≡”.

In a properly specified LGR, all members within each variant set are variants of each other; the mappings in each set are symmetric and transitive; and all variant sets are disjoint.

Common Legend

Source
By convention, the smaller of the two code points in a variant mapping pair.
Target
By convention, the larger of the two code points in a variant mapping pair.
Glyph
The shape displayed for source or target depends on the fonts available to your browser.
- forward
Indicates that variant Type, Ref and Comment apply to the mapping from source to target.
- reverse
Indicates that variant Type, Ref and Comment apply to the reverse mapping from target to source.
- both
Indicates that variant Type, Ref and Comment apply to both forward and reverse mapping.
- reflexive
Indicates that variant Type, Ref and Comment are for a reflexive mapping where source equals target.
Type
The type of the variant mapping. There are some predefined variant types such as “allocatable” and “blocked”, while others are defined specifically for each LGR.
Ref
One or more reference IDs (optional). A "/" separates references for reverse / forward mappings, as appropriate.
Comment
A descriptive comment (optional). A "/" separates comments for reverse / forward mappings, as appropriate.

Variant Set 1 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0061 a 0430 а cross-script-homoglyph [150] / [150], [202] U+0061 (a) is identical to U+0430 (а) / U+0430 (а) is identical to U+0061 (a)

Variant Set 2 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0063 c 0441 с cross-script-homoglyph [150], [202] U+0063 (c) is identical to U+0441 (с) / U+0441 (с) is identical to U+0063 (c)

Variant Set 3 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0064 d 0501 ԁ cross-script-homoglyph [150] U+0064 (d) is identical to U+0501 (ԁ) / U+0501 (ԁ) is identical to U+0064 (d)

Variant Set 4 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0065 e 0435 е cross-script-homoglyph [150], [202] U+0065 (e) is identical to U+0435 (е) / U+0435 (е) is identical to U+0065 (e)

Variant Set 5 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0067 g 0581 ց cross-script-homoglyph [204] U+0067 (g) not reliably distinguished from U+0581 (ց) / U+0581 (ց) not reliably distinguished from U+0067 (g)

Variant Set 6 — 3 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0068 h 04BB һ cross-script-homoglyph [150] U+0068 (h) is identical to U+04BB (һ) and U+0570 (հ) / U+04BB (һ) is not always distinct from U+0068 (h)
0068 h 0570 հ cross-script-homoglyph [202], [204] / [204] U+0068 (h) is identical to U+04BB (һ) and U+0570 (հ) / U+0570 (հ) is not always distinct from U+0068 (h)
04BB һ 0570 հ cross-script-homoglyph [202], [204] / [100], [204] U+04BB (һ) is identical to U+0068 (h) and U+0570 (հ) / U+0570 (հ) is identical to U+0068 (h) and U+04BB (һ)

Variant Set 7 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0069 i 0456 і cross-script-homoglyph [150] / [150], [202] U+0069 (i) is identical to U+0456 (і) / U+0456 (і) is identical to U+0069 (i)

Variant Set 8 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
006A j 03F3 ϳ cross-script-homoglyph [150] / [150], [202] U+006A (j) is identical to U+03F3 (ϳ) / U+03F3 (ϳ) is identical to U+006A (j)

Variant Set 9 — 3 Members

Source Glyph Target Glyph   Type(s) Ref Comment
006C l 026A ɪ homoglyph   / [150] U+006C (l) is similar to 026A in mostly sans-serif fonts / U+026A (ɪ) is sometimes identical to U+04CF (ӏ) and sometimes indistinguishable from 006C
006C l 04CF ӏ cross-script-homoglyph [202] /   (not in intentional) U+006C (l) is frequently identical to U+04CF (ӏ) / (not in intentional) U+04CF (ӏ) is frequently identical to U+0069 (i)
026A ɪ 04CF ӏ cross-script-homoglyph [150] U+026A (ɪ) is identical to U+04CF (ӏ) / U+04CF (ӏ) is commonly identical to U+006C (l) and sometimes identical to 026A

Variant Set 10 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
006E n 0578 ո cross-script-homoglyph [204] U+006E (n) not reliably distinguished from U+0578 (ո) / U+0578 (ո) not always distinct from U+006E (n)

Variant Set 11 — 4 Members

Source Glyph Target Glyph   Type(s) Ref Comment
006F o 03BF ο cross-script-homoglyph [150], [202] U+006F (o) is identical to U+006F (o) , U+043E (о) and U+0585 (օ) / U+03BF (ο) is identical to U+006F (o) and U+043E (о) and U+0585 (օ)
006F o 043E о cross-script-homoglyph [150], [202] / [202] U+006F (o) is identical to U+03BF (ο) and U+043E (о) and U+585 / (not in intentional) U+043E (о) is identical to U+03BF (ο) and 006F and U+0585 (օ)
006F o 0585 օ cross-script-homoglyph [100], [150], [202] / [202] U+006F (o) is identical to U+03BF (ο) and U+043E (о) and U+585 / (not in intentional) U+0585 (օ) is identical to U+03BF (ο) and U+043E (о) and U+006F (o)
03BF ο 043E о cross-script-homoglyph [150], [202] / [202] U+03BF (ο) is identical to U+069 and U+043E (о) and U+585 / (not in intentional) U+043E (о) is identical to U+006F (o) , U+03BF (ο) and U+0585 (օ)
03BF ο 0585 օ cross-script-homoglyph [100], [150], [202] / [202] U+03BF (ο) is identical to U+043E (о) , U+0069 (i) and U+0585 (օ) / (not in intentional) U+0585 (օ) is identical to U+006F (o) , U+043E (о) and U+03BF (ο)
043E о 0585 օ cross-script-homoglyph [202] (not in intentional) U+043E (о) is identical to U+03BF (ο) , U+0069 (i) and U+0585 (օ) / (not in intentional) U+0585 (օ) is identical to U+03BF (ο) and U+043E (о) and U+006F (o)

Variant Set 12 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0070 p 0440 р cross-script-homoglyph [150], [202] U+0070 (p) is identical to U+0440 (р) / U+0440 (р) is identical to U+0070 (p)

Variant Set 13 — 3 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0071 q 051B ԛ cross-script-homoglyph   U+0071 (q) identical to letter Q U+051B (ԛ) / U+051B (not in MSR-2) identical to U+0071 (q) and not reliably distinguished from U+0566 (զ)
0071 q 0566 զ cross-script-homoglyph [204] U+0071 (q) not reliably distinguished from U+0566 (զ) / U+0566 (զ) not reliably distinguished from U+0071 (q)
051B ԛ 0566 զ blocked   Required for Symmetry
cross-script-homoglyph   U+051B (not in MSR-2) identical to U+0071 (q)

Variant Set 14 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0073 s 0455 ѕ cross-script-homoglyph [150] / [150], [202] U+0073 (s) is identical to U+0455 (ѕ) / U+0455 (ѕ) is identical to U+0073 (s)

Variant Set 15 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0075 u 057D ս cross-script-homoglyph [204] U+0075 (u) not reliably distinguished from U+057D (ս) / U+057D (ս) not always distinct from U+0075 (u)

Variant Set 16 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0077 w 051D ԝ cross-script-homoglyph   U+0077 (w) identical to letter W U+051D (ԝ) / U+051D (ԝ) (not in MSR-2) identical to U+0077 (w)

Variant Set 17 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0078 x 0445 х cross-script-homoglyph [150], [202] U+0078 (x) is identical to U+0445 (х) / U+0445 (х) is identical to U+0078 (x)

Variant Set 18 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0079 y 0443 у cross-script-homoglyph [150], [202] U+0079 (y) is identical to U+0443 (у) / U+0443 (у) is identical to U+0079 (y)

Variant Set 19 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
00E6 æ 04D5 ӕ cross-script-homoglyph [150] / [150], [202] U+00E6 (æ) is identical to U+04D5 (ӕ) / U+04D5 (ӕ) is identical to U+00R6

Variant Set 20 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
00E7 ç 04AB ҫ cross-script-homoglyph [100], [151] U+00E7 (ç) is identical to U+04AB (ҫ) / U+04AB (ҫ) is identical to U+00E7 (ç)

Variant Set 21 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
00E8 è 0450 ѐ cross-script-homoglyph [151] U+00E8 (è) is identical to U+0450 (ѐ) / U+0450 (ѐ) is identical to U+00E8 (è)

Variant Set 22 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
00EB ë 0451 ё cross-script-homoglyph [151] U+00EB (ë) is identical to U+0451 (ё) / U+0451 (ё) is identical to U+00EB (ë)

Variant Set 23 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
00EF ï 0457 ї cross-script-homoglyph [151] U+00EF (ï) is identical to U+0457 (ї) / U+0457 (ї) is identical to U+00EF (ï)

Variant Set 24 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
00FF ÿ 04F0 Ӱ cross-script-homoglyph   U+00FF (ÿ) identical to U+04F0 (Ӱ) / U+04F0 (Ӱ) identical to U+00FF (ÿ)

Variant Set 25 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0115 ĕ 04D7 ӗ cross-script-homoglyph   U+0115 (ĕ) identical to U+04D7 (ӗ) / U+04D7 (ӗ) identical to U+0115 (ĕ)

Variant Set 26 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0127 ħ 045B ћ cross-script-homoglyph   U+0127 (ħ) identical to h-stroke U+045B (ћ) / U+045B (ћ) identical to U+0127 (ħ)

Variant Set 27 — 3 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0138 ĸ 03BA κ cross-script-homoglyph [150] / [150], [202] Obsolete U+0138 (ĸ) is identical to U+043A (к) and U+03BA (κ) / U+03BA (κ) is identical to U+043A (к) and to obsolete U+0138 (ĸ)
0138 ĸ 043A к cross-script-homoglyph [150] U+0138 (ĸ) is identical to U+XXXX / U+043A (к) is identical to U+03BA (κ) and to obsolete U+0138 (ĸ)
03BA κ 043A к cross-script-homoglyph [202] (not in intentional) U+03BA (κ) is not reliably distinguished from U+043A (к) / (not in intentional) U+043A (к) is not reliably distinguished from U+03BA (κ) and identical to obsolete U+0138 (ĸ)

Variant Set 28 — 3 Members

Source Glyph Target Glyph   Type(s) Ref Comment
01DD ǝ 0259 ə homoglyph [150] U+01DD (ǝ) is identical to U+0259 (ə) and U+04D9 (ә) / U+0259 (ə) is identical to U+01DD (ǝ) and U+04D9 (ә)
01DD ǝ 04D9 ә cross-script-homoglyph [150], [202] U+01DD (ǝ) is identical to U+0259 (ə) and 04D9 / U+04D9 (ә) is identical to U+0259 (ə) and U+01DD (ǝ)
0259 ə 04D9 ә cross-script-homoglyph [150] U+0259 (ə) is identical to U+01DD (ǝ) and U+04D9 (ә) / U+04D9 (ә) is identical to U+0259 (ə) and U+01DD (ǝ)

Variant Set 29 — 4 Members

Source Glyph Target Glyph   Type(s) Ref Comment
025B ɛ 03B5 ε cross-script-homoglyph [150] U+025B (ɛ) is identical to U+03B5 (ε) and not reliably distinguished from U+0511 (ԑ) / U+03B5 (ε) is identical to U+025B (ɛ)
025B ɛ 045C ќ blocked [150] /   / Added for Transitivity /
025B ɛ 0511 ԑ cross-script-homoglyph   U+025B (ɛ) not reliably distinguished from Latin EPSILON U+0511 (ԑ) / U+0511 (not in MSR-2) not reliably distinguished from U+025B (ɛ) and U+03B5 (ε)
03B5 ε 045C ќ blocked   Required for Symmetry
cross-script-homoglyph   U+03B5 (ε) 0301 not reliably distinguished from kappa+tonos U+045C (ќ)
03B5 ε 0511 ԑ cross-script-homoglyph [100], [150] /   U+03B5 (ε) is not reliably distinguished from U+0511 (ԑ) and identical to U+025B (ɛ) / U+0511 (not in MSR-2) not reliably distinguished from U+025B (ɛ) and U+03B5 (ε)
045C ќ 0511 ԑ blocked   / Added for Transitivity /

Variant Set 30 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
025C ɜ 0437 з cross-script-homoglyph [202] / [150] U+025C (ɜ) is identical to U+0437 (з) / U+0437 (з) is not always distinct from U+025C (ɜ)

Variant Set 31 — 3 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0269 ɩ 03B9 ι cross-script-homoglyph [150], [204] U+0269 (ɩ) not reliably distinguished from U+0582 (ւ) and identical to U+03B9 (ι) / U+03B9 (ι) not reliably distinguished from U+0582 (ւ) and identical to U+0269 (ɩ)
0269 ɩ 0582 ւ cross-script-homoglyph [204] U+0269 (ɩ) not reliably distinguished from U+0582 (ւ) and U+03B9 (ι) / U+0582 (ւ) not reliably distinguished from U+0269 (ɩ) and U+03B9 (ι)
03B9 ι 0582 ւ cross-script-homoglyph [204] U+03B9 (ι) not reliably distinguished from U+0582 (ւ) ans U+0269 (ɩ) / U+0582 (ւ) not reliably distinguished from U+0269 (ɩ) and U+03B9 (ι)

Variant Set 32 — 3 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0269 0308 ɩ̈ 03CA ϊ cross-script-homoglyph [151] / [100], [151] U+0269 U+0308 (ɩ̈) not reliably distinguished from U+0582 (ւ) and identical to U+03CA (ϊ) / U+03ca (ϊ) not reliably distinguished from U+0582 (ւ) u+0308and identical to U+0269 U+0308 (ɩ̈)
0269 0308 ɩ̈ 0582 0308 ւ̈ cross-script-homoglyph [151] /   U+0269 U+0308 (ɩ̈) not reliably distinguished from U+0582 (ւ) and identical to U+03B9 (ι) / U+0582 (ւ) 0308 not reliably distinguished from U+0269 U+0308 (ɩ̈) and U+03B9 U+0308 (ϊ)
03CA ϊ 0582 0308 ւ̈ cross-script-homoglyph [100], [151] /   U+03ca (ϊ) not reliably distinguished from U+0582 (ւ) u+0308and identical to U+0269 U+0308 (ɩ̈) / U+0582 U+0308 (ւ̈) not reliably distinguished from U+0269 U+0308 (ɩ̈) and U+03CA (ϊ)

Variant Set 33 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0275 ɵ 04E9 ө cross-script-homoglyph [150] U+0275 (ɵ) is identical to U+04E9 (ө) / U+04E9 (ө) is identical to U+0275 (ɵ)

Variant Set 34 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0292 ʒ 04E1 ӡ cross-script-homoglyph [150] U+0292 (ʒ) is identical to U+04E1 (ӡ) / U+04E1 (ӡ) is identical to U+0292 (ʒ)

Variant Set 35 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0299 ʙ 0432 в cross-script-homoglyph [150] U+0299 (ʙ) is identical to U+0432 (в) / U+0432 (в) is identical to U+0299 (ʙ)

Variant Set 36 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
029C ʜ 043D н cross-script-homoglyph [150] U+029C (ʜ) is identical to U+043D (н) / U+043D (н) is identical to U+029C (ʜ)

Variant Set 37 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0306 ̆ A67C cross-script-homoglyph   U+0306 (̆) not reliably distinguishable from U+A67C (꙼) / U+A67C (not in MSR-2) not reliably distinguishable from U+0306 (̆)

Variant Set 38 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0363 ͣ 2DF6 cross-script-homoglyph   U+0363 (ͣ) is identical to U+2DF6 (ⷶ) COMBINING LATIN SMALL LETTER A / U+2DF6 (ⷶ) is identical to U+0363 (ͣ)

Variant Set 39 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0364 ͤ 2DF7 cross-script-homoglyph   U+0364 (ͤ) is identical to U+2DF7 (ⷷ) COMBINING LATIN SMALL LETTER E / U+2DF7 (ⷷ) is identical to U+0364 (ͤ)

Variant Set 40 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0366 ͦ 2DEA cross-script-homoglyph   U+0366 (ͦ) is identical to U+2DEA (ⷪ) COMBINING LATIN SMALL LETTER O / U+2DEA (ⷪ) is identical to U+0366 (ͦ)

Variant Set 41 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0368 ͨ 2DED cross-script-homoglyph   U+0368 (ͨ) is identical to U+2DED (ⷭ) COMBINING LATIN SMALL LETTER C / U+2DED (ⷭ) is identical to U+0368 (ͨ)

Variant Set 42 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
036F ͯ 2DEF cross-script-homoglyph   U+036F (ͯ) is identical to U+2DEF (ⷯ) COMBINING LATIN SMALL LETTER X / U+2DEF (ⷯ) is identical to U+036F (ͯ)

Variant Set 43 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
03B4 δ 1E9F cross-script-homoglyph   U+03B4 (MSR-2) identical to U+1E9F (ẟ) / U+1E9F (MSR-2) identical to U+03B4 (δ)

Variant Set 44 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
03B7 η 0572 ղ cross-script-homoglyph [204] U+03B7 (η) not reliably distinguished from U+0572 (ղ) / U+0572 (ղ) not reliably distinguished from U+03B7 (η)

Variant Set 45 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
03C6 φ 0444 ф cross-script-homoglyph [150], [202] U+03C6 (φ) in some fonts is identical to U+0444 (ф) / U+0444 (ф) is identical to U+03C6 (φ) in some fonts

Variant Set 46 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0433 г 1D26 cross-script-homoglyph [150] U+0433 (г) is identical to U+1D26 (ᴦ) / U+1D26 (ᴦ) is identical to U+0433 (г)

Variant Set 47 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
043B л 1D2B cross-script-homoglyph [150] U+043B (л) is identical to U+1D2B (ᴫ) / U+1D2B (ᴫ) is identical to U+043B (л)

Variant Set 48 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
043C м 1D0D cross-script-homoglyph [150] U+043C (м) is identical to U+1D0D (ᴍ) / U+1D0D (ᴍ) is identical to U+043C (м)

Variant Set 49 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
043F п 1D28 cross-script-homoglyph [150] U+043F (п) is identical to U+1D28 (ᴨ) / U+1D28 (ᴨ) is identical to U+043F (п)

Variant Set 50 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
0442 т 1D1B cross-script-homoglyph [150] U+0442 (т) is identical to U+1D1B (ᴛ) / U+1D1B (ᴛ) is identical to U+0442 (т)

Variant Set 51 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
049B қ 2C6A cross-script-homoglyph   U+049B (қ) identical to U+2C6A (ⱪ) / U+2C6A (not in MSR-2) identical to k with descender U+049B (қ)

Variant Set 52 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
04A3 ң 2C68 cross-script-homoglyph   U+04A3 (ң) identical to U+2C68 (ⱨ) / U+2C68 (not in MSR-2) identical to h with descender U+04A3 (ң)

Variant Set 53 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
101D 1040 homoglyph [150] Letter U+101D (ဝ) is identical to Digit U+1040 (၀) / Digit U+1040 (၀) is identical to letter U+101D (ဝ)

Variant Set 54 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
17A2 17A3 homoglyph [150] U+17A2 (អ) is identical to deprecated U+17A3 (ឣ) / (deprecated) U+17A3 (ឣ) is identical to U+017A2 (អ)

Variant Set 55 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
1835 1855 homoglyph [150] U+1835 (ᠵ) is identical to U+1855 (ᡕ) / U+1855 (ᡕ) is identical to U+1835 (ᠵ)

Variant Set 56 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
199E 19D0 homoglyph [150] Letter U+199E (ᦞ) is identical to digit U+19D0 (᧐) / Digit U+19D0 (᧐) is identical to letter U+199E (ᦞ)

Variant Set 57 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
19B1 19D1 homoglyph [150] Letter U+19B1 (ᦱ) is identical to digit U+19D1 (᧑) / Digit U+19B1 (ᦱ) is identical to letter U+19D1 (᧑)

Variant Set 58 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
1B0D 1B52 homoglyph [150] Digit U+1B0D (ᬍ) is identical to letter U+1B0D (ᬍ) / Digit U+1B52 (᭒) is identical to letter U+1B0D (ᬍ)

Variant Set 59 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
1B11 1B53 homoglyph [150] Letter U+1B11 (ᬑ) is identical to digit U+1B53 (᭓) / Digit U+1B53 (᭓) is identical to letter U+1B11 (ᬑ)

Variant Set 60 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
1B28 1B58 homoglyph [150] Letter U+1B28 (ᬨ) is identical to digit U+1B58 (᭘) / Digit U+1B58 (᭘) is identical to letter U+1B28 (ᬨ)

Variant Set 61 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
1D18 1D29 cross-script-homoglyph [150] U+1D18 (ᴘ) is identical to U+1D29 (ᴩ) / U+1D29 (ᴩ) is identical to U+1D18 (ᴘ)

Variant Set 62 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
1039A 𐎚 12038 𒀸 cross-script-homoglyph [150] U+1039A (𐎚) is identical to U+12038 (𒀸) / U+12038 (𒀸) is identical to U+1039A (𐎚)

Variant Set 63 — 2 Members

Source Glyph Target Glyph   Type(s) Ref Comment
10486 𐒆 104A0 𐒠 homoglyph [150] U+10486 (𐒆) is identical to U+104A0 (𐒠) / U+104A0 (𐒠) is identical to U+10486 (𐒆)

Classes, Rules and Actions

Character Classes

The LGR does not define any named or implicit character classes.

Whole label evaluation and context rules

The following table lists all named rules defined in the LGR and indicates whether they are used as trigger in an action or as context (when or not-when) for a code point. (Any use of context rules for variants is not indicated).

Name Used as
Trigger
Used as
Context
Anchor Regular Expression Ref Comment
excluded-cp     (^$)   This rule matches the empty label,; if used as context rule for a code point, it invalidates any label that contains the code point, effectively excluding the code point from the eligible repertoire
preceding-hamza-above     (⚓(?=\u0654))   match if code point precedes U+0654
following-soft-dotted       ((?<=[∅=\p{SD=Y}]))   match if code point follows a soft-dotted character

Legend

Used as Trigger
This rule triggers one of the actions listed below.
Used as Context
This rule defines a required context for a code point.
Anchor
This has a placeholder for the code point for which it is evaluated.
Regular Expression
A regular expression equivalent to the rule, shown in the standard notation with some extensions as noted:
⚓ - context anchor
Placeholder for the actual code point, when a context is evaluated. The code point must occur at the position corresponding to the anchor. Rules containing an anchor cannot be used as triggers.
(?<=...) - look-behind
If present encloses required context preceding the anchor.
(?=...) - look-ahead
If present encloses required context following the anchor.
[\p{ }] - property character set
A character set defined by reference to a value for a given Unicode property [\p{prop=val}]. A set defined via "\P" indicates the set complement.
∅= - empty set
Indicates that the following set is empty because of the result of set operations, or because none of its elements are part of the repertoire defined here. A rule with a non-optional empty set never matches.
(^$) - empty label
The regex (^$) matches the empty label. Used as a context rule, it always fails to match, thus disallowing the affected code point in any label. By convention, it is used for context rules that disable code points that are not part of the repertoire, yet explicitly listed in the LGR as excluded or for optional future extension.

Note: The following rules are defined but not used in this LGR: preceding-hamza-above, following-soft-dotted.

Actions

The following table lists the actions that are used to assign dispositions to labels and variant labels, based on the specified conditions. The order of actions defines their precedence: the first action triggered by a label is the one defining its disposition.

# Condition Rule / Variant Set   Disposition Ref Comment
1 if at least one variant is in {homoglyph} blocked   homoglyphs are mutually exclusive by default
2 if at least one variant is in {cross-script-homoglyph} blocked   cross-script-homoglyphs are mutually exclusive by default

Legend

{...} - variant type set
In the "Rule/Variant Set" column, the notation {...} means a set of variant types.

Note: The following variant types defined in this LGR are not used as triggers for any actions: blocked. This is not necessarily an error. Labels containing such types are usually handled in the Catch All action.

Table of References

[100] MSR-2
Code points included in MSR-2
[115] MSR-2
Code points excluded from MSR-2
[150] The Unicode Consortium, "Intentional.txt", Version 10.0.0,
http://www.unicode.org/Public/security/10.0.0/intentional.txt
Code points considered identical by intention
[151] Derived from NFC plus The Unicode Consortium, "Intentional.txt", Version 10.0.0,
http://www.unicode.org/Public/security/10.0.0/intentional.txt
Combining sequences involving code points considered identical by intention, after applying NFC.
[202] Cyrillic LGR Root Zone
[204] Armenian LGR Root Zone