Precomposed character
A precomposed character (alternatively composite character or decomposable character) is a
Precomposed characters are the legacy solution for representing many special letters in various
Comparing precomposed and decomposed characters
In the following example, there is a common
- Åström (U+00C5 U+0073 U+0074 U+0072 U+00F6 U+006D)
- Åström (U+0041 U+030A U+0073 U+0074 U+0072 U+006F U+0308 U+006D)
Except for the different colors, the two solutions are equivalent and should render identically. In practice, however, some Unicode implementations still have difficulties with decomposed characters. In the worst case, combining diacritics may be disregarded or rendered as unrecognized characters after their base letters, as they are not included in all fonts. To overcome the problems, some applications may simply attempt to replace the decomposed characters with the equivalent precomposed characters.
With an incomplete font, however, precomposed characters may also be problematic – especially if they are more exotic, as in the following example (showing the reconstructed Proto-Indo-European word for "dog"):
- ḱṷṓn (U+1E31 U+1E77 U+1E53 U+006E)
- ḱṷṓn (U+006B U+0301 U+0075 U+032D U+006F U+0304 U+0301 U+006E)
In some situations, the precomposed green k, u and o with diacritics may render as unrecognized characters, or their typographical appearance may be very different from the final letter n with no diacritic. On the second line, the base letters should at least render correctly even if the combining diacritics could not be recognized.
OpenType has the ccmp "feature tag" to define glyphs that are compositions or decompositions involving combining characters.
Chinese characters
In theory, most
See also
- List of precomposed Latin characters in Unicode
- Dead key
- Compose key
- Combining character
- Unicode equivalence
- Complex text layout
- Unicode compatibility characters
- Alphabetic Presentation Forms – (Unicode block)
- Arabic Presentation Forms-A – (Unicode block)
- Arabic Presentation Forms-B – (Unicode block)
Sources
- The Unicode Standard, Version 5.2: Conformance (see Section 3.7 for Decomposition). The Unicode Consortium, December 2009.
- MSDN: Defining a Character Set. April 8, 2010.
- Unicode Normalization Forms (Unicode® Standard Annex #15): http://unicode.org/reports/tr15/
External links
- Free Idg Serif, a derivative of the FreeSeriffont with added declarations of precomposed characters.