CNS 11643

Source: Wikipedia, the free encyclopedia.
CNS 11643
Alias(es)CSIC (Chinese Standard Interchange Code)
Language(s)
ISO 2022, DBCS, CJK encoding
Encoding formats
Other related encoding(s)
CCCII

The CNS 11643 character set (Chinese National Standard 11643), also officially known as the Chinese Standard Interchange Code or CSIC[1] (Chinese: 中文標準交換碼), is officially the standard character set of Taiwan (Republic of China). In practice, variants of the related Big5 character set are de facto standard.

CNS 11643 is designed to conform to

CCCII
, the encoding of variant characters in CNS 11643 is not related.

EUC-TW is an encoded representation of CNS 11643 and ASCII in Extended Unix Code (EUC) form. Other encodings capable of representing certain CSIC planes include ISO-2022-CN (planes 1 and 2) and ISO-2022-CN-EXT
(planes 1 through 7).

History

The first edition of the standard was published in 1986, and included planes 1 and 2, deriving from levels 1 and 2 of

HKSCS characters;[3] see also Kangxi Radicals (Unicode block)). Extensions to the standard were subsequently published in 1988 (6319 characters, occupying plane 14) and 1990 (7169 characters, occupying plane 15).[2]
: 115–122 

Unicode 1.0.0, although it did not yet include

hanzi, included characters for compatibility with CNS 11643: the CJK Compatibility Forms block was titled "CNS 11643 Compatibility" in Unicode 1.0.0.[4] When the Unicode CJK Unified Ideographs set was being compiled for Unicode 1.0.1, the national bodies submitted character sets to the CJK Joint Research Group for inclusion. The version of CNS 11643 submitted included the plane 14 extension, in addition to further desired characters appended to plane 14 (after 68–21, the last used code point in the standard version of the extension).[2]
: 179–180 

In the second edition of the standard, published in 1992, a much larger collection of

code points 01-01 through 66–38, became plane 3 (with the remaining 171 characters, code points 66-39 through 68–21, being instead distributed amongst plane 4). The plane 15 extension was not included, although 338 of its characters were included amongst planes 4 through 7.[2]
: 115–122 

The third edition of the standard, published in 2007, added the

Roman alphabet support to plane 1. It introduced planes 10 through 14, containing additional hanzi, and incorporated the existing plane 15 extension into the standard itself (with gaps left where the characters already existed in planes 4 through 7). It also added 128 further hanzi to plane 3, starting at code point 68–40.[2]
: 115–122 

As of 2017[update], there are several thousand CNS 11643 characters with no corresponding Unicode character, mostly in planes 10 through 14; these are mapped to the Unicode Supplementary Private Use Area.[5]

Relationship to Big5

Levels 1 and 2 of the

hanzi characters in Big5 or HKSCS),[3] and further additional characters were added to CNS 11643 plane 1 in 2007.[2]: 115–122  The Big5-2003
variant of Big5 is defined as a partial encoding of CNS 11643.

Within the Big5 hanzi repertoire, only one plane 1 character is conventionally mapped to Unicode differently from the corresponding character from the first two CNS 11643 planes: to U+5F5D (

Unihan database currently maps the CNS 11643 character to U+7B9A (); U+5284 appears in CNS 11643 plane 14.[3]

References

  1. ISO-IR
    -171.
  2. ^ .
  3. ^
    UTC
    L2/22-288.
  4. ^ "3.8: Block-by-Block Charts" (PDF). The Unicode Standard. version 1.0. Unicode Consortium.
  5. ^ "CNS 11643 in Unicode's Supplementary Private Use Area". [chinese mac]. Council on East Asian Studies at Yale University.
  6. ^ Lunde, Ken (1995-12-18). "4.3: CJK Character Set Compatibility Issues - Chinese (Taiwan)". CJK.INF Version 1.9.
  7. IETF
    .
  8. Adobe Inc
    .
  9. ^ "ibm-950_P110-1999 (lead byte 0xC2)". International Components for Unicode Converter Explorer. Unicode Consortium. Archived from the original on 2021-07-12.
  10. ^ "ibm-950_P110-1999.ucm". ICU Data Repository. IBM/Unicode Consortium. 2007. <U5284> \xE3\x5A |0

External links