Code page 950

Source: Wikipedia, the free encyclopedia.
Code page 950
Traditional Chinese
Created byMicrosoft
ExtendsBig5
Based onBig5-ETen

Code page 950 is the

Traditional Chinese. It is Microsoft's implementation of the de facto standard Big5 character encoding. The code page is not registered with IANA,[1] and hence, it is not a standard to communicate information over the internet, although it is usually labelled simply as big5, including by Microsoft library functions.[2]

Terminology and variants

The major difference between Windows code page 950 and "common" (non-vendor-specific) Big5 is the incorporation of a subset of the

box drawing characters and block elements). The ranges used by some of the other ETEN extended characters are instead defined as end-user defined (private use) characters.[3]

IBM's

Private Use Area as user-defined characters.[3][11] It also includes two non-ETEN extension regions with trail bytes 0x81–A0, i.e. outside the usual Big5 trail byte range but similar to the Big5+ trail byte range: area 5 has lead bytes 0xF2–F9 and contains IBM-selected characters, while area 9 has lead bytes 0x81–8C and is a user-defined region.[12]

Microsoft updated their version of code page 950 in 2000, adding the euro sign (€) at the double-byte code 0xA3E1. IBM refers to the euro sign update of their Big-5 variant as CCSID 1370 (which includes both single-byte (0x80) and double-byte euro signs).[13] It comprises single byte code page 1114 (CCSID 5210) and double byte code page 947 (CCSID 21427).[13][14][15]

For better compatibility with Microsoft's variant in IBM Db2, IBM also define the pure double-byte Code page 1372[16] and associated variable-width CCSID 1373, which includes only the double-byte euro sign[17] and matches Microsoft behaviour in which extension regions are included.[18][19][20][21][22]

Single byte codes

The following are the single-byte graphical characters included by IBM. The codes 0x00 though 0x1F and 0x7F may be used for

code page 897
). As noted above, the single-byte euro sign at 0x80 is not included in IBM CCSIDs 950 or 1373, nor by Microsoft.

Code page 1114[23][24]
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x
1x
§
2x  
SP
 
!
"
#
$
%
&
'
(
)
*
+
,
-
. /
3x 0 1 2 3 4 5 6 7 8 9 : ; <
=
>
?
4x
@
A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z
[
\
]
^
_
6x
`
a b c d e f g h i j k l m n o
7x p q r s t u v w x y z
{
|
}
~
8x

The rest are parts of a double byte sequence.

Private Use Area usage

Mapping from Big5 EUDC to PUA code points[25]
Big5 range Unicode range Formula[26]
81 40–8D FE U+EEB8–U+F6B0 0xeeb8 + (157 * (H-0x81)) + (L<0x80)?(L-0x40):(L-0x62)
8E 40–A0 FE U+E311–U+EEB7 0xe311 + (157 * (H-0x8e)) + (L<0x80)?(L-0x40):(L-0x62)
C6 A1–C8 FE U+F6B1–U+F848 0xf672 + (157 * (H-0xc6)) + (L<0x80)?(L-0x40):(L-0x62)
FA 40–FE FE U+E000–U+E310 0xe000 + (157 * (H-0xfa)) + (L<0x80)?(L-0x40):(L-0x62)

This mapping is also used in

HKSCS where a given glyph is not yet found in the Unicode revision specified.[27]

See also

References

  1. ^ "Character Sets". IANA — Protocol Registries.
  2. ^ "Encoding.WindowsCodePage Property - .NET Framework (current version)". MSDN. Microsoft.
  3. ^ . RFC 1922.
  4. ^ "CCSID 950 information document". Archived from the original on 2014-12-02.
  5. ^ "CCSID 1114 information document". Archived from the original on 2016-03-27.
  6. ^ "CCSID 947 information document". Archived from the original on 2014-12-01.
  7. ^ "Lead byte A3: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  8. ^ "Lead byte C6: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  9. ^ "Lead byte C7: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  10. ^ "Lead byte C8: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  11. ^ "Lead byte F9: ibm-950_P110-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
  12. ^ "IBM Traditional Chinese Graphic Character Set for IBM BIG-5 Code" (PDF). IBM. 1999. C-H 3-3220-131 1999-04.
  13. ^ a b "CCSID 1370 information document". Archived from the original on 2016-03-27.
  14. ^ "CCSID 5210 information document". Archived from the original on 2014-11-29.
  15. ^ "CCSID 21427 information document". Archived from the original on 2016-03-27.
  16. ^ "CPGID 01372: MS T-Chinese Big-5 (Special for DB2)". IBM Globalization - Code page identifiers. Archived from the original on 2016-03-17.
  17. ^ "ibm-1373_P100-2002". ICU Demonstration - Converter Explorer. International Components for Unicode.
  18. ^ "Lead byte A3: ibm-1373_P100-2002". ICU Demonstration - Converter Explorer. International Components for Unicode.
  19. ^ "Lead byte C6: ibm-1373_P100-2002". ICU Demonstration - Converter Explorer. International Components for Unicode.
  20. ^ "Lead byte C7: ibm-1373_P100-2002". ICU Demonstration - Converter Explorer. International Components for Unicode.
  21. ^ "Lead byte C8: ibm-1373_P100-2002". ICU Demonstration - Converter Explorer. International Components for Unicode.
  22. ^ "Lead byte F9: ibm-1373_P100-2002". ICU Demonstration - Converter Explorer. International Components for Unicode.
  23. ^ Code Page CPGID 01114 (pdf) (PDF), IBM
  24. ^ Code Page CPGID 01114 (txt), IBM
  25. ^ "Windows Best Fit Chart: CP950". unicode.org. Retrieved 13 September 2016.
  26. ^ "Big5". Kanji Database. Retrieved 13 September 2016.
  27. ^ "Big5-HKSCS:2008". Archived from the original on 2016-09-13.