GB 2312
You can help expand this article with text translated from the corresponding article in Chinese. (September 2016) Click [show] for important translation instructions.
|
MIME / IANA | GB_2312-80 (GB2312 for usual EUC form) |
---|---|
Alias(es) | iso-ir-58, chinese, csGB2312, csISO58GB231280 |
Language(s) | |
Extensions | ISO-IR-165 |
Encoding formats |
|
Preceded by | Chinese telegraph code |
Succeeded by | GBK, GB 18030 |
Other related encoding(s) | JIS X 0208, KS X 1001 |
GB/T 2312-1980 is a key official
GB/T 2312-1980 was originally a mandatory national standard designated GB 2312-1980. However, following a National Standard Bulletin of the
As of September 2022[update], GB2312 is the second-most popular encoding served from China and territories (after UTF-8), with 5.5% of web servers serving a page declaring it.[3] Globally, GB2312 is declared on 0.1% of all web pages.[4] However, all major web browsers decode GB2312-marked documents as if they were marked with the superset GBK encoding, except for Safari and Edge on the label GB_2312
.[5]
There is an analogous character set known as
Character range in rows
While GB/T 2312 covers over 99.99% contemporary Chinese text usage,
Characters in GB/T 2312 are arranged in a 94×94 grid (as in
The rows (numbered from 1 to 94) contain characters as follows:
- 01–09, comprising punctuation and other special characters; also Hiragana, Katakana, Greek, Cyrillic, Pinyin, Bopomofo
- 16–55, the first level of Chinese characters, arranged according to Pinyin. (3755 characters).
- 56–87, the second level of Chinese characters, arranged according to radical and strokes. (3008 characters).
The rows 10–15 and 88–94 are unassigned.
For GB/T 2312-1980, it contains 682 signs and 6763 Chinese Characters.
Encodings of GB/T 2312
EUC-CN
0xA1–0xF7
(161–247), while the value of the second byte is from 0xA1–0xFE
(161–254). Since all of these ranges are beyond ASCII, like UTF-8, it is possible to check if a byte is part of a multi-byte construct when using EUC-CN, but not if a byte is first or last.
Compared to UTF-8, GB/T 2312 (whether native or encoded in EUC-CN) is more storage efficient: while UTF-8 uses three bytes[a] per CJK ideograph, GB/T 2312 only uses two. However, GB/T 2312 does not cover as many ideographs as Unicode does.
To map the qūwèi code points to EUC bytes, add 160 (0xA0
) to both the row number (or qū, 区) and cell/column number (ten or wèi, 位). The result of addition to the row number of the code point will form the high byte, and the result of addition to the cell number of the code point will form the low byte.
For example, to encode the character "外" at qūwèi cell 45-66, the high byte will use the row number 45: 45+160=205=0xCD
, and the low byte will come from the cell number 66: 66+160=226=0xE2
. So, the full encoding is <CD E2>
.[10][11]
ISO-2022-CN
0x21–0x77
(33–119), while the value of the second byte is from 0x21–0x7E
(33–126). As the byte range overlaps ASCII significantly, special characters are required to indicate whether a character is in the ASCII range or is part of the two-byte sequence of extended region, namely the Shift Out and Shift InTo map the qūwèi code points to ISO-2022 bytes, add 32 (0x20
) to both the row number (or qū, 区) and cell/column number (or wèi, 位). The result of addition to the row number of the code point will form the high byte, and the result of addition to the cell number of the code point will form the low byte similar to EUC encoding.
For example, to encode the character "外" at qūwèi cell 45-66, the high byte will use the row number 45: 45+32=77=0x4D
, and the low byte will come from the cell number 66: 66+32=98=0x62
. So, the full encoding is <4D 62>
.[11]
HZ
HZ is another encoding of GB/T 2312 that is used mostly for Usenet postings; characters are represented with the same byte pairs as in ISO-2022-CN, but the byte sequences denoting the beginning and end of a range of GB 2312 text differ.
Code charts
In the tables below, where a pair of hexadecimal numbers is given for a prefix byte or a coding byte, the smaller (with the eighth bit unset or unavailable) is used when encoded over GL (
When GB/T 2312 is encoded over GR, both bytes have the eighth bit set (i.e. are greater than 0x7F). GBK and GB 18030 also make use of two-byte codes in which only the first byte has the eighth bit set for extension purposes: such codes are outside of the GB/T 2312 plane, and are not tabulated here.
Lead byte
This chart details the overall layout of the main plane of the GB/T 2312 character set by lead byte. For lead bytes used for characters other than
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
2x/Ax | SP[b]
|
1-_ | 2-_ | 3-_ | 4-_ | 5-_ | 6-_ | 7-_ | 8-_ | 9-_ | 10-_ | 11-_ | 12-_ | 13-_ | 14-_ | 15-_ |
3x/Bx | 16-_ | 17-_ | 18-_ | 19-_ | 20-_ | 21-_ | 22-_ | 23-_ | 24-_ | 25-_ | 26-_ | 27-_ | 28-_ | 29-_ | 30-_ | 31-_ |
4x/Cx | 32-_ | 33-_ | 34-_ | 35-_ | 36-_ | 37-_ | 38-_ | 39-_ | 40-_ | 41-_ | 42-_ | 43-_ | 44-_ | 45-_ | 46-_ | 47-_ |
5x/Dx | 48-_ | 49-_ | 50-_ | 51-_ | 52-_ | 53-_ | 54-_ | 55-_ | 56-_ | 57-_ | 58-_ | 59-_ | 60-_ | 61-_ | 62-_ | 63-_ |
6x/Ex | 64-_ | 65-_ | 66-_ | 67-_ | 68-_ | 69-_ | 70-_ | 71-_ | 72-_ | 73-_ | 74-_ | 75-_ | 76-_ | 77-_ | 78-_ | 79-_ |
7x/Fx | 80-_ | 81-_ | 82-_ | 83-_ | 84-_ | 85-_ | 86-_ | 87-_ | 88-_ | 89-_ | 90-_ | 91-_ | 92-_ | 93-_ | 94-_ | DEL[b] |
Lead byte
Unused lead byte |
Non-Hanzi rows
The following charts list the non-
Two implementations of GB2312
EUC-CN | GBK/GB18030 subset | GB2312.TXT | Character name[12]: 3 |
---|---|---|---|
A1A4 | U+00B7 · MIDDLE DOT | U+30FB ・ KATAKANA MIDDLE DOT | 间隔点; 'separator dot' |
A1AA | U+2014 — EM DASH | U+2015 ― HORIZONTAL BAR | 破折号; 'em dash' |
Unicode mappings of the interpunct (Chinese: 间隔点; lit. 'separator dot') and em dash (Chinese: 破折号) in the subset of GBK and GB 18030 corresponding to GB/T 2312 (U+00B7 · MIDDLE DOT and U+2014 — EM DASH) differ from those which are listed in GB2312.TXT (U+30FB ・ KATAKANA MIDDLE DOT and U+2015 ― HORIZONTAL BAR), which is a data file which was previously provided by the Unicode Consortium,[13] although it has been designated as obsolete since August 2011[14] and is no longer hosted as of September 2016.
As of 2015, Microsoft .Net Framework follows GB 18030 mappings when mapping those two characters in data labelled gb2312
, whereas
gb2312
, which in turn uses a GB18030 decoder.[18]Other differing mappings have been defined and used by individual vendors,
Character set 0x21/0xA1 (row 1: punctuation and symbols)
This row contains punctuation, mathematical operators, and other symbols. The following table shows the GB 18030 mappings[20] for these GB/T 2312 characters first, followed by any other documented mappings.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
2x/Ax | IDSP | 、 3001 |
。 3002 |
・
|
ˉ 02C9 |
ˇ 02C7 |
¨ 00A8 |
〃 3003 |
々 3005 |
―
|
〜
|
∥
|
⋯
|
‘ 2018 |
’ 2019 | |
3x/Bx | “ 201C |
” 201D |
〔 3014 |
〕 3015 |
〈 3008 |
〉 3009 |
《 300A |
》 300B |
「 300C |
」 300D |
『 300E |
』 300F |
〖 3016 |
〗 3017 |
【 3010 |
】 3011 |
4x/Cx | ± 00B1 |
× 00D7 |
÷ 00F7 |
∶ 2236 |
∧ 2227 |
∨ 2228 |
∑ 2211 |
∏ 220F |
∪ 222A |
∩ 2229 |
∈ 2208 |
∷ 2237 |
√ 221A |
⊥ 22A5 |
∥ 2225 |
∠ 2220 |
5x/Dx | ⌒ 2312 |
⊙ 2299 |
∫ 222B |
∮ 222E |
≡ 2261 |
≌ 224C |
≈ 2248 |
∽ 223D |
∝ 221D |
≠ 2260 |
≮ 226E |
≯ 226F |
≤ 2264 |
≥ 2265 |
∞ 221E |
∵ 2235 |
6x/Ex | ∴ 2234 |
♂ 2642 |
♀ 2640 |
° 00B0 |
′ 2032 |
″ 2033 |
℃ 2103 |
$ FF04 |
¤ 00A4 |
¢
|
£
|
‰ 2030 |
§ 00A7 |
№ 2116 |
☆ 2606 |
★ 2605 |
7x/Fx | ○ 25CB |
● 25CF |
◎ 25CE |
◇ 25C7 |
◆ 25C6 |
□ 25A1 |
■ 25A0 |
△ 25B3 |
▲ 25B2 |
※ 203B |
→ 2192 |
← 2190 |
↑ 2191 |
↓ 2193 |
〓 3013 |
Character set 0x22/0xA2 (row 2: list markers)
This row contains various types of list marker. Lowercase forms of the Roman numerals were not included in the original GB/T 2312[21] nor in GB/T 12345,[6] but are included in both Windows code page 936[22] and GB 18030.[20] A euro sign was also added by GB 18030.[20]
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
2x/Ax | ⅰ 2170 |
ⅱ 2171 |
ⅲ 2172 |
ⅳ 2173 |
ⅴ 2174 |
ⅵ 2175 |
ⅶ 2176 |
ⅷ 2177 |
ⅸ 2178 |
ⅹ 2179 |
||||||
3x/Bx | ⒈ 2488 |
⒉ 2489 |
⒊ 248A |
⒋ 248B |
⒌ 248C |
⒍ 248D |
⒎ 248E |
⒏ 248F |
⒐ 2490 |
⒑ 2491 |
⒒ 2492 |
⒓ 2493 |
⒔ 2494 |
⒕ 2495 |
⒖ 2496 | |
4x/Cx | ⒗ 2497 |
⒘ 2498 |
⒙ 2499 |
⒚ 249A |
⒛ 249B |
⑴ 2474 |
⑵ 2475 |
⑶ 2476 |
⑷ 2477 |
⑸ 2478 |
⑹ 2479 |
⑺ 247A |
⑻ 247B |
⑼ 247C |
⑽ 247D |
⑾ 247E |
5x/Dx | ⑿ 247F |
⒀ 2480 |
⒁ 2481 |
⒂ 2482 |
⒃ 2483 |
⒄ 2484 |
⒅ 2485 |
⒆ 2486 |
⒇ 2487 |
① 2460 |
② 2461 |
③ 2462 |
④ 2463 |
⑤ 2464 |
⑥ 2465 |
⑦ 2466 |
6x/Ex | ⑧ 2467 |
⑨ 2468 |
⑩ 2469 |
€ 20AC |
㈠ 3220 |
㈡ 3221 |
㈢ 3222 |
㈣ 3223 |
㈤ 3224 |
㈥ 3225 |
㈦ 3226 |
㈧ 3227 |
㈨ 3228 |
㈩ 3229 |
||
7x/Fx | Ⅰ 2160 |
Ⅱ 2161 |
Ⅲ 2162 |
Ⅳ 2163 |
Ⅴ 2164 |
Ⅵ 2165 |
Ⅶ 2166 |
Ⅷ 2167 |
Ⅸ 2168 |
Ⅹ 2169 |
Ⅺ 216A |
Ⅻ 216B |
Character set 0x23/0xA3 (row 3: ISO 646-CN)
This row contains
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
2x/Ax | ! 0021 |
" 0022 |
# 0023 |
¥ 00A5 |
% 0025 |
& 0026 |
' 0027 |
( 0028 |
) 0029 |
* 002A |
+ 002B |
, 002C |
- 002D |
. 002E |
/ 002F | |
3x/Bx | 0 0030 |
1 0031 |
2 0032 |
3 0033 |
4 0034 |
5 0035 |
6 0036 |
7 0037 |
8 0038 |
9 0039 |
: 003A |
; 003B |
< 003C |
= 003D |
> 003E |
? 003F |
4x/Cx | @ 0040 |
A 0041 |
B 0042 |
C 0043 |
D 0044 |
E 0045 |
F 0046 |
G 0047 |
H 0048 |
I 0049 |
J 004A |
K 004B |
L 004C |
M 004D |
N 004E |
O 004F |
5x/Dx | P 0050 |
Q 0051 |
R 0052 |
S 0053 |
T 0054 |
U 0055 |
V 0056 |
W 0057 |
X 0058 |
Y 0059 |
Z 005A |
[ 005B |
\ 005C |
] 005D |
^ 005E |
_ 005F |
6x/Ex | ` 0060 |
a 0061 |
b 0062 |
c 0063 |
d 0064 |
e 0065 |
f 0066 |
g 0067 |
h 0068 |
i 0069 |
j 006A |
k 006B |
l 006C |
m 006D |
n 006E |
o 006F |
7x/Fx | p 0070 |
q 0071 |
r 0072 |
s 0073 |
t 0074 |
u 0075 |
v 0076 |
w 0077 |
x 0078 |
y 0079 |
z 007A |
{ 007B |
| 007C |
} 007D |
‾ 203E |
When used in an encoding allowing combination with ASCII such as
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
2x/Ax | ! FF01 |
" FF02 |
# FF03 |
¥ FFE5 |
% FF05 |
& FF06 |
' FF07 |
( FF08 |
) FF09 |
* FF0A |
+ FF0B |
, FF0C |
- FF0D |
. FF0E |
/ FF0F | |
3x/Bx | 0 FF10 |
1 FF11 |
2 FF12 |
3 FF13 |
4 FF14 |
5 FF15 |
6 FF16 |
7 FF17 |
8 FF18 |
9 FF19 |
: FF1A |
; FF1B |
< FF1C |
= FF1D |
> FF1E |
? FF1F |
4x/Cx | @ FF20 |
A FF21 |
B FF22 |
C FF23 |
D FF24 |
E FF25 |
F FF26 |
G FF27 |
H FF28 |
I FF29 |
J FF2A |
K FF2B |
L FF2C |
M FF2D |
N FF2E |
O FF2F |
5x/Dx | P FF30 |
Q FF31 |
R FF32 |
S FF33 |
T FF34 |
U FF35 |
V FF36 |
W FF37 |
X FF38 |
Y FF39 |
Z FF3A |
[ FF3B |
\ FF3C |
] FF3D |
^ FF3E |
_ FF3F |
6x/Ex | ` FF40 |
a FF41 |
b FF42 |
c FF43 |
d FF44 |
e FF45 |
f FF46 |
ɡ[c]
|
h FF48 |
i FF49 |
j FF4A |
k FF4B |
l FF4C |
m FF4D |
n FF4E |
o FF4F |
7x/Fx | p FF50 |
q FF51 |
r FF52 |
s FF53 |
t FF54 |
u FF55 |
v FF56 |
w FF57 |
x FF58 |
y FF59 |
z FF5A |
{ FF5B |
| FF5C |
} FF5D |
 ̄ FFE3 |
Character set 0x24/0xA4 (row 4: Hiragana)
This set contains Hiragana for writing the Japanese language.
Compare with row 4 of JIS X 0208, which this row matches, and with row 10 of KS X 1001 and of KPS 9566, which use the same layout, but in a different row.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
2x/Ax | ぁ 3041 |
あ 3042 |
ぃ 3043 |
い 3044 |
ぅ 3045 |
う 3046 |
ぇ 3047 |
え 3048 |
ぉ 3049 |
お 304A |
か 304B |
が 304C |
き 304D |
ぎ 304E |
く 304F | |
3x/Bx | ぐ 3050 |
け 3051 |
げ 3052 |
こ 3053 |
ご 3054 |
さ 3055 |
ざ 3056 |
し 3057 |
じ 3058 |
す 3059 |
ず 305A |
せ 305B |
ぜ 305C |
そ 305D |
ぞ 305E |
た 305F |
4x/Cx | だ 3060 |
ち 3061 |
ぢ 3062 |
っ 3063 |
つ 3064 |
づ 3065 |
て 3066 |
で 3067 |
と 3068 |
ど 3069 |
な 306A |
に 306B |
ぬ 306C |
ね 306D |
の 306E |
は 306F |
5x/Dx | ば 3070 |
ぱ 3071 |
ひ 3072 |
び 3073 |
ぴ 3074 |
ふ 3075 |
ぶ 3076 |
ぷ 3077 |
へ 3078 |
べ 3079 |
ぺ 307A |
ほ 307B |
ぼ 307C |
ぽ 307D |
ま 307E |
み 307F |
6x/Ex | む 3080 |
め 3081 |
も 3082 |
ゃ 3083 |
や 3084 |
ゅ 3085 |
ゆ 3086 |
ょ 3087 |
よ 3088 |
ら 3089 |
り 308A |
る 308B |
れ 308C |
ろ 308D |
ゎ 308E |
わ 308F |
7x/Fx | ゐ 3090 |
ゑ 3091 |
を 3092 |
ん 3093 |
Character set 0x25/0xA5 (row 5: Katakana)
This set contains Katakana for writing the Japanese language. However, the Japanese long vowel mark, which is used in katakana text and included in row 1 of JIS X 0208, is not included in GB/T 2312, although it is added in GBK and GB 18030 outside of the main GB/T 2312 plane,[24] at 0xA960.[20]
Compare with row 5 of JIS X 0208, which this row matches, and with row 11 of KS X 1001 and of KPS 9566, which use the same layout, but in a different row.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
2x/Ax | ァ 30A1 |
ア 30A2 |
ィ 30A3 |
イ 30A4 |
ゥ 30A5 |
ウ 30A6 |
ェ 30A7 |
エ 30A8 |
ォ 30A9 |
オ 30AA |
カ 30AB |
ガ 30AC |
キ 30AD |
ギ 30AE |
ク 30AF | |
3x/Bx | グ 30B0 |
ケ 30B1 |
ゲ 30B2 |
コ 30B3 |
ゴ 30B4 |
サ 30B5 |
ザ 30B6 |
シ 30B7 |
ジ 30B8 |
ス 30B9 |
ズ 30BA |
セ 30BB |
ゼ 30BC |
ソ 30BD |
ゾ 30BE |
タ 30BF |
4x/Cx | ダ 30C0 |
チ 30C1 |
ヂ 30C2 |
ッ 30C3 |
ツ 30C4 |
ヅ 30C5 |
テ 30C6 |
デ 30C7 |
ト 30C8 |
ド 30C9 |
ナ 30CA |
ニ 30CB |
ヌ 30CC |
ネ 30CD |
ノ 30CE |
ハ 30CF |
5x/Dx | バ 30D0 |
パ 30D1 |
ヒ 30D2 |
ビ 30D3 |
ピ 30D4 |
フ 30D5 |
ブ 30D6 |
プ 30D7 |
ヘ 30D8 |
ベ 30D9 |
ペ 30DA |
ホ 30DB |
ボ 30DC |
ポ 30DD |
マ 30DE |
ミ 30DF |
6x/Ex | ム 30E0 |
メ 30E1 |
モ 30E2 |
ャ 30E3 |
ヤ 30E4 |
ュ 30E5 |
ユ 30E6 |
ョ 30E7 |
ヨ 30E8 |
ラ 30E9 |
リ 30EA |
ル 30EB |
レ 30EC |
ロ 30ED |
ヮ 30EE |
ワ 30EF |
7x/Fx | ヰ 30F0 |
ヱ 30F1 |
ヲ 30F2 |
ン 30F3 |
ヴ 30F4 |
ヵ 30F5 |
ヶ 30F6 |
Character set 0x26/0xA6 (row 6: Greek and vertical extensions)
This row contains basic support for the modern
The highlighted characters are presentation forms of punctuation marks for vertical writing, and are not included in GB/T 2312 proper, but are included in this row by GB/T 12345,[1][6] Windows code page 936,[22] Mac OS Simplified Chinese,[19] and GB 18030.[20] They are seen as "standard extensions to GB 2312".[19] Conversely, ISO-IR-165 includes patterned semigraphic characters in this row (mostly without exact counterparts in Unicode), colliding with the code positions used for the vertical extensions.[25]
Compare with row 6 of JIS X 0208, which this row matches when the vertical forms are not included, and with row 6 of KPS 9566, which includes the same Greek letters in the same layout, but adds Roman numerals rather than vertical forms. Contrast row 5 of KS X 1001, which offsets the Greek letters to include the Roman numerals first.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
2x/Ax | Α 0391 |
Β 0392 |
Γ 0393 |
Δ 0394 |
Ε 0395 |
Ζ 0396 |
Η 0397 |
Θ 0398 |
Ι 0399 |
Κ 039A |
Λ 039B |
Μ 039C |
Ν 039D |
Ξ 039E |
Ο 039F | |
3x/Bx | Π 03A0 |
Ρ 03A1 |
Σ 03A3 |
Τ 03A4 |
Υ 03A5 |
Φ 03A6 |
Χ 03A7 |
Ψ 03A8 |
Ω 03A9 |
|||||||
4x/Cx | α 03B1 |
β 03B2 |
γ 03B3 |
δ 03B4 |
ε 03B5 |
ζ 03B6 |
η 03B7 |
θ 03B8 |
ι 03B9 |
κ 03BA |
λ 03BB |
μ 03BC |
ν 03BD |
ξ 03BE |
ο 03BF | |
5x/Dx | π 03C0 |
ρ 03C1 |
σ 03C3 |
τ 03C4 |
υ 03C5 |
φ 03C6 |
χ 03C7 |
ψ 03C8 |
ω 03C9 |
︐[d] FE10 |
︒[d] FE12 |
︑[d] FE11 |
︓[d] FE13 |
︔[d] FE14 |
︕[d] FE15 |
︖[d] FE16 |
6x/Ex | ︵ FE35 |
︶ FE36 |
︹ FE39 |
︺ FE3A |
︿ FE3F |
﹀ FE40 |
︽ FE3D |
︾ FE3E |
﹁ FE41 |
﹂ FE42 |
﹃ FE43 |
﹄ FE44 |
︗[d] FE17 |
︘[d] FE18 |
︻ FE3B |
︼ FE3C |
7x/Fx | ︷ FE37 |
︸ FE38 |
︱ FE31 |
︙[d] FE19 |
︳ FE33 |
︴ FE34 |
Character set 0x27/0xA7 (row 7: Cyrillic)
This set includes both cases of 33 letters from the Cyrillic script, sufficient to write the modern Russian alphabet and Bulgarian alphabet, although other forms of Cyrillic require additional letters.[27]
Compare with row 7 of JIS X 0208, which this row matches, and with row 12 of KS X 1001 and row 5 of KPS 9566, which use the same layout but in different rows.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
2x/Ax | А 0410 |
Б 0411 |
В 0412 |
Г 0413 |
Д 0414 |
Е 0415 |
Ё 0401 |
Ж 0416 |
З 0417 |
И 0418 |
Й 0419 |
К 041A |
Л 041B |
М 041C |
Н 041D | |
3x/Bx | О 041E |
П 041F |
Р 0420 |
С 0421 |
Т 0422 |
У 0423 |
Ф 0424 |
Х 0425 |
Ц 0426 |
Ч 0427 |
Ш 0428 |
Щ 0429 |
Ъ 042A |
Ы 042B |
Ь 042C |
Э 042D |
4x/Cx | Ю 042E |
Я 042F |
||||||||||||||
5x/Dx | а 0430 |
б 0431 |
в 0432 |
г 0433 |
д 0434 |
е 0435 |
ё 0451 |
ж 0436 |
з 0437 |
и 0438 |
й 0439 |
к 043A |
л 043B |
м 043C |
н 043D | |
6x/Ex | о 043E |
п 043F |
р 0440 |
с 0441 |
т 0442 |
у 0443 |
ф 0444 |
х 0445 |
ц 0446 |
ч 0447 |
ш 0448 |
щ 0449 |
ъ 044A |
ы 044B |
ь 044C |
э 044D |
7x/Fx | ю 044E |
я 044F |
Character set 0x28/0xA8 (row 8: zhuyin and non-ASCII pinyin)
This row contains
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
2x/Ax | ā 0101 |
á 00E1 |
ǎ 01CE |
à 00E0 |
ē 0113 |
é 00E9 |
ě 011B |
è 00E8 |
ī 012B |
í 00ED |
ǐ 01D0 |
ì 00EC |
ō 014D |
ó 00F3 |
ǒ 01D2 | |
3x/Bx | ò 00F2 |
ū 016B |
ú 00FA |
ǔ 01D4 |
ù 00F9 |
ǖ 01D6 |
ǘ 01D8 |
ǚ 01DA |
ǜ 01DC |
ü 00FC |
ê 00EA |
ɑ 0251 |
ḿ[e] 1E3F |
ń 0144 |
ň 0148 |
ǹ[f] 01F9 |
4x/Cx | g[g]
|
ㄅ 3105 |
ㄆ 3106 |
ㄇ 3107 |
ㄈ 3108 |
ㄉ 3109 |
ㄊ 310A |
ㄋ 310B |
ㄌ 310C |
ㄍ 310D |
ㄎ 310E |
ㄏ 310F | ||||
5x/Dx | ㄐ 3110 |
ㄑ 3111 |
ㄒ 3112 |
ㄓ 3113 |
ㄔ 3114 |
ㄕ 3115 |
ㄖ 3116 |
ㄗ 3117 |
ㄘ 3118 |
ㄙ 3119 |
ㄚ 311A |
ㄛ 311B |
ㄜ 311C |
ㄝ 311D |
ㄞ 311E |
ㄟ 311F |
6x/Ex | ㄠ 3120 |
ㄡ 3121 |
ㄢ 3122 |
ㄣ 3123 |
ㄤ 3124 |
ㄥ 3125 |
ㄦ 3126 |
ㄧ 3127 |
ㄨ 3128 |
ㄩ 3129 |
||||||
7x/Fx |
Character set 0x29/0xA9 (row 9: box drawing)
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
2x/Ax | ─ 2500 |
━ 2501 |
│ 2502 |
┃ 2503 |
┄ 2504 |
┅ 2505 |
┆ 2506 |
┇ 2507 |
┈ 2508 |
┉ 2509 |
┊ 250A |
┋ 250B | ||||
3x/Bx | ┌ 250C |
┍ 250D |
┎ 250E |
┏ 250F |
┐ 2510 |
┑ 2511 |
┒ 2512 |
┓ 2513 |
└ 2514 |
┕ 2515 |
┖ 2516 |
┗ 2517 |
┘ 2518 |
┙ 2519 |
┚ 251A |
┛ 251B |
4x/Cx | ├ 251C |
┝ 251D |
┞ 251E |
┟ 251F |
┠ 2520 |
┡ 2521 |
┢ 2522 |
┣ 2523 |
┤ 2524 |
┥ 2525 |
┦ 2526 |
┧ 2527 |
┨ 2528 |
┩ 2529 |
┪ 252A |
┫ 252B |
5x/Dx | ┬ 252C |
┭ 252D |
┮ 252E |
┯ 252F |
┰ 2530 |
┱ 2531 |
┲ 2532 |
┳ 2533 |
┴ 2534 |
┵ 2535 |
┶ 2536 |
┷ 2537 |
┸ 2538 |
┹ 2539 |
┺ 253A |
┻ 253B |
6x/Ex | ┼ 253C |
┽ 253D |
┾ 253E |
┿ 253F |
╀ 2540 |
╁ 2541 |
╂ 2542 |
╃ 2543 |
╄ 2544 |
╅ 2545 |
╆ 2546 |
╇ 2547 |
╈ 2548 |
╉ 2549 |
╊ 254A |
╋ 254B |
7x/Fx |
Hanzi rows
Corrections
GB 5007.1-85 24x24 Bitmap Font Set of Chinese Characters for Information Exchange (Chinese: 信息交换用汉字 24x24 点阵字模集) is the earliest font template based on GB/T 2312 that features corrections and extensions including:
- changing the glyph shape of Latin alphabet "g"
- adding 6
- changed "鍾" to "锺"
- included 94 half-width glyphs in row 10 (half-width form of row 3, equivalent to GB 1988–80
- included half-width form of 32 Hanyu Pinyin characters from row 8 in row 11.
GB/T 2312 did not have corrections, but these corrections are included in font templates that are based on GB/T 2312 including GB/T 12345; its supersets GBK and GB 18030 also included these corrections. GB/T 2312 is also used in ISO-IR-165.
See also
- Guobiao code
- CJK characters
- Chinese character encoding
- Unicode
- Big5 – standard used in Taiwan and Hong Kong
- GB 18030, which has superseded GB/T 2312-1980
- GB/T 12345-1990, traditional counterpart of GB/T 2312-1980, superseded by GB18030
References
- ^ a b c d e
ISBN 978-0-596-51447-1.
- ^ "2017年第7号中国国家标准公告 (China National Standard Bulletin 2017 No.7)". Standardization Administration of the People's Republic of China. Retrieved 3 July 2018.
- ^ "Distribution of Character Encodings among websites that use China and territories". w3techs.com. Retrieved 2022-09-04.
- ^ "Historical trends in the usage statistics of character encodings for websites, October 2022". w3techs.com. Retrieved 2022-10-01.
- ^ "Encoding: Summarized test results". www.w3.org. Retrieved 2019-11-15.
- ^ )
- ^ GB12345-80 to Unicode table. Unicode Consortium. 1993-12-06. Archived from the original on 2004-06-17.
- ISBN 9780824818920.
the set provides for better than 99.99 percent of all usage. Nevertheless, the designers found it necessary to add 14,276 "special usage" characters to cover contingencies!
- ^ "GB 2312-1980: Information technology—Chinese ideogram coded character set for information interchange (Basic set)". May 1981.
- ^ "Unicode to GB2312 or GBK table". cs.nyu.edu. Archived from the original on 3 March 2016. Retrieved 11 January 2022.
- ^ ISBN 978-0-596-51447-1.
- ^ "GB 2312-1980: Information technology—Chinese ideogram coded character set for information interchange (basic set)". May 1981. Retrieved 2 October 2016.
- ^ a b Haible, Bruno. "GB2312 (Conversion Tables)". Retrieved 29 September 2016.
- ^ "Readme – MAPPINGS/OBSOLETE/EASTASIA". 9 August 2001. Retrieved 29 September 2016.
- ^ "java-EUC_CN-1.3_P.ucm". Retrieved 29 September 2016.[permanent dead link]
- ^ "libiconv:lib/gb2312.h". GNU Savannah. Retrieved 29 September 2016.
- ^ "Issue 24036". Python Bug Tracker.
- ^ "Encoding § Names and labels". W3C. Retrieved 29 September 2016.
- ^ Apple, Inc.
- ^ a b c d e f g h i j Standardization Administration of China (SAC) (2005-11-18). GB 18030-2005: Information Technology—Chinese coded character set.
- ISO-IR-58.
- ^ a b c d e f Microsoft. "CODEPAGE 936: PRC GBK (XGB) - ANSI, OEM". Unicode Consortium.
- ^ a b Viswanadha, Raghuram (2000-08-30). "Unicode to ISO-IR-165 table". International Components for Unicode. IBM.
- ^
ISBN 978-0-596-51447-1.
- ^ ISO-IR-165.
- ^ Lunde, Dr Ken (4 August 2022). "The GB 18030-2022 Standard". Medium. Retrieved 7 August 2022.
- ^ Czyborra, Roman (1998-11-30) [1998-05-25]. "The Cyrillic Charset Soup". Archived from the original on 2016-12-03. Retrieved 2016-12-03.
- ^ "Unicode Character Encoding Stability Policies". Unicode Consortium. 2017-06-23.
Notes
- ^ Only for ideographs covered by GB/T 2312, all of which fall into Unicode BMP
- ^ ISO 2022 compatible 94n-character set, the plain space and delete characterare available as single-byte codes at 0x20 and 0x7F (not 0xA0 and 0xFF) respectively.
- GB 6345.1, including Apple's implementation and GB 18030 (which use 8-32 for U+0261),[20] but for U+0261 by ISO-IR-165.[23]
- ^ Private Use Area, but with a defined glyph,[22][20] and by Apple to the regular fullwidth character with an appended private use character U+F87E as a variation marker.[19] In the GB 18030-2022 update, these Private Use Area mappings has been eliminated and now mapped to their standard Unicode codepoints.[26]
- Private Use Area U+E7C8 by Windows-936.[22]
- ^ Mapped to U+0261 in GB 18030[20] and most other implementations based on GB 6345.1[19] (which use 3-71 for U+FF47), but to U+FF47 in ISO-IR-165.[23][25]
- ^ ɑ (U+0251)
ḿ (U+1E3F; Submitted in Unicode 3.0, thus CP936 did not include this character [1][permanent dead link])
ń (U+0144)
ň (U+0148)
ǹ (U+01F9; Submitted in Unicode 3.0, thus CP936 did not include this character [2][permanent dead link])
ɡ (U+0261)
Further reading
- Lunde, Ken (2009). "Chinese Character Set Standards—China". CJKV Information Processing (2nd ed.). O'Reilly. ISBN 978-0-596-51447-1.
External links
- Graphical View of GB2312 in ICU's Converter Explorer
- Unicode to GB2312 or GBK table
- Chinese Character Codes
- Evolution of GBK and GB2312 into GB18030
- GB2312 Character Set for Chinese Characters
- Coded Chinese Graphic Character Set for Information Interchange ISO-IR 58
- C code generates 6763 basic characters with output
- GB2312-80 standard on China-Language.gov.cn