GB 2312

GB 2312
MIME / IANA	GB_2312-80 (GB2312 for usual EUC form)
Alias(es)	iso-ir-58, chinese, csGB2312, csISO58GB231280
Language(s)	ISO-2022-compatible DBCS, CJK encoding
Extensions	ISO-IR-165
Encoding formats	EUC-CN (GB2312); HZ-GB-2312; ISO-2022-CN; Shift GB;
Preceded by	Chinese telegraph code
Succeeded by	GBK, GB 18030
Other related encoding(s)	JIS X 0208, KS X 1001
	v; t; e;

GB/T 2312-1980 is a key official

Guobiao standards (国家标准), whereas the T suffix (推荐; tuījiàn; 'recommendation') denotes a non-mandatory standard.^[1]

GB/T 2312-1980 was originally a mandatory national standard designated GB 2312-1980. However, following a National Standard Bulletin of the

People's Republic of China in 2017, GB 2312 is no longer mandatory, and its standard code is modified to GB/T 2312-1980.^[2] GB/T 2312-1980 has been superseded by GBK and GB 18030

, which include additional characters, but GB/T 2312 remains in widespread use as a subset of those encodings.

As of September 2022 [update], GB2312 is the second-most popular encoding served from China and territories (after UTF-8), with 5.5% of web servers serving a page declaring it.^[3] Globally, GB2312 is declared on 0.1% of all web pages.^[4] However, all major web browsers decode GB2312-marked documents as if they were marked with the superset GBK encoding, except for Safari and Edge on the label GB_2312.^[5]

There is an analogous character set known as

GB/T 12345 Code of Chinese ideogram set for information interchange supplementary set, which supplements GB/T 2312 with traditional character forms by replacing simplified forms in their qūwèi code, and some extra 62 supplemental characters.^[6]^[7]

GB-encoded fonts often come in pairs, one with the GB/T 2312 (simplified) character set and the other with the GB/T 12345 (traditional) character set. There exists more GB supplementary encoding sets that supplements GB/T 2312, including GB/T 7589 Code of Chinese ideograms set forinformation interchange--The 2nd supplementary set and GB/T 7590 Code of Chinese ideograms set forinformation interchange--The 4th supplementary set which provides additional [Variant Chinese characters|variant characters] in the same qūwèi encoding format (later used in ISO-2022-CN), but has no relation with characters encoded in GB/T 2312.

Character range in rows

While GB/T 2312 covers over 99.99% contemporary Chinese text usage,

Zhuyin, and a double-byte set of Pinyin

letters with tone marks. In later version GB/T 2312-1980, there are 7,445 letters.

Characters in GB/T 2312 are arranged in a 94×94 grid (as in

kuten.) For example, the character "外" (meaning: foreign) is located in row 45 position 66,^[9]

thus its qūwèi code is 45-66.

The rows (numbered from 1 to 94) contain characters as follows:

01–09, comprising punctuation and other special characters; also Hiragana, Katakana, Greek, Cyrillic, Pinyin, Bopomofo
16–55, the first level of Chinese characters, arranged according to Pinyin. (3755 characters).
56–87, the second level of Chinese characters, arranged according to radical and strokes. (3008 characters).

The rows 10–15 and 88–94 are unassigned.

For GB/T 2312-1980, it contains 682 signs and 6763 Chinese Characters.

Encodings of GB/T 2312

EUC-CN

bytes are used to represent every character not found in ASCII

. The value of the first byte is from 0xA1–0xF7 (161–247), while the value of the second byte is from 0xA1–0xFE (161–254). Since all of these ranges are beyond ASCII, like UTF-8, it is possible to check if a byte is part of a multi-byte construct when using EUC-CN, but not if a byte is first or last.

Compared to UTF-8, GB/T 2312 (whether native or encoded in EUC-CN) is more storage efficient: while UTF-8 uses three bytes^[a] per CJK ideograph, GB/T 2312 only uses two. However, GB/T 2312 does not cover as many ideographs as Unicode does.

To map the qūwèi code points to EUC bytes, add 160 (0xA0) to both the row number (or qū, 区) and cell/column number (ten or wèi, 位). The result of addition to the row number of the code point will form the high byte, and the result of addition to the cell number of the code point will form the low byte.

For example, to encode the character "外" at qūwèi cell 45-66, the high byte will use the row number 45: 45+160=205=0xCD, and the low byte will come from the cell number 66: 66+160=226=0xE2. So, the full encoding is <CD E2>.^[10]^[11]

ISO-2022-CN

ISO-2022-CN is another encoding form of GB/T 2312, which is also the encoding specified in the official documentation. This encoding references the ISO-2022 standard, which also uses two bytes to encode characters not found in ASCII. However, instead of using the extended region of ASCII, ISO-2022 uses the same byte range as ASCII: the value of the first byte is from 0x21–0x77 (33–119), while the value of the second byte is from 0x21–0x7E (33–126). As the byte range overlaps ASCII significantly, special characters are required to indicate whether a character is in the ASCII range or is part of the two-byte sequence of extended region, namely the Shift Out and Shift In

functions. This poses a risk for misencoding as improper handling of text can result in missing information.

To map the qūwèi code points to ISO-2022 bytes, add 32 (0x20) to both the row number (or qū, 区) and cell/column number (or wèi, 位). The result of addition to the row number of the code point will form the high byte, and the result of addition to the cell number of the code point will form the low byte similar to EUC encoding.

For example, to encode the character "外" at qūwèi cell 45-66, the high byte will use the row number 45: 45+32=77=0x4D, and the low byte will come from the cell number 66: 66+32=98=0x62. So, the full encoding is <4D 62>.[11]

HZ

HZ is another encoding of GB/T 2312 that is used mostly for Usenet postings; characters are represented with the same byte pairs as in ISO-2022-CN, but the byte sequences denoting the beginning and end of a range of GB 2312 text differ.

Code charts

In the tables below, where a pair of hexadecimal numbers is given for a prefix byte or a coding byte, the smaller (with the eighth bit unset or unavailable) is used when encoded over GL (

Qūwèi

numbers are given in decimal.

When GB/T 2312 is encoded over GR, both bytes have the eighth bit set (i.e. are greater than 0x7F). GBK and GB 18030 also make use of two-byte codes in which only the first byte has the eighth bit set for extension purposes: such codes are outside of the GB/T 2312 plane, and are not tabulated here.

Lead byte

This chart details the overall layout of the main plane of the GB/T 2312 character set by lead byte. For lead bytes used for characters other than

hanzi, links are provided to charts on this page listing the characters encoded under that lead byte. For lead bytes used for hanzi, links are provided to the appropriate section of Wiktionary

's hanzi index.

GB 2312 (lead bytes)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
2x/Ax	SP^[b]	1-_	2-_	3-_	4-_	5-_	6-_	7-_	8-_	9-_	10-_	11-_	12-_	13-_	14-_	15-_
3x/Bx	16-_	17-_	18-_	19-_	20-_	21-_	22-_	23-_	24-_	25-_	26-_	27-_	28-_	29-_	30-_	31-_
4x/Cx	32-_	33-_	34-_	35-_	36-_	37-_	38-_	39-_	40-_	41-_	42-_	43-_	44-_	45-_	46-_	47-_
5x/Dx	48-_	49-_	50-_	51-_	52-_	53-_	54-_	55-_	56-_	57-_	58-_	59-_	60-_	61-_	62-_	63-_
6x/Ex	64-_	65-_	66-_	67-_	68-_	69-_	70-_	71-_	72-_	73-_	74-_	75-_	76-_	77-_	78-_	79-_
7x/Fx	80-_	81-_	82-_	83-_	84-_	85-_	86-_	87-_	88-_	89-_	90-_	91-_	92-_	93-_	94-_	DEL^[b]
Lead byte Unused lead byte

Non-Hanzi rows

The following charts list the non-

GB 6345.1 and ISO-IR-165

differ from these. Cross-references are made to articles on other CJK national character sets for comparison.

Two implementations of GB2312

EUC-CN	GBK/GB18030 subset	GB2312.TXT	Character name^[12]^: 3
A1A4	U+00B7 · MIDDLE DOT	U+30FB ・ KATAKANA MIDDLE DOT	间隔点; 'separator dot'
A1AA	U+2014 — EM DASH	U+2015 ― HORIZONTAL BAR	破折号; 'em dash'

Unicode mappings of the interpunct (Chinese: 间隔点; lit. 'separator dot') and em dash (Chinese: 破折号) in the subset of GBK and GB 18030 corresponding to GB/T 2312 (U+00B7 · MIDDLE DOT and U+2014 — EM DASH) differ from those which are listed in GB2312.TXT (U+30FB ・ KATAKANA MIDDLE DOT and U+2015 ― HORIZONTAL BAR), which is a data file which was previously provided by the Unicode Consortium,^[13] although it has been designated as obsolete since August 2011^[14] and is no longer hosted as of September 2016.

As of 2015, Microsoft .Net Framework follows GB 18030 mappings when mapping those two characters in data labelled gb2312, whereas

W3C/WHATWG technical recommendation for use with HTML5 specifies a GBK encoding to be inferred for streams labelled gb2312, which in turn uses a GB18030 decoder.^[18]

Other differing mappings have been defined and used by individual vendors,

Apple.^[19]

Character set 0x21/0xA1 (row 1: punctuation and symbols)

This row contains punctuation, mathematical operators, and other symbols. The following table shows the GB 18030 mappings [20] for these GB/T 2312 characters first, followed by any other documented mappings.

GB 2312 (prefixed with 0x21/0xA1)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
2x/Ax		IDSP	、 3001	。 3002	・	ˉ 02C9	ˇ 02C7	¨ 00A8	〃 3003	々 3005	―	〜	∥	⋯	‘ 2018	’ 2019
3x/Bx	“ 201C	” 201D	〔 3014	〕 3015	〈 3008	〉 3009	《 300A	》 300B	「 300C	」 300D	『 300E	』 300F	〖 3016	〗 3017	【 3010	】 3011
4x/Cx	± 00B1	× 00D7	÷ 00F7	∶ 2236	∧ 2227	∨ 2228	∑ 2211	∏ 220F	∪ 222A	∩ 2229	∈ 2208	∷ 2237	√ 221A	⊥ 22A5	∥ 2225	∠ 2220
5x/Dx	⌒ 2312	⊙ 2299	∫ 222B	∮ 222E	≡ 2261	≌ 224C	≈ 2248	∽ 223D	∝ 221D	≠ 2260	≮ 226E	≯ 226F	≤ 2264	≥ 2265	∞ 221E	∵ 2235
6x/Ex	∴ 2234	♂ 2642	♀ 2640	° 00B0	′ 2032	″ 2033	℃ 2103	＄ FF04	¤ 00A4	¢	£	‰ 2030	§ 00A7	№ 2116	☆ 2606	★ 2605
7x/Fx	○ 25CB	● 25CF	◎ 25CE	◇ 25C7	◆ 25C6	□ 25A1	■ 25A0	△ 25B3	▲ 25B2	※ 203B	→ 2192	← 2190	↑ 2191	↓ 2193	〓 3013

Character set 0x22/0xA2 (row 2: list markers)

This row contains various types of list marker. Lowercase forms of the Roman numerals were not included in the original GB/T 2312^[21] nor in GB/T 12345,^[6] but are included in both Windows code page 936^[22] and GB 18030.^[20] A euro sign was also added by GB 18030.^[20]

GB 2312 (prefixed with 0x22/0xA2)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
2x/Ax		ⅰ 2170	ⅱ 2171	ⅲ 2172	ⅳ 2173	ⅴ 2174	ⅵ 2175	ⅶ 2176	ⅷ 2177	ⅸ 2178	ⅹ 2179
3x/Bx		⒈ 2488	⒉ 2489	⒊ 248A	⒋ 248B	⒌ 248C	⒍ 248D	⒎ 248E	⒏ 248F	⒐ 2490	⒑ 2491	⒒ 2492	⒓ 2493	⒔ 2494	⒕ 2495	⒖ 2496
4x/Cx	⒗ 2497	⒘ 2498	⒙ 2499	⒚ 249A	⒛ 249B	⑴ 2474	⑵ 2475	⑶ 2476	⑷ 2477	⑸ 2478	⑹ 2479	⑺ 247A	⑻ 247B	⑼ 247C	⑽ 247D	⑾ 247E
5x/Dx	⑿ 247F	⒀ 2480	⒁ 2481	⒂ 2482	⒃ 2483	⒄ 2484	⒅ 2485	⒆ 2486	⒇ 2487	① 2460	② 2461	③ 2462	④ 2463	⑤ 2464	⑥ 2465	⑦ 2466
6x/Ex	⑧ 2467	⑨ 2468	⑩ 2469	€ 20AC		㈠ 3220	㈡ 3221	㈢ 3222	㈣ 3223	㈤ 3224	㈥ 3225	㈦ 3226	㈧ 3227	㈨ 3228	㈩ 3229
7x/Fx		Ⅰ 2160	Ⅱ 2161	Ⅲ 2162	Ⅳ 2163	Ⅴ 2164	Ⅵ 2165	Ⅶ 2166	Ⅷ 2167	Ⅸ 2168	Ⅹ 2169	Ⅺ 216A	Ⅻ 216B

Character set 0x23/0xA3 (row 3: ISO 646-CN)

This row contains

ISO 646-CN (GB/T 1988-80), a national counterpart to ASCII. Compare row 3 of KS X 1001, which does the same with South Korea's ISO 646 version, and row 3 of JIS X 0208 and of KPS 9566

, which include only the alphanumeric subset, but in the same layout. The following chart lists ISO 646-CN.

ISO 646-CN; non-fullwidth mappings
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
2x/Ax		! 0021	" 0022	# 0023	¥ 00A5	% 0025	& 0026	' 0027	( 0028	) 0029	* 002A	+ 002B	, 002C	- 002D	. 002E	/ 002F
3x/Bx	0 0030	1 0031	2 0032	3 0033	4 0034	5 0035	6 0036	7 0037	8 0038	9 0039	: 003A	; 003B	< 003C	= 003D	> 003E	? 003F
4x/Cx	@ 0040	A 0041	B 0042	C 0043	D 0044	E 0045	F 0046	G 0047	H 0048	I 0049	J 004A	K 004B	L 004C	M 004D	N 004E	O 004F
5x/Dx	P 0050	Q 0051	R 0052	S 0053	T 0054	U 0055	V 0056	W 0057	X 0058	Y 0059	Z 005A	[ 005B	\ 005C	] 005D	^ 005E	_ 005F
6x/Ex	` 0060	a 0061	b 0062	c 0063	d 0064	e 0065	f 0066	g 0067	h 0068	i 0069	j 006A	k 006B	l 006C	m 006D	n 006E	o 006F
7x/Fx	p 0070	q 0071	r 0072	s 0073	t 0074	u 0075	v 0076	w 0077	x 0078	y 0079	z 007A	{ 007B	\| 007C	} 007D	‾ 203E

When used in an encoding allowing combination with ASCII such as

yuan sign as above.^[19]

GB 2312 (prefixed with 0x23/0xA3); fullwidth mappings
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
2x/Ax		！ FF01	＂ FF02	＃ FF03	￥ FFE5	％ FF05	＆ FF06	＇ FF07	（ FF08	） FF09	＊ FF0A	＋ FF0B	， FF0C	－ FF0D	． FF0E	／ FF0F
3x/Bx	０ FF10	１ FF11	２ FF12	３ FF13	４ FF14	５ FF15	６ FF16	７ FF17	８ FF18	９ FF19	： FF1A	； FF1B	＜ FF1C	＝ FF1D	＞ FF1E	？ FF1F
4x/Cx	＠ FF20	Ａ FF21	Ｂ FF22	Ｃ FF23	Ｄ FF24	Ｅ FF25	Ｆ FF26	Ｇ FF27	Ｈ FF28	Ｉ FF29	Ｊ FF2A	Ｋ FF2B	Ｌ FF2C	Ｍ FF2D	Ｎ FF2E	Ｏ FF2F
5x/Dx	Ｐ FF30	Ｑ FF31	Ｒ FF32	Ｓ FF33	Ｔ FF34	Ｕ FF35	Ｖ FF36	Ｗ FF37	Ｘ FF38	Ｙ FF39	Ｚ FF3A	［ FF3B	＼ FF3C	］ FF3D	＾ FF3E	＿ FF3F
6x/Ex	｀ FF40	ａ FF41	ｂ FF42	ｃ FF43	ｄ FF44	ｅ FF45	ｆ FF46	ɡ^[c]	ｈ FF48	ｉ FF49	ｊ FF4A	ｋ FF4B	ｌ FF4C	ｍ FF4D	ｎ FF4E	ｏ FF4F
7x/Fx	ｐ FF50	ｑ FF51	ｒ FF52	ｓ FF53	ｔ FF54	ｕ FF55	ｖ FF56	ｗ FF57	ｘ FF58	ｙ FF59	ｚ FF5A	｛ FF5B	｜ FF5C	｝ FF5D	￣ FFE3

Character set 0x24/0xA4 (row 4: Hiragana)

This set contains Hiragana for writing the Japanese language.

Compare with row 4 of JIS X 0208, which this row matches, and with row 10 of KS X 1001 and of KPS 9566, which use the same layout, but in a different row.

GB 2312 (prefixed with 0x24/0xA4)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
2x/Ax		ぁ 3041	あ 3042	ぃ 3043	い 3044	ぅ 3045	う 3046	ぇ 3047	え 3048	ぉ 3049	お 304A	か 304B	が 304C	き 304D	ぎ 304E	く 304F
3x/Bx	ぐ 3050	け 3051	げ 3052	こ 3053	ご 3054	さ 3055	ざ 3056	し 3057	じ 3058	す 3059	ず 305A	せ 305B	ぜ 305C	そ 305D	ぞ 305E	た 305F
4x/Cx	だ 3060	ち 3061	ぢ 3062	っ 3063	つ 3064	づ 3065	て 3066	で 3067	と 3068	ど 3069	な 306A	に 306B	ぬ 306C	ね 306D	の 306E	は 306F
5x/Dx	ば 3070	ぱ 3071	ひ 3072	び 3073	ぴ 3074	ふ 3075	ぶ 3076	ぷ 3077	へ 3078	べ 3079	ぺ 307A	ほ 307B	ぼ 307C	ぽ 307D	ま 307E	み 307F
6x/Ex	む 3080	め 3081	も 3082	ゃ 3083	や 3084	ゅ 3085	ゆ 3086	ょ 3087	よ 3088	ら 3089	り 308A	る 308B	れ 308C	ろ 308D	ゎ 308E	わ 308F
7x/Fx	ゐ 3090	ゑ 3091	を 3092	ん 3093

Character set 0x25/0xA5 (row 5: Katakana)

This set contains Katakana for writing the Japanese language. However, the Japanese long vowel mark, which is used in katakana text and included in row 1 of JIS X 0208, is not included in GB/T 2312, although it is added in GBK and GB 18030 outside of the main GB/T 2312 plane,^[24] at 0xA960.^[20]

Compare with row 5 of JIS X 0208, which this row matches, and with row 11 of KS X 1001 and of KPS 9566, which use the same layout, but in a different row.

GB 2312 (prefixed with 0x25/0xA5)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
2x/Ax		ァ 30A1	ア 30A2	ィ 30A3	イ 30A4	ゥ 30A5	ウ 30A6	ェ 30A7	エ 30A8	ォ 30A9	オ 30AA	カ 30AB	ガ 30AC	キ 30AD	ギ 30AE	ク 30AF
3x/Bx	グ 30B0	ケ 30B1	ゲ 30B2	コ 30B3	ゴ 30B4	サ 30B5	ザ 30B6	シ 30B7	ジ 30B8	ス 30B9	ズ 30BA	セ 30BB	ゼ 30BC	ソ 30BD	ゾ 30BE	タ 30BF
4x/Cx	ダ 30C0	チ 30C1	ヂ 30C2	ッ 30C3	ツ 30C4	ヅ 30C5	テ 30C6	デ 30C7	ト 30C8	ド 30C9	ナ 30CA	ニ 30CB	ヌ 30CC	ネ 30CD	ノ 30CE	ハ 30CF
5x/Dx	バ 30D0	パ 30D1	ヒ 30D2	ビ 30D3	ピ 30D4	フ 30D5	ブ 30D6	プ 30D7	ヘ 30D8	ベ 30D9	ペ 30DA	ホ 30DB	ボ 30DC	ポ 30DD	マ 30DE	ミ 30DF
6x/Ex	ム 30E0	メ 30E1	モ 30E2	ャ 30E3	ヤ 30E4	ュ 30E5	ユ 30E6	ョ 30E7	ヨ 30E8	ラ 30E9	リ 30EA	ル 30EB	レ 30EC	ロ 30ED	ヮ 30EE	ワ 30EF
7x/Fx	ヰ 30F0	ヱ 30F1	ヲ 30F2	ン 30F3	ヴ 30F4	ヵ 30F5	ヶ 30F6

Character set 0x26/0xA6 (row 6: Greek and vertical extensions)

This row contains basic support for the modern

final sigma

.

The highlighted characters are presentation forms of punctuation marks for vertical writing, and are not included in GB/T 2312 proper, but are included in this row by GB/T 12345,^[1]^[6] Windows code page 936,^[22] Mac OS Simplified Chinese,^[19] and GB 18030.^[20] They are seen as "standard extensions to GB 2312".^[19] Conversely, ISO-IR-165 includes patterned semigraphic characters in this row (mostly without exact counterparts in Unicode), colliding with the code positions used for the vertical extensions.^[25]

Compare with row 6 of JIS X 0208, which this row matches when the vertical forms are not included, and with row 6 of KPS 9566, which includes the same Greek letters in the same layout, but adds Roman numerals rather than vertical forms. Contrast row 5 of KS X 1001, which offsets the Greek letters to include the Roman numerals first.

GB 2312 (prefixed with 0x26/0xA6)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
2x/Ax		Α 0391	Β 0392	Γ 0393	Δ 0394	Ε 0395	Ζ 0396	Η 0397	Θ 0398	Ι 0399	Κ 039A	Λ 039B	Μ 039C	Ν 039D	Ξ 039E	Ο 039F
3x/Bx	Π 03A0	Ρ 03A1	Σ 03A3	Τ 03A4	Υ 03A5	Φ 03A6	Χ 03A7	Ψ 03A8	Ω 03A9
4x/Cx		α 03B1	β 03B2	γ 03B3	δ 03B4	ε 03B5	ζ 03B6	η 03B7	θ 03B8	ι 03B9	κ 03BA	λ 03BB	μ 03BC	ν 03BD	ξ 03BE	ο 03BF
5x/Dx	π 03C0	ρ 03C1	σ 03C3	τ 03C4	υ 03C5	φ 03C6	χ 03C7	ψ 03C8	ω 03C9	︐^[d] FE10	︒^[d] FE12	︑^[d] FE11	︓^[d] FE13	︔^[d] FE14	︕^[d] FE15	︖^[d] FE16
6x/Ex	︵ FE35	︶ FE36	︹ FE39	︺ FE3A	︿ FE3F	﹀ FE40	︽ FE3D	︾ FE3E	﹁ FE41	﹂ FE42	﹃ FE43	﹄ FE44	︗^[d] FE17	︘^[d] FE18	︻ FE3B	︼ FE3C
7x/Fx	︷ FE37	︸ FE38	︱ FE31	︙^[d] FE19	︳ FE33	︴ FE34

Character set 0x27/0xA7 (row 7: Cyrillic)

This set includes both cases of 33 letters from the Cyrillic script, sufficient to write the modern Russian alphabet and Bulgarian alphabet, although other forms of Cyrillic require additional letters.^[27]

Compare with row 7 of JIS X 0208, which this row matches, and with row 12 of KS X 1001 and row 5 of KPS 9566, which use the same layout but in different rows.

GB 2312 (prefixed with 0x27/0xA7)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
2x/Ax		А 0410	Б 0411	В 0412	Г 0413	Д 0414	Е 0415	Ё 0401	Ж 0416	З 0417	И 0418	Й 0419	К 041A	Л 041B	М 041C	Н 041D
3x/Bx	О 041E	П 041F	Р 0420	С 0421	Т 0422	У 0423	Ф 0424	Х 0425	Ц 0426	Ч 0427	Ш 0428	Щ 0429	Ъ 042A	Ы 042B	Ь 042C	Э 042D
4x/Cx	Ю 042E	Я 042F
5x/Dx		а 0430	б 0431	в 0432	г 0433	д 0434	е 0435	ё 0451	ж 0436	з 0437	и 0438	й 0439	к 043A	л 043B	м 043C	н 043D
6x/Ex	о 043E	п 043F	р 0440	с 0441	т 0442	у 0443	ф 0444	х 0445	ц 0446	ч 0447	ш 0448	щ 0449	ъ 044A	ы 044B	ь 044C	э 044D
7x/Fx	ю 044E	я 044F

Character set 0x28/0xA8 (row 8: zhuyin and non-ASCII pinyin)

This row contains

GB 6345.1,^[19] and also included in GB/T 12345,^[1]^[6] Windows code page 936,^[22] Mac OS Simplified Chinese^[19] and GB 18030.^[20] They are seen as "standard extensions to GB 2312".^[19]

GB 6345.1 treats the pinyin in this row as fullwidth, and includes halfwidth counterparts as row 11;^[1]

GB 18030 does not do this.

GB 2312 (prefixed with 0x28/0xA8)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
2x/Ax		ā 0101	á 00E1	ǎ 01CE	à 00E0	ē 0113	é 00E9	ě 011B	è 00E8	ī 012B	í 00ED	ǐ 01D0	ì 00EC	ō 014D	ó 00F3	ǒ 01D2
3x/Bx	ò 00F2	ū 016B	ú 00FA	ǔ 01D4	ù 00F9	ǖ 01D6	ǘ 01D8	ǚ 01DA	ǜ 01DC	ü 00FC	ê 00EA	ɑ 0251	ḿ^[e] 1E3F	ń 0144	ň 0148	ǹ^[f] 01F9
4x/Cx	ｇ^[g]					ㄅ 3105	ㄆ 3106	ㄇ 3107	ㄈ 3108	ㄉ 3109	ㄊ 310A	ㄋ 310B	ㄌ 310C	ㄍ 310D	ㄎ 310E	ㄏ 310F
5x/Dx	ㄐ 3110	ㄑ 3111	ㄒ 3112	ㄓ 3113	ㄔ 3114	ㄕ 3115	ㄖ 3116	ㄗ 3117	ㄘ 3118	ㄙ 3119	ㄚ 311A	ㄛ 311B	ㄜ 311C	ㄝ 311D	ㄞ 311E	ㄟ 311F
6x/Ex	ㄠ 3120	ㄡ 3121	ㄢ 3122	ㄣ 3123	ㄤ 3124	ㄥ 3125	ㄦ 3126	ㄧ 3127	ㄨ 3128	ㄩ 3129
7x/Fx

Character set 0x29/0xA9 (row 9: box drawing)

GB 2312 (prefixed with 0x29/0xA9)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
2x/Ax					─ 2500	━ 2501	│ 2502	┃ 2503	┄ 2504	┅ 2505	┆ 2506	┇ 2507	┈ 2508	┉ 2509	┊ 250A	┋ 250B
3x/Bx	┌ 250C	┍ 250D	┎ 250E	┏ 250F	┐ 2510	┑ 2511	┒ 2512	┓ 2513	└ 2514	┕ 2515	┖ 2516	┗ 2517	┘ 2518	┙ 2519	┚ 251A	┛ 251B
4x/Cx	├ 251C	┝ 251D	┞ 251E	┟ 251F	┠ 2520	┡ 2521	┢ 2522	┣ 2523	┤ 2524	┥ 2525	┦ 2526	┧ 2527	┨ 2528	┩ 2529	┪ 252A	┫ 252B
5x/Dx	┬ 252C	┭ 252D	┮ 252E	┯ 252F	┰ 2530	┱ 2531	┲ 2532	┳ 2533	┴ 2534	┵ 2535	┶ 2536	┷ 2537	┸ 2538	┹ 2539	┺ 253A	┻ 253B
6x/Ex	┼ 253C	┽ 253D	┾ 253E	┿ 253F	╀ 2540	╁ 2541	╂ 2542	╃ 2543	╄ 2544	╅ 2545	╆ 2546	╇ 2547	╈ 2548	╉ 2549	╊ 254A	╋ 254B
7x/Fx

Hanzi rows

Corrections

GB 5007.1-85 24x24 Bitmap Font Set of Chinese Characters for Information Exchange (Chinese: 信息交换用汉字 24x24 点阵字模集) is the earliest font template based on GB/T 2312 that features corrections and extensions including:

changing the glyph shape of Latin alphabet "g"
adding 6
ɡ^{[note 1]}

changed "鍾" to "锺"

included 94 half-width glyphs in row 10 (half-width form of row 3, equivalent to GB 1988–80

included half-width form of 32 Hanyu Pinyin characters from row 8 in row 11.

GB/T 2312 did not have corrections, but these corrections are included in font templates that are based on GB/T 2312 including GB/T 12345; its supersets GBK and GB 18030 also included these corrections. GB/T 2312 is also used in ISO-IR-165.

References

^ ^a ^b ^c ^d ^e
ISBN 978-0-596-51447-1
.

^ "2017年第7号中国国家标准公告 (China National Standard Bulletin 2017 No.7)". Standardization Administration of the People's Republic of China. Retrieved 3 July 2018.
^ "Distribution of Character Encodings among websites that use China and territories". w3techs.com. Retrieved 2022-09-04.
^ "Historical trends in the usage statistics of character encodings for websites, October 2022". w3techs.com. Retrieved 2022-10-01.
^ "Encoding: Summarized test results". www.w3.org. Retrieved 2019-11-15.
^
ISBN 9781565922242. {{cite book}}: |work= ignored (help
)

^ GB12345-80 to Unicode table. Unicode Consortium. 1993-12-06. Archived from the original on 2004-06-17.
ISBN 9780824818920
. the set provides for better than 99.99 percent of all usage. Nevertheless, the designers found it necessary to add 14,276 "special usage" characters to cover contingencies!

^ "GB 2312-1980: Information technology—Chinese ideogram coded character set for information interchange (Basic set)". May 1981.

^ "Unicode to GB2312 or GBK table". cs.nyu.edu. Archived from the original on 3 March 2016. Retrieved 11 January 2022.

^
ISBN 978-0-596-51447-1
.

^ "GB 2312-1980: Information technology—Chinese ideogram coded character set for information interchange (basic set)". May 1981. Retrieved 2 October 2016.

^ ^a ^b Haible, Bruno. "GB2312 (Conversion Tables)". Retrieved 29 September 2016.

^ "Readme – MAPPINGS/OBSOLETE/EASTASIA". 9 August 2001. Retrieved 29 September 2016.

^ "java-EUC_CN-1.3_P.ucm". Retrieved 29 September 2016.^{[permanent dead link]}

^ "libiconv:lib/gb2312.h". GNU Savannah. Retrieved 29 September 2016.

^ "Issue 24036". Python Bug Tracker.

^ "Encoding § Names and labels". W3C. Retrieved 29 September 2016.

^
Apple, Inc
.

^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j Standardization Administration of China (SAC) (2005-11-18). GB 18030-2005: Information Technology—Chinese coded character set.

ISO-IR
-58.

^ ^a ^b ^c ^d ^e ^f Microsoft. "CODEPAGE 936: PRC GBK (XGB) - ANSI, OEM". Unicode Consortium.

^ ^a ^b Viswanadha, Raghuram (2000-08-30). "Unicode to ISO-IR-165 table". International Components for Unicode. IBM.

^
ISBN 978-0-596-51447-1
.

^
ISO-IR
-165.

^ Lunde, Dr Ken (4 August 2022). "The GB 18030-2022 Standard". Medium. Retrieved 7 August 2022.

^ Czyborra, Roman (1998-11-30) [1998-05-25]. "The Cyrillic Charset Soup". Archived from the original on 2016-12-03. Retrieved 2016-12-03.

^ "Unicode Character Encoding Stability Policies". Unicode Consortium. 2017-06-23.

Notes

^ Only for ideographs covered by GB/T 2312, all of which fall into Unicode BMP

^
ISO 2022 compatible 94ⁿ-character set, the plain space and delete character
are available as single-byte codes at 0x20 and 0x7F (not 0xA0 and 0xFF) respectively.

GB 6345.1, including Apple's implementation and GB 18030 (which use 8-32 for U+0261),^[20] but for U+0261 by ISO-IR-165.^[23]

^
Private Use Area, but with a defined glyph,^[22]^[20] and by Apple to the regular fullwidth character with an appended private use character U+F87E as a variation marker.^[19] In the GB 18030-2022 update, these Private Use Area mappings has been eliminated and now mapped to their standard Unicode codepoints.^[26]

Private Use Area U+E7C7 by the first (2000) edition of GB 18030, and also by Windows-936;^[22] this was amended by the 2005 edition of GB 18030.^[20]

Private Use Area U+E7C8 by Windows-936.^[22]

^ Mapped to U+0261 in GB 18030^[20] and most other implementations based on GB 6345.1^[19] (which use 3-71 for U+FF47), but to U+FF47 in ISO-IR-165.^[23]^[25]

^ ɑ (U+0251)
ḿ (U+1E3F; Submitted in Unicode 3.0, thus CP936 did not include this character [1]^{[permanent dead link]})
ń (U+0144)
ň (U+0148)
ǹ (U+01F9; Submitted in Unicode 3.0, thus CP936 did not include this character [2]^{[permanent dead link]})
ɡ (U+0261)

Further reading

Lunde, Ken (2009). "Chinese Character Set Standards—China". CJKV Information Processing (2nd ed.). O'Reilly.
ISBN 978-0-596-51447-1
.

External links

Graphical View of GB2312 in ICU's Converter Explorer

Unicode to GB2312 or GBK table

Chinese Character Codes

Evolution of GBK and GB2312 into GB18030

GB2312 Character Set for Chinese Characters

Coded Chinese Graphic Character Set for Information Interchange ISO-IR 58

C code generates 6763 basic characters with output

GB2312-80 standard on China-Language.gov.cn

v
t
e
Chinese, Japanese and Korean computing
Encodings
Chinese

ISO-2022-CN

CNS 11643

Big5
HKSCS

GB 18030
GBK

GB 2312

GB/T 12345

HZ

ISO-IR-165

CCCII

Japanese

ISO-2022-JP

JIS
JIS X 0201

JIS X 0208

JIS X 0212

JIS X 0213

Shift-JIS

Korean

ISO-2022-KR

KS X 1001

KS X 1002

KPS 9566

GB 12052

International

EUC

ISO/IEC 2022

Unicode
CJK Unified Ideographs

Han unification

Input methods

Chinese

Japanese

Korean

Fonts

List of CJK fonts

v
t
e
Character encodings
Early telecommunications

Telegraph code
Needle

Morse
Non-Latin

Wabun/Kana

Chinese

Cyrillic

Korean

Baudot and Murray

Fieldata

ASCII
ISO/IEC 646

BCDIC

Teletex and Videotex/Teletext
T.51/ISO/IEC 6937

ITU T.61

ITU T.101

World System Teletext
background

sets

Transcode

ISO/IEC 8859

Approved parts
-1 (Western Europe)

-2 (Central Europe)

-3 (Maltese/Esperanto)

-4 (North Europe)

-5 (Cyrillic)

-6 (Arabic)

-7 (Greek)

-8 (Hebrew)

-9 (Turkish)

-10 (Nordic)

-11 (Thai)

-13 (Baltic)

-14 (Celtic)

-15 (New Western Europe)

-16 (Romanian)

Abandoned parts
-12 (Devanagari)

Proposed but not approved
KOI-8 Cyrillic

Sámi

Adaptations
Welsh

Barents Cyrillic

Estonian

Ukrainian Cyrillic

Bibliographic use

MARC-8
ANSEL

CCCII/EACC

ISO 5426

5426-2

5427

5428

6438

6862

National standards

ArmSCII

Big5

BraSCII

CNS 11643

DIN 66003

ELOT 927

GOST 10859

GB 2312

GB 12345

GB 12052

GB 18030

HKSCS

ISCII

JIS X 0201

JIS X 0208

JIS X 0212

JIS X 0213

KOI-7

KPS 9566

KS X 1001

KS X 1002

LST 1564

LST 1590-4

PASCII

Shift JIS

SI 960

TIS-620

TSCII

VISCII

VSCII

YUSCII

ISO/IEC 2022

ISO/IEC 8859

ISO/IEC 10367

Extended Unix Code / EUC

Mac OS Code pages
("scripts")

Armenian

Arabic

Barents Cyrillic

Celtic

Central European

Croatian

Cyrillic

Devanagari

Farsi (Persian)

Font X (Kermit)

Gaelic

Georgian

Greek

Gujarati

Gurmukhi

Hebrew

Iceland

Inuit

Keyboard

Latin (Kermit)

Maltese/Esperanto

Ogham

Roman

Romanian

Sámi

Turkish

Turkic Cyrillic

Ukrainian

VT100

DOS code pages

437

668

708

720

737

770

773

775

776

777

778

850

851

852

853

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

897

899

903

904

932

936

942

949

950

951

1034

1040

1042

1043

1044

1098

1115

1116

1117

1118

1127

3846

ABICOMP

CS Indic

CSX Indic

CSX+ Indic

CWI-2

Iran System

Kamenický

Mazovia

MIK

IBM AIX code pages

895

896

912

915

921

922

1006

1008

1009

1010

1012

1013

1014

1015

1016

1017

1018

1019

1046

1124

1133

Windows code pages

CER-GS

932

936 (GBK)

950

1169

Extended Latin-8

1250

1251

1252

1253

1254

1255

1256

1257

1258

1270

Cyrillic + Finnish

Cyrillic + French

Cyrillic + German

Polytonic Greek

EBCDIC code pages

Japanese language in EBCDIC

DKOI

DEC terminals (VTx)

Multinational (MCS)

National Replacement (NRCS)
French Canadian

Swiss

Spanish

United Kingdom

Dutch

Finnish

French

Norwegian and Danish

Swedish

Norwegian and Danish (alternative)

8-bit Greek

8-bit Turkish

SI 960

Hebrew

Special Graphics

Technical (TCS)

Platform specific

1052

1053

1054

1055

1056

1057

1058

Acorn RISC OS

Amstrad CPC

Apple II

ATASCII

Atari ST

BICS

Casio calculators

CDC

Compucolor 8001

Compucolor II

CP/M+

DEC RADIX 50

DEC MCS/NRCS

DG International

Galaksija

GEM

GSM 03.38

HP Roman

HP FOCAL

HP RPL

SQUOZE

LICS

LMBCS

MSX

NEC APC

NeXT

PETSCII

PostScript Standard

PostScript Latin 1

SAM Coupé

Sega SC-3000

Sharp calculators

Sharp MZ

Sinclair QL

Teletext

TI calculators

TRS-80

Ventura International

WISCII

XCCS

ZX80

ZX81

ZX Spectrum

Unicode / ISO/IEC 10646

UTF-1

UTF-7

UTF-8

UTF-16

UTF-32

UTF-EBCDIC

GB 18030

DIN 91379

BOCU-1

CESU-8

SCSU

TACE16

Comparison of Unicode encodings

TeX typesetting system

Cork

LY1

OML

OMS

OT1

Miscellaneous code pages

ABICOMP

ASMO 449

Digital encoding of APL symbols
ISO-IR-68

ARIB STD-B24

Fieldata

HZ

IEC-P27-1

INIS
7-bit

8-bit

ISO-IR-169

ISO 2033

KOI
KOI8-R

KOI8-RU

KOI8-U

Mojikyō

SEASCII

Stanford/ITS

Symbol

TRON

Unified Hangul Code

Control character

Morse prosigns

C0 and C1 control codes
ISO/IEC 6429

JIS X 0211

Unicode control, format and separator characters

Whitespace characters

Related topics

CCSID

Character encodings in HTML

Charset detection

Han unification

Hardware code page

MICR code

Mojibake

Variable-length encoding

Character sets

Retrieved from "https://en.wikipedia.org/w/index.php?title=GB_2312&oldid=1210960347"

[lunde2009-1] 
ISBN 978-0-596-51447-1
.

[SAC2017-7-2] "2017年第7号中国国家标准公告 (China National Standard Bulletin 2017 No.7)". Standardization Administration of the People's Republic of China. Retrieved 3 July 2018.

[ChinaDist-3] "Distribution of Character Encodings among websites that use China and territories". w3techs.com. Retrieved 2022-09-04.

[4] "Historical trends in the usage statistics of character encodings for websites, October 2022". w3techs.com. Retrieved 2022-10-01.

[5] "Encoding: Summarized test results". www.w3.org. Retrieved 2019-11-15.

[cjkv-12345-6] 
ISBN 9781565922242. {{cite book}}: |work= ignored (help
)

[7] GB12345-80 to Unicode table. Unicode Consortium. 1993-12-06. Archived from the original on 2004-06-17.

[8] ISBN 9780824818920
. the set provides for better than 99.99 percent of all usage. Nevertheless, the designers found it necessary to add 14,276 "special usage" characters to cover contingencies!

[9] "GB 2312-1980: Information technology—Chinese ideogram coded character set for information interchange (Basic set)". May 1981.

[11] "Unicode to GB2312 or GBK table". cs.nyu.edu. Archived from the original on 3 March 2016. Retrieved 11 January 2022.

[cjkvi-12] 
ISBN 978-0-596-51447-1
.

[gb2312-80-14] "GB 2312-1980: Information technology—Chinese ideogram coded character set for information interchange (basic set)". May 1981. Retrieved 2 October 2016.

[many-mappings-15] Haible, Bruno. "GB2312 (Conversion Tables)". Retrieved 29 September 2016.

[16] "Readme – MAPPINGS/OBSOLETE/EASTASIA". 9 August 2001. Retrieved 29 September 2016.

[17] "java-EUC_CN-1.3_P.ucm". Retrieved 29 September 2016.^{[permanent dead link]}

[18] "libiconv:lib/gb2312.h". GNU Savannah. Retrieved 29 September 2016.

[19] "Issue 24036". Python Bug Tracker.

[20] "Encoding § Names and labels". W3C. Retrieved 29 September 2016.

[macsimpchinese-21] 
Apple, Inc
.

[gb18030-22] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j Standardization Administration of China (SAC) (2005-11-18). GB 18030-2005: Information Technology—Chinese coded character set.

[iso-ir-58-23] ISO-IR
-58.

[ms936-24] ^ ^a ^b ^c ^d ^e ^f Microsoft. "CODEPAGE 936: PRC GBK (XGB) - ANSI, OEM". Unicode Consortium.

[ir165map-25] Viswanadha, Raghuram (2000-08-30). "Unicode to ISO-IR-165 table". International Components for Unicode. IBM.

[lunde2009chouon-27] 
ISBN 978-0-596-51447-1
.

[iso-ir-165-28] 
ISO-IR
-165.

[enc27-29] Lunde, Dr Ken (4 August 2022). "The GB 18030-2022 Standard". Medium. Retrieved 7 August 2022.

[Czyborra_1998_Cyrillic-31] Czyborra, Roman (1998-11-30) [1998-05-25]. "The Cyrillic Charset Soup". Archived from the original on 2016-12-03. Retrieved 2016-12-03.

[33] "Unicode Character Encoding Stability Policies". Unicode Consortium. 2017-06-23.

[10] Only for ideographs covered by GB/T 2312, all of which fall into Unicode BMP

[iso2022fixed-13] 
ISO 2022 compatible 94ⁿ-character set, the plain space and delete character
are available as single-byte codes at 0x20 and 0x7F (not 0xA0 and 0xFF) respectively.

[26] GB 6345.1, including Apple's implementation and GB 18030 (which use 8-32 for U+0261),^[20] but for U+0261 by ISO-IR-165.^[23]

[vertforms-30] 
Private Use Area, but with a defined glyph,^[22]^[20] and by Apple to the regular fullwidth character with an appended private use character U+F87E as a variation marker.^[19] In the GB 18030-2022 update, these Private Use Area mappings has been eliminated and now mapped to their standard Unicode codepoints.^[26]

[32] Private Use Area U+E7C7 by the first (2000) edition of GB 18030, and also by Windows-936;^[22] this was amended by the 2005 edition of GB 18030.^[20]

[34] Private Use Area U+E7C8 by Windows-936.^[22]

[35] Mapped to U+0261 in GB 18030^[20] and most other implementations based on GB 6345.1^[19] (which use 3-71 for U+FF47), but to U+FF47 in ISO-IR-165.^[23]^[25]

[36] ɑ (U+0251)
ḿ (U+1E3F; Submitted in Unicode 3.0, thus CP936 did not include this character [1]^{[permanent dead link]})
ń (U+0144)
ň (U+0148)
ǹ (U+01F9; Submitted in Unicode 3.0, thus CP936 did not include this character [2]^{[permanent dead link]})
ɡ (U+0261)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[9]

[a]

[10]

[11]

[b]

[12]

[13]

[14]

[18]

[19]

[21]

[22]

[20]

[c]

[24]

[25]

[d]

[27]

[e]

[f]

[g]

[note 1]

[23]

[26]