JIS X 0201
N-byte Hangul code | |
JIS X 0201, a
The first 96 codes comprise an
JIS X 0201 was supplanted by subsequent encodings such as Shift JIS, which combines this standard and JIS X 0208, and later by Unicode.
History
The
In 1963, ISO introduced a draft of ISO R 646 (6 and 7-bit coded character sets for information processing interchange). AIST committed the conjunction of ISO R 646 and katakana mapping to the Information Processing Society of Japan (IPSJ). IPSJ formed the code standardization committee. The committee didn't adopt the 6-bit form of ISO's draft because the katakana set couldn't fit into its character map. The early JIS draft mapped small katakana characters next to each of their normal katakana characters. It was considered to be convenient for sorting by Gojūon order (JIS X 0208:1978 chose this ordering). Some committee members criticized it would complicate the mechanic of keyboards which only handled normal katakana characters. The later draft mapped small katakana characters to positions 0xA7-0xAF.
The 1964 ISO draft reserved the positions 0x24 and 0x5c for first and second currency symbols to be assigned by each country, but it was considered too dangerous in international communications to use currency symbols that could be localized. The ISO committee had two options that to use a
JIS C 6220 (Codes for information interchange, 情報交換用符号) was published in 1969. Its number was changed to JIS X 0201 due to the JIS category reform in 1987, and the name was changed to 7-bit and 8-bit coded character sets for information interchange (7ビット及び8ビットの情報交換用符号化文字集合) in the 1990 edition.
The character set of JIS X 0201 had been widely used in Japan. The Nationwide Banking Data Communication System (全国銀行データ通信システム), the largest funds transfer system in Japan, was established in 1973. Transaction messages between banks used a subset of JIS X 0201. The system was used until 2018, and it was replaced by the ZEDI (The Nationwide Banking Electronic Data Interchange System, 全銀EDIシステム) which could handle hiragana and kanji characters., which became the industrial standard for personal computers.
Implementation details
The first half (Roman set) of JIS X 0201 constitutes a Japanese variant of
In the 7-bit format, the
In the 8-bit format, given in the chart below, bytes with the most significant bit set (i.e. 0x80–0xFF) are used for the Kana set and bytes with it unset (i.e. 0x00–0x7F) are used otherwise.Names used specifically for the 7-bit Roman set include "JISCII",[8] "JIS Roman",[9] "ISO646-JP",[10][11] "JIS C6220-1969-ro",[11][10] "Japanese-Roman",[12] "Japan 7-Bit Latin",[13] and "ISO-IR-14",[10][11][7] whereas names used specifically for the 7-bit Kana set include "ISO-IR-13",[6][10][11] "JIS C6220-1969-jp"[10][11] and "x0201-7".[10][11]
The substitution of the yen symbol for backslash can make paths on
printf("Hello, world.¥n");
.
Codepage layout
The following table is the original 8-bit coded character set of JIS X 0201 (with the kana set indicated by bytes with the high bit set).[15][16]
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | C0 codes[a]
| |||||||||||||||
1x | ||||||||||||||||
2x | SP
|
! | " | # | $ | % | & | ' | ( | ) | * | + | ,
|
- | . | / |
3x | 0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
: | ; | <
|
=
|
>
|
? |
4x | @
|
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [
|
¥
|
]
|
^ | _ |
6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | {
|
| | }
|
‾ | DEL |
8x | C1 codes or Empty Block[a]
| |||||||||||||||
9x | ||||||||||||||||
Ax | 。 | 「
|
」
|
、 | ・ | ヲ | ァ | ィ | ゥ | ェ | ォ | ャ | ュ | ョ | ッ | |
Bx | ー | ア | イ | ウ | エ | オ | カ | キ | ク | ケ | コ | サ | シ | ス | セ | ソ |
Cx | タ | チ | ツ | テ | ト | ナ | ニ | ヌ | ネ | ノ | ハ | ヒ | フ | ヘ | ホ | マ |
Dx | ミ | ム | メ | モ | ヤ | ユ | ヨ | ラ | リ | ル | レ | ロ | ワ | ン | ゙
|
゚
|
Ex | ||||||||||||||||
Fx |
As part of Shift JIS
Following is the mapping used for JIS X 0201 as part of
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | ||||||||||||||||
1x | ||||||||||||||||
2x | SP
|
! | " | # | $ | % | & | ' | ( | ) | * | + | ,
|
- | . | / |
3x | 0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
: | ; | <
|
=
|
>
|
? |
4x | @
|
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [
|
¥
|
]
|
^ | _ |
6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | {
|
| | }
|
‾ | |
8x | ||||||||||||||||
9x | ||||||||||||||||
Ax | 。 | 「
|
」
|
、 | ・ | ヲ | ァ | ィ | ゥ | ェ | ォ | ャ | ュ | ョ | ッ | |
Bx | ー | ア | イ | ウ | エ | オ | カ | キ | ク | ケ | コ | サ | シ | ス | セ | ソ |
Cx | タ | チ | ツ | テ | ト | ナ | ニ | ヌ | ネ | ノ | ハ | ヒ | フ | ヘ | ホ | マ |
Dx | ミ | ム | メ | モ | ヤ | ユ | ヨ | ラ | リ | ル | レ | ロ | ワ | ン | ゙
|
゚
|
Ex | ||||||||||||||||
Fx |
Alternative mapping of katakana
The basic
In theory, this mapping is equally correct, as JIS X 0201 itself does not specify display width, although in practice (and especially in duospaced environments) JIS X 0201 is used for half-width katakana.
For ease of comparison with the chart above, the mapping is shown below over the JIS X 0201 katakana encoding and with the high bit set.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
Ax | 。
|
「
|
」
|
、
|
・
|
ヲ
|
ァ
|
ィ
|
ゥ
|
ェ
|
ォ
|
ャ
|
ュ
|
ョ
|
ッ
| |
Bx | ー
|
ア
|
イ
|
ウ
|
エ
|
オ
|
カ
|
キ
|
ク
|
ケ
|
コ
|
サ
|
シ
|
ス
|
セ
|
ソ
|
Cx | タ
|
チ
|
ツ
|
テ
|
ト
|
ナ
|
ニ
|
ヌ
|
ネ
|
ノ
|
ハ
|
ヒ
|
フ
|
ヘ
|
ホ
|
マ
|
Dx | ミ
|
ム
|
メ
|
モ
|
ヤ
|
ユ
|
ヨ
|
ラ
|
リ
|
ル
|
レ
|
ロ
|
ワ
|
ン
|
゛[b]
|
゜[c]
|
Variants and extensions
Shift JIS
IBM's implementations
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | NUL | ╔
|
╗
|
╚
|
╝
|
║
|
═
|
↓ | BS | ○ | LF
|
〿
|
FF
|
CR | ■ | ☼
|
1x | ╬
|
DC1
|
↕ | DC3
|
▓
|
╩
|
╦
|
╣
|
CAN | ╠
|
░
|
↵
|
↑ | │
|
→ | ← |
IBM also implements the 7-bit Roman set of JIS X 0201 as
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
6x | ¢
|
£
|
¬
|
\
|
~
|
IBM's
IBM's
Others
-
NEC PC-8001(1979) character set as rendered in the 8×8 pixel font
-
PC98series.
Footnotes
- ^ a b Control characters are specified in JIS X 0211.
- compatibility normalization (which would be U+3099, the combining version).[22]
- compatibility normalization (which would be U+309A, the combining version).[22]
References
- OCLC 703804474.
- ^ Fischer, Eric N. (2000-06-20). "The Evolution of Character Codes, 1874–1968". ark:/13960/t07x23w8s. Retrieved 2023-11-02.
- ^ "経理部門の人材不足で悩む会社に朗報、金融EDI「ZEDI」が2018年稼働へ". Nikkei X-TECH. 2017-11-30. Retrieved 2019-07-24.
- ^ ISSN 0385-1680.
- ^ "3.1.1 Details of Problems". Problems and Solutions for Unicode and User/Vendor Defined Characters. The Open Group Japan. Archived from the original on 1999-02-03. Retrieved 2019-04-15.
- ^ a b Japanese Industrial Standards Committee. ISO-IR-13: The Japanese KATAKANA graphic set of characters (PDF). ITSCJ/IPSJ.
- ^ a b Japanese Industrial Standards Committee. ISO-IR-14: The Japanese Roman graphic set of characters (PDF). ITSCJ/IPSJ.
- ^ "IBM-943 and IBM-932", IBM Knowledge Center, IBM
- Apple Inc
- ^ RFC 1345
- ^ a b c d e f "Character Sets". IANA.
- ^ da Cruz, Frank (2010-04-02), "Kermit and MIME Character-Set Names", Kermit Project, Columbia University
- ^ "CP 00895", IBM Globalization — Code page identifiers, IBM, 9 November 2020
- ^ Kaplan, Michael S. (2005-09-17). "When is a backslash not a backslash?".
- ^ JIS X 0201-1997 (in Japanese). Japanese Standards Association. 1997-02-28. p. 17.
- ^ Unicode Consortium (2015-12-02). "JIS X 0201 (1976) to Unicode 1.1 Table". unicode.org. Retrieved 2021-10-01.
- ^ "ibm-943_P130-1999". ICU Demonstration - Converter Explorer. International Components for Unicode.
- Apple, Inc (2005-04-05) [1995-04-15]. "JAPANESE.TXT: Map (external version) from Mac OS Japanese encoding to Unicode 2.1 and later". Unicode Consortium.
- ^ van Kesteren, Anne (2019-02-11). "12.2.2. ISO-2022-JP encoder". Encoding Standard. WHATWG.
- ^ The WHATWG Encoding Standard, for instance, uses it as a transformation when encoding Unicode half-width kana data to ISO-2022-JP.[19]
- ^ van Kesteren, Anne (2018-01-06). "Index ISO-2022-JP Katakana". Encoding Standard. WHATWG.
- ^ a b van Kesteren, Anne (2019-02-11). "5. Indexes". Encoding Standard. WHATWG.
- ^ "Code page identifiers - CP 00897". IBM Globalization. IBM. Archived from the original on 2016-03-17.
- ^ "Code Page 01139" (PDF). IBM. Archived from the original (PDF) on 2015-07-08. Retrieved 2021-10-22.
- ^ "Code Page 01086" (PDF). IBM. Archived from the original (PDF) on 2015-07-08. Retrieved 2021-10-22.
- ^ "CP00897.pdf" (PDF). IBM.
- ^ "CP00897.txt". IBM.
- ^ "Converter Explorer - ibm-943_P130-1999". ICU Demonstration. International Components for Unicode.
- ^ "Coded character set identifiers - CCSID 943". IBM Globalization. IBM. Archived from the original on 2016-03-15.
- ^ Graphics are listed per CP00897.pdf and CP00897.txt provided by IBM.[26][27] Controls are listed, in absence of graphical function or where they differ from ASCII, per the ibm-943_P130-1999 codec provided by IBM to International Components for Unicode[28] (IBM-943 is a Code page 897 superset).[29] SUB is assigned to 0x7F.
- ^ "CP00895.pdf" (PDF). IBM.
- ^ a b "CP00896.pdf" (PDF). IBM.
- ^ "Coded character set identifiers - CCSID 896". IBM Globalization. IBM. Archived from the original on 2016-03-26.
- ^ "Coded character set identifiers - CCSID 4992". IBM Globalization. IBM. Archived from the original on 2016-03-27.
- ^ "11.2 - IBM Extended SBCS Set" (PDF). IBM Japanese Graphic Character Set for Extended UNIX Code (EUC). IBM. p. 315.
- ^ "CP01041.pdf" (PDF). IBM.
- ^ "Code Page 00911" (PDF). IBM. Archived from the original (PDF) on 2015-07-08. Retrieved 2021-10-22.
- ^ "Code page identifiers - CP 903". IBM Globalization. IBM. Archived from the original on 2016-03-17.
- ^ "Coded character set identifiers - CCSID 904". IBM Globalization. IBM. Archived from the original on 2016-03-27.
- ^ "CP00904.pdf" (PDF). IBM.
- ^ "CP00903.pdf" (PDF). IBM.
- ^ "Code Page 01042" (PDF). IBM. Archived from the original (PDF) on 2015-07-08.