UTF-EBCDIC
Created by | IBM |
---|---|
Definitions | Unicode Technical Report #16 |
Based on | UTF-8 |
Transforms / Encodes | Unicode |
UTF-EBCDIC is a character encoding capable of encoding all 1,112,064 valid character code points in Unicode using one to five one-byte (8-bit) code units (in contrast to a maximum of four for UTF-8).[1] It is meant to be EBCDIC-friendly, so that legacy EBCDIC applications on mainframes may process the characters without much difficulty. Its advantages for existing EBCDIC-based systems are similar to UTF-8's advantages for existing ASCII-based systems. Details on UTF-EBCDIC are defined in Unicode Technical Report #16.
To produce the UTF-EBCDIC encoded version of a series of Unicode code points, an encoding based on UTF-8 (known in the specification as UTF-8-Mod) is applied first (creating what the specification calls an I8 sequence). The main difference between this encoding and UTF-8 is that it allows Unicode code points U+0080 through U+009F (the
The UTF-8-Mod transformation leaves the data in an ASCII-based format (for example, U+0041 "A" is still encoded as 01000001), so each byte is fed through a reversible (one-to-one) lookup table to produce the final UTF-EBCDIC encoding. For example, 01000001 in this table maps to 11000001; thus the UTF-EBCDIC encoding of U+0041 (Unicode's "A") is 0xC1 (EBCDIC's "A").
This encoding form is rarely used, even on the EBCDIC-based mainframes for which it was designed. IBM EBCDIC-based mainframe operating systems, such as z/OS, usually use UTF-16 for complete Unicode support. For example, IBM Db2, COBOL, PL/I, Java and the IBM XML toolkit support UTF-16 on IBM mainframes.
Codepage layout
There are 160 characters with single-byte encodings in UTF-EBCDIC (compared to 128 in UTF-8). As can be seen, the single-byte portion is similar to IBM-1047 instead of IBM-37 due to the location of the square brackets. CCSID 37 has [] at hex BA and BB instead of at hex AD and BD respectively.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | NUL | SOH
|
STX
|
ETX
|
ST
|
HT
|
SSA
|
DEL | EPA
|
RI
|
SS2 | VT
|
FF
|
CR
|
SO
|
SI
|
1x | DLE
|
DC1
|
DC2
|
DC3
|
OSC
|
LF
|
BS | ESA
|
CAN | EM
|
PU2
|
SS3 | FS
|
GS
|
RS
|
US
|
2x | PAD | HOP | BPH | NBH
|
IND
|
NEL
|
ETB
|
ESC | HTS | HTJ | VTS | PLD
|
PLU
|
ENQ | ACK
|
BEL |
3x | DCS
|
PU1
|
SYN
|
STS
|
CCH | MW
|
SPA
|
EOT
|
SOS
|
SGCI | SCI | CSI
|
DC4
|
NAK | PM | SUB |
4x | SP
|
• | • | • | • | • | • | • | • | • | • | . | < | ( | + | | |
5x | & | • | • | • | • | • | • | • | • | • | ! | $ | * | ) | ; | ^ |
6x | - | / | • | • | • | • | • | • | • | • | • | , | % | _ | > | ? |
7x | • | • | • | • | 2 | 2 | 2 | 2 | 2 | `
|
: | # | @ | ' | = | " |
8x | 2 | a | b | c | d | e | f | g | h | i | 2 | 2 | 2 | 2 | 2 | 2 |
9x | 2 | j | k | l | m | n | o | p | q | r | 2 | 2 | 2 | 2 | 2 | 2 |
Ax | 2 | ~ | s | t | u | v | w | x | y | z | 2 | 2 | 2 | [ | 2 | 2 |
Bx | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | ] | 3 | 3 |
Cx | { | A | B | C | D | E | F | G | H | I | 3 | 3 | 3 | 3 | 3 | 3 |
Dx | } | J | K | L | M | N | O | P | Q | R | 3 | 3 | 4 | 4 | 4 | 4 |
Ex | \ | 4 | S | T | U | V | W | X | Y | Z | 4 | 4 | 4 | 5 | 5 | |
Fx | 0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
APC
|
Oracle UTFE
Oracle UTFE is a Unicode 3.0 UTF-8
See also
References
- ^ "UTR #16: UTF-EBCDIC". www.unicode.org. Retrieved 2021-02-23.
You need to search at most five bytes (seven bytes, if the full range of 31 bits of ISO/IEC 10646 is considered) backwards
- ^ Baird, Cathy; Chiba, Dan; Chu, Winson; Fan, Jessica; Ho, Claire; Law, Simon; Lee, Geoff; Linsley, Peter; Matsuda, Keni; Oscroft, Tamzin; Takeda, Shige; Tanaka, Linus; Tozawa, Makoto; Trute, Barry; Tsujimoto, Mayumi; Wu, Ying; Yau, Michael; Yu, Tim; Wang, Chao; Wong, Simon; Zhang, Weiran; Zheng, Lei; Zhu, Yan; Moore, Valarie (2002) [1996]. "Appendix A: Locale Data". Oracle9i Database Globalization Support Guide (PDF) (Release 2 (9.2) ed.). Oracle Corporation. Oracle A96529-01. Archived (PDF) from the original on 2017-02-14. Retrieved 2017-02-14.
External links
- V.S. Umamaheswaran, Unicode Technical Report #16: the definition of UTF-EBCDIC (2002-04-16)