ISO/IEC 2022

ISO 2022
Language(s)	Various.
Standard	ISO/IEC 4873; EUC;
	v; t; e;

ISO/IEC 2022 Information technology—Character code structure and extension techniques, is an

Japanese Industrial Standard JIS X 0202. Originating in 1971, it was most recently revised in 1994.^[4]

ISO 2022 specifies a general structure which character encodings can conform to, dedicating particular ranges of bytes (

ISO/IEC 6429, portions of which are implemented by ANSI.SYS and terminal emulators

.

ISO 2022 itself also defines particular control codes and escape sequences which can be used for switching between different

8BITMIME).^[8]

Encodings and conformance

The ASCII character set supports the

ISO 8859 series, conform to ISO 2022,^[9]^[10] while others such as DOS code page 437

do not, usually due to not reserving the bytes 0x80–9F for control codes.

Certain

EUC-JP also make use of ISO 2022 mechanisms.^[11]^[12]

Since the first 256

Unicode transformation formats such as UTF-8

generally deviate from the ISO 2022 structure in various ways, including:

Using 8-bit bytes, but not representing the C1 codes in their single-byte forms specified in ISO 2022 (most UTFs, one exception being the obsolete UTF-1)
Representing all characters, including control codes, with multiple bytes (e.g. UTF-16, UTF-32)
Mixing bytes with the
most significant bit set and unset within the coded representation for a single code point (e.g. UTF-1, GB 18030
)

ISO 2022 escape sequences do, however, exist for switching to and from UTF-8 as a "coding system different from that of ISO 2022",^[13] which are supported by certain terminal emulators such as xterm.^[14]

Overview

Elements

ISO/IEC 2022 specifies the following:

An infrastructure of multiple character sets with particular structures which may be included in a single character encoding system, including multiple graphical character sets and multiple sets of both primary (C0) and secondary (C1) control codes,^[15]
A format for encoding these sets, assuming that 8 bits are available per byte,^[16]
A format for encoding these sets in the same encoding system when only 7 bits are available per byte,^[17] and a method for transforming any conformant character data to pass through such a 7-bit environment,^[8]
The general structure of
ANSI escape codes,^[6]
and

Specific escape code formats for identifying individual character sets,[7] for announcing the use of particular encoding features or subsets,^[18] and for interacting with or switching to other encoding systems.^[18]

Code versions

A specific implementation does not have to implement all of the standard; the conformance level and the supported character sets are defined by the implementation. Although many of the mechanisms defined by the ISO/IEC 2022 standard are infrequently used, several established encodings are based on a subset of the ISO/IEC 2022 system.

MARC 21 library records.^[3]

Designation escape sequences

The escape sequences for switching to particular character sets or encodings are registered with the ISO-IR registry (except for those set apart for private use, the meanings of which are defined by vendors, or by protocol specifications such as ARIB STD-B24) and follow the patterns defined within the standard. Character encodings making use of these escape sequences require data to be processed sequentially in a forward direction, since the correct interpretation of the data depends on previously encountered escape sequences.

Specific profiles such as ISO-2022-JP may impose extra conditions, such as that the current character set is reset to US-ASCII before the end of a line. Furthermore, the escape sequences declaring the national character sets may be absent if a specific ISO-2022-based encoding permits or requires this, and dictates that particular national character sets are to be used. For example, ISO-8859-1 states that no defining escape sequence is needed.

Multi-byte characters

To represent large character sets, ISO/IEC 2022 builds on

CCCII

).

For the two-byte character sets, the

kuten^[a] form, which comprises two numbers between 1 and 94 inclusive, specifying a row^[b] and cell^[c] of that character within the zone. For a three-byte set, an additional plane^[d] number is included at the beginning.^[20]

The escape sequences do not only declare which character set is being used, but also whether the set is single-byte or multi-byte (although not how many bytes it uses if it is multi-byte), and also whether each byte has 94 or 96 permitted values.

Code structure

Notation and nomenclature

ISO/IEC 2022 coding specifies a two-layer mapping between character codes and displayed characters. Escape sequences allow any of a large registry of graphic character sets to be "designated"^[21] into one of four working sets, named G0 through G3, and shorter control sequences specify the working set that is "invoked"^[22] to interpret bytes in the stream.

Encoding byte values ("bit combinations") are often given in column-line notation, where two decimal numbers in the range 00–15 (each corresponding to a single hexadecimal digit) are separated by a slash.^[23] Hence, for instance, codes 2/0 (0x20) through 2/15 (0x2F) inclusive may be referred to as "column 02". This is the notation used in the ISO/IEC 2022 / ECMA-35 standard itself.^[24] They may be described elsewhere using hexadecimal, as is often used in this article, or using the corresponding ASCII characters,^[25] although the escape sequences are actually defined in terms of byte values, and the graphic assigned to that byte value may be altered without affecting the control sequence.

Byte values from the 7-bit ASCII graphic range (hexadecimal 0x20–0x7F), being on the left side of a character code table, are referred to as "GL" codes (with "GL" standing for "graphics left") while bytes from the "high ASCII" range (0xA0–0xFF), if available (i.e. in an 8-bit environment), are referred to as the "GR" codes ("graphics right").^[5] The terms "CL" (0x00–0x1F) and "CR" (0x80–0x9F) are defined for the control ranges, but the CL range always invokes the primary (C0) controls, whereas the CR range always either invokes the secondary (C1) controls or is unused.^[5]

Fixed coded characters

The

space character SP (0x20) are designated "fixed" coded characters^[26] and are always available when G0 is invoked over GL, irrespective of what character sets are designated. They may not be included in graphical character sets, although other sizes or types of whitespace character may be.^[27]

General syntax of escape sequences

Sequences using the ESC (escape) character take the form ESC [I...] F, where the ESC character is followed by zero or more intermediate bytes [28] (I) from the range 0x20–0x2F, and one final byte^[29] (F) from the range 0x30–0x7E.^[30]

The first I byte, or absence thereof, determines the type of escape sequence; it might, for instance, designate a working set, or denote a single control function. In all types of escape sequences, F bytes in the range 0x30–0x3F are reserved for unregistered private uses defined by prior agreement between parties.^[31]

Control functions from some sets may make use of further bytes following the escape sequence proper. For example, the

Control Sequence Introducer", which can be represented using an escape sequence, is followed by zero or more bytes in the range 0x30–0x3F, then zero or more bytes in the range 0x20–0x2F, then by a single byte in the range 0x40–0x7E, the entire sequence being called a "control sequence".^[32]

Graphical character sets

Each of the four working sets G0 through G3 may be a 94-character set or a 94ⁿ-character

multi-byte set

. Additionally, G1 through G3 may be a 96- or 96ⁿ-character set.

In a 96- or 96ⁿ-character set, the bytes 0x20 through 0x7F when GL-invoked, or 0xA0 through 0xFF when GR-invoked, are allocated to and may be used by the set. In a 94- or 94ⁿ-character set, the bytes 0x20 and 0x7F are not used.[33] When a 96- or 96ⁿ-character set is invoked in the GL region, the space and delete characters (codes 0x20 and 0x7F) are not available until a 94- or 94ⁿ-character set (such as the G0 set) is invoked in GL.^[5] 96-character sets cannot be designated to G0.

Registration of a set as a 96-character set does not necessarily mean that the 0x20/A0 and 0x7F/FF bytes are actually assigned by the set; some examples of graphical character sets which are registered as 96-sets but do not use those bytes include the G1 set of

CCITT).^[36]

Combining characters

Characters are expected to be spacing characters, not combining characters, unless specified otherwise by the graphical set in question.[37] ISO 2022 / ECMA-35 also recognizes the use of the backspace and carriage return control characters as means of combining otherwise spacing characters, as well as the CSI sequence "Graphic Character Combination" (GCC)^[37] (CSI 0x20 (SP) 0x5F (_)).^[38]

Use of the backspace and carriage return in this manner is permitted by

ISO/IEC 4873 / ECMA-43^[39] and by ISO/IEC 8859,^[40]^[41] on the basis that it leaves the graphical character repertoire undefined. ISO/IEC 4873 / ECMA-43 does, however, permit the use of the GCC function provided that the sequence of characters is kept the same and merely displayed in one space, rather than being over-stamped to form a character with a different meaning.^[42]

Control character sets

Control character sets are classified as "primary" or "secondary" control code sets,[43] respectively also called "C0" and "C1" control code sets.^[44]

A C0 control set must contain the ESC (escape) control character at 0x1B^[45] (a C0 set containing only ESC is registered as ISO-IR-104),^[46] whereas a C1 control set may not contain the escape control whatsoever.^[33] Hence, they are entirely separate registrations, with a C0 set being only a C0 set and a C1 set being only a C1 set.^[44]

If codes from the C0 set of ISO 6429 / ECMA-48, i.e. the ASCII control codes, appear in the C0 set, they are required to appear at their ISO 6429 / ECMA-48 locations.^[45] Inclusion of transmission control characters in the C0 set, besides the ten included by ISO 6429 / ECMA-48 (namely SOH, STX, ETX, EOT, ENQ, ACK, DLE, NAK, SYN and ETB),^[47] or inclusion of any of those ten in the C1 set, is also prohibited by the ISO/IEC 2022 / ECMA-35 standard.^[45]^[33]

A C0 control set is invoked over the CL range 0x00 through 0x1F,^[48] whereas a C1 control function may be invoked over the CR range 0x80 through 0x9F (in an 8-bit environment) or by using escape sequences (in a 7-bit or 8-bit environment),^[43] but not both. Which style of C1 invocation is used must be specified in the definition of the code version.^[49] For example, ISO/IEC 4873 specifies CR bytes for the C1 controls which it uses (SS2 and SS3).^[50] If necessary, which invocation is used may be communicated using announcer sequences.

In the latter case, single control functions from the C1 control code set are invoked using "type Fe" escape sequences,^[33] meaning those where the ESC control character is followed by a byte from columns 04 or 05 (that is to say, ESC 0x40 (@) through ESC 0x5F (_)).^[51]

Other control functions

Additional control functions are assigned to "type Fs" escape sequences (in the range ESC 0x60 (`) through ESC 0x7E (~)); these have permanently assigned meanings rather than depending on the C0 or C1 designations.^[51]^[52] Registration of control functions to type "Fs" sequences must be approved by ISO/IEC JTC 1/SC 2.^[52] Other single control functions may be registered to type "3Ft" escape sequences (in the range ESC 0x23 (#) [I...] 0x40 (@) through ESC 0x23 (#) [I...] 0x7E (~)),^[53] although no "3Ft" sequences are currently assigned (as of 2019).^[54] Some of these are specified in ECMA-35 (ISO 2022 / ANSI X3.41), others in ECMA-48 (ISO 6429 / ANSI X3.64).^[55] ECMA-48 refers to these as "independent control functions".^[56]

Code	Hex	Abbr.	Name	Effect^[54]
ESC `	`1B 60`	DMI	Disable manual input	Disables some or all of the manual input facilities of the device.
`ESC a`	`1B 61`	INT	Interrupt	Interrupts the current process.
`ESC b`	`1B 62`	EMI	Enable manual input	Enables the manual input facilities of the device.
`ESC c`	`1B 63`	RIS	Reset to initial state	Resets the device to its state after being powered on.^[57]
`ESC d`	`1B 64`	CMD	Coding method delimiter	Used when interacting with an outer coding / representation system, see below.
`ESC n`	`1B 6E`	LS2	Locking shift two	Shift function, see below.
`ESC o`	`1B 6F`	LS3	Locking shift three	Shift function, see below.
`ESC \|`	`1B 7C`	LS3R	Locking shift three right	Shift function, see below.
`ESC }`	`1B 7D`	LS2R	Locking shift two right	Shift function, see below.
`ESC ~`	`1B 7E`	LS1R	Locking shift one right	Shift function, see below.

Escape sequences of type "Fp" (ESC 0x30 (0) through ESC 0x3F (?)) or of type "3Fp" (ESC 0x23 (#) [I...] 0x30 (0) through ESC 0x23 (#) [I...] 0x3F (?)) are reserved for single private use control codes, by prior agreement between parties.^[58] Several such sequences of both types are used by DEC terminals such as the VT100, and are thus supported by terminal emulators.^[14]

Shift functions

By default, GL codes specify G0 characters and GR codes (where available) specify G1 characters; this may be otherwise specified by prior agreement. The set invoked over each area may also be modified with control codes referred to as shifts, as shown in the table below.^[59]

An 8-bit code may have GR codes specifying G1 characters, i.e. with its corresponding 7-bit code using

T.51).^[61]

The codes shown in the table below are the most common encodings of these control codes, conforming to ISO/IEC 6429. The LS2, LS3, LS1R, LS2R and LS3R shifts are registered as single control functions and are always encoded as the escape sequences listed below,^[54] whereas the others are part of a C0 or C1 control code set (as shown below, SI (LS0) and SO (LS1) are C0 controls and SS2 and SS3 are C1 controls), meaning that their coding and availability may vary depending on which control sets are designated: they must be present in the designated control sets if their functionality is used.^[48]^[49] The C1 controls themselves, as mentioned above, may be represented using escape sequences or 8-bit bytes, but not both.

Alternative encodings of the single-shifts as C0 control codes are available in certain control code sets. For example, SS2 and SS3 are usually available at 0x19 and 0x1D respectively in

ISO/IEC 4873 levels 2 and 3.^[69]

Code	Hex	Abbr.	Name	Effect
`SI`	`0F`	SI LS0	Shift In Locking shift zero	GL encodes G0 from now on^[70]^[71]
`SO`	`0E`	SO LS1	Shift Out Locking shift one	GL encodes G1 from now on^[70]^[71]
`ESC n`	`1B 6E`	LS2	Locking shift two	GL encodes G2 from now on^[70]^[71]
`ESC o`	`1B 6F`	LS3	Locking shift three	GL encodes G3 from now on^[70]^[71]
CR area: `SS2` Escape code: `ESC N`	CR area: `8E` Escape code: `1B 4E`	SS2	Single shift two	GL or GR (see below) encodes G2 for the immediately following character only^[72]
CR area: `SS3` Escape code: `ESC O`	CR area: `8F` Escape code: `1B 4F`	SS3	Single shift three	GL or GR (see below) encodes G3 for the immediately following character only^[72]
`ESC ~`	`1B 7E`	LS1R	Locking shift one right	GR encodes G1 from now on^[73]
`ESC }`	`1B 7D`	LS2R	Locking shift two right	GR encodes G2 from now on^[73]
`ESC \|`	`1B 7C`	LS3R	Locking shift three right	GR encodes G3 from now on^[73]

Although officially considered shift codes and named accordingly, single-shift codes are not always viewed as shifts,

ISO/IEC 4873 specifies GL, whereas packed EUC specifies GR. In 7-bit environments, only GL is used as the single-shift area.^[74]^[75] If necessary, which single-shift area is used may be communicated using announcer sequences

.

The names "locking shift zero" (LS0) and "locking shift one" (LS1) refer to the same pair of C0 control characters (0x0F and 0x0E) as the names "shift in" (SI) and "shift out" (SO). However, the standard refers to them as LS0 and LS1 when they are used in 8-bit environments and as SI and SO when they are used in 7-bit environments.[59]

The ISO/IEC 2022 / ECMA-35 standard permits, but discourages, invoking G1, G2 or G3 in both GL and GR simultaneously.^[76]

Registration of graphical and control code sets

The ISO International register of coded character sets to be used with escape sequences (ISO-IR) lists graphical character sets, control code sets, single control codes and so forth which have been registered for use with ISO/IEC 2022. The procedure for registering codes and sets with the ISO-IR registry is specified by ISO/IEC 2375. Each registration receives a unique escape sequence, and a unique registry entry number to identify it.

Simplified Chinese is known as ISO-IR-165

.

Registration of coded character sets with the ISO-IR registry identifies the documents specifying the character set or control function associated with an ISO/IEC 2022 non‑private-use escape sequence. This may be a standard document; however, registration does not create a new ISO standard, does not commit the ISO or IEC to adopt it as an international standard, and does not commit the ISO or IEC to add any of its characters to the Universal Coded Character Set.^[79]

ISO-IR registered escape sequences are also used encapsulated in a

ISO 646-1983,^[80] and the HTML 4.01 specification uses ISO Registration Number 177//CHARSET ISO/IEC 10646-1:1993 UCS-4 with implementation level 3//ESC 2/5 2/15 4/6 to identify Unicode.^[81] The textual representation of the escape sequence, included in the third element of the FPI, will be recognised by SGML implementations for supported character sets.^[80]

Character set designations

Escape sequences to designate character sets take the form ESC I [I...] F. As mentioned above, the intermediate (I) bytes are from the range 0x20–0x2F, and the final (F) byte is from the range 0x30–0x7E. The first I byte (or, for a multi-byte set, the first two) identifies the type of character set and the working set it is to be designated to, whereas the F byte (and any additional I bytes) identify the character set itself, as assigned in the ISO-IR register (or, for the private-use escape sequences, by prior agreement).

Additional I bytes may be added before the F byte to extend the F byte range. This is currently only used with 94-character sets, where codes of the form ESC ( ! F have been assigned.[82] At the other extreme, no multibyte 96-sets have been registered, so the sequences below are strictly theoretical.

As with other escape sequence types, the range 0x30–0x3F is reserved for private-use F bytes,^[31] in this case for private-use character set definitions (which might include unregistered sets defined by protocols such as ARIB STD-B24^[83] or MARC-8,^[3] or vendor-specific sets such as DEC Special Graphics).^[84] However, in a graphical set designation sequence, if the second I byte (for a single-byte set) or the third I byte (for a double-byte set) is 0x20 (space), the set denoted is a "dynamically redefinable character set" (DRCS) defined by prior agreement,^[85] which is also considered private use.^[31] A graphical set being considered a DRCS implies that it represents a font of exact glyphs, rather than a set of abstract characters.^[86] The manner in which DRCS sets and associated fonts are transmitted, allocated and managed is not stipulated by ISO/IEC 2022 / ECMA-35 itself, although it recommends allocating them sequentially starting with F byte 0x40 (@);^[87] however, a manner for transmitting DRCS fonts is defined within some telecommunication protocols such as World System Teletext.^[88]

There are also three special cases for multi-byte codes. The code sequences ESC $ @, ESC $ A, and ESC $ B were all registered when the contemporary version of the standard allowed multi-byte sets only in G0, so must be accepted in place of the sequences ESC $ ( @ through ESC $ ( B to designate to the G0 character set.^[89]

There are additional (rarely used) features for switching control character sets, but this is a single-level lookup, in that (as noted above) the C0 set is always invoked over CL, and the C1 set is always invoked over CR or by using escape codes. As noted above, it is required that any C0 character set include the ESC character at position 0x1B, so that further changes are possible. The control set designation sequences (as opposed to the graphical set ones) may also be used from within

ISO/IEC 10646 (UCS/Unicode), in contexts where processing ANSI escape codes is appropriate, provided that each byte in the sequence is padded to the code unit size of the encoding.^[90]

A table of escape sequence I bytes and the designation or other function which they perform is below.[91]

Code	Hex	Abbr.	Name	Effect	Example
`ESC SP F`	`1B 20 F`	ACS	Announce code structure	Specifies code features used, e.g. working sets (see below).^[92]	`ESC SP L` ( ISO 4873 level 1)
`ESC ! F`	`1B 21 F`	CZD	C0-designate	`F` selects a C0 control character set to be used.^[93]	`ESC ! @` (ASCII C0 codes)
`ESC " F`	`1B 22 F`	C1D	C1-designate	`F` selects a C1 control character set to be used.^[94]	`ESC " C` (ISO 6429 C1 codes)
`ESC # F`	`1B 23 F`	-	(Single control function)	(Reserved for sequences for control functions, see above.)	`ESC # 6` (private use: DEC Double Width Line)^[95]
`ESC $ F`^[e] `ESC $ ( F`	`1B 24 F`^[e] `1B 24 28 F`	GZDM4	G0-designate multibyte 94-set	`F` selects a 94ⁿ-character set to be used for G0.^[89]	`ESC $ ( C` (KS X 1001 in G0)
`ESC $ ) F`	`1B 24 29 F`	G1DM4	G1-designate multibyte 94-set	`F` selects a 94ⁿ-character set to be used for G1.^[89]	`ESC $ ) A` (GB 2312 in G1)
`ESC $ * F`	`1B 24 2A F`	G2DM4	G2-designate multibyte 94-set	`F` selects a 94ⁿ-character set to be used for G2.^[89]	`ESC $ * B` (JIS X 0208 in G2)
`ESC $ + F`	`1B 24 2B F`	G3DM4	G3-designate multibyte 94-set	`F` selects a 94ⁿ-character set to be used for G3.^[89]	`ESC $ + D` (JIS X 0212 in G3)
`ESC $ , F`	`1B 24 2C F`	-	(not used)	(not used)^[f]	-
`ESC $ - F`	`1B 24 2D F`	G1DM6	G1-designate multibyte 96-set	`F` selects a 96ⁿ-character set to be used for G1.^[89]	`ESC $ - 1` (private use)
`ESC $ . F`	`1B 24 2E F`	G2DM6	G2-designate multibyte 96-set	`F` selects a 96ⁿ-character set to be used for G2.^[89]	`ESC $ . 2` (private use)
`ESC $ / F`	`1B 24 2F F`	G3DM6	G3-designate multibyte 96-set	`F` selects a 96ⁿ-character set to be used for G3.^[89]	`ESC $ / 3` (private use)
`ESC % F`	`1B 25 F`	DOCS	Designate other coding system	Switches coding system, see below.	`ESC % G` (UTF-8)
`ESC & F`	`1B 26 F`	IRR	Identify revised registration	Prefixes designation escape to denote revision.^[g]	`ESC & @ ESC $ B` (JIS X 0208:1990 in G0)
`ESC ' F`	`1B 27 F`	-	(not used)	(not used)	-
`ESC ( F`	`1B 28 F`	GZD4	G0-designate 94-set	`F` selects a 94-character set to be used for G0.^[89]	`ESC ( B` (ASCII in G0)
`ESC ) F`	`1B 29 F`	G1D4	G1-designate 94-set	`F` selects a 94-character set to be used for G1.^[89]	`ESC ) I` (JIS X 0201 Kana in G1)
`ESC * F`	`1B 2A F`	G2D4	G2-designate 94-set	`F` selects a 94-character set to be used for G2.^[89]	`ESC * v` (ITU T.61 RHS in G2)
`ESC + F`	`1B 2B F`	G3D4	G3-designate 94-set	`F` selects a 94-character set to be used for G3.^[89]	`ESC + D` (NATS-SEFI-ADD in G3)
`ESC , F`	`1B 2C F`	-	(not used)	(not used)^[h]	-
`ESC - F`	`1B 2D F`	G1D6	G1-designate 96-set	`F` selects a 96-character set to be used for G1.^[89]	`ESC - A` ( ISO 8859-1 RHS in G1)
`ESC . F`	`1B 2E F`	G2D6	G2-designate 96-set	`F` selects a 96-character set to be used for G2.^[89]	`ESC . B` ( ISO 8859-2 RHS in G2)
`ESC / F`	`1B 2F F`	G3D6	G3-designate 96-set	`F` selects a 96-character set to be used for G3.^[89]	`ESC / b` ( ISO 8859-15 RHS in G3)

Note that the registry of F bytes is independent for the different types. The 94-character graphic set designated by ESC ( A through ESC + A is not related in any way to the 96-character set designated by ESC - A through ESC / A. And neither of those is related to the 94ⁿ-character set designated by ESC $ ( A through ESC $ + A, and so on; the final bytes must be interpreted in context. (Indeed, without any intermediate bytes, ESC A is a way of specifying the C1 control code 0x81.)

Also note that C0 and C1 control character sets are independent; the C0 control character set designated by ESC ! A (which happens to be the NATS control set for newspaper text transmission) is not the same as the C1 control character set designated by ESC " A (the

CCITT attribute control set for Videotex

).

Interaction with other coding systems

The standard also defines a way to specify coding systems that do not follow its own structure.

A sequence is also defined for returning to ISO/IEC 2022; the registrations which support this sequence as encoded in ISO/IEC 2022 comprise (as of 2019) various

Unicode/UCS formats, or subsets thereof.^[101]

Code	Hex	Abbr.	Name	Effect
`ESC % @`	`1B 25 40`	DOCS	Designate other coding system ("standard return")	Return to ISO/IEC 2022 from another encoding.^[100]
`ESC % F`	`1B 25 F`		Designate other coding system ("with standard return")^[99]	`F` selects an 8-bit code; use `ESC % @` to return.^[100]
`ESC % / F`	`1B 25 2F F`		Designate other coding system ("without standard return")^[101]	`F` selects an 8-bit code; there is no standard way to return.^[100]
`ESC d`	`1B 64`	CMD	Coding method delimiter	Denotes the end of an ISO/IEC 2022 coded sequence.^[102]

Of particular interest are the sequences which switch to

ISO/IEC 10646 (Unicode) formats which do not follow the ISO/IEC 2022 structure. These include UTF-8 (which does not reserve the range 0x80–0x9F for control characters), its predecessor UTF-1 (which mixes GR and GL bytes in multi-byte codes), and UTF-16 and UTF-32 (which use wider coding units).^[99]^[101]

Several codes were also registered for subsets (levels 1 and 2) of UTF-8, UTF-16 and UTF-32, as well as for three levels of

big-endian formats of UTF-16 and UTF-32 are designated by their escape sequences.^[104]

Unicode Format	Code(s)	Hex^[103]	Deprecated codes	Deprecated hex^[99]^[101]^[103]
UTF-1	(UTF-1 not in current ISO/IEC 10646.)		`ESC % B`	`1B 25 42`
UTF-8	`ESC % G`, `ESC % / I`	`1B 25 47`,^[13] `1B 25 2F 49`^[105]	`ESC % / G`, `ESC % / H`	`1B 25 2F 47`, `1B 25 2F 48`
UTF-16	`ESC % / L`	`1B 25 2F 4C`^[106]	`ESC % / @`, `ESC % / C`, `ESC % / E`, `ESC % / J`, `ESC % / K`	`1B 25 2F 40`, `1B 25 2F 43`, `1B 25 2F 45`, `1B 25 2F 4A`, `1B 25 2F 4B`
UTF-32	`ESC % / F`	`1B 25 2F 46`	`ESC % / A`, `ESC % / D`	`1B 25 2F 41`, `1B 25 2F 44`

Of the sequences switching to UTF-8, ESC % G is the one supported by, for example, xterm.^[14]

Although use of a variant of the standard return sequence from UTF-16 and UTF-32 is permitted, the bytes of the escape sequence must be padded to the size of the code unit of the encoding (i.e. 001B 0025 0040 for UTF-16), i.e. the coding of the standard return sequence does not conform exactly to ISO/IEC 2022. For this reason, the designations for UTF-16 and UTF-32 use a without-standard-return syntax.^[107]

For specifying encodings by labels, the

Compound Text format defines five private-use DOCS sequences.^[108]

Code structure announcements

The sequence "announce code structure" (ESC SP (0x20) F) is used to announce a specific code structure, or a specific group of ISO 2022 facilities which are used in a particular code version. Although announcements can be combined, certain contradictory combinations (specifically, using locking shift announcements 16–23 with announcements 1, 3 and 4) are prohibited by the standard, as is using additional announcements on top of

ISO/IEC 4873 level announcements 12–14^[92]

(which fully specify the permissible structural features). Announcement sequences are as follows:

Number	Code	Hex	Code version feature announced^[92]
1	`ESC SP A`	`1B 20 41`	G0 in GL, GR absent or unused, no locking shifts.
2	`ESC SP B`	`1B 20 42`	G0 and G1 invoked to GL by locking shifts, GR absent or unused.
3	`ESC SP C`	`1B 20 43`	G0 in GL, G1 in GR, no locking shifts, requires an 8-bit environment.
4	`ESC SP D`	`1B 20 44`	G0 in GL, G1 in GR if 8-bit, no locking shifts unless in a 7-bit environment.
5	`ESC SP E`	`1B 20 45`	Shift functions preserved during 7-bit/8-bit conversion.
6	`ESC SP F`	`1B 20 46`	C1 controls using escape sequences.
7	`ESC SP G`	`1B 20 47`	C1 controls in CR region in 8-bit environments, as escape sequences otherwise.
8	`ESC SP H`	`1B 20 48`	94-character graphical sets only.
9	`ESC SP I`	`1B 20 49`	94-character and/or 96-character graphical sets.
10	`ESC SP J`	`1B 20 4A`	Uses a 7-bit code, even if an eighth bit is available for use.
11	`ESC SP K`	`1B 20 4B`	Requires an 8-bit code.
12	`ESC SP L`	`1B 20 4C`	Complies to ISO/IEC 4873 (ECMA-43) level 1.
13	`ESC SP M`	`1B 20 4D`	Complies to ISO/IEC 4873 (ECMA-43) level 2.
14	`ESC SP N`	`1B 20 4E`	Complies to ISO/IEC 4873 (ECMA-43) level 3.
16	`ESC SP P`	`1B 20 50`	SI / LS0 used.
18	`ESC SP R`	`1B 20 52`	SO / LS1 used.
19	`ESC SP S`	`1B 20 53`	LS1R used in 8-bit environments, SO used in 7-bit environments.
20	`ESC SP T`	`1B 20 54`	LS2 used.
21	`ESC SP U`	`1B 20 55`	LS2R used in 8-bit environments, LS2 used in 7-bit environments.
22	`ESC SP V`	`1B 20 56`	LS3 used.
23	`ESC SP W`	`1B 20 57`	LS3R used in 8-bit environments, LS3 used in 7-bit environments.
26	`ESC SP Z`	`1B 20 5A`	SS2 used.
27	`ESC SP [`	`1B 20 5B`	SS3 used.
28	`ESC SP \`	`1B 20 5C`	Single-shifts invoke over GR.

ISO/IEC 2022 code versions

cross site scripting

attacks.)

Six 7-bit ISO 2022 code versions (ISO-2022-CN, ISO-2022-CN-EXT, ISO-2022-JP, ISO-2022-JP-1, ISO-2022-JP-2 and ISO-2022-KR) are defined by

replacement character,^[113] due to concerns about code injection attacks such as cross-site scripting.^[111]^[113]

8-bit code versions include Extended Unix Code.^[11]^[12] The ISO/IEC 8859 encodings also follow ISO 2022, in a subset stipulated in ISO/IEC 4873.^[9]^[10]

Japanese e-mail versions

ISO-2022-JP

ISO-2022-JP is a widely used encoding for Japanese, in particular in

IETF RFC 1468, dated 1993.^[114] It has an advantage over other encodings for Japanese in that it does not require 8-bit clean transmission. Microsoft calls it Code page 50220.^[115]

It starts in ASCII and includes the following escape sequences:

ESC ( B to switch to ASCII (1 byte per character)

ESC ( J to switch to JIS X 0201-1976 (ISO/IEC 646:JP) Roman set (1 byte per character)
ESC $ @ to switch to JIS X 0208-1978 (2 bytes per character)
ESC $ B to switch to JIS X 0208-1983 (2 bytes per character)

Use of the two characters added in JIS X 0208-1990 is permitted, but without including the IRR sequence, i.e. using the same escape sequence as JIS X 0208-1983.^[114] Also, due to being registered before designating multi-byte sets except to G0 was possible, the escapes for JIS X 0208 do not include the second I-byte (.^[89]

The RFC notes that some existing systems did not distinguish ESC ( B from ESC ( J, or did not distinguish ESC $ @ from ESC $ B, but stipulates that the escape sequences should not be changed by systems simply relaying messages such as e-mails.

ISO 646 and World System Teletext).^[114]^[i]

Versions with halfwidth katakana

Use of ESC ( I to switch to the

EUC-JP);^[117]^[118] this is close in both name and structure to an encoding denoted ISO-2022-JPext by DEC, which furthermore adds a two-byte user-defined region accessed with ESC $ ( 0 to complete the coverage of Super DEC Kanji.^[119] The WHATWG/HTML5 variant permits decoding JIS X 0201 katakana in ISO-2022-JP input, but converts the characters to their JIS X 0208 equivalents upon encoding.^[116] Microsoft's code page for ISO-2022-JP with JIS X 0201 kana additionally permitted is Code page 50221.^[115]

Other, older variants known as JIS7 and JIS8 build directly on the 7-bit and 8-bit encodings defined by JIS X 0201 and allow use of JIS X 0201 kana from G1 without escape sequences, using Shift Out and Shift In or setting the eighth bit (GR-invoked), respectively.^[120] They are not widely used;^[120] JIS X 0208 support in extended 8-bit JIS X 0201 is more commonly achieved via Shift JIS. Microsoft's code page for JIS X 0201-based ISO 2022 with single-byte katakana via Shift Out and Shift In is Code page 50222.^[115]

ISO-2022-JP-2

ISO-2022-JP-2 is a multilingual extension of ISO-2022-JP, defined in RFC 1554 (dated 1993), which permits the following escape sequences in addition to the ISO-2022-JP ones. The ISO/IEC 8859 parts are 96-character sets which cannot be designated to G0, and are accessed from G2 using the 7-bit escape sequence form of the single-shift code SS2:^[121]

ESC $ A to switch to GB 2312-1980 (2 bytes per character)
ESC $ ( C to switch to
KS X 1001-1992
(2 bytes per character)
ESC $ ( D to switch to JIS X 0212-1990 (2 bytes per character)
ESC . A to switch to ISO/IEC 8859-1 high part, Extended Latin 1 set (1 byte per character) [designated to G2]
ESC . F to switch to ISO/IEC 8859-7 high part, Basic Greek set (1 byte per character) [designated to G2]

ISO-2022-JP with the ISO-2022-JP-2 representation of JIS X 0212, but not the other extensions, was subsequently dubbed ISO-2022-JP-1 by RFC 2237, dated 1997.^[122]

IBM Japanese TCP

IBM implements nine 7-bit ISO 2022 based encodings for Japanese, each using a different set of escape sequences: IBM-956, IBM-957, IBM-958, IBM-959, IBM-5052, IBM-5053, IBM-5054, IBM-5055 and ISO-2022-JP, which are collectively termed "TCP/IP Japanese coded character sets".^[123] CCSID 9148 is the standard (RFC 1468) ISO-2022-JP.^[124]

IBM variants of ISO-2022-JP
Code page / CCSID	ACRI definition number	Escape sequences for ACRI^[110]
956^[125]	TCP-01	`ESC ( J` (JIS X 0201 Roman) `ESC $ ( B` (JIS X 0208, 1983+, long escape sequence) `ESC $ I` (JIS X 0201 Katakana) `ESC $ ( D`
957^[126]	TCP-02	`ESC ( J` (JIS X 0201 Roman) `ESC $ ( @` (JIS X 0208, 1978, long escape sequence) `ESC $ I` (JIS X 0201 Katakana) `ESC $ ( D` (JIS X 0212)
958^[127]	TCP-03	`ESC ( A` (ASCII) `ESC $ ( B` (JIS X 0208, 1983+, long escape sequence) `ESC $ I` (JIS X 0201 Katakana) `ESC $ ( D` (JIS X 0212)
959^[128]	TCP-04	`ESC ( A` (ASCII) `ESC $ ( @` (JIS X 0208, 1978, long escape sequence) `ESC $ I` (JIS X 0201 Katakana) `ESC $ ( D` (JIS X 0212)
5052^[129]	TCP-05	`ESC ( J` (JIS X 0201 Roman) `ESC $ B` (JIS X 0208, 1983+) `ESC $ I` (JIS X 0201 Katakana) `ESC $ ( D` (JIS X 0212)
5053^[130]	TCP-06	`ESC ( J` (JIS X 0201 Roman) `ESC $ @` (JIS X 0208, 1978) `ESC $ I` (JIS X 0201 Katakana) `ESC $ ( D` (JIS X 0212)
5054^[131]	TCP-07	`ESC ( A` (ASCII) `ESC $ B` (JIS X 0208, 1983+) `ESC $ I` (JIS X 0201 Katakana) `ESC $ ( D` (JIS X 0212)
5055^[132]	TCP-08	`ESC ( A` (ASCII) `ESC $ @` (JIS X 0208, 1978) `ESC $ I` (JIS X 0201 Katakana) `ESC $ ( D` (JIS X 0212)
9148^[124]	TCP-16	`ESC ( A` (ASCII) `ESC ( J` (JIS X 0201 Roman) `ESC $ @` (JIS X 0208, 1978) `ESC $ B` (JIS X 0208, 1983+)

JIS X 0213

The JIS X 0213 standard, first published in 2000, defines an updated version of ISO-2022-JP, without the ISO-2022-JP-2 extensions, named ISO-2022-JP-3. The additions made by JIS X 0213 compared to the base JIS X 0208 standard resulted in a new registration being made for the extended JIS plane 1, while the new plane 2 received its own registration. The further additions to plane 1 in the 2004 edition of the standard resulted in an additional registration being added to a further revision of the profile, dubbed ISO-2022-JP-2004. In addition to the basic ISO-2022-JP designation codes, the following designations are recognized:

ESC ( I to switch to JIS X 0201-1976 Kana set (1 byte per character)
ESC $ ( O to switch to JIS X 0213-2000 Plane 1 (2 bytes per character)
ESC $ ( P to switch to JIS X 0213-2000 Plane 2 (2 bytes per character)
ESC $ ( Q to switch to JIS X 0213-2004 Plane 1 (2 bytes per character, ISO-2022-JP-2004 only)

Other 7-bit versions

ISO-2022-KR is defined in RFC 1557, dated 1993.

KS X 1001-1992,^[134]^[135] previously named KS C 5601-1987. Unlike ISO-2022-JP-2, it makes use of the Shift Out and Shift In characters to switch between them, after including ESC $ ) C once at the start of a line to designate KS X 1001 to G1.^[133]

ISO-2022-CN and ISO-2022-CN-EXT are defined in RFC 1922, dated 1996. They are 7-bit encodings making use both of the Shift Out and Shift In functions (to shift between G0 and G1), and of the 7-bit escape code forms of the single-shift functions SS2 and SS3 (to access G2 and G3).

traditional Chinese

).

The basic ISO-2022-CN profile uses ASCII as its G0 (shift in) set, and also includes GB 2312 and the first two planes of CNS 11643 (due to these two planes being sufficient to represent all traditional Chinese characters from common Big5, to which the RFC provides a correspondence in an appendix):^[136]

ESC $ ) A to switch to GB 2312-1980 (2 bytes per character) [designated to G1]
ESC $ ) G to switch to CNS 11643-1992 Plane 1 (2 bytes per character) [designated to G1]
ESC $ * H to switch to CNS 11643-1992 Plane 2 (2 bytes per character) [designated to G2]

The ISO-2022-CN-EXT profile permits the following additional sets and planes.^[136]

ESC $ ) E to switch to ISO-IR-165 (2 bytes per character) [designated to G1]
ESC $ + I to switch to CNS 11643-1992 Plane 3 (2 bytes per character) [designated to G3]
ESC $ + J to switch to CNS 11643-1992 Plane 4 (2 bytes per character) [designated to G3]
ESC $ + K to switch to CNS 11643-1992 Plane 5 (2 bytes per character) [designated to G3]
ESC $ + L to switch to CNS 11643-1992 Plane 6 (2 bytes per character) [designated to G3]
ESC $ + M to switch to CNS 11643-1992 Plane 7 (2 bytes per character) [designated to G3]

The ISO-2022-CN-EXT profile further lists additional

Guobiao standard graphical sets as being permitted, but conditional on their being assigned registered ISO 2022 escape sequences:^[136]

GB 12345 in G1
GB 7589 or GB 13131 in G2
GB 7590 or GB 13132 in G3

The character after the ESC (for single-byte character sets) or ESC $ (for multi-byte character sets) specifies the type of character set and working set that is designated to. In the above examples, the character ( (0x28) designates a 94-character set to the G0 character set, whereas ), * or + (0x29–0x2B) designates to the G1–G3 character sets.

ISO-2022-KR and ISO-2022-CN are used less frequently than ISO-2022-JP, and are sometimes deliberately not supported due to security concerns. Notably, the

replacement character (�), in order to prevent certain cross-site scripting and related attacks, which utilize a difference in encoding support between the client and server.^[113] Although the same security concern (allowing sequences of ASCII bytes to be interpreted differently) also applies to ISO-2022-JP and UTF-16, they could not be given this treatment due to being much more frequently used in deployed content.^[111]

ISO/IEC 4873

A subset of ISO 2022 applied to 8-bit single-byte encodings is defined by ISO/IEC 4873, also published by Ecma International as ECMA-43. ISO/IEC 8859 defines 8-bit codes for ISO/IEC 4873 (or ECMA-43) level 1.^[9]^[10]

ISO/IEC 4873 / ECMA-43 defines three levels of encoding:^[137]

Level 1, which includes a C0 set, the ASCII G0 set, an optional C1 set and an optional single-byte (94-character or 96-character) G1 set. G0 is invoked over GL, and G1 is invoked over GR. Use of shift functions is not permitted.
Level 2, which includes a (94-character or 96-character) single-byte G2 and/or G3 set in addition to a mandatory G1 set. Only the single-shift functions SS2 and SS3 are permitted (i.e. locking shifts are forbidden), and they invoke over the GL region (including 0x20 and 0x7F in the case of a 96-set). SS2 and SS3 must be available in C1 at 0x8E and 0x8F respectively. This minimal required C1 set for ISO 4873 is registered as ISO-IR-105.^[69]
Level 3, which permits the GR locking-shift functions LS1R, LS2R and LS3R in addition to the single shifts, but otherwise has the same restrictions as level 2.

Earlier editions of the standard permitted non-ASCII assignments in the G0 set, provided that the

¤.^[138] For instance, the 8-bit encoding of JIS X 0201 is compliant with earlier editions. This was subsequently changed to fully specify the ISO/IEC 646:1991 IRV / ISO-IR No. 6 set (ASCII).^[139]^[140]^[141]

The use of the ISO/IEC 646 IRV (synchronised with ASCII since 1991) at ISO/IEC 4873 Level 1 with no C1 or G1 set, i.e. using the IRV in an 8-bit environment in which shift codes are not used and the high bit is always zero, is known as ISO 4873 DV, in which DV stands for "Default Version".^[142]

In cases where duplicate characters are available in different sets, the current edition of ISO/IEC 4873 / ECMA-43 only permits using these characters in the lowest numbered working set which they appear in.^[143] For instance, if a character appears in both the G1 set and the G3 set, it must be used from the G1 set. However, use from other sets is noted as having been permitted in earlier editions.^[141]

ISO/IEC 8859 defines complete encodings at level 1 of ISO/IEC 4873, and does not allow for use of multiple ISO/IEC 8859 parts together. It stipulates that ISO/IEC 10367 should be used instead for levels 2 and 3 of ISO/IEC 4873.^[9]^[10] ISO/IEC 10367:1991 includes G0 and G1 sets matching those used by the first 9 parts of ISO/IEC 8859 (i.e. those which existed as of 1991, when it was published), and some supplementary sets.^[144]

Character set designation escape sequences are used for identifying or switching between versions during information interchange only if required by a further protocol, in which case the standard requires an ISO/IEC 2022 announcer sequence specifying the ISO/IEC 4873 level, followed by a complete set of escapes specifying the character set designations for C0, C1, G0, G1, G2 and G3 respectively (but omitting G2 and G3 designations for level 1), with an F-byte of 0x7E denoting an empty set. Each ISO/IEC 4873 level has its own single ISO/IEC 2022 announcer sequence, which are as follows:^[145]

Code	Hex	Announcement
`ESC SP L`	`1B 20 4C`	ISO 4873 Level 1
`ESC SP M`	`1B 20 4D`	ISO 4873 Level 2
`ESC SP N`	`1B 20 4E`	ISO 4873 Level 3

Extended Unix Code

Extended Unix Code (EUC) is an 8-bit variable-width

simplified Chinese. It is based on ISO 2022, and only character sets which conform to the ISO 2022 structure can have EUC forms. Up to four coded character sets can be represented (in G0, G1, G2 and G3). The G0 set is invoked over GL, the G1 set is invoked over GR, and the G2 and G3 sets are (if present) invoked using the single shifts SS2 and SS3, which are used as CR bytes (i.e. 0x8E and 0x8F respectively) and invoke over GR (not GL).^[11] Locking shift codes are not used.^[12]

The code assigned to the G0 set is ASCII, or the country's national

Yen sign in some versions of EUC-JP and a Won sign

in some versions of EUC-KR.

G1 is used for a 94x94 coded character set represented in two bytes. The

EUC-TW

can take up to four bytes (i.e. SS2 plus three bytes).

The EUC code itself does not make use of the announcer or designation sequences from ISO 2022; however, it corresponds to the following sequence of four announcer sequences, with meanings breaking down as follows.[146]

Individual sequence	Hexadecimal	Feature of EUC denoted
`ESC SP C`	`1B 20 43`	ISO-8 (8-bit, G0 in GL, G1 in GR)
`ESC SP Z`	`1B 20 5A`	G2 accessed using SS2
`ESC SP [`	`1B 20 5B`	G3 accessed using SS3
`ESC SP \`	`1B 20 5C`	Single-shifts invoke over GR

Compound Text (X11)

The

ISO-8859-1 in its initial state.^[150]

The following F-bytes are used:

ISO 2022 designation sequences used in X11 Compound Text^[151]
Escape sequence type	Final byte	Graphical set
GZD4, G1D4 (for 94-character sets)	`B` (`0x42`)	ASCII
	`I` (`0x49`)	JIS X 0201 katakana
	`J` (`0x4A`)	JIS X 0201 Roman
G1D6 (for 96-character sets)	`A` (`0x41`)	ISO-8859-1 high part
	`B` (`0x42`)	ISO-8859-2 high part
	`C` (`0x43`)	ISO-8859-3 high part
	`D` (`0x44`)	ISO-8859-4 high part
	`F` (`0x46`)	ISO-8859-7 high part
	`G` (`0x47`)	ISO-8859-6 high part
	`H` (`0x48`)	ISO-8859-8 high part
	`L` (`0x4C`)	ISO-8859-5 high part
	`M` (`0x4D`)	ISO-8859-9 high part
GZDM4, G1DM4 (for 2-byte sets)	`A` (`0x41`)	GB 2312
	`B` (`0x42`)	JIS X 0208
	`C` (`0x43`)	KS C 5601

For specifying encodings by labels, X11 Compound Text defines five private-use DOCS sequences: ESC % / 0 (1B 25 2F 30) for variable-length encodings, and ESC % / 1 through ESC % / 4 for fixed-length encodings using one through four bytes respectively. Rather than using another escape sequence to return to ISO 2022, the two bytes following the initial escape sequence specify the remaining length in bytes, coded in base-128 using bytes 0x80–FF. The encoding label is included in

ISO 8859-1 before the encoded text, and terminated with STX (0x02).^[108]

Comparison with other encodings

Advantages

As ISO/IEC 2022's entire range of graphical character encodings can be invoked over GL, the available glyphs are not significantly limited by an inability to represent GR and C1, such as in a system limited to 7-bit encodings. It accordingly enables the representation of large set of characters in such a system. Generally, this 7-bit compatibility is not really an advantage, except for backwards compatibility with older systems. The vast majority of modern computers use 8 bits for each byte.

As compared to Unicode, ISO/IEC 2022 sidesteps Han unification by using sequence codes to switch between discrete encodings for different East Asian languages. This avoids the issues^{[citation needed]} associated with unification, such as difficulty supporting multiple CJK languages with their associated character variants in a single document and font.

Disadvantages

Since ISO/IEC 2022 is a stateful encoding, a program cannot jump in the middle of a block of text to search, insert or delete characters. This makes manipulation of the text very cumbersome and slow when compared to non-stateful encodings. Any jump in the middle of the text may require a backup to the previous escape sequence before the bytes following the escape sequence can be interpreted.
Due to the stateful nature of ISO/IEC 2022, an identical and equivalent character may be encoded in different character sets, which may be designated to any of G0 through G3, which may be invoked using single shifts or by using locking shifts to GL or GR. Consequently, characters can be represented in multiple ways, meaning that two visually identical and equivalent strings can not be reliably compared for equality.
Some systems, like DICOM and several e-mail clients, use a variant of ISO-2022 (e.g. "ISO 2022 IR 100"^[152]) in addition to supporting several other encodings.^[153] This type of variation makes it difficult to portably transfer text between computer systems.
UTF-1, the multi-byte Unicode transformation format compatible with ISO/IEC 2022's representation of 8-bit control characters, has various disadvantages in comparison with UTF-8, and switching from or to other charsets, as supported by ISO/IEC 2022, is typically unnecessary in Unicode documents.
Because of its escape sequences, it is possible to construct attack byte sequences in which a malicious string (such as
replacement character in HTML5 to prevent attacks.^[112]^[113] Restricted ISO 2022 8-bit code versions which do not use designation escapes or locking shift codes, such as Extended Unix Code
, do not share this problem.

Concatenation can pose issues. Profiles such as ISO-2022-JP specify that the stream starts in the ASCII state and must end in the ASCII state.
replacement character ("�") to prevent them from being used to mask malicious sequences such as cross-site scripting.^[156] Implementing this measure, e.g. in Mozilla Thunderbird, has led to interoperability issues, with unexpected "�" characters being generated where two ISO-2022-JP streams have been concatenated.^[154]

Footnotes

^ Japanese: 区点, romanized: kuten; Chinese: 区位; pinyin: qūwèi; Korean: 행렬; Hanja: 行列; RR: haeng-nyeol
^ Japanese: 区, romanized: ku, lit. 'zone'; Chinese: 区; pinyin: qū; Korean: 행; Hanja: 行; RR: haeng
^ Japanese: 点, romanized: ten, lit. 'point'; Chinese: 位; pinyin: wèi; lit. 'position'; Korean: 열; Hanja: 列; RR: yeol
^ Japanese: 面, romanized: men, lit. 'face'
^
SoftBank 2G emoji encoding, use additional escapes of this form for non-ISO-2022-compliant purposes.^[96]

^ Listed by MARC-8.^[3] See footnote for ESC , F below for background.
^ F, adjusted to the range 1-63, indicates which (upwardly compatible) revision of the immediately-following registration is needed, so that old systems know that they are old.^[97]
^ In earlier editions, 96-character sets did not exist, and the escape codes now used for 96-character sets were reserved as space for additional 94-character sets. Accordingly, the ESC 0x1B 0x2C sequence was defined in early editions of the standard as designating further 94-character sets to G0.^[98] Since 96-character sets cannot be designated to G0, this first I byte is not used by the current edition of the standard. However, it is still listed by MARC-8.^[3]
^ See also, for instance, Printronix (2012), OKI® Programmer's Reference Manual (PDF), p. 26 for a more recent system which uses ESC ( H to switch to ASCII from a DBCS.

References

^ ECMA-35 (1994), Brief History
^ ECMA-35 (1994), p. 51, annex D
^ ^a ^b ^c ^d ^e "Technique 2: Using standard alternate graphic character sets". MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media. Library of Congress. 2007-12-05. Archived from the original on 2020-07-22. Retrieved 2020-07-19.
^ "ECMA-35: Character code structure and extension techniques (web page)". Ecma International. Archived from the original on 2022-04-25. Retrieved 2022-04-27.
^ ^a ^b ^c ^d ECMA-35 (1994), pp. 15–16, chapter 8.1
^ ^a ^b ECMA-35 (1994), chapter 13
^ ^a ^b ECMA-35 (1994), chapters 12, 14
^ ^a ^b ECMA-35 (1994), chapter 11
^ ^a ^b ^c ^d ^e ISO/IEC FDIS 8859-10 (1998), p. 1, chapter 1 ("Scope")
^ ^a ^b ^c ^d ^e ECMA-144 (2000), p. 1, chapter 1 ("Scope")
^ ^a ^b ^c ^d ^e ^f Lunde (2008), pp. 242–245, Chapter 4 ("Encoding Methods"), section "EUC encoding"
^ ^a ^b ^c ^d Lunde (2008), pp. 253–255, Chapter 4 ("Encoding Methods"), section "EUC versus ISO-2022 encodings".
^ ^a ^b ISO-IR-196 (1996)
^ ^a ^b ^c Moy, Edward; Gildea, Stephen; Dickey, Thomas. "Controls beginning with ESC". XTerm Control Sequences. Archived from the original on 2019-10-10. Retrieved 2019-10-04.
^ ECMA-35 (1994), chapters 6, 7
^ ECMA-35 (1994), chapter 8
^ ECMA-35 (1994), chapter 9
^ ^a ^b ECMA-35 (1994), chapter 15
^ Lunde (2008), pp. 228–234, Chapter 4 ("Encoding Methods"), section "ISO-2022 encoding"
^ Lunde (2008), pp. 19–20, Chapter 1 ("CJKV Information Processing Overview"), section "What are Row-Cell and Plane-Row-Cell?"
^ ECMA-35 (1994), p. 4, definition 4.11
^ ECMA-35 (1994), p. 5, definition 4.18
JIS X 0201 Roman set
as ESC 2/8 4/10.

^ ECMA-35 (1994), p. 5, chapter 5.1

JIS X 0201 Roman set
as ESC ( J.

^ ECMA-35 (1994), p. 7, chapter 6.2

^ ECMA-35 (1994), p. 10, chapter 6.3.2

^ ECMA-35 (1994), p. 4, definition 4.17

^ ECMA-35 (1994), p. 4, definition 4.14

^ ECMA-35 (1994), p. 28, chapter 13.1

^ ^a ^b ^c ECMA-35 (1994), p. 33, chapter 13.3.3

^ ECMA-48 (1991), pp. 24–26, chapter 5.4

^ ^a ^b ^c ^d ECMA-35 (1994), p. 11, chapter 6.4.3

^ ISO-IR-208 (1999)

^ ISO-IR-155 (1990)

^ ISO-IR-164 (1992)

^ ^a ^b ECMA-35 (1994), p. 10, chapter 6.3.3

Google Inc. (2014). "ansi.go, line 134". ANSI escape sequence library for Go. Archived
from the original on 2022-04-30. Retrieved 2019-09-14.

^ ECMA-43 (1991), p. 5, chapter 7 ("Specification of the characters of the 8-bit code")

^ ISO/IEC FDIS 8859-10 (1998), p. 3, chapter 6 ("Specification of the coded character set")

^ ECMA-144 (2000), p. 3, chapter 6 ("Specification of the coded character set")

^ ECMA-43 (1991), p. 19, annex C ("Composite graphic characters")

^ ^a ^b ECMA-35 (1994), p. 10, chapter 6.4.1

^ ^a ^b ECMA-35 (1994), p. 11, chapter 6.4.4

^ ^a ^b ^c ECMA-35 (1994), p. 11, chapter 6.4.2

^ ISO-IR-104 (1985)

^ ISO-IR-1 (1975)

^ ^a ^b ECMA-35 (1994), p. 19, chapter 8.5.1

^ ^a ^b ECMA-35 (1994), p. 19, chapter 8.5.2

^ ECMA-43 (1991), p. 8, chapter 7.6 ("C1 set")

^ ^a ^b ECMA-35 (1994), p. 29, chapter 13.2.1

^ ^a ^b ECMA-35 (1994), p. 12, chapter 6.5.1

^ ECMA-35 (1994), p. 12, chapter 6.5.2

^ ^a ^b ^c ISO-IR, p. 19, chapter 2.7 ("Single control functions")

^ ECMA-35 (1994), p. 12, chapter 6.5.4

^ ECMA-48 (1991), chapter 5.5

ISO-IR-35.{{citation}}: CS1 maint: numeric names: authors list (link
)

^ ECMA-35 (1994), p. 12, chapter 6.5.3

^ ^a ^b ECMA-35 (1994), p. 14, chapter 7.3, table 2

^ ISO-IR-14 (1975)

^ ^a ^b ITU-T (1995-08-11). Recommendation T.51 (1992) Amendment 1. Archived from the original on 2020-08-02. Retrieved 2019-12-25.

^ ISO-IR-106 (1985)

^ ECMA-35 (1994), p. 15, chapter 7.3, note 23

^ ISO-IR-140 (1987)

^ ISO-IR-7 (1975)

^ ISO-IR-26 (1976)

^ ISO-IR-36 (1977)

^ ECMA-35 (1980), p. 8, chapter 5.1.7

^ ^a ^b ISO-IR-105 (1985)

^ ^a ^b ^c ^d ECMA-35 (1994), p. 17, chapter 8.3.1

^ ^a ^b ^c ^d ECMA-35 (1994), p. 23, chapter 9.3.1

^ ^a ^b ^c ECMA-35 (1994), p. 19, chapter 8.4

^ ^a ^b ^c ECMA-35 (1994), p. 17, chapter 8.3.2

^ ECMA-35 (1994), pp. 23–24, chapter 9.4

^ ECMA-35 (1994), p. 27, chapter 11.1

^ ECMA-35 (1994), p. 17, chapter 8.3.3

^ ECMA-35 (1994), p. 47, annex B

^ ISO-IR, p. 2, chapter 1 ("Introduction")

^ ISO/IEC 2375 (2003)

^ ^a ^b "Handling of the SGML declaration in SP". SP: an SGML System Conforming to International Standard ISO 8879.

W3C
.

^ ISO-IR, p. 10, chapter 2.2 ("94-Character graphic character set with second Intermediate byte")

^ ARIB STD-B24 (2008), p. 39, part 2, Table 7-3

^ Mascheck, Sven; Le Breton, Stefan; Hamilton, Richard L. "About the 'alternate linedrawing character set'". ~sven_mascheck/. Archived from the original on 2019-12-29. Retrieved 2020-01-08.

^ ECMA-35 (1994), p. 36, chapter 14.4

^ ECMA-35 (1994), p. 36, chapter 14.4.2, note 48

^ ECMA-35 (1994), p. 36, chapter 14.4.2, note 47

^ ETS 300 706 (1997), p. 103, chapter 14 ("Dynamically Re-definable Characters")

^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p ^q ECMA-35 (1994), pp. 35–36, chapter 14.3.2

^ ISO/IEC 10646 (2017), pp. 19–20, chapter 12.4 ("Identification of control function set")

^ ECMA-35 (1994), p. 32, table 5

^ ^a ^b ^c ECMA-35 (1994), pp. 37–41, chapter 15.2

^ ECMA-35 (1994), p. 34, chapter 14.2.2

^ ECMA-35 (1994), p. 34, chapter 14.2.3

^ Digital. "DECDWL—Double-Width, Single-Height Line". VT510 Video Terminal Programmer Information. Archived from the original on 2020-08-02. Retrieved 2020-01-17.

^ Kawasaki, Yusuke (2010). "Encode::JP::Emoji::Encoding". Encode-JP-Emoji. Line 268. Archived from the original on 2022-04-30. Retrieved 2020-05-28.

^ ECMA-35 (1994), pp. 36–37, chapter 14.5

^ ECMA-35 (1980), pp. 14–15, chapter 5.3.7

^ ^a ^b ^c ^d ISO-IR, p. 20, chapter 2.8.1 ("Coding systems with Standard return")

^ ^a ^b ^c ^d ECMA-35 (1994), pp. 41–42, chapter 15.4

^ ^a ^b ^c ^d ^e ISO-IR, p. 21, chapter 2.8.2 ("Coding systems without Standard return")

^ ECMA-35 (1994), p. 41, chapter 15.3

^ ^a ^b ^c ISO/IEC 10646 (2017), p. 19, chapter 12.2 ("Identification of a UCS encoding scheme")

^ ISO/IEC 10646 (2017), pp. 18–19, chapter 12.1 ("Purpose and context of identification")

^ ISO-IR-192 (1996)

^ ISO-IR-195 (1996)

^ ISO/IEC 10646 (2017), p. 20, chapter 12.5 ("Identification of the coding system of ISO/IEC 2022")

^ ^a ^b Scheifler (1989), § Non-Standard Character Set Encodings

^ Lunde (2008), pp. 229–230, Chapter 4 ("Encoding Methods"), section "ISO-2022 encoding" "Those encodings that have been extensively used in the past, or continue to be used today for some purposes, have been highlighted."

^ ^a ^b "Additional Coding-related Required Information". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2015-01-07.

^ ^a ^b ^c WHATWG Encoding Standard, section 2 ("Security background")

^ ^a ^b ^c WHATWG Encoding Standard, chapter 4.2 ("Names and labels"), anchor "replacement"

^ ^a ^b ^c ^d WHATWG Encoding Standard, section 14.1 ("replacement")

^ ^a ^b ^c ^d ^e ^f RFC 1468 (1993)

^ ^a ^b ^c "Code Page Identifiers". Windows Dev Center. Microsoft. Archived from the original on 2019-06-16. Retrieved 2019-09-16.

^ ^a ^b WHATWG Encoding Standard, section 12.2 ("ISO-2022-JP")

^ Chang, Hye-Shik. "Modules/cjkcodecs/_codecs_iso2022.c, line 1122". cPython source tree. Python Software Foundation. Archived from the original on 2022-04-30. Retrieved 2019-09-15.

^ "codecs — Codec registry and base classes § Standard Encodings". Python 3.7.4 documentation. Python Software Foundation. Archived from the original on 2019-07-28. Retrieved 2019-09-16.

^ "2: Codesets and Codeset Conversion". DIGITAL UNIX Technical Reference for Using Japanese Features. Digital Equipment Corporation, Compaq.^{[dead link]}

^ ^a ^b Lunde (2008), pp. 236–238, Chapter 4 ("Encoding Methods"), section "The predecessor of ISO-2022-JP encoding—JIS encoding"

^ RFC 1554 (1993)

^ RFC 2237 (1997)

^ "PQ02042: New Function to Provide C/370 iconv() Support for Japanese ISO-2022-JP". IBM. 2021-01-19. Archived from the original on 2022-01-04. Retrieved 2022-01-04.

^ ^a ^b "CCSID 9148". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-11-29.

^ "CCSID 956". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-12-02.

^ "CCSID 957". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-11-30.

^ "CCSID 958". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-12-01.

^ "CCSID 959". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-12-02.

^ "CCSID 5052". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-11-29.

^ "CCSID 5053". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-11-29.

^ "CCSID 5054". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-11-29.

^ "CCSID 5055". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-11-29.

^ ^a ^b RFC 1557 (1993)

^ "KS X 1001:1992" (PDF). Archived (PDF) from the original on 2007-09-26. Retrieved 2007-07-12.

^ ISO-IR-149 (1988)

^ ^a ^b ^c ^d RFC 1922 (1996)

^ ECMA-43 (1991), pp. 9–10, chapter 8 ("Levels")

^ ECMA-43 (1985), pp. 7–11, chapter 7.3 ("The G0 set")

^ ECMA-43 (1991), pp. 6–8, chapter 7.4 ("G0 set")

^ ECMA-43 (1991), p. 11, chapter 10.3 ("Identification of a version")

^ ^a ^b ECMA-43 (1991), p. 23, annex E ("Main differences between the second edition (1985) and the present (third) edition of this ECMA Standard")

IPTC (1995). The IPTC Recommended Message Format (PDF) (5th ed.). IPTC TEC 7901. Archived
(PDF) from the original on 2022-01-25. Retrieved 2020-01-14.

^ ECMA-43 (1991), pp. 10, chapter 9.2 ("Unique coding of characters")

^ van Wingen, Johan W (1999). "8. Code Extension, ISO 2022 and 2375, ISO 4873 and 10367". Character sets. Letters, tokens and codes. Terena. Archived from the original on 2020-08-01. Retrieved 2019-10-02.

^ ECMA-43 (1991), pp. 10–11, chapter 10 ("Identification of version and level")

^ IBM. "Character Data Representation Architecture (CDRA)". IBM. pp. 157–162. Archived from the original on 2019-06-23. Retrieved 2020-06-18.

^ Scheifler (1989)

^ Scheifler (1989), § Control Characters

^ Scheifler (1989), § Directionality

^ Scheifler (1989), § Standard Character Set Encodings

^ Scheifler (1989), § Approved Standard Encodings

^ "DICOM PS3.2 2016d - Conformance; D.6.2 Character Sets; D.6 Support of Character Sets". Archived from the original on 2020-02-16. Retrieved 2020-05-21.

^ "DICOM ISO 2022 variation". Archived from the original on 2013-04-30. Retrieved 2009-07-25.

^ ^a ^b Sivonen, Henri (2018-12-17). "(UNSUBMITTED DRAFT) No U+FFFD Generation for Zero-Length ASCII-State Content between ISO-2022-JP Escape Sequences" (PDF). Archived (PDF) from the original on 2019-02-21. Retrieved 2019-02-21.

^ "935453 - Gather telemetry about HZ and other encodings we might try to remove". Archived from the original on 2017-05-19. Retrieved 2018-06-18.

^ Davis, Mark; Suignard, Michel (2014-09-19). "3.6.2 Some Output For All Input". Unicode Technical Report #36: Unicode Security Considerations (revision 15). Unicode Consortium. Archived from the original on 2019-02-22. Retrieved 2019-02-21.

Standards and registry indices cited

ARIB (2008). ARIB STD-B24: Data Coding and Transmission Specification for Digital Broadcasting (PDF) (ARIB Standard). 5.2-E1. Vol. 1. Archived (PDF) from the original on 2017-07-10. Retrieved 2017-07-10.

ECMA (1980). ECMA-35: Extension of the 7-bit Coded Character Set (PDF) (ECMA Standard) (2nd ed.).

ECMA (1994). ECMA-35: Character Code Structure and Extension Techniques (PDF) (ECMA Standard) (6th ed.).

ECMA (1985). ECMA-43: 8-Bit Coded Character Set Structure and Rules (PDF) (ECMA Standard) (2nd ed.).

ECMA (1991). ECMA-43: 8-Bit Coded Character Set Structure and Rules (PDF) (ECMA Standard) (3rd ed.).

ECMA (1991). ECMA-48: Control Functions for Coded Character Sets (PDF) (ECMA Standard) (5th ed.).

ECMA (2000). ECMA-144: 8-Bit Single-Byte Coded Graphic Character sets: Latin Alphabet No. 6 (PDF) (ECMA Standard) (3rd ed.).

European Broadcasting Union (1997). ETS 300 706: Enhanced Teletext specification (PDF) (European Telecommunications Standards). ETSI.

ISO.{{cite book}}: CS1 maint: numeric names: authors list (link
)

ISO/IEC JTC 1/SC 2 (1998-02-12). ISO/IEC FDIS 8859-10: Information Technology — 8-bit single-byte coded graphic character sets — Part 10: Latin alphabet No. 6 (PDF) (Final Draft International Standard).{{cite book}}: CS1 maint: numeric names: authors list (link)

ISO.{{cite book}}: CS1 maint: numeric names: authors list (link
)

ISO-IR: ISO/IEC International Register of Coded Character Sets To Be Used With Escape Sequences (PDF) (Registry Index). ITSCJ/IPSJ.

Scheifler, Robert W. (1989). Compound Text Encoding (X Consortium Standard).
X Consortium
.

van Kesteren, Anne. WHATWG Encoding Standard (WHATWG Living Standard). WHATWG.

Registered code sets cited

ISO/TC 97/SC 2 (1975-12-01). ISO-IR-1: The set of control characters of the ISO 646 (PDF). ITSCJ/IPSJ.{{citation}}: CS1 maint: numeric names: authors list (link)

Sveriges Standardiseringskommission (1975-12-01). ISO-IR-7: NATS Control set for newspaper text transmission (PDF). ITSCJ/IPSJ.

Japanese Industrial Standards Committee (1975-12-01). ISO-IR-14: The Japanese Roman graphic set of characters (PDF). ITSCJ/IPSJ.

IPTC (1976-03-25). ISO-IR-26: Control set for newspaper text transmission (PDF). ITSCJ/IPSJ.

ISO/TC 97/SC 2 (1977-10-15). ISO-IR-36: The set of control characters of ISO 646, with IS4 replaced by Single Shift for G2 (SS2) (PDF). ITSCJ/IPSJ.{{citation}}: CS1 maint: numeric names: authors list (link)

ISO/TC97/SC2/WG-7; ECMA (1985-08-01). ISO-IR-104: Minimum C0 set for ISO 4873 (PDF). ITSCJ/IPSJ.{{citation}}: CS1 maint: numeric names: authors list (link)

ISO/TC97/SC2/WG-7; ECMA (1985-08-01). ISO-IR-105: Minimum C1 Set for ISO 4873 (PDF). ITSCJ/IPSJ.{{citation}}: CS1 maint: numeric names: authors list (link)

ITU (1985-08-01). ISO-IR-106: Teletex Primary Set of Control Functions (PDF). ITSCJ/IPSJ.

Úřad pro normalizaci a měřeni (1987-07-31). ISO-IR-140: The C0 Set of Control Characters of ISO 646, with EM replaced by SS2 (PDF). ITSCJ/IPSJ.

Korea Bureau of Standards (1988-10-01). ISO-IR-149: Korean Graphic Character Set for Information Interchange (KS C 5601:1987) (PDF). ITSCJ/IPSJ.

ISO/IEC/JTC1/SC2/WG3 (1990-04-16). ISO-IR-155: Basic Box-Drawings Set (PDF). ITSCJ/IPSJ.{{citation}}: CS1 maint: numeric names: authors list (link)

CCITT (1992-07-13). ISO-IR-164: Hebrew Supplementary Set of Graphic Characters (PDF). ITSCJ/IPSJ
.

ECMA (1996-04-22). ISO-IR-192: UCS Transformation Format (UTF-8), implementation level 3, without standard return (PDF). ITSCJ/IPSJ.

ECMA (1996-04-22). ISO-IR-195: UCS Transformation Format (UTF-16), implementation level 3, without standard return (PDF). ITSCJ/IPSJ.

ECMA (1996-04-22). ISO-IR-196: UCS Transformation Format (UTF-8), with standard return (PDF). ITSCJ/IPSJ.

National Standards Authority of Ireland (1999-12-07). ISO-IR-208: Ogham coded character set for information interchange (PDF). ITSCJ/IPSJ.

Internet Requests For Comment cited

Murai, J.; Crispin, M.; van der Poel, E. (1993). "RFC 1468: Japanese Character Encoding for Internet Messages". Requests for Comments.
doi:10.17487/rfc1468
.

Ohta, M.; Handa, K. (1993). "RFC 1554: ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP". Requests for Comments.
doi:10.17487/rfc1554
.

Choi, U.; Chon, K.; Park, H. (1993). "RFC 1557: Korean Character Encoding for Internet Messages". Requests for Comments.
doi:10.17487/rfc1557
.

Zhu, HF.; Hu, DY.; Wang, ZG.; Kao, TC.; Chang, WCH.; Crispin, M. (1996). "RFC 1922: Chinese Character Encoding for Internet Messages". Requests for Comments.
doi:10.17487/rfc1922
.

Tamaru, K. (1997). "RFC 2237: Japanese Character Encoding for Internet Messages". Requests for Comments.
doi:10.17487/rfc2237
.

Other published works cited

ISBN 9780596514471
.

Further reading

ISBN 1-56592-224-7
.

External links

ISO/IEC 2022:1994

ISO/IEC 2022:1994/Cor 1:1999

ECMA-35, equivalent to ISO/IEC 2022 and freely downloadable.

International Register of Coded Character Sets to be Used with Escape Sequences, a full list of assigned character sets and their escape sequences

History of Character Codes in North America, Europe, and East Asia from 1999, rev. 2004

Ken Lunde's CJK.INF: a document on encoding Chinese, Japanese, and Korean (CJK) languages, including a discussion of the various variants of ISO/IEC 2022.

v
t
e
Character encodings
Early telecommunications

Telegraph code
Needle

Morse
Non-Latin

Wabun/Kana

Chinese

Cyrillic

Korean

Baudot and Murray

Fieldata

ASCII
ISO/IEC 646

BCDIC

Teletex and Videotex/Teletext
T.51/ISO/IEC 6937

ITU T.61

ITU T.101

World System Teletext
background

sets

Transcode

ISO/IEC 8859

Approved parts
-1 (Western Europe)

-2 (Central Europe)

-3 (Maltese/Esperanto)

-4 (North Europe)

-5 (Cyrillic)

-6 (Arabic)

-7 (Greek)

-8 (Hebrew)

-9 (Turkish)

-10 (Nordic)

-11 (Thai)

-13 (Baltic)

-14 (Celtic)

-15 (New Western Europe)

-16 (Romanian)

Abandoned parts
-12 (Devanagari)

Proposed but not approved
KOI-8 Cyrillic

Sámi

Adaptations
Welsh

Barents Cyrillic

Estonian

Ukrainian Cyrillic

Bibliographic use

MARC-8
ANSEL

CCCII/EACC

ISO 5426

5426-2

5427

5428

6438

6862

National standards

ArmSCII

Big5

BraSCII

CNS 11643

DIN 66003

ELOT 927

GOST 10859

GB 2312

GB 12345

GB 12052

GB 18030

HKSCS

ISCII

JIS X 0201

JIS X 0208

JIS X 0212

JIS X 0213

KOI-7

KPS 9566

KS X 1001

KS X 1002

LST 1564

LST 1590-4

PASCII

Shift JIS

SI 960

TIS-620

TSCII

VISCII

VSCII

YUSCII

ISO/IEC 2022

ISO/IEC 8859

ISO/IEC 10367

Extended Unix Code / EUC

Mac OS Code pages
("scripts")

Armenian

Arabic

Barents Cyrillic

Celtic

Central European

Croatian

Cyrillic

Devanagari

Farsi (Persian)

Font X (Kermit)

Gaelic

Georgian

Greek

Gujarati

Gurmukhi

Hebrew

Iceland

Inuit

Keyboard

Latin (Kermit)

Maltese/Esperanto

Ogham

Roman

Romanian

Sámi

Turkish

Turkic Cyrillic

Ukrainian

VT100

DOS code pages

437

668

708

720

737

770

773

775

776

777

778

850

851

852

853

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

897

899

903

904

932

936

942

949

950

951

1034

1040

1042

1043

1044

1098

1115

1116

1117

1118

1127

3846

ABICOMP

CS Indic

CSX Indic

CSX+ Indic

CWI-2

Iran System

Kamenický

Mazovia

MIK

IBM AIX code pages

895

896

912

915

921

922

1006

1008

1009

1010

1012

1013

1014

1015

1016

1017

1018

1019

1046

1124

1133

Windows code pages

CER-GS

932

936 (GBK)

950

1169

Extended Latin-8

1250

1251

1252

1253

1254

1255

1256

1257

1258

1270

Cyrillic + Finnish

Cyrillic + French

Cyrillic + German

Polytonic Greek

EBCDIC code pages

Japanese language in EBCDIC

DKOI

DEC terminals (VTx)

Multinational (MCS)

National Replacement (NRCS)
French Canadian

Swiss

Spanish

United Kingdom

Dutch

Finnish

French

Norwegian and Danish

Swedish

Norwegian and Danish (alternative)

8-bit Greek

8-bit Turkish

SI 960

Hebrew

Special Graphics

Technical (TCS)

Platform specific

1052

1053

1054

1055

1056

1057

1058

Acorn RISC OS

Amstrad CPC

Apple II

ATASCII

Atari ST

BICS

Casio calculators

CDC

Compucolor 8001

Compucolor II

CP/M+

DEC RADIX 50

DEC MCS/NRCS

DG International

Galaksija

GEM

GSM 03.38

HP Roman

HP FOCAL

HP RPL

SQUOZE

LICS

LMBCS

MSX

NEC APC

NeXT

PETSCII

PostScript Standard

PostScript Latin 1

SAM Coupé

Sega SC-3000

Sharp calculators

Sharp MZ

Sinclair QL

Teletext

TI calculators

TRS-80

Ventura International

WISCII

XCCS

ZX80

ZX81

ZX Spectrum

Unicode / ISO/IEC 10646

UTF-1

UTF-7

UTF-8

UTF-16

UTF-32

UTF-EBCDIC

GB 18030

DIN 91379

BOCU-1

CESU-8

SCSU

TACE16

Comparison of Unicode encodings

TeX typesetting system

Cork

LY1

OML

OMS

OT1

Miscellaneous code pages

ABICOMP

ASMO 449

Digital encoding of APL symbols
ISO-IR-68

ARIB STD-B24

Fieldata

HZ

IEC-P27-1

INIS
7-bit

8-bit

ISO-IR-169

ISO 2033

KOI
KOI8-R

KOI8-RU

KOI8-U

Mojikyō

SEASCII

Stanford/ITS

Symbol

TRON

Unified Hangul Code

Control character

Morse prosigns

C0 and C1 control codes
ISO/IEC 6429

JIS X 0211

Unicode control, format and separator characters

Whitespace characters

Related topics

CCSID

Character encodings in HTML

Charset detection

Han unification

Hardware code page

MICR code

Mojibake

Variable-length encoding

Character sets

v
t
e
Standards of Ecma International
Application interfaces

ANSI escape code

APIW

Common Language Infrastructure

Office Open XML

OpenXPS

File systems (tape)

Advanced Intelligent Tape

DDS

DLT

Super DLT

Linear Tape-Open (Ultrium-1)

VXA

File systems (disk)

CD-ROM

CD File System (CDFS)

FAT
FAT12

FAT16

FAT16B

FD

UDF

Ultra Density Optical

Universal Media Disc

Holographic Versatile Disc

Graphics

Universal 3D

Programming languages

C++/CLI

C#

Eiffel

JavaScript (E4X, ECMAScript)

Dart

Minimal BASIC

Full BASIC

Radio link interfaces

NFC

UWB

Other

ECMA-35

JSON

List of Ecma standards (1961 – present)

Retrieved from "https://en.wikipedia.org/w/index.php?title=ISO/IEC_2022&oldid=1211542858"

[20] Japanese: 区点, romanized: kuten; Chinese: 区位; pinyin: qūwèi; Korean: 행렬; Hanja: 行列; RR: haeng-nyeol

[21] Japanese: 区, romanized: ku, lit. 'zone'; Chinese: 区; pinyin: qū; Korean: 행; Hanja: 行; RR: haeng

[22] Japanese: 点, romanized: ten, lit. 'point'; Chinese: 位; pinyin: wèi; lit. 'position'; Korean: 열; Hanja: 列; RR: yeol

[23] Japanese: 面, romanized: men, lit. 'face'

[legacygzdm4-101] 
SoftBank 2G emoji encoding, use additional escapes of this form for non-ISO-2022-compliant purposes.^[96]

[102] Listed by MARC-8.^[3] See footnote for ESC , F below for background.

[104] F, adjusted to the range 1-63, indicates which (upwardly compatible) revision of the immediately-following registration is needed, so that old systems know that they are old.^[97]

[106] In earlier editions, 96-character sets did not exist, and the escape codes now used for 96-character sets were reserved as space for additional 94-character sets. Accordingly, the ESC 0x1B 0x2C sequence was defined in early editions of the standard as designating further 94-character sets to G0.^[98] Since 96-character sets cannot be designated to G0, this first I byte is not used by the current edition of the standard. However, it is still listed by MARC-8.^[3]

[125] See also, for instance, Printronix (2012), OKI® Programmer's Reference Manual (PDF), p. 26 for a more recent system which uses ESC ( H to switch to ASCII from a DBCS.

[1] ECMA-35 (1994), Brief History

[2] ECMA-35 (1994), p. 51, annex D

[marc-escs-3] "Technique 2: Using standard alternate graphic character sets". MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media. Library of Congress. 2007-12-05. Archived from the original on 2020-07-22. Retrieved 2020-07-19.

[4] "ECMA-35: Character code structure and extension techniques (web page)". Ecma International. Archived from the original on 2022-04-25. Retrieved 2022-04-27.

[8.1-5] ECMA-35 (1994), pp. 15–16, chapter 8.1

[ch13-6] ECMA-35 (1994), chapter 13

[ch12_14-7] ECMA-35 (1994), chapters 12, 14

[ch11-8] ECMA-35 (1994), chapter 11

[8859-10-s1-9] ISO/IEC FDIS 8859-10 (1998), p. 1, chapter 1 ("Scope")

[ecma-144-s1-10] ECMA-144 (2000), p. 1, chapter 1 ("Scope")

[lundeeuc-11] ^ ^a ^b ^c ^d ^e ^f Lunde (2008), pp. 242–245, Chapter 4 ("Encoding Methods"), section "EUC encoding"

[lundeeucvs-12] Lunde (2008), pp. 253–255, Chapter 4 ("Encoding Methods"), section "EUC versus ISO-2022 encodings".

[iso-ir-196-13] ISO-IR-196 (1996)

[xtctrlesc-14] Moy, Edward; Gildea, Stephen; Dickey, Thomas. "Controls beginning with ESC". XTerm Control Sequences. Archived from the original on 2019-10-10. Retrieved 2019-10-04.

[15] ECMA-35 (1994), chapters 6, 7

[16] ECMA-35 (1994), chapter 8

[17] ECMA-35 (1994), chapter 9

[ch15-18] ECMA-35 (1994), chapter 15

[lunde2022-19] Lunde (2008), pp. 228–234, Chapter 4 ("Encoding Methods"), section "ISO-2022 encoding"

[lundekuten-24] Lunde (2008), pp. 19–20, Chapter 1 ("CJKV Information Processing Overview"), section "What are Row-Cell and Plane-Row-Cell?"

[25] ECMA-35 (1994), p. 4, definition 4.11

[26] ECMA-35 (1994), p. 5, definition 4.18

[27] JIS X 0201 Roman set
as ESC 2/8 4/10.

[28] ECMA-35 (1994), p. 5, chapter 5.1

[29] JIS X 0201 Roman set
as ESC ( J.

[30] ECMA-35 (1994), p. 7, chapter 6.2

[31] ECMA-35 (1994), p. 10, chapter 6.3.2

[32] ECMA-35 (1994), p. 4, definition 4.17

[33] ECMA-35 (1994), p. 4, definition 4.14

[13.1-34] ECMA-35 (1994), p. 28, chapter 13.1

[13.3.3-35] ECMA-35 (1994), p. 33, chapter 13.3.3

[36] ECMA-48 (1991), pp. 24–26, chapter 5.4

[6.4.3-37] ECMA-35 (1994), p. 11, chapter 6.4.3

[38] ISO-IR-208 (1999)

[39] ISO-IR-155 (1990)

[40] ISO-IR-164 (1992)

[6.3.3-41] ECMA-35 (1994), p. 10, chapter 6.3.3

[42] Google Inc. (2014). "ansi.go, line 134". ANSI escape sequence library for Go. Archived
from the original on 2022-04-30. Retrieved 2019-09-14.

[43] ECMA-43 (1991), p. 5, chapter 7 ("Specification of the characters of the 8-bit code")

[8859-10-s6-44] ISO/IEC FDIS 8859-10 (1998), p. 3, chapter 6 ("Specification of the coded character set")

[ecma-144-s6-45] ECMA-144 (2000), p. 3, chapter 6 ("Specification of the coded character set")

[46] ECMA-43 (1991), p. 19, annex C ("Composite graphic characters")

[6.4.1-47] ECMA-35 (1994), p. 10, chapter 6.4.1

[6.4.4-48] ECMA-35 (1994), p. 11, chapter 6.4.4

[6.4.2-49] ECMA-35 (1994), p. 11, chapter 6.4.2

[50] ISO-IR-104 (1985)

[51] ISO-IR-1 (1975)

[8.5.1-52] ECMA-35 (1994), p. 19, chapter 8.5.1

[8.5.2-53] ECMA-35 (1994), p. 19, chapter 8.5.2

[ecma-43-7.6-54] ECMA-43 (1991), p. 8, chapter 7.6 ("C1 set")

[13.12.1-55] ECMA-35 (1994), p. 29, chapter 13.2.1

[6.5.1-56] ECMA-35 (1994), p. 12, chapter 6.5.1

[6.5.2-57] ECMA-35 (1994), p. 12, chapter 6.5.2

[irfixctrl-58] ISO-IR, p. 19, chapter 2.7 ("Single control functions")

[6.5.4-59] ECMA-35 (1994), p. 12, chapter 6.5.4

[60] ECMA-48 (1991), chapter 5.5

[ris-61] ISO-IR-35.{{citation}}: CS1 maint: numeric names: authors list (link
)

[6.5.3-62] ECMA-35 (1994), p. 12, chapter 6.5.3

[table2-63] ECMA-35 (1994), p. 14, chapter 7.3, table 2

[64] ISO-IR-14 (1975)

[T.51-amd1995-65] ITU-T (1995-08-11). Recommendation T.51 (1992) Amendment 1. Archived from the original on 2020-08-02. Retrieved 2019-12-25.

[reg106-66] ISO-IR-106 (1985)

[67] ECMA-35 (1994), p. 15, chapter 7.3, note 23

[reg140-68] ISO-IR-140 (1987)

[reg7-69] ISO-IR-7 (1975)

[reg26-70] ISO-IR-26 (1976)

[reg36-71] ISO-IR-36 (1977)

[72] ECMA-35 (1980), p. 8, chapter 5.1.7

[harvp|ISO-IR-105|1985-73] ISO-IR-105 (1985)

[8.3.1-74] ECMA-35 (1994), p. 17, chapter 8.3.1

[9.3.1-75] ECMA-35 (1994), p. 23, chapter 9.3.1

[8.4-76] ECMA-35 (1994), p. 19, chapter 8.4

[8.3.2-77] ECMA-35 (1994), p. 17, chapter 8.3.2

[9.4-78] ECMA-35 (1994), pp. 23–24, chapter 9.4

[11.1-79] ECMA-35 (1994), p. 27, chapter 11.1

[8.3.3-80] ECMA-35 (1994), p. 17, chapter 8.3.3

[81] ECMA-35 (1994), p. 47, annex B

[irintro-82] ISO-IR, p. 2, chapter 1 ("Introduction")

[83] ISO/IEC 2375 (2003)

[sp-84] "Handling of the SGML declaration in SP". SP: an SGML System Conforming to International Standard ISO 8879.

[85] W3C
.

[irsecond94-86] ISO-IR, p. 10, chapter 2.2 ("94-Character graphic character set with second Intermediate byte")

[87] ARIB STD-B24 (2008), p. 39, part 2, Table 7-3

[88] Mascheck, Sven; Le Breton, Stefan; Hamilton, Richard L. "About the 'alternate linedrawing character set'". ~sven_mascheck/. Archived from the original on 2019-12-29. Retrieved 2020-01-08.

[14.4-89] ECMA-35 (1994), p. 36, chapter 14.4

[note48-90] ECMA-35 (1994), p. 36, chapter 14.4.2, note 48

[note47-91] ECMA-35 (1994), p. 36, chapter 14.4.2, note 47

[92] ETS 300 706 (1997), p. 103, chapter 14 ("Dynamically Re-definable Characters")

[14.3.2-93] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o ^p ^q ECMA-35 (1994), pp. 35–36, chapter 14.3.2

[iso10646czdc1d-94] ISO/IEC 10646 (2017), pp. 19–20, chapter 12.4 ("Identification of control function set")

[table5-95] ECMA-35 (1994), p. 32, table 5

[15.2-96] ECMA-35 (1994), pp. 37–41, chapter 15.2

[14.2.2-97] ECMA-35 (1994), p. 34, chapter 14.2.2

[14.2.3-98] ECMA-35 (1994), p. 34, chapter 14.2.3

[99] Digital. "DECDWL—Double-Width, Single-Height Line". VT510 Video Terminal Programmer Information. Archived from the original on 2020-08-02. Retrieved 2020-01-17.

[100] Kawasaki, Yusuke (2010). "Encode::JP::Emoji::Encoding". Encode-JP-Emoji. Line 268. Archived from the original on 2022-04-30. Retrieved 2020-05-28.

[14.5-103] ECMA-35 (1994), pp. 36–37, chapter 14.5

[105] ECMA-35 (1980), pp. 14–15, chapter 5.3.7

[irdocs-107] ISO-IR, p. 20, chapter 2.8.1 ("Coding systems with Standard return")

[15.4-108] ECMA-35 (1994), pp. 41–42, chapter 15.4

[irdocsslash-109] ISO-IR, p. 21, chapter 2.8.2 ("Coding systems without Standard return")

[15.3-110] ECMA-35 (1994), p. 41, chapter 15.3

[iso10646docs-111] ISO/IEC 10646 (2017), p. 19, chapter 12.2 ("Identification of a UCS encoding scheme")

[112] ISO/IEC 10646 (2017), pp. 18–19, chapter 12.1 ("Purpose and context of identification")

[iso-ir-192-113] ISO-IR-192 (1996)

[114] ISO-IR-195 (1996)

[iso10646stdret-115] ISO/IEC 10646 (2017), p. 20, chapter 12.5 ("Identification of the coding system of ISO/IEC 2022")

[scheiflerdocs-116] Scheifler (1989), § Non-Standard Character Set Encodings

[lunde2022rfcs-117] Lunde (2008), pp. 229–230, Chapter 4 ("Encoding Methods"), section "ISO-2022 encoding" "Those encodings that have been extensively used in the past, or continue to be used today for some purposes, have been highlighted."

[ibmacri-118] "Additional Coding-related Required Information". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2015-01-07.

[whatwg-security-119] WHATWG Encoding Standard, section 2 ("Security background")

[whatwg-replacement-labels-120] WHATWG Encoding Standard, chapter 4.2 ("Names and labels"), anchor "replacement"

[whatwg-replacement-121] WHATWG Encoding Standard, section 14.1 ("replacement")

[rfc1468-122] ^ ^a ^b ^c ^d ^e ^f RFC 1468 (1993)

[wdc-123] "Code Page Identifiers". Windows Dev Center. Microsoft. Archived from the original on 2019-06-16. Retrieved 2019-09-16.

[whatwgiso2022jp-124] WHATWG Encoding Standard, section 12.2 ("ISO-2022-JP")

[126] Chang, Hye-Shik. "Modules/cjkcodecs/_codecs_iso2022.c, line 1122". cPython source tree. Python Software Foundation. Archived from the original on 2022-04-30. Retrieved 2019-09-15.

[127] "codecs — Codec registry and base classes § Standard Encodings". Python 3.7.4 documentation. Python Software Foundation. Archived from the original on 2019-07-28. Retrieved 2019-09-16.

[decunix-128] "2: Codesets and Codeset Conversion". DIGITAL UNIX Technical Reference for Using Japanese Features. Digital Equipment Corporation, Compaq.^{[dead link]}

[lundejisenc-129] Lunde (2008), pp. 236–238, Chapter 4 ("Encoding Methods"), section "The predecessor of ISO-2022-JP encoding—JIS encoding"

[130] RFC 1554 (1993)

[131] RFC 2237 (1997)

[132] "PQ02042: New Function to Provide C/370 iconv() Support for Japanese ISO-2022-JP". IBM. 2021-01-19. Archived from the original on 2022-01-04. Retrieved 2022-01-04.

[ibm-9148-133] "CCSID 9148". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-11-29.

[134] "CCSID 956". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-12-02.

[135] "CCSID 957". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-11-30.

[136] "CCSID 958". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-12-01.

[137] "CCSID 959". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-12-02.

[138] "CCSID 5052". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-11-29.

[139] "CCSID 5053". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-11-29.

[140] "CCSID 5054". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-11-29.

[141] "CCSID 5055". IBM Globalization - Coded Character Set Identifiers. IBM. Archived from the original on 2014-11-29.

[rfc1557-142] RFC 1557 (1993)

[ksx-143] "KS X 1001:1992" (PDF). Archived (PDF) from the original on 2007-09-26. Retrieved 2007-07-12.

[ksc-144] ISO-IR-149 (1988)

[rfc1922-145] RFC 1922 (1996)

[ecma-43-8-146] ECMA-43 (1991), pp. 9–10, chapter 8 ("Levels")

[147] ECMA-43 (1985), pp. 7–11, chapter 7.3 ("The G0 set")

[ecma-43-7.4-148] ECMA-43 (1991), pp. 6–8, chapter 7.4 ("G0 set")

[ecma-43-10.3-149] ECMA-43 (1991), p. 11, chapter 10.3 ("Identification of a version")

[ecma-43-annexE-150] ECMA-43 (1991), p. 23, annex E ("Main differences between the second edition (1985) and the present (third) edition of this ECMA Standard")

[iptc7901-151] IPTC (1995). The IPTC Recommended Message Format (PDF) (5th ed.). IPTC TEC 7901. Archived
(PDF) from the original on 2022-01-25. Retrieved 2020-01-14.

[ecma-43-9.2-152] ECMA-43 (1991), pp. 10, chapter 9.2 ("Unique coding of characters")

[vanWingen-153] van Wingen, Johan W (1999). "8. Code Extension, ISO 2022 and 2375, ISO 4873 and 10367". Character sets. Letters, tokens and codes. Terena. Archived from the original on 2020-08-01. Retrieved 2019-10-02.

[ecma-43-10-154] ECMA-43 (1991), pp. 10–11, chapter 10 ("Identification of version and level")

[cdra-155] IBM. "Character Data Representation Architecture (CDRA)". IBM. pp. 157–162. Archived from the original on 2019-06-23. Retrieved 2020-06-18.

[156] Scheifler (1989)

[157] Scheifler (1989), § Control Characters

[158] Scheifler (1989), § Directionality

[159] Scheifler (1989), § Standard Character Set Encodings

[160] Scheifler (1989), § Approved Standard Encodings

[161] "DICOM PS3.2 2016d - Conformance; D.6.2 Character Sets; D.6 Support of Character Sets". Archived from the original on 2020-02-16. Retrieved 2020-05-21.

[DICOM-162] "DICOM ISO 2022 variation". Archived from the original on 2013-04-30. Retrieved 2009-07-25.

[sivonen2018-163] Sivonen, Henri (2018-12-17). "(UNSUBMITTED DRAFT) No U+FFFD Generation for Zero-Length ASCII-State Content between ISO-2022-JP Escape Sequences" (PDF). Archived (PDF) from the original on 2019-02-21. Retrieved 2019-02-21.

[164] "935453 - Gather telemetry about HZ and other encodings we might try to remove". Archived from the original on 2017-05-19. Retrieved 2018-06-18.

[165] Davis, Mark; Suignard, Michel (2014-09-19). "3.6.2 Some Output For All Input". Unicode Technical Report #36: Unicode Security Considerations (revision 15). Unicode Consortium. Archived from the original on 2019-02-22. Retrieved 2019-02-21.

[4]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[6]

[18]

[3]

[a]

[b]

[c]

[d]

[20]

[21]

[22]

[23]

[24]

[25]

[5]

[26]

[27]

[29]

[30]

[31]

[32]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[44]

[45]

[46]

[33]

[47]

[48]

[43]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[61]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[79]

[80]

[81]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[92]

[93]

[94]

[95]

[e]

[f]

[g]

[h]

[101]

[100]

[99]

[102]

[104]

[103]

[105]

[106]

[107]

[108]

[113]

[111]

[114]

[115]

[i]

[117]