ISO/IEC 646

Source: Wikipedia, the free encyclopedia.

ISO/IEC 646 encoding family
DEC NRCS, World System Teletext
Adaptations to other alphabets:
ELOT 927, Symbol, KOI-7, SRPSCII and MAKSCII, ASMO 449, SI 960

ISO/IEC 646 is a set of ISO/IEC standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964.[1][2] Since its first edition in 1967[3] it has specified a 7-bit character code from which several national standards are derived.

ISO/IEC 646 was also ratified by ECMA as ECMA-6. The first version of ECMA-6 had been published in 1965,[4] based on work the ECMA's Technical Committee TC1 had carried out since December 1960.[4]

Characters in the ISO/IEC 646 Basic Character Set are invariant characters.[5] Since that portion of ISO/IEC 646, that is the invariant character set shared by all countries, specified only those letters used in the ISO basic Latin alphabet, countries using additional letters needed to create national variants of ISO/IEC 646 to be able to use their native scripts. Since transmission and storage of 8-bit codes was not standard at the time, the national characters had to be made to fit within the constraints of 7 bits, meaning that some characters that appear in ASCII do not appear in other national variants of ISO/IEC 646.

History

ASA X3.4
:1963)

ISO/IEC 646 and its predecessor

telecommunications
industry.

US-ASCII, or ISO/IEC 646:US

As ASCII did not provide a number of characters needed for languages other than English, a number of national variants were made that substituted some less-used characters with needed ones. Due to the incompatibility of the various national variants, an International Reference Version (IRV) of ISO/IEC 646 was introduced, in an attempt to at least restrict the replaced set to the same characters in all variants. The original version (ISO 646 IRV) differed from

International Alphabet No. 5 (IA5). This standard allows users to exercise the 12 variable characters (i.e., two alternative graphic characters and 10 national defined characters). Among these exercises, ISO 646:1991 IRV (International Reference Version) is explicitly defined and identical to ASCII.[6]

The

ISO/IEC 10646 standard, directly related to Unicode
, supersedes all of the ISO 646 and ISO/IEC 8859 sets with one unified set of character encodings using a larger 21-bit value.

ISO 646:JP

A legacy of ISO/IEC 646 is visible on Windows, where in many East Asian locales the

C programming language
.

Published standards

  • ECMA-6 (1965-04-30), first edition (withdrawn)[4]
  • ISO/R646-1967 (withdrawn),[3] or ECMA-6 (1967-06), second edition (withdrawn)[3][4]
  • ECMA-6 (1970-07), third edition (withdrawn)[4][7]
  • ISO 646:1972 (withdrawn), or ECMA-6 (1973-08), fourth edition (withdrawn)[4][7]
  • ISO 646:1983 (withdrawn),[8] or ECMA-6 (1984-12, 1985-03), fifth edition (withdrawn)[4]
  • ITU-T Recommendation T.50 IA5 (1988-11-25) (withdrawn),[9][10] or ISO/IEC 646:1991 (in force),[11][12] or ECMA-6 (1991-12, 1997-08), sixth edition (in force)[11]
  • ITU-T Recommendation T.50 IRA (1992-09-18) (in force)[9][13]

Code page layout

The following table shows the ISO/IEC 646 Invariant character set. Each character is shown with its Unicode equivalent. National code points are gray with the ASCII character that is replaced. Yellow indicates a character that, in some regions, could be combined with a previous character as a diacritic using the backspace character, which may affect glyph choice.

In addition to the invariant set restrictions, 0x23 is restricted to be either

¤ in ECMA-6:1991, equivalent to ISO/IEC 646:1991.[14] However, these restrictions are not followed by all national variants.[15][16]

ISO/IEC 646(-INV)
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x NUL
SOH
STX
ETX
EOT
ENQ
ACK
BEL  BS   
HT
 
 
LF
 
 
VT
 
 
FF
 
 CR   
SO
 
 
SI
  
1x
DLE
DC1
DC2
DC3
DC4
NAK
SYN
ETB
CAN  
EM
 
SUB ESC  
FS
 
 
GS
 
 
RS
 
 
US
 
2x  
SP
 
!
"
#
$
%
&
'
(
)
*
+
,
-
. /
3x 0 1 2 3 4 5 6 7 8 9 :
;
<
=
>
?
4x
@
A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z
[
\
]
^
_
6x
`
a b c d e f g h i j k l m n o
7x p q r s t u v w x y z
{
|
}
~
DEL

Variant codes and descriptions

ISO/IEC 646 national variants

Some national variants of ISO/IEC 646 are as follows:

Code ISO-IR ISO/IEC ESC Approved National Standard Description
CA 121 ESC 2/8 7/7 ISO 646
CSA
Z243.4-1985-1
Canada (No. 1 alternative, with "î")
(French, classical) (Code page 1020[17])
CA2 122 ESC 2/8 7/8 ISO 646
CSA
Z243.4-1985-2
Canada (No. 2 alternative, with "É")
(French, reformed orthography)
CN 57[18] ESC 2/8 5/4 ? GB/T 1988-80
People's Republic of China
(Basic Latin)
CU 151 ESC 2/8 2/1 4/1 ISO 646 NC 99-10:81 / NC NC00-10:81 Cuba (Spanish)
DANO 9-1[19] ESC 2/8 4/5[19]
SIS
?
NATS-DANO Norway and Denmark (journalistic texts). Invariant code point 0x22 is displayed as «, (compare " in the IRV). It is, however, still considered a double quotation mark.[20] Accompanies SEFI (NATS-SEFI).
DE 21[19][18] ESC 2/8 4/11[19] ISO 646
DIN 66003
)
DK ? DS 2089[25][26] Denmark (Danish) (Code page 1017[27])
ES 17[19] ESC 2/8 5/10[19] ECMA Olivetti Spanish (international) (Code page 1023[28])
ES2 85[18] ESC 2/8 6/8 ECMA IBM Spain (Basque, Castilian, Catalan, Galician) (Code page 1014[29])
FI 10[18] ISO 646 SFS 4017 Finland (basic version) (Code page 1018[30])
FR 69[18] ESC 2/8 6/6 ISO 646 AFNOR NF Z 62010-1982 France (French) (Code page 1010[31])
FR1 25[19][18] ESC 2/8 5/2[19] ISO 646 AFNOR NF Z 62010-1973 France (obsolete since April 1985) (Code page 1104[32])
GB 4[19][18] ESC 2/8 4/1[19] ISO 646 BS 4730 United Kingdom (English) (Code page 1013[33])
HU 86 ESC 2/8 6/9 ISO 646 MSZ 7795/3 Hungary (Hungarian)
IE 207 ? NSAI 433:1996 Ireland (Irish)
INV 170 ESC 2/8 2/1 4/2 ISO 646 ISO 646:1983 Invariant subset
(IRV) 2[19][18] ESC 2/8 4/0[19] ISO 646 ISO 646:1973 International Reference Version. 0x7E as an overline (ISO-IR-002).[34]
? ? ISO 646 ISO 646:1983 International Reference Version. 0x7E as a tilde ().
ISO 646:1991 International Reference Version matches the US variant (see below).
IS ? ? ? Iceland (Icelandic)
IT 15[19][18] ESC 2/8 5/9[19] ECMA UNI 0204-70 / Olivetti? Italian (Code page 1012[37])
JP 14[19][18] ESC 2/8 4/10[19] ISO 646 JIS C 6220:1969-ro
Romaji) (Code page 895[38]). Also used as an 8-bit code with the corresponding Katakana supplementary set
.
JP-OCR-B 92 ESC 2/8 6/14 ISO 646 JIS C 6229-1984-b Japan (OCR-B)
KR ? KS C 5636-1989 South Korea
MT ? ? Malta (Maltese, English)
NL ECMA IBM Netherlands (Dutch) (Code page 1019[39])
NO 60[18] ESC 2/8 6/0 ISO 646 NS 4551 version 1[18] Norway (Code page 1016[40])
NO2 61[18] ESC 2/8 6/1 ISO 646 NS 4551 version 2[18] Norway (obsolete since June 1987) (Code page 20108[22][23][41])
pl BN-74/3101-01 Poland (Polish has 18 letters with diacritical marks, but only 9 lowercase letters are normalized due to code space reasons.)
PT 16[18] ESC 2/8 4/12 ECMA Olivetti Portuguese (international)
PT2 84[18] ESC 2/8 6/7 ECMA IBM Portugal (Portuguese, Spanish) (Code page 1015[42])
SE 10[19][18] ESC 2/8 4/7[19] ISO 646
SIS
63 61 27
Sweden (basic Swedish) (Code page 1018,[30] D47)
SE2 11[19][18] ESC 2/8 4/8[19] ISO 646
SIS
63 61 27
Sweden (extended Swedish for names) (Code page 20107,[22][23][43] E47)
SEFI 8-1[19] ESC 2/8 4/3[19]
SIS
NATS-SEFI Sweden and Finland (journalistic texts). Accompanies DANO (NATS-DANO).
T.61-7bit 102 ESC 2/8 7/5 ?
CCITT T.61
Recommendation
International (Teletex). Also used with the corresponding supplementary set as an 8-bit code.
TW ?
CNS
5205-1996
Republic of China (Taiwan
)
US / (IRV) 6[19][18] ESC 2/8 4/2[19] ISO 646 ANSI X3.4-1968 and ISO 646:1983 (also IRV in ISO/IEC 646:1991) )
YU 141 ESC 2/8 7/10 ISO 646 JUS I.B1.002 (YUSCII) former Yugoslavia (Croatian, Slovene, Serbian, Bosnian)
INIS 49 ESC 2/8 5/7 IAEA INIS ISO 646 IRV subset

National derivatives

Some national character sets also exist which are based on ISO/IEC 646 but do not strictly follow its invariant set (see also § Derivatives for other alphabets):

Character set ISO-IR ISO ESC Approved National Standard Description
BS_viewdata 47 ESC 2/8 5/6
British Post Office
Viewdata and Teletext. Viewdata square (⌗) substituted for normally invariant underscore (_) which cannot be displayed on the target hardware.[46] This is actually the encoding of Microsoft's WST_Engl.
GR / greek7 88 ESC 2/8 6/10 ? HOS ELOT 927 Greece (withdrawn in November 1986). Uses Greek letters in place of Roman ones[47] and hence is not strictly speaking an ISO 646 variant.
greek7-old 18 ESC 2/8 5/11 ECMA ? Greek graphic set. Similar in concept to greek7, but uses a different mapping of letters. Also, the upper case follows the lower case.
Latin-Greek 19 ESC 2/8 5/12 ECMA ? Latin-Greek combined graphics (capitals only). Follows greek7-old, but includes Latin capitals without modification, and Greek capitals over the Latin lower case.
Latin-Greek-1 27[19] ESC 2/8 5/5[19] ECMA
Honeywell-Bull
Latin-Greek mixed graphics (Greek capitals only).[19] Visually unifies Greek capitals with Latin capitals where possible, and adds the remaining Greek capitals. Unlike the other Greek versions, all Basic Latin letters remain intact. Replaces invariant punctuation as well as national characters, however,[48] and hence is still not strictly speaking an ISO 646 variant.
swi ECMA Olivetti Switzerland (French, German) (Code page 1021[49]) Invariant code point 0x5F is changed from _ to è. Is a DEC NRCS variant, closely related to ISO 646, but lacks a fully ISO 646 compliant equivalent.

Control characters

All the variants listed above are solely graphical character sets, and are to be used with a

C0 control character
set such as listed in the following table:

ISO-IR ISO ESC Approved Description
1[19] ESC 2/1 4/0[19] ISO 646 ISO 646 controls[19] ("ASCII controls")
7[19] ESC 2/1 4/1[19] ISO 646 Scandinavian newspaper (NATS) controls[19]
26[19] ESC 2/1 4/3[19] ISO 646
IPTC controls[19]

Associated supplementary character sets

The following table lists supplementary graphical character sets defined by the same standard as specific ISO/IEC 646 variants. These would be selected by using a mechanism such as

shift out or the NATS super shift (single shift),[50]
or by setting the eighth bit in environments where one was available:

ISO-IR ISO/IEC ESC National Standard Description
8-2[19] ESC 2/8 4/4[19] NATS-SEFI-ADD Supplementary code used with NATS-SEFI.
9-2[19] ESC 2/8 4/6[19] NATS-DANO-ADD Supplementary code used with NATS-DANO.
13[19][18] ESC 2/8 4/9[19] JIS C 6220:1969-jp Katakana, used as a supplementary code with ISO-646-JP.
103 ESC 2/8 7/6
CCITT T.61
Recommendation, Supplementary Set
Supplementary code used with T.61.

Variant comparison chart

The specifics of the changes for some of these variants are given in the following table. Character assignments unchanged across all listed variants (i.e. which remain the same as ASCII) are not shown.

For ease of comparison, variants detailed include national variants of ISO/IEC 646, DEC's closely related

subset of ISO/IEC 10646 and Unicode).

Several characters could be used as

teletype era when use of backspace would overstamp a glyph, and may be considered deprecated
.

Later, when wider character sets gained more acceptance, ISO/IEC 8859, vendor-specific character sets and eventually Unicode became the preferred methods of coding most of these variants.

Variant Code Code Chart Characters for each ISO 646 / NRCS compatible or derived charset
US / IRV (1991) ISO-IR-006[51] ! " # $ & : ? @ [ \ ] ^ _ ` { | } ~
Older International Reference Versions
IRV (1973) ISO-IR-002[34] ! " # ¤ & : ? @ [ \ ] ^ _ ` { | }
IRV (1983) CP01009[52] ! " # ¤ & : ? @ [ \ ] ^ _ ` { | } ~
Invariant and other IRV subsets
INV ISO-IR-170[53] ! "     & : ?           _          
INV (NRCS)[a] --- ! "   $ & : ?                      
INV (Teletext)[a] ETS WST[54] ! "     & : ?                      
INIS Subset[a] ISO-IR-049[55] $ : [ ] |
T.61 ISO-IR-102[56] ! " # ¤ & : ? @ [   ]   _     |    
East Asian
JP ISO-IR-014[57] ! " # $ & : ? @ [ ¥ ] ^ _ ` { | }
JP-OCR-B ISO-IR-092[58] ! " # $ & : ? @ [ ¥ ] ^ _   { | }  
KR (KS X 1003)[59] ! " # $ & : ? @ [ ] ^ _ ` { | }
CN ISO-IR-057[16] ! " # ¥ & : ? @ [ \ ] ^ _ ` { | }
TW (CNS 5205)[59] ! " # $ & : ? @ [ \ ] ^ _ ` { | }
British and Irish
GB ISO-IR-004[60] ! " £ $ & : ? @ [ \ ] ^ _ ` { | }
GB (NRCS) CP01101[61] ! " £ $ & : ? @ [ \ ] ^ _ ` { | } ~
Viewdata[b][c] ISO-IR-047[46] ! " £ $ & : ? @ ½ ¼ ¾ ÷
IE ISO-IR-207[62] ! " £ $ & : ? Ó É Í Ú Á _ ó é í ú á
Italophone or Francophone
IT[d] ISO-IR-015[63] ! " £ $ & : ? § ° ç é ^ _ ù à ò è ì
IT (Teletext)[c] ETS WST[64] ! " £ $ & : ? é ° ç ù à ò è ì
FR (1983) ISO-IR-069[65] ! " £ $ & : ? à ° ç § ^ _ µ é ù è ¨
FR (1973)[d] ISO-IR-025[66] ! " £ $ & : ? à ° ç § ^ _ ` é ù è ¨
FR Teletext[c] ETS WST[64] ! " é ï & : ? à ë ê ù î è â ô û ç
CA[d] ISO-IR-121[67] ! " # $ & : ? à â ç ê î _ ô é ù è û
CA2 ISO-IR-122[68] ! " # $ & : ? à â ç ê É _ ô é ù è û
Francophone-Germanophone
swi (NRCS)[c] CP01021[69] ! " ù $ & : ? à é ç ê î è ô ä ö ü û
Germanophone
DE[d][e] ISO-IR-021[70] ! " # $ & : ? § Ä Ö Ü ^ _ ` ä ö ü ß
Nordic (Eastern) and Baltic
FI / SE ISO-IR-010[71] ! " # ¤ & : ? @ Ä Ö Å ^ _ ` ä ö å
SE2[f] ISO-IR-011[72] ! " # ¤ & : ? É Ä Ö Å Ü _ é ä ö å ü
SE (NRCS) CP01106[73] ! " # $ & : ? É Ä Ö Å Ü _ é ä ö å ü
FI (NRCS) CP01103[74] ! " # $ & : ? @ Ä Ö Å Ü _ é ä ö å ü
SEFI (NATS)[g] ISO-IR-008-1[75] ! " # $ & : ?   Ä Ö Å _ ä ö å
EE (Teletext)[c] ETS WST[64] ! " # õ & : ? Š Ä Ö Ž Ü Õ š ä ö ž ü
LV / LT (Teletext)[c] ETS WST[64] ! " # $ & : ? Š ė ę Ž č ū š ą ų ž į
Nordic (Western)
DK CP01017[76] ! " # ¤ & : ? @ Æ Ø Å Ü _ ` æ ø å ü
DK/NO (NRCS) CP01105[77] ! " # $ & : ? Ä Æ Ø Å Ü _ ä æ ø å ü
DK/NO-alt (NRCS) CP01107[78] ! " # $ & : ? @ Æ Ø Å ^ _ ` æ ø å ~
NO ISO-IR-060[79] ! " # $ & : ? @ Æ Ø Å ^ _ ` æ ø å
NO2 ISO-IR-061[15] ! " § $ & : ? @ Æ Ø Å ^ _ ` æ ø å |
DANO (NATS)[g][h] ISO-IR-009-1[20] ! « » $ & : ?   Æ Ø Å _ æ ø å
IS [80] ! " # ¤ & : ? Ð Þ \ Æ Ö _ ð þ | æ ö
Hispanophone
ES[d] ISO-IR-017[81] ! " £ $ & : ? § ¡ Ñ ¿ ^ _ ` ° ñ ç ~
ES2 ISO-IR-085[82] ! " # $ & : ? · ¡ Ñ Ç ¿ _ ` ´ ñ ç ¨
CU ISO-IR-151[83] ! " # ¤ & : ? @ ¡ Ñ ] ¿ _ ` ´ ñ [ ¨
Hispanophone-Lusophone
ES/PT Teletext[c] ETS WST[64] ! " ç $ & : ? ¡ á é í ó ú ¿ ü ñ è à
Lusophone
PT ISO-IR-016[84] ! " # $ & : ? § Ã Ç Õ ^ _ ` ã ç õ °
PT2 ISO-IR-084[85] ! " # $ & : ? ´ Ã Ç Õ ^ _ ` ã ç õ ~
PT (NRCS) --- ! " # $ & : ? @ Ã Ç Õ ^ _ ` ã ç õ ~
Greek
Latin-GR mixed[c] ISO-IR-027[48] Ξ " Γ ¤ & Ψ Π Δ Ω Θ Φ Λ Σ ` { | }
ISO-IR-088 (GR / ELOT 927), ISO-IR-018 and ISO-IR-019 replace Roman letters with Greek letters and are detailed in a separate chart.
Slavic (Latin script)
YU ISO-IR-141[86] ! " # $ & : ? Ž Š Đ Ć Č _ ž š đ ć č
YU Teletext[c] ETS WST[64] ! " # Ë & : ? Č Ć Ž Đ Š ë č ć ž đ š
YU-alt Teletext[c] ETS WST[64] ! " # $ & : ? Č Ć Ž Đ Š ë č ć ž đ š
CS/CZ/SK (Teletext)[c] ETS WST[64] ! " # ů & : ? č ť ž ý í ř é á ě ú š
PL BN-74/3101-01[80] ! " # & : ? ę ź \ ń ś _ ą ó ł ż ć
PL Teletext[c] ETS WST[64] ! " # ń & : ? ą Ƶ Ś Ł ć ó ę ż ś ł ź
Adaptations for the Cyrillic script replace Roman letters and are detailed in a separate chart
Other
NL CP01019[87] ! " # $ & : ? @ [ \ ] ^ _ ` { | }
NL NRCS CP01102[88] ! " £ $ & : ? ¾ ij ½ | ^ _ ` ¨ ƒ ¼ ´
HU ISO-IR-086[89] ! " # ¤ & : ? Á É Ö Ü ^ _ á é ö ü ˝
MT [80] ! " # $ & : ? @ ġ ż ħ ^ _ ċ Ġ Ż Ħ Ċ
RO (Teletext)[c] ETS WST[64] ! " # ¤ & : ? Ţ Â Ş Ă Î ı ţ â ş ă î
TR (Teletext)[c] ETS WST[64] ! " TL ğ & : ? İ Ş Ö Ç Ü Ğ ı ş ö ç ü
  1. ^ a b c Is a subset of one of the International Reference Versions of ISO 646, but does not include all characters which are present in the invariant set. Included for comparison.
  2. ^ Also UK Teletext.
  3. ^ a b c d e f g h i j k l m n Does not completely conform to the invariant set, but is a closely related derivative of ISO 646. Included here for comparison.
  4. ^ a b c d e ISO 646 variant identical to NRCS variant.
  5. ^ Also World System Teletext (DE)
  6. ^ Also World System Teletext (SE/FI/HU)
  7. ^
    Private Use Area, respectively (although it also lists PUA mappings for several other characters which now have UCS code points). Unicode contains a number of space characters
    which might approximately correspond.
  8. ^ Conformance to the ISO 646 invariant set is questionable, but it is a closely related derivative of ISO 646. Included here for comparison.

Related encoding families

National Replacement Character Set

The National Replacement Character Set (NRCS) is a family of 7-bit encodings introduced in 1983 by DEC with the VT200 series of computer terminals. It is closely related to ISO/IEC 646, being based on a similar invariant subset of ASCII, differing in retaining $ as invariant but not _ (although most NRCS variants retain the _, and hence comply with the ISO/IEC 646 invariant set). Most NRCS variants are closely related to corresponding national ISO/IEC 646 variants where they exist, with the exception of the Dutch variant.

World System Teletext

The European telecommunications standard ETS 300 706, "Enhanced Teletext specification", defines Latin, Greek, Cyrillic, Arabic and Hebrew code sets with several national variants for both Latin and Cyrillic.[64] Like NRCS and ISO/IEC 646, within the Latin variants, the family of encodings known as the G0 set are based on a similar invariant subset of ASCII, but do not retain either $ nor _ as invariant. Unlike NRCS, variants often differ considerably from corresponding national ISO/IEC 646 variants.

HP

HP has code page 1054, which adds the medium shade (▒, U+2592) at 0x7F.[90] Code page 1052 replaces a few ASCII characters from code page 1054.[91]

Code page 1052
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x  
SP
 
!
# $ % &
(
)
* +
,
- . /
3x
0
1
2
3
4
5
6
7
8
9
: ;
=
¢
?
4x
@
A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z
[
®
]
©
_
6x
°
a b c d e f g h i j k l m n o
7x p q r s t u v w x y z §
  Differences from ASCII

Derivatives for other alphabets

Some 7-bit character sets for non-Latin alphabets are derived from the ISO/IEC 646 standard: these do not themselves constitute ISO/IEC 646 due to not following its invariant code points (often replacing the letters of at least one case), due to supporting differing alphabets which the set of national code points provide insufficient encoding space for. Examples include:

  • 7-bit Turkmen (ISO-IR-230).[92]
  • 7-bit Greek.
    • In ELOT 927 (ISO-IR-088),[47] the Greek alphabet is mapped in alphabetical order (except for the final-sigma) to positions 0x61–0x71 and 0x73–0x79, on top of the Latin lowercase letters.
    • ISO-IR-018[93] maps the Greek alphabet over both letter cases using a different scheme (not in alphabetical order, but trying where possible to match Greek letters over Roman letters which correspond in some sense), and ISO-IR-019[94] maps the Greek uppercase alphabet over the Latin lowercase letters using the same scheme as ISO-IR-018.
    • The lower half of the Symbol font character encoding[95] uses its own scheme for mapping Greek letters of both cases over the ASCII Roman letters, also trying to map Greek letters over Roman letters which correspond in some sense, but making different decisions in this regard (see chart below). It also replaces invariant code points 0x22 and 0x27 and five national code points with mathematical symbols. Although not intended for use in typesetting Greek prose, it is sometimes used for that purpose.
    • ISO-IR-027
      homoglyphs
      ; while it is explicitly based on ISO/IEC 646, some of these are mapped to code points which are invariant in ISO/IEC 646 (0x21, 0x3A and 0x3F), and it is therefore not a true ISO/IEC 646 variant.
    • The World System Teletext encoding for Greek uses yet another scheme of mapping Greek letters in alphabetical order over the ASCII letters of both cases, notably including several letters with diacritics.[96]
  • 7-bit Cyrillic
  • 7-bit Hebrew, SI 960. The Hebrew alphabet is mapped to positions 0x60–0x7A, on top of the lowercase Latin letters (and grave accent for aleph). 7-bit Hebrew was always stored in visual order. This mapping with the high bit set, i.e. with the Hebrew letters in 0xE0–0xFA, is ISO/IEC 8859-8. The World System Teletext encoding for Hebrew uses the same letter mappings, but uses BS_Viewdata as its base encoding (whereas SI 960 uses US-ASCII) and includes a shekel sign at 0x7B.
  • 7-bit Arabic, ASMO 449 (ISO-IR-089).[100] The Arabic alphabet is mapped to positions 0x41–0x5A and 0x60–0x6A, on top of both uppercase and lowercase Latin letters.

A comparison of some of these encodings is below. Only one case is shown, except in instances where the cases are mapped to different letters. In such instances, the mapping with the smallest code is shown first. Possible transcriptions are given for some letters; where this is omitted, the letter can be considered to correspond to the Roman one which it is mapped over.

English
(ASCII)
Cyrillic alphabets Greek alphabet Hebrew
Semi-transliterative Naturally ordered
Russian
(KOI-7)
Russian,
Bulgarian
(WST
RU/BG
)
Ukrainian
(WST UKR)
Serbian
(SRPSCII)
Macedonian
(MAKSCII)
Serbian,
Macedonian[a]
(WST SRP)
Greek
(
Symbol
)
Greek
(IR-18[93])
Greek
(ELOT 927)
Greek
(WST EL)
Hebrew
(SI 960)
@
`
Ю (ju/yu) Ю (ju/yu) Ю (ju/yu) Ж (ž) Ж (ž) Ч (č)
´
`
@
`
ΐ
ΰ
א (ʾ/ʔ)
A А А (a/á) А А А А Α Α Α Α ב (b)
B Б Б Б Б Б Б Β Β Β Β ג (g)
C Ц (c/ts) Ц (c/ts) Ц (c/ts) Ц (c/ts) Ц (c/ts) Ц (c/ts) Χ (ch/kh) Ψ (ps) Γ (g) Γ (g) ד (d)
D Д Д Д Д Д Д Δ Δ Δ Δ ה (h)
E Е (je/ye) Е (je/ye) Е (e) Е (e) Е (e) Е (e) Ε Ε Ε Ε ו‬ (w)
F Ф Ф Ф Ф Ф Ф Φ (ph/f) Φ (ph/f) Ζ (z) Ζ (z) ז (z)
G Г Г Г Г Г Γ Γ Γ Η (ē) Η (ē) ח (ch/kh)
H Х (h/kh/ch) Х (h/kh/ch) Х (h/kh/ch) Х (h/kh/ch) Х (h/kh/ch) Х (h/kh/ch) Η (ē) Η (ē) Θ (th) Θ (th) ט (tt)
I И И И (y) И И И Ι Ι Ι Ι י (j/y)
J Й (j/y) Й (j/y) Й (j/y) Ј (j/y) Ј (j/y) Ј (j/y) ϑ (th)
ϕ (ph/f)
Ξ (x/ks)   Κ (k) ך (k final)
K К К К К К К Κ Κ Κ Λ (l) כ
L Л Л Л Л Л Л Λ Λ Λ Μ (m) ל
M М М М М М М Μ Μ Μ Ν (n) ם (m final)
N Н Н Н Н Н Н Ν Ν Ν Ξ (x/ks) מ (m)
O О О О О О О Ο Ο Ξ (x/ks) Ο ן (n final)
P П П П П П П Π Π Ο (o) Π נ (n)
Q Я (ja/ya) Я (ja/ya) Я (ja/ya) Љ (lj/ly) Љ (lj/ly) Ќ (Ḱ/kj) Θ (th) ͺ (i) Π (p) Ρ (r) ס (s)
R Р Р Р Р Р Р Ρ Ρ Ρ ʹ
ς (s final)
ע (ʽ/ŋ)
S С С С С С С Σ Σ Σ Σ ף (p final)
T Т Т Т Т Т Т Τ Τ Τ Τ פ (p)
U У У У У У У Υ Θ (th) Υ Υ ץ (ṣ/ts final)
V Ж (ž) Ж (ž) Ж (ž) В В В ς (s final)
ϖ (p)
Ω (ō) Φ (f/ph) Φ (f/ph) צ (ṣ/ts)
W В (v) В (v) В (v) Њ (nj/ny/ñ) Њ (nj/ny/ñ) Ѓ (ǵ/gj) Ω (ō) ς (s final) ς (s final) Χ (ch/kh) ק (q)
X Ь (’) Ь (’) Ь (’) Џ (dž) Џ (dž) Љ (lj/ly) Ξ Χ (ch/kh) Χ (ch/kh) Ψ (ps) ר (r)
Y Ы (y/ı) Ъ (″/ǎ/ŭ) І (i) Ѕ (dz) Ѕ (dz) Њ (nj/ny/ñ) Ψ (ps) Υ (u) Ψ (ps) Ω (ō) ש (š/sh)
Z З З З З З З Ζ Ζ Ω (ō) Ϊ ת (t)
[
{
Ш (š/sh) Ш (š/sh) Ш (š/sh) Ш (š/sh) Ш (š/sh) Ћ (ć) [
{

[
{
Ϋ [
{
\
|
Э (e) Э (e) Є (je/ye) Ђ (đ/dj) Ѓ (ǵ/gj) Ж (ž)
|
᾿
῾ (h)
\
|
ά
ό
\
|
]
}
Щ (šč) Щ (šč) Щ (šč) Ћ (ć) Ќ (Ḱ/kj) Ђ (đ/dj) ]
}

]
}
έ
ύ
]
}
^
~
Ч (č) Ч (č) Ч (č) Ч (č) Ч (č) Ш (š/sh)
~
˜
¨
^
ή
ώ
^
_ Ъ (″) Ы (y/ı) Ї (ji/yi) _ _ Џ (dž) _ _ _ ί _

See also

Footnotes

  1. ^
    Ѕ). A subset of Roman letters, mostly those without homoglyphs in the G0 set, are included in the G1 set (15.6.7 Table 41), including S/s at 0x6B/7B. Croatian is written in Latin script
    .

References

  1. ^ Mullendore, Ralph Elvin (1964) [1963]. Ptak, John F. (ed.). "On the Early Development of ASCII - The History of ASCII". JF Ptak Science Books (published March 2012). Archived from the original on 2016-05-26. Retrieved 2016-05-26.
  2. ^ 6 and 7 Bit Coded Character Sets for Information Processing Interchange (draft), International Organization for Standardization, July 1964 (NB. 21 pages. With cover letter for the members of the X3.2 and Task Groups from Eric Clamons.)
  3. ^ (PDF) from the original on May 26, 2016. Retrieved August 25, 2019.
  4. ^
    European Computer Manufacturers Association (Ecma). March 1985. Archived (PDF) from the original on 2016-05-29. Retrieved 2016-05-29. The Technical Committee TC1 of ECMA
    met for the first time in December 1960 to prepare standard codes for Input/Output purposes. On [30] April 1965, Standard ECMA-6 was adopted by the General Assembly of ECMA.
  5. NISO Circulation Interchange Protocol. Colorado Department of Education, USA: NCIP Standing Committee (NCIP-SC). Archived from the original
    on 2013-12-24. Retrieved 2016-05-30.
  6. ^ Demchenko, Yuri (2000) [1997]. "International Standardization of 7-Bit Codes, ISO 646". TERENA. 4. Archived from the original on 2016-06-17. Retrieved 2012-08-13.
  7. ^
    European Computer Manufacturers Association (Ecma). August 1973. Archived
    (PDF) from the original on 2016-05-29. Retrieved 2016-05-29.
  8. ^ "Information processing -- ISO 7-bit coded character set for information interchange". 1983-07-01. ISO 646:1983. Archived from the original on 2016-05-30. Retrieved 2016-05-30.
  9. ^ from the original on 2016-06-13. Retrieved 2016-06-13.
  10. The International Telegraph and Telephone Consultative Committee (CCITT) - Series T: Terminal Equipment and Protocols for Telematic Services, 1993-04-16 [1988-11-25], E 33116, archived
    from the original on 2017-03-19, retrieved 2017-03-18
  11. ^
    European Computer Manufacturers Association (Ecma). August 1997 [December 1991]. Archived
    (PDF) from the original on 2016-05-29. Retrieved 2016-05-29.
  12. ^ "Information technology -- ISO 7-bit coded character set for information interchange" (3rd ed.). 1991-12-16. ISO/IEC 646:1991. Archived from the original on 2016-05-30. Retrieved 2016-05-30.
  13. The International Telegraph and Telephone Consultative Committee (CCITT) - Terminal Equipment and Protocols for Telematic Services, 1993-04-16 [1992-09-18], E 3177, archived
    from the original on 2014-12-19, retrieved 2017-03-18
  14. ^ ECMA (1991). "7-Bit coded Character Set" (PDF). ECMA-6.
  15. ^
    ISO-IR
    -61.
  16. ^
    ISO-IR
    -57.
  17. ^ "SBCS code page information - CPGID: 01020 / Name: Canadian (French) Variant". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1992-10-01. Archived from the original on 2016-06-17. Retrieved 2016-06-17.
  18. ^
    Hewlett-Packard Company, LP. June 2003. HP part-number 502-0378. Archived from the original
    (PDF) on 2016-08-10. Retrieved 2016-08-10.
  19. ^
    Bemer, Robert William
    (July 1978). "Inside ASCII - Part III". Interface Age. 3 (7). Portland, OR, USA: dilithium Press: 80–87.
  20. ^
    ISO-IR
    -9-1.
  21. ^ "SBCS code page information - CPGID: 01011 / Name: 7-Bit Germany F.R." IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1987-08-01. Archived from the original on 2016-06-17. Retrieved 2016-06-17.
  22. ^ a b c d e "Code Page Identifiers". Microsoft Developer Network. Microsoft. 2014. Archived from the original on 2016-06-19. Retrieved 2016-06-19.
  23. ^ a b c d e "Web Encodings - Internet Explorer - Encodings". WHATWG Wiki. 2012-10-23. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
  24. ^ Foller, Antonin (2014) [2011]. "German (IA5) encoding - Windows charsets". WUtils.com - Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
  25. ^ Danish Standard DS 2089: Application of ISO 7-bit coded character set. February 1974. UDC 681.3:003.62.
  26. .
  27. ^ "SBCS code page information - CPGID: 01017 / Name: 7-Bit Denmark". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1987-08-01. Archived from the original on 2016-06-17. Retrieved 2016-06-17.
  28. ^ "SBCS code page information - CPGID: 01023 / Name: Spain Variant". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1992-10-01. Archived from the original on 2016-06-17. Retrieved 2016-06-17.
  29. ^ "SBCS code page information - CPGID: 01014 / Name: 7-Bit Spain". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1987-10-01. Archived from the original on 2016-06-17. Retrieved 2016-06-17.
  30. ^ a b "SBCS code page information - CPGID: 01018 / Name: 7-Bit Finland/Sweden". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1987-08-01. Archived from the original on 2016-06-17. Retrieved 2016-06-17.
  31. ^ "SBCS code page information - CPGID: 01010 / Name: 7-Bit France". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1987-08-01. Archived from the original on 2016-06-17. Retrieved 2016-06-17.
  32. ^ "SBCS code page information - CPGID: 01104 / Name: French NRC Set". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1987-08-01. Archived from the original on 2016-06-21. Retrieved 2016-06-21.
  33. ^ "SBCS code page information - CPGID: 01013 / Name: 7-Bit United Kingdom". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1987-08-01. Archived from the original on 2016-06-17. Retrieved 2016-06-17.
  34. ^
    ISO-IR-2.{{citation}}: CS1 maint: numeric names: authors list (link
    )
  35. ^ "SBCS code page information - CPGID: 01009 / Name: ISO IRV". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1990-04-01. Archived from the original on 2016-06-17. Retrieved 2016-06-17.
  36. ^ Foller, Antonin (2014) [2011]. "Western European (IA5) encoding - Windows charsets". WUtils.com - Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
  37. ^ "SBCS code page information - CPGID: 01012 / Name: 7-Bit Italy". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1987-08-01. Archived from the original on 2016-06-17. Retrieved 2016-06-17.
  38. ^ "SBCS code page information - CPGID: 00895 / Name: Japan 7-Bit Latin". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1986-10-01. Archived from the original on 2016-06-18. Retrieved 2016-06-18.
  39. ^ "SBCS code page information - CPGID: 01019 / Name: 7-Bit Netherlands". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1987-08-01. Archived from the original on 2016-06-17. Retrieved 2016-06-17.
  40. ^ "SBCS code page information - CPGID: 01016 / Name: 7-Bit Norway". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1987-08-01. Archived from the original on 2016-06-17. Retrieved 2016-06-17.
  41. ^ Foller, Antonin (2014) [2011]. "Norwegian (IA5) encoding - Windows charsets". WUtils.com - Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
  42. ^ "SBCS code page information - CPGID: 01015 / Name: 7-Bit Portugal". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1987-08-01. Archived from the original on 2016-06-17. Retrieved 2016-06-17.
  43. ^ Foller, Antonin (2014) [2011]. "Swedish (IA5) encoding - Windows charsets". WUtils.com - Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
  44. ^ "SBCS code page information - CPGID: 00367 / Name: ASCII". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1978-01-01. Archived from the original on 2016-06-17. Retrieved 2016-06-17.
  45. ^ Foller, Antonin (2014) [2011]. "US-ASCII encoding - Windows charsets". WUtils.com - Online web utility and help. Motobit Software. Archived from the original on 2016-06-20. Retrieved 2016-06-20.
  46. ^
    ISO-IR
    -47.
  47. ^
    ISO-IR
    -88.
  48. ^
    ISO-IR
    -27.
  49. ^ "SBCS code page information - CPGID: 01021 / Name: Switzerland Variant". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. 1. IBM. 1992-10-01. Archived from the original on 2016-06-17. Retrieved 2016-06-17.
  50. ISO-IR
    -7.
  51. ISO-IR
    -6.
  52. ^ IBM (1990). "Code Page 01009" (PDF). REGISTRY: Graphic Character Sets and Code Pages.
  53. ISO-IR
    -170.
  54. ^ "15.6.1 Latin G0 Set", ETS 300 706: Enhanced Teletext specification (PDF), European Telecommunications Standards Institute (ETSI), p. 114
  55. ISO-IR
    -49.
  56. ISO-IR
    -102.
  57. ISO-IR
    -14.
  58. ISO-IR-92.{{citation}}: CS1 maint: numeric names: authors list (link
    )
  59. ^ .
  60. ISO-IR
    -4.
  61. ^ IBM (1992). "Code Page 01101" (PDF). REGISTRY: Graphic Character Sets and Code Pages.
  62. ISO-IR
    -207.
  63. ISO-IR
    -15.
  64. ^ a b c d e f g h i j k l "15.6.2 Latin National Option Sub-Sets, Table 36", ETS 300 706: Enhanced Teletext specification (PDF), European Telecommunications Standards Institute (ETSI), p. 115
  65. ISO-IR
    -69.
  66. ISO-IR
    -25.
  67. ISO-IR
    -121.
  68. ISO-IR
    -122.
  69. ^ IBM (1992). "Code Page 01021" (PDF). REGISTRY: Graphic Character Sets and Code Pages.
  70. ISO-IR
    -21.
  71. ISO-IR
    -10.
  72. ISO-IR
    -11.
  73. ^ IBM (1992). "Code Page 01106" (PDF). REGISTRY: Graphic Character Sets and Code Pages.
  74. ^ IBM (1992). "Code Page 01103" (PDF). REGISTRY: Graphic Character Sets and Code Pages.
  75. ^
    ISO-IR
    -8-1.
  76. ^ IBM (1987). "Code Page 01017" (PDF). REGISTRY: Graphic Character Sets and Code Pages.
  77. ^ IBM (1992). "Code Page 01105" (PDF). REGISTRY: Graphic Character Sets and Code Pages.
  78. ^ IBM (1992). "Code Page 01107" (PDF). REGISTRY: Graphic Character Sets and Code Pages.
  79. ISO-IR
    -60.
  80. ^
    ISBN 978-9986-680-47-5. (note: chart given sometimes mixes up letter cases
    , e.g. ġ and Ġ both appearing as Ġ in the Maltese row, or Ä and ä both appearing as Ä in the Swedish rows)
  81. ISO-IR
    -17.
  82. ISO-IR
    -85.
  83. ISO-IR
    -151.
  84. ISO-IR
    -16.
  85. ISO-IR
    -84.
  86. ISO-IR
    -141.
  87. ^ IBM (1987). "Code Page 01019" (PDF). REGISTRY: Graphic Character Sets and Code Pages.
  88. ^ IBM (1992). "Code Page 01102" (PDF). REGISTRY: Graphic Character Sets and Code Pages.
  89. ISO-IR
    -86.
  90. ^ "Code Page 1054" (PDF). Archived from the original (PDF) on 2013-01-21.
  91. ^ "Code Page 1052" (PDF). Archived from the original (PDF) on 2013-01-21.
  92. ISO-IR
    -230.
  93. ^
    ISO-IR
    -18.
  94. ISO-IR
    -19.
  95. ^ "Map (external version) from Mac OS Symbol character set to Unicode 4.0 and later".
  96. ^ "15.6.8: Greek G0 Set", ETS 300 706: Enhanced Teletext specification (PDF), European Telecommunications Standards Institute (ETSI), p. 121
  97. ^ "15.6.5: Cyrillic G0 Set - Option 2 - Russian/Bulgarian", ETS 300 706: Enhanced Teletext specification (PDF), European Telecommunications Standards Institute (ETSI), p. 118
  98. ^ "15.6.6: Cyrillic G0 Set - Option 3 - Ukrainian", ETS 300 706: Enhanced Teletext specification (PDF), European Telecommunications Standards Institute (ETSI), p. 119
  99. ^ "15.6.4: Cyrillic G0 Set - Option 1 - Serbian/Croatian", ETS 300 706: Enhanced Teletext specification (PDF), European Telecommunications Standards Institute (ETSI), p. 117
  100. ISO-IR
    -89.

Further reading

  • Fischer, Eric, ed. (1975) [1972]. Source documents on the history of character codes, 1972–1975 (Compilation). Archived from the original on 2020-06-07. Retrieved 2020-06-07{{
    Bemer, Robert William
    (1972). "a view of the history of the ISO character set". Honeywell Computer Journal. 6 (4). Phoenix, Arizona, USA: Honeywell Information Systems: 274–286, 287–291. (13+5 pages) and many other documents and correspondence.

External links