T.51/ISO/IEC 6937
Latin based coded character sets for telematic services | |
Status | In force |
---|---|
Year started | 1984 |
Latest version | (09/92) September 1992 |
Organization | ITU-T |
Committee | Study Group VIII |
Related standards | T.61, ETS 300 706, ISO/IEC 10367, ISO/IEC 2022, ISO 5426 |
Domain | encoding |
License | Freely available |
Website | https://www.itu.int/rec/T-REC-T.51 |
Alias(es) |
|
---|---|
Standard | |
Based on | ITU T.61 |
Other related encoding(s) | |
T.51 / ISO/IEC 6937:2001, Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension of
ISO/IEC 6937's architects were Hugh McGregor Ross, Peter Fenwick, Bernard Marti and Loek Zeckendorf.
ISO6937/2 defines 327 characters found in modern European languages using the
IANA has registered the charset names ISO_6937-2-25 and ISO_6937-2-add for two (older) versions of this standard (plus control codes). But in practice this character encoding is unused on the Internet.
Single byte characters
The primary set (first half) originally followed
The supplementary set (second half) contains a selection of spacing and non-spacing graphic characters, additional symbols and some locations reserved for future standardisation.
Both of these are
The ISO/IEC 2022 escape sequence to designate the supplementary set of ISO/IEC 6937 as the G2 set is ESC . R
(hex 1B 2E 52
).[2][5][6] The older ISO 6937/2:1983 supplementary set is registered as a 94-code set, and designated to G2 with ESC * l
(hex 1B 2A 6C
).[5][7]
Two byte characters
Accented letters which are not allocated single codes in the primary or supplementary set are coded using two bytes. The first byte, the "non spacing diacritical mark", is followed by a letter from the base set e.g.:
small e with acute accent (é) = [Acute]+e
The ITU T.51 standard allocates column 4 of the supplementary set (i.e.
This repertoire is also affixed to the ITU version of the specification as Annex A, although the ITU version does not reference it from the main text. It is described as a "unified superset" of the Latin-script character repertoires.
This system also differs from the Unicode combining character system in that the diacritic code precedes the letter (as opposed to following it), making it more similar to ANSEL.
A little anomaly is that Latin Small Letter G with Cedilla is coded as if it were with an acute accent, that is, with a 0xC2 lead byte, since due to its descender interfering with a cedilla, the lowercase letter is usually with turned comma above: Ģ ģ.
In total 13 diacritical marks can be followed by the selected characters from the primary set:
Accent | Code | Second character | Result |
---|---|---|---|
Grave | 0xC1 | AEIOUaeiou | ÀÈÌÒÙàèìòù |
Acute | 0xC2 | ACEILNORSUYZacegilnorsuyz | ÁĆÉÍĹŃÓŔŚÚÝŹáćéģíĺńóŕśúýź |
Circumflex | 0xC3 | ACEGHIJOSUWYaceghijosuwy | ÂĈÊĜĤÎĴÔŜÛŴŶâĉêĝĥîĵôŝûŵŷ |
Tilde | 0xC4 | AINOUainou | ÃĨÑÕŨãĩñõũ |
Macron | 0xC5 | AEIOUaeiou | ĀĒĪŌŪāēīōū |
Breve | 0xC6 | AGUagu | ĂĞŬăğŭ |
Dot | 0xC7 | CEGIZcegz | ĊĖĠİŻċėġż |
Umlaut or diæresis | 0xC8 | AEIOUYaeiouy | ÄËÏÖÜŸäëïöüÿ |
Ring | 0xCA | AUau | ÅŮåů |
Cedilla | 0xCB | CGKLNRSTcklnrst | ÇĢĶĻŅŖŞŢçķļņŗşţ |
Double Acute | 0xCD | OUou | ŐŰőű |
Ogonek | 0xCE | AEIUaeiu | ĄĘĮŲąęįų |
Caron | 0xCF | CDELNRSTZcdelnrstz | ČĎĚĽŇŘŠŤŽčďěľňřšťž |
Codepage layout
The reference to combining characters in the U+0300—U+036F range for the codes in the range 0xC1—0xCF below is subject to the caveats mentioned above; they cannot simply be mapped to the codepoints listed. Also, Unicode distinguishes 0xE2 into uppercase D with stroke and uppercase Eth, which usually look different for the lowercase letters (0xF2 and 0xF3).
The older 1988 edition of ITU T.51 defined two versions of the supplementary set, with the first version lacking the
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | ||||||||||||||||
1x | ||||||||||||||||
2x | SP
|
! | " | # | ¤[a]
|
% | & | ' | ( | ) | * | + | ,
|
- | . | / |
3x | 0
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
: | ; | <
|
=
|
>
|
? |
4x | @
|
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [
|
\ | ]
|
^ | _ |
6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
8x | ||||||||||||||||
9x | ||||||||||||||||
Ax | NBSP | ¡ | ¢ | £ | $[b] | ¥
|
#[b] | § | ¤
|
‘
|
“
|
«
|
← | ↑ | → | ↓ |
Bx | °
|
±
|
²
|
³
|
×
|
µ
|
¶
|
·
|
÷
|
’
|
”
|
»
|
¼
|
½ | ¾
|
¿
|
Cx | ◌̀ | ◌́ | ◌̂ | ◌̃ | ◌̄ | ◌̆ | ◌̇ | ◌̈ | ◌̊ | ◌̧ | ◌̲[c]
|
◌̋ | ◌̨ | ◌̌ | ||
Dx | ―
|
¹
|
®
|
©
|
™
|
♪
|
¬
|
¦
|
⅛
|
⅜
|
⅝
|
⅞
| ||||
Ex | Ω | Æ | Ð
|
ª
|
Ħ
|
[d] | IJ
|
Ŀ
|
Ł | Ø | Œ | º
|
Þ
|
Ŧ
|
Ŋ
|
ʼn
|
Fx | ĸ
|
æ | đ
|
ð
|
ħ
|
ı | ij
|
ŀ
|
ł | ø | œ | ß | þ
|
ŧ
|
ŋ
|
SHY |
Videotex version
The versions of the supplementary set used by the ITU T.101 standard for Videotex are based on the first supplementary set of the 1988 edition of T.51.
The default G2 set for Data Syntax 2 adds a
The supplementary set for Data Syntax 3 adds non-spacing marks for a "vector overbar" and solidus and several semigraphic characters.[11]
ETS 300 706 version
The ETS 300 706 standard for
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
Ax | SP | ¡ | ¢ | £ | $ | ¥
|
# | § | ¤
|
‘
|
“
|
«
|
← | ↑ | → | ↓ |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Bx | °
|
±
|
²
|
³
|
×
|
µ
|
¶
|
·
|
÷
|
’
|
”
|
»
|
¼
|
½ | ¾
|
¿
|
Cx | ◌̀ | ◌́ | ◌̂ | ◌̃ | ◌̄ | ◌̆ | ◌̇ | ◌̈ | ̣◌̣ | ◌̊ | ◌̧ | ◌̲
|
◌̋ | ◌̨ | ◌̌ | |
Dx | ―
|
¹
|
®
|
©
|
™
|
♪
|
₠ | ‰ | α
|
⅛
|
⅜
|
⅝
|
⅞
| |||
Ex | Ω | Æ | Ð
|
ª
|
Ħ
|
IJ
|
Ŀ
|
Ł | Ø | Œ | º
|
Þ
|
Ŧ
|
Ŋ
|
ʼn
| |
Fx | ĸ
|
æ | đ
|
ð
|
ħ
|
ı | ij
|
ŀ
|
ł | ø | œ | ß | þ
|
ŧ
|
ŋ
|
■ |
See also
Footnotes
- ^ Continued use for ¤ permitted for existing CCITT services only.[2]
- ^ a b Permitted for existing CCITT services only, otherwise the ASCII representation should be used.[2]
- ANSI escape sequences, although it does mention that it should be correctly interpreted when received by applicable systems.[2] Previous editions of the ISO/IEC version of the standard also allowed combining this code with any character in the defined repertoire,[7] whereas more recent revisions do not include this code.[5]
- ^ An early draft placed ȷ in this position.
References
- ^ "T.51 : Latin based coded character sets for telematic services". www.itu.int. Archived from the original on 2019-10-08. Retrieved 2019-11-14.
- ^ a b c d e f g h CCITT (1992-09-18). Latin based coded character sets for telematic services (1992 ed.). Recommendation T.51.
- ^ ITU-T (1995-08-11). Recommendation T.51 (1992) Amendment 1.
- ISO-IR-106.
- ^ a b c d e ISO/IEC JTC 1/SC 2/WG 3 (1998-04-15). WD 6937, Coded graphic character set for text communication - Latin alphabet (PDF). JTC1/SC2/N454.
{{citation}}
: CS1 maint: numeric names: authors list (link) - .)
- ^ )
- ISBN 978-1-4200-4067-8.
- ^ a b CCITT (1988). Coded character sets for telematic services (1988 ed.). Recommendation T.51.
- ISO-IR-70.
- ISO-IR-128.
- ^ ETSI (1997). "15.6.3 Latin G2 Set". Enhanced Teletext specification (PDF)(PDF). p. 116. ETS 300 706.
External links
- ITU Recommendation T.51
- ISO pages: ISO 6937-1:1983, ISO 6937-2:1983, ISO 6937-2:1983/Add 1:1989, ISO/IEC 6937:1994, ISO/IEC 6937:2001
- WD 6937, Coded graphic character set for text communication - Latin alphabet (Revision of ISO/IEC 6937:1994) (ISO/IEC 6937:1994 draft)
- ISO-IR-156 (ISO-IRregistration of right-hand part)