Indian Script Code for Information Interchange

Source: Wikipedia, the free encyclopedia.

Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of

Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII
encoding.

ISCII has not been widely used outside certain government institutions, although a variant without the ATR mechanism was used on classic Mac OS, Mac OS Devanagari,[1] and it has now been rendered largely obsolete by Unicode. Unicode uses a separate block for each Indic writing system, and largely preserves the ISCII layout within each block.

Background

The Brahmi-derived writing systems have similar structure. So ISCII encodes letters with the same phonetic value at the same code point, overlaying the various scripts. For example, the ISCII codes 0xB3 0xDB represent [ki]. This will be rendered as കി in Malayalam, कि in Devanagari, as ਕਿ in Gurmukhi, and as கி in Tamil. The writing system can be selected in rich text by markup or in plain text by means of the ATR code described below.

One motivation for the use of a single encoding is the idea that it will allow easy transliteration from one writing system to another. However, there are enough incompatibilities that this is not really a practical idea.

ISCII is an 8-bit encoding. The lower 128 code points are plain

ASCII, the upper 128 code points are ISCII-specific. In addition to the code points representing characters, ISCII makes use of a code point with mnemonic ATR
that indicates that the following byte contains one of two kinds of information. One set of values changes the writing system until the next writing system indicator or end-of-line. Another set of values select display modes such as bold and italic. ISCII does not provide a means of indicating the default writing system.

Codepage layout

The following table shows the character set for

equivalent form in each writing system. Each character is shown with its decimal code and its Unicode
equivalent.

ISCII Devanagari
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x NUL
SOH
STX
ETX
EOT
ENQ
ACK
BEL  BS   
HT
 
 
LF
 
 
VT
 
 
FF
 
 CR   
SO
 
 
SI
  
1x
DLE
DC1
DC2
DC3
DC4
NAK
SYN
ETB
CAN  
EM
 
SUB ESC  
FS
 
 
GS
 
 
RS
 
 
US
 
2x  
SP
 
!
"
#
$
%
&
'
(
)
*
+
,
-
. /
3x 0 1 2 3 4 5 6 7 8 9 :
;
<
=
>
?
4x
@
A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z
[
\
]
^
_
6x
'
a b c d e f g h i j k l m n o
7x p q r s t u v w x y z
{
|
}
~
DEL
8x
9x
Ax
Bx
Cx य़
Dx INV ि
Ex ATR
Fx EXT
  Undefined
  Lead byte

Special code points

INV character—code point D9 (217)
The INV (invisible consonant) character is used as a pseudo-consonant to display combining elements in isolation. For example, क (ka) + ् (halant) + INV = क्‍ (half ka). The Unicode equivalent is U+200D ZERO WIDTH JOINER (
Apple maps the ISCII INV character to the Unicode left-to-right mark, so as to guarantee round-tripping.[1]
ATR character—code point EF (239)
The ATR (attribute) character followed by a byte code is used to switch to a different font attribute (such as bold) or to a different ISCII or
PASCII
language (such as Bengali), up to the next ATR sequence or the end of the line. This has no direct Unicode equivalent, as font attributes are not part of Unicode, and each script has a distinct set of code points.
Presentational attributes
ATR + byte Mnemonic Formatting option
0x30 BLD
Bold
0x31 ITA
Italics
0x32 UL
Underlining
0x33 EXP Expanded
0x34 HLT Highlight
0x35 OTL Outline
0x36 SHD Shadow
0x37 TOP Top half of character (used with LOW to create double-height characters)
0x38 LOW Bottom half of character (used with TOP to create double-height characters)
0x39 DBL Entire row double-width and double-height
Shifts to ISCII scripts
ATR + byte Mnemonic ISCII script
0x40 DEF Default script (i.e. the script which will be switched back to after a line break)
0x41 RMN Romanised transliteration
0x42 DEV Devanagari
0x43 BNG
Bengali script
0x44 TML Tamil script
0x45 TLG Telugu script
0x46 ASM
Assamese script
0x47 ORI Odia script
0x48 KND Kannada script
0x49 MLM Malayalam script
0x4A GJR Gujarati script
0x4B PNJ
Gurmukhī
Shifts to
PASCII
ATR + byte Mnemonic PASCII locale
0x71 ARB Arabic alphabet
0x72 PES Persian alphabet
0x73 URD Urdu alphabet
0x74 SND
Sindhi alphabet
0x75 KSM
Kashmiri alphabet
0x76 PST Pashto alphabet
EXT character—code point F0 (240)
The EXT (extensions for Vedic) character followed by a byte code indicates a Vedic accent. This has no direct Unicode equivalent, as Vedic accents are assigned to distinct code points.
Halant character ्—code point E8 (232)
The halant character removes the implicit vowel from a consonant and is used between consonants to represent conjunct consonants. For example, क (ka) + ् (halant) + त (ta) = क्त (kta). The sequence ् (halant) + ् (halant) displays a conjunct with an explicit halant, for example क (ka) + ् (halant) + ् (halant) + त (ta) = क्‌त. The sequence ् (halant) + ़ (nukta) displays a conjunct with half consonants, if available, for example क (ka) + ् (halant) + ़ (nukta) + त (ta) = क्‍त.
Correspondences between ISCII and Unicode halent/virama behaviour
ISCII Unicode
single halant E8 halant 094D
halant + halant E8 E8 halant + ZWNJ 094D 200C
halant + nukta E8 E9 halant + ZWJ 094D 200D
Nukta character ़—code point E9 (233)
The
nukta
character after another ISCII character is used for a number of rarer characters which don't exist in the main ISCII set. For example क (ka) + ़ (nukta) = क़ (qa). These characters have precomposed forms in Unicode, as shown in the following table.
Single Unicode characters corresponding to ISCII nukta sequences
ISCII
code point
Original
character
Character
with nukta
Unicode
code point
A1 (161) 0950
A6 (166) 090C
A7 (167) 0961
AA (176) 0960
B3 (179) क़ 0958
B4 (180) ख़ 0959
B5 (181) ग़ 095A
BA (186) ज़ 095B
BF (191) ड़ 095C
C0 (192) ढ़ 095D
C9 (201) फ़ 095E
DB (219) ि 0962
DC (220) 0963
DF (223) 0944
EA (234) 093D

Code pages for ISCII conversion

To convert from Unicode (UTF-8) to an ISCII / ANSI coding, the following code pages may be used:

  • 57002: Devanagari (Hindi, Marathi, Sanskrit, Konkani)
  • 57003: Bengali
  • 57004: Tamil
  • 57005: Telugu
  • 57006: Assamese
  • 57007: Odia
  • 57008: Kannada
  • 57009: Malayalam
  • 57010: Gujarati
  • 57011: Punjabi (Gurmukhi)
  • 54654: gg

Code points for all languages

References

External links