8-bit clean

Source: Wikipedia, the free encyclopedia.

8-bit clean is an attribute of

computer systems, communication channels, and other devices and software, that process 8-bit character encodings without treating any byte as an in-band
control code.

History

Until the early 1990s, many programs and data transmission channels were character-oriented and treated some characters, e.g.,

flag bit, or meta data control bit. 7-bit systems and data links are unable to directly handle more complex character codes which are commonplace in non-English-speaking countries with larger alphabets
.

Binary files of octets cannot be transmitted through 7-bit data channels directly. To work around this, binary-to-text encodings have been devised which use only 7-bit ASCII characters. Some of these encodings are uuencoding, Ascii85, SREC, BinHex, kermit and MIME's Base64. EBCDIC-based systems cannot handle all characters used in UUencoded data. However, the base64 encoding does not have this problem.

SMTP and NNTP 8-bit cleanness

Historically, various media were used to transfer messages, some of them only supporting 7-bit data, so an 8-bit message had high chances to be garbled during transmission in the 20th century. But some implementations really did not care about formal discouraging of 8-bit data and allowed high bit set bytes to pass through. Such implementations are said to be 8-bit clean. In general, a communications protocol is said to be 8-bit clean if it correctly passes through the high bit of each byte in the communication process.

Many early

RFC 1056, were designed to work over such "7-bit" communication links. They specifically require the use of ASCII character set "transmitted as an 8-bit byte with the high-order bit cleared to zero" and some of these[1]
explicitly restrict all data to 7-bit characters.

For the first few decades of email networks (1971 to the early 1990s), most email messages were plain text in the 7-bit US-ASCII character set.[2]

The

RFC 780, limits Internet Mail to lines (1000 characters or less) of 7-bit US-ASCII characters.[3][4][5][6]

Later the format of email messages was re-defined in order to support messages that are not entirely US-ASCII text (text messages in character sets other than US-ASCII, and non-text messages, such as audio and images).[6] The header field Content-Transfer-Encoding=binary[a] requires an 8-bit clean transport.

MIME encoding of non-ASCII data.

The Internet community generally adds features by extension, allowing communication in both directions between upgraded machines and not-yet-upgraded machines, rather than declaring formerly standards-compliant legacy software to be "broken" and insisting that all software worldwide be upgraded to the latest standard. In the mid-1990s, people[

RFC 6152. This "just-send-8" attitude does not in fact cause problems in practice, since virtually all modern email servers are 8-bit clean.[14]

See also

Notes

  1. CRLF
    has special significance.

References