Ascii85
Ascii85, also called Base85, is a form of
Its main modern uses are in
Overview
The basic need for a binary-to-text encoding comes from a need to communicate arbitrary
Eighty-five is the minimum integral value of n such that n5 ≥ 2564; so any sequence of 4 bytes can be encoded as 5 symbols, as long as at least 85 distinct symbols are available. (Five radix-85 digits can represent the integers from 0 to 4,437,053,124 inclusive, which suffice to represent all 4,294,967,296 possible 4-byte sequences.)
Encoding
big-endian convention). This is converted, by repeatedly dividing by 85 and taking the remainder, into 5 radix-85 digits. Then each digit (again, most significant first) is encoded as an ASCII printable character by adding 33 to it, giving the ASCII characters 33 (! ) through 117 (u ).
Because all-zero data is quite common, an exception is made for the sake of data compression, and an all-zero group is encoded as a single character Groups of characters that decode to a value greater than 232 − 1 (encoded as LimitationsThe original specification only allows a stream that is a multiple of 4 bytes to be encoded. Encoded data may contain Historybtoa versionThe original btoa program always encoded full groups (padding the source as necessary), with a prefix line of "xbtoa Begin", and suffix line of "xbtoa End", followed by the original file length (in decimal and hexadecimal) and three 32-bit checksums. The decoder needs to use the file length to see how much of the group was padding. The initial proposal for btoa encoding used an encoding alphabet starting at the ASCII space character through "t" inclusive, but this was replaced with an encoding alphabet of "!" to "u" to avoid "problems with some mailers (stripping off trailing blanks)".[3] This program also introduced the special " ZMODEM version"ZMODEM Pack-7 encoding" encodes groups of 4 octets into groups of 5 printable ASCII characters in a similar, or possibly in the same way as Ascii85 does. When a ZMODEM program sends pre-compressed 8-bit data files over 7-bit data channels, it uses "ZMODEM Pack-7 encoding".[4] Adobe versionAdobe adopted the basic btoa encoding, but with slight changes, and gave it the name Ascii85. The characters used are the ASCII characters 33 ( The reverse is applied when decoding: The last block is padded to 5 bytes with the Ascii85 character The padding is not arbitrary. Converting from binary to base 64 only regroups bits and does not change them or their order (a high bit in binary does not affect the low bits in the base64 representation). In converting a binary number to base85 (85 is not a power of two) high bits do affect the low order base85 digits and conversely. Padding the binary low (with zero bits) while encoding and padding the base85 value high (with In Ascii85-encoded blocks, whitespace and line-break characters may be present anywhere, including in the middle of a 5-character block, but they must be silently ignored. Adobe's specification does not support the Example for Ascii85A quote from Thomas Hobbes's Leviathan:
If this is initially encoded using US-ASCII, it can be reencoded in Ascii85 as follows: 9jqo^BlbD-BleB1DJ+*+F(f,q/0JhKF<GL>[email protected]$d7F!,L7@<6@)/0JDEF<G%<+EV:2F!,O< DJ+*.@<*K0@<6L(Df-\0Ec5e;DffZ(EZee.Bl.9pF"AGXBPCsi+DGm>@3BB/F*&OCAfu2/AKYi( DIb:@FD,*)+C]U=@3BN#EcYf8ATD3s@q?d$AftVqCh[NqF<G:8+EV:.+Cf>-FD5W8ARlolDIal( DId<j@<?3r@:F%a+D58'ATD4$Bl@l3De:,-DJs`8ARoFb/0JMK@qB4^F!,R<AKZ&-DfTqBG%G>u D.RTpAKYo'+CT/5+Cei#DII?(E,9)oF*2M7/c
Since three bytes of padding had to be added, the three final characters 'YkO' are omitted from the output. Decoding is done inversely, except that the last 5-tuple is padded with 'u' characters:
Since the input had to be padded with three 'u' bytes, the last three bytes of the output are ignored and we end up with the original period. The input sentence does not contain 4 consecutive zero bytes, so the example does not show the use of the 'z' abbreviation. CompatibilityThe Ascii85 encoding is compatible with 7-bit and 8-bit MIME, while having less overhead than Base64. One potential compatibility issue of Ascii85 is that some of the characters it uses are significant in markup languages such as SGML. To include ascii85 data in these documents, it may be necessary to escape the quote, angle brackets, and ampersands .
RFC 1924 versionPublished on RFC 1924: "A Compact Representation of IPv6 Addresses" by Robert Elz suggests a base-85 encoding of IPv6 addresses. This differs from the scheme used above in that he proposes a different set of 85 ASCII characters, and proposes to do all arithmetic on the 128-bit number, converting it to a single 20-digit base-85 number (internal whitespace not allowed), rather than breaking it into four 32-bit groups.
The proposed character set is, in order, This character set excludes the characters See also
References
External links
|