Ascii85

Ascii85, also called Base85, is a form of

uuencode or Base64

, which use four characters to represent three bytes of data (1⁄3 increase, assuming eight bits per ASCII character).

Its main modern uses are in

Portable Document Format file formats, as well as in the patch encoding for binary files used by Git.^[1]

Overview

The basic need for a binary-to-text encoding comes from a need to communicate arbitrary

whitespace. Thus, only the 94 printable ASCII characters

are "safe" to use to convey data.

Eighty-five is the minimum integer value of n such that n⁵ ≥ 256⁴ $=$ 2³² so any sequence of 4 bytes can be encoded as 5 symbols, as long as at least 85 distinct symbols are available. (Five radix-85 digits can represent the integers from 0 to 4,437,053,124 inclusive, which suffice to represent all 4,294,967,296 possible 4-byte sequences.)

Characters used by the encoded text are !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu and additionally z to mark a sequence of four zero bytes.

Encoding

big-endian

convention). This is converted, by repeatedly dividing by 85 and taking the remainder, into 5 radix-85 digits. Then each digit (again, most significant first) is encoded as an ASCII printable character by adding 33 to it, giving the ASCII characters 33 (!) through 117 (u).

Because all-zero data is quite common, an exception is made for the sake of data compression, and an all-zero group is encoded as a single character z instead of !!!!!.

Groups of characters that decode to a value greater than 2³² − 1 (encoded as s8W-!) will cause a decoding error, as will z characters in the middle of a group. White space between the characters is ignored and may occur anywhere to accommodate line-length limitations.

Limitations

The original specification only allows a stream that is a multiple of 4 bytes to be encoded.

Encoded data may contain

RFC 1924 are designed to be safe in source code.^[2]

History

btoa version

"btoa" redirects here. For the JavaScript btoa() function, see Base64.

The original btoa program always encoded full groups (padding the source as necessary), with a prefix line of "xbtoa Begin", and suffix line of "xbtoa End", followed by the original file length (in decimal and hexadecimal) and three 32-bit checksums. The decoder needs to use the file length to see how much of the group was padding. The initial proposal for btoa encoding used an encoding alphabet starting at the ASCII space character through "t" inclusive, but this was replaced with an encoding alphabet of "!" to "u" to avoid "problems with some mailers (stripping off trailing blanks)".^[3] This program also introduced the special "z" short form for an all-zero group. Version 4.2 added a "y" exception for a group of all ASCII space characters (0x20202020).

ZMODEM version

"ZMODEM Pack-7 encoding" encodes groups of 4 octets into groups of 5 printable ASCII characters in a similar, or possibly in the same way as Ascii85 does. When a ZMODEM program sends pre-compressed 8-bit data files over 7-bit data channels, it uses "ZMODEM Pack-7 encoding".^[4]

Adobe version

Adobe adopted the basic btoa encoding, but with slight changes, and gave it the name Ascii85. The characters used are the ASCII characters 33 (!) through 117 (u) inclusive (to represent the base-85 digits 0 through 84), together with the letter z (as a special case to represent a 32-bit 0 value), and white space is ignored. Adobe uses the delimiter "~>" to mark the end of an Ascii85-encoded string, and the string may be prefixed by "<~".^[5] Adobe represents the length by truncating the final group: If the last block of source bytes contains fewer than 4 bytes, the block is padded with up to 3 null bytes before encoding. After encoding, as many bytes as were added as padding are removed from the end of the output.

The reverse is applied when decoding: The last block is padded to 5 bytes with the Ascii85 character u, and as many bytes as were added as padding are omitted from the end of the output (see example).

The padding is not arbitrary. Converting from binary to base 64 only regroups bits and does not change them or their order (a high bit in binary does not affect the low bits in the base64 representation). In converting a binary number to base85 (85 is not a power of two) high bits do affect the low order base85 digits and conversely. Padding the binary low (with zero bits) while encoding and padding the base85 value high (with us) in decoding assures that the high order bits are preserved (the zero padding in the binary gives enough room so that a small addition is trapped and there is no "carry" to the high bits).

In Ascii85-encoded blocks, whitespace and line-break characters may be present anywhere, including in the middle of a 5-character block, but they must be silently ignored.

Adobe's specification does not support btoa's y exception.

Example for Ascii85

Take this quote from Thomas Hobbes's Leviathan:

Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.

Assuming that 269-character quote is provided in US-ASCII or a 100% compatible encoding to start with, it can then be re-encoded in Ascii85 as the following 337 characters (count and output shown without "<~" and "~>" pre/postfixes):^[a]

9jqo^BlbD-BleB1DJ+*+F(f,q/0JhKF<GL>[email protected]$d7F!,L7@<6@)/0JDEF<G%<+EV:2F!,O<
DJ+*.@<*K0@<6L(Df-\0Ec5e;DffZ(EZee.Bl.9pF"AGXBPCsi+DGm>@3BB/F*&OCAfu2/AKYi(
DIb:@FD,*)+C]U=@3BN#EcYf8ATD3s@q?d$AftVqCh[NqF<G:8+EV:.+Cf>-FD5W8ARlolDIal(
DId<j@<?3r@:F%a+D58'ATD4$Bl@l3De:,-DJs`8ARoFb/0JMK@qB4^F!,R<AKZ&-DfTqBG%G>u
D.RTpAKYo'+CT/5+Cei#DII?(E,9)oF*2M7/c

For a detailed look at the re-encoding, this is the beginning of the Hobbes quote:

Text content	M								a								n																...
ASCII	77 (0x4d)								97 (0x61)								110 (0x6e)								32 (0x20)								...
Bit pattern	0	1	0	0	1	1	0	1	0	1	1	0	0	0	0	1	0	1	1	0	1	1	1	0	0	0	1	0	0	0	0	0	...
32-bit value	1,298,230,816 = 24×85⁴ + 73×85³ + 80×85² + 78×85 + 61																																...
Base 85 (+33)	24 (57)						73 (106)							80 (113)						78 (111)							61 (94)						...
ASCII	9						j							q						o							^						...

...and the following is the end of the quote (penultimate 4-tuple):

Text content	s								u								r								e
ASCII	115 (0x73)								117 (0x75)								114 (0x72)								101 (0x65)
Bit pattern	0	1	1	1	0	0	1	1	0	1	1	1	0	1	0	1	0	1	1	1	0	0	1	0	0	1	1	0	0	1	0	1
32-bit value	1,937,076,837 = 37×85⁴ + 9×85³ + 17×85² + 44×85 + 22
Base 85 (+33)	37 (70)						9 (42)							17 (50)						44 (77)							22 (55)
ASCII	F						*							2						M							7

As however the final 4-tuple is incomplete after the period, it must be padded with three zero bytes:

Text content	.								\0								\0								\0
ASCII	46 (0x2e)								0 (0x00)								0 (0x00)								0 (0x00)
Bit pattern	0	0	1	0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
32-bit value	771,751,936 = 14×85⁴ + 66×85³ + 56×85² + 74×85 + 46
Base 85 (+33)	14 (47)						66 (99)							56 (89)						74 (107)							46 (79)
ASCII	/						c							Y						k							O

Since three bytes of padding had to be added, the three final characters 'YkO' are omitted from the output.

Decoding is done inversely, except that the last 5-tuple is padded with 'u' characters:

ASCII	/						c							u						u							u
Base 85 (+33)	14 (47)						66 (99)							84 (117)						84 (117)							84 (117)
32-bit value	771,955,124 = 14×85⁴ + 66×85³ + 84×85² + 84×85 + 84
Bit pattern	0	0	1	0	1	1	1	0	0	0	0	0	0	0	1	1	0	0	0	1	1	0	0	1	1	0	1	1	0	1	0	0
ASCII	46								3								25								180
Text content	.								[ ETX ]								[ EM ]								´ (Extended ASCII)

Since the input had to be padded with three 'u' bytes, the last three bytes of the output are ignored and we end up with the original period.

The input sentence does not contain 4 consecutive zero bytes, so the example does not show the use of the 'z' abbreviation.

Compatibility

The Ascii85 encoding is compatible with 7-bit and 8-bit MIME, while having less overhead than Base64.

One potential compatibility issue of Ascii85 is that some of the characters it uses are significant in markup languages such as

SGML. To include Ascii85 data in these documents, it may be necessary to escape the quote, angle brackets, and ampersands

RFC 1924 version

Published on

RFC 1924: "A Compact Representation of IPv6 Addresses" by Robert Elz suggests a base-85 encoding of IPv6 addresses as an April Fools' Day

joke. This differs from the scheme used above in that he proposes a different set of 85 ASCII characters, and proposes to do all arithmetic on the 128-bit number, converting it to a single 20-digit base-85 number (internal whitespace not allowed), rather than breaking it into four 32-bit groups.

The proposed character set is, in order, 0–9, A–Z, a–z, and then the 23 characters !#$%&()*+-;<=>?@^_`{|}~. The highest possible representable address, 2¹²⁸−1 = 74×85¹⁹ + 53×85¹⁸ + 5×85¹⁷ + ..., would be encoded as =r54lj&NUUO~Hi%c2ym0.

This character set excludes the characters "',./:[\] , making it suitable for use in JSON strings (where " and \ would require escaping). However, for SGML-based protocols, notably including XML, string escapes may still be required (to accommodate <, > and &).

Notes

^ This output is also limited to 75-character lines for technical/formatting reasons. Selecting and copying it will include line breaks, which will increase the character count; however the actual Ascii85 decoder ignores line breaks.

References

Hamano, Junio C (May 5, 2006). "[PATCH] binary patch". git. Archived from the original
on 2020-07-26.

^ "32/Z85" on ZeroMQ RFC

^ Orost, Joe (Mar 26, 1991). "Re: COMPRESSING of binary data into mailable ASCII Re: Encoding of binary data into mailable ASCII". Google Groups. Retrieved 11 April 2015.

^ Chuck Forsberg. "Recent Developments in ZMODEM". omen.com. Archived from the original on 2015-09-24. Retrieved 2013-05-14.. "ZMODEM Pack-7 packs 4 bytes into 5 printing characters."

^ https://github.com/lindig/ascii85/blob/master/ascii85enc.pod

External links

basE91

PostScript Language Reference (Adobe) - see ASCII85Encode Filter

v
t
e
Data exchange formats
Human
readable

Atom

CSV

EDIFACT

JSON
Web Encryption

Web Token

Web Signature

Property list

RDF

Rebol

TOML

XML

YAML

Binary

AMF

Ascii85

ASN.1
SMI

Avro

Base32

Base64

Bencode

BSON
UBJSON

Cap'n Proto

CBOR

FlatBuffers

MessagePack

Property list

Protocol Buffers

Thrift

Cyphal DSDL

XDR

uuencode

yEnc

Comparison of data-serialization formats

Retrieved from "https://en.wikipedia.org/w/index.php?title=Ascii85&oldid=1296446606"

This page is based on the copyrighted Wikipedia article: Ascii85. Articles is available under the CC BY-SA 3.0 license; additional terms may apply.Privacy Policy