Entropy coding

In

source coding theorem, which states that any lossless data compression method must have an expected code length greater than or equal to the entropy of the source.^[1]

More precisely, the source coding theorem states that for any source distribution, the expected code length satisfies $\operatorname {E} _{x\sim P}[\ell (d(x))]\geq \operatorname {E} _{x\sim P}[-\log _{b}(P(x))]$ , where $\ell$ is the number of symbols in a code word, $d$ is the coding function, $b$ is the number of symbols used to make output codes and $P$ is the probability of the source symbol. An entropy coding attempts to approach this lower bound.

Two of the most common entropy coding techniques are Huffman coding and arithmetic coding.^[2] If the approximate entropy characteristics of a data stream are known in advance (especially for signal compression), a simpler static code may be useful. These static codes include

Rice coding

).

Since 2014, data compressors have started using the asymmetric numeral systems family of entropy coding techniques, which allows combination of the compression ratio of arithmetic coding with a processing cost similar to Huffman coding.

Entropy as a measure of similarity

Besides using entropy coding as a way to compress digital data, an entropy encoder can also be used to measure the amount of similarity between streams of data and already existing classes of data. This is done by generating an entropy coder/compressor for each class of data; unknown data is then classified by feeding the uncompressed data to each compressor and seeing which compressor yields the highest compression. The coder with the best compression is probably the coder trained on the data that was most similar to the unknown data.

References

S2CID 20260346
.

ISSN 0096-8390
.

External links

Information Theory, Inference, and Learning Algorithms, by
David MacKay (2003), gives an introduction to Shannon theory and data compression, including the Huffman coding and arithmetic coding
.

Source Coding, by T. Wiegand and H. Schwarz (2011).

v
t
e
Data compression methods
Lossless
Entropy type

Adaptive coding

Arithmetic

Asymmetric numeral systems

Golomb

Huffman
Adaptive

Canonical

Modified

Range

Shannon

Shannon–Fano

Shannon–Fano–Elias

Tunstall

Unary

Universal
Exp-Golomb

Fibonacci

Gamma

Levenshtein

Dictionary type

Byte pair encoding

Lempel–Ziv
842

LZ4

LZJB

LZO

LZRW

LZSS

LZW

LZWL

Snappy

Other types

BWT

CTW

CM

Delta
Incremental

DMC

DPCM

Grammar
Re-Pair

Sequitur

LDCT

MTF

PAQ

PPM

RLE

Hybrid

LZ77 + Huffman
Deflate

LZX

LZS

LZ77 + ANS
LZFSE

LZ77 + Huffman + ANS
Zstandard

LZ77 + Huffman + context
Brotli

LZSS + Huffman
LHA/LZH

LZ77 + Range
LZMA

LZHAM

RLE + BWT + MTF + Huffman
bzip2

Lossy
Transform type

Discrete cosine transform
DCT

MDCT

DST

FFT

Wavelet
Daubechies

DWT

SPIHT

Predictive type

DPCM
ADPCM

LPC
ACELP

CELP

LAR

LSP

WLPC

Motion
Compensation

Estimation

Vector

Psychoacoustic

Audio
Concepts

Bit rate
ABR

CBR

VBR

Companding

Convolution

Dynamic range

Latency

Nyquist–Shannon theorem

Sampling

Silence compression

Sound quality

Speech coding

Sub-band coding

Codec parts

A-law

μ-law

DPCM
ADPCM

DM

FT
FFT

LPC
ACELP

CELP

LAR

LSP

WLPC

MDCT

Psychoacoustic model

Image
Concepts

Chroma subsampling

Coding tree unit

Color space

Compression artifact

Image resolution

Macroblock

Pixel

PSNR

Quantization

Standard test image

Texture compression

Methods

Chain code

DCT

Deflate

Fractal

KLT

LP

RLE

Wavelet
Daubechies

DWT

EZW

SPIHT

Video
Concepts

Bit rate
ABR

CBR

VBR

Display resolution

Frame

Frame rate

Frame types

Interlace

Video characteristics

Video quality

Codec parts

DCT

DPCM

Deblocking filter

Lapped transform

Motion
Compensation

Estimation

Vector

Wavelet
Daubechies

DWT

Theory

Compressed data structures
Compressed suffix array

FM-index

Entropy

Information theory
Timeline

Kolmogorov complexity

Prefix code

Quantization

Rate–distortion

Redundancy

Symmetry

Smallest grammar problem

Community

Hutter Prize

People

Mark Adler

Authority control databases: National
Germany

Retrieved from "https://en.wikipedia.org/w/index.php?title=Entropy_coding&oldid=1285712490"

[1] S2CID 20260346
.

[Huffman_1952_pp._1098–1101-2] ISSN 0096-8390
.

[1]

[2]

Entropy as a measure of similarity

See also

References

External links