Linear predictive coding

Linear predictive coding (LPC) is a method used mostly in

speech in compressed form, using the information of a linear predictive model.^[1]^[2]

LPC is the most widely used method in speech coding and speech synthesis. It is a powerful speech analysis technique, and a useful method for encoding good quality speech at a low bit rate.

Overview

LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (for voiced sounds), with occasional added hissing and popping sounds (for voiceless sounds such as sibilants and plosives). Although apparently crude, this Source–filter model is actually a close approximation of the reality of speech production. The glottis (the space between the vocal folds) produces the buzz, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract (the throat and mouth) forms the tube, which is characterized by its resonances; these resonances give rise to formants, or enhanced frequency bands in the sound produced. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives.

LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue.

The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech.

Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally, 30 to 50 frames per second give an intelligible speech with good compression.

Early history

Linear prediction (signal estimation) goes back to at least the 1940s when Norbert Wiener developed a mathematical theory for calculating the best filters and predictors for detecting signals hidden in noise.^[3]^[4] Soon after Claude Shannon established a general theory of coding, work on predictive coding was done by C. Chapin Cutler,^[5] Bernard M. Oliver^[6] and Henry C. Harrison.^[7] Peter Elias in 1955 published two papers on predictive coding of signals.^[8]^[9]

Linear predictors were applied to speech analysis independently by Fumitada Itakura of Nagoya University and Shuzo Saito of Nippon Telegraph and Telephone in 1966 and in 1967 by Bishnu S. Atal, Manfred R. Schroeder and John Burg. Itakura and Saito described a statistical approach based on maximum likelihood estimation; Atal and Schroeder described an adaptive linear predictor approach; Burg outlined an approach based on principle of maximum entropy.^[4]^[10]^[11]^[12]

In 1969, Itakura and Saito introduced method based on

audio compression format, introduced in 1993.^[14] Code-excited linear prediction (CELP) was developed by Schroeder and Atal in 1985.^[16]

LPC is the basis for

BBN Technologies started the first developments in packetized speech, which would eventually lead to voice-over-IP technology. In 1973, according to Lincoln Laboratory informal history, the first real-time 2400 bit/s LPC was implemented by Ed Hofstetter. In 1974, the first real-time two-way LPC packet speech communication was accomplished over the ARPANET at 3500 bit/s between Culler-Harrison and Lincoln Laboratory. In 1976, the first LPC conference took place over the ARPANET using the Network Voice Protocol, between Culler-Harrison, ISI, SRI, and LL at 3500 bit/s.^{[citation needed]}^{[clarification needed}

]

LPC coefficient representations

LPC is frequently used for transmitting spectral envelope information, and as such it has to be tolerant of transmission errors. Transmission of the filter coefficients directly (see linear prediction for a definition of coefficients) is undesirable, since they are very sensitive to errors. In other words, a very small error can distort the whole spectrum, or worse, a small error might make the prediction filter unstable.

There are more advanced representations such as log area ratios (LAR), line spectral pairs (LSP) decomposition and reflection coefficients. Of these, especially LSP decomposition has gained popularity since it ensures the stability of the predictor, and spectral errors are local for small coefficient deviations.

Applications

LPC is the most widely used method in

digitized, encrypted and sent over a narrow voice channel; an early example of this is the US government's Navajo I

.

LPC synthesis can be used to construct vocoders where musical instruments are used as an excitation signal to the time-varying filter estimated from a singer's speech. This is somewhat popular in electronic music.

notjustmoreidlechatter using linear predictive coding.^[18]

A 10th-order LPC was used in the popular 1980s

Speak & Spell

educational toy.

LPC predictors are used in

MPEG-4 ALS, FLAC, SILK audio codec, and other lossless

audio codecs.

LPC has received some attention as a tool for use in the tonal analysis of violins and other stringed musical instruments.[19]

References

ISBN 978-0-8247-4040-5
.

ISBN 978-0-387-77591-3
.

S2CID 15601493
.

^ ^a ^b Y. Sasahira; S. Hashimoto (1995). "Voice pitch changing by Linear Predictive Coding Method to keep the Singer's Personal Timbre" (PDF). {{cite journal}}: Cite journal requires |journal= (help)

^ US 2605361, C. C. Cutler, "Differential quantization of communication signals", published 1952-07-29

^ B. M. Oliver (1952). "Efficient coding". 31 (4). Nokia Bell Labs: 724–750. {{cite journal}}: Cite journal requires |journal= (help)

^ H. C. Harrison (1952). "Experiments with linear prediction in television". 31. Bell System Technical Journal: 764–783. {{cite journal}}: Cite journal requires |journal= (help)

^ P. Elias (1955). "Predictive coding I". IT-1 no. 1. IRE Trans. Inform.Theory: 16–24. {{cite journal}}: Cite journal requires |journal= (help)

^ P. Elias (1955). "Predictive coding II". IT-1 no. 1. IRE Trans. Inform. Theory: 24–33. {{cite journal}}: Cite journal requires |journal= (help)

^ S. Saito; F. Itakura (Jan 1967). "Theoretical consideration of the statistical optimum recognition of the spectral density of speech". J. Acoust. Soc.Japan. {{cite journal}}: Cite journal requires |journal= (help)

^ B.S. Atal; M.R. Schroeder (1967). "Predictive coding of speech". Conf. Communications and Proc. {{cite journal}}: Cite journal requires |journal= (help)

^ J.P. Burg (1967). "Maximum Entropy Spectral Analysis". Proceedings of 37th Meeting, Society of Exploration Geophysics, Oklahoma City. {{cite journal}}: Cite journal requires |journal= (help)

^
ISSN 1932-8346. Archived
(PDF) from the original on 2022-10-09.

^
ISBN 9783319056609
.

doi:10.1109/ICASSP.1978.1170564
.

S2CID 14803427
.

S2CID 212485331. Archived from the original
(PDF) on 2019-10-18. Retrieved 18 October 2019.

^ Lansky, Paul. "More Than Idle Chatter". Archived from the original on 2017-12-24. Retrieved 2024-06-02.

^ Tai, Hwan-Ching; Chung, Dai-Ting (June 14, 2012). "Stradivari Violins Exhibit Formant Frequencies Resembling Vowels Produced by Females". Savart Journal. 1 (2).

Further reading

O'Shaughnessy, D. (1988). "Linear predictive coding". IEEE Potentials. 7 (1): 29–32.
S2CID 12786562
.

ISBN 978-3-540-13938-6. {{cite book}}: |journal= ignored (help
)

El-Jaroudi, Amro (2003). "Linear Predictive Coding". Wiley Encyclopedia of Telecommunications.
ISBN 978-0471219286. {{cite book}}: |journal= ignored (help
)

External links

real-time LPC analysis/synthesis learning software

30 years later Dr Richard Wiggins Talks Speak & Spell development

Robert M. Gray, IEEE Signal Processing Society, Distinguished Lecturer Program

v
t
e
Data compression methods
Lossless
Entropy type

Adaptive coding

Arithmetic

Asymmetric numeral systems

Golomb

Huffman
Adaptive

Canonical

Modified

Range

Shannon

Shannon–Fano

Shannon–Fano–Elias

Tunstall

Unary

Universal
Exp-Golomb

Fibonacci

Gamma

Levenshtein

Dictionary type

Byte pair encoding

Lempel–Ziv
842

LZ4

LZJB

LZO

LZRW

LZSS

LZW

LZWL

Snappy

Other types

BWT

CTW

CM

Delta
Incremental

DMC

DPCM

Grammar
Re-Pair

Sequitur

LDCT

MTF

PAQ

PPM

RLE

Hybrid

LZ77 + Huffman
Deflate

LZX

LZS

LZ77 + ANS
LZFSE

LZ77 + Huffman + ANS
Zstandard

LZ77 + Huffman + context
Brotli

LZSS + Huffman
LHA/LZH

LZ77 + Range
LZMA

LZHAM

bzip2 (RLE + BWT + MTF + Huffman)

Lossy
Transform type

Discrete cosine transform
DCT

MDCT

DST

FFT

Wavelet
Daubechies

DWT

SPIHT

Predictive type

DPCM
ADPCM

LPC
ACELP

CELP

LAR

LSP

WLPC

Motion
Compensation

Estimation

Vector

Psychoacoustic

Audio
Concepts

Bit rate
ABR

CBR

VBR

Companding

Convolution

Dynamic range

Latency

Nyquist–Shannon theorem

Sampling

Silence compression

Sound quality

Speech coding

Sub-band coding

Codec parts

A-law

μ-law

DPCM
ADPCM

DM

FT
FFT

LPC
ACELP

CELP

LAR

LSP

WLPC

MDCT

Psychoacoustic model

Image
Concepts

Chroma subsampling

Coding tree unit

Color space

Compression artifact

Image resolution

Macroblock

Pixel

PSNR

Quantization

Standard test image

Texture compression

Methods

Chain code

DCT

Deflate

Fractal

KLT

LP

RLE

Wavelet
Daubechies

DWT

EZW

SPIHT

Video
Concepts

Bit rate
ABR

CBR

VBR

Display resolution

Frame

Frame rate

Frame types

Interlace

Video characteristics

Video quality

Codec parts

DCT

DPCM

Deblocking filter

Lapped transform

Motion
Compensation

Estimation

Vector

Wavelet
Daubechies

DWT

Theory

Compressed data structures
Compressed suffix array

FM-index

Entropy

Information theory
Timeline

Kolmogorov complexity

Prefix code

Quantization

Rate–distortion

Redundancy

Symmetry

Smallest grammar problem

Community

Hutter Prize

People

Mark Adler

Compression formats

Compression software (codecs)

Retrieved from "https://en.wikipedia.org/w/index.php?title=Linear_predictive_coding&oldid=1226900322"

[1] ISBN 978-0-8247-4040-5
.

[2] ISBN 978-0-387-77591-3
.

[3] S2CID 15601493
.

[Sasahira-4] Y. Sasahira; S. Hashimoto (1995). "Voice pitch changing by Linear Predictive Coding Method to keep the Singer's Personal Timbre" (PDF). {{cite journal}}: Cite journal requires |journal= (help)

[5] US 2605361, C. C. Cutler, "Differential quantization of communication signals", published 1952-07-29

[6] B. M. Oliver (1952). "Efficient coding". 31 (4). Nokia Bell Labs: 724–750. {{cite journal}}: Cite journal requires |journal= (help)

[7] H. C. Harrison (1952). "Experiments with linear prediction in television". 31. Bell System Technical Journal: 764–783. {{cite journal}}: Cite journal requires |journal= (help)

[8] P. Elias (1955). "Predictive coding I". IT-1 no. 1. IRE Trans. Inform.Theory: 16–24. {{cite journal}}: Cite journal requires |journal= (help)

[9] P. Elias (1955). "Predictive coding II". IT-1 no. 1. IRE Trans. Inform. Theory: 24–33. {{cite journal}}: Cite journal requires |journal= (help)

[10] S. Saito; F. Itakura (Jan 1967). "Theoretical consideration of the statistical optimum recognition of the spectral density of speech". J. Acoust. Soc.Japan. {{cite journal}}: Cite journal requires |journal= (help)

[11] B.S. Atal; M.R. Schroeder (1967). "Predictive coding of speech". Conf. Communications and Proc. {{cite journal}}: Cite journal requires |journal= (help)

[12] J.P. Burg (1967). "Maximum Entropy Spectral Analysis". Proceedings of 37th Meeting, Society of Exploration Geophysics, Oklahoma City. {{cite journal}}: Cite journal requires |journal= (help)

[Gray-13] 
ISSN 1932-8346. Archived
(PDF) from the original on 2022-10-09.

[Schroeder2014-14] 
ISBN 9783319056609
.

[15] :10.1109/ICASSP.1978.1170564
.

[16] S2CID 14803427
.

[17] S2CID 212485331. Archived from the original
(PDF) on 2019-10-18. Retrieved 18 October 2019.

[18] Lansky, Paul. "More Than Idle Chatter". Archived from the original on 2017-12-24. Retrieved 2024-06-02.

[tai-19] Tai, Hwan-Ching; Chung, Dai-Ting (June 14, 2012). "Stradivari Violins Exhibit Formant Frequencies Resembling Vowels Produced by Females". Savart Journal. 1 (2).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[14]

[16]

[18]

Overview

Early history

LPC coefficient representations

Applications

See also

References

Further reading

External links