Harmonic Vector Excitation Coding

Source: Wikipedia, the free encyclopedia.

Harmonic Vector Excitation Coding, abbreviated as HVXC is a

sampling frequency of 8 kHz. It also operates at lower bitrates, such as 1.2 - 1.7 kbit/s, using a variable bit rate technique.[1] The total algorithmic delay for the encoder and decoder is 36 ms.[2]

It was published as subpart 2 of ISO/IEC 14496-3:1999 (MPEG-4 Audio) in 1999.[3] An extended version of HVXC was published in MPEG-4 Audio Version 2 (ISO/IEC 14496-3:1999/Amd 1:2000).[4][5]

MPEG-4 Natural Speech Coding Tool Set uses two algorithms: HVXC and CELP (

Code Excited Linear Prediction). HVXC is used at a low bit rate of 2 or 4 kbit/s. Higher bitrates than 4 kbit/s in addition to 3.85 kbit/s are covered by CELP.[6]

Technology

Linear Predictive Coding

HVXC uses

unvoiced. In the case of voiced speech, the residual is coded in a parametric representation (operating as a vocoder
), while in the case of unvoiced speech, the residual waveform is quantized (thus operating as a hybrid speech codec).

Voiced (Harmonic) Residual Coding

In voiced segments, the residual signal is represented by two parameters: the pitch period and the spectral envelope.

transformed into the DFT-domain.[2] The DFT-spectrum is segmented into bands, one band per harmonic. The frequency band for the m-th harmonic consists of the DFT-coefficients from (m-1/2)ω0 to (m+1/2)ω0, ω0 being the pitch frequency.[2] The amplitude value for the m-th harmonic is chosen to optimally represent these DFT-coefficients.[2] Phase information is discarded in this process. The spectral envelope is then coded using variable-dimension weighted vector quantization
. This process is also referred to as Harmonic VQ.

To make a speech with a mixture of voiced and unvoiced excitation sound more natural and smooth, three different modes of voiced speech (Mixed Voiced-1, Mixed Voiced-2, Full Voiced) are differentiated.[2] The degree of voicing is determined by the value of the normalized autocorrelation function at a shift of one pitch period. Depending on the chosen mode, different amounts of band-pass Gaussian noise are added to the synthesized harmonic signal by the decoder.

Voiceless (VXC) Residual Coding

Unvoiced segments are encoded according to the

long-term prediction
of voiced segments. However, since HVXC does not use CELP for voiced segments, the dynamic codebook is omitted from the design.

See also

References

  1. ^ ISO/IEC (2009-09-01), ISO/IEC 14496-3:2009 - Information technology -- Coding of audio-visual objects -- Part 3: Audio (PDF), IEC, retrieved 2009-10-07
  2. ^ a b c d e f g h i j k Masayuki Nishiguchi (2006-04-17), Harmonic vector excitation coding of speech (PDF), Acoustical Science and Technology, retrieved 2009-10-09
  3. ^ ISO (1999). "ISO/IEC 14496-3:1999 - Information technology -- Coding of audio-visual objects -- Part 3: Audio". ISO. Retrieved 2009-10-09.
  4. ^ ISO (2000). "ISO/IEC 14496-3:1999/Amd 1:2000 - Audio extensions". ISO. Retrieved 2009-10-07.
  5. ^ ISO/IEC JTC 1/SC 29/WG 11 (July 1999), ISO/IEC 14496-3:/Amd.1 - Final Committee Draft - MPEG-4 Audio Version 2 (PDF), archived from the original (PDF) on 2012-08-01, retrieved 2009-10-07{{citation}}: CS1 maint: numeric names: authors list (link)
  6. ^ Karlheinz Brandenburg; Oliver Kunz; Akihiko Sugiyama. "MPEG-4 Natural Audio Coding - Natural Speech Coding Tools" (PDF). Retrieved 2013-03-25.