Binaural unmasking
Binaural unmasking is phenomenon of auditory perception discovered by Ira Hirsh.[1] In binaural unmasking, the brain combines information from the two ears in order to improve signal detection and identification in noise. The phenomenon is most commonly observed when there is a difference between the interaural phase of the signal and the interaural phase of the noise. When such a difference is present there is an improvement in masking threshold compared to a reference situation in which the interaural phases are the same, or when the stimulus has been presented monaurally. Those two cases usually give very similar thresholds. The size of the improvement is known as the "binaural masking level difference" (BMLD), or simply as the "masking level difference".
Binaural unmasking is most effective at low frequencies. The BMLD for pure tones in broadband noise reaches a maximum value of about 15
Improved identification of
Labelling system
A systematic labelling system for different stimulus configurations, first used by Jeffress,[5] has been adopted by most authors in the area. The condition names are written NxSy, where x is interaural configuration of the noise and y is the interaural configuration of the signal. Some common values for x and y include:
- 0 means that the signal or noise is identical at the two ears
- π means that the signal or noise has an interaural phase difference of π radians
- τ means that the signal or noise has an interaural time difference, where the exact value of the time difference, τ, is specified elsewhere.
- ρ means that the noise has an interaural correlation of less than one, the exact correlation being specified elsewhere.
- u means that the signal or noise is uncorrelated across the two ears.
- m means that the signal or noise is monaural.
Theories
Binaural unmasking has two main explanatory frameworks. These are based on interaural cross-correlation [6] and interaural subtraction.[7]
The cross-correlation account relies on the existence of a coincidence detection network in the midbrain similar to that proposed by Lloyd A. Jeffress[8] to account for sensitivity to interaural time differences in sound localization. Each coincidence detector receives a stream of action potentials from the two ears via a network of axons that introduce differential transmission delays. Detection of a signal is thought to occur when the response rate of the most active coincidence detector is reduced by the presence of a signal. Cross-correlation of the signals at the two ears is often used as mathematical surrogate for modelling such an array of coincidence detecting neurons; the reduced response rate is translated into a reduction in the cross-correlation maximum.
The subtractive account is known as "equalization-cancellation" or "EC" theory. In this account, the waveforms at the two ears (or their internal representations) are temporally aligned (equalized) by the brain, before being subtracted one from the other. In effect, the coincidence detectors are replaced with neurons that are excited by action potentials from one ear, but inhibited by action potentials from the other. However, EC theory is not generally framed in such explicit neurological terms, and no suitable neural substrate has been identified in the brain. Nonetheless, EC theory has proved a very popular modelling framework, and has fared well in direct comparison with cross-correlation models in psychoacoustic experiments [9]
Perceptual cues
The ear filters incoming sound into different frequencies: a given place in the cochlea, and a given auditory nerve fibre, respond only to a limited range of frequencies. Consequently, researchers have examined the cues that are generated by mixtures of speech and noise at the two ears within a narrow frequency band around the signal. When a signal and narrowband noise are added, a
Experiments have examined which of these cues the auditory system can best detect. These have shown that, at low frequencies (specifically 500 Hz), the auditory system is most sensitive to the interaural time differences.[10] At higher frequencies, however, there seems to be a transition to using interaural level differences.[11]
Practical implications
In everyday life, speech is more easily understood in noise when speech and noise come from different directions, a phenomenon known as "spatial release from masking". In this situation, the speech and noise have distinct