Audio-to-video synchronization
Development of the music video |
---|
Audio-to-video synchronization (AV synchronization, also known as
In industry terminology, the lip-sync error is expressed as the amount of time the audio departs from perfect synchronization with the video where a positive time number indicates the audio leads the video and a negative number indicates the audio lags the video.[1] This terminology and standardization of the numeric lip-sync error is utilized in the professional broadcast industry as evidenced by the various professional papers,[2] standards such as ITU-R BT.1359-1, and other references below.
Digital or analog
Sources of error
There are different ways in which the AV-sync can get incorrectly synchronized.
During creation AV-sync errors happen because of internal AV-sync error due to different signal processing delays between image and sound in video camera and microphone. The AV-sync delay is normally fixed. External AV-sync errors can occur if a microphone is placed far away from the sound source, the audio will be out of sync because the speed of sound is much lower than the speed of light. If the sound source is 340 meters from the microphone, then the sound arrives approximately 1 second later than the light. The AV-sync delay increases with distance. During mixing of video clips normally either the audio or video needs to be delayed so they are synchronized. The AV-sync delay is static but can vary with the individual clip. Video editing effects can delay video causing it to lag the audio.
Transmission (
Processing circuits
Some transmission protocols like RTP require an out-of-band method for synchronizing media streams. In some RTP systems each media stream has its own timestamp using an independent clock rate and per-stream randomized starting value. A RTCP Sender Report (SR) may be needed for each stream in order to synchronize streams.[3]
Effect of no explicit AV-sync timing
When a digital or analog audio-video stream does not have some sort of explicit AV-sync timing these effects will cause the stream to become out of sync:
- In film movies these timing errors are most commonly caused by worn films skipping over the movie projector sprockets because the film has torn sprocket holes.
- Errors can also be caused by the projectionist misthreading the film in the projector, although this is rare with competent projectionists.
- AV-sync is commonly corrected and maintained with an audio synchronizer. Television industry standards organizations have established acceptable amounts of audio and video timing error and suggested practices related to maintaining acceptable timing.[4][1]
- AV-sync errors are becoming a significant problem in the plasma displays.
- In the television field, audio-video sync problems are commonly caused when significant amounts of video processing is performed on the video part of the television program.
- Typical sources of significant video delays in the television field include video synchronizers and video compression encoders and decoders. Particularly troublesome encoders and decoders are used in MPEG compression systems utilized for broadcasting digital televisionand storing television programs on consumer and professional recording and playback devices.
- A source of significant video delay is found in pixelatedtelevision displays (LCD, DLP and plasma) which utilize complex video signal processing to convert the resolution of the incoming video signal to the native resolution of the pixelated display, for example converting standard definition video to be displayed on a high definition display. Lip-flap may exceed 200 ms at times.
- In broadcast television, it is not unusual for lip-sync error to vary by over 100 ms (several video frames) from time to time.
- The EBU Recommendation R37 "The relative timing of the sound and vision components of a television signal" states that end-to-end audio/video sync should be within +40 ms and -60 ms (audio before/after video, respectively) and that each stage should be within +5 ms and -15 ms.[5]
Viewer experience of incorrectly synchronized AV-sync
The result typically leaves a filmed or televised character moving his or her mouth when there is no spoken dialog to accompany it, hence the term lip flap or lip-sync error. The resulting audio-video sync error can be annoying to the viewer and may even cause the viewer to not enjoy the program, decrease the effectiveness of the program or lead to a negative perception of the speaker on the part of the viewer.[6] The potential loss of effectiveness is of particular concern for product commercials and political candidates. Television industry standards organizations, such as the Advanced Television Systems Committee, have become involved in setting standards for audio-video sync errors.[4]
Because of these annoyances, AV-sync error is a concern to the television programming industry, including television stations, networks, advertisers and program production companies. Unfortunately, the advent of high-definition flat-panel display technologies (LCD, DLP and plasma), which can delay video more than audio, has moved the problem into the viewer's home and beyond the control of the television programming industry alone. Consumer product companies now offer audio-delay adjustments to compensate for video-delay changes in TVs and A/V receivers, and several companies manufacture dedicated digital audio delays made exclusively for lip-sync error correction.
Recommendations
For television applications, the
The
SMPTE ST2064
Timestamps
The
See also
- Audio synchronizer
- Clapperboard
- Dubbing (filmmaking)
- Input lag
- Lip sync
References
- ^ a b c "ITU-R BT.1359-1, Relative Timing of Sound and Vision for Broadcasting" (PDF). ITU. 1998. Retrieved 30 May 2015.
- ^ Patrick Waddell; Graham Jones; Adam Goldberg. "Audio/Video Standards and Solutions A Status Report" (PDF). ATSC. Archived from the original (PDF) on 17 February 2016. Retrieved 4 April 2012.
- RFC 3550
- ^ ATSC, 2003-06-26, archived from the originalon 2012-03-21
- ^ a b "The relative timing of the sound and vision components of a television signal" (PDF).
- ^ Byron Reeves; David Voelker (October 1993). "Effects of Audio-Video Asynchrony on Viewer's Memory, Evaluation of Content and Detection Ability" (PDF). Archived from the original (PDF) on 2 October 2008. Retrieved 2008-10-19.
- SMPTE.
Appropriate A/V sync limits have been established and the range that is considered acceptable for film is +/- 22 ms. The range for video, according to the ATSC, is up to 15 ms lead time and about 45 ms lag time
- ^ Consumer Electronics Association. "CEA-CEB20 R-2013: A/V Synchronization Processing Recommended Practice". Archived from the original on 2015-05-30.
- SMPTE, 2015
- SMPTE, 10 December 2013, archivedfrom the original on 2021-12-15
- SMPTE, 10 December 2013, archived from the original(PDF) on 2016-08-26, retrieved 2016-06-09
- ^ "MPEG-2 Systems FAQ: 19. Where are the PTSs and DTSs inserted?". Archived from the original on 2008-07-26. Retrieved 2007-12-27.
- ^ Arpi (7 May 2003). "MPlayer-G2-dev: mpeg container's timing (PTS values)".
- ^ "birds-eye.net: DTS - Decode Time Stamp".
- ^ "SVCD2DVD: Author and burn DVDs: AVI to DVD, DivX to DVD, Xvid to DVD, MPEG to DVD, SVCD to DVD, VCD to DVD, PAL to NTSC conversion, HDTV2DVD, HDTV to DVD, BLURAY". www.svcd2dvd.com.
- RFC 7273
- RFC 7272
Further reading
- Cugnini, Aldo (Sep 1, 2007). "Managing lip sync". TV Technology, originally from Broadcast Engineering. Archived from the original on October 8, 2015. Retrieved 2008-10-19.
- R.A. Salmon; Andrew Mason (January 2009). "Factors affecting perception of audio-video synchronisation in television". BBC Research & Development. Retrieved 2013-06-02.
- Sieranoja, S.; Sahidullah, Md; Kinnunen, T.; Komulainen, J.; Hadid, A. (July 2018). "Audiovisual Synchrony Detection with Optimized Audio Features" (PDF). 2018 IEEE 3rd International Conference on Signal and Image Processing (ICSIP). pp. 377–381. S2CID 51682024.