Motion estimation
In
Related terms
More often than not, the term motion estimation and the term
Each motion vector is used to represent a macroblock in a picture based on the position of this macroblock (or a similar one) in another picture, called the reference picture.
The
motion vector: a two-dimensional vector used for inter prediction that provides an offset from the coordinates in the decoded picture to the coordinates in a reference picture.[2][3]
Algorithms
The methods for finding motion vectors can be categorised into pixel based methods ("direct") and feature based methods ("indirect"). A famous debate resulted in two papers from the opposing factions being produced to try to establish a conclusion.[4][5]
Direct methods
- Block-matching algorithm
- Phase correlation and frequency domain methods
- Pixel recursive algorithms
- Optical flow
Indirect methods
Indirect methods use features, such as corner detection, and match corresponding features between frames, usually with a statistical function applied over a local or global area. The purpose of the statistical function is to remove matches that do not correspond to the actual motion.
Statistical functions that have been successfully used include
Additional note on the categorization
It can be argued that almost all methods require some kind of definition of the matching criteria. The difference is only whether you summarise over a local image region first and then compare the summarisation (such as feature based methods), or you compare each pixel first (such as squaring the difference) and then summarise over a local image region (block base motion and filter based motion). An emerging type of matching criteria summarises a local image region first for every pixel location (through some feature transform such as Laplacian transform), compares each summarised pixel and summarises over a local image region again.[6] Some matching criteria have the ability to exclude points that do not actually correspond to each other albeit producing a good matching score, others do not have this ability, but they are still matching criteria.
Affine Motion Estimation
Affine motion estimation is a technique used in computer vision and image processing to estimate the motion between two images or frames. It assumes that the motion can be modeled as an affine transformation (translation + rotation + zooming), which is a linear transformation followed by a translation.
Applications
Video coding
Applying the motion vectors to an image to synthesize the transformation to the next image is called
As a way of exploiting temporal redundancy, motion estimation and compensation are key parts of
3D reconstruction
In simultaneous localization and mapping, a 3D model of a scene is reconstructed using images from a moving camera.[9]
See also
- Moving object detection
- Graphics processing unit
- Vision processing unit
- Scale-invariant feature transform
References
- ISBN 978-1-59454-357-9.
- ^ Latest working draft of H.264/MPEG-4 AVC Archived 2004-07-23 at the Wayback Machine. Retrieved on 2008-02-29.
- ^ "Latest working draft of H.264/MPEG-4 AVC on hhi.fraunhofer.de" (PDF).[permanent dead link]
- ^ Philip H.S. Torr and Andrew Zisserman: Feature Based Methods for Structure and Motion Estimation, ICCV Workshop on Vision Algorithms, pages 278-294, 1999
- ^ Michal Irani and P. Anandan: About Direct Methods, ICCV Workshop on Vision Algorithms, pages 267-277, 1999.
- ^ Rui Xu, David Taubman & Aous Thabit Naman, 'Motion Estimation Based on Mutual Information and Adaptive Multi-scale Thresholding', in IEEE Transactions on Image Processing, vol. 25, no. 3, pp. 1095-1108, March 2016.
- ISBN 978-1-4615-6241-2.
- ISBN 9780240806174.
- ^ Kerl, Christian, Jürgen Sturm, and Daniel Cremers. "Dense visual SLAM for RGB-D cameras." 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2013.