Motion estimation

In

frames in a video sequence. It is an ill-posed problem as the motion happens in three dimensions (3D) but the images are a projection of the 3D scene onto a 2D plane. The motion vectors may relate to the whole image (global motion estimation) or specific parts, such as rectangular blocks, arbitrary shaped patches or even per pixel

. The motion vectors may be represented by a translational model or many other models that can approximate the motion of a real video camera, such as rotation and translation in all three dimensions and zoom.

Related terms

More often than not, the term motion estimation and the term

stereo correspondence.^[1] In fact all of these terms refer to the process of finding corresponding points

between two images or video frames. The points that correspond to each other in two views (images or frames) of a real scene or object are "usually" the same point in that scene or on that object. Before we do motion estimation, we must define our measurement of correspondence, i.e., the matching metric, which is a measurement of how similar two image points are. There is no right or wrong here; the choice of matching metric is usually related to what the final estimated motion is used for as well as the optimisation strategy in the estimation process.

Each motion vector is used to represent a macroblock in a picture based on the position of this macroblock (or a similar one) in another picture, called the reference picture.

The

H.264/MPEG-4 AVC

standard defines motion vector as:

motion vector: a two-dimensional vector used for inter prediction that provides an offset from the coordinates in the decoded picture to the coordinates in a reference picture.^[2]^[3]

Algorithms

The methods for finding motion vectors can be categorised into pixel based methods ("direct") and feature based methods ("indirect"). A famous debate resulted in two papers from the opposing factions being produced to try to establish a conclusion.^[4]^[5]

Direct methods

Block-matching algorithm
Phase correlation and frequency domain methods
Pixel recursive algorithms
Optical flow

Indirect methods

Indirect methods use features, such as corner detection, and match corresponding features between frames, usually with a statistical function applied over a local or global area. The purpose of the statistical function is to remove matches that do not correspond to the actual motion.

Statistical functions that have been successfully used include

RANSAC

.

Additional note on the categorization

It can be argued that almost all methods require some kind of definition of the matching criteria. The difference is only whether you summarise over a local image region first and then compare the summarisation (such as feature based methods), or you compare each pixel first (such as squaring the difference) and then summarise over a local image region (block base motion and filter based motion). An emerging type of matching criteria summarises a local image region first for every pixel location (through some feature transform such as Laplacian transform), compares each summarised pixel and summarises over a local image region again.^[6] Some matching criteria have the ability to exclude points that do not actually correspond to each other albeit producing a good matching score, others do not have this ability, but they are still matching criteria.

Affine Motion Estimation

Affine motion estimation is a technique used in computer vision and image processing to estimate the motion between two images or frames. It assumes that the motion can be modeled as an affine transformation (translation + rotation + zooming), which is a linear transformation followed by a translation.

Applications

Video coding

Applying the motion vectors to an image to synthesize the transformation to the next image is called

video coding standards, because the coding is performed in blocks.^[8]

As a way of exploiting temporal redundancy, motion estimation and compensation are key parts of

HEVC

.

3D reconstruction

In simultaneous localization and mapping, a 3D model of a scene is reconstructed using images from a moving camera.^[9]

References

ISBN 978-1-59454-357-9
.

^ Latest working draft of H.264/MPEG-4 AVC Archived 2004-07-23 at the Wayback Machine. Retrieved on 2008-02-29.

^ "Latest working draft of H.264/MPEG-4 AVC on hhi.fraunhofer.de" (PDF).^{[permanent dead link]}

^ Philip H.S. Torr and Andrew Zisserman: Feature Based Methods for Structure and Motion Estimation, ICCV Workshop on Vision Algorithms, pages 278-294, 1999

^ Michal Irani and P. Anandan: About Direct Methods, ICCV Workshop on Vision Algorithms, pages 267-277, 1999.

^ Rui Xu, David Taubman & Aous Thabit Naman, 'Motion Estimation Based on Mutual Information and Adaptive Multi-scale Thresholding', in IEEE Transactions on Image Processing, vol. 25, no. 3, pp. 1095-1108, March 2016.

ISBN 978-1-4615-6241-2
.

ISBN 9780240806174
.

^ Kerl, Christian, Jürgen Sturm, and Daniel Cremers. "Dense visual SLAM for RGB-D cameras." 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2013.

v
t
e
Computer vision
Categories

Datasets

Digital geometry

Commercial systems

Feature detection

Geometry

Image sensor technology

Learning

Morphology

Motion analysis

Noise reduction techniques

Recognition and categorization

Research infrastructure

Researchers

Segmentation

Software

Technologies

Computer stereo vision

Motion capture

Object recognition
3D object recognition

Applications
3D reconstruction

3D reconstruction from multiple images

2D to 3D conversion

Gaussian splatting

Shape from focus

Simultaneous localization and mapping

Structure from motion

View synthesis

Visual hull

4D reconstruction
Free viewpoint television

Volumetric capture

3D pose estimation

Activity recognition

Audio-visual speech recognition

Automatic image annotation

Automatic number-plate recognition

Automated species identification

Augmented reality

Bioimage informatics

Blob detection

Computer-aided diagnosis

Content-based image retrieval
Reverse image search

Eye tracking

Face recognition

Foreground detection

Gesture recognition

Image denoising

Image restoration

Landmark detection

Medical image computing

Object detection
Moving object detection

Small object detection

Optical character recognition

Pose tracking

Remote sensing

Robotic mapping

Autonomous vehicles

Video content analysis

Video motion analysis

Video surveillance

Video tracking
Main category

Retrieved from "https://en.wikipedia.org/w/index.php?title=Motion_estimation&oldid=1218974562"

[Liu2006-1] ISBN 978-1-59454-357-9
.

[2] Latest working draft of H.264/MPEG-4 AVC Archived 2004-07-23 at the Wayback Machine. Retrieved on 2008-02-29.

[3] "Latest working draft of H.264/MPEG-4 AVC on hhi.fraunhofer.de" (PDF).^{[permanent dead link]}

[4] Philip H.S. Torr and Andrew Zisserman: Feature Based Methods for Structure and Motion Estimation, ICCV Workshop on Vision Algorithms, pages 278-294, 1999

[5] Michal Irani and P. Anandan: About Direct Methods, ICCV Workshop on Vision Algorithms, pages 267-277, 1999.

[6] Rui Xu, David Taubman & Aous Thabit Naman, 'Motion Estimation Based on Mutual Information and Adaptive Multi-scale Thresholding', in IEEE Transactions on Image Processing, vol. 25, no. 3, pp. 1095-1108, March 2016.

[FurhtGreenberg2012-7] ISBN 978-1-4615-6241-2
.

[8] ISBN 9780240806174
.

[9] Kerl, Christian, Jürgen Sturm, and Daniel Cremers. "Dense visual SLAM for RGB-D cameras." 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2013.

[1]

[2]

[3]

[4]

[5]

[6]

[8]

[9]