Visual information fidelity

Visual information fidelity (VIF) is a full reference

VMAF

video quality monitoring system, which controls the picture quality of all encoded videos streamed by Netflix.

Model overview

Images and videos of the

quality assessment (QA) method that does not rely on any HVS or viewing geometry parameter, nor any constants requiring optimization, and yet is competitive with state of the art QA methods.^[3]

Specifically, the reference image is modeled as being the output of a stochastic 'natural' source that passes through the HVS channel and is processed later by the brain. The information content of the reference image is quantified as being the mutual information between the input and output of the HVS channel. This is the information that the brain could ideally extract from the output of the HVS. The same measure is then quantified in the presence of an image distortion channel that distorts the output of the natural source before it passes through the HVS channel, thereby measuring the information that the brain could ideally extract from the test image. This is shown pictorially in Figure 1. The two information measures are then combined to form a visual information fidelity measure that relates visual quality to relative image information.

System model

Source model

A Gaussian scale mixture (GSM) is used to statistically model the

wavelet coefficients of a steerable pyramid decomposition of an image.^[4]

The model is described below for a given subband of the multi-scale multi-orientation decomposition and can be extended to other subbands similarly. Let the wavelet coefficients in a given subband be

{\mathcal {C}}=\{{\bar {C}}_{i}:i\in {\mathcal {I}}\}

where

{\mathcal {I}}

denotes the set of spatial indices across the subband and each

{\bar {C}}_{i}

is an

M

dimensional vector. The subband is partitioned into non-overlapping blocks of

M

coefficients each, where each block corresponds to

{\bar {C}}_{i}

. According to the GSM model,

{\mathcal {C}}={\mathcal {S}}\cdot {\mathcal {U}}=\{S_{i}{\bar {U}}_{i}:i\in {\mathcal {I}}\},

where

S_{i}

is a positive scalar and

{\bar {U}}_{i}

is a Gaussian vector with mean zero and co-variance

\mathbf {C} _{U}

. Further the non-overlapping blocks are assumed to be independent of each other and that the random field

{\mathcal {S}}

is independent of

{\mathcal {U}}

.

Distortion model

The distortion process is modeled using a combination of signal attenuation and additive noise in the wavelet domain. Mathematically, if ${\mathcal {D}}=\{{\bar {D}}_{i}:i\in {\mathcal {I}}\}$ denotes the random field from a given subband of the distorted image, ${\mathcal {G}}=\{g_{i}:i\in {\mathcal {I}}\}$ is a deterministic scalar field and ${\mathcal {V}}=\{{\bar {V}}_{i}:i\in {\mathcal {I}}\}$ , where ${\bar {V}}_{i}$ is a zero mean Gaussian vector with co-variance $\mathbf {C} _{V}=\sigma _{v}^{2}\mathbf {I}$ , then

{\mathcal {D}}={\mathcal {G}}{\mathcal {C}}+{\mathcal {V}}.

Further, ${\mathcal {V}}$ is modeled to be independent of ${\mathcal {S}}$ and ${\mathcal {U}}$ .

HVS model

The duality of HVS models and NSS implies that several aspects of the HVS have already been accounted for in the source model. Here, the HVS is additionally modeled based on the hypothesis that the uncertainty in the

visual noise

in the HVS model. In particular, the HVS noise in a given subband of the wavelet decomposition is modeled as additive white Gaussian noise. Let

{\mathcal {N}}=\{{\bar {N}}_{i}:i\in {\mathcal {I}}\}

and

{\mathcal {N}}'=\{{\bar {N}}_{i}':i\in {\mathcal {I}}\}

be random fields, where

{\bar {N}}_{i}

and

{\bar {N}}_{i}'

are zero mean Gaussian vectors with co-variance

\mathbf {C} _{N}

and

\mathbf {C} _{N}'

. Further, let

{\mathcal {E}}

and

{\mathcal {F}}

denote the visual signal at the output of the HVS. Mathematically, we have

{\mathcal {E}}={\mathcal {C}}+{\mathcal {N}}

and

{\mathcal {F}}={\mathcal {D}}+{\mathcal {N}}'

. Note that

{\mathcal {N}}

and

{\mathcal {N}}'

are random fields that are independent of

{\mathcal {S}}

,

{\mathcal {U}}

and

{\mathcal {V}}

.

VIF index

Let ${\bar {C}}^{N}=({\bar {C}}_{1},{\bar {C}}_{2},\ldots ,{\bar {C}}^{N})$ denote the vector of all blocks from a given subband. Let $S^{N},{\bar {D}}^{N},{\bar {E}}^{N}$ and ${\bar {F}}^{N}$ be similarly defined. Let $s^{N}$ denote the maximum likelihood estimate of $S^{N}$ given $C^{N}$ and $\mathbf {C} _{U}$ . The amount of information extracted from the reference is obtained as

I({\bar {C}}^{N};{\bar {E}}^{N}|{\bar {S}}^{N}=s^{N})={\frac {1}{2}}\sum _{i=1}^{N}\log _{2}\left({\frac {|s_{i}^{2}\mathbf {C} _{U}+\sigma _{n}^{2}\mathbf {I} |}{|\sigma _{n}^{2}\mathbf {I} |}}\right),

while the amount of information extracted from the test image is given as

I({\bar {C}}^{N};{\bar {F}}^{N}|{\bar {S}}^{N}=s^{N})={\frac {1}{2}}\sum _{i=1}^{N}\log _{2}\left({\frac {|g_{i}^{2}s_{i}^{2}\mathbf {C} _{U}+(\sigma _{v}^{2}+\sigma _{n}^{2})\mathbf {I} |}{|(\sigma _{v}^{2}+\sigma _{n}^{2})\mathbf {I} |}}\right).

Denoting the $N$ blocks in subband $j$ of the wavelet decomposition by ${\bar {C}}^{N,j}$ , and similarly for the other variables, the VIF index is defined as

{\textrm {VIF}}={\frac {\sum _{j\in {\textrm {subbands}}}I({\bar {C}}^{N,j};{\bar {F}}^{N,j}\mid S^{N,j}=s^{N,j})}{\sum _{j\in {\textrm {subbands}}}I({\bar {C}}^{N,j};{\bar {E}}^{N,j}\mid S^{N,j}=s^{N,j})}}.

Performance

The Spearman's rank-order correlation coefficient (SROCC) between the VIF index scores of distorted images on the LIVE Image Quality Assessment Database and the corresponding human opinion scores is evaluated to be 0.96.^{[citation needed]}

References

PMID 16479813
.

S2CID 207761262
.

^ Sheikh, Hamid R. "Image Information and Visual Quality". University of Texas. Retrieved 15 April 2024.

S2CID 1099364
.

External links

Laboratory for Image and Video Engineering at the University of Texas

An implementation of the VIF index

LIVE Image Quality Assessment Database

Retrieved from "https://en.wikipedia.org/w/index.php?title=Visual_information_fidelity&oldid=1220036758"

[1] PMID 16479813
.

[2] S2CID 207761262
.

[3] Sheikh, Hamid R. "Image Information and Visual Quality". University of Texas. Retrieved 15 April 2024.

[4] S2CID 1099364
.

[3]

[4]