Markov random field

In the domain of

undirected graph. In other words, a random field is said to be a Markov random field if it satisfies Markov properties. The concept originates from the Sherrington–Kirkpatrick model.^[1]

A Markov network or MRF is similar to a Bayesian network in its representation of dependencies; the differences being that Bayesian networks are directed and acyclic, whereas Markov networks are undirected and may be cyclic. Thus, a Markov network can represent certain dependencies that a Bayesian network cannot (such as cyclic dependencies ^{[further explanation needed]}); on the other hand, it can't represent certain dependencies that a Bayesian network can (such as induced dependencies ^{[further explanation needed]}). The underlying graph of a Markov random field may be finite or infinite.

When the

image processing and computer vision.^[3]

Definition

Given an undirected graph $G=(V,E)$ , a set of random variables $X=(X_{v})_{v\in V}$ indexed by $V$ form a Markov random field with respect to $G$ if they satisfy the local Markov properties:

Pairwise Markov property: Any two non-adjacent variables are conditionally independent given all other variables:

X_{u}\perp \!\!\!\perp X_{v}\mid X_{V\smallsetminus \{u,v\}}

Local Markov property: A variable is conditionally independent of all other variables given its neighbors:

X_{v}\perp \!\!\!\perp X_{V\smallsetminus \operatorname {N} [v]}\mid X_{\operatorname {N} (v)}

where

{\textstyle \operatorname {N} (v)}

is the set of neighbors of

v

, and

\operatorname {N} [v]=v\cup \operatorname {N} (v)

is the

closed neighbourhood

of

v

.

Global Markov property: Any two subsets of variables are conditionally independent given a separating subset:

X_{A}\perp \!\!\!\perp X_{B}\mid X_{S}

where every path from a node in

A

to a node in

B

passes through

S

.

The Global Markov property is stronger than the Local Markov property, which in turn is stronger than the Pairwise one.^[4] However, the above three Markov properties are equivalent for positive distributions^[5] (those that assign only nonzero probabilities to the associated variables).

The relation between the three Markov properties is particularly clear in the following formulation:

Pairwise: For any $i,j\in V$ not equal or adjacent, $X_{i}\perp \!\!\!\perp X_{j}|X_{V\smallsetminus \{i,j\}}$ .
Local: For any $i\in V$ and $J\subset V$ not containing or adjacent to $i$ , $X_{i}\perp \!\!\!\perp X_{J}|X_{V\smallsetminus (\{i\}\cup J)}$ .
Global: For any $I,J\subset V$ not intersecting or adjacent, $X_{I}\perp \!\!\!\perp X_{J}|X_{V\smallsetminus (I\cup J)}$ .

Clique factorization

As the Markov property of an arbitrary probability distribution can be difficult to establish, a commonly used class of Markov random fields are those that can be factorized according to the cliques of the graph.

Given a set of random variables $X=(X_{v})_{v\in V}$ , let $P(X=x)$ be the probability of a particular field configuration $x$ in $X$ —that is, $P(X=x)$ is the probability of finding that the random variables $X$ take on the particular value $x$ . Because $X$ is a set, the probability of $x$ should be understood to be taken with respect to a joint distribution of the $X_{v}$ .

If this joint density can be factorized over the cliques of $G$ as

P(X=x)=\prod _{C\in \operatorname {cl} (G)}\varphi _{C}(x_{C})

then $X$ forms a Markov random field with respect to $G$ . Here, $\operatorname {cl} (G)$ is the set of cliques of $G$ . The definition is equivalent if only maximal cliques are used. The functions $\varphi _{C}$ are sometimes referred to as factor potentials or clique potentials. Note, however, conflicting terminology is in use: the word potential is often applied to the logarithm of $\varphi _{C}$ . This is because, in statistical mechanics, $\log(\varphi _{C})$ has a direct interpretation as the potential energy of a configuration $x_{C}$ .

Some MRF's do not factorize: a simple example can be constructed on a cycle of 4 nodes with some infinite energies, i.e. configurations of zero probabilities,^[6] even if one, more appropriately, allows the infinite energies to act on the complete graph on $V$ .^[7]

MRF's factorize if at least one of the following conditions is fulfilled:

the density is positive (by the Hammersley–Clifford theorem)
the graph is chordal (by equivalence to a Bayesian network)

When such a factorization does exist, it is possible to construct a factor graph for the network.

Exponential family

Any positive Markov random field can be written as exponential family in canonical form with feature functions $f_{k}$ such that the full-joint distribution can be written as

P(X=x)={\frac {1}{Z}}\exp \left(\sum _{k}w_{k}^{\top }f_{k}(x_{\{k\}})\right)

where the notation

w_{k}^{\top }f_{k}(x_{\{k\}})=\sum _{i=1}^{N_{k}}w_{k,i}\cdot f_{k,i}(x_{\{k\}})

is simply a dot product over field configurations, and Z is the partition function:

Z=\sum _{x\in {\mathcal {X}}}\exp \left(\sum _{k}w_{k}^{\top }f_{k}(x_{\{k\}})\right).

Here, ${\mathcal {X}}$ denotes the set of all possible assignments of values to all the network's random variables. Usually, the feature functions $f_{k,i}$ are defined such that they are indicators of the clique's configuration, i.e. $f_{k,i}(x_{\{k\}})=1$ if $x_{\{k\}}$ corresponds to the i-th possible configuration of the k-th clique and 0 otherwise. This model is equivalent to the clique factorization model given above, if $N_{k}=|\operatorname {dom} (C_{k})|$ is the cardinality of the clique, and the weight of a feature $f_{k,i}$ corresponds to the logarithm of the corresponding clique factor, i.e. $w_{k,i}=\log \varphi (c_{k,i})$ , where $c_{k,i}$ is the i-th possible configuration of the k-th clique, i.e. the i-th value in the domain of the clique $C_{k}$ .

The probability P is often called the Gibbs measure. This expression of a Markov field as a logistic model is only possible if all clique factors are non-zero, i.e. if none of the elements of ${\mathcal {X}}$ are assigned a probability of 0. This allows techniques from matrix algebra to be applied, e.g. that the trace of a matrix is log of the determinant, with the matrix representation of a graph arising from the graph's incidence matrix.

The importance of the partition function Z is that many concepts from

variational methods to be applied to the solution of the problem: one can attach a driving force to one or more of the random variables, and explore the reaction of the network in response to this perturbation

. Thus, for example, one may add a driving term J_v, for each vertex v of the graph, to the partition function to get:

Z[J]=\sum _{x\in {\mathcal {X}}}\exp \left(\sum _{k}w_{k}^{\top }f_{k}(x_{\{k\}})+\sum _{v}J_{v}x_{v}\right)

Formally differentiating with respect to J_v gives the

expectation value

of the random variable X_v associated with the vertex v:

E[X_{v}]={\frac {1}{Z}}\left.{\frac {\partial Z[J]}{\partial J_{v}}}\right|_{J_{v}=0}.

Correlation functions are computed likewise; the two-point correlation is:

C[X_{u},X_{v}]={\frac {1}{Z}}\left.{\frac {\partial ^{2}Z[J]}{\partial J_{u}\,\partial J_{v}}}\right|_{J_{u}=0,J_{v}=0}.

Unfortunately, though the likelihood of a logistic Markov network is convex, evaluating the likelihood or gradient of the likelihood of a model requires inference in the model, which is generally computationally infeasible (see 'Inference' below).

Examples

Gaussian

A multivariate normal distribution forms a Markov random field with respect to a graph $G=(V,E)$ if the missing edges correspond to zeros on the

precision matrix (the inverse covariance matrix

):

X=(X_{v})_{v\in V}\sim {\mathcal {N}}({\boldsymbol {\mu }},\Sigma )

such that

(\Sigma ^{-1})_{uv}=0\quad {\text{iff}}\quad \{u,v\}\notin E.

[8]

Inference

As in a

conditional distribution

of a set of nodes

V'=\{v_{1},\ldots ,v_{i}\}

given values to another set of nodes

W'=\{w_{1},\ldots ,w_{j}\}

in the Markov random field by summing over all possible assignments to

u\notin V',W'

; this is called

MLE, it is possible to discover a consistent structure for hundreds of variables.^[11]

Conditional random fields

One notable variant of a Markov random field is a conditional random field, in which each random variable may also be conditioned upon a set of global observations $o$ . In this model, each function $\varphi _{k}$ is a mapping from all assignments to both the clique k and the observations $o$ to the nonnegative real numbers. This form of the Markov network may be more appropriate for producing discriminative classifiers, which do not model the distribution over the observations. CRFs were proposed by John D. Lafferty, Andrew McCallum and Fernando C.N. Pereira in 2001.^[12]

Varied applications

Markov random fields find application in a variety of fields, ranging from

super-resolution, stereo matching and information retrieval. They can be used to solve various computer vision problems which can be posed as energy minimization problems or problems where different regions have to be distinguished using a set of discriminating features, within a Markov random field framework, to predict the category of the region.^[16]

Markov random fields were a generalization over the Ising model and have, since then, been used widely in combinatorial optimizations and networks.

References

doi:10.1103/PhysRevLett.35.1792

MR 0620955. Archived from the original
(PDF) on 2017-08-10. Retrieved 2012-04-09.

ISBN 9781848002791
.

ISBN 978-0198522195
.

ISBN 9780262013192
.

S2CID 121299906
.

doi:10.2140/memocs.2016.4.407
.

ISBN 978-1-58488-432-3
.

S2CID 11312524
.

^ Duchi, John C.; Tarlow, Daniel; Elidan, Gal; Koller, Daphne (2006), "Using Combinatorial Optimization within Max-Product Belief Propagation", in Schölkopf, Bernhard; Platt, John C.; Hoffman, Thomas (eds.), Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4-7, 2006, Advances in Neural Information Processing Systems, vol. 19, MIT Press, pp. 369–376.

^ Petitjean, F.; Webb, G.I.; Nicholson, A.E. (2013). Scaling log-linear analysis to high-dimensional data (PDF). International Conference on Data Mining. Dallas, TX, USA: IEEE.

^ "Two classic paper prizes for papers that appeared at ICML 2013". ICML. 2013. Retrieved 15 December 2014.

ISBN 978-0-8218-5001-5
.

PMID 28145456
.

doi:10.1145/1076034.1076115
.

CiteSeerX 10.1.1.649.303
.

Discrete time

Bernoulli process

Branching process

Chinese restaurant process

Galton–Watson process

Independent and identically distributed random variables

Markov chain

Moran process

Random walk
Loop-erased

Self-avoiding

Biased

Maximal entropy

Continuous time

Additive process

Bessel process

Birth–death process
pure birth

Brownian motion
Bridge

Excursion

Fractional

Geometric

Meander

Cauchy process

Contact process

Continuous-time random walk

Cox process

Diffusion process

Dyson Brownian motion

Empirical process

Feller process

Fleming–Viot process

Gamma process

Geometric process

Hawkes process

Hunt process

Interacting particle systems

Itô diffusion

Itô process

Jump diffusion

Jump process

Lévy process

Local time

Markov additive process

McKean–Vlasov process

Ornstein–Uhlenbeck process

Poisson process
Compound

Non-homogeneous

Schramm–Loewner evolution

Semimartingale

Sigma-martingale

Stable process

Superprocess

Telegraph process

Variance gamma process

Wiener process

Wiener sausage

Both

Branching process

Galves–Löcherbach model

Gaussian process

Hidden Markov model (HMM)

Markov process

Martingale
Differences

Local

Sub-

Super-

Random dynamical system

Regenerative process

Renewal process

Stochastic chains with memory of variable length

White noise

Fields and other

Dirichlet process

Gaussian random field

Gibbs measure

Hopfield model

Ising model
Potts model

Boolean network

Markov random field

Percolation

Pitman–Yor process

Point process
Cox

Poisson

Random field

Random graph

Time series models

Autoregressive conditional heteroskedasticity (ARCH) model

Autoregressive integrated moving average (ARIMA) model

Autoregressive (AR) model

Autoregressive–moving-average (ARMA) model

Generalized autoregressive conditional heteroskedasticity (GARCH) model

Moving-average (MA) model

Financial models

Binomial options pricing model

Black–Derman–Toy

Black–Karasinski

Black–Scholes

Chan–Karolyi–Longstaff–Sanders (CKLS)

Chen

Constant elasticity of variance (CEV)

Cox–Ingersoll–Ross (CIR)

Garman–Kohlhagen

Heath–Jarrow–Morton (HJM)

Heston

Ho–Lee

Hull–White

Korn-Kreer-Lenssen

LIBOR market

Rendleman–Bartter

SABR volatility

Vašíček

Wilkie

Actuarial models

Bühlmann

Cramér–Lundberg

Risk process

Sparre–Anderson

Queueing models

Bulk

Fluid

Generalized queueing network

M/G/1

M/M/1

M/M/c

Properties

Càdlàg paths

Continuous

Continuous paths

Ergodic

Exchangeable

Feller-continuous

Gauss–Markov

Markov

Mixing

Piecewise-deterministic

Predictable

Progressively measurable

Self-similar

Stationary

Time-reversible

Limit theorems

Central limit theorem

Donsker's theorem

Doob's martingale convergence theorems

Ergodic theorem

Fisher–Tippett–Gnedenko theorem

Large deviation principle

Law of large numbers (weak/strong)

Law of the iterated logarithm

Maximal ergodic theorem

Sanov's theorem

Lévy
)

Inequalities

Burkholder–Davis–Gundy

Doob's martingale

Doob's upcrossing

Kunita–Watanabe

Marcinkiewicz–Zygmund

Tools

Cameron–Martin formula

Convergence of random variables

Doléans-Dade exponential

Doob decomposition theorem

Doob–Meyer decomposition theorem

Doob's optional stopping theorem

Dynkin's formula

Feynman–Kac formula

Filtration

Girsanov theorem

Infinitesimal generator

Itô integral

Itô's lemma

Karhunen–Loève theorem

Kolmogorov continuity theorem

Kolmogorov extension theorem

Lévy–Prokhorov metric

Malliavin calculus

Martingale representation theorem

Optional stopping theorem

Prokhorov's theorem

Quadratic variation

Reflection principle

Skorokhod integral

Skorokhod's representation theorem

Skorokhod space

Snell envelope

Stochastic differential equation
Tanaka

Stopping time

Stratonovich integral

Uniform integrability

Usual hypotheses

Wiener space

Classical

Abstract

Disciplines

Actuarial mathematics

Control theory

Econometrics

Ergodic theory

Extreme value theory (EVT)

Large deviations theory

Mathematical finance

Mathematical statistics

Probability theory

Queueing theory

Renewal theory

Ruin theory

Signal processing

Statistics

Stochastic analysis

Time series analysis

Machine learning

List of topics

Category

Retrieved from "https://en.wikipedia.org/w/index.php?title=Markov_random_field&oldid=1221329476"

[1] doi:10.1103/PhysRevLett.35.1792

[2] MR 0620955. Archived from the original
(PDF) on 2017-08-10. Retrieved 2012-04-09.

[3] ISBN 9781848002791
.

[4] ISBN 978-0198522195
.

[5] ISBN 9780262013192
.

[6] S2CID 121299906
.

[7] :10.2140/memocs.2016.4.407
.

[8] ISBN 978-1-58488-432-3
.

[9] S2CID 11312524
.

[10] Duchi, John C.; Tarlow, Daniel; Elidan, Gal; Koller, Daphne (2006), "Using Combinatorial Optimization within Max-Product Belief Propagation", in Schölkopf, Bernhard; Platt, John C.; Hoffman, Thomas (eds.), Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4-7, 2006, Advances in Neural Information Processing Systems, vol. 19, MIT Press, pp. 369–376.

[Petitjean-11] Petitjean, F.; Webb, G.I.; Nicholson, A.E. (2013). Scaling log-linear analysis to high-dimensional data (PDF). International Conference on Data Mining. Dallas, TX, USA: IEEE.

[ICML03classic-12] "Two classic paper prizes for papers that appeared at ICML 2013". ICML. 2013. Retrieved 15 December 2014.

[13] ISBN 978-0-8218-5001-5
.

[14] PMID 28145456
.

[15] :10.1145/1076034.1076115
.

[16] CiteSeerX 10.1.1.649.303
.

[1]

[3]

[4]

[5]

[6]

[7]

[11]

[12]

[16]