Markov random field
In the domain of
A Markov network or MRF is similar to a Bayesian network in its representation of dependencies; the differences being that Bayesian networks are directed and acyclic, whereas Markov networks are undirected and may be cyclic. Thus, a Markov network can represent certain dependencies that a Bayesian network cannot (such as cyclic dependencies [further explanation needed]); on the other hand, it can't represent certain dependencies that a Bayesian network can (such as induced dependencies [further explanation needed]). The underlying graph of a Markov random field may be finite or infinite.
When the
Definition
Given an undirected graph , a set of random variables indexed by form a Markov random field with respect to if they satisfy the local Markov properties:
- Pairwise Markov property: Any two non-adjacent variables are conditionally independent given all other variables:
- Local Markov property: A variable is conditionally independent of all other variables given its neighbors:
- where is the set of neighbors of , and is the closed neighbourhoodof .
- Global Markov property: Any two subsets of variables are conditionally independent given a separating subset:
- where every path from a node in to a node in passes through .
The Global Markov property is stronger than the Local Markov property, which in turn is stronger than the Pairwise one.[4] However, the above three Markov properties are equivalent for positive distributions[5] (those that assign only nonzero probabilities to the associated variables).
The relation between the three Markov properties is particularly clear in the following formulation:
- Pairwise: For any not equal or adjacent, .
- Local: For any and not containing or adjacent to , .
- Global: For any not intersecting or adjacent, .
Clique factorization
As the Markov property of an arbitrary probability distribution can be difficult to establish, a commonly used class of Markov random fields are those that can be factorized according to the cliques of the graph.
Given a set of random variables , let be the probability of a particular field configuration in —that is, is the probability of finding that the random variables take on the particular value . Because is a set, the probability of should be understood to be taken with respect to a joint distribution of the .
If this joint density can be factorized over the cliques of as
then forms a Markov random field with respect to . Here, is the set of cliques of . The definition is equivalent if only maximal cliques are used. The functions are sometimes referred to as factor potentials or clique potentials. Note, however, conflicting terminology is in use: the word potential is often applied to the logarithm of . This is because, in statistical mechanics, has a direct interpretation as the potential energy of a configuration .
Some MRF's do not factorize: a simple example can be constructed on a cycle of 4 nodes with some infinite energies, i.e. configurations of zero probabilities,[6] even if one, more appropriately, allows the infinite energies to act on the complete graph on .[7]
MRF's factorize if at least one of the following conditions is fulfilled:
- the density is positive (by the Hammersley–Clifford theorem)
- the graph is chordal (by equivalence to a Bayesian network)
When such a factorization does exist, it is possible to construct a factor graph for the network.
Exponential family
Any positive Markov random field can be written as exponential family in canonical form with feature functions such that the full-joint distribution can be written as
where the notation
is simply a dot product over field configurations, and Z is the partition function:
Here, denotes the set of all possible assignments of values to all the network's random variables. Usually, the feature functions are defined such that they are indicators of the clique's configuration, i.e. if corresponds to the i-th possible configuration of the k-th clique and 0 otherwise. This model is equivalent to the clique factorization model given above, if is the cardinality of the clique, and the weight of a feature corresponds to the logarithm of the corresponding clique factor, i.e. , where is the i-th possible configuration of the k-th clique, i.e. the i-th value in the domain of the clique .
The probability P is often called the Gibbs measure. This expression of a Markov field as a logistic model is only possible if all clique factors are non-zero, i.e. if none of the elements of are assigned a probability of 0. This allows techniques from matrix algebra to be applied, e.g. that the trace of a matrix is log of the determinant, with the matrix representation of a graph arising from the graph's incidence matrix.
The importance of the partition function Z is that many concepts from
Formally differentiating with respect to Jv gives the
Correlation functions are computed likewise; the two-point correlation is:
Unfortunately, though the likelihood of a logistic Markov network is convex, evaluating the likelihood or gradient of the likelihood of a model requires inference in the model, which is generally computationally infeasible (see 'Inference' below).
Examples
Gaussian
A multivariate normal distribution forms a Markov random field with respect to a graph if the missing edges correspond to zeros on the
such that
Inference
As in a
Conditional random fields
One notable variant of a Markov random field is a conditional random field, in which each random variable may also be conditioned upon a set of global observations . In this model, each function is a mapping from all assignments to both the clique k and the observations to the nonnegative real numbers. This form of the Markov network may be more appropriate for producing discriminative classifiers, which do not model the distribution over the observations. CRFs were proposed by John D. Lafferty, Andrew McCallum and Fernando C.N. Pereira in 2001.[12]
Varied applications
Markov random fields find application in a variety of fields, ranging from
See also
References
- MR 0620955. Archived from the original(PDF) on 2017-08-10. Retrieved 2012-04-09.
- ISBN 9781848002791.
- ISBN 978-0198522195.
- ISBN 9780262013192.
- S2CID 121299906.
- .
- ISBN 978-1-58488-432-3.
- S2CID 11312524.
- ^ Duchi, John C.; Tarlow, Daniel; Elidan, Gal; Koller, Daphne (2006), "Using Combinatorial Optimization within Max-Product Belief Propagation", in Schölkopf, Bernhard; Platt, John C.; Hoffman, Thomas (eds.), Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4-7, 2006, Advances in Neural Information Processing Systems, vol. 19, MIT Press, pp. 369–376.
- ^ Petitjean, F.; Webb, G.I.; Nicholson, A.E. (2013). Scaling log-linear analysis to high-dimensional data (PDF). International Conference on Data Mining. Dallas, TX, USA: IEEE.
- ^ "Two classic paper prizes for papers that appeared at ICML 2013". ICML. 2013. Retrieved 15 December 2014.
- ISBN 978-0-8218-5001-5.
- PMID 28145456.
- .
- CiteSeerX 10.1.1.649.303.