Markov chain Monte Carlo
Part of a series on |
Bayesian statistics |
---|
Posterior = Likelihood × Prior ÷ Evidence |
Background |
Model building |
Posterior approximation |
Estimators |
|
Evidence approximation |
Model evaluation |
|
In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that is, the Markov chain's equilibrium distribution matches the target distribution. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution.
Markov chain Monte Carlo methods are used to study probability distributions that are too complex or too highly
Applications
MCMC methods are primarily used for calculating numerical approximations of multi-dimensional integrals, for example in Bayesian statistics, computational physics,[1] computational biology[2] and computational linguistics.[3][4]
In Bayesian statistics, Markov chain Monte Carlo methods are typically used to calculate moments and credible intervals of posterior probability distributions. The use of MCMC methods makes it possible to compute large hierarchical models that require integrations over hundreds to thousands of unknown parameters.[5]
In rare event sampling, they are also used for generating samples that gradually populate the rare failure region.[citation needed]
General explanation
Markov chain Monte Carlo methods create samples from a continuous
Practically, an
Random walk Monte Carlo methods are a kind of random
These algorithms create
Reducing correlation
While MCMC methods were created to address multi-dimensional problems better than generic Monte Carlo algorithms, when the number of dimensions rises they too tend to suffer the curse of dimensionality: regions of higher probability tend to stretch and get lost in an increasing volume of space that contributes little to the integral. One way to address this problem could be shortening the steps of the walker, so that it does not continuously try to exit the highest probability region, though this way the process would be highly autocorrelated and expensive (i.e. many steps would be required for an accurate result). More sophisticated methods such as Hamiltonian Monte Carlo and the Wang and Landau algorithm use various ways of reducing this autocorrelation, while managing to keep the process in the regions that give a higher contribution to the integral. These algorithms usually rely on a more complicated theory and are harder to implement, but they usually converge faster.
Examples
Random walk
- Metropolis–Hastings algorithm: This method generates a Markov chain using a proposal density for new steps and a method for rejecting some of the proposed moves. It is actually a general framework which includes as special cases the very first and simpler MCMC (Metropolis algorithm) and many more recent alternatives listed below.
- conditional distribution given other coordinates. Gibbs sampling can be viewed as a special case of Metropolis–Hastings algorithm with acceptance rate uniformly equal to 1. When drawing from the full conditional distributions is not straightforward other samplers-within-Gibbs are used (e.g., see [7][8]). Gibbs sampling is popular partly because it does not require any 'tuning'. Algorithm structure of the Gibbs sampling highly resembles that of the coordinate ascent variational inference in that both algorithms utilize the full-conditional distributions in the updating procedure.[9]
- Metropolis-adjusted Langevin algorithm and other methods that rely on the gradient (and possibly second derivative) of the log target density to propose steps that are more likely to be in the direction of higher probability density.[10]
- Hamiltonian dynamics, so the potential energy function is the target density. The momentum samples are discarded after sampling. The result of hybrid Monte Carlo is that proposals move across the sample space in larger steps; they are therefore less correlated and converge to the target distribution more rapidly.
- Pseudo-marginal Metropolis–Hastings: This method replaces the evaluation of the density of the target distribution with an unbiased estimate and is useful when the target density is not available analytically, e.g. latent variable models.
- Slice sampling: This method depends on the principle that one can sample from a distribution by sampling uniformly from the region under the plot of its density function. It alternates uniform sampling in the vertical direction with uniform sampling from the horizontal 'slice' defined by the current vertical position.
- Multiple-try Metropolis: This method is a variation of the Metropolis–Hastings algorithm that allows multiple trials at each point. By making it possible to take larger steps at each iteration, it helps address the curse of dimensionality.
- nonparametric Bayesian models such as those involving the Dirichlet process or Chinese restaurant process, where the number of mixing components/clusters/etc. is automatically inferred from the data.
Interacting particle methods
Interacting MCMC methodologies are a class of
Quasi-Monte Carlo
The quasi-Monte Carlo method is an analog to the normal Monte Carlo method that uses low-discrepancy sequences instead of random numbers.[16][17] It yields an integration error that decays faster than that of true random sampling, as quantified by the Koksma–Hlawka inequality. Empirically it allows the reduction of both estimation error and convergence time by an order of magnitude.[citation needed] Markov chain quasi-Monte Carlo methods[18][19] such as the Array–RQMC method combine randomized quasi–Monte Carlo and Markov chain simulation by simulating chains simultaneously in a way that better approximates the true distribution of the chain than with ordinary MCMC.[20] In empirical experiments, the variance of the average of a function of the state sometimes converges at rate or even faster, instead of the Monte Carlo rate.[21]
Convergence
Usually it is not hard to construct a Markov chain with the desired properties. The more difficult problem is to determine how many steps are needed to converge to the stationary distribution within an acceptable error.[22] A good chain will have rapid mixing: the stationary distribution is reached quickly starting from an arbitrary position. A standard empirical method to assess convergence is to run several independent simulated Markov chains and check that the ratio of inter-chain to intra-chain variances for all the parameters sampled is close to 1.[22][23]
Typically, Markov chain Monte Carlo sampling can only approximate the target distribution, as there is always some residual effect of the starting position. More sophisticated Markov chain Monte Carlo-based algorithms such as
Many random walk Monte Carlo methods move around the equilibrium distribution in relatively small steps, with no tendency for the steps to proceed in the same direction. These methods are easy to implement and analyze, but unfortunately it can take a long time for the walker to explore all of the space. The walker will often double back and cover ground already covered.
Further consideration of convergence is at Markov chain central limit theorem. See [24] for a discussion of the theory related to convergence and stationarity of the Metropolis–Hastings algorithm.
Software
Several software programs provide MCMC sampling capabilities, for example:
- ParaMonte parallel Monte Carlo software available in multiple programming languages including C, C++, Fortran, MATLAB, and Python.
- Packages that use dialects of the BUGS model language:
- MCSim
- Julia language with packages like
- Turing.jl
- DynamicHMC.jl
- AffineInvariantMCMC.jl
- Gen.jl
- and the ones in StanJulia repository.
- Python (programming language) with the packages:
- R (programming language) with the packages adaptMCMC, atmcmc, BRugs, mcmc, MCMCpack, ramcmc, rjags, rstan, etc.
- Stan
- TensorFlow Probability (probabilistic programming library built on TensorFlow)
- Korali high-performance framework for Bayesian UQ, optimization, and reinforcement learning.
- MacMCMC — Full-featured application (freeware) for MacOS, with advanced functionality, available at causaScientia
See also
- Coupling from the past
- Integrated nested Laplace approximations
- Markov chain central limit theorem
- Metropolis-adjusted Langevin algorithm
References
Citations
- S2CID 170078861.
- PMID 27429455.
- ^ See Gill 2008.
- ^ See Robert & Casella 2004.
- ISBN 978-1-4398-1917-3.
- S2CID 5837272.
- JSTOR 2347565.
- JSTOR 2986138.
- S2CID 220935477.
- ^ See Stramer 1999.
- ^ See Green 1995.
- ^ Del Moral, Pierre (2013). Mean field simulation for Monte Carlo integration. Chapman & Hall/CRC Press. p. 626.
- ^ Del Moral, Pierre (2004). Feynman–Kac formulae. Genealogical and interacting particle approximations. Springer. p. 575.
- ISBN 978-3-540-67314-9.
- S2CID 12074789.
- ^ Papageorgiou, Anargyros; Traub, J. F. (1996). "Beating Monte Carlo". Risk. 9 (6): 63–65.
- .
- .
- ProQuest 304808879.
- .
- .
- ^ .
- .
- S2CID 58672766.
- S2CID 88518555.
Sources
- Christophe Andrieu, Nando De Freitas, Arnaud Doucet and Michael I. Jordan An Introduction to MCMC for Machine Learning, 2003
- Asmussen, Søren; Glynn, Peter W. (2007). Stochastic Simulation: Algorithms and Analysis. Stochastic Modelling and Applied Probability. Vol. 57. Springer.
- Atzberger, P. "An Introduction to Monte-Carlo Methods" (PDF).
- Berg, Bernd A. (2004). Markov Chain Monte Carlo Simulations and Their Statistical Analysis. World Scientific.
- Bolstad, William M. (2010). Understanding Computational Bayesian Statistics. Wiley. ISBN 978-0-470-04609-8.
- Carlin, Brad; Chib, Siddhartha (1995). "Bayesian Model Choice via Markov Chain Monte Carlo Methods". Journal of the Royal Statistical Society, Series B, 57(3), 473–484.
- Casella, George; George, Edward I. (1992). "Explaining the Gibbs sampler". JSTOR 2685208.
- JSTOR 2684568.
- Gelfand, A.E.; Smith, A.F.M. (1990). "Sampling-Based Approaches to Calculating Marginal Densities". .
- Chapman and Hall. (See Chapter 11.)
- Geman, S.; S2CID 5837272.
- Gilks, W.R.; Richardson, S.; Chapman and Hall/CRC.
- Gill, Jeff (2008). Bayesian methods: a social and behavioral sciences approach (2nd ed.). ISBN 978-1-58488-562-7.
- Green, P.J. (1995). "Reversible-jump Markov chain Monte Carlo computation and Bayesian model determination". .
- Neal, Radford M. (2003). "Slice Sampling". JSTOR 3448413.
- Neal, Radford M. (1993). "Probabilistic Inference Using Markov Chain Monte Carlo Methods".
- Robert, Christian P.; Casella, G. (2004). Monte Carlo Statistical Methods (2nd ed.). Springer. ISBN 978-0-387-21239-5.
- Rubinstein, R.Y.; ISBN 978-0-470-17794-5.
- Smith, R.L. (1984). "Efficient Monte Carlo Procedures for Generating Points Uniformly Distributed Over Bounded Regions". hdl:2027.42/7681.
- Spall, J.C. (April 2003). "Estimation via Markov Chain Monte Carlo". .
- Stramer, O.; Tweedie, R. (1999). "Langevin-Type Models II: Self-Targeting Candidates for MCMC Algorithms". Methodology and Computing in Applied Probability. 1 (3): 307–328. S2CID 1512689.
Further reading
- . S 0273-0979(08)01238-X.
- ISBN 978-0-521-88068-8
- Richey, Matthew (May 2010). "The Evolution of Markov Chain Monte Carlo Methods" (PDF). S2CID 13630404.