Energy-based model
An energy-based model (EBM) (also called a Canonical Ensemble Learning(CEL) or Learning via Canonical Ensemble (LCE)) is an application of
EBMs provide a unified framework for many probabilistic and non-probabilistic approaches to such learning, particularly for training graphical and other structured models.[citation needed]
An EBM learns the characteristics of a target dataset and generates a similar but larger dataset. EBMs detect the
Energy-based generative neural networks
Boltzmann machines are a special form of energy-based models with a specific parametrization of the energy.[3]
Description
For a given input , the model describes an energy such that the Boltzmann distribution is a probability (density) and typically .
Since the normalization constant , also known as partition function, depends on all the Boltzmann factors of all possible inputs it cannot be easily computed or reliably estimated during training simply using standard maximum likelihood estimation.
However for maximizing the likelihood during training, the gradient of the
The expectation in the above formula for the gradient can be approximately estimated by drawing samples from the distribution using Markov chain Monte Carlo (MCMC)[4]
Early energy-based models like the 2003
and . A replay buffer of past values is used with LD to initialize the optimization module.
The parameters of the neural network are, therefore, trained in a generative manner by MCMC-based maximum likelihood estimation:[6] The learning process follows an "analysis by synthesis" scheme, where within each learning iteration, the algorithm samples the synthesized examples from the current model by a gradient-based MCMC method, e.g.,
In the end, the model learns a function that associates low energies to correct values, and higher energies to incorrect values.
After training, given a converged energy model , the Metropolis–Hastings algorithm can be used to draw new samples. The acceptance probability is given by:
History
The term "energy-based models" was first coined in a 2003
Characteristics
EBMs demonstrate useful properties:
- Simplicity and stability–The EBM is the only object that needs to be designed and trained. Separate networks need not be trained to ensure balance.
- Adaptive computation time–An EBM can generate sharp, diverse samples or (more quickly) coarse, less diverse samples. Given infinite time, this procedure produces true samples.[7]
- Flexibility–In flow-based models, the generator learns a map from a continuous space to a (possibly) discontinuous space containing different data modes. EBMs can learn to assign low energies to disjoint regions (multiple modes).
- Adaptive generation–EBM generators are implicitly defined by the probability distribution, and automatically adapt as the distribution changes (without training), allowing EBMs to address domains where generator training is impractical, as well as minimizing mode collapse and avoiding spurious modes from out-of-distribution samples.[4]
- Compositionality–Individual models are unnormalized probability distributions, allowing models to be combined through product of experts or other hierarchical techniques.
Experimental results
On image datasets such as CIFAR-10 and ImageNet 32x32, an EBM model generated high-quality images relatively quickly. It supported combining features learned from one type of image for generating other types of images. It was able to generalize using out-of-distribution datasets, outperforming flow-based and autoregressive models. EBM was relatively resistant to adversarial perturbations, behaving better than models explicitly trained against them with training for classification.
Applications
Target applications include natural language processing, robotics and computer vision.
The first energy-based generative neural network is the generative ConvNet proposed in 2016 for image patterns, where the neural network is a convolutional neural network.[10][11] The model has been generalized to various domains to learn distributions of videos,[7][2] and 3D voxels.[12] They are made more effective in their variants.[13][14][15][16][17][18] They have proven useful for data generation (e.g., image synthesis, video synthesis, [7] 3D shape synthesis,[4] etc.), data recovery (e.g., recovering videos with missing pixels or image frames,[7] 3D super-resolution,[4] etc), data reconstruction (e.g., image reconstruction and linear interpolation [14]).
Alternatives
EBMs compete with techniques such as
Extensions
Joint energy-based models
![](http://upload.wikimedia.org/wikipedia/commons/thumb/1/10/Joint_Energy_Based_Model.png/220px-Joint_Energy_Based_Model.png)
Joint energy-based models (JEM), proposed in 2020 by Grathwohl et al., allow any classifier with softmax output to be interpreted as energy-based model. The key observation is that such a classifier is trained to predict the conditional probability where is the y-th index of the logits corresponding to class y. Without any change to the logits it was proposed to reinterpret the logits to describe a joint probability density:
with unknown partition function and energy . By marginalization, we obtain the unnormalized density
therefore,
so that any classifier can be used to define an energy function .
See also
- Empirical likelihood
- Posterior predictive distribution
- Contrastive learning
Literature
- Implicit Generation and Generalization in Energy-Based Models Yilun Du, Igor Mordatch https://arxiv.org/abs/1903.08689
- Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One, Will Grathwohl, Kuan-Chieh Wang, Jörn-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, Kevin Swersky https://arxiv.org/abs/1912.03263
References
- Bibcode:2016arXiv160203264X.
- ^ S2CID 201098397.
- ^ Learning Deep Architectures for AI, Yoshua Bengio, Page 54, https://books.google.de/books?id=cq5ewg7FniMC&pg=PA54
- ^ arXiv:1903.08689 [cs.LG].
- ^ Grathwohl, Will, et al. "Your classifier is secretly an energy based model and you should treat it like one." arXiv preprint arXiv:1912.03263 (2019).
- ^ Barbu, Adrian; Zhu, Song-Chun (2020). Monte Carlo Methods. Springer.
- ^ S2CID 763074.
- ISSN 2380-288X.
- ^ Teh, Yee Whye; Welling, Max; Osindero, Simon; Hinton, Geoffrey E. (December 2003). "Energy-Based Models for Sparse Overcomplete Representations". JMLR. 4 (Dec): 1235–1260.
- S2CID 14542261.
- ^ Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey (2012). "ImageNet classification with deep convolutional neural networks" (PDF). NIPS.
- S2CID 4564025.
- S2CID 4566195.
- ^ OCLC 1106340764.)
{{cite book}}
: CS1 maint: location missing publisher (link) CS1 maint: multiple names: authors list (link - ISSN 2374-3468.
- S2CID 7759006.
- S2CID 9212174.
- S2CID 57189202.
External links
- "CIAR NCAP Summer School". www.cs.toronto.edu. Retrieved 2019-12-27.
- Dayan, Peter; Hinton, Geoffrey; Neal, Radford; Zemel, Richard S. (1999), "Helmholtz Machine", Unsupervised Learning, The MIT Press, ISBN 978-0-262-28803-3
- Hinton, Geoffrey E. (August 2002). "Training Products of Experts by Minimizing Contrastive Divergence". Neural Computation. 14 (8): 1771–1800. S2CID 207596505.
- Salakhutdinov, Ruslan; Hinton, Geoffrey (2009-04-15). "Deep Boltzmann Machines". Artificial Intelligence and Statistics: 448–455.