Self-supervised learning
Part of a series on |
Machine learning and data mining |
---|
Self-supervised learning (SSL) is a paradigm in machine learning where a model is trained on a task using the data itself to generate supervisory signals, rather than relying on external labels provided by humans. In the context of neural networks, self-supervised learning aims to leverage inherent structures or relationships within the input data to create meaningful training signals. SSL tasks are designed so that solving it requires capturing essential features or relationships in the data. The input data is typically augmented or transformed in a way that creates pairs of related samples. One sample serves as the input, and the other is used to formulate the supervisory signal. This augmentation can involve introducing noise, cropping, rotation, or other transformations. Self-supervised learning more closely imitates the way humans learn to classify objects.[1]
The typical SSL method is based on an
Self-supervised learning has produced promising results in recent years and has found practical application in audio processing and is being used by Facebook and others for speech recognition.[8]
Types
Autoassociative self-supervised learning
Autoassociative self-supervised learning is a specific category of self-supervised learning where a neural network is trained to reproduce or reconstruct its own input data.[9] In other words, the model is tasked with learning a representation of the data that captures its essential features or structure, allowing it to regenerate the original input.
The term "autoassociative" comes from the fact that the model is essentially associating the input data with itself. This is often achieved using
The training process involves presenting the model with input data and requiring it to reconstruct the same data as closely as possible. The loss function used during training typically penalizes the difference between the original input and the reconstructed output. By minimizing this reconstruction error, the autoencoder learns a meaningful representation of the data in its latent space.
Contrastive self-supervised learning
For a binary classification task,
Non-contrastive self-supervised learning
Non-contrastive self-supervised learning (NCSSL) uses only positive examples. Counterintuitively, NCSSL converges on a useful local minimum rather than reaching a trivial solution, with zero loss. For the example of binary classification, it would trivially learn to classify each example as positive. Effective NCSSL requires an extra predictor on the online side that does not back-propagate on the target side.[10]
Comparison with other forms of machine learning
SSL belongs to supervised learning methods insofar as the goal is to generate a classified output from the input. At the same time, however, it does not require the explicit use of labeled input-output pairs. Instead, correlations, metadata embedded in the data, or domain knowledge present in the input are implicitly and autonomously extracted from the data. These supervisory signals, generated from the data, can then be used for training.[1]
SSL is similar to unsupervised learning in that it does not require labels in the sample data. Unlike unsupervised learning, however, learning is not done using inherent data structures.
In transfer learning a model designed for one task is reused on a different task.[11]
Training an autoencoder intrinsically constitutes a self-supervised process, because the output pattern needs to become an optimal reconstruction of the input pattern itself. However, in current jargon, the term 'self-supervised' has become associated with classification tasks that are based on a pretext-task training setup. This involves the (human) design of such pretext task(s), unlike the case of fully self-contained autoencoder training.[9]
In reinforcement learning, self-supervising learning from a combination of losses can create abstract representations where only the most important information about the state are kept in a compressed way.[12]
Examples
Self-supervised learning is particularly suitable for speech recognition. For example,
Google's Bidirectional Encoder Representations from Transformers (BERT) model is used to better understand the context of search queries.[13]
OpenAI's GPT-3 is an autoregressive language model that can be used in language processing. It can be used to translate texts or answer questions, among other things.[14]
Bootstrap Your Own Latent (BYOL) is a NCSSL that produced excellent results on ImageNet and on transfer and semi-supervised benchmarks.[15]
The
DirectPred is a NCSSL that directly sets the predictor weights instead of learning it via gradient update.[10]
Self-GenomeNet is an example of self-supervised learning in genomics.[16]
References
- ^ a b Bouchard, Louis (25 November 2020). "What is Self-Supervised Learning? | Will machines ever be able to learn like humans?". Medium. Retrieved 9 June 2021.
- . Retrieved 1 November 2022.
- S2CID 473729.
- ^ S2CID 167209887.
- S2CID 9062671.
- S2CID 3796689.
- S2CID 186206588.
- ^ a b "Wav2vec: State-of-the-art speech recognition through self-supervision". ai.facebook.com. Retrieved 9 June 2021.
- ^ .
- ^ a b c d "Demystifying a key self-supervised learning technique: Non-contrastive learning". ai.facebook.com. Retrieved 5 October 2021.
- S2CID 6517610.
- arXiv:1809.04506.
- ^ "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing". Google AI Blog. 2 November 2018. Retrieved 9 June 2021.
- S2CID 222291675.
- arXiv:2006.07733 [cs.LG].
- PMC 10495322.
Further reading
- Balestriero, Randall; Ibrahim, Mark; Sobal, Vlad; Morcos, Ari; Shekhar, Shashank; Goldstein, Tom; Bordes, Florian; Bardes, Adrien; Mialon, Gregoire; Tian, Yuandong; Schwarzschild, Avi; Wilson, Andrew Gordon; Geiping, Jonas; Garrido, Quentin; Fernandez, Pierre (24 April 2023). "A Cookbook of Self-Supervised Learning". arXiv:2304.12210 [cs.LG].
External links
- Doersch, Carl; Zisserman, Andrew (October 2017). "Multi-task Self-Supervised Visual Learning". 2017 IEEE International Conference on Computer Vision (ICCV). pp. 2070–2079. S2CID 473729.
- Doersch, Carl; Gupta, Abhinav; Efros, Alexei A. (December 2015). "Unsupervised Visual Representation Learning by Context Prediction". 2015 IEEE International Conference on Computer Vision (ICCV). pp. 1422–1430. S2CID 9062671.
- Zheng, Xin; Wang, Yong; Wang, Guoyou; Liu, Jianguo (1 April 2018). "Fast and robust segmentation of white blood cell images by self-supervised learning". Micron. 107: 55–71. S2CID 3796689.
- Yarowsky, David (1995). "Unsupervised Word Sense Disambiguation Rivaling Supervised Methods". Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Cambridge, MA: Association for Computational Linguistics: 189–196. . Retrieved 1 November 2022.