Double descent

In

data points used to train the model will have a large error.^[2]

History

Early observations of double descent in specific models date back to 1989,

bias-variance tradeoff),^[7] and the empirical observations in the 2010s that some modern machine learning models tend to perform better with larger models.^[5]^[8]

Theoretical models

[9] shows that double descent occurs in linear regression with isotropic Gaussian covariates and isotropic Gaussian noise.

A model of double descent at the thermodynamic limit has been analyzed by the replica method, and the result has been confirmed numerically.^[10]

Empirical examples

The scaling behavior of double descent has been found to follow a broken neural scaling law^[11] functional form.

References

arXiv:2303.14151v1 [cs.LG
].

^ "Deep Double Descent". OpenAI. 2019-12-05. Retrieved 2022-08-12.

ISSN 0295-5075
.

PMID 32371495
.

^
PMID 31341078
.

ISSN 0162-8828
.

^ Eric (2023-01-10). "The bias-variance tradeoff is not a statistical concept". Eric J. Wang. Retrieved 2024-01-05.

S2CID 207808916
.

^ Nakkiran, Preetum (2019-12-16). "More Data Can Hurt for Linear Regression: Sample-wise Double Descent". arXiv.org. Retrieved 2024-04-18.

PMC 7685244
.

^ Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022). "Broken Neural Scaling Laws". International Conference on Learning Representations (ICLR), 2023.

Part of a series on
Machine learning
and data mining

Paradigms

Supervised learning

Unsupervised learning

Online learning

Batch learning

Meta-learning

Semi-supervised learning

Self-supervised learning

Reinforcement learning

Curriculum learning

Rule-based learning

Quantum machine learning

Problems

Classification

Generative modeling

Regression

Clustering

Dimensionality reduction

Density estimation

Anomaly detection

Data cleaning

AutoML

Association rules

Semantic analysis

Structured prediction

Feature engineering

Feature learning

Learning to rank

Grammar induction

Ontology learning

Multimodal learning

Supervised learning
(classification • regression)

Apprenticeship learning

Decision trees

Ensembles
Bagging

Boosting

Random forest

k-NN

Linear regression

Naive Bayes

Artificial neural networks

Logistic regression

Perceptron

Relevance vector machine (RVM)

Support vector machine (SVM)

Clustering

BIRCH

CURE

Hierarchical

k-means

Fuzzy

Expectation–maximization (EM)

DBSCAN

OPTICS

Mean shift

Dimensionality reduction

Factor analysis

CCA

ICA

LDA

NMF

PCA

PGD

t-SNE

SDL

Structured prediction

Graphical models
Bayes net

Conditional random field

Hidden Markov

Anomaly detection

RANSAC

k-NN

Local outlier factor

Isolation forest

Artificial neural network

Autoencoder

Cognitive computing

Deep learning

DeepDream

Feedforward neural network

Recurrent neural network
LSTM

GRU

ESN

reservoir computing

Restricted Boltzmann machine

GAN

Diffusion model

SOM

Convolutional neural network
U-Net

Transformer

Vision

Mamba

Spiking neural network

Memtransistor

Electrochemical RAM (ECRAM)

Reinforcement learning

Q-learning

SARSA

Temporal difference (TD)

Multi-agent
Self-play

Learning with humans

Active learning

Crowdsourcing

Human-in-the-loop

RLHF

Model diagnostics

Coefficient of determination

Confusion matrix

Learning curve

ROC curve

Mathematical foundations

Kernel machines

Bias–variance tradeoff

Computational learning theory

Empirical risk minimization

Occam learning

PAC learning

Statistical learning

VC theory

Machine-learning venues

ECML PKDD

NeurIPS

ICML

ICLR

IJCAI

ML

JMLR

Related articles

Glossary of artificial intelligence

List of datasets for machine-learning research
List of datasets in computer vision and image processing

Outline of machine learning

v
t
e

Further reading

Mikhail Belkin; Daniel Hsu; Ji Xu (2020). "Two Models of Double Descent for Weak Features". doi:10.1137/20M1336072
.

Mount, John (3 April 2024). "The m = n Machine Learning Anomaly".

Preetum Nakkiran; Gal Kaplun; Yamini Bansal; Tristan Yang; Boaz Barak; Ilya Sutskever (29 December 2021). "Deep double descent: where bigger models and more data hurt". S2CID 207808916
.

Song Mei; Andrea Montanari (April 2022). "The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve". S2CID 199668852
.

Xiangyu Chang; Yingcong Li; Samet Oymak; Christos Thrampoulidis (2021). "Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks". Proceedings of the AAAI Conference on Artificial Intelligence. 35 (8).
arXiv:2012.08749
.

External links

Brent Werness; Jared Wilber. "Double Descent: Part 1: A Visual Introduction".

Brent Werness; Jared Wilber. "Double Descent: Part 2: A Mathematical Explanation".

Understanding "Deep Double Descent" at evhub.

v
t
e

Retrieved from "https://en.wikipedia.org/w/index.php?title=Double_descent&oldid=1220726518"

[1] rXiv:2303.14151v1 [cs.LG
].

[2] "Deep Double Descent". OpenAI. 2019-12-05. Retrieved 2022-08-12.

[3] ISSN 0295-5075
.

[4] PMID 32371495
.

[:0-5] 
PMID 31341078
.

[6] ISSN 0162-8828
.

[7] Eric (2023-01-10). "The bias-variance tradeoff is not a statistical concept". Eric J. Wang. Retrieved 2024-01-05.

[8] S2CID 207808916
.

[9] Nakkiran, Preetum (2019-12-16). "More Data Can Hurt for Linear Regression: Sample-wise Double Descent". arXiv.org. Retrieved 2024-04-18.

[10] PMC 7685244
.

[11] Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022). "Broken Neural Scaling Laws". International Conference on Learning Representations (ICLR), 2023.

[1]

[2]

[7]

[5]

[8]

[10]

[11]