Double descent

Source: Wikipedia, the free encyclopedia.
An example of the double descent phenomenon in a two-layer neural network: When the ratio of parameters to data points is increased, the test error falls first, then rises, then falls again.[1] The vertical line marks the boundary between the underparametrized regime (more data points than parameters) and the overparameterized regime (more parameters than data points).

In

data points used to train the model will have a large error.[2]

History

Early observations of double descent in specific models date back to 1989,

bias-variance tradeoff),[7] and the empirical observations in the 2010s that some modern machine learning models tend to perform better with larger models.[5][8]

Theoretical models

[9] shows that double descent occurs in linear regression with isotropic Gaussian covariates and isotropic Gaussian noise.

A model of double descent at the thermodynamic limit has been analyzed by the replica method, and the result has been confirmed numerically.[10]

Empirical examples

The scaling behavior of double descent has been found to follow a broken neural scaling law[11] functional form.

References

  1. ].
  2. ^ "Deep Double Descent". OpenAI. 2019-12-05. Retrieved 2022-08-12.
  3. ISSN 0295-5075
    .
  4. .
  5. ^ .
  6. .
  7. ^ Eric (2023-01-10). "The bias-variance tradeoff is not a statistical concept". Eric J. Wang. Retrieved 2024-01-05.
  8. S2CID 207808916
    .
  9. ^ Nakkiran, Preetum (2019-12-16). "More Data Can Hurt for Linear Regression: Sample-wise Double Descent". arXiv.org. Retrieved 2024-04-18.
  10. PMC 7685244
    .
  11. ^ Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022). "Broken Neural Scaling Laws". International Conference on Learning Representations (ICLR), 2023.

Further reading

External links