Grokking (machine learning)

In
Transition
Grokking was introduced in January 2022 by OpenAI researchers investigating how neural network perform calculations. It is derived from the word grok coined by Robert Heinlein in his novel Stranger in a Strange Land.[1]
Grokking can be understood as a phase transition during the training process.[5] While grokking has been thought of as largely a phenomenon of relatively shallow models, grokking has been observed in deep neural networks and non-neural models and is the subject of active research.[6][7][8][9]
One potential explanation is that the
Recent theories[10][11] have hypothesized that grokking occurs when neural networks transition from a "lazy training"[12] regime where the weights do not deviate far from initialization, to a "rich" regime where weights abruptly begin to move in task-relevant directions. Follow-up empirical and theoretical work[13] has accumulated evidence in support of this perspective, and it offers a unifying view of earlier work as the transition from lazy to rich training dynamics is known to arise from properties of adaptive optimizers,[14] weight decay,[15] initial parameter weight norm,[8] and more.
See also
- Deep double descent
References
- ^ a b c Ananthaswamy, Anil (2024-04-12). "How Do Machines 'Grok' Data?". Quanta Magazine. Retrieved 2025-01-21.
- ^ Pearce, Adam; Ghandeharioun, Asma; Hussein, Nada; Thain, Nithum; Wattenberg, Martin; Dixon, Lucas. "Do Machine Learning Models Memorize or Generalize?". pair.withgoogle.com. Retrieved 2024-06-04.
- arXiv:2201.02177 [cs.LG].
- arXiv:2310.19470 [cs.LG].
- arXiv:2205.10343.
- arXiv:2405.19454 [cs.LG].
- arXiv:2310.17247 [cs.LG].
- ^ arXiv:2210.01117.
- ISBN 978-1-7281-8671-9.
- , arXiv:2310.06110, retrieved 2025-02-17
- , arXiv:2311.18817, retrieved 2025-02-17
- , arXiv:1812.07956, retrieved 2025-02-17
- , arXiv:2407.12332, retrieved 2025-02-17
- , arXiv:2206.04817, retrieved 2025-02-17
- , arXiv:2309.02390, retrieved 2025-02-17