Talk:Regularization (mathematics)

Robotics Low‑importance

	This article is within the scope of WikiProject Robotics, a collaborative effort to improve the coverage of Robotics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.RoboticsWikipedia:WikiProject RoboticsTemplate:WikiProject RoboticsRobotics articles
Low	This article has been rated as Low-importance on the project's importance scale.

Merging

Regularization has two articles termed "

regularization (machine learning)

".

There is not very much difference between the two concepts. Presently "mathematics" has a small description on inverse problems, while the "machine learning" is towards a "statistical" description. I propose that the article should be called "regularization (mathematics)". "machine learning" is a narrow field. Other names or possible redirects would be

— fnielsen (talk) 13:08, 16 April 2008 (UTC)[reply]

I agree with the merge. While there may be arguments for separate pages at some point in the future, at the moment they are both little beyond stubs, and have a large degree of overlap. As for the name, "regularization (mathematics)" is probably sufficient for now, but the names you suggest sound good should they be split again in future. - 3mta3 (talk) 14:49, 16 April 2008 (UTC)[reply]

What happened? The second sections of both articles (with the table) are almost identical. Reg(ML) should be made a section of Reg(math). --88.73.36.251 (talk) 11:22, 5 September 2011 (UTC)[reply]

The origin of 'regularization' is inverse problem theory. Regularization in machine learning is the application of inverse problem theory in machine learning. Why does first example is not general one instead of some machine learning problem? — Preceding unsigned comment added by 46.76.184.231 (talk) 15:24, 19 October 2018 (UTC)[reply]

Cross validation as an alternative to regularization

I'm no statistics expert, but I sometimes dabble in machine learning. I was wondering what the following means:

Alternative methods of controlling overfitting not involving regularization include cross-validation.

This strikes me as a bit odd, because one of the main things I use cross-validation for is to find the right value of a regularization parameter. Can anyone with more experience in statistics provide a clue here? Qwertyus (talk) 13:03, 23 February 2012 (UTC)[reply]

Cross-validation is used in several forms of machine learning as a regularization technique. For example, when training a neural network the data is often partitioned into three classes: training, probe, and test. An optimization problem is constructed based solely on the error between the network's output and the training data. This error is minimized through many iterated steps. The issue is that neural networks almost always overfit the training data. So the probe data (e.g., cross-validation data) is probed during the optimization routine, even though it is not used directly in the aforementioned optimization problem. The neural network returned is the neural network with the minimal probe error, not the network with the minimal training error. This serves to mitigate overtraining. The test data is then used to estimate the actual accuracy of the neural network.

One final note. Typically cross-validation data is only used for a single purpose. The more use you make out of the cross-validation data, the less likely the cross-validation error will correspond to actual unseen data. This is easily demonstrated with a thought experiment (although I am sure there is also a statistical proof somewhere too). Consider you have a large number of hyper-parameters to tune (regularization parameters, step size, network configuration, is momentum considered, etc.). Suppose there are millions of degrees of freedom in hyper-parameters, but there are not that many degrees of freedom contributed from the cross-validation data. Just by chance something will work. In general the smaller the cross-validation data is relative to the number of hyper-parameters, the more likely there will be a configuration that works by chance to minimize the cross-validation error but not the true error. This can be seen as overfitting the hyper-parameters at a meta-optimization level. — Preceding unsigned comment added by 150.135.222.130 (talk) 21:14, 31 January 2013 (UTC)[reply]

I'm sorry but the above comment is incorrect. Cross validation is not a regularisation technique at all. As stated in the corresponding wikipedia article, it's a model validation technique. Maltimore (talk) 12:58, 30 August 2019 (UTC)[reply]

The hat symbols

The use of hats seems to be inconsistent and never clearly explained. In case of the Tikhonov Regularized Least Square it even seems to be unnecessary. — Preceding unsigned comment added by Tomas.krehlik (talk • contribs) 10:48, 2 February 2016 (UTC)[reply]

I agree. Unless someone objects, I will change this soon. Maltimore (talk) 12:54, 30 August 2019 (UTC)[reply]

Explain X and Y

The article never explains what variables X and Y (input and output) mean. — Preceding unsigned comment added by 169.234.127.96 (talk) 19:39, 17 September 2014 (UTC)[reply]

Much more explanation needed

This article is littered with symbols that are not explained. If you don't know the meaning of a symbol you are flummoxed, if you do know the meaning then you don't need to read the article (except perhaps as an aide memoire). My personal flummox is the parallel lines and 2s : $\|w\|_{2}^{2}$ - what does mean? I guess it's a norm (magnitude of vector, matrix or function) and the top 2 means the components / kernel are squared. But what about the lower 2 I went to the article on Tikhonov, but that wasn't much help as it describes a having matrix Q in place of the lower, scalar, 2. I get that we would probably not explain $\sum _{i=1}^{n}$ (although we might) but IMHO this norm notation is sufficiently obscure as to require an explanation. Aredgers (talk) 13:33, 23 November 2017 (UTC)[reply]

I think the

\|w\|_{2}^{2}

is quite common. The lower 2 indeed means that you take the L2 norm and the upper 2 is no special notation, it just means that you square the result. Maltimore (talk) 12:58, 30 August 2019 (UTC)[reply]

This article needs improvement for example: "It (Sic, Tikhonov regularization) is also known as ridge regression." Not quite, ridge regression is a subset procedure of Tikhonov regularization with other minor differences---see https://stats.stackexchange.com/q/234280/99274 Also, machine learning regularization is a subset of regularization for statistics and not the obverse. Also, regularization (mathematics) means something else entirely, for example, regularized $\sin(x)$ is $\sin ^{2}(x)$ because $\sin ^{2}(x)+\cos ^{2}(x)=1$ and a regularized lower incomplete gamma function is ${\frac {\gamma (a,b)}{\Gamma (a)}}$ because ${\frac {\gamma (a,b)}{\Gamma (a)}}+{\frac {\Gamma (a,b)}{\Gamma (a)}}=1$ .CarlWesolowski (talk) 15:30, 13 April 2020 (UTC)[reply]

Machine learning bias

Regularization is a term broadly used in many sub-fields of pure and applied mathematics, data science, etc. In its current form, and probably because of the previous merge, it is strongly biased toward machine learning. The very first section starts with "Empirical learning of classifiers (...)", which is incredibly narrow. — Preceding unsigned comment added by Pthibault (talk • contribs) 09:48, 30 November 2020 (UTC)[reply]

Agreed. A way to write this section that is both more neutral and more general is to talk about a generic interpolation problem: given a collection of samples, find an interpolating function. This problem dates back hundreds of years, long before the dawn of machine learning, and is central to many sciences beyond computer science. (I fear, however, that if we try to edit the article in this way, we will be bludgeoned by the current horde of machine learners...) — Preceding unsigned comment added by 74.109.244.65 (talk • contribs) 18 May 2021 (UTC)

I believe by 2023 a separate article about regularizing in ML is long overdue, as it has much more applications and importance now. That would remove the bias, as this article would only retain a short overview. Ain92 (talk) 10:21, 11 March 2023 (UTC)[reply]

Etymology

Is there any information on the etymology of the term? I always assumed that the term goes back to Tikhonov regularization, which takes a singular or near-singular matrix and makes it "more regular", i.e. increases the absolute value of its determinant. — Preceding unsigned comment added by 2001:16B8:2E8A:8800:B45F:5236:B1C9:CEB7 (talk) 07:40, 24 February 2022 (UTC)[reply]

unclear comment on Kratsios reference

Note 1, which gives a paper by Kratsios and Hyndman as a reference for the claim that regularization is used in finance, includes the statement "Term structure models can be regularized to remove arbitrage opportunities [sic?]." This is a meaningful and correct statement in the context of the original study (although certainly confusing to the general public). It also doesn't seem to appear, at least verbatim, in that paper or in other papers by Kratsios, although it's certainly a sentence that could appear in this part of the math finance literature. (Arbitrage opportunities are a property of solutions of math finance problems that are unwanted, regularization can be used to restrict solutions to those that don't have arbitrage opportunities). I would delete it, but I hesitate because I don't know of the intention of the original author of this text ...

Update this article by an AI?

I'd like to propose that we try to update this article by making use of an AI MorgsTheBot (talk) 15:13, 1 April 2023 (UTC)[reply]

Suggested improvements from ChatGPT:

The page on "Regularization (mathematics)" on Wikipedia is quite comprehensive and informative. However, here are a few suggestions for improvement:

The page could benefit from more examples and applications of regularization in different areas of mathematics and science.

The introduction could be rewritten to provide a more accessible explanation of what regularization is and its main purposes.

Some of the mathematical notation and terminology could be explained in more detail or accompanied by more intuitive explanations.

The page could include more references to recent research and developments in the field of regularization.

The section on "Tikhonov regularization" could be expanded to provide more information on its practical applications and its relationship to other types of regularization.

The section on "Bayesian regularization" could benefit from more information on its strengths and weaknesses compared to other regularization methods.

The section on "Sparsity regularization" could be expanded to provide more information on its applications in machine learning and data science.

Overall, the page is a great resource for those interested in regularization in mathematics, but incorporating these suggestions could make it even more comprehensive and useful. MorgsTheBot (talk) 16:49, 1 April 2023 (UTC)[reply]

Adding citations to the article would have the most significant impact on the quality of the article, as it helps to establish the credibility of the information presented and allows readers to verify the information themselves. In addition, it helps to prevent errors and misinformation from being propagated.

Ok great, can you suggest a few articles we could cite to improve this article?

Sure! Here are a few articles that might be helpful to cite:

"A Review of Regularization Techniques in Machine Learning," by Ahmed A. Ahmed and A. M. Riad. This article provides a comprehensive overview of regularization techniques used in machine learning, including L1, L2, and elastic net regularization.

"An Introduction to Regularization," by Michael K. Tippett. This article provides a clear introduction to regularization and its use in statistics and machine learning.

"Regularization and Variable Selection via the Elastic Net," by Hui Zou and Trevor Hastie. This article presents the elastic net regularization technique, which combines L1 and L2 regularization and is commonly used in machine learning.

These articles should help provide a more detailed understanding of regularization and its use in mathematics and machine learning. MorgsTheBot (talk) 16:54, 1 April 2023 (UTC)[reply]

To improve the quality of this Wikipedia article, I would recommend the following changes:

Add an introductory section: Before diving into specific regularization techniques, provide a brief introduction to the concept of regularization, its purpose in machine learning, and the types of problems it addresses.

Add more visual aids: Include diagrams and illustrations to help visualize concepts and techniques, such as the effect of different regularization methods on the solution space or how proximal operators work.

Provide clearer explanations: Some sections may be difficult for readers to understand, especially those without a strong mathematical background. Simplify explanations and provide more intuitive examples to make the content more accessible.

Improve the organization: Reorganize the article to present the regularization techniques in a more logical order, possibly grouping them by application area or type of technique (e.g., L1-norm, L2-norm, group sparsity, multitask learning, etc.).

Add more context and motivation: For each regularization technique, provide more context and motivation by explaining why it was developed, the problem it addresses, and its advantages and disadvantages compared to other methods.

Expand on applications and examples: Include more real-world examples and applications of each regularization method, such as the use of Lasso for feature selection or ridge regression for multicollinearity issues.

Update references and citations: Ensure that all references and citations are up-to-date and properly formatted. This could involve adding more recent research papers or replacing older references with newer, more relevant sources.

Proofread and edit for clarity: Carefully proofread the article to correct any grammatical errors or unclear phrasing. Ensure that mathematical notation and symbols are used consistently throughout the text.

Add further reading and external resources: Include a "Further Reading" or "External Resources" section to direct readers to additional resources, such as textbooks, online courses, or video lectures on regularization techniques.

Encourage collaboration and feedback: Invite experts in the field to review the article and provide suggestions for improvement. Additionally, engage with the broader community by encouraging readers to contribute their knowledge and expertise to enhance the article's content. MorgsTheBot (talk) 18:09, 1 April 2023 (UTC)[reply]

Here are a few experts in the field of machine learning and regularization techniques who you could invite to collaborate on the Wikipedia article:

Prof. Trevor Hastie: A prominent researcher in machine learning, Prof. Hastie has co-authored influential books like "The Elements of Statistical Learning" and "Introduction to Statistical Learning." He is a Professor of Statistics at Stanford University.

Prof. Robert Tibshirani: Prof. Tibshirani, also a Professor of Statistics at Stanford University, is well-known for his work on Lasso and other regularization techniques. He co-authored "The Elements of Statistical Learning" and "Introduction to Statistical Learning" with Prof. Hastie.

Prof. Yann LeCun: As one of the pioneers in deep learning, Prof. LeCun has made significant contributions to the field of machine learning. He is a Professor of Computer Science at New York University and the Chief AI Scientist at Facebook.

Prof. Yoshua Bengio: Another pioneer in deep learning, Prof. Bengio has conducted extensive research in machine learning and neural networks. He is a Professor at the University of Montreal and the Co-founder of MILA (Montreal Institute for Learning Algorithms).

Dr. Francis Bach: Dr. Bach is a researcher at INRIA, the French National Institute for Research in Digital Science and Technology. He specializes in machine learning, optimization, and statistical signal processing. His work on group sparsity and other regularization methods is highly regarded.

Keep in mind that these experts are very busy and may not have the time to collaborate directly on a Wikipedia article. However, you could still reach out to them for their insights, suggestions, or recommendations of other experts in the field who might be interested in collaborating. Additionally, consider engaging with the broader machine learning community through forums, mailing lists, or conferences to find experts who are willing to contribute to the article. MorgsTheBot (talk) 18:14, 1 April 2023 (UTC)[reply]