XGBoost
Developer(s) | The XGBoost Contributors |
---|---|
Initial release | March 27, 2014 |
Stable release | 2.0.3[1]
/ 19 December 2023 |
Repository | |
Written in | Apache License 2.0 |
Website | xgboost |
XGBoost
XGBoost gained much popularity and attention in the mid-2010s as the algorithm of choice for many winning teams of machine learning competitions.[11]
History
XGBoost initially started as a research project by Tianqi Chen
It was soon integrated with a number of other packages making it easier to use in their respective communities. It has now been integrated with scikit-learn for Python users and with the caret package for R users. It can also be integrated into Data Flow frameworks like Apache Spark, Apache Hadoop, and Apache Flink using the abstracted Rabit[13] and XGBoost4J.[14] XGBoost is also available on OpenCL for FPGAs.[15] An efficient, scalable implementation of XGBoost has been published by Tianqi Chen and Carlos Guestrin.[16]
While the XGBoost model often achieves higher accuracy than a single decision tree, it sacrifices the intrinsic interpretability of decision trees. For example, following the path that a decision tree takes to make its decision is trivial and self-explained, but following the paths of hundreds or thousands of trees is much harder.
Features
Salient features of XGBoost which make it different from other gradient boosting algorithms include:[17][18][19][16]
- Clever penalization of trees
- A proportional shrinking of leaf nodes
- Newton Boosting
- Extra randomization parameter
- Implementation on single, out-of-corecomputation
- Automatic Feature selection [citation needed]
- Theoretically justified weighted quantile sketching for efficient computation
- Parallel tree structure boosting with sparsity
- Efficient cacheable block structure for decision tree training
The algorithm
XGBoost works as Newton-Raphson in function space unlike gradient boosting that works as gradient descent in function space, a second order Taylor approximation is used in the loss function to make the connection to Newton Raphson method.
A generic unregularized XGBoost algorithm is:
Input: training set , a differentiable loss function , a number of weak learners and a learning rate .
Algorithm:
- Initialize model with a constant value:
- For m = 1 to M:
- Compute the 'gradients' and 'hessians':
- Fit a base learner (or weak learner, e.g. tree) using the training set by solving the optimization problem below:
- Update the model:
- Compute the 'gradients' and 'hessians':
- Output
Awards
- John Chambers Award (2016)[20]
- High Energy Physics meets Machine Learning award (HEP meets ML) (2016)[21]
See also
References
- ^ "Release 2.0.3". 19 December 2023. Retrieved 19 December 2023.
- ^ "GitHub project webpage". GitHub. June 2022. Archived from the original on 2021-04-01. Retrieved 2016-04-05.
- ^ "Python Package Index PYPI: xgboost". Archived from the original on 2017-08-23. Retrieved 2016-08-01.
- ^ "CRAN package xgboost". Archived from the original on 2018-10-26. Retrieved 2016-08-01.
- ^ "Julia package listing xgboost". Archived from the original on 2016-08-18. Retrieved 2016-08-01.
- ^ "CPAN module AI::XGBoost". Archived from the original on 2020-03-28. Retrieved 2020-02-09.
- ^ "Installing XGBoost for Anaconda in Windows". IBM. Archived from the original on 2018-05-08. Retrieved 2016-08-01.
- ^ "Installing XGBoost on Mac OSX". IBM. Archived from the original on 2018-05-08. Retrieved 2016-08-01.
- ^ "Dask Homepage". Archived from the original on 2022-09-14. Retrieved 2021-07-15.
- ^ "Distributed XGBoost with Dask — xgboost 1.5.0-dev documentation". xgboost.readthedocs.io. Archived from the original on 2022-06-04. Retrieved 2021-07-15.
- ^ a b "XGBoost - ML winning solutions (incomplete list)". GitHub. Archived from the original on 2017-08-24. Retrieved 2016-08-01.
- ^ "Story and Lessons behind the evolution of XGBoost". Archived from the original on 2016-08-07. Retrieved 2016-08-01.
- ^ "Rabit - Reliable Allreduce and Broadcast Interface". GitHub. Archived from the original on 2018-06-11. Retrieved 2016-08-01.
- ^ "XGBoost4J". Archived from the original on 2018-05-08. Retrieved 2016-08-01.
- ^ "XGBoost on FPGAs". GitHub. Archived from the original on 2020-09-13. Retrieved 2019-08-01.
- ^ S2CID 4650265.
- ^ Gandhi, Rohith (2019-05-24). "Gradient Boosting and XGBoost". Medium. Archived from the original on 2020-03-28. Retrieved 2020-01-04.
- ^ "Boosting algorithm: XGBoost". Towards Data Science. 2017-05-14. Archived from the original on 2022-04-06. Retrieved 2020-01-04.
- ^ "Tree Boosting With XGBoost – Why Does XGBoost Win "Every" Machine Learning Competition?". Synced. 2017-10-22. Archived from the original on 2020-03-28. Retrieved 2020-01-04.
- ^ "John Chambers Award Previous Winners". Archived from the original on 2017-07-31. Retrieved 2016-08-01.
- ^ "HEP meets ML Award". Archived from the original on 2018-05-08. Retrieved 2016-08-01.