* Bugfix 1: Fix segfault in multithreaded ApplySplitSparseData() When there are more threads than rows in rowset, some threads end up with empty ranges, causing them to crash. (iend - 1 needs to be accessible as part of algorithm) Fix: run only those threads with nonempty ranges. * Add regression test for Bugfix 1 * Moving python_omp_test to existing python test group Turns out you don't need to set "OMP_NUM_THREADS" to enable multithreading. Just add nthread parameter. * Bugfix 2: Fix corner case of ApplySplitSparseData() for categorical feature When split value is less than all cut points, split_cond is set incorrectly. Fix: set split_cond = -1 to indicate this scenario * Bugfix 3: Initialize data layout indicator before using it data_layout_ is accessed before being set; this variable determines whether feature 0 is included in feat_set. Fix: re-order code in InitData() to initialize data_layout_ first * Adding regression test for Bugfix 2 Unfortunately, no regression test for Bugfix 3, as there is no way to deterministically assign value to an uninitialized variable.
eXtreme Gradient Boosting
Documentation | Resources | Installation | Release Notes | RoadMap
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. The same code runs on major distributed environment (Hadoop, SGE, MPI) and can solve problems beyond billions of examples.
What's New
- XGBoost4J: Portable Distributed XGboost in Spark, Flink and Dataflow, see JVM-Package
- Story and Lessons Behind the Evolution of XGBoost
- Tutorial: Distributed XGBoost on AWS with YARN
- XGBoost brick Release
Ask a Question
- For reporting bugs please use the xgboost/issues page.
- For generic questions or to share your experience using XGBoost please use the XGBoost User Group
Help to Make XGBoost Better
XGBoost has been developed and used by a group of active community members. Your help is very valuable to make the package better for everyone.
- Check out call for contributions and Roadmap to see what can be improved, or open an issue if you want something.
- Contribute to the documents and examples to share your experience with other users.
- Add your stories and experience to Awesome XGBoost.
- Please add your name to CONTRIBUTORS.md and after your patch has been merged.
- Please also update NEWS.md on changes and improvements in API and docs.
License
© Contributors, 2016. Licensed under an Apache-2 license.
Reference
- Tianqi Chen and Carlos Guestrin. XGBoost: A Scalable Tree Boosting System. In 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016
- XGBoost originates from research project at University of Washington, see also the Project Page at UW.
Description
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Languages
C++
45.5%
Python
20.3%
Cuda
15.2%
R
6.8%
Scala
6.4%
Other
5.6%