* make DMatrix._init_from_npy2d only copy data when necessary When creating DMatrix from a 2d ndarray, it can unnecessarily copy the input data. This can be problematic when the data is already very large--running out of memory. The copy is temporary (going out of scope at the end of this function) but it still adds to peak memory usage. ``numpy.array`` copies its input no matter what by default. By adding ``copy=False``, it will only do so when necessary. Since XGDMatrixCreateFromMat is readonly on the input buffer, this copy is not needed. Also added comments explaining when a copy can happen (if data ordering/layout is wrong or if type is not 32-bit float). * remove whitespace
eXtreme Gradient Boosting
Documentation | Resources | Installation | Release Notes | RoadMap
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting(also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. The same code runs on major distributed environment(Hadoop, SGE, MPI) and can solve problems beyond billions of examples.
What's New
- XGBoost4J: Portable Distributed XGboost in Spark, Flink and Dataflow, see JVM-Package
- Story and Lessons Behind the Evolution of XGBoost
- Tutorial: Distributed XGBoost on AWS with YARN
- XGBoost brick Release
Ask a Question
- For reporting bugs please use the xgboost/issues page.
- For generic questions for to share your experience using xgboost please use the XGBoost User Group
Help to Make XGBoost Better
XGBoost has been developed and used by a group of active community members. Your help is very valuable to make the package better for everyone.
- Check out call for contributions and Roadmap to see what can be improved, or open an issue if you want something.
- Contribute to the documents and examples to share your experience with other users.
- Add your stories and experience to Awesome XGBoost.
- Please add your name to CONTRIBUTORS.md and after your patch has been merged.
- Please also update NEWS.md on changes and improvements in API and docs.
License
© Contributors, 2016. Licensed under an Apache-2 license.
Reference
- Tianqi Chen and Carlos Guestrin. XGBoost: A Scalable Tree Boosting System. In 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016
- XGBoost originates from research project at University of Washington, see also the Project Page at UW.
Description
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Languages
C++
45.5%
Python
20.3%
Cuda
15.2%
R
6.8%
Scala
6.4%
Other
5.6%