2014-08-15 20:15:58 -07:00
2014-08-01 11:21:17 -07:00
2014-08-15 20:15:58 -07:00
2014-08-15 20:15:58 -07:00
2014-05-25 10:18:04 -07:00
2014-05-25 16:46:03 -07:00
2014-08-15 20:15:58 -07:00
2014-08-15 20:15:58 -07:00
2014-05-16 20:46:08 -07:00
2014-08-15 20:15:58 -07:00
2014-05-15 20:28:34 -07:00
2014-08-15 20:15:58 -07:00
2014-08-15 20:15:58 -07:00
2014-08-15 20:15:58 -07:00

xgboost: eXtreme Gradient Boosting

An optimized general purpose gradient boosting (tree) library.

Contributors: https://github.com/tqchen/xgboost/graphs/contributors

Turorial and Documentation: https://github.com/tqchen/xgboost/wiki

Questions and Issues: https://github.com/tqchen/xgboost/issues

xgboost-unity

experimental branch: refactor xgboost, cleaner code, more flexibility

Build

  • Simply type make
  • If your compiler does not come with OpenMP support, it will fire an warning telling you that the code will compile into single thread mode, and you will get single thread xgboost
    • You may get a error: -lgomp is not found, you can remove -fopenmp flag in Makefile to get single thread xgboost, or upgrade your compiler to compile multi-thread version

Project Logical Layout

  • Dependency order: learner->gbm->tree
  • tree are implementations of tree construction algorithms.
  • gbm is gradient boosting interface, that takes trees and other base learner to do boosting.
    • gbm only takes gradient as sufficient statistics, it does not compute the gradient.
  • learner is learning module that computes gradient for specific object, and pass it to GBM

File Naming Convention

  • The project is templatized, to make it easy to adjust input data structure.
  • .h files are data structures and interface, which are needed to use functions in that layer.
  • -inl.hpp files are implementations of interface, like cpp file in most project.
    • You only need to understand the interface file to understand the usage of that layer
Description
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Readme 33 MiB
Languages
C++ 45.5%
Python 20.3%
Cuda 15.2%
R 6.8%
Scala 6.4%
Other 5.6%