2014-05-16 20:41:59 -07:00
2014-05-16 19:33:59 -07:00
2014-05-15 14:41:11 -07:00
2014-05-09 14:14:43 +08:00
2014-05-16 19:10:52 -07:00
2014-05-15 20:28:34 -07:00
ok
2014-05-15 18:56:28 -07:00
2014-05-16 20:41:59 -07:00

xgboost: eXtreme Gradient Boosting

An optimized general purpose gradient boosting (tree) library.

Contributors:

  • Tianqi Chen, project creater
  • Kailong Chen, contributes regression module
  • Bing Xu, contributes python interface, higgs example

Turorial and Documentation: https://github.com/tqchen/xgboost/wiki

Features

  • Sparse feature format:
    • Sparse feature format allows easy handling of missing values, and improve computation efficiency.
  • Push the limit on single machine:
    • Efficient implementation that optimizes memory and computation.
  • Speed: XGBoost is very fast
    • IN demo/higgs/speedtest.py, kaggle higgs data it is faster(on our machine 20 times faster using 4 threads) than sklearn.ensemble.GradientBoostingClassifier
  • Layout of gradient boosting algorithm to support user defined objective
  • Python interface, works with numpy and scipy.sparse matrix

Supported key components

  • Gradient boosting models:
    • regression tree (GBRT)
    • linear model/lasso
  • Objectives to support tasks:
    • regression
    • classification
  • OpenMP implementation

Planned components

  • More objective to support tasks:
    • ranking
    • matrix factorization
    • structured prediction

File extension convention

  • .h are interface, utils and data structures, with detailed comment;
  • .cpp are implementations that will be compiled, with less comment;
  • .hpp are implementations that will be included by .cpp, with less comment
Description
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Readme 33 MiB
Languages
C++ 45.5%
Python 20.3%
Cuda 15.2%
R 6.8%
Scala 6.4%
Other 5.6%