45 lines
1.5 KiB
Markdown
45 lines
1.5 KiB
Markdown
xgboost: eXtreme Gradient Boosting
|
|
=======
|
|
An optimized general purpose gradient boosting (tree) library.
|
|
|
|
Contributors:
|
|
* Tianqi Chen, project creater
|
|
* Kailong Chen, contributes regression module
|
|
* Bing Xu, contributes python interface, higgs example
|
|
|
|
Turorial and Documentation: https://github.com/tqchen/xgboost/wiki
|
|
|
|
Features
|
|
=======
|
|
* Sparse feature format:
|
|
- Sparse feature format allows easy handling of missing values, and improve computation efficiency.
|
|
* Push the limit on single machine:
|
|
- Efficient implementation that optimizes memory and computation.
|
|
* Speed: XGBoost is very fast
|
|
- IN [demo/higgs/speedtest.py](demo/kaggle-higgs/speedtest.py), kaggle higgs data it is faster(on our machine 20 times faster using 4 threads) than sklearn.ensemble.GradientBoostingClassifier
|
|
* Layout of gradient boosting algorithm to support user defined objective
|
|
* Python interface, works with numpy and scipy.sparse matrix
|
|
|
|
Supported key components
|
|
=======
|
|
* Gradient boosting models:
|
|
- regression tree (GBRT)
|
|
- linear model/lasso
|
|
* Objectives to support tasks:
|
|
- regression
|
|
- classification
|
|
* OpenMP implementation
|
|
|
|
Planned components
|
|
=======
|
|
* More objective to support tasks:
|
|
- ranking
|
|
- matrix factorization
|
|
- structured prediction
|
|
|
|
File extension convention
|
|
=======
|
|
* .h are interface, utils and data structures, with detailed comment;
|
|
* .cpp are implementations that will be compiled, with less comment;
|
|
* .hpp are implementations that will be included by .cpp, with less comment
|