This commit is contained in:
giuliohome 2014-08-31 16:31:11 +02:00
commit 23195ac95b

View File

@ -1,46 +1,13 @@
xgboost: eXtreme Gradient Boosting
======
An optimized general purpose gradient boosting (tree) library.
This is a Fork of XGBoost from https://github.com/tqchen/xgboost
Contributors: https://github.com/tqchen/xgboost/graphs/contributors
In the main repo you already find 2 windows projects for the porting of the executable and the python library.
Turorial and Documentation: https://github.com/tqchen/xgboost/wiki
Here you have:
Questions and Issues: [https://github.com/tqchen/xgboost/issues](https://github.com/tqchen/xgboost/issues?q=is%3Aissue+label%3Aquestion)
Notes on the Code: [Code Guide](src)
Features
======
* Sparse feature format:
- Sparse feature format allows easy handling of missing values, and improve computation efficiency.
* Push the limit on single machine:
- Efficient implementation that optimizes memory and computation.
* Speed: XGBoost is very fast
- IN [demo/higgs/speedtest.py](demo/kaggle-higgs/speedtest.py), kaggle higgs data it is faster(on our machine 20 times faster using 4 threads) than sklearn.ensemble.GradientBoostingClassifier
* Layout of gradient boosting algorithm to support user defined objective
* Python interface, works with numpy and scipy.sparse matrix
Build
======
* Simply type make
* If your compiler does not come with OpenMP support, it will fire an warning telling you that the code will compile into single thread mode, and you will get single thread xgboost
* You may get a error: -lgomp is not found
- You can type ```make no_omp=1```, this will get you single thread xgboost
- Alternatively, you can upgrade your compiler to compile multi-thread version
* Possible way to build using Visual Studio (not tested):
- In principle, you can put src/xgboost.cpp and src/io/io.cpp into the project, and build xgboost.
- For python module, you need python/xgboost_wrapper.cpp and src/io/io.cpp to build a dll.
Version
======
* This version is named xgboost-unity, the code has been refactored from 0.2x to be cleaner and more flexibility
* This version of xgboost is not compatible with 0.2x, due to huge amount of changes in code structure
- This means the model and buffer file of previous version can not be loaded in xgboost-unity
* For legacy 0.2x code, refer to [Here](https://github.com/tqchen/xgboost/releases/tag/v0.22)
* Change log in [CHANGES.md](CHANGES.md)
XGBoost in Graphlab Create
======
* XGBoost is adopted as part of boosted tree toolkit in Graphlab Create (GLC). Graphlab Create is a powerful python toolkit that allows you to data manipulation, graph processing, hyper-parameter search, and visualization of big data in one framework. Try the Graphlab Create in http://graphlab.com/products/create/quick-start-guide.html
* Nice blogpost by Jay Gu using GLC boosted tree to solve kaggle bike sharing challenge: http://blog.graphlab.com/using-gradient-boosted-trees-to-predict-bike-sharing-demand
1) a c# dll wrapper, meaning the passage from unmanaged to managed code, in https://github.com/giuliohome/xgboost/tree/master/windows/xgboost_sharp_wrapper
2) the c# Higgs Kaggle demo, instead of the python one (actually you will get a higher score with the c# version, due to some changes I've made) in https://github.com/giuliohome/xgboost/tree/master/windows/kaggle_higgs_demo
Start the demo from the root folder like this:
bin\x64\Debug\kaggle_higgs_demo.exe training_path.csv test_path.csv sharp_pred.csv
3) 5 fold cv implementation in c# for the demo: you see inline cv ams while training (of course on a completely separate set)