Hendrik/xgboost

Files

History

tqchen 62a108a7c2 chg of hadoop script

2015-01-11 21:02:38 -08:00

..

mushroom.hadoop.conf

hadoop version conf

2015-01-11 15:44:16 +08:00

README.md

chg of hadoop script

2015-01-11 21:02:38 -08:00

run_mushroom.sh

chg of hadoop script

2015-01-11 21:02:38 -08:00

README.md

Distributed XGBoost: Hadoop Version

Hadoop version: run bash run_binary_classification.sh <n_hadoop_workers> <n_thread_per_worker> <path_in_HDFS>
- This is the hadoop version of binary classification example in the demo folder.

How to Use

Check whether environment variable $HADOOP_HOME exists (e.g. run echo $HADOOP_HOME). If not, plz set up hadoop-streaming.jar path in rabit_hadoop.py.

Notes

The code has been tested on MapReduce 1 (MRv1) and YARN, it recommended run on MapReduce 2 (MRv2, YARN).
The code is multi-threaded, so you want to run one xgboost per node/worker, which means you want to set <n_thread_per_worker> to be number of cores you have on each machine.
- You will need YARN to set specify number of cores of each worker
The hadoop version save the final model into HDFS