Hendrik/xgboost

Files

History

tqchen c8396ca24e add mock exec

2014-12-21 18:47:56 -08:00

..

mushroom-col-mpi.sh

change allreduce lib to rabit library, xgboost now run with rabit

2014-12-20 00:17:09 -08:00

mushroom-col-python.sh

change allreduce lib to rabit library, xgboost now run with rabit

2014-12-20 00:17:09 -08:00

mushroom-col-rabit-mock.sh

add mock exec

2014-12-21 18:47:56 -08:00

mushroom-col-rabit.sh

change allreduce lib to rabit library, xgboost now run with rabit

2014-12-20 00:17:09 -08:00

mushroom-col.conf

check in conf

2014-11-23 17:35:21 -08:00

mushroom-col.py

change allreduce lib to rabit library, xgboost now run with rabit

2014-12-20 00:17:09 -08:00

README.md

pas mock, need to fix rabit lib for not initialization

2014-12-21 00:14:00 -08:00

splitsvm.py

check multinode

2014-11-19 11:22:17 -08:00

README.md

Distributed XGBoost: Column Split Version

run bash mushroom-col-rabit.sh <n-process>
- mushroom-col-rabit.sh starts xgboost job using rabit's allreduce
run bash mushroom-col-rabit-mock.sh <n-process>
- mushroom-col-rabit-mock.sh starts xgboost job using rabit's allreduce, inserts suicide signal at certain point and test recovery
run bash mushroom-col-mpi.sh <n-mpi-process>
- mushroom-col.sh starts xgboost-mpi job

How to Use

First split the data by column,
In the config, specify data file as containing a wildcard %d, where %d is the rank of the node, each node will load their part of data
Enable column split mode by dsplit=col

Notes

The code is multi-threaded, so you want to run one process per node
The code will work correctly as long as union of each column subset is all the columns we are interested in.
- The column subset can overlap with each other.
It uses exactly the same algorithm as single node version, to examine all potential split points.