2014-11-23 17:35:21 -08:00
..
2014-11-23 14:03:59 -08:00
ok
2014-11-23 14:08:34 -08:00
2014-11-23 17:35:21 -08:00
2014-11-23 14:03:59 -08:00
2014-11-19 20:09:26 -08:00
ok
2014-11-23 14:08:34 -08:00
2014-11-19 11:22:17 -08:00

Distributed XGBoost: Column Split Version

  • run bash mushroom-col.sh <n-mpi-process>
    • mushroom-col.sh starts xgboost-mpi job
  • run bash mushroom-col-tcp.sh <n-process>
    • mushroom-col-tcp.sh starts xgboost job using xgboost's buildin allreduce
  • run bash mushroom-col-python.sh <n-process>
    • mushroom-col-python.sh starts xgboost python job using xgboost's buildin all reduce
    • see mushroom-col.py

How to Use

  • First split the data by column,
  • In the config, specify data file as containing a wildcard %d, where %d is the rank of the node, each node will load their part of data
  • Enable column split mode by dsplit=col

Notes

  • The code is multi-threaded, so you want to run one xgboost-mpi per node
  • The code will work correctly as long as union of each column subset is all the columns we are interested in.
    • The column subset can overlap with each other.
  • It uses exactly the same algorithm as single node version, to examine all potential split points.