2015-01-19 08:41:14 -08:00
..
2015-01-19 08:41:14 -08:00
2015-01-15 21:55:56 -08:00
2014-11-19 19:21:56 -08:00

Distributed XGBoost: Row Split Version

  • You might be interested to checkout the Hadoop example
  • Machine Rabit: run bash machine-row-rabit.sh <n-mpi-process>
    • machine-col-rabit.sh starts xgboost job using rabit

How to Use

  • First split the data by rows
  • In the config, specify data file as containing a wildcard %d, where %d is the rank of the node, each node will load their part of data
  • Enable ow split mode by dsplit=row

Notes

  • The code is multi-threaded, so you want to run one xgboost-mpi per node
  • Row-based solver split data by row, each node work on subset of rows, it uses an approximate histogram count algorithm, and will only examine subset of potential split points as opposed to all split points.