xgboost/README.md
2014-12-04 19:02:11 -08:00

1.3 KiB

rabit: Reliable Allreduce and Broadcast Interface

rabit is a light weight library that provides a fault tolerant interface of Allreduce and Broadcast. It is designed to support easy implementations of distributed machine learning programs, many of which fall naturally under the Allreduce abstraction.

Features

  • Portable library
    • Rabit is a library instead of a framework, a program only needs to link the library to run.
  • Flexibility in programming
    • Programs can call rabit functions in any order, as opposed to frameworks where callbacks are offered and called by the framework, i.e. inversion of control principle.
    • Programs persist over all the iterations, unless they fail and recover.
  • Fault tolerance
    • Rabit programs can recover the model and results using synchronous function calls.
  • MPI compatible
    • Code that uses the rabit interface also compiles with existing MPI compilers
    • Users can use MPI Allreduce with no code modification

Design Notes

  • Rabit is designed for algorithms that replicate the same global model across nodes, while each node operates on a local partition of the data.
  • The collection of global statistics is done using Allreduce

Design Goals

  • rabit should run fast
  • rabit should be light weight
  • rabit should safely dig burrows to avoid disasters