xgboost/README.md
2014-12-04 09:09:56 -08:00

1.3 KiB

rabit: Robust Allreduce and Broadcast Interface

rabit is a light weight library that provides a fault tolerant interface of Allreduce and Broadcast. It is designed to support easy implementation of distributed machine learning programs, many of which sits naturally under Allreduce abstraction.

Contributors: https://github.com/tqchen/rabit/graphs/contributors

Features

  • Portable library
    • Rabit is a library instead of framework, program only need to link the library to run, without restricting to a single framework.
  • Flexibility in programming
    • Programs call rabit functions in any sequence, as opposed to defines limited functions and being called.
    • Program persist over all the iterations, unless it fails and recover
  • Fault tolerance
    • Rabit program can recover model and results of syncrhonization functions calls
  • MPI compatible
    • Codes using rabit interface naturally compiles with existing MPI compiler
    • User can fall back to use MPI Allreduce if they like with no code modification

Design Note

  • Rabit is designed for algorithms that replicate same global model across nodes, while each node operating on local parition of data.
  • The global statistics collection is done by using Allreduce

Design Goal

  • rabit should run fast
  • rabit is light weight
  • rabit dig safe burrows to avoid disasters