rabit: Robust Allreduce and Broadcast Interface
rabit is a light weight library that provides a fault tolerant interface of Allreduce and Broadcast. It is designed to support easy implementation of distributed machine learning programs, many of which sits naturally under Allreduce abstraction.
Contributors: https://github.com/tqchen/rabit/graphs/contributors
Design Note
- Rabit is designed for algorithms that replicate same global model across nodes, while each node operating on local parition of data.
- The global statistics collection is done by using Allreduce
Design Goal
- rabit should run fast
- rabit is light weight
- rabit dig safe burrows to avoid disasters
Features
- Portable library
- Rabit is a library instead of framework, program only need to link the library to run, without restricting to a single framework.
- Flexibility in programming
- Programs call rabit functions in any sequence, as opposed to defines limited functions and being called.
- Program persist over all the iterations, unless it fails and recover
- Fault tolerance
- Rabit program can recover model and results of syncrhonization functions calls
- MPI compatible
- Codes using rabit interface naturally compiles with existing MPI compiler
- User can fall back to use MPI Allreduce if they like with no code modification
Description
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Languages
C++
45.5%
Python
20.3%
Cuda
15.2%
R
6.8%
Scala
6.4%
Other
5.6%