rabit: Robust Allreduce and Broadcast Interface
rabit is a light weight library that provides a fault tolerant interface of Allreduce and Broadcast. It is designed to support easy implementation of distributed machine learning programs, many of which sits naturally under Allreduce abstraction.
Interface: rabit.h
Features
- Portable library
- Rabit is a library instead of framework, program only need to link the library to run, without restricting to a single framework.
- Flexibility in programming
- Programs call rabit functions in any sequence, as opposed to defines limited functions and being called.
- Program persist over all the iterations, unless it fails and recover
- Fault tolerance
- Rabit program can recover model and results of syncrhonization functions calls
- MPI compatible
- Codes using rabit interface naturally compiles with existing MPI compiler
- User can fall back to use MPI Allreduce if they like with no code modification
Design Note
- Rabit is designed for algorithms that replicate same global model across nodes, while each node operating on local parition of data.
- The global statistics collection is done by using Allreduce
Design Goal
- rabit should run fast
- rabit is light weight
- rabit dig safe burrows to avoid disasters
Description
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Languages
C++
45.5%
Python
20.3%
Cuda
15.2%
R
6.8%
Scala
6.4%
Other
5.6%