rabit: Reliable Allreduce and Broadcast Interface
rabit is a light weight library that provides a fault tolerant interface of Allreduce and Broadcast. It is designed to support easy implementations of distributed machine learning programs, many of which fall naturally under the Allreduce abstraction.
- Tutorial
- API Documentation
- You can also directly read the interface header
Features
- Portable library
- Rabit is a library instead of a framework, a program only needs to link the library to run.
- Flexibility in programming
- Programs can call rabit functions in any order, as opposed to frameworks where callbacks are offered and called by the framework, i.e. inversion of control principle.
- Programs persist over all the iterations, unless they fail and recover.
- Fault tolerance
- Rabit programs can recover the model and results using synchronous function calls.
- MPI compatible
- Code that uses the rabit interface also compiles with existing MPI compilers
- Users can use MPI Allreduce with no code modification
Use Rabit
- Type make in the root folder will compile the rabit library in lib folder
- Add lib to the library path and include to the include path of compiler
Design Notes
- Rabit is designed for algorithms that replicate the same global model across nodes, while each node operates on a local partition of the data.
- The collection of global statistics is done using Allreduce
Design Goals
- rabit should run fast
- rabit should be light weight
- rabit should safely dig burrows to avoid disasters
Description
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Languages
C++
45.5%
Python
20.3%
Cuda
15.2%
R
6.8%
Scala
6.4%
Other
5.6%