1.3 KiB
1.3 KiB
rabit: Reliable Allreduce and Broadcast Interface
rabit is a light weight library that provides a fault tolerant interface of Allreduce and Broadcast. It is designed to support easy implementation of distributed machine learning programs, many of which sits naturally under Allreduce abstraction.
- See the package interface file
Features
- Portable library
- Rabit is a library instead of framework, program only need to link the library to run, without restricting to a single framework.
- Flexibility in programming
- Programs call rabit functions in any sequence, as opposed to defines limited functions and being called.
- Program persist over all the iterations, unless it fails and recover
- Fault tolerance
- Rabit program can recover model and results of syncrhonization functions calls
- MPI compatible
- Codes using rabit interface naturally compiles with existing MPI compiler
- User can fall back to use MPI Allreduce if they like with no code modification
Design Note
- Rabit is designed for algorithms that replicate same global model across nodes, while each node operating on local parition of data.
- The global statistics collection is done by using Allreduce
Design Goal
- rabit should run fast
- rabit is light weight
- rabit dig safe burrows to avoid disasters