2014-12-04 17:30:58 -08:00
ok
2014-12-03 21:53:34 -08:00
2014-12-04 17:30:58 -08:00
2014-12-03 15:04:30 -08:00
2014-12-04 09:05:48 -08:00
2014-12-03 21:30:11 -08:00
2014-12-04 17:30:58 -08:00

rabit: Robust Allreduce and Broadcast Interface

rabit is a light weight library that provides a fault tolerant interface of Allreduce and Broadcast. It is designed to support easy implementation of distributed machine learning programs, many of which sits naturally under Allreduce abstraction.

Interface: rabit.h

Features

  • Portable library
    • Rabit is a library instead of framework, program only need to link the library to run, without restricting to a single framework.
  • Flexibility in programming
    • Programs call rabit functions in any sequence, as opposed to defines limited functions and being called.
    • Program persist over all the iterations, unless it fails and recover
  • Fault tolerance
    • Rabit program can recover model and results of syncrhonization functions calls
  • MPI compatible
    • Codes using rabit interface naturally compiles with existing MPI compiler
    • User can fall back to use MPI Allreduce if they like with no code modification

Design Note

  • Rabit is designed for algorithms that replicate same global model across nodes, while each node operating on local parition of data.
  • The global statistics collection is done by using Allreduce

Design Goal

  • rabit should run fast
  • rabit is light weight
  • rabit dig safe burrows to avoid disasters
Description
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Readme 33 MiB
Languages
C++ 45.5%
Python 20.3%
Cuda 15.2%
R 6.8%
Scala 6.4%
Other 5.6%