Go to file

FelixYBW e6cd74ead3 Set a minimal reducer size and parent_down size (#139 )

* set a minimal reducer msg size. Receive the same data size from parent each time.

* When parent read from a child, check it receive minimal reduce size.
 fix bug. Rewrite the minimal reducer size check, make sure it's 1~N times of minimal reduce size

 Assume the minimal reduce size is X, the logic here is
 1: each child upload total_size of message
 2: each parent receive X message at least, up to total_size
 3: parent reduce X or NxX or total_size message
 4: parent sends X or NxX or total_size message to its parent
 4: parent's parent receive X message at least, up to total_size. Then reduce X or NxX or total_size message
 6: parent's parent sends X or NxX or total_size message to its children
 7: parent receives X or NxX or total_size message, sends to its children
 8: child receive X or NxN or total_size message.

 During the whole process, each transfer is (1~N)xX Byte message or up to total_size.

 if X is larger than total_size, then allreduce allways reduce the whole messages and pass down.

* Follow style check rule

* fix the cpplint check

* fix allreduce_base header seq

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>

2020-07-25 12:46:45 -07:00

cmake

Clean up cmake script and code includes (#106 )

2019-09-26 02:29:04 -04:00

doc

exit when allreduce/broadcast error cause timeout (#112 )

2019-10-11 03:39:39 -04:00

guide

Fixed print statements and xrange to be compatibile with Python 2 and 3. (#55 )

2018-02-26 12:19:04 -08:00

include/rabit

De-duplicate macro _CRT_SECURE_NO_WARNINGS / _CRT_SECURE_NO_DEPRECATE (#136 )

2020-06-28 09:51:50 -07:00

lib

update doc

2015-01-03 05:20:18 -08:00

python

support bootstrap allreduce/broadcast (#98 )

2019-08-27 18:12:33 -07:00

scripts

De-duplicate macro _CRT_SECURE_NO_WARNINGS / _CRT_SECURE_NO_DEPRECATE (#136 )

2020-06-28 09:51:50 -07:00

src

Set a minimal reducer size and parent_down size (#139 )

2020-07-25 12:46:45 -07:00

test

De-duplicate macro _CRT_SECURE_NO_WARNINGS / _CRT_SECURE_NO_DEPRECATE (#136 )

2020-06-28 09:51:50 -07:00

.gitignore

Add SeekEnd to MemoryFixSizeBuffer. (#109 )

2019-10-13 00:09:25 -04:00

.travis.yml

Add RABIT_DLL tag to definitions of rabit APIs. (#140 )

2020-05-19 18:20:31 +08:00

CMakeLists.txt

Fix cmake variable. (#126 )

2019-11-05 01:27:08 -05:00

LICENSE

license

2015-02-11 14:49:51 -08:00

Makefile

support bootstrap allreduce/broadcast (#98 )

2019-08-27 18:12:33 -07:00

README.md

remove is_bootstrap parameter (#102 )

2019-09-10 11:45:50 -07:00

README.md

Rabit: Reliable Allreduce and Broadcast Interface

rabit is a light weight library that provides a fault tolerant interface of Allreduce and Broadcast. It is designed to support easy implementations of distributed machine learning programs, many of which fall naturally under the Allreduce abstraction. The goal of rabit is to support portable , scalable and reliable distributed machine learning programs.

Tutorial
API Documentation
You can also directly read the interface header
XGBoost
- Rabit is one of the backbone library to support distributed XGBoost

Features

All these features comes from the facts about small rabbit:)

Portable: rabit is light weight and runs everywhere
- Rabit is a library instead of a framework, a program only needs to link the library to run
- Rabit only replies on a mechanism to start program, which was provided by most framework
- You can run rabit programs on many platforms, including Yarn(Hadoop), MPI using the same code
Scalable and Flexible: rabit runs fast
- Rabit program use Allreduce to communicate, and do not suffer the cost between iterations of MapReduce abstraction.
- Programs can call rabit functions in any order, as opposed to frameworks where callbacks are offered and called by the framework, i.e. inversion of control principle.
- Programs persist over all the iterations, unless they fail and recover.
Reliable: rabit dig burrows to avoid disasters
- Rabit programs can recover the model and results using synchronous function calls.
- Rabit programs can set rabit_boostrap_cache=1 to support allreduce/broadcast operations before loadcheckpoint rabit::Init(); -> rabit::AllReduce(); -> rabit::loadCheckpoint(); -> for () { rabit::AllReduce(); rabit::Checkpoint();} -> rabit::Shutdown();

Use Rabit

Type make in the root folder will compile the rabit library in lib folder
Add lib to the library path and include to the include path of compiler
Languages: You can use rabit in C++ and python
- It is also possible to port the library to other languages

Contributing

Rabit is an open-source library, contributions are welcomed, including:

The rabit core library.
Customized tracker script for new platforms and interface of new languages.
Tutorial and examples about the library.

Languages

C++ 45.5%

Python 20.3%

Cuda 15.2%

R 6.8%

Scala 6.4%

Other 5.6%