adding changes suggested by Tianqi

This commit is contained in:
nachocano 2015-01-15 01:21:40 -08:00
parent bddfa2fc24
commit 86e61ad6a5

View File

@ -58,6 +58,24 @@ Rabit provides different reduction operators, for example, if you change ```op:
the reduction operation will be a summation, and the result will become ```a = {1, 3, 5}```.
You can also run the example with different processes by setting -n to different values.
In order to make the library available for a wider range of developers, we decided to provide a python wrapper to our C++ code. Developers can now program rabit applications in Python! The same example as before can be found in [basic.py](basic.py):
```python
import numpy as np
import rabit
rabit.init()
n = 3
rank = rabit.get_rank()
a = np.zeros(n)
for i in xrange(n):
a[i] = rank + i
print '@node[%d] before-allreduce: a=%s' % (rank, str(a))
a = rabit.allreduce(a, rabit.MAX)
print '@node[%d] after-allreduce: a=%s' % (rank, str(a))
rabit.finalize()
```
Broadcast is another method provided by rabit besides Allreduce. This function allows one node to broadcast its
local data to all other nodes. The following code in [broadcast.cc](broadcast.cc) broadcasts a string from
node 0 to all other nodes.
@ -85,6 +103,22 @@ The following command starts the program with three worker processes.
```
Besides strings, rabit also allows to broadcast constant size array and vectors.
The counterpart in python can be found in [broadcast.py](broadcast.py). Here is a snippet so that you can get a better sense of how simple is to use the wrapper:
```python
import rabit
rabit.init()
n = 3
rank = rabit.get_rank()
s = None
if rank == 0:
s = {'hello world':100, 2:3}
print '@node[%d] before-broadcast: s=\"%s\"' % (rank, str(s))
s = rabit.broadcast(s, 0)
print '@node[%d] after-broadcast: s=\"%s\"' % (rank, str(s))
rabit.finalize()
```
Common Use Case
=====
Many distributed machine learning algorithms involve splitting the data into different nodes,
@ -266,13 +300,4 @@ recovered node fetches its latest checkpoint and the results of
Allreduce/Broadcast calls after the checkpoint from some alive nodes.
This is just a conceptual introduction to rabit's fault tolerance model. The actual implementation is more sophisticated,
and can deal with more complicated cases such as multiple nodes failure and node failure during recovery phase.
Python Wrapper
=====
In order to make the library available for a wider range of developers, we decided to provide a python wrapper to our C++ code.
Developers can now program rabit applications in Python! We provide a couple of examples:
* [./basic.py](./basic.py) : [./basic.cc](./basic.cc) counterpart, explained above.
* [./broadcast.py](./broadcast.py) : [./broadcast.cc](./broadcast.cc) counterpart, explained above.
and can deal with more complicated cases such as multiple nodes failure and node failure during recovery phase.