update docs

This commit is contained in:
tqchen 2015-03-09 13:00:34 -07:00
parent a607047aa1
commit 6a1167611c
7 changed files with 34 additions and 13 deletions

View File

@ -13,7 +13,7 @@ All these features comes from the facts about small rabbit:)
* Portable: rabit is light weight and runs everywhere * Portable: rabit is light weight and runs everywhere
- Rabit is a library instead of a framework, a program only needs to link the library to run - Rabit is a library instead of a framework, a program only needs to link the library to run
- Rabit only replies on a mechanism to start program, which was provided by most framework - Rabit only replies on a mechanism to start program, which was provided by most framework
- You can run rabit programs on many platforms, including Hadoop, MPI using the same code - You can run rabit programs on many platforms, including Yarn(Hadoop), MPI using the same code
* Scalable and Flexible: rabit runs fast * Scalable and Flexible: rabit runs fast
* Rabit program use Allreduce to communicate, and do not suffer the cost between iterations of MapReduce abstraction. * Rabit program use Allreduce to communicate, and do not suffer the cost between iterations of MapReduce abstraction.
- Programs can call rabit functions in any order, as opposed to frameworks where callbacks are offered and called by the framework, i.e. inversion of control principle. - Programs can call rabit functions in any order, as opposed to frameworks where callbacks are offered and called by the framework, i.e. inversion of control principle.

View File

@ -341,12 +341,11 @@ Rabit is a portable library that can run on multiple platforms.
* This script will restart the program when it exits with -2, so it can be used for [mock test](#link-against-mock-test-library) * This script will restart the program when it exits with -2, so it can be used for [mock test](#link-against-mock-test-library)
#### Running Rabit on Hadoop #### Running Rabit on Hadoop
* You can use [../tracker/rabit_hadoop.py](../tracker/rabit_hadoop.py) to run rabit programs on hadoop * You can use [../tracker/rabit_yarn.py](../tracker/rabit_yarn.py) to run rabit programs as Yarn application
* This will start n rabit programs as mappers of MapReduce * This will start rabit programs as yarn applications
* Each program can read its portion of data from stdin
* Yarn(Hadoop 2.0 or higher) is highly recommended, since Yarn allows specifying number of cpus and memory of each mapper:
- This allows multi-threading programs in each node, which can be more efficient - This allows multi-threading programs in each node, which can be more efficient
- An easy multi-threading solution could be to use OpenMP with rabit code - An easy multi-threading solution could be to use OpenMP with rabit code
* It is also possible to run rabit program via hadoop streaming, however, YARN is highly recommended.
#### Running Rabit using MPI #### Running Rabit using MPI
* You can submit rabit programs to an MPI cluster using [../tracker/rabit_mpi.py](../tracker/rabit_mpi.py). * You can submit rabit programs to an MPI cluster using [../tracker/rabit_mpi.py](../tracker/rabit_mpi.py).
@ -358,15 +357,15 @@ tracker scripts, such as [../tracker/rabit_hadoop.py](../tracker/rabit_hadoop.py
You will need to implement a platform dependent submission function with the following definition You will need to implement a platform dependent submission function with the following definition
```python ```python
def fun_submit(nworkers, worker_args): def fun_submit(nworkers, worker_args, worker_envs):
""" """
customized submit script, that submits nslave jobs, customized submit script, that submits nslave jobs,
each must contain args as parameter each must contain args as parameter
note this can be a lambda closure note this can be a lambda closure
Parameters Parameters
nworkers number of worker processes to start nworkers number of worker processes to start
worker_args tracker information which must be passed to the arguments worker_args addtiional arguments that needs to be passed to worker
this usually includes the parameters of master_uri and port, etc. worker_envs enviroment variables that need to be set to the worker
""" """
``` ```
The submission function should start nworkers processes in the platform, and append worker_args to the end of the other arguments. The submission function should start nworkers processes in the platform, and append worker_args to the end of the other arguments.
@ -374,7 +373,7 @@ Then you can simply call ```tracker.submit``` with fun_submit to submit jobs to
Note that the current rabit tracker does not restart a worker when it dies, the restart of a node is done by the platform, otherwise we should write the fail-restart logic in the custom script. Note that the current rabit tracker does not restart a worker when it dies, the restart of a node is done by the platform, otherwise we should write the fail-restart logic in the custom script.
* Fail-restart is usually provided by most platforms. * Fail-restart is usually provided by most platforms.
* For example, mapreduce will restart a mapper when it fails - rabit-yarn provides such functionality in YARN
Fault Tolerance Fault Tolerance
===== =====

View File

@ -5,6 +5,7 @@ It also contain links to the Machine Learning packages that uses rabit.
* Contribution of toolkits, examples, benchmarks is more than welcomed! * Contribution of toolkits, examples, benchmarks is more than welcomed!
Toolkits Toolkits
==== ====
* [KMeans Clustering](kmeans) * [KMeans Clustering](kmeans)
@ -14,5 +15,3 @@ Toolkits
10 times faster than existing packages 10 times faster than existing packages
- Rabit carries xgboost to distributed enviroment, inheritating all the benefits of xgboost - Rabit carries xgboost to distributed enviroment, inheritating all the benefits of xgboost
single node version, and scale it to even larger problems single node version, and scale it to even larger problems

12
tracker/README.md Normal file
View File

@ -0,0 +1,12 @@
Trackers
=====
This folder contains tracker scripts that can be used to submit yarn jobs to different platforms,
the example guidelines are in the script themselfs
***Supported Platforms***
* Local demo: [rabit_demo.py](rabit_demo.py)
* MPI: [rabit_mpi.py](rabit_mpi.py)
* Yarn (Hadoop): [rabit_yarn.py](rabit_yarn.py)
- It is also possible to submit via hadoop streaming with rabit_hadoop_streaming.py
- However, it is higly recommended to use rabit_yarn.py because this will allocate resources more precisely and fits machine learning scenarios

View File

@ -1,7 +1,11 @@
#!/usr/bin/python #!/usr/bin/python
""" """
Deprecated
This is a script to submit rabit job using hadoop streaming. This is a script to submit rabit job using hadoop streaming.
It will submit the rabit process as mappers of MapReduce. It will submit the rabit process as mappers of MapReduce.
This script is deprecated, it is highly recommended to use rabit_yarn.py instead
""" """
import argparse import argparse
import sys import sys
@ -91,6 +95,8 @@ out = out.split('\n')[0].split()
assert out[0] == 'Hadoop', 'cannot parse hadoop version string' assert out[0] == 'Hadoop', 'cannot parse hadoop version string'
hadoop_version = out[1].split('.') hadoop_version = out[1].split('.')
use_yarn = int(hadoop_version[0]) >= 2 use_yarn = int(hadoop_version[0]) >= 2
if use_yarn:
warnings.warn('It is highly recommended to use rabit_yarn.py to submit jobs to yarn instead', stacklevel = 2)
print 'Current Hadoop Version is %s' % out[1] print 'Current Hadoop Version is %s' % out[1]

View File

@ -1,7 +1,7 @@
#!/usr/bin/python #!/usr/bin/python
""" """
This is a script to submit rabit job using hadoop streaming. This is a script to submit rabit job via Yarn
It will submit the rabit process as mappers of MapReduce. rabit will run as a Yarn application
""" """
import argparse import argparse
import sys import sys

5
yarn/README.md Normal file
View File

@ -0,0 +1,5 @@
rabit-yarn
=====
* This folder contains Application code to allow rabit run on Yarn.
* You can use [../tracker/rabit_yarn.py](../tracker/rabit_yarn.py) to submit the job
- run ```./build.sh``` to build the jar, before using the script