update docs
This commit is contained in:
parent
a607047aa1
commit
6a1167611c
@ -13,7 +13,7 @@ All these features comes from the facts about small rabbit:)
|
|||||||
* Portable: rabit is light weight and runs everywhere
|
* Portable: rabit is light weight and runs everywhere
|
||||||
- Rabit is a library instead of a framework, a program only needs to link the library to run
|
- Rabit is a library instead of a framework, a program only needs to link the library to run
|
||||||
- Rabit only replies on a mechanism to start program, which was provided by most framework
|
- Rabit only replies on a mechanism to start program, which was provided by most framework
|
||||||
- You can run rabit programs on many platforms, including Hadoop, MPI using the same code
|
- You can run rabit programs on many platforms, including Yarn(Hadoop), MPI using the same code
|
||||||
* Scalable and Flexible: rabit runs fast
|
* Scalable and Flexible: rabit runs fast
|
||||||
* Rabit program use Allreduce to communicate, and do not suffer the cost between iterations of MapReduce abstraction.
|
* Rabit program use Allreduce to communicate, and do not suffer the cost between iterations of MapReduce abstraction.
|
||||||
- Programs can call rabit functions in any order, as opposed to frameworks where callbacks are offered and called by the framework, i.e. inversion of control principle.
|
- Programs can call rabit functions in any order, as opposed to frameworks where callbacks are offered and called by the framework, i.e. inversion of control principle.
|
||||||
|
|||||||
@ -341,12 +341,11 @@ Rabit is a portable library that can run on multiple platforms.
|
|||||||
* This script will restart the program when it exits with -2, so it can be used for [mock test](#link-against-mock-test-library)
|
* This script will restart the program when it exits with -2, so it can be used for [mock test](#link-against-mock-test-library)
|
||||||
|
|
||||||
#### Running Rabit on Hadoop
|
#### Running Rabit on Hadoop
|
||||||
* You can use [../tracker/rabit_hadoop.py](../tracker/rabit_hadoop.py) to run rabit programs on hadoop
|
* You can use [../tracker/rabit_yarn.py](../tracker/rabit_yarn.py) to run rabit programs as Yarn application
|
||||||
* This will start n rabit programs as mappers of MapReduce
|
* This will start rabit programs as yarn applications
|
||||||
* Each program can read its portion of data from stdin
|
|
||||||
* Yarn(Hadoop 2.0 or higher) is highly recommended, since Yarn allows specifying number of cpus and memory of each mapper:
|
|
||||||
- This allows multi-threading programs in each node, which can be more efficient
|
- This allows multi-threading programs in each node, which can be more efficient
|
||||||
- An easy multi-threading solution could be to use OpenMP with rabit code
|
- An easy multi-threading solution could be to use OpenMP with rabit code
|
||||||
|
* It is also possible to run rabit program via hadoop streaming, however, YARN is highly recommended.
|
||||||
|
|
||||||
#### Running Rabit using MPI
|
#### Running Rabit using MPI
|
||||||
* You can submit rabit programs to an MPI cluster using [../tracker/rabit_mpi.py](../tracker/rabit_mpi.py).
|
* You can submit rabit programs to an MPI cluster using [../tracker/rabit_mpi.py](../tracker/rabit_mpi.py).
|
||||||
@ -358,15 +357,15 @@ tracker scripts, such as [../tracker/rabit_hadoop.py](../tracker/rabit_hadoop.py
|
|||||||
|
|
||||||
You will need to implement a platform dependent submission function with the following definition
|
You will need to implement a platform dependent submission function with the following definition
|
||||||
```python
|
```python
|
||||||
def fun_submit(nworkers, worker_args):
|
def fun_submit(nworkers, worker_args, worker_envs):
|
||||||
"""
|
"""
|
||||||
customized submit script, that submits nslave jobs,
|
customized submit script, that submits nslave jobs,
|
||||||
each must contain args as parameter
|
each must contain args as parameter
|
||||||
note this can be a lambda closure
|
note this can be a lambda closure
|
||||||
Parameters
|
Parameters
|
||||||
nworkers number of worker processes to start
|
nworkers number of worker processes to start
|
||||||
worker_args tracker information which must be passed to the arguments
|
worker_args addtiional arguments that needs to be passed to worker
|
||||||
this usually includes the parameters of master_uri and port, etc.
|
worker_envs enviroment variables that need to be set to the worker
|
||||||
"""
|
"""
|
||||||
```
|
```
|
||||||
The submission function should start nworkers processes in the platform, and append worker_args to the end of the other arguments.
|
The submission function should start nworkers processes in the platform, and append worker_args to the end of the other arguments.
|
||||||
@ -374,7 +373,7 @@ Then you can simply call ```tracker.submit``` with fun_submit to submit jobs to
|
|||||||
|
|
||||||
Note that the current rabit tracker does not restart a worker when it dies, the restart of a node is done by the platform, otherwise we should write the fail-restart logic in the custom script.
|
Note that the current rabit tracker does not restart a worker when it dies, the restart of a node is done by the platform, otherwise we should write the fail-restart logic in the custom script.
|
||||||
* Fail-restart is usually provided by most platforms.
|
* Fail-restart is usually provided by most platforms.
|
||||||
* For example, mapreduce will restart a mapper when it fails
|
- rabit-yarn provides such functionality in YARN
|
||||||
|
|
||||||
Fault Tolerance
|
Fault Tolerance
|
||||||
=====
|
=====
|
||||||
|
|||||||
@ -5,6 +5,7 @@ It also contain links to the Machine Learning packages that uses rabit.
|
|||||||
|
|
||||||
* Contribution of toolkits, examples, benchmarks is more than welcomed!
|
* Contribution of toolkits, examples, benchmarks is more than welcomed!
|
||||||
|
|
||||||
|
|
||||||
Toolkits
|
Toolkits
|
||||||
====
|
====
|
||||||
* [KMeans Clustering](kmeans)
|
* [KMeans Clustering](kmeans)
|
||||||
@ -14,5 +15,3 @@ Toolkits
|
|||||||
10 times faster than existing packages
|
10 times faster than existing packages
|
||||||
- Rabit carries xgboost to distributed enviroment, inheritating all the benefits of xgboost
|
- Rabit carries xgboost to distributed enviroment, inheritating all the benefits of xgboost
|
||||||
single node version, and scale it to even larger problems
|
single node version, and scale it to even larger problems
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
12
tracker/README.md
Normal file
12
tracker/README.md
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
Trackers
|
||||||
|
=====
|
||||||
|
This folder contains tracker scripts that can be used to submit yarn jobs to different platforms,
|
||||||
|
the example guidelines are in the script themselfs
|
||||||
|
|
||||||
|
***Supported Platforms***
|
||||||
|
* Local demo: [rabit_demo.py](rabit_demo.py)
|
||||||
|
* MPI: [rabit_mpi.py](rabit_mpi.py)
|
||||||
|
* Yarn (Hadoop): [rabit_yarn.py](rabit_yarn.py)
|
||||||
|
- It is also possible to submit via hadoop streaming with rabit_hadoop_streaming.py
|
||||||
|
- However, it is higly recommended to use rabit_yarn.py because this will allocate resources more precisely and fits machine learning scenarios
|
||||||
|
|
||||||
@ -1,7 +1,11 @@
|
|||||||
#!/usr/bin/python
|
#!/usr/bin/python
|
||||||
"""
|
"""
|
||||||
|
Deprecated
|
||||||
|
|
||||||
This is a script to submit rabit job using hadoop streaming.
|
This is a script to submit rabit job using hadoop streaming.
|
||||||
It will submit the rabit process as mappers of MapReduce.
|
It will submit the rabit process as mappers of MapReduce.
|
||||||
|
|
||||||
|
This script is deprecated, it is highly recommended to use rabit_yarn.py instead
|
||||||
"""
|
"""
|
||||||
import argparse
|
import argparse
|
||||||
import sys
|
import sys
|
||||||
@ -91,6 +95,8 @@ out = out.split('\n')[0].split()
|
|||||||
assert out[0] == 'Hadoop', 'cannot parse hadoop version string'
|
assert out[0] == 'Hadoop', 'cannot parse hadoop version string'
|
||||||
hadoop_version = out[1].split('.')
|
hadoop_version = out[1].split('.')
|
||||||
use_yarn = int(hadoop_version[0]) >= 2
|
use_yarn = int(hadoop_version[0]) >= 2
|
||||||
|
if use_yarn:
|
||||||
|
warnings.warn('It is highly recommended to use rabit_yarn.py to submit jobs to yarn instead', stacklevel = 2)
|
||||||
|
|
||||||
print 'Current Hadoop Version is %s' % out[1]
|
print 'Current Hadoop Version is %s' % out[1]
|
||||||
|
|
||||||
|
|||||||
@ -1,7 +1,7 @@
|
|||||||
#!/usr/bin/python
|
#!/usr/bin/python
|
||||||
"""
|
"""
|
||||||
This is a script to submit rabit job using hadoop streaming.
|
This is a script to submit rabit job via Yarn
|
||||||
It will submit the rabit process as mappers of MapReduce.
|
rabit will run as a Yarn application
|
||||||
"""
|
"""
|
||||||
import argparse
|
import argparse
|
||||||
import sys
|
import sys
|
||||||
|
|||||||
5
yarn/README.md
Normal file
5
yarn/README.md
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
rabit-yarn
|
||||||
|
=====
|
||||||
|
* This folder contains Application code to allow rabit run on Yarn.
|
||||||
|
* You can use [../tracker/rabit_yarn.py](../tracker/rabit_yarn.py) to submit the job
|
||||||
|
- run ```./build.sh``` to build the jar, before using the script
|
||||||
Loading…
x
Reference in New Issue
Block a user