diff --git a/README.md b/README.md index 752c27d43..a8ea1de9e 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ All these features comes from the facts about small rabbit:) * Portable: rabit is light weight and runs everywhere - Rabit is a library instead of a framework, a program only needs to link the library to run - Rabit only replies on a mechanism to start program, which was provided by most framework - - You can run rabit programs on many platforms, including Hadoop, MPI using the same code + - You can run rabit programs on many platforms, including Yarn(Hadoop), MPI using the same code * Scalable and Flexible: rabit runs fast * Rabit program use Allreduce to communicate, and do not suffer the cost between iterations of MapReduce abstraction. - Programs can call rabit functions in any order, as opposed to frameworks where callbacks are offered and called by the framework, i.e. inversion of control principle. diff --git a/guide/README.md b/guide/README.md index 41c4b9982..e4ee14ed7 100644 --- a/guide/README.md +++ b/guide/README.md @@ -341,12 +341,11 @@ Rabit is a portable library that can run on multiple platforms. * This script will restart the program when it exits with -2, so it can be used for [mock test](#link-against-mock-test-library) #### Running Rabit on Hadoop -* You can use [../tracker/rabit_hadoop.py](../tracker/rabit_hadoop.py) to run rabit programs on hadoop -* This will start n rabit programs as mappers of MapReduce -* Each program can read its portion of data from stdin -* Yarn(Hadoop 2.0 or higher) is highly recommended, since Yarn allows specifying number of cpus and memory of each mapper: +* You can use [../tracker/rabit_yarn.py](../tracker/rabit_yarn.py) to run rabit programs as Yarn application +* This will start rabit programs as yarn applications - This allows multi-threading programs in each node, which can be more efficient - An easy multi-threading solution could be to use OpenMP with rabit code +* It is also possible to run rabit program via hadoop streaming, however, YARN is highly recommended. #### Running Rabit using MPI * You can submit rabit programs to an MPI cluster using [../tracker/rabit_mpi.py](../tracker/rabit_mpi.py). @@ -358,15 +357,15 @@ tracker scripts, such as [../tracker/rabit_hadoop.py](../tracker/rabit_hadoop.py You will need to implement a platform dependent submission function with the following definition ```python -def fun_submit(nworkers, worker_args): +def fun_submit(nworkers, worker_args, worker_envs): """ customized submit script, that submits nslave jobs, each must contain args as parameter note this can be a lambda closure Parameters nworkers number of worker processes to start - worker_args tracker information which must be passed to the arguments - this usually includes the parameters of master_uri and port, etc. + worker_args addtiional arguments that needs to be passed to worker + worker_envs enviroment variables that need to be set to the worker """ ``` The submission function should start nworkers processes in the platform, and append worker_args to the end of the other arguments. @@ -374,7 +373,7 @@ Then you can simply call ```tracker.submit``` with fun_submit to submit jobs to Note that the current rabit tracker does not restart a worker when it dies, the restart of a node is done by the platform, otherwise we should write the fail-restart logic in the custom script. * Fail-restart is usually provided by most platforms. -* For example, mapreduce will restart a mapper when it fails + - rabit-yarn provides such functionality in YARN Fault Tolerance ===== diff --git a/rabit-learn/README.md b/rabit-learn/README.md index 64e39b3e2..dc6b791ac 100644 --- a/rabit-learn/README.md +++ b/rabit-learn/README.md @@ -5,6 +5,7 @@ It also contain links to the Machine Learning packages that uses rabit. * Contribution of toolkits, examples, benchmarks is more than welcomed! + Toolkits ==== * [KMeans Clustering](kmeans) @@ -14,5 +15,3 @@ Toolkits 10 times faster than existing packages - Rabit carries xgboost to distributed enviroment, inheritating all the benefits of xgboost single node version, and scale it to even larger problems - - diff --git a/tracker/README.md b/tracker/README.md new file mode 100644 index 000000000..accf4dbc0 --- /dev/null +++ b/tracker/README.md @@ -0,0 +1,12 @@ +Trackers +===== +This folder contains tracker scripts that can be used to submit yarn jobs to different platforms, +the example guidelines are in the script themselfs + +***Supported Platforms*** +* Local demo: [rabit_demo.py](rabit_demo.py) +* MPI: [rabit_mpi.py](rabit_mpi.py) +* Yarn (Hadoop): [rabit_yarn.py](rabit_yarn.py) + - It is also possible to submit via hadoop streaming with rabit_hadoop_streaming.py + - However, it is higly recommended to use rabit_yarn.py because this will allocate resources more precisely and fits machine learning scenarios + diff --git a/tracker/rabit_hadoop_streaming.py b/tracker/rabit_hadoop_streaming.py index d2b47adf9..2587a6872 100755 --- a/tracker/rabit_hadoop_streaming.py +++ b/tracker/rabit_hadoop_streaming.py @@ -1,7 +1,11 @@ #!/usr/bin/python """ +Deprecated + This is a script to submit rabit job using hadoop streaming. It will submit the rabit process as mappers of MapReduce. + +This script is deprecated, it is highly recommended to use rabit_yarn.py instead """ import argparse import sys @@ -91,6 +95,8 @@ out = out.split('\n')[0].split() assert out[0] == 'Hadoop', 'cannot parse hadoop version string' hadoop_version = out[1].split('.') use_yarn = int(hadoop_version[0]) >= 2 +if use_yarn: + warnings.warn('It is highly recommended to use rabit_yarn.py to submit jobs to yarn instead', stacklevel = 2) print 'Current Hadoop Version is %s' % out[1] diff --git a/tracker/rabit_yarn.py b/tracker/rabit_yarn.py index ae391fffd..3a4937278 100755 --- a/tracker/rabit_yarn.py +++ b/tracker/rabit_yarn.py @@ -1,7 +1,7 @@ #!/usr/bin/python """ -This is a script to submit rabit job using hadoop streaming. -It will submit the rabit process as mappers of MapReduce. +This is a script to submit rabit job via Yarn +rabit will run as a Yarn application """ import argparse import sys diff --git a/yarn/README.md b/yarn/README.md new file mode 100644 index 000000000..a1f924fd9 --- /dev/null +++ b/yarn/README.md @@ -0,0 +1,5 @@ +rabit-yarn +===== +* This folder contains Application code to allow rabit run on Yarn. +* You can use [../tracker/rabit_yarn.py](../tracker/rabit_yarn.py) to submit the job + - run ```./build.sh``` to build the jar, before using the script