[GPU-Plugin] Multi-GPU for grow_gpu_hist histogram method using NVIDIA NCCL. (#2395)

2017-06-11 13:06:08 -04:00
parent e24f25e0c6
commit 41efe32aa5
19 changed files with 2009 additions and 682 deletions
--- a/plugin/updater_gpu/README.md
+++ b/plugin/updater_gpu/README.md
@@ -17,8 +17,11 @@ colsample_bytree | &#10004; | &#10004;|
 colsample_bylevel | &#10004; | &#10004; |
 max_bin | &#10006; | &#10004; |
 gpu_id | &#10004; | &#10004; | 
+n_gpus | &#10006; | &#10004; | 

-All algorithms currently use only a single GPU. The device ordinal can be selected using the 'gpu_id' parameter, which defaults to 0.
+The device ordinal can be selected using the 'gpu_id' parameter, which defaults to 0.
+
+Multiple GPUs can be used with the grow_gpu_hist parameter using the n_gpus parameter, which defaults to -1 (indicating use all visible GPUs).  If gpu_id is specified as non-zero, the gpu device order is mod(gpu_id + i) % n_visible_devices for i=0 to n_gpus-1.  As with GPU vs. CPU, multi-GPU will not always be faster than a single GPU due to PCI bus bandwidth that can limit performance.  For example, when n_features * n_bins * 2^depth divided by time of each round/iteration becomes comparable to the real PCI 16x bus bandwidth of order 4GB/s to 10GB/s, then AllReduce will dominant code speed and multiple GPUs become ineffective at increasing performance.  Also, CPU overhead between GPU calls can limit usefulness of multiple GPUs.

 This plugin currently works with the CLI version and python version.

@@ -54,29 +57,38 @@ $ python -m nose test/python/
 ## Dependencies
 A CUDA capable GPU with at least compute capability >= 3.5 (the algorithm depends on shuffle and vote instructions introduced in Kepler).

-Building the plug-in requires CUDA Toolkit 7.5 or later.
+Building the plug-in requires CUDA Toolkit 7.5 or later (https://developer.nvidia.com/cuda-downloads)

+submodule: The plugin also depends on CUB 1.6.4 - https://nvlabs.github.io/cub/ . CUB is a header only cuda library which provides sort/reduce/scan primitives.
+
+submodule: NVIDIA NCCL from https://github.com/NVIDIA/nccl with windows port allowed by git@github.com:h2oai/nccl.git

 ## Build

-### Using cmake
-To use the plugin xgboost must be built by specifying the option PLUGIN_UPDATER_GPU=ON. CMake will prepare a build system depending on which platform you are on.
+From the command line on Linux starting from the xgboost directory:

 On Linux, from the xgboost directory:
 ```bash
 $ mkdir build
 $ cd build
 $ cmake .. -DPLUGIN_UPDATER_GPU=ON
-$ make
+$ make -j
 ```
-If 'make' fails try invoking make again. There can sometimes be problems with the order items are built.
-
-On Windows you may also need to specify your generator as 64 bit, so the cmake command becomes:
+On Windows using cmake, see what options for Generators you have for cmake, and choose one with [arch] replaced by Win64:
 ```bash
-$ cmake .. -G"Visual Studio 12 2013 Win64" -DPLUGIN_UPDATER_GPU=ON
+cmake -help
 ```
-You may also  be able to use a later version of visual studio depending on whether the CUDA toolkit supports it.
-cmake will generate an xgboost.sln solution file in the build directory. Build this solution in release mode. This is also a good time to check it is being built as x64. If not make sure the cmake generator is set correctly.
+Then run cmake as:
+```bash
+$ mkdir build
+$ cd build
+$ cmake .. -G"Visual Studio 14 2015 Win64" -DPLUGIN_UPDATER_GPU=ON
+```
+Cmake will generate an xgboost.sln solution file in the build directory. Build this solution in release mode as a x64 build.
+
+Visual studio community 2015, supported by cuda toolkit (http://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/#axzz4isREr2nS), can be downloaded from: https://my.visualstudio.com/Downloads?q=Visual%20Studio%20Community%202015 .  You may also be able to use a later version of visual studio depending on whether the CUDA toolkit supports it.  Note that Mingw cannot be used with cuda.
+
+### For Developers!

 ### Using make
 Now, it also supports the usual 'make' flow to build gpu-enabled tree construction plugins. It's currently only tested on Linux. From the xgboost directory
@@ -84,9 +96,6 @@ Now, it also supports the usual 'make' flow to build gpu-enabled tree constructi
 # make sure CUDA SDK bin directory is in the 'PATH' env variable
 $ make PLUGIN_UPDATER_GPU=ON
 ```
-
-### For Developers!
-
 Now, some of the code-base inside gpu plugins have googletest unit-tests inside 'tests/'.
 They can be enabled run along with other unit-tests inside '<xgboostRoot>/tests/cpp' using:
 ```bash
@@ -98,10 +107,17 @@ $ make PLUGIN_UPDATER_GPU=ON GTEST_PATH=${CACHE_PREFIX} test
 ```

 ## Changelog
+##### 2017/6/5
+
+* Multi-GPU support for histogram method using NVIDIA NCCL.
+
 ##### 2017/5/31
 * Faster version of the grow_gpu plugin
 * Added support for building gpu plugin through 'make' flow too

+##### 2017/5/19
+* Further performance enhancements for histogram method.
+
 ##### 2017/5/5
 * Histogram performance improvements
 * Fix gcc build issues 
@@ -115,10 +131,19 @@ $ make PLUGIN_UPDATER_GPU=ON GTEST_PATH=${CACHE_PREFIX} test
 [Mitchell, Rory, and Eibe Frank. Accelerating the XGBoost algorithm using GPU computing. No. e2911v1. PeerJ Preprints, 2017.](https://peerj.com/preprints/2911/)

 ## Author
-Rory Mitchell 
-
-Please report bugs to the xgboost/issues page. You can tag me with @RAMitchell.
-
-Otherwise I can be contacted at r.a.mitchell.nz at gmail.
+<<<<<<< HEAD
+Rory Mitchell,
+Jonathan C. McKinney,
+Shankara Rao Thejaswi Nanditale,
+Vinay Deshpande,
+and the rest of the H2O.ai and NVIDIA team.
+=======
+Rory Mitchell
+Jonathan C. McKinney
+Shankara Rao Thejaswi Nanditale
+Vinay Deshpande
+... and the rest of the H2O.ai and NVIDIA team.
+>>>>>>> d2fbbdf4a39fa1f0af5cbd59a7912cf5caade34e

+Please report bugs to the xgboost/issues page.