[GPU-Plugin] Add GPU accelerated prediction (#2593)

* [GPU-Plugin] Add GPU accelerated prediction

* Improve allocation message

* Update documentation

* Resolve linker error for predictor

* Add unit tests
This commit is contained in:
Rory Mitchell
2017-08-16 12:31:59 +12:00
committed by GitHub
parent 71e5e622b1
commit ef23e424f1
25 changed files with 876 additions and 203 deletions

View File

@@ -1,5 +1,5 @@
# CUDA Accelerated Tree Construction Algorithms
This plugin adds GPU accelerated tree construction algorithms to XGBoost.
This plugin adds GPU accelerated tree construction and prediction algorithms to XGBoost.
## Usage
Specify the 'tree_method' parameter as one of the following algorithms.
@@ -18,6 +18,9 @@ colsample_bylevel | ✔ | ✔ |
max_bin | ✖ | ✔ |
gpu_id | ✔ | ✔ |
n_gpus | ✖ | ✔ |
predictor | ✔ | ✔ |
GPU accelerated prediction is enabled by default for the above mentioned 'tree_method' parameters but can be switched to CPU prediction by setting 'predictor':'cpu_predictor'. This could be useful if you want to conserve GPU memory. Likewise when using CPU algorithms, GPU accelerated prediction can be enabled by setting 'predictor':'gpu_predictor'.
The device ordinal can be selected using the 'gpu_id' parameter, which defaults to 0.
@@ -37,48 +40,31 @@ To run benchmarks on synthetic data for binary classification:
$ python benchmark/benchmark.py
```
Training time time on 1000000 rows x 50 columns with 500 boosting iterations on i7-6700K CPU @ 4.00GHz and Pascal Titan X.
Training time time on 1,000,000 rows x 50 columns with 500 boosting iterations and 0.25/0.75 test/train split on i7-6700K CPU @ 4.00GHz and Pascal Titan X.
| tree_method | Time (s) |
| --- | --- |
| gpu_hist | 11.09 |
| hist (histogram XGBoost - CPU) | 41.75 |
| gpu_exact | 193.90 |
| exact (standard XGBoost - CPU) | 720.12 |
| gpu_hist | 13.87 |
| hist | 63.55 |
| gpu_exact | 161.08 |
| exact | 1082.20 |
[See here](http://dmlc.ml/2016/12/14/GPU-accelerated-xgboost.html) for additional performance benchmarks of the 'gpu_exact' tree_method.
## Test
To run tests:Will
To run python tests:
```bash
$ python -m nose test/python/
```
Google tests can be enabled by specifying -DGOOGLE_TEST=ON when building with cmake.
## Dependencies
A CUDA capable GPU with at least compute capability >= 3.5 (the algorithm depends on shuffle and vote instructions introduced in Kepler).
A CUDA capable GPU with at least compute capability >= 3.5
Building the plug-in requires CUDA Toolkit 7.5 or later (https://developer.nvidia.com/cuda-downloads)
submodule: The plugin also depends on CUB 1.6.4 - https://nvlabs.github.io/cub/ . CUB is a header only cuda library which provides sort/reduce/scan primitives.
submodule: NVIDIA NCCL from https://github.com/NVIDIA/nccl with windows port allowed by git@github.com:h2oai/nccl.git
## Download full repo + full submodules for your choice (or empty) path <mypath>
git clone --recursive https://github.com/dmlc/xgboost.git <mypath>
## Download with shallow submodules for much quicker download:
git 2.9.0+ (assumes only HEAD used for all submodules, but not true currently for dmlc-core and rabbit)
git clone --recursive --shallow-submodules https://github.com/dmlc/xgboost.git <mypath>
git 2.9.0-: (only cub is shallow, as largest repo)
git clone https://github.com/dmlc/xgboost.git <mypath>
cd <mypath>
bash plugin/updater/gpu/gitshallow_submodules.sh
## Build
From the command line on Linux starting from the xgboost directory:
@@ -110,14 +96,11 @@ On some systems, nccl libraries are specific to a particular system (IBM Power o
### For Developers!
In case you want to build only for a specific GPU(s), for eg. GP100 and GP102,
whose compute capability are 60 and 61 respectively:
```bash
$ cmake .. -DPLUGIN_UPDATER_GPU=ON -DGPU_COMPUTE_VER="60;61"
```
By default, the versions will include support for all GPUs in Maxwell and Pascal architectures.
### Using make
Now, it also supports the usual 'make' flow to build gpu-enabled tree construction plugins. It's currently only tested on Linux. From the xgboost directory
@@ -131,19 +114,10 @@ Similar to cmake, if you want to build only for a specific GPU(s):
$ make -j PLUGIN_UPDATER_GPU=ON GPU_COMPUTE_VER="60 61"
```
### For Developers!
Now, some of the code-base inside gpu plugins have googletest unit-tests inside 'tests/'.
They can be enabled run along with other unit-tests inside '<xgboostRoot>/tests/cpp' using:
```bash
# make sure CUDA SDK bin directory is in the 'PATH' env variable
# below 2 commands need only be executed once
$ source ./dmlc-core/scripts/travis/travis_setup_env.sh
$ make -f dmlc-core/scripts/packages.mk gtest
$ make PLUGIN_UPDATER_GPU=ON GTEST_PATH=${CACHE_PREFIX} test
```
## Changelog
##### 2017/8/14
* Added GPU accelerated prediction. Considerably improved performance when using test/eval sets.
##### 2017/7/10
* Memory performance improved 4x for gpu_hist