RMM integration plugin (#5873)
* [CI] Add RMM as an optional dependency * Replace caching allocator with pool allocator from RMM * Revert "Replace caching allocator with pool allocator from RMM" This reverts commit e15845d4e72e890c2babe31a988b26503a7d9038. * Use rmm::mr::get_default_resource() * Try setting default resource (doesn't work yet) * Allocate pool_mr in the heap * Prevent leaking pool_mr handle * Separate EXPECT_DEATH() in separate test suite suffixed DeathTest * Turn off death tests for RMM * Address reviewer's feedback * Prevent leaking of cuda_mr * Fix Jenkinsfile syntax * Remove unnecessary function in Jenkinsfile * [CI] Install NCCL into RMM container * Run Python tests * Try building with RMM, CUDA 10.0 * Do not use RMM for CUDA 10.0 target * Actually test for test_rmm flag * Fix TestPythonGPU * Use CNMeM allocator, since pool allocator doesn't yet support multiGPU * Use 10.0 container to build RMM-enabled XGBoost * Revert "Use 10.0 container to build RMM-enabled XGBoost" This reverts commit 789021fa31112e25b683aef39fff375403060141. * Fix Jenkinsfile * [CI] Assign larger /dev/shm to NCCL * Use 10.2 artifact to run multi-GPU Python tests * Add CUDA 10.0 -> 11.0 cross-version test; remove CUDA 10.0 target * Rename Conda env rmm_test -> gpu_test * Use env var to opt into CNMeM pool for C++ tests * Use identical CUDA version for RMM builds and tests * Use Pytest fixtures to enable RMM pool in Python tests * Move RMM to plugin/CMakeLists.txt; use PLUGIN_RMM * Use per-device MR; use command arg in gtest * Set CMake prefix path to use Conda env * Use 0.15 nightly version of RMM * Remove unnecessary header * Fix a unit test when cudf is missing * Add RMM demos * Remove print() * Use HostDeviceVector in GPU predictor * Simplify pytest setup; use LocalCUDACluster fixture * Address reviewers' commments Co-authored-by: Hyunsu Cho <chohyu01@cs.wasshington.edu>
This commit is contained in:
committed by
GitHub
parent
c3ea3b7e37
commit
9adb812a0a
31
demo/rmm_plugin/README.md
Normal file
31
demo/rmm_plugin/README.md
Normal file
@@ -0,0 +1,31 @@
|
||||
Using XGBoost with RAPIDS Memory Manager (RMM) plugin (EXPERIMENTAL)
|
||||
====================================================================
|
||||
[RAPIDS Memory Manager (RMM)](https://github.com/rapidsai/rmm) library provides a collection of
|
||||
efficient memory allocators for NVIDIA GPUs. It is now possible to use XGBoost with memory
|
||||
allocators provided by RMM, by enabling the RMM integration plugin.
|
||||
|
||||
The demos in this directory highlights one RMM allocator in particular: **the pool sub-allocator**.
|
||||
This allocator addresses the slow speed of `cudaMalloc()` by allocating a large chunk of memory
|
||||
upfront. Subsequent allocations will draw from the pool of already allocated memory and thus avoid
|
||||
the overhead of calling `cudaMalloc()` directly. See
|
||||
[this GTC talk slides](https://on-demand.gputechconf.com/gtc/2015/presentation/S5530-Stephen-Jones.pdf)
|
||||
for more details.
|
||||
|
||||
Before running the demos, ensure that XGBoost is compiled with the RMM plugin enabled. To do this,
|
||||
run CMake with option `-DPLUGIN_RMM=ON` (`-DUSE_CUDA=ON` also required):
|
||||
```
|
||||
cmake .. -DUSE_CUDA=ON -DUSE_NCCL=ON -DPLUGIN_RMM=ON
|
||||
make -j4
|
||||
```
|
||||
CMake will attempt to locate the RMM library in your build environment. You may choose to build
|
||||
RMM from the source, or install it using the Conda package manager. If CMake cannot find RMM, you
|
||||
should specify the location of RMM with the CMake prefix:
|
||||
```
|
||||
# If using Conda:
|
||||
cmake .. -DUSE_CUDA=ON -DUSE_NCCL=ON -DPLUGIN_RMM=ON -DCMAKE_PREFIX_PATH=$CONDA_PREFIX
|
||||
# If using RMM installed with a custom location
|
||||
cmake .. -DUSE_CUDA=ON -DUSE_NCCL=ON -DPLUGIN_RMM=ON -DCMAKE_PREFIX_PATH=/path/to/rmm
|
||||
```
|
||||
|
||||
* [Using RMM with a single GPU](./rmm_singlegpu.py)
|
||||
* [Using RMM with a local Dask cluster consisting of multiple GPUs](./rmm_mgpu_with_dask.py)
|
||||
27
demo/rmm_plugin/rmm_mgpu_with_dask.py
Normal file
27
demo/rmm_plugin/rmm_mgpu_with_dask.py
Normal file
@@ -0,0 +1,27 @@
|
||||
import xgboost as xgb
|
||||
from sklearn.datasets import make_classification
|
||||
import dask
|
||||
from dask.distributed import Client
|
||||
from dask_cuda import LocalCUDACluster
|
||||
|
||||
def main(client):
|
||||
X, y = make_classification(n_samples=10000, n_informative=5, n_classes=3)
|
||||
X = dask.array.from_array(X)
|
||||
y = dask.array.from_array(y)
|
||||
dtrain = xgb.dask.DaskDMatrix(client, X, label=y)
|
||||
|
||||
params = {'max_depth': 8, 'eta': 0.01, 'objective': 'multi:softprob', 'num_class': 3,
|
||||
'tree_method': 'gpu_hist'}
|
||||
output = xgb.dask.train(client, params, dtrain, num_boost_round=100,
|
||||
evals=[(dtrain, 'train')])
|
||||
bst = output['booster']
|
||||
history = output['history']
|
||||
for i, e in enumerate(history['train']['merror']):
|
||||
print(f'[{i}] train-merror: {e}')
|
||||
|
||||
if __name__ == '__main__':
|
||||
# To use RMM pool allocator with a GPU Dask cluster, just add rmm_pool_size option to
|
||||
# LocalCUDACluster constructor.
|
||||
with LocalCUDACluster(rmm_pool_size='2GB') as cluster:
|
||||
with Client(cluster) as client:
|
||||
main(client)
|
||||
14
demo/rmm_plugin/rmm_singlegpu.py
Normal file
14
demo/rmm_plugin/rmm_singlegpu.py
Normal file
@@ -0,0 +1,14 @@
|
||||
import xgboost as xgb
|
||||
import rmm
|
||||
from sklearn.datasets import make_classification
|
||||
|
||||
# Initialize RMM pool allocator
|
||||
rmm.reinitialize(pool_allocator=True)
|
||||
|
||||
X, y = make_classification(n_samples=10000, n_informative=5, n_classes=3)
|
||||
dtrain = xgb.DMatrix(X, label=y)
|
||||
|
||||
params = {'max_depth': 8, 'eta': 0.01, 'objective': 'multi:softprob', 'num_class': 3,
|
||||
'tree_method': 'gpu_hist'}
|
||||
# XGBoost will automatically use the RMM pool allocator
|
||||
bst = xgb.train(params, dtrain, num_boost_round=100, evals=[(dtrain, 'train')])
|
||||
Reference in New Issue
Block a user