Document for device ordinal. (#9398)

- Rewrite GPU demos. notebook is converted to script to avoid committing additional png plots. - Add GPU demos into the sphinx gallery. - Add RMM demos into the sphinx gallery. - Test for firing threads with different device ordinals.
2023-07-22 15:26:29 +08:00
parent 22b0a55a04
commit 275da176ba
32 changed files with 351 additions and 398 deletions
--- a/demo/rmm_plugin/README.md
+++ b/demo/rmm_plugin/README.md
@@ -1,47 +0,0 @@
-Using XGBoost with RAPIDS Memory Manager (RMM) plugin (EXPERIMENTAL)
-====================================================================
-[RAPIDS Memory Manager (RMM)](https://github.com/rapidsai/rmm) library provides a collection of
-efficient memory allocators for NVIDIA GPUs. It is now possible to use XGBoost with memory
-allocators provided by RMM, by enabling the RMM integration plugin.
-
-The demos in this directory highlights one RMM allocator in particular: **the pool sub-allocator**.
-This allocator addresses the slow speed of `cudaMalloc()` by allocating a large chunk of memory
-upfront. Subsequent allocations will draw from the pool of already allocated memory and thus avoid
-the overhead of calling `cudaMalloc()` directly. See
-[this GTC talk slides](https://on-demand.gputechconf.com/gtc/2015/presentation/S5530-Stephen-Jones.pdf)
-for more details.
-
-Before running the demos, ensure that XGBoost is compiled with the RMM plugin enabled. To do this,
-run CMake with option `-DPLUGIN_RMM=ON` (`-DUSE_CUDA=ON` also required):
-```
-cmake .. -DUSE_CUDA=ON -DUSE_NCCL=ON -DPLUGIN_RMM=ON
-make -j4
-```
-CMake will attempt to locate the RMM library in your build environment. You may choose to build
-RMM from the source, or install it using the Conda package manager. If CMake cannot find RMM, you
-should specify the location of RMM with the CMake prefix:
-```
-# If using Conda:
-cmake .. -DUSE_CUDA=ON -DUSE_NCCL=ON -DPLUGIN_RMM=ON -DCMAKE_PREFIX_PATH=$CONDA_PREFIX
-# If using RMM installed with a custom location
-cmake .. -DUSE_CUDA=ON -DUSE_NCCL=ON -DPLUGIN_RMM=ON -DCMAKE_PREFIX_PATH=/path/to/rmm
-```
-
-# Informing XGBoost about RMM pool
-
-When XGBoost is compiled with RMM, most of the large size allocation will go through RMM
-allocators, but some small allocations in performance critical areas are using a different
-caching allocator so that we can have better control over memory allocation behavior.
-Users can override this behavior and force the use of rmm for all allocations by setting
-the global configuration ``use_rmm``:
-
-``` python
-with xgb.config_context(use_rmm=True):
-    clf = xgb.XGBClassifier(tree_method="gpu_hist")
-```
-
-Depending on the choice of memory pool size or type of allocator, this may have negative
-performance impact.
-
-* [Using RMM with a single GPU](./rmm_singlegpu.py)
-* [Using RMM with a local Dask cluster consisting of multiple GPUs](./rmm_mgpu_with_dask.py)
--- a/demo/rmm_plugin/README.rst
+++ b/demo/rmm_plugin/README.rst
@@ -0,0 +1,51 @@
+Using XGBoost with RAPIDS Memory Manager (RMM) plugin (EXPERIMENTAL)
+====================================================================
+
+`RAPIDS Memory Manager (RMM) <https://github.com/rapidsai/rmm>`__ library provides a
+collection of efficient memory allocators for NVIDIA GPUs. It is now possible to use
+XGBoost with memory allocators provided by RMM, by enabling the RMM integration plugin.
+
+The demos in this directory highlights one RMM allocator in particular: **the pool
+sub-allocator**.  This allocator addresses the slow speed of ``cudaMalloc()`` by
+allocating a large chunk of memory upfront. Subsequent allocations will draw from the pool
+of already allocated memory and thus avoid the overhead of calling ``cudaMalloc()``
+directly. See `this GTC talk slides
+<https://on-demand.gputechconf.com/gtc/2015/presentation/S5530-Stephen-Jones.pdf>`_ for
+more details.
+
+Before running the demos, ensure that XGBoost is compiled with the RMM plugin enabled. To do this,
+run CMake with option ``-DPLUGIN_RMM=ON`` (``-DUSE_CUDA=ON`` also required):
+
+.. code-block:: sh
+
+  cmake .. -DUSE_CUDA=ON -DUSE_NCCL=ON -DPLUGIN_RMM=ON
+  make -j$(nproc)
+
+CMake will attempt to locate the RMM library in your build environment. You may choose to build
+RMM from the source, or install it using the Conda package manager. If CMake cannot find RMM, you
+should specify the location of RMM with the CMake prefix:
+
+.. code-block:: sh
+
+  # If using Conda:
+  cmake .. -DUSE_CUDA=ON -DUSE_NCCL=ON -DPLUGIN_RMM=ON -DCMAKE_PREFIX_PATH=$CONDA_PREFIX
+  # If using RMM installed with a custom location
+  cmake .. -DUSE_CUDA=ON -DUSE_NCCL=ON -DPLUGIN_RMM=ON -DCMAKE_PREFIX_PATH=/path/to/rmm
+
+********************************
+Informing XGBoost about RMM pool
+********************************
+
+When XGBoost is compiled with RMM, most of the large size allocation will go through RMM
+allocators, but some small allocations in performance critical areas are using a different
+caching allocator so that we can have better control over memory allocation behavior.
+Users can override this behavior and force the use of rmm for all allocations by setting
+the global configuration ``use_rmm``:
+
+.. code-block:: python
+
+  with xgb.config_context(use_rmm=True):
+    clf = xgb.XGBClassifier(tree_method="hist", device="cuda")
+
+Depending on the choice of memory pool size or type of allocator, this may have negative
+performance impact.
--- a/demo/rmm_plugin/rmm_mgpu_with_dask.py
+++ b/demo/rmm_plugin/rmm_mgpu_with_dask.py
@@ -1,3 +1,7 @@
+"""
+Using rmm with Dask
+===================
+"""
 import dask
 from dask.distributed import Client
 from dask_cuda import LocalCUDACluster
@@ -11,25 +15,33 @@ def main(client):
    # xgb.set_config(use_rmm=True)

    X, y = make_classification(n_samples=10000, n_informative=5, n_classes=3)
-    # In pratice one should prefer loading the data with dask collections instead of using
-    # `from_array`.
+    # In pratice one should prefer loading the data with dask collections instead of
+    # using `from_array`.
    X = dask.array.from_array(X)
    y = dask.array.from_array(y)
    dtrain = xgb.dask.DaskDMatrix(client, X, label=y)

-    params = {'max_depth': 8, 'eta': 0.01, 'objective': 'multi:softprob', 'num_class': 3,
-              'tree_method': 'gpu_hist', 'eval_metric': 'merror'}
-    output = xgb.dask.train(client, params, dtrain, num_boost_round=100,
-                            evals=[(dtrain, 'train')])
-    bst = output['booster']
-    history = output['history']
-    for i, e in enumerate(history['train']['merror']):
-        print(f'[{i}] train-merror: {e}')
+    params = {
+        "max_depth": 8,
+        "eta": 0.01,
+        "objective": "multi:softprob",
+        "num_class": 3,
+        "tree_method": "hist",
+        "eval_metric": "merror",
+        "device": "cuda",
+    }
+    output = xgb.dask.train(
+        client, params, dtrain, num_boost_round=100, evals=[(dtrain, "train")]
+    )
+    bst = output["booster"]
+    history = output["history"]
+    for i, e in enumerate(history["train"]["merror"]):
+        print(f"[{i}] train-merror: {e}")


-if __name__ == '__main__':
-    # To use RMM pool allocator with a GPU Dask cluster, just add rmm_pool_size option to
-    # LocalCUDACluster constructor.
-    with LocalCUDACluster(rmm_pool_size='2GB') as cluster:
+if __name__ == "__main__":
+    # To use RMM pool allocator with a GPU Dask cluster, just add rmm_pool_size option
+    # to LocalCUDACluster constructor.
+    with LocalCUDACluster(rmm_pool_size="2GB") as cluster:
        with Client(cluster) as client:
            main(client)
--- a/demo/rmm_plugin/rmm_singlegpu.py
+++ b/demo/rmm_plugin/rmm_singlegpu.py
@@ -1,3 +1,7 @@
+"""
+Using rmm on a single node device
+=================================
+"""
 import rmm
 from sklearn.datasets import make_classification

@@ -16,7 +20,8 @@ params = {
    "eta": 0.01,
    "objective": "multi:softprob",
    "num_class": 3,
-    "tree_method": "gpu_hist",
+    "tree_method": "hist",
+    "device": "cuda",
 }
 # XGBoost will automatically use the RMM pool allocator
 bst = xgb.train(params, dtrain, num_boost_round=100, evals=[(dtrain, "train")])