diff --git a/demo/rmm_plugin/README.rst b/demo/rmm_plugin/README.rst index 28b816eb2..809d7aebd 100644 --- a/demo/rmm_plugin/README.rst +++ b/demo/rmm_plugin/README.rst @@ -58,4 +58,20 @@ Since with RMM the memory pool is pre-allocated on a specific device, changing t device ordinal in XGBoost can result in memory error ``cudaErrorIllegalAddress``. Use the ``CUDA_VISIBLE_DEVICES`` environment variable instead of the ``device="cuda:1"`` parameter for selecting device. For distributed training, the distributed computing frameworks like -``dask-cuda`` are responsible for device management. \ No newline at end of file +``dask-cuda`` are responsible for device management. + +************************ +Memory Over-Subscription +************************ + +.. warning:: + + This feature is still experimental and is under active development. + +The newer NVIDIA platforms like `Grace-Hopper +`__ use `NVLink-C2C +`__, which allows the CPU and GPU to +have a coherent memory model. Users can use the `SamHeadroomMemoryResource` in the latest +RMM to utilize system memory for storing data. This can help XGBoost utilize memory from +the host for GPU computation, but it may reduce performance due to slower CPU memory speed +and page migration overhead. \ No newline at end of file diff --git a/doc/gpu/index.rst b/doc/gpu/index.rst index 468362302..13e8c9e14 100644 --- a/doc/gpu/index.rst +++ b/doc/gpu/index.rst @@ -50,6 +50,11 @@ Multi-node Multi-GPU Training XGBoost supports fully distributed GPU training using `Dask `_, ``Spark`` and ``PySpark``. For getting started with Dask see our tutorial :doc:`/tutorials/dask` and worked examples :doc:`/python/dask-examples/index`, also Python documentation :ref:`dask_api` for complete reference. For usage with ``Spark`` using Scala see :doc:`/jvm/xgboost4j_spark_gpu_tutorial`. Lastly for distributed GPU training with ``PySpark``, see :doc:`/tutorials/spark_estimator`. +RMM integration +=============== + +XGBoost provides optional support for RMM integration. See :doc:`/python/rmm-examples/index` for more info. + Memory usage ============