[doc] Brief note about RMM SAM allocator. [skip ci] (#10712)

This commit is contained in:
Jiaming Yuan 2024-08-17 04:21:39 +08:00 committed by GitHub
parent ec3f327c20
commit fd365c147e
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 22 additions and 1 deletions

View File

@ -58,4 +58,20 @@ Since with RMM the memory pool is pre-allocated on a specific device, changing t
device ordinal in XGBoost can result in memory error ``cudaErrorIllegalAddress``. Use the
``CUDA_VISIBLE_DEVICES`` environment variable instead of the ``device="cuda:1"`` parameter
for selecting device. For distributed training, the distributed computing frameworks like
``dask-cuda`` are responsible for device management.
``dask-cuda`` are responsible for device management.
************************
Memory Over-Subscription
************************
.. warning::
This feature is still experimental and is under active development.
The newer NVIDIA platforms like `Grace-Hopper
<https://www.nvidia.com/en-us/data-center/grace-hopper-superchip/>`__ use `NVLink-C2C
<https://www.nvidia.com/en-us/data-center/nvlink-c2c/>`__, which allows the CPU and GPU to
have a coherent memory model. Users can use the `SamHeadroomMemoryResource` in the latest
RMM to utilize system memory for storing data. This can help XGBoost utilize memory from
the host for GPU computation, but it may reduce performance due to slower CPU memory speed
and page migration overhead.

View File

@ -50,6 +50,11 @@ Multi-node Multi-GPU Training
XGBoost supports fully distributed GPU training using `Dask <https://dask.org/>`_, ``Spark`` and ``PySpark``. For getting started with Dask see our tutorial :doc:`/tutorials/dask` and worked examples :doc:`/python/dask-examples/index`, also Python documentation :ref:`dask_api` for complete reference. For usage with ``Spark`` using Scala see :doc:`/jvm/xgboost4j_spark_gpu_tutorial`. Lastly for distributed GPU training with ``PySpark``, see :doc:`/tutorials/spark_estimator`.
RMM integration
===============
XGBoost provides optional support for RMM integration. See :doc:`/python/rmm-examples/index` for more info.
Memory usage
============