[doc] Brief note about RMM SAM allocator. [skip ci] (#10712)
This commit is contained in:
parent
ec3f327c20
commit
fd365c147e
@ -58,4 +58,20 @@ Since with RMM the memory pool is pre-allocated on a specific device, changing t
|
||||
device ordinal in XGBoost can result in memory error ``cudaErrorIllegalAddress``. Use the
|
||||
``CUDA_VISIBLE_DEVICES`` environment variable instead of the ``device="cuda:1"`` parameter
|
||||
for selecting device. For distributed training, the distributed computing frameworks like
|
||||
``dask-cuda`` are responsible for device management.
|
||||
``dask-cuda`` are responsible for device management.
|
||||
|
||||
************************
|
||||
Memory Over-Subscription
|
||||
************************
|
||||
|
||||
.. warning::
|
||||
|
||||
This feature is still experimental and is under active development.
|
||||
|
||||
The newer NVIDIA platforms like `Grace-Hopper
|
||||
<https://www.nvidia.com/en-us/data-center/grace-hopper-superchip/>`__ use `NVLink-C2C
|
||||
<https://www.nvidia.com/en-us/data-center/nvlink-c2c/>`__, which allows the CPU and GPU to
|
||||
have a coherent memory model. Users can use the `SamHeadroomMemoryResource` in the latest
|
||||
RMM to utilize system memory for storing data. This can help XGBoost utilize memory from
|
||||
the host for GPU computation, but it may reduce performance due to slower CPU memory speed
|
||||
and page migration overhead.
|
||||
@ -50,6 +50,11 @@ Multi-node Multi-GPU Training
|
||||
|
||||
XGBoost supports fully distributed GPU training using `Dask <https://dask.org/>`_, ``Spark`` and ``PySpark``. For getting started with Dask see our tutorial :doc:`/tutorials/dask` and worked examples :doc:`/python/dask-examples/index`, also Python documentation :ref:`dask_api` for complete reference. For usage with ``Spark`` using Scala see :doc:`/jvm/xgboost4j_spark_gpu_tutorial`. Lastly for distributed GPU training with ``PySpark``, see :doc:`/tutorials/spark_estimator`.
|
||||
|
||||
RMM integration
|
||||
===============
|
||||
|
||||
XGBoost provides optional support for RMM integration. See :doc:`/python/rmm-examples/index` for more info.
|
||||
|
||||
|
||||
Memory usage
|
||||
============
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user