Document for device ordinal. (#9398)

- Rewrite GPU demos. notebook is converted to script to avoid committing additional png plots.
- Add GPU demos into the sphinx gallery.
- Add RMM demos into the sphinx gallery.
- Test for firing threads with different device ordinals.
This commit is contained in:
Jiaming Yuan
2023-07-22 15:26:29 +08:00
committed by GitHub
parent 22b0a55a04
commit 275da176ba
32 changed files with 351 additions and 398 deletions

2
doc/.gitignore vendored
View File

@@ -6,3 +6,5 @@ doxygen
parser.py
*.pyc
web-data
# generated by doxygen
tmp

View File

@@ -19,7 +19,6 @@ import sys
import tarfile
import urllib.request
import warnings
from subprocess import call
from urllib.error import HTTPError
from sh.contrib import git
@@ -148,12 +147,20 @@ extensions = [
sphinx_gallery_conf = {
# path to your example scripts
"examples_dirs": ["../demo/guide-python", "../demo/dask", "../demo/aft_survival"],
"examples_dirs": [
"../demo/guide-python",
"../demo/dask",
"../demo/aft_survival",
"../demo/gpu_acceleration",
"../demo/rmm_plugin"
],
# path to where to save gallery generated output
"gallery_dirs": [
"python/examples",
"python/dask-examples",
"python/survival-examples",
"python/gpu-examples",
"python/rmm-examples",
],
"matplotlib_animations": True,
}

View File

@@ -23,20 +23,19 @@ The GPU algorithms currently work with CLI, Python, R, and JVM packages. See :do
:caption: Python example
params = dict()
params["device"] = "cuda:0"
params["device"] = "cuda"
params["tree_method"] = "hist"
Xy = xgboost.QuantileDMatrix(X, y)
xgboost.train(params, Xy)
.. code-block:: python
:caption: With Scikit-Learn interface
:caption: With the Scikit-Learn interface
XGBRegressor(tree_method="hist", device="cuda")
GPU-Accelerated SHAP values
=============================
XGBoost makes use of `GPUTreeShap <https://github.com/rapidsai/gputreeshap>`_ as a backend for computing shap values when the GPU predictor is selected.
XGBoost makes use of `GPUTreeShap <https://github.com/rapidsai/gputreeshap>`_ as a backend for computing shap values when the GPU is used.
.. code-block:: python
@@ -44,12 +43,12 @@ XGBoost makes use of `GPUTreeShap <https://github.com/rapidsai/gputreeshap>`_ as
shap_values = booster.predict(dtrain, pred_contribs=True)
shap_interaction_values = model.predict(dtrain, pred_interactions=True)
See examples `here <https://github.com/dmlc/xgboost/tree/master/demo/gpu_acceleration>`__.
See :ref:`sphx_glr_python_gpu-examples_tree_shap.py` for a worked example.
Multi-node Multi-GPU Training
=============================
XGBoost supports fully distributed GPU training using `Dask <https://dask.org/>`_, ``Spark`` and ``PySpark``. For getting started with Dask see our tutorial :doc:`/tutorials/dask` and worked examples `here <https://github.com/dmlc/xgboost/tree/master/demo/dask>`__, also Python documentation :ref:`dask_api` for complete reference. For usage with ``Spark`` using Scala see :doc:`/jvm/xgboost4j_spark_gpu_tutorial`. Lastly for distributed GPU training with ``PySpark``, see :doc:`/tutorials/spark_estimator`.
XGBoost supports fully distributed GPU training using `Dask <https://dask.org/>`_, ``Spark`` and ``PySpark``. For getting started with Dask see our tutorial :doc:`/tutorials/dask` and worked examples :doc:`/python/dask-examples/index`, also Python documentation :ref:`dask_api` for complete reference. For usage with ``Spark`` using Scala see :doc:`/jvm/xgboost4j_spark_gpu_tutorial`. Lastly for distributed GPU training with ``PySpark``, see :doc:`/tutorials/spark_estimator`.
Memory usage
@@ -67,7 +66,8 @@ If you are getting out-of-memory errors on a big dataset, try the or :py:class:`
CPU-GPU Interoperability
========================
XGBoost models trained on GPUs can be used on CPU-only systems to generate predictions. For information about how to save and load an XGBoost model, see :doc:`/tutorials/saving_model`.
The model can be used on any device regardless of the one used to train it. For instance, a model trained using GPU can still work on a CPU-only machine and vice versa. For more information about model serialization, see :doc:`/tutorials/saving_model`.
Developer notes

View File

@@ -189,7 +189,7 @@ This will check out the latest stable version from the Maven Central.
For the latest release version number, please check `release page <https://github.com/dmlc/xgboost/releases>`_.
To enable the GPU algorithm (``tree_method='gpu_hist'``), use artifacts ``xgboost4j-gpu_2.12`` and ``xgboost4j-spark-gpu_2.12`` instead (note the ``gpu`` suffix).
To enable the GPU algorithm (``device='cuda'``), use artifacts ``xgboost4j-gpu_2.12`` and ``xgboost4j-spark-gpu_2.12`` instead (note the ``gpu`` suffix).
.. note:: Windows not supported in the JVM package
@@ -325,4 +325,4 @@ The SNAPSHOT JARs are hosted by the XGBoost project. Every commit in the ``maste
You can browse the file listing of the Maven repository at https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/list.html.
To enable the GPU algorithm (``tree_method='gpu_hist'``), use artifacts ``xgboost4j-gpu_2.12`` and ``xgboost4j-spark-gpu_2.12`` instead (note the ``gpu`` suffix).
To enable the GPU algorithm (``device='cuda'``), use artifacts ``xgboost4j-gpu_2.12`` and ``xgboost4j-spark-gpu_2.12`` instead (note the ``gpu`` suffix).

View File

@@ -34,27 +34,6 @@ General Parameters
- Which booster to use. Can be ``gbtree``, ``gblinear`` or ``dart``; ``gbtree`` and ``dart`` use tree based models while ``gblinear`` uses linear functions.
* ``verbosity`` [default=1]
- Verbosity of printing messages. Valid values are 0 (silent), 1 (warning), 2 (info), 3
(debug). Sometimes XGBoost tries to change configurations based on heuristics, which
is displayed as warning message. If there's unexpected behaviour, please try to
increase value of verbosity.
* ``validate_parameters`` [default to ``false``, except for Python, R and CLI interface]
- When set to True, XGBoost will perform validation of input parameters to check whether
a parameter is used or not.
* ``nthread`` [default to maximum number of threads available if not set]
- Number of parallel threads used to run XGBoost. When choosing it, please keep thread
contention and hyperthreading in mind.
* ``disable_default_eval_metric`` [default= ``false``]
- Flag to disable default metric. Set to 1 or ``true`` to disable.
* ``device`` [default= ``cpu``]
.. versionadded:: 2.0.0
@@ -67,6 +46,29 @@ General Parameters
+ ``gpu``: Default GPU device selection from the list of available and supported devices. Only ``cuda`` devices are supported currently.
+ ``gpu:<ordinal>``: Default GPU device selection from the list of available and supported devices. Only ``cuda`` devices are supported currently.
For more information about GPU acceleration, see :doc:`/gpu/index`.
* ``verbosity`` [default=1]
- Verbosity of printing messages. Valid values are 0 (silent), 1 (warning), 2 (info), 3
(debug). Sometimes XGBoost tries to change configurations based on heuristics, which
is displayed as warning message. If there's unexpected behaviour, please try to
increase value of verbosity.
* ``validate_parameters`` [default to ``false``, except for Python, R and CLI interface]
- When set to True, XGBoost will perform validation of input parameters to check whether
a parameter is used or not. A warning is emitted when there's unknown parameter.
* ``nthread`` [default to maximum number of threads available if not set]
- Number of parallel threads used to run XGBoost. When choosing it, please keep thread
contention and hyperthreading in mind.
* ``disable_default_eval_metric`` [default= ``false``]
- Flag to disable default metric. Set to 1 or ``true`` to disable.
Parameters for Tree Booster
===========================
* ``eta`` [default=0.3, alias: ``learning_rate``]
@@ -160,7 +162,7 @@ Parameters for Tree Booster
- ``grow_colmaker``: non-distributed column-based construction of trees.
- ``grow_histmaker``: distributed tree construction with row-based data splitting based on global proposal of histogram counting.
- ``grow_quantile_histmaker``: Grow tree using quantized histogram.
- ``grow_gpu_hist``: Grow tree with GPU. Same as setting ``tree_method`` to ``hist`` and use ``device=cuda``.
- ``grow_gpu_hist``: Grow tree with GPU. Enabled when ``tree_method`` is set to ``hist`` along with ``device=cuda``.
- ``sync``: synchronizes trees in all distributed nodes.
- ``refresh``: refreshes tree's statistics and/or leaf values based on the current data. Note that no random subsampling of data rows is performed.
- ``prune``: prunes the splits where loss < min_split_loss (or gamma) and nodes that have depth greater than ``max_depth``.

View File

@@ -1,3 +1,5 @@
examples
dask-examples
survival-examples
survival-examples
gpu-examples
rmm-examples

View File

@@ -17,3 +17,5 @@ Contents
examples/index
dask-examples/index
survival-examples/index
gpu-examples/index
rmm-examples/index

View File

@@ -124,7 +124,7 @@ Following table summarizes some differences in supported features between 4 tree
`T` means supported while `F` means unsupported.
+------------------+-----------+---------------------+---------------------+------------------------+
| | Exact | Approx | Hist | GPU Hist |
| | Exact | Approx | Hist | Hist (GPU) |
+==================+===========+=====================+=====================+========================+
| grow_policy | Depthwise | depthwise/lossguide | depthwise/lossguide | depthwise/lossguide |
+------------------+-----------+---------------------+---------------------+------------------------+
@@ -141,5 +141,5 @@ Following table summarizes some differences in supported features between 4 tree
Features/parameters that are not mentioned here are universally supported for all 4 tree
methods (for instance, column sampling and constraints). The `P` in external memory means
partially supported. Please note that both categorical data and external memory are
special handling. Please note that both categorical data and external memory are
experimental.

View File

@@ -35,8 +35,8 @@ parameter ``enable_categorical``:
.. code:: python
# Supported tree methods are `gpu_hist`, `approx`, and `hist`.
clf = xgb.XGBClassifier(tree_method="gpu_hist", enable_categorical=True)
# Supported tree methods are `approx` and `hist`.
clf = xgb.XGBClassifier(tree_method="hist", enable_categorical=True, device="cuda")
# X is the dataframe we created in previous snippet
clf.fit(X, y)
# Must use JSON/UBJSON for serialization, otherwise the information is lost.

View File

@@ -81,7 +81,7 @@ constructor.
it = Iterator(["file_0.svm", "file_1.svm", "file_2.svm"])
Xy = xgboost.DMatrix(it)
# Other tree methods including ``hist`` and ``gpu_hist`` also work, but has some caveats
# The ``approx`` also work, but with low performance. GPU implementation is different from CPU.
# as noted in following sections.
booster = xgboost.train({"tree_method": "hist"}, Xy)
@@ -118,15 +118,15 @@ to reduce the overhead of file reading.
GPU Version (GPU Hist tree method)
**********************************
External memory is supported by GPU algorithms (i.e. when ``tree_method`` is set to
``gpu_hist``). However, the algorithm used for GPU is different from the one used for
External memory is supported by GPU algorithms (i.e. when ``device`` is set to
``cuda``). However, the algorithm used for GPU is different from the one used for
CPU. When training on a CPU, the tree method iterates through all batches from external
memory for each step of the tree construction algorithm. On the other hand, the GPU
algorithm uses a hybrid approach. It iterates through the data during the beginning of
each iteration and concatenates all batches into one in GPU memory. To reduce overall
memory usage, users can utilize subsampling. The GPU hist tree method supports
`gradient-based sampling`, enabling users to set a low sampling rate without compromising
accuracy.
each iteration and concatenates all batches into one in GPU memory for performance
reasons. To reduce overall memory usage, users can utilize subsampling. The GPU hist tree
method supports `gradient-based sampling`, enabling users to set a low sampling rate
without compromising accuracy.
.. code-block:: python

View File

@@ -83,13 +83,14 @@ Some other examples:
- ``(0,-1)``: No constraint on the first predictor and a decreasing constraint on the second.
**Note for the 'hist' tree construction algorithm**.
If ``tree_method`` is set to either ``hist``, ``approx`` or ``gpu_hist``, enabling
monotonic constraints may produce unnecessarily shallow trees. This is because the
``hist`` method reduces the number of candidate splits to be considered at each
split. Monotonic constraints may wipe out all available split candidates, in which case no
split is made. To reduce the effect, you may want to increase the ``max_bin`` parameter to
consider more split candidates.
.. note::
**Note for the 'hist' tree construction algorithm**. If ``tree_method`` is set to
either ``hist`` or ``approx``, enabling monotonic constraints may produce unnecessarily
shallow trees. This is because the ``hist`` method reduces the number of candidate
splits to be considered at each split. Monotonic constraints may wipe out all available
split candidates, in which case no split is made. To reduce the effect, you may want to
increase the ``max_bin`` parameter to consider more split candidates.
*******************

View File

@@ -38,10 +38,6 @@ There are in general two ways that you can control overfitting in XGBoost:
- This includes ``subsample`` and ``colsample_bytree``.
- You can also reduce stepsize ``eta``. Remember to increase ``num_round`` when you do so.
***************************
Faster training performance
***************************
There's a parameter called ``tree_method``, set it to ``hist`` or ``gpu_hist`` for faster computation.
*************************
Handle Imbalanced Dataset

View File

@@ -50,13 +50,14 @@ Here is a sample parameter dictionary for training a random forest on a GPU usin
xgboost::
params = {
'colsample_bynode': 0.8,
'learning_rate': 1,
'max_depth': 5,
'num_parallel_tree': 100,
'objective': 'binary:logistic',
'subsample': 0.8,
'tree_method': 'gpu_hist'
"colsample_bynode": 0.8,
"learning_rate": 1,
"max_depth": 5,
"num_parallel_tree": 100,
"objective": "binary:logistic",
"subsample": 0.8,
"tree_method": "hist",
"device": "cuda",
}
A random forest model can then be trained as follows::

View File

@@ -174,7 +174,7 @@ Will print out something similar to (not actual output as it's too long for demo
"gbtree_train_param": {
"num_parallel_tree": "1",
"process_type": "default",
"tree_method": "gpu_hist",
"tree_method": "hist",
"updater": "grow_gpu_hist",
"updater_seq": "grow_gpu_hist"
},