Document for device ordinal. (#9398)
- Rewrite GPU demos. notebook is converted to script to avoid committing additional png plots. - Add GPU demos into the sphinx gallery. - Add RMM demos into the sphinx gallery. - Test for firing threads with different device ordinals.
This commit is contained in:
2
doc/.gitignore
vendored
2
doc/.gitignore
vendored
@@ -6,3 +6,5 @@ doxygen
|
||||
parser.py
|
||||
*.pyc
|
||||
web-data
|
||||
# generated by doxygen
|
||||
tmp
|
||||
11
doc/conf.py
11
doc/conf.py
@@ -19,7 +19,6 @@ import sys
|
||||
import tarfile
|
||||
import urllib.request
|
||||
import warnings
|
||||
from subprocess import call
|
||||
from urllib.error import HTTPError
|
||||
|
||||
from sh.contrib import git
|
||||
@@ -148,12 +147,20 @@ extensions = [
|
||||
|
||||
sphinx_gallery_conf = {
|
||||
# path to your example scripts
|
||||
"examples_dirs": ["../demo/guide-python", "../demo/dask", "../demo/aft_survival"],
|
||||
"examples_dirs": [
|
||||
"../demo/guide-python",
|
||||
"../demo/dask",
|
||||
"../demo/aft_survival",
|
||||
"../demo/gpu_acceleration",
|
||||
"../demo/rmm_plugin"
|
||||
],
|
||||
# path to where to save gallery generated output
|
||||
"gallery_dirs": [
|
||||
"python/examples",
|
||||
"python/dask-examples",
|
||||
"python/survival-examples",
|
||||
"python/gpu-examples",
|
||||
"python/rmm-examples",
|
||||
],
|
||||
"matplotlib_animations": True,
|
||||
}
|
||||
|
||||
@@ -23,20 +23,19 @@ The GPU algorithms currently work with CLI, Python, R, and JVM packages. See :do
|
||||
:caption: Python example
|
||||
|
||||
params = dict()
|
||||
params["device"] = "cuda:0"
|
||||
params["device"] = "cuda"
|
||||
params["tree_method"] = "hist"
|
||||
Xy = xgboost.QuantileDMatrix(X, y)
|
||||
xgboost.train(params, Xy)
|
||||
|
||||
.. code-block:: python
|
||||
:caption: With Scikit-Learn interface
|
||||
:caption: With the Scikit-Learn interface
|
||||
|
||||
XGBRegressor(tree_method="hist", device="cuda")
|
||||
|
||||
|
||||
GPU-Accelerated SHAP values
|
||||
=============================
|
||||
XGBoost makes use of `GPUTreeShap <https://github.com/rapidsai/gputreeshap>`_ as a backend for computing shap values when the GPU predictor is selected.
|
||||
XGBoost makes use of `GPUTreeShap <https://github.com/rapidsai/gputreeshap>`_ as a backend for computing shap values when the GPU is used.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@@ -44,12 +43,12 @@ XGBoost makes use of `GPUTreeShap <https://github.com/rapidsai/gputreeshap>`_ as
|
||||
shap_values = booster.predict(dtrain, pred_contribs=True)
|
||||
shap_interaction_values = model.predict(dtrain, pred_interactions=True)
|
||||
|
||||
See examples `here <https://github.com/dmlc/xgboost/tree/master/demo/gpu_acceleration>`__.
|
||||
See :ref:`sphx_glr_python_gpu-examples_tree_shap.py` for a worked example.
|
||||
|
||||
Multi-node Multi-GPU Training
|
||||
=============================
|
||||
|
||||
XGBoost supports fully distributed GPU training using `Dask <https://dask.org/>`_, ``Spark`` and ``PySpark``. For getting started with Dask see our tutorial :doc:`/tutorials/dask` and worked examples `here <https://github.com/dmlc/xgboost/tree/master/demo/dask>`__, also Python documentation :ref:`dask_api` for complete reference. For usage with ``Spark`` using Scala see :doc:`/jvm/xgboost4j_spark_gpu_tutorial`. Lastly for distributed GPU training with ``PySpark``, see :doc:`/tutorials/spark_estimator`.
|
||||
XGBoost supports fully distributed GPU training using `Dask <https://dask.org/>`_, ``Spark`` and ``PySpark``. For getting started with Dask see our tutorial :doc:`/tutorials/dask` and worked examples :doc:`/python/dask-examples/index`, also Python documentation :ref:`dask_api` for complete reference. For usage with ``Spark`` using Scala see :doc:`/jvm/xgboost4j_spark_gpu_tutorial`. Lastly for distributed GPU training with ``PySpark``, see :doc:`/tutorials/spark_estimator`.
|
||||
|
||||
|
||||
Memory usage
|
||||
@@ -67,7 +66,8 @@ If you are getting out-of-memory errors on a big dataset, try the or :py:class:`
|
||||
|
||||
CPU-GPU Interoperability
|
||||
========================
|
||||
XGBoost models trained on GPUs can be used on CPU-only systems to generate predictions. For information about how to save and load an XGBoost model, see :doc:`/tutorials/saving_model`.
|
||||
|
||||
The model can be used on any device regardless of the one used to train it. For instance, a model trained using GPU can still work on a CPU-only machine and vice versa. For more information about model serialization, see :doc:`/tutorials/saving_model`.
|
||||
|
||||
|
||||
Developer notes
|
||||
|
||||
@@ -189,7 +189,7 @@ This will check out the latest stable version from the Maven Central.
|
||||
|
||||
For the latest release version number, please check `release page <https://github.com/dmlc/xgboost/releases>`_.
|
||||
|
||||
To enable the GPU algorithm (``tree_method='gpu_hist'``), use artifacts ``xgboost4j-gpu_2.12`` and ``xgboost4j-spark-gpu_2.12`` instead (note the ``gpu`` suffix).
|
||||
To enable the GPU algorithm (``device='cuda'``), use artifacts ``xgboost4j-gpu_2.12`` and ``xgboost4j-spark-gpu_2.12`` instead (note the ``gpu`` suffix).
|
||||
|
||||
|
||||
.. note:: Windows not supported in the JVM package
|
||||
@@ -325,4 +325,4 @@ The SNAPSHOT JARs are hosted by the XGBoost project. Every commit in the ``maste
|
||||
|
||||
You can browse the file listing of the Maven repository at https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/list.html.
|
||||
|
||||
To enable the GPU algorithm (``tree_method='gpu_hist'``), use artifacts ``xgboost4j-gpu_2.12`` and ``xgboost4j-spark-gpu_2.12`` instead (note the ``gpu`` suffix).
|
||||
To enable the GPU algorithm (``device='cuda'``), use artifacts ``xgboost4j-gpu_2.12`` and ``xgboost4j-spark-gpu_2.12`` instead (note the ``gpu`` suffix).
|
||||
|
||||
@@ -34,27 +34,6 @@ General Parameters
|
||||
|
||||
- Which booster to use. Can be ``gbtree``, ``gblinear`` or ``dart``; ``gbtree`` and ``dart`` use tree based models while ``gblinear`` uses linear functions.
|
||||
|
||||
* ``verbosity`` [default=1]
|
||||
|
||||
- Verbosity of printing messages. Valid values are 0 (silent), 1 (warning), 2 (info), 3
|
||||
(debug). Sometimes XGBoost tries to change configurations based on heuristics, which
|
||||
is displayed as warning message. If there's unexpected behaviour, please try to
|
||||
increase value of verbosity.
|
||||
|
||||
* ``validate_parameters`` [default to ``false``, except for Python, R and CLI interface]
|
||||
|
||||
- When set to True, XGBoost will perform validation of input parameters to check whether
|
||||
a parameter is used or not.
|
||||
|
||||
* ``nthread`` [default to maximum number of threads available if not set]
|
||||
|
||||
- Number of parallel threads used to run XGBoost. When choosing it, please keep thread
|
||||
contention and hyperthreading in mind.
|
||||
|
||||
* ``disable_default_eval_metric`` [default= ``false``]
|
||||
|
||||
- Flag to disable default metric. Set to 1 or ``true`` to disable.
|
||||
|
||||
* ``device`` [default= ``cpu``]
|
||||
|
||||
.. versionadded:: 2.0.0
|
||||
@@ -67,6 +46,29 @@ General Parameters
|
||||
+ ``gpu``: Default GPU device selection from the list of available and supported devices. Only ``cuda`` devices are supported currently.
|
||||
+ ``gpu:<ordinal>``: Default GPU device selection from the list of available and supported devices. Only ``cuda`` devices are supported currently.
|
||||
|
||||
For more information about GPU acceleration, see :doc:`/gpu/index`.
|
||||
|
||||
* ``verbosity`` [default=1]
|
||||
|
||||
- Verbosity of printing messages. Valid values are 0 (silent), 1 (warning), 2 (info), 3
|
||||
(debug). Sometimes XGBoost tries to change configurations based on heuristics, which
|
||||
is displayed as warning message. If there's unexpected behaviour, please try to
|
||||
increase value of verbosity.
|
||||
|
||||
* ``validate_parameters`` [default to ``false``, except for Python, R and CLI interface]
|
||||
|
||||
- When set to True, XGBoost will perform validation of input parameters to check whether
|
||||
a parameter is used or not. A warning is emitted when there's unknown parameter.
|
||||
|
||||
* ``nthread`` [default to maximum number of threads available if not set]
|
||||
|
||||
- Number of parallel threads used to run XGBoost. When choosing it, please keep thread
|
||||
contention and hyperthreading in mind.
|
||||
|
||||
* ``disable_default_eval_metric`` [default= ``false``]
|
||||
|
||||
- Flag to disable default metric. Set to 1 or ``true`` to disable.
|
||||
|
||||
Parameters for Tree Booster
|
||||
===========================
|
||||
* ``eta`` [default=0.3, alias: ``learning_rate``]
|
||||
@@ -160,7 +162,7 @@ Parameters for Tree Booster
|
||||
- ``grow_colmaker``: non-distributed column-based construction of trees.
|
||||
- ``grow_histmaker``: distributed tree construction with row-based data splitting based on global proposal of histogram counting.
|
||||
- ``grow_quantile_histmaker``: Grow tree using quantized histogram.
|
||||
- ``grow_gpu_hist``: Grow tree with GPU. Same as setting ``tree_method`` to ``hist`` and use ``device=cuda``.
|
||||
- ``grow_gpu_hist``: Grow tree with GPU. Enabled when ``tree_method`` is set to ``hist`` along with ``device=cuda``.
|
||||
- ``sync``: synchronizes trees in all distributed nodes.
|
||||
- ``refresh``: refreshes tree's statistics and/or leaf values based on the current data. Note that no random subsampling of data rows is performed.
|
||||
- ``prune``: prunes the splits where loss < min_split_loss (or gamma) and nodes that have depth greater than ``max_depth``.
|
||||
|
||||
4
doc/python/.gitignore
vendored
4
doc/python/.gitignore
vendored
@@ -1,3 +1,5 @@
|
||||
examples
|
||||
dask-examples
|
||||
survival-examples
|
||||
survival-examples
|
||||
gpu-examples
|
||||
rmm-examples
|
||||
@@ -17,3 +17,5 @@ Contents
|
||||
examples/index
|
||||
dask-examples/index
|
||||
survival-examples/index
|
||||
gpu-examples/index
|
||||
rmm-examples/index
|
||||
|
||||
@@ -124,7 +124,7 @@ Following table summarizes some differences in supported features between 4 tree
|
||||
`T` means supported while `F` means unsupported.
|
||||
|
||||
+------------------+-----------+---------------------+---------------------+------------------------+
|
||||
| | Exact | Approx | Hist | GPU Hist |
|
||||
| | Exact | Approx | Hist | Hist (GPU) |
|
||||
+==================+===========+=====================+=====================+========================+
|
||||
| grow_policy | Depthwise | depthwise/lossguide | depthwise/lossguide | depthwise/lossguide |
|
||||
+------------------+-----------+---------------------+---------------------+------------------------+
|
||||
@@ -141,5 +141,5 @@ Following table summarizes some differences in supported features between 4 tree
|
||||
|
||||
Features/parameters that are not mentioned here are universally supported for all 4 tree
|
||||
methods (for instance, column sampling and constraints). The `P` in external memory means
|
||||
partially supported. Please note that both categorical data and external memory are
|
||||
special handling. Please note that both categorical data and external memory are
|
||||
experimental.
|
||||
|
||||
@@ -35,8 +35,8 @@ parameter ``enable_categorical``:
|
||||
|
||||
.. code:: python
|
||||
|
||||
# Supported tree methods are `gpu_hist`, `approx`, and `hist`.
|
||||
clf = xgb.XGBClassifier(tree_method="gpu_hist", enable_categorical=True)
|
||||
# Supported tree methods are `approx` and `hist`.
|
||||
clf = xgb.XGBClassifier(tree_method="hist", enable_categorical=True, device="cuda")
|
||||
# X is the dataframe we created in previous snippet
|
||||
clf.fit(X, y)
|
||||
# Must use JSON/UBJSON for serialization, otherwise the information is lost.
|
||||
|
||||
@@ -81,7 +81,7 @@ constructor.
|
||||
it = Iterator(["file_0.svm", "file_1.svm", "file_2.svm"])
|
||||
Xy = xgboost.DMatrix(it)
|
||||
|
||||
# Other tree methods including ``hist`` and ``gpu_hist`` also work, but has some caveats
|
||||
# The ``approx`` also work, but with low performance. GPU implementation is different from CPU.
|
||||
# as noted in following sections.
|
||||
booster = xgboost.train({"tree_method": "hist"}, Xy)
|
||||
|
||||
@@ -118,15 +118,15 @@ to reduce the overhead of file reading.
|
||||
GPU Version (GPU Hist tree method)
|
||||
**********************************
|
||||
|
||||
External memory is supported by GPU algorithms (i.e. when ``tree_method`` is set to
|
||||
``gpu_hist``). However, the algorithm used for GPU is different from the one used for
|
||||
External memory is supported by GPU algorithms (i.e. when ``device`` is set to
|
||||
``cuda``). However, the algorithm used for GPU is different from the one used for
|
||||
CPU. When training on a CPU, the tree method iterates through all batches from external
|
||||
memory for each step of the tree construction algorithm. On the other hand, the GPU
|
||||
algorithm uses a hybrid approach. It iterates through the data during the beginning of
|
||||
each iteration and concatenates all batches into one in GPU memory. To reduce overall
|
||||
memory usage, users can utilize subsampling. The GPU hist tree method supports
|
||||
`gradient-based sampling`, enabling users to set a low sampling rate without compromising
|
||||
accuracy.
|
||||
each iteration and concatenates all batches into one in GPU memory for performance
|
||||
reasons. To reduce overall memory usage, users can utilize subsampling. The GPU hist tree
|
||||
method supports `gradient-based sampling`, enabling users to set a low sampling rate
|
||||
without compromising accuracy.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
|
||||
@@ -83,13 +83,14 @@ Some other examples:
|
||||
- ``(0,-1)``: No constraint on the first predictor and a decreasing constraint on the second.
|
||||
|
||||
|
||||
**Note for the 'hist' tree construction algorithm**.
|
||||
If ``tree_method`` is set to either ``hist``, ``approx`` or ``gpu_hist``, enabling
|
||||
monotonic constraints may produce unnecessarily shallow trees. This is because the
|
||||
``hist`` method reduces the number of candidate splits to be considered at each
|
||||
split. Monotonic constraints may wipe out all available split candidates, in which case no
|
||||
split is made. To reduce the effect, you may want to increase the ``max_bin`` parameter to
|
||||
consider more split candidates.
|
||||
.. note::
|
||||
|
||||
**Note for the 'hist' tree construction algorithm**. If ``tree_method`` is set to
|
||||
either ``hist`` or ``approx``, enabling monotonic constraints may produce unnecessarily
|
||||
shallow trees. This is because the ``hist`` method reduces the number of candidate
|
||||
splits to be considered at each split. Monotonic constraints may wipe out all available
|
||||
split candidates, in which case no split is made. To reduce the effect, you may want to
|
||||
increase the ``max_bin`` parameter to consider more split candidates.
|
||||
|
||||
|
||||
*******************
|
||||
|
||||
@@ -38,10 +38,6 @@ There are in general two ways that you can control overfitting in XGBoost:
|
||||
- This includes ``subsample`` and ``colsample_bytree``.
|
||||
- You can also reduce stepsize ``eta``. Remember to increase ``num_round`` when you do so.
|
||||
|
||||
***************************
|
||||
Faster training performance
|
||||
***************************
|
||||
There's a parameter called ``tree_method``, set it to ``hist`` or ``gpu_hist`` for faster computation.
|
||||
|
||||
*************************
|
||||
Handle Imbalanced Dataset
|
||||
|
||||
@@ -50,13 +50,14 @@ Here is a sample parameter dictionary for training a random forest on a GPU usin
|
||||
xgboost::
|
||||
|
||||
params = {
|
||||
'colsample_bynode': 0.8,
|
||||
'learning_rate': 1,
|
||||
'max_depth': 5,
|
||||
'num_parallel_tree': 100,
|
||||
'objective': 'binary:logistic',
|
||||
'subsample': 0.8,
|
||||
'tree_method': 'gpu_hist'
|
||||
"colsample_bynode": 0.8,
|
||||
"learning_rate": 1,
|
||||
"max_depth": 5,
|
||||
"num_parallel_tree": 100,
|
||||
"objective": "binary:logistic",
|
||||
"subsample": 0.8,
|
||||
"tree_method": "hist",
|
||||
"device": "cuda",
|
||||
}
|
||||
|
||||
A random forest model can then be trained as follows::
|
||||
|
||||
@@ -174,7 +174,7 @@ Will print out something similar to (not actual output as it's too long for demo
|
||||
"gbtree_train_param": {
|
||||
"num_parallel_tree": "1",
|
||||
"process_type": "default",
|
||||
"tree_method": "gpu_hist",
|
||||
"tree_method": "hist",
|
||||
"updater": "grow_gpu_hist",
|
||||
"updater_seq": "grow_gpu_hist"
|
||||
},
|
||||
|
||||
Reference in New Issue
Block a user