Define the new device parameter. (#9362)

This commit is contained in:
Jiaming Yuan
2023-07-13 19:30:25 +08:00
committed by GitHub
parent 2d0cd2817e
commit 04aff3af8e
63 changed files with 827 additions and 477 deletions

View File

@@ -22,7 +22,8 @@ Supported parameters
GPU accelerated prediction is enabled by default for the above mentioned ``tree_method`` parameters but can be switched to CPU prediction by setting ``predictor`` to ``cpu_predictor``. This could be useful if you want to conserve GPU memory. Likewise when using CPU algorithms, GPU accelerated prediction can be enabled by setting ``predictor`` to ``gpu_predictor``.
The device ordinal (which GPU to use if you have many of them) can be selected using the
``gpu_id`` parameter, which defaults to 0 (the first device reported by CUDA runtime).
``device`` parameter, which defaults to 0 when "CUDA" is specified(the first device reported by CUDA
runtime).
The GPU algorithms currently work with CLI, Python, R, and JVM packages. See :doc:`/install` for details.
@@ -30,13 +31,13 @@ The GPU algorithms currently work with CLI, Python, R, and JVM packages. See :do
.. code-block:: python
:caption: Python example
param['gpu_id'] = 0
param["device"] = "cuda:0"
param['tree_method'] = 'gpu_hist'
.. code-block:: python
:caption: With Scikit-Learn interface
XGBRegressor(tree_method='gpu_hist', gpu_id=0)
XGBRegressor(tree_method='gpu_hist', device="cuda")
GPU-Accelerated SHAP values
@@ -45,7 +46,7 @@ XGBoost makes use of `GPUTreeShap <https://github.com/rapidsai/gputreeshap>`_ as
.. code-block:: python
model.set_param({"gpu_id": "0", "tree_method": "gpu_hist"})
model.set_param({"device": "cuda:0", "tree_method": "gpu_hist"})
shap_values = model.predict(dtrain, pred_contribs=True)
shap_interaction_values = model.predict(dtrain, pred_interactions=True)

View File

@@ -3,10 +3,10 @@ Installation Guide
##################
XGBoost provides binary packages for some language bindings. The binary packages support
the GPU algorithm (``gpu_hist``) on machines with NVIDIA GPUs. Please note that **training
with multiple GPUs is only supported for Linux platform**. See :doc:`gpu/index`. Also we
have both stable releases and nightly builds, see below for how to install them. For
building from source, visit :doc:`this page </build>`.
the GPU algorithm (``device=cuda:0``) on machines with NVIDIA GPUs. Please note that
**training with multiple GPUs is only supported for Linux platform**. See
:doc:`gpu/index`. Also we have both stable releases and nightly builds, see below for how
to install them. For building from source, visit :doc:`this page </build>`.
.. contents:: Contents

View File

@@ -59,6 +59,18 @@ General Parameters
- Feature dimension used in boosting, set to maximum dimension of the feature
* ``device`` [default= ``cpu``]
.. versionadded:: 2.0.0
- Device for XGBoost to run. User can set it to one of the following values:
+ ``cpu``: Use CPU.
+ ``cuda``: Use a GPU (CUDA device).
+ ``cuda:<ordinal>``: ``<ordinal>`` is an integer that specifies the ordinal of the GPU (which GPU do you want to use if you have more than one devices).
+ ``gpu``: Default GPU device selection from the list of available and supported devices. Only ``cuda`` devices are supported currently.
+ ``gpu:<ordinal>``: Default GPU device selection from the list of available and supported devices. Only ``cuda`` devices are supported currently.
Parameters for Tree Booster
===========================
* ``eta`` [default=0.3, alias: ``learning_rate``]
@@ -99,7 +111,7 @@ Parameters for Tree Booster
- ``gradient_based``: the selection probability for each training instance is proportional to the
*regularized absolute value* of gradients (more specifically, :math:`\sqrt{g^2+\lambda h^2}`).
``subsample`` may be set to as low as 0.1 without loss of model accuracy. Note that this
sampling method is only supported when ``tree_method`` is set to ``gpu_hist``; other tree
sampling method is only supported when ``tree_method`` is set to ``hist`` and the device is ``cuda``; other tree
methods only support ``uniform`` sampling.
* ``colsample_bytree``, ``colsample_bylevel``, ``colsample_bynode`` [default=1]
@@ -131,26 +143,15 @@ Parameters for Tree Booster
* ``tree_method`` string [default= ``auto``]
- The tree construction algorithm used in XGBoost. See description in the `reference paper <http://arxiv.org/abs/1603.02754>`_ and :doc:`treemethod`.
- XGBoost supports ``approx``, ``hist`` and ``gpu_hist`` for distributed training. Experimental support for external memory is available for ``approx`` and ``gpu_hist``.
- Choices: ``auto``, ``exact``, ``approx``, ``hist``, ``gpu_hist``, this is a
combination of commonly used updaters. For other updaters like ``refresh``, set the
parameter ``updater`` directly.
- Choices: ``auto``, ``exact``, ``approx``, ``hist``, this is a combination of commonly
used updaters. For other updaters like ``refresh``, set the parameter ``updater``
directly.
- ``auto``: Use heuristic to choose the fastest method.
- For small dataset, exact greedy (``exact``) will be used.
- For larger dataset, approximate algorithm (``approx``) will be chosen. It's
recommended to try ``hist`` and ``gpu_hist`` for higher performance with large
dataset.
(``gpu_hist``)has support for ``external memory``.
- Because old behavior is always use exact greedy in single machine, user will get a
message when approximate algorithm is chosen to notify this choice.
- ``auto``: Same as the ``hist`` tree method.
- ``exact``: Exact greedy algorithm. Enumerates all split candidates.
- ``approx``: Approximate greedy algorithm using quantile sketch and gradient histogram.
- ``hist``: Faster histogram optimized approximate greedy algorithm.
- ``gpu_hist``: GPU implementation of ``hist`` algorithm.
* ``scale_pos_weight`` [default=1]
@@ -163,7 +164,7 @@ Parameters for Tree Booster
- ``grow_colmaker``: non-distributed column-based construction of trees.
- ``grow_histmaker``: distributed tree construction with row-based data splitting based on global proposal of histogram counting.
- ``grow_quantile_histmaker``: Grow tree using quantized histogram.
- ``grow_gpu_hist``: Grow tree with GPU.
- ``grow_gpu_hist``: Grow tree with GPU. Same as setting tree method to ``hist`` and use ``device=cuda``.
- ``sync``: synchronizes trees in all distributed nodes.
- ``refresh``: refreshes tree's statistics and/or leaf values based on the current data. Note that no random subsampling of data rows is performed.
- ``prune``: prunes the splits where loss < min_split_loss (or gamma) and nodes that have depth greater than ``max_depth``.
@@ -183,7 +184,7 @@ Parameters for Tree Booster
* ``grow_policy`` [default= ``depthwise``]
- Controls a way new nodes are added to the tree.
- Currently supported only if ``tree_method`` is set to ``hist``, ``approx`` or ``gpu_hist``.
- Currently supported only if ``tree_method`` is set to ``hist`` or ``approx``.
- Choices: ``depthwise``, ``lossguide``
- ``depthwise``: split at nodes closest to the root.
@@ -195,7 +196,7 @@ Parameters for Tree Booster
* ``max_bin``, [default=256]
- Only used if ``tree_method`` is set to ``hist``, ``approx`` or ``gpu_hist``.
- Only used if ``tree_method`` is set to ``hist`` or ``approx``.
- Maximum number of discrete bins to bucket continuous features.
- Increasing this number improves the optimality of splits at the cost of higher computation time.

View File

@@ -3,14 +3,14 @@ Tree Methods
############
For training boosted tree models, there are 2 parameters used for choosing algorithms,
namely ``updater`` and ``tree_method``. XGBoost has 4 builtin tree methods, namely
``exact``, ``approx``, ``hist`` and ``gpu_hist``. Along with these tree methods, there
are also some free standing updaters including ``refresh``,
``prune`` and ``sync``. The parameter ``updater`` is more primitive than ``tree_method``
as the latter is just a pre-configuration of the former. The difference is mostly due to
historical reasons that each updater requires some specific configurations and might has
missing features. As we are moving forward, the gap between them is becoming more and
more irrelevant. We will collectively document them under tree methods.
namely ``updater`` and ``tree_method``. XGBoost has 3 builtin tree methods, namely
``exact``, ``approx`` and ``hist``. Along with these tree methods, there are also some
free standing updaters including ``refresh``, ``prune`` and ``sync``. The parameter
``updater`` is more primitive than ``tree_method`` as the latter is just a
pre-configuration of the former. The difference is mostly due to historical reasons that
each updater requires some specific configurations and might has missing features. As we
are moving forward, the gap between them is becoming more and more irrelevant. We will
collectively document them under tree methods.
**************
Exact Solution
@@ -19,23 +19,23 @@ Exact Solution
Exact means XGBoost considers all candidates from data for tree splitting, but underlying
the objective is still interpreted as a Taylor expansion.
1. ``exact``: Vanilla gradient boosting tree algorithm described in `reference paper
<http://arxiv.org/abs/1603.02754>`_. During each split finding procedure, it iterates
over all entries of input data. It's more accurate (among other greedy methods) but
slow in computation performance. Also it doesn't support distributed training as
XGBoost employs row spliting data distribution while ``exact`` tree method works on a
sorted column format. This tree method can be used with parameter ``tree_method`` set
to ``exact``.
1. ``exact``: The vanilla gradient boosting tree algorithm described in `reference paper
<http://arxiv.org/abs/1603.02754>`_. During split-finding, it iterates over all
entries of input data. It's more accurate (among other greedy methods) but
computationally slower in compared to other tree methods. Further more, its feature
set is limited. Features like distributed training and external memory that require
approximated quantiles are not supported. This tree method can be used with the
parameter ``tree_method`` set to ``exact``.
**********************
Approximated Solutions
**********************
As ``exact`` tree method is slow in performance and not scalable, we often employ
approximated training algorithms. These algorithms build a gradient histogram for each
node and iterate through the histogram instead of real dataset. Here we introduce the
implementations in XGBoost below.
As ``exact`` tree method is slow in computation performance and difficult to scale, we
often employ approximated training algorithms. These algorithms build a gradient
histogram for each node and iterate through the histogram instead of real dataset. Here
we introduce the implementations in XGBoost.
1. ``approx`` tree method: An approximation tree method described in `reference paper
<http://arxiv.org/abs/1603.02754>`_. It runs sketching before building each tree
@@ -48,22 +48,18 @@ implementations in XGBoost below.
this global sketch. This is the fastest algorithm as it runs sketching only once. The
algorithm can be accessed by setting ``tree_method`` to ``hist``.
3. ``gpu_hist`` tree method: The ``gpu_hist`` tree method is a GPU implementation of
``hist``, with additional support for gradient based sampling. The algorithm can be
accessed by setting ``tree_method`` to ``gpu_hist``.
************
Implications
************
Some objectives like ``reg:squarederror`` have constant hessian. In this case, ``hist``
or ``gpu_hist`` should be preferred as weighted sketching doesn't make sense with constant
Some objectives like ``reg:squarederror`` have constant hessian. In this case, the
``hist`` should be preferred as weighted sketching doesn't make sense with constant
weights. When using non-constant hessian objectives, sometimes ``approx`` yields better
accuracy, but with slower computation performance. Most of the time using ``(gpu)_hist``
with higher ``max_bin`` can achieve similar or even superior accuracy while maintaining
good performance. However, as xgboost is largely driven by community effort, the actual
implementations have some differences than pure math description. Result might have
slight differences than expectation, which we are currently trying to overcome.
accuracy, but with slower computation performance. Most of the time using ``hist`` with
higher ``max_bin`` can achieve similar or even superior accuracy while maintaining good
performance. However, as xgboost is largely driven by community effort, the actual
implementations have some differences than pure math description. Result might be
slightly different than expectation, which we are currently trying to overcome.
**************
Other Updaters
@@ -106,8 +102,8 @@ solely for the interest of documentation.
histogram creation step and uses sketching values directly during split evaluation. It
was never tested and contained some unknown bugs, we decided to remove it and focus our
resources on more promising algorithms instead. For accuracy, most of the time
``approx``, ``hist`` and ``gpu_hist`` are enough with some parameters tuning, so
removing them don't have any real practical impact.
``approx`` and ``hist`` are enough with some parameters tuning, so removing them don't
have any real practical impact.
3. ``grow_local_histmaker`` updater: An approximation tree method described in `reference
paper <http://arxiv.org/abs/1603.02754>`_. This updater was rarely used in practice so

View File

@@ -149,7 +149,7 @@ Also for inplace prediction:
.. code-block:: python
# where X is a dask DataFrame or dask Array backed by cupy or cuDF.
booster.set_param({"gpu_id": "0"})
booster.set_param({"device": "cuda:0"})
prediction = xgb.dask.inplace_predict(client, booster, X)
When input is ``da.Array`` object, output is always ``da.Array``. However, if the input

View File

@@ -163,7 +163,7 @@ Will print out something similar to (not actual output as it's too long for demo
{
"Learner": {
"generic_parameter": {
"gpu_id": "0",
"device": "cuda:0",
"gpu_page_size": "0",
"n_jobs": "0",
"random_state": "0",