Define the new device parameter. (#9362)

2023-07-13 19:30:25 +08:00
parent 2d0cd2817e
commit 04aff3af8e
63 changed files with 827 additions and 477 deletions
--- a/doc/gpu/index.rst
+++ b/doc/gpu/index.rst
@@ -22,7 +22,8 @@ Supported parameters
 GPU accelerated prediction is enabled by default for the above mentioned ``tree_method`` parameters but can be switched to CPU prediction by setting ``predictor`` to ``cpu_predictor``. This could be useful if you want to conserve GPU memory. Likewise when using CPU algorithms, GPU accelerated prediction can be enabled by setting ``predictor`` to ``gpu_predictor``.

 The device ordinal (which GPU to use if you have many of them) can be selected using the
-``gpu_id`` parameter, which defaults to 0 (the first device reported by CUDA runtime).
+``device`` parameter, which defaults to 0 when "CUDA" is specified(the first device reported by CUDA
+runtime).


 The GPU algorithms currently work with CLI, Python, R, and JVM packages. See :doc:`/install` for details.
@@ -30,13 +31,13 @@ The GPU algorithms currently work with CLI, Python, R, and JVM packages. See :do
 .. code-block:: python
  :caption: Python example

-  param['gpu_id'] = 0
+  param["device"] = "cuda:0"
  param['tree_method'] = 'gpu_hist'

 .. code-block:: python
  :caption: With Scikit-Learn interface

-  XGBRegressor(tree_method='gpu_hist', gpu_id=0)
+  XGBRegressor(tree_method='gpu_hist', device="cuda")


 GPU-Accelerated SHAP values
@@ -45,7 +46,7 @@ XGBoost makes use of `GPUTreeShap <https://github.com/rapidsai/gputreeshap>`_ as

 .. code-block:: python

-  model.set_param({"gpu_id": "0", "tree_method": "gpu_hist"})
+  model.set_param({"device": "cuda:0", "tree_method": "gpu_hist"})
  shap_values = model.predict(dtrain, pred_contribs=True)
  shap_interaction_values = model.predict(dtrain, pred_interactions=True)

--- a/doc/install.rst
+++ b/doc/install.rst
@@ -3,10 +3,10 @@ Installation Guide
 ##################

 XGBoost provides binary packages for some language bindings.  The binary packages support
-the GPU algorithm (``gpu_hist``) on machines with NVIDIA GPUs. Please note that **training
-with multiple GPUs is only supported for Linux platform**. See :doc:`gpu/index`.  Also we
-have both stable releases and nightly builds, see below for how to install them.  For
-building from source, visit :doc:`this page </build>`.
+the GPU algorithm (``device=cuda:0``) on machines with NVIDIA GPUs. Please note that
+**training with multiple GPUs is only supported for Linux platform**. See
+:doc:`gpu/index`.  Also we have both stable releases and nightly builds, see below for how
+to install them.  For building from source, visit :doc:`this page </build>`.

 .. contents:: Contents

--- a/doc/parameter.rst
+++ b/doc/parameter.rst
@@ -59,6 +59,18 @@ General Parameters

  - Feature dimension used in boosting, set to maximum dimension of the feature

+* ``device`` [default= ``cpu``]
+
+  .. versionadded:: 2.0.0
+
+  - Device for XGBoost to run. User can set it to one of the following values:
+
+    + ``cpu``: Use CPU.
+    + ``cuda``: Use a GPU (CUDA device).
+    + ``cuda:<ordinal>``: ``<ordinal>`` is an integer that specifies the ordinal of the GPU (which GPU do you want to use if you have more than one devices).
+    + ``gpu``: Default GPU device selection from the list of available and supported devices. Only ``cuda`` devices are supported currently.
+    + ``gpu:<ordinal>``: Default GPU device selection from the list of available and supported devices. Only ``cuda`` devices are supported currently.
+
 Parameters for Tree Booster
 ===========================
 * ``eta`` [default=0.3, alias: ``learning_rate``]
@@ -99,7 +111,7 @@ Parameters for Tree Booster
  - ``gradient_based``: the selection probability for each training instance is proportional to the
    *regularized absolute value* of gradients (more specifically, :math:`\sqrt{g^2+\lambda h^2}`).
    ``subsample`` may be set to as low as 0.1 without loss of model accuracy. Note that this
-    sampling method is only supported when ``tree_method`` is set to ``gpu_hist``; other tree
+    sampling method is only supported when ``tree_method`` is set to ``hist`` and the device is ``cuda``; other tree
    methods only support ``uniform`` sampling.

 * ``colsample_bytree``, ``colsample_bylevel``, ``colsample_bynode`` [default=1]
@@ -131,26 +143,15 @@ Parameters for Tree Booster
 * ``tree_method`` string [default= ``auto``]

  - The tree construction algorithm used in XGBoost. See description in the `reference paper <http://arxiv.org/abs/1603.02754>`_ and :doc:`treemethod`.
-  - XGBoost supports  ``approx``, ``hist`` and ``gpu_hist`` for distributed training.  Experimental support for external memory is available for ``approx`` and ``gpu_hist``.

-  - Choices: ``auto``, ``exact``, ``approx``, ``hist``, ``gpu_hist``, this is a
-    combination of commonly used updaters.  For other updaters like ``refresh``, set the
-    parameter ``updater`` directly.
+  - Choices: ``auto``, ``exact``, ``approx``, ``hist``, this is a combination of commonly
+    used updaters.  For other updaters like ``refresh``, set the parameter ``updater``
+    directly.

-    - ``auto``: Use heuristic to choose the fastest method.
-
-      - For small dataset, exact greedy (``exact``) will be used.
-      - For larger dataset, approximate algorithm (``approx``) will be chosen.  It's
-        recommended to try ``hist`` and ``gpu_hist`` for higher performance with large
-        dataset.
-        (``gpu_hist``)has support for ``external memory``.
-
-      - Because old behavior is always use exact greedy in single machine, user will get a
-        message when approximate algorithm is chosen to notify this choice.
+    - ``auto``: Same as the ``hist`` tree method.
    - ``exact``: Exact greedy algorithm.  Enumerates all split candidates.
    - ``approx``: Approximate greedy algorithm using quantile sketch and gradient histogram.
    - ``hist``: Faster histogram optimized approximate greedy algorithm.
-    - ``gpu_hist``: GPU implementation of ``hist`` algorithm.

 * ``scale_pos_weight`` [default=1]

@@ -163,7 +164,7 @@ Parameters for Tree Booster
    - ``grow_colmaker``: non-distributed column-based construction of trees.
    - ``grow_histmaker``: distributed tree construction with row-based data splitting based on global proposal of histogram counting.
    - ``grow_quantile_histmaker``: Grow tree using quantized histogram.
-    - ``grow_gpu_hist``: Grow tree with GPU.
+    - ``grow_gpu_hist``: Grow tree with GPU. Same as setting tree method to ``hist`` and use ``device=cuda``.
    - ``sync``: synchronizes trees in all distributed nodes.
    - ``refresh``: refreshes tree's statistics and/or leaf values based on the current data. Note that no random subsampling of data rows is performed.
    - ``prune``: prunes the splits where loss < min_split_loss (or gamma) and nodes that have depth greater than ``max_depth``.
@@ -183,7 +184,7 @@ Parameters for Tree Booster
 * ``grow_policy`` [default= ``depthwise``]

  - Controls a way new nodes are added to the tree.
-  - Currently supported only if ``tree_method`` is set to ``hist``, ``approx`` or ``gpu_hist``.
+  - Currently supported only if ``tree_method`` is set to ``hist`` or ``approx``.
  - Choices: ``depthwise``, ``lossguide``

    - ``depthwise``: split at nodes closest to the root.
@@ -195,7 +196,7 @@ Parameters for Tree Booster

 * ``max_bin``, [default=256]

-  - Only used if ``tree_method`` is set to ``hist``, ``approx`` or ``gpu_hist``.
+  - Only used if ``tree_method`` is set to ``hist`` or ``approx``.
  - Maximum number of discrete bins to bucket continuous features.
  - Increasing this number improves the optimality of splits at the cost of higher computation time.

--- a/doc/treemethod.rst
+++ b/doc/treemethod.rst
@@ -3,14 +3,14 @@ Tree Methods
 ############

 For training boosted tree models, there are 2 parameters used for choosing algorithms,
-namely ``updater`` and ``tree_method``.  XGBoost has 4 builtin tree methods, namely
-``exact``, ``approx``, ``hist`` and ``gpu_hist``.  Along with these tree methods, there
-are also some free standing updaters including ``refresh``,
-``prune`` and ``sync``.  The parameter ``updater`` is more primitive than ``tree_method``
-as the latter is just a pre-configuration of the former.  The difference is mostly due to
-historical reasons that each updater requires some specific configurations and might has
-missing features.  As we are moving forward, the gap between them is becoming more and
-more irrelevant.  We will collectively document them under tree methods.
+namely ``updater`` and ``tree_method``.  XGBoost has 3 builtin tree methods, namely
+``exact``, ``approx`` and ``hist``.  Along with these tree methods, there are also some
+free standing updaters including ``refresh``, ``prune`` and ``sync``.  The parameter
+``updater`` is more primitive than ``tree_method`` as the latter is just a
+pre-configuration of the former.  The difference is mostly due to historical reasons that
+each updater requires some specific configurations and might has missing features.  As we
+are moving forward, the gap between them is becoming more and more irrelevant.  We will
+collectively document them under tree methods.

 **************
 Exact Solution
@@ -19,23 +19,23 @@ Exact Solution
 Exact means XGBoost considers all candidates from data for tree splitting, but underlying
 the objective is still interpreted as a Taylor expansion.

-1. ``exact``: Vanilla gradient boosting tree algorithm described in `reference paper
-   <http://arxiv.org/abs/1603.02754>`_.  During each split finding procedure, it iterates
-   over all entries of input data.  It's more accurate (among other greedy methods) but
-   slow in computation performance.  Also it doesn't support distributed training as
-   XGBoost employs row spliting data distribution while ``exact`` tree method works on a
-   sorted column format.  This tree method can be used with parameter ``tree_method`` set
-   to ``exact``.
+1. ``exact``: The vanilla gradient boosting tree algorithm described in `reference paper
+   <http://arxiv.org/abs/1603.02754>`_.  During split-finding, it iterates over all
+   entries of input data.  It's more accurate (among other greedy methods) but
+   computationally slower in compared to other tree methods.  Further more, its feature
+   set is limited. Features like distributed training and external memory that require
+   approximated quantiles are not supported. This tree method can be used with the
+   parameter ``tree_method`` set to ``exact``.


 **********************
 Approximated Solutions
 **********************

-As ``exact`` tree method is slow in performance and not scalable, we often employ
-approximated training algorithms.  These algorithms build a gradient histogram for each
-node and iterate through the histogram instead of real dataset.  Here we introduce the
-implementations in XGBoost below.
+As ``exact`` tree method is slow in computation performance and difficult to scale, we
+often employ approximated training algorithms.  These algorithms build a gradient
+histogram for each node and iterate through the histogram instead of real dataset.  Here
+we introduce the implementations in XGBoost.

 1. ``approx`` tree method: An approximation tree method described in `reference paper
   <http://arxiv.org/abs/1603.02754>`_.  It runs sketching before building each tree
@@ -48,22 +48,18 @@ implementations in XGBoost below.
   this global sketch.  This is the fastest algorithm as it runs sketching only once.  The
   algorithm can be accessed by setting ``tree_method`` to ``hist``.

-3. ``gpu_hist`` tree method: The ``gpu_hist`` tree method is a GPU implementation of
-   ``hist``, with additional support for gradient based sampling.  The algorithm can be
-   accessed by setting ``tree_method`` to ``gpu_hist``.
-
 ************
 Implications
 ************

-Some objectives like ``reg:squarederror`` have constant hessian.  In this case, ``hist``
-or ``gpu_hist`` should be preferred as weighted sketching doesn't make sense with constant
+Some objectives like ``reg:squarederror`` have constant hessian.  In this case, the
+``hist`` should be preferred as weighted sketching doesn't make sense with constant
 weights.  When using non-constant hessian objectives, sometimes ``approx`` yields better
-accuracy, but with slower computation performance.  Most of the time using ``(gpu)_hist``
-with higher ``max_bin`` can achieve similar or even superior accuracy while maintaining
-good performance.  However, as xgboost is largely driven by community effort, the actual
-implementations have some differences than pure math description.  Result might have
-slight differences than expectation, which we are currently trying to overcome.
+accuracy, but with slower computation performance.  Most of the time using ``hist`` with
+higher ``max_bin`` can achieve similar or even superior accuracy while maintaining good
+performance.  However, as xgboost is largely driven by community effort, the actual
+implementations have some differences than pure math description.  Result might be
+slightly different than expectation, which we are currently trying to overcome.

 **************
 Other Updaters
@@ -106,8 +102,8 @@ solely for the interest of documentation.
   histogram creation step and uses sketching values directly during split evaluation.  It
   was never tested and contained some unknown bugs, we decided to remove it and focus our
   resources on more promising algorithms instead.  For accuracy, most of the time
-   ``approx``, ``hist`` and ``gpu_hist`` are enough with some parameters tuning, so
-   removing them don't have any real practical impact.
+   ``approx`` and ``hist`` are enough with some parameters tuning, so removing them don't
+   have any real practical impact.

 3. ``grow_local_histmaker`` updater: An approximation tree method described in `reference
   paper <http://arxiv.org/abs/1603.02754>`_.  This updater was rarely used in practice so
--- a/doc/tutorials/dask.rst
+++ b/doc/tutorials/dask.rst
@@ -149,7 +149,7 @@ Also for inplace prediction:
 .. code-block:: python

  # where X is a dask DataFrame or dask Array backed by cupy or cuDF.
-  booster.set_param({"gpu_id": "0"})
+  booster.set_param({"device": "cuda:0"})
  prediction = xgb.dask.inplace_predict(client, booster, X)

 When input is ``da.Array`` object, output is always ``da.Array``.  However, if the input
--- a/doc/tutorials/saving_model.rst
+++ b/doc/tutorials/saving_model.rst
@@ -163,7 +163,7 @@ Will print out something similar to (not actual output as it's too long for demo
    {
      "Learner": {
        "generic_parameter": {
-          "gpu_id": "0",
+          "device": "cuda:0",
          "gpu_page_size": "0",
          "n_jobs": "0",
          "random_state": "0",