Document for device ordinal. (#9398)

- Rewrite GPU demos. notebook is converted to script to avoid committing additional png plots. - Add GPU demos into the sphinx gallery. - Add RMM demos into the sphinx gallery. - Test for firing threads with different device ordinals.
2023-07-22 15:26:29 +08:00
parent 22b0a55a04
commit 275da176ba
32 changed files with 351 additions and 398 deletions
--- a/doc/tutorials/categorical.rst
+++ b/doc/tutorials/categorical.rst
@@ -35,8 +35,8 @@ parameter ``enable_categorical``:

 .. code:: python

-  # Supported tree methods are `gpu_hist`, `approx`, and `hist`.
-  clf = xgb.XGBClassifier(tree_method="gpu_hist", enable_categorical=True)
+  # Supported tree methods are `approx` and `hist`.
+  clf = xgb.XGBClassifier(tree_method="hist", enable_categorical=True, device="cuda")
  # X is the dataframe we created in previous snippet
  clf.fit(X, y)
  # Must use JSON/UBJSON for serialization, otherwise the information is lost.
--- a/doc/tutorials/external_memory.rst
+++ b/doc/tutorials/external_memory.rst
@@ -81,7 +81,7 @@ constructor.
  it = Iterator(["file_0.svm", "file_1.svm", "file_2.svm"])
  Xy = xgboost.DMatrix(it)

-  # Other tree methods including ``hist`` and ``gpu_hist`` also work, but has some caveats
+  # The ``approx`` also work, but with low performance. GPU implementation is different from CPU.
  # as noted in following sections.
  booster = xgboost.train({"tree_method": "hist"}, Xy)

@@ -118,15 +118,15 @@ to reduce the overhead of file reading.
 GPU Version (GPU Hist tree method)
 **********************************

-External memory is supported by GPU algorithms (i.e. when ``tree_method`` is set to
-``gpu_hist``). However, the algorithm used for GPU is different from the one used for
+External memory is supported by GPU algorithms (i.e. when ``device`` is set to
+``cuda``). However, the algorithm used for GPU is different from the one used for
 CPU. When training on a CPU, the tree method iterates through all batches from external
 memory for each step of the tree construction algorithm. On the other hand, the GPU
 algorithm uses a hybrid approach. It iterates through the data during the beginning of
-each iteration and concatenates all batches into one in GPU memory. To reduce overall
-memory usage, users can utilize subsampling. The GPU hist tree method supports
-`gradient-based sampling`, enabling users to set a low sampling rate without compromising
-accuracy.
+each iteration and concatenates all batches into one in GPU memory for performance
+reasons. To reduce overall memory usage, users can utilize subsampling. The GPU hist tree
+method supports `gradient-based sampling`, enabling users to set a low sampling rate
+without compromising accuracy.

 .. code-block:: python

--- a/doc/tutorials/monotonic.rst
+++ b/doc/tutorials/monotonic.rst
@@ -83,13 +83,14 @@ Some other examples:
 - ``(0,-1)``: No constraint on the first predictor and a decreasing constraint on the second.


-**Note for the 'hist' tree construction algorithm**.
-If ``tree_method`` is set to either ``hist``, ``approx`` or ``gpu_hist``, enabling
-monotonic constraints may produce unnecessarily shallow trees. This is because the
-``hist`` method reduces the number of candidate splits to be considered at each
-split. Monotonic constraints may wipe out all available split candidates, in which case no
-split is made. To reduce the effect, you may want to increase the ``max_bin`` parameter to
-consider more split candidates.
+.. note::
+
+   **Note for the 'hist' tree construction algorithm**.  If ``tree_method`` is set to
+   either ``hist`` or ``approx``, enabling monotonic constraints may produce unnecessarily
+   shallow trees. This is because the ``hist`` method reduces the number of candidate
+   splits to be considered at each split. Monotonic constraints may wipe out all available
+   split candidates, in which case no split is made. To reduce the effect, you may want to
+   increase the ``max_bin`` parameter to consider more split candidates.


 *******************
--- a/doc/tutorials/param_tuning.rst
+++ b/doc/tutorials/param_tuning.rst
@@ -38,10 +38,6 @@ There are in general two ways that you can control overfitting in XGBoost:
  - This includes ``subsample`` and ``colsample_bytree``.
  - You can also reduce stepsize ``eta``. Remember to increase ``num_round`` when you do so.

-***************************
-Faster training performance
-***************************
-There's a parameter called ``tree_method``, set it to ``hist`` or ``gpu_hist`` for faster computation.

 *************************
 Handle Imbalanced Dataset
--- a/doc/tutorials/rf.rst
+++ b/doc/tutorials/rf.rst
@@ -50,13 +50,14 @@ Here is a sample parameter dictionary for training a random forest on a GPU usin
 xgboost::

  params = {
-    'colsample_bynode': 0.8,
-    'learning_rate': 1,
-    'max_depth': 5,
-    'num_parallel_tree': 100,
-    'objective': 'binary:logistic',
-    'subsample': 0.8,
-    'tree_method': 'gpu_hist'
+    "colsample_bynode": 0.8,
+    "learning_rate": 1,
+    "max_depth": 5,
+    "num_parallel_tree": 100,
+    "objective": "binary:logistic",
+    "subsample": 0.8,
+    "tree_method": "hist",
+    "device": "cuda",
  }

 A random forest model can then be trained as follows::
--- a/doc/tutorials/saving_model.rst
+++ b/doc/tutorials/saving_model.rst
@@ -174,7 +174,7 @@ Will print out something similar to (not actual output as it's too long for demo
          "gbtree_train_param": {
            "num_parallel_tree": "1",
            "process_type": "default",
-            "tree_method": "gpu_hist",
+            "tree_method": "hist",
            "updater": "grow_gpu_hist",
            "updater_seq": "grow_gpu_hist"
          },