Document for device ordinal. (#9398)

- Rewrite GPU demos. notebook is converted to script to avoid committing additional png plots.
- Add GPU demos into the sphinx gallery.
- Add RMM demos into the sphinx gallery.
- Test for firing threads with different device ordinals.
This commit is contained in:
Jiaming Yuan
2023-07-22 15:26:29 +08:00
committed by GitHub
parent 22b0a55a04
commit 275da176ba
32 changed files with 351 additions and 398 deletions

View File

@@ -35,8 +35,8 @@ parameter ``enable_categorical``:
.. code:: python
# Supported tree methods are `gpu_hist`, `approx`, and `hist`.
clf = xgb.XGBClassifier(tree_method="gpu_hist", enable_categorical=True)
# Supported tree methods are `approx` and `hist`.
clf = xgb.XGBClassifier(tree_method="hist", enable_categorical=True, device="cuda")
# X is the dataframe we created in previous snippet
clf.fit(X, y)
# Must use JSON/UBJSON for serialization, otherwise the information is lost.

View File

@@ -81,7 +81,7 @@ constructor.
it = Iterator(["file_0.svm", "file_1.svm", "file_2.svm"])
Xy = xgboost.DMatrix(it)
# Other tree methods including ``hist`` and ``gpu_hist`` also work, but has some caveats
# The ``approx`` also work, but with low performance. GPU implementation is different from CPU.
# as noted in following sections.
booster = xgboost.train({"tree_method": "hist"}, Xy)
@@ -118,15 +118,15 @@ to reduce the overhead of file reading.
GPU Version (GPU Hist tree method)
**********************************
External memory is supported by GPU algorithms (i.e. when ``tree_method`` is set to
``gpu_hist``). However, the algorithm used for GPU is different from the one used for
External memory is supported by GPU algorithms (i.e. when ``device`` is set to
``cuda``). However, the algorithm used for GPU is different from the one used for
CPU. When training on a CPU, the tree method iterates through all batches from external
memory for each step of the tree construction algorithm. On the other hand, the GPU
algorithm uses a hybrid approach. It iterates through the data during the beginning of
each iteration and concatenates all batches into one in GPU memory. To reduce overall
memory usage, users can utilize subsampling. The GPU hist tree method supports
`gradient-based sampling`, enabling users to set a low sampling rate without compromising
accuracy.
each iteration and concatenates all batches into one in GPU memory for performance
reasons. To reduce overall memory usage, users can utilize subsampling. The GPU hist tree
method supports `gradient-based sampling`, enabling users to set a low sampling rate
without compromising accuracy.
.. code-block:: python

View File

@@ -83,13 +83,14 @@ Some other examples:
- ``(0,-1)``: No constraint on the first predictor and a decreasing constraint on the second.
**Note for the 'hist' tree construction algorithm**.
If ``tree_method`` is set to either ``hist``, ``approx`` or ``gpu_hist``, enabling
monotonic constraints may produce unnecessarily shallow trees. This is because the
``hist`` method reduces the number of candidate splits to be considered at each
split. Monotonic constraints may wipe out all available split candidates, in which case no
split is made. To reduce the effect, you may want to increase the ``max_bin`` parameter to
consider more split candidates.
.. note::
**Note for the 'hist' tree construction algorithm**. If ``tree_method`` is set to
either ``hist`` or ``approx``, enabling monotonic constraints may produce unnecessarily
shallow trees. This is because the ``hist`` method reduces the number of candidate
splits to be considered at each split. Monotonic constraints may wipe out all available
split candidates, in which case no split is made. To reduce the effect, you may want to
increase the ``max_bin`` parameter to consider more split candidates.
*******************

View File

@@ -38,10 +38,6 @@ There are in general two ways that you can control overfitting in XGBoost:
- This includes ``subsample`` and ``colsample_bytree``.
- You can also reduce stepsize ``eta``. Remember to increase ``num_round`` when you do so.
***************************
Faster training performance
***************************
There's a parameter called ``tree_method``, set it to ``hist`` or ``gpu_hist`` for faster computation.
*************************
Handle Imbalanced Dataset

View File

@@ -50,13 +50,14 @@ Here is a sample parameter dictionary for training a random forest on a GPU usin
xgboost::
params = {
'colsample_bynode': 0.8,
'learning_rate': 1,
'max_depth': 5,
'num_parallel_tree': 100,
'objective': 'binary:logistic',
'subsample': 0.8,
'tree_method': 'gpu_hist'
"colsample_bynode": 0.8,
"learning_rate": 1,
"max_depth": 5,
"num_parallel_tree": 100,
"objective": "binary:logistic",
"subsample": 0.8,
"tree_method": "hist",
"device": "cuda",
}
A random forest model can then be trained as follows::

View File

@@ -174,7 +174,7 @@ Will print out something similar to (not actual output as it's too long for demo
"gbtree_train_param": {
"num_parallel_tree": "1",
"process_type": "default",
"tree_method": "gpu_hist",
"tree_method": "hist",
"updater": "grow_gpu_hist",
"updater_seq": "grow_gpu_hist"
},