[breaking] Remove the predictor param, allow fallback to prediction using DMatrix. (#9129)

- A `DeviceOrd` struct is implemented to indicate the device. It will eventually replace the `gpu_id` parameter.
- The `predictor` parameter is removed.
- Fallback to `DMatrix` when `inplace_predict` is not available.
- The heuristic for choosing a predictor is only used during training.
This commit is contained in:
Jiaming Yuan
2023-07-03 19:23:54 +08:00
committed by GitHub
parent 3a0f787703
commit 39390cc2ee
54 changed files with 1049 additions and 778 deletions

View File

@@ -45,7 +45,7 @@ XGBoost makes use of `GPUTreeShap <https://github.com/rapidsai/gputreeshap>`_ as
.. code-block:: python
model.set_param({"predictor": "gpu_predictor"})
model.set_param({"gpu_id": "0", "tree_method": "gpu_hist"})
shap_values = model.predict(dtrain, pred_contribs=True)
shap_interaction_values = model.predict(dtrain, pred_interactions=True)

View File

@@ -199,18 +199,6 @@ Parameters for Tree Booster
- Maximum number of discrete bins to bucket continuous features.
- Increasing this number improves the optimality of splits at the cost of higher computation time.
* ``predictor``, [default= ``auto``]
- The type of predictor algorithm to use. Provides the same results but allows the use of GPU or CPU.
- ``auto``: Configure predictor based on heuristics.
- ``cpu_predictor``: Multicore CPU prediction algorithm.
- ``gpu_predictor``: Prediction using GPU. Used when ``tree_method`` is ``gpu_hist``.
When ``predictor`` is set to default value ``auto``, the ``gpu_hist`` tree method is
able to provide GPU based prediction without copying training data to GPU memory.
If ``gpu_predictor`` is explicitly specified, then all data is copied into GPU, only
recommended for performing prediction tasks.
* ``num_parallel_tree``, [default=1]
- Number of parallel trees constructed during each iteration. This option is used to support boosted random forest.

View File

@@ -87,15 +87,6 @@ with the native Python interface :py:meth:`xgboost.Booster.predict` and
behavior. Also the ``save_best`` parameter from :py:obj:`xgboost.callback.EarlyStopping`
might be useful.
*********
Predictor
*********
There are 2 predictors in XGBoost (3 if you have the one-api plugin enabled), namely
``cpu_predictor`` and ``gpu_predictor``. The default option is ``auto`` so that XGBoost
can employ some heuristics for saving GPU memory during training. They might have slight
different outputs due to floating point errors.
***********
Base Margin
@@ -134,15 +125,6 @@ it. Be aware that the output of in-place prediction depends on input data type,
input is on GPU data output is :py:obj:`cupy.ndarray`, otherwise a :py:obj:`numpy.ndarray`
is returned.
****************
Categorical Data
****************
Other than users performing encoding, XGBoost has experimental support for categorical
data using ``gpu_hist`` and ``gpu_predictor``. No special operation needs to be done on
input test data since the information about categories is encoded into the model during
training.
*************
Thread Safety
*************
@@ -159,7 +141,6 @@ instance we might accidentally call ``clf.set_params()`` inside a predict functi
def predict_fn(clf: xgb.XGBClassifier, X):
X = preprocess(X)
clf.set_params(predictor="gpu_predictor") # NOT safe!
clf.set_params(n_jobs=1) # NOT safe!
return clf.predict_proba(X, iteration_range=(0, 10))

View File

@@ -148,8 +148,8 @@ Also for inplace prediction:
.. code-block:: python
booster.set_param({'predictor': 'gpu_predictor'})
# where X is a dask DataFrame or dask Array containing cupy or cuDF backed data.
# where X is a dask DataFrame or dask Array backed by cupy or cuDF.
booster.set_param({"gpu_id": "0"})
prediction = xgb.dask.inplace_predict(client, booster, X)
When input is ``da.Array`` object, output is always ``da.Array``. However, if the input

View File

@@ -173,7 +173,6 @@ Will print out something similar to (not actual output as it's too long for demo
"gradient_booster": {
"gbtree_train_param": {
"num_parallel_tree": "1",
"predictor": "gpu_predictor",
"process_type": "default",
"tree_method": "gpu_hist",
"updater": "grow_gpu_hist",