Update documents. (#6856)

* Add early stopping section to prediction doc.
* Remove best_ntree_limit.
* Better doxygen output.
This commit is contained in:
Jiaming Yuan 2021-04-16 12:41:03 +08:00 committed by GitHub
parent d31a57cf5f
commit a5d7094a45
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 34 additions and 16 deletions

View File

@ -67,6 +67,18 @@ the 3-class classification dataset, and want to use the first 2 iterations of tr
prediction, you need to provide ``iteration_range=(0, 2)``. Then the first :math:`2 prediction, you need to provide ``iteration_range=(0, 2)``. Then the first :math:`2
\times 3 \times 4` trees will be used in this prediction. \times 3 \times 4` trees will be used in this prediction.
**************
Early Stopping
**************
When a model is trained with early stopping, there is an inconsistent behavior between
native Python interface and sklearn/R interfaces. By default on R and sklearn interfaces,
the ``best_iteration`` is automatically used so prediction comes from the best model. But
with the native Python interface :py:meth:`xgboost.Booster.predict` and
:py:meth:`xgboost.Booster.inplace_predict` uses the full model. Users can use
``best_iteration`` attribute with ``iteration_range`` parameter to achieve the same
behavior. Also the ``save_best`` parameter from :py:obj:`xgboost.callback.EarlyStopping`
might be useful.
********* *********
Predictor Predictor

View File

@ -183,7 +183,7 @@ Early stopping requires at least one set in ``evals``. If there's more than one,
The model will train until the validation score stops improving. Validation error needs to decrease at least every ``early_stopping_rounds`` to continue training. The model will train until the validation score stops improving. Validation error needs to decrease at least every ``early_stopping_rounds`` to continue training.
If early stopping occurs, the model will have three additional fields: ``bst.best_score``, ``bst.best_iteration`` and ``bst.best_ntree_limit``. Note that :py:meth:`xgboost.train` will return a model from the last iteration, not the best one. If early stopping occurs, the model will have two additional fields: ``bst.best_score``, ``bst.best_iteration``. Note that :py:meth:`xgboost.train` will return a model from the last iteration, not the best one.
This works with both metrics to minimize (RMSE, log loss, etc.) and to maximize (MAP, NDCG, AUC). Note that if you specify more than one evaluation metric the last one in ``param['eval_metric']`` is used for early stopping. This works with both metrics to minimize (RMSE, log loss, etc.) and to maximize (MAP, NDCG, AUC). Note that if you specify more than one evaluation metric the last one in ``param['eval_metric']`` is used for early stopping.
@ -198,11 +198,11 @@ A model that has been trained or loaded can perform predictions on data sets.
dtest = xgb.DMatrix(data) dtest = xgb.DMatrix(data)
ypred = bst.predict(dtest) ypred = bst.predict(dtest)
If early stopping is enabled during training, you can get predictions from the best iteration with ``bst.best_ntree_limit``: If early stopping is enabled during training, you can get predictions from the best iteration with ``bst.best_iteration``:
.. code-block:: python .. code-block:: python
ypred = bst.predict(dtest, ntree_limit=bst.best_ntree_limit) ypred = bst.predict(dtest, iteration_range=(0, bst.best_iteration))
Plotting Plotting
-------- --------

View File

@ -744,13 +744,13 @@ XGB_DLL int XGBoosterPredict(BoosterHandle handle,
* following available fields in the JSON object: * following available fields in the JSON object:
* *
* "type": [0, 6] * "type": [0, 6]
* 0: normal prediction * - 0: normal prediction
* 1: output margin * - 1: output margin
* 2: predict contribution * - 2: predict contribution
* 3: predict approximated contribution * - 3: predict approximated contribution
* 4: predict feature interaction * - 4: predict feature interaction
* 5: predict approximated feature interaction * - 5: predict approximated feature interaction
* 6: predict leaf * - 6: predict leaf
* "training": bool * "training": bool
* Whether the prediction function is used as part of a training loop. **Not used * Whether the prediction function is used as part of a training loop. **Not used
* for inplace prediction**. * for inplace prediction**.
@ -773,7 +773,8 @@ XGB_DLL int XGBoosterPredict(BoosterHandle handle,
* disregarding the use of multi-class model, and leaf prediction will output 4-dim * disregarding the use of multi-class model, and leaf prediction will output 4-dim
* array representing: (n_samples, n_iterations, n_classes, n_trees_in_forest) * array representing: (n_samples, n_iterations, n_classes, n_trees_in_forest)
* *
* Run a normal prediction with strict output shape, 2 dim for softprob , 1 dim for others. * Example JSON input for running a normal prediction with strict output shape, 2 dim
* for softprob , 1 dim for others.
* \code * \code
* { * {
* "type": 0, * "type": 0,

View File

@ -1683,7 +1683,9 @@ class Booster(object):
iteration_range: Tuple[int, int] = (0, 0), iteration_range: Tuple[int, int] = (0, 0),
strict_shape: bool = False, strict_shape: bool = False,
) -> np.ndarray: ) -> np.ndarray:
"""Predict with data. """Predict with data. The full model will be used unless `iteration_range` is specified,
meaning user have to either slice the model or use the ``best_iteration``
attribute to get prediction from best model returned from early stopping.
.. note:: .. note::

View File

@ -794,8 +794,8 @@ class XGBModel(XGBModelBase):
base_margin: Optional[array_like] = None, base_margin: Optional[array_like] = None,
iteration_range: Optional[Tuple[int, int]] = None, iteration_range: Optional[Tuple[int, int]] = None,
) -> np.ndarray: ) -> np.ndarray:
""" """Predict with `X`. If the model is trained with early stopping, then `best_iteration`
Predict with `X`. is used automatically.
.. note:: This function is only thread safe for `gbtree` and `dart`. .. note:: This function is only thread safe for `gbtree` and `dart`.
@ -819,6 +819,7 @@ class XGBModel(XGBModelBase):
used in this prediction. used in this prediction.
.. versionadded:: 1.4.0 .. versionadded:: 1.4.0
Returns Returns
------- -------
prediction prediction
@ -860,7 +861,8 @@ class XGBModel(XGBModelBase):
ntree_limit: int = 0, ntree_limit: int = 0,
iteration_range: Optional[Tuple[int, int]] = None iteration_range: Optional[Tuple[int, int]] = None
) -> np.ndarray: ) -> np.ndarray:
"""Return the predicted leaf every tree for each sample. """Return the predicted leaf every tree for each sample. If the model is trained with
early stopping, then `best_iteration` is used automatically.
Parameters Parameters
---------- ----------
@ -879,6 +881,7 @@ class XGBModel(XGBModelBase):
For each datapoint x in X and for each tree, return the index of the For each datapoint x in X and for each tree, return the index of the
leaf x ends up in. Leaves are numbered within leaf x ends up in. Leaves are numbered within
``[0; 2**(self.max_depth+1))``, possibly with gaps in the numbering. ``[0; 2**(self.max_depth+1))``, possibly with gaps in the numbering.
""" """
iteration_range = _convert_ntree_limit( iteration_range = _convert_ntree_limit(
self.get_booster(), ntree_limit, iteration_range self.get_booster(), ntree_limit, iteration_range