Mark Scikit-Learn RF interface as experimental in doc. (#4258)

* Mark Scikit-Learn RF interface as experimental in doc.
This commit is contained in:
Jiaming Yuan 2019-03-16 00:45:32 +08:00 committed by GitHub
parent 5465b73e7c
commit 7b1b11390a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 27 additions and 10 deletions

View File

@ -4,12 +4,17 @@ Random Forests in XGBoost
XGBoost is normally used to train gradient-boosted decision trees and other gradient XGBoost is normally used to train gradient-boosted decision trees and other gradient
boosted models. Random forests use the same model representation and inference, as boosted models. Random forests use the same model representation and inference, as
gradient-boosted decision trees, but a different training algorithm. There are XGBoost gradient-boosted decision trees, but a different training algorithm. One can use XGBoost
parameters that enable training a forest in a random forest fashion. to train a standalone random forest or use random forest as a base model for gradient
boosting. Here we focus on training standalone random forest.
We have native APIs for training random forests since the early days, and a new
Scikit-Learn wrapper after 0.82 (not included in 0.82). Please note that the new
Scikit-Learn wrapper is still **experimental**, which means we might change the interface
whenever needed.
**************** ****************
With XGBoost API Standalone Random Forest With XGBoost API
**************** ****************
The following parameters must be set to enable random forest training. The following parameters must be set to enable random forest training.
@ -22,8 +27,9 @@ The following parameters must be set to enable random forest training.
selection of columns. Normally, ``colsample_bynode`` would be set to a value less than 1 selection of columns. Normally, ``colsample_bynode`` would be set to a value less than 1
to randomly sample columns at each tree split. to randomly sample columns at each tree split.
* ``num_parallel_tree`` should be set to the size of the forest being trained. * ``num_parallel_tree`` should be set to the size of the forest being trained.
* ``num_boost_round`` should be set to 1. Note that this is a keyword argument to * ``num_boost_round`` should be set to 1 to prevent XGBoost from boosting multiple random
``train()``, and is not part of the parameter dictionary. forests. Note that this is a keyword argument to ``train()``, and is not part of the
parameter dictionary.
* ``eta`` (alias: ``learning_rate``) must be set to 1 when training random forest * ``eta`` (alias: ``learning_rate``) must be set to 1 when training random forest
regression. regression.
* ``random_state`` can be used to seed the random number generator. * ``random_state`` can be used to seed the random number generator.
@ -59,7 +65,7 @@ A random forest model can then be trained as follows::
************************** **************************
With Scikit-Learn-Like API Standalone Random Forest With Scikit-Learn-Like API
************************** **************************
``XGBRFClassifier`` and ``XGBRFRegressor`` are SKL-like classes that provide random forest ``XGBRFClassifier`` and ``XGBRFRegressor`` are SKL-like classes that provide random forest
@ -73,6 +79,17 @@ some of the parameters adjusted accordingly. In particular:
* ``colsample_bynode`` and ``subsample`` are set to 0.8 by default * ``colsample_bynode`` and ``subsample`` are set to 0.8 by default
* ``booster`` is always ``gbtree`` * ``booster`` is always ``gbtree``
For a simple example, you can train a random forest regressor with::
from sklearn.model_selection import KFold
# Your code ...
kf = KFold(n_splits=2)
for train_index, test_index in kf.split(X, y):
xgb_model = xgb.XGBRFRegressor(random_state=42).fit(
X[train_index], y[train_index])
Note that these classes have a smaller selection of parameters compared to using Note that these classes have a smaller selection of parameters compared to using
``train()``. In particular, it is impossible to combine random forests with gradient ``train()``. In particular, it is impossible to combine random forests with gradient
boosting using this API. boosting using this API.

View File

@ -884,7 +884,7 @@ class XGBClassifier(XGBModel, XGBClassifierBase):
class XGBRFClassifier(XGBClassifier): class XGBRFClassifier(XGBClassifier):
# pylint: disable=missing-docstring # pylint: disable=missing-docstring
__doc__ = "Implementation of the scikit-learn API "\ __doc__ = "Experimental implementation of the scikit-learn API "\
+ "for XGBoost random forest classification.\n\n"\ + "for XGBoost random forest classification.\n\n"\
+ '\n'.join(XGBModel.__doc__.split('\n')[2:]) + '\n'.join(XGBModel.__doc__.split('\n')[2:])
@ -923,7 +923,7 @@ class XGBRegressor(XGBModel, XGBRegressorBase):
class XGBRFRegressor(XGBRegressor): class XGBRFRegressor(XGBRegressor):
# pylint: disable=missing-docstring # pylint: disable=missing-docstring
__doc__ = "Implementation of the scikit-learn API "\ __doc__ = "Experimental implementation of the scikit-learn API "\
+ "for XGBoost random forest regression.\n\n"\ + "for XGBoost random forest regression.\n\n"\
+ '\n'.join(XGBModel.__doc__.split('\n')[2:]) + '\n'.join(XGBModel.__doc__.split('\n')[2:])