Support slicing tree model (#6302)
This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
This commit is contained in:
@@ -7,9 +7,9 @@ package. In XGBoost 1.3, a new callback interface is designed for Python packag
|
||||
provides the flexiblity of designing various extension for training. Also, XGBoost has a
|
||||
number of pre-defined callbacks for supporting early stopping, checkpoints etc.
|
||||
|
||||
#######################
|
||||
|
||||
Using builtin callbacks
|
||||
#######################
|
||||
-----------------------
|
||||
|
||||
By default, training methods in XGBoost have parameters like ``early_stopping_rounds`` and
|
||||
``verbose``/``verbose_eval``, when specified the training procedure will define the
|
||||
@@ -50,9 +50,9 @@ this callback function directly into XGBoost:
|
||||
dump = booster.get_dump(dump_format='json')
|
||||
assert len(early_stop.stopping_history['Valid']['CustomErr']) == len(dump)
|
||||
|
||||
##########################
|
||||
|
||||
Defining your own callback
|
||||
##########################
|
||||
--------------------------
|
||||
|
||||
XGBoost provides an callback interface class: ``xgboost.callback.TrainingCallback``, user
|
||||
defined callbacks should inherit this class and override corresponding methods. There's a
|
||||
|
||||
@@ -12,4 +12,5 @@ Contents
|
||||
python_intro
|
||||
python_api
|
||||
callbacks
|
||||
model
|
||||
Python examples <https://github.com/dmlc/xgboost/tree/master/demo/guide-python>
|
||||
|
||||
38
doc/python/model.rst
Normal file
38
doc/python/model.rst
Normal file
@@ -0,0 +1,38 @@
|
||||
#####
|
||||
Model
|
||||
#####
|
||||
|
||||
Slice tree model
|
||||
----------------
|
||||
|
||||
When ``booster`` is set to ``gbtree`` or ``dart``, XGBoost builds a tree model, which is a
|
||||
list of trees and can be sliced into multiple sub-models.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from sklearn.datasets import make_classification
|
||||
num_classes = 3
|
||||
X, y = make_classification(n_samples=1000, n_informative=5,
|
||||
n_classes=num_classes)
|
||||
dtrain = xgb.DMatrix(data=X, label=y)
|
||||
num_parallel_tree = 4
|
||||
num_boost_round = 16
|
||||
# total number of built trees is num_parallel_tree * num_classes * num_boost_round
|
||||
|
||||
# We build a boosted random forest for classification here.
|
||||
booster = xgb.train({
|
||||
'num_parallel_tree': 4, 'subsample': 0.5, 'num_class': 3},
|
||||
num_boost_round=num_boost_round, dtrain=dtrain)
|
||||
|
||||
# This is the sliced model, containing [3, 7) forests
|
||||
# step is also supported with some limitations like negative step is invalid.
|
||||
sliced: xgb.Booster = booster[3:7]
|
||||
|
||||
# Access individual tree layer
|
||||
trees = [_ for _ in booster]
|
||||
assert len(trees) == num_boost_round
|
||||
|
||||
|
||||
The sliced model is a copy of selected trees, that means the model itself is immutable
|
||||
during slicing. This feature is the basis of `save_best` option in early stopping
|
||||
callback.
|
||||
Reference in New Issue
Block a user