Support slicing tree model (#6302)

This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone. * Implement the save_best option in early stopping. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-11-03 02:27:39 -05:00
parent 29745c6df2
commit 2cc9662005
19 changed files with 550 additions and 37 deletions
--- a/doc/python/callbacks.rst
+++ b/doc/python/callbacks.rst
@@ -7,9 +7,9 @@ package.  In XGBoost 1.3, a new callback interface is designed for Python packag
 provides the flexiblity of designing various extension for training.  Also, XGBoost has a
 number of pre-defined callbacks for supporting early stopping, checkpoints etc.

-#######################
+
 Using builtin callbacks
-#######################
+-----------------------

 By default, training methods in XGBoost have parameters like ``early_stopping_rounds`` and
 ``verbose``/``verbose_eval``, when specified the training procedure will define the
@@ -50,9 +50,9 @@ this callback function directly into XGBoost:
    dump = booster.get_dump(dump_format='json')
    assert len(early_stop.stopping_history['Valid']['CustomErr']) == len(dump)

-##########################
+
 Defining your own callback
-##########################
+--------------------------

 XGBoost provides an callback interface class: ``xgboost.callback.TrainingCallback``, user
 defined callbacks should inherit this class and override corresponding methods.  There's a
--- a/doc/python/index.rst
+++ b/doc/python/index.rst
@@ -12,4 +12,5 @@ Contents
  python_intro
  python_api
  callbacks
+  model
  Python examples <https://github.com/dmlc/xgboost/tree/master/demo/guide-python>
--- a/doc/python/model.rst
+++ b/doc/python/model.rst
@@ -0,0 +1,38 @@
+#####
+Model
+#####
+
+Slice tree model
+----------------
+
+When ``booster`` is set to ``gbtree`` or ``dart``, XGBoost builds a tree model, which is a
+list of trees and can be sliced into multiple sub-models.
+
+.. code-block:: python
+
+    from sklearn.datasets import make_classification
+    num_classes = 3
+    X, y = make_classification(n_samples=1000, n_informative=5,
+                               n_classes=num_classes)
+    dtrain = xgb.DMatrix(data=X, label=y)
+    num_parallel_tree = 4
+    num_boost_round = 16
+    # total number of built trees is num_parallel_tree * num_classes * num_boost_round
+
+    # We build a boosted random forest for classification here.
+    booster = xgb.train({
+        'num_parallel_tree': 4, 'subsample': 0.5, 'num_class': 3},
+                        num_boost_round=num_boost_round, dtrain=dtrain)
+
+    # This is the sliced model, containing [3, 7) forests
+    # step is also supported with some limitations like negative step is invalid.
+    sliced: xgb.Booster = booster[3:7]
+
+    # Access individual tree layer
+    trees = [_ for _ in booster]
+    assert len(trees) == num_boost_round
+
+
+The sliced model is a copy of selected trees, that means the model itself is immutable
+during slicing.  This feature is the basis of `save_best` option in early stopping
+callback.