Revert ntree limit fix (#6616)

The old (before fix) best_ntree_limit ignores the num_class parameters, which is incorrect. In before we workarounded it in c++ layer to avoid possible breaking changes on other language bindings. But the Python interpretation stayed incorrect. The PR fixed that in Python to consider num_class, but didn't remove the old workaround, so tree calculation in predictor is incorrect, see PredictBatch in CPUPredictor.
2021-01-19 23:51:16 +08:00
parent d132933550
commit d6d72de339
6 changed files with 32 additions and 21 deletions
--- a/python-package/xgboost/training.py
+++ b/python-package/xgboost/training.py
@@ -109,22 +109,19 @@ def _train_internal(params, dtrain,
    else:
        raise ValueError(f'Unknown booster: {booster}')

-    num_groups = int(config['learner']['learner_model_param']['num_class'])
-    num_groups = 1 if num_groups == 0 else num_groups
    if bst.attr('best_score') is not None:
        bst.best_score = float(bst.attr('best_score'))
        bst.best_iteration = int(bst.attr('best_iteration'))
+        # num_class is handled internally
        bst.set_attr(
-            best_ntree_limit=str(
-                (bst.best_iteration + 1) * num_parallel_tree * num_groups
-            )
+            best_ntree_limit=str((bst.best_iteration + 1) * num_parallel_tree)
        )
        bst.best_ntree_limit = int(bst.attr("best_ntree_limit"))
    else:
        # Due to compatibility with version older than 1.4, these attributes are added
        # to Python object even if early stopping is not used.
        bst.best_iteration = bst.num_boosted_rounds() - 1
-        bst.best_ntree_limit = (bst.best_iteration + 1) * num_parallel_tree * num_groups
+        bst.best_ntree_limit = (bst.best_iteration + 1) * num_parallel_tree

    # Copy to serialise and unserialise booster to reset state and free
    # training memory
@@ -165,9 +162,10 @@ def train(params, dtrain, num_boost_round=10, evals=(), obj=None, feval=None,
        If there's more than one metric in the **eval_metric** parameter given in
        **params**, the last metric will be used for early stopping.
        If early stopping occurs, the model will have three additional fields:
-        ``bst.best_score``, ``bst.best_iteration`` and ``bst.best_ntree_limit``.  (Use
+        ``bst.best_score``, ``bst.best_iteration`` and ``bst.best_ntree_limit``.  Use
        ``bst.best_ntree_limit`` to get the correct value if ``num_parallel_tree`` and/or
-        ``num_class`` appears in the parameters)
+        ``num_class`` appears in the parameters.  ``best_ntree_limit`` is the result of
+        ``num_parallel_tree * best_iteration``.
    evals_result: dict
        This dictionary stores the evaluation results of all the items in watchlist.