[backport] Always use partition based categorical splits. (#7857) (#7865)

2022-05-06 19:14:19 +08:00
parent f4eb6b984e
commit b1b6246e35
13 changed files with 77 additions and 103 deletions
--- a/doc/tutorials/categorical.rst
+++ b/doc/tutorials/categorical.rst
@@ -74,23 +74,20 @@ Optimal Partitioning
 .. versionadded:: 1.6

 Optimal partitioning is a technique for partitioning the categorical predictors for each
-node split, the proof of optimality for numerical objectives like ``RMSE`` was first
-introduced by `[1] <#references>`__. The algorithm is used in decision trees for handling
-regression and binary classification tasks `[2] <#references>`__, later LightGBM `[3]
-<#references>`__ brought it to the context of gradient boosting trees and now is also
-adopted in XGBoost as an optional feature for handling categorical splits. More
-specifically, the proof by Fisher `[1] <#references>`__ states that, when trying to
-partition a set of discrete values into groups based on the distances between a measure of
-these values, one only needs to look at sorted partitions instead of enumerating all
-possible permutations. In the context of decision trees, the discrete values are
-categories, and the measure is the output leaf value.  Intuitively, we want to group the
-categories that output similar leaf values. During split finding, we first sort the
-gradient histogram to prepare the contiguous partitions then enumerate the splits
+node split, the proof of optimality for numerical output was first introduced by `[1]
+<#references>`__. The algorithm is used in decision trees `[2] <#references>`__, later
+LightGBM `[3] <#references>`__ brought it to the context of gradient boosting trees and
+now is also adopted in XGBoost as an optional feature for handling categorical
+splits. More specifically, the proof by Fisher `[1] <#references>`__ states that, when
+trying to partition a set of discrete values into groups based on the distances between a
+measure of these values, one only needs to look at sorted partitions instead of
+enumerating all possible permutations. In the context of decision trees, the discrete
+values are categories, and the measure is the output leaf value.  Intuitively, we want to
+group the categories that output similar leaf values. During split finding, we first sort
+the gradient histogram to prepare the contiguous partitions then enumerate the splits
 according to these sorted values. One of the related parameters for XGBoost is
 ``max_cat_to_one_hot``, which controls whether one-hot encoding or partitioning should be
-used for each feature, see :doc:`/parameter` for details.  When objective is not
-regression or binary classification, XGBoost will fallback to using onehot encoding
-instead.
+used for each feature, see :doc:`/parameter` for details.


 **********************