Implement max_cat_threshold for CPU. (#7957)

This commit is contained in:
Jiaming Yuan
2022-06-04 11:02:46 +08:00
committed by GitHub
parent 78694405a6
commit b90c6d25e8
8 changed files with 177 additions and 20 deletions

View File

@@ -235,14 +235,19 @@ Parameters for Tree Booster
list is a group of indices of features that are allowed to interact with each other.
See :doc:`/tutorials/feature_interaction_constraint` for more information.
Additional parameters for ``hist``, ``gpu_hist`` and ``approx`` tree method
===========================================================================
.. _cat-param:
Parameters for Categorical Feature
==================================
These parameters are only used for training with categorical data. See
:doc:`/tutorials/categorical` for more information.
* ``max_cat_to_onehot``
.. versionadded:: 1.6
.. note:: The support for this parameter is experimental.
.. note:: This parameter is experimental. ``exact`` tree method is not supported yet.
- A threshold for deciding whether XGBoost should use one-hot encoding based split for
categorical data. When number of categories is lesser than the threshold then one-hot
@@ -250,6 +255,16 @@ Additional parameters for ``hist``, ``gpu_hist`` and ``approx`` tree method
Only relevant for regression and binary classification. Also, ``exact`` tree method is
not supported
* ``max_cat_threshold``
.. versionadded:: 2.0
.. note:: This parameter is experimental. ``exact`` and ``gpu_hist`` tree methods are
not supported yet.
- Maximum number of categories considered for each split. Used only by partition-based
splits for preventing over-fitting.
Additional parameters for Dart Booster (``booster=dart``)
=========================================================

View File

@@ -85,7 +85,7 @@ group the categories that output similar leaf values. During split finding, we f
the gradient histogram to prepare the contiguous partitions then enumerate the splits
according to these sorted values. One of the related parameters for XGBoost is
``max_cat_to_one_hot``, which controls whether one-hot encoding or partitioning should be
used for each feature, see :doc:`/parameter` for details.
used for each feature, see :ref:`cat-param` for details.
**********************