Support optimal partitioning for GPU hist. (#7652)
* Implement `MaxCategory` in quantile. * Implement partition-based split for GPU evaluation. Currently, it's based on the existing evaluation function. * Extract an evaluator from GPU Hist to store the needed states. * Added some CUDA stream/event utilities. * Update document with references. * Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
This commit is contained in:
@@ -581,10 +581,10 @@ class DMatrix: # pylint: disable=too-many-instance-attributes
|
||||
|
||||
.. versionadded:: 1.3.0
|
||||
|
||||
Experimental support of specializing for categorical features. Do not set to
|
||||
True unless you are interested in development. Currently it's only available
|
||||
for `gpu_hist` tree method with 1 vs rest (one hot) categorical split. Also,
|
||||
JSON serialization format is required.
|
||||
Experimental support of specializing for categorical features. Do not set
|
||||
to True unless you are interested in development. Currently it's only
|
||||
available for `gpu_hist` and `approx` tree methods. Also, JSON/UBJSON
|
||||
serialization format is required. (XGBoost 1.6 for approx)
|
||||
|
||||
"""
|
||||
if group is not None and qid is not None:
|
||||
|
||||
@@ -207,7 +207,9 @@ __model_doc = f'''
|
||||
.. versionadded:: 1.5.0
|
||||
|
||||
Experimental support for categorical data. Do not set to true unless you are
|
||||
interested in development. Only valid when `gpu_hist` and dataframe are used.
|
||||
interested in development. Only valid when `gpu_hist` or `approx` is used along
|
||||
with dataframe as input. Also, JSON/UBJSON serialization format is
|
||||
required. (XGBoost 1.6 for approx)
|
||||
|
||||
max_cat_to_onehot : Optional[int]
|
||||
|
||||
@@ -216,10 +218,11 @@ __model_doc = f'''
|
||||
.. note:: This parameter is experimental
|
||||
|
||||
A threshold for deciding whether XGBoost should use one-hot encoding based split
|
||||
for categorical data. When number of categories is lesser than the threshold then
|
||||
one-hot encoding is chosen, otherwise the categories will be partitioned into
|
||||
children nodes. Only relevant for regression and binary classification and
|
||||
`approx` tree method.
|
||||
for categorical data. When number of categories is lesser than the threshold
|
||||
then one-hot encoding is chosen, otherwise the categories will be partitioned
|
||||
into children nodes. Only relevant for regression and binary
|
||||
classification. Also, ``approx`` or ``gpu_hist`` tree method is required. See
|
||||
:doc:`Categorical Data </tutorials/categorical>` for details.
|
||||
|
||||
eval_metric : Optional[Union[str, List[str], Callable]]
|
||||
|
||||
|
||||
Reference in New Issue
Block a user