Clarify the behavior of invalid categorical value handling. (#7529)
This commit is contained in:
@@ -108,6 +108,18 @@ feature it's specified as ``"c"``. The Dask module in XGBoost has the same inte
|
||||
:class:`dask.Array <dask.Array>` can also be used as categorical data.
|
||||
|
||||
|
||||
*************
|
||||
Miscellaneous
|
||||
*************
|
||||
|
||||
By default, XGBoost assumes input categories are integers starting from 0 till the number
|
||||
of categories :math:`[0, n_categories)`. However, user might provide inputs with invalid
|
||||
values due to mistakes or missing values. It can be negative value, floating point value
|
||||
that can not be represented by 32-bit integer, or values that are larger than actual
|
||||
number of unique categories. During training this is validated but for prediction it's
|
||||
treated as the same as missing value for performance reasons. Lastly, missing values are
|
||||
treated as the same as numerical features.
|
||||
|
||||
**********
|
||||
Next Steps
|
||||
**********
|
||||
|
||||
Reference in New Issue
Block a user