Clarify the behavior of invalid categorical value handling. (#7529)

This commit is contained in:
Jiaming Yuan
2022-01-13 16:11:52 +08:00
committed by GitHub
parent 20c0d60ac7
commit e5e47c3c99
7 changed files with 88 additions and 25 deletions

View File

@@ -108,6 +108,18 @@ feature it's specified as ``"c"``. The Dask module in XGBoost has the same inte
:class:`dask.Array <dask.Array>` can also be used as categorical data.
*************
Miscellaneous
*************
By default, XGBoost assumes input categories are integers starting from 0 till the number
of categories :math:`[0, n_categories)`. However, user might provide inputs with invalid
values due to mistakes or missing values. It can be negative value, floating point value
that can not be represented by 32-bit integer, or values that are larger than actual
number of unique categories. During training this is validated but for prediction it's
treated as the same as missing value for performance reasons. Lastly, missing values are
treated as the same as numerical features.
**********
Next Steps
**********