Update document for multi output and categorical. (#7574)

* Group together categorical related parameters.
* Update documents about multioutput and categorical.
This commit is contained in:
Jiaming Yuan
2022-01-19 04:35:17 +08:00
committed by GitHub
parent dac9eb13bd
commit b4ec1682c6
5 changed files with 27 additions and 22 deletions

View File

@@ -113,7 +113,7 @@ Miscellaneous
*************
By default, XGBoost assumes input categories are integers starting from 0 till the number
of categories :math:`[0, n_categories)`. However, user might provide inputs with invalid
of categories :math:`[0, n\_categories)`. However, user might provide inputs with invalid
values due to mistakes or missing values. It can be negative value, integer values that
can not be accurately represented by 32-bit floating point, or values that are larger than
actual number of unique categories. During training this is validated but for prediction

View File

@@ -12,14 +12,15 @@ terminologies related to different multi-output models please refer to the `scik
user guide <https://scikit-learn.org/stable/modules/multiclass.HTML>`_.
Internally, XGBoost builds one model for each target similar to sklearn meta estimators,
with the added benefit of reusing data and custom objective support. For a worked example
of regression, see :ref:`sphx_glr_python_examples_multioutput_regression.py`. For
multi-label classification, the binary relevance strategy is used. Input ``y`` should be
of shape ``(n_samples, n_classes)`` with each column having a value of 0 or 1 to specify
whether the sample is labeled as positive for respective class. Given a sample with 3
output classes and 2 labels, the corresponding `y` should be encoded as ``[1, 0, 1]`` with
the second class labeled as negative and the rest labeled as positive. At the moment
XGBoost supports only dense matrix for labels.
with the added benefit of reusing data and other integrated features like SHAP. For a
worked example of regression, see
:ref:`sphx_glr_python_examples_multioutput_regression.py`. For multi-label classification,
the binary relevance strategy is used. Input ``y`` should be of shape ``(n_samples,
n_classes)`` with each column having a value of 0 or 1 to specify whether the sample is
labeled as positive for respective class. Given a sample with 3 output classes and 2
labels, the corresponding `y` should be encoded as ``[1, 0, 1]`` with the second class
labeled as negative and the rest labeled as positive. At the moment XGBoost supports only
dense matrix for labels.
.. code-block:: python