[doc] Small improvements for categorical data document. (#7330)
This commit is contained in:
@@ -58,10 +58,12 @@ can plot the model and calculate the global feature importance:
|
||||
The ``scikit-learn`` interface from dask is similar to single node version. The basic
|
||||
idea is create dataframe with category feature type, and tell XGBoost to use ``gpu_hist``
|
||||
with parameter ``enable_categorical``. See `this demo
|
||||
<https://github.com/dmlc/xgboost/blob/master/demo/guide-python/categorical.py>`_ for a
|
||||
worked example using categorical data with ``scikit-learn`` interface. For using it with
|
||||
the Kaggle tutorial dataset, see `this demo
|
||||
<https://github.com/dmlc/xgboost/blob/master/demo/guide-python/cat_in_the_dat.py>`_
|
||||
<https://github.com/dmlc/xgboost/blob/master/demo/guide-python/categorical.py>`__ for a
|
||||
worked example of using categorical data with ``scikit-learn`` interface. A comparison
|
||||
between using one-hot encoded data and XGBoost's categorical data support can be found
|
||||
`here
|
||||
<https://github.com/dmlc/xgboost/blob/master/demo/guide-python/cat_in_the_dat.py>`__.
|
||||
|
||||
|
||||
|
||||
**********************
|
||||
@@ -70,10 +72,10 @@ Using native interface
|
||||
|
||||
The ``scikit-learn`` interface is user friendly, but lacks some features that are only
|
||||
available in native interface. For instance users cannot compute SHAP value directly or
|
||||
use quantized ``DMatrix``. Also native interface supports data types other than
|
||||
dataframe, like ``numpy/cupy array``. To use the native interface with categorical data,
|
||||
we need to pass the similar parameter to ``DMatrix`` and the ``train`` function. For
|
||||
dataframe input:
|
||||
use quantized :class:`DMatrix <xgboost.DMatrix>`. Also native interface supports data
|
||||
types other than dataframe, like ``numpy/cupy array``. To use the native interface with
|
||||
categorical data, we need to pass the similar parameter to :class:`DMatrix
|
||||
<xgboost.DMatrix>` and the :func:`train <xgboost.train>` function. For dataframe input:
|
||||
|
||||
.. code:: python
|
||||
|
||||
@@ -106,7 +108,7 @@ types by using the ``feature_types`` parameter in :class:`DMatrix <xgboost.DMatr
|
||||
|
||||
For numerical data, the feature type can be ``"q"`` or ``"float"``, while for categorical
|
||||
feature it's specified as ``"c"``. The Dask module in XGBoost has the same interface so
|
||||
``dask.Array`` can also be used as categorical data.
|
||||
:class:`dask.Array <dask.Array>` can also be used as categorical data.
|
||||
|
||||
|
||||
**********
|
||||
|
||||
Reference in New Issue
Block a user