[R] Add data iterator, quantile dmatrix, external memory, and missing feature_types (#9913)

This commit is contained in:
david-cortes
2024-01-30 12:26:44 +01:00
committed by GitHub
parent d9f4ab557a
commit 3abbbe41ac
13 changed files with 1754 additions and 104 deletions

View File

@@ -798,9 +798,23 @@ class DMatrix: # pylint: disable=too-many-instance-attributes,too-many-public-m
Set names for features.
feature_types :
Set types for features. When `enable_categorical` is set to `True`, string
"c" represents categorical data type while "q" represents numerical feature
type. For categorical features, the input is assumed to be preprocessed and
Set types for features. If `data` is a DataFrame type and passing
`enable_categorical=True`, the types will be deduced automatically
from the column types.
Otherwise, one can pass a list-like input with the same length as number
of columns in `data`, with the following possible values:
- "c", which represents categorical columns.
- "q", which represents numeric columns.
- "int", which represents integer columns.
- "i", which represents boolean columns.
Note that, while categorical types are treated differently from
the rest for model fitting purposes, the other types do not influence
the generated model, but have effects in other functionalities such as
feature importances.
For categorical features, the input is assumed to be preprocessed and
encoded by the users. The encoding can be done via
:py:class:`sklearn.preprocessing.OrdinalEncoder` or pandas dataframe
`.cat.codes` method. This is useful when users want to specify categorical