[R] Add data iterator, quantile dmatrix, external memory, and missing feature_types (#9913)

2024-01-30 12:26:44 +01:00
parent d9f4ab557a
commit 3abbbe41ac
13 changed files with 1754 additions and 104 deletions
--- a/python-package/xgboost/core.py
+++ b/python-package/xgboost/core.py
@@ -798,9 +798,23 @@ class DMatrix:  # pylint: disable=too-many-instance-attributes,too-many-public-m
            Set names for features.
        feature_types :

-            Set types for features.  When `enable_categorical` is set to `True`, string
-            "c" represents categorical data type while "q" represents numerical feature
-            type. For categorical features, the input is assumed to be preprocessed and
+            Set types for features. If `data` is a DataFrame type and passing
+            `enable_categorical=True`, the types will be deduced automatically
+            from the column types.
+
+            Otherwise, one can pass a list-like input with the same length as number
+            of columns in `data`, with the following possible values:
+             - "c", which represents categorical columns.
+             - "q", which represents numeric columns.
+             - "int", which represents integer columns.
+             - "i", which represents boolean columns.
+
+            Note that, while categorical types are treated differently from
+            the rest for model fitting purposes, the other types do not influence
+            the generated model, but have effects in other functionalities such as
+            feature importances.
+
+            For categorical features, the input is assumed to be preprocessed and
            encoded by the users. The encoding can be done via
            :py:class:`sklearn.preprocessing.OrdinalEncoder` or pandas dataframe
            `.cat.codes` method. This is useful when users want to specify categorical