Support dataframe data format in native XGBoost. (#9828)

- Implement a columnar adapter. - Refactor Python pandas handling code to avoid converting into a single numpy array. - Add support in R for transforming columns. - Support R data.frame and factor type.
2023-12-12 09:56:31 +08:00
parent b3700bbb3f
commit faf0f2df10
21 changed files with 718 additions and 221 deletions
--- a/demo/guide-python/cat_in_the_dat.py
+++ b/demo/guide-python/cat_in_the_dat.py
@@ -78,6 +78,10 @@ def categorical_model(X: pd.DataFrame, y: pd.Series, output_dir: str) -> None:
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, random_state=1994, test_size=0.2
    )
+    # Be aware that the encoding for X_train and X_test are the same here. In practice,
+    # we should try to use an encoder like (sklearn OrdinalEncoder) to obtain the
+    # categorical values.
+
    # Specify `enable_categorical` to True.
    clf = xgb.XGBClassifier(
        **params,