Support dataframe data format in native XGBoost. (#9828)

- Implement a columnar adapter. - Refactor Python pandas handling code to avoid converting into a single numpy array. - Add support in R for transforming columns. - Support R data.frame and factor type.
2023-12-12 09:56:31 +08:00
parent b3700bbb3f
commit faf0f2df10
21 changed files with 718 additions and 221 deletions
--- a/R-package/man/xgb.DMatrix.Rd
+++ b/R-package/man/xgb.DMatrix.Rd
@@ -17,7 +17,8 @@ xgb.DMatrix(
  qid = NULL,
  label_lower_bound = NULL,
  label_upper_bound = NULL,
-  feature_weights = NULL
+  feature_weights = NULL,
+  enable_categorical = FALSE
 )
 }
 \arguments{
@@ -42,7 +43,8 @@ It is useful when a 0 or some other extreme value represents missing values in d

 \item{silent}{whether to suppress printing an informational message after loading from a file.}

-\item{feature_names}{Set names for features.}
+\item{feature_names}{Set names for features. Overrides column names in data
+frame and matrix.}

 \item{nthread}{Number of threads used for creating DMatrix.}

@@ -55,6 +57,9 @@ It is useful when a 0 or some other extreme value represents missing values in d
 \item{label_upper_bound}{Upper bound for survival training.}

 \item{feature_weights}{Set feature weights for column sampling.}
+
+\item{enable_categorical}{Experimental support of specializing for
+categorical features. JSON/UBJSON serialization format is required.}
 }
 \description{
 Construct xgb.DMatrix object from either a dense matrix, a sparse matrix, or a local file.