[R] switch to URI reader (#10024)
This commit is contained in:
@@ -19,7 +19,8 @@ xgb.DMatrix(
|
||||
qid = NULL,
|
||||
label_lower_bound = NULL,
|
||||
label_upper_bound = NULL,
|
||||
feature_weights = NULL
|
||||
feature_weights = NULL,
|
||||
data_split_mode = "row"
|
||||
)
|
||||
|
||||
xgb.QuantileDMatrix(
|
||||
@@ -60,10 +61,27 @@ Other column types are not supported.
|
||||
'xgb.QuantileDMatrix'.
|
||||
\item Single-row CSR matrices, as class \code{dsparseVector} from package \code{Matrix}, which is interpreted
|
||||
as a single row (only when making predictions from a fitted model).
|
||||
\item Text files in SVMLight / LibSVM formats, passed as a path to the file. These are \bold{not}
|
||||
supported for xgb.QuantileDMatrix'.
|
||||
\item Binary files generated by \link{xgb.DMatrix.save}, passed as a path to the file. These are
|
||||
\bold{not} supported for xgb.QuantileDMatrix'.
|
||||
\item Text files in a supported format, passed as a \code{character} variable containing the URI path to
|
||||
the file, with an optional format specifier.
|
||||
|
||||
These are \bold{not} supported for \code{xgb.QuantileDMatrix}. Supported formats are:\itemize{
|
||||
\item XGBoost's own binary format for DMatrices, as produced by \link{xgb.DMatrix.save}.
|
||||
\item SVMLight (a.k.a. LibSVM) format for CSR matrices. This format can be signaled by suffix
|
||||
\code{?format=libsvm} at the end of the file path. It will be the default format if not
|
||||
otherwise specified.
|
||||
\item CSV files (comma-separated values). This format can be specified by adding suffix
|
||||
\code{?format=csv} at the end ofthe file path. It will \bold{not} be auto-deduced from file extensions.
|
||||
}
|
||||
|
||||
Be aware that the format of the file will not be auto-deduced - for example, if a file is named 'file.csv',
|
||||
it will not look at the extension or file contents to determine that it is a comma-separated value.
|
||||
Instead, the format must be specified following the URI format, so the input to \code{data} should be passed
|
||||
like this: \code{"file.csv?format=csv"} (or \code{"file.csv?format=csv&label_column=0"} if the first column
|
||||
corresponds to the labels).
|
||||
|
||||
For more information about passing text files as input, see the articles
|
||||
\href{https://xgboost.readthedocs.io/en/stable/tutorials/input_format.html}{Text Input Format of DMatrix} and
|
||||
\href{https://xgboost.readthedocs.io/en/stable/python/python_intro.html#python-data-interface}{Data Interface}.
|
||||
}}
|
||||
|
||||
\item{label}{Label of the training data. For classification problems, should be passed encoded as
|
||||
@@ -129,6 +147,14 @@ not be saved, so make sure that \code{factor} columns passed to \code{predict} h
|
||||
|
||||
\item{feature_weights}{Set feature weights for column sampling.}
|
||||
|
||||
\item{data_split_mode}{When passing a URI (as R \code{character}) as input, this signals
|
||||
whether to split by row or column. Allowed values are \code{"row"} and \code{"col"}.
|
||||
|
||||
In distributed mode, the file is split accordingly; otherwise this is only an indicator on
|
||||
how the file was split beforehand. Default to row.
|
||||
|
||||
This is not used when \code{data} is not a URI.}
|
||||
|
||||
\item{ref}{The training dataset that provides quantile information, needed when creating
|
||||
validation/test dataset with \code{xgb.QuantileDMatrix}. Supplying the training DMatrix
|
||||
as a reference means that the same quantisation applied to the training data is
|
||||
|
||||
Reference in New Issue
Block a user