112 lines
4.9 KiB
R
112 lines
4.9 KiB
R
% Generated by roxygen2: do not edit by hand
|
|
% Please edit documentation in R/xgb.DMatrix.R
|
|
\name{xgb.DataBatch}
|
|
\alias{xgb.DataBatch}
|
|
\title{Structure for Data Batches}
|
|
\usage{
|
|
xgb.DataBatch(
|
|
data,
|
|
label = NULL,
|
|
weight = NULL,
|
|
base_margin = NULL,
|
|
feature_names = colnames(data),
|
|
feature_types = NULL,
|
|
group = NULL,
|
|
qid = NULL,
|
|
label_lower_bound = NULL,
|
|
label_upper_bound = NULL,
|
|
feature_weights = NULL
|
|
)
|
|
}
|
|
\arguments{
|
|
\item{data}{Batch of data belonging to this batch.
|
|
|
|
Note that not all of the input types supported by \code{\link[=xgb.DMatrix]{xgb.DMatrix()}} are possible
|
|
to pass here. Supported types are:
|
|
\itemize{
|
|
\item \code{matrix}, with types \code{numeric}, \code{integer}, and \code{logical}. Note that for types
|
|
\code{integer} and \code{logical}, missing values might not be automatically recognized as
|
|
as such - see the documentation for parameter \code{missing} in \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}
|
|
for details on this.
|
|
\item \code{data.frame}, with the same types as supported by 'xgb.DMatrix' and same
|
|
conversions applied to it. See the documentation for parameter \code{data} in
|
|
\code{\link[=xgb.DMatrix]{xgb.DMatrix()}} for details on it.
|
|
\item CSR matrices, as class \code{dgRMatrix} from package "Matrix".
|
|
}}
|
|
|
|
\item{label}{Label of the training data. For classification problems, should be passed encoded as
|
|
integers with numeration starting at zero.}
|
|
|
|
\item{weight}{Weight for each instance.
|
|
|
|
Note that, for ranking task, weights are per-group. In ranking task, one weight
|
|
is assigned to each group (not each data point). This is because we
|
|
only care about the relative ordering of data points within each group,
|
|
so it doesn't make sense to assign weights to individual data points.}
|
|
|
|
\item{base_margin}{Base margin used for boosting from existing model.
|
|
|
|
In the case of multi-output models, one can also pass multi-dimensional base_margin.}
|
|
|
|
\item{feature_names}{Set names for features. Overrides column names in data frame and matrix.
|
|
|
|
Note: columns are not referenced by name when calling \code{predict}, so the column order there
|
|
must be the same as in the DMatrix construction, regardless of the column names.}
|
|
|
|
\item{feature_types}{Set types for features.
|
|
|
|
If \code{data} is a \code{data.frame} and passing \code{feature_types} is not supplied,
|
|
feature types will be deduced automatically from the column types.
|
|
|
|
Otherwise, one can pass a character vector with the same length as number of columns in \code{data},
|
|
with the following possible values:
|
|
\itemize{
|
|
\item "c", which represents categorical columns.
|
|
\item "q", which represents numeric columns.
|
|
\item "int", which represents integer columns.
|
|
\item "i", which represents logical (boolean) columns.
|
|
}
|
|
|
|
Note that, while categorical types are treated differently from the rest for model fitting
|
|
purposes, the other types do not influence the generated model, but have effects in other
|
|
functionalities such as feature importances.
|
|
|
|
\strong{Important}: Categorical features, if specified manually through \code{feature_types}, must
|
|
be encoded as integers with numeration starting at zero, and the same encoding needs to be
|
|
applied when passing data to \code{\link[=predict]{predict()}}. Even if passing \code{factor} types, the encoding will
|
|
not be saved, so make sure that \code{factor} columns passed to \code{predict} have the same \code{levels}.}
|
|
|
|
\item{group}{Group size for all ranking group.}
|
|
|
|
\item{qid}{Query ID for data samples, used for ranking.}
|
|
|
|
\item{label_lower_bound}{Lower bound for survival training.}
|
|
|
|
\item{label_upper_bound}{Upper bound for survival training.}
|
|
|
|
\item{feature_weights}{Set feature weights for column sampling.}
|
|
}
|
|
\value{
|
|
An object of class \code{xgb.DataBatch}, which is just a list containing the
|
|
data and parameters passed here. It does \strong{not} inherit from \code{xgb.DMatrix}.
|
|
}
|
|
\description{
|
|
Helper function to supply data in batches of a data iterator when
|
|
constructing a DMatrix from external memory through \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}
|
|
or through \code{\link[=xgb.QuantileDMatrix.from_iterator]{xgb.QuantileDMatrix.from_iterator()}}.
|
|
|
|
This function is \strong{only} meant to be called inside of a callback function (which
|
|
is passed as argument to function \code{\link[=xgb.DataIter]{xgb.DataIter()}} to construct a data iterator)
|
|
when constructing a DMatrix through external memory - otherwise, one should call
|
|
\code{\link[=xgb.DMatrix]{xgb.DMatrix()}} or \code{\link[=xgb.QuantileDMatrix]{xgb.QuantileDMatrix()}}.
|
|
|
|
The object that results from calling this function directly is \strong{not} like
|
|
an \code{xgb.DMatrix} - i.e. cannot be used to train a model, nor to get predictions - only
|
|
possible usage is to supply data to an iterator, from which a DMatrix is then constructed.
|
|
|
|
For more information and for example usage, see the documentation for \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}.
|
|
}
|
|
\seealso{
|
|
\code{\link[=xgb.DataIter]{xgb.DataIter()}}, \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}.
|
|
}
|