[R] remove 'reshape' argument, let shapes be handled by core cpp library (#10330)
This commit is contained in:
@@ -13,10 +13,10 @@
|
||||
predcontrib = FALSE,
|
||||
approxcontrib = FALSE,
|
||||
predinteraction = FALSE,
|
||||
reshape = FALSE,
|
||||
training = FALSE,
|
||||
iterationrange = NULL,
|
||||
strict_shape = FALSE,
|
||||
avoid_transpose = FALSE,
|
||||
validate_features = FALSE,
|
||||
base_margin = NULL,
|
||||
...
|
||||
@@ -66,10 +66,6 @@ logistic regression would return log-odds instead of probabilities.}
|
||||
|
||||
\item{predinteraction}{Whether to return contributions of feature interactions to individual predictions (see Details).}
|
||||
|
||||
\item{reshape}{Whether to reshape the vector of predictions to matrix form when there are several
|
||||
prediction outputs per case. No effect if \code{predleaf}, \code{predcontrib},
|
||||
or \code{predinteraction} is \code{TRUE}.}
|
||||
|
||||
\item{training}{Whether the prediction result is used for training. For dart booster,
|
||||
training predicting will perform dropout.}
|
||||
|
||||
@@ -86,8 +82,27 @@ base-1 indexing, and inclusive of both ends).
|
||||
If passing "all", will use all of the rounds regardless of whether the model had early stopping or not.
|
||||
}\if{html}{\out{</div>}}}
|
||||
|
||||
\item{strict_shape}{Default is \code{FALSE}. When set to \code{TRUE}, the output
|
||||
type and shape of predictions are invariant to the model type.}
|
||||
\item{strict_shape}{Whether to always return an array with the same dimensions for the given prediction mode
|
||||
regardless of the model type - meaning that, for example, both a multi-class and a binary classification
|
||||
model would generate output arrays with the same number of dimensions, with the 'class' dimension having
|
||||
size equal to '1' for the binary model.
|
||||
|
||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ If passing `FALSE` (the default), dimensions will be simplified according to the model type, so that a
|
||||
binary classification model for example would not have a redundant dimension for 'class'.
|
||||
|
||||
See documentation for the return type for the exact shape of the output arrays for each prediction mode.
|
||||
}\if{html}{\out{</div>}}}
|
||||
|
||||
\item{avoid_transpose}{Whether to output the resulting predictions in the same memory layout in which they
|
||||
are generated by the core XGBoost library, without transposing them to match the expected output shape.
|
||||
|
||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Internally, XGBoost uses row-major order for the predictions it generates, while R arrays use column-major
|
||||
order, hence the result needs to be transposed in order to have the expected shape when represented as
|
||||
an R array or matrix, which might be a slow operation.
|
||||
|
||||
If passing `TRUE`, then the result will have dimensions in reverse order - for example, rows
|
||||
will be the last dimensions instead of the first dimension.
|
||||
}\if{html}{\out{</div>}}}
|
||||
|
||||
\item{validate_features}{When \code{TRUE}, validate that the Booster's and newdata's feature_names
|
||||
match (only applicable when both \code{object} and \code{newdata} have feature names).
|
||||
@@ -116,32 +131,46 @@ match (only applicable when both \code{object} and \code{newdata} have feature n
|
||||
\item{...}{Not used.}
|
||||
}
|
||||
\value{
|
||||
The return type depends on \code{strict_shape}. If \code{FALSE} (default):
|
||||
\itemize{
|
||||
\item For regression or binary classification: A vector of length \code{nrows(newdata)}.
|
||||
\item For multiclass classification: A vector of length \code{num_class * nrows(newdata)} or
|
||||
a \verb{(nrows(newdata), num_class)} matrix, depending on the \code{reshape} value.
|
||||
\item When \code{predleaf = TRUE}: A matrix with one column per tree.
|
||||
\item When \code{predcontrib = TRUE}: When not multiclass, a matrix with
|
||||
\code{ num_features + 1} columns. The last "+ 1" column corresponds to the baseline value.
|
||||
In the multiclass case, a list of \code{num_class} such matrices.
|
||||
The contribution values are on the scale of untransformed margin
|
||||
(e.g., for binary classification, the values are log-odds deviations from the baseline).
|
||||
\item When \code{predinteraction = TRUE}: When not multiclass, the output is a 3d array of
|
||||
dimension \code{c(nrow, num_features + 1, num_features + 1)}. The off-diagonal (in the last two dimensions)
|
||||
elements represent different feature interaction contributions. The array is symmetric WRT the last
|
||||
two dimensions. The "+ 1" columns corresponds to the baselines. Summing this array along the last dimension should
|
||||
produce practically the same result as \code{predcontrib = TRUE}.
|
||||
In the multiclass case, a list of \code{num_class} such arrays.
|
||||
A numeric vector or array, with corresponding dimensions depending on the prediction mode and on
|
||||
parameter \code{strict_shape} as follows:
|
||||
|
||||
If passing \code{strict_shape=FALSE}:\itemize{
|
||||
\item For regression or binary classification: a vector of length \code{nrows}.
|
||||
\item For multi-class and multi-target objectives: a matrix of dimensions \verb{[nrows, ngroups]}.
|
||||
|
||||
Note that objective variant \code{multi:softmax} defaults towards predicting most likely class (a vector
|
||||
\code{nrows}) instead of per-class probabilities.
|
||||
\item For \code{predleaf}: a matrix with one column per tree.
|
||||
|
||||
For multi-class / multi-target, they will be arranged so that columns in the output will have
|
||||
the leafs from one group followed by leafs of the other group (e.g. order will be \code{group1:feat1},
|
||||
\code{group1:feat2}, ..., \code{group2:feat1}, \code{group2:feat2}, ...).
|
||||
\item For \code{predcontrib}: when not multi-class / multi-target, a matrix with dimensions
|
||||
\verb{[nrows, nfeats+1]}. The last "+ 1" column corresponds to the baseline value.
|
||||
|
||||
For multi-class and multi-target objectives, will be an array with dimensions \verb{[nrows, ngroups, nfeats+1]}.
|
||||
|
||||
The contribution values are on the scale of untransformed margin (e.g., for binary classification,
|
||||
the values are log-odds deviations from the baseline).
|
||||
\item For \code{predinteraction}: when not multi-class / multi-target, the output is a 3D array of
|
||||
dimensions \verb{[nrows, nfeats+1, nfeats+1]}. The off-diagonal (in the last two dimensions)
|
||||
elements represent different feature interaction contributions. The array is symmetric w.r.t. the last
|
||||
two dimensions. The "+ 1" columns corresponds to the baselines. Summing this array along the last
|
||||
dimension should produce practically the same result as \code{predcontrib = TRUE}.
|
||||
|
||||
For multi-class and multi-target, will be a 4D array with dimensions \verb{[nrows, ngroups, nfeats+1, nfeats+1]}
|
||||
}
|
||||
|
||||
When \code{strict_shape = TRUE}, the output is always an array:
|
||||
\itemize{
|
||||
\item For normal predictions, the output has dimension \verb{(num_class, nrow(newdata))}.
|
||||
\item For \code{predcontrib = TRUE}, the dimension is \verb{(ncol(newdata) + 1, num_class, nrow(newdata))}.
|
||||
\item For \code{predinteraction = TRUE}, the dimension is \verb{(ncol(newdata) + 1, ncol(newdata) + 1, num_class, nrow(newdata))}.
|
||||
\item For \code{predleaf = TRUE}, the dimension is \verb{(n_trees_in_forest, num_class, n_iterations, nrow(newdata))}.
|
||||
If passing \code{strict_shape=FALSE}, the result is always an array:\itemize{
|
||||
\item For normal predictions, the dimension is \verb{[nrows, ngroups]}.
|
||||
\item For \code{predcontrib=TRUE}, the dimension is \verb{[nrows, ngroups, nfeats+1]}.
|
||||
\item For \code{predinteraction=TRUE}, the dimension is \verb{[nrows, ngroups, nfeats+1, nfeats+1]}.
|
||||
\item For \code{predleaf=TRUE}, the dimension is \verb{[nrows, niter, ngroups, num_parallel_tree]}.
|
||||
}
|
||||
|
||||
If passing \code{avoid_transpose=TRUE}, then the dimensions in all cases will be in reverse order - for
|
||||
example, for \code{predinteraction}, they will be \verb{[nfeats+1, nfeats+1, ngroups, nrows]}
|
||||
instead of \verb{[nrows, ngroups, nfeats+1, nfeats+1]}.
|
||||
}
|
||||
\description{
|
||||
Predict values on data based on xgboost model.
|
||||
@@ -241,8 +270,6 @@ bst <- xgb.train(
|
||||
# predict for softmax returns num_class probability numbers per case:
|
||||
pred <- predict(bst, as.matrix(iris[, -5]))
|
||||
str(pred)
|
||||
# reshape it to a num_class-columns matrix
|
||||
pred <- matrix(pred, ncol = num_class, byrow = TRUE)
|
||||
# convert the probabilities to softmax labels
|
||||
pred_labels <- max.col(pred) - 1
|
||||
# the following should result in the same error as seen in the last iteration
|
||||
|
||||
Reference in New Issue
Block a user