249 lines
10 KiB
R
249 lines
10 KiB
R
% Generated by roxygen2: do not edit by hand
|
|
% Please edit documentation in R/callbacks.R
|
|
\name{xgb.Callback}
|
|
\alias{xgb.Callback}
|
|
\title{XGBoost Callback Constructor}
|
|
\usage{
|
|
xgb.Callback(
|
|
cb_name = "custom_callback",
|
|
env = new.env(),
|
|
f_before_training = function(env, model, data, evals, begin_iteration, end_iteration)
|
|
NULL,
|
|
f_before_iter = function(env, model, data, evals, iteration) NULL,
|
|
f_after_iter = function(env, model, data, evals, iteration, iter_feval) NULL,
|
|
f_after_training = function(env, model, data, evals, iteration, final_feval,
|
|
prev_cb_res) NULL
|
|
)
|
|
}
|
|
\arguments{
|
|
\item{cb_name}{Name for the callback.
|
|
|
|
If the callback produces some non-NULL result (from executing the function passed under
|
|
\code{f_after_training}), that result will be added as an R attribute to the resulting booster
|
|
(or as a named element in the result of CV), with the attribute name specified here.
|
|
|
|
Names of callbacks must be unique - i.e. there cannot be two callbacks with the same name.}
|
|
|
|
\item{env}{An environment object that will be passed to the different functions in the callback.
|
|
Note that this environment will not be shared with other callbacks.}
|
|
|
|
\item{f_before_training}{A function that will be executed before the training has started.
|
|
|
|
If passing \code{NULL} for this or for the other function inputs, then no function will be executed.
|
|
|
|
If passing a function, it will be called with parameters supplied as non-named arguments
|
|
matching the function signatures that are shown in the default value for each function argument.}
|
|
|
|
\item{f_before_iter}{A function that will be executed before each boosting round.
|
|
|
|
This function can signal whether the training should be finalized or not, by outputting
|
|
a value that evaluates to \code{TRUE} - i.e. if the output from the function provided here at
|
|
a given round is \code{TRUE}, then training will be stopped before the current iteration happens.
|
|
|
|
Return values of \code{NULL} will be interpreted as \code{FALSE}.}
|
|
|
|
\item{f_after_iter}{A function that will be executed after each boosting round.
|
|
|
|
This function can signal whether the training should be finalized or not, by outputting
|
|
a value that evaluates to \code{TRUE} - i.e. if the output from the function provided here at
|
|
a given round is \code{TRUE}, then training will be stopped at that round.
|
|
|
|
Return values of \code{NULL} will be interpreted as \code{FALSE}.}
|
|
|
|
\item{f_after_training}{A function that will be executed after training is finished.
|
|
|
|
This function can optionally output something non-NULL, which will become part of the R
|
|
attributes of the booster (assuming one passes \code{keep_extra_attributes=TRUE} to \link{xgb.train})
|
|
under the name supplied for parameter \code{cb_name} imn the case of \link{xgb.train}; or a part
|
|
of the named elements in the result of \link{xgb.cv}.}
|
|
}
|
|
\value{
|
|
An \code{xgb.Callback} object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
|
}
|
|
\description{
|
|
Constructor for defining the structure of callback functions that can be executed
|
|
at different stages of model training (before / after training, before / after each boosting
|
|
iteration).
|
|
}
|
|
\details{
|
|
Arguments that will be passed to the supplied functions are as follows:\itemize{
|
|
|
|
\item env The same environment that is passed under argument \code{env}.
|
|
|
|
It may be modified by the functions in order to e.g. keep tracking of what happens
|
|
across iterations or similar.
|
|
|
|
This environment is only used by the functions supplied to the callback, and will
|
|
not be kept after the model fitting function terminates (see parameter \code{f_after_training}).
|
|
|
|
\item model The booster object when using \link{xgb.train}, or the folds when using
|
|
\link{xgb.cv}.
|
|
|
|
For \link{xgb.cv}, folds are a list with a structure as follows:\itemize{
|
|
\item \code{dtrain}: The training data for the fold (as an \code{xgb.DMatrix} object).
|
|
\item \code{bst}: Rhe \code{xgb.Booster} object for the fold.
|
|
\item \code{evals}: A list containing two DMatrices, with names \code{train} and \code{test}
|
|
(\code{test} is the held-out data for the fold).
|
|
\item \code{index}: The indices of the hold-out data for that fold (base-1 indexing),
|
|
from which the \code{test} entry in \code{evals} was obtained.
|
|
}
|
|
|
|
This object should \bold{not} be in-place modified in ways that conflict with the
|
|
training (e.g. resetting the parameters for a training update in a way that resets
|
|
the number of rounds to zero in order to overwrite rounds).
|
|
|
|
Note that any R attributes that are assigned to the booster during the callback functions,
|
|
will not be kept thereafter as the booster object variable is not re-assigned during
|
|
training. It is however possible to set C-level attributes of the booster through
|
|
\link{xgb.attr} or \link{xgb.attributes}, which should remain available for the rest
|
|
of the iterations and after the training is done.
|
|
|
|
For keeping variables across iterations, it's recommended to use \code{env} instead.
|
|
\item data The data to which the model is being fit, as an \code{xgb.DMatrix} object.
|
|
|
|
Note that, for \link{xgb.cv}, this will be the full data, while data for the specific
|
|
folds can be found in the \code{model} object.
|
|
|
|
\item evals The evaluation data, as passed under argument \code{evals} to
|
|
\link{xgb.train}.
|
|
|
|
For \link{xgb.cv}, this will always be \code{NULL}.
|
|
|
|
\item begin_iteration Index of the first boosting iteration that will be executed
|
|
(base-1 indexing).
|
|
|
|
This will typically be '1', but when using training continuation, depending on the
|
|
parameters for updates, boosting rounds will be continued from where the previous
|
|
model ended, in which case this will be larger than 1.
|
|
|
|
\item end_iteration Index of the last boostign iteration that will be executed
|
|
(base-1 indexing, inclusive of this end).
|
|
|
|
It should match with argument \code{nrounds} passed to \link{xgb.train} or \link{xgb.cv}.
|
|
|
|
Note that boosting might be interrupted before reaching this last iteration, for
|
|
example by using the early stopping callback \link{xgb.cb.early.stop}.
|
|
|
|
\item iteration Index of the iteration number that is being executed (first iteration
|
|
will be the same as parameter \code{begin_iteration}, then next one will add +1, and so on).
|
|
|
|
\item iter_feval Evaluation metrics for \code{evals} that were supplied, either
|
|
determined by the objective, or by parameter \code{feval}.
|
|
|
|
For \link{xgb.train}, this will be a named vector with one entry per element in
|
|
\code{evals}, where the names are determined as 'evals name' + '-' + 'metric name' - for
|
|
example, if \code{evals} contains an entry named "tr" and the metric is "rmse",
|
|
this will be a one-element vector with name "tr-rmse".
|
|
|
|
For \link{xgb.cv}, this will be a 2d matrix with dimensions \verb{[length(evals), nfolds]},
|
|
where the row names will follow the same naming logic as the one-dimensional vector
|
|
that is passed in \link{xgb.train}.
|
|
|
|
Note that, internally, the built-in callbacks such as \link{xgb.cb.print.evaluation} summarize
|
|
this table by calculating the row-wise means and standard deviations.
|
|
|
|
\item final_feval The evaluation results after the last boosting round is executed
|
|
(same format as \code{iter_feval}, and will be the exact same input as passed under
|
|
\code{iter_feval} to the last round that is executed during model fitting).
|
|
|
|
\item prev_cb_res Result from a previous run of a callback sharing the same name
|
|
(as given by parameter \code{cb_name}) when conducting training continuation, if there
|
|
was any in the booster R attributes.
|
|
|
|
Some times, one might want to append the new results to the previous one, and this will
|
|
be done automatically by the built-in callbacks such as \link{xgb.cb.evaluation.log},
|
|
which will append the new rows to the previous table.
|
|
|
|
If no such previous callback result is available (which it never will when fitting
|
|
a model from start instead of updating an existing model), this will be \code{NULL}.
|
|
|
|
For \link{xgb.cv}, which doesn't support training continuation, this will always be \code{NULL}.
|
|
}
|
|
|
|
The following names (\code{cb_name} values) are reserved for internal callbacks:\itemize{
|
|
\item print_evaluation
|
|
\item evaluation_log
|
|
\item reset_parameters
|
|
\item early_stop
|
|
\item save_model
|
|
\item cv_predict
|
|
\item gblinear_history
|
|
}
|
|
|
|
The following names are reserved for other non-callback attributes:\itemize{
|
|
\item names
|
|
\item class
|
|
\item call
|
|
\item params
|
|
\item niter
|
|
\item nfeatures
|
|
\item folds
|
|
}
|
|
|
|
When using the built-in early stopping callback (\link{xgb.cb.early.stop}), said callback
|
|
will always be executed before the others, as it sets some booster C-level attributes
|
|
that other callbacks might also use. Otherwise, the order of execution will match with
|
|
the order in which the callbacks are passed to the model fitting function.
|
|
}
|
|
\examples{
|
|
# Example constructing a custom callback that calculates
|
|
# squared error on the training data (no separate test set),
|
|
# and outputs the per-iteration results.
|
|
ssq_callback <- xgb.Callback(
|
|
cb_name = "ssq",
|
|
f_before_training = function(env, model, data, evals,
|
|
begin_iteration, end_iteration) {
|
|
# A vector to keep track of a number at each iteration
|
|
env$logs <- rep(NA_real_, end_iteration - begin_iteration + 1)
|
|
},
|
|
f_after_iter = function(env, model, data, evals, iteration, iter_feval) {
|
|
# This calculates the sum of squared errors on the training data.
|
|
# Note that this can be better done by passing an 'evals' entry,
|
|
# but this demonstrates a way in which callbacks can be structured.
|
|
pred <- predict(model, data)
|
|
err <- pred - getinfo(data, "label")
|
|
sq_err <- sum(err^2)
|
|
env$logs[iteration] <- sq_err
|
|
cat(
|
|
sprintf(
|
|
"Squared error at iteration \%d: \%.2f\n",
|
|
iteration, sq_err
|
|
)
|
|
)
|
|
|
|
# A return value of 'TRUE' here would signal to finalize the training
|
|
return(FALSE)
|
|
},
|
|
f_after_training = function(env, model, data, evals, iteration,
|
|
final_feval, prev_cb_res) {
|
|
return(env$logs)
|
|
}
|
|
)
|
|
|
|
data(mtcars)
|
|
y <- mtcars$mpg
|
|
x <- as.matrix(mtcars[, -1])
|
|
dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
|
model <- xgb.train(
|
|
data = dm,
|
|
params = list(objective = "reg:squarederror", nthread = 1),
|
|
nrounds = 5,
|
|
callbacks = list(ssq_callback),
|
|
keep_extra_attributes = TRUE
|
|
)
|
|
|
|
# Result from 'f_after_iter' will be available as an attribute
|
|
attributes(model)$ssq
|
|
}
|
|
\seealso{
|
|
Built-in callbacks:\itemize{
|
|
\item \link{xgb.cb.print.evaluation}
|
|
\item \link{xgb.cb.evaluation.log}
|
|
\item \link{xgb.cb.reset.parameters}
|
|
\item \link{xgb.cb.early.stop}
|
|
\item \link{xgb.cb.save.model}
|
|
\item \link{xgb.cb.cv.predict}
|
|
\item \link{xgb.cb.gblinear.history}
|
|
}
|
|
}
|