xgboost/R-package/man/xgb.Callback.Rd

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/callbacks.R
\name{xgb.Callback}
\alias{xgb.Callback}
\title{XGBoost Callback Constructor}
\usage{
xgb.Callback(
  cb_name = "custom_callback",
  env = new.env(),
  f_before_training = function(env, model, data, evals, begin_iteration, end_iteration)
    NULL,
  f_before_iter = function(env, model, data, evals, iteration) NULL,
  f_after_iter = function(env, model, data, evals, iteration, iter_feval) NULL,
  f_after_training = function(env, model, data, evals, iteration, final_feval,
    prev_cb_res) NULL
)
}
\arguments{
\item{cb_name}{Name for the callback.

If the callback produces some non-NULL result (from executing the function passed under
\code{f_after_training}), that result will be added as an R attribute to the resulting booster
(or as a named element in the result of CV), with the attribute name specified here.

Names of callbacks must be unique - i.e. there cannot be two callbacks with the same name.}

\item{env}{An environment object that will be passed to the different functions in the callback.
Note that this environment will not be shared with other callbacks.}

\item{f_before_training}{A function that will be executed before the training has started.

If passing \code{NULL} for this or for the other function inputs, then no function will be executed.

If passing a function, it will be called with parameters supplied as non-named arguments
matching the function signatures that are shown in the default value for each function argument.}

\item{f_before_iter}{A function that will be executed before each boosting round.

This function can signal whether the training should be finalized or not, by outputting
a value that evaluates to \code{TRUE} - i.e. if the output from the function provided here at
a given round is \code{TRUE}, then training will be stopped before the current iteration happens.

Return values of \code{NULL} will be interpreted as \code{FALSE}.}

\item{f_after_iter}{A function that will be executed after each boosting round.

This function can signal whether the training should be finalized or not, by outputting
a value that evaluates to \code{TRUE} - i.e. if the output from the function provided here at
a given round is \code{TRUE}, then training will be stopped at that round.

Return values of \code{NULL} will be interpreted as \code{FALSE}.}

\item{f_after_training}{A function that will be executed after training is finished.

This function can optionally output something non-NULL, which will become part of the R
attributes of the booster (assuming one passes \code{keep_extra_attributes=TRUE} to \link{xgb.train})
under the name supplied for parameter \code{cb_name} imn the case of \link{xgb.train}; or a part
of the named elements in the result of \link{xgb.cv}.}
}
\value{
An \code{xgb.Callback} object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
}
\description{
Constructor for defining the structure of callback functions that can be executed
at different stages of model training (before / after training, before / after each boosting
iteration).
}
\details{
Arguments that will be passed to the supplied functions are as follows:\itemize{

\item env The same environment that is passed under argument \code{env}.

It may be modified by the functions in order to e.g. keep tracking of what happens
across iterations or similar.

This environment is only used by the functions supplied to the callback, and will
not be kept after the model fitting function terminates (see parameter \code{f_after_training}).

\item model The booster object when using \link{xgb.train}, or the folds when using
\link{xgb.cv}.

For \link{xgb.cv}, folds are a list with a structure as follows:\itemize{
\item \code{dtrain}: The training data for the fold (as an \code{xgb.DMatrix} object).
\item \code{bst}: Rhe \code{xgb.Booster} object for the fold.
\item \code{evals}: A list containing two DMatrices, with names \code{train} and \code{test}
(\code{test} is the held-out data for the fold).
\item \code{index}: The indices of the hold-out data for that fold (base-1 indexing),
from which the \code{test} entry in \code{evals} was obtained.
}

This object should \bold{not} be in-place modified in ways that conflict with the
training (e.g. resetting the parameters for a training update in a way that resets
the number of rounds to zero in order to overwrite rounds).

Note that any R attributes that are assigned to the booster during the callback functions,
will not be kept thereafter as the booster object variable is not re-assigned during
training. It is however possible to set C-level attributes of the booster through
\link{xgb.attr} or \link{xgb.attributes}, which should remain available for the rest
of the iterations and after the training is done.

For keeping variables across iterations, it's recommended to use \code{env} instead.
\item data The data to which the model is being fit, as an \code{xgb.DMatrix} object.

Note that, for \link{xgb.cv}, this will be the full data, while data for the specific
folds can be found in the \code{model} object.

\item evals The evaluation data, as passed under argument \code{evals} to
\link{xgb.train}.

For \link{xgb.cv}, this will always be \code{NULL}.

\item begin_iteration Index of the first boosting iteration that will be executed
(base-1 indexing).

This will typically be '1', but when using training continuation, depending on the
parameters for updates, boosting rounds will be continued from where the previous
model ended, in which case this will be larger than 1.

\item end_iteration Index of the last boostign iteration that will be executed
(base-1 indexing, inclusive of this end).

It should match with argument \code{nrounds} passed to \link{xgb.train} or \link{xgb.cv}.

Note that boosting might be interrupted before reaching this last iteration, for
example by using the early stopping callback \link{xgb.cb.early.stop}.

\item iteration Index of the iteration number that is being executed (first iteration
will be the same as parameter \code{begin_iteration}, then next one will add +1, and so on).

\item iter_feval Evaluation metrics for \code{evals} that were supplied, either
determined by the objective, or by parameter \code{feval}.

For \link{xgb.train}, this will be a named vector with one entry per element in
\code{evals}, where the names are determined as 'evals name' + '-' + 'metric name' - for
example, if \code{evals} contains an entry named "tr" and the metric is "rmse",
this will be a one-element vector with name "tr-rmse".

For \link{xgb.cv}, this will be a 2d matrix with dimensions \verb{[length(evals), nfolds]},
where the row names will follow the same naming logic as the one-dimensional vector
that is passed in \link{xgb.train}.

Note that, internally, the built-in callbacks such as \link{xgb.cb.print.evaluation} summarize
this table by calculating the row-wise means and standard deviations.

\item final_feval The evaluation results after the last boosting round is executed
(same format as \code{iter_feval}, and will be the exact same input as passed under
\code{iter_feval} to the last round that is executed during model fitting).

\item prev_cb_res Result from a previous run of a callback sharing the same name
(as given by parameter \code{cb_name}) when conducting training continuation, if there
was any in the booster R attributes.

Some times, one might want to append the new results to the previous one, and this will
be done automatically by the built-in callbacks such as \link{xgb.cb.evaluation.log},
which will append the new rows to the previous table.

If no such previous callback result is available (which it never will when fitting
a model from start instead of updating an existing model), this will be \code{NULL}.

For \link{xgb.cv}, which doesn't support training continuation, this will always be \code{NULL}.
}

The following names (\code{cb_name} values) are reserved for internal callbacks:\itemize{
\item print_evaluation
\item evaluation_log
\item reset_parameters
\item early_stop
\item save_model
\item cv_predict
\item gblinear_history
}

The following names are reserved for other non-callback attributes:\itemize{
\item names
\item class
\item call
\item params
\item niter
\item nfeatures
\item folds
}

When using the built-in early stopping callback (\link{xgb.cb.early.stop}), said callback
will always be executed before the others, as it sets some booster C-level attributes
that other callbacks might also use. Otherwise, the order of execution will match with
the order in which the callbacks are passed to the model fitting function.
}
\examples{
# Example constructing a custom callback that calculates
# squared error on the training data (no separate test set),
# and outputs the per-iteration results.
ssq_callback <- xgb.Callback(
  cb_name = "ssq",
  f_before_training = function(env, model, data, evals,
                               begin_iteration, end_iteration) {
    # A vector to keep track of a number at each iteration
    env$logs <- rep(NA_real_, end_iteration - begin_iteration + 1)
  },
  f_after_iter = function(env, model, data, evals, iteration, iter_feval) {
    # This calculates the sum of squared errors on the training data.
    # Note that this can be better done by passing an 'evals' entry,
    # but this demonstrates a way in which callbacks can be structured.
    pred <- predict(model, data)
    err <- pred - getinfo(data, "label")
    sq_err <- sum(err^2)
    env$logs[iteration] <- sq_err
    cat(
      sprintf(
        "Squared error at iteration \%d: \%.2f\n",
        iteration, sq_err
      )
    )

    # A return value of 'TRUE' here would signal to finalize the training
    return(FALSE)
  },
  f_after_training = function(env, model, data, evals, iteration,
                              final_feval, prev_cb_res) {
    return(env$logs)
  }
)

data(mtcars)
y <- mtcars$mpg
x <- as.matrix(mtcars[, -1])
dm <- xgb.DMatrix(x, label = y, nthread = 1)
model <- xgb.train(
  data = dm,
  params = list(objective = "reg:squarederror", nthread = 1),
  nrounds = 5,
  callbacks = list(ssq_callback),
  keep_extra_attributes = TRUE
)

# Result from 'f_after_iter' will be available as an attribute
attributes(model)$ssq
}
\seealso{
Built-in callbacks:\itemize{
\item \link{xgb.cb.print.evaluation}
\item \link{xgb.cb.evaluation.log}
\item \link{xgb.cb.reset.parameters}
\item \link{xgb.cb.early.stop}
\item \link{xgb.cb.save.model}
\item \link{xgb.cb.cv.predict}
\item \link{xgb.cb.gblinear.history}
}
}