[R] Refactor callback structure and attributes (#9957)
This commit is contained in:
@@ -1,37 +0,0 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/callbacks.R
|
||||
\name{callbacks}
|
||||
\alias{callbacks}
|
||||
\title{Callback closures for booster training.}
|
||||
\description{
|
||||
These are used to perform various service tasks either during boosting iterations or at the end.
|
||||
This approach helps to modularize many of such tasks without bloating the main training methods,
|
||||
and it offers .
|
||||
}
|
||||
\details{
|
||||
By default, a callback function is run after each boosting iteration.
|
||||
An R-attribute \code{is_pre_iteration} could be set for a callback to define a pre-iteration function.
|
||||
|
||||
When a callback function has \code{finalize} parameter, its finalizer part will also be run after
|
||||
the boosting is completed.
|
||||
|
||||
WARNING: side-effects!!! Be aware that these callback functions access and modify things in
|
||||
the environment from which they are called from, which is a fairly uncommon thing to do in R.
|
||||
|
||||
To write a custom callback closure, make sure you first understand the main concepts about R environments.
|
||||
Check either R documentation on \code{\link[base]{environment}} or the
|
||||
\href{http://adv-r.had.co.nz/Environments.html}{Environments chapter} from the "Advanced R"
|
||||
book by Hadley Wickham. Further, the best option is to read the code of some of the existing callbacks -
|
||||
choose ones that do something similar to what you want to achieve. Also, you would need to get familiar
|
||||
with the objects available inside of the \code{xgb.train} and \code{xgb.cv} internal environments.
|
||||
}
|
||||
\seealso{
|
||||
\code{\link{cb.print.evaluation}},
|
||||
\code{\link{cb.evaluation.log}},
|
||||
\code{\link{cb.reset.parameters}},
|
||||
\code{\link{cb.early.stop}},
|
||||
\code{\link{cb.save.model}},
|
||||
\code{\link{cb.cv.predict}},
|
||||
\code{\link{xgb.train}},
|
||||
\code{\link{xgb.cv}}
|
||||
}
|
||||
@@ -1,62 +0,0 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/callbacks.R
|
||||
\name{cb.early.stop}
|
||||
\alias{cb.early.stop}
|
||||
\title{Callback closure to activate the early stopping.}
|
||||
\usage{
|
||||
cb.early.stop(
|
||||
stopping_rounds,
|
||||
maximize = FALSE,
|
||||
metric_name = NULL,
|
||||
verbose = TRUE
|
||||
)
|
||||
}
|
||||
\arguments{
|
||||
\item{stopping_rounds}{The number of rounds with no improvement in
|
||||
the evaluation metric in order to stop the training.}
|
||||
|
||||
\item{maximize}{whether to maximize the evaluation metric}
|
||||
|
||||
\item{metric_name}{the name of an evaluation column to use as a criteria for early
|
||||
stopping. If not set, the last column would be used.
|
||||
Let's say the test data in \code{watchlist} was labelled as \code{dtest},
|
||||
and one wants to use the AUC in test data for early stopping regardless of where
|
||||
it is in the \code{watchlist}, then one of the following would need to be set:
|
||||
\code{metric_name='dtest-auc'} or \code{metric_name='dtest_auc'}.
|
||||
All dash '-' characters in metric names are considered equivalent to '_'.}
|
||||
|
||||
\item{verbose}{whether to print the early stopping information.}
|
||||
}
|
||||
\description{
|
||||
Callback closure to activate the early stopping.
|
||||
}
|
||||
\details{
|
||||
This callback function determines the condition for early stopping
|
||||
by setting the \code{stop_condition = TRUE} flag in its calling frame.
|
||||
|
||||
The following additional fields are assigned to the model's R object:
|
||||
\itemize{
|
||||
\item \code{best_score} the evaluation score at the best iteration
|
||||
\item \code{best_iteration} at which boosting iteration the best score has occurred (1-based index)
|
||||
}
|
||||
The Same values are also stored as xgb-attributes:
|
||||
\itemize{
|
||||
\item \code{best_iteration} is stored as a 0-based iteration index (for interoperability of binary models)
|
||||
\item \code{best_msg} message string is also stored.
|
||||
}
|
||||
|
||||
At least one data element is required in the evaluation watchlist for early stopping to work.
|
||||
|
||||
Callback function expects the following values to be set in its calling frame:
|
||||
\code{stop_condition},
|
||||
\code{bst_evaluation},
|
||||
\code{rank},
|
||||
\code{bst} (or \code{bst_folds} and \code{basket}),
|
||||
\code{iteration},
|
||||
\code{begin_iteration},
|
||||
\code{end_iteration},
|
||||
}
|
||||
\seealso{
|
||||
\code{\link{callbacks}},
|
||||
\code{\link{xgb.attr}}
|
||||
}
|
||||
@@ -1,31 +0,0 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/callbacks.R
|
||||
\name{cb.evaluation.log}
|
||||
\alias{cb.evaluation.log}
|
||||
\title{Callback closure for logging the evaluation history}
|
||||
\usage{
|
||||
cb.evaluation.log()
|
||||
}
|
||||
\description{
|
||||
Callback closure for logging the evaluation history
|
||||
}
|
||||
\details{
|
||||
This callback function appends the current iteration evaluation results \code{bst_evaluation}
|
||||
available in the calling parent frame to the \code{evaluation_log} list in a calling frame.
|
||||
|
||||
The finalizer callback (called with \code{finalize = TURE} in the end) converts
|
||||
the \code{evaluation_log} list into a final data.table.
|
||||
|
||||
The iteration evaluation result \code{bst_evaluation} must be a named numeric vector.
|
||||
|
||||
Note: in the column names of the final data.table, the dash '-' character is replaced with
|
||||
the underscore '_' in order to make the column names more like regular R identifiers.
|
||||
|
||||
Callback function expects the following values to be set in its calling frame:
|
||||
\code{evaluation_log},
|
||||
\code{bst_evaluation},
|
||||
\code{iteration}.
|
||||
}
|
||||
\seealso{
|
||||
\code{\link{callbacks}}
|
||||
}
|
||||
@@ -1,29 +0,0 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/callbacks.R
|
||||
\name{cb.print.evaluation}
|
||||
\alias{cb.print.evaluation}
|
||||
\title{Callback closure for printing the result of evaluation}
|
||||
\usage{
|
||||
cb.print.evaluation(period = 1, showsd = TRUE)
|
||||
}
|
||||
\arguments{
|
||||
\item{period}{results would be printed every number of periods}
|
||||
|
||||
\item{showsd}{whether standard deviations should be printed (when available)}
|
||||
}
|
||||
\description{
|
||||
Callback closure for printing the result of evaluation
|
||||
}
|
||||
\details{
|
||||
The callback function prints the result of evaluation at every \code{period} iterations.
|
||||
The initial and the last iteration's evaluations are always printed.
|
||||
|
||||
Callback function expects the following values to be set in its calling frame:
|
||||
\code{bst_evaluation} (also \code{bst_evaluation_err} when available),
|
||||
\code{iteration},
|
||||
\code{begin_iteration},
|
||||
\code{end_iteration}.
|
||||
}
|
||||
\seealso{
|
||||
\code{\link{callbacks}}
|
||||
}
|
||||
@@ -1,40 +0,0 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/callbacks.R
|
||||
\name{cb.save.model}
|
||||
\alias{cb.save.model}
|
||||
\title{Callback closure for saving a model file.}
|
||||
\usage{
|
||||
cb.save.model(save_period = 0, save_name = "xgboost.ubj")
|
||||
}
|
||||
\arguments{
|
||||
\item{save_period}{save the model to disk after every
|
||||
\code{save_period} iterations; 0 means save the model at the end.}
|
||||
|
||||
\item{save_name}{the name or path for the saved model file.
|
||||
|
||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Note that the format of the model being saved is determined by the file
|
||||
extension specified here (see \link{xgb.save} for details about how it works).
|
||||
|
||||
It can contain a \code{\link[base]{sprintf}} formatting specifier
|
||||
to include the integer iteration number in the file name.
|
||||
E.g., with \code{save_name} = 'xgboost_\%04d.ubj',
|
||||
the file saved at iteration 50 would be named "xgboost_0050.ubj".
|
||||
}\if{html}{\out{</div>}}}
|
||||
}
|
||||
\description{
|
||||
Callback closure for saving a model file.
|
||||
}
|
||||
\details{
|
||||
This callback function allows to save an xgb-model file, either periodically after each \code{save_period}'s or at the end.
|
||||
|
||||
Callback function expects the following values to be set in its calling frame:
|
||||
\code{bst},
|
||||
\code{iteration},
|
||||
\code{begin_iteration},
|
||||
\code{end_iteration}.
|
||||
}
|
||||
\seealso{
|
||||
\link{xgb.save}
|
||||
|
||||
\code{\link{callbacks}}
|
||||
}
|
||||
248
R-package/man/xgb.Callback.Rd
Normal file
248
R-package/man/xgb.Callback.Rd
Normal file
@@ -0,0 +1,248 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/callbacks.R
|
||||
\name{xgb.Callback}
|
||||
\alias{xgb.Callback}
|
||||
\title{XGBoost Callback Constructor}
|
||||
\usage{
|
||||
xgb.Callback(
|
||||
cb_name = "custom_callback",
|
||||
env = new.env(),
|
||||
f_before_training = function(env, model, data, watchlist, begin_iteration,
|
||||
end_iteration) NULL,
|
||||
f_before_iter = function(env, model, data, watchlist, iteration) NULL,
|
||||
f_after_iter = function(env, model, data, watchlist, iteration, iter_feval) NULL,
|
||||
f_after_training = function(env, model, data, watchlist, iteration, final_feval,
|
||||
prev_cb_res) NULL
|
||||
)
|
||||
}
|
||||
\arguments{
|
||||
\item{cb_name}{Name for the callback.
|
||||
|
||||
If the callback produces some non-NULL result (from executing the function passed under
|
||||
\code{f_after_training}), that result will be added as an R attribute to the resulting booster
|
||||
(or as a named element in the result of CV), with the attribute name specified here.
|
||||
|
||||
Names of callbacks must be unique - i.e. there cannot be two callbacks with the same name.}
|
||||
|
||||
\item{env}{An environment object that will be passed to the different functions in the callback.
|
||||
Note that this environment will not be shared with other callbacks.}
|
||||
|
||||
\item{f_before_training}{A function that will be executed before the training has started.
|
||||
|
||||
If passing \code{NULL} for this or for the other function inputs, then no function will be executed.
|
||||
|
||||
If passing a function, it will be called with parameters supplied as non-named arguments
|
||||
matching the function signatures that are shown in the default value for each function argument.}
|
||||
|
||||
\item{f_before_iter}{A function that will be executed before each boosting round.
|
||||
|
||||
This function can signal whether the training should be finalized or not, by outputting
|
||||
a value that evaluates to \code{TRUE} - i.e. if the output from the function provided here at
|
||||
a given round is \code{TRUE}, then training will be stopped before the current iteration happens.
|
||||
|
||||
Return values of \code{NULL} will be interpreted as \code{FALSE}.}
|
||||
|
||||
\item{f_after_iter}{A function that will be executed after each boosting round.
|
||||
|
||||
This function can signal whether the training should be finalized or not, by outputting
|
||||
a value that evaluates to \code{TRUE} - i.e. if the output from the function provided here at
|
||||
a given round is \code{TRUE}, then training will be stopped at that round.
|
||||
|
||||
Return values of \code{NULL} will be interpreted as \code{FALSE}.}
|
||||
|
||||
\item{f_after_training}{A function that will be executed after training is finished.
|
||||
|
||||
This function can optionally output something non-NULL, which will become part of the R
|
||||
attributes of the booster (assuming one passes \code{keep_extra_attributes=TRUE} to \link{xgb.train})
|
||||
under the name supplied for parameter \code{cb_name} imn the case of \link{xgb.train}; or a part
|
||||
of the named elements in the result of \link{xgb.cv}.}
|
||||
}
|
||||
\value{
|
||||
An \code{xgb.Callback} object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
||||
}
|
||||
\description{
|
||||
Constructor for defining the structure of callback functions that can be executed
|
||||
at different stages of model training (before / after training, before / after each boosting
|
||||
iteration).
|
||||
}
|
||||
\details{
|
||||
Arguments that will be passed to the supplied functions are as follows:\itemize{
|
||||
|
||||
\item env The same environment that is passed under argument \code{env}.
|
||||
|
||||
It may be modified by the functions in order to e.g. keep tracking of what happens
|
||||
across iterations or similar.
|
||||
|
||||
This environment is only used by the functions supplied to the callback, and will
|
||||
not be kept after the model fitting function terminates (see parameter \code{f_after_training}).
|
||||
|
||||
\item model The booster object when using \link{xgb.train}, or the folds when using
|
||||
\link{xgb.cv}.
|
||||
|
||||
For \link{xgb.cv}, folds are a list with a structure as follows:\itemize{
|
||||
\item \code{dtrain}: The training data for the fold (as an \code{xgb.DMatrix} object).
|
||||
\item \code{bst}: Rhe \code{xgb.Booster} object for the fold.
|
||||
\item \code{watchlist}: A list with two DMatrices, with names \code{train} and \code{test}
|
||||
(\code{test} is the held-out data for the fold).
|
||||
\item \code{index}: The indices of the hold-out data for that fold (base-1 indexing),
|
||||
from which the \code{test} entry in the watchlist was obtained.
|
||||
}
|
||||
|
||||
This object should \bold{not} be in-place modified in ways that conflict with the
|
||||
training (e.g. resetting the parameters for a training update in a way that resets
|
||||
the number of rounds to zero in order to overwrite rounds).
|
||||
|
||||
Note that any R attributes that are assigned to the booster during the callback functions,
|
||||
will not be kept thereafter as the booster object variable is not re-assigned during
|
||||
training. It is however possible to set C-level attributes of the booster through
|
||||
\link{xgb.attr} or \link{xgb.attributes}, which should remain available for the rest
|
||||
of the iterations and after the training is done.
|
||||
|
||||
For keeping variables across iterations, it's recommended to use \code{env} instead.
|
||||
\item data The data to which the model is being fit, as an \code{xgb.DMatrix} object.
|
||||
|
||||
Note that, for \link{xgb.cv}, this will be the full data, while data for the specific
|
||||
folds can be found in the \code{model} object.
|
||||
|
||||
\item watchlist The evaluation watchlist, as passed under argument \code{watchlist} to
|
||||
\link{xgb.train}.
|
||||
|
||||
For \link{xgb.cv}, this will always be \code{NULL}.
|
||||
|
||||
\item begin_iteration Index of the first boosting iteration that will be executed
|
||||
(base-1 indexing).
|
||||
|
||||
This will typically be '1', but when using training continuation, depending on the
|
||||
parameters for updates, boosting rounds will be continued from where the previous
|
||||
model ended, in which case this will be larger than 1.
|
||||
|
||||
\item end_iteration Index of the last boostign iteration that will be executed
|
||||
(base-1 indexing, inclusive of this end).
|
||||
|
||||
It should match with argument \code{nrounds} passed to \link{xgb.train} or \link{xgb.cv}.
|
||||
|
||||
Note that boosting might be interrupted before reaching this last iteration, for
|
||||
example by using the early stopping callback \link{xgb.cb.early.stop}.
|
||||
|
||||
\item iteration Index of the iteration number that is being executed (first iteration
|
||||
will be the same as parameter \code{begin_iteration}, then next one will add +1, and so on).
|
||||
|
||||
\item iter_feval Evaluation metrics for the \code{watchlist} that was supplied, either
|
||||
determined by the objective, or by parameter \code{feval}.
|
||||
|
||||
For \link{xgb.train}, this will be a named vector with one entry per element in
|
||||
\code{watchlist}, where the names are determined as 'watchlist name' + '-' + 'metric name' - for
|
||||
example, if \code{watchlist} contains an entry named "tr" and the metric is "rmse",
|
||||
this will be a one-element vector with name "tr-rmse".
|
||||
|
||||
For \link{xgb.cv}, this will be a 2d matrix with dimensions \verb{[length(watchlist), nfolds]},
|
||||
where the row names will follow the same naming logic as the one-dimensional vector
|
||||
that is passed in \link{xgb.train}.
|
||||
|
||||
Note that, internally, the built-in callbacks such as \link{xgb.cb.print.evaluation} summarize
|
||||
this table by calculating the row-wise means and standard deviations.
|
||||
|
||||
\item final_feval The evaluation results after the last boosting round is executed
|
||||
(same format as \code{iter_feval}, and will be the exact same input as passed under
|
||||
\code{iter_feval} to the last round that is executed during model fitting).
|
||||
|
||||
\item prev_cb_res Result from a previous run of a callback sharing the same name
|
||||
(as given by parameter \code{cb_name}) when conducting training continuation, if there
|
||||
was any in the booster R attributes.
|
||||
|
||||
Some times, one might want to append the new results to the previous one, and this will
|
||||
be done automatically by the built-in callbacks such as \link{xgb.cb.evaluation.log},
|
||||
which will append the new rows to the previous table.
|
||||
|
||||
If no such previous callback result is available (which it never will when fitting
|
||||
a model from start instead of updating an existing model), this will be \code{NULL}.
|
||||
|
||||
For \link{xgb.cv}, which doesn't support training continuation, this will always be \code{NULL}.
|
||||
}
|
||||
|
||||
The following names (\code{cb_name} values) are reserved for internal callbacks:\itemize{
|
||||
\item print_evaluation
|
||||
\item evaluation_log
|
||||
\item reset_parameters
|
||||
\item early_stop
|
||||
\item save_model
|
||||
\item cv_predict
|
||||
\item gblinear_history
|
||||
}
|
||||
|
||||
The following names are reserved for other non-callback attributes:\itemize{
|
||||
\item names
|
||||
\item class
|
||||
\item call
|
||||
\item params
|
||||
\item niter
|
||||
\item nfeatures
|
||||
\item folds
|
||||
}
|
||||
|
||||
When using the built-in early stopping callback (\link{xgb.cb.early.stop}), said callback
|
||||
will always be executed before the others, as it sets some booster C-level attributes
|
||||
that other callbacks might also use. Otherwise, the order of execution will match with
|
||||
the order in which the callbacks are passed to the model fitting function.
|
||||
}
|
||||
\examples{
|
||||
# Example constructing a custom callback that calculates
|
||||
# squared error on the training data, without a watchlist,
|
||||
# and outputs the per-iteration results.
|
||||
ssq_callback <- xgb.Callback(
|
||||
cb_name = "ssq",
|
||||
f_before_training = function(env, model, data, watchlist,
|
||||
begin_iteration, end_iteration) {
|
||||
# A vector to keep track of a number at each iteration
|
||||
env$logs <- rep(NA_real_, end_iteration - begin_iteration + 1)
|
||||
},
|
||||
f_after_iter = function(env, model, data, watchlist, iteration, iter_feval) {
|
||||
# This calculates the sum of squared errors on the training data.
|
||||
# Note that this can be better done by passing a 'watchlist' entry,
|
||||
# but this demonstrates a way in which callbacks can be structured.
|
||||
pred <- predict(model, data)
|
||||
err <- pred - getinfo(data, "label")
|
||||
sq_err <- sum(err^2)
|
||||
env$logs[iteration] <- sq_err
|
||||
cat(
|
||||
sprintf(
|
||||
"Squared error at iteration \%d: \%.2f\n",
|
||||
iteration, sq_err
|
||||
)
|
||||
)
|
||||
|
||||
# A return value of 'TRUE' here would signal to finalize the training
|
||||
return(FALSE)
|
||||
},
|
||||
f_after_training = function(env, model, data, watchlist, iteration,
|
||||
final_feval, prev_cb_res) {
|
||||
return(env$logs)
|
||||
}
|
||||
)
|
||||
|
||||
data(mtcars)
|
||||
y <- mtcars$mpg
|
||||
x <- as.matrix(mtcars[, -1])
|
||||
dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
||||
model <- xgb.train(
|
||||
data = dm,
|
||||
params = list(objective = "reg:squarederror", nthread = 1),
|
||||
nrounds = 5,
|
||||
callbacks = list(ssq_callback),
|
||||
keep_extra_attributes = TRUE
|
||||
)
|
||||
|
||||
# Result from 'f_after_iter' will be available as an attribute
|
||||
attributes(model)$ssq
|
||||
}
|
||||
\seealso{
|
||||
Built-in callbacks:\itemize{
|
||||
\item \link{xgb.cb.print.evaluation}
|
||||
\item \link{xgb.cb.evaluation.log}
|
||||
\item \link{xgb.cb.reset.parameters}
|
||||
\item \link{xgb.cb.early.stop}
|
||||
\item \link{xgb.cb.save.model}
|
||||
\item \link{xgb.cb.cv.predict}
|
||||
\item \link{xgb.cb.gblinear.history}
|
||||
}
|
||||
}
|
||||
@@ -1,16 +1,27 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/callbacks.R
|
||||
\name{cb.cv.predict}
|
||||
\alias{cb.cv.predict}
|
||||
\title{Callback closure for returning cross-validation based predictions.}
|
||||
\name{xgb.cb.cv.predict}
|
||||
\alias{xgb.cb.cv.predict}
|
||||
\title{Callback for returning cross-validation based predictions.}
|
||||
\usage{
|
||||
cb.cv.predict(save_models = FALSE)
|
||||
xgb.cb.cv.predict(save_models = FALSE, outputmargin = FALSE)
|
||||
}
|
||||
\arguments{
|
||||
\item{save_models}{a flag for whether to save the folds' models.}
|
||||
\item{save_models}{A flag for whether to save the folds' models.}
|
||||
|
||||
\item{outputmargin}{Whether to save margin predictions (same effect as passing this
|
||||
parameter to \link{predict.xgb.Booster}).}
|
||||
}
|
||||
\value{
|
||||
Predictions are returned inside of the \code{pred} element, which is either a vector or a matrix,
|
||||
An \code{xgb.Callback} object, which can be passed to \link{xgb.cv},
|
||||
but \bold{not} to \link{xgb.train}.
|
||||
}
|
||||
\description{
|
||||
This callback function saves predictions for all of the test folds,
|
||||
and also allows to save the folds' models.
|
||||
}
|
||||
\details{
|
||||
Predictions are saved inside of the \code{pred} element, which is either a vector or a matrix,
|
||||
depending on the number of prediction outputs per data row. The order of predictions corresponds
|
||||
to the order of rows in the original dataset. Note that when a custom \code{folds} list is
|
||||
provided in \code{xgb.cv}, the predictions would only be returned properly when this list is a
|
||||
@@ -19,23 +30,3 @@ meaningful when user-provided folds have overlapping indices as in, e.g., random
|
||||
When some of the indices in the training dataset are not included into user-provided \code{folds},
|
||||
their prediction value would be \code{NA}.
|
||||
}
|
||||
\description{
|
||||
Callback closure for returning cross-validation based predictions.
|
||||
}
|
||||
\details{
|
||||
This callback function saves predictions for all of the test folds,
|
||||
and also allows to save the folds' models.
|
||||
|
||||
It is a "finalizer" callback and it uses early stopping information whenever it is available,
|
||||
thus it must be run after the early stopping callback if the early stopping is used.
|
||||
|
||||
Callback function expects the following values to be set in its calling frame:
|
||||
\code{bst_folds},
|
||||
\code{basket},
|
||||
\code{data},
|
||||
\code{end_iteration},
|
||||
\code{params},
|
||||
}
|
||||
\seealso{
|
||||
\code{\link{callbacks}}
|
||||
}
|
||||
55
R-package/man/xgb.cb.early.stop.Rd
Normal file
55
R-package/man/xgb.cb.early.stop.Rd
Normal file
@@ -0,0 +1,55 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/callbacks.R
|
||||
\name{xgb.cb.early.stop}
|
||||
\alias{xgb.cb.early.stop}
|
||||
\title{Callback to activate early stopping}
|
||||
\usage{
|
||||
xgb.cb.early.stop(
|
||||
stopping_rounds,
|
||||
maximize = FALSE,
|
||||
metric_name = NULL,
|
||||
verbose = TRUE,
|
||||
keep_all_iter = TRUE
|
||||
)
|
||||
}
|
||||
\arguments{
|
||||
\item{stopping_rounds}{The number of rounds with no improvement in
|
||||
the evaluation metric in order to stop the training.}
|
||||
|
||||
\item{maximize}{Whether to maximize the evaluation metric.}
|
||||
|
||||
\item{metric_name}{The name of an evaluation column to use as a criteria for early
|
||||
stopping. If not set, the last column would be used.
|
||||
Let's say the test data in \code{watchlist} was labelled as \code{dtest},
|
||||
and one wants to use the AUC in test data for early stopping regardless of where
|
||||
it is in the \code{watchlist}, then one of the following would need to be set:
|
||||
\code{metric_name='dtest-auc'} or \code{metric_name='dtest_auc'}.
|
||||
All dash '-' characters in metric names are considered equivalent to '_'.}
|
||||
|
||||
\item{verbose}{Whether to print the early stopping information.}
|
||||
|
||||
\item{keep_all_iter}{Whether to keep all of the boosting rounds that were produced
|
||||
in the resulting object. If passing \code{FALSE}, will only keep the boosting rounds
|
||||
up to the detected best iteration, discarding the ones that come after.}
|
||||
}
|
||||
\value{
|
||||
An \code{xgb.Callback} object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
||||
}
|
||||
\description{
|
||||
This callback function determines the condition for early stopping.
|
||||
|
||||
The following attributes are assigned to the booster's object:
|
||||
\itemize{
|
||||
\item \code{best_score} the evaluation score at the best iteration
|
||||
\item \code{best_iteration} at which boosting iteration the best score has occurred
|
||||
(0-based index for interoperability of binary models)
|
||||
}
|
||||
|
||||
The same values are also stored as R attributes as a result of the callback, plus an additional
|
||||
attribute \code{stopped_by_max_rounds} which indicates whether an early stopping by the \code{stopping_rounds}
|
||||
condition occurred. Note that the \code{best_iteration} that is stored under R attributes will follow
|
||||
base-1 indexing, so it will be larger by '1' than the C-level 'best_iteration' that is accessed
|
||||
through \link{xgb.attr} or \link{xgb.attributes}.
|
||||
|
||||
At least one data element is required in the evaluation watchlist for early stopping to work.
|
||||
}
|
||||
24
R-package/man/xgb.cb.evaluation.log.Rd
Normal file
24
R-package/man/xgb.cb.evaluation.log.Rd
Normal file
@@ -0,0 +1,24 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/callbacks.R
|
||||
\name{xgb.cb.evaluation.log}
|
||||
\alias{xgb.cb.evaluation.log}
|
||||
\title{Callback for logging the evaluation history}
|
||||
\usage{
|
||||
xgb.cb.evaluation.log()
|
||||
}
|
||||
\value{
|
||||
An \code{xgb.Callback} object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
||||
}
|
||||
\description{
|
||||
Callback for logging the evaluation history
|
||||
}
|
||||
\details{
|
||||
This callback creates a table with per-iteration evaluation metrics (see parameters
|
||||
\code{watchlist} and \code{feval} in \link{xgb.train}).
|
||||
|
||||
Note: in the column names of the final data.table, the dash '-' character is replaced with
|
||||
the underscore '_' in order to make the column names more like regular R identifiers.
|
||||
}
|
||||
\seealso{
|
||||
\link{xgb.cb.print.evaluation}
|
||||
}
|
||||
@@ -1,37 +1,48 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/callbacks.R
|
||||
\name{cb.gblinear.history}
|
||||
\alias{cb.gblinear.history}
|
||||
\title{Callback closure for collecting the model coefficients history of a gblinear booster
|
||||
during its training.}
|
||||
\name{xgb.cb.gblinear.history}
|
||||
\alias{xgb.cb.gblinear.history}
|
||||
\title{Callback for collecting coefficients history of a gblinear booster}
|
||||
\usage{
|
||||
cb.gblinear.history(sparse = FALSE)
|
||||
xgb.cb.gblinear.history(sparse = FALSE)
|
||||
}
|
||||
\arguments{
|
||||
\item{sparse}{when set to FALSE/TRUE, a dense/sparse matrix is used to store the result.
|
||||
\item{sparse}{when set to \code{FALSE}/\code{TRUE}, a dense/sparse matrix is used to store the result.
|
||||
Sparse format is useful when one expects only a subset of coefficients to be non-zero,
|
||||
when using the "thrifty" feature selector with fairly small number of top features
|
||||
selected per iteration.}
|
||||
}
|
||||
\value{
|
||||
Results are stored in the \code{coefs} element of the closure.
|
||||
The \code{\link{xgb.gblinear.history}} convenience function provides an easy
|
||||
way to access it.
|
||||
With \code{xgb.train}, it is either a dense of a sparse matrix.
|
||||
While with \code{xgb.cv}, it is a list (an element per each fold) of such
|
||||
matrices.
|
||||
An \code{xgb.Callback} object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
||||
}
|
||||
\description{
|
||||
Callback closure for collecting the model coefficients history of a gblinear booster
|
||||
during its training.
|
||||
Callback for collecting coefficients history of a gblinear booster
|
||||
}
|
||||
\details{
|
||||
To keep things fast and simple, gblinear booster does not internally store the history of linear
|
||||
model coefficients at each boosting iteration. This callback provides a workaround for storing
|
||||
the coefficients' path, by extracting them after each training iteration.
|
||||
|
||||
Callback function expects the following values to be set in its calling frame:
|
||||
\code{bst} (or \code{bst_folds}).
|
||||
This callback will construct a matrix where rows are boosting iterations and columns are
|
||||
feature coefficients (same order as when calling \link{coef.xgb.Booster}, with the intercept
|
||||
corresponding to the first column).
|
||||
|
||||
When there is more than one coefficient per feature (e.g. multi-class classification),
|
||||
the result will be reshaped into a vector where coefficients are arranged first by features and
|
||||
then by class (e.g. first 1 through N coefficients will be for the first class, then
|
||||
coefficients N+1 through 2N for the second class, and so on).
|
||||
|
||||
If the result has only one coefficient per feature in the data, then the resulting matrix
|
||||
will have column names matching with the feature names, otherwise (when there's more than
|
||||
one coefficient per feature) the names will be composed as 'column name' + ':' + 'class index'
|
||||
(so e.g. column 'c1' for class '0' will be named 'c1:0').
|
||||
|
||||
With \code{xgb.train}, the output is either a dense or a sparse matrix.
|
||||
With with \code{xgb.cv}, it is a list (one element per each fold) of such
|
||||
matrices.
|
||||
|
||||
Function \link{xgb.gblinear.history} function provides an easy way to retrieve the
|
||||
outputs from this callback.
|
||||
}
|
||||
\examples{
|
||||
#### Binary classification:
|
||||
@@ -52,7 +63,7 @@ param <- list(booster = "gblinear", objective = "reg:logistic", eval_metric = "a
|
||||
# rate does not break the convergence, but allows us to illustrate the typical pattern of
|
||||
# "stochastic explosion" behaviour of this lock-free algorithm at early boosting iterations.
|
||||
bst <- xgb.train(param, dtrain, list(tr=dtrain), nrounds = 200, eta = 1.,
|
||||
callbacks = list(cb.gblinear.history()))
|
||||
callbacks = list(xgb.cb.gblinear.history()))
|
||||
# Extract the coefficients' path and plot them vs boosting iteration number:
|
||||
coef_path <- xgb.gblinear.history(bst)
|
||||
matplot(coef_path, type = 'l')
|
||||
@@ -61,7 +72,7 @@ matplot(coef_path, type = 'l')
|
||||
# Will try the classical componentwise boosting which selects a single best feature per round:
|
||||
bst <- xgb.train(param, dtrain, list(tr=dtrain), nrounds = 200, eta = 0.8,
|
||||
updater = 'coord_descent', feature_selector = 'thrifty', top_k = 1,
|
||||
callbacks = list(cb.gblinear.history()))
|
||||
callbacks = list(xgb.cb.gblinear.history()))
|
||||
matplot(xgb.gblinear.history(bst), type = 'l')
|
||||
# Componentwise boosting is known to have similar effect to Lasso regularization.
|
||||
# Try experimenting with various values of top_k, eta, nrounds,
|
||||
@@ -69,7 +80,7 @@ matplot(xgb.gblinear.history(bst), type = 'l')
|
||||
|
||||
# For xgb.cv:
|
||||
bst <- xgb.cv(param, dtrain, nfold = 5, nrounds = 100, eta = 0.8,
|
||||
callbacks = list(cb.gblinear.history()))
|
||||
callbacks = list(xgb.cb.gblinear.history()))
|
||||
# coefficients in the CV fold #3
|
||||
matplot(xgb.gblinear.history(bst)[[3]], type = 'l')
|
||||
|
||||
@@ -82,7 +93,7 @@ param <- list(booster = "gblinear", objective = "multi:softprob", num_class = 3,
|
||||
# For the default linear updater 'shotgun' it sometimes is helpful
|
||||
# to use smaller eta to reduce instability
|
||||
bst <- xgb.train(param, dtrain, list(tr=dtrain), nrounds = 50, eta = 0.5,
|
||||
callbacks = list(cb.gblinear.history()))
|
||||
callbacks = list(xgb.cb.gblinear.history()))
|
||||
# Will plot the coefficient paths separately for each class:
|
||||
matplot(xgb.gblinear.history(bst, class_index = 0), type = 'l')
|
||||
matplot(xgb.gblinear.history(bst, class_index = 1), type = 'l')
|
||||
@@ -90,11 +101,11 @@ matplot(xgb.gblinear.history(bst, class_index = 2), type = 'l')
|
||||
|
||||
# CV:
|
||||
bst <- xgb.cv(param, dtrain, nfold = 5, nrounds = 70, eta = 0.5,
|
||||
callbacks = list(cb.gblinear.history(FALSE)))
|
||||
callbacks = list(xgb.cb.gblinear.history(FALSE)))
|
||||
# 1st fold of 1st class
|
||||
matplot(xgb.gblinear.history(bst, class_index = 0)[[1]], type = 'l')
|
||||
|
||||
}
|
||||
\seealso{
|
||||
\code{\link{callbacks}}, \code{\link{xgb.gblinear.history}}.
|
||||
\link{xgb.gblinear.history}, \link{coef.xgb.Booster}.
|
||||
}
|
||||
25
R-package/man/xgb.cb.print.evaluation.Rd
Normal file
25
R-package/man/xgb.cb.print.evaluation.Rd
Normal file
@@ -0,0 +1,25 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/callbacks.R
|
||||
\name{xgb.cb.print.evaluation}
|
||||
\alias{xgb.cb.print.evaluation}
|
||||
\title{Callback for printing the result of evaluation}
|
||||
\usage{
|
||||
xgb.cb.print.evaluation(period = 1, showsd = TRUE)
|
||||
}
|
||||
\arguments{
|
||||
\item{period}{results would be printed every number of periods}
|
||||
|
||||
\item{showsd}{whether standard deviations should be printed (when available)}
|
||||
}
|
||||
\value{
|
||||
An \code{xgb.Callback} object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
||||
}
|
||||
\description{
|
||||
The callback function prints the result of evaluation at every \code{period} iterations.
|
||||
The initial and the last iteration's evaluations are always printed.
|
||||
|
||||
Does not leave any attribute in the booster (see \link{xgb.cb.evaluation.log} for that).
|
||||
}
|
||||
\seealso{
|
||||
\link{xgb.Callback}
|
||||
}
|
||||
@@ -1,10 +1,10 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/callbacks.R
|
||||
\name{cb.reset.parameters}
|
||||
\alias{cb.reset.parameters}
|
||||
\title{Callback closure for resetting the booster's parameters at each iteration.}
|
||||
\name{xgb.cb.reset.parameters}
|
||||
\alias{xgb.cb.reset.parameters}
|
||||
\title{Callback for resetting the booster's parameters at each iteration.}
|
||||
\usage{
|
||||
cb.reset.parameters(new_params)
|
||||
xgb.cb.reset.parameters(new_params)
|
||||
}
|
||||
\arguments{
|
||||
\item{new_params}{a list where each element corresponds to a parameter that needs to be reset.
|
||||
@@ -14,23 +14,16 @@ or a function of two parameters \code{learning_rates(iteration, nrounds)}
|
||||
which returns a new parameter value by using the current iteration number
|
||||
and the total number of boosting rounds.}
|
||||
}
|
||||
\value{
|
||||
An \code{xgb.Callback} object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
||||
}
|
||||
\description{
|
||||
Callback closure for resetting the booster's parameters at each iteration.
|
||||
Callback for resetting the booster's parameters at each iteration.
|
||||
}
|
||||
\details{
|
||||
This is a "pre-iteration" callback function used to reset booster's parameters
|
||||
at the beginning of each iteration.
|
||||
|
||||
Note that when training is resumed from some previous model, and a function is used to
|
||||
reset a parameter value, the \code{nrounds} argument in this function would be the
|
||||
the number of boosting rounds in the current training.
|
||||
|
||||
Callback function expects the following values to be set in its calling frame:
|
||||
\code{bst} or \code{bst_folds},
|
||||
\code{iteration},
|
||||
\code{begin_iteration},
|
||||
\code{end_iteration}.
|
||||
}
|
||||
\seealso{
|
||||
\code{\link{callbacks}}
|
||||
Does not leave any attribute in the booster.
|
||||
}
|
||||
28
R-package/man/xgb.cb.save.model.Rd
Normal file
28
R-package/man/xgb.cb.save.model.Rd
Normal file
@@ -0,0 +1,28 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/callbacks.R
|
||||
\name{xgb.cb.save.model}
|
||||
\alias{xgb.cb.save.model}
|
||||
\title{Callback for saving a model file.}
|
||||
\usage{
|
||||
xgb.cb.save.model(save_period = 0, save_name = "xgboost.ubj")
|
||||
}
|
||||
\arguments{
|
||||
\item{save_period}{Save the model to disk after every
|
||||
\code{save_period} iterations; 0 means save the model at the end.}
|
||||
|
||||
\item{save_name}{The name or path for the saved model file.
|
||||
It can contain a \code{\link[base]{sprintf}} formatting specifier
|
||||
to include the integer iteration number in the file name.
|
||||
E.g., with \code{save_name} = 'xgboost_\%04d.model',
|
||||
the file saved at iteration 50 would be named "xgboost_0050.model".}
|
||||
}
|
||||
\value{
|
||||
An \code{xgb.Callback} object, which can be passed to \link{xgb.train},
|
||||
but \bold{not} to \link{xgb.cv}.
|
||||
}
|
||||
\description{
|
||||
This callback function allows to save an xgb-model file, either periodically
|
||||
after each \code{save_period}'s or at the end.
|
||||
|
||||
Does not leave any attribute in the booster.
|
||||
}
|
||||
@@ -59,7 +59,7 @@ that NA values should be considered as 'missing' by the algorithm.
|
||||
Sometimes, 0 or other extreme value might be used to represent missing values.}
|
||||
|
||||
\item{prediction}{A logical value indicating whether to return the test fold predictions
|
||||
from each CV model. This parameter engages the \code{\link{cb.cv.predict}} callback.}
|
||||
from each CV model. This parameter engages the \code{\link{xgb.cb.cv.predict}} callback.}
|
||||
|
||||
\item{showsd}{\code{boolean}, whether to show standard deviation of cross validation}
|
||||
|
||||
@@ -98,20 +98,20 @@ the \code{nfold} and \code{stratified} parameters are ignored.}
|
||||
|
||||
\item{print_every_n}{Print each n-th iteration evaluation messages when \code{verbose>0}.
|
||||
Default is 1 which means all messages are printed. This parameter is passed to the
|
||||
\code{\link{cb.print.evaluation}} callback.}
|
||||
\code{\link{xgb.cb.print.evaluation}} callback.}
|
||||
|
||||
\item{early_stopping_rounds}{If \code{NULL}, the early stopping function is not triggered.
|
||||
If set to an integer \code{k}, training with a validation set will stop if the performance
|
||||
doesn't improve for \code{k} rounds.
|
||||
Setting this parameter engages the \code{\link{cb.early.stop}} callback.}
|
||||
Setting this parameter engages the \code{\link{xgb.cb.early.stop}} callback.}
|
||||
|
||||
\item{maximize}{If \code{feval} and \code{early_stopping_rounds} are set,
|
||||
then this parameter must be set as well.
|
||||
When it is \code{TRUE}, it means the larger the evaluation score the better.
|
||||
This parameter is passed to the \code{\link{cb.early.stop}} callback.}
|
||||
This parameter is passed to the \code{\link{xgb.cb.early.stop}} callback.}
|
||||
|
||||
\item{callbacks}{a list of callback functions to perform various task during boosting.
|
||||
See \code{\link{callbacks}}. Some of the callbacks are automatically created depending on the
|
||||
See \code{\link{xgb.Callback}}. Some of the callbacks are automatically created depending on the
|
||||
parameters' values. User can provide either existing or their own callback methods in order
|
||||
to customize the training process.}
|
||||
|
||||
@@ -122,24 +122,24 @@ An object of class \code{xgb.cv.synchronous} with the following elements:
|
||||
\itemize{
|
||||
\item \code{call} a function call.
|
||||
\item \code{params} parameters that were passed to the xgboost library. Note that it does not
|
||||
capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
|
||||
\item \code{callbacks} callback functions that were either automatically assigned or
|
||||
explicitly passed.
|
||||
capture parameters changed by the \code{\link{xgb.cb.reset.parameters}} callback.
|
||||
\item \code{evaluation_log} evaluation history stored as a \code{data.table} with the
|
||||
first column corresponding to iteration number and the rest corresponding to the
|
||||
CV-based evaluation means and standard deviations for the training and test CV-sets.
|
||||
It is created by the \code{\link{cb.evaluation.log}} callback.
|
||||
It is created by the \code{\link{xgb.cb.evaluation.log}} callback.
|
||||
\item \code{niter} number of boosting iterations.
|
||||
\item \code{nfeatures} number of features in training data.
|
||||
\item \code{folds} the list of CV folds' indices - either those passed through the \code{folds}
|
||||
parameter or randomly generated.
|
||||
\item \code{best_iteration} iteration number with the best evaluation metric value
|
||||
(only available with early stopping).
|
||||
\item \code{pred} CV prediction values available when \code{prediction} is set.
|
||||
It is either vector or matrix (see \code{\link{cb.cv.predict}}).
|
||||
\item \code{models} a list of the CV folds' models. It is only available with the explicit
|
||||
setting of the \code{cb.cv.predict(save_models = TRUE)} callback.
|
||||
}
|
||||
|
||||
Plus other potential elements that are the result of callbacks, such as a list \code{cv_predict} with
|
||||
a sub-element \code{pred} when passing \code{prediction = TRUE}, which is added by the \link{xgb.cb.cv.predict}
|
||||
callback (note that one can also pass it manually under \code{callbacks} with different settings,
|
||||
such as saving also the models created during cross validation); or a list \code{early_stop} which
|
||||
will contain elements such as \code{best_iteration} when using the early stopping callback (\link{xgb.cb.early.stop}).
|
||||
}
|
||||
\description{
|
||||
The cross validation function of xgboost
|
||||
|
||||
@@ -8,7 +8,7 @@ xgb.gblinear.history(model, class_index = NULL)
|
||||
}
|
||||
\arguments{
|
||||
\item{model}{either an \code{xgb.Booster} or a result of \code{xgb.cv()}, trained
|
||||
using the \code{cb.gblinear.history()} callback, but \bold{not} a booster
|
||||
using the \link{xgb.cb.gblinear.history} callback, but \bold{not} a booster
|
||||
loaded from \link{xgb.load} or \link{xgb.load.raw}.}
|
||||
|
||||
\item{class_index}{zero-based class index to extract the coefficients for only that
|
||||
@@ -16,23 +16,31 @@ specific class in a multinomial multiclass model. When it is NULL, all the
|
||||
coefficients are returned. Has no effect in non-multiclass models.}
|
||||
}
|
||||
\value{
|
||||
For an \code{xgb.train} result, a matrix (either dense or sparse) with the columns
|
||||
corresponding to iteration's coefficients (in the order as \code{xgb.dump()} would
|
||||
return) and the rows corresponding to boosting iterations.
|
||||
For an \link{xgb.train} result, a matrix (either dense or sparse) with the columns
|
||||
corresponding to iteration's coefficients and the rows corresponding to boosting iterations.
|
||||
|
||||
For an \code{xgb.cv} result, a list of such matrices is returned with the elements
|
||||
For an \link{xgb.cv} result, a list of such matrices is returned with the elements
|
||||
corresponding to CV folds.
|
||||
|
||||
When there is more than one coefficient per feature (e.g. multi-class classification)
|
||||
and \code{class_index} is not provided,
|
||||
the result will be reshaped into a vector where coefficients are arranged first by features and
|
||||
then by class (e.g. first 1 through N coefficients will be for the first class, then
|
||||
coefficients N+1 through 2N for the second class, and so on).
|
||||
}
|
||||
\description{
|
||||
A helper function to extract the matrix of linear coefficients' history
|
||||
from a gblinear model created while using the \code{cb.gblinear.history()}
|
||||
callback.
|
||||
from a gblinear model created while using the \link{xgb.cb.gblinear.history}
|
||||
callback (which must be added manually as by default it's not used).
|
||||
}
|
||||
\details{
|
||||
Note that this is an R-specific function that relies on R attributes that
|
||||
are not saved when using xgboost's own serialization functions like \link{xgb.load}
|
||||
or \link{xgb.load.raw}.
|
||||
|
||||
In order for a serialized model to be accepted by tgis function, one must use R
|
||||
In order for a serialized model to be accepted by this function, one must use R
|
||||
serializers such as \link{saveRDS}.
|
||||
}
|
||||
\seealso{
|
||||
\link{xgb.cb.gblinear.history}, \link{coef.xgb.Booster}.
|
||||
}
|
||||
|
||||
@@ -17,7 +17,7 @@ Load xgboost model from the binary model file.
|
||||
}
|
||||
\details{
|
||||
The input file is expected to contain a model saved in an xgboost model format
|
||||
using either \code{\link{xgb.save}} or \code{\link{cb.save.model}} in R, or using some
|
||||
using either \code{\link{xgb.save}} or \code{\link{xgb.cb.save.model}} in R, or using some
|
||||
appropriate methods from other xgboost interfaces. E.g., a model trained in Python and
|
||||
saved from there in xgboost format, could be loaded from R.
|
||||
|
||||
|
||||
@@ -162,7 +162,7 @@ List is provided in detail section.}
|
||||
Metrics specified in either \code{eval_metric} or \code{feval} will be computed for each
|
||||
of these datasets during each boosting iteration, and stored in the end as a field named
|
||||
\code{evaluation_log} in the resulting object. When either \code{verbose>=1} or
|
||||
\code{\link{cb.print.evaluation}} callback is engaged, the performance results are continuously
|
||||
\code{\link{xgb.cb.print.evaluation}} callback is engaged, the performance results are continuously
|
||||
printed out during the training.
|
||||
E.g., specifying \code{watchlist=list(validation1=mat1, validation2=mat2)} allows to track
|
||||
the performance of each round's model on mat1 and mat2.}
|
||||
@@ -177,24 +177,24 @@ prediction and dtrain.}
|
||||
\item{verbose}{If 0, xgboost will stay silent. If 1, it will print information about performance.
|
||||
If 2, some additional information will be printed out.
|
||||
Note that setting \code{verbose > 0} automatically engages the
|
||||
\code{cb.print.evaluation(period=1)} callback function.}
|
||||
\code{xgb.cb.print.evaluation(period=1)} callback function.}
|
||||
|
||||
\item{print_every_n}{Print each n-th iteration evaluation messages when \code{verbose>0}.
|
||||
Default is 1 which means all messages are printed. This parameter is passed to the
|
||||
\code{\link{cb.print.evaluation}} callback.}
|
||||
\code{\link{xgb.cb.print.evaluation}} callback.}
|
||||
|
||||
\item{early_stopping_rounds}{If \code{NULL}, the early stopping function is not triggered.
|
||||
If set to an integer \code{k}, training with a validation set will stop if the performance
|
||||
doesn't improve for \code{k} rounds.
|
||||
Setting this parameter engages the \code{\link{cb.early.stop}} callback.}
|
||||
Setting this parameter engages the \code{\link{xgb.cb.early.stop}} callback.}
|
||||
|
||||
\item{maximize}{If \code{feval} and \code{early_stopping_rounds} are set,
|
||||
then this parameter must be set as well.
|
||||
When it is \code{TRUE}, it means the larger the evaluation score the better.
|
||||
This parameter is passed to the \code{\link{cb.early.stop}} callback.}
|
||||
This parameter is passed to the \code{\link{xgb.cb.early.stop}} callback.}
|
||||
|
||||
\item{save_period}{when it is non-NULL, model is saved to disk after every \code{save_period} rounds,
|
||||
0 means save at the end. The saving is handled by the \code{\link{cb.save.model}} callback.}
|
||||
0 means save at the end. The saving is handled by the \code{\link{xgb.cb.save.model}} callback.}
|
||||
|
||||
\item{save_name}{the name or path for periodically saved model file.}
|
||||
|
||||
@@ -203,12 +203,13 @@ Could be either an object of class \code{xgb.Booster}, or its raw data, or the n
|
||||
file with a previously saved model.}
|
||||
|
||||
\item{callbacks}{a list of callback functions to perform various task during boosting.
|
||||
See \code{\link{callbacks}}. Some of the callbacks are automatically created depending on the
|
||||
See \code{\link{xgb.Callback}}. Some of the callbacks are automatically created depending on the
|
||||
parameters' values. User can provide either existing or their own callback methods in order
|
||||
to customize the training process.
|
||||
|
||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Note that some callbacks might try to set an evaluation log - be aware that these evaluation logs
|
||||
are kept as R attributes, and thus do not get saved when using non-R serializaters like
|
||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Note that some callbacks might try to leave attributes in the resulting model object,
|
||||
such as an evaluation log (a `data.table` object) - be aware that these objects are kept
|
||||
as R attributes, and thus do not get saved when using XGBoost's own serializaters like
|
||||
\link{xgb.save} (but are kept when using R serializers like \link{saveRDS}).
|
||||
}\if{html}{\out{</div>}}}
|
||||
|
||||
@@ -269,18 +270,19 @@ Different threshold (e.g., 0.) could be specified as "error@0."
|
||||
|
||||
The following callbacks are automatically created when certain parameters are set:
|
||||
\itemize{
|
||||
\item \code{cb.print.evaluation} is turned on when \code{verbose > 0};
|
||||
\item \code{xgb.cb.print.evaluation} is turned on when \code{verbose > 0};
|
||||
and the \code{print_every_n} parameter is passed to it.
|
||||
\item \code{cb.evaluation.log} is on when \code{watchlist} is present.
|
||||
\item \code{cb.early.stop}: when \code{early_stopping_rounds} is set.
|
||||
\item \code{cb.save.model}: when \code{save_period > 0} is set.
|
||||
\item \code{xgb.cb.evaluation.log} is on when \code{watchlist} is present.
|
||||
\item \code{xgb.cb.early.stop}: when \code{early_stopping_rounds} is set.
|
||||
\item \code{xgb.cb.save.model}: when \code{save_period > 0} is set.
|
||||
}
|
||||
|
||||
Note that objects of type \code{xgb.Booster} as returned by this function behave a bit differently
|
||||
from typical R objects (it's an 'altrep' list class), and it makes a separation between
|
||||
internal booster attributes (restricted to jsonifyable data), accessed through \link{xgb.attr}
|
||||
and shared between interfaces through serialization functions like \link{xgb.save}; and
|
||||
R-specific attributes, accessed through \link{attributes} and \link{attr}, which are otherwise
|
||||
R-specific attributes (typically the result from a callback), accessed through \link{attributes}
|
||||
and \link{attr}, which are otherwise
|
||||
only used in the R interface, only kept when using R's serializers like \link{saveRDS}, and
|
||||
not anyhow used by functions like \link{predict.xgb.Booster}.
|
||||
|
||||
@@ -348,7 +350,7 @@ param <- list(max_depth = 2, eta = 1, nthread = nthread,
|
||||
objective = "binary:logistic", eval_metric = "auc")
|
||||
my_etas <- list(eta = c(0.5, 0.1))
|
||||
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist, verbose = 0,
|
||||
callbacks = list(cb.reset.parameters(my_etas)))
|
||||
callbacks = list(xgb.cb.reset.parameters(my_etas)))
|
||||
|
||||
## Early stopping:
|
||||
bst <- xgb.train(param, dtrain, nrounds = 25, watchlist,
|
||||
@@ -366,7 +368,7 @@ Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System",
|
||||
22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, \url{https://arxiv.org/abs/1603.02754}
|
||||
}
|
||||
\seealso{
|
||||
\code{\link{callbacks}},
|
||||
\code{\link{xgb.Callback}},
|
||||
\code{\link{predict.xgb.Booster}},
|
||||
\code{\link{xgb.cv}}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user