[R] Work on Roxygen documentation (#10674)
This commit is contained in:
parent
5db0803eb2
commit
b949a4bf7b
@ -1,9 +1,120 @@
|
|||||||
.reserved_cb_names <- c("names", "class", "call", "params", "niter", "nfeatures", "folds")
|
.reserved_cb_names <- c("names", "class", "call", "params", "niter", "nfeatures", "folds")
|
||||||
|
|
||||||
#' @title XGBoost Callback Constructor
|
#' XGBoost Callback Constructor
|
||||||
#' @description Constructor for defining the structure of callback functions that can be executed
|
#'
|
||||||
|
#' Constructor for defining the structure of callback functions that can be executed
|
||||||
#' at different stages of model training (before / after training, before / after each boosting
|
#' at different stages of model training (before / after training, before / after each boosting
|
||||||
#' iteration).
|
#' iteration).
|
||||||
|
#'
|
||||||
|
#' @details
|
||||||
|
#' Arguments that will be passed to the supplied functions are as follows:
|
||||||
|
#' - env The same environment that is passed under argument `env`.
|
||||||
|
#'
|
||||||
|
#' It may be modified by the functions in order to e.g. keep tracking of what happens
|
||||||
|
#' across iterations or similar.
|
||||||
|
#'
|
||||||
|
#' This environment is only used by the functions supplied to the callback, and will
|
||||||
|
#' not be kept after the model fitting function terminates (see parameter `f_after_training`).
|
||||||
|
#'
|
||||||
|
#' - model The booster object when using [xgb.train()], or the folds when using [xgb.cv()].
|
||||||
|
#'
|
||||||
|
#' For [xgb.cv()], folds are a list with a structure as follows:
|
||||||
|
#' - `dtrain`: The training data for the fold (as an `xgb.DMatrix` object).
|
||||||
|
#' - `bst`: Rhe `xgb.Booster` object for the fold.
|
||||||
|
#' - `evals`: A list containing two DMatrices, with names `train` and `test`
|
||||||
|
#' (`test` is the held-out data for the fold).
|
||||||
|
#' - `index`: The indices of the hold-out data for that fold (base-1 indexing),
|
||||||
|
#' from which the `test` entry in `evals` was obtained.
|
||||||
|
#'
|
||||||
|
#' This object should **not** be in-place modified in ways that conflict with the
|
||||||
|
#' training (e.g. resetting the parameters for a training update in a way that resets
|
||||||
|
#' the number of rounds to zero in order to overwrite rounds).
|
||||||
|
#'
|
||||||
|
#' Note that any R attributes that are assigned to the booster during the callback functions,
|
||||||
|
#' will not be kept thereafter as the booster object variable is not re-assigned during
|
||||||
|
#' training. It is however possible to set C-level attributes of the booster through
|
||||||
|
#' [xgb.attr()] or [xgb.attributes()], which should remain available for the rest
|
||||||
|
#' of the iterations and after the training is done.
|
||||||
|
#'
|
||||||
|
#' For keeping variables across iterations, it's recommended to use `env` instead.
|
||||||
|
#' - data The data to which the model is being fit, as an `xgb.DMatrix` object.
|
||||||
|
#'
|
||||||
|
#' Note that, for [xgb.cv()], this will be the full data, while data for the specific
|
||||||
|
#' folds can be found in the `model` object.
|
||||||
|
#' - evals The evaluation data, as passed under argument `evals` to [xgb.train()].
|
||||||
|
#'
|
||||||
|
#' For [xgb.cv()], this will always be `NULL`.
|
||||||
|
#' - begin_iteration Index of the first boosting iteration that will be executed (base-1 indexing).
|
||||||
|
#'
|
||||||
|
#' This will typically be '1', but when using training continuation, depending on the
|
||||||
|
#' parameters for updates, boosting rounds will be continued from where the previous
|
||||||
|
#' model ended, in which case this will be larger than 1.
|
||||||
|
#'
|
||||||
|
#' - end_iteration Index of the last boostign iteration that will be executed
|
||||||
|
#' (base-1 indexing, inclusive of this end).
|
||||||
|
#'
|
||||||
|
#' It should match with argument `nrounds` passed to [xgb.train()] or [xgb.cv()].
|
||||||
|
#'
|
||||||
|
#' Note that boosting might be interrupted before reaching this last iteration, for
|
||||||
|
#' example by using the early stopping callback \link{xgb.cb.early.stop}.
|
||||||
|
#' - iteration Index of the iteration number that is being executed (first iteration
|
||||||
|
#' will be the same as parameter `begin_iteration`, then next one will add +1, and so on).
|
||||||
|
#'
|
||||||
|
#' - iter_feval Evaluation metrics for `evals` that were supplied, either
|
||||||
|
#' determined by the objective, or by parameter `feval`.
|
||||||
|
#'
|
||||||
|
#' For [xgb.train()], this will be a named vector with one entry per element in
|
||||||
|
#' `evals`, where the names are determined as 'evals name' + '-' + 'metric name' - for
|
||||||
|
#' example, if `evals` contains an entry named "tr" and the metric is "rmse",
|
||||||
|
#' this will be a one-element vector with name "tr-rmse".
|
||||||
|
#'
|
||||||
|
#' For [xgb.cv()], this will be a 2d matrix with dimensions `[length(evals), nfolds]`,
|
||||||
|
#' where the row names will follow the same naming logic as the one-dimensional vector
|
||||||
|
#' that is passed in [xgb.train()].
|
||||||
|
#'
|
||||||
|
#' Note that, internally, the built-in callbacks such as [xgb.cb.print.evaluation] summarize
|
||||||
|
#' this table by calculating the row-wise means and standard deviations.
|
||||||
|
#'
|
||||||
|
#' - final_feval The evaluation results after the last boosting round is executed
|
||||||
|
#' (same format as `iter_feval`, and will be the exact same input as passed under
|
||||||
|
#' `iter_feval` to the last round that is executed during model fitting).
|
||||||
|
#'
|
||||||
|
#' - prev_cb_res Result from a previous run of a callback sharing the same name
|
||||||
|
#' (as given by parameter `cb_name`) when conducting training continuation, if there
|
||||||
|
#' was any in the booster R attributes.
|
||||||
|
#'
|
||||||
|
#' Sometimes, one might want to append the new results to the previous one, and this will
|
||||||
|
#' be done automatically by the built-in callbacks such as [xgb.cb.evaluation.log],
|
||||||
|
#' which will append the new rows to the previous table.
|
||||||
|
#'
|
||||||
|
#' If no such previous callback result is available (which it never will when fitting
|
||||||
|
#' a model from start instead of updating an existing model), this will be `NULL`.
|
||||||
|
#'
|
||||||
|
#' For [xgb.cv()], which doesn't support training continuation, this will always be `NULL`.
|
||||||
|
#'
|
||||||
|
#' The following names (`cb_name` values) are reserved for internal callbacks:
|
||||||
|
#' - print_evaluation
|
||||||
|
#' - evaluation_log
|
||||||
|
#' - reset_parameters
|
||||||
|
#' - early_stop
|
||||||
|
#' - save_model
|
||||||
|
#' - cv_predict
|
||||||
|
#' - gblinear_history
|
||||||
|
#'
|
||||||
|
#' The following names are reserved for other non-callback attributes:
|
||||||
|
#' - names
|
||||||
|
#' - class
|
||||||
|
#' - call
|
||||||
|
#' - params
|
||||||
|
#' - niter
|
||||||
|
#' - nfeatures
|
||||||
|
#' - folds
|
||||||
|
#'
|
||||||
|
#' When using the built-in early stopping callback ([xgb.cb.early.stop]), said callback
|
||||||
|
#' will always be executed before the others, as it sets some booster C-level attributes
|
||||||
|
#' that other callbacks might also use. Otherwise, the order of execution will match with
|
||||||
|
#' the order in which the callbacks are passed to the model fitting function.
|
||||||
|
#'
|
||||||
#' @param cb_name Name for the callback.
|
#' @param cb_name Name for the callback.
|
||||||
#'
|
#'
|
||||||
#' If the callback produces some non-NULL result (from executing the function passed under
|
#' If the callback produces some non-NULL result (from executing the function passed under
|
||||||
@ -36,137 +147,20 @@
|
|||||||
#' @param f_after_training A function that will be executed after training is finished.
|
#' @param f_after_training A function that will be executed after training is finished.
|
||||||
#'
|
#'
|
||||||
#' This function can optionally output something non-NULL, which will become part of the R
|
#' This function can optionally output something non-NULL, which will become part of the R
|
||||||
#' attributes of the booster (assuming one passes `keep_extra_attributes=TRUE` to \link{xgb.train})
|
#' attributes of the booster (assuming one passes `keep_extra_attributes=TRUE` to [xgb.train()])
|
||||||
#' under the name supplied for parameter `cb_name` imn the case of \link{xgb.train}; or a part
|
#' under the name supplied for parameter `cb_name` imn the case of [xgb.train()]; or a part
|
||||||
#' of the named elements in the result of \link{xgb.cv}.
|
#' of the named elements in the result of [xgb.cv()].
|
||||||
#' @return An `xgb.Callback` object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
#' @return An `xgb.Callback` object, which can be passed to [xgb.train()] or [xgb.cv()].
|
||||||
#' @details Arguments that will be passed to the supplied functions are as follows:\itemize{
|
|
||||||
#'
|
#'
|
||||||
#' \item env The same environment that is passed under argument `env`.
|
#' @seealso Built-in callbacks:
|
||||||
#'
|
#' - [xgb.cb.print.evaluation]
|
||||||
#' It may be modified by the functions in order to e.g. keep tracking of what happens
|
#' - [xgb.cb.evaluation.log]
|
||||||
#' across iterations or similar.
|
#' - [xgb.cb.reset.parameters]
|
||||||
#'
|
#' - [xgb.cb.early.stop]
|
||||||
#' This environment is only used by the functions supplied to the callback, and will
|
#' - [xgb.cb.save.model]
|
||||||
#' not be kept after the model fitting function terminates (see parameter `f_after_training`).
|
#' - [xgb.cb.cv.predict]
|
||||||
#'
|
#' - [xgb.cb.gblinear.history]
|
||||||
#' \item model The booster object when using \link{xgb.train}, or the folds when using
|
#
|
||||||
#' \link{xgb.cv}.
|
|
||||||
#'
|
|
||||||
#' For \link{xgb.cv}, folds are a list with a structure as follows:\itemize{
|
|
||||||
#' \item `dtrain`: The training data for the fold (as an `xgb.DMatrix` object).
|
|
||||||
#' \item `bst`: Rhe `xgb.Booster` object for the fold.
|
|
||||||
#' \item `evals`: A list containing two DMatrices, with names `train` and `test`
|
|
||||||
#' (`test` is the held-out data for the fold).
|
|
||||||
#' \item `index`: The indices of the hold-out data for that fold (base-1 indexing),
|
|
||||||
#' from which the `test` entry in `evals` was obtained.
|
|
||||||
#' }
|
|
||||||
#'
|
|
||||||
#' This object should \bold{not} be in-place modified in ways that conflict with the
|
|
||||||
#' training (e.g. resetting the parameters for a training update in a way that resets
|
|
||||||
#' the number of rounds to zero in order to overwrite rounds).
|
|
||||||
#'
|
|
||||||
#' Note that any R attributes that are assigned to the booster during the callback functions,
|
|
||||||
#' will not be kept thereafter as the booster object variable is not re-assigned during
|
|
||||||
#' training. It is however possible to set C-level attributes of the booster through
|
|
||||||
#' \link{xgb.attr} or \link{xgb.attributes}, which should remain available for the rest
|
|
||||||
#' of the iterations and after the training is done.
|
|
||||||
#'
|
|
||||||
#' For keeping variables across iterations, it's recommended to use `env` instead.
|
|
||||||
#' \item data The data to which the model is being fit, as an `xgb.DMatrix` object.
|
|
||||||
#'
|
|
||||||
#' Note that, for \link{xgb.cv}, this will be the full data, while data for the specific
|
|
||||||
#' folds can be found in the `model` object.
|
|
||||||
#'
|
|
||||||
#' \item evals The evaluation data, as passed under argument `evals` to
|
|
||||||
#' \link{xgb.train}.
|
|
||||||
#'
|
|
||||||
#' For \link{xgb.cv}, this will always be `NULL`.
|
|
||||||
#'
|
|
||||||
#' \item begin_iteration Index of the first boosting iteration that will be executed
|
|
||||||
#' (base-1 indexing).
|
|
||||||
#'
|
|
||||||
#' This will typically be '1', but when using training continuation, depending on the
|
|
||||||
#' parameters for updates, boosting rounds will be continued from where the previous
|
|
||||||
#' model ended, in which case this will be larger than 1.
|
|
||||||
#'
|
|
||||||
#' \item end_iteration Index of the last boostign iteration that will be executed
|
|
||||||
#' (base-1 indexing, inclusive of this end).
|
|
||||||
#'
|
|
||||||
#' It should match with argument `nrounds` passed to \link{xgb.train} or \link{xgb.cv}.
|
|
||||||
#'
|
|
||||||
#' Note that boosting might be interrupted before reaching this last iteration, for
|
|
||||||
#' example by using the early stopping callback \link{xgb.cb.early.stop}.
|
|
||||||
#'
|
|
||||||
#' \item iteration Index of the iteration number that is being executed (first iteration
|
|
||||||
#' will be the same as parameter `begin_iteration`, then next one will add +1, and so on).
|
|
||||||
#'
|
|
||||||
#' \item iter_feval Evaluation metrics for `evals` that were supplied, either
|
|
||||||
#' determined by the objective, or by parameter `feval`.
|
|
||||||
#'
|
|
||||||
#' For \link{xgb.train}, this will be a named vector with one entry per element in
|
|
||||||
#' `evals`, where the names are determined as 'evals name' + '-' + 'metric name' - for
|
|
||||||
#' example, if `evals` contains an entry named "tr" and the metric is "rmse",
|
|
||||||
#' this will be a one-element vector with name "tr-rmse".
|
|
||||||
#'
|
|
||||||
#' For \link{xgb.cv}, this will be a 2d matrix with dimensions `[length(evals), nfolds]`,
|
|
||||||
#' where the row names will follow the same naming logic as the one-dimensional vector
|
|
||||||
#' that is passed in \link{xgb.train}.
|
|
||||||
#'
|
|
||||||
#' Note that, internally, the built-in callbacks such as \link{xgb.cb.print.evaluation} summarize
|
|
||||||
#' this table by calculating the row-wise means and standard deviations.
|
|
||||||
#'
|
|
||||||
#' \item final_feval The evaluation results after the last boosting round is executed
|
|
||||||
#' (same format as `iter_feval`, and will be the exact same input as passed under
|
|
||||||
#' `iter_feval` to the last round that is executed during model fitting).
|
|
||||||
#'
|
|
||||||
#' \item prev_cb_res Result from a previous run of a callback sharing the same name
|
|
||||||
#' (as given by parameter `cb_name`) when conducting training continuation, if there
|
|
||||||
#' was any in the booster R attributes.
|
|
||||||
#'
|
|
||||||
#' Some times, one might want to append the new results to the previous one, and this will
|
|
||||||
#' be done automatically by the built-in callbacks such as \link{xgb.cb.evaluation.log},
|
|
||||||
#' which will append the new rows to the previous table.
|
|
||||||
#'
|
|
||||||
#' If no such previous callback result is available (which it never will when fitting
|
|
||||||
#' a model from start instead of updating an existing model), this will be `NULL`.
|
|
||||||
#'
|
|
||||||
#' For \link{xgb.cv}, which doesn't support training continuation, this will always be `NULL`.
|
|
||||||
#' }
|
|
||||||
#'
|
|
||||||
#' The following names (`cb_name` values) are reserved for internal callbacks:\itemize{
|
|
||||||
#' \item print_evaluation
|
|
||||||
#' \item evaluation_log
|
|
||||||
#' \item reset_parameters
|
|
||||||
#' \item early_stop
|
|
||||||
#' \item save_model
|
|
||||||
#' \item cv_predict
|
|
||||||
#' \item gblinear_history
|
|
||||||
#' }
|
|
||||||
#'
|
|
||||||
#' The following names are reserved for other non-callback attributes:\itemize{
|
|
||||||
#' \item names
|
|
||||||
#' \item class
|
|
||||||
#' \item call
|
|
||||||
#' \item params
|
|
||||||
#' \item niter
|
|
||||||
#' \item nfeatures
|
|
||||||
#' \item folds
|
|
||||||
#' }
|
|
||||||
#'
|
|
||||||
#' When using the built-in early stopping callback (\link{xgb.cb.early.stop}), said callback
|
|
||||||
#' will always be executed before the others, as it sets some booster C-level attributes
|
|
||||||
#' that other callbacks might also use. Otherwise, the order of execution will match with
|
|
||||||
#' the order in which the callbacks are passed to the model fitting function.
|
|
||||||
#' @seealso Built-in callbacks:\itemize{
|
|
||||||
#' \item \link{xgb.cb.print.evaluation}
|
|
||||||
#' \item \link{xgb.cb.evaluation.log}
|
|
||||||
#' \item \link{xgb.cb.reset.parameters}
|
|
||||||
#' \item \link{xgb.cb.early.stop}
|
|
||||||
#' \item \link{xgb.cb.save.model}
|
|
||||||
#' \item \link{xgb.cb.cv.predict}
|
|
||||||
#' \item \link{xgb.cb.gblinear.history}
|
|
||||||
#' }
|
|
||||||
#' @examples
|
#' @examples
|
||||||
#' # Example constructing a custom callback that calculates
|
#' # Example constructing a custom callback that calculates
|
||||||
#' # squared error on the training data (no separate test set),
|
#' # squared error on the training data (no separate test set),
|
||||||
@ -203,8 +197,10 @@
|
|||||||
#' )
|
#' )
|
||||||
#'
|
#'
|
||||||
#' data(mtcars)
|
#' data(mtcars)
|
||||||
|
#'
|
||||||
#' y <- mtcars$mpg
|
#' y <- mtcars$mpg
|
||||||
#' x <- as.matrix(mtcars[, -1])
|
#' x <- as.matrix(mtcars[, -1])
|
||||||
|
#'
|
||||||
#' dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
#' dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
||||||
#' model <- xgb.train(
|
#' model <- xgb.train(
|
||||||
#' data = dm,
|
#' data = dm,
|
||||||
@ -407,16 +403,18 @@ xgb.Callback <- function(
|
|||||||
return(paste0(iter, res))
|
return(paste0(iter, res))
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Callback for printing the result of evaluation
|
#' Callback for printing the result of evaluation
|
||||||
#' @param period results would be printed every number of periods
|
#'
|
||||||
#' @param showsd whether standard deviations should be printed (when available)
|
|
||||||
#' @return An `xgb.Callback` object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
|
||||||
#' @description
|
#' @description
|
||||||
#' The callback function prints the result of evaluation at every \code{period} iterations.
|
#' The callback function prints the result of evaluation at every `period` iterations.
|
||||||
#' The initial and the last iteration's evaluations are always printed.
|
#' The initial and the last iteration's evaluations are always printed.
|
||||||
#'
|
#'
|
||||||
#' Does not leave any attribute in the booster (see \link{xgb.cb.evaluation.log} for that).
|
#' Does not leave any attribute in the booster (see [xgb.cb.evaluation.log] for that).
|
||||||
#' @seealso \link{xgb.Callback}
|
#'
|
||||||
|
#' @param period Results would be printed every number of periods.
|
||||||
|
#' @param showsd Whether standard deviations should be printed (when available).
|
||||||
|
#' @return An `xgb.Callback` object, which can be passed to [xgb.train()] or [xgb.cv()].
|
||||||
|
#' @seealso [xgb.Callback]
|
||||||
#' @export
|
#' @export
|
||||||
xgb.cb.print.evaluation <- function(period = 1, showsd = TRUE) {
|
xgb.cb.print.evaluation <- function(period = 1, showsd = TRUE) {
|
||||||
if (length(period) != 1 || period != floor(period) || period < 1) {
|
if (length(period) != 1 || period != floor(period) || period < 1) {
|
||||||
@ -450,14 +448,16 @@ xgb.cb.print.evaluation <- function(period = 1, showsd = TRUE) {
|
|||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Callback for logging the evaluation history
|
#' Callback for logging the evaluation history
|
||||||
#' @return An `xgb.Callback` object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
#'
|
||||||
#' @details This callback creates a table with per-iteration evaluation metrics (see parameters
|
#' @details This callback creates a table with per-iteration evaluation metrics (see parameters
|
||||||
#' `evals` and `feval` in \link{xgb.train}).
|
#' `evals` and `feval` in [xgb.train()]).
|
||||||
#' @details
|
#'
|
||||||
#' Note: in the column names of the final data.table, the dash '-' character is replaced with
|
#' Note: in the column names of the final data.table, the dash '-' character is replaced with
|
||||||
#' the underscore '_' in order to make the column names more like regular R identifiers.
|
#' the underscore '_' in order to make the column names more like regular R identifiers.
|
||||||
#' @seealso \link{xgb.cb.print.evaluation}
|
#'
|
||||||
|
#' @return An `xgb.Callback` object, which can be passed to [xgb.train()] or [xgb.cv()].
|
||||||
|
#' @seealso [xgb.cb.print.evaluation]
|
||||||
#' @export
|
#' @export
|
||||||
xgb.cb.evaluation.log <- function() {
|
xgb.cb.evaluation.log <- function() {
|
||||||
xgb.Callback(
|
xgb.Callback(
|
||||||
@ -517,20 +517,22 @@ xgb.cb.evaluation.log <- function() {
|
|||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Callback for resetting the booster's parameters at each iteration.
|
#' Callback for resetting booster parameters at each iteration
|
||||||
#' @param new_params a list where each element corresponds to a parameter that needs to be reset.
|
#'
|
||||||
#' Each element's value must be either a vector of values of length \code{nrounds}
|
|
||||||
#' to be set at each iteration,
|
|
||||||
#' or a function of two parameters \code{learning_rates(iteration, nrounds)}
|
|
||||||
#' which returns a new parameter value by using the current iteration number
|
|
||||||
#' and the total number of boosting rounds.
|
|
||||||
#' @return An `xgb.Callback` object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
|
||||||
#' @details
|
#' @details
|
||||||
#' Note that when training is resumed from some previous model, and a function is used to
|
#' Note that when training is resumed from some previous model, and a function is used to
|
||||||
#' reset a parameter value, the \code{nrounds} argument in this function would be the
|
#' reset a parameter value, the `nrounds` argument in this function would be the
|
||||||
#' the number of boosting rounds in the current training.
|
#' the number of boosting rounds in the current training.
|
||||||
#'
|
#'
|
||||||
#' Does not leave any attribute in the booster.
|
#' Does not leave any attribute in the booster.
|
||||||
|
#'
|
||||||
|
#' @param new_params List of parameters needed to be reset.
|
||||||
|
#' Each element's value must be either a vector of values of length `nrounds`
|
||||||
|
#' to be set at each iteration,
|
||||||
|
#' or a function of two parameters `learning_rates(iteration, nrounds)`
|
||||||
|
#' which returns a new parameter value by using the current iteration number
|
||||||
|
#' and the total number of boosting rounds.
|
||||||
|
#' @return An `xgb.Callback` object, which can be passed to [xgb.train()] or [xgb.cv()].
|
||||||
#' @export
|
#' @export
|
||||||
xgb.cb.reset.parameters <- function(new_params) {
|
xgb.cb.reset.parameters <- function(new_params) {
|
||||||
stopifnot(is.list(new_params))
|
stopifnot(is.list(new_params))
|
||||||
@ -583,39 +585,39 @@ xgb.cb.reset.parameters <- function(new_params) {
|
|||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Callback to activate early stopping
|
#' Callback to activate early stopping
|
||||||
#' @param stopping_rounds The number of rounds with no improvement in
|
#'
|
||||||
#' the evaluation metric in order to stop the training.
|
|
||||||
#' @param maximize Whether to maximize the evaluation metric.
|
|
||||||
#' @param metric_name The name of an evaluation column to use as a criteria for early
|
|
||||||
#' stopping. If not set, the last column would be used.
|
|
||||||
#' Let's say the test data in \code{evals} was labelled as \code{dtest},
|
|
||||||
#' and one wants to use the AUC in test data for early stopping regardless of where
|
|
||||||
#' it is in the \code{evals}, then one of the following would need to be set:
|
|
||||||
#' \code{metric_name='dtest-auc'} or \code{metric_name='dtest_auc'}.
|
|
||||||
#' All dash '-' characters in metric names are considered equivalent to '_'.
|
|
||||||
#' @param verbose Whether to print the early stopping information.
|
|
||||||
#' @param keep_all_iter Whether to keep all of the boosting rounds that were produced
|
|
||||||
#' in the resulting object. If passing `FALSE`, will only keep the boosting rounds
|
|
||||||
#' up to the detected best iteration, discarding the ones that come after.
|
|
||||||
#' @return An `xgb.Callback` object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
|
||||||
#' @description
|
#' @description
|
||||||
#' This callback function determines the condition for early stopping.
|
#' This callback function determines the condition for early stopping.
|
||||||
#'
|
#'
|
||||||
#' The following attributes are assigned to the booster's object:
|
#' The following attributes are assigned to the booster's object:
|
||||||
#' \itemize{
|
#' - `best_score` the evaluation score at the best iteration
|
||||||
#' \item \code{best_score} the evaluation score at the best iteration
|
#' - `best_iteration` at which boosting iteration the best score has occurred
|
||||||
#' \item \code{best_iteration} at which boosting iteration the best score has occurred
|
|
||||||
#' (0-based index for interoperability of binary models)
|
#' (0-based index for interoperability of binary models)
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' The same values are also stored as R attributes as a result of the callback, plus an additional
|
#' The same values are also stored as R attributes as a result of the callback, plus an additional
|
||||||
#' attribute `stopped_by_max_rounds` which indicates whether an early stopping by the `stopping_rounds`
|
#' attribute `stopped_by_max_rounds` which indicates whether an early stopping by the `stopping_rounds`
|
||||||
#' condition occurred. Note that the `best_iteration` that is stored under R attributes will follow
|
#' condition occurred. Note that the `best_iteration` that is stored under R attributes will follow
|
||||||
#' base-1 indexing, so it will be larger by '1' than the C-level 'best_iteration' that is accessed
|
#' base-1 indexing, so it will be larger by '1' than the C-level 'best_iteration' that is accessed
|
||||||
#' through \link{xgb.attr} or \link{xgb.attributes}.
|
#' through [xgb.attr()] or [xgb.attributes()].
|
||||||
#'
|
#'
|
||||||
#' At least one dataset is required in `evals` for early stopping to work.
|
#' At least one dataset is required in `evals` for early stopping to work.
|
||||||
|
#'
|
||||||
|
#' @param stopping_rounds The number of rounds with no improvement in
|
||||||
|
#' the evaluation metric in order to stop the training.
|
||||||
|
#' @param maximize Whether to maximize the evaluation metric.
|
||||||
|
#' @param metric_name The name of an evaluation column to use as a criteria for early
|
||||||
|
#' stopping. If not set, the last column would be used.
|
||||||
|
#' Let's say the test data in `evals` was labelled as `dtest`,
|
||||||
|
#' and one wants to use the AUC in test data for early stopping regardless of where
|
||||||
|
#' it is in the `evals`, then one of the following would need to be set:
|
||||||
|
#' `metric_name = 'dtest-auc'` or `metric_name = 'dtest_auc'`.
|
||||||
|
#' All dash '-' characters in metric names are considered equivalent to '_'.
|
||||||
|
#' @param verbose Whether to print the early stopping information.
|
||||||
|
#' @param keep_all_iter Whether to keep all of the boosting rounds that were produced
|
||||||
|
#' in the resulting object. If passing `FALSE`, will only keep the boosting rounds
|
||||||
|
#' up to the detected best iteration, discarding the ones that come after.
|
||||||
|
#' @return An `xgb.Callback` object, which can be passed to [xgb.train()] or [xgb.cv()].
|
||||||
#' @export
|
#' @export
|
||||||
xgb.cb.early.stop <- function(
|
xgb.cb.early.stop <- function(
|
||||||
stopping_rounds,
|
stopping_rounds,
|
||||||
@ -771,21 +773,22 @@ xgb.cb.early.stop <- function(
|
|||||||
xgb.save(model, save_name)
|
xgb.save(model, save_name)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Callback for saving a model file.
|
#' Callback for saving a model file
|
||||||
#' @param save_period Save the model to disk after every
|
#'
|
||||||
#' \code{save_period} iterations; 0 means save the model at the end.
|
|
||||||
#' @param save_name The name or path for the saved model file.
|
|
||||||
#' It can contain a \code{\link[base]{sprintf}} formatting specifier
|
|
||||||
#' to include the integer iteration number in the file name.
|
|
||||||
#' E.g., with \code{save_name} = 'xgboost_%04d.model',
|
|
||||||
#' the file saved at iteration 50 would be named "xgboost_0050.model".
|
|
||||||
#' @return An `xgb.Callback` object, which can be passed to \link{xgb.train},
|
|
||||||
#' but \bold{not} to \link{xgb.cv}.
|
|
||||||
#' @description
|
#' @description
|
||||||
#' This callback function allows to save an xgb-model file, either periodically
|
#' This callback function allows to save an xgb-model file, either periodically
|
||||||
#' after each \code{save_period}'s or at the end.
|
#' after each `save_period`'s or at the end.
|
||||||
#'
|
#'
|
||||||
#' Does not leave any attribute in the booster.
|
#' Does not leave any attribute in the booster.
|
||||||
|
#'
|
||||||
|
#' @param save_period Save the model to disk after every `save_period` iterations;
|
||||||
|
#' 0 means save the model at the end.
|
||||||
|
#' @param save_name The name or path for the saved model file.
|
||||||
|
#' It can contain a [sprintf()] formatting specifier to include the integer
|
||||||
|
#' iteration number in the file name. E.g., with `save_name = 'xgboost_%04d.model'`,
|
||||||
|
#' the file saved at iteration 50 would be named "xgboost_0050.model".
|
||||||
|
#' @return An `xgb.Callback` object, which can be passed to [xgb.train()],
|
||||||
|
#' but **not** to [xgb.cv()].
|
||||||
#' @export
|
#' @export
|
||||||
xgb.cb.save.model <- function(save_period = 0, save_name = "xgboost.ubj") {
|
xgb.cb.save.model <- function(save_period = 0, save_name = "xgboost.ubj") {
|
||||||
if (save_period < 0) {
|
if (save_period < 0) {
|
||||||
@ -817,24 +820,26 @@ xgb.cb.save.model <- function(save_period = 0, save_name = "xgboost.ubj") {
|
|||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Callback for returning cross-validation based predictions.
|
#' Callback for returning cross-validation based predictions
|
||||||
#' @param save_models A flag for whether to save the folds' models.
|
#'
|
||||||
#' @param outputmargin Whether to save margin predictions (same effect as passing this
|
|
||||||
#' parameter to \link{predict.xgb.Booster}).
|
|
||||||
#' @return An `xgb.Callback` object, which can be passed to \link{xgb.cv},
|
|
||||||
#' but \bold{not} to \link{xgb.train}.
|
|
||||||
#' @description
|
|
||||||
#' This callback function saves predictions for all of the test folds,
|
#' This callback function saves predictions for all of the test folds,
|
||||||
#' and also allows to save the folds' models.
|
#' and also allows to save the folds' models.
|
||||||
|
#'
|
||||||
#' @details
|
#' @details
|
||||||
#' Predictions are saved inside of the \code{pred} element, which is either a vector or a matrix,
|
#' Predictions are saved inside of the `pred` element, which is either a vector or a matrix,
|
||||||
#' depending on the number of prediction outputs per data row. The order of predictions corresponds
|
#' depending on the number of prediction outputs per data row. The order of predictions corresponds
|
||||||
#' to the order of rows in the original dataset. Note that when a custom \code{folds} list is
|
#' to the order of rows in the original dataset. Note that when a custom `folds` list is
|
||||||
#' provided in \code{xgb.cv}, the predictions would only be returned properly when this list is a
|
#' provided in [xgb.cv()], the predictions would only be returned properly when this list is a
|
||||||
#' non-overlapping list of k sets of indices, as in a standard k-fold CV. The predictions would not be
|
#' non-overlapping list of k sets of indices, as in a standard k-fold CV. The predictions would not be
|
||||||
#' meaningful when user-provided folds have overlapping indices as in, e.g., random sampling splits.
|
#' meaningful when user-provided folds have overlapping indices as in, e.g., random sampling splits.
|
||||||
#' When some of the indices in the training dataset are not included into user-provided \code{folds},
|
#' When some of the indices in the training dataset are not included into user-provided `folds`,
|
||||||
#' their prediction value would be \code{NA}.
|
#' their prediction value would be `NA`.
|
||||||
|
#'
|
||||||
|
#' @param save_models A flag for whether to save the folds' models.
|
||||||
|
#' @param outputmargin Whether to save margin predictions (same effect as passing this
|
||||||
|
#' parameter to [predict.xgb.Booster]).
|
||||||
|
#' @return An `xgb.Callback` object, which can be passed to [xgb.cv()],
|
||||||
|
#' but **not** to [xgb.train()].
|
||||||
#' @export
|
#' @export
|
||||||
xgb.cb.cv.predict <- function(save_models = FALSE, outputmargin = FALSE) {
|
xgb.cb.cv.predict <- function(save_models = FALSE, outputmargin = FALSE) {
|
||||||
xgb.Callback(
|
xgb.Callback(
|
||||||
@ -903,19 +908,15 @@ xgb.cb.cv.predict <- function(save_models = FALSE, outputmargin = FALSE) {
|
|||||||
return(coefs)
|
return(coefs)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Callback for collecting coefficients history of a gblinear booster
|
#' Callback for collecting coefficients history of a gblinear booster
|
||||||
#' @param sparse when set to `FALSE`/`TRUE`, a dense/sparse matrix is used to store the result.
|
#'
|
||||||
#' Sparse format is useful when one expects only a subset of coefficients to be non-zero,
|
|
||||||
#' when using the "thrifty" feature selector with fairly small number of top features
|
|
||||||
#' selected per iteration.
|
|
||||||
#' @return An `xgb.Callback` object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
|
||||||
#' @details
|
#' @details
|
||||||
#' To keep things fast and simple, gblinear booster does not internally store the history of linear
|
#' To keep things fast and simple, gblinear booster does not internally store the history of linear
|
||||||
#' model coefficients at each boosting iteration. This callback provides a workaround for storing
|
#' model coefficients at each boosting iteration. This callback provides a workaround for storing
|
||||||
#' the coefficients' path, by extracting them after each training iteration.
|
#' the coefficients' path, by extracting them after each training iteration.
|
||||||
#'
|
#'
|
||||||
#' This callback will construct a matrix where rows are boosting iterations and columns are
|
#' This callback will construct a matrix where rows are boosting iterations and columns are
|
||||||
#' feature coefficients (same order as when calling \link{coef.xgb.Booster}, with the intercept
|
#' feature coefficients (same order as when calling [coef.xgb.Booster], with the intercept
|
||||||
#' corresponding to the first column).
|
#' corresponding to the first column).
|
||||||
#'
|
#'
|
||||||
#' When there is more than one coefficient per feature (e.g. multi-class classification),
|
#' When there is more than one coefficient per feature (e.g. multi-class classification),
|
||||||
@ -928,13 +929,18 @@ xgb.cb.cv.predict <- function(save_models = FALSE, outputmargin = FALSE) {
|
|||||||
#' one coefficient per feature) the names will be composed as 'column name' + ':' + 'class index'
|
#' one coefficient per feature) the names will be composed as 'column name' + ':' + 'class index'
|
||||||
#' (so e.g. column 'c1' for class '0' will be named 'c1:0').
|
#' (so e.g. column 'c1' for class '0' will be named 'c1:0').
|
||||||
#'
|
#'
|
||||||
#' With \code{xgb.train}, the output is either a dense or a sparse matrix.
|
#' With [xgb.train()], the output is either a dense or a sparse matrix.
|
||||||
#' With with \code{xgb.cv}, it is a list (one element per each fold) of such
|
#' With with [xgb.cv()], it is a list (one element per each fold) of such matrices.
|
||||||
#' matrices.
|
|
||||||
#'
|
#'
|
||||||
#' Function \link{xgb.gblinear.history} function provides an easy way to retrieve the
|
#' Function [xgb.gblinear.history] provides an easy way to retrieve the
|
||||||
#' outputs from this callback.
|
#' outputs from this callback.
|
||||||
#' @seealso \link{xgb.gblinear.history}, \link{coef.xgb.Booster}.
|
#'
|
||||||
|
#' @param sparse When set to `FALSE`/`TRUE`, a dense/sparse matrix is used to store the result.
|
||||||
|
#' Sparse format is useful when one expects only a subset of coefficients to be non-zero,
|
||||||
|
#' when using the "thrifty" feature selector with fairly small number of top features
|
||||||
|
#' selected per iteration.
|
||||||
|
#' @return An `xgb.Callback` object, which can be passed to [xgb.train()] or [xgb.cv()].
|
||||||
|
#' @seealso [xgb.gblinear.history], [coef.xgb.Booster].
|
||||||
#' @examples
|
#' @examples
|
||||||
#' #### Binary classification:
|
#' #### Binary classification:
|
||||||
#'
|
#'
|
||||||
@ -946,55 +952,107 @@ xgb.cb.cv.predict <- function(save_models = FALSE, outputmargin = FALSE) {
|
|||||||
#' # without considering the 2nd order interactions:
|
#' # without considering the 2nd order interactions:
|
||||||
#' x <- model.matrix(Species ~ .^2, iris)[, -1]
|
#' x <- model.matrix(Species ~ .^2, iris)[, -1]
|
||||||
#' colnames(x)
|
#' colnames(x)
|
||||||
#' dtrain <- xgb.DMatrix(scale(x), label = 1*(iris$Species == "versicolor"), nthread = nthread)
|
#' dtrain <- xgb.DMatrix(
|
||||||
#' param <- list(booster = "gblinear", objective = "reg:logistic", eval_metric = "auc",
|
#' scale(x),
|
||||||
#' lambda = 0.0003, alpha = 0.0003, nthread = nthread)
|
#' label = 1 * (iris$Species == "versicolor"),
|
||||||
|
#' nthread = nthread
|
||||||
|
#' )
|
||||||
|
#' param <- list(
|
||||||
|
#' booster = "gblinear",
|
||||||
|
#' objective = "reg:logistic",
|
||||||
|
#' eval_metric = "auc",
|
||||||
|
#' lambda = 0.0003,
|
||||||
|
#' alpha = 0.0003,
|
||||||
|
#' nthread = nthread
|
||||||
|
#' )
|
||||||
|
#'
|
||||||
#' # For 'shotgun', which is a default linear updater, using high eta values may result in
|
#' # For 'shotgun', which is a default linear updater, using high eta values may result in
|
||||||
#' # unstable behaviour in some datasets. With this simple dataset, however, the high learning
|
#' # unstable behaviour in some datasets. With this simple dataset, however, the high learning
|
||||||
#' # rate does not break the convergence, but allows us to illustrate the typical pattern of
|
#' # rate does not break the convergence, but allows us to illustrate the typical pattern of
|
||||||
#' # "stochastic explosion" behaviour of this lock-free algorithm at early boosting iterations.
|
#' # "stochastic explosion" behaviour of this lock-free algorithm at early boosting iterations.
|
||||||
#' bst <- xgb.train(param, dtrain, list(tr=dtrain), nrounds = 200, eta = 1.,
|
#' bst <- xgb.train(
|
||||||
#' callbacks = list(xgb.cb.gblinear.history()))
|
#' param,
|
||||||
|
#' dtrain,
|
||||||
|
#' list(tr = dtrain),
|
||||||
|
#' nrounds = 200,
|
||||||
|
#' eta = 1.,
|
||||||
|
#' callbacks = list(xgb.cb.gblinear.history())
|
||||||
|
#' )
|
||||||
|
#'
|
||||||
#' # Extract the coefficients' path and plot them vs boosting iteration number:
|
#' # Extract the coefficients' path and plot them vs boosting iteration number:
|
||||||
#' coef_path <- xgb.gblinear.history(bst)
|
#' coef_path <- xgb.gblinear.history(bst)
|
||||||
#' matplot(coef_path, type = 'l')
|
#' matplot(coef_path, type = "l")
|
||||||
#'
|
#'
|
||||||
#' # With the deterministic coordinate descent updater, it is safer to use higher learning rates.
|
#' # With the deterministic coordinate descent updater, it is safer to use higher learning rates.
|
||||||
#' # Will try the classical componentwise boosting which selects a single best feature per round:
|
#' # Will try the classical componentwise boosting which selects a single best feature per round:
|
||||||
#' bst <- xgb.train(param, dtrain, list(tr=dtrain), nrounds = 200, eta = 0.8,
|
#' bst <- xgb.train(
|
||||||
#' updater = 'coord_descent', feature_selector = 'thrifty', top_k = 1,
|
#' param,
|
||||||
#' callbacks = list(xgb.cb.gblinear.history()))
|
#' dtrain,
|
||||||
#' matplot(xgb.gblinear.history(bst), type = 'l')
|
#' list(tr = dtrain),
|
||||||
|
#' nrounds = 200,
|
||||||
|
#' eta = 0.8,
|
||||||
|
#' updater = "coord_descent",
|
||||||
|
#' feature_selector = "thrifty",
|
||||||
|
#' top_k = 1,
|
||||||
|
#' callbacks = list(xgb.cb.gblinear.history())
|
||||||
|
#' )
|
||||||
|
#' matplot(xgb.gblinear.history(bst), type = "l")
|
||||||
#' # Componentwise boosting is known to have similar effect to Lasso regularization.
|
#' # Componentwise boosting is known to have similar effect to Lasso regularization.
|
||||||
#' # Try experimenting with various values of top_k, eta, nrounds,
|
#' # Try experimenting with various values of top_k, eta, nrounds,
|
||||||
#' # as well as different feature_selectors.
|
#' # as well as different feature_selectors.
|
||||||
#'
|
#'
|
||||||
#' # For xgb.cv:
|
#' # For xgb.cv:
|
||||||
#' bst <- xgb.cv(param, dtrain, nfold = 5, nrounds = 100, eta = 0.8,
|
#' bst <- xgb.cv(
|
||||||
#' callbacks = list(xgb.cb.gblinear.history()))
|
#' param,
|
||||||
|
#' dtrain,
|
||||||
|
#' nfold = 5,
|
||||||
|
#' nrounds = 100,
|
||||||
|
#' eta = 0.8,
|
||||||
|
#' callbacks = list(xgb.cb.gblinear.history())
|
||||||
|
#' )
|
||||||
#' # coefficients in the CV fold #3
|
#' # coefficients in the CV fold #3
|
||||||
#' matplot(xgb.gblinear.history(bst)[[3]], type = 'l')
|
#' matplot(xgb.gblinear.history(bst)[[3]], type = "l")
|
||||||
#'
|
#'
|
||||||
#'
|
#'
|
||||||
#' #### Multiclass classification:
|
#' #### Multiclass classification:
|
||||||
#' #
|
|
||||||
#' dtrain <- xgb.DMatrix(scale(x), label = as.numeric(iris$Species) - 1, nthread = nthread)
|
#' dtrain <- xgb.DMatrix(scale(x), label = as.numeric(iris$Species) - 1, nthread = nthread)
|
||||||
#' param <- list(booster = "gblinear", objective = "multi:softprob", num_class = 3,
|
#'
|
||||||
#' lambda = 0.0003, alpha = 0.0003, nthread = nthread)
|
#' param <- list(
|
||||||
|
#' booster = "gblinear",
|
||||||
|
#' objective = "multi:softprob",
|
||||||
|
#' num_class = 3,
|
||||||
|
#' lambda = 0.0003,
|
||||||
|
#' alpha = 0.0003,
|
||||||
|
#' nthread = nthread
|
||||||
|
#' )
|
||||||
|
#'
|
||||||
#' # For the default linear updater 'shotgun' it sometimes is helpful
|
#' # For the default linear updater 'shotgun' it sometimes is helpful
|
||||||
#' # to use smaller eta to reduce instability
|
#' # to use smaller eta to reduce instability
|
||||||
#' bst <- xgb.train(param, dtrain, list(tr=dtrain), nrounds = 50, eta = 0.5,
|
#' bst <- xgb.train(
|
||||||
#' callbacks = list(xgb.cb.gblinear.history()))
|
#' param,
|
||||||
|
#' dtrain,
|
||||||
|
#' list(tr = dtrain),
|
||||||
|
#' nrounds = 50,
|
||||||
|
#' eta = 0.5,
|
||||||
|
#' callbacks = list(xgb.cb.gblinear.history())
|
||||||
|
#' )
|
||||||
|
#'
|
||||||
#' # Will plot the coefficient paths separately for each class:
|
#' # Will plot the coefficient paths separately for each class:
|
||||||
#' matplot(xgb.gblinear.history(bst, class_index = 0), type = 'l')
|
#' matplot(xgb.gblinear.history(bst, class_index = 0), type = "l")
|
||||||
#' matplot(xgb.gblinear.history(bst, class_index = 1), type = 'l')
|
#' matplot(xgb.gblinear.history(bst, class_index = 1), type = "l")
|
||||||
#' matplot(xgb.gblinear.history(bst, class_index = 2), type = 'l')
|
#' matplot(xgb.gblinear.history(bst, class_index = 2), type = "l")
|
||||||
#'
|
#'
|
||||||
#' # CV:
|
#' # CV:
|
||||||
#' bst <- xgb.cv(param, dtrain, nfold = 5, nrounds = 70, eta = 0.5,
|
#' bst <- xgb.cv(
|
||||||
#' callbacks = list(xgb.cb.gblinear.history(FALSE)))
|
#' param,
|
||||||
|
#' dtrain,
|
||||||
|
#' nfold = 5,
|
||||||
|
#' nrounds = 70,
|
||||||
|
#' eta = 0.5,
|
||||||
|
#' callbacks = list(xgb.cb.gblinear.history(FALSE))
|
||||||
|
#' )
|
||||||
#' # 1st fold of 1st class
|
#' # 1st fold of 1st class
|
||||||
#' matplot(xgb.gblinear.history(bst, class_index = 0)[[1]], type = 'l')
|
#' matplot(xgb.gblinear.history(bst, class_index = 0)[[1]], type = "l")
|
||||||
#'
|
#'
|
||||||
#' @export
|
#' @export
|
||||||
xgb.cb.gblinear.history <- function(sparse = FALSE) {
|
xgb.cb.gblinear.history <- function(sparse = FALSE) {
|
||||||
@ -1097,28 +1155,31 @@ xgb.cb.gblinear.history <- function(sparse = FALSE) {
|
|||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Extract gblinear coefficients history.
|
#' Extract gblinear coefficients history
|
||||||
#' @description A helper function to extract the matrix of linear coefficients' history
|
#'
|
||||||
#' from a gblinear model created while using the \link{xgb.cb.gblinear.history}
|
#' A helper function to extract the matrix of linear coefficients' history
|
||||||
#' callback (which must be added manually as by default it's not used).
|
#' from a gblinear model created while using the [xgb.cb.gblinear.history]
|
||||||
#' @details Note that this is an R-specific function that relies on R attributes that
|
#' callback (which must be added manually as by default it is not used).
|
||||||
#' are not saved when using xgboost's own serialization functions like \link{xgb.load}
|
#'
|
||||||
#' or \link{xgb.load.raw}.
|
#' @details
|
||||||
|
#' Note that this is an R-specific function that relies on R attributes that
|
||||||
|
#' are not saved when using XGBoost's own serialization functions like [xgb.load()]
|
||||||
|
#' or [xgb.load.raw()].
|
||||||
#'
|
#'
|
||||||
#' In order for a serialized model to be accepted by this function, one must use R
|
#' In order for a serialized model to be accepted by this function, one must use R
|
||||||
#' serializers such as \link{saveRDS}.
|
#' serializers such as [saveRDS()].
|
||||||
#' @param model either an \code{xgb.Booster} or a result of \code{xgb.cv()}, trained
|
#' @param model Either an `xgb.Booster` or a result of [xgb.cv()], trained
|
||||||
#' using the \link{xgb.cb.gblinear.history} callback, but \bold{not} a booster
|
#' using the [xgb.cb.gblinear.history] callback, but **not** a booster
|
||||||
#' loaded from \link{xgb.load} or \link{xgb.load.raw}.
|
#' loaded from [xgb.load()] or [xgb.load.raw()].
|
||||||
#' @param class_index zero-based class index to extract the coefficients for only that
|
#' @param class_index zero-based class index to extract the coefficients for only that
|
||||||
#' specific class in a multinomial multiclass model. When it is NULL, all the
|
#' specific class in a multinomial multiclass model. When it is `NULL`, all the
|
||||||
#' coefficients are returned. Has no effect in non-multiclass models.
|
#' coefficients are returned. Has no effect in non-multiclass models.
|
||||||
#'
|
#'
|
||||||
#' @return
|
#' @return
|
||||||
#' For an \link{xgb.train} result, a matrix (either dense or sparse) with the columns
|
#' For an [xgb.train()] result, a matrix (either dense or sparse) with the columns
|
||||||
#' corresponding to iteration's coefficients and the rows corresponding to boosting iterations.
|
#' corresponding to iteration's coefficients and the rows corresponding to boosting iterations.
|
||||||
#'
|
#'
|
||||||
#' For an \link{xgb.cv} result, a list of such matrices is returned with the elements
|
#' For an [xgb.cv()] result, a list of such matrices is returned with the elements
|
||||||
#' corresponding to CV folds.
|
#' corresponding to CV folds.
|
||||||
#'
|
#'
|
||||||
#' When there is more than one coefficient per feature (e.g. multi-class classification)
|
#' When there is more than one coefficient per feature (e.g. multi-class classification)
|
||||||
@ -1126,7 +1187,7 @@ xgb.cb.gblinear.history <- function(sparse = FALSE) {
|
|||||||
#' the result will be reshaped into a vector where coefficients are arranged first by features and
|
#' the result will be reshaped into a vector where coefficients are arranged first by features and
|
||||||
#' then by class (e.g. first 1 through N coefficients will be for the first class, then
|
#' then by class (e.g. first 1 through N coefficients will be for the first class, then
|
||||||
#' coefficients N+1 through 2N for the second class, and so on).
|
#' coefficients N+1 through 2N for the second class, and so on).
|
||||||
#' @seealso \link{xgb.cb.gblinear.history}, \link{coef.xgb.Booster}.
|
#' @seealso [xgb.cb.gblinear.history], [coef.xgb.Booster].
|
||||||
#' @export
|
#' @export
|
||||||
xgb.gblinear.history <- function(model, class_index = NULL) {
|
xgb.gblinear.history <- function(model, class_index = NULL) {
|
||||||
|
|
||||||
|
|||||||
@ -410,7 +410,7 @@ xgb.createFolds <- function(y, k) {
|
|||||||
#' At this time, some of the parameter names were changed in order to make the code style more uniform.
|
#' At this time, some of the parameter names were changed in order to make the code style more uniform.
|
||||||
#' The deprecated parameters would be removed in the next release.
|
#' The deprecated parameters would be removed in the next release.
|
||||||
#'
|
#'
|
||||||
#' To see all the current deprecated and new parameters, check the \code{xgboost:::depr_par_lut} table.
|
#' To see all the current deprecated and new parameters, check the `xgboost:::depr_par_lut` table.
|
||||||
#'
|
#'
|
||||||
#' A deprecation warning is shown when any of the deprecated parameters is used in a call.
|
#' A deprecation warning is shown when any of the deprecated parameters is used in a call.
|
||||||
#' An additional warning is shown when there was a partial match to a deprecated parameter
|
#' An additional warning is shown when there was a partial match to a deprecated parameter
|
||||||
@ -419,70 +419,79 @@ xgb.createFolds <- function(y, k) {
|
|||||||
#' @name xgboost-deprecated
|
#' @name xgboost-deprecated
|
||||||
NULL
|
NULL
|
||||||
|
|
||||||
#' @title Model Serialization and Compatibility
|
#' Model Serialization and Compatibility
|
||||||
#' @description
|
|
||||||
#'
|
#'
|
||||||
|
#' @description
|
||||||
#' When it comes to serializing XGBoost models, it's possible to use R serializers such as
|
#' When it comes to serializing XGBoost models, it's possible to use R serializers such as
|
||||||
#' \link{save} or \link{saveRDS} to serialize an XGBoost R model, but XGBoost also provides
|
#' [save()] or [saveRDS()] to serialize an XGBoost R model, but XGBoost also provides
|
||||||
#' its own serializers with better compatibility guarantees, which allow loading
|
#' its own serializers with better compatibility guarantees, which allow loading
|
||||||
#' said models in other language bindings of XGBoost.
|
#' said models in other language bindings of XGBoost.
|
||||||
#'
|
#'
|
||||||
#' Note that an `xgb.Booster` object, outside of its core components, might also keep:\itemize{
|
#' Note that an `xgb.Booster` object, outside of its core components, might also keep:
|
||||||
#' \item Additional model configuration (accessible through \link{xgb.config}),
|
#' - Additional model configuration (accessible through [xgb.config()]), which includes
|
||||||
#' which includes model fitting parameters like `max_depth` and runtime parameters like `nthread`.
|
#' model fitting parameters like `max_depth` and runtime parameters like `nthread`.
|
||||||
#' These are not necessarily useful for prediction/importance/plotting.
|
#' These are not necessarily useful for prediction/importance/plotting.
|
||||||
#' \item Additional R-specific attributes - e.g. results of callbacks, such as evaluation logs,
|
#' - Additional R specific attributes - e.g. results of callbacks, such as evaluation logs,
|
||||||
#' which are kept as a `data.table` object, accessible through `attributes(model)$evaluation_log`
|
#' which are kept as a `data.table` object, accessible through
|
||||||
#' if present.
|
#' `attributes(model)$evaluation_log` if present.
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' The first one (configurations) does not have the same compatibility guarantees as
|
#' The first one (configurations) does not have the same compatibility guarantees as
|
||||||
#' the model itself, including attributes that are set and accessed through \link{xgb.attributes} - that is, such configuration
|
#' the model itself, including attributes that are set and accessed through
|
||||||
#' might be lost after loading the booster in a different XGBoost version, regardless of the
|
#' [xgb.attributes()] - that is, such configuration might be lost after loading the
|
||||||
#' serializer that was used. These are saved when using \link{saveRDS}, but will be discarded
|
#' booster in a different XGBoost version, regardless of the serializer that was used.
|
||||||
#' if loaded into an incompatible XGBoost version. They are not saved when using XGBoost's
|
#' These are saved when using [saveRDS()], but will be discarded if loaded into an
|
||||||
#' serializers from its public interface including \link{xgb.save} and \link{xgb.save.raw}.
|
#' incompatible XGBoost version. They are not saved when using XGBoost's
|
||||||
|
#' serializers from its public interface including [xgb.save()] and [xgb.save.raw()].
|
||||||
#'
|
#'
|
||||||
#' The second ones (R attributes) are not part of the standard XGBoost model structure, and thus are
|
#' The second ones (R attributes) are not part of the standard XGBoost model structure,
|
||||||
#' not saved when using XGBoost's own serializers. These attributes are only used for informational
|
#' and thus are not saved when using XGBoost's own serializers. These attributes are
|
||||||
#' purposes, such as keeping track of evaluation metrics as the model was fit, or saving the R
|
#' only used for informational purposes, such as keeping track of evaluation metrics as
|
||||||
#' call that produced the model, but are otherwise not used for prediction / importance / plotting / etc.
|
#' the model was fit, or saving the R call that produced the model, but are otherwise
|
||||||
|
#' not used for prediction / importance / plotting / etc.
|
||||||
#' These R attributes are only preserved when using R's serializers.
|
#' These R attributes are only preserved when using R's serializers.
|
||||||
#'
|
#'
|
||||||
#' Note that XGBoost models in R starting from version `2.1.0` and onwards, and XGBoost models
|
#' Note that XGBoost models in R starting from version `2.1.0` and onwards, and
|
||||||
#' before version `2.1.0`; have a very different R object structure and are incompatible with
|
#' XGBoost models before version `2.1.0`; have a very different R object structure and
|
||||||
#' each other. Hence, models that were saved with R serializers live `saveRDS` or `save` before
|
#' are incompatible with each other. Hence, models that were saved with R serializers
|
||||||
#' version `2.1.0` will not work with latter `xgboost` versions and vice versa. Be aware that
|
#' like [saveRDS()] or [save()] before version `2.1.0` will not work with latter
|
||||||
#' the structure of R model objects could in theory change again in the future, so XGBoost's serializers
|
#' `xgboost` versions and vice versa. Be aware that the structure of R model objects
|
||||||
|
#' could in theory change again in the future, so XGBoost's serializers
|
||||||
#' should be preferred for long-term storage.
|
#' should be preferred for long-term storage.
|
||||||
#'
|
#'
|
||||||
#' Furthermore, note that using the package `qs` for serialization will require version 0.26 or
|
#' Furthermore, note that using the package `qs` for serialization will require
|
||||||
#' higher of said package, and will have the same compatibility restrictions as R serializers.
|
#' version 0.26 or higher of said package, and will have the same compatibility
|
||||||
|
#' restrictions as R serializers.
|
||||||
#'
|
#'
|
||||||
#' @details
|
#' @details
|
||||||
#' Use \code{\link{xgb.save}} to save the XGBoost model as a stand-alone file. You may opt into
|
#' Use [xgb.save()] to save the XGBoost model as a stand-alone file. You may opt into
|
||||||
#' the JSON format by specifying the JSON extension. To read the model back, use
|
#' the JSON format by specifying the JSON extension. To read the model back, use
|
||||||
#' \code{\link{xgb.load}}.
|
#' [xgb.load()].
|
||||||
#'
|
#'
|
||||||
#' Use \code{\link{xgb.save.raw}} to save the XGBoost model as a sequence (vector) of raw bytes
|
#' Use [xgb.save.raw()] to save the XGBoost model as a sequence (vector) of raw bytes
|
||||||
#' in a future-proof manner. Future releases of XGBoost will be able to read the raw bytes and
|
#' in a future-proof manner. Future releases of XGBoost will be able to read the raw bytes and
|
||||||
#' re-construct the corresponding model. To read the model back, use \code{\link{xgb.load.raw}}.
|
#' re-construct the corresponding model. To read the model back, use [xgb.load.raw()].
|
||||||
#' The \code{\link{xgb.save.raw}} function is useful if you'd like to persist the XGBoost model
|
#' The [xgb.save.raw()] function is useful if you would like to persist the XGBoost model
|
||||||
#' as part of another R object.
|
#' as part of another R object.
|
||||||
#'
|
#'
|
||||||
#' Use \link{saveRDS} if you require the R-specific attributes that a booster might have, such
|
#' Use [saveRDS()] if you require the R-specific attributes that a booster might have, such
|
||||||
#' as evaluation logs, but note that future compatibility of such objects is outside XGBoost's
|
#' as evaluation logs, but note that future compatibility of such objects is outside XGBoost's
|
||||||
#' control as it relies on R's serialization format (see e.g. the details section in
|
#' control as it relies on R's serialization format (see e.g. the details section in
|
||||||
#' \link{serialize} and \link{save} from base R).
|
#' [serialize] and [save()] from base R).
|
||||||
#'
|
#'
|
||||||
#' For more details and explanation about model persistence and archival, consult the page
|
#' For more details and explanation about model persistence and archival, consult the page
|
||||||
#' \url{https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html}.
|
#' \url{https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html}.
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' data(agaricus.train, package='xgboost')
|
#' data(agaricus.train, package = "xgboost")
|
||||||
#' bst <- xgb.train(data = xgb.DMatrix(agaricus.train$data, label = agaricus.train$label),
|
#'
|
||||||
#' max_depth = 2, eta = 1, nthread = 2, nrounds = 2,
|
#' bst <- xgb.train(
|
||||||
#' objective = "binary:logistic")
|
#' data = xgb.DMatrix(agaricus.train$data, label = agaricus.train$label),
|
||||||
|
#' max_depth = 2,
|
||||||
|
#' eta = 1,
|
||||||
|
#' nthread = 2,
|
||||||
|
#' nrounds = 2,
|
||||||
|
#' objective = "binary:logistic"
|
||||||
|
#' )
|
||||||
#'
|
#'
|
||||||
#' # Save as a stand-alone file; load it with xgb.load()
|
#' # Save as a stand-alone file; load it with xgb.load()
|
||||||
#' fname <- file.path(tempdir(), "xgb_model.ubj")
|
#' fname <- file.path(tempdir(), "xgb_model.ubj")
|
||||||
|
|||||||
@ -1,4 +1,4 @@
|
|||||||
# Construct an internal xgboost Booster and get its current number of rounds.
|
# Construct an internal XGBoost Booster and get its current number of rounds.
|
||||||
# internal utility function
|
# internal utility function
|
||||||
# Note: the number of rounds in the C booster gets reset to zero when changing
|
# Note: the number of rounds in the C booster gets reset to zero when changing
|
||||||
# key booster parameters like 'process_type=update', but in some cases, when
|
# key booster parameters like 'process_type=update', but in some cases, when
|
||||||
@ -64,7 +64,7 @@ xgb.get.handle <- function(object) {
|
|||||||
if (inherits(object, "xgb.Booster")) {
|
if (inherits(object, "xgb.Booster")) {
|
||||||
handle <- object$ptr
|
handle <- object$ptr
|
||||||
if (is.null(handle) || !inherits(handle, "externalptr")) {
|
if (is.null(handle) || !inherits(handle, "externalptr")) {
|
||||||
stop("'xgb.Booster' object is corrupted or is from an incompatible xgboost version.")
|
stop("'xgb.Booster' object is corrupted or is from an incompatible XGBoost version.")
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
stop("argument must be an 'xgb.Booster' object.")
|
stop("argument must be an 'xgb.Booster' object.")
|
||||||
@ -77,37 +77,38 @@ xgb.get.handle <- function(object) {
|
|||||||
|
|
||||||
#' Predict method for XGBoost model
|
#' Predict method for XGBoost model
|
||||||
#'
|
#'
|
||||||
#' Predict values on data based on xgboost model.
|
#' Predict values on data based on XGBoost model.
|
||||||
#'
|
#'
|
||||||
#' @param object Object of class `xgb.Booster`.
|
#' @param object Object of class `xgb.Booster`.
|
||||||
#' @param newdata Takes `data.frame`, `matrix`, `dgCMatrix`, `dgRMatrix`, `dsparseVector`,
|
#' @param newdata Takes `data.frame`, `matrix`, `dgCMatrix`, `dgRMatrix`, `dsparseVector`,
|
||||||
#' local data file, or `xgb.DMatrix`.
|
#' local data file, or `xgb.DMatrix`.
|
||||||
#'
|
#'
|
||||||
#' For single-row predictions on sparse data, it's recommended to use CSR format. If passing
|
#' For single-row predictions on sparse data, it is recommended to use CSR format. If passing
|
||||||
#' a sparse vector, it will take it as a row vector.
|
#' a sparse vector, it will take it as a row vector.
|
||||||
#'
|
#'
|
||||||
#' Note that, for repeated predictions on the same data, one might want to create a DMatrix to
|
#' Note that, for repeated predictions on the same data, one might want to create a DMatrix to
|
||||||
#' pass here instead of passing R types like matrices or data frames, as predictions will be
|
#' pass here instead of passing R types like matrices or data frames, as predictions will be
|
||||||
#' faster on DMatrix.
|
#' faster on DMatrix.
|
||||||
#'
|
#'
|
||||||
#' If `newdata` is a `data.frame`, be aware that:\itemize{
|
#' If `newdata` is a `data.frame`, be aware that:
|
||||||
#' \item Columns will be converted to numeric if they aren't already, which could potentially make
|
#' - Columns will be converted to numeric if they aren't already, which could potentially make
|
||||||
#' the operation slower than in an equivalent `matrix` object.
|
#' the operation slower than in an equivalent `matrix` object.
|
||||||
#' \item The order of the columns must match with that of the data from which the model was fitted
|
#' - The order of the columns must match with that of the data from which the model was fitted
|
||||||
#' (i.e. columns will not be referenced by their names, just by their order in the data).
|
#' (i.e. columns will not be referenced by their names, just by their order in the data).
|
||||||
#' \item If the model was fitted to data with categorical columns, these columns must be of
|
#' - If the model was fitted to data with categorical columns, these columns must be of
|
||||||
#' `factor` type here, and must use the same encoding (i.e. have the same levels).
|
#' `factor` type here, and must use the same encoding (i.e. have the same levels).
|
||||||
#' \item If `newdata` contains any `factor` columns, they will be converted to base-0
|
#' - If `newdata` contains any `factor` columns, they will be converted to base-0
|
||||||
#' encoding (same as during DMatrix creation) - hence, one should not pass a `factor`
|
#' encoding (same as during DMatrix creation) - hence, one should not pass a `factor`
|
||||||
#' under a column which during training had a different type.
|
#' under a column which during training had a different type.
|
||||||
#' }
|
#' @param missing Float value that represents missing values in data
|
||||||
#' @param missing Float value that represents missing values in data (e.g., 0 or some other extreme value).
|
#' (e.g., 0 or some other extreme value).
|
||||||
#'
|
#'
|
||||||
#' This parameter is not used when `newdata` is an `xgb.DMatrix` - in such cases, should pass
|
#' This parameter is not used when `newdata` is an `xgb.DMatrix` - in such cases,
|
||||||
#' this as an argument to the DMatrix constructor instead.
|
#' should pass this as an argument to the DMatrix constructor instead.
|
||||||
#' @param outputmargin Whether the prediction should be returned in the form of original untransformed
|
#' @param outputmargin Whether the prediction should be returned in the form of
|
||||||
#' sum of predictions from boosting iterations' results. E.g., setting `outputmargin=TRUE` for
|
#' original untransformed sum of predictions from boosting iterations' results.
|
||||||
#' logistic regression would return log-odds instead of probabilities.
|
#' E.g., setting `outputmargin = TRUE` for logistic regression would return log-odds
|
||||||
|
#' instead of probabilities.
|
||||||
#' @param predleaf Whether to predict per-tree leaf indices.
|
#' @param predleaf Whether to predict per-tree leaf indices.
|
||||||
#' @param predcontrib Whether to return feature contributions to individual predictions (see Details).
|
#' @param predcontrib Whether to return feature contributions to individual predictions (see Details).
|
||||||
#' @param approxcontrib Whether to use a fast approximation for feature contributions (see Details).
|
#' @param approxcontrib Whether to use a fast approximation for feature contributions (see Details).
|
||||||
@ -147,17 +148,16 @@ xgb.get.handle <- function(object) {
|
|||||||
#'
|
#'
|
||||||
#' Note that, if `newdata` is an `xgb.DMatrix` object, this argument will
|
#' Note that, if `newdata` is an `xgb.DMatrix` object, this argument will
|
||||||
#' be ignored as it needs to be added to the DMatrix instead (e.g. by passing it as
|
#' be ignored as it needs to be added to the DMatrix instead (e.g. by passing it as
|
||||||
#' an argument in its constructor, or by calling \link{setinfo.xgb.DMatrix}).
|
#' an argument in its constructor, or by calling [setinfo.xgb.DMatrix()].
|
||||||
#'
|
#' @param validate_features When `TRUE`, validate that the Booster's and newdata's
|
||||||
#' @param validate_features When `TRUE`, validate that the Booster's and newdata's feature_names
|
#' feature_names match (only applicable when both `object` and `newdata` have feature names).
|
||||||
#' match (only applicable when both `object` and `newdata` have feature names).
|
|
||||||
#'
|
#'
|
||||||
#' If the column names differ and `newdata` is not an `xgb.DMatrix`, will try to reorder
|
#' If the column names differ and `newdata` is not an `xgb.DMatrix`, will try to reorder
|
||||||
#' the columns in `newdata` to match with the booster's.
|
#' the columns in `newdata` to match with the booster's.
|
||||||
#'
|
#'
|
||||||
#' If the booster has feature types and `newdata` is either an `xgb.DMatrix` or `data.frame`,
|
#' If the booster has feature types and `newdata` is either an `xgb.DMatrix` or
|
||||||
#' will additionally verify that categorical columns are of the correct type in `newdata`,
|
#' `data.frame`, will additionally verify that categorical columns are of the
|
||||||
#' throwing an error if they do not match.
|
#' correct type in `newdata`, throwing an error if they do not match.
|
||||||
#'
|
#'
|
||||||
#' If passing `FALSE`, it is assumed that the feature names and types are the same,
|
#' If passing `FALSE`, it is assumed that the feature names and types are the same,
|
||||||
#' and come in the same order as in the training data.
|
#' and come in the same order as in the training data.
|
||||||
@ -167,7 +167,6 @@ xgb.get.handle <- function(object) {
|
|||||||
#' @param ... Not used.
|
#' @param ... Not used.
|
||||||
#'
|
#'
|
||||||
#' @details
|
#' @details
|
||||||
#'
|
|
||||||
#' Note that `iterationrange` would currently do nothing for predictions from "gblinear",
|
#' Note that `iterationrange` would currently do nothing for predictions from "gblinear",
|
||||||
#' since "gblinear" doesn't keep its boosting history.
|
#' since "gblinear" doesn't keep its boosting history.
|
||||||
#'
|
#'
|
||||||
@ -589,11 +588,33 @@ validate.features <- function(bst, newdata) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
#' @title Accessors for serializable attributes of a model
|
#' Accessors for serializable attributes of a model
|
||||||
#'
|
#'
|
||||||
#' @description These methods allow to manipulate the key-value attribute strings of an xgboost model.
|
#' These methods allow to manipulate the key-value attribute strings of an XGBoost model.
|
||||||
#'
|
#'
|
||||||
#' @param object Object of class `xgb.Booster`. \bold{Will be modified in-place} when assigning to it.
|
#' @details
|
||||||
|
#' The primary purpose of XGBoost model attributes is to store some meta data about the model.
|
||||||
|
#' Note that they are a separate concept from the object attributes in R.
|
||||||
|
#' Specifically, they refer to key-value strings that can be attached to an XGBoost model,
|
||||||
|
#' stored together with the model's binary representation, and accessed later
|
||||||
|
#' (from R or any other interface).
|
||||||
|
#' In contrast, any R attribute assigned to an R object of `xgb.Booster` class
|
||||||
|
#' would not be saved by [xgb.save()] because an XGBoost model is an external memory object
|
||||||
|
#' and its serialization is handled externally.
|
||||||
|
#' Also, setting an attribute that has the same name as one of XGBoost's parameters wouldn't
|
||||||
|
#' change the value of that parameter for a model.
|
||||||
|
#' Use [xgb.parameters<-()] to set or change model parameters.
|
||||||
|
#'
|
||||||
|
#' The [xgb.attributes<-()] setter either updates the existing or adds one or several attributes,
|
||||||
|
#' but it doesn't delete the other existing attributes.
|
||||||
|
#'
|
||||||
|
#' Important: since this modifies the booster's C object, semantics for assignment here
|
||||||
|
#' will differ from R's, as any object reference to the same booster will be modified
|
||||||
|
#' too, while assignment of R attributes through `attributes(model)$<attr> <- <value>`
|
||||||
|
#' will follow the usual copy-on-write R semantics (see [xgb.copy.Booster()] for an
|
||||||
|
#' example of these behaviors).
|
||||||
|
#'
|
||||||
|
#' @param object Object of class `xgb.Booster`. **Will be modified in-place** when assigning to it.
|
||||||
#' @param name A non-empty character string specifying which attribute is to be accessed.
|
#' @param name A non-empty character string specifying which attribute is to be accessed.
|
||||||
#' @param value For `xgb.attr<-`, a value of an attribute; for `xgb.attributes<-`,
|
#' @param value For `xgb.attr<-`, a value of an attribute; for `xgb.attributes<-`,
|
||||||
#' it is a list (or an object coercible to a list) with the names of attributes to set
|
#' it is a list (or an object coercible to a list) with the names of attributes to set
|
||||||
@ -601,29 +622,6 @@ validate.features <- function(bst, newdata) {
|
|||||||
#' Non-character values are converted to character.
|
#' Non-character values are converted to character.
|
||||||
#' When an attribute value is not a scalar, only the first index is used.
|
#' When an attribute value is not a scalar, only the first index is used.
|
||||||
#' Use `NULL` to remove an attribute.
|
#' Use `NULL` to remove an attribute.
|
||||||
#'
|
|
||||||
#' @details
|
|
||||||
#' The primary purpose of xgboost model attributes is to store some meta data about the model.
|
|
||||||
#' Note that they are a separate concept from the object attributes in R.
|
|
||||||
#' Specifically, they refer to key-value strings that can be attached to an xgboost model,
|
|
||||||
#' stored together with the model's binary representation, and accessed later
|
|
||||||
#' (from R or any other interface).
|
|
||||||
#' In contrast, any R attribute assigned to an R object of `xgb.Booster` class
|
|
||||||
#' would not be saved by [xgb.save()] because an xgboost model is an external memory object
|
|
||||||
#' and its serialization is handled externally.
|
|
||||||
#' Also, setting an attribute that has the same name as one of xgboost's parameters wouldn't
|
|
||||||
#' change the value of that parameter for a model.
|
|
||||||
#' Use [xgb.parameters<-()] to set or change model parameters.
|
|
||||||
#'
|
|
||||||
#' The `xgb.attributes<-` setter either updates the existing or adds one or several attributes,
|
|
||||||
#' but it doesn't delete the other existing attributes.
|
|
||||||
#'
|
|
||||||
#' Important: since this modifies the booster's C object, semantics for assignment here
|
|
||||||
#' will differ from R's, as any object reference to the same booster will be modified
|
|
||||||
#' too, while assignment of R attributes through `attributes(model)$<attr> <- <value>`
|
|
||||||
#' will follow the usual copy-on-write R semantics (see \link{xgb.copy.Booster} for an
|
|
||||||
#' example of these behaviors).
|
|
||||||
#'
|
|
||||||
#' @return
|
#' @return
|
||||||
#' - `xgb.attr()` returns either a string value of an attribute
|
#' - `xgb.attr()` returns either a string value of an attribute
|
||||||
#' or `NULL` if an attribute wasn't stored in a model.
|
#' or `NULL` if an attribute wasn't stored in a model.
|
||||||
@ -720,15 +718,18 @@ xgb.attributes <- function(object) {
|
|||||||
return(object)
|
return(object)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Accessors for model parameters as JSON string
|
#' Accessors for model parameters as JSON string
|
||||||
#' @details Note that assignment is performed in-place on the booster C object, which unlike assignment
|
#'
|
||||||
|
#' @details
|
||||||
|
#' Note that assignment is performed in-place on the booster C object, which unlike assignment
|
||||||
#' of R attributes, doesn't follow typical copy-on-write semantics for assignment - i.e. all references
|
#' of R attributes, doesn't follow typical copy-on-write semantics for assignment - i.e. all references
|
||||||
#' to the same booster will also get updated.
|
#' to the same booster will also get updated.
|
||||||
#'
|
#'
|
||||||
#' See \link{xgb.copy.Booster} for an example of this behavior.
|
#' See [xgb.copy.Booster()] for an example of this behavior.
|
||||||
#' @param object Object of class `xgb.Booster`. \bold{Will be modified in-place} when assigning to it.
|
#'
|
||||||
#' @param value An R list.
|
#' @param object Object of class `xgb.Booster`.**Will be modified in-place** when assigning to it.
|
||||||
#' @return `xgb.config` will return the parameters as an R list.
|
#' @param value A list.
|
||||||
|
#' @return Parameters as a list.
|
||||||
#' @examples
|
#' @examples
|
||||||
#' data(agaricus.train, package = "xgboost")
|
#' data(agaricus.train, package = "xgboost")
|
||||||
#'
|
#'
|
||||||
@ -767,23 +768,27 @@ xgb.config <- function(object) {
|
|||||||
return(object)
|
return(object)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Accessors for model parameters
|
#' Accessors for model parameters
|
||||||
#' @description Only the setter for xgboost parameters is currently implemented.
|
#'
|
||||||
#' @details Just like \link{xgb.attr}, this function will make in-place modifications
|
#' Only the setter for XGBoost parameters is currently implemented.
|
||||||
|
#'
|
||||||
|
#' @details
|
||||||
|
#' Just like [xgb.attr()], this function will make in-place modifications
|
||||||
#' on the booster object which do not follow typical R assignment semantics - that is,
|
#' on the booster object which do not follow typical R assignment semantics - that is,
|
||||||
#' all references to the same booster will also be updated, unlike assingment of R
|
#' all references to the same booster will also be updated, unlike assingment of R
|
||||||
#' attributes which follow copy-on-write semantics.
|
#' attributes which follow copy-on-write semantics.
|
||||||
#'
|
#'
|
||||||
#' See \link{xgb.copy.Booster} for an example of this behavior.
|
#' See [xgb.copy.Booster()] for an example of this behavior.
|
||||||
#'
|
#'
|
||||||
#' Be aware that setting parameters of a fitted booster related to training continuation / updates
|
#' Be aware that setting parameters of a fitted booster related to training continuation / updates
|
||||||
#' will reset its number of rounds indicator to zero.
|
#' will reset its number of rounds indicator to zero.
|
||||||
#' @param object Object of class `xgb.Booster`. \bold{Will be modified in-place}.
|
#' @param object Object of class `xgb.Booster`. **Will be modified in-place**.
|
||||||
#' @param value A list (or an object coercible to a list) with the names of parameters to set
|
#' @param value A list (or an object coercible to a list) with the names of parameters to set
|
||||||
#' and the elements corresponding to parameter values.
|
#' and the elements corresponding to parameter values.
|
||||||
#' @return The same booster `object`, which gets modified in-place.
|
#' @return The same booster `object`, which gets modified in-place.
|
||||||
#' @examples
|
#' @examples
|
||||||
#' data(agaricus.train, package = "xgboost")
|
#' data(agaricus.train, package = "xgboost")
|
||||||
|
#'
|
||||||
#' train <- agaricus.train
|
#' train <- agaricus.train
|
||||||
#'
|
#'
|
||||||
#' bst <- xgb.train(
|
#' bst <- xgb.train(
|
||||||
@ -859,11 +864,12 @@ setinfo.xgb.Booster <- function(object, name, info) {
|
|||||||
return(TRUE)
|
return(TRUE)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Get number of boosting in a fitted booster
|
#' Get number of boosting in a fitted booster
|
||||||
|
#'
|
||||||
#' @param model,x A fitted `xgb.Booster` model.
|
#' @param model,x A fitted `xgb.Booster` model.
|
||||||
#' @return The number of rounds saved in the model, as an integer.
|
#' @return The number of rounds saved in the model as an integer.
|
||||||
#' @details Note that setting booster parameters related to training
|
#' @details Note that setting booster parameters related to training
|
||||||
#' continuation / updates through \link{xgb.parameters<-} will reset the
|
#' continuation / updates through [xgb.parameters<-()] will reset the
|
||||||
#' number of rounds to zero.
|
#' number of rounds to zero.
|
||||||
#' @export
|
#' @export
|
||||||
#' @rdname xgb.get.num.boosted.rounds
|
#' @rdname xgb.get.num.boosted.rounds
|
||||||
@ -877,16 +883,19 @@ length.xgb.Booster <- function(x) {
|
|||||||
return(xgb.get.num.boosted.rounds(x))
|
return(xgb.get.num.boosted.rounds(x))
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Slice Booster by Rounds
|
#' Slice Booster by Rounds
|
||||||
#' @description Creates a new booster including only a selected range of rounds / iterations
|
#'
|
||||||
|
#' Creates a new booster including only a selected range of rounds / iterations
|
||||||
#' from an existing booster, as given by the sequence `seq(start, end, step)`.
|
#' from an existing booster, as given by the sequence `seq(start, end, step)`.
|
||||||
#' @details Note that any R attributes that the booster might have, will not be copied into
|
#'
|
||||||
|
#' @details
|
||||||
|
#' Note that any R attributes that the booster might have, will not be copied into
|
||||||
#' the resulting object.
|
#' the resulting object.
|
||||||
|
#'
|
||||||
#' @param model,x A fitted `xgb.Booster` object, which is to be sliced by taking only a subset
|
#' @param model,x A fitted `xgb.Booster` object, which is to be sliced by taking only a subset
|
||||||
#' of its rounds / iterations.
|
#' of its rounds / iterations.
|
||||||
#' @param start Start of the slice (base-1 and inclusive, like R's \link{seq}).
|
#' @param start Start of the slice (base-1 and inclusive, like R's [seq()]).
|
||||||
#' @param end End of the slice (base-1 and inclusive, like R's \link{seq}).
|
#' @param end End of the slice (base-1 and inclusive, like R's [seq()]).
|
||||||
#'
|
|
||||||
#' Passing a value of zero here is equivalent to passing the full number of rounds in the
|
#' Passing a value of zero here is equivalent to passing the full number of rounds in the
|
||||||
#' booster object.
|
#' booster object.
|
||||||
#' @param step Step size of the slice. Passing '1' will take every round in the sequence defined by
|
#' @param step Step size of the slice. Passing '1' will take every round in the sequence defined by
|
||||||
@ -894,8 +903,10 @@ length.xgb.Booster <- function(x) {
|
|||||||
#' @return A sliced booster object containing only the requested rounds.
|
#' @return A sliced booster object containing only the requested rounds.
|
||||||
#' @examples
|
#' @examples
|
||||||
#' data(mtcars)
|
#' data(mtcars)
|
||||||
|
#'
|
||||||
#' y <- mtcars$mpg
|
#' y <- mtcars$mpg
|
||||||
#' x <- as.matrix(mtcars[, -1])
|
#' x <- as.matrix(mtcars[, -1])
|
||||||
|
#'
|
||||||
#' dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
#' dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
||||||
#' model <- xgb.train(data = dm, params = list(nthread = 1), nrounds = 5)
|
#' model <- xgb.train(data = dm, params = list(nthread = 1), nrounds = 5)
|
||||||
#' model_slice <- xgb.slice.Booster(model, 1, 3)
|
#' model_slice <- xgb.slice.Booster(model, 1, 3)
|
||||||
@ -948,10 +959,12 @@ xgb.slice.Booster <- function(model, start, end = xgb.get.num.boosted.rounds(mod
|
|||||||
return(xgb.slice.Booster(x, i[1L], i[length(i)], steps[1L]))
|
return(xgb.slice.Booster(x, i[1L], i[length(i)], steps[1L]))
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Get Features Names from Booster
|
#' Get Features Names from Booster
|
||||||
#' @description Returns the feature / variable / column names from a fitted
|
#'
|
||||||
#' booster object, which are set automatically during the call to \link{xgb.train}
|
#' @description
|
||||||
#' from the DMatrix names, or which can be set manually through \link{setinfo}.
|
#' Returns the feature / variable / column names from a fitted
|
||||||
|
#' booster object, which are set automatically during the call to [xgb.train()]
|
||||||
|
#' from the DMatrix names, or which can be set manually through [setinfo()].
|
||||||
#'
|
#'
|
||||||
#' If the object doesn't have feature names, will return `NULL`.
|
#' If the object doesn't have feature names, will return `NULL`.
|
||||||
#'
|
#'
|
||||||
@ -1002,23 +1015,25 @@ xgb.best_iteration <- function(bst) {
|
|||||||
return(out)
|
return(out)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Extract coefficients from linear booster
|
#' Extract coefficients from linear booster
|
||||||
#' @description Extracts the coefficients from a 'gblinear' booster object,
|
#'
|
||||||
#' as produced by \code{xgb.train} when using parameter `booster="gblinear"`.
|
#' @description
|
||||||
|
#' Extracts the coefficients from a 'gblinear' booster object,
|
||||||
|
#' as produced by [xgb.train()] when using parameter `booster="gblinear"`.
|
||||||
#'
|
#'
|
||||||
#' Note: this function will error out if passing a booster model
|
#' Note: this function will error out if passing a booster model
|
||||||
#' which is not of "gblinear" type.
|
#' which is not of "gblinear" type.
|
||||||
|
#'
|
||||||
#' @param object A fitted booster of 'gblinear' type.
|
#' @param object A fitted booster of 'gblinear' type.
|
||||||
#' @param ... Not used.
|
#' @param ... Not used.
|
||||||
#' @return The extracted coefficients:\itemize{
|
#' @return The extracted coefficients:
|
||||||
#' \item If there's only one coefficient per column in the data, will be returned as a
|
#' - If there is only one coefficient per column in the data, will be returned as a
|
||||||
#' vector, potentially containing the feature names if available, with the intercept
|
#' vector, potentially containing the feature names if available, with the intercept
|
||||||
#' as first column.
|
#' as first column.
|
||||||
#' \item If there's more than one coefficient per column in the data (e.g. when using
|
#' - If there is more than one coefficient per column in the data (e.g. when using
|
||||||
#' `objective="multi:softmax"`), will be returned as a matrix with dimensions equal
|
#' `objective="multi:softmax"`), will be returned as a matrix with dimensions equal
|
||||||
#' to `[num_features, num_cols]`, with the intercepts as first row. Note that the column
|
#' to `[num_features, num_cols]`, with the intercepts as first row. Note that the column
|
||||||
#' (classes in multi-class classification) dimension will not be named.
|
#' (classes in multi-class classification) dimension will not be named.
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' The intercept returned here will include the 'base_score' parameter (unlike the 'bias'
|
#' The intercept returned here will include the 'base_score' parameter (unlike the 'bias'
|
||||||
#' or the last coefficient in the model dump, which doesn't have 'base_score' added to it),
|
#' or the last coefficient in the model dump, which doesn't have 'base_score' added to it),
|
||||||
@ -1027,12 +1042,15 @@ xgb.best_iteration <- function(bst) {
|
|||||||
#'
|
#'
|
||||||
#' Be aware that the coefficients are obtained by first converting them to strings and
|
#' Be aware that the coefficients are obtained by first converting them to strings and
|
||||||
#' back, so there will always be some very small lose of precision compared to the actual
|
#' back, so there will always be some very small lose of precision compared to the actual
|
||||||
#' coefficients as used by \link{predict.xgb.Booster}.
|
#' coefficients as used by [predict.xgb.Booster].
|
||||||
#' @examples
|
#' @examples
|
||||||
#' library(xgboost)
|
#' library(xgboost)
|
||||||
|
#'
|
||||||
#' data(mtcars)
|
#' data(mtcars)
|
||||||
|
#'
|
||||||
#' y <- mtcars[, 1]
|
#' y <- mtcars[, 1]
|
||||||
#' x <- as.matrix(mtcars[, -1])
|
#' x <- as.matrix(mtcars[, -1])
|
||||||
|
#'
|
||||||
#' dm <- xgb.DMatrix(data = x, label = y, nthread = 1)
|
#' dm <- xgb.DMatrix(data = x, label = y, nthread = 1)
|
||||||
#' params <- list(booster = "gblinear", nthread = 1)
|
#' params <- list(booster = "gblinear", nthread = 1)
|
||||||
#' model <- xgb.train(data = dm, params = params, nrounds = 2)
|
#' model <- xgb.train(data = dm, params = params, nrounds = 2)
|
||||||
@ -1088,19 +1106,25 @@ coef.xgb.Booster <- function(object, ...) {
|
|||||||
return(out)
|
return(out)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Deep-copies a Booster Object
|
#' Deep-copies a Booster Object
|
||||||
#' @description Creates a deep copy of an 'xgb.Booster' object, such that the
|
#'
|
||||||
|
#' Creates a deep copy of an 'xgb.Booster' object, such that the
|
||||||
#' C object pointer contained will be a different object, and hence functions
|
#' C object pointer contained will be a different object, and hence functions
|
||||||
#' like \link{xgb.attr} will not affect the object from which it was copied.
|
#' like [xgb.attr()] will not affect the object from which it was copied.
|
||||||
|
#'
|
||||||
#' @param model An 'xgb.Booster' object.
|
#' @param model An 'xgb.Booster' object.
|
||||||
#' @return A deep copy of `model` - it will be identical in every way, but C-level
|
#' @return A deep copy of `model` - it will be identical in every way, but C-level
|
||||||
#' functions called on that copy will not affect the `model` variable.
|
#' functions called on that copy will not affect the `model` variable.
|
||||||
#' @examples
|
#' @examples
|
||||||
#' library(xgboost)
|
#' library(xgboost)
|
||||||
|
#'
|
||||||
#' data(mtcars)
|
#' data(mtcars)
|
||||||
|
#'
|
||||||
#' y <- mtcars$mpg
|
#' y <- mtcars$mpg
|
||||||
#' x <- mtcars[, -1]
|
#' x <- mtcars[, -1]
|
||||||
|
#'
|
||||||
#' dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
#' dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
||||||
|
#'
|
||||||
#' model <- xgb.train(
|
#' model <- xgb.train(
|
||||||
#' data = dm,
|
#' data = dm,
|
||||||
#' params = list(nthread = 1),
|
#' params = list(nthread = 1),
|
||||||
@ -1135,29 +1159,35 @@ xgb.copy.Booster <- function(model) {
|
|||||||
return(.Call(XGDuplicate_R, model))
|
return(.Call(XGDuplicate_R, model))
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Check if two boosters share the same C object
|
#' Check if two boosters share the same C object
|
||||||
#' @description Checks whether two booster objects refer to the same underlying C object.
|
#'
|
||||||
#' @details As booster objects (as returned by e.g. \link{xgb.train}) contain an R 'externalptr'
|
#' Checks whether two booster objects refer to the same underlying C object.
|
||||||
|
#'
|
||||||
|
#' @details
|
||||||
|
#' As booster objects (as returned by e.g. [xgb.train()]) contain an R 'externalptr'
|
||||||
#' object, they don't follow typical copy-on-write semantics of other R objects - that is, if
|
#' object, they don't follow typical copy-on-write semantics of other R objects - that is, if
|
||||||
#' one assigns a booster to a different variable and modifies that new variable through in-place
|
#' one assigns a booster to a different variable and modifies that new variable through in-place
|
||||||
#' methods like \link{xgb.attr<-}, the modification will be applied to both the old and the new
|
#' methods like [xgb.attr<-()], the modification will be applied to both the old and the new
|
||||||
#' variable, unlike typical R assignments which would only modify the latter.
|
#' variable, unlike typical R assignments which would only modify the latter.
|
||||||
#'
|
#'
|
||||||
#' This function allows checking whether two booster objects share the same 'externalptr',
|
#' This function allows checking whether two booster objects share the same 'externalptr',
|
||||||
#' regardless of the R attributes that they might have.
|
#' regardless of the R attributes that they might have.
|
||||||
#'
|
#'
|
||||||
#' In order to duplicate a booster in such a way that the copy wouldn't share the same
|
#' In order to duplicate a booster in such a way that the copy wouldn't share the same
|
||||||
#' 'externalptr', one can use function \link{xgb.copy.Booster}.
|
#' 'externalptr', one can use function [xgb.copy.Booster()].
|
||||||
#' @param obj1 Booster model to compare with `obj2`.
|
#' @param obj1 Booster model to compare with `obj2`.
|
||||||
#' @param obj2 Booster model to compare with `obj1`.
|
#' @param obj2 Booster model to compare with `obj1`.
|
||||||
#' @return Either `TRUE` or `FALSE` according to whether the two boosters share
|
#' @return Either `TRUE` or `FALSE` according to whether the two boosters share the
|
||||||
#' the underlying C object.
|
#' underlying C object.
|
||||||
#' @seealso \link{xgb.copy.Booster}
|
#' @seealso [xgb.copy.Booster()]
|
||||||
#' @examples
|
#' @examples
|
||||||
#' library(xgboost)
|
#' library(xgboost)
|
||||||
|
#'
|
||||||
#' data(mtcars)
|
#' data(mtcars)
|
||||||
|
#'
|
||||||
#' y <- mtcars$mpg
|
#' y <- mtcars$mpg
|
||||||
#' x <- as.matrix(mtcars[, -1])
|
#' x <- as.matrix(mtcars[, -1])
|
||||||
|
#'
|
||||||
#' model <- xgb.train(
|
#' model <- xgb.train(
|
||||||
#' params = list(nthread = 1),
|
#' params = list(nthread = 1),
|
||||||
#' data = xgb.DMatrix(x, label = y, nthread = 1),
|
#' data = xgb.DMatrix(x, label = y, nthread = 1),
|
||||||
@ -1210,10 +1240,10 @@ xgb.is.same.Booster <- function(obj1, obj2) {
|
|||||||
#' attr(bst, "myattr") <- "memo"
|
#' attr(bst, "myattr") <- "memo"
|
||||||
#'
|
#'
|
||||||
#' print(bst)
|
#' print(bst)
|
||||||
#'
|
#' @method print xgb.Booster
|
||||||
#' @export
|
#' @export
|
||||||
print.xgb.Booster <- function(x, ...) {
|
print.xgb.Booster <- function(x, ...) {
|
||||||
# this lets it error out when the object comes from an earlier R xgboost version
|
# this lets it error out when the object comes from an earlier R XGBoost version
|
||||||
handle <- xgb.get.handle(x)
|
handle <- xgb.get.handle(x)
|
||||||
cat('##### xgb.Booster\n')
|
cat('##### xgb.Booster\n')
|
||||||
|
|
||||||
|
|||||||
@ -1,20 +1,15 @@
|
|||||||
#' Create new features from a previously learned model
|
#' Create new features from a previously learned model
|
||||||
#'
|
#'
|
||||||
#' May improve the learning by adding new features to the training data based on the decision trees from a previously learned model.
|
#' May improve the learning by adding new features to the training data based on the
|
||||||
#'
|
#' decision trees from a previously learned model.
|
||||||
#' @param model decision tree boosting model learned on the original data
|
|
||||||
#' @param data original data (usually provided as a \code{dgCMatrix} matrix)
|
|
||||||
#' @param ... currently not used
|
|
||||||
#'
|
|
||||||
#' @return \code{dgCMatrix} matrix including both the original data and the new features.
|
|
||||||
#'
|
#'
|
||||||
#' @details
|
#' @details
|
||||||
#' This is the function inspired from the paragraph 3.1 of the paper:
|
#' This is the function inspired from the paragraph 3.1 of the paper:
|
||||||
#'
|
#'
|
||||||
#' \strong{Practical Lessons from Predicting Clicks on Ads at Facebook}
|
#' **Practical Lessons from Predicting Clicks on Ads at Facebook**
|
||||||
#'
|
#'
|
||||||
#' \emph{(Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yan, xin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers,
|
#' *(Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yan, xin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers,
|
||||||
#' Joaquin Quinonero Candela)}
|
#' Joaquin Quinonero Candela)*
|
||||||
#'
|
#'
|
||||||
#' International Workshop on Data Mining for Online Advertising (ADKDD) - August 24, 2014
|
#' International Workshop on Data Mining for Online Advertising (ADKDD) - August 24, 2014
|
||||||
#'
|
#'
|
||||||
@ -33,11 +28,11 @@
|
|||||||
#' where the first subtree has 3 leafs and the second 2 leafs. If an
|
#' where the first subtree has 3 leafs and the second 2 leafs. If an
|
||||||
#' instance ends up in leaf 2 in the first subtree and leaf 1 in
|
#' instance ends up in leaf 2 in the first subtree and leaf 1 in
|
||||||
#' second subtree, the overall input to the linear classifier will
|
#' second subtree, the overall input to the linear classifier will
|
||||||
#' be the binary vector \code{[0, 1, 0, 1, 0]}, where the first 3 entries
|
#' be the binary vector `[0, 1, 0, 1, 0]`, where the first 3 entries
|
||||||
#' correspond to the leaves of the first subtree and last 2 to
|
#' correspond to the leaves of the first subtree and last 2 to
|
||||||
#' those of the second subtree.
|
#' those of the second subtree.
|
||||||
#'
|
#'
|
||||||
#' [...]
|
#' ...
|
||||||
#'
|
#'
|
||||||
#' We can understand boosted decision tree
|
#' We can understand boosted decision tree
|
||||||
#' based transformation as a supervised feature encoding that
|
#' based transformation as a supervised feature encoding that
|
||||||
@ -45,16 +40,23 @@
|
|||||||
#' vector. A traversal from root node to a leaf node represents
|
#' vector. A traversal from root node to a leaf node represents
|
||||||
#' a rule on certain features."
|
#' a rule on certain features."
|
||||||
#'
|
#'
|
||||||
|
#' @param model Decision tree boosting model learned on the original data.
|
||||||
|
#' @param data Original data (usually provided as a `dgCMatrix` matrix).
|
||||||
|
#' @param ... Currently not used.
|
||||||
|
#'
|
||||||
|
#' @return A `dgCMatrix` matrix including both the original data and the new features.
|
||||||
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' data(agaricus.train, package='xgboost')
|
#' data(agaricus.train, package = "xgboost")
|
||||||
#' data(agaricus.test, package='xgboost')
|
#' data(agaricus.test, package = "xgboost")
|
||||||
|
#'
|
||||||
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||||
#' dtest <- with(agaricus.test, xgb.DMatrix(data, label = label, nthread = 2))
|
#' dtest <- with(agaricus.test, xgb.DMatrix(data, label = label, nthread = 2))
|
||||||
#'
|
#'
|
||||||
#' param <- list(max_depth = 2, eta = 1, objective = 'binary:logistic')
|
#' param <- list(max_depth = 2, eta = 1, objective = 'binary:logistic')
|
||||||
#' nrounds = 4
|
#' nrounds = 4
|
||||||
#'
|
#'
|
||||||
#' bst = xgb.train(params = param, data = dtrain, nrounds = nrounds, nthread = 2)
|
#' bst <- xgb.train(params = param, data = dtrain, nrounds = nrounds, nthread = 2)
|
||||||
#'
|
#'
|
||||||
#' # Model accuracy without new features
|
#' # Model accuracy without new features
|
||||||
#' accuracy.before <- sum((predict(bst, agaricus.test$data) >= 0.5) == agaricus.test$label) /
|
#' accuracy.before <- sum((predict(bst, agaricus.test$data) >= 0.5) == agaricus.test$label) /
|
||||||
|
|||||||
@ -1,36 +1,44 @@
|
|||||||
#' Dump an xgboost model in text format.
|
#' Dump an XGBoost model in text format.
|
||||||
#'
|
#'
|
||||||
#' Dump an xgboost model in text format.
|
#' Dump an XGBoost model in text format.
|
||||||
#'
|
#'
|
||||||
#' @param model the model object.
|
#' @param model The model object.
|
||||||
#' @param fname the name of the text file where to save the model text dump.
|
#' @param fname The name of the text file where to save the model text dump.
|
||||||
#' If not provided or set to \code{NULL}, the model is returned as a \code{character} vector.
|
#' If not provided or set to `NULL`, the model is returned as a character vector.
|
||||||
#' @param fmap feature map file representing feature types.
|
#' @param fmap Feature map file representing feature types. See demo/ for a walkthrough
|
||||||
#' See demo/ for walkthrough example in R, and
|
#' example in R, and \url{https://github.com/dmlc/xgboost/blob/master/demo/data/featmap.txt}
|
||||||
#' \url{https://github.com/dmlc/xgboost/blob/master/demo/data/featmap.txt}
|
#' to see an example of the value.
|
||||||
#' for example Format.
|
#' @param with_stats Whether to dump some additional statistics about the splits.
|
||||||
#' @param with_stats whether to dump some additional statistics about the splits.
|
|
||||||
#' When this option is on, the model dump contains two additional values:
|
#' When this option is on, the model dump contains two additional values:
|
||||||
#' gain is the approximate loss function gain we get in each split;
|
#' gain is the approximate loss function gain we get in each split;
|
||||||
#' cover is the sum of second order gradient in each node.
|
#' cover is the sum of second order gradient in each node.
|
||||||
#' @param dump_format either 'text', 'json', or 'dot' (graphviz) format could be specified.
|
#' @param dump_format Either 'text', 'json', or 'dot' (graphviz) format could be specified.
|
||||||
#'
|
#'
|
||||||
#' Format 'dot' for a single tree can be passed directly to packages that consume this format
|
#' Format 'dot' for a single tree can be passed directly to packages that consume this format
|
||||||
#' for graph visualization, such as function [DiagrammeR::grViz()]
|
#' for graph visualization, such as function `DiagrammeR::grViz()`
|
||||||
#' @param ... currently not used
|
#' @param ... Currently not used
|
||||||
#'
|
#'
|
||||||
#' @return
|
#' @return
|
||||||
#' If fname is not provided or set to \code{NULL} the function will return the model
|
#' If fname is not provided or set to `NULL` the function will return the model
|
||||||
#' as a \code{character} vector. Otherwise it will return \code{TRUE}.
|
#' as a character vector. Otherwise it will return `TRUE`.
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' \dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
#' \dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
||||||
#' data(agaricus.train, package='xgboost')
|
#' data(agaricus.train, package = "xgboost")
|
||||||
#' data(agaricus.test, package='xgboost')
|
#' data(agaricus.test, package = "xgboost")
|
||||||
|
#'
|
||||||
#' train <- agaricus.train
|
#' train <- agaricus.train
|
||||||
#' test <- agaricus.test
|
#' test <- agaricus.test
|
||||||
#' bst <- xgb.train(data = xgb.DMatrix(train$data, label = train$label), max_depth = 2,
|
#'
|
||||||
#' eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
|
#' bst <- xgb.train(
|
||||||
|
#' data = xgb.DMatrix(train$data, label = train$label),
|
||||||
|
#' max_depth = 2,
|
||||||
|
#' eta = 1,
|
||||||
|
#' nthread = 2,
|
||||||
|
#' nrounds = 2,
|
||||||
|
#' objective = "binary:logistic"
|
||||||
|
#' )
|
||||||
|
#'
|
||||||
#' # save the model in file 'xgb.model.dump'
|
#' # save the model in file 'xgb.model.dump'
|
||||||
#' dump_path = file.path(tempdir(), 'model.dump')
|
#' dump_path = file.path(tempdir(), 'model.dump')
|
||||||
#' xgb.dump(bst, dump_path, with_stats = TRUE)
|
#' xgb.dump(bst, dump_path, with_stats = TRUE)
|
||||||
@ -39,7 +47,7 @@
|
|||||||
#' print(xgb.dump(bst, with_stats = TRUE))
|
#' print(xgb.dump(bst, with_stats = TRUE))
|
||||||
#'
|
#'
|
||||||
#' # print in JSON format:
|
#' # print in JSON format:
|
||||||
#' cat(xgb.dump(bst, with_stats = TRUE, dump_format='json'))
|
#' cat(xgb.dump(bst, with_stats = TRUE, dump_format = "json"))
|
||||||
#'
|
#'
|
||||||
#' # plot first tree leveraging the 'dot' format
|
#' # plot first tree leveraging the 'dot' format
|
||||||
#' if (requireNamespace('DiagrammeR', quietly = TRUE)) {
|
#' if (requireNamespace('DiagrammeR', quietly = TRUE)) {
|
||||||
|
|||||||
@ -1,6 +1,5 @@
|
|||||||
# ggplot backend for the xgboost plotting facilities
|
# ggplot backend for the xgboost plotting facilities
|
||||||
|
|
||||||
|
|
||||||
#' @rdname xgb.plot.importance
|
#' @rdname xgb.plot.importance
|
||||||
#' @export
|
#' @export
|
||||||
xgb.ggplot.importance <- function(importance_matrix = NULL, top_n = NULL, measure = NULL,
|
xgb.ggplot.importance <- function(importance_matrix = NULL, top_n = NULL, measure = NULL,
|
||||||
@ -135,8 +134,7 @@ xgb.ggplot.shap.summary <- function(data, shap_contrib = NULL, features = NULL,
|
|||||||
#' @param data_list The result of `xgb.shap.data()`.
|
#' @param data_list The result of `xgb.shap.data()`.
|
||||||
#' @param normalize Whether to standardize feature values to mean 0 and
|
#' @param normalize Whether to standardize feature values to mean 0 and
|
||||||
#' standard deviation 1. This is useful for comparing multiple features on the same
|
#' standard deviation 1. This is useful for comparing multiple features on the same
|
||||||
#' plot. Default is \code{FALSE}.
|
#' plot. Default is `FALSE`.
|
||||||
#'
|
|
||||||
#' @return A `data.table` containing the observation ID, the feature name, the
|
#' @return A `data.table` containing the observation ID, the feature name, the
|
||||||
#' feature value (normalized if specified), and the SHAP contribution value.
|
#' feature value (normalized if specified), and the SHAP contribution value.
|
||||||
#' @noRd
|
#' @noRd
|
||||||
@ -167,7 +165,6 @@ prepare.ggplot.shap.data <- function(data_list, normalize = FALSE) {
|
|||||||
#' Useful to compare multiple features on the same plot.
|
#' Useful to compare multiple features on the same plot.
|
||||||
#'
|
#'
|
||||||
#' @param x Numeric vector.
|
#' @param x Numeric vector.
|
||||||
#'
|
|
||||||
#' @return Numeric vector with mean 0 and standard deviation 1.
|
#' @return Numeric vector with mean 0 and standard deviation 1.
|
||||||
#' @noRd
|
#' @noRd
|
||||||
#' @keywords internal
|
#' @keywords internal
|
||||||
|
|||||||
@ -2,6 +2,13 @@
|
|||||||
#'
|
#'
|
||||||
#' Creates a `data.table` of feature importances.
|
#' Creates a `data.table` of feature importances.
|
||||||
#'
|
#'
|
||||||
|
#' @details
|
||||||
|
#' This function works for both linear and tree models.
|
||||||
|
#'
|
||||||
|
#' For linear models, the importance is the absolute magnitude of linear coefficients.
|
||||||
|
#' To obtain a meaningful ranking by importance for linear models, the features need to
|
||||||
|
#' be on the same scale (which is also recommended when using L1 or L2 regularization).
|
||||||
|
#'
|
||||||
#' @param feature_names Character vector used to overwrite the feature names
|
#' @param feature_names Character vector used to overwrite the feature names
|
||||||
#' of the model. The default is `NULL` (use original feature names).
|
#' of the model. The default is `NULL` (use original feature names).
|
||||||
#' @param model Object of class `xgb.Booster`.
|
#' @param model Object of class `xgb.Booster`.
|
||||||
@ -14,15 +21,6 @@
|
|||||||
#' @param data Deprecated.
|
#' @param data Deprecated.
|
||||||
#' @param label Deprecated.
|
#' @param label Deprecated.
|
||||||
#' @param target Deprecated.
|
#' @param target Deprecated.
|
||||||
#'
|
|
||||||
#' @details
|
|
||||||
#'
|
|
||||||
#' This function works for both linear and tree models.
|
|
||||||
#'
|
|
||||||
#' For linear models, the importance is the absolute magnitude of linear coefficients.
|
|
||||||
#' To obtain a meaningful ranking by importance for linear models, the features need to
|
|
||||||
#' be on the same scale (which is also recommended when using L1 or L2 regularization).
|
|
||||||
#'
|
|
||||||
#' @return A `data.table` with the following columns:
|
#' @return A `data.table` with the following columns:
|
||||||
#'
|
#'
|
||||||
#' For a tree model:
|
#' For a tree model:
|
||||||
|
|||||||
@ -1,28 +1,27 @@
|
|||||||
#' Load xgboost model from binary file
|
#' Load XGBoost model from binary file
|
||||||
#'
|
#'
|
||||||
#' Load xgboost model from the binary model file.
|
#' Load XGBoost model from binary model file.
|
||||||
#'
|
#'
|
||||||
#' @param modelfile the name of the binary input file.
|
#' @param modelfile The name of the binary input file.
|
||||||
#'
|
#'
|
||||||
#' @details
|
#' @details
|
||||||
#' The input file is expected to contain a model saved in an xgboost model format
|
#' The input file is expected to contain a model saved in an XGBoost model format
|
||||||
#' using either \code{\link{xgb.save}} or \code{\link{xgb.cb.save.model}} in R, or using some
|
#' using either [xgb.save()] in R, or using some
|
||||||
#' appropriate methods from other xgboost interfaces. E.g., a model trained in Python and
|
#' appropriate methods from other XGBoost interfaces. E.g., a model trained in Python and
|
||||||
#' saved from there in xgboost format, could be loaded from R.
|
#' saved from there in XGBoost format, could be loaded from R.
|
||||||
#'
|
#'
|
||||||
#' Note: a model saved as an R-object, has to be loaded using corresponding R-methods,
|
#' Note: a model saved as an R object has to be loaded using corresponding R-methods,
|
||||||
#' not \code{xgb.load}.
|
#' not by [xgb.load()].
|
||||||
#'
|
#'
|
||||||
#' @return
|
#' @return
|
||||||
#' An object of \code{xgb.Booster} class.
|
#' An object of `xgb.Booster` class.
|
||||||
#'
|
#'
|
||||||
#' @seealso
|
#' @seealso [xgb.save()]
|
||||||
#' \code{\link{xgb.save}}
|
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' \dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
#' \dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
||||||
#' data(agaricus.train, package='xgboost')
|
#' data(agaricus.train, package = "xgboost")
|
||||||
#' data(agaricus.test, package='xgboost')
|
#' data(agaricus.test, package = "xgboost")
|
||||||
#'
|
#'
|
||||||
#' ## Keep the number of threads to 1 for examples
|
#' ## Keep the number of threads to 1 for examples
|
||||||
#' nthread <- 1
|
#' nthread <- 1
|
||||||
@ -30,6 +29,7 @@
|
|||||||
#'
|
#'
|
||||||
#' train <- agaricus.train
|
#' train <- agaricus.train
|
||||||
#' test <- agaricus.test
|
#' test <- agaricus.test
|
||||||
|
#'
|
||||||
#' bst <- xgb.train(
|
#' bst <- xgb.train(
|
||||||
#' data = xgb.DMatrix(train$data, label = train$label),
|
#' data = xgb.DMatrix(train$data, label = train$label),
|
||||||
#' max_depth = 2,
|
#' max_depth = 2,
|
||||||
|
|||||||
@ -1,8 +1,8 @@
|
|||||||
#' Load serialised xgboost model from R's raw vector
|
#' Load serialised XGBoost model from R's raw vector
|
||||||
#'
|
#'
|
||||||
#' User can generate raw memory buffer by calling xgb.save.raw
|
#' User can generate raw memory buffer by calling [xgb.save.raw()].
|
||||||
#'
|
#'
|
||||||
#' @param buffer the buffer returned by xgb.save.raw
|
#' @param buffer The buffer returned by [xgb.save.raw()].
|
||||||
#' @export
|
#' @export
|
||||||
xgb.load.raw <- function(buffer) {
|
xgb.load.raw <- function(buffer) {
|
||||||
cachelist <- list()
|
cachelist <- list()
|
||||||
|
|||||||
@ -2,13 +2,12 @@
|
|||||||
#'
|
#'
|
||||||
#' Parse a boosted tree model text dump into a `data.table` structure.
|
#' Parse a boosted tree model text dump into a `data.table` structure.
|
||||||
#'
|
#'
|
||||||
#' @param model Object of class `xgb.Booster`. If it contains feature names (they can be set through
|
#' @param model Object of class `xgb.Booster`. If it contains feature names (they can
|
||||||
#' \link{setinfo}), they will be used in the output from this function.
|
#' be set through [setinfo()]), they will be used in the output from this function.
|
||||||
#' @param text Character vector previously generated by the function [xgb.dump()]
|
#' @param text Character vector previously generated by the function [xgb.dump()]
|
||||||
#' (called with parameter `with_stats = TRUE`). `text` takes precedence over `model`.
|
#' (called with parameter `with_stats = TRUE`). `text` takes precedence over `model`.
|
||||||
#' @param trees An integer vector of tree indices that should be used.
|
#' @param trees An integer vector of tree indices that should be used. The default
|
||||||
#' The default (`NULL`) uses all trees.
|
#' (`NULL`) uses all trees. Useful, e.g., in multiclass classification to get only
|
||||||
#' Useful, e.g., in multiclass classification to get only
|
|
||||||
#' the trees of one class. *Important*: the tree index in XGBoost models
|
#' the trees of one class. *Important*: the tree index in XGBoost models
|
||||||
#' is zero-based (e.g., use `trees = 0:4` for the first five trees).
|
#' is zero-based (e.g., use `trees = 0:4` for the first five trees).
|
||||||
#' @param use_int_id A logical flag indicating whether nodes in columns "Yes", "No", and
|
#' @param use_int_id A logical flag indicating whether nodes in columns "Yes", "No", and
|
||||||
@ -195,7 +194,7 @@ xgb.model.dt.tree <- function(model = NULL, text = NULL,
|
|||||||
td[order(Tree, Node)]
|
td[order(Tree, Node)]
|
||||||
}
|
}
|
||||||
|
|
||||||
# Avoid error messages during CRAN check.
|
# Avoid notes during CRAN check.
|
||||||
# The reason is that these variables are never declared
|
# The reason is that these variables are never declared
|
||||||
# They are mainly column names inferred by Data.table...
|
# They are mainly column names inferred by Data.table...
|
||||||
globalVariables(c("Tree", "Node", "ID", "Feature", "t", "isLeaf", ".SD", ".SDcols"))
|
globalVariables(c("Tree", "Node", "ID", "Feature", "t", "isLeaf", ".SD", ".SDcols"))
|
||||||
|
|||||||
@ -4,7 +4,8 @@
|
|||||||
#' - `xgb.plot.deepness()` uses base R graphics, while
|
#' - `xgb.plot.deepness()` uses base R graphics, while
|
||||||
#' - `xgb.ggplot.deepness()` uses "ggplot2".
|
#' - `xgb.ggplot.deepness()` uses "ggplot2".
|
||||||
#'
|
#'
|
||||||
#' @param model Either an `xgb.Booster` model, or the "data.table" returned by [xgb.model.dt.tree()].
|
#' @param model Either an `xgb.Booster` model, or the "data.table" returned
|
||||||
|
#' by [xgb.model.dt.tree()].
|
||||||
#' @param which Which distribution to plot (see details).
|
#' @param which Which distribution to plot (see details).
|
||||||
#' @param plot Should the plot be shown? Default is `TRUE`.
|
#' @param plot Should the plot be shown? Default is `TRUE`.
|
||||||
#' @param ... Other parameters passed to [graphics::barplot()] or [graphics::plot()].
|
#' @param ... Other parameters passed to [graphics::barplot()] or [graphics::plot()].
|
||||||
|
|||||||
@ -4,6 +4,21 @@
|
|||||||
#' - `xgb.plot.importance()` uses base R graphics, while
|
#' - `xgb.plot.importance()` uses base R graphics, while
|
||||||
#' - `xgb.ggplot.importance()` uses "ggplot".
|
#' - `xgb.ggplot.importance()` uses "ggplot".
|
||||||
#'
|
#'
|
||||||
|
#' @details
|
||||||
|
#' The graph represents each feature as a horizontal bar of length proportional to the
|
||||||
|
#' importance of a feature. Features are sorted by decreasing importance.
|
||||||
|
#' It works for both "gblinear" and "gbtree" models.
|
||||||
|
#'
|
||||||
|
#' When `rel_to_first = FALSE`, the values would be plotted as in `importance_matrix`.
|
||||||
|
#' For a "gbtree" model, that would mean being normalized to the total of 1
|
||||||
|
#' ("what is feature's importance contribution relative to the whole model?").
|
||||||
|
#' For linear models, `rel_to_first = FALSE` would show actual values of the coefficients.
|
||||||
|
#' Setting `rel_to_first = TRUE` allows to see the picture from the perspective of
|
||||||
|
#' "what is feature's importance contribution relative to the most important feature?"
|
||||||
|
#'
|
||||||
|
#' The "ggplot" backend performs 1-D clustering of the importance values,
|
||||||
|
#' with bar colors corresponding to different clusters having similar importance values.
|
||||||
|
#'
|
||||||
#' @param importance_matrix A `data.table` as returned by [xgb.importance()].
|
#' @param importance_matrix A `data.table` as returned by [xgb.importance()].
|
||||||
#' @param top_n Maximal number of top features to include into the plot.
|
#' @param top_n Maximal number of top features to include into the plot.
|
||||||
#' @param measure The name of importance measure to plot.
|
#' @param measure The name of importance measure to plot.
|
||||||
@ -19,22 +34,6 @@
|
|||||||
#' @param ... Other parameters passed to [graphics::barplot()]
|
#' @param ... Other parameters passed to [graphics::barplot()]
|
||||||
#' (except `horiz`, `border`, `cex.names`, `names.arg`, and `las`).
|
#' (except `horiz`, `border`, `cex.names`, `names.arg`, and `las`).
|
||||||
#' Only used in `xgb.plot.importance()`.
|
#' Only used in `xgb.plot.importance()`.
|
||||||
#'
|
|
||||||
#' @details
|
|
||||||
#' The graph represents each feature as a horizontal bar of length proportional to the importance of a feature.
|
|
||||||
#' Features are sorted by decreasing importance.
|
|
||||||
#' It works for both "gblinear" and "gbtree" models.
|
|
||||||
#'
|
|
||||||
#' When `rel_to_first = FALSE`, the values would be plotted as in `importance_matrix`.
|
|
||||||
#' For a "gbtree" model, that would mean being normalized to the total of 1
|
|
||||||
#' ("what is feature's importance contribution relative to the whole model?").
|
|
||||||
#' For linear models, `rel_to_first = FALSE` would show actual values of the coefficients.
|
|
||||||
#' Setting `rel_to_first = TRUE` allows to see the picture from the perspective of
|
|
||||||
#' "what is feature's importance contribution relative to the most important feature?"
|
|
||||||
#'
|
|
||||||
#' The "ggplot" backend performs 1-D clustering of the importance values,
|
|
||||||
#' with bar colors corresponding to different clusters having similar importance values.
|
|
||||||
#'
|
|
||||||
#' @return
|
#' @return
|
||||||
#' The return value depends on the function:
|
#' The return value depends on the function:
|
||||||
#' - `xgb.plot.importance()`: Invisibly, a "data.table" with `n_top` features sorted
|
#' - `xgb.plot.importance()`: Invisibly, a "data.table" with `n_top` features sorted
|
||||||
|
|||||||
@ -2,12 +2,7 @@
|
|||||||
#'
|
#'
|
||||||
#' Visualization of the ensemble of trees as a single collective unit.
|
#' Visualization of the ensemble of trees as a single collective unit.
|
||||||
#'
|
#'
|
||||||
#' @inheritParams xgb.plot.tree
|
|
||||||
#' @param features_keep Number of features to keep in each position of the multi trees,
|
|
||||||
#' by default 5.
|
|
||||||
#'
|
|
||||||
#' @details
|
#' @details
|
||||||
#'
|
|
||||||
#' This function tries to capture the complexity of a gradient boosted tree model
|
#' This function tries to capture the complexity of a gradient boosted tree model
|
||||||
#' in a cohesive way by compressing an ensemble of trees into a single tree-graph representation.
|
#' in a cohesive way by compressing an ensemble of trees into a single tree-graph representation.
|
||||||
#' The goal is to improve the interpretability of a model generally seen as black box.
|
#' The goal is to improve the interpretability of a model generally seen as black box.
|
||||||
@ -25,6 +20,9 @@
|
|||||||
#' This function is inspired by this blog post:
|
#' This function is inspired by this blog post:
|
||||||
#' <https://wellecks.wordpress.com/2015/02/21/peering-into-the-black-box-visualizing-lambdamart/>
|
#' <https://wellecks.wordpress.com/2015/02/21/peering-into-the-black-box-visualizing-lambdamart/>
|
||||||
#'
|
#'
|
||||||
|
#' @inheritParams xgb.plot.tree
|
||||||
|
#' @param features_keep Number of features to keep in each position of the multi trees,
|
||||||
|
#' by default 5.
|
||||||
#' @inherit xgb.plot.tree return
|
#' @inherit xgb.plot.tree return
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
|
|||||||
@ -5,11 +5,10 @@
|
|||||||
#' @param data The data to explain as a `matrix` or `dgCMatrix`.
|
#' @param data The data to explain as a `matrix` or `dgCMatrix`.
|
||||||
#' @param shap_contrib Matrix of SHAP contributions of `data`.
|
#' @param shap_contrib Matrix of SHAP contributions of `data`.
|
||||||
#' The default (`NULL`) computes it from `model` and `data`.
|
#' The default (`NULL`) computes it from `model` and `data`.
|
||||||
#' @param features Vector of column indices or feature names to plot.
|
#' @param features Vector of column indices or feature names to plot. When `NULL`
|
||||||
#' When `NULL` (default), the `top_n` most important features are selected
|
#' (default), the `top_n` most important features are selected by [xgb.importance()].
|
||||||
#' by [xgb.importance()].
|
|
||||||
#' @param top_n How many of the most important features (<= 100) should be selected?
|
#' @param top_n How many of the most important features (<= 100) should be selected?
|
||||||
#' By default 1 for SHAP dependence and 10 for SHAP summary).
|
#' By default 1 for SHAP dependence and 10 for SHAP summary.
|
||||||
#' Only used when `features = NULL`.
|
#' Only used when `features = NULL`.
|
||||||
#' @param model An `xgb.Booster` model. Only required when `shap_contrib = NULL` or
|
#' @param model An `xgb.Booster` model. Only required when `shap_contrib = NULL` or
|
||||||
#' `features = NULL`.
|
#' `features = NULL`.
|
||||||
@ -120,6 +119,7 @@
|
|||||||
#' )
|
#' )
|
||||||
#' trees0 <- seq(from = 0, by = nclass, length.out = nrounds)
|
#' trees0 <- seq(from = 0, by = nclass, length.out = nrounds)
|
||||||
#' col <- rgb(0, 0, 1, 0.5)
|
#' col <- rgb(0, 0, 1, 0.5)
|
||||||
|
#'
|
||||||
#' xgb.plot.shap(
|
#' xgb.plot.shap(
|
||||||
#' x,
|
#' x,
|
||||||
#' model = mbst,
|
#' model = mbst,
|
||||||
|
|||||||
@ -2,36 +2,7 @@
|
|||||||
#'
|
#'
|
||||||
#' Read a tree model text dump and plot the model.
|
#' Read a tree model text dump and plot the model.
|
||||||
#'
|
#'
|
||||||
#' @param model Object of class `xgb.Booster`. If it contains feature names (they can be set through
|
|
||||||
#' \link{setinfo}), they will be used in the output from this function.
|
|
||||||
#' @param trees An integer vector of tree indices that should be used.
|
|
||||||
#' The default (`NULL`) uses all trees.
|
|
||||||
#' Useful, e.g., in multiclass classification to get only
|
|
||||||
#' the trees of one class. *Important*: the tree index in XGBoost models
|
|
||||||
#' is zero-based (e.g., use `trees = 0:2` for the first three trees).
|
|
||||||
#' @param plot_width,plot_height Width and height of the graph in pixels.
|
|
||||||
#' The values are passed to [DiagrammeR::render_graph()].
|
|
||||||
#' @param render Should the graph be rendered or not? The default is `TRUE`.
|
|
||||||
#' @param show_node_id a logical flag for whether to show node id's in the graph.
|
|
||||||
#' @param style Style to use for the plot. Options are:\itemize{
|
|
||||||
#' \item `"xgboost"`: will use the plot style defined in the core XGBoost library,
|
|
||||||
#' which is shared between different interfaces through the 'dot' format. This
|
|
||||||
#' style was not available before version 2.1.0 in R. It always plots the trees
|
|
||||||
#' vertically (from top to bottom).
|
|
||||||
#' \item `"R"`: will use the style defined from XGBoost's R interface, which predates
|
|
||||||
#' the introducition of the standardized style from the core library. It might plot
|
|
||||||
#' the trees horizontally (from left to right).
|
|
||||||
#' }
|
|
||||||
#'
|
|
||||||
#' Note that `style="xgboost"` is only supported when all of the following conditions are met:\itemize{
|
|
||||||
#' \item Only a single tree is being plotted.
|
|
||||||
#' \item Node IDs are not added to the graph.
|
|
||||||
#' \item The graph is being returned as `htmlwidget` (`render=TRUE`).
|
|
||||||
#' }
|
|
||||||
#' @param ... currently not used.
|
|
||||||
#'
|
|
||||||
#' @details
|
#' @details
|
||||||
#'
|
|
||||||
#' When using `style="xgboost"`, the content of each node is visualized as follows:
|
#' When using `style="xgboost"`, the content of each node is visualized as follows:
|
||||||
#' - For non-terminal nodes, it will display the split condition (number or name if
|
#' - For non-terminal nodes, it will display the split condition (number or name if
|
||||||
#' available, and the condition that would decide to which node to go next).
|
#' available, and the condition that would decide to which node to go next).
|
||||||
@ -56,6 +27,31 @@
|
|||||||
#'
|
#'
|
||||||
#' This function uses [GraphViz](https://www.graphviz.org/) as DiagrammeR backend.
|
#' This function uses [GraphViz](https://www.graphviz.org/) as DiagrammeR backend.
|
||||||
#'
|
#'
|
||||||
|
#' @param model Object of class `xgb.Booster`. If it contains feature names (they can be set through
|
||||||
|
#' \link{setinfo}), they will be used in the output from this function.
|
||||||
|
#' @param trees An integer vector of tree indices that should be used.
|
||||||
|
#' The default (`NULL`) uses all trees.
|
||||||
|
#' Useful, e.g., in multiclass classification to get only
|
||||||
|
#' the trees of one class. *Important*: the tree index in XGBoost models
|
||||||
|
#' is zero-based (e.g., use `trees = 0:2` for the first three trees).
|
||||||
|
#' @param plot_width,plot_height Width and height of the graph in pixels.
|
||||||
|
#' The values are passed to `DiagrammeR::render_graph()`.
|
||||||
|
#' @param render Should the graph be rendered or not? The default is `TRUE`.
|
||||||
|
#' @param show_node_id a logical flag for whether to show node id's in the graph.
|
||||||
|
#' @param style Style to use for the plot:
|
||||||
|
#' - `"xgboost"`: will use the plot style defined in the core XGBoost library,
|
||||||
|
#' which is shared between different interfaces through the 'dot' format. This
|
||||||
|
#' style was not available before version 2.1.0 in R. It always plots the trees
|
||||||
|
#' vertically (from top to bottom).
|
||||||
|
#' - `"R"`: will use the style defined from XGBoost's R interface, which predates
|
||||||
|
#' the introducition of the standardized style from the core library. It might plot
|
||||||
|
#' the trees horizontally (from left to right).
|
||||||
|
#'
|
||||||
|
#' Note that `style="xgboost"` is only supported when all of the following conditions are met:
|
||||||
|
#' - Only a single tree is being plotted.
|
||||||
|
#' - Node IDs are not added to the graph.
|
||||||
|
#' - The graph is being returned as `htmlwidget` (`render=TRUE`).
|
||||||
|
#' @param ... Currently not used.
|
||||||
#' @return
|
#' @return
|
||||||
#' The value depends on the `render` parameter:
|
#' The value depends on the `render` parameter:
|
||||||
#' - If `render = TRUE` (default): Rendered graph object which is an htmlwidget of
|
#' - If `render = TRUE` (default): Rendered graph object which is an htmlwidget of
|
||||||
@ -63,7 +59,7 @@
|
|||||||
#' running from the command line.
|
#' running from the command line.
|
||||||
#' - If `render = FALSE`: Graph object which is of DiagrammeR's class `dgr_graph`.
|
#' - If `render = FALSE`: Graph object which is of DiagrammeR's class `dgr_graph`.
|
||||||
#' This could be useful if one wants to modify some of the graph attributes
|
#' This could be useful if one wants to modify some of the graph attributes
|
||||||
#' before rendering the graph with [DiagrammeR::render_graph()].
|
#' before rendering the graph with `DiagrammeR::render_graph()`.
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' data(agaricus.train, package = "xgboost")
|
#' data(agaricus.train, package = "xgboost")
|
||||||
|
|||||||
@ -1,43 +1,39 @@
|
|||||||
#' Save xgboost model to binary file
|
#' Save XGBoost model to binary file
|
||||||
#'
|
#'
|
||||||
#' Save xgboost model to a file in binary or JSON format.
|
#' Save XGBoost model to a file in binary or JSON format.
|
||||||
#'
|
#'
|
||||||
#' @param model Model object of \code{xgb.Booster} class.
|
#' @param model Model object of \code{xgb.Booster} class.
|
||||||
#' @param fname Name of the file to write.
|
#' @param fname Name of the file to write. Its extension determines the serialization format:
|
||||||
#'
|
#' - ".ubj": Use the universal binary JSON format (recommended).
|
||||||
#' Note that the extension of this file name determined the serialization format to use:\itemize{
|
|
||||||
#' \item Extension ".ubj" will use the universal binary JSON format (recommended).
|
|
||||||
#' This format uses binary types for e.g. floating point numbers, thereby preventing any loss
|
#' This format uses binary types for e.g. floating point numbers, thereby preventing any loss
|
||||||
#' of precision when converting to a human-readable JSON text or similar.
|
#' of precision when converting to a human-readable JSON text or similar.
|
||||||
#' \item Extension ".json" will use plain JSON, which is a human-readable format.
|
#' - ".json": Use plain JSON, which is a human-readable format.
|
||||||
#' \item Extension ".deprecated" will use a \bold{deprecated} binary format. This format will
|
#' - ".deprecated": Use **deprecated** binary format. This format will
|
||||||
#' not be able to save attributes introduced after v1 of XGBoost, such as the "best_iteration"
|
#' not be able to save attributes introduced after v1 of XGBoost, such as the "best_iteration"
|
||||||
#' attribute that boosters might keep, nor feature names or user-specifiec attributes.
|
#' attribute that boosters might keep, nor feature names or user-specifiec attributes.
|
||||||
#' \item If the format is not specified by passing one of the file extensions above, will
|
#' - If the format is not specified by passing one of the file extensions above, will
|
||||||
#' default to UBJ.
|
#' default to UBJ.
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' @details
|
#' @details
|
||||||
#' This methods allows to save a model in an xgboost-internal binary or text format which is universal
|
|
||||||
#' among the various xgboost interfaces. In R, the saved model file could be read-in later
|
|
||||||
#' using either the \code{\link{xgb.load}} function or the \code{xgb_model} parameter
|
|
||||||
#' of \code{\link{xgb.train}}.
|
|
||||||
#'
|
#'
|
||||||
#' Note: a model can also be saved as an R-object (e.g., by using \code{\link[base]{readRDS}}
|
#' This methods allows to save a model in an XGBoost-internal binary or text format which is universal
|
||||||
#' or \code{\link[base]{save}}). However, it would then only be compatible with R, and
|
#' among the various xgboost interfaces. In R, the saved model file could be read later
|
||||||
#' corresponding R-methods would need to be used to load it. Moreover, persisting the model with
|
#' using either the [xgb.load()] function or the `xgb_model` parameter of [xgb.train()].
|
||||||
#' \code{\link[base]{readRDS}} or \code{\link[base]{save}}) might cause compatibility problems in
|
#'
|
||||||
#' future versions of XGBoost. Consult \code{\link{a-compatibility-note-for-saveRDS-save}} to learn
|
#' Note: a model can also be saved as an R object (e.g., by using [readRDS()]
|
||||||
#' how to persist models in a future-proof way, i.e. to make the model accessible in future
|
#' or [save()]). However, it would then only be compatible with R, and
|
||||||
|
#' corresponding R methods would need to be used to load it. Moreover, persisting the model with
|
||||||
|
#' [readRDS()] or [save()] might cause compatibility problems in
|
||||||
|
#' future versions of XGBoost. Consult [a-compatibility-note-for-saveRDS-save] to learn
|
||||||
|
#' how to persist models in a future-proof way, i.e., to make the model accessible in future
|
||||||
#' releases of XGBoost.
|
#' releases of XGBoost.
|
||||||
#'
|
#'
|
||||||
#' @seealso
|
#' @seealso [xgb.load()]
|
||||||
#' \code{\link{xgb.load}}
|
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' \dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
#' \dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
||||||
#' data(agaricus.train, package='xgboost')
|
#' data(agaricus.train, package = "xgboost")
|
||||||
#' data(agaricus.test, package='xgboost')
|
#' data(agaricus.test, package = "xgboost")
|
||||||
#'
|
#'
|
||||||
#' ## Keep the number of threads to 1 for examples
|
#' ## Keep the number of threads to 1 for examples
|
||||||
#' nthread <- 1
|
#' nthread <- 1
|
||||||
@ -45,6 +41,7 @@
|
|||||||
#'
|
#'
|
||||||
#' train <- agaricus.train
|
#' train <- agaricus.train
|
||||||
#' test <- agaricus.test
|
#' test <- agaricus.test
|
||||||
|
#'
|
||||||
#' bst <- xgb.train(
|
#' bst <- xgb.train(
|
||||||
#' data = xgb.DMatrix(train$data, label = train$label),
|
#' data = xgb.DMatrix(train$data, label = train$label),
|
||||||
#' max_depth = 2,
|
#' max_depth = 2,
|
||||||
@ -53,6 +50,7 @@
|
|||||||
#' nrounds = 2,
|
#' nrounds = 2,
|
||||||
#' objective = "binary:logistic"
|
#' objective = "binary:logistic"
|
||||||
#' )
|
#' )
|
||||||
|
#'
|
||||||
#' fname <- file.path(tempdir(), "xgb.ubj")
|
#' fname <- file.path(tempdir(), "xgb.ubj")
|
||||||
#' xgb.save(bst, fname)
|
#' xgb.save(bst, fname)
|
||||||
#' bst <- xgb.load(fname)
|
#' bst <- xgb.load(fname)
|
||||||
|
|||||||
@ -1,29 +1,34 @@
|
|||||||
#' Save xgboost model to R's raw vector,
|
#' Save XGBoost model to R's raw vector
|
||||||
#' user can call xgb.load.raw to load the model back from raw vector
|
|
||||||
#'
|
#'
|
||||||
#' Save xgboost model from xgboost or xgb.train
|
#' Save XGBoost model from [xgboost()] or [xgb.train()].
|
||||||
|
#' Call [xgb.load.raw()] to load the model back from raw vector.
|
||||||
#'
|
#'
|
||||||
#' @param model the model object.
|
#' @param model The model object.
|
||||||
#' @param raw_format The format for encoding the booster. Available options are
|
#' @param raw_format The format for encoding the booster:
|
||||||
#' \itemize{
|
#' - "json": Encode the booster into JSON text document.
|
||||||
#' \item \code{json}: Encode the booster into JSON text document.
|
#' - "ubj": Encode the booster into Universal Binary JSON.
|
||||||
#' \item \code{ubj}: Encode the booster into Universal Binary JSON.
|
#' - "deprecated": Encode the booster into old customized binary format.
|
||||||
#' \item \code{deprecated}: Encode the booster into old customized binary format.
|
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' \dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
#' \dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
||||||
#' data(agaricus.train, package='xgboost')
|
#' data(agaricus.train, package = "xgboost")
|
||||||
#' data(agaricus.test, package='xgboost')
|
#' data(agaricus.test, package = "xgboost")
|
||||||
#'
|
#'
|
||||||
#' ## Keep the number of threads to 2 for examples
|
#' ## Keep the number of threads to 1 for examples
|
||||||
#' nthread <- 2
|
#' nthread <- 1
|
||||||
#' data.table::setDTthreads(nthread)
|
#' data.table::setDTthreads(nthread)
|
||||||
#'
|
#'
|
||||||
#' train <- agaricus.train
|
#' train <- agaricus.train
|
||||||
#' test <- agaricus.test
|
#' test <- agaricus.test
|
||||||
#' bst <- xgb.train(data = xgb.DMatrix(train$data, label = train$label), max_depth = 2,
|
#'
|
||||||
#' eta = 1, nthread = nthread, nrounds = 2,objective = "binary:logistic")
|
#' bst <- xgb.train(
|
||||||
|
#' data = xgb.DMatrix(train$data, label = train$label),
|
||||||
|
#' max_depth = 2,
|
||||||
|
#' eta = 1,
|
||||||
|
#' nthread = nthread,
|
||||||
|
#' nrounds = 2,
|
||||||
|
#' objective = "binary:logistic"
|
||||||
|
#' )
|
||||||
#'
|
#'
|
||||||
#' raw <- xgb.save.raw(bst)
|
#' raw <- xgb.save.raw(bst)
|
||||||
#' bst <- xgb.load.raw(raw)
|
#' bst <- xgb.load.raw(raw)
|
||||||
|
|||||||
@ -944,6 +944,7 @@ xgboost <- function(
|
|||||||
return(model)
|
return(model)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#' @method print xgboost
|
||||||
#' @export
|
#' @export
|
||||||
print.xgboost <- function(x, ...) {
|
print.xgboost <- function(x, ...) {
|
||||||
cat("XGBoost model object\n")
|
cat("XGBoost model object\n")
|
||||||
|
|||||||
@ -5,66 +5,77 @@
|
|||||||
\title{Model Serialization and Compatibility}
|
\title{Model Serialization and Compatibility}
|
||||||
\description{
|
\description{
|
||||||
When it comes to serializing XGBoost models, it's possible to use R serializers such as
|
When it comes to serializing XGBoost models, it's possible to use R serializers such as
|
||||||
\link{save} or \link{saveRDS} to serialize an XGBoost R model, but XGBoost also provides
|
\code{\link[=save]{save()}} or \code{\link[=saveRDS]{saveRDS()}} to serialize an XGBoost R model, but XGBoost also provides
|
||||||
its own serializers with better compatibility guarantees, which allow loading
|
its own serializers with better compatibility guarantees, which allow loading
|
||||||
said models in other language bindings of XGBoost.
|
said models in other language bindings of XGBoost.
|
||||||
|
|
||||||
Note that an \code{xgb.Booster} object, outside of its core components, might also keep:\itemize{
|
Note that an \code{xgb.Booster} object, outside of its core components, might also keep:
|
||||||
\item Additional model configuration (accessible through \link{xgb.config}),
|
\itemize{
|
||||||
which includes model fitting parameters like \code{max_depth} and runtime parameters like \code{nthread}.
|
\item Additional model configuration (accessible through \code{\link[=xgb.config]{xgb.config()}}), which includes
|
||||||
|
model fitting parameters like \code{max_depth} and runtime parameters like \code{nthread}.
|
||||||
These are not necessarily useful for prediction/importance/plotting.
|
These are not necessarily useful for prediction/importance/plotting.
|
||||||
\item Additional R-specific attributes - e.g. results of callbacks, such as evaluation logs,
|
\item Additional R specific attributes - e.g. results of callbacks, such as evaluation logs,
|
||||||
which are kept as a \code{data.table} object, accessible through \code{attributes(model)$evaluation_log}
|
which are kept as a \code{data.table} object, accessible through
|
||||||
if present.
|
\code{attributes(model)$evaluation_log} if present.
|
||||||
}
|
}
|
||||||
|
|
||||||
The first one (configurations) does not have the same compatibility guarantees as
|
The first one (configurations) does not have the same compatibility guarantees as
|
||||||
the model itself, including attributes that are set and accessed through \link{xgb.attributes} - that is, such configuration
|
the model itself, including attributes that are set and accessed through
|
||||||
might be lost after loading the booster in a different XGBoost version, regardless of the
|
\code{\link[=xgb.attributes]{xgb.attributes()}} - that is, such configuration might be lost after loading the
|
||||||
serializer that was used. These are saved when using \link{saveRDS}, but will be discarded
|
booster in a different XGBoost version, regardless of the serializer that was used.
|
||||||
if loaded into an incompatible XGBoost version. They are not saved when using XGBoost's
|
These are saved when using \code{\link[=saveRDS]{saveRDS()}}, but will be discarded if loaded into an
|
||||||
serializers from its public interface including \link{xgb.save} and \link{xgb.save.raw}.
|
incompatible XGBoost version. They are not saved when using XGBoost's
|
||||||
|
serializers from its public interface including \code{\link[=xgb.save]{xgb.save()}} and \code{\link[=xgb.save.raw]{xgb.save.raw()}}.
|
||||||
|
|
||||||
The second ones (R attributes) are not part of the standard XGBoost model structure, and thus are
|
The second ones (R attributes) are not part of the standard XGBoost model structure,
|
||||||
not saved when using XGBoost's own serializers. These attributes are only used for informational
|
and thus are not saved when using XGBoost's own serializers. These attributes are
|
||||||
purposes, such as keeping track of evaluation metrics as the model was fit, or saving the R
|
only used for informational purposes, such as keeping track of evaluation metrics as
|
||||||
call that produced the model, but are otherwise not used for prediction / importance / plotting / etc.
|
the model was fit, or saving the R call that produced the model, but are otherwise
|
||||||
|
not used for prediction / importance / plotting / etc.
|
||||||
These R attributes are only preserved when using R's serializers.
|
These R attributes are only preserved when using R's serializers.
|
||||||
|
|
||||||
Note that XGBoost models in R starting from version \verb{2.1.0} and onwards, and XGBoost models
|
Note that XGBoost models in R starting from version \verb{2.1.0} and onwards, and
|
||||||
before version \verb{2.1.0}; have a very different R object structure and are incompatible with
|
XGBoost models before version \verb{2.1.0}; have a very different R object structure and
|
||||||
each other. Hence, models that were saved with R serializers live \code{saveRDS} or \code{save} before
|
are incompatible with each other. Hence, models that were saved with R serializers
|
||||||
version \verb{2.1.0} will not work with latter \code{xgboost} versions and vice versa. Be aware that
|
like \code{\link[=saveRDS]{saveRDS()}} or \code{\link[=save]{save()}} before version \verb{2.1.0} will not work with latter
|
||||||
the structure of R model objects could in theory change again in the future, so XGBoost's serializers
|
\code{xgboost} versions and vice versa. Be aware that the structure of R model objects
|
||||||
|
could in theory change again in the future, so XGBoost's serializers
|
||||||
should be preferred for long-term storage.
|
should be preferred for long-term storage.
|
||||||
|
|
||||||
Furthermore, note that using the package \code{qs} for serialization will require version 0.26 or
|
Furthermore, note that using the package \code{qs} for serialization will require
|
||||||
higher of said package, and will have the same compatibility restrictions as R serializers.
|
version 0.26 or higher of said package, and will have the same compatibility
|
||||||
|
restrictions as R serializers.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
Use \code{\link{xgb.save}} to save the XGBoost model as a stand-alone file. You may opt into
|
Use \code{\link[=xgb.save]{xgb.save()}} to save the XGBoost model as a stand-alone file. You may opt into
|
||||||
the JSON format by specifying the JSON extension. To read the model back, use
|
the JSON format by specifying the JSON extension. To read the model back, use
|
||||||
\code{\link{xgb.load}}.
|
\code{\link[=xgb.load]{xgb.load()}}.
|
||||||
|
|
||||||
Use \code{\link{xgb.save.raw}} to save the XGBoost model as a sequence (vector) of raw bytes
|
Use \code{\link[=xgb.save.raw]{xgb.save.raw()}} to save the XGBoost model as a sequence (vector) of raw bytes
|
||||||
in a future-proof manner. Future releases of XGBoost will be able to read the raw bytes and
|
in a future-proof manner. Future releases of XGBoost will be able to read the raw bytes and
|
||||||
re-construct the corresponding model. To read the model back, use \code{\link{xgb.load.raw}}.
|
re-construct the corresponding model. To read the model back, use \code{\link[=xgb.load.raw]{xgb.load.raw()}}.
|
||||||
The \code{\link{xgb.save.raw}} function is useful if you'd like to persist the XGBoost model
|
The \code{\link[=xgb.save.raw]{xgb.save.raw()}} function is useful if you would like to persist the XGBoost model
|
||||||
as part of another R object.
|
as part of another R object.
|
||||||
|
|
||||||
Use \link{saveRDS} if you require the R-specific attributes that a booster might have, such
|
Use \code{\link[=saveRDS]{saveRDS()}} if you require the R-specific attributes that a booster might have, such
|
||||||
as evaluation logs, but note that future compatibility of such objects is outside XGBoost's
|
as evaluation logs, but note that future compatibility of such objects is outside XGBoost's
|
||||||
control as it relies on R's serialization format (see e.g. the details section in
|
control as it relies on R's serialization format (see e.g. the details section in
|
||||||
\link{serialize} and \link{save} from base R).
|
\link{serialize} and \code{\link[=save]{save()}} from base R).
|
||||||
|
|
||||||
For more details and explanation about model persistence and archival, consult the page
|
For more details and explanation about model persistence and archival, consult the page
|
||||||
\url{https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html}.
|
\url{https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html}.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
data(agaricus.train, package='xgboost')
|
data(agaricus.train, package = "xgboost")
|
||||||
bst <- xgb.train(data = xgb.DMatrix(agaricus.train$data, label = agaricus.train$label),
|
|
||||||
max_depth = 2, eta = 1, nthread = 2, nrounds = 2,
|
bst <- xgb.train(
|
||||||
objective = "binary:logistic")
|
data = xgb.DMatrix(agaricus.train$data, label = agaricus.train$label),
|
||||||
|
max_depth = 2,
|
||||||
|
eta = 1,
|
||||||
|
nthread = 2,
|
||||||
|
nrounds = 2,
|
||||||
|
objective = "binary:logistic"
|
||||||
|
)
|
||||||
|
|
||||||
# Save as a stand-alone file; load it with xgb.load()
|
# Save as a stand-alone file; load it with xgb.load()
|
||||||
fname <- file.path(tempdir(), "xgb_model.ubj")
|
fname <- file.path(tempdir(), "xgb_model.ubj")
|
||||||
|
|||||||
@ -12,11 +12,12 @@
|
|||||||
\item{...}{Not used.}
|
\item{...}{Not used.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
The extracted coefficients:\itemize{
|
The extracted coefficients:
|
||||||
\item If there's only one coefficient per column in the data, will be returned as a
|
\itemize{
|
||||||
|
\item If there is only one coefficient per column in the data, will be returned as a
|
||||||
vector, potentially containing the feature names if available, with the intercept
|
vector, potentially containing the feature names if available, with the intercept
|
||||||
as first column.
|
as first column.
|
||||||
\item If there's more than one coefficient per column in the data (e.g. when using
|
\item If there is more than one coefficient per column in the data (e.g. when using
|
||||||
\code{objective="multi:softmax"}), will be returned as a matrix with dimensions equal
|
\code{objective="multi:softmax"}), will be returned as a matrix with dimensions equal
|
||||||
to \verb{[num_features, num_cols]}, with the intercepts as first row. Note that the column
|
to \verb{[num_features, num_cols]}, with the intercepts as first row. Note that the column
|
||||||
(classes in multi-class classification) dimension will not be named.
|
(classes in multi-class classification) dimension will not be named.
|
||||||
@ -33,16 +34,19 @@ coefficients as used by \link{predict.xgb.Booster}.
|
|||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Extracts the coefficients from a 'gblinear' booster object,
|
Extracts the coefficients from a 'gblinear' booster object,
|
||||||
as produced by \code{xgb.train} when using parameter \code{booster="gblinear"}.
|
as produced by \code{\link[=xgb.train]{xgb.train()}} when using parameter \code{booster="gblinear"}.
|
||||||
|
|
||||||
Note: this function will error out if passing a booster model
|
Note: this function will error out if passing a booster model
|
||||||
which is not of "gblinear" type.
|
which is not of "gblinear" type.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
library(xgboost)
|
library(xgboost)
|
||||||
|
|
||||||
data(mtcars)
|
data(mtcars)
|
||||||
|
|
||||||
y <- mtcars[, 1]
|
y <- mtcars[, 1]
|
||||||
x <- as.matrix(mtcars[, -1])
|
x <- as.matrix(mtcars[, -1])
|
||||||
|
|
||||||
dm <- xgb.DMatrix(data = x, label = y, nthread = 1)
|
dm <- xgb.DMatrix(data = x, label = y, nthread = 1)
|
||||||
params <- list(booster = "gblinear", nthread = 1)
|
params <- list(booster = "gblinear", nthread = 1)
|
||||||
model <- xgb.train(data = dm, params = params, nrounds = 2)
|
model <- xgb.train(data = dm, params = params, nrounds = 2)
|
||||||
|
|||||||
@ -28,35 +28,36 @@
|
|||||||
\item{newdata}{Takes \code{data.frame}, \code{matrix}, \code{dgCMatrix}, \code{dgRMatrix}, \code{dsparseVector},
|
\item{newdata}{Takes \code{data.frame}, \code{matrix}, \code{dgCMatrix}, \code{dgRMatrix}, \code{dsparseVector},
|
||||||
local data file, or \code{xgb.DMatrix}.
|
local data file, or \code{xgb.DMatrix}.
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ For single-row predictions on sparse data, it's recommended to use CSR format. If passing
|
For single-row predictions on sparse data, it is recommended to use CSR format. If passing
|
||||||
a sparse vector, it will take it as a row vector.
|
a sparse vector, it will take it as a row vector.
|
||||||
|
|
||||||
Note that, for repeated predictions on the same data, one might want to create a DMatrix to
|
Note that, for repeated predictions on the same data, one might want to create a DMatrix to
|
||||||
pass here instead of passing R types like matrices or data frames, as predictions will be
|
pass here instead of passing R types like matrices or data frames, as predictions will be
|
||||||
faster on DMatrix.
|
faster on DMatrix.
|
||||||
|
|
||||||
If `newdata` is a `data.frame`, be aware that:\\itemize\{
|
If \code{newdata} is a \code{data.frame}, be aware that:
|
||||||
\\item Columns will be converted to numeric if they aren't already, which could potentially make
|
\itemize{
|
||||||
the operation slower than in an equivalent `matrix` object.
|
\item Columns will be converted to numeric if they aren't already, which could potentially make
|
||||||
\\item The order of the columns must match with that of the data from which the model was fitted
|
the operation slower than in an equivalent \code{matrix} object.
|
||||||
|
\item The order of the columns must match with that of the data from which the model was fitted
|
||||||
(i.e. columns will not be referenced by their names, just by their order in the data).
|
(i.e. columns will not be referenced by their names, just by their order in the data).
|
||||||
\\item If the model was fitted to data with categorical columns, these columns must be of
|
\item If the model was fitted to data with categorical columns, these columns must be of
|
||||||
`factor` type here, and must use the same encoding (i.e. have the same levels).
|
\code{factor} type here, and must use the same encoding (i.e. have the same levels).
|
||||||
\\item If `newdata` contains any `factor` columns, they will be converted to base-0
|
\item If \code{newdata} contains any \code{factor} columns, they will be converted to base-0
|
||||||
encoding (same as during DMatrix creation) - hence, one should not pass a `factor`
|
encoding (same as during DMatrix creation) - hence, one should not pass a \code{factor}
|
||||||
under a column which during training had a different type.
|
under a column which during training had a different type.
|
||||||
\}
|
}}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{missing}{Float value that represents missing values in data (e.g., 0 or some other extreme value).
|
\item{missing}{Float value that represents missing values in data
|
||||||
|
(e.g., 0 or some other extreme value).
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ This parameter is not used when `newdata` is an `xgb.DMatrix` - in such cases, should pass
|
This parameter is not used when \code{newdata} is an \code{xgb.DMatrix} - in such cases,
|
||||||
this as an argument to the DMatrix constructor instead.
|
should pass this as an argument to the DMatrix constructor instead.}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{outputmargin}{Whether the prediction should be returned in the form of original untransformed
|
\item{outputmargin}{Whether the prediction should be returned in the form of
|
||||||
sum of predictions from boosting iterations' results. E.g., setting \code{outputmargin=TRUE} for
|
original untransformed sum of predictions from boosting iterations' results.
|
||||||
logistic regression would return log-odds instead of probabilities.}
|
E.g., setting \code{outputmargin = TRUE} for logistic regression would return log-odds
|
||||||
|
instead of probabilities.}
|
||||||
|
|
||||||
\item{predleaf}{Whether to predict per-tree leaf indices.}
|
\item{predleaf}{Whether to predict per-tree leaf indices.}
|
||||||
|
|
||||||
@ -73,60 +74,55 @@ training predicting will perform dropout.}
|
|||||||
a two-dimensional vector with the start and end numbers in the sequence (same format as R's \code{seq} - i.e.
|
a two-dimensional vector with the start and end numbers in the sequence (same format as R's \code{seq} - i.e.
|
||||||
base-1 indexing, and inclusive of both ends).
|
base-1 indexing, and inclusive of both ends).
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ For example, passing `c(1,20)` will predict using the first twenty iterations, while passing `c(1,1)` will
|
For example, passing \code{c(1,20)} will predict using the first twenty iterations, while passing \code{c(1,1)} will
|
||||||
predict using only the first one.
|
predict using only the first one.
|
||||||
|
|
||||||
If passing `NULL`, will either stop at the best iteration if the model used early stopping, or use all
|
If passing \code{NULL}, will either stop at the best iteration if the model used early stopping, or use all
|
||||||
of the iterations (rounds) otherwise.
|
of the iterations (rounds) otherwise.
|
||||||
|
|
||||||
If passing "all", will use all of the rounds regardless of whether the model had early stopping or not.
|
If passing "all", will use all of the rounds regardless of whether the model had early stopping or not.}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{strict_shape}{Whether to always return an array with the same dimensions for the given prediction mode
|
\item{strict_shape}{Whether to always return an array with the same dimensions for the given prediction mode
|
||||||
regardless of the model type - meaning that, for example, both a multi-class and a binary classification
|
regardless of the model type - meaning that, for example, both a multi-class and a binary classification
|
||||||
model would generate output arrays with the same number of dimensions, with the 'class' dimension having
|
model would generate output arrays with the same number of dimensions, with the 'class' dimension having
|
||||||
size equal to '1' for the binary model.
|
size equal to '1' for the binary model.
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ If passing `FALSE` (the default), dimensions will be simplified according to the model type, so that a
|
If passing \code{FALSE} (the default), dimensions will be simplified according to the model type, so that a
|
||||||
binary classification model for example would not have a redundant dimension for 'class'.
|
binary classification model for example would not have a redundant dimension for 'class'.
|
||||||
|
|
||||||
See documentation for the return type for the exact shape of the output arrays for each prediction mode.
|
See documentation for the return type for the exact shape of the output arrays for each prediction mode.}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{avoid_transpose}{Whether to output the resulting predictions in the same memory layout in which they
|
\item{avoid_transpose}{Whether to output the resulting predictions in the same memory layout in which they
|
||||||
are generated by the core XGBoost library, without transposing them to match the expected output shape.
|
are generated by the core XGBoost library, without transposing them to match the expected output shape.
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Internally, XGBoost uses row-major order for the predictions it generates, while R arrays use column-major
|
Internally, XGBoost uses row-major order for the predictions it generates, while R arrays use column-major
|
||||||
order, hence the result needs to be transposed in order to have the expected shape when represented as
|
order, hence the result needs to be transposed in order to have the expected shape when represented as
|
||||||
an R array or matrix, which might be a slow operation.
|
an R array or matrix, which might be a slow operation.
|
||||||
|
|
||||||
If passing `TRUE`, then the result will have dimensions in reverse order - for example, rows
|
If passing \code{TRUE}, then the result will have dimensions in reverse order - for example, rows
|
||||||
will be the last dimensions instead of the first dimension.
|
will be the last dimensions instead of the first dimension.}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{validate_features}{When \code{TRUE}, validate that the Booster's and newdata's feature_names
|
\item{validate_features}{When \code{TRUE}, validate that the Booster's and newdata's
|
||||||
match (only applicable when both \code{object} and \code{newdata} have feature names).
|
feature_names match (only applicable when both \code{object} and \code{newdata} have feature names).
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ If the column names differ and `newdata` is not an `xgb.DMatrix`, will try to reorder
|
If the column names differ and \code{newdata} is not an \code{xgb.DMatrix}, will try to reorder
|
||||||
the columns in `newdata` to match with the booster's.
|
the columns in \code{newdata} to match with the booster's.
|
||||||
|
|
||||||
If the booster has feature types and `newdata` is either an `xgb.DMatrix` or `data.frame`,
|
If the booster has feature types and \code{newdata} is either an \code{xgb.DMatrix} or
|
||||||
will additionally verify that categorical columns are of the correct type in `newdata`,
|
\code{data.frame}, will additionally verify that categorical columns are of the
|
||||||
throwing an error if they do not match.
|
correct type in \code{newdata}, throwing an error if they do not match.
|
||||||
|
|
||||||
If passing `FALSE`, it is assumed that the feature names and types are the same,
|
If passing \code{FALSE}, it is assumed that the feature names and types are the same,
|
||||||
and come in the same order as in the training data.
|
and come in the same order as in the training data.
|
||||||
|
|
||||||
Note that this check might add some sizable latency to the predictions, so it's
|
Note that this check might add some sizable latency to the predictions, so it's
|
||||||
recommended to disable it for performance-sensitive applications.
|
recommended to disable it for performance-sensitive applications.}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{base_margin}{Base margin used for boosting from existing model.
|
\item{base_margin}{Base margin used for boosting from existing model.
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Note that, if `newdata` is an `xgb.DMatrix` object, this argument will
|
Note that, if \code{newdata} is an \code{xgb.DMatrix} object, this argument will
|
||||||
be ignored as it needs to be added to the DMatrix instead (e.g. by passing it as
|
be ignored as it needs to be added to the DMatrix instead (e.g. by passing it as
|
||||||
an argument in its constructor, or by calling \link{setinfo.xgb.DMatrix}).
|
an argument in its constructor, or by calling \code{\link[=setinfo.xgb.DMatrix]{setinfo.xgb.DMatrix()}}.}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{...}{Not used.}
|
\item{...}{Not used.}
|
||||||
}
|
}
|
||||||
@ -173,7 +169,7 @@ example, for \code{predinteraction}, they will be \verb{[nfeats+1, nfeats+1, ngr
|
|||||||
instead of \verb{[nrows, ngroups, nfeats+1, nfeats+1]}.
|
instead of \verb{[nrows, ngroups, nfeats+1, nfeats+1]}.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Predict values on data based on xgboost model.
|
Predict values on data based on XGBoost model.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
Note that \code{iterationrange} would currently do nothing for predictions from "gblinear",
|
Note that \code{iterationrange} would currently do nothing for predictions from "gblinear",
|
||||||
|
|||||||
@ -33,5 +33,4 @@ bst <- xgb.train(
|
|||||||
attr(bst, "myattr") <- "memo"
|
attr(bst, "myattr") <- "memo"
|
||||||
|
|
||||||
print(bst)
|
print(bst)
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@ -13,8 +13,8 @@
|
|||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Returns the feature / variable / column names from a fitted
|
Returns the feature / variable / column names from a fitted
|
||||||
booster object, which are set automatically during the call to \link{xgb.train}
|
booster object, which are set automatically during the call to \code{\link[=xgb.train]{xgb.train()}}
|
||||||
from the DMatrix names, or which can be set manually through \link{setinfo}.
|
from the DMatrix names, or which can be set manually through \code{\link[=setinfo]{setinfo()}}.
|
||||||
|
|
||||||
If the object doesn't have feature names, will return \code{NULL}.
|
If the object doesn't have feature names, will return \code{NULL}.
|
||||||
|
|
||||||
|
|||||||
@ -53,12 +53,12 @@ Return values of \code{NULL} will be interpreted as \code{FALSE}.}
|
|||||||
\item{f_after_training}{A function that will be executed after training is finished.
|
\item{f_after_training}{A function that will be executed after training is finished.
|
||||||
|
|
||||||
This function can optionally output something non-NULL, which will become part of the R
|
This function can optionally output something non-NULL, which will become part of the R
|
||||||
attributes of the booster (assuming one passes \code{keep_extra_attributes=TRUE} to \link{xgb.train})
|
attributes of the booster (assuming one passes \code{keep_extra_attributes=TRUE} to \code{\link[=xgb.train]{xgb.train()}})
|
||||||
under the name supplied for parameter \code{cb_name} imn the case of \link{xgb.train}; or a part
|
under the name supplied for parameter \code{cb_name} imn the case of \code{\link[=xgb.train]{xgb.train()}}; or a part
|
||||||
of the named elements in the result of \link{xgb.cv}.}
|
of the named elements in the result of \code{\link[=xgb.cv]{xgb.cv()}}.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
An \code{xgb.Callback} object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
An \code{xgb.Callback} object, which can be passed to \code{\link[=xgb.train]{xgb.train()}} or \code{\link[=xgb.cv]{xgb.cv()}}.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Constructor for defining the structure of callback functions that can be executed
|
Constructor for defining the structure of callback functions that can be executed
|
||||||
@ -66,8 +66,8 @@ at different stages of model training (before / after training, before / after e
|
|||||||
iteration).
|
iteration).
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
Arguments that will be passed to the supplied functions are as follows:\itemize{
|
Arguments that will be passed to the supplied functions are as follows:
|
||||||
|
\itemize{
|
||||||
\item env The same environment that is passed under argument \code{env}.
|
\item env The same environment that is passed under argument \code{env}.
|
||||||
|
|
||||||
It may be modified by the functions in order to e.g. keep tracking of what happens
|
It may be modified by the functions in order to e.g. keep tracking of what happens
|
||||||
@ -75,11 +75,10 @@ across iterations or similar.
|
|||||||
|
|
||||||
This environment is only used by the functions supplied to the callback, and will
|
This environment is only used by the functions supplied to the callback, and will
|
||||||
not be kept after the model fitting function terminates (see parameter \code{f_after_training}).
|
not be kept after the model fitting function terminates (see parameter \code{f_after_training}).
|
||||||
|
\item model The booster object when using \code{\link[=xgb.train]{xgb.train()}}, or the folds when using \code{\link[=xgb.cv]{xgb.cv()}}.
|
||||||
|
|
||||||
\item model The booster object when using \link{xgb.train}, or the folds when using
|
For \code{\link[=xgb.cv]{xgb.cv()}}, folds are a list with a structure as follows:
|
||||||
\link{xgb.cv}.
|
\itemize{
|
||||||
|
|
||||||
For \link{xgb.cv}, folds are a list with a structure as follows:\itemize{
|
|
||||||
\item \code{dtrain}: The training data for the fold (as an \code{xgb.DMatrix} object).
|
\item \code{dtrain}: The training data for the fold (as an \code{xgb.DMatrix} object).
|
||||||
\item \code{bst}: Rhe \code{xgb.Booster} object for the fold.
|
\item \code{bst}: Rhe \code{xgb.Booster} object for the fold.
|
||||||
\item \code{evals}: A list containing two DMatrices, with names \code{train} and \code{test}
|
\item \code{evals}: A list containing two DMatrices, with names \code{train} and \code{test}
|
||||||
@ -88,64 +87,55 @@ For \link{xgb.cv}, folds are a list with a structure as follows:\itemize{
|
|||||||
from which the \code{test} entry in \code{evals} was obtained.
|
from which the \code{test} entry in \code{evals} was obtained.
|
||||||
}
|
}
|
||||||
|
|
||||||
This object should \bold{not} be in-place modified in ways that conflict with the
|
This object should \strong{not} be in-place modified in ways that conflict with the
|
||||||
training (e.g. resetting the parameters for a training update in a way that resets
|
training (e.g. resetting the parameters for a training update in a way that resets
|
||||||
the number of rounds to zero in order to overwrite rounds).
|
the number of rounds to zero in order to overwrite rounds).
|
||||||
|
|
||||||
Note that any R attributes that are assigned to the booster during the callback functions,
|
Note that any R attributes that are assigned to the booster during the callback functions,
|
||||||
will not be kept thereafter as the booster object variable is not re-assigned during
|
will not be kept thereafter as the booster object variable is not re-assigned during
|
||||||
training. It is however possible to set C-level attributes of the booster through
|
training. It is however possible to set C-level attributes of the booster through
|
||||||
\link{xgb.attr} or \link{xgb.attributes}, which should remain available for the rest
|
\code{\link[=xgb.attr]{xgb.attr()}} or \code{\link[=xgb.attributes]{xgb.attributes()}}, which should remain available for the rest
|
||||||
of the iterations and after the training is done.
|
of the iterations and after the training is done.
|
||||||
|
|
||||||
For keeping variables across iterations, it's recommended to use \code{env} instead.
|
For keeping variables across iterations, it's recommended to use \code{env} instead.
|
||||||
\item data The data to which the model is being fit, as an \code{xgb.DMatrix} object.
|
\item data The data to which the model is being fit, as an \code{xgb.DMatrix} object.
|
||||||
|
|
||||||
Note that, for \link{xgb.cv}, this will be the full data, while data for the specific
|
Note that, for \code{\link[=xgb.cv]{xgb.cv()}}, this will be the full data, while data for the specific
|
||||||
folds can be found in the \code{model} object.
|
folds can be found in the \code{model} object.
|
||||||
|
\item evals The evaluation data, as passed under argument \code{evals} to \code{\link[=xgb.train]{xgb.train()}}.
|
||||||
|
|
||||||
\item evals The evaluation data, as passed under argument \code{evals} to
|
For \code{\link[=xgb.cv]{xgb.cv()}}, this will always be \code{NULL}.
|
||||||
\link{xgb.train}.
|
\item begin_iteration Index of the first boosting iteration that will be executed (base-1 indexing).
|
||||||
|
|
||||||
For \link{xgb.cv}, this will always be \code{NULL}.
|
|
||||||
|
|
||||||
\item begin_iteration Index of the first boosting iteration that will be executed
|
|
||||||
(base-1 indexing).
|
|
||||||
|
|
||||||
This will typically be '1', but when using training continuation, depending on the
|
This will typically be '1', but when using training continuation, depending on the
|
||||||
parameters for updates, boosting rounds will be continued from where the previous
|
parameters for updates, boosting rounds will be continued from where the previous
|
||||||
model ended, in which case this will be larger than 1.
|
model ended, in which case this will be larger than 1.
|
||||||
|
|
||||||
\item end_iteration Index of the last boostign iteration that will be executed
|
\item end_iteration Index of the last boostign iteration that will be executed
|
||||||
(base-1 indexing, inclusive of this end).
|
(base-1 indexing, inclusive of this end).
|
||||||
|
|
||||||
It should match with argument \code{nrounds} passed to \link{xgb.train} or \link{xgb.cv}.
|
It should match with argument \code{nrounds} passed to \code{\link[=xgb.train]{xgb.train()}} or \code{\link[=xgb.cv]{xgb.cv()}}.
|
||||||
|
|
||||||
Note that boosting might be interrupted before reaching this last iteration, for
|
Note that boosting might be interrupted before reaching this last iteration, for
|
||||||
example by using the early stopping callback \link{xgb.cb.early.stop}.
|
example by using the early stopping callback \link{xgb.cb.early.stop}.
|
||||||
|
|
||||||
\item iteration Index of the iteration number that is being executed (first iteration
|
\item iteration Index of the iteration number that is being executed (first iteration
|
||||||
will be the same as parameter \code{begin_iteration}, then next one will add +1, and so on).
|
will be the same as parameter \code{begin_iteration}, then next one will add +1, and so on).
|
||||||
|
|
||||||
\item iter_feval Evaluation metrics for \code{evals} that were supplied, either
|
\item iter_feval Evaluation metrics for \code{evals} that were supplied, either
|
||||||
determined by the objective, or by parameter \code{feval}.
|
determined by the objective, or by parameter \code{feval}.
|
||||||
|
|
||||||
For \link{xgb.train}, this will be a named vector with one entry per element in
|
For \code{\link[=xgb.train]{xgb.train()}}, this will be a named vector with one entry per element in
|
||||||
\code{evals}, where the names are determined as 'evals name' + '-' + 'metric name' - for
|
\code{evals}, where the names are determined as 'evals name' + '-' + 'metric name' - for
|
||||||
example, if \code{evals} contains an entry named "tr" and the metric is "rmse",
|
example, if \code{evals} contains an entry named "tr" and the metric is "rmse",
|
||||||
this will be a one-element vector with name "tr-rmse".
|
this will be a one-element vector with name "tr-rmse".
|
||||||
|
|
||||||
For \link{xgb.cv}, this will be a 2d matrix with dimensions \verb{[length(evals), nfolds]},
|
For \code{\link[=xgb.cv]{xgb.cv()}}, this will be a 2d matrix with dimensions \verb{[length(evals), nfolds]},
|
||||||
where the row names will follow the same naming logic as the one-dimensional vector
|
where the row names will follow the same naming logic as the one-dimensional vector
|
||||||
that is passed in \link{xgb.train}.
|
that is passed in \code{\link[=xgb.train]{xgb.train()}}.
|
||||||
|
|
||||||
Note that, internally, the built-in callbacks such as \link{xgb.cb.print.evaluation} summarize
|
Note that, internally, the built-in callbacks such as \link{xgb.cb.print.evaluation} summarize
|
||||||
this table by calculating the row-wise means and standard deviations.
|
this table by calculating the row-wise means and standard deviations.
|
||||||
|
|
||||||
\item final_feval The evaluation results after the last boosting round is executed
|
\item final_feval The evaluation results after the last boosting round is executed
|
||||||
(same format as \code{iter_feval}, and will be the exact same input as passed under
|
(same format as \code{iter_feval}, and will be the exact same input as passed under
|
||||||
\code{iter_feval} to the last round that is executed during model fitting).
|
\code{iter_feval} to the last round that is executed during model fitting).
|
||||||
|
|
||||||
\item prev_cb_res Result from a previous run of a callback sharing the same name
|
\item prev_cb_res Result from a previous run of a callback sharing the same name
|
||||||
(as given by parameter \code{cb_name}) when conducting training continuation, if there
|
(as given by parameter \code{cb_name}) when conducting training continuation, if there
|
||||||
was any in the booster R attributes.
|
was any in the booster R attributes.
|
||||||
@ -157,10 +147,11 @@ which will append the new rows to the previous table.
|
|||||||
If no such previous callback result is available (which it never will when fitting
|
If no such previous callback result is available (which it never will when fitting
|
||||||
a model from start instead of updating an existing model), this will be \code{NULL}.
|
a model from start instead of updating an existing model), this will be \code{NULL}.
|
||||||
|
|
||||||
For \link{xgb.cv}, which doesn't support training continuation, this will always be \code{NULL}.
|
For \code{\link[=xgb.cv]{xgb.cv()}}, which doesn't support training continuation, this will always be \code{NULL}.
|
||||||
}
|
}
|
||||||
|
|
||||||
The following names (\code{cb_name} values) are reserved for internal callbacks:\itemize{
|
The following names (\code{cb_name} values) are reserved for internal callbacks:
|
||||||
|
\itemize{
|
||||||
\item print_evaluation
|
\item print_evaluation
|
||||||
\item evaluation_log
|
\item evaluation_log
|
||||||
\item reset_parameters
|
\item reset_parameters
|
||||||
@ -170,7 +161,8 @@ The following names (\code{cb_name} values) are reserved for internal callbacks:
|
|||||||
\item gblinear_history
|
\item gblinear_history
|
||||||
}
|
}
|
||||||
|
|
||||||
The following names are reserved for other non-callback attributes:\itemize{
|
The following names are reserved for other non-callback attributes:
|
||||||
|
\itemize{
|
||||||
\item names
|
\item names
|
||||||
\item class
|
\item class
|
||||||
\item call
|
\item call
|
||||||
@ -221,8 +213,10 @@ ssq_callback <- xgb.Callback(
|
|||||||
)
|
)
|
||||||
|
|
||||||
data(mtcars)
|
data(mtcars)
|
||||||
|
|
||||||
y <- mtcars$mpg
|
y <- mtcars$mpg
|
||||||
x <- as.matrix(mtcars[, -1])
|
x <- as.matrix(mtcars[, -1])
|
||||||
|
|
||||||
dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
||||||
model <- xgb.train(
|
model <- xgb.train(
|
||||||
data = dm,
|
data = dm,
|
||||||
@ -236,7 +230,8 @@ model <- xgb.train(
|
|||||||
attributes(model)$ssq
|
attributes(model)$ssq
|
||||||
}
|
}
|
||||||
\seealso{
|
\seealso{
|
||||||
Built-in callbacks:\itemize{
|
Built-in callbacks:
|
||||||
|
\itemize{
|
||||||
\item \link{xgb.cb.print.evaluation}
|
\item \link{xgb.cb.print.evaluation}
|
||||||
\item \link{xgb.cb.evaluation.log}
|
\item \link{xgb.cb.evaluation.log}
|
||||||
\item \link{xgb.cb.reset.parameters}
|
\item \link{xgb.cb.reset.parameters}
|
||||||
|
|||||||
@ -96,8 +96,7 @@ so it doesn't make sense to assign weights to individual data points.}
|
|||||||
|
|
||||||
\item{base_margin}{Base margin used for boosting from existing model.
|
\item{base_margin}{Base margin used for boosting from existing model.
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ In the case of multi-output models, one can also pass multi-dimensional base_margin.
|
In the case of multi-output models, one can also pass multi-dimensional base_margin.}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{missing}{A float value to represents missing values in data (not used when creating DMatrix
|
\item{missing}{A float value to represents missing values in data (not used when creating DMatrix
|
||||||
from text files).
|
from text files).
|
||||||
@ -109,9 +108,8 @@ values in data.}
|
|||||||
\item{feature_names}{Set names for features. Overrides column names in data
|
\item{feature_names}{Set names for features. Overrides column names in data
|
||||||
frame and matrix.
|
frame and matrix.
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Note: columns are not referenced by name when calling `predict`, so the column order there
|
Note: columns are not referenced by name when calling \code{predict}, so the column order there
|
||||||
must be the same as in the DMatrix construction, regardless of the column names.
|
must be the same as in the DMatrix construction, regardless of the column names.}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{feature_types}{Set types for features.
|
\item{feature_types}{Set types for features.
|
||||||
|
|
||||||
|
|||||||
@ -45,15 +45,13 @@ so it doesn't make sense to assign weights to individual data points.}
|
|||||||
|
|
||||||
\item{base_margin}{Base margin used for boosting from existing model.
|
\item{base_margin}{Base margin used for boosting from existing model.
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ In the case of multi-output models, one can also pass multi-dimensional base_margin.
|
In the case of multi-output models, one can also pass multi-dimensional base_margin.}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{feature_names}{Set names for features. Overrides column names in data
|
\item{feature_names}{Set names for features. Overrides column names in data
|
||||||
frame and matrix.
|
frame and matrix.
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Note: columns are not referenced by name when calling `predict`, so the column order there
|
Note: columns are not referenced by name when calling \code{predict}, so the column order there
|
||||||
must be the same as in the DMatrix construction, regardless of the column names.
|
must be the same as in the DMatrix construction, regardless of the column names.}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{feature_types}{Set types for features.
|
\item{feature_types}{Set types for features.
|
||||||
|
|
||||||
|
|||||||
@ -16,7 +16,7 @@ xgb.attributes(object)
|
|||||||
xgb.attributes(object) <- value
|
xgb.attributes(object) <- value
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{object}{Object of class \code{xgb.Booster}. \bold{Will be modified in-place} when assigning to it.}
|
\item{object}{Object of class \code{xgb.Booster}. \strong{Will be modified in-place} when assigning to it.}
|
||||||
|
|
||||||
\item{name}{A non-empty character string specifying which attribute is to be accessed.}
|
\item{name}{A non-empty character string specifying which attribute is to be accessed.}
|
||||||
|
|
||||||
@ -36,28 +36,28 @@ or \code{NULL} if a model has no stored attributes.
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
These methods allow to manipulate the key-value attribute strings of an xgboost model.
|
These methods allow to manipulate the key-value attribute strings of an XGBoost model.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
The primary purpose of xgboost model attributes is to store some meta data about the model.
|
The primary purpose of XGBoost model attributes is to store some meta data about the model.
|
||||||
Note that they are a separate concept from the object attributes in R.
|
Note that they are a separate concept from the object attributes in R.
|
||||||
Specifically, they refer to key-value strings that can be attached to an xgboost model,
|
Specifically, they refer to key-value strings that can be attached to an XGBoost model,
|
||||||
stored together with the model's binary representation, and accessed later
|
stored together with the model's binary representation, and accessed later
|
||||||
(from R or any other interface).
|
(from R or any other interface).
|
||||||
In contrast, any R attribute assigned to an R object of \code{xgb.Booster} class
|
In contrast, any R attribute assigned to an R object of \code{xgb.Booster} class
|
||||||
would not be saved by \code{\link[=xgb.save]{xgb.save()}} because an xgboost model is an external memory object
|
would not be saved by \code{\link[=xgb.save]{xgb.save()}} because an XGBoost model is an external memory object
|
||||||
and its serialization is handled externally.
|
and its serialization is handled externally.
|
||||||
Also, setting an attribute that has the same name as one of xgboost's parameters wouldn't
|
Also, setting an attribute that has the same name as one of XGBoost's parameters wouldn't
|
||||||
change the value of that parameter for a model.
|
change the value of that parameter for a model.
|
||||||
Use \code{\link[=xgb.parameters<-]{xgb.parameters<-()}} to set or change model parameters.
|
Use \code{\link[=xgb.parameters<-]{xgb.parameters<-()}} to set or change model parameters.
|
||||||
|
|
||||||
The \verb{xgb.attributes<-} setter either updates the existing or adds one or several attributes,
|
The \code{\link[=xgb.attributes<-]{xgb.attributes<-()}} setter either updates the existing or adds one or several attributes,
|
||||||
but it doesn't delete the other existing attributes.
|
but it doesn't delete the other existing attributes.
|
||||||
|
|
||||||
Important: since this modifies the booster's C object, semantics for assignment here
|
Important: since this modifies the booster's C object, semantics for assignment here
|
||||||
will differ from R's, as any object reference to the same booster will be modified
|
will differ from R's, as any object reference to the same booster will be modified
|
||||||
too, while assignment of R attributes through \verb{attributes(model)$<attr> <- <value>}
|
too, while assignment of R attributes through \verb{attributes(model)$<attr> <- <value>}
|
||||||
will follow the usual copy-on-write R semantics (see \link{xgb.copy.Booster} for an
|
will follow the usual copy-on-write R semantics (see \code{\link[=xgb.copy.Booster]{xgb.copy.Booster()}} for an
|
||||||
example of these behaviors).
|
example of these behaviors).
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
|
|||||||
@ -2,7 +2,7 @@
|
|||||||
% Please edit documentation in R/callbacks.R
|
% Please edit documentation in R/callbacks.R
|
||||||
\name{xgb.cb.cv.predict}
|
\name{xgb.cb.cv.predict}
|
||||||
\alias{xgb.cb.cv.predict}
|
\alias{xgb.cb.cv.predict}
|
||||||
\title{Callback for returning cross-validation based predictions.}
|
\title{Callback for returning cross-validation based predictions}
|
||||||
\usage{
|
\usage{
|
||||||
xgb.cb.cv.predict(save_models = FALSE, outputmargin = FALSE)
|
xgb.cb.cv.predict(save_models = FALSE, outputmargin = FALSE)
|
||||||
}
|
}
|
||||||
@ -13,8 +13,8 @@ xgb.cb.cv.predict(save_models = FALSE, outputmargin = FALSE)
|
|||||||
parameter to \link{predict.xgb.Booster}).}
|
parameter to \link{predict.xgb.Booster}).}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
An \code{xgb.Callback} object, which can be passed to \link{xgb.cv},
|
An \code{xgb.Callback} object, which can be passed to \code{\link[=xgb.cv]{xgb.cv()}},
|
||||||
but \bold{not} to \link{xgb.train}.
|
but \strong{not} to \code{\link[=xgb.train]{xgb.train()}}.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
This callback function saves predictions for all of the test folds,
|
This callback function saves predictions for all of the test folds,
|
||||||
@ -24,7 +24,7 @@ and also allows to save the folds' models.
|
|||||||
Predictions are saved inside of the \code{pred} element, which is either a vector or a matrix,
|
Predictions are saved inside of the \code{pred} element, which is either a vector or a matrix,
|
||||||
depending on the number of prediction outputs per data row. The order of predictions corresponds
|
depending on the number of prediction outputs per data row. The order of predictions corresponds
|
||||||
to the order of rows in the original dataset. Note that when a custom \code{folds} list is
|
to the order of rows in the original dataset. Note that when a custom \code{folds} list is
|
||||||
provided in \code{xgb.cv}, the predictions would only be returned properly when this list is a
|
provided in \code{\link[=xgb.cv]{xgb.cv()}}, the predictions would only be returned properly when this list is a
|
||||||
non-overlapping list of k sets of indices, as in a standard k-fold CV. The predictions would not be
|
non-overlapping list of k sets of indices, as in a standard k-fold CV. The predictions would not be
|
||||||
meaningful when user-provided folds have overlapping indices as in, e.g., random sampling splits.
|
meaningful when user-provided folds have overlapping indices as in, e.g., random sampling splits.
|
||||||
When some of the indices in the training dataset are not included into user-provided \code{folds},
|
When some of the indices in the training dataset are not included into user-provided \code{folds},
|
||||||
|
|||||||
@ -33,7 +33,7 @@ in the resulting object. If passing \code{FALSE}, will only keep the boosting ro
|
|||||||
up to the detected best iteration, discarding the ones that come after.}
|
up to the detected best iteration, discarding the ones that come after.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
An \code{xgb.Callback} object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
An \code{xgb.Callback} object, which can be passed to \code{\link[=xgb.train]{xgb.train()}} or \code{\link[=xgb.cv]{xgb.cv()}}.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
This callback function determines the condition for early stopping.
|
This callback function determines the condition for early stopping.
|
||||||
@ -49,7 +49,7 @@ The same values are also stored as R attributes as a result of the callback, plu
|
|||||||
attribute \code{stopped_by_max_rounds} which indicates whether an early stopping by the \code{stopping_rounds}
|
attribute \code{stopped_by_max_rounds} which indicates whether an early stopping by the \code{stopping_rounds}
|
||||||
condition occurred. Note that the \code{best_iteration} that is stored under R attributes will follow
|
condition occurred. Note that the \code{best_iteration} that is stored under R attributes will follow
|
||||||
base-1 indexing, so it will be larger by '1' than the C-level 'best_iteration' that is accessed
|
base-1 indexing, so it will be larger by '1' than the C-level 'best_iteration' that is accessed
|
||||||
through \link{xgb.attr} or \link{xgb.attributes}.
|
through \code{\link[=xgb.attr]{xgb.attr()}} or \code{\link[=xgb.attributes]{xgb.attributes()}}.
|
||||||
|
|
||||||
At least one dataset is required in \code{evals} for early stopping to work.
|
At least one dataset is required in \code{evals} for early stopping to work.
|
||||||
}
|
}
|
||||||
|
|||||||
@ -7,14 +7,14 @@
|
|||||||
xgb.cb.evaluation.log()
|
xgb.cb.evaluation.log()
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
An \code{xgb.Callback} object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
An \code{xgb.Callback} object, which can be passed to \code{\link[=xgb.train]{xgb.train()}} or \code{\link[=xgb.cv]{xgb.cv()}}.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Callback for logging the evaluation history
|
Callback for logging the evaluation history
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
This callback creates a table with per-iteration evaluation metrics (see parameters
|
This callback creates a table with per-iteration evaluation metrics (see parameters
|
||||||
\code{evals} and \code{feval} in \link{xgb.train}).
|
\code{evals} and \code{feval} in \code{\link[=xgb.train]{xgb.train()}}).
|
||||||
|
|
||||||
Note: in the column names of the final data.table, the dash '-' character is replaced with
|
Note: in the column names of the final data.table, the dash '-' character is replaced with
|
||||||
the underscore '_' in order to make the column names more like regular R identifiers.
|
the underscore '_' in order to make the column names more like regular R identifiers.
|
||||||
|
|||||||
@ -7,13 +7,13 @@
|
|||||||
xgb.cb.gblinear.history(sparse = FALSE)
|
xgb.cb.gblinear.history(sparse = FALSE)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{sparse}{when set to \code{FALSE}/\code{TRUE}, a dense/sparse matrix is used to store the result.
|
\item{sparse}{When set to \code{FALSE}/\code{TRUE}, a dense/sparse matrix is used to store the result.
|
||||||
Sparse format is useful when one expects only a subset of coefficients to be non-zero,
|
Sparse format is useful when one expects only a subset of coefficients to be non-zero,
|
||||||
when using the "thrifty" feature selector with fairly small number of top features
|
when using the "thrifty" feature selector with fairly small number of top features
|
||||||
selected per iteration.}
|
selected per iteration.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
An \code{xgb.Callback} object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
An \code{xgb.Callback} object, which can be passed to \code{\link[=xgb.train]{xgb.train()}} or \code{\link[=xgb.cv]{xgb.cv()}}.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Callback for collecting coefficients history of a gblinear booster
|
Callback for collecting coefficients history of a gblinear booster
|
||||||
@ -37,11 +37,10 @@ will have column names matching with the feature names, otherwise (when there's
|
|||||||
one coefficient per feature) the names will be composed as 'column name' + ':' + 'class index'
|
one coefficient per feature) the names will be composed as 'column name' + ':' + 'class index'
|
||||||
(so e.g. column 'c1' for class '0' will be named 'c1:0').
|
(so e.g. column 'c1' for class '0' will be named 'c1:0').
|
||||||
|
|
||||||
With \code{xgb.train}, the output is either a dense or a sparse matrix.
|
With \code{\link[=xgb.train]{xgb.train()}}, the output is either a dense or a sparse matrix.
|
||||||
With with \code{xgb.cv}, it is a list (one element per each fold) of such
|
With with \code{\link[=xgb.cv]{xgb.cv()}}, it is a list (one element per each fold) of such matrices.
|
||||||
matrices.
|
|
||||||
|
|
||||||
Function \link{xgb.gblinear.history} function provides an easy way to retrieve the
|
Function \link{xgb.gblinear.history} provides an easy way to retrieve the
|
||||||
outputs from this callback.
|
outputs from this callback.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
@ -55,55 +54,107 @@ data.table::setDTthreads(nthread)
|
|||||||
# without considering the 2nd order interactions:
|
# without considering the 2nd order interactions:
|
||||||
x <- model.matrix(Species ~ .^2, iris)[, -1]
|
x <- model.matrix(Species ~ .^2, iris)[, -1]
|
||||||
colnames(x)
|
colnames(x)
|
||||||
dtrain <- xgb.DMatrix(scale(x), label = 1*(iris$Species == "versicolor"), nthread = nthread)
|
dtrain <- xgb.DMatrix(
|
||||||
param <- list(booster = "gblinear", objective = "reg:logistic", eval_metric = "auc",
|
scale(x),
|
||||||
lambda = 0.0003, alpha = 0.0003, nthread = nthread)
|
label = 1 * (iris$Species == "versicolor"),
|
||||||
|
nthread = nthread
|
||||||
|
)
|
||||||
|
param <- list(
|
||||||
|
booster = "gblinear",
|
||||||
|
objective = "reg:logistic",
|
||||||
|
eval_metric = "auc",
|
||||||
|
lambda = 0.0003,
|
||||||
|
alpha = 0.0003,
|
||||||
|
nthread = nthread
|
||||||
|
)
|
||||||
|
|
||||||
# For 'shotgun', which is a default linear updater, using high eta values may result in
|
# For 'shotgun', which is a default linear updater, using high eta values may result in
|
||||||
# unstable behaviour in some datasets. With this simple dataset, however, the high learning
|
# unstable behaviour in some datasets. With this simple dataset, however, the high learning
|
||||||
# rate does not break the convergence, but allows us to illustrate the typical pattern of
|
# rate does not break the convergence, but allows us to illustrate the typical pattern of
|
||||||
# "stochastic explosion" behaviour of this lock-free algorithm at early boosting iterations.
|
# "stochastic explosion" behaviour of this lock-free algorithm at early boosting iterations.
|
||||||
bst <- xgb.train(param, dtrain, list(tr=dtrain), nrounds = 200, eta = 1.,
|
bst <- xgb.train(
|
||||||
callbacks = list(xgb.cb.gblinear.history()))
|
param,
|
||||||
|
dtrain,
|
||||||
|
list(tr = dtrain),
|
||||||
|
nrounds = 200,
|
||||||
|
eta = 1.,
|
||||||
|
callbacks = list(xgb.cb.gblinear.history())
|
||||||
|
)
|
||||||
|
|
||||||
# Extract the coefficients' path and plot them vs boosting iteration number:
|
# Extract the coefficients' path and plot them vs boosting iteration number:
|
||||||
coef_path <- xgb.gblinear.history(bst)
|
coef_path <- xgb.gblinear.history(bst)
|
||||||
matplot(coef_path, type = 'l')
|
matplot(coef_path, type = "l")
|
||||||
|
|
||||||
# With the deterministic coordinate descent updater, it is safer to use higher learning rates.
|
# With the deterministic coordinate descent updater, it is safer to use higher learning rates.
|
||||||
# Will try the classical componentwise boosting which selects a single best feature per round:
|
# Will try the classical componentwise boosting which selects a single best feature per round:
|
||||||
bst <- xgb.train(param, dtrain, list(tr=dtrain), nrounds = 200, eta = 0.8,
|
bst <- xgb.train(
|
||||||
updater = 'coord_descent', feature_selector = 'thrifty', top_k = 1,
|
param,
|
||||||
callbacks = list(xgb.cb.gblinear.history()))
|
dtrain,
|
||||||
matplot(xgb.gblinear.history(bst), type = 'l')
|
list(tr = dtrain),
|
||||||
|
nrounds = 200,
|
||||||
|
eta = 0.8,
|
||||||
|
updater = "coord_descent",
|
||||||
|
feature_selector = "thrifty",
|
||||||
|
top_k = 1,
|
||||||
|
callbacks = list(xgb.cb.gblinear.history())
|
||||||
|
)
|
||||||
|
matplot(xgb.gblinear.history(bst), type = "l")
|
||||||
# Componentwise boosting is known to have similar effect to Lasso regularization.
|
# Componentwise boosting is known to have similar effect to Lasso regularization.
|
||||||
# Try experimenting with various values of top_k, eta, nrounds,
|
# Try experimenting with various values of top_k, eta, nrounds,
|
||||||
# as well as different feature_selectors.
|
# as well as different feature_selectors.
|
||||||
|
|
||||||
# For xgb.cv:
|
# For xgb.cv:
|
||||||
bst <- xgb.cv(param, dtrain, nfold = 5, nrounds = 100, eta = 0.8,
|
bst <- xgb.cv(
|
||||||
callbacks = list(xgb.cb.gblinear.history()))
|
param,
|
||||||
|
dtrain,
|
||||||
|
nfold = 5,
|
||||||
|
nrounds = 100,
|
||||||
|
eta = 0.8,
|
||||||
|
callbacks = list(xgb.cb.gblinear.history())
|
||||||
|
)
|
||||||
# coefficients in the CV fold #3
|
# coefficients in the CV fold #3
|
||||||
matplot(xgb.gblinear.history(bst)[[3]], type = 'l')
|
matplot(xgb.gblinear.history(bst)[[3]], type = "l")
|
||||||
|
|
||||||
|
|
||||||
#### Multiclass classification:
|
#### Multiclass classification:
|
||||||
#
|
|
||||||
dtrain <- xgb.DMatrix(scale(x), label = as.numeric(iris$Species) - 1, nthread = nthread)
|
dtrain <- xgb.DMatrix(scale(x), label = as.numeric(iris$Species) - 1, nthread = nthread)
|
||||||
param <- list(booster = "gblinear", objective = "multi:softprob", num_class = 3,
|
|
||||||
lambda = 0.0003, alpha = 0.0003, nthread = nthread)
|
param <- list(
|
||||||
|
booster = "gblinear",
|
||||||
|
objective = "multi:softprob",
|
||||||
|
num_class = 3,
|
||||||
|
lambda = 0.0003,
|
||||||
|
alpha = 0.0003,
|
||||||
|
nthread = nthread
|
||||||
|
)
|
||||||
|
|
||||||
# For the default linear updater 'shotgun' it sometimes is helpful
|
# For the default linear updater 'shotgun' it sometimes is helpful
|
||||||
# to use smaller eta to reduce instability
|
# to use smaller eta to reduce instability
|
||||||
bst <- xgb.train(param, dtrain, list(tr=dtrain), nrounds = 50, eta = 0.5,
|
bst <- xgb.train(
|
||||||
callbacks = list(xgb.cb.gblinear.history()))
|
param,
|
||||||
|
dtrain,
|
||||||
|
list(tr = dtrain),
|
||||||
|
nrounds = 50,
|
||||||
|
eta = 0.5,
|
||||||
|
callbacks = list(xgb.cb.gblinear.history())
|
||||||
|
)
|
||||||
|
|
||||||
# Will plot the coefficient paths separately for each class:
|
# Will plot the coefficient paths separately for each class:
|
||||||
matplot(xgb.gblinear.history(bst, class_index = 0), type = 'l')
|
matplot(xgb.gblinear.history(bst, class_index = 0), type = "l")
|
||||||
matplot(xgb.gblinear.history(bst, class_index = 1), type = 'l')
|
matplot(xgb.gblinear.history(bst, class_index = 1), type = "l")
|
||||||
matplot(xgb.gblinear.history(bst, class_index = 2), type = 'l')
|
matplot(xgb.gblinear.history(bst, class_index = 2), type = "l")
|
||||||
|
|
||||||
# CV:
|
# CV:
|
||||||
bst <- xgb.cv(param, dtrain, nfold = 5, nrounds = 70, eta = 0.5,
|
bst <- xgb.cv(
|
||||||
callbacks = list(xgb.cb.gblinear.history(FALSE)))
|
param,
|
||||||
|
dtrain,
|
||||||
|
nfold = 5,
|
||||||
|
nrounds = 70,
|
||||||
|
eta = 0.5,
|
||||||
|
callbacks = list(xgb.cb.gblinear.history(FALSE))
|
||||||
|
)
|
||||||
# 1st fold of 1st class
|
# 1st fold of 1st class
|
||||||
matplot(xgb.gblinear.history(bst, class_index = 0)[[1]], type = 'l')
|
matplot(xgb.gblinear.history(bst, class_index = 0)[[1]], type = "l")
|
||||||
|
|
||||||
}
|
}
|
||||||
\seealso{
|
\seealso{
|
||||||
|
|||||||
@ -7,12 +7,12 @@
|
|||||||
xgb.cb.print.evaluation(period = 1, showsd = TRUE)
|
xgb.cb.print.evaluation(period = 1, showsd = TRUE)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{period}{results would be printed every number of periods}
|
\item{period}{Results would be printed every number of periods.}
|
||||||
|
|
||||||
\item{showsd}{whether standard deviations should be printed (when available)}
|
\item{showsd}{Whether standard deviations should be printed (when available).}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
An \code{xgb.Callback} object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
An \code{xgb.Callback} object, which can be passed to \code{\link[=xgb.train]{xgb.train()}} or \code{\link[=xgb.cv]{xgb.cv()}}.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
The callback function prints the result of evaluation at every \code{period} iterations.
|
The callback function prints the result of evaluation at every \code{period} iterations.
|
||||||
|
|||||||
@ -2,12 +2,12 @@
|
|||||||
% Please edit documentation in R/callbacks.R
|
% Please edit documentation in R/callbacks.R
|
||||||
\name{xgb.cb.reset.parameters}
|
\name{xgb.cb.reset.parameters}
|
||||||
\alias{xgb.cb.reset.parameters}
|
\alias{xgb.cb.reset.parameters}
|
||||||
\title{Callback for resetting the booster's parameters at each iteration.}
|
\title{Callback for resetting booster parameters at each iteration}
|
||||||
\usage{
|
\usage{
|
||||||
xgb.cb.reset.parameters(new_params)
|
xgb.cb.reset.parameters(new_params)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{new_params}{a list where each element corresponds to a parameter that needs to be reset.
|
\item{new_params}{List of parameters needed to be reset.
|
||||||
Each element's value must be either a vector of values of length \code{nrounds}
|
Each element's value must be either a vector of values of length \code{nrounds}
|
||||||
to be set at each iteration,
|
to be set at each iteration,
|
||||||
or a function of two parameters \code{learning_rates(iteration, nrounds)}
|
or a function of two parameters \code{learning_rates(iteration, nrounds)}
|
||||||
@ -15,10 +15,10 @@ which returns a new parameter value by using the current iteration number
|
|||||||
and the total number of boosting rounds.}
|
and the total number of boosting rounds.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
An \code{xgb.Callback} object, which can be passed to \link{xgb.train} or \link{xgb.cv}.
|
An \code{xgb.Callback} object, which can be passed to \code{\link[=xgb.train]{xgb.train()}} or \code{\link[=xgb.cv]{xgb.cv()}}.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Callback for resetting the booster's parameters at each iteration.
|
Callback for resetting booster parameters at each iteration
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
Note that when training is resumed from some previous model, and a function is used to
|
Note that when training is resumed from some previous model, and a function is used to
|
||||||
|
|||||||
@ -2,23 +2,22 @@
|
|||||||
% Please edit documentation in R/callbacks.R
|
% Please edit documentation in R/callbacks.R
|
||||||
\name{xgb.cb.save.model}
|
\name{xgb.cb.save.model}
|
||||||
\alias{xgb.cb.save.model}
|
\alias{xgb.cb.save.model}
|
||||||
\title{Callback for saving a model file.}
|
\title{Callback for saving a model file}
|
||||||
\usage{
|
\usage{
|
||||||
xgb.cb.save.model(save_period = 0, save_name = "xgboost.ubj")
|
xgb.cb.save.model(save_period = 0, save_name = "xgboost.ubj")
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{save_period}{Save the model to disk after every
|
\item{save_period}{Save the model to disk after every \code{save_period} iterations;
|
||||||
\code{save_period} iterations; 0 means save the model at the end.}
|
0 means save the model at the end.}
|
||||||
|
|
||||||
\item{save_name}{The name or path for the saved model file.
|
\item{save_name}{The name or path for the saved model file.
|
||||||
It can contain a \code{\link[base]{sprintf}} formatting specifier
|
It can contain a \code{\link[=sprintf]{sprintf()}} formatting specifier to include the integer
|
||||||
to include the integer iteration number in the file name.
|
iteration number in the file name. E.g., with \code{save_name = 'xgboost_\%04d.model'},
|
||||||
E.g., with \code{save_name} = 'xgboost_\%04d.model',
|
|
||||||
the file saved at iteration 50 would be named "xgboost_0050.model".}
|
the file saved at iteration 50 would be named "xgboost_0050.model".}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
An \code{xgb.Callback} object, which can be passed to \link{xgb.train},
|
An \code{xgb.Callback} object, which can be passed to \code{\link[=xgb.train]{xgb.train()}},
|
||||||
but \bold{not} to \link{xgb.cv}.
|
but \strong{not} to \code{\link[=xgb.cv]{xgb.cv()}}.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
This callback function allows to save an xgb-model file, either periodically
|
This callback function allows to save an xgb-model file, either periodically
|
||||||
|
|||||||
@ -10,12 +10,12 @@ xgb.config(object)
|
|||||||
xgb.config(object) <- value
|
xgb.config(object) <- value
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{object}{Object of class \code{xgb.Booster}. \bold{Will be modified in-place} when assigning to it.}
|
\item{object}{Object of class \code{xgb.Booster}.\strong{Will be modified in-place} when assigning to it.}
|
||||||
|
|
||||||
\item{value}{An R list.}
|
\item{value}{A list.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
\code{xgb.config} will return the parameters as an R list.
|
Parameters as a list.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Accessors for model parameters as JSON string
|
Accessors for model parameters as JSON string
|
||||||
@ -25,7 +25,7 @@ Note that assignment is performed in-place on the booster C object, which unlike
|
|||||||
of R attributes, doesn't follow typical copy-on-write semantics for assignment - i.e. all references
|
of R attributes, doesn't follow typical copy-on-write semantics for assignment - i.e. all references
|
||||||
to the same booster will also get updated.
|
to the same booster will also get updated.
|
||||||
|
|
||||||
See \link{xgb.copy.Booster} for an example of this behavior.
|
See \code{\link[=xgb.copy.Booster]{xgb.copy.Booster()}} for an example of this behavior.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
data(agaricus.train, package = "xgboost")
|
data(agaricus.train, package = "xgboost")
|
||||||
|
|||||||
@ -16,14 +16,18 @@ functions called on that copy will not affect the \code{model} variable.
|
|||||||
\description{
|
\description{
|
||||||
Creates a deep copy of an 'xgb.Booster' object, such that the
|
Creates a deep copy of an 'xgb.Booster' object, such that the
|
||||||
C object pointer contained will be a different object, and hence functions
|
C object pointer contained will be a different object, and hence functions
|
||||||
like \link{xgb.attr} will not affect the object from which it was copied.
|
like \code{\link[=xgb.attr]{xgb.attr()}} will not affect the object from which it was copied.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
library(xgboost)
|
library(xgboost)
|
||||||
|
|
||||||
data(mtcars)
|
data(mtcars)
|
||||||
|
|
||||||
y <- mtcars$mpg
|
y <- mtcars$mpg
|
||||||
x <- mtcars[, -1]
|
x <- mtcars[, -1]
|
||||||
|
|
||||||
dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
||||||
|
|
||||||
model <- xgb.train(
|
model <- xgb.train(
|
||||||
data = dm,
|
data = dm,
|
||||||
params = list(nthread = 1),
|
params = list(nthread = 1),
|
||||||
|
|||||||
@ -7,17 +7,18 @@
|
|||||||
xgb.create.features(model, data, ...)
|
xgb.create.features(model, data, ...)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{model}{decision tree boosting model learned on the original data}
|
\item{model}{Decision tree boosting model learned on the original data.}
|
||||||
|
|
||||||
\item{data}{original data (usually provided as a \code{dgCMatrix} matrix)}
|
\item{data}{Original data (usually provided as a \code{dgCMatrix} matrix).}
|
||||||
|
|
||||||
\item{...}{currently not used}
|
\item{...}{Currently not used.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
\code{dgCMatrix} matrix including both the original data and the new features.
|
A \code{dgCMatrix} matrix including both the original data and the new features.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
May improve the learning by adding new features to the training data based on the decision trees from a previously learned model.
|
May improve the learning by adding new features to the training data based on the
|
||||||
|
decision trees from a previously learned model.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
This is the function inspired from the paragraph 3.1 of the paper:
|
This is the function inspired from the paragraph 3.1 of the paper:
|
||||||
@ -44,11 +45,11 @@ For example, consider the boosted tree model in Figure 1 with 2 subtrees,
|
|||||||
where the first subtree has 3 leafs and the second 2 leafs. If an
|
where the first subtree has 3 leafs and the second 2 leafs. If an
|
||||||
instance ends up in leaf 2 in the first subtree and leaf 1 in
|
instance ends up in leaf 2 in the first subtree and leaf 1 in
|
||||||
second subtree, the overall input to the linear classifier will
|
second subtree, the overall input to the linear classifier will
|
||||||
be the binary vector \code{[0, 1, 0, 1, 0]}, where the first 3 entries
|
be the binary vector \verb{[0, 1, 0, 1, 0]}, where the first 3 entries
|
||||||
correspond to the leaves of the first subtree and last 2 to
|
correspond to the leaves of the first subtree and last 2 to
|
||||||
those of the second subtree.
|
those of the second subtree.
|
||||||
|
|
||||||
\link{...}
|
...
|
||||||
|
|
||||||
We can understand boosted decision tree
|
We can understand boosted decision tree
|
||||||
based transformation as a supervised feature encoding that
|
based transformation as a supervised feature encoding that
|
||||||
@ -57,15 +58,16 @@ vector. A traversal from root node to a leaf node represents
|
|||||||
a rule on certain features."
|
a rule on certain features."
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
data(agaricus.train, package='xgboost')
|
data(agaricus.train, package = "xgboost")
|
||||||
data(agaricus.test, package='xgboost')
|
data(agaricus.test, package = "xgboost")
|
||||||
|
|
||||||
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||||
dtest <- with(agaricus.test, xgb.DMatrix(data, label = label, nthread = 2))
|
dtest <- with(agaricus.test, xgb.DMatrix(data, label = label, nthread = 2))
|
||||||
|
|
||||||
param <- list(max_depth = 2, eta = 1, objective = 'binary:logistic')
|
param <- list(max_depth = 2, eta = 1, objective = 'binary:logistic')
|
||||||
nrounds = 4
|
nrounds = 4
|
||||||
|
|
||||||
bst = xgb.train(params = param, data = dtrain, nrounds = nrounds, nthread = 2)
|
bst <- xgb.train(params = param, data = dtrain, nrounds = nrounds, nthread = 2)
|
||||||
|
|
||||||
# Model accuracy without new features
|
# Model accuracy without new features
|
||||||
accuracy.before <- sum((predict(bst, agaricus.test$data) >= 0.5) == agaricus.test$label) /
|
accuracy.before <- sum((predict(bst, agaricus.test$data) >= 0.5) == agaricus.test$label) /
|
||||||
|
|||||||
@ -2,7 +2,7 @@
|
|||||||
% Please edit documentation in R/xgb.dump.R
|
% Please edit documentation in R/xgb.dump.R
|
||||||
\name{xgb.dump}
|
\name{xgb.dump}
|
||||||
\alias{xgb.dump}
|
\alias{xgb.dump}
|
||||||
\title{Dump an xgboost model in text format.}
|
\title{Dump an XGBoost model in text format.}
|
||||||
\usage{
|
\usage{
|
||||||
xgb.dump(
|
xgb.dump(
|
||||||
model,
|
model,
|
||||||
@ -14,43 +14,51 @@ xgb.dump(
|
|||||||
)
|
)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{model}{the model object.}
|
\item{model}{The model object.}
|
||||||
|
|
||||||
\item{fname}{the name of the text file where to save the model text dump.
|
\item{fname}{The name of the text file where to save the model text dump.
|
||||||
If not provided or set to \code{NULL}, the model is returned as a \code{character} vector.}
|
If not provided or set to \code{NULL}, the model is returned as a character vector.}
|
||||||
|
|
||||||
\item{fmap}{feature map file representing feature types.
|
\item{fmap}{Feature map file representing feature types. See demo/ for a walkthrough
|
||||||
See demo/ for walkthrough example in R, and
|
example in R, and \url{https://github.com/dmlc/xgboost/blob/master/demo/data/featmap.txt}
|
||||||
\url{https://github.com/dmlc/xgboost/blob/master/demo/data/featmap.txt}
|
to see an example of the value.}
|
||||||
for example Format.}
|
|
||||||
|
|
||||||
\item{with_stats}{whether to dump some additional statistics about the splits.
|
\item{with_stats}{Whether to dump some additional statistics about the splits.
|
||||||
When this option is on, the model dump contains two additional values:
|
When this option is on, the model dump contains two additional values:
|
||||||
gain is the approximate loss function gain we get in each split;
|
gain is the approximate loss function gain we get in each split;
|
||||||
cover is the sum of second order gradient in each node.}
|
cover is the sum of second order gradient in each node.}
|
||||||
|
|
||||||
\item{dump_format}{either 'text', 'json', or 'dot' (graphviz) format could be specified.
|
\item{dump_format}{Either 'text', 'json', or 'dot' (graphviz) format could be specified.
|
||||||
|
|
||||||
Format 'dot' for a single tree can be passed directly to packages that consume this format
|
Format 'dot' for a single tree can be passed directly to packages that consume this format
|
||||||
for graph visualization, such as function \code{\link[DiagrammeR:grViz]{DiagrammeR::grViz()}}}
|
for graph visualization, such as function \code{DiagrammeR::grViz()}}
|
||||||
|
|
||||||
\item{...}{currently not used}
|
\item{...}{Currently not used}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
If fname is not provided or set to \code{NULL} the function will return the model
|
If fname is not provided or set to \code{NULL} the function will return the model
|
||||||
as a \code{character} vector. Otherwise it will return \code{TRUE}.
|
as a character vector. Otherwise it will return \code{TRUE}.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Dump an xgboost model in text format.
|
Dump an XGBoost model in text format.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
\dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
\dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
||||||
data(agaricus.train, package='xgboost')
|
data(agaricus.train, package = "xgboost")
|
||||||
data(agaricus.test, package='xgboost')
|
data(agaricus.test, package = "xgboost")
|
||||||
|
|
||||||
train <- agaricus.train
|
train <- agaricus.train
|
||||||
test <- agaricus.test
|
test <- agaricus.test
|
||||||
bst <- xgb.train(data = xgb.DMatrix(train$data, label = train$label), max_depth = 2,
|
|
||||||
eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
|
bst <- xgb.train(
|
||||||
|
data = xgb.DMatrix(train$data, label = train$label),
|
||||||
|
max_depth = 2,
|
||||||
|
eta = 1,
|
||||||
|
nthread = 2,
|
||||||
|
nrounds = 2,
|
||||||
|
objective = "binary:logistic"
|
||||||
|
)
|
||||||
|
|
||||||
# save the model in file 'xgb.model.dump'
|
# save the model in file 'xgb.model.dump'
|
||||||
dump_path = file.path(tempdir(), 'model.dump')
|
dump_path = file.path(tempdir(), 'model.dump')
|
||||||
xgb.dump(bst, dump_path, with_stats = TRUE)
|
xgb.dump(bst, dump_path, with_stats = TRUE)
|
||||||
@ -59,7 +67,7 @@ xgb.dump(bst, dump_path, with_stats = TRUE)
|
|||||||
print(xgb.dump(bst, with_stats = TRUE))
|
print(xgb.dump(bst, with_stats = TRUE))
|
||||||
|
|
||||||
# print in JSON format:
|
# print in JSON format:
|
||||||
cat(xgb.dump(bst, with_stats = TRUE, dump_format='json'))
|
cat(xgb.dump(bst, with_stats = TRUE, dump_format = "json"))
|
||||||
|
|
||||||
# plot first tree leveraging the 'dot' format
|
# plot first tree leveraging the 'dot' format
|
||||||
if (requireNamespace('DiagrammeR', quietly = TRUE)) {
|
if (requireNamespace('DiagrammeR', quietly = TRUE)) {
|
||||||
|
|||||||
@ -2,24 +2,24 @@
|
|||||||
% Please edit documentation in R/callbacks.R
|
% Please edit documentation in R/callbacks.R
|
||||||
\name{xgb.gblinear.history}
|
\name{xgb.gblinear.history}
|
||||||
\alias{xgb.gblinear.history}
|
\alias{xgb.gblinear.history}
|
||||||
\title{Extract gblinear coefficients history.}
|
\title{Extract gblinear coefficients history}
|
||||||
\usage{
|
\usage{
|
||||||
xgb.gblinear.history(model, class_index = NULL)
|
xgb.gblinear.history(model, class_index = NULL)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{model}{either an \code{xgb.Booster} or a result of \code{xgb.cv()}, trained
|
\item{model}{Either an \code{xgb.Booster} or a result of \code{\link[=xgb.cv]{xgb.cv()}}, trained
|
||||||
using the \link{xgb.cb.gblinear.history} callback, but \bold{not} a booster
|
using the \link{xgb.cb.gblinear.history} callback, but \strong{not} a booster
|
||||||
loaded from \link{xgb.load} or \link{xgb.load.raw}.}
|
loaded from \code{\link[=xgb.load]{xgb.load()}} or \code{\link[=xgb.load.raw]{xgb.load.raw()}}.}
|
||||||
|
|
||||||
\item{class_index}{zero-based class index to extract the coefficients for only that
|
\item{class_index}{zero-based class index to extract the coefficients for only that
|
||||||
specific class in a multinomial multiclass model. When it is NULL, all the
|
specific class in a multinomial multiclass model. When it is \code{NULL}, all the
|
||||||
coefficients are returned. Has no effect in non-multiclass models.}
|
coefficients are returned. Has no effect in non-multiclass models.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
For an \link{xgb.train} result, a matrix (either dense or sparse) with the columns
|
For an \code{\link[=xgb.train]{xgb.train()}} result, a matrix (either dense or sparse) with the columns
|
||||||
corresponding to iteration's coefficients and the rows corresponding to boosting iterations.
|
corresponding to iteration's coefficients and the rows corresponding to boosting iterations.
|
||||||
|
|
||||||
For an \link{xgb.cv} result, a list of such matrices is returned with the elements
|
For an \code{\link[=xgb.cv]{xgb.cv()}} result, a list of such matrices is returned with the elements
|
||||||
corresponding to CV folds.
|
corresponding to CV folds.
|
||||||
|
|
||||||
When there is more than one coefficient per feature (e.g. multi-class classification)
|
When there is more than one coefficient per feature (e.g. multi-class classification)
|
||||||
@ -31,15 +31,15 @@ coefficients N+1 through 2N for the second class, and so on).
|
|||||||
\description{
|
\description{
|
||||||
A helper function to extract the matrix of linear coefficients' history
|
A helper function to extract the matrix of linear coefficients' history
|
||||||
from a gblinear model created while using the \link{xgb.cb.gblinear.history}
|
from a gblinear model created while using the \link{xgb.cb.gblinear.history}
|
||||||
callback (which must be added manually as by default it's not used).
|
callback (which must be added manually as by default it is not used).
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
Note that this is an R-specific function that relies on R attributes that
|
Note that this is an R-specific function that relies on R attributes that
|
||||||
are not saved when using xgboost's own serialization functions like \link{xgb.load}
|
are not saved when using XGBoost's own serialization functions like \code{\link[=xgb.load]{xgb.load()}}
|
||||||
or \link{xgb.load.raw}.
|
or \code{\link[=xgb.load.raw]{xgb.load.raw()}}.
|
||||||
|
|
||||||
In order for a serialized model to be accepted by this function, one must use R
|
In order for a serialized model to be accepted by this function, one must use R
|
||||||
serializers such as \link{saveRDS}.
|
serializers such as \code{\link[=saveRDS]{saveRDS()}}.
|
||||||
}
|
}
|
||||||
\seealso{
|
\seealso{
|
||||||
\link{xgb.cb.gblinear.history}, \link{coef.xgb.Booster}.
|
\link{xgb.cb.gblinear.history}, \link{coef.xgb.Booster}.
|
||||||
|
|||||||
@ -13,13 +13,13 @@ xgb.get.num.boosted.rounds(model)
|
|||||||
\item{model, x}{A fitted \code{xgb.Booster} model.}
|
\item{model, x}{A fitted \code{xgb.Booster} model.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
The number of rounds saved in the model, as an integer.
|
The number of rounds saved in the model as an integer.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Get number of boosting in a fitted booster
|
Get number of boosting in a fitted booster
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
Note that setting booster parameters related to training
|
Note that setting booster parameters related to training
|
||||||
continuation / updates through \link{xgb.parameters<-} will reset the
|
continuation / updates through \code{\link[=xgb.parameters<-]{xgb.parameters<-()}} will reset the
|
||||||
number of rounds to zero.
|
number of rounds to zero.
|
||||||
}
|
}
|
||||||
|
|||||||
@ -12,30 +12,33 @@ xgb.is.same.Booster(obj1, obj2)
|
|||||||
\item{obj2}{Booster model to compare with \code{obj1}.}
|
\item{obj2}{Booster model to compare with \code{obj1}.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
Either \code{TRUE} or \code{FALSE} according to whether the two boosters share
|
Either \code{TRUE} or \code{FALSE} according to whether the two boosters share the
|
||||||
the underlying C object.
|
underlying C object.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Checks whether two booster objects refer to the same underlying C object.
|
Checks whether two booster objects refer to the same underlying C object.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
As booster objects (as returned by e.g. \link{xgb.train}) contain an R 'externalptr'
|
As booster objects (as returned by e.g. \code{\link[=xgb.train]{xgb.train()}}) contain an R 'externalptr'
|
||||||
object, they don't follow typical copy-on-write semantics of other R objects - that is, if
|
object, they don't follow typical copy-on-write semantics of other R objects - that is, if
|
||||||
one assigns a booster to a different variable and modifies that new variable through in-place
|
one assigns a booster to a different variable and modifies that new variable through in-place
|
||||||
methods like \link{xgb.attr<-}, the modification will be applied to both the old and the new
|
methods like \code{\link[=xgb.attr<-]{xgb.attr<-()}}, the modification will be applied to both the old and the new
|
||||||
variable, unlike typical R assignments which would only modify the latter.
|
variable, unlike typical R assignments which would only modify the latter.
|
||||||
|
|
||||||
This function allows checking whether two booster objects share the same 'externalptr',
|
This function allows checking whether two booster objects share the same 'externalptr',
|
||||||
regardless of the R attributes that they might have.
|
regardless of the R attributes that they might have.
|
||||||
|
|
||||||
In order to duplicate a booster in such a way that the copy wouldn't share the same
|
In order to duplicate a booster in such a way that the copy wouldn't share the same
|
||||||
'externalptr', one can use function \link{xgb.copy.Booster}.
|
'externalptr', one can use function \code{\link[=xgb.copy.Booster]{xgb.copy.Booster()}}.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
library(xgboost)
|
library(xgboost)
|
||||||
|
|
||||||
data(mtcars)
|
data(mtcars)
|
||||||
|
|
||||||
y <- mtcars$mpg
|
y <- mtcars$mpg
|
||||||
x <- as.matrix(mtcars[, -1])
|
x <- as.matrix(mtcars[, -1])
|
||||||
|
|
||||||
model <- xgb.train(
|
model <- xgb.train(
|
||||||
params = list(nthread = 1),
|
params = list(nthread = 1),
|
||||||
data = xgb.DMatrix(x, label = y, nthread = 1),
|
data = xgb.DMatrix(x, label = y, nthread = 1),
|
||||||
@ -55,5 +58,5 @@ xgb.attr(model, "my_attr") # gets modified
|
|||||||
xgb.attr(model_deep_copy, "my_attr") # doesn't get modified
|
xgb.attr(model_deep_copy, "my_attr") # doesn't get modified
|
||||||
}
|
}
|
||||||
\seealso{
|
\seealso{
|
||||||
\link{xgb.copy.Booster}
|
\code{\link[=xgb.copy.Booster]{xgb.copy.Booster()}}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -2,32 +2,32 @@
|
|||||||
% Please edit documentation in R/xgb.load.R
|
% Please edit documentation in R/xgb.load.R
|
||||||
\name{xgb.load}
|
\name{xgb.load}
|
||||||
\alias{xgb.load}
|
\alias{xgb.load}
|
||||||
\title{Load xgboost model from binary file}
|
\title{Load XGBoost model from binary file}
|
||||||
\usage{
|
\usage{
|
||||||
xgb.load(modelfile)
|
xgb.load(modelfile)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{modelfile}{the name of the binary input file.}
|
\item{modelfile}{The name of the binary input file.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
An object of \code{xgb.Booster} class.
|
An object of \code{xgb.Booster} class.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Load xgboost model from the binary model file.
|
Load XGBoost model from binary model file.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
The input file is expected to contain a model saved in an xgboost model format
|
The input file is expected to contain a model saved in an XGBoost model format
|
||||||
using either \code{\link{xgb.save}} or \code{\link{xgb.cb.save.model}} in R, or using some
|
using either \code{\link[=xgb.save]{xgb.save()}} in R, or using some
|
||||||
appropriate methods from other xgboost interfaces. E.g., a model trained in Python and
|
appropriate methods from other XGBoost interfaces. E.g., a model trained in Python and
|
||||||
saved from there in xgboost format, could be loaded from R.
|
saved from there in XGBoost format, could be loaded from R.
|
||||||
|
|
||||||
Note: a model saved as an R-object, has to be loaded using corresponding R-methods,
|
Note: a model saved as an R object has to be loaded using corresponding R-methods,
|
||||||
not \code{xgb.load}.
|
not by \code{\link[=xgb.load]{xgb.load()}}.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
\dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
\dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
||||||
data(agaricus.train, package='xgboost')
|
data(agaricus.train, package = "xgboost")
|
||||||
data(agaricus.test, package='xgboost')
|
data(agaricus.test, package = "xgboost")
|
||||||
|
|
||||||
## Keep the number of threads to 1 for examples
|
## Keep the number of threads to 1 for examples
|
||||||
nthread <- 1
|
nthread <- 1
|
||||||
@ -35,6 +35,7 @@ data.table::setDTthreads(nthread)
|
|||||||
|
|
||||||
train <- agaricus.train
|
train <- agaricus.train
|
||||||
test <- agaricus.test
|
test <- agaricus.test
|
||||||
|
|
||||||
bst <- xgb.train(
|
bst <- xgb.train(
|
||||||
data = xgb.DMatrix(train$data, label = train$label),
|
data = xgb.DMatrix(train$data, label = train$label),
|
||||||
max_depth = 2,
|
max_depth = 2,
|
||||||
@ -49,5 +50,5 @@ xgb.save(bst, fname)
|
|||||||
bst <- xgb.load(fname)
|
bst <- xgb.load(fname)
|
||||||
}
|
}
|
||||||
\seealso{
|
\seealso{
|
||||||
\code{\link{xgb.save}}
|
\code{\link[=xgb.save]{xgb.save()}}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -2,13 +2,13 @@
|
|||||||
% Please edit documentation in R/xgb.load.raw.R
|
% Please edit documentation in R/xgb.load.raw.R
|
||||||
\name{xgb.load.raw}
|
\name{xgb.load.raw}
|
||||||
\alias{xgb.load.raw}
|
\alias{xgb.load.raw}
|
||||||
\title{Load serialised xgboost model from R's raw vector}
|
\title{Load serialised XGBoost model from R's raw vector}
|
||||||
\usage{
|
\usage{
|
||||||
xgb.load.raw(buffer)
|
xgb.load.raw(buffer)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{buffer}{the buffer returned by xgb.save.raw}
|
\item{buffer}{The buffer returned by \code{\link[=xgb.save.raw]{xgb.save.raw()}}.}
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
User can generate raw memory buffer by calling xgb.save.raw
|
User can generate raw memory buffer by calling \code{\link[=xgb.save.raw]{xgb.save.raw()}}.
|
||||||
}
|
}
|
||||||
|
|||||||
@ -13,15 +13,14 @@ xgb.model.dt.tree(
|
|||||||
)
|
)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{model}{Object of class \code{xgb.Booster}. If it contains feature names (they can be set through
|
\item{model}{Object of class \code{xgb.Booster}. If it contains feature names (they can
|
||||||
\link{setinfo}), they will be used in the output from this function.}
|
be set through \code{\link[=setinfo]{setinfo()}}), they will be used in the output from this function.}
|
||||||
|
|
||||||
\item{text}{Character vector previously generated by the function \code{\link[=xgb.dump]{xgb.dump()}}
|
\item{text}{Character vector previously generated by the function \code{\link[=xgb.dump]{xgb.dump()}}
|
||||||
(called with parameter \code{with_stats = TRUE}). \code{text} takes precedence over \code{model}.}
|
(called with parameter \code{with_stats = TRUE}). \code{text} takes precedence over \code{model}.}
|
||||||
|
|
||||||
\item{trees}{An integer vector of tree indices that should be used.
|
\item{trees}{An integer vector of tree indices that should be used. The default
|
||||||
The default (\code{NULL}) uses all trees.
|
(\code{NULL}) uses all trees. Useful, e.g., in multiclass classification to get only
|
||||||
Useful, e.g., in multiclass classification to get only
|
|
||||||
the trees of one class. \emph{Important}: the tree index in XGBoost models
|
the trees of one class. \emph{Important}: the tree index in XGBoost models
|
||||||
is zero-based (e.g., use \code{trees = 0:4} for the first five trees).}
|
is zero-based (e.g., use \code{trees = 0:4} for the first five trees).}
|
||||||
|
|
||||||
|
|||||||
@ -7,7 +7,7 @@
|
|||||||
xgb.parameters(object) <- value
|
xgb.parameters(object) <- value
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{object}{Object of class \code{xgb.Booster}. \bold{Will be modified in-place}.}
|
\item{object}{Object of class \code{xgb.Booster}. \strong{Will be modified in-place}.}
|
||||||
|
|
||||||
\item{value}{A list (or an object coercible to a list) with the names of parameters to set
|
\item{value}{A list (or an object coercible to a list) with the names of parameters to set
|
||||||
and the elements corresponding to parameter values.}
|
and the elements corresponding to parameter values.}
|
||||||
@ -16,21 +16,22 @@ and the elements corresponding to parameter values.}
|
|||||||
The same booster \code{object}, which gets modified in-place.
|
The same booster \code{object}, which gets modified in-place.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Only the setter for xgboost parameters is currently implemented.
|
Only the setter for XGBoost parameters is currently implemented.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
Just like \link{xgb.attr}, this function will make in-place modifications
|
Just like \code{\link[=xgb.attr]{xgb.attr()}}, this function will make in-place modifications
|
||||||
on the booster object which do not follow typical R assignment semantics - that is,
|
on the booster object which do not follow typical R assignment semantics - that is,
|
||||||
all references to the same booster will also be updated, unlike assingment of R
|
all references to the same booster will also be updated, unlike assingment of R
|
||||||
attributes which follow copy-on-write semantics.
|
attributes which follow copy-on-write semantics.
|
||||||
|
|
||||||
See \link{xgb.copy.Booster} for an example of this behavior.
|
See \code{\link[=xgb.copy.Booster]{xgb.copy.Booster()}} for an example of this behavior.
|
||||||
|
|
||||||
Be aware that setting parameters of a fitted booster related to training continuation / updates
|
Be aware that setting parameters of a fitted booster related to training continuation / updates
|
||||||
will reset its number of rounds indicator to zero.
|
will reset its number of rounds indicator to zero.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
data(agaricus.train, package = "xgboost")
|
data(agaricus.train, package = "xgboost")
|
||||||
|
|
||||||
train <- agaricus.train
|
train <- agaricus.train
|
||||||
|
|
||||||
bst <- xgb.train(
|
bst <- xgb.train(
|
||||||
|
|||||||
@ -18,7 +18,8 @@ xgb.plot.deepness(
|
|||||||
)
|
)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{model}{Either an \code{xgb.Booster} model, or the "data.table" returned by \code{\link[=xgb.model.dt.tree]{xgb.model.dt.tree()}}.}
|
\item{model}{Either an \code{xgb.Booster} model, or the "data.table" returned
|
||||||
|
by \code{\link[=xgb.model.dt.tree]{xgb.model.dt.tree()}}.}
|
||||||
|
|
||||||
\item{which}{Which distribution to plot (see details).}
|
\item{which}{Which distribution to plot (see details).}
|
||||||
|
|
||||||
|
|||||||
@ -67,8 +67,8 @@ Represents previously calculated feature importance as a bar graph.
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
The graph represents each feature as a horizontal bar of length proportional to the importance of a feature.
|
The graph represents each feature as a horizontal bar of length proportional to the
|
||||||
Features are sorted by decreasing importance.
|
importance of a feature. Features are sorted by decreasing importance.
|
||||||
It works for both "gblinear" and "gbtree" models.
|
It works for both "gblinear" and "gbtree" models.
|
||||||
|
|
||||||
When \code{rel_to_first = FALSE}, the values would be plotted as in \code{importance_matrix}.
|
When \code{rel_to_first = FALSE}, the values would be plotted as in \code{importance_matrix}.
|
||||||
|
|||||||
@ -21,11 +21,11 @@ xgb.plot.multi.trees(
|
|||||||
by default 5.}
|
by default 5.}
|
||||||
|
|
||||||
\item{plot_width, plot_height}{Width and height of the graph in pixels.
|
\item{plot_width, plot_height}{Width and height of the graph in pixels.
|
||||||
The values are passed to \code{\link[DiagrammeR:render_graph]{DiagrammeR::render_graph()}}.}
|
The values are passed to \code{DiagrammeR::render_graph()}.}
|
||||||
|
|
||||||
\item{render}{Should the graph be rendered or not? The default is \code{TRUE}.}
|
\item{render}{Should the graph be rendered or not? The default is \code{TRUE}.}
|
||||||
|
|
||||||
\item{...}{currently not used.}
|
\item{...}{Currently not used.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
The value depends on the \code{render} parameter:
|
The value depends on the \code{render} parameter:
|
||||||
@ -35,7 +35,7 @@ class \code{grViz}. Similar to "ggplot" objects, it needs to be printed when not
|
|||||||
running from the command line.
|
running from the command line.
|
||||||
\item If \code{render = FALSE}: Graph object which is of DiagrammeR's class \code{dgr_graph}.
|
\item If \code{render = FALSE}: Graph object which is of DiagrammeR's class \code{dgr_graph}.
|
||||||
This could be useful if one wants to modify some of the graph attributes
|
This could be useful if one wants to modify some of the graph attributes
|
||||||
before rendering the graph with \code{\link[DiagrammeR:render_graph]{DiagrammeR::render_graph()}}.
|
before rendering the graph with \code{DiagrammeR::render_graph()}.
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
|
|||||||
@ -38,12 +38,11 @@ xgb.plot.shap(
|
|||||||
\item{shap_contrib}{Matrix of SHAP contributions of \code{data}.
|
\item{shap_contrib}{Matrix of SHAP contributions of \code{data}.
|
||||||
The default (\code{NULL}) computes it from \code{model} and \code{data}.}
|
The default (\code{NULL}) computes it from \code{model} and \code{data}.}
|
||||||
|
|
||||||
\item{features}{Vector of column indices or feature names to plot.
|
\item{features}{Vector of column indices or feature names to plot. When \code{NULL}
|
||||||
When \code{NULL} (default), the \code{top_n} most important features are selected
|
(default), the \code{top_n} most important features are selected by \code{\link[=xgb.importance]{xgb.importance()}}.}
|
||||||
by \code{\link[=xgb.importance]{xgb.importance()}}.}
|
|
||||||
|
|
||||||
\item{top_n}{How many of the most important features (<= 100) should be selected?
|
\item{top_n}{How many of the most important features (<= 100) should be selected?
|
||||||
By default 1 for SHAP dependence and 10 for SHAP summary).
|
By default 1 for SHAP dependence and 10 for SHAP summary.
|
||||||
Only used when \code{features = NULL}.}
|
Only used when \code{features = NULL}.}
|
||||||
|
|
||||||
\item{model}{An \code{xgb.Booster} model. Only required when \code{shap_contrib = NULL} or
|
\item{model}{An \code{xgb.Booster} model. Only required when \code{shap_contrib = NULL} or
|
||||||
@ -173,6 +172,7 @@ mbst <- xgb.train(
|
|||||||
)
|
)
|
||||||
trees0 <- seq(from = 0, by = nclass, length.out = nrounds)
|
trees0 <- seq(from = 0, by = nclass, length.out = nrounds)
|
||||||
col <- rgb(0, 0, 1, 0.5)
|
col <- rgb(0, 0, 1, 0.5)
|
||||||
|
|
||||||
xgb.plot.shap(
|
xgb.plot.shap(
|
||||||
x,
|
x,
|
||||||
model = mbst,
|
model = mbst,
|
||||||
|
|||||||
@ -35,12 +35,11 @@ xgb.plot.shap.summary(
|
|||||||
\item{shap_contrib}{Matrix of SHAP contributions of \code{data}.
|
\item{shap_contrib}{Matrix of SHAP contributions of \code{data}.
|
||||||
The default (\code{NULL}) computes it from \code{model} and \code{data}.}
|
The default (\code{NULL}) computes it from \code{model} and \code{data}.}
|
||||||
|
|
||||||
\item{features}{Vector of column indices or feature names to plot.
|
\item{features}{Vector of column indices or feature names to plot. When \code{NULL}
|
||||||
When \code{NULL} (default), the \code{top_n} most important features are selected
|
(default), the \code{top_n} most important features are selected by \code{\link[=xgb.importance]{xgb.importance()}}.}
|
||||||
by \code{\link[=xgb.importance]{xgb.importance()}}.}
|
|
||||||
|
|
||||||
\item{top_n}{How many of the most important features (<= 100) should be selected?
|
\item{top_n}{How many of the most important features (<= 100) should be selected?
|
||||||
By default 1 for SHAP dependence and 10 for SHAP summary).
|
By default 1 for SHAP dependence and 10 for SHAP summary.
|
||||||
Only used when \code{features = NULL}.}
|
Only used when \code{features = NULL}.}
|
||||||
|
|
||||||
\item{model}{An \code{xgb.Booster} model. Only required when \code{shap_contrib = NULL} or
|
\item{model}{An \code{xgb.Booster} model. Only required when \code{shap_contrib = NULL} or
|
||||||
|
|||||||
@ -26,13 +26,14 @@ the trees of one class. \emph{Important}: the tree index in XGBoost models
|
|||||||
is zero-based (e.g., use \code{trees = 0:2} for the first three trees).}
|
is zero-based (e.g., use \code{trees = 0:2} for the first three trees).}
|
||||||
|
|
||||||
\item{plot_width, plot_height}{Width and height of the graph in pixels.
|
\item{plot_width, plot_height}{Width and height of the graph in pixels.
|
||||||
The values are passed to \code{\link[DiagrammeR:render_graph]{DiagrammeR::render_graph()}}.}
|
The values are passed to \code{DiagrammeR::render_graph()}.}
|
||||||
|
|
||||||
\item{render}{Should the graph be rendered or not? The default is \code{TRUE}.}
|
\item{render}{Should the graph be rendered or not? The default is \code{TRUE}.}
|
||||||
|
|
||||||
\item{show_node_id}{a logical flag for whether to show node id's in the graph.}
|
\item{show_node_id}{a logical flag for whether to show node id's in the graph.}
|
||||||
|
|
||||||
\item{style}{Style to use for the plot. Options are:\itemize{
|
\item{style}{Style to use for the plot:
|
||||||
|
\itemize{
|
||||||
\item \code{"xgboost"}: will use the plot style defined in the core XGBoost library,
|
\item \code{"xgboost"}: will use the plot style defined in the core XGBoost library,
|
||||||
which is shared between different interfaces through the 'dot' format. This
|
which is shared between different interfaces through the 'dot' format. This
|
||||||
style was not available before version 2.1.0 in R. It always plots the trees
|
style was not available before version 2.1.0 in R. It always plots the trees
|
||||||
@ -42,13 +43,14 @@ the introducition of the standardized style from the core library. It might plot
|
|||||||
the trees horizontally (from left to right).
|
the trees horizontally (from left to right).
|
||||||
}
|
}
|
||||||
|
|
||||||
Note that \code{style="xgboost"} is only supported when all of the following conditions are met:\itemize{
|
Note that \code{style="xgboost"} is only supported when all of the following conditions are met:
|
||||||
|
\itemize{
|
||||||
\item Only a single tree is being plotted.
|
\item Only a single tree is being plotted.
|
||||||
\item Node IDs are not added to the graph.
|
\item Node IDs are not added to the graph.
|
||||||
\item The graph is being returned as \code{htmlwidget} (\code{render=TRUE}).
|
\item The graph is being returned as \code{htmlwidget} (\code{render=TRUE}).
|
||||||
}}
|
}}
|
||||||
|
|
||||||
\item{...}{currently not used.}
|
\item{...}{Currently not used.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
The value depends on the \code{render} parameter:
|
The value depends on the \code{render} parameter:
|
||||||
@ -58,7 +60,7 @@ class \code{grViz}. Similar to "ggplot" objects, it needs to be printed when not
|
|||||||
running from the command line.
|
running from the command line.
|
||||||
\item If \code{render = FALSE}: Graph object which is of DiagrammeR's class \code{dgr_graph}.
|
\item If \code{render = FALSE}: Graph object which is of DiagrammeR's class \code{dgr_graph}.
|
||||||
This could be useful if one wants to modify some of the graph attributes
|
This could be useful if one wants to modify some of the graph attributes
|
||||||
before rendering the graph with \code{\link[DiagrammeR:render_graph]{DiagrammeR::render_graph()}}.
|
before rendering the graph with \code{DiagrammeR::render_graph()}.
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
|
|||||||
@ -2,21 +2,20 @@
|
|||||||
% Please edit documentation in R/xgb.save.R
|
% Please edit documentation in R/xgb.save.R
|
||||||
\name{xgb.save}
|
\name{xgb.save}
|
||||||
\alias{xgb.save}
|
\alias{xgb.save}
|
||||||
\title{Save xgboost model to binary file}
|
\title{Save XGBoost model to binary file}
|
||||||
\usage{
|
\usage{
|
||||||
xgb.save(model, fname)
|
xgb.save(model, fname)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{model}{Model object of \code{xgb.Booster} class.}
|
\item{model}{Model object of \code{xgb.Booster} class.}
|
||||||
|
|
||||||
\item{fname}{Name of the file to write.
|
\item{fname}{Name of the file to write. Its extension determines the serialization format:
|
||||||
|
\itemize{
|
||||||
Note that the extension of this file name determined the serialization format to use:\itemize{
|
\item ".ubj": Use the universal binary JSON format (recommended).
|
||||||
\item Extension ".ubj" will use the universal binary JSON format (recommended).
|
|
||||||
This format uses binary types for e.g. floating point numbers, thereby preventing any loss
|
This format uses binary types for e.g. floating point numbers, thereby preventing any loss
|
||||||
of precision when converting to a human-readable JSON text or similar.
|
of precision when converting to a human-readable JSON text or similar.
|
||||||
\item Extension ".json" will use plain JSON, which is a human-readable format.
|
\item ".json": Use plain JSON, which is a human-readable format.
|
||||||
\item Extension ".deprecated" will use a \bold{deprecated} binary format. This format will
|
\item ".deprecated": Use \strong{deprecated} binary format. This format will
|
||||||
not be able to save attributes introduced after v1 of XGBoost, such as the "best_iteration"
|
not be able to save attributes introduced after v1 of XGBoost, such as the "best_iteration"
|
||||||
attribute that boosters might keep, nor feature names or user-specifiec attributes.
|
attribute that boosters might keep, nor feature names or user-specifiec attributes.
|
||||||
\item If the format is not specified by passing one of the file extensions above, will
|
\item If the format is not specified by passing one of the file extensions above, will
|
||||||
@ -24,26 +23,25 @@ default to UBJ.
|
|||||||
}}
|
}}
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Save xgboost model to a file in binary or JSON format.
|
Save XGBoost model to a file in binary or JSON format.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
This methods allows to save a model in an xgboost-internal binary or text format which is universal
|
This methods allows to save a model in an XGBoost-internal binary or text format which is universal
|
||||||
among the various xgboost interfaces. In R, the saved model file could be read-in later
|
among the various xgboost interfaces. In R, the saved model file could be read later
|
||||||
using either the \code{\link{xgb.load}} function or the \code{xgb_model} parameter
|
using either the \code{\link[=xgb.load]{xgb.load()}} function or the \code{xgb_model} parameter of \code{\link[=xgb.train]{xgb.train()}}.
|
||||||
of \code{\link{xgb.train}}.
|
|
||||||
|
|
||||||
Note: a model can also be saved as an R-object (e.g., by using \code{\link[base]{readRDS}}
|
Note: a model can also be saved as an R object (e.g., by using \code{\link[=readRDS]{readRDS()}}
|
||||||
or \code{\link[base]{save}}). However, it would then only be compatible with R, and
|
or \code{\link[=save]{save()}}). However, it would then only be compatible with R, and
|
||||||
corresponding R-methods would need to be used to load it. Moreover, persisting the model with
|
corresponding R methods would need to be used to load it. Moreover, persisting the model with
|
||||||
\code{\link[base]{readRDS}} or \code{\link[base]{save}}) might cause compatibility problems in
|
\code{\link[=readRDS]{readRDS()}} or \code{\link[=save]{save()}} might cause compatibility problems in
|
||||||
future versions of XGBoost. Consult \code{\link{a-compatibility-note-for-saveRDS-save}} to learn
|
future versions of XGBoost. Consult \link{a-compatibility-note-for-saveRDS-save} to learn
|
||||||
how to persist models in a future-proof way, i.e. to make the model accessible in future
|
how to persist models in a future-proof way, i.e., to make the model accessible in future
|
||||||
releases of XGBoost.
|
releases of XGBoost.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
\dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
\dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
||||||
data(agaricus.train, package='xgboost')
|
data(agaricus.train, package = "xgboost")
|
||||||
data(agaricus.test, package='xgboost')
|
data(agaricus.test, package = "xgboost")
|
||||||
|
|
||||||
## Keep the number of threads to 1 for examples
|
## Keep the number of threads to 1 for examples
|
||||||
nthread <- 1
|
nthread <- 1
|
||||||
@ -51,6 +49,7 @@ data.table::setDTthreads(nthread)
|
|||||||
|
|
||||||
train <- agaricus.train
|
train <- agaricus.train
|
||||||
test <- agaricus.test
|
test <- agaricus.test
|
||||||
|
|
||||||
bst <- xgb.train(
|
bst <- xgb.train(
|
||||||
data = xgb.DMatrix(train$data, label = train$label),
|
data = xgb.DMatrix(train$data, label = train$label),
|
||||||
max_depth = 2,
|
max_depth = 2,
|
||||||
@ -59,10 +58,11 @@ bst <- xgb.train(
|
|||||||
nrounds = 2,
|
nrounds = 2,
|
||||||
objective = "binary:logistic"
|
objective = "binary:logistic"
|
||||||
)
|
)
|
||||||
|
|
||||||
fname <- file.path(tempdir(), "xgb.ubj")
|
fname <- file.path(tempdir(), "xgb.ubj")
|
||||||
xgb.save(bst, fname)
|
xgb.save(bst, fname)
|
||||||
bst <- xgb.load(fname)
|
bst <- xgb.load(fname)
|
||||||
}
|
}
|
||||||
\seealso{
|
\seealso{
|
||||||
\code{\link{xgb.load}}
|
\code{\link[=xgb.load]{xgb.load()}}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -2,37 +2,44 @@
|
|||||||
% Please edit documentation in R/xgb.save.raw.R
|
% Please edit documentation in R/xgb.save.raw.R
|
||||||
\name{xgb.save.raw}
|
\name{xgb.save.raw}
|
||||||
\alias{xgb.save.raw}
|
\alias{xgb.save.raw}
|
||||||
\title{Save xgboost model to R's raw vector,
|
\title{Save XGBoost model to R's raw vector}
|
||||||
user can call xgb.load.raw to load the model back from raw vector}
|
|
||||||
\usage{
|
\usage{
|
||||||
xgb.save.raw(model, raw_format = "ubj")
|
xgb.save.raw(model, raw_format = "ubj")
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{model}{the model object.}
|
\item{model}{The model object.}
|
||||||
|
|
||||||
\item{raw_format}{The format for encoding the booster. Available options are
|
\item{raw_format}{The format for encoding the booster:
|
||||||
\itemize{
|
\itemize{
|
||||||
\item \code{json}: Encode the booster into JSON text document.
|
\item "json": Encode the booster into JSON text document.
|
||||||
\item \code{ubj}: Encode the booster into Universal Binary JSON.
|
\item "ubj": Encode the booster into Universal Binary JSON.
|
||||||
\item \code{deprecated}: Encode the booster into old customized binary format.
|
\item "deprecated": Encode the booster into old customized binary format.
|
||||||
}}
|
}}
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Save xgboost model from xgboost or xgb.train
|
Save XGBoost model from \code{\link[=xgboost]{xgboost()}} or \code{\link[=xgb.train]{xgb.train()}}.
|
||||||
|
Call \code{\link[=xgb.load.raw]{xgb.load.raw()}} to load the model back from raw vector.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
\dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
\dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
||||||
data(agaricus.train, package='xgboost')
|
data(agaricus.train, package = "xgboost")
|
||||||
data(agaricus.test, package='xgboost')
|
data(agaricus.test, package = "xgboost")
|
||||||
|
|
||||||
## Keep the number of threads to 2 for examples
|
## Keep the number of threads to 1 for examples
|
||||||
nthread <- 2
|
nthread <- 1
|
||||||
data.table::setDTthreads(nthread)
|
data.table::setDTthreads(nthread)
|
||||||
|
|
||||||
train <- agaricus.train
|
train <- agaricus.train
|
||||||
test <- agaricus.test
|
test <- agaricus.test
|
||||||
bst <- xgb.train(data = xgb.DMatrix(train$data, label = train$label), max_depth = 2,
|
|
||||||
eta = 1, nthread = nthread, nrounds = 2,objective = "binary:logistic")
|
bst <- xgb.train(
|
||||||
|
data = xgb.DMatrix(train$data, label = train$label),
|
||||||
|
max_depth = 2,
|
||||||
|
eta = 1,
|
||||||
|
nthread = nthread,
|
||||||
|
nrounds = 2,
|
||||||
|
objective = "binary:logistic"
|
||||||
|
)
|
||||||
|
|
||||||
raw <- xgb.save.raw(bst)
|
raw <- xgb.save.raw(bst)
|
||||||
bst <- xgb.load.raw(raw)
|
bst <- xgb.load.raw(raw)
|
||||||
|
|||||||
@ -18,10 +18,9 @@ xgb.slice.Booster(
|
|||||||
\item{model, x}{A fitted \code{xgb.Booster} object, which is to be sliced by taking only a subset
|
\item{model, x}{A fitted \code{xgb.Booster} object, which is to be sliced by taking only a subset
|
||||||
of its rounds / iterations.}
|
of its rounds / iterations.}
|
||||||
|
|
||||||
\item{start}{Start of the slice (base-1 and inclusive, like R's \link{seq}).}
|
\item{start}{Start of the slice (base-1 and inclusive, like R's \code{\link[=seq]{seq()}}).}
|
||||||
|
|
||||||
\item{end}{End of the slice (base-1 and inclusive, like R's \link{seq}).
|
|
||||||
|
|
||||||
|
\item{end}{End of the slice (base-1 and inclusive, like R's \code{\link[=seq]{seq()}}).
|
||||||
Passing a value of zero here is equivalent to passing the full number of rounds in the
|
Passing a value of zero here is equivalent to passing the full number of rounds in the
|
||||||
booster object.}
|
booster object.}
|
||||||
|
|
||||||
@ -43,8 +42,10 @@ the resulting object.
|
|||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
data(mtcars)
|
data(mtcars)
|
||||||
|
|
||||||
y <- mtcars$mpg
|
y <- mtcars$mpg
|
||||||
x <- as.matrix(mtcars[, -1])
|
x <- as.matrix(mtcars[, -1])
|
||||||
|
|
||||||
dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
||||||
model <- xgb.train(data = dm, params = list(nthread = 1), nrounds = 5)
|
model <- xgb.train(data = dm, params = list(nthread = 1), nrounds = 5)
|
||||||
model_slice <- xgb.slice.Booster(model, 1, 3)
|
model_slice <- xgb.slice.Booster(model, 1, 3)
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user