[R] Finalizes switch to markdown doc (#10733)
--------- Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
This commit is contained in:
parent
479ae8081b
commit
074cad2343
@ -56,7 +56,7 @@
|
||||
#' It should match with argument `nrounds` passed to [xgb.train()] or [xgb.cv()].
|
||||
#'
|
||||
#' Note that boosting might be interrupted before reaching this last iteration, for
|
||||
#' example by using the early stopping callback \link{xgb.cb.early.stop}.
|
||||
#' example by using the early stopping callback [xgb.cb.early.stop()].
|
||||
#' - iteration Index of the iteration number that is being executed (first iteration
|
||||
#' will be the same as parameter `begin_iteration`, then next one will add +1, and so on).
|
||||
#'
|
||||
|
||||
@ -222,12 +222,11 @@ xgb.get.handle <- function(object) {
|
||||
#' For multi-class and multi-target, will be a 4D array with dimensions `[nrows, ngroups, nfeats+1, nfeats+1]`
|
||||
#' }
|
||||
#'
|
||||
#' If passing `strict_shape=FALSE`, the result is always an array:\itemize{
|
||||
#' \item For normal predictions, the dimension is `[nrows, ngroups]`.
|
||||
#' \item For `predcontrib=TRUE`, the dimension is `[nrows, ngroups, nfeats+1]`.
|
||||
#' \item For `predinteraction=TRUE`, the dimension is `[nrows, ngroups, nfeats+1, nfeats+1]`.
|
||||
#' \item For `predleaf=TRUE`, the dimension is `[nrows, niter, ngroups, num_parallel_tree]`.
|
||||
#' }
|
||||
#' If passing `strict_shape=FALSE`, the result is always an array:
|
||||
#' - For normal predictions, the dimension is `[nrows, ngroups]`.
|
||||
#' - For `predcontrib=TRUE`, the dimension is `[nrows, ngroups, nfeats+1]`.
|
||||
#' - For `predinteraction=TRUE`, the dimension is `[nrows, ngroups, nfeats+1, nfeats+1]`.
|
||||
#' - For `predleaf=TRUE`, the dimension is `[nrows, niter, ngroups, num_parallel_tree]`.
|
||||
#'
|
||||
#' If passing `avoid_transpose=TRUE`, then the dimensions in all cases will be in reverse order - for
|
||||
#' example, for `predinteraction`, they will be `[nfeats+1, nfeats+1, ngroups, nrows]`
|
||||
@ -623,7 +622,7 @@ validate.features <- function(bst, newdata) {
|
||||
#' change the value of that parameter for a model.
|
||||
#' Use [xgb.parameters<-()] to set or change model parameters.
|
||||
#'
|
||||
#' The [xgb.attributes<-()] setter either updates the existing or adds one or several attributes,
|
||||
#' The `xgb.attributes<-` setter either updates the existing or adds one or several attributes,
|
||||
#' but it doesn't delete the other existing attributes.
|
||||
#'
|
||||
#' Important: since this modifies the booster's C object, semantics for assignment here
|
||||
|
||||
@ -1,9 +1,9 @@
|
||||
#' Construct xgb.DMatrix object
|
||||
#'
|
||||
#' Construct an 'xgb.DMatrix' object from a given data source, which can then be passed to functions
|
||||
#' such as \link{xgb.train} or \link{predict.xgb.Booster}.
|
||||
#' such as [xgb.train()] or [predict()].
|
||||
#'
|
||||
#' Function 'xgb.QuantileDMatrix' will construct a DMatrix with quantization for the histogram
|
||||
#' Function `xgb.QuantileDMatrix()` will construct a DMatrix with quantization for the histogram
|
||||
#' method already applied to it, which can be used to reduce memory usage (compared to using a
|
||||
#' a regular DMatrix first and then creating a quantization out of it) when using the histogram
|
||||
#' method (`tree_method = "hist"`, which is the default algorithm), but is not usable for the
|
||||
@ -24,20 +24,20 @@
|
||||
#'
|
||||
#' Other column types are not supported.
|
||||
#' \item CSR matrices, as class `dgRMatrix` from package `Matrix`.
|
||||
#' \item CSC matrices, as class `dgCMatrix` from package `Matrix`. These are \bold{not} supported for
|
||||
#' \item CSC matrices, as class `dgCMatrix` from package `Matrix`. These are **not** supported for
|
||||
#' 'xgb.QuantileDMatrix'.
|
||||
#' \item Single-row CSR matrices, as class `dsparseVector` from package `Matrix`, which is interpreted
|
||||
#' as a single row (only when making predictions from a fitted model).
|
||||
#' \item Text files in a supported format, passed as a `character` variable containing the URI path to
|
||||
#' the file, with an optional format specifier.
|
||||
#'
|
||||
#' These are \bold{not} supported for `xgb.QuantileDMatrix`. Supported formats are:\itemize{
|
||||
#' \item XGBoost's own binary format for DMatrices, as produced by \link{xgb.DMatrix.save}.
|
||||
#' These are **not** supported for `xgb.QuantileDMatrix`. Supported formats are:\itemize{
|
||||
#' \item XGBoost's own binary format for DMatrices, as produced by [xgb.DMatrix.save()].
|
||||
#' \item SVMLight (a.k.a. LibSVM) format for CSR matrices. This format can be signaled by suffix
|
||||
#' `?format=libsvm` at the end of the file path. It will be the default format if not
|
||||
#' otherwise specified.
|
||||
#' \item CSV files (comma-separated values). This format can be specified by adding suffix
|
||||
#' `?format=csv` at the end ofthe file path. It will \bold{not} be auto-deduced from file extensions.
|
||||
#' `?format=csv` at the end ofthe file path. It will **not** be auto-deduced from file extensions.
|
||||
#' }
|
||||
#'
|
||||
#' Be aware that the format of the file will not be auto-deduced - for example, if a file is named 'file.csv',
|
||||
@ -62,35 +62,32 @@
|
||||
#'
|
||||
#' In the case of multi-output models, one can also pass multi-dimensional base_margin.
|
||||
#' @param missing A float value to represents missing values in data (not used when creating DMatrix
|
||||
#' from text files).
|
||||
#' It is useful to change when a zero, infinite, or some other extreme value represents missing
|
||||
#' values in data.
|
||||
#' from text files). It is useful to change when a zero, infinite, or some other
|
||||
#' extreme value represents missing values in data.
|
||||
#' @param silent whether to suppress printing an informational message after loading from a file.
|
||||
#' @param feature_names Set names for features. Overrides column names in data
|
||||
#' frame and matrix.
|
||||
#' @param feature_names Set names for features. Overrides column names in data frame and matrix.
|
||||
#'
|
||||
#' Note: columns are not referenced by name when calling `predict`, so the column order there
|
||||
#' must be the same as in the DMatrix construction, regardless of the column names.
|
||||
#' @param feature_types Set types for features.
|
||||
#'
|
||||
#' If `data` is a `data.frame` and passing `feature_types` is not supplied, feature types will be deduced
|
||||
#' automatically from the column types.
|
||||
#' If `data` is a `data.frame` and passing `feature_types` is not supplied,
|
||||
#' feature types will be deduced automatically from the column types.
|
||||
#'
|
||||
#' Otherwise, one can pass a character vector with the same length as number of columns in `data`,
|
||||
#' with the following possible values:\itemize{
|
||||
#' \item "c", which represents categorical columns.
|
||||
#' \item "q", which represents numeric columns.
|
||||
#' \item "int", which represents integer columns.
|
||||
#' \item "i", which represents logical (boolean) columns.
|
||||
#' }
|
||||
#' with the following possible values:
|
||||
#' - "c", which represents categorical columns.
|
||||
#' - "q", which represents numeric columns.
|
||||
#' - "int", which represents integer columns.
|
||||
#' - "i", which represents logical (boolean) columns.
|
||||
#'
|
||||
#' Note that, while categorical types are treated differently from the rest for model fitting
|
||||
#' purposes, the other types do not influence the generated model, but have effects in other
|
||||
#' functionalities such as feature importances.
|
||||
#'
|
||||
#' \bold{Important}: categorical features, if specified manually through `feature_types`, must
|
||||
#' **Important**: Categorical features, if specified manually through `feature_types`, must
|
||||
#' be encoded as integers with numeration starting at zero, and the same encoding needs to be
|
||||
#' applied when passing data to `predict`. Even if passing `factor` types, the encoding will
|
||||
#' applied when passing data to [predict()]. Even if passing `factor` types, the encoding will
|
||||
#' not be saved, so make sure that `factor` columns passed to `predict` have the same `levels`.
|
||||
#' @param nthread Number of threads used for creating DMatrix.
|
||||
#' @param group Group size for all ranking group.
|
||||
@ -109,13 +106,14 @@
|
||||
#' subclass 'xgb.QuantileDMatrix'.
|
||||
#'
|
||||
#' @details
|
||||
#' Note that DMatrix objects are not serializable through R functions such as \code{saveRDS} or \code{save}.
|
||||
#' Note that DMatrix objects are not serializable through R functions such as [saveRDS()] or [save()].
|
||||
#' If a DMatrix gets serialized and then de-serialized (for example, when saving data in an R session or caching
|
||||
#' chunks in an Rmd file), the resulting object will not be usable anymore and will need to be reconstructed
|
||||
#' from the original source of data.
|
||||
#'
|
||||
#' @examples
|
||||
#' data(agaricus.train, package='xgboost')
|
||||
#' data(agaricus.train, package = "xgboost")
|
||||
#'
|
||||
#' ## Keep the number of threads to 1 for examples
|
||||
#' nthread <- 1
|
||||
#' data.table::setDTthreads(nthread)
|
||||
@ -318,7 +316,7 @@ xgb.DMatrix <- function(
|
||||
}
|
||||
|
||||
#' @param ref The training dataset that provides quantile information, needed when creating
|
||||
#' validation/test dataset with `xgb.QuantileDMatrix`. Supplying the training DMatrix
|
||||
#' validation/test dataset with [xgb.QuantileDMatrix()]. Supplying the training DMatrix
|
||||
#' as a reference means that the same quantisation applied to the training data is
|
||||
#' applied to the validation/test data
|
||||
#' @param max_bin The number of histogram bin, should be consistent with the training parameter
|
||||
@ -411,31 +409,33 @@ xgb.QuantileDMatrix <- function(
|
||||
return(dmat)
|
||||
}
|
||||
|
||||
#' @title XGBoost Data Iterator
|
||||
#' @description Interface to create a custom data iterator in order to construct a DMatrix
|
||||
#' XGBoost Data Iterator
|
||||
#'
|
||||
#' @description
|
||||
#' Interface to create a custom data iterator in order to construct a DMatrix
|
||||
#' from external memory.
|
||||
#'
|
||||
#' This function is responsible for generating an R object structure containing callback
|
||||
#' functions and an environment shared with them.
|
||||
#'
|
||||
#' The output structure from this function is then meant to be passed to \link{xgb.ExternalDMatrix},
|
||||
#' The output structure from this function is then meant to be passed to [xgb.ExternalDMatrix()],
|
||||
#' which will consume the data and create a DMatrix from it by executing the callback functions.
|
||||
#'
|
||||
#' For more information, and for a usage example, see the documentation for \link{xgb.ExternalDMatrix}.
|
||||
#' For more information, and for a usage example, see the documentation for [xgb.ExternalDMatrix()].
|
||||
#'
|
||||
#' @param env An R environment to pass to the callback functions supplied here, which can be
|
||||
#' used to keep track of variables to determine how to handle the batches.
|
||||
#'
|
||||
#' For example, one might want to keep track of an iteration number in this environment in order
|
||||
#' to know which part of the data to pass next.
|
||||
#' @param f_next `function(env)` which is responsible for:\itemize{
|
||||
#' \item Accessing or retrieving the next batch of data in the iterator.
|
||||
#' \item Supplying this data by calling function \link{xgb.DataBatch} on it and returning the result.
|
||||
#' \item Keeping track of where in the iterator batch it is or will go next, which can for example
|
||||
#' @param f_next `function(env)` which is responsible for:
|
||||
#' - Accessing or retrieving the next batch of data in the iterator.
|
||||
#' - Supplying this data by calling function [xgb.DataBatch()] on it and returning the result.
|
||||
#' - Keeping track of where in the iterator batch it is or will go next, which can for example
|
||||
#' be done by modifiying variables in the `env` variable that is passed here.
|
||||
#' \item Signaling whether there are more batches to be consumed or not, by returning `NULL`
|
||||
#' - Signaling whether there are more batches to be consumed or not, by returning `NULL`
|
||||
#' when the stream of data ends (all batches in the iterator have been consumed), or the result from
|
||||
#' calling \link{xgb.DataBatch} when there are more batches in the line to be consumed.
|
||||
#' }
|
||||
#' calling [xgb.DataBatch()] when there are more batches in the line to be consumed.
|
||||
#' @param f_reset `function(env)` which is responsible for reseting the data iterator
|
||||
#' (i.e. taking it back to the first batch, called before and after the sequence of batches
|
||||
#' has been consumed).
|
||||
@ -443,8 +443,8 @@ xgb.QuantileDMatrix <- function(
|
||||
#' Note that, after resetting the iterator, the batches will be accessed again, so the same data
|
||||
#' (and in the same order) must be passed in subsequent iterations.
|
||||
#' @return An `xgb.DataIter` object, containing the same inputs supplied here, which can then
|
||||
#' be passed to \link{xgb.ExternalDMatrix}.
|
||||
#' @seealso \link{xgb.ExternalDMatrix}, \link{xgb.DataBatch}.
|
||||
#' be passed to [xgb.ExternalDMatrix()].
|
||||
#' @seealso [xgb.ExternalDMatrix()], [xgb.DataBatch()].
|
||||
#' @export
|
||||
xgb.DataIter <- function(env = new.env(), f_next, f_reset) {
|
||||
if (!is.function(f_next)) {
|
||||
@ -508,38 +508,39 @@ xgb.DataIter <- function(env = new.env(), f_next, f_reset) {
|
||||
return(out)
|
||||
}
|
||||
|
||||
#' @title Structure for Data Batches
|
||||
#' @description Helper function to supply data in batches of a data iterator when
|
||||
#' constructing a DMatrix from external memory through \link{xgb.ExternalDMatrix}
|
||||
#' or through \link{xgb.QuantileDMatrix.from_iterator}.
|
||||
#' Structure for Data Batches
|
||||
#'
|
||||
#' This function is \bold{only} meant to be called inside of a callback function (which
|
||||
#' is passed as argument to function \link{xgb.DataIter} to construct a data iterator)
|
||||
#' @description
|
||||
#' Helper function to supply data in batches of a data iterator when
|
||||
#' constructing a DMatrix from external memory through [xgb.ExternalDMatrix()]
|
||||
#' or through [xgb.QuantileDMatrix.from_iterator()].
|
||||
#'
|
||||
#' This function is **only** meant to be called inside of a callback function (which
|
||||
#' is passed as argument to function [xgb.DataIter()] to construct a data iterator)
|
||||
#' when constructing a DMatrix through external memory - otherwise, one should call
|
||||
#' \link{xgb.DMatrix} or \link{xgb.QuantileDMatrix}.
|
||||
#' [xgb.DMatrix()] or [xgb.QuantileDMatrix()].
|
||||
#'
|
||||
#' The object that results from calling this function directly is \bold{not} like
|
||||
#' The object that results from calling this function directly is **not** like
|
||||
#' an `xgb.DMatrix` - i.e. cannot be used to train a model, nor to get predictions - only
|
||||
#' possible usage is to supply data to an iterator, from which a DMatrix is then constructed.
|
||||
#'
|
||||
#' For more information and for example usage, see the documentation for \link{xgb.ExternalDMatrix}.
|
||||
#' For more information and for example usage, see the documentation for [xgb.ExternalDMatrix()].
|
||||
#' @inheritParams xgb.DMatrix
|
||||
#' @param data Batch of data belonging to this batch.
|
||||
#'
|
||||
#' Note that not all of the input types supported by \link{xgb.DMatrix} are possible
|
||||
#' to pass here. Supported types are:\itemize{
|
||||
#' \item `matrix`, with types `numeric`, `integer`, and `logical`. Note that for types
|
||||
#' Note that not all of the input types supported by [xgb.DMatrix()] are possible
|
||||
#' to pass here. Supported types are:
|
||||
#' - `matrix`, with types `numeric`, `integer`, and `logical`. Note that for types
|
||||
#' `integer` and `logical`, missing values might not be automatically recognized as
|
||||
#' as such - see the documentation for parameter `missing` in \link{xgb.ExternalDMatrix}
|
||||
#' as such - see the documentation for parameter `missing` in [xgb.ExternalDMatrix()]
|
||||
#' for details on this.
|
||||
#' \item `data.frame`, with the same types as supported by 'xgb.DMatrix' and same
|
||||
#' - `data.frame`, with the same types as supported by 'xgb.DMatrix' and same
|
||||
#' conversions applied to it. See the documentation for parameter `data` in
|
||||
#' \link{xgb.DMatrix} for details on it.
|
||||
#' \item CSR matrices, as class `dgRMatrix` from package `Matrix`.
|
||||
#' }
|
||||
#' [xgb.DMatrix()] for details on it.
|
||||
#' - CSR matrices, as class `dgRMatrix` from package "Matrix".
|
||||
#' @return An object of class `xgb.DataBatch`, which is just a list containing the
|
||||
#' data and parameters passed here. It does \bold{not} inherit from `xgb.DMatrix`.
|
||||
#' @seealso \link{xgb.DataIter}, \link{xgb.ExternalDMatrix}.
|
||||
#' data and parameters passed here. It does **not** inherit from `xgb.DMatrix`.
|
||||
#' @seealso [xgb.DataIter()], [xgb.ExternalDMatrix()].
|
||||
#' @export
|
||||
xgb.DataBatch <- function(
|
||||
data,
|
||||
@ -616,42 +617,43 @@ xgb.ProxyDMatrix <- function(proxy_handle, data_iterator) {
|
||||
return(1L)
|
||||
}
|
||||
|
||||
#' @title DMatrix from External Data
|
||||
#' @description Create a special type of xgboost 'DMatrix' object from external data
|
||||
#' supplied by an \link{xgb.DataIter} object, potentially passed in batches from a
|
||||
#' DMatrix from External Data
|
||||
#'
|
||||
#' @description
|
||||
#' Create a special type of XGBoost 'DMatrix' object from external data
|
||||
#' supplied by an [xgb.DataIter()] object, potentially passed in batches from a
|
||||
#' bigger set that might not fit entirely in memory.
|
||||
#'
|
||||
#' The data supplied by the iterator is accessed on-demand as needed, multiple times,
|
||||
#' without being concatenated, but note that fields like 'label' \bold{will} be
|
||||
#' without being concatenated, but note that fields like 'label' **will** be
|
||||
#' concatenated from multiple calls to the data iterator.
|
||||
#'
|
||||
#' For more information, see the guide 'Using XGBoost External Memory Version':
|
||||
#' \url{https://xgboost.readthedocs.io/en/stable/tutorials/external_memory.html}
|
||||
#' @inheritParams xgb.DMatrix
|
||||
#' @param data_iterator A data iterator structure as returned by \link{xgb.DataIter},
|
||||
#' @param data_iterator A data iterator structure as returned by [xgb.DataIter()],
|
||||
#' which includes an environment shared between function calls, and functions to access
|
||||
#' the data in batches on-demand.
|
||||
#' @param cache_prefix The path of cache file, caller must initialize all the directories in this path.
|
||||
#' @param missing A float value to represents missing values in data.
|
||||
#'
|
||||
#' Note that, while functions like \link{xgb.DMatrix} can take a generic `NA` and interpret it
|
||||
#' Note that, while functions like [xgb.DMatrix()] can take a generic `NA` and interpret it
|
||||
#' correctly for different types like `numeric` and `integer`, if an `NA` value is passed here,
|
||||
#' it will not be adapted for different input types.
|
||||
#'
|
||||
#' For example, in R `integer` types, missing values are represented by integer number `-2147483648`
|
||||
#' (since machine 'integer' types do not have an inherent 'NA' value) - hence, if one passes `NA`,
|
||||
#' which is interpreted as a floating-point NaN by 'xgb.ExternalDMatrix' and by
|
||||
#' 'xgb.QuantileDMatrix.from_iterator', these integer missing values will not be treated as missing.
|
||||
#' which is interpreted as a floating-point NaN by [xgb.ExternalDMatrix()] and by
|
||||
#' [xgb.QuantileDMatrix.from_iterator()], these integer missing values will not be treated as missing.
|
||||
#' This should not pose any problem for `numeric` types, since they do have an inheret NaN value.
|
||||
#' @return An 'xgb.DMatrix' object, with subclass 'xgb.ExternalDMatrix', in which the data is not
|
||||
#' held internally but accessed through the iterator when needed.
|
||||
#' @seealso \link{xgb.DataIter}, \link{xgb.DataBatch}, \link{xgb.QuantileDMatrix.from_iterator}
|
||||
#' @seealso [xgb.DataIter()], [xgb.DataBatch()], [xgb.QuantileDMatrix.from_iterator()]
|
||||
#' @examples
|
||||
#' library(xgboost)
|
||||
#' data(mtcars)
|
||||
#'
|
||||
#' # this custom environment will be passed to the iterator
|
||||
#' # functions at each call. It's up to the user to keep
|
||||
#' # This custom environment will be passed to the iterator
|
||||
#' # functions at each call. It is up to the user to keep
|
||||
#' # track of the iteration number in this environment.
|
||||
#' iterator_env <- as.environment(
|
||||
#' list(
|
||||
@ -758,25 +760,27 @@ xgb.ExternalDMatrix <- function(
|
||||
}
|
||||
|
||||
|
||||
#' @title QuantileDMatrix from External Data
|
||||
#' @description Create an `xgb.QuantileDMatrix` object (exact same class as would be returned by
|
||||
#' calling function \link{xgb.QuantileDMatrix}, with the same advantages and limitations) from
|
||||
#' external data supplied by an \link{xgb.DataIter} object, potentially passed in batches from
|
||||
#' a bigger set that might not fit entirely in memory, same way as \link{xgb.ExternalDMatrix}.
|
||||
#' QuantileDMatrix from External Data
|
||||
#'
|
||||
#' @description
|
||||
#' Create an `xgb.QuantileDMatrix` object (exact same class as would be returned by
|
||||
#' calling function [xgb.QuantileDMatrix()], with the same advantages and limitations) from
|
||||
#' external data supplied by [xgb.DataIter()], potentially passed in batches from
|
||||
#' a bigger set that might not fit entirely in memory, same way as [xgb.ExternalDMatrix()].
|
||||
#'
|
||||
#' Note that, while external data will only be loaded through the iterator (thus the full data
|
||||
#' might not be held entirely in-memory), the quantized representation of the data will get
|
||||
#' created in-memory, being concatenated from multiple calls to the data iterator. The quantized
|
||||
#' version is typically lighter than the original data, so there might be cases in which this
|
||||
#' representation could potentially fit in memory even if the full data doesn't.
|
||||
#' representation could potentially fit in memory even if the full data does not.
|
||||
#'
|
||||
#' For more information, see the guide 'Using XGBoost External Memory Version':
|
||||
#' \url{https://xgboost.readthedocs.io/en/stable/tutorials/external_memory.html}
|
||||
#' @inheritParams xgb.ExternalDMatrix
|
||||
#' @inheritParams xgb.QuantileDMatrix
|
||||
#' @return An 'xgb.DMatrix' object, with subclass 'xgb.QuantileDMatrix'.
|
||||
#' @seealso \link{xgb.DataIter}, \link{xgb.DataBatch}, \link{xgb.ExternalDMatrix},
|
||||
#' \link{xgb.QuantileDMatrix}
|
||||
#' @seealso [xgb.DataIter()], [xgb.DataBatch()], [xgb.ExternalDMatrix()],
|
||||
#' [xgb.QuantileDMatrix()]
|
||||
#' @export
|
||||
xgb.QuantileDMatrix.from_iterator <- function( # nolint
|
||||
data_iterator,
|
||||
@ -823,18 +827,18 @@ xgb.QuantileDMatrix.from_iterator <- function( # nolint
|
||||
return(dmat)
|
||||
}
|
||||
|
||||
#' @title Check whether DMatrix object has a field
|
||||
#' @description Checks whether an xgb.DMatrix object has a given field assigned to
|
||||
#' Check whether DMatrix object has a field
|
||||
#'
|
||||
#' Checks whether an xgb.DMatrix object has a given field assigned to
|
||||
#' it, such as weights, labels, etc.
|
||||
#' @param object The DMatrix object to check for the given \code{info} field.
|
||||
#' @param info The field to check for presence or absence in \code{object}.
|
||||
#' @seealso \link{xgb.DMatrix}, \link{getinfo.xgb.DMatrix}, \link{setinfo.xgb.DMatrix}
|
||||
#' @param object The DMatrix object to check for the given `info` field.
|
||||
#' @param info The field to check for presence or absence in `object`.
|
||||
#' @seealso [xgb.DMatrix()], [getinfo.xgb.DMatrix()], [setinfo.xgb.DMatrix()]
|
||||
#' @examples
|
||||
#' library(xgboost)
|
||||
#' x <- matrix(1:10, nrow = 5)
|
||||
#' dm <- xgb.DMatrix(x, nthread = 1)
|
||||
#'
|
||||
#' # 'dm' so far doesn't have any fields set
|
||||
#' # 'dm' so far does not have any fields set
|
||||
#' xgb.DMatrix.hasinfo(dm, "label")
|
||||
#'
|
||||
#' # Fields can be added after construction
|
||||
@ -855,17 +859,19 @@ xgb.DMatrix.hasinfo <- function(object, info) {
|
||||
|
||||
#' Dimensions of xgb.DMatrix
|
||||
#'
|
||||
#' Returns a vector of numbers of rows and of columns in an \code{xgb.DMatrix}.
|
||||
#' @param x Object of class \code{xgb.DMatrix}
|
||||
#' Returns a vector of numbers of rows and of columns in an `xgb.DMatrix`.
|
||||
#'
|
||||
#' @param x Object of class `xgb.DMatrix`
|
||||
#'
|
||||
#' @details
|
||||
#' Note: since \code{nrow} and \code{ncol} internally use \code{dim}, they can also
|
||||
#' be directly used with an \code{xgb.DMatrix} object.
|
||||
#' Note: since [nrow()] and [ncol()] internally use [dim()], they can also
|
||||
#' be directly used with an `xgb.DMatrix` object.
|
||||
#'
|
||||
#' @examples
|
||||
#' data(agaricus.train, package='xgboost')
|
||||
#' data(agaricus.train, package = "xgboost")
|
||||
#'
|
||||
#' train <- agaricus.train
|
||||
#' dtrain <- xgb.DMatrix(train$data, label=train$label, nthread = 2)
|
||||
#' dtrain <- xgb.DMatrix(train$data, label = train$label, nthread = 2)
|
||||
#'
|
||||
#' stopifnot(nrow(dtrain) == nrow(train$data))
|
||||
#' stopifnot(ncol(dtrain) == ncol(train$data))
|
||||
@ -877,27 +883,28 @@ dim.xgb.DMatrix <- function(x) {
|
||||
}
|
||||
|
||||
|
||||
#' Handling of column names of \code{xgb.DMatrix}
|
||||
#' Handling of column names of `xgb.DMatrix`
|
||||
#'
|
||||
#' Only column names are supported for \code{xgb.DMatrix}, thus setting of
|
||||
#' row names would have no effect and returned row names would be NULL.
|
||||
#' Only column names are supported for `xgb.DMatrix`, thus setting of
|
||||
#' row names would have no effect and returned row names would be `NULL`.
|
||||
#'
|
||||
#' @param x object of class \code{xgb.DMatrix}
|
||||
#' @param value a list of two elements: the first one is ignored
|
||||
#' @param x Object of class `xgb.DMatrix`.
|
||||
#' @param value A list of two elements: the first one is ignored
|
||||
#' and the second one is column names
|
||||
#'
|
||||
#' @details
|
||||
#' Generic \code{dimnames} methods are used by \code{colnames}.
|
||||
#' Since row names are irrelevant, it is recommended to use \code{colnames} directly.
|
||||
#' Generic [dimnames()] methods are used by [colnames()].
|
||||
#' Since row names are irrelevant, it is recommended to use [colnames()] directly.
|
||||
#'
|
||||
#' @examples
|
||||
#' data(agaricus.train, package='xgboost')
|
||||
#' data(agaricus.train, package = "xgboost")
|
||||
#'
|
||||
#' train <- agaricus.train
|
||||
#' dtrain <- xgb.DMatrix(train$data, label=train$label, nthread = 2)
|
||||
#' dtrain <- xgb.DMatrix(train$data, label = train$label, nthread = 2)
|
||||
#' dimnames(dtrain)
|
||||
#' colnames(dtrain)
|
||||
#' colnames(dtrain) <- make.names(1:ncol(train$data))
|
||||
#' print(dtrain, verbose=TRUE)
|
||||
#' print(dtrain, verbose = TRUE)
|
||||
#'
|
||||
#' @rdname dimnames.xgb.DMatrix
|
||||
#' @export
|
||||
@ -926,47 +933,45 @@ dimnames.xgb.DMatrix <- function(x) {
|
||||
}
|
||||
|
||||
|
||||
#' @title Get or set information of xgb.DMatrix and xgb.Booster objects
|
||||
#' @param object Object of class \code{xgb.DMatrix} of `xgb.Booster`.
|
||||
#' @param name the name of the information field to get (see details)
|
||||
#' @return For `getinfo`, will return the requested field. For `setinfo`, will always return value `TRUE`
|
||||
#' if it succeeds.
|
||||
#' @details
|
||||
#' The \code{name} field can be one of the following for `xgb.DMatrix`:
|
||||
#' Get or set information of xgb.DMatrix and xgb.Booster objects
|
||||
#'
|
||||
#' \itemize{
|
||||
#' \item \code{label}
|
||||
#' \item \code{weight}
|
||||
#' \item \code{base_margin}
|
||||
#' \item \code{label_lower_bound}
|
||||
#' \item \code{label_upper_bound}
|
||||
#' \item \code{group}
|
||||
#' \item \code{feature_type}
|
||||
#' \item \code{feature_name}
|
||||
#' \item \code{nrow}
|
||||
#' }
|
||||
#' See the documentation for \link{xgb.DMatrix} for more information about these fields.
|
||||
#' @param object Object of class `xgb.DMatrix` or `xgb.Booster`.
|
||||
#' @param name The name of the information field to get (see details).
|
||||
#' @return For `getinfo()`, will return the requested field. For `setinfo()`,
|
||||
#' will always return value `TRUE` if it succeeds.
|
||||
#' @details
|
||||
#' The `name` field can be one of the following for `xgb.DMatrix`:
|
||||
#' - label
|
||||
#' - weight
|
||||
#' - base_margin
|
||||
#' - label_lower_bound
|
||||
#' - label_upper_bound
|
||||
#' - group
|
||||
#' - feature_type
|
||||
#' - feature_name
|
||||
#' - nrow
|
||||
#'
|
||||
#' See the documentation for [xgb.DMatrix()] for more information about these fields.
|
||||
#'
|
||||
#' For `xgb.Booster`, can be one of the following:
|
||||
#' \itemize{
|
||||
#' \item \code{feature_type}
|
||||
#' \item \code{feature_name}
|
||||
#' }
|
||||
#' - `feature_type`
|
||||
#' - `feature_name`
|
||||
#'
|
||||
#' Note that, while 'qid' cannot be retrieved, it's possible to get the equivalent 'group'
|
||||
#' Note that, while 'qid' cannot be retrieved, it is possible to get the equivalent 'group'
|
||||
#' for a DMatrix that had 'qid' assigned.
|
||||
#'
|
||||
#' \bold{Important}: when calling `setinfo`, the objects are modified in-place. See
|
||||
#' \link{xgb.copy.Booster} for an idea of this in-place assignment works.
|
||||
#' **Important**: when calling [setinfo()], the objects are modified in-place. See
|
||||
#' [xgb.copy.Booster()] for an idea of this in-place assignment works.
|
||||
#' @examples
|
||||
#' data(agaricus.train, package='xgboost')
|
||||
#' data(agaricus.train, package = "xgboost")
|
||||
#'
|
||||
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||
#'
|
||||
#' labels <- getinfo(dtrain, 'label')
|
||||
#' setinfo(dtrain, 'label', 1-labels)
|
||||
#' labels <- getinfo(dtrain, "label")
|
||||
#' setinfo(dtrain, "label", 1 - labels)
|
||||
#'
|
||||
#' labels2 <- getinfo(dtrain, 'label')
|
||||
#' stopifnot(all(labels2 == 1-labels))
|
||||
#' labels2 <- getinfo(dtrain, "label")
|
||||
#' stopifnot(all(labels2 == 1 - labels))
|
||||
#' @rdname getinfo
|
||||
#' @export
|
||||
getinfo <- function(object, name) UseMethod("getinfo")
|
||||
@ -1011,28 +1016,29 @@ getinfo.xgb.DMatrix <- function(object, name) {
|
||||
}
|
||||
|
||||
#' @rdname getinfo
|
||||
#' @param info the specific field of information to set
|
||||
#' @param info The specific field of information to set.
|
||||
#'
|
||||
#' @details
|
||||
#' See the documentation for \link{xgb.DMatrix} for possible fields that can be set
|
||||
#' See the documentation for [xgb.DMatrix()] for possible fields that can be set
|
||||
#' (which correspond to arguments in that function).
|
||||
#'
|
||||
#' Note that the following fields are allowed in the construction of an \code{xgb.DMatrix}
|
||||
#' but \bold{aren't} allowed here:\itemize{
|
||||
#' \item data
|
||||
#' \item missing
|
||||
#' \item silent
|
||||
#' \item nthread
|
||||
#' }
|
||||
#' Note that the following fields are allowed in the construction of an `xgb.DMatrix`
|
||||
#' but **are not** allowed here:
|
||||
#' - data
|
||||
#' - missing
|
||||
#' - silent
|
||||
#' - nthread
|
||||
#'
|
||||
#' @examples
|
||||
#' data(agaricus.train, package='xgboost')
|
||||
#' data(agaricus.train, package = "xgboost")
|
||||
#'
|
||||
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||
#'
|
||||
#' labels <- getinfo(dtrain, 'label')
|
||||
#' setinfo(dtrain, 'label', 1-labels)
|
||||
#' labels2 <- getinfo(dtrain, 'label')
|
||||
#' stopifnot(all.equal(labels2, 1-labels))
|
||||
#' labels <- getinfo(dtrain, "label")
|
||||
#' setinfo(dtrain, "label", 1 - labels)
|
||||
#'
|
||||
#' labels2 <- getinfo(dtrain, "label")
|
||||
#' stopifnot(all.equal(labels2, 1 - labels))
|
||||
#' @export
|
||||
setinfo <- function(object, name, info) UseMethod("setinfo")
|
||||
|
||||
@ -1117,9 +1123,11 @@ setinfo.xgb.DMatrix <- function(object, name, info) {
|
||||
stop("setinfo: unknown info name ", name)
|
||||
}
|
||||
|
||||
#' @title Get Quantile Cuts from DMatrix
|
||||
#' @description Get the quantile cuts (a.k.a. borders) from an `xgb.DMatrix`
|
||||
#' that has been quantized for the histogram method (`tree_method="hist"`).
|
||||
#' Get Quantile Cuts from DMatrix
|
||||
#'
|
||||
#' @description
|
||||
#' Get the quantile cuts (a.k.a. borders) from an `xgb.DMatrix`
|
||||
#' that has been quantized for the histogram method (`tree_method = "hist"`).
|
||||
#'
|
||||
#' These cuts are used in order to assign observations to bins - i.e. these are ordered
|
||||
#' boundaries which are used to determine assignment condition `border_low < x < border_high`.
|
||||
@ -1130,19 +1138,18 @@ setinfo.xgb.DMatrix <- function(object, name, info) {
|
||||
#' which will be output in sorted order from lowest to highest.
|
||||
#'
|
||||
#' Different columns can have different numbers of bins according to their range.
|
||||
#' @param dmat An `xgb.DMatrix` object, as returned by \link{xgb.DMatrix}.
|
||||
#' @param output Output format for the quantile cuts. Possible options are:\itemize{
|
||||
#' \item `"list"` will return the output as a list with one entry per column, where
|
||||
#' @param dmat An `xgb.DMatrix` object, as returned by [xgb.DMatrix()].
|
||||
#' @param output Output format for the quantile cuts. Possible options are:
|
||||
#' - "list"` will return the output as a list with one entry per column, where
|
||||
#' each column will have a numeric vector with the cuts. The list will be named if
|
||||
#' `dmat` has column names assigned to it.
|
||||
#' \item `"arrays"` will return a list with entries `indptr` (base-0 indexing) and
|
||||
#' - `"arrays"` will return a list with entries `indptr` (base-0 indexing) and
|
||||
#' `data`. Here, the cuts for column 'i' are obtained by slicing 'data' from entries
|
||||
#' `indptr[i]+1` to `indptr[i+1]`.
|
||||
#' }
|
||||
#' ` indptr[i]+1` to `indptr[i+1]`.
|
||||
#' @return The quantile cuts, in the format specified by parameter `output`.
|
||||
#' @examples
|
||||
#' library(xgboost)
|
||||
#' data(mtcars)
|
||||
#'
|
||||
#' y <- mtcars$mpg
|
||||
#' x <- as.matrix(mtcars[, -1])
|
||||
#' dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
||||
@ -1150,11 +1157,7 @@ setinfo.xgb.DMatrix <- function(object, name, info) {
|
||||
#' # DMatrix is not quantized right away, but will be once a hist model is generated
|
||||
#' model <- xgb.train(
|
||||
#' data = dm,
|
||||
#' params = list(
|
||||
#' tree_method = "hist",
|
||||
#' max_bin = 8,
|
||||
#' nthread = 1
|
||||
#' ),
|
||||
#' params = list(tree_method = "hist", max_bin = 8, nthread = 1),
|
||||
#' nrounds = 3
|
||||
#' )
|
||||
#'
|
||||
@ -1189,17 +1192,19 @@ xgb.get.DMatrix.qcut <- function(dmat, output = c("list", "arrays")) { # nolint
|
||||
}
|
||||
}
|
||||
|
||||
#' @title Get Number of Non-Missing Entries in DMatrix
|
||||
#' @param dmat An `xgb.DMatrix` object, as returned by \link{xgb.DMatrix}.
|
||||
#' @return The number of non-missing entries in the DMatrix
|
||||
#' Get Number of Non-Missing Entries in DMatrix
|
||||
#'
|
||||
#' @param dmat An `xgb.DMatrix` object, as returned by [xgb.DMatrix()].
|
||||
#' @return The number of non-missing entries in the DMatrix.
|
||||
#' @export
|
||||
xgb.get.DMatrix.num.non.missing <- function(dmat) { # nolint
|
||||
stopifnot(inherits(dmat, "xgb.DMatrix"))
|
||||
return(.Call(XGDMatrixNumNonMissing_R, dmat))
|
||||
}
|
||||
|
||||
#' @title Get DMatrix Data
|
||||
#' @param dmat An `xgb.DMatrix` object, as returned by \link{xgb.DMatrix}.
|
||||
#' Get DMatrix Data
|
||||
#'
|
||||
#' @param dmat An `xgb.DMatrix` object, as returned by [xgb.DMatrix()].
|
||||
#' @return The data held in the DMatrix, as a sparse CSR matrix (class `dgRMatrix`
|
||||
#' from package `Matrix`). If it had feature names, these will be added as column names
|
||||
#' in the output.
|
||||
@ -1223,27 +1228,27 @@ xgb.get.DMatrix.data <- function(dmat) {
|
||||
return(out)
|
||||
}
|
||||
|
||||
#' Get a new DMatrix containing the specified rows of
|
||||
#' original xgb.DMatrix object
|
||||
#' Slice DMatrix
|
||||
#'
|
||||
#' Get a new DMatrix containing the specified rows of
|
||||
#' original xgb.DMatrix object
|
||||
#' Get a new DMatrix containing the specified rows of original xgb.DMatrix object.
|
||||
#'
|
||||
#' @param object Object of class "xgb.DMatrix".
|
||||
#' @param object Object of class `xgb.DMatrix`.
|
||||
#' @param idxset An integer vector of indices of rows needed (base-1 indexing).
|
||||
#' @param allow_groups Whether to allow slicing an `xgb.DMatrix` with `group` (or
|
||||
#' equivalently `qid`) field. Note that in such case, the result will not have
|
||||
#' the groups anymore - they need to be set manually through `setinfo`.
|
||||
#' @param colset currently not used (columns subsetting is not available)
|
||||
#' the groups anymore - they need to be set manually through [setinfo()].
|
||||
#' @param colset Currently not used (columns subsetting is not available).
|
||||
#'
|
||||
#' @examples
|
||||
#' data(agaricus.train, package='xgboost')
|
||||
#' data(agaricus.train, package = "xgboost")
|
||||
#'
|
||||
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||
#'
|
||||
#' dsub <- xgb.slice.DMatrix(dtrain, 1:42)
|
||||
#' labels1 <- getinfo(dsub, 'label')
|
||||
#' labels1 <- getinfo(dsub, "label")
|
||||
#'
|
||||
#' dsub <- dtrain[1:42, ]
|
||||
#' labels2 <- getinfo(dsub, 'label')
|
||||
#' labels2 <- getinfo(dsub, "label")
|
||||
#' all.equal(labels1, labels2)
|
||||
#'
|
||||
#' @rdname xgb.slice.DMatrix
|
||||
@ -1292,16 +1297,17 @@ xgb.slice.DMatrix <- function(object, idxset, allow_groups = FALSE) {
|
||||
#' Print information about xgb.DMatrix.
|
||||
#' Currently it displays dimensions and presence of info-fields and colnames.
|
||||
#'
|
||||
#' @param x an xgb.DMatrix object
|
||||
#' @param verbose whether to print colnames (when present)
|
||||
#' @param ... not currently used
|
||||
#' @param x An xgb.DMatrix object.
|
||||
#' @param verbose Whether to print colnames (when present).
|
||||
#' @param ... Not currently used.
|
||||
#'
|
||||
#' @examples
|
||||
#' data(agaricus.train, package='xgboost')
|
||||
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||
#' data(agaricus.train, package = "xgboost")
|
||||
#'
|
||||
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||
#' dtrain
|
||||
#' print(dtrain, verbose=TRUE)
|
||||
#'
|
||||
#' print(dtrain, verbose = TRUE)
|
||||
#'
|
||||
#' @method print xgb.DMatrix
|
||||
#' @export
|
||||
|
||||
@ -2,12 +2,13 @@
|
||||
#'
|
||||
#' Save xgb.DMatrix object to binary file
|
||||
#'
|
||||
#' @param dmatrix the \code{xgb.DMatrix} object
|
||||
#' @param dmatrix the `xgb.DMatrix` object
|
||||
#' @param fname the name of the file to write.
|
||||
#'
|
||||
#' @examples
|
||||
#' \dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
||||
#' data(agaricus.train, package='xgboost')
|
||||
#' data(agaricus.train, package = "xgboost")
|
||||
#'
|
||||
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||
#' fname <- file.path(tempdir(), "xgb.DMatrix.data")
|
||||
#' xgb.DMatrix.save(dtrain, fname)
|
||||
|
||||
@ -1,24 +1,26 @@
|
||||
#' Set and get global configuration
|
||||
#'
|
||||
#' Global configuration consists of a collection of parameters that can be applied in the global
|
||||
#' scope. See \url{https://xgboost.readthedocs.io/en/stable/parameter.html} for the full list of
|
||||
#' parameters supported in the global configuration. Use \code{xgb.set.config} to update the
|
||||
#' values of one or more global-scope parameters. Use \code{xgb.get.config} to fetch the current
|
||||
#' parameters supported in the global configuration. Use `xgb.set.config()` to update the
|
||||
#' values of one or more global-scope parameters. Use `xgb.get.config()` to fetch the current
|
||||
#' values of all global-scope parameters (listed in
|
||||
#' \url{https://xgboost.readthedocs.io/en/stable/parameter.html}).
|
||||
#'
|
||||
#' @details
|
||||
#' Note that serialization-related functions might use a globally-configured number of threads,
|
||||
#' which is managed by the system's OpenMP (OMP) configuration instead. Typically, XGBoost methods
|
||||
#' accept an `nthreads` parameter, but some methods like `readRDS` might get executed before such
|
||||
#' accept an `nthreads` parameter, but some methods like [readRDS()] might get executed before such
|
||||
#' parameter can be supplied.
|
||||
#'
|
||||
#' The number of OMP threads can in turn be configured for example through an environment variable
|
||||
#' `OMP_NUM_THREADS` (needs to be set before R is started), or through `RhpcBLASctl::omp_set_num_threads`.
|
||||
#' @rdname xgbConfig
|
||||
#' @title Set and get global configuration
|
||||
#' @name xgb.set.config, xgb.get.config
|
||||
#' @export xgb.set.config xgb.get.config
|
||||
#' @param ... List of parameters to be set, as keyword arguments
|
||||
#' @return
|
||||
#' \code{xgb.set.config} returns \code{TRUE} to signal success. \code{xgb.get.config} returns
|
||||
#' `xgb.set.config()` returns `TRUE` to signal success. `xgb.get.config()` returns
|
||||
#' a list containing all global-scope parameters and their values.
|
||||
#'
|
||||
#' @examples
|
||||
|
||||
@ -2,56 +2,50 @@
|
||||
#'
|
||||
#' The cross validation function of xgboost.
|
||||
#'
|
||||
#' @param params the list of parameters. The complete list of parameters is
|
||||
#' available in the \href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}. Below
|
||||
#' is a shorter summary:
|
||||
#' \itemize{
|
||||
#' \item \code{objective} objective function, common ones are
|
||||
#' \itemize{
|
||||
#' \item \code{reg:squarederror} Regression with squared loss.
|
||||
#' \item \code{binary:logistic} logistic regression for classification.
|
||||
#' \item See \code{\link[=xgb.train]{xgb.train}()} for complete list of objectives.
|
||||
#' }
|
||||
#' \item \code{eta} step size of each boosting step
|
||||
#' \item \code{max_depth} maximum depth of the tree
|
||||
#' \item \code{nthread} number of thread used in training, if not set, all threads are used
|
||||
#' }
|
||||
#' @param params The list of parameters. The complete list of parameters is available in the
|
||||
#' [online documentation](http://xgboost.readthedocs.io/en/latest/parameter.html).
|
||||
#' Below is a shorter summary:
|
||||
#' - `objective`: Objective function, common ones are
|
||||
#' - `reg:squarederror`: Regression with squared loss.
|
||||
#' - `binary:logistic`: Logistic regression for classification.
|
||||
#'
|
||||
#' See \code{\link{xgb.train}} for further details.
|
||||
#' See also demo/ for walkthrough example in R.
|
||||
#' See [xgb.train()] for complete list of objectives.
|
||||
#' - `eta`: Step size of each boosting step
|
||||
#' - `max_depth`: Maximum depth of the tree
|
||||
#' - `nthread`: Number of threads used in training. If not set, all threads are used
|
||||
#'
|
||||
#' See [xgb.train()] for further details.
|
||||
#' See also demo for walkthrough example in R.
|
||||
#'
|
||||
#' Note that, while `params` accepts a `seed` entry and will use such parameter for model training if
|
||||
#' supplied, this seed is not used for creation of train-test splits, which instead rely on R's own RNG
|
||||
#' system - thus, for reproducible results, one needs to call the `set.seed` function beforehand.
|
||||
#' system - thus, for reproducible results, one needs to call the [set.seed()] function beforehand.
|
||||
#' @param data An `xgb.DMatrix` object, with corresponding fields like `label` or bounds as required
|
||||
#' for model training by the objective.
|
||||
#'
|
||||
#' Note that only the basic `xgb.DMatrix` class is supported - variants such as `xgb.QuantileDMatrix`
|
||||
#' or `xgb.ExternalDMatrix` are not supported here.
|
||||
#' @param nrounds the max number of iterations
|
||||
#' @param nfold the original dataset is randomly partitioned into \code{nfold} equal size subsamples.
|
||||
#' @param nrounds The max number of iterations.
|
||||
#' @param nfold The original dataset is randomly partitioned into `nfold` equal size subsamples.
|
||||
#' @param prediction A logical value indicating whether to return the test fold predictions
|
||||
#' from each CV model. This parameter engages the \code{\link{xgb.cb.cv.predict}} callback.
|
||||
#' @param showsd \code{boolean}, whether to show standard deviation of cross validation
|
||||
#' @param metrics, list of evaluation metrics to be used in cross validation,
|
||||
#' from each CV model. This parameter engages the [xgb.cb.cv.predict()] callback.
|
||||
#' @param showsd Logical value whether to show standard deviation of cross validation.
|
||||
#' @param metrics List of evaluation metrics to be used in cross validation,
|
||||
#' when it is not specified, the evaluation metric is chosen according to objective function.
|
||||
#' Possible options are:
|
||||
#' \itemize{
|
||||
#' \item \code{error} binary classification error rate
|
||||
#' \item \code{rmse} Rooted mean square error
|
||||
#' \item \code{logloss} negative log-likelihood function
|
||||
#' \item \code{mae} Mean absolute error
|
||||
#' \item \code{mape} Mean absolute percentage error
|
||||
#' \item \code{auc} Area under curve
|
||||
#' \item \code{aucpr} Area under PR curve
|
||||
#' \item \code{merror} Exact matching error, used to evaluate multi-class classification
|
||||
#' }
|
||||
#' @param obj customized objective function. Returns gradient and second order
|
||||
#' - `error`: Binary classification error rate
|
||||
#' - `rmse`: Root mean square error
|
||||
#' - `logloss`: Negative log-likelihood function
|
||||
#' - `mae`: Mean absolute error
|
||||
#' - `mape`: Mean absolute percentage error
|
||||
#' - `auc`: Area under curve
|
||||
#' - `aucpr`: Area under PR curve
|
||||
#' - `merror`: Exact matching error used to evaluate multi-class classification
|
||||
#' @param obj Customized objective function. Returns gradient and second order
|
||||
#' gradient with given prediction and dtrain.
|
||||
#' @param feval customized evaluation function. Returns
|
||||
#' \code{list(metric='metric-name', value='metric-value')} with given
|
||||
#' prediction and dtrain.
|
||||
#' @param stratified A \code{boolean} indicating whether sampling of folds should be stratified
|
||||
#' @param feval Customized evaluation function. Returns
|
||||
#' `list(metric='metric-name', value='metric-value')` with given prediction and dtrain.
|
||||
#' @param stratified Logical flag indicating whether sampling of folds should be stratified
|
||||
#' by the values of outcome labels. For real-valued labels in regression objectives,
|
||||
#' stratification will be done by discretizing the labels into up to 5 buckets beforehand.
|
||||
#'
|
||||
@ -62,81 +56,87 @@
|
||||
#' This parameter is ignored when `data` has a `group` field - in such case, the splitting
|
||||
#' will be based on whole groups (note that this might make the folds have different sizes).
|
||||
#'
|
||||
#' Value `TRUE` here is \bold{not} supported for custom objectives.
|
||||
#' @param folds \code{list} provides a possibility to use a list of pre-defined CV folds
|
||||
#' (each element must be a vector of test fold's indices). When folds are supplied,
|
||||
#' the \code{nfold} and \code{stratified} parameters are ignored.
|
||||
#' Value `TRUE` here is **not** supported for custom objectives.
|
||||
#' @param folds List with pre-defined CV folds (each element must be a vector of test fold's indices).
|
||||
#' When folds are supplied, the `nfold` and `stratified` parameters are ignored.
|
||||
#'
|
||||
#' If `data` has a `group` field and the objective requires this field, each fold (list element)
|
||||
#' must additionally have two attributes (retrievable through \link{attributes}) named `group_test`
|
||||
#' and `group_train`, which should hold the `group` to assign through \link{setinfo.xgb.DMatrix} to
|
||||
#' must additionally have two attributes (retrievable through `attributes`) named `group_test`
|
||||
#' and `group_train`, which should hold the `group` to assign through [setinfo.xgb.DMatrix()] to
|
||||
#' the resulting DMatrices.
|
||||
#' @param train_folds \code{list} list specifying which indicies to use for training. If \code{NULL}
|
||||
#' (the default) all indices not specified in \code{folds} will be used for training.
|
||||
#' @param train_folds List specifying which indices to use for training. If `NULL`
|
||||
#' (the default) all indices not specified in `folds` will be used for training.
|
||||
#'
|
||||
#' This is not supported when `data` has `group` field.
|
||||
#' @param verbose \code{boolean}, print the statistics during the process
|
||||
#' @param print_every_n Print each n-th iteration evaluation messages when \code{verbose>0}.
|
||||
#' @param verbose Logical flag. Should statistics be printed during the process?
|
||||
#' @param print_every_n Print each nth iteration evaluation messages when `verbose > 0`.
|
||||
#' Default is 1 which means all messages are printed. This parameter is passed to the
|
||||
#' \code{\link{xgb.cb.print.evaluation}} callback.
|
||||
#' @param early_stopping_rounds If \code{NULL}, the early stopping function is not triggered.
|
||||
#' If set to an integer \code{k}, training with a validation set will stop if the performance
|
||||
#' doesn't improve for \code{k} rounds.
|
||||
#' Setting this parameter engages the \code{\link{xgb.cb.early.stop}} callback.
|
||||
#' @param maximize If \code{feval} and \code{early_stopping_rounds} are set,
|
||||
#' [xgb.cb.print.evaluation()] callback.
|
||||
#' @param early_stopping_rounds If `NULL`, the early stopping function is not triggered.
|
||||
#' If set to an integer `k`, training with a validation set will stop if the performance
|
||||
#' doesn't improve for `k` rounds.
|
||||
#' Setting this parameter engages the [xgb.cb.early.stop()] callback.
|
||||
#' @param maximize If `feval` and `early_stopping_rounds` are set,
|
||||
#' then this parameter must be set as well.
|
||||
#' When it is \code{TRUE}, it means the larger the evaluation score the better.
|
||||
#' This parameter is passed to the \code{\link{xgb.cb.early.stop}} callback.
|
||||
#' @param callbacks a list of callback functions to perform various task during boosting.
|
||||
#' See \code{\link{xgb.Callback}}. Some of the callbacks are automatically created depending on the
|
||||
#' When it is `TRUE`, it means the larger the evaluation score the better.
|
||||
#' This parameter is passed to the [xgb.cb.early.stop()] callback.
|
||||
#' @param callbacks A list of callback functions to perform various task during boosting.
|
||||
#' See [xgb.Callback()]. Some of the callbacks are automatically created depending on the
|
||||
#' parameters' values. User can provide either existing or their own callback methods in order
|
||||
#' to customize the training process.
|
||||
#' @param ... other parameters to pass to \code{params}.
|
||||
#' @param ... Other parameters to pass to `params`.
|
||||
#'
|
||||
#' @details
|
||||
#' The original sample is randomly partitioned into \code{nfold} equal size subsamples.
|
||||
#' The original sample is randomly partitioned into `nfold` equal size subsamples.
|
||||
#'
|
||||
#' Of the \code{nfold} subsamples, a single subsample is retained as the validation data for testing the model,
|
||||
#' and the remaining \code{nfold - 1} subsamples are used as training data.
|
||||
#' Of the `nfold` subsamples, a single subsample is retained as the validation data for testing the model,
|
||||
#' and the remaining `nfold - 1` subsamples are used as training data.
|
||||
#'
|
||||
#' The cross-validation process is then repeated \code{nrounds} times, with each of the
|
||||
#' \code{nfold} subsamples used exactly once as the validation data.
|
||||
#' The cross-validation process is then repeated `nrounds` times, with each of the
|
||||
#' `nfold` subsamples used exactly once as the validation data.
|
||||
#'
|
||||
#' All observations are used for both training and validation.
|
||||
#'
|
||||
#' Adapted from \url{https://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29}
|
||||
#'
|
||||
#' @return
|
||||
#' An object of class \code{xgb.cv.synchronous} with the following elements:
|
||||
#' \itemize{
|
||||
#' \item \code{call} a function call.
|
||||
#' \item \code{params} parameters that were passed to the xgboost library. Note that it does not
|
||||
#' capture parameters changed by the \code{\link{xgb.cb.reset.parameters}} callback.
|
||||
#' \item \code{evaluation_log} evaluation history stored as a \code{data.table} with the
|
||||
#' An object of class 'xgb.cv.synchronous' with the following elements:
|
||||
#' - `call`: Function call.
|
||||
#' - `params`: Parameters that were passed to the xgboost library. Note that it does not
|
||||
#' capture parameters changed by the [xgb.cb.reset.parameters()] callback.
|
||||
#' - `evaluation_log`: Evaluation history stored as a `data.table` with the
|
||||
#' first column corresponding to iteration number and the rest corresponding to the
|
||||
#' CV-based evaluation means and standard deviations for the training and test CV-sets.
|
||||
#' It is created by the \code{\link{xgb.cb.evaluation.log}} callback.
|
||||
#' \item \code{niter} number of boosting iterations.
|
||||
#' \item \code{nfeatures} number of features in training data.
|
||||
#' \item \code{folds} the list of CV folds' indices - either those passed through the \code{folds}
|
||||
#' It is created by the [xgb.cb.evaluation.log()] callback.
|
||||
#' - `niter`: Number of boosting iterations.
|
||||
#' - `nfeatures`: Number of features in training data.
|
||||
#' - `folds`: The list of CV folds' indices - either those passed through the `folds`
|
||||
#' parameter or randomly generated.
|
||||
#' \item \code{best_iteration} iteration number with the best evaluation metric value
|
||||
#' - `best_iteration`: Iteration number with the best evaluation metric value
|
||||
#' (only available with early stopping).
|
||||
#' }
|
||||
#'
|
||||
#' Plus other potential elements that are the result of callbacks, such as a list `cv_predict` with
|
||||
#' a sub-element `pred` when passing `prediction = TRUE`, which is added by the \link{xgb.cb.cv.predict}
|
||||
#' a sub-element `pred` when passing `prediction = TRUE`, which is added by the [xgb.cb.cv.predict()]
|
||||
#' callback (note that one can also pass it manually under `callbacks` with different settings,
|
||||
#' such as saving also the models created during cross validation); or a list `early_stop` which
|
||||
#' will contain elements such as `best_iteration` when using the early stopping callback (\link{xgb.cb.early.stop}).
|
||||
#' will contain elements such as `best_iteration` when using the early stopping callback ([xgb.cb.early.stop()]).
|
||||
#'
|
||||
#' @examples
|
||||
#' data(agaricus.train, package='xgboost')
|
||||
#' data(agaricus.train, package = "xgboost")
|
||||
#'
|
||||
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||
#' cv <- xgb.cv(data = dtrain, nrounds = 3, nthread = 2, nfold = 5, metrics = list("rmse","auc"),
|
||||
#' max_depth = 3, eta = 1, objective = "binary:logistic")
|
||||
#'
|
||||
#' cv <- xgb.cv(
|
||||
#' data = dtrain,
|
||||
#' nrounds = 3,
|
||||
#' nthread = 2,
|
||||
#' nfold = 5,
|
||||
#' metrics = list("rmse","auc"),
|
||||
#' max_depth = 3,
|
||||
#' eta = 1,objective = "binary:logistic"
|
||||
#' )
|
||||
#' print(cv)
|
||||
#' print(cv, verbose=TRUE)
|
||||
#' print(cv, verbose = TRUE)
|
||||
#'
|
||||
#' @export
|
||||
xgb.cv <- function(params = list(), data, nrounds, nfold,
|
||||
@ -325,23 +325,31 @@ xgb.cv <- function(params = list(), data, nrounds, nfold,
|
||||
|
||||
#' Print xgb.cv result
|
||||
#'
|
||||
#' Prints formatted results of \code{xgb.cv}.
|
||||
#' Prints formatted results of [xgb.cv()].
|
||||
#'
|
||||
#' @param x an \code{xgb.cv.synchronous} object
|
||||
#' @param verbose whether to print detailed data
|
||||
#' @param ... passed to \code{data.table.print}
|
||||
#' @param x An `xgb.cv.synchronous` object.
|
||||
#' @param verbose Whether to print detailed data.
|
||||
#' @param ... Passed to `data.table.print()`.
|
||||
#'
|
||||
#' @details
|
||||
#' When not verbose, it would only print the evaluation results,
|
||||
#' including the best iteration (when available).
|
||||
#'
|
||||
#' @examples
|
||||
#' data(agaricus.train, package='xgboost')
|
||||
#' data(agaricus.train, package = "xgboost")
|
||||
#'
|
||||
#' train <- agaricus.train
|
||||
#' cv <- xgb.cv(data = xgb.DMatrix(train$data, label = train$label), nfold = 5, max_depth = 2,
|
||||
#' eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
|
||||
#' cv <- xgb.cv(
|
||||
#' data = xgb.DMatrix(train$data, label = train$label),
|
||||
#' nfold = 5,
|
||||
#' max_depth = 2,
|
||||
#' eta = 1,
|
||||
#' nthread = 2,
|
||||
#' nrounds = 2,
|
||||
#' objective = "binary:logistic"
|
||||
#' )
|
||||
#' print(cv)
|
||||
#' print(cv, verbose=TRUE)
|
||||
#' print(cv, verbose = TRUE)
|
||||
#'
|
||||
#' @rdname print.xgb.cv
|
||||
#' @method print xgb.cv.synchronous
|
||||
|
||||
@ -28,7 +28,7 @@
|
||||
#' This function uses [GraphViz](https://www.graphviz.org/) as DiagrammeR backend.
|
||||
#'
|
||||
#' @param model Object of class `xgb.Booster`. If it contains feature names (they can be set through
|
||||
#' \link{setinfo}), they will be used in the output from this function.
|
||||
#' [setinfo()], they will be used in the output from this function.
|
||||
#' @param trees An integer vector of tree indices that should be used.
|
||||
#' The default (`NULL`) uses all trees.
|
||||
#' Useful, e.g., in multiclass classification to get only
|
||||
|
||||
@ -2,7 +2,7 @@
|
||||
#'
|
||||
#' Save XGBoost model to a file in binary or JSON format.
|
||||
#'
|
||||
#' @param model Model object of \code{xgb.Booster} class.
|
||||
#' @param model Model object of `xgb.Booster` class.
|
||||
#' @param fname Name of the file to write. Its extension determines the serialization format:
|
||||
#' - ".ubj": Use the universal binary JSON format (recommended).
|
||||
#' This format uses binary types for e.g. floating point numbers, thereby preventing any loss
|
||||
|
||||
@ -1,107 +1,98 @@
|
||||
#' eXtreme Gradient Boosting Training
|
||||
#'
|
||||
#' \code{xgb.train} is an advanced interface for training an xgboost model.
|
||||
#' The \code{xgboost} function is a simpler wrapper for \code{xgb.train}.
|
||||
#' `xgb.train()` is an advanced interface for training an xgboost model.
|
||||
#' The [xgboost()] function is a simpler wrapper for `xgb.train()`.
|
||||
#'
|
||||
#' @param params the list of parameters. The complete list of parameters is
|
||||
#' available in the \href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}. Below
|
||||
#' is a shorter summary:
|
||||
#' available in the [online documentation](http://xgboost.readthedocs.io/en/latest/parameter.html).
|
||||
#' Below is a shorter summary:
|
||||
#'
|
||||
#' 1. General Parameters
|
||||
#' **1. General Parameters**
|
||||
#'
|
||||
#' \itemize{
|
||||
#' \item \code{booster} which booster to use, can be \code{gbtree} or \code{gblinear}. Default: \code{gbtree}.
|
||||
#' }
|
||||
#' - `booster`: Which booster to use, can be `gbtree` or `gblinear`. Default: `gbtree`.
|
||||
#'
|
||||
#' 2. Booster Parameters
|
||||
#' **2. Booster Parameters**
|
||||
#'
|
||||
#' 2.1. Parameters for Tree Booster
|
||||
#'
|
||||
#' \itemize{
|
||||
#' \item{ \code{eta} control the learning rate: scale the contribution of each tree by a factor of \code{0 < eta < 1}
|
||||
#' **2.1. Parameters for Tree Booster**
|
||||
#' - `eta`: The learning rate: scale the contribution of each tree by a factor of `0 < eta < 1`
|
||||
#' when it is added to the current approximation.
|
||||
#' Used to prevent overfitting by making the boosting process more conservative.
|
||||
#' Lower value for \code{eta} implies larger value for \code{nrounds}: low \code{eta} value means model
|
||||
#' more robust to overfitting but slower to compute. Default: 0.3}
|
||||
#' \item{ \code{gamma} minimum loss reduction required to make a further partition on a leaf node of the tree.
|
||||
#' the larger, the more conservative the algorithm will be.}
|
||||
#' \item \code{max_depth} maximum depth of a tree. Default: 6
|
||||
#' \item{\code{min_child_weight} minimum sum of instance weight (hessian) needed in a child.
|
||||
#' Lower value for `eta` implies larger value for `nrounds`: low `eta` value means model
|
||||
#' more robust to overfitting but slower to compute. Default: 0.3.
|
||||
#' - `gamma`: Minimum loss reduction required to make a further partition on a leaf node of the tree.
|
||||
#' the larger, the more conservative the algorithm will be.
|
||||
#' - `max_depth`: Maximum depth of a tree. Default: 6.
|
||||
#' - `min_child_weight`: Minimum sum of instance weight (hessian) needed in a child.
|
||||
#' If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight,
|
||||
#' then the building process will give up further partitioning.
|
||||
#' In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node.
|
||||
#' The larger, the more conservative the algorithm will be. Default: 1}
|
||||
#' \item{ \code{subsample} subsample ratio of the training instance.
|
||||
#' The larger, the more conservative the algorithm will be. Default: 1.
|
||||
#' - `subsample`: Subsample ratio of the training instance.
|
||||
#' Setting it to 0.5 means that xgboost randomly collected half of the data instances to grow trees
|
||||
#' and this will prevent overfitting. It makes computation shorter (because less data to analyse).
|
||||
#' It is advised to use this parameter with \code{eta} and increase \code{nrounds}. Default: 1}
|
||||
#' \item \code{colsample_bytree} subsample ratio of columns when constructing each tree. Default: 1
|
||||
#' \item \code{lambda} L2 regularization term on weights. Default: 1
|
||||
#' \item \code{alpha} L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0
|
||||
#' \item{ \code{num_parallel_tree} Experimental parameter. number of trees to grow per round.
|
||||
#' Useful to test Random Forest through XGBoost
|
||||
#' (set \code{colsample_bytree < 1}, \code{subsample < 1} and \code{round = 1}) accordingly.
|
||||
#' Default: 1}
|
||||
#' \item{ \code{monotone_constraints} A numerical vector consists of \code{1}, \code{0} and \code{-1} with its length
|
||||
#' It is advised to use this parameter with `eta` and increase `nrounds`. Default: 1.
|
||||
#' - `colsample_bytree`: Subsample ratio of columns when constructing each tree. Default: 1.
|
||||
#' - `lambda`: L2 regularization term on weights. Default: 1.
|
||||
#' - `alpha`: L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0.
|
||||
#' - `num_parallel_tree`: Experimental parameter. number of trees to grow per round.
|
||||
#' Useful to test Random Forest through XGBoost.
|
||||
#' (set `colsample_bytree < 1`, `subsample < 1` and `round = 1`) accordingly.
|
||||
#' Default: 1.
|
||||
#' - `monotone_constraints`: A numerical vector consists of `1`, `0` and `-1` with its length
|
||||
#' equals to the number of features in the training data.
|
||||
#' \code{1} is increasing, \code{-1} is decreasing and \code{0} is no constraint.}
|
||||
#' \item{ \code{interaction_constraints} A list of vectors specifying feature indices of permitted interactions.
|
||||
#' `1` is increasing, `-1` is decreasing and `0` is no constraint.
|
||||
#' - `interaction_constraints`: A list of vectors specifying feature indices of permitted interactions.
|
||||
#' Each item of the list represents one permitted interaction where specified features are allowed to interact with each other.
|
||||
#' Feature index values should start from \code{0} (\code{0} references the first column).
|
||||
#' Leave argument unspecified for no interaction constraints.}
|
||||
#' }
|
||||
#' Feature index values should start from `0` (`0` references the first column).
|
||||
#' Leave argument unspecified for no interaction constraints.
|
||||
#'
|
||||
#' 2.2. Parameters for Linear Booster
|
||||
#' **2.2. Parameters for Linear Booster**
|
||||
#'
|
||||
#' \itemize{
|
||||
#' \item \code{lambda} L2 regularization term on weights. Default: 0
|
||||
#' \item \code{lambda_bias} L2 regularization term on bias. Default: 0
|
||||
#' \item \code{alpha} L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0
|
||||
#' }
|
||||
#' - `lambda`: L2 regularization term on weights. Default: 0.
|
||||
#' - `lambda_bias`: L2 regularization term on bias. Default: 0.
|
||||
#' - `alpha`: L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0.
|
||||
#'
|
||||
#' 3. Task Parameters
|
||||
#' **3. Task Parameters**
|
||||
#'
|
||||
#' \itemize{
|
||||
#' \item{ \code{objective} specify the learning task and the corresponding learning objective, users can pass a self-defined function to it.
|
||||
#' The default objective options are below:
|
||||
#' \itemize{
|
||||
#' \item \code{reg:squarederror} Regression with squared loss (Default).
|
||||
#' \item{ \code{reg:squaredlogerror}: regression with squared log loss \eqn{1/2 * (log(pred + 1) - log(label + 1))^2}.
|
||||
#' - `objective`: Specifies the learning task and the corresponding learning objective.
|
||||
#' users can pass a self-defined function to it. The default objective options are below:
|
||||
#' - `reg:squarederror`: Regression with squared loss (default).
|
||||
#' - `reg:squaredlogerror`: Regression with squared log loss \eqn{1/2 \cdot (\log(pred + 1) - \log(label + 1))^2}.
|
||||
#' All inputs are required to be greater than -1.
|
||||
#' Also, see metric rmsle for possible issue with this objective.}
|
||||
#' \item \code{reg:logistic} logistic regression.
|
||||
#' \item \code{reg:pseudohubererror}: regression with Pseudo Huber loss, a twice differentiable alternative to absolute loss.
|
||||
#' \item \code{binary:logistic} logistic regression for binary classification. Output probability.
|
||||
#' \item \code{binary:logitraw} logistic regression for binary classification, output score before logistic transformation.
|
||||
#' \item \code{binary:hinge}: hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities.
|
||||
#' \item{ \code{count:poisson}: Poisson regression for count data, output mean of Poisson distribution.
|
||||
#' \code{max_delta_step} is set to 0.7 by default in poisson regression (used to safeguard optimization).}
|
||||
#' \item{ \code{survival:cox}: Cox regression for right censored survival time data (negative values are considered right censored).
|
||||
#' Also, see metric rmsle for possible issue with this objective.
|
||||
#' - `reg:logistic`: Logistic regression.
|
||||
#' - `reg:pseudohubererror`: Regression with Pseudo Huber loss, a twice differentiable alternative to absolute loss.
|
||||
#' - `binary:logistic`: Logistic regression for binary classification. Output probability.
|
||||
#' - `binary:logitraw`: Logistic regression for binary classification, output score before logistic transformation.
|
||||
#' - `binary:hinge`: Hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities.
|
||||
#' - `count:poisson`: Poisson regression for count data, output mean of Poisson distribution.
|
||||
#' The parameter `max_delta_step` is set to 0.7 by default in poisson regression
|
||||
#' (used to safeguard optimization).
|
||||
#' - `survival:cox`: Cox regression for right censored survival time data (negative values are considered right censored).
|
||||
#' Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional
|
||||
#' hazard function \code{h(t) = h0(t) * HR)}.}
|
||||
#' \item{ \code{survival:aft}: Accelerated failure time model for censored survival time data. See
|
||||
#' \href{https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html}{Survival Analysis with Accelerated Failure Time}
|
||||
#' for details.}
|
||||
#' \item \code{aft_loss_distribution}: Probability Density Function used by \code{survival:aft} and \code{aft-nloglik} metric.
|
||||
#' \item{ \code{multi:softmax} set xgboost to do multiclass classification using the softmax objective.
|
||||
#' Class is represented by a number and should be from 0 to \code{num_class - 1}.}
|
||||
#' \item{ \code{multi:softprob} same as softmax, but prediction outputs a vector of ndata * nclass elements, which can be
|
||||
#' hazard function \eqn{h(t) = h_0(t) \cdot HR}.
|
||||
#' - `survival:aft`: Accelerated failure time model for censored survival time data. See
|
||||
#' [Survival Analysis with Accelerated Failure Time](https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html)
|
||||
#' for details.
|
||||
#' The parameter `aft_loss_distribution` specifies the Probability Density Function
|
||||
#' used by `survival:aft` and the `aft-nloglik` metric.
|
||||
#' - `multi:softmax`: Set xgboost to do multiclass classification using the softmax objective.
|
||||
#' Class is represented by a number and should be from 0 to `num_class - 1`.
|
||||
#' - `multi:softprob`: Same as softmax, but prediction outputs a vector of ndata * nclass elements, which can be
|
||||
#' further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging
|
||||
#' to each class.}
|
||||
#' \item \code{rank:pairwise} set xgboost to do ranking task by minimizing the pairwise loss.
|
||||
#' \item{ \code{rank:ndcg}: Use LambdaMART to perform list-wise ranking where
|
||||
#' \href{https://en.wikipedia.org/wiki/Discounted_cumulative_gain}{Normalized Discounted Cumulative Gain (NDCG)} is maximized.}
|
||||
#' \item{ \code{rank:map}: Use LambdaMART to perform list-wise ranking where
|
||||
#' \href{https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision}{Mean Average Precision (MAP)}
|
||||
#' is maximized.}
|
||||
#' \item{ \code{reg:gamma}: gamma regression with log-link.
|
||||
#' Output is a mean of gamma distribution.
|
||||
#' to each class.
|
||||
#' - `rank:pairwise`: Set XGBoost to do ranking task by minimizing the pairwise loss.
|
||||
#' - `rank:ndcg`: Use LambdaMART to perform list-wise ranking where
|
||||
#' [Normalized Discounted Cumulative Gain (NDCG)](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) is maximized.
|
||||
#' - `rank:map`: Use LambdaMART to perform list-wise ranking where
|
||||
#' [Mean Average Precision (MAP)](https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision)
|
||||
#' is maximized.
|
||||
#' - `reg:gamma`: Gamma regression with log-link. Output is a mean of gamma distribution.
|
||||
#' It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be
|
||||
#' \href{https://en.wikipedia.org/wiki/Gamma_distribution#Applications}{gamma-distributed}.}
|
||||
#' \item{ \code{reg:tweedie}: Tweedie regression with log-link.
|
||||
#' [gamma-distributed](https://en.wikipedia.org/wiki/Gamma_distribution#Applications).
|
||||
#' - `reg:tweedie`: Tweedie regression with log-link.
|
||||
#' It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be
|
||||
#' \href{https://en.wikipedia.org/wiki/Tweedie_distribution#Applications}{Tweedie-distributed}.}
|
||||
#' }
|
||||
#' [Tweedie-distributed](https://en.wikipedia.org/wiki/Tweedie_distribution#Applications).
|
||||
#'
|
||||
#' For custom objectives, one should pass a function taking as input the current predictions (as a numeric
|
||||
#' vector or matrix) and the training data (as an `xgb.DMatrix` object) that will return a list with elements
|
||||
@ -111,37 +102,35 @@
|
||||
#' the Hessian will be clipped, so one might consider using the expected Hessian (Fisher information) if the
|
||||
#' objective is non-convex.
|
||||
#'
|
||||
#' See the tutorials \href{https://xgboost.readthedocs.io/en/stable/tutorials/custom_metric_obj.html}{
|
||||
#' Custom Objective and Evaluation Metric} and \href{https://xgboost.readthedocs.io/en/stable/tutorials/advanced_custom_obj}{
|
||||
#' Advanced Usage of Custom Objectives} for more information about custom objectives.
|
||||
#' }
|
||||
#' \item \code{base_score} the initial prediction score of all instances, global bias. Default: 0.5
|
||||
#' \item{ \code{eval_metric} evaluation metrics for validation data.
|
||||
#' See the tutorials [Custom Objective and Evaluation Metric](https://xgboost.readthedocs.io/en/stable/tutorials/custom_metric_obj.html)
|
||||
#' and [Advanced Usage of Custom Objectives](https://xgboost.readthedocs.io/en/stable/tutorials/advanced_custom_obj)
|
||||
#' for more information about custom objectives.
|
||||
#'
|
||||
#' - `base_score`: The initial prediction score of all instances, global bias. Default: 0.5.
|
||||
#' - `eval_metric`: Evaluation metrics for validation data.
|
||||
#' Users can pass a self-defined function to it.
|
||||
#' Default: metric will be assigned according to objective
|
||||
#' (rmse for regression, and error for classification, mean average precision for ranking).
|
||||
#' List is provided in detail section.}
|
||||
#' }
|
||||
#'
|
||||
#' @param data training dataset. \code{xgb.train} accepts only an \code{xgb.DMatrix} as the input.
|
||||
#' \code{xgboost}, in addition, also accepts \code{matrix}, \code{dgCMatrix}, or name of a local data file.
|
||||
#' @param nrounds max number of boosting iterations.
|
||||
#' List is provided in detail section.
|
||||
#' @param data Training dataset. `xgb.train()` accepts only an `xgb.DMatrix` as the input.
|
||||
#' [xgboost()], in addition, also accepts `matrix`, `dgCMatrix`, or name of a local data file.
|
||||
#' @param nrounds Max number of boosting iterations.
|
||||
#' @param evals Named list of `xgb.DMatrix` datasets to use for evaluating model performance.
|
||||
#' Metrics specified in either \code{eval_metric} or \code{feval} will be computed for each
|
||||
#' Metrics specified in either `eval_metric` or `feval` will be computed for each
|
||||
#' of these datasets during each boosting iteration, and stored in the end as a field named
|
||||
#' \code{evaluation_log} in the resulting object. When either \code{verbose>=1} or
|
||||
#' \code{\link{xgb.cb.print.evaluation}} callback is engaged, the performance results are continuously
|
||||
#' `evaluation_log` in the resulting object. When either `verbose>=1` or
|
||||
#' [xgb.cb.print.evaluation()] callback is engaged, the performance results are continuously
|
||||
#' printed out during the training.
|
||||
#' E.g., specifying \code{evals=list(validation1=mat1, validation2=mat2)} allows to track
|
||||
#' E.g., specifying `evals=list(validation1=mat1, validation2=mat2)` allows to track
|
||||
#' the performance of each round's model on mat1 and mat2.
|
||||
#' @param obj customized objective function. Should take two arguments: the first one will be the
|
||||
#' @param obj Customized objective function. Should take two arguments: the first one will be the
|
||||
#' current predictions (either a numeric vector or matrix depending on the number of targets / classes),
|
||||
#' and the second one will be the `data` DMatrix object that is used for training.
|
||||
#'
|
||||
#' It should return a list with two elements `grad` and `hess` (in that order), as either
|
||||
#' numeric vectors or numeric matrices depending on the number of targets / classes (same
|
||||
#' dimension as the predictions that are passed as first argument).
|
||||
#' @param feval customized evaluation function. Just like `obj`, should take two arguments, with
|
||||
#' @param feval Customized evaluation function. Just like `obj`, should take two arguments, with
|
||||
#' the first one being the predictions and the second one the `data` DMatrix.
|
||||
#'
|
||||
#' Should return a list with two elements `metric` (name that will be displayed for this metric,
|
||||
@ -153,49 +142,45 @@
|
||||
#' parameter `disable_default_eval_metric = TRUE`.
|
||||
#' @param verbose If 0, xgboost will stay silent. If 1, it will print information about performance.
|
||||
#' If 2, some additional information will be printed out.
|
||||
#' Note that setting \code{verbose > 0} automatically engages the
|
||||
#' \code{xgb.cb.print.evaluation(period=1)} callback function.
|
||||
#' @param print_every_n Print each n-th iteration evaluation messages when \code{verbose>0}.
|
||||
#' Note that setting `verbose > 0` automatically engages the
|
||||
#' `xgb.cb.print.evaluation(period=1)` callback function.
|
||||
#' @param print_every_n Print each nth iteration evaluation messages when `verbose>0`.
|
||||
#' Default is 1 which means all messages are printed. This parameter is passed to the
|
||||
#' \code{\link{xgb.cb.print.evaluation}} callback.
|
||||
#' @param early_stopping_rounds If \code{NULL}, the early stopping function is not triggered.
|
||||
#' If set to an integer \code{k}, training with a validation set will stop if the performance
|
||||
#' doesn't improve for \code{k} rounds.
|
||||
#' Setting this parameter engages the \code{\link{xgb.cb.early.stop}} callback.
|
||||
#' @param maximize If \code{feval} and \code{early_stopping_rounds} are set,
|
||||
#' then this parameter must be set as well.
|
||||
#' When it is \code{TRUE}, it means the larger the evaluation score the better.
|
||||
#' This parameter is passed to the \code{\link{xgb.cb.early.stop}} callback.
|
||||
#' @param save_period when it is non-NULL, model is saved to disk after every \code{save_period} rounds,
|
||||
#' 0 means save at the end. The saving is handled by the \code{\link{xgb.cb.save.model}} callback.
|
||||
#' [xgb.cb.print.evaluation()] callback.
|
||||
#' @param early_stopping_rounds If `NULL`, the early stopping function is not triggered.
|
||||
#' If set to an integer `k`, training with a validation set will stop if the performance
|
||||
#' doesn't improve for `k` rounds. Setting this parameter engages the [xgb.cb.early.stop()] callback.
|
||||
#' @param maximize If `feval` and `early_stopping_rounds` are set, then this parameter must be set as well.
|
||||
#' When it is `TRUE`, it means the larger the evaluation score the better.
|
||||
#' This parameter is passed to the [xgb.cb.early.stop()] callback.
|
||||
#' @param save_period When not `NULL`, model is saved to disk after every `save_period` rounds.
|
||||
#' 0 means save at the end. The saving is handled by the [xgb.cb.save.model()] callback.
|
||||
#' @param save_name the name or path for periodically saved model file.
|
||||
#' @param xgb_model a previously built model to continue the training from.
|
||||
#' Could be either an object of class \code{xgb.Booster}, or its raw data, or the name of a
|
||||
#' @param xgb_model A previously built model to continue the training from.
|
||||
#' Could be either an object of class `xgb.Booster`, or its raw data, or the name of a
|
||||
#' file with a previously saved model.
|
||||
#' @param callbacks a list of callback functions to perform various task during boosting.
|
||||
#' See \code{\link{xgb.Callback}}. Some of the callbacks are automatically created depending on the
|
||||
#' @param callbacks A list of callback functions to perform various task during boosting.
|
||||
#' See [xgb.Callback()]. Some of the callbacks are automatically created depending on the
|
||||
#' parameters' values. User can provide either existing or their own callback methods in order
|
||||
#' to customize the training process.
|
||||
#'
|
||||
#' Note that some callbacks might try to leave attributes in the resulting model object,
|
||||
#' such as an evaluation log (a `data.table` object) - be aware that these objects are kept
|
||||
#' as R attributes, and thus do not get saved when using XGBoost's own serializaters like
|
||||
#' \link{xgb.save} (but are kept when using R serializers like \link{saveRDS}).
|
||||
#' @param ... other parameters to pass to \code{params}.
|
||||
#' [xgb.save()] (but are kept when using R serializers like [saveRDS()]).
|
||||
#' @param ... other parameters to pass to `params`.
|
||||
#'
|
||||
#' @return
|
||||
#' An object of class \code{xgb.Booster}.
|
||||
#' @return An object of class `xgb.Booster`.
|
||||
#'
|
||||
#' @details
|
||||
#' These are the training functions for \code{xgboost}.
|
||||
#' These are the training functions for [xgboost()].
|
||||
#'
|
||||
#' The \code{xgb.train} interface supports advanced features such as \code{evals},
|
||||
#' The `xgb.train()` interface supports advanced features such as `evals`,
|
||||
#' customized objective and evaluation metric functions, therefore it is more flexible
|
||||
#' than the \code{xgboost} interface.
|
||||
#' than the [xgboost()] interface.
|
||||
#'
|
||||
#' Parallelization is automatically enabled if \code{OpenMP} is present.
|
||||
#' Number of threads can also be manually specified via the \code{nthread}
|
||||
#' parameter.
|
||||
#' Parallelization is automatically enabled if OpenMP is present.
|
||||
#' Number of threads can also be manually specified via the `nthread` parameter.
|
||||
#'
|
||||
#' While in other interfaces, the default random seed defaults to zero, in R, if a parameter `seed`
|
||||
#' is not manually supplied, it will generate a random seed through R's own random number generator,
|
||||
@ -203,64 +188,56 @@
|
||||
#' RNG from R.
|
||||
#'
|
||||
#' The evaluation metric is chosen automatically by XGBoost (according to the objective)
|
||||
#' when the \code{eval_metric} parameter is not provided.
|
||||
#' User may set one or several \code{eval_metric} parameters.
|
||||
#' when the `eval_metric` parameter is not provided.
|
||||
#' User may set one or several `eval_metric` parameters.
|
||||
#' Note that when using a customized metric, only this single metric can be used.
|
||||
#' The following is the list of built-in metrics for which XGBoost provides optimized implementation:
|
||||
#' \itemize{
|
||||
#' \item \code{rmse} root mean square error. \url{https://en.wikipedia.org/wiki/Root_mean_square_error}
|
||||
#' \item \code{logloss} negative log-likelihood. \url{https://en.wikipedia.org/wiki/Log-likelihood}
|
||||
#' \item \code{mlogloss} multiclass logloss. \url{https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html}
|
||||
#' \item \code{error} Binary classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
|
||||
#' - `rmse`: Root mean square error. \url{https://en.wikipedia.org/wiki/Root_mean_square_error}
|
||||
#' - `logloss`: Negative log-likelihood. \url{https://en.wikipedia.org/wiki/Log-likelihood}
|
||||
#' - `mlogloss`: Multiclass logloss. \url{https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html}
|
||||
#' - `error`: Binary classification error rate. It is calculated as `(# wrong cases) / (# all cases)`.
|
||||
#' By default, it uses the 0.5 threshold for predicted values to define negative and positive instances.
|
||||
#' Different threshold (e.g., 0.) could be specified as "error@0."
|
||||
#' \item \code{merror} Multiclass classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
|
||||
#' \item \code{mae} Mean absolute error
|
||||
#' \item \code{mape} Mean absolute percentage error
|
||||
#' \item{ \code{auc} Area under the curve.
|
||||
#' \url{https://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.}
|
||||
#' \item \code{aucpr} Area under the PR curve. \url{https://en.wikipedia.org/wiki/Precision_and_recall} for ranking evaluation.
|
||||
#' \item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{https://en.wikipedia.org/wiki/NDCG}
|
||||
#' }
|
||||
#' Different threshold (e.g., 0.) could be specified as `error@0`.
|
||||
#' - `merror`: Multiclass classification error rate. It is calculated as `(# wrong cases) / (# all cases)`.
|
||||
#' - `mae`: Mean absolute error.
|
||||
#' - `mape`: Mean absolute percentage error.
|
||||
#' - `auc`: Area under the curve.
|
||||
#' \url{https://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.
|
||||
#' - `aucpr`: Area under the PR curve. \url{https://en.wikipedia.org/wiki/Precision_and_recall} for ranking evaluation.
|
||||
#' - `ndcg`: Normalized Discounted Cumulative Gain (for ranking task). \url{https://en.wikipedia.org/wiki/NDCG}
|
||||
#'
|
||||
#' The following callbacks are automatically created when certain parameters are set:
|
||||
#' \itemize{
|
||||
#' \item \code{xgb.cb.print.evaluation} is turned on when \code{verbose > 0};
|
||||
#' and the \code{print_every_n} parameter is passed to it.
|
||||
#' \item \code{xgb.cb.evaluation.log} is on when \code{evals} is present.
|
||||
#' \item \code{xgb.cb.early.stop}: when \code{early_stopping_rounds} is set.
|
||||
#' \item \code{xgb.cb.save.model}: when \code{save_period > 0} is set.
|
||||
#' }
|
||||
#' - [xgb.cb.print.evaluation()] is turned on when `verbose > 0` and the `print_every_n`
|
||||
#' parameter is passed to it.
|
||||
#' - [xgb.cb.evaluation.log()] is on when `evals` is present.
|
||||
#' - [xgb.cb.early.stop()]: When `early_stopping_rounds` is set.
|
||||
#' - [xgb.cb.save.model()]: When `save_period > 0` is set.
|
||||
#'
|
||||
#' Note that objects of type `xgb.Booster` as returned by this function behave a bit differently
|
||||
#' from typical R objects (it's an 'altrep' list class), and it makes a separation between
|
||||
#' internal booster attributes (restricted to jsonifyable data), accessed through \link{xgb.attr}
|
||||
#' and shared between interfaces through serialization functions like \link{xgb.save}; and
|
||||
#' R-specific attributes (typically the result from a callback), accessed through \link{attributes}
|
||||
#' and \link{attr}, which are otherwise
|
||||
#' only used in the R interface, only kept when using R's serializers like \link{saveRDS}, and
|
||||
#' not anyhow used by functions like \link{predict.xgb.Booster}.
|
||||
#' internal booster attributes (restricted to jsonifyable data), accessed through [xgb.attr()]
|
||||
#' and shared between interfaces through serialization functions like [xgb.save()]; and
|
||||
#' R-specific attributes (typically the result from a callback), accessed through [attributes()]
|
||||
#' and [attr()], which are otherwise
|
||||
#' only used in the R interface, only kept when using R's serializers like [saveRDS()], and
|
||||
#' not anyhow used by functions like `predict.xgb.Booster()`.
|
||||
#'
|
||||
#' Be aware that one such R attribute that is automatically added is `params` - this attribute
|
||||
#' is assigned from the `params` argument to this function, and is only meant to serve as a
|
||||
#' reference for what went into the booster, but is not used in other methods that take a booster
|
||||
#' object - so for example, changing the booster's configuration requires calling `xgb.config<-`
|
||||
#' or 'xgb.parameters<-', while simply modifying `attributes(model)$params$<...>` will have no
|
||||
#' or `xgb.parameters<-`, while simply modifying `attributes(model)$params$<...>` will have no
|
||||
#' effect elsewhere.
|
||||
#'
|
||||
#' @seealso
|
||||
#' \code{\link{xgb.Callback}},
|
||||
#' \code{\link{predict.xgb.Booster}},
|
||||
#' \code{\link{xgb.cv}}
|
||||
#' @seealso [xgb.Callback()], [predict.xgb.Booster()], [xgb.cv()]
|
||||
#'
|
||||
#' @references
|
||||
#'
|
||||
#' Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System",
|
||||
#' 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, \url{https://arxiv.org/abs/1603.02754}
|
||||
#'
|
||||
#' @examples
|
||||
#' data(agaricus.train, package='xgboost')
|
||||
#' data(agaricus.test, package='xgboost')
|
||||
#' data(agaricus.train, package = "xgboost")
|
||||
#' data(agaricus.test, package = "xgboost")
|
||||
#'
|
||||
#' ## Keep the number of threads to 1 for examples
|
||||
#' nthread <- 1
|
||||
@ -275,8 +252,13 @@
|
||||
#' evals <- list(train = dtrain, eval = dtest)
|
||||
#'
|
||||
#' ## A simple xgb.train example:
|
||||
#' param <- list(max_depth = 2, eta = 1, nthread = nthread,
|
||||
#' objective = "binary:logistic", eval_metric = "auc")
|
||||
#' param <- list(
|
||||
#' max_depth = 2,
|
||||
#' eta = 1,
|
||||
#' nthread = nthread,
|
||||
#' objective = "binary:logistic",
|
||||
#' eval_metric = "auc"
|
||||
#' )
|
||||
#' bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)
|
||||
#'
|
||||
#' ## An xgb.train example where custom objective and evaluation metric are
|
||||
@ -296,34 +278,65 @@
|
||||
#'
|
||||
#' # These functions could be used by passing them either:
|
||||
#' # as 'objective' and 'eval_metric' parameters in the params list:
|
||||
#' param <- list(max_depth = 2, eta = 1, nthread = nthread,
|
||||
#' objective = logregobj, eval_metric = evalerror)
|
||||
#' param <- list(
|
||||
#' max_depth = 2,
|
||||
#' eta = 1,
|
||||
#' nthread = nthread,
|
||||
#' objective = logregobj,
|
||||
#' eval_metric = evalerror
|
||||
#' )
|
||||
#' bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)
|
||||
#'
|
||||
#' # or through the ... arguments:
|
||||
#' param <- list(max_depth = 2, eta = 1, nthread = nthread)
|
||||
#' bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0,
|
||||
#' objective = logregobj, eval_metric = evalerror)
|
||||
#' bst <- xgb.train(
|
||||
#' param,
|
||||
#' dtrain,
|
||||
#' nrounds = 2,
|
||||
#' evals = evals,
|
||||
#' verbose = 0,
|
||||
#' objective = logregobj,
|
||||
#' eval_metric = evalerror
|
||||
#' )
|
||||
#'
|
||||
#' # or as dedicated 'obj' and 'feval' parameters of xgb.train:
|
||||
#' bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals,
|
||||
#' obj = logregobj, feval = evalerror)
|
||||
#' bst <- xgb.train(
|
||||
#' param, dtrain, nrounds = 2, evals = evals, obj = logregobj, feval = evalerror
|
||||
#' )
|
||||
#'
|
||||
#'
|
||||
#' ## An xgb.train example of using variable learning rates at each iteration:
|
||||
#' param <- list(max_depth = 2, eta = 1, nthread = nthread,
|
||||
#' objective = "binary:logistic", eval_metric = "auc")
|
||||
#' param <- list(
|
||||
#' max_depth = 2,
|
||||
#' eta = 1,
|
||||
#' nthread = nthread,
|
||||
#' objective = "binary:logistic",
|
||||
#' eval_metric = "auc"
|
||||
#' )
|
||||
#' my_etas <- list(eta = c(0.5, 0.1))
|
||||
#' bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0,
|
||||
#' callbacks = list(xgb.cb.reset.parameters(my_etas)))
|
||||
#'
|
||||
#' bst <- xgb.train(
|
||||
#' param,
|
||||
#' dtrain,
|
||||
#' nrounds = 2,
|
||||
#' evals = evals,
|
||||
#' verbose = 0,
|
||||
#' callbacks = list(xgb.cb.reset.parameters(my_etas))
|
||||
#' )
|
||||
#'
|
||||
#' ## Early stopping:
|
||||
#' bst <- xgb.train(param, dtrain, nrounds = 25, evals = evals,
|
||||
#' early_stopping_rounds = 3)
|
||||
#' bst <- xgb.train(
|
||||
#' param, dtrain, nrounds = 25, evals = evals, early_stopping_rounds = 3
|
||||
#' )
|
||||
#'
|
||||
#' ## An 'xgboost' interface example:
|
||||
#' bst <- xgboost(x = agaricus.train$data, y = factor(agaricus.train$label),
|
||||
#' params = list(max_depth = 2, eta = 1), nthread = nthread, nrounds = 2)
|
||||
#' bst <- xgboost(
|
||||
#' x = agaricus.train$data,
|
||||
#' y = factor(agaricus.train$label),
|
||||
#' params = list(max_depth = 2, eta = 1),
|
||||
#' nthread = nthread,
|
||||
#' nrounds = 2
|
||||
#' )
|
||||
#' pred <- predict(bst, agaricus.test$data)
|
||||
#'
|
||||
#' @export
|
||||
|
||||
@ -717,54 +717,53 @@ process.x.and.col.args <- function(
|
||||
return(lst_args)
|
||||
}
|
||||
|
||||
#' @noMd
|
||||
#' @export
|
||||
#' @title Fit XGBoost Model
|
||||
#' @description Fits an XGBoost model (boosted decision tree ensemble) to given x/y data.
|
||||
#' Fit XGBoost Model
|
||||
#'
|
||||
#' See the tutorial \href{https://xgboost.readthedocs.io/en/stable/tutorials/model.html}{
|
||||
#' Introduction to Boosted Trees} for a longer explanation of what XGBoost does.
|
||||
#' @export
|
||||
#' @description
|
||||
#' Fits an XGBoost model (boosted decision tree ensemble) to given x/y data.
|
||||
#'
|
||||
#' See the tutorial [Introduction to Boosted Trees](https://xgboost.readthedocs.io/en/stable/tutorials/model.html)
|
||||
#' for a longer explanation of what XGBoost does.
|
||||
#'
|
||||
#' This function is intended to provide a more user-friendly interface for XGBoost that follows
|
||||
#' R's conventions for model fitting and predictions, but which doesn't expose all of the
|
||||
#' possible functionalities of the core XGBoost library.
|
||||
#'
|
||||
#' See \link{xgb.train} for a more flexible low-level alternative which is similar across different
|
||||
#' See [xgb.train()] for a more flexible low-level alternative which is similar across different
|
||||
#' language bindings of XGBoost and which exposes the full library's functionalities.
|
||||
#' @details For package authors using `xgboost` as a dependency, it is highly recommended to use
|
||||
#' \link{xgb.train} in package code instead of `xgboost()`, since it has a more stable interface
|
||||
#'
|
||||
#' @details
|
||||
#' For package authors using 'xgboost' as a dependency, it is highly recommended to use
|
||||
#' [xgb.train()] in package code instead of [xgboost()], since it has a more stable interface
|
||||
#' and performs fewer data conversions and copies along the way.
|
||||
#' @references \itemize{
|
||||
#' \item Chen, Tianqi, and Carlos Guestrin. "Xgboost: A scalable tree boosting system."
|
||||
#' @references
|
||||
#' - Chen, Tianqi, and Carlos Guestrin. "Xgboost: A scalable tree boosting system."
|
||||
#' Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and
|
||||
#' data mining. 2016.
|
||||
#' \item \url{https://xgboost.readthedocs.io/en/stable/}
|
||||
#' }
|
||||
#' @param x The features / covariates. Can be passed as:\itemize{
|
||||
#' \item A numeric or integer `matrix`.
|
||||
#' \item A `data.frame`, in which all columns are one of the following types:\itemize{
|
||||
#' \item `numeric`
|
||||
#' \item `integer`
|
||||
#' \item `logical`
|
||||
#' \item `factor`
|
||||
#' }
|
||||
#' - \url{https://xgboost.readthedocs.io/en/stable/}
|
||||
#' @param x The features / covariates. Can be passed as:
|
||||
#' - A numeric or integer `matrix`.
|
||||
#' - A `data.frame`, in which all columns are one of the following types:
|
||||
#' - `numeric`
|
||||
#' - `integer`
|
||||
#' - `logical`
|
||||
#' - `factor`
|
||||
#'
|
||||
#' Columns of `factor` type will be assumed to be categorical, while other column types will
|
||||
#' be assumed to be numeric.
|
||||
#' \item A sparse matrix from the `Matrix` package, either as `dgCMatrix` or `dgRMatrix` class.
|
||||
#' }
|
||||
#' - A sparse matrix from the `Matrix` package, either as `dgCMatrix` or `dgRMatrix` class.
|
||||
#'
|
||||
#' Note that categorical features are only supported for `data.frame` inputs, and are automatically
|
||||
#' determined based on their types. See \link{xgb.train} with \link{xgb.DMatrix} for more flexible
|
||||
#' determined based on their types. See [xgb.train()] with [xgb.DMatrix()] for more flexible
|
||||
#' variants that would allow something like categorical features on sparse matrices.
|
||||
#' @param y The response variable. Allowed values are:\itemize{
|
||||
#' \item A numeric or integer vector (for regression tasks).
|
||||
#' \item A factor or character vector (for binary and multi-class classification tasks).
|
||||
#' \item A logical (boolean) vector (for binary classification tasks).
|
||||
#' \item A numeric or integer matrix or `data.frame` with numeric/integer columns
|
||||
#' @param y The response variable. Allowed values are:
|
||||
#' - A numeric or integer vector (for regression tasks).
|
||||
#' - A factor or character vector (for binary and multi-class classification tasks).
|
||||
#' - A logical (boolean) vector (for binary classification tasks).
|
||||
#' - A numeric or integer matrix or `data.frame` with numeric/integer columns
|
||||
#' (for multi-task regression tasks).
|
||||
#' \item A `Surv` object from the `survival` package (for survival tasks).
|
||||
#' }
|
||||
#' - A `Surv` object from the 'survival' package (for survival tasks).
|
||||
#'
|
||||
#' If `objective` is `NULL`, the right task will be determined automatically based on
|
||||
#' the class of `y`.
|
||||
@ -778,18 +777,17 @@ process.x.and.col.args <- function(
|
||||
#' set as the last level.
|
||||
#' @param objective Optimization objective to minimize based on the supplied data, to be passed
|
||||
#' by name as a string / character (e.g. `reg:absoluteerror`). See the
|
||||
#' \href{https://xgboost.readthedocs.io/en/stable/parameter.html#learning-task-parameters}{
|
||||
#' Learning Task Parameters} page for more detailed information on allowed values.
|
||||
#' [Learning Task Parameters](https://xgboost.readthedocs.io/en/stable/parameter.html#learning-task-parameters)
|
||||
#' page for more detailed information on allowed values.
|
||||
#'
|
||||
#' If `NULL` (the default), will be automatically determined from `y` according to the following
|
||||
#' logic:\itemize{
|
||||
#' \item If `y` is a factor with 2 levels, will use `binary:logistic`.
|
||||
#' \item If `y` is a factor with more than 2 levels, will use `multi:softprob` (number of classes
|
||||
#' logic:
|
||||
#' - If `y` is a factor with 2 levels, will use `binary:logistic`.
|
||||
#' - If `y` is a factor with more than 2 levels, will use `multi:softprob` (number of classes
|
||||
#' will be determined automatically, should not be passed under `params`).
|
||||
#' \item If `y` is a `Surv` object from the `survival` package, will use `survival:aft` (note that
|
||||
#' - If `y` is a `Surv` object from the `survival` package, will use `survival:aft` (note that
|
||||
#' the only types supported are left / right / interval censored).
|
||||
#' \item Otherwise, will use `reg:squarederror`.
|
||||
#' }
|
||||
#' - Otherwise, will use `reg:squarederror`.
|
||||
#'
|
||||
#' If `objective` is not `NULL`, it must match with the type of `y` - e.g. `factor` types of `y`
|
||||
#' can only be used with classification objectives and vice-versa.
|
||||
@ -805,8 +803,7 @@ process.x.and.col.args <- function(
|
||||
#' @param weights Sample weights for each row in `x` and `y`. If `NULL` (the default), each row
|
||||
#' will have the same weight.
|
||||
#'
|
||||
#' If not `NULL`, should be passed as a numeric vector with length matching to the number of
|
||||
#' rows in `x`.
|
||||
#' If not `NULL`, should be passed as a numeric vector with length matching to the number of rows in `x`.
|
||||
#' @param verbosity Verbosity of printing messages. Valid values of 0 (silent), 1 (warning),
|
||||
#' 2 (info), and 3 (debug).
|
||||
#' @param nthreads Number of parallel threads to use. If passing zero, will use all CPU threads.
|
||||
@ -826,8 +823,8 @@ process.x.and.col.args <- function(
|
||||
#' case the columns that are not referred to in `monotone_constraints` will be assumed to have
|
||||
#' a value of zero (no constraint imposed on the model for those features).
|
||||
#'
|
||||
#' See the tutorial \href{https://xgboost.readthedocs.io/en/stable/tutorials/monotonic.html}{
|
||||
#' Monotonic Constraints} for a more detailed explanation.
|
||||
#' See the tutorial [Monotonic Constraints](https://xgboost.readthedocs.io/en/stable/tutorials/monotonic.html)
|
||||
#' for a more detailed explanation.
|
||||
#' @param interaction_constraints Constraints for interaction representing permitted interactions.
|
||||
#' The constraints must be specified in the form of a list of vectors referencing columns in the
|
||||
#' data, e.g. `list(c(1, 2), c(3, 4, 5))` (with these numbers being column indices, numeration
|
||||
@ -836,9 +833,8 @@ process.x.and.col.args <- function(
|
||||
#' columns by names), where each vector is a group of indices of features that are allowed to
|
||||
#' interact with each other.
|
||||
#'
|
||||
#' See the tutorial
|
||||
#' \href{https://xgboost.readthedocs.io/en/stable/tutorials/feature_interaction_constraint.html}{
|
||||
#' Feature Interaction Constraints} for more information.
|
||||
#' See the tutorial [Feature Interaction Constraints](https://xgboost.readthedocs.io/en/stable/tutorials/feature_interaction_constraint.html)
|
||||
#' for more information.
|
||||
#' @param feature_weights Feature weights for column sampling.
|
||||
#'
|
||||
#' Can be passed either as a vector with length matching to columns of `x`, or as a named
|
||||
@ -866,18 +862,19 @@ process.x.and.col.args <- function(
|
||||
#' If `NULL`, will start from zero, but note that for most objectives, an intercept is usually
|
||||
#' added (controllable through parameter `base_score` instead) when `base_margin` is not passed.
|
||||
#' @param ... Other training parameters. See the online documentation
|
||||
#' \href{https://xgboost.readthedocs.io/en/stable/parameter.html}{XGBoost Parameters} for
|
||||
#' [XGBoost Parameters](https://xgboost.readthedocs.io/en/stable/parameter.html) for
|
||||
#' details about possible values and what they do.
|
||||
#'
|
||||
#' Note that not all possible values from the core XGBoost library are allowed as `params` for
|
||||
#' 'xgboost()' - in particular, values which require an already-fitted booster object (such as
|
||||
#' `process_type`) are not accepted here.
|
||||
#' @return A model object, inheriting from both `xgboost` and `xgb.Booster`. Compared to the regular
|
||||
#' `xgb.Booster` model class produced by \link{xgb.train}, this `xgboost` class will have an
|
||||
#' `xgb.Booster` model class produced by [xgb.train()], this `xgboost` class will have an
|
||||
#'
|
||||
#' additional attribute `metadata` containing information which is used for formatting prediction
|
||||
#' outputs, such as class names for classification problems.
|
||||
#'
|
||||
#' @examples
|
||||
#' library(xgboost)
|
||||
#' data(mtcars)
|
||||
#'
|
||||
#' # Fit a small regression model on the mtcars data
|
||||
@ -1006,12 +1003,9 @@ print.xgboost <- function(x, ...) {
|
||||
#' This data set is originally from the Mushroom data set,
|
||||
#' UCI Machine Learning Repository.
|
||||
#'
|
||||
#' This data set includes the following fields:
|
||||
#'
|
||||
#' \itemize{
|
||||
#' \item \code{label} the label for each record
|
||||
#' \item \code{data} a sparse Matrix of \code{dgCMatrix} class, with 126 columns.
|
||||
#' }
|
||||
#' It includes the following fields:
|
||||
#' - `label`: The label for each record.
|
||||
#' - `data`: A sparse Matrix of 'dgCMatrix' class with 126 columns.
|
||||
#'
|
||||
#' @references
|
||||
#' <https://archive.ics.uci.edu/ml/datasets/Mushroom>
|
||||
@ -1033,12 +1027,9 @@ NULL
|
||||
#' This data set is originally from the Mushroom data set,
|
||||
#' UCI Machine Learning Repository.
|
||||
#'
|
||||
#' This data set includes the following fields:
|
||||
#'
|
||||
#' \itemize{
|
||||
#' \item \code{label} the label for each record
|
||||
#' \item \code{data} a sparse Matrix of \code{dgCMatrix} class, with 126 columns.
|
||||
#' }
|
||||
#' It includes the following fields:
|
||||
#' - `label`: The label for each record.
|
||||
#' - `data`: A sparse Matrix of 'dgCMatrix' class with 126 columns.
|
||||
#'
|
||||
#' @references
|
||||
#' <https://archive.ics.uci.edu/ml/datasets/Mushroom>
|
||||
|
||||
@ -16,11 +16,10 @@ This data set is originally from the Mushroom data set,
|
||||
UCI Machine Learning Repository.
|
||||
}
|
||||
\details{
|
||||
This data set includes the following fields:
|
||||
|
||||
It includes the following fields:
|
||||
\itemize{
|
||||
\item \code{label} the label for each record
|
||||
\item \code{data} a sparse Matrix of \code{dgCMatrix} class, with 126 columns.
|
||||
\item \code{label}: The label for each record.
|
||||
\item \code{data}: A sparse Matrix of 'dgCMatrix' class with 126 columns.
|
||||
}
|
||||
}
|
||||
\references{
|
||||
|
||||
@ -16,11 +16,10 @@ This data set is originally from the Mushroom data set,
|
||||
UCI Machine Learning Repository.
|
||||
}
|
||||
\details{
|
||||
This data set includes the following fields:
|
||||
|
||||
It includes the following fields:
|
||||
\itemize{
|
||||
\item \code{label} the label for each record
|
||||
\item \code{data} a sparse Matrix of \code{dgCMatrix} class, with 126 columns.
|
||||
\item \code{label}: The label for each record.
|
||||
\item \code{data}: A sparse Matrix of 'dgCMatrix' class with 126 columns.
|
||||
}
|
||||
}
|
||||
\references{
|
||||
|
||||
@ -13,13 +13,14 @@
|
||||
Returns a vector of numbers of rows and of columns in an \code{xgb.DMatrix}.
|
||||
}
|
||||
\details{
|
||||
Note: since \code{nrow} and \code{ncol} internally use \code{dim}, they can also
|
||||
Note: since \code{\link[=nrow]{nrow()}} and \code{\link[=ncol]{ncol()}} internally use \code{\link[=dim]{dim()}}, they can also
|
||||
be directly used with an \code{xgb.DMatrix} object.
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package='xgboost')
|
||||
data(agaricus.train, package = "xgboost")
|
||||
|
||||
train <- agaricus.train
|
||||
dtrain <- xgb.DMatrix(train$data, label=train$label, nthread = 2)
|
||||
dtrain <- xgb.DMatrix(train$data, label = train$label, nthread = 2)
|
||||
|
||||
stopifnot(nrow(dtrain) == nrow(train$data))
|
||||
stopifnot(ncol(dtrain) == ncol(train$data))
|
||||
|
||||
@ -10,26 +10,27 @@
|
||||
\method{dimnames}{xgb.DMatrix}(x) <- value
|
||||
}
|
||||
\arguments{
|
||||
\item{x}{object of class \code{xgb.DMatrix}}
|
||||
\item{x}{Object of class \code{xgb.DMatrix}.}
|
||||
|
||||
\item{value}{a list of two elements: the first one is ignored
|
||||
\item{value}{A list of two elements: the first one is ignored
|
||||
and the second one is column names}
|
||||
}
|
||||
\description{
|
||||
Only column names are supported for \code{xgb.DMatrix}, thus setting of
|
||||
row names would have no effect and returned row names would be NULL.
|
||||
row names would have no effect and returned row names would be \code{NULL}.
|
||||
}
|
||||
\details{
|
||||
Generic \code{dimnames} methods are used by \code{colnames}.
|
||||
Since row names are irrelevant, it is recommended to use \code{colnames} directly.
|
||||
Generic \code{\link[=dimnames]{dimnames()}} methods are used by \code{\link[=colnames]{colnames()}}.
|
||||
Since row names are irrelevant, it is recommended to use \code{\link[=colnames]{colnames()}} directly.
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package='xgboost')
|
||||
data(agaricus.train, package = "xgboost")
|
||||
|
||||
train <- agaricus.train
|
||||
dtrain <- xgb.DMatrix(train$data, label=train$label, nthread = 2)
|
||||
dtrain <- xgb.DMatrix(train$data, label = train$label, nthread = 2)
|
||||
dimnames(dtrain)
|
||||
colnames(dtrain)
|
||||
colnames(dtrain) <- make.names(1:ncol(train$data))
|
||||
print(dtrain, verbose=TRUE)
|
||||
print(dtrain, verbose = TRUE)
|
||||
|
||||
}
|
||||
|
||||
@ -22,34 +22,34 @@ setinfo(object, name, info)
|
||||
\method{setinfo}{xgb.DMatrix}(object, name, info)
|
||||
}
|
||||
\arguments{
|
||||
\item{object}{Object of class \code{xgb.DMatrix} of \code{xgb.Booster}.}
|
||||
\item{object}{Object of class \code{xgb.DMatrix} or \code{xgb.Booster}.}
|
||||
|
||||
\item{name}{the name of the information field to get (see details)}
|
||||
\item{name}{The name of the information field to get (see details).}
|
||||
|
||||
\item{info}{the specific field of information to set}
|
||||
\item{info}{The specific field of information to set.}
|
||||
}
|
||||
\value{
|
||||
For \code{getinfo}, will return the requested field. For \code{setinfo}, will always return value \code{TRUE}
|
||||
if it succeeds.
|
||||
For \code{getinfo()}, will return the requested field. For \code{setinfo()},
|
||||
will always return value \code{TRUE} if it succeeds.
|
||||
}
|
||||
\description{
|
||||
Get or set information of xgb.DMatrix and xgb.Booster objects
|
||||
}
|
||||
\details{
|
||||
The \code{name} field can be one of the following for \code{xgb.DMatrix}:
|
||||
|
||||
\itemize{
|
||||
\item \code{label}
|
||||
\item \code{weight}
|
||||
\item \code{base_margin}
|
||||
\item \code{label_lower_bound}
|
||||
\item \code{label_upper_bound}
|
||||
\item \code{group}
|
||||
\item \code{feature_type}
|
||||
\item \code{feature_name}
|
||||
\item \code{nrow}
|
||||
\item label
|
||||
\item weight
|
||||
\item base_margin
|
||||
\item label_lower_bound
|
||||
\item label_upper_bound
|
||||
\item group
|
||||
\item feature_type
|
||||
\item feature_name
|
||||
\item nrow
|
||||
}
|
||||
See the documentation for \link{xgb.DMatrix} for more information about these fields.
|
||||
|
||||
See the documentation for \code{\link[=xgb.DMatrix]{xgb.DMatrix()}} for more information about these fields.
|
||||
|
||||
For \code{xgb.Booster}, can be one of the following:
|
||||
\itemize{
|
||||
@ -57,17 +57,18 @@ For \code{xgb.Booster}, can be one of the following:
|
||||
\item \code{feature_name}
|
||||
}
|
||||
|
||||
Note that, while 'qid' cannot be retrieved, it's possible to get the equivalent 'group'
|
||||
Note that, while 'qid' cannot be retrieved, it is possible to get the equivalent 'group'
|
||||
for a DMatrix that had 'qid' assigned.
|
||||
|
||||
\bold{Important}: when calling \code{setinfo}, the objects are modified in-place. See
|
||||
\link{xgb.copy.Booster} for an idea of this in-place assignment works.
|
||||
\strong{Important}: when calling \code{\link[=setinfo]{setinfo()}}, the objects are modified in-place. See
|
||||
\code{\link[=xgb.copy.Booster]{xgb.copy.Booster()}} for an idea of this in-place assignment works.
|
||||
|
||||
See the documentation for \link{xgb.DMatrix} for possible fields that can be set
|
||||
See the documentation for \code{\link[=xgb.DMatrix]{xgb.DMatrix()}} for possible fields that can be set
|
||||
(which correspond to arguments in that function).
|
||||
|
||||
Note that the following fields are allowed in the construction of an \code{xgb.DMatrix}
|
||||
but \bold{aren't} allowed here:\itemize{
|
||||
but \strong{are not} allowed here:
|
||||
\itemize{
|
||||
\item data
|
||||
\item missing
|
||||
\item silent
|
||||
@ -75,19 +76,22 @@ but \bold{aren't} allowed here:\itemize{
|
||||
}
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package='xgboost')
|
||||
data(agaricus.train, package = "xgboost")
|
||||
|
||||
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||
|
||||
labels <- getinfo(dtrain, 'label')
|
||||
setinfo(dtrain, 'label', 1-labels)
|
||||
labels <- getinfo(dtrain, "label")
|
||||
setinfo(dtrain, "label", 1 - labels)
|
||||
|
||||
labels2 <- getinfo(dtrain, "label")
|
||||
stopifnot(all(labels2 == 1 - labels))
|
||||
data(agaricus.train, package = "xgboost")
|
||||
|
||||
labels2 <- getinfo(dtrain, 'label')
|
||||
stopifnot(all(labels2 == 1-labels))
|
||||
data(agaricus.train, package='xgboost')
|
||||
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||
|
||||
labels <- getinfo(dtrain, 'label')
|
||||
setinfo(dtrain, 'label', 1-labels)
|
||||
labels2 <- getinfo(dtrain, 'label')
|
||||
stopifnot(all.equal(labels2, 1-labels))
|
||||
labels <- getinfo(dtrain, "label")
|
||||
setinfo(dtrain, "label", 1 - labels)
|
||||
|
||||
labels2 <- getinfo(dtrain, "label")
|
||||
stopifnot(all.equal(labels2, 1 - labels))
|
||||
}
|
||||
|
||||
@ -157,7 +157,8 @@ dimension should produce practically the same result as \code{predcontrib = TRUE
|
||||
For multi-class and multi-target, will be a 4D array with dimensions \verb{[nrows, ngroups, nfeats+1, nfeats+1]}
|
||||
}
|
||||
|
||||
If passing \code{strict_shape=FALSE}, the result is always an array:\itemize{
|
||||
If passing \code{strict_shape=FALSE}, the result is always an array:
|
||||
\itemize{
|
||||
\item For normal predictions, the dimension is \verb{[nrows, ngroups]}.
|
||||
\item For \code{predcontrib=TRUE}, the dimension is \verb{[nrows, ngroups, nfeats+1]}.
|
||||
\item For \code{predinteraction=TRUE}, the dimension is \verb{[nrows, ngroups, nfeats+1, nfeats+1]}.
|
||||
|
||||
@ -7,21 +7,22 @@
|
||||
\method{print}{xgb.DMatrix}(x, verbose = FALSE, ...)
|
||||
}
|
||||
\arguments{
|
||||
\item{x}{an xgb.DMatrix object}
|
||||
\item{x}{An xgb.DMatrix object.}
|
||||
|
||||
\item{verbose}{whether to print colnames (when present)}
|
||||
\item{verbose}{Whether to print colnames (when present).}
|
||||
|
||||
\item{...}{not currently used}
|
||||
\item{...}{Not currently used.}
|
||||
}
|
||||
\description{
|
||||
Print information about xgb.DMatrix.
|
||||
Currently it displays dimensions and presence of info-fields and colnames.
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package='xgboost')
|
||||
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||
data(agaricus.train, package = "xgboost")
|
||||
|
||||
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||
dtrain
|
||||
print(dtrain, verbose=TRUE)
|
||||
|
||||
print(dtrain, verbose = TRUE)
|
||||
|
||||
}
|
||||
|
||||
@ -7,25 +7,33 @@
|
||||
\method{print}{xgb.cv.synchronous}(x, verbose = FALSE, ...)
|
||||
}
|
||||
\arguments{
|
||||
\item{x}{an \code{xgb.cv.synchronous} object}
|
||||
\item{x}{An \code{xgb.cv.synchronous} object.}
|
||||
|
||||
\item{verbose}{whether to print detailed data}
|
||||
\item{verbose}{Whether to print detailed data.}
|
||||
|
||||
\item{...}{passed to \code{data.table.print}}
|
||||
\item{...}{Passed to \code{data.table.print()}.}
|
||||
}
|
||||
\description{
|
||||
Prints formatted results of \code{xgb.cv}.
|
||||
Prints formatted results of \code{\link[=xgb.cv]{xgb.cv()}}.
|
||||
}
|
||||
\details{
|
||||
When not verbose, it would only print the evaluation results,
|
||||
including the best iteration (when available).
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package='xgboost')
|
||||
data(agaricus.train, package = "xgboost")
|
||||
|
||||
train <- agaricus.train
|
||||
cv <- xgb.cv(data = xgb.DMatrix(train$data, label = train$label), nfold = 5, max_depth = 2,
|
||||
eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
|
||||
cv <- xgb.cv(
|
||||
data = xgb.DMatrix(train$data, label = train$label),
|
||||
nfold = 5,
|
||||
max_depth = 2,
|
||||
eta = 1,
|
||||
nthread = 2,
|
||||
nrounds = 2,
|
||||
objective = "binary:logistic"
|
||||
)
|
||||
print(cv)
|
||||
print(cv, verbose=TRUE)
|
||||
print(cv, verbose = TRUE)
|
||||
|
||||
}
|
||||
|
||||
@ -116,7 +116,7 @@ model ended, in which case this will be larger than 1.
|
||||
It should match with argument \code{nrounds} passed to \code{\link[=xgb.train]{xgb.train()}} or \code{\link[=xgb.cv]{xgb.cv()}}.
|
||||
|
||||
Note that boosting might be interrupted before reaching this last iteration, for
|
||||
example by using the early stopping callback \link{xgb.cb.early.stop}.
|
||||
example by using the early stopping callback \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}}.
|
||||
\item iteration Index of the iteration number that is being executed (first iteration
|
||||
will be the same as parameter \code{begin_iteration}, then next one will add +1, and so on).
|
||||
\item iter_feval Evaluation metrics for \code{evals} that were supplied, either
|
||||
|
||||
@ -57,20 +57,20 @@ was constructed.
|
||||
|
||||
Other column types are not supported.
|
||||
\item CSR matrices, as class \code{dgRMatrix} from package \code{Matrix}.
|
||||
\item CSC matrices, as class \code{dgCMatrix} from package \code{Matrix}. These are \bold{not} supported for
|
||||
\item CSC matrices, as class \code{dgCMatrix} from package \code{Matrix}. These are \strong{not} supported for
|
||||
'xgb.QuantileDMatrix'.
|
||||
\item Single-row CSR matrices, as class \code{dsparseVector} from package \code{Matrix}, which is interpreted
|
||||
as a single row (only when making predictions from a fitted model).
|
||||
\item Text files in a supported format, passed as a \code{character} variable containing the URI path to
|
||||
the file, with an optional format specifier.
|
||||
|
||||
These are \bold{not} supported for \code{xgb.QuantileDMatrix}. Supported formats are:\itemize{
|
||||
\item XGBoost's own binary format for DMatrices, as produced by \link{xgb.DMatrix.save}.
|
||||
These are \strong{not} supported for \code{xgb.QuantileDMatrix}. Supported formats are:\itemize{
|
||||
\item XGBoost's own binary format for DMatrices, as produced by \code{\link[=xgb.DMatrix.save]{xgb.DMatrix.save()}}.
|
||||
\item SVMLight (a.k.a. LibSVM) format for CSR matrices. This format can be signaled by suffix
|
||||
\code{?format=libsvm} at the end of the file path. It will be the default format if not
|
||||
otherwise specified.
|
||||
\item CSV files (comma-separated values). This format can be specified by adding suffix
|
||||
\code{?format=csv} at the end ofthe file path. It will \bold{not} be auto-deduced from file extensions.
|
||||
\code{?format=csv} at the end ofthe file path. It will \strong{not} be auto-deduced from file extensions.
|
||||
}
|
||||
|
||||
Be aware that the format of the file will not be auto-deduced - for example, if a file is named 'file.csv',
|
||||
@ -99,25 +99,24 @@ so it doesn't make sense to assign weights to individual data points.}
|
||||
In the case of multi-output models, one can also pass multi-dimensional base_margin.}
|
||||
|
||||
\item{missing}{A float value to represents missing values in data (not used when creating DMatrix
|
||||
from text files).
|
||||
It is useful to change when a zero, infinite, or some other extreme value represents missing
|
||||
values in data.}
|
||||
from text files). It is useful to change when a zero, infinite, or some other
|
||||
extreme value represents missing values in data.}
|
||||
|
||||
\item{silent}{whether to suppress printing an informational message after loading from a file.}
|
||||
|
||||
\item{feature_names}{Set names for features. Overrides column names in data
|
||||
frame and matrix.
|
||||
\item{feature_names}{Set names for features. Overrides column names in data frame and matrix.
|
||||
|
||||
Note: columns are not referenced by name when calling \code{predict}, so the column order there
|
||||
must be the same as in the DMatrix construction, regardless of the column names.}
|
||||
|
||||
\item{feature_types}{Set types for features.
|
||||
|
||||
If \code{data} is a \code{data.frame} and passing \code{feature_types} is not supplied, feature types will be deduced
|
||||
automatically from the column types.
|
||||
If \code{data} is a \code{data.frame} and passing \code{feature_types} is not supplied,
|
||||
feature types will be deduced automatically from the column types.
|
||||
|
||||
Otherwise, one can pass a character vector with the same length as number of columns in \code{data},
|
||||
with the following possible values:\itemize{
|
||||
with the following possible values:
|
||||
\itemize{
|
||||
\item "c", which represents categorical columns.
|
||||
\item "q", which represents numeric columns.
|
||||
\item "int", which represents integer columns.
|
||||
@ -128,9 +127,9 @@ Note that, while categorical types are treated differently from the rest for mod
|
||||
purposes, the other types do not influence the generated model, but have effects in other
|
||||
functionalities such as feature importances.
|
||||
|
||||
\bold{Important}: categorical features, if specified manually through \code{feature_types}, must
|
||||
\strong{Important}: Categorical features, if specified manually through \code{feature_types}, must
|
||||
be encoded as integers with numeration starting at zero, and the same encoding needs to be
|
||||
applied when passing data to \code{predict}. Even if passing \code{factor} types, the encoding will
|
||||
applied when passing data to \code{\link[=predict]{predict()}}. Even if passing \code{factor} types, the encoding will
|
||||
not be saved, so make sure that \code{factor} columns passed to \code{predict} have the same \code{levels}.}
|
||||
|
||||
\item{nthread}{Number of threads used for creating DMatrix.}
|
||||
@ -154,7 +153,7 @@ how the file was split beforehand. Default to row.
|
||||
This is not used when \code{data} is not a URI.}
|
||||
|
||||
\item{ref}{The training dataset that provides quantile information, needed when creating
|
||||
validation/test dataset with \code{xgb.QuantileDMatrix}. Supplying the training DMatrix
|
||||
validation/test dataset with \code{\link[=xgb.QuantileDMatrix]{xgb.QuantileDMatrix()}}. Supplying the training DMatrix
|
||||
as a reference means that the same quantisation applied to the training data is
|
||||
applied to the validation/test data}
|
||||
|
||||
@ -169,23 +168,24 @@ subclass 'xgb.QuantileDMatrix'.
|
||||
}
|
||||
\description{
|
||||
Construct an 'xgb.DMatrix' object from a given data source, which can then be passed to functions
|
||||
such as \link{xgb.train} or \link{predict.xgb.Booster}.
|
||||
such as \code{\link[=xgb.train]{xgb.train()}} or \code{\link[=predict]{predict()}}.
|
||||
}
|
||||
\details{
|
||||
Function 'xgb.QuantileDMatrix' will construct a DMatrix with quantization for the histogram
|
||||
Function \code{xgb.QuantileDMatrix()} will construct a DMatrix with quantization for the histogram
|
||||
method already applied to it, which can be used to reduce memory usage (compared to using a
|
||||
a regular DMatrix first and then creating a quantization out of it) when using the histogram
|
||||
method (\code{tree_method = "hist"}, which is the default algorithm), but is not usable for the
|
||||
sorted-indices method (\code{tree_method = "exact"}), nor for the approximate method
|
||||
(\code{tree_method = "approx"}).
|
||||
|
||||
Note that DMatrix objects are not serializable through R functions such as \code{saveRDS} or \code{save}.
|
||||
Note that DMatrix objects are not serializable through R functions such as \code{\link[=saveRDS]{saveRDS()}} or \code{\link[=save]{save()}}.
|
||||
If a DMatrix gets serialized and then de-serialized (for example, when saving data in an R session or caching
|
||||
chunks in an Rmd file), the resulting object will not be usable anymore and will need to be reconstructed
|
||||
from the original source of data.
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package='xgboost')
|
||||
data(agaricus.train, package = "xgboost")
|
||||
|
||||
## Keep the number of threads to 1 for examples
|
||||
nthread <- 1
|
||||
data.table::setDTthreads(nthread)
|
||||
|
||||
@ -16,11 +16,10 @@ Checks whether an xgb.DMatrix object has a given field assigned to
|
||||
it, such as weights, labels, etc.
|
||||
}
|
||||
\examples{
|
||||
library(xgboost)
|
||||
x <- matrix(1:10, nrow = 5)
|
||||
dm <- xgb.DMatrix(x, nthread = 1)
|
||||
|
||||
# 'dm' so far doesn't have any fields set
|
||||
# 'dm' so far does not have any fields set
|
||||
xgb.DMatrix.hasinfo(dm, "label")
|
||||
|
||||
# Fields can be added after construction
|
||||
@ -28,5 +27,5 @@ setinfo(dm, "label", 1:5)
|
||||
xgb.DMatrix.hasinfo(dm, "label")
|
||||
}
|
||||
\seealso{
|
||||
\link{xgb.DMatrix}, \link{getinfo.xgb.DMatrix}, \link{setinfo.xgb.DMatrix}
|
||||
\code{\link[=xgb.DMatrix]{xgb.DMatrix()}}, \code{\link[=getinfo.xgb.DMatrix]{getinfo.xgb.DMatrix()}}, \code{\link[=setinfo.xgb.DMatrix]{setinfo.xgb.DMatrix()}}
|
||||
}
|
||||
|
||||
@ -16,7 +16,8 @@ Save xgb.DMatrix object to binary file
|
||||
}
|
||||
\examples{
|
||||
\dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
||||
data(agaricus.train, package='xgboost')
|
||||
data(agaricus.train, package = "xgboost")
|
||||
|
||||
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||
fname <- file.path(tempdir(), "xgb.DMatrix.data")
|
||||
xgb.DMatrix.save(dtrain, fname)
|
||||
|
||||
@ -21,16 +21,17 @@ xgb.DataBatch(
|
||||
\arguments{
|
||||
\item{data}{Batch of data belonging to this batch.
|
||||
|
||||
Note that not all of the input types supported by \link{xgb.DMatrix} are possible
|
||||
to pass here. Supported types are:\itemize{
|
||||
Note that not all of the input types supported by \code{\link[=xgb.DMatrix]{xgb.DMatrix()}} are possible
|
||||
to pass here. Supported types are:
|
||||
\itemize{
|
||||
\item \code{matrix}, with types \code{numeric}, \code{integer}, and \code{logical}. Note that for types
|
||||
\code{integer} and \code{logical}, missing values might not be automatically recognized as
|
||||
as such - see the documentation for parameter \code{missing} in \link{xgb.ExternalDMatrix}
|
||||
as such - see the documentation for parameter \code{missing} in \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}
|
||||
for details on this.
|
||||
\item \code{data.frame}, with the same types as supported by 'xgb.DMatrix' and same
|
||||
conversions applied to it. See the documentation for parameter \code{data} in
|
||||
\link{xgb.DMatrix} for details on it.
|
||||
\item CSR matrices, as class \code{dgRMatrix} from package \code{Matrix}.
|
||||
\code{\link[=xgb.DMatrix]{xgb.DMatrix()}} for details on it.
|
||||
\item CSR matrices, as class \code{dgRMatrix} from package "Matrix".
|
||||
}}
|
||||
|
||||
\item{label}{Label of the training data. For classification problems, should be passed encoded as
|
||||
@ -47,19 +48,19 @@ so it doesn't make sense to assign weights to individual data points.}
|
||||
|
||||
In the case of multi-output models, one can also pass multi-dimensional base_margin.}
|
||||
|
||||
\item{feature_names}{Set names for features. Overrides column names in data
|
||||
frame and matrix.
|
||||
\item{feature_names}{Set names for features. Overrides column names in data frame and matrix.
|
||||
|
||||
Note: columns are not referenced by name when calling \code{predict}, so the column order there
|
||||
must be the same as in the DMatrix construction, regardless of the column names.}
|
||||
|
||||
\item{feature_types}{Set types for features.
|
||||
|
||||
If \code{data} is a \code{data.frame} and passing \code{feature_types} is not supplied, feature types will be deduced
|
||||
automatically from the column types.
|
||||
If \code{data} is a \code{data.frame} and passing \code{feature_types} is not supplied,
|
||||
feature types will be deduced automatically from the column types.
|
||||
|
||||
Otherwise, one can pass a character vector with the same length as number of columns in \code{data},
|
||||
with the following possible values:\itemize{
|
||||
with the following possible values:
|
||||
\itemize{
|
||||
\item "c", which represents categorical columns.
|
||||
\item "q", which represents numeric columns.
|
||||
\item "int", which represents integer columns.
|
||||
@ -70,9 +71,9 @@ Note that, while categorical types are treated differently from the rest for mod
|
||||
purposes, the other types do not influence the generated model, but have effects in other
|
||||
functionalities such as feature importances.
|
||||
|
||||
\bold{Important}: categorical features, if specified manually through \code{feature_types}, must
|
||||
\strong{Important}: Categorical features, if specified manually through \code{feature_types}, must
|
||||
be encoded as integers with numeration starting at zero, and the same encoding needs to be
|
||||
applied when passing data to \code{predict}. Even if passing \code{factor} types, the encoding will
|
||||
applied when passing data to \code{\link[=predict]{predict()}}. Even if passing \code{factor} types, the encoding will
|
||||
not be saved, so make sure that \code{factor} columns passed to \code{predict} have the same \code{levels}.}
|
||||
|
||||
\item{group}{Group size for all ranking group.}
|
||||
@ -87,24 +88,24 @@ not be saved, so make sure that \code{factor} columns passed to \code{predict} h
|
||||
}
|
||||
\value{
|
||||
An object of class \code{xgb.DataBatch}, which is just a list containing the
|
||||
data and parameters passed here. It does \bold{not} inherit from \code{xgb.DMatrix}.
|
||||
data and parameters passed here. It does \strong{not} inherit from \code{xgb.DMatrix}.
|
||||
}
|
||||
\description{
|
||||
Helper function to supply data in batches of a data iterator when
|
||||
constructing a DMatrix from external memory through \link{xgb.ExternalDMatrix}
|
||||
or through \link{xgb.QuantileDMatrix.from_iterator}.
|
||||
constructing a DMatrix from external memory through \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}
|
||||
or through \code{\link[=xgb.QuantileDMatrix.from_iterator]{xgb.QuantileDMatrix.from_iterator()}}.
|
||||
|
||||
This function is \bold{only} meant to be called inside of a callback function (which
|
||||
is passed as argument to function \link{xgb.DataIter} to construct a data iterator)
|
||||
This function is \strong{only} meant to be called inside of a callback function (which
|
||||
is passed as argument to function \code{\link[=xgb.DataIter]{xgb.DataIter()}} to construct a data iterator)
|
||||
when constructing a DMatrix through external memory - otherwise, one should call
|
||||
\link{xgb.DMatrix} or \link{xgb.QuantileDMatrix}.
|
||||
\code{\link[=xgb.DMatrix]{xgb.DMatrix()}} or \code{\link[=xgb.QuantileDMatrix]{xgb.QuantileDMatrix()}}.
|
||||
|
||||
The object that results from calling this function directly is \bold{not} like
|
||||
The object that results from calling this function directly is \strong{not} like
|
||||
an \code{xgb.DMatrix} - i.e. cannot be used to train a model, nor to get predictions - only
|
||||
possible usage is to supply data to an iterator, from which a DMatrix is then constructed.
|
||||
|
||||
For more information and for example usage, see the documentation for \link{xgb.ExternalDMatrix}.
|
||||
For more information and for example usage, see the documentation for \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}.
|
||||
}
|
||||
\seealso{
|
||||
\link{xgb.DataIter}, \link{xgb.ExternalDMatrix}.
|
||||
\code{\link[=xgb.DataIter]{xgb.DataIter()}}, \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}.
|
||||
}
|
||||
|
||||
@ -13,14 +13,15 @@ used to keep track of variables to determine how to handle the batches.
|
||||
For example, one might want to keep track of an iteration number in this environment in order
|
||||
to know which part of the data to pass next.}
|
||||
|
||||
\item{f_next}{\verb{function(env)} which is responsible for:\itemize{
|
||||
\item{f_next}{\verb{function(env)} which is responsible for:
|
||||
\itemize{
|
||||
\item Accessing or retrieving the next batch of data in the iterator.
|
||||
\item Supplying this data by calling function \link{xgb.DataBatch} on it and returning the result.
|
||||
\item Supplying this data by calling function \code{\link[=xgb.DataBatch]{xgb.DataBatch()}} on it and returning the result.
|
||||
\item Keeping track of where in the iterator batch it is or will go next, which can for example
|
||||
be done by modifiying variables in the \code{env} variable that is passed here.
|
||||
\item Signaling whether there are more batches to be consumed or not, by returning \code{NULL}
|
||||
when the stream of data ends (all batches in the iterator have been consumed), or the result from
|
||||
calling \link{xgb.DataBatch} when there are more batches in the line to be consumed.
|
||||
calling \code{\link[=xgb.DataBatch]{xgb.DataBatch()}} when there are more batches in the line to be consumed.
|
||||
}}
|
||||
|
||||
\item{f_reset}{\verb{function(env)} which is responsible for reseting the data iterator
|
||||
@ -32,7 +33,7 @@ Note that, after resetting the iterator, the batches will be accessed again, so
|
||||
}
|
||||
\value{
|
||||
An \code{xgb.DataIter} object, containing the same inputs supplied here, which can then
|
||||
be passed to \link{xgb.ExternalDMatrix}.
|
||||
be passed to \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}.
|
||||
}
|
||||
\description{
|
||||
Interface to create a custom data iterator in order to construct a DMatrix
|
||||
@ -41,11 +42,11 @@ from external memory.
|
||||
This function is responsible for generating an R object structure containing callback
|
||||
functions and an environment shared with them.
|
||||
|
||||
The output structure from this function is then meant to be passed to \link{xgb.ExternalDMatrix},
|
||||
The output structure from this function is then meant to be passed to \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}},
|
||||
which will consume the data and create a DMatrix from it by executing the callback functions.
|
||||
|
||||
For more information, and for a usage example, see the documentation for \link{xgb.ExternalDMatrix}.
|
||||
For more information, and for a usage example, see the documentation for \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}.
|
||||
}
|
||||
\seealso{
|
||||
\link{xgb.ExternalDMatrix}, \link{xgb.DataBatch}.
|
||||
\code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}, \code{\link[=xgb.DataBatch]{xgb.DataBatch()}}.
|
||||
}
|
||||
|
||||
@ -12,7 +12,7 @@ xgb.ExternalDMatrix(
|
||||
)
|
||||
}
|
||||
\arguments{
|
||||
\item{data_iterator}{A data iterator structure as returned by \link{xgb.DataIter},
|
||||
\item{data_iterator}{A data iterator structure as returned by \code{\link[=xgb.DataIter]{xgb.DataIter()}},
|
||||
which includes an environment shared between function calls, and functions to access
|
||||
the data in batches on-demand.}
|
||||
|
||||
@ -20,14 +20,14 @@ the data in batches on-demand.}
|
||||
|
||||
\item{missing}{A float value to represents missing values in data.
|
||||
|
||||
Note that, while functions like \link{xgb.DMatrix} can take a generic \code{NA} and interpret it
|
||||
Note that, while functions like \code{\link[=xgb.DMatrix]{xgb.DMatrix()}} can take a generic \code{NA} and interpret it
|
||||
correctly for different types like \code{numeric} and \code{integer}, if an \code{NA} value is passed here,
|
||||
it will not be adapted for different input types.
|
||||
|
||||
For example, in R \code{integer} types, missing values are represented by integer number \code{-2147483648}
|
||||
(since machine 'integer' types do not have an inherent 'NA' value) - hence, if one passes \code{NA},
|
||||
which is interpreted as a floating-point NaN by 'xgb.ExternalDMatrix' and by
|
||||
'xgb.QuantileDMatrix.from_iterator', these integer missing values will not be treated as missing.
|
||||
which is interpreted as a floating-point NaN by \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}} and by
|
||||
\code{\link[=xgb.QuantileDMatrix.from_iterator]{xgb.QuantileDMatrix.from_iterator()}}, these integer missing values will not be treated as missing.
|
||||
This should not pose any problem for \code{numeric} types, since they do have an inheret NaN value.}
|
||||
|
||||
\item{nthread}{Number of threads used for creating DMatrix.}
|
||||
@ -37,23 +37,22 @@ An 'xgb.DMatrix' object, with subclass 'xgb.ExternalDMatrix', in which the data
|
||||
held internally but accessed through the iterator when needed.
|
||||
}
|
||||
\description{
|
||||
Create a special type of xgboost 'DMatrix' object from external data
|
||||
supplied by an \link{xgb.DataIter} object, potentially passed in batches from a
|
||||
Create a special type of XGBoost 'DMatrix' object from external data
|
||||
supplied by an \code{\link[=xgb.DataIter]{xgb.DataIter()}} object, potentially passed in batches from a
|
||||
bigger set that might not fit entirely in memory.
|
||||
|
||||
The data supplied by the iterator is accessed on-demand as needed, multiple times,
|
||||
without being concatenated, but note that fields like 'label' \bold{will} be
|
||||
without being concatenated, but note that fields like 'label' \strong{will} be
|
||||
concatenated from multiple calls to the data iterator.
|
||||
|
||||
For more information, see the guide 'Using XGBoost External Memory Version':
|
||||
\url{https://xgboost.readthedocs.io/en/stable/tutorials/external_memory.html}
|
||||
}
|
||||
\examples{
|
||||
library(xgboost)
|
||||
data(mtcars)
|
||||
|
||||
# this custom environment will be passed to the iterator
|
||||
# functions at each call. It's up to the user to keep
|
||||
# This custom environment will be passed to the iterator
|
||||
# functions at each call. It is up to the user to keep
|
||||
# track of the iteration number in this environment.
|
||||
iterator_env <- as.environment(
|
||||
list(
|
||||
@ -118,5 +117,5 @@ pred_dm <- predict(model, dm)
|
||||
pred_mat <- predict(model, as.matrix(mtcars[, -1]))
|
||||
}
|
||||
\seealso{
|
||||
\link{xgb.DataIter}, \link{xgb.DataBatch}, \link{xgb.QuantileDMatrix.from_iterator}
|
||||
\code{\link[=xgb.DataIter]{xgb.DataIter()}}, \code{\link[=xgb.DataBatch]{xgb.DataBatch()}}, \code{\link[=xgb.QuantileDMatrix.from_iterator]{xgb.QuantileDMatrix.from_iterator()}}
|
||||
}
|
||||
|
||||
@ -13,26 +13,26 @@ xgb.QuantileDMatrix.from_iterator(
|
||||
)
|
||||
}
|
||||
\arguments{
|
||||
\item{data_iterator}{A data iterator structure as returned by \link{xgb.DataIter},
|
||||
\item{data_iterator}{A data iterator structure as returned by \code{\link[=xgb.DataIter]{xgb.DataIter()}},
|
||||
which includes an environment shared between function calls, and functions to access
|
||||
the data in batches on-demand.}
|
||||
|
||||
\item{missing}{A float value to represents missing values in data.
|
||||
|
||||
Note that, while functions like \link{xgb.DMatrix} can take a generic \code{NA} and interpret it
|
||||
Note that, while functions like \code{\link[=xgb.DMatrix]{xgb.DMatrix()}} can take a generic \code{NA} and interpret it
|
||||
correctly for different types like \code{numeric} and \code{integer}, if an \code{NA} value is passed here,
|
||||
it will not be adapted for different input types.
|
||||
|
||||
For example, in R \code{integer} types, missing values are represented by integer number \code{-2147483648}
|
||||
(since machine 'integer' types do not have an inherent 'NA' value) - hence, if one passes \code{NA},
|
||||
which is interpreted as a floating-point NaN by 'xgb.ExternalDMatrix' and by
|
||||
'xgb.QuantileDMatrix.from_iterator', these integer missing values will not be treated as missing.
|
||||
which is interpreted as a floating-point NaN by \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}} and by
|
||||
\code{\link[=xgb.QuantileDMatrix.from_iterator]{xgb.QuantileDMatrix.from_iterator()}}, these integer missing values will not be treated as missing.
|
||||
This should not pose any problem for \code{numeric} types, since they do have an inheret NaN value.}
|
||||
|
||||
\item{nthread}{Number of threads used for creating DMatrix.}
|
||||
|
||||
\item{ref}{The training dataset that provides quantile information, needed when creating
|
||||
validation/test dataset with \code{xgb.QuantileDMatrix}. Supplying the training DMatrix
|
||||
validation/test dataset with \code{\link[=xgb.QuantileDMatrix]{xgb.QuantileDMatrix()}}. Supplying the training DMatrix
|
||||
as a reference means that the same quantisation applied to the training data is
|
||||
applied to the validation/test data}
|
||||
|
||||
@ -46,20 +46,20 @@ An 'xgb.DMatrix' object, with subclass 'xgb.QuantileDMatrix'.
|
||||
}
|
||||
\description{
|
||||
Create an \code{xgb.QuantileDMatrix} object (exact same class as would be returned by
|
||||
calling function \link{xgb.QuantileDMatrix}, with the same advantages and limitations) from
|
||||
external data supplied by an \link{xgb.DataIter} object, potentially passed in batches from
|
||||
a bigger set that might not fit entirely in memory, same way as \link{xgb.ExternalDMatrix}.
|
||||
calling function \code{\link[=xgb.QuantileDMatrix]{xgb.QuantileDMatrix()}}, with the same advantages and limitations) from
|
||||
external data supplied by \code{\link[=xgb.DataIter]{xgb.DataIter()}}, potentially passed in batches from
|
||||
a bigger set that might not fit entirely in memory, same way as \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}.
|
||||
|
||||
Note that, while external data will only be loaded through the iterator (thus the full data
|
||||
might not be held entirely in-memory), the quantized representation of the data will get
|
||||
created in-memory, being concatenated from multiple calls to the data iterator. The quantized
|
||||
version is typically lighter than the original data, so there might be cases in which this
|
||||
representation could potentially fit in memory even if the full data doesn't.
|
||||
representation could potentially fit in memory even if the full data does not.
|
||||
|
||||
For more information, see the guide 'Using XGBoost External Memory Version':
|
||||
\url{https://xgboost.readthedocs.io/en/stable/tutorials/external_memory.html}
|
||||
}
|
||||
\seealso{
|
||||
\link{xgb.DataIter}, \link{xgb.DataBatch}, \link{xgb.ExternalDMatrix},
|
||||
\link{xgb.QuantileDMatrix}
|
||||
\code{\link[=xgb.DataIter]{xgb.DataIter()}}, \code{\link[=xgb.DataBatch]{xgb.DataBatch()}}, \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}},
|
||||
\code{\link[=xgb.QuantileDMatrix]{xgb.QuantileDMatrix()}}
|
||||
}
|
||||
|
||||
@ -51,7 +51,7 @@ Also, setting an attribute that has the same name as one of XGBoost's parameters
|
||||
change the value of that parameter for a model.
|
||||
Use \code{\link[=xgb.parameters<-]{xgb.parameters<-()}} to set or change model parameters.
|
||||
|
||||
The \code{\link[=xgb.attributes<-]{xgb.attributes<-()}} setter either updates the existing or adds one or several attributes,
|
||||
The \verb{xgb.attributes<-} setter either updates the existing or adds one or several attributes,
|
||||
but it doesn't delete the other existing attributes.
|
||||
|
||||
Important: since this modifies the booster's C object, semantics for assignment here
|
||||
|
||||
@ -26,141 +26,136 @@ xgb.cv(
|
||||
)
|
||||
}
|
||||
\arguments{
|
||||
\item{params}{the list of parameters. The complete list of parameters is
|
||||
available in the \href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}. Below
|
||||
is a shorter summary:
|
||||
\item{params}{The list of parameters. The complete list of parameters is available in the
|
||||
\href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}.
|
||||
Below is a shorter summary:
|
||||
\itemize{
|
||||
\item \code{objective} objective function, common ones are
|
||||
\item \code{objective}: Objective function, common ones are
|
||||
\itemize{
|
||||
\item \code{reg:squarederror} Regression with squared loss.
|
||||
\item \code{binary:logistic} logistic regression for classification.
|
||||
\item See \code{\link[=xgb.train]{xgb.train}()} for complete list of objectives.
|
||||
}
|
||||
\item \code{eta} step size of each boosting step
|
||||
\item \code{max_depth} maximum depth of the tree
|
||||
\item \code{nthread} number of thread used in training, if not set, all threads are used
|
||||
\item \code{reg:squarederror}: Regression with squared loss.
|
||||
\item \code{binary:logistic}: Logistic regression for classification.
|
||||
}
|
||||
|
||||
See \code{\link{xgb.train}} for further details.
|
||||
See also demo/ for walkthrough example in R.
|
||||
See \code{\link[=xgb.train]{xgb.train()}} for complete list of objectives.
|
||||
\item \code{eta}: Step size of each boosting step
|
||||
\item \code{max_depth}: Maximum depth of the tree
|
||||
\item \code{nthread}: Number of threads used in training. If not set, all threads are used
|
||||
}
|
||||
|
||||
See \code{\link[=xgb.train]{xgb.train()}} for further details.
|
||||
See also demo for walkthrough example in R.
|
||||
|
||||
Note that, while \code{params} accepts a \code{seed} entry and will use such parameter for model training if
|
||||
supplied, this seed is not used for creation of train-test splits, which instead rely on R's own RNG
|
||||
system - thus, for reproducible results, one needs to call the \code{set.seed} function beforehand.}
|
||||
system - thus, for reproducible results, one needs to call the \code{\link[=set.seed]{set.seed()}} function beforehand.}
|
||||
|
||||
\item{data}{An \code{xgb.DMatrix} object, with corresponding fields like \code{label} or bounds as required
|
||||
for model training by the objective.
|
||||
|
||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Note that only the basic `xgb.DMatrix` class is supported - variants such as `xgb.QuantileDMatrix`
|
||||
or `xgb.ExternalDMatrix` are not supported here.
|
||||
}\if{html}{\out{</div>}}}
|
||||
Note that only the basic \code{xgb.DMatrix} class is supported - variants such as \code{xgb.QuantileDMatrix}
|
||||
or \code{xgb.ExternalDMatrix} are not supported here.}
|
||||
|
||||
\item{nrounds}{the max number of iterations}
|
||||
\item{nrounds}{The max number of iterations.}
|
||||
|
||||
\item{nfold}{the original dataset is randomly partitioned into \code{nfold} equal size subsamples.}
|
||||
\item{nfold}{The original dataset is randomly partitioned into \code{nfold} equal size subsamples.}
|
||||
|
||||
\item{prediction}{A logical value indicating whether to return the test fold predictions
|
||||
from each CV model. This parameter engages the \code{\link{xgb.cb.cv.predict}} callback.}
|
||||
from each CV model. This parameter engages the \code{\link[=xgb.cb.cv.predict]{xgb.cb.cv.predict()}} callback.}
|
||||
|
||||
\item{showsd}{\code{boolean}, whether to show standard deviation of cross validation}
|
||||
\item{showsd}{Logical value whether to show standard deviation of cross validation.}
|
||||
|
||||
\item{metrics, }{list of evaluation metrics to be used in cross validation,
|
||||
\item{metrics}{List of evaluation metrics to be used in cross validation,
|
||||
when it is not specified, the evaluation metric is chosen according to objective function.
|
||||
Possible options are:
|
||||
\itemize{
|
||||
\item \code{error} binary classification error rate
|
||||
\item \code{rmse} Rooted mean square error
|
||||
\item \code{logloss} negative log-likelihood function
|
||||
\item \code{mae} Mean absolute error
|
||||
\item \code{mape} Mean absolute percentage error
|
||||
\item \code{auc} Area under curve
|
||||
\item \code{aucpr} Area under PR curve
|
||||
\item \code{merror} Exact matching error, used to evaluate multi-class classification
|
||||
\item \code{error}: Binary classification error rate
|
||||
\item \code{rmse}: Root mean square error
|
||||
\item \code{logloss}: Negative log-likelihood function
|
||||
\item \code{mae}: Mean absolute error
|
||||
\item \code{mape}: Mean absolute percentage error
|
||||
\item \code{auc}: Area under curve
|
||||
\item \code{aucpr}: Area under PR curve
|
||||
\item \code{merror}: Exact matching error used to evaluate multi-class classification
|
||||
}}
|
||||
|
||||
\item{obj}{customized objective function. Returns gradient and second order
|
||||
\item{obj}{Customized objective function. Returns gradient and second order
|
||||
gradient with given prediction and dtrain.}
|
||||
|
||||
\item{feval}{customized evaluation function. Returns
|
||||
\code{list(metric='metric-name', value='metric-value')} with given
|
||||
prediction and dtrain.}
|
||||
\item{feval}{Customized evaluation function. Returns
|
||||
\code{list(metric='metric-name', value='metric-value')} with given prediction and dtrain.}
|
||||
|
||||
\item{stratified}{A \code{boolean} indicating whether sampling of folds should be stratified
|
||||
\item{stratified}{Logical flag indicating whether sampling of folds should be stratified
|
||||
by the values of outcome labels. For real-valued labels in regression objectives,
|
||||
stratification will be done by discretizing the labels into up to 5 buckets beforehand.
|
||||
|
||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ If passing "auto", will be set to `TRUE` if the objective in `params` is a classification
|
||||
objective (from XGBoost's built-in objectives, doesn't apply to custom ones), and to
|
||||
`FALSE` otherwise.
|
||||
If passing "auto", will be set to \code{TRUE} if the objective in \code{params} is a classification
|
||||
objective (from XGBoost's built-in objectives, doesn't apply to custom ones), and to
|
||||
\code{FALSE} otherwise.
|
||||
|
||||
This parameter is ignored when `data` has a `group` field - in such case, the splitting
|
||||
will be based on whole groups (note that this might make the folds have different sizes).
|
||||
This parameter is ignored when \code{data} has a \code{group} field - in such case, the splitting
|
||||
will be based on whole groups (note that this might make the folds have different sizes).
|
||||
|
||||
Value `TRUE` here is \\bold\{not\} supported for custom objectives.
|
||||
}\if{html}{\out{</div>}}}
|
||||
Value \code{TRUE} here is \strong{not} supported for custom objectives.}
|
||||
|
||||
\item{folds}{\code{list} provides a possibility to use a list of pre-defined CV folds
|
||||
(each element must be a vector of test fold's indices). When folds are supplied,
|
||||
the \code{nfold} and \code{stratified} parameters are ignored.
|
||||
\item{folds}{List with pre-defined CV folds (each element must be a vector of test fold's indices).
|
||||
When folds are supplied, the \code{nfold} and \code{stratified} parameters are ignored.
|
||||
|
||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ If `data` has a `group` field and the objective requires this field, each fold (list element)
|
||||
must additionally have two attributes (retrievable through \link{attributes}) named `group_test`
|
||||
and `group_train`, which should hold the `group` to assign through \link{setinfo.xgb.DMatrix} to
|
||||
the resulting DMatrices.
|
||||
}\if{html}{\out{</div>}}}
|
||||
If \code{data} has a \code{group} field and the objective requires this field, each fold (list element)
|
||||
must additionally have two attributes (retrievable through \code{attributes}) named \code{group_test}
|
||||
and \code{group_train}, which should hold the \code{group} to assign through \code{\link[=setinfo.xgb.DMatrix]{setinfo.xgb.DMatrix()}} to
|
||||
the resulting DMatrices.}
|
||||
|
||||
\item{train_folds}{\code{list} list specifying which indicies to use for training. If \code{NULL}
|
||||
\item{train_folds}{List specifying which indices to use for training. If \code{NULL}
|
||||
(the default) all indices not specified in \code{folds} will be used for training.
|
||||
|
||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ This is not supported when `data` has `group` field.
|
||||
}\if{html}{\out{</div>}}}
|
||||
This is not supported when \code{data} has \code{group} field.}
|
||||
|
||||
\item{verbose}{\code{boolean}, print the statistics during the process}
|
||||
\item{verbose}{Logical flag. Should statistics be printed during the process?}
|
||||
|
||||
\item{print_every_n}{Print each n-th iteration evaluation messages when \code{verbose>0}.
|
||||
\item{print_every_n}{Print each nth iteration evaluation messages when \code{verbose > 0}.
|
||||
Default is 1 which means all messages are printed. This parameter is passed to the
|
||||
\code{\link{xgb.cb.print.evaluation}} callback.}
|
||||
\code{\link[=xgb.cb.print.evaluation]{xgb.cb.print.evaluation()}} callback.}
|
||||
|
||||
\item{early_stopping_rounds}{If \code{NULL}, the early stopping function is not triggered.
|
||||
If set to an integer \code{k}, training with a validation set will stop if the performance
|
||||
doesn't improve for \code{k} rounds.
|
||||
Setting this parameter engages the \code{\link{xgb.cb.early.stop}} callback.}
|
||||
Setting this parameter engages the \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}} callback.}
|
||||
|
||||
\item{maximize}{If \code{feval} and \code{early_stopping_rounds} are set,
|
||||
then this parameter must be set as well.
|
||||
When it is \code{TRUE}, it means the larger the evaluation score the better.
|
||||
This parameter is passed to the \code{\link{xgb.cb.early.stop}} callback.}
|
||||
This parameter is passed to the \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}} callback.}
|
||||
|
||||
\item{callbacks}{a list of callback functions to perform various task during boosting.
|
||||
See \code{\link{xgb.Callback}}. Some of the callbacks are automatically created depending on the
|
||||
\item{callbacks}{A list of callback functions to perform various task during boosting.
|
||||
See \code{\link[=xgb.Callback]{xgb.Callback()}}. Some of the callbacks are automatically created depending on the
|
||||
parameters' values. User can provide either existing or their own callback methods in order
|
||||
to customize the training process.}
|
||||
|
||||
\item{...}{other parameters to pass to \code{params}.}
|
||||
\item{...}{Other parameters to pass to \code{params}.}
|
||||
}
|
||||
\value{
|
||||
An object of class \code{xgb.cv.synchronous} with the following elements:
|
||||
An object of class 'xgb.cv.synchronous' with the following elements:
|
||||
\itemize{
|
||||
\item \code{call} a function call.
|
||||
\item \code{params} parameters that were passed to the xgboost library. Note that it does not
|
||||
capture parameters changed by the \code{\link{xgb.cb.reset.parameters}} callback.
|
||||
\item \code{evaluation_log} evaluation history stored as a \code{data.table} with the
|
||||
\item \code{call}: Function call.
|
||||
\item \code{params}: Parameters that were passed to the xgboost library. Note that it does not
|
||||
capture parameters changed by the \code{\link[=xgb.cb.reset.parameters]{xgb.cb.reset.parameters()}} callback.
|
||||
\item \code{evaluation_log}: Evaluation history stored as a \code{data.table} with the
|
||||
first column corresponding to iteration number and the rest corresponding to the
|
||||
CV-based evaluation means and standard deviations for the training and test CV-sets.
|
||||
It is created by the \code{\link{xgb.cb.evaluation.log}} callback.
|
||||
\item \code{niter} number of boosting iterations.
|
||||
\item \code{nfeatures} number of features in training data.
|
||||
\item \code{folds} the list of CV folds' indices - either those passed through the \code{folds}
|
||||
It is created by the \code{\link[=xgb.cb.evaluation.log]{xgb.cb.evaluation.log()}} callback.
|
||||
\item \code{niter}: Number of boosting iterations.
|
||||
\item \code{nfeatures}: Number of features in training data.
|
||||
\item \code{folds}: The list of CV folds' indices - either those passed through the \code{folds}
|
||||
parameter or randomly generated.
|
||||
\item \code{best_iteration} iteration number with the best evaluation metric value
|
||||
\item \code{best_iteration}: Iteration number with the best evaluation metric value
|
||||
(only available with early stopping).
|
||||
}
|
||||
|
||||
Plus other potential elements that are the result of callbacks, such as a list \code{cv_predict} with
|
||||
a sub-element \code{pred} when passing \code{prediction = TRUE}, which is added by the \link{xgb.cb.cv.predict}
|
||||
a sub-element \code{pred} when passing \code{prediction = TRUE}, which is added by the \code{\link[=xgb.cb.cv.predict]{xgb.cb.cv.predict()}}
|
||||
callback (note that one can also pass it manually under \code{callbacks} with different settings,
|
||||
such as saving also the models created during cross validation); or a list \code{early_stop} which
|
||||
will contain elements such as \code{best_iteration} when using the early stopping callback (\link{xgb.cb.early.stop}).
|
||||
will contain elements such as \code{best_iteration} when using the early stopping callback (\code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}}).
|
||||
}
|
||||
\description{
|
||||
The cross validation function of xgboost.
|
||||
@ -179,11 +174,20 @@ All observations are used for both training and validation.
|
||||
Adapted from \url{https://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29}
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package='xgboost')
|
||||
data(agaricus.train, package = "xgboost")
|
||||
|
||||
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||
cv <- xgb.cv(data = dtrain, nrounds = 3, nthread = 2, nfold = 5, metrics = list("rmse","auc"),
|
||||
max_depth = 3, eta = 1, objective = "binary:logistic")
|
||||
|
||||
cv <- xgb.cv(
|
||||
data = dtrain,
|
||||
nrounds = 3,
|
||||
nthread = 2,
|
||||
nfold = 5,
|
||||
metrics = list("rmse","auc"),
|
||||
max_depth = 3,
|
||||
eta = 1,objective = "binary:logistic"
|
||||
)
|
||||
print(cv)
|
||||
print(cv, verbose=TRUE)
|
||||
print(cv, verbose = TRUE)
|
||||
|
||||
}
|
||||
|
||||
@ -7,7 +7,7 @@
|
||||
xgb.get.DMatrix.data(dmat)
|
||||
}
|
||||
\arguments{
|
||||
\item{dmat}{An \code{xgb.DMatrix} object, as returned by \link{xgb.DMatrix}.}
|
||||
\item{dmat}{An \code{xgb.DMatrix} object, as returned by \code{\link[=xgb.DMatrix]{xgb.DMatrix()}}.}
|
||||
}
|
||||
\value{
|
||||
The data held in the DMatrix, as a sparse CSR matrix (class \code{dgRMatrix}
|
||||
|
||||
@ -7,10 +7,10 @@
|
||||
xgb.get.DMatrix.num.non.missing(dmat)
|
||||
}
|
||||
\arguments{
|
||||
\item{dmat}{An \code{xgb.DMatrix} object, as returned by \link{xgb.DMatrix}.}
|
||||
\item{dmat}{An \code{xgb.DMatrix} object, as returned by \code{\link[=xgb.DMatrix]{xgb.DMatrix()}}.}
|
||||
}
|
||||
\value{
|
||||
The number of non-missing entries in the DMatrix
|
||||
The number of non-missing entries in the DMatrix.
|
||||
}
|
||||
\description{
|
||||
Get Number of Non-Missing Entries in DMatrix
|
||||
|
||||
@ -7,15 +7,14 @@
|
||||
xgb.get.DMatrix.qcut(dmat, output = c("list", "arrays"))
|
||||
}
|
||||
\arguments{
|
||||
\item{dmat}{An \code{xgb.DMatrix} object, as returned by \link{xgb.DMatrix}.}
|
||||
\item{dmat}{An \code{xgb.DMatrix} object, as returned by \code{\link[=xgb.DMatrix]{xgb.DMatrix()}}.}
|
||||
|
||||
\item{output}{Output format for the quantile cuts. Possible options are:\itemize{
|
||||
\item \code{"list"} will return the output as a list with one entry per column, where
|
||||
each column will have a numeric vector with the cuts. The list will be named if
|
||||
\code{dmat} has column names assigned to it.
|
||||
\item{output}{Output format for the quantile cuts. Possible options are:
|
||||
\itemize{
|
||||
\item "list"\verb{will return the output as a list with one entry per column, where each column will have a numeric vector with the cuts. The list will be named if}dmat` has column names assigned to it.
|
||||
\item \code{"arrays"} will return a list with entries \code{indptr} (base-0 indexing) and
|
||||
\code{data}. Here, the cuts for column 'i' are obtained by slicing 'data' from entries
|
||||
\code{indptr[i]+1} to \code{indptr[i+1]}.
|
||||
\code{ indptr[i]+1} to \code{indptr[i+1]}.
|
||||
}}
|
||||
}
|
||||
\value{
|
||||
@ -23,7 +22,7 @@ The quantile cuts, in the format specified by parameter \code{output}.
|
||||
}
|
||||
\description{
|
||||
Get the quantile cuts (a.k.a. borders) from an \code{xgb.DMatrix}
|
||||
that has been quantized for the histogram method (\code{tree_method="hist"}).
|
||||
that has been quantized for the histogram method (\code{tree_method = "hist"}).
|
||||
|
||||
These cuts are used in order to assign observations to bins - i.e. these are ordered
|
||||
boundaries which are used to determine assignment condition \verb{border_low < x < border_high}.
|
||||
@ -36,8 +35,8 @@ which will be output in sorted order from lowest to highest.
|
||||
Different columns can have different numbers of bins according to their range.
|
||||
}
|
||||
\examples{
|
||||
library(xgboost)
|
||||
data(mtcars)
|
||||
|
||||
y <- mtcars$mpg
|
||||
x <- as.matrix(mtcars[, -1])
|
||||
dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
||||
@ -45,11 +44,7 @@ dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
||||
# DMatrix is not quantized right away, but will be once a hist model is generated
|
||||
model <- xgb.train(
|
||||
data = dm,
|
||||
params = list(
|
||||
tree_method = "hist",
|
||||
max_bin = 8,
|
||||
nthread = 1
|
||||
),
|
||||
params = list(tree_method = "hist", max_bin = 8, nthread = 1),
|
||||
nrounds = 3
|
||||
)
|
||||
|
||||
|
||||
@ -15,7 +15,7 @@ xgb.plot.multi.trees(
|
||||
}
|
||||
\arguments{
|
||||
\item{model}{Object of class \code{xgb.Booster}. If it contains feature names (they can be set through
|
||||
\link{setinfo}), they will be used in the output from this function.}
|
||||
\code{\link[=setinfo]{setinfo()}}, they will be used in the output from this function.}
|
||||
|
||||
\item{features_keep}{Number of features to keep in each position of the multi trees,
|
||||
by default 5.}
|
||||
|
||||
@ -17,7 +17,7 @@ xgb.plot.tree(
|
||||
}
|
||||
\arguments{
|
||||
\item{model}{Object of class \code{xgb.Booster}. If it contains feature names (they can be set through
|
||||
\link{setinfo}), they will be used in the output from this function.}
|
||||
\code{\link[=setinfo]{setinfo()}}, they will be used in the output from this function.}
|
||||
|
||||
\item{trees}{An integer vector of tree indices that should be used.
|
||||
The default (\code{NULL}) uses all trees.
|
||||
|
||||
@ -3,36 +3,36 @@
|
||||
\name{xgb.slice.DMatrix}
|
||||
\alias{xgb.slice.DMatrix}
|
||||
\alias{[.xgb.DMatrix}
|
||||
\title{Get a new DMatrix containing the specified rows of
|
||||
original xgb.DMatrix object}
|
||||
\title{Slice DMatrix}
|
||||
\usage{
|
||||
xgb.slice.DMatrix(object, idxset, allow_groups = FALSE)
|
||||
|
||||
\method{[}{xgb.DMatrix}(object, idxset, colset = NULL)
|
||||
}
|
||||
\arguments{
|
||||
\item{object}{Object of class "xgb.DMatrix".}
|
||||
\item{object}{Object of class \code{xgb.DMatrix}.}
|
||||
|
||||
\item{idxset}{An integer vector of indices of rows needed (base-1 indexing).}
|
||||
|
||||
\item{allow_groups}{Whether to allow slicing an \code{xgb.DMatrix} with \code{group} (or
|
||||
equivalently \code{qid}) field. Note that in such case, the result will not have
|
||||
the groups anymore - they need to be set manually through \code{setinfo}.}
|
||||
the groups anymore - they need to be set manually through \code{\link[=setinfo]{setinfo()}}.}
|
||||
|
||||
\item{colset}{currently not used (columns subsetting is not available)}
|
||||
\item{colset}{Currently not used (columns subsetting is not available).}
|
||||
}
|
||||
\description{
|
||||
Get a new DMatrix containing the specified rows of
|
||||
original xgb.DMatrix object
|
||||
Get a new DMatrix containing the specified rows of original xgb.DMatrix object.
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package='xgboost')
|
||||
data(agaricus.train, package = "xgboost")
|
||||
|
||||
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||
|
||||
dsub <- xgb.slice.DMatrix(dtrain, 1:42)
|
||||
labels1 <- getinfo(dsub, 'label')
|
||||
labels1 <- getinfo(dsub, "label")
|
||||
|
||||
dsub <- dtrain[1:42, ]
|
||||
labels2 <- getinfo(dsub, 'label')
|
||||
labels2 <- getinfo(dsub, "label")
|
||||
all.equal(labels1, labels2)
|
||||
|
||||
}
|
||||
|
||||
@ -24,106 +24,100 @@ xgb.train(
|
||||
}
|
||||
\arguments{
|
||||
\item{params}{the list of parameters. The complete list of parameters is
|
||||
available in the \href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}. Below
|
||||
is a shorter summary:
|
||||
\enumerate{
|
||||
\item General Parameters
|
||||
}
|
||||
available in the \href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}.
|
||||
Below is a shorter summary:
|
||||
|
||||
\strong{1. General Parameters}
|
||||
\itemize{
|
||||
\item \code{booster} which booster to use, can be \code{gbtree} or \code{gblinear}. Default: \code{gbtree}.
|
||||
}
|
||||
\enumerate{
|
||||
\item Booster Parameters
|
||||
\item \code{booster}: Which booster to use, can be \code{gbtree} or \code{gblinear}. Default: \code{gbtree}.
|
||||
}
|
||||
|
||||
2.1. Parameters for Tree Booster
|
||||
\strong{2. Booster Parameters}
|
||||
|
||||
\strong{2.1. Parameters for Tree Booster}
|
||||
\itemize{
|
||||
\item{ \code{eta} control the learning rate: scale the contribution of each tree by a factor of \code{0 < eta < 1}
|
||||
\item \code{eta}: The learning rate: scale the contribution of each tree by a factor of \verb{0 < eta < 1}
|
||||
when it is added to the current approximation.
|
||||
Used to prevent overfitting by making the boosting process more conservative.
|
||||
Lower value for \code{eta} implies larger value for \code{nrounds}: low \code{eta} value means model
|
||||
more robust to overfitting but slower to compute. Default: 0.3}
|
||||
\item{ \code{gamma} minimum loss reduction required to make a further partition on a leaf node of the tree.
|
||||
the larger, the more conservative the algorithm will be.}
|
||||
\item \code{max_depth} maximum depth of a tree. Default: 6
|
||||
\item{\code{min_child_weight} minimum sum of instance weight (hessian) needed in a child.
|
||||
more robust to overfitting but slower to compute. Default: 0.3.
|
||||
\item \code{gamma}: Minimum loss reduction required to make a further partition on a leaf node of the tree.
|
||||
the larger, the more conservative the algorithm will be.
|
||||
\item \code{max_depth}: Maximum depth of a tree. Default: 6.
|
||||
\item \code{min_child_weight}: Minimum sum of instance weight (hessian) needed in a child.
|
||||
If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight,
|
||||
then the building process will give up further partitioning.
|
||||
In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node.
|
||||
The larger, the more conservative the algorithm will be. Default: 1}
|
||||
\item{ \code{subsample} subsample ratio of the training instance.
|
||||
The larger, the more conservative the algorithm will be. Default: 1.
|
||||
\item \code{subsample}: Subsample ratio of the training instance.
|
||||
Setting it to 0.5 means that xgboost randomly collected half of the data instances to grow trees
|
||||
and this will prevent overfitting. It makes computation shorter (because less data to analyse).
|
||||
It is advised to use this parameter with \code{eta} and increase \code{nrounds}. Default: 1}
|
||||
\item \code{colsample_bytree} subsample ratio of columns when constructing each tree. Default: 1
|
||||
\item \code{lambda} L2 regularization term on weights. Default: 1
|
||||
\item \code{alpha} L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0
|
||||
\item{ \code{num_parallel_tree} Experimental parameter. number of trees to grow per round.
|
||||
Useful to test Random Forest through XGBoost
|
||||
It is advised to use this parameter with \code{eta} and increase \code{nrounds}. Default: 1.
|
||||
\item \code{colsample_bytree}: Subsample ratio of columns when constructing each tree. Default: 1.
|
||||
\item \code{lambda}: L2 regularization term on weights. Default: 1.
|
||||
\item \code{alpha}: L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0.
|
||||
\item \code{num_parallel_tree}: Experimental parameter. number of trees to grow per round.
|
||||
Useful to test Random Forest through XGBoost.
|
||||
(set \code{colsample_bytree < 1}, \code{subsample < 1} and \code{round = 1}) accordingly.
|
||||
Default: 1}
|
||||
\item{ \code{monotone_constraints} A numerical vector consists of \code{1}, \code{0} and \code{-1} with its length
|
||||
Default: 1.
|
||||
\item \code{monotone_constraints}: A numerical vector consists of \code{1}, \code{0} and \code{-1} with its length
|
||||
equals to the number of features in the training data.
|
||||
\code{1} is increasing, \code{-1} is decreasing and \code{0} is no constraint.}
|
||||
\item{ \code{interaction_constraints} A list of vectors specifying feature indices of permitted interactions.
|
||||
\code{1} is increasing, \code{-1} is decreasing and \code{0} is no constraint.
|
||||
\item \code{interaction_constraints}: A list of vectors specifying feature indices of permitted interactions.
|
||||
Each item of the list represents one permitted interaction where specified features are allowed to interact with each other.
|
||||
Feature index values should start from \code{0} (\code{0} references the first column).
|
||||
Leave argument unspecified for no interaction constraints.}
|
||||
Leave argument unspecified for no interaction constraints.
|
||||
}
|
||||
|
||||
2.2. Parameters for Linear Booster
|
||||
|
||||
\strong{2.2. Parameters for Linear Booster}
|
||||
\itemize{
|
||||
\item \code{lambda} L2 regularization term on weights. Default: 0
|
||||
\item \code{lambda_bias} L2 regularization term on bias. Default: 0
|
||||
\item \code{alpha} L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0
|
||||
}
|
||||
\enumerate{
|
||||
\item Task Parameters
|
||||
\item \code{lambda}: L2 regularization term on weights. Default: 0.
|
||||
\item \code{lambda_bias}: L2 regularization term on bias. Default: 0.
|
||||
\item \code{alpha}: L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0.
|
||||
}
|
||||
|
||||
\strong{3. Task Parameters}
|
||||
\itemize{
|
||||
\item{ \code{objective} specify the learning task and the corresponding learning objective, users can pass a self-defined function to it.
|
||||
The default objective options are below:
|
||||
\item \code{objective}: Specifies the learning task and the corresponding learning objective.
|
||||
users can pass a self-defined function to it. The default objective options are below:
|
||||
\itemize{
|
||||
\item \code{reg:squarederror} Regression with squared loss (Default).
|
||||
\item{ \code{reg:squaredlogerror}: regression with squared log loss \eqn{1/2 * (log(pred + 1) - log(label + 1))^2}.
|
||||
\item \code{reg:squarederror}: Regression with squared loss (default).
|
||||
\item \code{reg:squaredlogerror}: Regression with squared log loss \eqn{1/2 \cdot (\log(pred + 1) - \log(label + 1))^2}.
|
||||
All inputs are required to be greater than -1.
|
||||
Also, see metric rmsle for possible issue with this objective.}
|
||||
\item \code{reg:logistic} logistic regression.
|
||||
\item \code{reg:pseudohubererror}: regression with Pseudo Huber loss, a twice differentiable alternative to absolute loss.
|
||||
\item \code{binary:logistic} logistic regression for binary classification. Output probability.
|
||||
\item \code{binary:logitraw} logistic regression for binary classification, output score before logistic transformation.
|
||||
\item \code{binary:hinge}: hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities.
|
||||
\item{ \code{count:poisson}: Poisson regression for count data, output mean of Poisson distribution.
|
||||
\code{max_delta_step} is set to 0.7 by default in poisson regression (used to safeguard optimization).}
|
||||
\item{ \code{survival:cox}: Cox regression for right censored survival time data (negative values are considered right censored).
|
||||
Also, see metric rmsle for possible issue with this objective.
|
||||
\item \code{reg:logistic}: Logistic regression.
|
||||
\item \code{reg:pseudohubererror}: Regression with Pseudo Huber loss, a twice differentiable alternative to absolute loss.
|
||||
\item \code{binary:logistic}: Logistic regression for binary classification. Output probability.
|
||||
\item \code{binary:logitraw}: Logistic regression for binary classification, output score before logistic transformation.
|
||||
\item \code{binary:hinge}: Hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities.
|
||||
\item \code{count:poisson}: Poisson regression for count data, output mean of Poisson distribution.
|
||||
The parameter \code{max_delta_step} is set to 0.7 by default in poisson regression
|
||||
(used to safeguard optimization).
|
||||
\item \code{survival:cox}: Cox regression for right censored survival time data (negative values are considered right censored).
|
||||
Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional
|
||||
hazard function \code{h(t) = h0(t) * HR)}.}
|
||||
\item{ \code{survival:aft}: Accelerated failure time model for censored survival time data. See
|
||||
hazard function \eqn{h(t) = h_0(t) \cdot HR}.
|
||||
\item \code{survival:aft}: Accelerated failure time model for censored survival time data. See
|
||||
\href{https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html}{Survival Analysis with Accelerated Failure Time}
|
||||
for details.}
|
||||
\item \code{aft_loss_distribution}: Probability Density Function used by \code{survival:aft} and \code{aft-nloglik} metric.
|
||||
\item{ \code{multi:softmax} set xgboost to do multiclass classification using the softmax objective.
|
||||
Class is represented by a number and should be from 0 to \code{num_class - 1}.}
|
||||
\item{ \code{multi:softprob} same as softmax, but prediction outputs a vector of ndata * nclass elements, which can be
|
||||
for details.
|
||||
The parameter \code{aft_loss_distribution} specifies the Probability Density Function
|
||||
used by \code{survival:aft} and the \code{aft-nloglik} metric.
|
||||
\item \code{multi:softmax}: Set xgboost to do multiclass classification using the softmax objective.
|
||||
Class is represented by a number and should be from 0 to \code{num_class - 1}.
|
||||
\item \code{multi:softprob}: Same as softmax, but prediction outputs a vector of ndata * nclass elements, which can be
|
||||
further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging
|
||||
to each class.}
|
||||
\item \code{rank:pairwise} set xgboost to do ranking task by minimizing the pairwise loss.
|
||||
\item{ \code{rank:ndcg}: Use LambdaMART to perform list-wise ranking where
|
||||
\href{https://en.wikipedia.org/wiki/Discounted_cumulative_gain}{Normalized Discounted Cumulative Gain (NDCG)} is maximized.}
|
||||
\item{ \code{rank:map}: Use LambdaMART to perform list-wise ranking where
|
||||
to each class.
|
||||
\item \code{rank:pairwise}: Set XGBoost to do ranking task by minimizing the pairwise loss.
|
||||
\item \code{rank:ndcg}: Use LambdaMART to perform list-wise ranking where
|
||||
\href{https://en.wikipedia.org/wiki/Discounted_cumulative_gain}{Normalized Discounted Cumulative Gain (NDCG)} is maximized.
|
||||
\item \code{rank:map}: Use LambdaMART to perform list-wise ranking where
|
||||
\href{https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision}{Mean Average Precision (MAP)}
|
||||
is maximized.}
|
||||
\item{ \code{reg:gamma}: gamma regression with log-link.
|
||||
Output is a mean of gamma distribution.
|
||||
is maximized.
|
||||
\item \code{reg:gamma}: Gamma regression with log-link. Output is a mean of gamma distribution.
|
||||
It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be
|
||||
\href{https://en.wikipedia.org/wiki/Gamma_distribution#Applications}{gamma-distributed}.}
|
||||
\item{ \code{reg:tweedie}: Tweedie regression with log-link.
|
||||
\href{https://en.wikipedia.org/wiki/Gamma_distribution#Applications}{gamma-distributed}.
|
||||
\item \code{reg:tweedie}: Tweedie regression with log-link.
|
||||
It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be
|
||||
\href{https://en.wikipedia.org/wiki/Tweedie_distribution#Applications}{Tweedie-distributed}.}
|
||||
\href{https://en.wikipedia.org/wiki/Tweedie_distribution#Applications}{Tweedie-distributed}.
|
||||
}
|
||||
|
||||
For custom objectives, one should pass a function taking as input the current predictions (as a numeric
|
||||
@ -134,91 +128,85 @@ For multi-valued custom objectives, should have shape \verb{[nrows, ntargets]}.
|
||||
the Hessian will be clipped, so one might consider using the expected Hessian (Fisher information) if the
|
||||
objective is non-convex.
|
||||
|
||||
See the tutorials \href{https://xgboost.readthedocs.io/en/stable/tutorials/custom_metric_obj.html}{
|
||||
Custom Objective and Evaluation Metric} and \href{https://xgboost.readthedocs.io/en/stable/tutorials/advanced_custom_obj}{
|
||||
Advanced Usage of Custom Objectives} for more information about custom objectives.
|
||||
}
|
||||
\item \code{base_score} the initial prediction score of all instances, global bias. Default: 0.5
|
||||
\item{ \code{eval_metric} evaluation metrics for validation data.
|
||||
See the tutorials \href{https://xgboost.readthedocs.io/en/stable/tutorials/custom_metric_obj.html}{Custom Objective and Evaluation Metric}
|
||||
and \href{https://xgboost.readthedocs.io/en/stable/tutorials/advanced_custom_obj}{Advanced Usage of Custom Objectives}
|
||||
for more information about custom objectives.
|
||||
\item \code{base_score}: The initial prediction score of all instances, global bias. Default: 0.5.
|
||||
\item \code{eval_metric}: Evaluation metrics for validation data.
|
||||
Users can pass a self-defined function to it.
|
||||
Default: metric will be assigned according to objective
|
||||
(rmse for regression, and error for classification, mean average precision for ranking).
|
||||
List is provided in detail section.}
|
||||
List is provided in detail section.
|
||||
}}
|
||||
|
||||
\item{data}{training dataset. \code{xgb.train} accepts only an \code{xgb.DMatrix} as the input.
|
||||
\code{xgboost}, in addition, also accepts \code{matrix}, \code{dgCMatrix}, or name of a local data file.}
|
||||
\item{data}{Training dataset. \code{xgb.train()} accepts only an \code{xgb.DMatrix} as the input.
|
||||
\code{\link[=xgboost]{xgboost()}}, in addition, also accepts \code{matrix}, \code{dgCMatrix}, or name of a local data file.}
|
||||
|
||||
\item{nrounds}{max number of boosting iterations.}
|
||||
\item{nrounds}{Max number of boosting iterations.}
|
||||
|
||||
\item{evals}{Named list of \code{xgb.DMatrix} datasets to use for evaluating model performance.
|
||||
Metrics specified in either \code{eval_metric} or \code{feval} will be computed for each
|
||||
of these datasets during each boosting iteration, and stored in the end as a field named
|
||||
\code{evaluation_log} in the resulting object. When either \code{verbose>=1} or
|
||||
\code{\link{xgb.cb.print.evaluation}} callback is engaged, the performance results are continuously
|
||||
\code{\link[=xgb.cb.print.evaluation]{xgb.cb.print.evaluation()}} callback is engaged, the performance results are continuously
|
||||
printed out during the training.
|
||||
E.g., specifying \code{evals=list(validation1=mat1, validation2=mat2)} allows to track
|
||||
the performance of each round's model on mat1 and mat2.}
|
||||
|
||||
\item{obj}{customized objective function. Should take two arguments: the first one will be the
|
||||
\item{obj}{Customized objective function. Should take two arguments: the first one will be the
|
||||
current predictions (either a numeric vector or matrix depending on the number of targets / classes),
|
||||
and the second one will be the \code{data} DMatrix object that is used for training.
|
||||
|
||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ It should return a list with two elements `grad` and `hess` (in that order), as either
|
||||
numeric vectors or numeric matrices depending on the number of targets / classes (same
|
||||
dimension as the predictions that are passed as first argument).
|
||||
}\if{html}{\out{</div>}}}
|
||||
It should return a list with two elements \code{grad} and \code{hess} (in that order), as either
|
||||
numeric vectors or numeric matrices depending on the number of targets / classes (same
|
||||
dimension as the predictions that are passed as first argument).}
|
||||
|
||||
\item{feval}{customized evaluation function. Just like \code{obj}, should take two arguments, with
|
||||
\item{feval}{Customized evaluation function. Just like \code{obj}, should take two arguments, with
|
||||
the first one being the predictions and the second one the \code{data} DMatrix.
|
||||
|
||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Should return a list with two elements `metric` (name that will be displayed for this metric,
|
||||
should be a string / character), and `value` (the number that the function calculates, should
|
||||
be a numeric scalar).
|
||||
Should return a list with two elements \code{metric} (name that will be displayed for this metric,
|
||||
should be a string / character), and \code{value} (the number that the function calculates, should
|
||||
be a numeric scalar).
|
||||
|
||||
Note that even if passing `feval`, objectives also have an associated default metric that
|
||||
will be evaluated in addition to it. In order to disable the built-in metric, one can pass
|
||||
parameter `disable_default_eval_metric = TRUE`.
|
||||
}\if{html}{\out{</div>}}}
|
||||
Note that even if passing \code{feval}, objectives also have an associated default metric that
|
||||
will be evaluated in addition to it. In order to disable the built-in metric, one can pass
|
||||
parameter \code{disable_default_eval_metric = TRUE}.}
|
||||
|
||||
\item{verbose}{If 0, xgboost will stay silent. If 1, it will print information about performance.
|
||||
If 2, some additional information will be printed out.
|
||||
Note that setting \code{verbose > 0} automatically engages the
|
||||
\code{xgb.cb.print.evaluation(period=1)} callback function.}
|
||||
|
||||
\item{print_every_n}{Print each n-th iteration evaluation messages when \code{verbose>0}.
|
||||
\item{print_every_n}{Print each nth iteration evaluation messages when \code{verbose>0}.
|
||||
Default is 1 which means all messages are printed. This parameter is passed to the
|
||||
\code{\link{xgb.cb.print.evaluation}} callback.}
|
||||
\code{\link[=xgb.cb.print.evaluation]{xgb.cb.print.evaluation()}} callback.}
|
||||
|
||||
\item{early_stopping_rounds}{If \code{NULL}, the early stopping function is not triggered.
|
||||
If set to an integer \code{k}, training with a validation set will stop if the performance
|
||||
doesn't improve for \code{k} rounds.
|
||||
Setting this parameter engages the \code{\link{xgb.cb.early.stop}} callback.}
|
||||
doesn't improve for \code{k} rounds. Setting this parameter engages the \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}} callback.}
|
||||
|
||||
\item{maximize}{If \code{feval} and \code{early_stopping_rounds} are set,
|
||||
then this parameter must be set as well.
|
||||
\item{maximize}{If \code{feval} and \code{early_stopping_rounds} are set, then this parameter must be set as well.
|
||||
When it is \code{TRUE}, it means the larger the evaluation score the better.
|
||||
This parameter is passed to the \code{\link{xgb.cb.early.stop}} callback.}
|
||||
This parameter is passed to the \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}} callback.}
|
||||
|
||||
\item{save_period}{when it is non-NULL, model is saved to disk after every \code{save_period} rounds,
|
||||
0 means save at the end. The saving is handled by the \code{\link{xgb.cb.save.model}} callback.}
|
||||
\item{save_period}{When not \code{NULL}, model is saved to disk after every \code{save_period} rounds.
|
||||
0 means save at the end. The saving is handled by the \code{\link[=xgb.cb.save.model]{xgb.cb.save.model()}} callback.}
|
||||
|
||||
\item{save_name}{the name or path for periodically saved model file.}
|
||||
|
||||
\item{xgb_model}{a previously built model to continue the training from.
|
||||
\item{xgb_model}{A previously built model to continue the training from.
|
||||
Could be either an object of class \code{xgb.Booster}, or its raw data, or the name of a
|
||||
file with a previously saved model.}
|
||||
|
||||
\item{callbacks}{a list of callback functions to perform various task during boosting.
|
||||
See \code{\link{xgb.Callback}}. Some of the callbacks are automatically created depending on the
|
||||
\item{callbacks}{A list of callback functions to perform various task during boosting.
|
||||
See \code{\link[=xgb.Callback]{xgb.Callback()}}. Some of the callbacks are automatically created depending on the
|
||||
parameters' values. User can provide either existing or their own callback methods in order
|
||||
to customize the training process.
|
||||
|
||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Note that some callbacks might try to leave attributes in the resulting model object,
|
||||
such as an evaluation log (a `data.table` object) - be aware that these objects are kept
|
||||
as R attributes, and thus do not get saved when using XGBoost's own serializaters like
|
||||
\link{xgb.save} (but are kept when using R serializers like \link{saveRDS}).
|
||||
}\if{html}{\out{</div>}}}
|
||||
Note that some callbacks might try to leave attributes in the resulting model object,
|
||||
such as an evaluation log (a \code{data.table} object) - be aware that these objects are kept
|
||||
as R attributes, and thus do not get saved when using XGBoost's own serializaters like
|
||||
\code{\link[=xgb.save]{xgb.save()}} (but are kept when using R serializers like \code{\link[=saveRDS]{saveRDS()}}).}
|
||||
|
||||
\item{...}{other parameters to pass to \code{params}.}
|
||||
}
|
||||
@ -226,19 +214,18 @@ to customize the training process.
|
||||
An object of class \code{xgb.Booster}.
|
||||
}
|
||||
\description{
|
||||
\code{xgb.train} is an advanced interface for training an xgboost model.
|
||||
The \code{xgboost} function is a simpler wrapper for \code{xgb.train}.
|
||||
\code{xgb.train()} is an advanced interface for training an xgboost model.
|
||||
The \code{\link[=xgboost]{xgboost()}} function is a simpler wrapper for \code{xgb.train()}.
|
||||
}
|
||||
\details{
|
||||
These are the training functions for \code{xgboost}.
|
||||
These are the training functions for \code{\link[=xgboost]{xgboost()}}.
|
||||
|
||||
The \code{xgb.train} interface supports advanced features such as \code{evals},
|
||||
The \code{xgb.train()} interface supports advanced features such as \code{evals},
|
||||
customized objective and evaluation metric functions, therefore it is more flexible
|
||||
than the \code{xgboost} interface.
|
||||
than the \code{\link[=xgboost]{xgboost()}} interface.
|
||||
|
||||
Parallelization is automatically enabled if \code{OpenMP} is present.
|
||||
Number of threads can also be manually specified via the \code{nthread}
|
||||
parameter.
|
||||
Parallelization is automatically enabled if OpenMP is present.
|
||||
Number of threads can also be manually specified via the \code{nthread} parameter.
|
||||
|
||||
While in other interfaces, the default random seed defaults to zero, in R, if a parameter \code{seed}
|
||||
is not manually supplied, it will generate a random seed through R's own random number generator,
|
||||
@ -251,49 +238,49 @@ User may set one or several \code{eval_metric} parameters.
|
||||
Note that when using a customized metric, only this single metric can be used.
|
||||
The following is the list of built-in metrics for which XGBoost provides optimized implementation:
|
||||
\itemize{
|
||||
\item \code{rmse} root mean square error. \url{https://en.wikipedia.org/wiki/Root_mean_square_error}
|
||||
\item \code{logloss} negative log-likelihood. \url{https://en.wikipedia.org/wiki/Log-likelihood}
|
||||
\item \code{mlogloss} multiclass logloss. \url{https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html}
|
||||
\item \code{error} Binary classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
|
||||
\item \code{rmse}: Root mean square error. \url{https://en.wikipedia.org/wiki/Root_mean_square_error}
|
||||
\item \code{logloss}: Negative log-likelihood. \url{https://en.wikipedia.org/wiki/Log-likelihood}
|
||||
\item \code{mlogloss}: Multiclass logloss. \url{https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html}
|
||||
\item \code{error}: Binary classification error rate. It is calculated as \verb{(# wrong cases) / (# all cases)}.
|
||||
By default, it uses the 0.5 threshold for predicted values to define negative and positive instances.
|
||||
Different threshold (e.g., 0.) could be specified as "error@0."
|
||||
\item \code{merror} Multiclass classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
|
||||
\item \code{mae} Mean absolute error
|
||||
\item \code{mape} Mean absolute percentage error
|
||||
\item{ \code{auc} Area under the curve.
|
||||
\url{https://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.}
|
||||
\item \code{aucpr} Area under the PR curve. \url{https://en.wikipedia.org/wiki/Precision_and_recall} for ranking evaluation.
|
||||
\item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{https://en.wikipedia.org/wiki/NDCG}
|
||||
Different threshold (e.g., 0.) could be specified as \verb{error@0}.
|
||||
\item \code{merror}: Multiclass classification error rate. It is calculated as \verb{(# wrong cases) / (# all cases)}.
|
||||
\item \code{mae}: Mean absolute error.
|
||||
\item \code{mape}: Mean absolute percentage error.
|
||||
\item \code{auc}: Area under the curve.
|
||||
\url{https://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.
|
||||
\item \code{aucpr}: Area under the PR curve. \url{https://en.wikipedia.org/wiki/Precision_and_recall} for ranking evaluation.
|
||||
\item \code{ndcg}: Normalized Discounted Cumulative Gain (for ranking task). \url{https://en.wikipedia.org/wiki/NDCG}
|
||||
}
|
||||
|
||||
The following callbacks are automatically created when certain parameters are set:
|
||||
\itemize{
|
||||
\item \code{xgb.cb.print.evaluation} is turned on when \code{verbose > 0};
|
||||
and the \code{print_every_n} parameter is passed to it.
|
||||
\item \code{xgb.cb.evaluation.log} is on when \code{evals} is present.
|
||||
\item \code{xgb.cb.early.stop}: when \code{early_stopping_rounds} is set.
|
||||
\item \code{xgb.cb.save.model}: when \code{save_period > 0} is set.
|
||||
\item \code{\link[=xgb.cb.print.evaluation]{xgb.cb.print.evaluation()}} is turned on when \code{verbose > 0} and the \code{print_every_n}
|
||||
parameter is passed to it.
|
||||
\item \code{\link[=xgb.cb.evaluation.log]{xgb.cb.evaluation.log()}} is on when \code{evals} is present.
|
||||
\item \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}}: When \code{early_stopping_rounds} is set.
|
||||
\item \code{\link[=xgb.cb.save.model]{xgb.cb.save.model()}}: When \code{save_period > 0} is set.
|
||||
}
|
||||
|
||||
Note that objects of type \code{xgb.Booster} as returned by this function behave a bit differently
|
||||
from typical R objects (it's an 'altrep' list class), and it makes a separation between
|
||||
internal booster attributes (restricted to jsonifyable data), accessed through \link{xgb.attr}
|
||||
and shared between interfaces through serialization functions like \link{xgb.save}; and
|
||||
R-specific attributes (typically the result from a callback), accessed through \link{attributes}
|
||||
and \link{attr}, which are otherwise
|
||||
only used in the R interface, only kept when using R's serializers like \link{saveRDS}, and
|
||||
not anyhow used by functions like \link{predict.xgb.Booster}.
|
||||
internal booster attributes (restricted to jsonifyable data), accessed through \code{\link[=xgb.attr]{xgb.attr()}}
|
||||
and shared between interfaces through serialization functions like \code{\link[=xgb.save]{xgb.save()}}; and
|
||||
R-specific attributes (typically the result from a callback), accessed through \code{\link[=attributes]{attributes()}}
|
||||
and \code{\link[=attr]{attr()}}, which are otherwise
|
||||
only used in the R interface, only kept when using R's serializers like \code{\link[=saveRDS]{saveRDS()}}, and
|
||||
not anyhow used by functions like \code{predict.xgb.Booster()}.
|
||||
|
||||
Be aware that one such R attribute that is automatically added is \code{params} - this attribute
|
||||
is assigned from the \code{params} argument to this function, and is only meant to serve as a
|
||||
reference for what went into the booster, but is not used in other methods that take a booster
|
||||
object - so for example, changing the booster's configuration requires calling \verb{xgb.config<-}
|
||||
or 'xgb.parameters<-', while simply modifying \verb{attributes(model)$params$<...>} will have no
|
||||
or \verb{xgb.parameters<-}, while simply modifying \verb{attributes(model)$params$<...>} will have no
|
||||
effect elsewhere.
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package='xgboost')
|
||||
data(agaricus.test, package='xgboost')
|
||||
data(agaricus.train, package = "xgboost")
|
||||
data(agaricus.test, package = "xgboost")
|
||||
|
||||
## Keep the number of threads to 1 for examples
|
||||
nthread <- 1
|
||||
@ -308,8 +295,13 @@ dtest <- with(
|
||||
evals <- list(train = dtrain, eval = dtest)
|
||||
|
||||
## A simple xgb.train example:
|
||||
param <- list(max_depth = 2, eta = 1, nthread = nthread,
|
||||
objective = "binary:logistic", eval_metric = "auc")
|
||||
param <- list(
|
||||
max_depth = 2,
|
||||
eta = 1,
|
||||
nthread = nthread,
|
||||
objective = "binary:logistic",
|
||||
eval_metric = "auc"
|
||||
)
|
||||
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)
|
||||
|
||||
## An xgb.train example where custom objective and evaluation metric are
|
||||
@ -329,34 +321,65 @@ evalerror <- function(preds, dtrain) {
|
||||
|
||||
# These functions could be used by passing them either:
|
||||
# as 'objective' and 'eval_metric' parameters in the params list:
|
||||
param <- list(max_depth = 2, eta = 1, nthread = nthread,
|
||||
objective = logregobj, eval_metric = evalerror)
|
||||
param <- list(
|
||||
max_depth = 2,
|
||||
eta = 1,
|
||||
nthread = nthread,
|
||||
objective = logregobj,
|
||||
eval_metric = evalerror
|
||||
)
|
||||
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)
|
||||
|
||||
# or through the ... arguments:
|
||||
param <- list(max_depth = 2, eta = 1, nthread = nthread)
|
||||
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0,
|
||||
objective = logregobj, eval_metric = evalerror)
|
||||
bst <- xgb.train(
|
||||
param,
|
||||
dtrain,
|
||||
nrounds = 2,
|
||||
evals = evals,
|
||||
verbose = 0,
|
||||
objective = logregobj,
|
||||
eval_metric = evalerror
|
||||
)
|
||||
|
||||
# or as dedicated 'obj' and 'feval' parameters of xgb.train:
|
||||
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals,
|
||||
obj = logregobj, feval = evalerror)
|
||||
bst <- xgb.train(
|
||||
param, dtrain, nrounds = 2, evals = evals, obj = logregobj, feval = evalerror
|
||||
)
|
||||
|
||||
|
||||
## An xgb.train example of using variable learning rates at each iteration:
|
||||
param <- list(max_depth = 2, eta = 1, nthread = nthread,
|
||||
objective = "binary:logistic", eval_metric = "auc")
|
||||
param <- list(
|
||||
max_depth = 2,
|
||||
eta = 1,
|
||||
nthread = nthread,
|
||||
objective = "binary:logistic",
|
||||
eval_metric = "auc"
|
||||
)
|
||||
my_etas <- list(eta = c(0.5, 0.1))
|
||||
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0,
|
||||
callbacks = list(xgb.cb.reset.parameters(my_etas)))
|
||||
|
||||
bst <- xgb.train(
|
||||
param,
|
||||
dtrain,
|
||||
nrounds = 2,
|
||||
evals = evals,
|
||||
verbose = 0,
|
||||
callbacks = list(xgb.cb.reset.parameters(my_etas))
|
||||
)
|
||||
|
||||
## Early stopping:
|
||||
bst <- xgb.train(param, dtrain, nrounds = 25, evals = evals,
|
||||
early_stopping_rounds = 3)
|
||||
bst <- xgb.train(
|
||||
param, dtrain, nrounds = 25, evals = evals, early_stopping_rounds = 3
|
||||
)
|
||||
|
||||
## An 'xgboost' interface example:
|
||||
bst <- xgboost(x = agaricus.train$data, y = factor(agaricus.train$label),
|
||||
params = list(max_depth = 2, eta = 1), nthread = nthread, nrounds = 2)
|
||||
bst <- xgboost(
|
||||
x = agaricus.train$data,
|
||||
y = factor(agaricus.train$label),
|
||||
params = list(max_depth = 2, eta = 1),
|
||||
nthread = nthread,
|
||||
nrounds = 2
|
||||
)
|
||||
pred <- predict(bst, agaricus.test$data)
|
||||
|
||||
}
|
||||
@ -365,7 +388,5 @@ Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System",
|
||||
22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, \url{https://arxiv.org/abs/1603.02754}
|
||||
}
|
||||
\seealso{
|
||||
\code{\link{xgb.Callback}},
|
||||
\code{\link{predict.xgb.Booster}},
|
||||
\code{\link{xgb.cv}}
|
||||
\code{\link[=xgb.Callback]{xgb.Callback()}}, \code{\link[=predict.xgb.Booster]{predict.xgb.Booster()}}, \code{\link[=xgb.cv]{xgb.cv()}}
|
||||
}
|
||||
|
||||
@ -14,21 +14,21 @@ xgb.get.config()
|
||||
\item{...}{List of parameters to be set, as keyword arguments}
|
||||
}
|
||||
\value{
|
||||
\code{xgb.set.config} returns \code{TRUE} to signal success. \code{xgb.get.config} returns
|
||||
\code{xgb.set.config()} returns \code{TRUE} to signal success. \code{xgb.get.config()} returns
|
||||
a list containing all global-scope parameters and their values.
|
||||
}
|
||||
\description{
|
||||
Global configuration consists of a collection of parameters that can be applied in the global
|
||||
scope. See \url{https://xgboost.readthedocs.io/en/stable/parameter.html} for the full list of
|
||||
parameters supported in the global configuration. Use \code{xgb.set.config} to update the
|
||||
values of one or more global-scope parameters. Use \code{xgb.get.config} to fetch the current
|
||||
parameters supported in the global configuration. Use \code{xgb.set.config()} to update the
|
||||
values of one or more global-scope parameters. Use \code{xgb.get.config()} to fetch the current
|
||||
values of all global-scope parameters (listed in
|
||||
\url{https://xgboost.readthedocs.io/en/stable/parameter.html}).
|
||||
}
|
||||
\details{
|
||||
Note that serialization-related functions might use a globally-configured number of threads,
|
||||
which is managed by the system's OpenMP (OMP) configuration instead. Typically, XGBoost methods
|
||||
accept an \code{nthreads} parameter, but some methods like \code{readRDS} might get executed before such
|
||||
accept an \code{nthreads} parameter, but some methods like \code{\link[=readRDS]{readRDS()}} might get executed before such
|
||||
parameter can be supplied.
|
||||
|
||||
The number of OMP threads can in turn be configured for example through an environment variable
|
||||
|
||||
@ -21,65 +21,69 @@ xgboost(
|
||||
)
|
||||
}
|
||||
\arguments{
|
||||
\item{x}{The features / covariates. Can be passed as:\itemize{
|
||||
\item A numeric or integer `matrix`.
|
||||
\item A `data.frame`, in which all columns are one of the following types:\itemize{
|
||||
\item `numeric`
|
||||
\item `integer`
|
||||
\item `logical`
|
||||
\item `factor`
|
||||
\item{x}{The features / covariates. Can be passed as:
|
||||
\itemize{
|
||||
\item A numeric or integer \code{matrix}.
|
||||
\item A \code{data.frame}, in which all columns are one of the following types:
|
||||
\itemize{
|
||||
\item \code{numeric}
|
||||
\item \code{integer}
|
||||
\item \code{logical}
|
||||
\item \code{factor}
|
||||
}
|
||||
|
||||
Columns of `factor` type will be assumed to be categorical, while other column types will
|
||||
Columns of \code{factor} type will be assumed to be categorical, while other column types will
|
||||
be assumed to be numeric.
|
||||
\item A sparse matrix from the `Matrix` package, either as `dgCMatrix` or `dgRMatrix` class.
|
||||
\item A sparse matrix from the \code{Matrix} package, either as \code{dgCMatrix} or \code{dgRMatrix} class.
|
||||
}
|
||||
|
||||
Note that categorical features are only supported for `data.frame` inputs, and are automatically
|
||||
determined based on their types. See \link{xgb.train} with \link{xgb.DMatrix} for more flexible
|
||||
Note that categorical features are only supported for \code{data.frame} inputs, and are automatically
|
||||
determined based on their types. See \code{\link[=xgb.train]{xgb.train()}} with \code{\link[=xgb.DMatrix]{xgb.DMatrix()}} for more flexible
|
||||
variants that would allow something like categorical features on sparse matrices.}
|
||||
|
||||
\item{y}{The response variable. Allowed values are:\itemize{
|
||||
\item{y}{The response variable. Allowed values are:
|
||||
\itemize{
|
||||
\item A numeric or integer vector (for regression tasks).
|
||||
\item A factor or character vector (for binary and multi-class classification tasks).
|
||||
\item A logical (boolean) vector (for binary classification tasks).
|
||||
\item A numeric or integer matrix or `data.frame` with numeric/integer columns
|
||||
\item A numeric or integer matrix or \code{data.frame} with numeric/integer columns
|
||||
(for multi-task regression tasks).
|
||||
\item A `Surv` object from the `survival` package (for survival tasks).
|
||||
\item A \code{Surv} object from the 'survival' package (for survival tasks).
|
||||
}
|
||||
|
||||
If `objective` is `NULL`, the right task will be determined automatically based on
|
||||
the class of `y`.
|
||||
If \code{objective} is \code{NULL}, the right task will be determined automatically based on
|
||||
the class of \code{y}.
|
||||
|
||||
If `objective` is not `NULL`, it must match with the type of `y` - e.g. `factor` types of `y`
|
||||
If \code{objective} is not \code{NULL}, it must match with the type of \code{y} - e.g. \code{factor} types of \code{y}
|
||||
can only be used with classification objectives and vice-versa.
|
||||
|
||||
For binary classification, the last factor level of `y` will be used as the "positive"
|
||||
class - that is, the numbers from `predict` will reflect the probabilities of belonging to this
|
||||
class instead of to the first factor level. If `y` is a `logical` vector, then `TRUE` will be
|
||||
For binary classification, the last factor level of \code{y} will be used as the "positive"
|
||||
class - that is, the numbers from \code{predict} will reflect the probabilities of belonging to this
|
||||
class instead of to the first factor level. If \code{y} is a \code{logical} vector, then \code{TRUE} will be
|
||||
set as the last level.}
|
||||
|
||||
\item{objective}{Optimization objective to minimize based on the supplied data, to be passed
|
||||
by name as a string / character (e.g. `reg:absoluteerror`). See the
|
||||
\href{https://xgboost.readthedocs.io/en/stable/parameter.html#learning-task-parameters}{
|
||||
Learning Task Parameters} page for more detailed information on allowed values.
|
||||
by name as a string / character (e.g. \code{reg:absoluteerror}). See the
|
||||
\href{https://xgboost.readthedocs.io/en/stable/parameter.html#learning-task-parameters}{Learning Task Parameters}
|
||||
page for more detailed information on allowed values.
|
||||
|
||||
If `NULL` (the default), will be automatically determined from `y` according to the following
|
||||
logic:\itemize{
|
||||
\item If `y` is a factor with 2 levels, will use `binary:logistic`.
|
||||
\item If `y` is a factor with more than 2 levels, will use `multi:softprob` (number of classes
|
||||
will be determined automatically, should not be passed under `params`).
|
||||
\item If `y` is a `Surv` object from the `survival` package, will use `survival:aft` (note that
|
||||
If \code{NULL} (the default), will be automatically determined from \code{y} according to the following
|
||||
logic:
|
||||
\itemize{
|
||||
\item If \code{y} is a factor with 2 levels, will use \code{binary:logistic}.
|
||||
\item If \code{y} is a factor with more than 2 levels, will use \code{multi:softprob} (number of classes
|
||||
will be determined automatically, should not be passed under \code{params}).
|
||||
\item If \code{y} is a \code{Surv} object from the \code{survival} package, will use \code{survival:aft} (note that
|
||||
the only types supported are left / right / interval censored).
|
||||
\item Otherwise, will use `reg:squarederror`.
|
||||
\item Otherwise, will use \code{reg:squarederror}.
|
||||
}
|
||||
|
||||
If `objective` is not `NULL`, it must match with the type of `y` - e.g. `factor` types of `y`
|
||||
If \code{objective} is not \code{NULL}, it must match with the type of \code{y} - e.g. \code{factor} types of \code{y}
|
||||
can only be used with classification objectives and vice-versa.
|
||||
|
||||
Note that not all possible `objective` values supported by the core XGBoost library are allowed
|
||||
Note that not all possible \code{objective} values supported by the core XGBoost library are allowed
|
||||
here - for example, objectives which are a variation of another but with a different default
|
||||
prediction type (e.g. `multi:softmax` vs. `multi:softprob`) are not allowed, and neither are
|
||||
prediction type (e.g. \code{multi:softmax} vs. \code{multi:softprob}) are not allowed, and neither are
|
||||
ranking objectives, nor custom objectives at the moment.}
|
||||
|
||||
\item{nrounds}{Number of boosting iterations / rounds.
|
||||
@ -87,56 +91,54 @@ ranking objectives, nor custom objectives at the moment.}
|
||||
Note that the number of default boosting rounds here is not automatically tuned, and different
|
||||
problems will have vastly different optimal numbers of boosting rounds.}
|
||||
|
||||
\item{weights}{Sample weights for each row in `x` and `y`. If `NULL` (the default), each row
|
||||
\item{weights}{Sample weights for each row in \code{x} and \code{y}. If \code{NULL} (the default), each row
|
||||
will have the same weight.
|
||||
|
||||
If not `NULL`, should be passed as a numeric vector with length matching to the number of
|
||||
rows in `x`.}
|
||||
If not \code{NULL}, should be passed as a numeric vector with length matching to the number of rows in \code{x}.}
|
||||
|
||||
\item{verbosity}{Verbosity of printing messages. Valid values of 0 (silent), 1 (warning),
|
||||
2 (info), and 3 (debug).}
|
||||
|
||||
\item{nthreads}{Number of parallel threads to use. If passing zero, will use all CPU threads.}
|
||||
|
||||
\item{seed}{Seed to use for random number generation. If passing `NULL`, will draw a random
|
||||
\item{seed}{Seed to use for random number generation. If passing \code{NULL}, will draw a random
|
||||
number using R's PRNG system to use as seed.}
|
||||
|
||||
\item{monotone_constraints}{Optional monotonicity constraints for features.
|
||||
|
||||
Can be passed either as a named list (when `x` has column names), or as a vector. If passed
|
||||
as a vector and `x` has column names, will try to match the elements by name.
|
||||
Can be passed either as a named list (when \code{x} has column names), or as a vector. If passed
|
||||
as a vector and \code{x} has column names, will try to match the elements by name.
|
||||
|
||||
A value of `+1` for a given feature makes the model predictions / scores constrained to be
|
||||
A value of \code{+1} for a given feature makes the model predictions / scores constrained to be
|
||||
a monotonically increasing function of that feature (that is, as the value of the feature
|
||||
increases, the model prediction cannot decrease), while a value of `-1` makes it a monotonically
|
||||
increases, the model prediction cannot decrease), while a value of \code{-1} makes it a monotonically
|
||||
decreasing function. A value of zero imposes no constraint.
|
||||
|
||||
The input for `monotone_constraints` can be a subset of the columns of `x` if named, in which
|
||||
case the columns that are not referred to in `monotone_constraints` will be assumed to have
|
||||
The input for \code{monotone_constraints} can be a subset of the columns of \code{x} if named, in which
|
||||
case the columns that are not referred to in \code{monotone_constraints} will be assumed to have
|
||||
a value of zero (no constraint imposed on the model for those features).
|
||||
|
||||
See the tutorial \href{https://xgboost.readthedocs.io/en/stable/tutorials/monotonic.html}{
|
||||
Monotonic Constraints} for a more detailed explanation.}
|
||||
See the tutorial \href{https://xgboost.readthedocs.io/en/stable/tutorials/monotonic.html}{Monotonic Constraints}
|
||||
for a more detailed explanation.}
|
||||
|
||||
\item{interaction_constraints}{Constraints for interaction representing permitted interactions.
|
||||
The constraints must be specified in the form of a list of vectors referencing columns in the
|
||||
data, e.g. `list(c(1, 2), c(3, 4, 5))` (with these numbers being column indices, numeration
|
||||
data, e.g. \code{list(c(1, 2), c(3, 4, 5))} (with these numbers being column indices, numeration
|
||||
starting at 1 - i.e. the first sublist references the first and second columns) or
|
||||
`list(c("Sepal.Length", "Sepal.Width"), c("Petal.Length", "Petal.Width"))` (references
|
||||
\code{list(c("Sepal.Length", "Sepal.Width"), c("Petal.Length", "Petal.Width"))} (references
|
||||
columns by names), where each vector is a group of indices of features that are allowed to
|
||||
interact with each other.
|
||||
|
||||
See the tutorial
|
||||
\href{https://xgboost.readthedocs.io/en/stable/tutorials/feature_interaction_constraint.html}{
|
||||
Feature Interaction Constraints} for more information.}
|
||||
See the tutorial \href{https://xgboost.readthedocs.io/en/stable/tutorials/feature_interaction_constraint.html}{Feature Interaction Constraints}
|
||||
for more information.}
|
||||
|
||||
\item{feature_weights}{Feature weights for column sampling.
|
||||
|
||||
Can be passed either as a vector with length matching to columns of `x`, or as a named
|
||||
list (only if `x` has column names) with names matching to columns of 'x'. If it is a
|
||||
named vector, will try to match the entries to column names of `x` by name.
|
||||
Can be passed either as a vector with length matching to columns of \code{x}, or as a named
|
||||
list (only if \code{x} has column names) with names matching to columns of 'x'. If it is a
|
||||
named vector, will try to match the entries to column names of \code{x} by name.
|
||||
|
||||
If `NULL` (the default), all columns will have the same weight.}
|
||||
If \code{NULL} (the default), all columns will have the same weight.}
|
||||
|
||||
\item{base_margin}{Base margin used for boosting from existing model.
|
||||
|
||||
@ -145,53 +147,53 @@ here - for example, one can pass the raw scores from a previous model, or some p
|
||||
offset, or similar.
|
||||
|
||||
Should be either a numeric vector or numeric matrix (for multi-class and multi-target objectives)
|
||||
with the same number of rows as `x` and number of columns corresponding to number of optimization
|
||||
targets, and should be in the untransformed scale (for example, for objective `binary:logistic`,
|
||||
it should have log-odds, not probabilities; and for objective `multi:softprob`, should have
|
||||
with the same number of rows as \code{x} and number of columns corresponding to number of optimization
|
||||
targets, and should be in the untransformed scale (for example, for objective \code{binary:logistic},
|
||||
it should have log-odds, not probabilities; and for objective \code{multi:softprob}, should have
|
||||
number of columns matching to number of classes in the data).
|
||||
|
||||
Note that, if it contains more than one column, then columns will not be matched by name to
|
||||
the corresponding `y` - `base_margin` should have the same column order that the model will use
|
||||
(for example, for objective `multi:softprob`, columns of `base_margin` will be matched against
|
||||
`levels(y)` by their position, regardless of what `colnames(base_margin)` returns).
|
||||
the corresponding \code{y} - \code{base_margin} should have the same column order that the model will use
|
||||
(for example, for objective \code{multi:softprob}, columns of \code{base_margin} will be matched against
|
||||
\code{levels(y)} by their position, regardless of what \code{colnames(base_margin)} returns).
|
||||
|
||||
If `NULL`, will start from zero, but note that for most objectives, an intercept is usually
|
||||
added (controllable through parameter `base_score` instead) when `base_margin` is not passed.}
|
||||
If \code{NULL}, will start from zero, but note that for most objectives, an intercept is usually
|
||||
added (controllable through parameter \code{base_score} instead) when \code{base_margin} is not passed.}
|
||||
|
||||
\item{...}{Other training parameters. See the online documentation
|
||||
\href{https://xgboost.readthedocs.io/en/stable/parameter.html}{XGBoost Parameters} for
|
||||
details about possible values and what they do.
|
||||
|
||||
Note that not all possible values from the core XGBoost library are allowed as `params` for
|
||||
Note that not all possible values from the core XGBoost library are allowed as \code{params} for
|
||||
'xgboost()' - in particular, values which require an already-fitted booster object (such as
|
||||
`process_type`) are not accepted here.}
|
||||
\code{process_type}) are not accepted here.}
|
||||
}
|
||||
\value{
|
||||
A model object, inheriting from both `xgboost` and `xgb.Booster`. Compared to the regular
|
||||
`xgb.Booster` model class produced by \link{xgb.train}, this `xgboost` class will have an
|
||||
additional attribute `metadata` containing information which is used for formatting prediction
|
||||
A model object, inheriting from both \code{xgboost} and \code{xgb.Booster}. Compared to the regular
|
||||
\code{xgb.Booster} model class produced by \code{\link[=xgb.train]{xgb.train()}}, this \code{xgboost} class will have an
|
||||
|
||||
additional attribute \code{metadata} containing information which is used for formatting prediction
|
||||
outputs, such as class names for classification problems.
|
||||
}
|
||||
\description{
|
||||
Fits an XGBoost model (boosted decision tree ensemble) to given x/y data.
|
||||
|
||||
See the tutorial \href{https://xgboost.readthedocs.io/en/stable/tutorials/model.html}{
|
||||
Introduction to Boosted Trees} for a longer explanation of what XGBoost does.
|
||||
See the tutorial \href{https://xgboost.readthedocs.io/en/stable/tutorials/model.html}{Introduction to Boosted Trees}
|
||||
for a longer explanation of what XGBoost does.
|
||||
|
||||
This function is intended to provide a more user-friendly interface for XGBoost that follows
|
||||
R's conventions for model fitting and predictions, but which doesn't expose all of the
|
||||
possible functionalities of the core XGBoost library.
|
||||
|
||||
See \link{xgb.train} for a more flexible low-level alternative which is similar across different
|
||||
See \code{\link[=xgb.train]{xgb.train()}} for a more flexible low-level alternative which is similar across different
|
||||
language bindings of XGBoost and which exposes the full library's functionalities.
|
||||
}
|
||||
\details{
|
||||
For package authors using `xgboost` as a dependency, it is highly recommended to use
|
||||
\link{xgb.train} in package code instead of `xgboost()`, since it has a more stable interface
|
||||
For package authors using 'xgboost' as a dependency, it is highly recommended to use
|
||||
\code{\link[=xgb.train]{xgb.train()}} in package code instead of \code{\link[=xgboost]{xgboost()}}, since it has a more stable interface
|
||||
and performs fewer data conversions and copies along the way.
|
||||
}
|
||||
\examples{
|
||||
library(xgboost)
|
||||
data(mtcars)
|
||||
|
||||
# Fit a small regression model on the mtcars data
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user