[R] Finalizes switch to markdown doc (#10733)
--------- Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
This commit is contained in:
parent
479ae8081b
commit
074cad2343
@ -56,7 +56,7 @@
|
|||||||
#' It should match with argument `nrounds` passed to [xgb.train()] or [xgb.cv()].
|
#' It should match with argument `nrounds` passed to [xgb.train()] or [xgb.cv()].
|
||||||
#'
|
#'
|
||||||
#' Note that boosting might be interrupted before reaching this last iteration, for
|
#' Note that boosting might be interrupted before reaching this last iteration, for
|
||||||
#' example by using the early stopping callback \link{xgb.cb.early.stop}.
|
#' example by using the early stopping callback [xgb.cb.early.stop()].
|
||||||
#' - iteration Index of the iteration number that is being executed (first iteration
|
#' - iteration Index of the iteration number that is being executed (first iteration
|
||||||
#' will be the same as parameter `begin_iteration`, then next one will add +1, and so on).
|
#' will be the same as parameter `begin_iteration`, then next one will add +1, and so on).
|
||||||
#'
|
#'
|
||||||
|
|||||||
@ -222,12 +222,11 @@ xgb.get.handle <- function(object) {
|
|||||||
#' For multi-class and multi-target, will be a 4D array with dimensions `[nrows, ngroups, nfeats+1, nfeats+1]`
|
#' For multi-class and multi-target, will be a 4D array with dimensions `[nrows, ngroups, nfeats+1, nfeats+1]`
|
||||||
#' }
|
#' }
|
||||||
#'
|
#'
|
||||||
#' If passing `strict_shape=FALSE`, the result is always an array:\itemize{
|
#' If passing `strict_shape=FALSE`, the result is always an array:
|
||||||
#' \item For normal predictions, the dimension is `[nrows, ngroups]`.
|
#' - For normal predictions, the dimension is `[nrows, ngroups]`.
|
||||||
#' \item For `predcontrib=TRUE`, the dimension is `[nrows, ngroups, nfeats+1]`.
|
#' - For `predcontrib=TRUE`, the dimension is `[nrows, ngroups, nfeats+1]`.
|
||||||
#' \item For `predinteraction=TRUE`, the dimension is `[nrows, ngroups, nfeats+1, nfeats+1]`.
|
#' - For `predinteraction=TRUE`, the dimension is `[nrows, ngroups, nfeats+1, nfeats+1]`.
|
||||||
#' \item For `predleaf=TRUE`, the dimension is `[nrows, niter, ngroups, num_parallel_tree]`.
|
#' - For `predleaf=TRUE`, the dimension is `[nrows, niter, ngroups, num_parallel_tree]`.
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' If passing `avoid_transpose=TRUE`, then the dimensions in all cases will be in reverse order - for
|
#' If passing `avoid_transpose=TRUE`, then the dimensions in all cases will be in reverse order - for
|
||||||
#' example, for `predinteraction`, they will be `[nfeats+1, nfeats+1, ngroups, nrows]`
|
#' example, for `predinteraction`, they will be `[nfeats+1, nfeats+1, ngroups, nrows]`
|
||||||
@ -623,7 +622,7 @@ validate.features <- function(bst, newdata) {
|
|||||||
#' change the value of that parameter for a model.
|
#' change the value of that parameter for a model.
|
||||||
#' Use [xgb.parameters<-()] to set or change model parameters.
|
#' Use [xgb.parameters<-()] to set or change model parameters.
|
||||||
#'
|
#'
|
||||||
#' The [xgb.attributes<-()] setter either updates the existing or adds one or several attributes,
|
#' The `xgb.attributes<-` setter either updates the existing or adds one or several attributes,
|
||||||
#' but it doesn't delete the other existing attributes.
|
#' but it doesn't delete the other existing attributes.
|
||||||
#'
|
#'
|
||||||
#' Important: since this modifies the booster's C object, semantics for assignment here
|
#' Important: since this modifies the booster's C object, semantics for assignment here
|
||||||
@ -635,11 +634,11 @@ validate.features <- function(bst, newdata) {
|
|||||||
#' @param object Object of class `xgb.Booster`. **Will be modified in-place** when assigning to it.
|
#' @param object Object of class `xgb.Booster`. **Will be modified in-place** when assigning to it.
|
||||||
#' @param name A non-empty character string specifying which attribute is to be accessed.
|
#' @param name A non-empty character string specifying which attribute is to be accessed.
|
||||||
#' @param value For `xgb.attr<-`, a value of an attribute; for `xgb.attributes<-`,
|
#' @param value For `xgb.attr<-`, a value of an attribute; for `xgb.attributes<-`,
|
||||||
#' it is a list (or an object coercible to a list) with the names of attributes to set
|
#' it is a list (or an object coercible to a list) with the names of attributes to set
|
||||||
#' and the elements corresponding to attribute values.
|
#' and the elements corresponding to attribute values.
|
||||||
#' Non-character values are converted to character.
|
#' Non-character values are converted to character.
|
||||||
#' When an attribute value is not a scalar, only the first index is used.
|
#' When an attribute value is not a scalar, only the first index is used.
|
||||||
#' Use `NULL` to remove an attribute.
|
#' Use `NULL` to remove an attribute.
|
||||||
#' @return
|
#' @return
|
||||||
#' - `xgb.attr()` returns either a string value of an attribute
|
#' - `xgb.attr()` returns either a string value of an attribute
|
||||||
#' or `NULL` if an attribute wasn't stored in a model.
|
#' or `NULL` if an attribute wasn't stored in a model.
|
||||||
|
|||||||
@ -1,9 +1,9 @@
|
|||||||
#' Construct xgb.DMatrix object
|
#' Construct xgb.DMatrix object
|
||||||
#'
|
#'
|
||||||
#' Construct an 'xgb.DMatrix' object from a given data source, which can then be passed to functions
|
#' Construct an 'xgb.DMatrix' object from a given data source, which can then be passed to functions
|
||||||
#' such as \link{xgb.train} or \link{predict.xgb.Booster}.
|
#' such as [xgb.train()] or [predict()].
|
||||||
#'
|
#'
|
||||||
#' Function 'xgb.QuantileDMatrix' will construct a DMatrix with quantization for the histogram
|
#' Function `xgb.QuantileDMatrix()` will construct a DMatrix with quantization for the histogram
|
||||||
#' method already applied to it, which can be used to reduce memory usage (compared to using a
|
#' method already applied to it, which can be used to reduce memory usage (compared to using a
|
||||||
#' a regular DMatrix first and then creating a quantization out of it) when using the histogram
|
#' a regular DMatrix first and then creating a quantization out of it) when using the histogram
|
||||||
#' method (`tree_method = "hist"`, which is the default algorithm), but is not usable for the
|
#' method (`tree_method = "hist"`, which is the default algorithm), but is not usable for the
|
||||||
@ -24,20 +24,20 @@
|
|||||||
#'
|
#'
|
||||||
#' Other column types are not supported.
|
#' Other column types are not supported.
|
||||||
#' \item CSR matrices, as class `dgRMatrix` from package `Matrix`.
|
#' \item CSR matrices, as class `dgRMatrix` from package `Matrix`.
|
||||||
#' \item CSC matrices, as class `dgCMatrix` from package `Matrix`. These are \bold{not} supported for
|
#' \item CSC matrices, as class `dgCMatrix` from package `Matrix`. These are **not** supported for
|
||||||
#' 'xgb.QuantileDMatrix'.
|
#' 'xgb.QuantileDMatrix'.
|
||||||
#' \item Single-row CSR matrices, as class `dsparseVector` from package `Matrix`, which is interpreted
|
#' \item Single-row CSR matrices, as class `dsparseVector` from package `Matrix`, which is interpreted
|
||||||
#' as a single row (only when making predictions from a fitted model).
|
#' as a single row (only when making predictions from a fitted model).
|
||||||
#' \item Text files in a supported format, passed as a `character` variable containing the URI path to
|
#' \item Text files in a supported format, passed as a `character` variable containing the URI path to
|
||||||
#' the file, with an optional format specifier.
|
#' the file, with an optional format specifier.
|
||||||
#'
|
#'
|
||||||
#' These are \bold{not} supported for `xgb.QuantileDMatrix`. Supported formats are:\itemize{
|
#' These are **not** supported for `xgb.QuantileDMatrix`. Supported formats are:\itemize{
|
||||||
#' \item XGBoost's own binary format for DMatrices, as produced by \link{xgb.DMatrix.save}.
|
#' \item XGBoost's own binary format for DMatrices, as produced by [xgb.DMatrix.save()].
|
||||||
#' \item SVMLight (a.k.a. LibSVM) format for CSR matrices. This format can be signaled by suffix
|
#' \item SVMLight (a.k.a. LibSVM) format for CSR matrices. This format can be signaled by suffix
|
||||||
#' `?format=libsvm` at the end of the file path. It will be the default format if not
|
#' `?format=libsvm` at the end of the file path. It will be the default format if not
|
||||||
#' otherwise specified.
|
#' otherwise specified.
|
||||||
#' \item CSV files (comma-separated values). This format can be specified by adding suffix
|
#' \item CSV files (comma-separated values). This format can be specified by adding suffix
|
||||||
#' `?format=csv` at the end ofthe file path. It will \bold{not} be auto-deduced from file extensions.
|
#' `?format=csv` at the end ofthe file path. It will **not** be auto-deduced from file extensions.
|
||||||
#' }
|
#' }
|
||||||
#'
|
#'
|
||||||
#' Be aware that the format of the file will not be auto-deduced - for example, if a file is named 'file.csv',
|
#' Be aware that the format of the file will not be auto-deduced - for example, if a file is named 'file.csv',
|
||||||
@ -54,44 +54,41 @@
|
|||||||
#' integers with numeration starting at zero.
|
#' integers with numeration starting at zero.
|
||||||
#' @param weight Weight for each instance.
|
#' @param weight Weight for each instance.
|
||||||
#'
|
#'
|
||||||
#' Note that, for ranking task, weights are per-group. In ranking task, one weight
|
#' Note that, for ranking task, weights are per-group. In ranking task, one weight
|
||||||
#' is assigned to each group (not each data point). This is because we
|
#' is assigned to each group (not each data point). This is because we
|
||||||
#' only care about the relative ordering of data points within each group,
|
#' only care about the relative ordering of data points within each group,
|
||||||
#' so it doesn't make sense to assign weights to individual data points.
|
#' so it doesn't make sense to assign weights to individual data points.
|
||||||
#' @param base_margin Base margin used for boosting from existing model.
|
#' @param base_margin Base margin used for boosting from existing model.
|
||||||
#'
|
#'
|
||||||
#' In the case of multi-output models, one can also pass multi-dimensional base_margin.
|
#' In the case of multi-output models, one can also pass multi-dimensional base_margin.
|
||||||
#' @param missing A float value to represents missing values in data (not used when creating DMatrix
|
#' @param missing A float value to represents missing values in data (not used when creating DMatrix
|
||||||
#' from text files).
|
#' from text files). It is useful to change when a zero, infinite, or some other
|
||||||
#' It is useful to change when a zero, infinite, or some other extreme value represents missing
|
#' extreme value represents missing values in data.
|
||||||
#' values in data.
|
|
||||||
#' @param silent whether to suppress printing an informational message after loading from a file.
|
#' @param silent whether to suppress printing an informational message after loading from a file.
|
||||||
#' @param feature_names Set names for features. Overrides column names in data
|
#' @param feature_names Set names for features. Overrides column names in data frame and matrix.
|
||||||
#' frame and matrix.
|
|
||||||
#'
|
#'
|
||||||
#' Note: columns are not referenced by name when calling `predict`, so the column order there
|
#' Note: columns are not referenced by name when calling `predict`, so the column order there
|
||||||
#' must be the same as in the DMatrix construction, regardless of the column names.
|
#' must be the same as in the DMatrix construction, regardless of the column names.
|
||||||
#' @param feature_types Set types for features.
|
#' @param feature_types Set types for features.
|
||||||
#'
|
#'
|
||||||
#' If `data` is a `data.frame` and passing `feature_types` is not supplied, feature types will be deduced
|
#' If `data` is a `data.frame` and passing `feature_types` is not supplied,
|
||||||
#' automatically from the column types.
|
#' feature types will be deduced automatically from the column types.
|
||||||
#'
|
#'
|
||||||
#' Otherwise, one can pass a character vector with the same length as number of columns in `data`,
|
#' Otherwise, one can pass a character vector with the same length as number of columns in `data`,
|
||||||
#' with the following possible values:\itemize{
|
#' with the following possible values:
|
||||||
#' \item "c", which represents categorical columns.
|
#' - "c", which represents categorical columns.
|
||||||
#' \item "q", which represents numeric columns.
|
#' - "q", which represents numeric columns.
|
||||||
#' \item "int", which represents integer columns.
|
#' - "int", which represents integer columns.
|
||||||
#' \item "i", which represents logical (boolean) columns.
|
#' - "i", which represents logical (boolean) columns.
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' Note that, while categorical types are treated differently from the rest for model fitting
|
#' Note that, while categorical types are treated differently from the rest for model fitting
|
||||||
#' purposes, the other types do not influence the generated model, but have effects in other
|
#' purposes, the other types do not influence the generated model, but have effects in other
|
||||||
#' functionalities such as feature importances.
|
#' functionalities such as feature importances.
|
||||||
#'
|
#'
|
||||||
#' \bold{Important}: categorical features, if specified manually through `feature_types`, must
|
#' **Important**: Categorical features, if specified manually through `feature_types`, must
|
||||||
#' be encoded as integers with numeration starting at zero, and the same encoding needs to be
|
#' be encoded as integers with numeration starting at zero, and the same encoding needs to be
|
||||||
#' applied when passing data to `predict`. Even if passing `factor` types, the encoding will
|
#' applied when passing data to [predict()]. Even if passing `factor` types, the encoding will
|
||||||
#' not be saved, so make sure that `factor` columns passed to `predict` have the same `levels`.
|
#' not be saved, so make sure that `factor` columns passed to `predict` have the same `levels`.
|
||||||
#' @param nthread Number of threads used for creating DMatrix.
|
#' @param nthread Number of threads used for creating DMatrix.
|
||||||
#' @param group Group size for all ranking group.
|
#' @param group Group size for all ranking group.
|
||||||
#' @param qid Query ID for data samples, used for ranking.
|
#' @param qid Query ID for data samples, used for ranking.
|
||||||
@ -99,23 +96,24 @@
|
|||||||
#' @param label_upper_bound Upper bound for survival training.
|
#' @param label_upper_bound Upper bound for survival training.
|
||||||
#' @param feature_weights Set feature weights for column sampling.
|
#' @param feature_weights Set feature weights for column sampling.
|
||||||
#' @param data_split_mode When passing a URI (as R `character`) as input, this signals
|
#' @param data_split_mode When passing a URI (as R `character`) as input, this signals
|
||||||
#' whether to split by row or column. Allowed values are `"row"` and `"col"`.
|
#' whether to split by row or column. Allowed values are `"row"` and `"col"`.
|
||||||
#'
|
#'
|
||||||
#' In distributed mode, the file is split accordingly; otherwise this is only an indicator on
|
#' In distributed mode, the file is split accordingly; otherwise this is only an indicator on
|
||||||
#' how the file was split beforehand. Default to row.
|
#' how the file was split beforehand. Default to row.
|
||||||
#'
|
#'
|
||||||
#' This is not used when `data` is not a URI.
|
#' This is not used when `data` is not a URI.
|
||||||
#' @return An 'xgb.DMatrix' object. If calling 'xgb.QuantileDMatrix', it will have additional
|
#' @return An 'xgb.DMatrix' object. If calling 'xgb.QuantileDMatrix', it will have additional
|
||||||
#' subclass 'xgb.QuantileDMatrix'.
|
#' subclass 'xgb.QuantileDMatrix'.
|
||||||
#'
|
#'
|
||||||
#' @details
|
#' @details
|
||||||
#' Note that DMatrix objects are not serializable through R functions such as \code{saveRDS} or \code{save}.
|
#' Note that DMatrix objects are not serializable through R functions such as [saveRDS()] or [save()].
|
||||||
#' If a DMatrix gets serialized and then de-serialized (for example, when saving data in an R session or caching
|
#' If a DMatrix gets serialized and then de-serialized (for example, when saving data in an R session or caching
|
||||||
#' chunks in an Rmd file), the resulting object will not be usable anymore and will need to be reconstructed
|
#' chunks in an Rmd file), the resulting object will not be usable anymore and will need to be reconstructed
|
||||||
#' from the original source of data.
|
#' from the original source of data.
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' data(agaricus.train, package='xgboost')
|
#' data(agaricus.train, package = "xgboost")
|
||||||
|
#'
|
||||||
#' ## Keep the number of threads to 1 for examples
|
#' ## Keep the number of threads to 1 for examples
|
||||||
#' nthread <- 1
|
#' nthread <- 1
|
||||||
#' data.table::setDTthreads(nthread)
|
#' data.table::setDTthreads(nthread)
|
||||||
@ -318,13 +316,13 @@ xgb.DMatrix <- function(
|
|||||||
}
|
}
|
||||||
|
|
||||||
#' @param ref The training dataset that provides quantile information, needed when creating
|
#' @param ref The training dataset that provides quantile information, needed when creating
|
||||||
#' validation/test dataset with `xgb.QuantileDMatrix`. Supplying the training DMatrix
|
#' validation/test dataset with [xgb.QuantileDMatrix()]. Supplying the training DMatrix
|
||||||
#' as a reference means that the same quantisation applied to the training data is
|
#' as a reference means that the same quantisation applied to the training data is
|
||||||
#' applied to the validation/test data
|
#' applied to the validation/test data
|
||||||
#' @param max_bin The number of histogram bin, should be consistent with the training parameter
|
#' @param max_bin The number of histogram bin, should be consistent with the training parameter
|
||||||
#' `max_bin`.
|
#' `max_bin`.
|
||||||
#'
|
#'
|
||||||
#' This is only supported when constructing a QuantileDMatrix.
|
#' This is only supported when constructing a QuantileDMatrix.
|
||||||
#' @export
|
#' @export
|
||||||
#' @rdname xgb.DMatrix
|
#' @rdname xgb.DMatrix
|
||||||
xgb.QuantileDMatrix <- function(
|
xgb.QuantileDMatrix <- function(
|
||||||
@ -411,40 +409,42 @@ xgb.QuantileDMatrix <- function(
|
|||||||
return(dmat)
|
return(dmat)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title XGBoost Data Iterator
|
#' XGBoost Data Iterator
|
||||||
#' @description Interface to create a custom data iterator in order to construct a DMatrix
|
#'
|
||||||
|
#' @description
|
||||||
|
#' Interface to create a custom data iterator in order to construct a DMatrix
|
||||||
#' from external memory.
|
#' from external memory.
|
||||||
#'
|
#'
|
||||||
#' This function is responsible for generating an R object structure containing callback
|
#' This function is responsible for generating an R object structure containing callback
|
||||||
#' functions and an environment shared with them.
|
#' functions and an environment shared with them.
|
||||||
#'
|
#'
|
||||||
#' The output structure from this function is then meant to be passed to \link{xgb.ExternalDMatrix},
|
#' The output structure from this function is then meant to be passed to [xgb.ExternalDMatrix()],
|
||||||
#' which will consume the data and create a DMatrix from it by executing the callback functions.
|
#' which will consume the data and create a DMatrix from it by executing the callback functions.
|
||||||
#'
|
#'
|
||||||
#' For more information, and for a usage example, see the documentation for \link{xgb.ExternalDMatrix}.
|
#' For more information, and for a usage example, see the documentation for [xgb.ExternalDMatrix()].
|
||||||
|
#'
|
||||||
#' @param env An R environment to pass to the callback functions supplied here, which can be
|
#' @param env An R environment to pass to the callback functions supplied here, which can be
|
||||||
#' used to keep track of variables to determine how to handle the batches.
|
#' used to keep track of variables to determine how to handle the batches.
|
||||||
#'
|
#'
|
||||||
#' For example, one might want to keep track of an iteration number in this environment in order
|
#' For example, one might want to keep track of an iteration number in this environment in order
|
||||||
#' to know which part of the data to pass next.
|
#' to know which part of the data to pass next.
|
||||||
#' @param f_next `function(env)` which is responsible for:\itemize{
|
#' @param f_next `function(env)` which is responsible for:
|
||||||
#' \item Accessing or retrieving the next batch of data in the iterator.
|
#' - Accessing or retrieving the next batch of data in the iterator.
|
||||||
#' \item Supplying this data by calling function \link{xgb.DataBatch} on it and returning the result.
|
#' - Supplying this data by calling function [xgb.DataBatch()] on it and returning the result.
|
||||||
#' \item Keeping track of where in the iterator batch it is or will go next, which can for example
|
#' - Keeping track of where in the iterator batch it is or will go next, which can for example
|
||||||
#' be done by modifiying variables in the `env` variable that is passed here.
|
#' be done by modifiying variables in the `env` variable that is passed here.
|
||||||
#' \item Signaling whether there are more batches to be consumed or not, by returning `NULL`
|
#' - Signaling whether there are more batches to be consumed or not, by returning `NULL`
|
||||||
#' when the stream of data ends (all batches in the iterator have been consumed), or the result from
|
#' when the stream of data ends (all batches in the iterator have been consumed), or the result from
|
||||||
#' calling \link{xgb.DataBatch} when there are more batches in the line to be consumed.
|
#' calling [xgb.DataBatch()] when there are more batches in the line to be consumed.
|
||||||
#' }
|
|
||||||
#' @param f_reset `function(env)` which is responsible for reseting the data iterator
|
#' @param f_reset `function(env)` which is responsible for reseting the data iterator
|
||||||
#' (i.e. taking it back to the first batch, called before and after the sequence of batches
|
#' (i.e. taking it back to the first batch, called before and after the sequence of batches
|
||||||
#' has been consumed).
|
#' has been consumed).
|
||||||
#'
|
#'
|
||||||
#' Note that, after resetting the iterator, the batches will be accessed again, so the same data
|
#' Note that, after resetting the iterator, the batches will be accessed again, so the same data
|
||||||
#' (and in the same order) must be passed in subsequent iterations.
|
#' (and in the same order) must be passed in subsequent iterations.
|
||||||
#' @return An `xgb.DataIter` object, containing the same inputs supplied here, which can then
|
#' @return An `xgb.DataIter` object, containing the same inputs supplied here, which can then
|
||||||
#' be passed to \link{xgb.ExternalDMatrix}.
|
#' be passed to [xgb.ExternalDMatrix()].
|
||||||
#' @seealso \link{xgb.ExternalDMatrix}, \link{xgb.DataBatch}.
|
#' @seealso [xgb.ExternalDMatrix()], [xgb.DataBatch()].
|
||||||
#' @export
|
#' @export
|
||||||
xgb.DataIter <- function(env = new.env(), f_next, f_reset) {
|
xgb.DataIter <- function(env = new.env(), f_next, f_reset) {
|
||||||
if (!is.function(f_next)) {
|
if (!is.function(f_next)) {
|
||||||
@ -508,38 +508,39 @@ xgb.DataIter <- function(env = new.env(), f_next, f_reset) {
|
|||||||
return(out)
|
return(out)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Structure for Data Batches
|
#' Structure for Data Batches
|
||||||
#' @description Helper function to supply data in batches of a data iterator when
|
|
||||||
#' constructing a DMatrix from external memory through \link{xgb.ExternalDMatrix}
|
|
||||||
#' or through \link{xgb.QuantileDMatrix.from_iterator}.
|
|
||||||
#'
|
#'
|
||||||
#' This function is \bold{only} meant to be called inside of a callback function (which
|
#' @description
|
||||||
#' is passed as argument to function \link{xgb.DataIter} to construct a data iterator)
|
#' Helper function to supply data in batches of a data iterator when
|
||||||
|
#' constructing a DMatrix from external memory through [xgb.ExternalDMatrix()]
|
||||||
|
#' or through [xgb.QuantileDMatrix.from_iterator()].
|
||||||
|
#'
|
||||||
|
#' This function is **only** meant to be called inside of a callback function (which
|
||||||
|
#' is passed as argument to function [xgb.DataIter()] to construct a data iterator)
|
||||||
#' when constructing a DMatrix through external memory - otherwise, one should call
|
#' when constructing a DMatrix through external memory - otherwise, one should call
|
||||||
#' \link{xgb.DMatrix} or \link{xgb.QuantileDMatrix}.
|
#' [xgb.DMatrix()] or [xgb.QuantileDMatrix()].
|
||||||
#'
|
#'
|
||||||
#' The object that results from calling this function directly is \bold{not} like
|
#' The object that results from calling this function directly is **not** like
|
||||||
#' an `xgb.DMatrix` - i.e. cannot be used to train a model, nor to get predictions - only
|
#' an `xgb.DMatrix` - i.e. cannot be used to train a model, nor to get predictions - only
|
||||||
#' possible usage is to supply data to an iterator, from which a DMatrix is then constructed.
|
#' possible usage is to supply data to an iterator, from which a DMatrix is then constructed.
|
||||||
#'
|
#'
|
||||||
#' For more information and for example usage, see the documentation for \link{xgb.ExternalDMatrix}.
|
#' For more information and for example usage, see the documentation for [xgb.ExternalDMatrix()].
|
||||||
#' @inheritParams xgb.DMatrix
|
#' @inheritParams xgb.DMatrix
|
||||||
#' @param data Batch of data belonging to this batch.
|
#' @param data Batch of data belonging to this batch.
|
||||||
#'
|
#'
|
||||||
#' Note that not all of the input types supported by \link{xgb.DMatrix} are possible
|
#' Note that not all of the input types supported by [xgb.DMatrix()] are possible
|
||||||
#' to pass here. Supported types are:\itemize{
|
#' to pass here. Supported types are:
|
||||||
#' \item `matrix`, with types `numeric`, `integer`, and `logical`. Note that for types
|
#' - `matrix`, with types `numeric`, `integer`, and `logical`. Note that for types
|
||||||
#' `integer` and `logical`, missing values might not be automatically recognized as
|
#' `integer` and `logical`, missing values might not be automatically recognized as
|
||||||
#' as such - see the documentation for parameter `missing` in \link{xgb.ExternalDMatrix}
|
#' as such - see the documentation for parameter `missing` in [xgb.ExternalDMatrix()]
|
||||||
#' for details on this.
|
#' for details on this.
|
||||||
#' \item `data.frame`, with the same types as supported by 'xgb.DMatrix' and same
|
#' - `data.frame`, with the same types as supported by 'xgb.DMatrix' and same
|
||||||
#' conversions applied to it. See the documentation for parameter `data` in
|
#' conversions applied to it. See the documentation for parameter `data` in
|
||||||
#' \link{xgb.DMatrix} for details on it.
|
#' [xgb.DMatrix()] for details on it.
|
||||||
#' \item CSR matrices, as class `dgRMatrix` from package `Matrix`.
|
#' - CSR matrices, as class `dgRMatrix` from package "Matrix".
|
||||||
#' }
|
|
||||||
#' @return An object of class `xgb.DataBatch`, which is just a list containing the
|
#' @return An object of class `xgb.DataBatch`, which is just a list containing the
|
||||||
#' data and parameters passed here. It does \bold{not} inherit from `xgb.DMatrix`.
|
#' data and parameters passed here. It does **not** inherit from `xgb.DMatrix`.
|
||||||
#' @seealso \link{xgb.DataIter}, \link{xgb.ExternalDMatrix}.
|
#' @seealso [xgb.DataIter()], [xgb.ExternalDMatrix()].
|
||||||
#' @export
|
#' @export
|
||||||
xgb.DataBatch <- function(
|
xgb.DataBatch <- function(
|
||||||
data,
|
data,
|
||||||
@ -616,42 +617,43 @@ xgb.ProxyDMatrix <- function(proxy_handle, data_iterator) {
|
|||||||
return(1L)
|
return(1L)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title DMatrix from External Data
|
#' DMatrix from External Data
|
||||||
#' @description Create a special type of xgboost 'DMatrix' object from external data
|
#'
|
||||||
#' supplied by an \link{xgb.DataIter} object, potentially passed in batches from a
|
#' @description
|
||||||
|
#' Create a special type of XGBoost 'DMatrix' object from external data
|
||||||
|
#' supplied by an [xgb.DataIter()] object, potentially passed in batches from a
|
||||||
#' bigger set that might not fit entirely in memory.
|
#' bigger set that might not fit entirely in memory.
|
||||||
#'
|
#'
|
||||||
#' The data supplied by the iterator is accessed on-demand as needed, multiple times,
|
#' The data supplied by the iterator is accessed on-demand as needed, multiple times,
|
||||||
#' without being concatenated, but note that fields like 'label' \bold{will} be
|
#' without being concatenated, but note that fields like 'label' **will** be
|
||||||
#' concatenated from multiple calls to the data iterator.
|
#' concatenated from multiple calls to the data iterator.
|
||||||
#'
|
#'
|
||||||
#' For more information, see the guide 'Using XGBoost External Memory Version':
|
#' For more information, see the guide 'Using XGBoost External Memory Version':
|
||||||
#' \url{https://xgboost.readthedocs.io/en/stable/tutorials/external_memory.html}
|
#' \url{https://xgboost.readthedocs.io/en/stable/tutorials/external_memory.html}
|
||||||
#' @inheritParams xgb.DMatrix
|
#' @inheritParams xgb.DMatrix
|
||||||
#' @param data_iterator A data iterator structure as returned by \link{xgb.DataIter},
|
#' @param data_iterator A data iterator structure as returned by [xgb.DataIter()],
|
||||||
#' which includes an environment shared between function calls, and functions to access
|
#' which includes an environment shared between function calls, and functions to access
|
||||||
#' the data in batches on-demand.
|
#' the data in batches on-demand.
|
||||||
#' @param cache_prefix The path of cache file, caller must initialize all the directories in this path.
|
#' @param cache_prefix The path of cache file, caller must initialize all the directories in this path.
|
||||||
#' @param missing A float value to represents missing values in data.
|
#' @param missing A float value to represents missing values in data.
|
||||||
#'
|
#'
|
||||||
#' Note that, while functions like \link{xgb.DMatrix} can take a generic `NA` and interpret it
|
#' Note that, while functions like [xgb.DMatrix()] can take a generic `NA` and interpret it
|
||||||
#' correctly for different types like `numeric` and `integer`, if an `NA` value is passed here,
|
#' correctly for different types like `numeric` and `integer`, if an `NA` value is passed here,
|
||||||
#' it will not be adapted for different input types.
|
#' it will not be adapted for different input types.
|
||||||
#'
|
#'
|
||||||
#' For example, in R `integer` types, missing values are represented by integer number `-2147483648`
|
#' For example, in R `integer` types, missing values are represented by integer number `-2147483648`
|
||||||
#' (since machine 'integer' types do not have an inherent 'NA' value) - hence, if one passes `NA`,
|
#' (since machine 'integer' types do not have an inherent 'NA' value) - hence, if one passes `NA`,
|
||||||
#' which is interpreted as a floating-point NaN by 'xgb.ExternalDMatrix' and by
|
#' which is interpreted as a floating-point NaN by [xgb.ExternalDMatrix()] and by
|
||||||
#' 'xgb.QuantileDMatrix.from_iterator', these integer missing values will not be treated as missing.
|
#' [xgb.QuantileDMatrix.from_iterator()], these integer missing values will not be treated as missing.
|
||||||
#' This should not pose any problem for `numeric` types, since they do have an inheret NaN value.
|
#' This should not pose any problem for `numeric` types, since they do have an inheret NaN value.
|
||||||
#' @return An 'xgb.DMatrix' object, with subclass 'xgb.ExternalDMatrix', in which the data is not
|
#' @return An 'xgb.DMatrix' object, with subclass 'xgb.ExternalDMatrix', in which the data is not
|
||||||
#' held internally but accessed through the iterator when needed.
|
#' held internally but accessed through the iterator when needed.
|
||||||
#' @seealso \link{xgb.DataIter}, \link{xgb.DataBatch}, \link{xgb.QuantileDMatrix.from_iterator}
|
#' @seealso [xgb.DataIter()], [xgb.DataBatch()], [xgb.QuantileDMatrix.from_iterator()]
|
||||||
#' @examples
|
#' @examples
|
||||||
#' library(xgboost)
|
|
||||||
#' data(mtcars)
|
#' data(mtcars)
|
||||||
#'
|
#'
|
||||||
#' # this custom environment will be passed to the iterator
|
#' # This custom environment will be passed to the iterator
|
||||||
#' # functions at each call. It's up to the user to keep
|
#' # functions at each call. It is up to the user to keep
|
||||||
#' # track of the iteration number in this environment.
|
#' # track of the iteration number in this environment.
|
||||||
#' iterator_env <- as.environment(
|
#' iterator_env <- as.environment(
|
||||||
#' list(
|
#' list(
|
||||||
@ -758,25 +760,27 @@ xgb.ExternalDMatrix <- function(
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
#' @title QuantileDMatrix from External Data
|
#' QuantileDMatrix from External Data
|
||||||
#' @description Create an `xgb.QuantileDMatrix` object (exact same class as would be returned by
|
#'
|
||||||
#' calling function \link{xgb.QuantileDMatrix}, with the same advantages and limitations) from
|
#' @description
|
||||||
#' external data supplied by an \link{xgb.DataIter} object, potentially passed in batches from
|
#' Create an `xgb.QuantileDMatrix` object (exact same class as would be returned by
|
||||||
#' a bigger set that might not fit entirely in memory, same way as \link{xgb.ExternalDMatrix}.
|
#' calling function [xgb.QuantileDMatrix()], with the same advantages and limitations) from
|
||||||
|
#' external data supplied by [xgb.DataIter()], potentially passed in batches from
|
||||||
|
#' a bigger set that might not fit entirely in memory, same way as [xgb.ExternalDMatrix()].
|
||||||
#'
|
#'
|
||||||
#' Note that, while external data will only be loaded through the iterator (thus the full data
|
#' Note that, while external data will only be loaded through the iterator (thus the full data
|
||||||
#' might not be held entirely in-memory), the quantized representation of the data will get
|
#' might not be held entirely in-memory), the quantized representation of the data will get
|
||||||
#' created in-memory, being concatenated from multiple calls to the data iterator. The quantized
|
#' created in-memory, being concatenated from multiple calls to the data iterator. The quantized
|
||||||
#' version is typically lighter than the original data, so there might be cases in which this
|
#' version is typically lighter than the original data, so there might be cases in which this
|
||||||
#' representation could potentially fit in memory even if the full data doesn't.
|
#' representation could potentially fit in memory even if the full data does not.
|
||||||
#'
|
#'
|
||||||
#' For more information, see the guide 'Using XGBoost External Memory Version':
|
#' For more information, see the guide 'Using XGBoost External Memory Version':
|
||||||
#' \url{https://xgboost.readthedocs.io/en/stable/tutorials/external_memory.html}
|
#' \url{https://xgboost.readthedocs.io/en/stable/tutorials/external_memory.html}
|
||||||
#' @inheritParams xgb.ExternalDMatrix
|
#' @inheritParams xgb.ExternalDMatrix
|
||||||
#' @inheritParams xgb.QuantileDMatrix
|
#' @inheritParams xgb.QuantileDMatrix
|
||||||
#' @return An 'xgb.DMatrix' object, with subclass 'xgb.QuantileDMatrix'.
|
#' @return An 'xgb.DMatrix' object, with subclass 'xgb.QuantileDMatrix'.
|
||||||
#' @seealso \link{xgb.DataIter}, \link{xgb.DataBatch}, \link{xgb.ExternalDMatrix},
|
#' @seealso [xgb.DataIter()], [xgb.DataBatch()], [xgb.ExternalDMatrix()],
|
||||||
#' \link{xgb.QuantileDMatrix}
|
#' [xgb.QuantileDMatrix()]
|
||||||
#' @export
|
#' @export
|
||||||
xgb.QuantileDMatrix.from_iterator <- function( # nolint
|
xgb.QuantileDMatrix.from_iterator <- function( # nolint
|
||||||
data_iterator,
|
data_iterator,
|
||||||
@ -823,18 +827,18 @@ xgb.QuantileDMatrix.from_iterator <- function( # nolint
|
|||||||
return(dmat)
|
return(dmat)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Check whether DMatrix object has a field
|
#' Check whether DMatrix object has a field
|
||||||
#' @description Checks whether an xgb.DMatrix object has a given field assigned to
|
#'
|
||||||
|
#' Checks whether an xgb.DMatrix object has a given field assigned to
|
||||||
#' it, such as weights, labels, etc.
|
#' it, such as weights, labels, etc.
|
||||||
#' @param object The DMatrix object to check for the given \code{info} field.
|
#' @param object The DMatrix object to check for the given `info` field.
|
||||||
#' @param info The field to check for presence or absence in \code{object}.
|
#' @param info The field to check for presence or absence in `object`.
|
||||||
#' @seealso \link{xgb.DMatrix}, \link{getinfo.xgb.DMatrix}, \link{setinfo.xgb.DMatrix}
|
#' @seealso [xgb.DMatrix()], [getinfo.xgb.DMatrix()], [setinfo.xgb.DMatrix()]
|
||||||
#' @examples
|
#' @examples
|
||||||
#' library(xgboost)
|
|
||||||
#' x <- matrix(1:10, nrow = 5)
|
#' x <- matrix(1:10, nrow = 5)
|
||||||
#' dm <- xgb.DMatrix(x, nthread = 1)
|
#' dm <- xgb.DMatrix(x, nthread = 1)
|
||||||
#'
|
#'
|
||||||
#' # 'dm' so far doesn't have any fields set
|
#' # 'dm' so far does not have any fields set
|
||||||
#' xgb.DMatrix.hasinfo(dm, "label")
|
#' xgb.DMatrix.hasinfo(dm, "label")
|
||||||
#'
|
#'
|
||||||
#' # Fields can be added after construction
|
#' # Fields can be added after construction
|
||||||
@ -855,17 +859,19 @@ xgb.DMatrix.hasinfo <- function(object, info) {
|
|||||||
|
|
||||||
#' Dimensions of xgb.DMatrix
|
#' Dimensions of xgb.DMatrix
|
||||||
#'
|
#'
|
||||||
#' Returns a vector of numbers of rows and of columns in an \code{xgb.DMatrix}.
|
#' Returns a vector of numbers of rows and of columns in an `xgb.DMatrix`.
|
||||||
#' @param x Object of class \code{xgb.DMatrix}
|
#'
|
||||||
|
#' @param x Object of class `xgb.DMatrix`
|
||||||
#'
|
#'
|
||||||
#' @details
|
#' @details
|
||||||
#' Note: since \code{nrow} and \code{ncol} internally use \code{dim}, they can also
|
#' Note: since [nrow()] and [ncol()] internally use [dim()], they can also
|
||||||
#' be directly used with an \code{xgb.DMatrix} object.
|
#' be directly used with an `xgb.DMatrix` object.
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' data(agaricus.train, package='xgboost')
|
#' data(agaricus.train, package = "xgboost")
|
||||||
|
#'
|
||||||
#' train <- agaricus.train
|
#' train <- agaricus.train
|
||||||
#' dtrain <- xgb.DMatrix(train$data, label=train$label, nthread = 2)
|
#' dtrain <- xgb.DMatrix(train$data, label = train$label, nthread = 2)
|
||||||
#'
|
#'
|
||||||
#' stopifnot(nrow(dtrain) == nrow(train$data))
|
#' stopifnot(nrow(dtrain) == nrow(train$data))
|
||||||
#' stopifnot(ncol(dtrain) == ncol(train$data))
|
#' stopifnot(ncol(dtrain) == ncol(train$data))
|
||||||
@ -877,27 +883,28 @@ dim.xgb.DMatrix <- function(x) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
#' Handling of column names of \code{xgb.DMatrix}
|
#' Handling of column names of `xgb.DMatrix`
|
||||||
#'
|
#'
|
||||||
#' Only column names are supported for \code{xgb.DMatrix}, thus setting of
|
#' Only column names are supported for `xgb.DMatrix`, thus setting of
|
||||||
#' row names would have no effect and returned row names would be NULL.
|
#' row names would have no effect and returned row names would be `NULL`.
|
||||||
#'
|
#'
|
||||||
#' @param x object of class \code{xgb.DMatrix}
|
#' @param x Object of class `xgb.DMatrix`.
|
||||||
#' @param value a list of two elements: the first one is ignored
|
#' @param value A list of two elements: the first one is ignored
|
||||||
#' and the second one is column names
|
#' and the second one is column names
|
||||||
#'
|
#'
|
||||||
#' @details
|
#' @details
|
||||||
#' Generic \code{dimnames} methods are used by \code{colnames}.
|
#' Generic [dimnames()] methods are used by [colnames()].
|
||||||
#' Since row names are irrelevant, it is recommended to use \code{colnames} directly.
|
#' Since row names are irrelevant, it is recommended to use [colnames()] directly.
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' data(agaricus.train, package='xgboost')
|
#' data(agaricus.train, package = "xgboost")
|
||||||
|
#'
|
||||||
#' train <- agaricus.train
|
#' train <- agaricus.train
|
||||||
#' dtrain <- xgb.DMatrix(train$data, label=train$label, nthread = 2)
|
#' dtrain <- xgb.DMatrix(train$data, label = train$label, nthread = 2)
|
||||||
#' dimnames(dtrain)
|
#' dimnames(dtrain)
|
||||||
#' colnames(dtrain)
|
#' colnames(dtrain)
|
||||||
#' colnames(dtrain) <- make.names(1:ncol(train$data))
|
#' colnames(dtrain) <- make.names(1:ncol(train$data))
|
||||||
#' print(dtrain, verbose=TRUE)
|
#' print(dtrain, verbose = TRUE)
|
||||||
#'
|
#'
|
||||||
#' @rdname dimnames.xgb.DMatrix
|
#' @rdname dimnames.xgb.DMatrix
|
||||||
#' @export
|
#' @export
|
||||||
@ -926,47 +933,45 @@ dimnames.xgb.DMatrix <- function(x) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
#' @title Get or set information of xgb.DMatrix and xgb.Booster objects
|
#' Get or set information of xgb.DMatrix and xgb.Booster objects
|
||||||
#' @param object Object of class \code{xgb.DMatrix} of `xgb.Booster`.
|
|
||||||
#' @param name the name of the information field to get (see details)
|
|
||||||
#' @return For `getinfo`, will return the requested field. For `setinfo`, will always return value `TRUE`
|
|
||||||
#' if it succeeds.
|
|
||||||
#' @details
|
|
||||||
#' The \code{name} field can be one of the following for `xgb.DMatrix`:
|
|
||||||
#'
|
#'
|
||||||
#' \itemize{
|
#' @param object Object of class `xgb.DMatrix` or `xgb.Booster`.
|
||||||
#' \item \code{label}
|
#' @param name The name of the information field to get (see details).
|
||||||
#' \item \code{weight}
|
#' @return For `getinfo()`, will return the requested field. For `setinfo()`,
|
||||||
#' \item \code{base_margin}
|
#' will always return value `TRUE` if it succeeds.
|
||||||
#' \item \code{label_lower_bound}
|
#' @details
|
||||||
#' \item \code{label_upper_bound}
|
#' The `name` field can be one of the following for `xgb.DMatrix`:
|
||||||
#' \item \code{group}
|
#' - label
|
||||||
#' \item \code{feature_type}
|
#' - weight
|
||||||
#' \item \code{feature_name}
|
#' - base_margin
|
||||||
#' \item \code{nrow}
|
#' - label_lower_bound
|
||||||
#' }
|
#' - label_upper_bound
|
||||||
#' See the documentation for \link{xgb.DMatrix} for more information about these fields.
|
#' - group
|
||||||
|
#' - feature_type
|
||||||
|
#' - feature_name
|
||||||
|
#' - nrow
|
||||||
|
#'
|
||||||
|
#' See the documentation for [xgb.DMatrix()] for more information about these fields.
|
||||||
#'
|
#'
|
||||||
#' For `xgb.Booster`, can be one of the following:
|
#' For `xgb.Booster`, can be one of the following:
|
||||||
#' \itemize{
|
#' - `feature_type`
|
||||||
#' \item \code{feature_type}
|
#' - `feature_name`
|
||||||
#' \item \code{feature_name}
|
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' Note that, while 'qid' cannot be retrieved, it's possible to get the equivalent 'group'
|
#' Note that, while 'qid' cannot be retrieved, it is possible to get the equivalent 'group'
|
||||||
#' for a DMatrix that had 'qid' assigned.
|
#' for a DMatrix that had 'qid' assigned.
|
||||||
#'
|
#'
|
||||||
#' \bold{Important}: when calling `setinfo`, the objects are modified in-place. See
|
#' **Important**: when calling [setinfo()], the objects are modified in-place. See
|
||||||
#' \link{xgb.copy.Booster} for an idea of this in-place assignment works.
|
#' [xgb.copy.Booster()] for an idea of this in-place assignment works.
|
||||||
#' @examples
|
#' @examples
|
||||||
#' data(agaricus.train, package='xgboost')
|
#' data(agaricus.train, package = "xgboost")
|
||||||
|
#'
|
||||||
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||||
#'
|
#'
|
||||||
#' labels <- getinfo(dtrain, 'label')
|
#' labels <- getinfo(dtrain, "label")
|
||||||
#' setinfo(dtrain, 'label', 1-labels)
|
#' setinfo(dtrain, "label", 1 - labels)
|
||||||
#'
|
#'
|
||||||
#' labels2 <- getinfo(dtrain, 'label')
|
#' labels2 <- getinfo(dtrain, "label")
|
||||||
#' stopifnot(all(labels2 == 1-labels))
|
#' stopifnot(all(labels2 == 1 - labels))
|
||||||
#' @rdname getinfo
|
#' @rdname getinfo
|
||||||
#' @export
|
#' @export
|
||||||
getinfo <- function(object, name) UseMethod("getinfo")
|
getinfo <- function(object, name) UseMethod("getinfo")
|
||||||
@ -1011,28 +1016,29 @@ getinfo.xgb.DMatrix <- function(object, name) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
#' @rdname getinfo
|
#' @rdname getinfo
|
||||||
#' @param info the specific field of information to set
|
#' @param info The specific field of information to set.
|
||||||
#'
|
#'
|
||||||
#' @details
|
#' @details
|
||||||
#' See the documentation for \link{xgb.DMatrix} for possible fields that can be set
|
#' See the documentation for [xgb.DMatrix()] for possible fields that can be set
|
||||||
#' (which correspond to arguments in that function).
|
#' (which correspond to arguments in that function).
|
||||||
#'
|
#'
|
||||||
#' Note that the following fields are allowed in the construction of an \code{xgb.DMatrix}
|
#' Note that the following fields are allowed in the construction of an `xgb.DMatrix`
|
||||||
#' but \bold{aren't} allowed here:\itemize{
|
#' but **are not** allowed here:
|
||||||
#' \item data
|
#' - data
|
||||||
#' \item missing
|
#' - missing
|
||||||
#' \item silent
|
#' - silent
|
||||||
#' \item nthread
|
#' - nthread
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' data(agaricus.train, package='xgboost')
|
#' data(agaricus.train, package = "xgboost")
|
||||||
|
#'
|
||||||
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||||
#'
|
#'
|
||||||
#' labels <- getinfo(dtrain, 'label')
|
#' labels <- getinfo(dtrain, "label")
|
||||||
#' setinfo(dtrain, 'label', 1-labels)
|
#' setinfo(dtrain, "label", 1 - labels)
|
||||||
#' labels2 <- getinfo(dtrain, 'label')
|
#'
|
||||||
#' stopifnot(all.equal(labels2, 1-labels))
|
#' labels2 <- getinfo(dtrain, "label")
|
||||||
|
#' stopifnot(all.equal(labels2, 1 - labels))
|
||||||
#' @export
|
#' @export
|
||||||
setinfo <- function(object, name, info) UseMethod("setinfo")
|
setinfo <- function(object, name, info) UseMethod("setinfo")
|
||||||
|
|
||||||
@ -1117,9 +1123,11 @@ setinfo.xgb.DMatrix <- function(object, name, info) {
|
|||||||
stop("setinfo: unknown info name ", name)
|
stop("setinfo: unknown info name ", name)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Get Quantile Cuts from DMatrix
|
#' Get Quantile Cuts from DMatrix
|
||||||
#' @description Get the quantile cuts (a.k.a. borders) from an `xgb.DMatrix`
|
#'
|
||||||
#' that has been quantized for the histogram method (`tree_method="hist"`).
|
#' @description
|
||||||
|
#' Get the quantile cuts (a.k.a. borders) from an `xgb.DMatrix`
|
||||||
|
#' that has been quantized for the histogram method (`tree_method = "hist"`).
|
||||||
#'
|
#'
|
||||||
#' These cuts are used in order to assign observations to bins - i.e. these are ordered
|
#' These cuts are used in order to assign observations to bins - i.e. these are ordered
|
||||||
#' boundaries which are used to determine assignment condition `border_low < x < border_high`.
|
#' boundaries which are used to determine assignment condition `border_low < x < border_high`.
|
||||||
@ -1130,19 +1138,18 @@ setinfo.xgb.DMatrix <- function(object, name, info) {
|
|||||||
#' which will be output in sorted order from lowest to highest.
|
#' which will be output in sorted order from lowest to highest.
|
||||||
#'
|
#'
|
||||||
#' Different columns can have different numbers of bins according to their range.
|
#' Different columns can have different numbers of bins according to their range.
|
||||||
#' @param dmat An `xgb.DMatrix` object, as returned by \link{xgb.DMatrix}.
|
#' @param dmat An `xgb.DMatrix` object, as returned by [xgb.DMatrix()].
|
||||||
#' @param output Output format for the quantile cuts. Possible options are:\itemize{
|
#' @param output Output format for the quantile cuts. Possible options are:
|
||||||
#' \item `"list"` will return the output as a list with one entry per column, where
|
#' - "list"` will return the output as a list with one entry per column, where
|
||||||
#' each column will have a numeric vector with the cuts. The list will be named if
|
#' each column will have a numeric vector with the cuts. The list will be named if
|
||||||
#' `dmat` has column names assigned to it.
|
#' `dmat` has column names assigned to it.
|
||||||
#' \item `"arrays"` will return a list with entries `indptr` (base-0 indexing) and
|
#' - `"arrays"` will return a list with entries `indptr` (base-0 indexing) and
|
||||||
#' `data`. Here, the cuts for column 'i' are obtained by slicing 'data' from entries
|
#' `data`. Here, the cuts for column 'i' are obtained by slicing 'data' from entries
|
||||||
#' `indptr[i]+1` to `indptr[i+1]`.
|
#' ` indptr[i]+1` to `indptr[i+1]`.
|
||||||
#' }
|
|
||||||
#' @return The quantile cuts, in the format specified by parameter `output`.
|
#' @return The quantile cuts, in the format specified by parameter `output`.
|
||||||
#' @examples
|
#' @examples
|
||||||
#' library(xgboost)
|
|
||||||
#' data(mtcars)
|
#' data(mtcars)
|
||||||
|
#'
|
||||||
#' y <- mtcars$mpg
|
#' y <- mtcars$mpg
|
||||||
#' x <- as.matrix(mtcars[, -1])
|
#' x <- as.matrix(mtcars[, -1])
|
||||||
#' dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
#' dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
||||||
@ -1150,11 +1157,7 @@ setinfo.xgb.DMatrix <- function(object, name, info) {
|
|||||||
#' # DMatrix is not quantized right away, but will be once a hist model is generated
|
#' # DMatrix is not quantized right away, but will be once a hist model is generated
|
||||||
#' model <- xgb.train(
|
#' model <- xgb.train(
|
||||||
#' data = dm,
|
#' data = dm,
|
||||||
#' params = list(
|
#' params = list(tree_method = "hist", max_bin = 8, nthread = 1),
|
||||||
#' tree_method = "hist",
|
|
||||||
#' max_bin = 8,
|
|
||||||
#' nthread = 1
|
|
||||||
#' ),
|
|
||||||
#' nrounds = 3
|
#' nrounds = 3
|
||||||
#' )
|
#' )
|
||||||
#'
|
#'
|
||||||
@ -1189,17 +1192,19 @@ xgb.get.DMatrix.qcut <- function(dmat, output = c("list", "arrays")) { # nolint
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Get Number of Non-Missing Entries in DMatrix
|
#' Get Number of Non-Missing Entries in DMatrix
|
||||||
#' @param dmat An `xgb.DMatrix` object, as returned by \link{xgb.DMatrix}.
|
#'
|
||||||
#' @return The number of non-missing entries in the DMatrix
|
#' @param dmat An `xgb.DMatrix` object, as returned by [xgb.DMatrix()].
|
||||||
|
#' @return The number of non-missing entries in the DMatrix.
|
||||||
#' @export
|
#' @export
|
||||||
xgb.get.DMatrix.num.non.missing <- function(dmat) { # nolint
|
xgb.get.DMatrix.num.non.missing <- function(dmat) { # nolint
|
||||||
stopifnot(inherits(dmat, "xgb.DMatrix"))
|
stopifnot(inherits(dmat, "xgb.DMatrix"))
|
||||||
return(.Call(XGDMatrixNumNonMissing_R, dmat))
|
return(.Call(XGDMatrixNumNonMissing_R, dmat))
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @title Get DMatrix Data
|
#' Get DMatrix Data
|
||||||
#' @param dmat An `xgb.DMatrix` object, as returned by \link{xgb.DMatrix}.
|
#'
|
||||||
|
#' @param dmat An `xgb.DMatrix` object, as returned by [xgb.DMatrix()].
|
||||||
#' @return The data held in the DMatrix, as a sparse CSR matrix (class `dgRMatrix`
|
#' @return The data held in the DMatrix, as a sparse CSR matrix (class `dgRMatrix`
|
||||||
#' from package `Matrix`). If it had feature names, these will be added as column names
|
#' from package `Matrix`). If it had feature names, these will be added as column names
|
||||||
#' in the output.
|
#' in the output.
|
||||||
@ -1223,27 +1228,27 @@ xgb.get.DMatrix.data <- function(dmat) {
|
|||||||
return(out)
|
return(out)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' Get a new DMatrix containing the specified rows of
|
#' Slice DMatrix
|
||||||
#' original xgb.DMatrix object
|
|
||||||
#'
|
#'
|
||||||
#' Get a new DMatrix containing the specified rows of
|
#' Get a new DMatrix containing the specified rows of original xgb.DMatrix object.
|
||||||
#' original xgb.DMatrix object
|
|
||||||
#'
|
#'
|
||||||
#' @param object Object of class "xgb.DMatrix".
|
#' @param object Object of class `xgb.DMatrix`.
|
||||||
#' @param idxset An integer vector of indices of rows needed (base-1 indexing).
|
#' @param idxset An integer vector of indices of rows needed (base-1 indexing).
|
||||||
#' @param allow_groups Whether to allow slicing an `xgb.DMatrix` with `group` (or
|
#' @param allow_groups Whether to allow slicing an `xgb.DMatrix` with `group` (or
|
||||||
#' equivalently `qid`) field. Note that in such case, the result will not have
|
#' equivalently `qid`) field. Note that in such case, the result will not have
|
||||||
#' the groups anymore - they need to be set manually through `setinfo`.
|
#' the groups anymore - they need to be set manually through [setinfo()].
|
||||||
#' @param colset currently not used (columns subsetting is not available)
|
#' @param colset Currently not used (columns subsetting is not available).
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' data(agaricus.train, package='xgboost')
|
#' data(agaricus.train, package = "xgboost")
|
||||||
|
#'
|
||||||
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||||
#'
|
#'
|
||||||
#' dsub <- xgb.slice.DMatrix(dtrain, 1:42)
|
#' dsub <- xgb.slice.DMatrix(dtrain, 1:42)
|
||||||
#' labels1 <- getinfo(dsub, 'label')
|
#' labels1 <- getinfo(dsub, "label")
|
||||||
|
#'
|
||||||
#' dsub <- dtrain[1:42, ]
|
#' dsub <- dtrain[1:42, ]
|
||||||
#' labels2 <- getinfo(dsub, 'label')
|
#' labels2 <- getinfo(dsub, "label")
|
||||||
#' all.equal(labels1, labels2)
|
#' all.equal(labels1, labels2)
|
||||||
#'
|
#'
|
||||||
#' @rdname xgb.slice.DMatrix
|
#' @rdname xgb.slice.DMatrix
|
||||||
@ -1292,16 +1297,17 @@ xgb.slice.DMatrix <- function(object, idxset, allow_groups = FALSE) {
|
|||||||
#' Print information about xgb.DMatrix.
|
#' Print information about xgb.DMatrix.
|
||||||
#' Currently it displays dimensions and presence of info-fields and colnames.
|
#' Currently it displays dimensions and presence of info-fields and colnames.
|
||||||
#'
|
#'
|
||||||
#' @param x an xgb.DMatrix object
|
#' @param x An xgb.DMatrix object.
|
||||||
#' @param verbose whether to print colnames (when present)
|
#' @param verbose Whether to print colnames (when present).
|
||||||
#' @param ... not currently used
|
#' @param ... Not currently used.
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' data(agaricus.train, package='xgboost')
|
#' data(agaricus.train, package = "xgboost")
|
||||||
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
|
||||||
#'
|
#'
|
||||||
|
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||||
#' dtrain
|
#' dtrain
|
||||||
#' print(dtrain, verbose=TRUE)
|
#'
|
||||||
|
#' print(dtrain, verbose = TRUE)
|
||||||
#'
|
#'
|
||||||
#' @method print xgb.DMatrix
|
#' @method print xgb.DMatrix
|
||||||
#' @export
|
#' @export
|
||||||
|
|||||||
@ -2,12 +2,13 @@
|
|||||||
#'
|
#'
|
||||||
#' Save xgb.DMatrix object to binary file
|
#' Save xgb.DMatrix object to binary file
|
||||||
#'
|
#'
|
||||||
#' @param dmatrix the \code{xgb.DMatrix} object
|
#' @param dmatrix the `xgb.DMatrix` object
|
||||||
#' @param fname the name of the file to write.
|
#' @param fname the name of the file to write.
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' \dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
#' \dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
||||||
#' data(agaricus.train, package='xgboost')
|
#' data(agaricus.train, package = "xgboost")
|
||||||
|
#'
|
||||||
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||||
#' fname <- file.path(tempdir(), "xgb.DMatrix.data")
|
#' fname <- file.path(tempdir(), "xgb.DMatrix.data")
|
||||||
#' xgb.DMatrix.save(dtrain, fname)
|
#' xgb.DMatrix.save(dtrain, fname)
|
||||||
|
|||||||
@ -1,24 +1,26 @@
|
|||||||
|
#' Set and get global configuration
|
||||||
|
#'
|
||||||
#' Global configuration consists of a collection of parameters that can be applied in the global
|
#' Global configuration consists of a collection of parameters that can be applied in the global
|
||||||
#' scope. See \url{https://xgboost.readthedocs.io/en/stable/parameter.html} for the full list of
|
#' scope. See \url{https://xgboost.readthedocs.io/en/stable/parameter.html} for the full list of
|
||||||
#' parameters supported in the global configuration. Use \code{xgb.set.config} to update the
|
#' parameters supported in the global configuration. Use `xgb.set.config()` to update the
|
||||||
#' values of one or more global-scope parameters. Use \code{xgb.get.config} to fetch the current
|
#' values of one or more global-scope parameters. Use `xgb.get.config()` to fetch the current
|
||||||
#' values of all global-scope parameters (listed in
|
#' values of all global-scope parameters (listed in
|
||||||
#' \url{https://xgboost.readthedocs.io/en/stable/parameter.html}).
|
#' \url{https://xgboost.readthedocs.io/en/stable/parameter.html}).
|
||||||
|
#'
|
||||||
#' @details
|
#' @details
|
||||||
#' Note that serialization-related functions might use a globally-configured number of threads,
|
#' Note that serialization-related functions might use a globally-configured number of threads,
|
||||||
#' which is managed by the system's OpenMP (OMP) configuration instead. Typically, XGBoost methods
|
#' which is managed by the system's OpenMP (OMP) configuration instead. Typically, XGBoost methods
|
||||||
#' accept an `nthreads` parameter, but some methods like `readRDS` might get executed before such
|
#' accept an `nthreads` parameter, but some methods like [readRDS()] might get executed before such
|
||||||
#' parameter can be supplied.
|
#' parameter can be supplied.
|
||||||
#'
|
#'
|
||||||
#' The number of OMP threads can in turn be configured for example through an environment variable
|
#' The number of OMP threads can in turn be configured for example through an environment variable
|
||||||
#' `OMP_NUM_THREADS` (needs to be set before R is started), or through `RhpcBLASctl::omp_set_num_threads`.
|
#' `OMP_NUM_THREADS` (needs to be set before R is started), or through `RhpcBLASctl::omp_set_num_threads`.
|
||||||
#' @rdname xgbConfig
|
#' @rdname xgbConfig
|
||||||
#' @title Set and get global configuration
|
|
||||||
#' @name xgb.set.config, xgb.get.config
|
#' @name xgb.set.config, xgb.get.config
|
||||||
#' @export xgb.set.config xgb.get.config
|
#' @export xgb.set.config xgb.get.config
|
||||||
#' @param ... List of parameters to be set, as keyword arguments
|
#' @param ... List of parameters to be set, as keyword arguments
|
||||||
#' @return
|
#' @return
|
||||||
#' \code{xgb.set.config} returns \code{TRUE} to signal success. \code{xgb.get.config} returns
|
#' `xgb.set.config()` returns `TRUE` to signal success. `xgb.get.config()` returns
|
||||||
#' a list containing all global-scope parameters and their values.
|
#' a list containing all global-scope parameters and their values.
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
|
|||||||
@ -2,141 +2,141 @@
|
|||||||
#'
|
#'
|
||||||
#' The cross validation function of xgboost.
|
#' The cross validation function of xgboost.
|
||||||
#'
|
#'
|
||||||
#' @param params the list of parameters. The complete list of parameters is
|
#' @param params The list of parameters. The complete list of parameters is available in the
|
||||||
#' available in the \href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}. Below
|
#' [online documentation](http://xgboost.readthedocs.io/en/latest/parameter.html).
|
||||||
#' is a shorter summary:
|
#' Below is a shorter summary:
|
||||||
#' \itemize{
|
#' - `objective`: Objective function, common ones are
|
||||||
#' \item \code{objective} objective function, common ones are
|
#' - `reg:squarederror`: Regression with squared loss.
|
||||||
#' \itemize{
|
#' - `binary:logistic`: Logistic regression for classification.
|
||||||
#' \item \code{reg:squarederror} Regression with squared loss.
|
|
||||||
#' \item \code{binary:logistic} logistic regression for classification.
|
|
||||||
#' \item See \code{\link[=xgb.train]{xgb.train}()} for complete list of objectives.
|
|
||||||
#' }
|
|
||||||
#' \item \code{eta} step size of each boosting step
|
|
||||||
#' \item \code{max_depth} maximum depth of the tree
|
|
||||||
#' \item \code{nthread} number of thread used in training, if not set, all threads are used
|
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' See \code{\link{xgb.train}} for further details.
|
#' See [xgb.train()] for complete list of objectives.
|
||||||
#' See also demo/ for walkthrough example in R.
|
#' - `eta`: Step size of each boosting step
|
||||||
|
#' - `max_depth`: Maximum depth of the tree
|
||||||
|
#' - `nthread`: Number of threads used in training. If not set, all threads are used
|
||||||
#'
|
#'
|
||||||
#' Note that, while `params` accepts a `seed` entry and will use such parameter for model training if
|
#' See [xgb.train()] for further details.
|
||||||
#' supplied, this seed is not used for creation of train-test splits, which instead rely on R's own RNG
|
#' See also demo for walkthrough example in R.
|
||||||
#' system - thus, for reproducible results, one needs to call the `set.seed` function beforehand.
|
#'
|
||||||
|
#' Note that, while `params` accepts a `seed` entry and will use such parameter for model training if
|
||||||
|
#' supplied, this seed is not used for creation of train-test splits, which instead rely on R's own RNG
|
||||||
|
#' system - thus, for reproducible results, one needs to call the [set.seed()] function beforehand.
|
||||||
#' @param data An `xgb.DMatrix` object, with corresponding fields like `label` or bounds as required
|
#' @param data An `xgb.DMatrix` object, with corresponding fields like `label` or bounds as required
|
||||||
#' for model training by the objective.
|
#' for model training by the objective.
|
||||||
#'
|
#'
|
||||||
#' Note that only the basic `xgb.DMatrix` class is supported - variants such as `xgb.QuantileDMatrix`
|
#' Note that only the basic `xgb.DMatrix` class is supported - variants such as `xgb.QuantileDMatrix`
|
||||||
#' or `xgb.ExternalDMatrix` are not supported here.
|
#' or `xgb.ExternalDMatrix` are not supported here.
|
||||||
#' @param nrounds the max number of iterations
|
#' @param nrounds The max number of iterations.
|
||||||
#' @param nfold the original dataset is randomly partitioned into \code{nfold} equal size subsamples.
|
#' @param nfold The original dataset is randomly partitioned into `nfold` equal size subsamples.
|
||||||
#' @param prediction A logical value indicating whether to return the test fold predictions
|
#' @param prediction A logical value indicating whether to return the test fold predictions
|
||||||
#' from each CV model. This parameter engages the \code{\link{xgb.cb.cv.predict}} callback.
|
#' from each CV model. This parameter engages the [xgb.cb.cv.predict()] callback.
|
||||||
#' @param showsd \code{boolean}, whether to show standard deviation of cross validation
|
#' @param showsd Logical value whether to show standard deviation of cross validation.
|
||||||
#' @param metrics, list of evaluation metrics to be used in cross validation,
|
#' @param metrics List of evaluation metrics to be used in cross validation,
|
||||||
#' when it is not specified, the evaluation metric is chosen according to objective function.
|
#' when it is not specified, the evaluation metric is chosen according to objective function.
|
||||||
#' Possible options are:
|
#' Possible options are:
|
||||||
#' \itemize{
|
#' - `error`: Binary classification error rate
|
||||||
#' \item \code{error} binary classification error rate
|
#' - `rmse`: Root mean square error
|
||||||
#' \item \code{rmse} Rooted mean square error
|
#' - `logloss`: Negative log-likelihood function
|
||||||
#' \item \code{logloss} negative log-likelihood function
|
#' - `mae`: Mean absolute error
|
||||||
#' \item \code{mae} Mean absolute error
|
#' - `mape`: Mean absolute percentage error
|
||||||
#' \item \code{mape} Mean absolute percentage error
|
#' - `auc`: Area under curve
|
||||||
#' \item \code{auc} Area under curve
|
#' - `aucpr`: Area under PR curve
|
||||||
#' \item \code{aucpr} Area under PR curve
|
#' - `merror`: Exact matching error used to evaluate multi-class classification
|
||||||
#' \item \code{merror} Exact matching error, used to evaluate multi-class classification
|
#' @param obj Customized objective function. Returns gradient and second order
|
||||||
#' }
|
#' gradient with given prediction and dtrain.
|
||||||
#' @param obj customized objective function. Returns gradient and second order
|
#' @param feval Customized evaluation function. Returns
|
||||||
#' gradient with given prediction and dtrain.
|
#' `list(metric='metric-name', value='metric-value')` with given prediction and dtrain.
|
||||||
#' @param feval customized evaluation function. Returns
|
#' @param stratified Logical flag indicating whether sampling of folds should be stratified
|
||||||
#' \code{list(metric='metric-name', value='metric-value')} with given
|
#' by the values of outcome labels. For real-valued labels in regression objectives,
|
||||||
#' prediction and dtrain.
|
#' stratification will be done by discretizing the labels into up to 5 buckets beforehand.
|
||||||
#' @param stratified A \code{boolean} indicating whether sampling of folds should be stratified
|
|
||||||
#' by the values of outcome labels. For real-valued labels in regression objectives,
|
|
||||||
#' stratification will be done by discretizing the labels into up to 5 buckets beforehand.
|
|
||||||
#'
|
#'
|
||||||
#' If passing "auto", will be set to `TRUE` if the objective in `params` is a classification
|
#' If passing "auto", will be set to `TRUE` if the objective in `params` is a classification
|
||||||
#' objective (from XGBoost's built-in objectives, doesn't apply to custom ones), and to
|
#' objective (from XGBoost's built-in objectives, doesn't apply to custom ones), and to
|
||||||
#' `FALSE` otherwise.
|
#' `FALSE` otherwise.
|
||||||
#'
|
#'
|
||||||
#' This parameter is ignored when `data` has a `group` field - in such case, the splitting
|
#' This parameter is ignored when `data` has a `group` field - in such case, the splitting
|
||||||
#' will be based on whole groups (note that this might make the folds have different sizes).
|
#' will be based on whole groups (note that this might make the folds have different sizes).
|
||||||
#'
|
#'
|
||||||
#' Value `TRUE` here is \bold{not} supported for custom objectives.
|
#' Value `TRUE` here is **not** supported for custom objectives.
|
||||||
#' @param folds \code{list} provides a possibility to use a list of pre-defined CV folds
|
#' @param folds List with pre-defined CV folds (each element must be a vector of test fold's indices).
|
||||||
#' (each element must be a vector of test fold's indices). When folds are supplied,
|
#' When folds are supplied, the `nfold` and `stratified` parameters are ignored.
|
||||||
#' the \code{nfold} and \code{stratified} parameters are ignored.
|
|
||||||
#'
|
#'
|
||||||
#' If `data` has a `group` field and the objective requires this field, each fold (list element)
|
#' If `data` has a `group` field and the objective requires this field, each fold (list element)
|
||||||
#' must additionally have two attributes (retrievable through \link{attributes}) named `group_test`
|
#' must additionally have two attributes (retrievable through `attributes`) named `group_test`
|
||||||
#' and `group_train`, which should hold the `group` to assign through \link{setinfo.xgb.DMatrix} to
|
#' and `group_train`, which should hold the `group` to assign through [setinfo.xgb.DMatrix()] to
|
||||||
#' the resulting DMatrices.
|
#' the resulting DMatrices.
|
||||||
#' @param train_folds \code{list} list specifying which indicies to use for training. If \code{NULL}
|
#' @param train_folds List specifying which indices to use for training. If `NULL`
|
||||||
#' (the default) all indices not specified in \code{folds} will be used for training.
|
#' (the default) all indices not specified in `folds` will be used for training.
|
||||||
#'
|
#'
|
||||||
#' This is not supported when `data` has `group` field.
|
#' This is not supported when `data` has `group` field.
|
||||||
#' @param verbose \code{boolean}, print the statistics during the process
|
#' @param verbose Logical flag. Should statistics be printed during the process?
|
||||||
#' @param print_every_n Print each n-th iteration evaluation messages when \code{verbose>0}.
|
#' @param print_every_n Print each nth iteration evaluation messages when `verbose > 0`.
|
||||||
#' Default is 1 which means all messages are printed. This parameter is passed to the
|
#' Default is 1 which means all messages are printed. This parameter is passed to the
|
||||||
#' \code{\link{xgb.cb.print.evaluation}} callback.
|
#' [xgb.cb.print.evaluation()] callback.
|
||||||
#' @param early_stopping_rounds If \code{NULL}, the early stopping function is not triggered.
|
#' @param early_stopping_rounds If `NULL`, the early stopping function is not triggered.
|
||||||
#' If set to an integer \code{k}, training with a validation set will stop if the performance
|
#' If set to an integer `k`, training with a validation set will stop if the performance
|
||||||
#' doesn't improve for \code{k} rounds.
|
#' doesn't improve for `k` rounds.
|
||||||
#' Setting this parameter engages the \code{\link{xgb.cb.early.stop}} callback.
|
#' Setting this parameter engages the [xgb.cb.early.stop()] callback.
|
||||||
#' @param maximize If \code{feval} and \code{early_stopping_rounds} are set,
|
#' @param maximize If `feval` and `early_stopping_rounds` are set,
|
||||||
#' then this parameter must be set as well.
|
#' then this parameter must be set as well.
|
||||||
#' When it is \code{TRUE}, it means the larger the evaluation score the better.
|
#' When it is `TRUE`, it means the larger the evaluation score the better.
|
||||||
#' This parameter is passed to the \code{\link{xgb.cb.early.stop}} callback.
|
#' This parameter is passed to the [xgb.cb.early.stop()] callback.
|
||||||
#' @param callbacks a list of callback functions to perform various task during boosting.
|
#' @param callbacks A list of callback functions to perform various task during boosting.
|
||||||
#' See \code{\link{xgb.Callback}}. Some of the callbacks are automatically created depending on the
|
#' See [xgb.Callback()]. Some of the callbacks are automatically created depending on the
|
||||||
#' parameters' values. User can provide either existing or their own callback methods in order
|
#' parameters' values. User can provide either existing or their own callback methods in order
|
||||||
#' to customize the training process.
|
#' to customize the training process.
|
||||||
#' @param ... other parameters to pass to \code{params}.
|
#' @param ... Other parameters to pass to `params`.
|
||||||
#'
|
#'
|
||||||
#' @details
|
#' @details
|
||||||
#' The original sample is randomly partitioned into \code{nfold} equal size subsamples.
|
#' The original sample is randomly partitioned into `nfold` equal size subsamples.
|
||||||
#'
|
#'
|
||||||
#' Of the \code{nfold} subsamples, a single subsample is retained as the validation data for testing the model,
|
#' Of the `nfold` subsamples, a single subsample is retained as the validation data for testing the model,
|
||||||
#' and the remaining \code{nfold - 1} subsamples are used as training data.
|
#' and the remaining `nfold - 1` subsamples are used as training data.
|
||||||
#'
|
#'
|
||||||
#' The cross-validation process is then repeated \code{nrounds} times, with each of the
|
#' The cross-validation process is then repeated `nrounds` times, with each of the
|
||||||
#' \code{nfold} subsamples used exactly once as the validation data.
|
#' `nfold` subsamples used exactly once as the validation data.
|
||||||
#'
|
#'
|
||||||
#' All observations are used for both training and validation.
|
#' All observations are used for both training and validation.
|
||||||
#'
|
#'
|
||||||
#' Adapted from \url{https://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29}
|
#' Adapted from \url{https://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29}
|
||||||
#'
|
#'
|
||||||
#' @return
|
#' @return
|
||||||
#' An object of class \code{xgb.cv.synchronous} with the following elements:
|
#' An object of class 'xgb.cv.synchronous' with the following elements:
|
||||||
#' \itemize{
|
#' - `call`: Function call.
|
||||||
#' \item \code{call} a function call.
|
#' - `params`: Parameters that were passed to the xgboost library. Note that it does not
|
||||||
#' \item \code{params} parameters that were passed to the xgboost library. Note that it does not
|
#' capture parameters changed by the [xgb.cb.reset.parameters()] callback.
|
||||||
#' capture parameters changed by the \code{\link{xgb.cb.reset.parameters}} callback.
|
#' - `evaluation_log`: Evaluation history stored as a `data.table` with the
|
||||||
#' \item \code{evaluation_log} evaluation history stored as a \code{data.table} with the
|
#' first column corresponding to iteration number and the rest corresponding to the
|
||||||
#' first column corresponding to iteration number and the rest corresponding to the
|
#' CV-based evaluation means and standard deviations for the training and test CV-sets.
|
||||||
#' CV-based evaluation means and standard deviations for the training and test CV-sets.
|
#' It is created by the [xgb.cb.evaluation.log()] callback.
|
||||||
#' It is created by the \code{\link{xgb.cb.evaluation.log}} callback.
|
#' - `niter`: Number of boosting iterations.
|
||||||
#' \item \code{niter} number of boosting iterations.
|
#' - `nfeatures`: Number of features in training data.
|
||||||
#' \item \code{nfeatures} number of features in training data.
|
#' - `folds`: The list of CV folds' indices - either those passed through the `folds`
|
||||||
#' \item \code{folds} the list of CV folds' indices - either those passed through the \code{folds}
|
#' parameter or randomly generated.
|
||||||
#' parameter or randomly generated.
|
#' - `best_iteration`: Iteration number with the best evaluation metric value
|
||||||
#' \item \code{best_iteration} iteration number with the best evaluation metric value
|
#' (only available with early stopping).
|
||||||
#' (only available with early stopping).
|
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' Plus other potential elements that are the result of callbacks, such as a list `cv_predict` with
|
#' Plus other potential elements that are the result of callbacks, such as a list `cv_predict` with
|
||||||
#' a sub-element `pred` when passing `prediction = TRUE`, which is added by the \link{xgb.cb.cv.predict}
|
#' a sub-element `pred` when passing `prediction = TRUE`, which is added by the [xgb.cb.cv.predict()]
|
||||||
#' callback (note that one can also pass it manually under `callbacks` with different settings,
|
#' callback (note that one can also pass it manually under `callbacks` with different settings,
|
||||||
#' such as saving also the models created during cross validation); or a list `early_stop` which
|
#' such as saving also the models created during cross validation); or a list `early_stop` which
|
||||||
#' will contain elements such as `best_iteration` when using the early stopping callback (\link{xgb.cb.early.stop}).
|
#' will contain elements such as `best_iteration` when using the early stopping callback ([xgb.cb.early.stop()]).
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' data(agaricus.train, package='xgboost')
|
#' data(agaricus.train, package = "xgboost")
|
||||||
|
#'
|
||||||
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
#' dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||||
#' cv <- xgb.cv(data = dtrain, nrounds = 3, nthread = 2, nfold = 5, metrics = list("rmse","auc"),
|
#'
|
||||||
#' max_depth = 3, eta = 1, objective = "binary:logistic")
|
#' cv <- xgb.cv(
|
||||||
|
#' data = dtrain,
|
||||||
|
#' nrounds = 3,
|
||||||
|
#' nthread = 2,
|
||||||
|
#' nfold = 5,
|
||||||
|
#' metrics = list("rmse","auc"),
|
||||||
|
#' max_depth = 3,
|
||||||
|
#' eta = 1,objective = "binary:logistic"
|
||||||
|
#' )
|
||||||
#' print(cv)
|
#' print(cv)
|
||||||
#' print(cv, verbose=TRUE)
|
#' print(cv, verbose = TRUE)
|
||||||
#'
|
#'
|
||||||
#' @export
|
#' @export
|
||||||
xgb.cv <- function(params = list(), data, nrounds, nfold,
|
xgb.cv <- function(params = list(), data, nrounds, nfold,
|
||||||
@ -325,23 +325,31 @@ xgb.cv <- function(params = list(), data, nrounds, nfold,
|
|||||||
|
|
||||||
#' Print xgb.cv result
|
#' Print xgb.cv result
|
||||||
#'
|
#'
|
||||||
#' Prints formatted results of \code{xgb.cv}.
|
#' Prints formatted results of [xgb.cv()].
|
||||||
#'
|
#'
|
||||||
#' @param x an \code{xgb.cv.synchronous} object
|
#' @param x An `xgb.cv.synchronous` object.
|
||||||
#' @param verbose whether to print detailed data
|
#' @param verbose Whether to print detailed data.
|
||||||
#' @param ... passed to \code{data.table.print}
|
#' @param ... Passed to `data.table.print()`.
|
||||||
#'
|
#'
|
||||||
#' @details
|
#' @details
|
||||||
#' When not verbose, it would only print the evaluation results,
|
#' When not verbose, it would only print the evaluation results,
|
||||||
#' including the best iteration (when available).
|
#' including the best iteration (when available).
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' data(agaricus.train, package='xgboost')
|
#' data(agaricus.train, package = "xgboost")
|
||||||
|
#'
|
||||||
#' train <- agaricus.train
|
#' train <- agaricus.train
|
||||||
#' cv <- xgb.cv(data = xgb.DMatrix(train$data, label = train$label), nfold = 5, max_depth = 2,
|
#' cv <- xgb.cv(
|
||||||
#' eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
|
#' data = xgb.DMatrix(train$data, label = train$label),
|
||||||
|
#' nfold = 5,
|
||||||
|
#' max_depth = 2,
|
||||||
|
#' eta = 1,
|
||||||
|
#' nthread = 2,
|
||||||
|
#' nrounds = 2,
|
||||||
|
#' objective = "binary:logistic"
|
||||||
|
#' )
|
||||||
#' print(cv)
|
#' print(cv)
|
||||||
#' print(cv, verbose=TRUE)
|
#' print(cv, verbose = TRUE)
|
||||||
#'
|
#'
|
||||||
#' @rdname print.xgb.cv
|
#' @rdname print.xgb.cv
|
||||||
#' @method print xgb.cv.synchronous
|
#' @method print xgb.cv.synchronous
|
||||||
|
|||||||
@ -28,7 +28,7 @@
|
|||||||
#' This function uses [GraphViz](https://www.graphviz.org/) as DiagrammeR backend.
|
#' This function uses [GraphViz](https://www.graphviz.org/) as DiagrammeR backend.
|
||||||
#'
|
#'
|
||||||
#' @param model Object of class `xgb.Booster`. If it contains feature names (they can be set through
|
#' @param model Object of class `xgb.Booster`. If it contains feature names (they can be set through
|
||||||
#' \link{setinfo}), they will be used in the output from this function.
|
#' [setinfo()], they will be used in the output from this function.
|
||||||
#' @param trees An integer vector of tree indices that should be used.
|
#' @param trees An integer vector of tree indices that should be used.
|
||||||
#' The default (`NULL`) uses all trees.
|
#' The default (`NULL`) uses all trees.
|
||||||
#' Useful, e.g., in multiclass classification to get only
|
#' Useful, e.g., in multiclass classification to get only
|
||||||
|
|||||||
@ -2,7 +2,7 @@
|
|||||||
#'
|
#'
|
||||||
#' Save XGBoost model to a file in binary or JSON format.
|
#' Save XGBoost model to a file in binary or JSON format.
|
||||||
#'
|
#'
|
||||||
#' @param model Model object of \code{xgb.Booster} class.
|
#' @param model Model object of `xgb.Booster` class.
|
||||||
#' @param fname Name of the file to write. Its extension determines the serialization format:
|
#' @param fname Name of the file to write. Its extension determines the serialization format:
|
||||||
#' - ".ubj": Use the universal binary JSON format (recommended).
|
#' - ".ubj": Use the universal binary JSON format (recommended).
|
||||||
#' This format uses binary types for e.g. floating point numbers, thereby preventing any loss
|
#' This format uses binary types for e.g. floating point numbers, thereby preventing any loss
|
||||||
|
|||||||
@ -1,201 +1,186 @@
|
|||||||
#' eXtreme Gradient Boosting Training
|
#' eXtreme Gradient Boosting Training
|
||||||
#'
|
#'
|
||||||
#' \code{xgb.train} is an advanced interface for training an xgboost model.
|
#' `xgb.train()` is an advanced interface for training an xgboost model.
|
||||||
#' The \code{xgboost} function is a simpler wrapper for \code{xgb.train}.
|
#' The [xgboost()] function is a simpler wrapper for `xgb.train()`.
|
||||||
#'
|
#'
|
||||||
#' @param params the list of parameters. The complete list of parameters is
|
#' @param params the list of parameters. The complete list of parameters is
|
||||||
#' available in the \href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}. Below
|
#' available in the [online documentation](http://xgboost.readthedocs.io/en/latest/parameter.html).
|
||||||
#' is a shorter summary:
|
#' Below is a shorter summary:
|
||||||
#'
|
#'
|
||||||
#' 1. General Parameters
|
#' **1. General Parameters**
|
||||||
#'
|
#'
|
||||||
#' \itemize{
|
#' - `booster`: Which booster to use, can be `gbtree` or `gblinear`. Default: `gbtree`.
|
||||||
#' \item \code{booster} which booster to use, can be \code{gbtree} or \code{gblinear}. Default: \code{gbtree}.
|
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' 2. Booster Parameters
|
#' **2. Booster Parameters**
|
||||||
#'
|
#'
|
||||||
#' 2.1. Parameters for Tree Booster
|
#' **2.1. Parameters for Tree Booster**
|
||||||
|
#' - `eta`: The learning rate: scale the contribution of each tree by a factor of `0 < eta < 1`
|
||||||
|
#' when it is added to the current approximation.
|
||||||
|
#' Used to prevent overfitting by making the boosting process more conservative.
|
||||||
|
#' Lower value for `eta` implies larger value for `nrounds`: low `eta` value means model
|
||||||
|
#' more robust to overfitting but slower to compute. Default: 0.3.
|
||||||
|
#' - `gamma`: Minimum loss reduction required to make a further partition on a leaf node of the tree.
|
||||||
|
#' the larger, the more conservative the algorithm will be.
|
||||||
|
#' - `max_depth`: Maximum depth of a tree. Default: 6.
|
||||||
|
#' - `min_child_weight`: Minimum sum of instance weight (hessian) needed in a child.
|
||||||
|
#' If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight,
|
||||||
|
#' then the building process will give up further partitioning.
|
||||||
|
#' In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node.
|
||||||
|
#' The larger, the more conservative the algorithm will be. Default: 1.
|
||||||
|
#' - `subsample`: Subsample ratio of the training instance.
|
||||||
|
#' Setting it to 0.5 means that xgboost randomly collected half of the data instances to grow trees
|
||||||
|
#' and this will prevent overfitting. It makes computation shorter (because less data to analyse).
|
||||||
|
#' It is advised to use this parameter with `eta` and increase `nrounds`. Default: 1.
|
||||||
|
#' - `colsample_bytree`: Subsample ratio of columns when constructing each tree. Default: 1.
|
||||||
|
#' - `lambda`: L2 regularization term on weights. Default: 1.
|
||||||
|
#' - `alpha`: L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0.
|
||||||
|
#' - `num_parallel_tree`: Experimental parameter. number of trees to grow per round.
|
||||||
|
#' Useful to test Random Forest through XGBoost.
|
||||||
|
#' (set `colsample_bytree < 1`, `subsample < 1` and `round = 1`) accordingly.
|
||||||
|
#' Default: 1.
|
||||||
|
#' - `monotone_constraints`: A numerical vector consists of `1`, `0` and `-1` with its length
|
||||||
|
#' equals to the number of features in the training data.
|
||||||
|
#' `1` is increasing, `-1` is decreasing and `0` is no constraint.
|
||||||
|
#' - `interaction_constraints`: A list of vectors specifying feature indices of permitted interactions.
|
||||||
|
#' Each item of the list represents one permitted interaction where specified features are allowed to interact with each other.
|
||||||
|
#' Feature index values should start from `0` (`0` references the first column).
|
||||||
|
#' Leave argument unspecified for no interaction constraints.
|
||||||
#'
|
#'
|
||||||
#' \itemize{
|
#' **2.2. Parameters for Linear Booster**
|
||||||
#' \item{ \code{eta} control the learning rate: scale the contribution of each tree by a factor of \code{0 < eta < 1}
|
|
||||||
#' when it is added to the current approximation.
|
|
||||||
#' Used to prevent overfitting by making the boosting process more conservative.
|
|
||||||
#' Lower value for \code{eta} implies larger value for \code{nrounds}: low \code{eta} value means model
|
|
||||||
#' more robust to overfitting but slower to compute. Default: 0.3}
|
|
||||||
#' \item{ \code{gamma} minimum loss reduction required to make a further partition on a leaf node of the tree.
|
|
||||||
#' the larger, the more conservative the algorithm will be.}
|
|
||||||
#' \item \code{max_depth} maximum depth of a tree. Default: 6
|
|
||||||
#' \item{\code{min_child_weight} minimum sum of instance weight (hessian) needed in a child.
|
|
||||||
#' If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight,
|
|
||||||
#' then the building process will give up further partitioning.
|
|
||||||
#' In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node.
|
|
||||||
#' The larger, the more conservative the algorithm will be. Default: 1}
|
|
||||||
#' \item{ \code{subsample} subsample ratio of the training instance.
|
|
||||||
#' Setting it to 0.5 means that xgboost randomly collected half of the data instances to grow trees
|
|
||||||
#' and this will prevent overfitting. It makes computation shorter (because less data to analyse).
|
|
||||||
#' It is advised to use this parameter with \code{eta} and increase \code{nrounds}. Default: 1}
|
|
||||||
#' \item \code{colsample_bytree} subsample ratio of columns when constructing each tree. Default: 1
|
|
||||||
#' \item \code{lambda} L2 regularization term on weights. Default: 1
|
|
||||||
#' \item \code{alpha} L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0
|
|
||||||
#' \item{ \code{num_parallel_tree} Experimental parameter. number of trees to grow per round.
|
|
||||||
#' Useful to test Random Forest through XGBoost
|
|
||||||
#' (set \code{colsample_bytree < 1}, \code{subsample < 1} and \code{round = 1}) accordingly.
|
|
||||||
#' Default: 1}
|
|
||||||
#' \item{ \code{monotone_constraints} A numerical vector consists of \code{1}, \code{0} and \code{-1} with its length
|
|
||||||
#' equals to the number of features in the training data.
|
|
||||||
#' \code{1} is increasing, \code{-1} is decreasing and \code{0} is no constraint.}
|
|
||||||
#' \item{ \code{interaction_constraints} A list of vectors specifying feature indices of permitted interactions.
|
|
||||||
#' Each item of the list represents one permitted interaction where specified features are allowed to interact with each other.
|
|
||||||
#' Feature index values should start from \code{0} (\code{0} references the first column).
|
|
||||||
#' Leave argument unspecified for no interaction constraints.}
|
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' 2.2. Parameters for Linear Booster
|
#' - `lambda`: L2 regularization term on weights. Default: 0.
|
||||||
|
#' - `lambda_bias`: L2 regularization term on bias. Default: 0.
|
||||||
|
#' - `alpha`: L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0.
|
||||||
#'
|
#'
|
||||||
#' \itemize{
|
#' **3. Task Parameters**
|
||||||
#' \item \code{lambda} L2 regularization term on weights. Default: 0
|
|
||||||
#' \item \code{lambda_bias} L2 regularization term on bias. Default: 0
|
|
||||||
#' \item \code{alpha} L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0
|
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' 3. Task Parameters
|
#' - `objective`: Specifies the learning task and the corresponding learning objective.
|
||||||
|
#' users can pass a self-defined function to it. The default objective options are below:
|
||||||
|
#' - `reg:squarederror`: Regression with squared loss (default).
|
||||||
|
#' - `reg:squaredlogerror`: Regression with squared log loss \eqn{1/2 \cdot (\log(pred + 1) - \log(label + 1))^2}.
|
||||||
|
#' All inputs are required to be greater than -1.
|
||||||
|
#' Also, see metric rmsle for possible issue with this objective.
|
||||||
|
#' - `reg:logistic`: Logistic regression.
|
||||||
|
#' - `reg:pseudohubererror`: Regression with Pseudo Huber loss, a twice differentiable alternative to absolute loss.
|
||||||
|
#' - `binary:logistic`: Logistic regression for binary classification. Output probability.
|
||||||
|
#' - `binary:logitraw`: Logistic regression for binary classification, output score before logistic transformation.
|
||||||
|
#' - `binary:hinge`: Hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities.
|
||||||
|
#' - `count:poisson`: Poisson regression for count data, output mean of Poisson distribution.
|
||||||
|
#' The parameter `max_delta_step` is set to 0.7 by default in poisson regression
|
||||||
|
#' (used to safeguard optimization).
|
||||||
|
#' - `survival:cox`: Cox regression for right censored survival time data (negative values are considered right censored).
|
||||||
|
#' Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional
|
||||||
|
#' hazard function \eqn{h(t) = h_0(t) \cdot HR}.
|
||||||
|
#' - `survival:aft`: Accelerated failure time model for censored survival time data. See
|
||||||
|
#' [Survival Analysis with Accelerated Failure Time](https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html)
|
||||||
|
#' for details.
|
||||||
|
#' The parameter `aft_loss_distribution` specifies the Probability Density Function
|
||||||
|
#' used by `survival:aft` and the `aft-nloglik` metric.
|
||||||
|
#' - `multi:softmax`: Set xgboost to do multiclass classification using the softmax objective.
|
||||||
|
#' Class is represented by a number and should be from 0 to `num_class - 1`.
|
||||||
|
#' - `multi:softprob`: Same as softmax, but prediction outputs a vector of ndata * nclass elements, which can be
|
||||||
|
#' further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging
|
||||||
|
#' to each class.
|
||||||
|
#' - `rank:pairwise`: Set XGBoost to do ranking task by minimizing the pairwise loss.
|
||||||
|
#' - `rank:ndcg`: Use LambdaMART to perform list-wise ranking where
|
||||||
|
#' [Normalized Discounted Cumulative Gain (NDCG)](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) is maximized.
|
||||||
|
#' - `rank:map`: Use LambdaMART to perform list-wise ranking where
|
||||||
|
#' [Mean Average Precision (MAP)](https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision)
|
||||||
|
#' is maximized.
|
||||||
|
#' - `reg:gamma`: Gamma regression with log-link. Output is a mean of gamma distribution.
|
||||||
|
#' It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be
|
||||||
|
#' [gamma-distributed](https://en.wikipedia.org/wiki/Gamma_distribution#Applications).
|
||||||
|
#' - `reg:tweedie`: Tweedie regression with log-link.
|
||||||
|
#' It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be
|
||||||
|
#' [Tweedie-distributed](https://en.wikipedia.org/wiki/Tweedie_distribution#Applications).
|
||||||
#'
|
#'
|
||||||
#' \itemize{
|
#' For custom objectives, one should pass a function taking as input the current predictions (as a numeric
|
||||||
#' \item{ \code{objective} specify the learning task and the corresponding learning objective, users can pass a self-defined function to it.
|
#' vector or matrix) and the training data (as an `xgb.DMatrix` object) that will return a list with elements
|
||||||
#' The default objective options are below:
|
#' `grad` and `hess`, which should be numeric vectors or matrices with number of rows matching to the numbers
|
||||||
#' \itemize{
|
#' of rows in the training data (same shape as the predictions that are passed as input to the function).
|
||||||
#' \item \code{reg:squarederror} Regression with squared loss (Default).
|
#' For multi-valued custom objectives, should have shape `[nrows, ntargets]`. Note that negative values of
|
||||||
#' \item{ \code{reg:squaredlogerror}: regression with squared log loss \eqn{1/2 * (log(pred + 1) - log(label + 1))^2}.
|
#' the Hessian will be clipped, so one might consider using the expected Hessian (Fisher information) if the
|
||||||
#' All inputs are required to be greater than -1.
|
#' objective is non-convex.
|
||||||
#' Also, see metric rmsle for possible issue with this objective.}
|
|
||||||
#' \item \code{reg:logistic} logistic regression.
|
|
||||||
#' \item \code{reg:pseudohubererror}: regression with Pseudo Huber loss, a twice differentiable alternative to absolute loss.
|
|
||||||
#' \item \code{binary:logistic} logistic regression for binary classification. Output probability.
|
|
||||||
#' \item \code{binary:logitraw} logistic regression for binary classification, output score before logistic transformation.
|
|
||||||
#' \item \code{binary:hinge}: hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities.
|
|
||||||
#' \item{ \code{count:poisson}: Poisson regression for count data, output mean of Poisson distribution.
|
|
||||||
#' \code{max_delta_step} is set to 0.7 by default in poisson regression (used to safeguard optimization).}
|
|
||||||
#' \item{ \code{survival:cox}: Cox regression for right censored survival time data (negative values are considered right censored).
|
|
||||||
#' Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional
|
|
||||||
#' hazard function \code{h(t) = h0(t) * HR)}.}
|
|
||||||
#' \item{ \code{survival:aft}: Accelerated failure time model for censored survival time data. See
|
|
||||||
#' \href{https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html}{Survival Analysis with Accelerated Failure Time}
|
|
||||||
#' for details.}
|
|
||||||
#' \item \code{aft_loss_distribution}: Probability Density Function used by \code{survival:aft} and \code{aft-nloglik} metric.
|
|
||||||
#' \item{ \code{multi:softmax} set xgboost to do multiclass classification using the softmax objective.
|
|
||||||
#' Class is represented by a number and should be from 0 to \code{num_class - 1}.}
|
|
||||||
#' \item{ \code{multi:softprob} same as softmax, but prediction outputs a vector of ndata * nclass elements, which can be
|
|
||||||
#' further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging
|
|
||||||
#' to each class.}
|
|
||||||
#' \item \code{rank:pairwise} set xgboost to do ranking task by minimizing the pairwise loss.
|
|
||||||
#' \item{ \code{rank:ndcg}: Use LambdaMART to perform list-wise ranking where
|
|
||||||
#' \href{https://en.wikipedia.org/wiki/Discounted_cumulative_gain}{Normalized Discounted Cumulative Gain (NDCG)} is maximized.}
|
|
||||||
#' \item{ \code{rank:map}: Use LambdaMART to perform list-wise ranking where
|
|
||||||
#' \href{https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision}{Mean Average Precision (MAP)}
|
|
||||||
#' is maximized.}
|
|
||||||
#' \item{ \code{reg:gamma}: gamma regression with log-link.
|
|
||||||
#' Output is a mean of gamma distribution.
|
|
||||||
#' It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be
|
|
||||||
#' \href{https://en.wikipedia.org/wiki/Gamma_distribution#Applications}{gamma-distributed}.}
|
|
||||||
#' \item{ \code{reg:tweedie}: Tweedie regression with log-link.
|
|
||||||
#' It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be
|
|
||||||
#' \href{https://en.wikipedia.org/wiki/Tweedie_distribution#Applications}{Tweedie-distributed}.}
|
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' For custom objectives, one should pass a function taking as input the current predictions (as a numeric
|
#' See the tutorials [Custom Objective and Evaluation Metric](https://xgboost.readthedocs.io/en/stable/tutorials/custom_metric_obj.html)
|
||||||
#' vector or matrix) and the training data (as an `xgb.DMatrix` object) that will return a list with elements
|
#' and [Advanced Usage of Custom Objectives](https://xgboost.readthedocs.io/en/stable/tutorials/advanced_custom_obj)
|
||||||
#' `grad` and `hess`, which should be numeric vectors or matrices with number of rows matching to the numbers
|
#' for more information about custom objectives.
|
||||||
#' of rows in the training data (same shape as the predictions that are passed as input to the function).
|
|
||||||
#' For multi-valued custom objectives, should have shape `[nrows, ntargets]`. Note that negative values of
|
|
||||||
#' the Hessian will be clipped, so one might consider using the expected Hessian (Fisher information) if the
|
|
||||||
#' objective is non-convex.
|
|
||||||
#'
|
#'
|
||||||
#' See the tutorials \href{https://xgboost.readthedocs.io/en/stable/tutorials/custom_metric_obj.html}{
|
#' - `base_score`: The initial prediction score of all instances, global bias. Default: 0.5.
|
||||||
#' Custom Objective and Evaluation Metric} and \href{https://xgboost.readthedocs.io/en/stable/tutorials/advanced_custom_obj}{
|
#' - `eval_metric`: Evaluation metrics for validation data.
|
||||||
#' Advanced Usage of Custom Objectives} for more information about custom objectives.
|
#' Users can pass a self-defined function to it.
|
||||||
#' }
|
#' Default: metric will be assigned according to objective
|
||||||
#' \item \code{base_score} the initial prediction score of all instances, global bias. Default: 0.5
|
#' (rmse for regression, and error for classification, mean average precision for ranking).
|
||||||
#' \item{ \code{eval_metric} evaluation metrics for validation data.
|
#' List is provided in detail section.
|
||||||
#' Users can pass a self-defined function to it.
|
#' @param data Training dataset. `xgb.train()` accepts only an `xgb.DMatrix` as the input.
|
||||||
#' Default: metric will be assigned according to objective
|
#' [xgboost()], in addition, also accepts `matrix`, `dgCMatrix`, or name of a local data file.
|
||||||
#' (rmse for regression, and error for classification, mean average precision for ranking).
|
#' @param nrounds Max number of boosting iterations.
|
||||||
#' List is provided in detail section.}
|
|
||||||
#' }
|
|
||||||
#'
|
|
||||||
#' @param data training dataset. \code{xgb.train} accepts only an \code{xgb.DMatrix} as the input.
|
|
||||||
#' \code{xgboost}, in addition, also accepts \code{matrix}, \code{dgCMatrix}, or name of a local data file.
|
|
||||||
#' @param nrounds max number of boosting iterations.
|
|
||||||
#' @param evals Named list of `xgb.DMatrix` datasets to use for evaluating model performance.
|
#' @param evals Named list of `xgb.DMatrix` datasets to use for evaluating model performance.
|
||||||
#' Metrics specified in either \code{eval_metric} or \code{feval} will be computed for each
|
#' Metrics specified in either `eval_metric` or `feval` will be computed for each
|
||||||
#' of these datasets during each boosting iteration, and stored in the end as a field named
|
#' of these datasets during each boosting iteration, and stored in the end as a field named
|
||||||
#' \code{evaluation_log} in the resulting object. When either \code{verbose>=1} or
|
#' `evaluation_log` in the resulting object. When either `verbose>=1` or
|
||||||
#' \code{\link{xgb.cb.print.evaluation}} callback is engaged, the performance results are continuously
|
#' [xgb.cb.print.evaluation()] callback is engaged, the performance results are continuously
|
||||||
#' printed out during the training.
|
#' printed out during the training.
|
||||||
#' E.g., specifying \code{evals=list(validation1=mat1, validation2=mat2)} allows to track
|
#' E.g., specifying `evals=list(validation1=mat1, validation2=mat2)` allows to track
|
||||||
#' the performance of each round's model on mat1 and mat2.
|
#' the performance of each round's model on mat1 and mat2.
|
||||||
#' @param obj customized objective function. Should take two arguments: the first one will be the
|
#' @param obj Customized objective function. Should take two arguments: the first one will be the
|
||||||
#' current predictions (either a numeric vector or matrix depending on the number of targets / classes),
|
#' current predictions (either a numeric vector or matrix depending on the number of targets / classes),
|
||||||
#' and the second one will be the `data` DMatrix object that is used for training.
|
#' and the second one will be the `data` DMatrix object that is used for training.
|
||||||
#'
|
#'
|
||||||
#' It should return a list with two elements `grad` and `hess` (in that order), as either
|
#' It should return a list with two elements `grad` and `hess` (in that order), as either
|
||||||
#' numeric vectors or numeric matrices depending on the number of targets / classes (same
|
#' numeric vectors or numeric matrices depending on the number of targets / classes (same
|
||||||
#' dimension as the predictions that are passed as first argument).
|
#' dimension as the predictions that are passed as first argument).
|
||||||
#' @param feval customized evaluation function. Just like `obj`, should take two arguments, with
|
#' @param feval Customized evaluation function. Just like `obj`, should take two arguments, with
|
||||||
#' the first one being the predictions and the second one the `data` DMatrix.
|
#' the first one being the predictions and the second one the `data` DMatrix.
|
||||||
#'
|
#'
|
||||||
#' Should return a list with two elements `metric` (name that will be displayed for this metric,
|
#' Should return a list with two elements `metric` (name that will be displayed for this metric,
|
||||||
#' should be a string / character), and `value` (the number that the function calculates, should
|
#' should be a string / character), and `value` (the number that the function calculates, should
|
||||||
#' be a numeric scalar).
|
#' be a numeric scalar).
|
||||||
#'
|
#'
|
||||||
#' Note that even if passing `feval`, objectives also have an associated default metric that
|
#' Note that even if passing `feval`, objectives also have an associated default metric that
|
||||||
#' will be evaluated in addition to it. In order to disable the built-in metric, one can pass
|
#' will be evaluated in addition to it. In order to disable the built-in metric, one can pass
|
||||||
#' parameter `disable_default_eval_metric = TRUE`.
|
#' parameter `disable_default_eval_metric = TRUE`.
|
||||||
#' @param verbose If 0, xgboost will stay silent. If 1, it will print information about performance.
|
#' @param verbose If 0, xgboost will stay silent. If 1, it will print information about performance.
|
||||||
#' If 2, some additional information will be printed out.
|
#' If 2, some additional information will be printed out.
|
||||||
#' Note that setting \code{verbose > 0} automatically engages the
|
#' Note that setting `verbose > 0` automatically engages the
|
||||||
#' \code{xgb.cb.print.evaluation(period=1)} callback function.
|
#' `xgb.cb.print.evaluation(period=1)` callback function.
|
||||||
#' @param print_every_n Print each n-th iteration evaluation messages when \code{verbose>0}.
|
#' @param print_every_n Print each nth iteration evaluation messages when `verbose>0`.
|
||||||
#' Default is 1 which means all messages are printed. This parameter is passed to the
|
#' Default is 1 which means all messages are printed. This parameter is passed to the
|
||||||
#' \code{\link{xgb.cb.print.evaluation}} callback.
|
#' [xgb.cb.print.evaluation()] callback.
|
||||||
#' @param early_stopping_rounds If \code{NULL}, the early stopping function is not triggered.
|
#' @param early_stopping_rounds If `NULL`, the early stopping function is not triggered.
|
||||||
#' If set to an integer \code{k}, training with a validation set will stop if the performance
|
#' If set to an integer `k`, training with a validation set will stop if the performance
|
||||||
#' doesn't improve for \code{k} rounds.
|
#' doesn't improve for `k` rounds. Setting this parameter engages the [xgb.cb.early.stop()] callback.
|
||||||
#' Setting this parameter engages the \code{\link{xgb.cb.early.stop}} callback.
|
#' @param maximize If `feval` and `early_stopping_rounds` are set, then this parameter must be set as well.
|
||||||
#' @param maximize If \code{feval} and \code{early_stopping_rounds} are set,
|
#' When it is `TRUE`, it means the larger the evaluation score the better.
|
||||||
#' then this parameter must be set as well.
|
#' This parameter is passed to the [xgb.cb.early.stop()] callback.
|
||||||
#' When it is \code{TRUE}, it means the larger the evaluation score the better.
|
#' @param save_period When not `NULL`, model is saved to disk after every `save_period` rounds.
|
||||||
#' This parameter is passed to the \code{\link{xgb.cb.early.stop}} callback.
|
#' 0 means save at the end. The saving is handled by the [xgb.cb.save.model()] callback.
|
||||||
#' @param save_period when it is non-NULL, model is saved to disk after every \code{save_period} rounds,
|
|
||||||
#' 0 means save at the end. The saving is handled by the \code{\link{xgb.cb.save.model}} callback.
|
|
||||||
#' @param save_name the name or path for periodically saved model file.
|
#' @param save_name the name or path for periodically saved model file.
|
||||||
#' @param xgb_model a previously built model to continue the training from.
|
#' @param xgb_model A previously built model to continue the training from.
|
||||||
#' Could be either an object of class \code{xgb.Booster}, or its raw data, or the name of a
|
#' Could be either an object of class `xgb.Booster`, or its raw data, or the name of a
|
||||||
#' file with a previously saved model.
|
#' file with a previously saved model.
|
||||||
#' @param callbacks a list of callback functions to perform various task during boosting.
|
#' @param callbacks A list of callback functions to perform various task during boosting.
|
||||||
#' See \code{\link{xgb.Callback}}. Some of the callbacks are automatically created depending on the
|
#' See [xgb.Callback()]. Some of the callbacks are automatically created depending on the
|
||||||
#' parameters' values. User can provide either existing or their own callback methods in order
|
#' parameters' values. User can provide either existing or their own callback methods in order
|
||||||
#' to customize the training process.
|
#' to customize the training process.
|
||||||
#'
|
#'
|
||||||
#' Note that some callbacks might try to leave attributes in the resulting model object,
|
#' Note that some callbacks might try to leave attributes in the resulting model object,
|
||||||
#' such as an evaluation log (a `data.table` object) - be aware that these objects are kept
|
#' such as an evaluation log (a `data.table` object) - be aware that these objects are kept
|
||||||
#' as R attributes, and thus do not get saved when using XGBoost's own serializaters like
|
#' as R attributes, and thus do not get saved when using XGBoost's own serializaters like
|
||||||
#' \link{xgb.save} (but are kept when using R serializers like \link{saveRDS}).
|
#' [xgb.save()] (but are kept when using R serializers like [saveRDS()]).
|
||||||
#' @param ... other parameters to pass to \code{params}.
|
#' @param ... other parameters to pass to `params`.
|
||||||
#'
|
#'
|
||||||
#' @return
|
#' @return An object of class `xgb.Booster`.
|
||||||
#' An object of class \code{xgb.Booster}.
|
|
||||||
#'
|
#'
|
||||||
#' @details
|
#' @details
|
||||||
#' These are the training functions for \code{xgboost}.
|
#' These are the training functions for [xgboost()].
|
||||||
#'
|
#'
|
||||||
#' The \code{xgb.train} interface supports advanced features such as \code{evals},
|
#' The `xgb.train()` interface supports advanced features such as `evals`,
|
||||||
#' customized objective and evaluation metric functions, therefore it is more flexible
|
#' customized objective and evaluation metric functions, therefore it is more flexible
|
||||||
#' than the \code{xgboost} interface.
|
#' than the [xgboost()] interface.
|
||||||
#'
|
#'
|
||||||
#' Parallelization is automatically enabled if \code{OpenMP} is present.
|
#' Parallelization is automatically enabled if OpenMP is present.
|
||||||
#' Number of threads can also be manually specified via the \code{nthread}
|
#' Number of threads can also be manually specified via the `nthread` parameter.
|
||||||
#' parameter.
|
|
||||||
#'
|
#'
|
||||||
#' While in other interfaces, the default random seed defaults to zero, in R, if a parameter `seed`
|
#' While in other interfaces, the default random seed defaults to zero, in R, if a parameter `seed`
|
||||||
#' is not manually supplied, it will generate a random seed through R's own random number generator,
|
#' is not manually supplied, it will generate a random seed through R's own random number generator,
|
||||||
@ -203,64 +188,56 @@
|
|||||||
#' RNG from R.
|
#' RNG from R.
|
||||||
#'
|
#'
|
||||||
#' The evaluation metric is chosen automatically by XGBoost (according to the objective)
|
#' The evaluation metric is chosen automatically by XGBoost (according to the objective)
|
||||||
#' when the \code{eval_metric} parameter is not provided.
|
#' when the `eval_metric` parameter is not provided.
|
||||||
#' User may set one or several \code{eval_metric} parameters.
|
#' User may set one or several `eval_metric` parameters.
|
||||||
#' Note that when using a customized metric, only this single metric can be used.
|
#' Note that when using a customized metric, only this single metric can be used.
|
||||||
#' The following is the list of built-in metrics for which XGBoost provides optimized implementation:
|
#' The following is the list of built-in metrics for which XGBoost provides optimized implementation:
|
||||||
#' \itemize{
|
#' - `rmse`: Root mean square error. \url{https://en.wikipedia.org/wiki/Root_mean_square_error}
|
||||||
#' \item \code{rmse} root mean square error. \url{https://en.wikipedia.org/wiki/Root_mean_square_error}
|
#' - `logloss`: Negative log-likelihood. \url{https://en.wikipedia.org/wiki/Log-likelihood}
|
||||||
#' \item \code{logloss} negative log-likelihood. \url{https://en.wikipedia.org/wiki/Log-likelihood}
|
#' - `mlogloss`: Multiclass logloss. \url{https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html}
|
||||||
#' \item \code{mlogloss} multiclass logloss. \url{https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html}
|
#' - `error`: Binary classification error rate. It is calculated as `(# wrong cases) / (# all cases)`.
|
||||||
#' \item \code{error} Binary classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
|
#' By default, it uses the 0.5 threshold for predicted values to define negative and positive instances.
|
||||||
#' By default, it uses the 0.5 threshold for predicted values to define negative and positive instances.
|
#' Different threshold (e.g., 0.) could be specified as `error@0`.
|
||||||
#' Different threshold (e.g., 0.) could be specified as "error@0."
|
#' - `merror`: Multiclass classification error rate. It is calculated as `(# wrong cases) / (# all cases)`.
|
||||||
#' \item \code{merror} Multiclass classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
|
#' - `mae`: Mean absolute error.
|
||||||
#' \item \code{mae} Mean absolute error
|
#' - `mape`: Mean absolute percentage error.
|
||||||
#' \item \code{mape} Mean absolute percentage error
|
#' - `auc`: Area under the curve.
|
||||||
#' \item{ \code{auc} Area under the curve.
|
#' \url{https://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.
|
||||||
#' \url{https://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.}
|
#' - `aucpr`: Area under the PR curve. \url{https://en.wikipedia.org/wiki/Precision_and_recall} for ranking evaluation.
|
||||||
#' \item \code{aucpr} Area under the PR curve. \url{https://en.wikipedia.org/wiki/Precision_and_recall} for ranking evaluation.
|
#' - `ndcg`: Normalized Discounted Cumulative Gain (for ranking task). \url{https://en.wikipedia.org/wiki/NDCG}
|
||||||
#' \item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{https://en.wikipedia.org/wiki/NDCG}
|
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' The following callbacks are automatically created when certain parameters are set:
|
#' The following callbacks are automatically created when certain parameters are set:
|
||||||
#' \itemize{
|
#' - [xgb.cb.print.evaluation()] is turned on when `verbose > 0` and the `print_every_n`
|
||||||
#' \item \code{xgb.cb.print.evaluation} is turned on when \code{verbose > 0};
|
#' parameter is passed to it.
|
||||||
#' and the \code{print_every_n} parameter is passed to it.
|
#' - [xgb.cb.evaluation.log()] is on when `evals` is present.
|
||||||
#' \item \code{xgb.cb.evaluation.log} is on when \code{evals} is present.
|
#' - [xgb.cb.early.stop()]: When `early_stopping_rounds` is set.
|
||||||
#' \item \code{xgb.cb.early.stop}: when \code{early_stopping_rounds} is set.
|
#' - [xgb.cb.save.model()]: When `save_period > 0` is set.
|
||||||
#' \item \code{xgb.cb.save.model}: when \code{save_period > 0} is set.
|
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' Note that objects of type `xgb.Booster` as returned by this function behave a bit differently
|
#' Note that objects of type `xgb.Booster` as returned by this function behave a bit differently
|
||||||
#' from typical R objects (it's an 'altrep' list class), and it makes a separation between
|
#' from typical R objects (it's an 'altrep' list class), and it makes a separation between
|
||||||
#' internal booster attributes (restricted to jsonifyable data), accessed through \link{xgb.attr}
|
#' internal booster attributes (restricted to jsonifyable data), accessed through [xgb.attr()]
|
||||||
#' and shared between interfaces through serialization functions like \link{xgb.save}; and
|
#' and shared between interfaces through serialization functions like [xgb.save()]; and
|
||||||
#' R-specific attributes (typically the result from a callback), accessed through \link{attributes}
|
#' R-specific attributes (typically the result from a callback), accessed through [attributes()]
|
||||||
#' and \link{attr}, which are otherwise
|
#' and [attr()], which are otherwise
|
||||||
#' only used in the R interface, only kept when using R's serializers like \link{saveRDS}, and
|
#' only used in the R interface, only kept when using R's serializers like [saveRDS()], and
|
||||||
#' not anyhow used by functions like \link{predict.xgb.Booster}.
|
#' not anyhow used by functions like `predict.xgb.Booster()`.
|
||||||
#'
|
#'
|
||||||
#' Be aware that one such R attribute that is automatically added is `params` - this attribute
|
#' Be aware that one such R attribute that is automatically added is `params` - this attribute
|
||||||
#' is assigned from the `params` argument to this function, and is only meant to serve as a
|
#' is assigned from the `params` argument to this function, and is only meant to serve as a
|
||||||
#' reference for what went into the booster, but is not used in other methods that take a booster
|
#' reference for what went into the booster, but is not used in other methods that take a booster
|
||||||
#' object - so for example, changing the booster's configuration requires calling `xgb.config<-`
|
#' object - so for example, changing the booster's configuration requires calling `xgb.config<-`
|
||||||
#' or 'xgb.parameters<-', while simply modifying `attributes(model)$params$<...>` will have no
|
#' or `xgb.parameters<-`, while simply modifying `attributes(model)$params$<...>` will have no
|
||||||
#' effect elsewhere.
|
#' effect elsewhere.
|
||||||
#'
|
#'
|
||||||
#' @seealso
|
#' @seealso [xgb.Callback()], [predict.xgb.Booster()], [xgb.cv()]
|
||||||
#' \code{\link{xgb.Callback}},
|
|
||||||
#' \code{\link{predict.xgb.Booster}},
|
|
||||||
#' \code{\link{xgb.cv}}
|
|
||||||
#'
|
#'
|
||||||
#' @references
|
#' @references
|
||||||
#'
|
|
||||||
#' Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System",
|
#' Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System",
|
||||||
#' 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, \url{https://arxiv.org/abs/1603.02754}
|
#' 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, \url{https://arxiv.org/abs/1603.02754}
|
||||||
#'
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' data(agaricus.train, package='xgboost')
|
#' data(agaricus.train, package = "xgboost")
|
||||||
#' data(agaricus.test, package='xgboost')
|
#' data(agaricus.test, package = "xgboost")
|
||||||
#'
|
#'
|
||||||
#' ## Keep the number of threads to 1 for examples
|
#' ## Keep the number of threads to 1 for examples
|
||||||
#' nthread <- 1
|
#' nthread <- 1
|
||||||
@ -275,8 +252,13 @@
|
|||||||
#' evals <- list(train = dtrain, eval = dtest)
|
#' evals <- list(train = dtrain, eval = dtest)
|
||||||
#'
|
#'
|
||||||
#' ## A simple xgb.train example:
|
#' ## A simple xgb.train example:
|
||||||
#' param <- list(max_depth = 2, eta = 1, nthread = nthread,
|
#' param <- list(
|
||||||
#' objective = "binary:logistic", eval_metric = "auc")
|
#' max_depth = 2,
|
||||||
|
#' eta = 1,
|
||||||
|
#' nthread = nthread,
|
||||||
|
#' objective = "binary:logistic",
|
||||||
|
#' eval_metric = "auc"
|
||||||
|
#' )
|
||||||
#' bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)
|
#' bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)
|
||||||
#'
|
#'
|
||||||
#' ## An xgb.train example where custom objective and evaluation metric are
|
#' ## An xgb.train example where custom objective and evaluation metric are
|
||||||
@ -296,34 +278,65 @@
|
|||||||
#'
|
#'
|
||||||
#' # These functions could be used by passing them either:
|
#' # These functions could be used by passing them either:
|
||||||
#' # as 'objective' and 'eval_metric' parameters in the params list:
|
#' # as 'objective' and 'eval_metric' parameters in the params list:
|
||||||
#' param <- list(max_depth = 2, eta = 1, nthread = nthread,
|
#' param <- list(
|
||||||
#' objective = logregobj, eval_metric = evalerror)
|
#' max_depth = 2,
|
||||||
|
#' eta = 1,
|
||||||
|
#' nthread = nthread,
|
||||||
|
#' objective = logregobj,
|
||||||
|
#' eval_metric = evalerror
|
||||||
|
#' )
|
||||||
#' bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)
|
#' bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)
|
||||||
#'
|
#'
|
||||||
#' # or through the ... arguments:
|
#' # or through the ... arguments:
|
||||||
#' param <- list(max_depth = 2, eta = 1, nthread = nthread)
|
#' param <- list(max_depth = 2, eta = 1, nthread = nthread)
|
||||||
#' bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0,
|
#' bst <- xgb.train(
|
||||||
#' objective = logregobj, eval_metric = evalerror)
|
#' param,
|
||||||
|
#' dtrain,
|
||||||
|
#' nrounds = 2,
|
||||||
|
#' evals = evals,
|
||||||
|
#' verbose = 0,
|
||||||
|
#' objective = logregobj,
|
||||||
|
#' eval_metric = evalerror
|
||||||
|
#' )
|
||||||
#'
|
#'
|
||||||
#' # or as dedicated 'obj' and 'feval' parameters of xgb.train:
|
#' # or as dedicated 'obj' and 'feval' parameters of xgb.train:
|
||||||
#' bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals,
|
#' bst <- xgb.train(
|
||||||
#' obj = logregobj, feval = evalerror)
|
#' param, dtrain, nrounds = 2, evals = evals, obj = logregobj, feval = evalerror
|
||||||
|
#' )
|
||||||
#'
|
#'
|
||||||
#'
|
#'
|
||||||
#' ## An xgb.train example of using variable learning rates at each iteration:
|
#' ## An xgb.train example of using variable learning rates at each iteration:
|
||||||
#' param <- list(max_depth = 2, eta = 1, nthread = nthread,
|
#' param <- list(
|
||||||
#' objective = "binary:logistic", eval_metric = "auc")
|
#' max_depth = 2,
|
||||||
|
#' eta = 1,
|
||||||
|
#' nthread = nthread,
|
||||||
|
#' objective = "binary:logistic",
|
||||||
|
#' eval_metric = "auc"
|
||||||
|
#' )
|
||||||
#' my_etas <- list(eta = c(0.5, 0.1))
|
#' my_etas <- list(eta = c(0.5, 0.1))
|
||||||
#' bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0,
|
#'
|
||||||
#' callbacks = list(xgb.cb.reset.parameters(my_etas)))
|
#' bst <- xgb.train(
|
||||||
|
#' param,
|
||||||
|
#' dtrain,
|
||||||
|
#' nrounds = 2,
|
||||||
|
#' evals = evals,
|
||||||
|
#' verbose = 0,
|
||||||
|
#' callbacks = list(xgb.cb.reset.parameters(my_etas))
|
||||||
|
#' )
|
||||||
#'
|
#'
|
||||||
#' ## Early stopping:
|
#' ## Early stopping:
|
||||||
#' bst <- xgb.train(param, dtrain, nrounds = 25, evals = evals,
|
#' bst <- xgb.train(
|
||||||
#' early_stopping_rounds = 3)
|
#' param, dtrain, nrounds = 25, evals = evals, early_stopping_rounds = 3
|
||||||
|
#' )
|
||||||
#'
|
#'
|
||||||
#' ## An 'xgboost' interface example:
|
#' ## An 'xgboost' interface example:
|
||||||
#' bst <- xgboost(x = agaricus.train$data, y = factor(agaricus.train$label),
|
#' bst <- xgboost(
|
||||||
#' params = list(max_depth = 2, eta = 1), nthread = nthread, nrounds = 2)
|
#' x = agaricus.train$data,
|
||||||
|
#' y = factor(agaricus.train$label),
|
||||||
|
#' params = list(max_depth = 2, eta = 1),
|
||||||
|
#' nthread = nthread,
|
||||||
|
#' nrounds = 2
|
||||||
|
#' )
|
||||||
#' pred <- predict(bst, agaricus.test$data)
|
#' pred <- predict(bst, agaricus.test$data)
|
||||||
#'
|
#'
|
||||||
#' @export
|
#' @export
|
||||||
|
|||||||
@ -717,167 +717,164 @@ process.x.and.col.args <- function(
|
|||||||
return(lst_args)
|
return(lst_args)
|
||||||
}
|
}
|
||||||
|
|
||||||
#' @noMd
|
#' Fit XGBoost Model
|
||||||
#' @export
|
|
||||||
#' @title Fit XGBoost Model
|
|
||||||
#' @description Fits an XGBoost model (boosted decision tree ensemble) to given x/y data.
|
|
||||||
#'
|
#'
|
||||||
#' See the tutorial \href{https://xgboost.readthedocs.io/en/stable/tutorials/model.html}{
|
#' @export
|
||||||
#' Introduction to Boosted Trees} for a longer explanation of what XGBoost does.
|
#' @description
|
||||||
|
#' Fits an XGBoost model (boosted decision tree ensemble) to given x/y data.
|
||||||
|
#'
|
||||||
|
#' See the tutorial [Introduction to Boosted Trees](https://xgboost.readthedocs.io/en/stable/tutorials/model.html)
|
||||||
|
#' for a longer explanation of what XGBoost does.
|
||||||
#'
|
#'
|
||||||
#' This function is intended to provide a more user-friendly interface for XGBoost that follows
|
#' This function is intended to provide a more user-friendly interface for XGBoost that follows
|
||||||
#' R's conventions for model fitting and predictions, but which doesn't expose all of the
|
#' R's conventions for model fitting and predictions, but which doesn't expose all of the
|
||||||
#' possible functionalities of the core XGBoost library.
|
#' possible functionalities of the core XGBoost library.
|
||||||
#'
|
#'
|
||||||
#' See \link{xgb.train} for a more flexible low-level alternative which is similar across different
|
#' See [xgb.train()] for a more flexible low-level alternative which is similar across different
|
||||||
#' language bindings of XGBoost and which exposes the full library's functionalities.
|
#' language bindings of XGBoost and which exposes the full library's functionalities.
|
||||||
#' @details For package authors using `xgboost` as a dependency, it is highly recommended to use
|
#'
|
||||||
#' \link{xgb.train} in package code instead of `xgboost()`, since it has a more stable interface
|
#' @details
|
||||||
|
#' For package authors using 'xgboost' as a dependency, it is highly recommended to use
|
||||||
|
#' [xgb.train()] in package code instead of [xgboost()], since it has a more stable interface
|
||||||
#' and performs fewer data conversions and copies along the way.
|
#' and performs fewer data conversions and copies along the way.
|
||||||
#' @references \itemize{
|
#' @references
|
||||||
#' \item Chen, Tianqi, and Carlos Guestrin. "Xgboost: A scalable tree boosting system."
|
#' - Chen, Tianqi, and Carlos Guestrin. "Xgboost: A scalable tree boosting system."
|
||||||
#' Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and
|
#' Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and
|
||||||
#' data mining. 2016.
|
#' data mining. 2016.
|
||||||
#' \item \url{https://xgboost.readthedocs.io/en/stable/}
|
#' - \url{https://xgboost.readthedocs.io/en/stable/}
|
||||||
#' }
|
#' @param x The features / covariates. Can be passed as:
|
||||||
#' @param x The features / covariates. Can be passed as:\itemize{
|
#' - A numeric or integer `matrix`.
|
||||||
#' \item A numeric or integer `matrix`.
|
#' - A `data.frame`, in which all columns are one of the following types:
|
||||||
#' \item A `data.frame`, in which all columns are one of the following types:\itemize{
|
#' - `numeric`
|
||||||
#' \item `numeric`
|
#' - `integer`
|
||||||
#' \item `integer`
|
#' - `logical`
|
||||||
#' \item `logical`
|
#' - `factor`
|
||||||
#' \item `factor`
|
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' Columns of `factor` type will be assumed to be categorical, while other column types will
|
#' Columns of `factor` type will be assumed to be categorical, while other column types will
|
||||||
#' be assumed to be numeric.
|
#' be assumed to be numeric.
|
||||||
#' \item A sparse matrix from the `Matrix` package, either as `dgCMatrix` or `dgRMatrix` class.
|
#' - A sparse matrix from the `Matrix` package, either as `dgCMatrix` or `dgRMatrix` class.
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' Note that categorical features are only supported for `data.frame` inputs, and are automatically
|
#' Note that categorical features are only supported for `data.frame` inputs, and are automatically
|
||||||
#' determined based on their types. See \link{xgb.train} with \link{xgb.DMatrix} for more flexible
|
#' determined based on their types. See [xgb.train()] with [xgb.DMatrix()] for more flexible
|
||||||
#' variants that would allow something like categorical features on sparse matrices.
|
#' variants that would allow something like categorical features on sparse matrices.
|
||||||
#' @param y The response variable. Allowed values are:\itemize{
|
#' @param y The response variable. Allowed values are:
|
||||||
#' \item A numeric or integer vector (for regression tasks).
|
#' - A numeric or integer vector (for regression tasks).
|
||||||
#' \item A factor or character vector (for binary and multi-class classification tasks).
|
#' - A factor or character vector (for binary and multi-class classification tasks).
|
||||||
#' \item A logical (boolean) vector (for binary classification tasks).
|
#' - A logical (boolean) vector (for binary classification tasks).
|
||||||
#' \item A numeric or integer matrix or `data.frame` with numeric/integer columns
|
#' - A numeric or integer matrix or `data.frame` with numeric/integer columns
|
||||||
#' (for multi-task regression tasks).
|
#' (for multi-task regression tasks).
|
||||||
#' \item A `Surv` object from the `survival` package (for survival tasks).
|
#' - A `Surv` object from the 'survival' package (for survival tasks).
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' If `objective` is `NULL`, the right task will be determined automatically based on
|
#' If `objective` is `NULL`, the right task will be determined automatically based on
|
||||||
#' the class of `y`.
|
#' the class of `y`.
|
||||||
#'
|
#'
|
||||||
#' If `objective` is not `NULL`, it must match with the type of `y` - e.g. `factor` types of `y`
|
#' If `objective` is not `NULL`, it must match with the type of `y` - e.g. `factor` types of `y`
|
||||||
#' can only be used with classification objectives and vice-versa.
|
#' can only be used with classification objectives and vice-versa.
|
||||||
#'
|
#'
|
||||||
#' For binary classification, the last factor level of `y` will be used as the "positive"
|
#' For binary classification, the last factor level of `y` will be used as the "positive"
|
||||||
#' class - that is, the numbers from `predict` will reflect the probabilities of belonging to this
|
#' class - that is, the numbers from `predict` will reflect the probabilities of belonging to this
|
||||||
#' class instead of to the first factor level. If `y` is a `logical` vector, then `TRUE` will be
|
#' class instead of to the first factor level. If `y` is a `logical` vector, then `TRUE` will be
|
||||||
#' set as the last level.
|
#' set as the last level.
|
||||||
#' @param objective Optimization objective to minimize based on the supplied data, to be passed
|
#' @param objective Optimization objective to minimize based on the supplied data, to be passed
|
||||||
#' by name as a string / character (e.g. `reg:absoluteerror`). See the
|
#' by name as a string / character (e.g. `reg:absoluteerror`). See the
|
||||||
#' \href{https://xgboost.readthedocs.io/en/stable/parameter.html#learning-task-parameters}{
|
#' [Learning Task Parameters](https://xgboost.readthedocs.io/en/stable/parameter.html#learning-task-parameters)
|
||||||
#' Learning Task Parameters} page for more detailed information on allowed values.
|
#' page for more detailed information on allowed values.
|
||||||
#'
|
#'
|
||||||
#' If `NULL` (the default), will be automatically determined from `y` according to the following
|
#' If `NULL` (the default), will be automatically determined from `y` according to the following
|
||||||
#' logic:\itemize{
|
#' logic:
|
||||||
#' \item If `y` is a factor with 2 levels, will use `binary:logistic`.
|
#' - If `y` is a factor with 2 levels, will use `binary:logistic`.
|
||||||
#' \item If `y` is a factor with more than 2 levels, will use `multi:softprob` (number of classes
|
#' - If `y` is a factor with more than 2 levels, will use `multi:softprob` (number of classes
|
||||||
#' will be determined automatically, should not be passed under `params`).
|
#' will be determined automatically, should not be passed under `params`).
|
||||||
#' \item If `y` is a `Surv` object from the `survival` package, will use `survival:aft` (note that
|
#' - If `y` is a `Surv` object from the `survival` package, will use `survival:aft` (note that
|
||||||
#' the only types supported are left / right / interval censored).
|
#' the only types supported are left / right / interval censored).
|
||||||
#' \item Otherwise, will use `reg:squarederror`.
|
#' - Otherwise, will use `reg:squarederror`.
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' If `objective` is not `NULL`, it must match with the type of `y` - e.g. `factor` types of `y`
|
#' If `objective` is not `NULL`, it must match with the type of `y` - e.g. `factor` types of `y`
|
||||||
#' can only be used with classification objectives and vice-versa.
|
#' can only be used with classification objectives and vice-versa.
|
||||||
#'
|
#'
|
||||||
#' Note that not all possible `objective` values supported by the core XGBoost library are allowed
|
#' Note that not all possible `objective` values supported by the core XGBoost library are allowed
|
||||||
#' here - for example, objectives which are a variation of another but with a different default
|
#' here - for example, objectives which are a variation of another but with a different default
|
||||||
#' prediction type (e.g. `multi:softmax` vs. `multi:softprob`) are not allowed, and neither are
|
#' prediction type (e.g. `multi:softmax` vs. `multi:softprob`) are not allowed, and neither are
|
||||||
#' ranking objectives, nor custom objectives at the moment.
|
#' ranking objectives, nor custom objectives at the moment.
|
||||||
#' @param nrounds Number of boosting iterations / rounds.
|
#' @param nrounds Number of boosting iterations / rounds.
|
||||||
#'
|
#'
|
||||||
#' Note that the number of default boosting rounds here is not automatically tuned, and different
|
#' Note that the number of default boosting rounds here is not automatically tuned, and different
|
||||||
#' problems will have vastly different optimal numbers of boosting rounds.
|
#' problems will have vastly different optimal numbers of boosting rounds.
|
||||||
#' @param weights Sample weights for each row in `x` and `y`. If `NULL` (the default), each row
|
#' @param weights Sample weights for each row in `x` and `y`. If `NULL` (the default), each row
|
||||||
#' will have the same weight.
|
#' will have the same weight.
|
||||||
#'
|
#'
|
||||||
#' If not `NULL`, should be passed as a numeric vector with length matching to the number of
|
#' If not `NULL`, should be passed as a numeric vector with length matching to the number of rows in `x`.
|
||||||
#' rows in `x`.
|
|
||||||
#' @param verbosity Verbosity of printing messages. Valid values of 0 (silent), 1 (warning),
|
#' @param verbosity Verbosity of printing messages. Valid values of 0 (silent), 1 (warning),
|
||||||
#' 2 (info), and 3 (debug).
|
#' 2 (info), and 3 (debug).
|
||||||
#' @param nthreads Number of parallel threads to use. If passing zero, will use all CPU threads.
|
#' @param nthreads Number of parallel threads to use. If passing zero, will use all CPU threads.
|
||||||
#' @param seed Seed to use for random number generation. If passing `NULL`, will draw a random
|
#' @param seed Seed to use for random number generation. If passing `NULL`, will draw a random
|
||||||
#' number using R's PRNG system to use as seed.
|
#' number using R's PRNG system to use as seed.
|
||||||
#' @param monotone_constraints Optional monotonicity constraints for features.
|
#' @param monotone_constraints Optional monotonicity constraints for features.
|
||||||
#'
|
#'
|
||||||
#' Can be passed either as a named list (when `x` has column names), or as a vector. If passed
|
#' Can be passed either as a named list (when `x` has column names), or as a vector. If passed
|
||||||
#' as a vector and `x` has column names, will try to match the elements by name.
|
#' as a vector and `x` has column names, will try to match the elements by name.
|
||||||
#'
|
#'
|
||||||
#' A value of `+1` for a given feature makes the model predictions / scores constrained to be
|
#' A value of `+1` for a given feature makes the model predictions / scores constrained to be
|
||||||
#' a monotonically increasing function of that feature (that is, as the value of the feature
|
#' a monotonically increasing function of that feature (that is, as the value of the feature
|
||||||
#' increases, the model prediction cannot decrease), while a value of `-1` makes it a monotonically
|
#' increases, the model prediction cannot decrease), while a value of `-1` makes it a monotonically
|
||||||
#' decreasing function. A value of zero imposes no constraint.
|
#' decreasing function. A value of zero imposes no constraint.
|
||||||
#'
|
#'
|
||||||
#' The input for `monotone_constraints` can be a subset of the columns of `x` if named, in which
|
#' The input for `monotone_constraints` can be a subset of the columns of `x` if named, in which
|
||||||
#' case the columns that are not referred to in `monotone_constraints` will be assumed to have
|
#' case the columns that are not referred to in `monotone_constraints` will be assumed to have
|
||||||
#' a value of zero (no constraint imposed on the model for those features).
|
#' a value of zero (no constraint imposed on the model for those features).
|
||||||
#'
|
#'
|
||||||
#' See the tutorial \href{https://xgboost.readthedocs.io/en/stable/tutorials/monotonic.html}{
|
#' See the tutorial [Monotonic Constraints](https://xgboost.readthedocs.io/en/stable/tutorials/monotonic.html)
|
||||||
#' Monotonic Constraints} for a more detailed explanation.
|
#' for a more detailed explanation.
|
||||||
#' @param interaction_constraints Constraints for interaction representing permitted interactions.
|
#' @param interaction_constraints Constraints for interaction representing permitted interactions.
|
||||||
#' The constraints must be specified in the form of a list of vectors referencing columns in the
|
#' The constraints must be specified in the form of a list of vectors referencing columns in the
|
||||||
#' data, e.g. `list(c(1, 2), c(3, 4, 5))` (with these numbers being column indices, numeration
|
#' data, e.g. `list(c(1, 2), c(3, 4, 5))` (with these numbers being column indices, numeration
|
||||||
#' starting at 1 - i.e. the first sublist references the first and second columns) or
|
#' starting at 1 - i.e. the first sublist references the first and second columns) or
|
||||||
#' `list(c("Sepal.Length", "Sepal.Width"), c("Petal.Length", "Petal.Width"))` (references
|
#' `list(c("Sepal.Length", "Sepal.Width"), c("Petal.Length", "Petal.Width"))` (references
|
||||||
#' columns by names), where each vector is a group of indices of features that are allowed to
|
#' columns by names), where each vector is a group of indices of features that are allowed to
|
||||||
#' interact with each other.
|
#' interact with each other.
|
||||||
#'
|
#'
|
||||||
#' See the tutorial
|
#' See the tutorial [Feature Interaction Constraints](https://xgboost.readthedocs.io/en/stable/tutorials/feature_interaction_constraint.html)
|
||||||
#' \href{https://xgboost.readthedocs.io/en/stable/tutorials/feature_interaction_constraint.html}{
|
#' for more information.
|
||||||
#' Feature Interaction Constraints} for more information.
|
|
||||||
#' @param feature_weights Feature weights for column sampling.
|
#' @param feature_weights Feature weights for column sampling.
|
||||||
#'
|
#'
|
||||||
#' Can be passed either as a vector with length matching to columns of `x`, or as a named
|
#' Can be passed either as a vector with length matching to columns of `x`, or as a named
|
||||||
#' list (only if `x` has column names) with names matching to columns of 'x'. If it is a
|
#' list (only if `x` has column names) with names matching to columns of 'x'. If it is a
|
||||||
#' named vector, will try to match the entries to column names of `x` by name.
|
#' named vector, will try to match the entries to column names of `x` by name.
|
||||||
#'
|
#'
|
||||||
#' If `NULL` (the default), all columns will have the same weight.
|
#' If `NULL` (the default), all columns will have the same weight.
|
||||||
#' @param base_margin Base margin used for boosting from existing model.
|
#' @param base_margin Base margin used for boosting from existing model.
|
||||||
#'
|
#'
|
||||||
#' If passing it, will start the gradient boosting procedure from the scores that are provided
|
#' If passing it, will start the gradient boosting procedure from the scores that are provided
|
||||||
#' here - for example, one can pass the raw scores from a previous model, or some per-observation
|
#' here - for example, one can pass the raw scores from a previous model, or some per-observation
|
||||||
#' offset, or similar.
|
#' offset, or similar.
|
||||||
#'
|
#'
|
||||||
#' Should be either a numeric vector or numeric matrix (for multi-class and multi-target objectives)
|
#' Should be either a numeric vector or numeric matrix (for multi-class and multi-target objectives)
|
||||||
#' with the same number of rows as `x` and number of columns corresponding to number of optimization
|
#' with the same number of rows as `x` and number of columns corresponding to number of optimization
|
||||||
#' targets, and should be in the untransformed scale (for example, for objective `binary:logistic`,
|
#' targets, and should be in the untransformed scale (for example, for objective `binary:logistic`,
|
||||||
#' it should have log-odds, not probabilities; and for objective `multi:softprob`, should have
|
#' it should have log-odds, not probabilities; and for objective `multi:softprob`, should have
|
||||||
#' number of columns matching to number of classes in the data).
|
#' number of columns matching to number of classes in the data).
|
||||||
#'
|
#'
|
||||||
#' Note that, if it contains more than one column, then columns will not be matched by name to
|
#' Note that, if it contains more than one column, then columns will not be matched by name to
|
||||||
#' the corresponding `y` - `base_margin` should have the same column order that the model will use
|
#' the corresponding `y` - `base_margin` should have the same column order that the model will use
|
||||||
#' (for example, for objective `multi:softprob`, columns of `base_margin` will be matched against
|
#' (for example, for objective `multi:softprob`, columns of `base_margin` will be matched against
|
||||||
#' `levels(y)` by their position, regardless of what `colnames(base_margin)` returns).
|
#' `levels(y)` by their position, regardless of what `colnames(base_margin)` returns).
|
||||||
#'
|
#'
|
||||||
#' If `NULL`, will start from zero, but note that for most objectives, an intercept is usually
|
#' If `NULL`, will start from zero, but note that for most objectives, an intercept is usually
|
||||||
#' added (controllable through parameter `base_score` instead) when `base_margin` is not passed.
|
#' added (controllable through parameter `base_score` instead) when `base_margin` is not passed.
|
||||||
#' @param ... Other training parameters. See the online documentation
|
#' @param ... Other training parameters. See the online documentation
|
||||||
#' \href{https://xgboost.readthedocs.io/en/stable/parameter.html}{XGBoost Parameters} for
|
#' [XGBoost Parameters](https://xgboost.readthedocs.io/en/stable/parameter.html) for
|
||||||
#' details about possible values and what they do.
|
#' details about possible values and what they do.
|
||||||
#'
|
#'
|
||||||
#' Note that not all possible values from the core XGBoost library are allowed as `params` for
|
#' Note that not all possible values from the core XGBoost library are allowed as `params` for
|
||||||
#' 'xgboost()' - in particular, values which require an already-fitted booster object (such as
|
#' 'xgboost()' - in particular, values which require an already-fitted booster object (such as
|
||||||
#' `process_type`) are not accepted here.
|
#' `process_type`) are not accepted here.
|
||||||
#' @return A model object, inheriting from both `xgboost` and `xgb.Booster`. Compared to the regular
|
#' @return A model object, inheriting from both `xgboost` and `xgb.Booster`. Compared to the regular
|
||||||
#' `xgb.Booster` model class produced by \link{xgb.train}, this `xgboost` class will have an
|
#' `xgb.Booster` model class produced by [xgb.train()], this `xgboost` class will have an
|
||||||
#' additional attribute `metadata` containing information which is used for formatting prediction
|
#'
|
||||||
#' outputs, such as class names for classification problems.
|
#' additional attribute `metadata` containing information which is used for formatting prediction
|
||||||
|
#' outputs, such as class names for classification problems.
|
||||||
|
#'
|
||||||
#' @examples
|
#' @examples
|
||||||
#' library(xgboost)
|
|
||||||
#' data(mtcars)
|
#' data(mtcars)
|
||||||
#'
|
#'
|
||||||
#' # Fit a small regression model on the mtcars data
|
#' # Fit a small regression model on the mtcars data
|
||||||
@ -1006,12 +1003,9 @@ print.xgboost <- function(x, ...) {
|
|||||||
#' This data set is originally from the Mushroom data set,
|
#' This data set is originally from the Mushroom data set,
|
||||||
#' UCI Machine Learning Repository.
|
#' UCI Machine Learning Repository.
|
||||||
#'
|
#'
|
||||||
#' This data set includes the following fields:
|
#' It includes the following fields:
|
||||||
#'
|
#' - `label`: The label for each record.
|
||||||
#' \itemize{
|
#' - `data`: A sparse Matrix of 'dgCMatrix' class with 126 columns.
|
||||||
#' \item \code{label} the label for each record
|
|
||||||
#' \item \code{data} a sparse Matrix of \code{dgCMatrix} class, with 126 columns.
|
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' @references
|
#' @references
|
||||||
#' <https://archive.ics.uci.edu/ml/datasets/Mushroom>
|
#' <https://archive.ics.uci.edu/ml/datasets/Mushroom>
|
||||||
@ -1033,12 +1027,9 @@ NULL
|
|||||||
#' This data set is originally from the Mushroom data set,
|
#' This data set is originally from the Mushroom data set,
|
||||||
#' UCI Machine Learning Repository.
|
#' UCI Machine Learning Repository.
|
||||||
#'
|
#'
|
||||||
#' This data set includes the following fields:
|
#' It includes the following fields:
|
||||||
#'
|
#' - `label`: The label for each record.
|
||||||
#' \itemize{
|
#' - `data`: A sparse Matrix of 'dgCMatrix' class with 126 columns.
|
||||||
#' \item \code{label} the label for each record
|
|
||||||
#' \item \code{data} a sparse Matrix of \code{dgCMatrix} class, with 126 columns.
|
|
||||||
#' }
|
|
||||||
#'
|
#'
|
||||||
#' @references
|
#' @references
|
||||||
#' <https://archive.ics.uci.edu/ml/datasets/Mushroom>
|
#' <https://archive.ics.uci.edu/ml/datasets/Mushroom>
|
||||||
|
|||||||
@ -16,11 +16,10 @@ This data set is originally from the Mushroom data set,
|
|||||||
UCI Machine Learning Repository.
|
UCI Machine Learning Repository.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
This data set includes the following fields:
|
It includes the following fields:
|
||||||
|
|
||||||
\itemize{
|
\itemize{
|
||||||
\item \code{label} the label for each record
|
\item \code{label}: The label for each record.
|
||||||
\item \code{data} a sparse Matrix of \code{dgCMatrix} class, with 126 columns.
|
\item \code{data}: A sparse Matrix of 'dgCMatrix' class with 126 columns.
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
\references{
|
\references{
|
||||||
|
|||||||
@ -16,11 +16,10 @@ This data set is originally from the Mushroom data set,
|
|||||||
UCI Machine Learning Repository.
|
UCI Machine Learning Repository.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
This data set includes the following fields:
|
It includes the following fields:
|
||||||
|
|
||||||
\itemize{
|
\itemize{
|
||||||
\item \code{label} the label for each record
|
\item \code{label}: The label for each record.
|
||||||
\item \code{data} a sparse Matrix of \code{dgCMatrix} class, with 126 columns.
|
\item \code{data}: A sparse Matrix of 'dgCMatrix' class with 126 columns.
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
\references{
|
\references{
|
||||||
|
|||||||
@ -13,13 +13,14 @@
|
|||||||
Returns a vector of numbers of rows and of columns in an \code{xgb.DMatrix}.
|
Returns a vector of numbers of rows and of columns in an \code{xgb.DMatrix}.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
Note: since \code{nrow} and \code{ncol} internally use \code{dim}, they can also
|
Note: since \code{\link[=nrow]{nrow()}} and \code{\link[=ncol]{ncol()}} internally use \code{\link[=dim]{dim()}}, they can also
|
||||||
be directly used with an \code{xgb.DMatrix} object.
|
be directly used with an \code{xgb.DMatrix} object.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
data(agaricus.train, package='xgboost')
|
data(agaricus.train, package = "xgboost")
|
||||||
|
|
||||||
train <- agaricus.train
|
train <- agaricus.train
|
||||||
dtrain <- xgb.DMatrix(train$data, label=train$label, nthread = 2)
|
dtrain <- xgb.DMatrix(train$data, label = train$label, nthread = 2)
|
||||||
|
|
||||||
stopifnot(nrow(dtrain) == nrow(train$data))
|
stopifnot(nrow(dtrain) == nrow(train$data))
|
||||||
stopifnot(ncol(dtrain) == ncol(train$data))
|
stopifnot(ncol(dtrain) == ncol(train$data))
|
||||||
|
|||||||
@ -10,26 +10,27 @@
|
|||||||
\method{dimnames}{xgb.DMatrix}(x) <- value
|
\method{dimnames}{xgb.DMatrix}(x) <- value
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{x}{object of class \code{xgb.DMatrix}}
|
\item{x}{Object of class \code{xgb.DMatrix}.}
|
||||||
|
|
||||||
\item{value}{a list of two elements: the first one is ignored
|
\item{value}{A list of two elements: the first one is ignored
|
||||||
and the second one is column names}
|
and the second one is column names}
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Only column names are supported for \code{xgb.DMatrix}, thus setting of
|
Only column names are supported for \code{xgb.DMatrix}, thus setting of
|
||||||
row names would have no effect and returned row names would be NULL.
|
row names would have no effect and returned row names would be \code{NULL}.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
Generic \code{dimnames} methods are used by \code{colnames}.
|
Generic \code{\link[=dimnames]{dimnames()}} methods are used by \code{\link[=colnames]{colnames()}}.
|
||||||
Since row names are irrelevant, it is recommended to use \code{colnames} directly.
|
Since row names are irrelevant, it is recommended to use \code{\link[=colnames]{colnames()}} directly.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
data(agaricus.train, package='xgboost')
|
data(agaricus.train, package = "xgboost")
|
||||||
|
|
||||||
train <- agaricus.train
|
train <- agaricus.train
|
||||||
dtrain <- xgb.DMatrix(train$data, label=train$label, nthread = 2)
|
dtrain <- xgb.DMatrix(train$data, label = train$label, nthread = 2)
|
||||||
dimnames(dtrain)
|
dimnames(dtrain)
|
||||||
colnames(dtrain)
|
colnames(dtrain)
|
||||||
colnames(dtrain) <- make.names(1:ncol(train$data))
|
colnames(dtrain) <- make.names(1:ncol(train$data))
|
||||||
print(dtrain, verbose=TRUE)
|
print(dtrain, verbose = TRUE)
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@ -22,34 +22,34 @@ setinfo(object, name, info)
|
|||||||
\method{setinfo}{xgb.DMatrix}(object, name, info)
|
\method{setinfo}{xgb.DMatrix}(object, name, info)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{object}{Object of class \code{xgb.DMatrix} of \code{xgb.Booster}.}
|
\item{object}{Object of class \code{xgb.DMatrix} or \code{xgb.Booster}.}
|
||||||
|
|
||||||
\item{name}{the name of the information field to get (see details)}
|
\item{name}{The name of the information field to get (see details).}
|
||||||
|
|
||||||
\item{info}{the specific field of information to set}
|
\item{info}{The specific field of information to set.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
For \code{getinfo}, will return the requested field. For \code{setinfo}, will always return value \code{TRUE}
|
For \code{getinfo()}, will return the requested field. For \code{setinfo()},
|
||||||
if it succeeds.
|
will always return value \code{TRUE} if it succeeds.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Get or set information of xgb.DMatrix and xgb.Booster objects
|
Get or set information of xgb.DMatrix and xgb.Booster objects
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
The \code{name} field can be one of the following for \code{xgb.DMatrix}:
|
The \code{name} field can be one of the following for \code{xgb.DMatrix}:
|
||||||
|
|
||||||
\itemize{
|
\itemize{
|
||||||
\item \code{label}
|
\item label
|
||||||
\item \code{weight}
|
\item weight
|
||||||
\item \code{base_margin}
|
\item base_margin
|
||||||
\item \code{label_lower_bound}
|
\item label_lower_bound
|
||||||
\item \code{label_upper_bound}
|
\item label_upper_bound
|
||||||
\item \code{group}
|
\item group
|
||||||
\item \code{feature_type}
|
\item feature_type
|
||||||
\item \code{feature_name}
|
\item feature_name
|
||||||
\item \code{nrow}
|
\item nrow
|
||||||
}
|
}
|
||||||
See the documentation for \link{xgb.DMatrix} for more information about these fields.
|
|
||||||
|
See the documentation for \code{\link[=xgb.DMatrix]{xgb.DMatrix()}} for more information about these fields.
|
||||||
|
|
||||||
For \code{xgb.Booster}, can be one of the following:
|
For \code{xgb.Booster}, can be one of the following:
|
||||||
\itemize{
|
\itemize{
|
||||||
@ -57,17 +57,18 @@ For \code{xgb.Booster}, can be one of the following:
|
|||||||
\item \code{feature_name}
|
\item \code{feature_name}
|
||||||
}
|
}
|
||||||
|
|
||||||
Note that, while 'qid' cannot be retrieved, it's possible to get the equivalent 'group'
|
Note that, while 'qid' cannot be retrieved, it is possible to get the equivalent 'group'
|
||||||
for a DMatrix that had 'qid' assigned.
|
for a DMatrix that had 'qid' assigned.
|
||||||
|
|
||||||
\bold{Important}: when calling \code{setinfo}, the objects are modified in-place. See
|
\strong{Important}: when calling \code{\link[=setinfo]{setinfo()}}, the objects are modified in-place. See
|
||||||
\link{xgb.copy.Booster} for an idea of this in-place assignment works.
|
\code{\link[=xgb.copy.Booster]{xgb.copy.Booster()}} for an idea of this in-place assignment works.
|
||||||
|
|
||||||
See the documentation for \link{xgb.DMatrix} for possible fields that can be set
|
See the documentation for \code{\link[=xgb.DMatrix]{xgb.DMatrix()}} for possible fields that can be set
|
||||||
(which correspond to arguments in that function).
|
(which correspond to arguments in that function).
|
||||||
|
|
||||||
Note that the following fields are allowed in the construction of an \code{xgb.DMatrix}
|
Note that the following fields are allowed in the construction of an \code{xgb.DMatrix}
|
||||||
but \bold{aren't} allowed here:\itemize{
|
but \strong{are not} allowed here:
|
||||||
|
\itemize{
|
||||||
\item data
|
\item data
|
||||||
\item missing
|
\item missing
|
||||||
\item silent
|
\item silent
|
||||||
@ -75,19 +76,22 @@ but \bold{aren't} allowed here:\itemize{
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
data(agaricus.train, package='xgboost')
|
data(agaricus.train, package = "xgboost")
|
||||||
|
|
||||||
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||||
|
|
||||||
labels <- getinfo(dtrain, 'label')
|
labels <- getinfo(dtrain, "label")
|
||||||
setinfo(dtrain, 'label', 1-labels)
|
setinfo(dtrain, "label", 1 - labels)
|
||||||
|
|
||||||
|
labels2 <- getinfo(dtrain, "label")
|
||||||
|
stopifnot(all(labels2 == 1 - labels))
|
||||||
|
data(agaricus.train, package = "xgboost")
|
||||||
|
|
||||||
labels2 <- getinfo(dtrain, 'label')
|
|
||||||
stopifnot(all(labels2 == 1-labels))
|
|
||||||
data(agaricus.train, package='xgboost')
|
|
||||||
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||||
|
|
||||||
labels <- getinfo(dtrain, 'label')
|
labels <- getinfo(dtrain, "label")
|
||||||
setinfo(dtrain, 'label', 1-labels)
|
setinfo(dtrain, "label", 1 - labels)
|
||||||
labels2 <- getinfo(dtrain, 'label')
|
|
||||||
stopifnot(all.equal(labels2, 1-labels))
|
labels2 <- getinfo(dtrain, "label")
|
||||||
|
stopifnot(all.equal(labels2, 1 - labels))
|
||||||
}
|
}
|
||||||
|
|||||||
@ -157,7 +157,8 @@ dimension should produce practically the same result as \code{predcontrib = TRUE
|
|||||||
For multi-class and multi-target, will be a 4D array with dimensions \verb{[nrows, ngroups, nfeats+1, nfeats+1]}
|
For multi-class and multi-target, will be a 4D array with dimensions \verb{[nrows, ngroups, nfeats+1, nfeats+1]}
|
||||||
}
|
}
|
||||||
|
|
||||||
If passing \code{strict_shape=FALSE}, the result is always an array:\itemize{
|
If passing \code{strict_shape=FALSE}, the result is always an array:
|
||||||
|
\itemize{
|
||||||
\item For normal predictions, the dimension is \verb{[nrows, ngroups]}.
|
\item For normal predictions, the dimension is \verb{[nrows, ngroups]}.
|
||||||
\item For \code{predcontrib=TRUE}, the dimension is \verb{[nrows, ngroups, nfeats+1]}.
|
\item For \code{predcontrib=TRUE}, the dimension is \verb{[nrows, ngroups, nfeats+1]}.
|
||||||
\item For \code{predinteraction=TRUE}, the dimension is \verb{[nrows, ngroups, nfeats+1, nfeats+1]}.
|
\item For \code{predinteraction=TRUE}, the dimension is \verb{[nrows, ngroups, nfeats+1, nfeats+1]}.
|
||||||
|
|||||||
@ -7,21 +7,22 @@
|
|||||||
\method{print}{xgb.DMatrix}(x, verbose = FALSE, ...)
|
\method{print}{xgb.DMatrix}(x, verbose = FALSE, ...)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{x}{an xgb.DMatrix object}
|
\item{x}{An xgb.DMatrix object.}
|
||||||
|
|
||||||
\item{verbose}{whether to print colnames (when present)}
|
\item{verbose}{Whether to print colnames (when present).}
|
||||||
|
|
||||||
\item{...}{not currently used}
|
\item{...}{Not currently used.}
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Print information about xgb.DMatrix.
|
Print information about xgb.DMatrix.
|
||||||
Currently it displays dimensions and presence of info-fields and colnames.
|
Currently it displays dimensions and presence of info-fields and colnames.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
data(agaricus.train, package='xgboost')
|
data(agaricus.train, package = "xgboost")
|
||||||
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
|
||||||
|
|
||||||
|
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||||
dtrain
|
dtrain
|
||||||
print(dtrain, verbose=TRUE)
|
|
||||||
|
print(dtrain, verbose = TRUE)
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@ -7,25 +7,33 @@
|
|||||||
\method{print}{xgb.cv.synchronous}(x, verbose = FALSE, ...)
|
\method{print}{xgb.cv.synchronous}(x, verbose = FALSE, ...)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{x}{an \code{xgb.cv.synchronous} object}
|
\item{x}{An \code{xgb.cv.synchronous} object.}
|
||||||
|
|
||||||
\item{verbose}{whether to print detailed data}
|
\item{verbose}{Whether to print detailed data.}
|
||||||
|
|
||||||
\item{...}{passed to \code{data.table.print}}
|
\item{...}{Passed to \code{data.table.print()}.}
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Prints formatted results of \code{xgb.cv}.
|
Prints formatted results of \code{\link[=xgb.cv]{xgb.cv()}}.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
When not verbose, it would only print the evaluation results,
|
When not verbose, it would only print the evaluation results,
|
||||||
including the best iteration (when available).
|
including the best iteration (when available).
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
data(agaricus.train, package='xgboost')
|
data(agaricus.train, package = "xgboost")
|
||||||
|
|
||||||
train <- agaricus.train
|
train <- agaricus.train
|
||||||
cv <- xgb.cv(data = xgb.DMatrix(train$data, label = train$label), nfold = 5, max_depth = 2,
|
cv <- xgb.cv(
|
||||||
eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
|
data = xgb.DMatrix(train$data, label = train$label),
|
||||||
|
nfold = 5,
|
||||||
|
max_depth = 2,
|
||||||
|
eta = 1,
|
||||||
|
nthread = 2,
|
||||||
|
nrounds = 2,
|
||||||
|
objective = "binary:logistic"
|
||||||
|
)
|
||||||
print(cv)
|
print(cv)
|
||||||
print(cv, verbose=TRUE)
|
print(cv, verbose = TRUE)
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@ -116,7 +116,7 @@ model ended, in which case this will be larger than 1.
|
|||||||
It should match with argument \code{nrounds} passed to \code{\link[=xgb.train]{xgb.train()}} or \code{\link[=xgb.cv]{xgb.cv()}}.
|
It should match with argument \code{nrounds} passed to \code{\link[=xgb.train]{xgb.train()}} or \code{\link[=xgb.cv]{xgb.cv()}}.
|
||||||
|
|
||||||
Note that boosting might be interrupted before reaching this last iteration, for
|
Note that boosting might be interrupted before reaching this last iteration, for
|
||||||
example by using the early stopping callback \link{xgb.cb.early.stop}.
|
example by using the early stopping callback \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}}.
|
||||||
\item iteration Index of the iteration number that is being executed (first iteration
|
\item iteration Index of the iteration number that is being executed (first iteration
|
||||||
will be the same as parameter \code{begin_iteration}, then next one will add +1, and so on).
|
will be the same as parameter \code{begin_iteration}, then next one will add +1, and so on).
|
||||||
\item iter_feval Evaluation metrics for \code{evals} that were supplied, either
|
\item iter_feval Evaluation metrics for \code{evals} that were supplied, either
|
||||||
|
|||||||
@ -57,20 +57,20 @@ was constructed.
|
|||||||
|
|
||||||
Other column types are not supported.
|
Other column types are not supported.
|
||||||
\item CSR matrices, as class \code{dgRMatrix} from package \code{Matrix}.
|
\item CSR matrices, as class \code{dgRMatrix} from package \code{Matrix}.
|
||||||
\item CSC matrices, as class \code{dgCMatrix} from package \code{Matrix}. These are \bold{not} supported for
|
\item CSC matrices, as class \code{dgCMatrix} from package \code{Matrix}. These are \strong{not} supported for
|
||||||
'xgb.QuantileDMatrix'.
|
'xgb.QuantileDMatrix'.
|
||||||
\item Single-row CSR matrices, as class \code{dsparseVector} from package \code{Matrix}, which is interpreted
|
\item Single-row CSR matrices, as class \code{dsparseVector} from package \code{Matrix}, which is interpreted
|
||||||
as a single row (only when making predictions from a fitted model).
|
as a single row (only when making predictions from a fitted model).
|
||||||
\item Text files in a supported format, passed as a \code{character} variable containing the URI path to
|
\item Text files in a supported format, passed as a \code{character} variable containing the URI path to
|
||||||
the file, with an optional format specifier.
|
the file, with an optional format specifier.
|
||||||
|
|
||||||
These are \bold{not} supported for \code{xgb.QuantileDMatrix}. Supported formats are:\itemize{
|
These are \strong{not} supported for \code{xgb.QuantileDMatrix}. Supported formats are:\itemize{
|
||||||
\item XGBoost's own binary format for DMatrices, as produced by \link{xgb.DMatrix.save}.
|
\item XGBoost's own binary format for DMatrices, as produced by \code{\link[=xgb.DMatrix.save]{xgb.DMatrix.save()}}.
|
||||||
\item SVMLight (a.k.a. LibSVM) format for CSR matrices. This format can be signaled by suffix
|
\item SVMLight (a.k.a. LibSVM) format for CSR matrices. This format can be signaled by suffix
|
||||||
\code{?format=libsvm} at the end of the file path. It will be the default format if not
|
\code{?format=libsvm} at the end of the file path. It will be the default format if not
|
||||||
otherwise specified.
|
otherwise specified.
|
||||||
\item CSV files (comma-separated values). This format can be specified by adding suffix
|
\item CSV files (comma-separated values). This format can be specified by adding suffix
|
||||||
\code{?format=csv} at the end ofthe file path. It will \bold{not} be auto-deduced from file extensions.
|
\code{?format=csv} at the end ofthe file path. It will \strong{not} be auto-deduced from file extensions.
|
||||||
}
|
}
|
||||||
|
|
||||||
Be aware that the format of the file will not be auto-deduced - for example, if a file is named 'file.csv',
|
Be aware that the format of the file will not be auto-deduced - for example, if a file is named 'file.csv',
|
||||||
@ -99,25 +99,24 @@ so it doesn't make sense to assign weights to individual data points.}
|
|||||||
In the case of multi-output models, one can also pass multi-dimensional base_margin.}
|
In the case of multi-output models, one can also pass multi-dimensional base_margin.}
|
||||||
|
|
||||||
\item{missing}{A float value to represents missing values in data (not used when creating DMatrix
|
\item{missing}{A float value to represents missing values in data (not used when creating DMatrix
|
||||||
from text files).
|
from text files). It is useful to change when a zero, infinite, or some other
|
||||||
It is useful to change when a zero, infinite, or some other extreme value represents missing
|
extreme value represents missing values in data.}
|
||||||
values in data.}
|
|
||||||
|
|
||||||
\item{silent}{whether to suppress printing an informational message after loading from a file.}
|
\item{silent}{whether to suppress printing an informational message after loading from a file.}
|
||||||
|
|
||||||
\item{feature_names}{Set names for features. Overrides column names in data
|
\item{feature_names}{Set names for features. Overrides column names in data frame and matrix.
|
||||||
frame and matrix.
|
|
||||||
|
|
||||||
Note: columns are not referenced by name when calling \code{predict}, so the column order there
|
Note: columns are not referenced by name when calling \code{predict}, so the column order there
|
||||||
must be the same as in the DMatrix construction, regardless of the column names.}
|
must be the same as in the DMatrix construction, regardless of the column names.}
|
||||||
|
|
||||||
\item{feature_types}{Set types for features.
|
\item{feature_types}{Set types for features.
|
||||||
|
|
||||||
If \code{data} is a \code{data.frame} and passing \code{feature_types} is not supplied, feature types will be deduced
|
If \code{data} is a \code{data.frame} and passing \code{feature_types} is not supplied,
|
||||||
automatically from the column types.
|
feature types will be deduced automatically from the column types.
|
||||||
|
|
||||||
Otherwise, one can pass a character vector with the same length as number of columns in \code{data},
|
Otherwise, one can pass a character vector with the same length as number of columns in \code{data},
|
||||||
with the following possible values:\itemize{
|
with the following possible values:
|
||||||
|
\itemize{
|
||||||
\item "c", which represents categorical columns.
|
\item "c", which represents categorical columns.
|
||||||
\item "q", which represents numeric columns.
|
\item "q", which represents numeric columns.
|
||||||
\item "int", which represents integer columns.
|
\item "int", which represents integer columns.
|
||||||
@ -128,9 +127,9 @@ Note that, while categorical types are treated differently from the rest for mod
|
|||||||
purposes, the other types do not influence the generated model, but have effects in other
|
purposes, the other types do not influence the generated model, but have effects in other
|
||||||
functionalities such as feature importances.
|
functionalities such as feature importances.
|
||||||
|
|
||||||
\bold{Important}: categorical features, if specified manually through \code{feature_types}, must
|
\strong{Important}: Categorical features, if specified manually through \code{feature_types}, must
|
||||||
be encoded as integers with numeration starting at zero, and the same encoding needs to be
|
be encoded as integers with numeration starting at zero, and the same encoding needs to be
|
||||||
applied when passing data to \code{predict}. Even if passing \code{factor} types, the encoding will
|
applied when passing data to \code{\link[=predict]{predict()}}. Even if passing \code{factor} types, the encoding will
|
||||||
not be saved, so make sure that \code{factor} columns passed to \code{predict} have the same \code{levels}.}
|
not be saved, so make sure that \code{factor} columns passed to \code{predict} have the same \code{levels}.}
|
||||||
|
|
||||||
\item{nthread}{Number of threads used for creating DMatrix.}
|
\item{nthread}{Number of threads used for creating DMatrix.}
|
||||||
@ -154,7 +153,7 @@ how the file was split beforehand. Default to row.
|
|||||||
This is not used when \code{data} is not a URI.}
|
This is not used when \code{data} is not a URI.}
|
||||||
|
|
||||||
\item{ref}{The training dataset that provides quantile information, needed when creating
|
\item{ref}{The training dataset that provides quantile information, needed when creating
|
||||||
validation/test dataset with \code{xgb.QuantileDMatrix}. Supplying the training DMatrix
|
validation/test dataset with \code{\link[=xgb.QuantileDMatrix]{xgb.QuantileDMatrix()}}. Supplying the training DMatrix
|
||||||
as a reference means that the same quantisation applied to the training data is
|
as a reference means that the same quantisation applied to the training data is
|
||||||
applied to the validation/test data}
|
applied to the validation/test data}
|
||||||
|
|
||||||
@ -169,23 +168,24 @@ subclass 'xgb.QuantileDMatrix'.
|
|||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Construct an 'xgb.DMatrix' object from a given data source, which can then be passed to functions
|
Construct an 'xgb.DMatrix' object from a given data source, which can then be passed to functions
|
||||||
such as \link{xgb.train} or \link{predict.xgb.Booster}.
|
such as \code{\link[=xgb.train]{xgb.train()}} or \code{\link[=predict]{predict()}}.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
Function 'xgb.QuantileDMatrix' will construct a DMatrix with quantization for the histogram
|
Function \code{xgb.QuantileDMatrix()} will construct a DMatrix with quantization for the histogram
|
||||||
method already applied to it, which can be used to reduce memory usage (compared to using a
|
method already applied to it, which can be used to reduce memory usage (compared to using a
|
||||||
a regular DMatrix first and then creating a quantization out of it) when using the histogram
|
a regular DMatrix first and then creating a quantization out of it) when using the histogram
|
||||||
method (\code{tree_method = "hist"}, which is the default algorithm), but is not usable for the
|
method (\code{tree_method = "hist"}, which is the default algorithm), but is not usable for the
|
||||||
sorted-indices method (\code{tree_method = "exact"}), nor for the approximate method
|
sorted-indices method (\code{tree_method = "exact"}), nor for the approximate method
|
||||||
(\code{tree_method = "approx"}).
|
(\code{tree_method = "approx"}).
|
||||||
|
|
||||||
Note that DMatrix objects are not serializable through R functions such as \code{saveRDS} or \code{save}.
|
Note that DMatrix objects are not serializable through R functions such as \code{\link[=saveRDS]{saveRDS()}} or \code{\link[=save]{save()}}.
|
||||||
If a DMatrix gets serialized and then de-serialized (for example, when saving data in an R session or caching
|
If a DMatrix gets serialized and then de-serialized (for example, when saving data in an R session or caching
|
||||||
chunks in an Rmd file), the resulting object will not be usable anymore and will need to be reconstructed
|
chunks in an Rmd file), the resulting object will not be usable anymore and will need to be reconstructed
|
||||||
from the original source of data.
|
from the original source of data.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
data(agaricus.train, package='xgboost')
|
data(agaricus.train, package = "xgboost")
|
||||||
|
|
||||||
## Keep the number of threads to 1 for examples
|
## Keep the number of threads to 1 for examples
|
||||||
nthread <- 1
|
nthread <- 1
|
||||||
data.table::setDTthreads(nthread)
|
data.table::setDTthreads(nthread)
|
||||||
|
|||||||
@ -16,11 +16,10 @@ Checks whether an xgb.DMatrix object has a given field assigned to
|
|||||||
it, such as weights, labels, etc.
|
it, such as weights, labels, etc.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
library(xgboost)
|
|
||||||
x <- matrix(1:10, nrow = 5)
|
x <- matrix(1:10, nrow = 5)
|
||||||
dm <- xgb.DMatrix(x, nthread = 1)
|
dm <- xgb.DMatrix(x, nthread = 1)
|
||||||
|
|
||||||
# 'dm' so far doesn't have any fields set
|
# 'dm' so far does not have any fields set
|
||||||
xgb.DMatrix.hasinfo(dm, "label")
|
xgb.DMatrix.hasinfo(dm, "label")
|
||||||
|
|
||||||
# Fields can be added after construction
|
# Fields can be added after construction
|
||||||
@ -28,5 +27,5 @@ setinfo(dm, "label", 1:5)
|
|||||||
xgb.DMatrix.hasinfo(dm, "label")
|
xgb.DMatrix.hasinfo(dm, "label")
|
||||||
}
|
}
|
||||||
\seealso{
|
\seealso{
|
||||||
\link{xgb.DMatrix}, \link{getinfo.xgb.DMatrix}, \link{setinfo.xgb.DMatrix}
|
\code{\link[=xgb.DMatrix]{xgb.DMatrix()}}, \code{\link[=getinfo.xgb.DMatrix]{getinfo.xgb.DMatrix()}}, \code{\link[=setinfo.xgb.DMatrix]{setinfo.xgb.DMatrix()}}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -16,7 +16,8 @@ Save xgb.DMatrix object to binary file
|
|||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
\dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
\dontshow{RhpcBLASctl::omp_set_num_threads(1)}
|
||||||
data(agaricus.train, package='xgboost')
|
data(agaricus.train, package = "xgboost")
|
||||||
|
|
||||||
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||||
fname <- file.path(tempdir(), "xgb.DMatrix.data")
|
fname <- file.path(tempdir(), "xgb.DMatrix.data")
|
||||||
xgb.DMatrix.save(dtrain, fname)
|
xgb.DMatrix.save(dtrain, fname)
|
||||||
|
|||||||
@ -21,16 +21,17 @@ xgb.DataBatch(
|
|||||||
\arguments{
|
\arguments{
|
||||||
\item{data}{Batch of data belonging to this batch.
|
\item{data}{Batch of data belonging to this batch.
|
||||||
|
|
||||||
Note that not all of the input types supported by \link{xgb.DMatrix} are possible
|
Note that not all of the input types supported by \code{\link[=xgb.DMatrix]{xgb.DMatrix()}} are possible
|
||||||
to pass here. Supported types are:\itemize{
|
to pass here. Supported types are:
|
||||||
|
\itemize{
|
||||||
\item \code{matrix}, with types \code{numeric}, \code{integer}, and \code{logical}. Note that for types
|
\item \code{matrix}, with types \code{numeric}, \code{integer}, and \code{logical}. Note that for types
|
||||||
\code{integer} and \code{logical}, missing values might not be automatically recognized as
|
\code{integer} and \code{logical}, missing values might not be automatically recognized as
|
||||||
as such - see the documentation for parameter \code{missing} in \link{xgb.ExternalDMatrix}
|
as such - see the documentation for parameter \code{missing} in \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}
|
||||||
for details on this.
|
for details on this.
|
||||||
\item \code{data.frame}, with the same types as supported by 'xgb.DMatrix' and same
|
\item \code{data.frame}, with the same types as supported by 'xgb.DMatrix' and same
|
||||||
conversions applied to it. See the documentation for parameter \code{data} in
|
conversions applied to it. See the documentation for parameter \code{data} in
|
||||||
\link{xgb.DMatrix} for details on it.
|
\code{\link[=xgb.DMatrix]{xgb.DMatrix()}} for details on it.
|
||||||
\item CSR matrices, as class \code{dgRMatrix} from package \code{Matrix}.
|
\item CSR matrices, as class \code{dgRMatrix} from package "Matrix".
|
||||||
}}
|
}}
|
||||||
|
|
||||||
\item{label}{Label of the training data. For classification problems, should be passed encoded as
|
\item{label}{Label of the training data. For classification problems, should be passed encoded as
|
||||||
@ -47,19 +48,19 @@ so it doesn't make sense to assign weights to individual data points.}
|
|||||||
|
|
||||||
In the case of multi-output models, one can also pass multi-dimensional base_margin.}
|
In the case of multi-output models, one can also pass multi-dimensional base_margin.}
|
||||||
|
|
||||||
\item{feature_names}{Set names for features. Overrides column names in data
|
\item{feature_names}{Set names for features. Overrides column names in data frame and matrix.
|
||||||
frame and matrix.
|
|
||||||
|
|
||||||
Note: columns are not referenced by name when calling \code{predict}, so the column order there
|
Note: columns are not referenced by name when calling \code{predict}, so the column order there
|
||||||
must be the same as in the DMatrix construction, regardless of the column names.}
|
must be the same as in the DMatrix construction, regardless of the column names.}
|
||||||
|
|
||||||
\item{feature_types}{Set types for features.
|
\item{feature_types}{Set types for features.
|
||||||
|
|
||||||
If \code{data} is a \code{data.frame} and passing \code{feature_types} is not supplied, feature types will be deduced
|
If \code{data} is a \code{data.frame} and passing \code{feature_types} is not supplied,
|
||||||
automatically from the column types.
|
feature types will be deduced automatically from the column types.
|
||||||
|
|
||||||
Otherwise, one can pass a character vector with the same length as number of columns in \code{data},
|
Otherwise, one can pass a character vector with the same length as number of columns in \code{data},
|
||||||
with the following possible values:\itemize{
|
with the following possible values:
|
||||||
|
\itemize{
|
||||||
\item "c", which represents categorical columns.
|
\item "c", which represents categorical columns.
|
||||||
\item "q", which represents numeric columns.
|
\item "q", which represents numeric columns.
|
||||||
\item "int", which represents integer columns.
|
\item "int", which represents integer columns.
|
||||||
@ -70,9 +71,9 @@ Note that, while categorical types are treated differently from the rest for mod
|
|||||||
purposes, the other types do not influence the generated model, but have effects in other
|
purposes, the other types do not influence the generated model, but have effects in other
|
||||||
functionalities such as feature importances.
|
functionalities such as feature importances.
|
||||||
|
|
||||||
\bold{Important}: categorical features, if specified manually through \code{feature_types}, must
|
\strong{Important}: Categorical features, if specified manually through \code{feature_types}, must
|
||||||
be encoded as integers with numeration starting at zero, and the same encoding needs to be
|
be encoded as integers with numeration starting at zero, and the same encoding needs to be
|
||||||
applied when passing data to \code{predict}. Even if passing \code{factor} types, the encoding will
|
applied when passing data to \code{\link[=predict]{predict()}}. Even if passing \code{factor} types, the encoding will
|
||||||
not be saved, so make sure that \code{factor} columns passed to \code{predict} have the same \code{levels}.}
|
not be saved, so make sure that \code{factor} columns passed to \code{predict} have the same \code{levels}.}
|
||||||
|
|
||||||
\item{group}{Group size for all ranking group.}
|
\item{group}{Group size for all ranking group.}
|
||||||
@ -87,24 +88,24 @@ not be saved, so make sure that \code{factor} columns passed to \code{predict} h
|
|||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
An object of class \code{xgb.DataBatch}, which is just a list containing the
|
An object of class \code{xgb.DataBatch}, which is just a list containing the
|
||||||
data and parameters passed here. It does \bold{not} inherit from \code{xgb.DMatrix}.
|
data and parameters passed here. It does \strong{not} inherit from \code{xgb.DMatrix}.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Helper function to supply data in batches of a data iterator when
|
Helper function to supply data in batches of a data iterator when
|
||||||
constructing a DMatrix from external memory through \link{xgb.ExternalDMatrix}
|
constructing a DMatrix from external memory through \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}
|
||||||
or through \link{xgb.QuantileDMatrix.from_iterator}.
|
or through \code{\link[=xgb.QuantileDMatrix.from_iterator]{xgb.QuantileDMatrix.from_iterator()}}.
|
||||||
|
|
||||||
This function is \bold{only} meant to be called inside of a callback function (which
|
This function is \strong{only} meant to be called inside of a callback function (which
|
||||||
is passed as argument to function \link{xgb.DataIter} to construct a data iterator)
|
is passed as argument to function \code{\link[=xgb.DataIter]{xgb.DataIter()}} to construct a data iterator)
|
||||||
when constructing a DMatrix through external memory - otherwise, one should call
|
when constructing a DMatrix through external memory - otherwise, one should call
|
||||||
\link{xgb.DMatrix} or \link{xgb.QuantileDMatrix}.
|
\code{\link[=xgb.DMatrix]{xgb.DMatrix()}} or \code{\link[=xgb.QuantileDMatrix]{xgb.QuantileDMatrix()}}.
|
||||||
|
|
||||||
The object that results from calling this function directly is \bold{not} like
|
The object that results from calling this function directly is \strong{not} like
|
||||||
an \code{xgb.DMatrix} - i.e. cannot be used to train a model, nor to get predictions - only
|
an \code{xgb.DMatrix} - i.e. cannot be used to train a model, nor to get predictions - only
|
||||||
possible usage is to supply data to an iterator, from which a DMatrix is then constructed.
|
possible usage is to supply data to an iterator, from which a DMatrix is then constructed.
|
||||||
|
|
||||||
For more information and for example usage, see the documentation for \link{xgb.ExternalDMatrix}.
|
For more information and for example usage, see the documentation for \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}.
|
||||||
}
|
}
|
||||||
\seealso{
|
\seealso{
|
||||||
\link{xgb.DataIter}, \link{xgb.ExternalDMatrix}.
|
\code{\link[=xgb.DataIter]{xgb.DataIter()}}, \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}.
|
||||||
}
|
}
|
||||||
|
|||||||
@ -13,14 +13,15 @@ used to keep track of variables to determine how to handle the batches.
|
|||||||
For example, one might want to keep track of an iteration number in this environment in order
|
For example, one might want to keep track of an iteration number in this environment in order
|
||||||
to know which part of the data to pass next.}
|
to know which part of the data to pass next.}
|
||||||
|
|
||||||
\item{f_next}{\verb{function(env)} which is responsible for:\itemize{
|
\item{f_next}{\verb{function(env)} which is responsible for:
|
||||||
|
\itemize{
|
||||||
\item Accessing or retrieving the next batch of data in the iterator.
|
\item Accessing or retrieving the next batch of data in the iterator.
|
||||||
\item Supplying this data by calling function \link{xgb.DataBatch} on it and returning the result.
|
\item Supplying this data by calling function \code{\link[=xgb.DataBatch]{xgb.DataBatch()}} on it and returning the result.
|
||||||
\item Keeping track of where in the iterator batch it is or will go next, which can for example
|
\item Keeping track of where in the iterator batch it is or will go next, which can for example
|
||||||
be done by modifiying variables in the \code{env} variable that is passed here.
|
be done by modifiying variables in the \code{env} variable that is passed here.
|
||||||
\item Signaling whether there are more batches to be consumed or not, by returning \code{NULL}
|
\item Signaling whether there are more batches to be consumed or not, by returning \code{NULL}
|
||||||
when the stream of data ends (all batches in the iterator have been consumed), or the result from
|
when the stream of data ends (all batches in the iterator have been consumed), or the result from
|
||||||
calling \link{xgb.DataBatch} when there are more batches in the line to be consumed.
|
calling \code{\link[=xgb.DataBatch]{xgb.DataBatch()}} when there are more batches in the line to be consumed.
|
||||||
}}
|
}}
|
||||||
|
|
||||||
\item{f_reset}{\verb{function(env)} which is responsible for reseting the data iterator
|
\item{f_reset}{\verb{function(env)} which is responsible for reseting the data iterator
|
||||||
@ -32,7 +33,7 @@ Note that, after resetting the iterator, the batches will be accessed again, so
|
|||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
An \code{xgb.DataIter} object, containing the same inputs supplied here, which can then
|
An \code{xgb.DataIter} object, containing the same inputs supplied here, which can then
|
||||||
be passed to \link{xgb.ExternalDMatrix}.
|
be passed to \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Interface to create a custom data iterator in order to construct a DMatrix
|
Interface to create a custom data iterator in order to construct a DMatrix
|
||||||
@ -41,11 +42,11 @@ from external memory.
|
|||||||
This function is responsible for generating an R object structure containing callback
|
This function is responsible for generating an R object structure containing callback
|
||||||
functions and an environment shared with them.
|
functions and an environment shared with them.
|
||||||
|
|
||||||
The output structure from this function is then meant to be passed to \link{xgb.ExternalDMatrix},
|
The output structure from this function is then meant to be passed to \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}},
|
||||||
which will consume the data and create a DMatrix from it by executing the callback functions.
|
which will consume the data and create a DMatrix from it by executing the callback functions.
|
||||||
|
|
||||||
For more information, and for a usage example, see the documentation for \link{xgb.ExternalDMatrix}.
|
For more information, and for a usage example, see the documentation for \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}.
|
||||||
}
|
}
|
||||||
\seealso{
|
\seealso{
|
||||||
\link{xgb.ExternalDMatrix}, \link{xgb.DataBatch}.
|
\code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}, \code{\link[=xgb.DataBatch]{xgb.DataBatch()}}.
|
||||||
}
|
}
|
||||||
|
|||||||
@ -12,7 +12,7 @@ xgb.ExternalDMatrix(
|
|||||||
)
|
)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{data_iterator}{A data iterator structure as returned by \link{xgb.DataIter},
|
\item{data_iterator}{A data iterator structure as returned by \code{\link[=xgb.DataIter]{xgb.DataIter()}},
|
||||||
which includes an environment shared between function calls, and functions to access
|
which includes an environment shared between function calls, and functions to access
|
||||||
the data in batches on-demand.}
|
the data in batches on-demand.}
|
||||||
|
|
||||||
@ -20,14 +20,14 @@ the data in batches on-demand.}
|
|||||||
|
|
||||||
\item{missing}{A float value to represents missing values in data.
|
\item{missing}{A float value to represents missing values in data.
|
||||||
|
|
||||||
Note that, while functions like \link{xgb.DMatrix} can take a generic \code{NA} and interpret it
|
Note that, while functions like \code{\link[=xgb.DMatrix]{xgb.DMatrix()}} can take a generic \code{NA} and interpret it
|
||||||
correctly for different types like \code{numeric} and \code{integer}, if an \code{NA} value is passed here,
|
correctly for different types like \code{numeric} and \code{integer}, if an \code{NA} value is passed here,
|
||||||
it will not be adapted for different input types.
|
it will not be adapted for different input types.
|
||||||
|
|
||||||
For example, in R \code{integer} types, missing values are represented by integer number \code{-2147483648}
|
For example, in R \code{integer} types, missing values are represented by integer number \code{-2147483648}
|
||||||
(since machine 'integer' types do not have an inherent 'NA' value) - hence, if one passes \code{NA},
|
(since machine 'integer' types do not have an inherent 'NA' value) - hence, if one passes \code{NA},
|
||||||
which is interpreted as a floating-point NaN by 'xgb.ExternalDMatrix' and by
|
which is interpreted as a floating-point NaN by \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}} and by
|
||||||
'xgb.QuantileDMatrix.from_iterator', these integer missing values will not be treated as missing.
|
\code{\link[=xgb.QuantileDMatrix.from_iterator]{xgb.QuantileDMatrix.from_iterator()}}, these integer missing values will not be treated as missing.
|
||||||
This should not pose any problem for \code{numeric} types, since they do have an inheret NaN value.}
|
This should not pose any problem for \code{numeric} types, since they do have an inheret NaN value.}
|
||||||
|
|
||||||
\item{nthread}{Number of threads used for creating DMatrix.}
|
\item{nthread}{Number of threads used for creating DMatrix.}
|
||||||
@ -37,23 +37,22 @@ An 'xgb.DMatrix' object, with subclass 'xgb.ExternalDMatrix', in which the data
|
|||||||
held internally but accessed through the iterator when needed.
|
held internally but accessed through the iterator when needed.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Create a special type of xgboost 'DMatrix' object from external data
|
Create a special type of XGBoost 'DMatrix' object from external data
|
||||||
supplied by an \link{xgb.DataIter} object, potentially passed in batches from a
|
supplied by an \code{\link[=xgb.DataIter]{xgb.DataIter()}} object, potentially passed in batches from a
|
||||||
bigger set that might not fit entirely in memory.
|
bigger set that might not fit entirely in memory.
|
||||||
|
|
||||||
The data supplied by the iterator is accessed on-demand as needed, multiple times,
|
The data supplied by the iterator is accessed on-demand as needed, multiple times,
|
||||||
without being concatenated, but note that fields like 'label' \bold{will} be
|
without being concatenated, but note that fields like 'label' \strong{will} be
|
||||||
concatenated from multiple calls to the data iterator.
|
concatenated from multiple calls to the data iterator.
|
||||||
|
|
||||||
For more information, see the guide 'Using XGBoost External Memory Version':
|
For more information, see the guide 'Using XGBoost External Memory Version':
|
||||||
\url{https://xgboost.readthedocs.io/en/stable/tutorials/external_memory.html}
|
\url{https://xgboost.readthedocs.io/en/stable/tutorials/external_memory.html}
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
library(xgboost)
|
|
||||||
data(mtcars)
|
data(mtcars)
|
||||||
|
|
||||||
# this custom environment will be passed to the iterator
|
# This custom environment will be passed to the iterator
|
||||||
# functions at each call. It's up to the user to keep
|
# functions at each call. It is up to the user to keep
|
||||||
# track of the iteration number in this environment.
|
# track of the iteration number in this environment.
|
||||||
iterator_env <- as.environment(
|
iterator_env <- as.environment(
|
||||||
list(
|
list(
|
||||||
@ -118,5 +117,5 @@ pred_dm <- predict(model, dm)
|
|||||||
pred_mat <- predict(model, as.matrix(mtcars[, -1]))
|
pred_mat <- predict(model, as.matrix(mtcars[, -1]))
|
||||||
}
|
}
|
||||||
\seealso{
|
\seealso{
|
||||||
\link{xgb.DataIter}, \link{xgb.DataBatch}, \link{xgb.QuantileDMatrix.from_iterator}
|
\code{\link[=xgb.DataIter]{xgb.DataIter()}}, \code{\link[=xgb.DataBatch]{xgb.DataBatch()}}, \code{\link[=xgb.QuantileDMatrix.from_iterator]{xgb.QuantileDMatrix.from_iterator()}}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -13,26 +13,26 @@ xgb.QuantileDMatrix.from_iterator(
|
|||||||
)
|
)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{data_iterator}{A data iterator structure as returned by \link{xgb.DataIter},
|
\item{data_iterator}{A data iterator structure as returned by \code{\link[=xgb.DataIter]{xgb.DataIter()}},
|
||||||
which includes an environment shared between function calls, and functions to access
|
which includes an environment shared between function calls, and functions to access
|
||||||
the data in batches on-demand.}
|
the data in batches on-demand.}
|
||||||
|
|
||||||
\item{missing}{A float value to represents missing values in data.
|
\item{missing}{A float value to represents missing values in data.
|
||||||
|
|
||||||
Note that, while functions like \link{xgb.DMatrix} can take a generic \code{NA} and interpret it
|
Note that, while functions like \code{\link[=xgb.DMatrix]{xgb.DMatrix()}} can take a generic \code{NA} and interpret it
|
||||||
correctly for different types like \code{numeric} and \code{integer}, if an \code{NA} value is passed here,
|
correctly for different types like \code{numeric} and \code{integer}, if an \code{NA} value is passed here,
|
||||||
it will not be adapted for different input types.
|
it will not be adapted for different input types.
|
||||||
|
|
||||||
For example, in R \code{integer} types, missing values are represented by integer number \code{-2147483648}
|
For example, in R \code{integer} types, missing values are represented by integer number \code{-2147483648}
|
||||||
(since machine 'integer' types do not have an inherent 'NA' value) - hence, if one passes \code{NA},
|
(since machine 'integer' types do not have an inherent 'NA' value) - hence, if one passes \code{NA},
|
||||||
which is interpreted as a floating-point NaN by 'xgb.ExternalDMatrix' and by
|
which is interpreted as a floating-point NaN by \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}} and by
|
||||||
'xgb.QuantileDMatrix.from_iterator', these integer missing values will not be treated as missing.
|
\code{\link[=xgb.QuantileDMatrix.from_iterator]{xgb.QuantileDMatrix.from_iterator()}}, these integer missing values will not be treated as missing.
|
||||||
This should not pose any problem for \code{numeric} types, since they do have an inheret NaN value.}
|
This should not pose any problem for \code{numeric} types, since they do have an inheret NaN value.}
|
||||||
|
|
||||||
\item{nthread}{Number of threads used for creating DMatrix.}
|
\item{nthread}{Number of threads used for creating DMatrix.}
|
||||||
|
|
||||||
\item{ref}{The training dataset that provides quantile information, needed when creating
|
\item{ref}{The training dataset that provides quantile information, needed when creating
|
||||||
validation/test dataset with \code{xgb.QuantileDMatrix}. Supplying the training DMatrix
|
validation/test dataset with \code{\link[=xgb.QuantileDMatrix]{xgb.QuantileDMatrix()}}. Supplying the training DMatrix
|
||||||
as a reference means that the same quantisation applied to the training data is
|
as a reference means that the same quantisation applied to the training data is
|
||||||
applied to the validation/test data}
|
applied to the validation/test data}
|
||||||
|
|
||||||
@ -46,20 +46,20 @@ An 'xgb.DMatrix' object, with subclass 'xgb.QuantileDMatrix'.
|
|||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Create an \code{xgb.QuantileDMatrix} object (exact same class as would be returned by
|
Create an \code{xgb.QuantileDMatrix} object (exact same class as would be returned by
|
||||||
calling function \link{xgb.QuantileDMatrix}, with the same advantages and limitations) from
|
calling function \code{\link[=xgb.QuantileDMatrix]{xgb.QuantileDMatrix()}}, with the same advantages and limitations) from
|
||||||
external data supplied by an \link{xgb.DataIter} object, potentially passed in batches from
|
external data supplied by \code{\link[=xgb.DataIter]{xgb.DataIter()}}, potentially passed in batches from
|
||||||
a bigger set that might not fit entirely in memory, same way as \link{xgb.ExternalDMatrix}.
|
a bigger set that might not fit entirely in memory, same way as \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}}.
|
||||||
|
|
||||||
Note that, while external data will only be loaded through the iterator (thus the full data
|
Note that, while external data will only be loaded through the iterator (thus the full data
|
||||||
might not be held entirely in-memory), the quantized representation of the data will get
|
might not be held entirely in-memory), the quantized representation of the data will get
|
||||||
created in-memory, being concatenated from multiple calls to the data iterator. The quantized
|
created in-memory, being concatenated from multiple calls to the data iterator. The quantized
|
||||||
version is typically lighter than the original data, so there might be cases in which this
|
version is typically lighter than the original data, so there might be cases in which this
|
||||||
representation could potentially fit in memory even if the full data doesn't.
|
representation could potentially fit in memory even if the full data does not.
|
||||||
|
|
||||||
For more information, see the guide 'Using XGBoost External Memory Version':
|
For more information, see the guide 'Using XGBoost External Memory Version':
|
||||||
\url{https://xgboost.readthedocs.io/en/stable/tutorials/external_memory.html}
|
\url{https://xgboost.readthedocs.io/en/stable/tutorials/external_memory.html}
|
||||||
}
|
}
|
||||||
\seealso{
|
\seealso{
|
||||||
\link{xgb.DataIter}, \link{xgb.DataBatch}, \link{xgb.ExternalDMatrix},
|
\code{\link[=xgb.DataIter]{xgb.DataIter()}}, \code{\link[=xgb.DataBatch]{xgb.DataBatch()}}, \code{\link[=xgb.ExternalDMatrix]{xgb.ExternalDMatrix()}},
|
||||||
\link{xgb.QuantileDMatrix}
|
\code{\link[=xgb.QuantileDMatrix]{xgb.QuantileDMatrix()}}
|
||||||
}
|
}
|
||||||
|
|||||||
@ -51,7 +51,7 @@ Also, setting an attribute that has the same name as one of XGBoost's parameters
|
|||||||
change the value of that parameter for a model.
|
change the value of that parameter for a model.
|
||||||
Use \code{\link[=xgb.parameters<-]{xgb.parameters<-()}} to set or change model parameters.
|
Use \code{\link[=xgb.parameters<-]{xgb.parameters<-()}} to set or change model parameters.
|
||||||
|
|
||||||
The \code{\link[=xgb.attributes<-]{xgb.attributes<-()}} setter either updates the existing or adds one or several attributes,
|
The \verb{xgb.attributes<-} setter either updates the existing or adds one or several attributes,
|
||||||
but it doesn't delete the other existing attributes.
|
but it doesn't delete the other existing attributes.
|
||||||
|
|
||||||
Important: since this modifies the booster's C object, semantics for assignment here
|
Important: since this modifies the booster's C object, semantics for assignment here
|
||||||
|
|||||||
@ -26,141 +26,136 @@ xgb.cv(
|
|||||||
)
|
)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{params}{the list of parameters. The complete list of parameters is
|
\item{params}{The list of parameters. The complete list of parameters is available in the
|
||||||
available in the \href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}. Below
|
\href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}.
|
||||||
is a shorter summary:
|
Below is a shorter summary:
|
||||||
\itemize{
|
\itemize{
|
||||||
\item \code{objective} objective function, common ones are
|
\item \code{objective}: Objective function, common ones are
|
||||||
\itemize{
|
\itemize{
|
||||||
\item \code{reg:squarederror} Regression with squared loss.
|
\item \code{reg:squarederror}: Regression with squared loss.
|
||||||
\item \code{binary:logistic} logistic regression for classification.
|
\item \code{binary:logistic}: Logistic regression for classification.
|
||||||
\item See \code{\link[=xgb.train]{xgb.train}()} for complete list of objectives.
|
|
||||||
}
|
|
||||||
\item \code{eta} step size of each boosting step
|
|
||||||
\item \code{max_depth} maximum depth of the tree
|
|
||||||
\item \code{nthread} number of thread used in training, if not set, all threads are used
|
|
||||||
}
|
}
|
||||||
|
|
||||||
See \code{\link{xgb.train}} for further details.
|
See \code{\link[=xgb.train]{xgb.train()}} for complete list of objectives.
|
||||||
See also demo/ for walkthrough example in R.
|
\item \code{eta}: Step size of each boosting step
|
||||||
|
\item \code{max_depth}: Maximum depth of the tree
|
||||||
|
\item \code{nthread}: Number of threads used in training. If not set, all threads are used
|
||||||
|
}
|
||||||
|
|
||||||
|
See \code{\link[=xgb.train]{xgb.train()}} for further details.
|
||||||
|
See also demo for walkthrough example in R.
|
||||||
|
|
||||||
Note that, while \code{params} accepts a \code{seed} entry and will use such parameter for model training if
|
Note that, while \code{params} accepts a \code{seed} entry and will use such parameter for model training if
|
||||||
supplied, this seed is not used for creation of train-test splits, which instead rely on R's own RNG
|
supplied, this seed is not used for creation of train-test splits, which instead rely on R's own RNG
|
||||||
system - thus, for reproducible results, one needs to call the \code{set.seed} function beforehand.}
|
system - thus, for reproducible results, one needs to call the \code{\link[=set.seed]{set.seed()}} function beforehand.}
|
||||||
|
|
||||||
\item{data}{An \code{xgb.DMatrix} object, with corresponding fields like \code{label} or bounds as required
|
\item{data}{An \code{xgb.DMatrix} object, with corresponding fields like \code{label} or bounds as required
|
||||||
for model training by the objective.
|
for model training by the objective.
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Note that only the basic `xgb.DMatrix` class is supported - variants such as `xgb.QuantileDMatrix`
|
Note that only the basic \code{xgb.DMatrix} class is supported - variants such as \code{xgb.QuantileDMatrix}
|
||||||
or `xgb.ExternalDMatrix` are not supported here.
|
or \code{xgb.ExternalDMatrix} are not supported here.}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{nrounds}{the max number of iterations}
|
\item{nrounds}{The max number of iterations.}
|
||||||
|
|
||||||
\item{nfold}{the original dataset is randomly partitioned into \code{nfold} equal size subsamples.}
|
\item{nfold}{The original dataset is randomly partitioned into \code{nfold} equal size subsamples.}
|
||||||
|
|
||||||
\item{prediction}{A logical value indicating whether to return the test fold predictions
|
\item{prediction}{A logical value indicating whether to return the test fold predictions
|
||||||
from each CV model. This parameter engages the \code{\link{xgb.cb.cv.predict}} callback.}
|
from each CV model. This parameter engages the \code{\link[=xgb.cb.cv.predict]{xgb.cb.cv.predict()}} callback.}
|
||||||
|
|
||||||
\item{showsd}{\code{boolean}, whether to show standard deviation of cross validation}
|
\item{showsd}{Logical value whether to show standard deviation of cross validation.}
|
||||||
|
|
||||||
\item{metrics, }{list of evaluation metrics to be used in cross validation,
|
\item{metrics}{List of evaluation metrics to be used in cross validation,
|
||||||
when it is not specified, the evaluation metric is chosen according to objective function.
|
when it is not specified, the evaluation metric is chosen according to objective function.
|
||||||
Possible options are:
|
Possible options are:
|
||||||
\itemize{
|
\itemize{
|
||||||
\item \code{error} binary classification error rate
|
\item \code{error}: Binary classification error rate
|
||||||
\item \code{rmse} Rooted mean square error
|
\item \code{rmse}: Root mean square error
|
||||||
\item \code{logloss} negative log-likelihood function
|
\item \code{logloss}: Negative log-likelihood function
|
||||||
\item \code{mae} Mean absolute error
|
\item \code{mae}: Mean absolute error
|
||||||
\item \code{mape} Mean absolute percentage error
|
\item \code{mape}: Mean absolute percentage error
|
||||||
\item \code{auc} Area under curve
|
\item \code{auc}: Area under curve
|
||||||
\item \code{aucpr} Area under PR curve
|
\item \code{aucpr}: Area under PR curve
|
||||||
\item \code{merror} Exact matching error, used to evaluate multi-class classification
|
\item \code{merror}: Exact matching error used to evaluate multi-class classification
|
||||||
}}
|
}}
|
||||||
|
|
||||||
\item{obj}{customized objective function. Returns gradient and second order
|
\item{obj}{Customized objective function. Returns gradient and second order
|
||||||
gradient with given prediction and dtrain.}
|
gradient with given prediction and dtrain.}
|
||||||
|
|
||||||
\item{feval}{customized evaluation function. Returns
|
\item{feval}{Customized evaluation function. Returns
|
||||||
\code{list(metric='metric-name', value='metric-value')} with given
|
\code{list(metric='metric-name', value='metric-value')} with given prediction and dtrain.}
|
||||||
prediction and dtrain.}
|
|
||||||
|
|
||||||
\item{stratified}{A \code{boolean} indicating whether sampling of folds should be stratified
|
\item{stratified}{Logical flag indicating whether sampling of folds should be stratified
|
||||||
by the values of outcome labels. For real-valued labels in regression objectives,
|
by the values of outcome labels. For real-valued labels in regression objectives,
|
||||||
stratification will be done by discretizing the labels into up to 5 buckets beforehand.
|
stratification will be done by discretizing the labels into up to 5 buckets beforehand.
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ If passing "auto", will be set to `TRUE` if the objective in `params` is a classification
|
If passing "auto", will be set to \code{TRUE} if the objective in \code{params} is a classification
|
||||||
objective (from XGBoost's built-in objectives, doesn't apply to custom ones), and to
|
objective (from XGBoost's built-in objectives, doesn't apply to custom ones), and to
|
||||||
`FALSE` otherwise.
|
\code{FALSE} otherwise.
|
||||||
|
|
||||||
This parameter is ignored when `data` has a `group` field - in such case, the splitting
|
This parameter is ignored when \code{data} has a \code{group} field - in such case, the splitting
|
||||||
will be based on whole groups (note that this might make the folds have different sizes).
|
will be based on whole groups (note that this might make the folds have different sizes).
|
||||||
|
|
||||||
Value `TRUE` here is \\bold\{not\} supported for custom objectives.
|
Value \code{TRUE} here is \strong{not} supported for custom objectives.}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{folds}{\code{list} provides a possibility to use a list of pre-defined CV folds
|
\item{folds}{List with pre-defined CV folds (each element must be a vector of test fold's indices).
|
||||||
(each element must be a vector of test fold's indices). When folds are supplied,
|
When folds are supplied, the \code{nfold} and \code{stratified} parameters are ignored.
|
||||||
the \code{nfold} and \code{stratified} parameters are ignored.
|
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ If `data` has a `group` field and the objective requires this field, each fold (list element)
|
If \code{data} has a \code{group} field and the objective requires this field, each fold (list element)
|
||||||
must additionally have two attributes (retrievable through \link{attributes}) named `group_test`
|
must additionally have two attributes (retrievable through \code{attributes}) named \code{group_test}
|
||||||
and `group_train`, which should hold the `group` to assign through \link{setinfo.xgb.DMatrix} to
|
and \code{group_train}, which should hold the \code{group} to assign through \code{\link[=setinfo.xgb.DMatrix]{setinfo.xgb.DMatrix()}} to
|
||||||
the resulting DMatrices.
|
the resulting DMatrices.}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{train_folds}{\code{list} list specifying which indicies to use for training. If \code{NULL}
|
\item{train_folds}{List specifying which indices to use for training. If \code{NULL}
|
||||||
(the default) all indices not specified in \code{folds} will be used for training.
|
(the default) all indices not specified in \code{folds} will be used for training.
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ This is not supported when `data` has `group` field.
|
This is not supported when \code{data} has \code{group} field.}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{verbose}{\code{boolean}, print the statistics during the process}
|
\item{verbose}{Logical flag. Should statistics be printed during the process?}
|
||||||
|
|
||||||
\item{print_every_n}{Print each n-th iteration evaluation messages when \code{verbose>0}.
|
\item{print_every_n}{Print each nth iteration evaluation messages when \code{verbose > 0}.
|
||||||
Default is 1 which means all messages are printed. This parameter is passed to the
|
Default is 1 which means all messages are printed. This parameter is passed to the
|
||||||
\code{\link{xgb.cb.print.evaluation}} callback.}
|
\code{\link[=xgb.cb.print.evaluation]{xgb.cb.print.evaluation()}} callback.}
|
||||||
|
|
||||||
\item{early_stopping_rounds}{If \code{NULL}, the early stopping function is not triggered.
|
\item{early_stopping_rounds}{If \code{NULL}, the early stopping function is not triggered.
|
||||||
If set to an integer \code{k}, training with a validation set will stop if the performance
|
If set to an integer \code{k}, training with a validation set will stop if the performance
|
||||||
doesn't improve for \code{k} rounds.
|
doesn't improve for \code{k} rounds.
|
||||||
Setting this parameter engages the \code{\link{xgb.cb.early.stop}} callback.}
|
Setting this parameter engages the \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}} callback.}
|
||||||
|
|
||||||
\item{maximize}{If \code{feval} and \code{early_stopping_rounds} are set,
|
\item{maximize}{If \code{feval} and \code{early_stopping_rounds} are set,
|
||||||
then this parameter must be set as well.
|
then this parameter must be set as well.
|
||||||
When it is \code{TRUE}, it means the larger the evaluation score the better.
|
When it is \code{TRUE}, it means the larger the evaluation score the better.
|
||||||
This parameter is passed to the \code{\link{xgb.cb.early.stop}} callback.}
|
This parameter is passed to the \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}} callback.}
|
||||||
|
|
||||||
\item{callbacks}{a list of callback functions to perform various task during boosting.
|
\item{callbacks}{A list of callback functions to perform various task during boosting.
|
||||||
See \code{\link{xgb.Callback}}. Some of the callbacks are automatically created depending on the
|
See \code{\link[=xgb.Callback]{xgb.Callback()}}. Some of the callbacks are automatically created depending on the
|
||||||
parameters' values. User can provide either existing or their own callback methods in order
|
parameters' values. User can provide either existing or their own callback methods in order
|
||||||
to customize the training process.}
|
to customize the training process.}
|
||||||
|
|
||||||
\item{...}{other parameters to pass to \code{params}.}
|
\item{...}{Other parameters to pass to \code{params}.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
An object of class \code{xgb.cv.synchronous} with the following elements:
|
An object of class 'xgb.cv.synchronous' with the following elements:
|
||||||
\itemize{
|
\itemize{
|
||||||
\item \code{call} a function call.
|
\item \code{call}: Function call.
|
||||||
\item \code{params} parameters that were passed to the xgboost library. Note that it does not
|
\item \code{params}: Parameters that were passed to the xgboost library. Note that it does not
|
||||||
capture parameters changed by the \code{\link{xgb.cb.reset.parameters}} callback.
|
capture parameters changed by the \code{\link[=xgb.cb.reset.parameters]{xgb.cb.reset.parameters()}} callback.
|
||||||
\item \code{evaluation_log} evaluation history stored as a \code{data.table} with the
|
\item \code{evaluation_log}: Evaluation history stored as a \code{data.table} with the
|
||||||
first column corresponding to iteration number and the rest corresponding to the
|
first column corresponding to iteration number and the rest corresponding to the
|
||||||
CV-based evaluation means and standard deviations for the training and test CV-sets.
|
CV-based evaluation means and standard deviations for the training and test CV-sets.
|
||||||
It is created by the \code{\link{xgb.cb.evaluation.log}} callback.
|
It is created by the \code{\link[=xgb.cb.evaluation.log]{xgb.cb.evaluation.log()}} callback.
|
||||||
\item \code{niter} number of boosting iterations.
|
\item \code{niter}: Number of boosting iterations.
|
||||||
\item \code{nfeatures} number of features in training data.
|
\item \code{nfeatures}: Number of features in training data.
|
||||||
\item \code{folds} the list of CV folds' indices - either those passed through the \code{folds}
|
\item \code{folds}: The list of CV folds' indices - either those passed through the \code{folds}
|
||||||
parameter or randomly generated.
|
parameter or randomly generated.
|
||||||
\item \code{best_iteration} iteration number with the best evaluation metric value
|
\item \code{best_iteration}: Iteration number with the best evaluation metric value
|
||||||
(only available with early stopping).
|
(only available with early stopping).
|
||||||
}
|
}
|
||||||
|
|
||||||
Plus other potential elements that are the result of callbacks, such as a list \code{cv_predict} with
|
Plus other potential elements that are the result of callbacks, such as a list \code{cv_predict} with
|
||||||
a sub-element \code{pred} when passing \code{prediction = TRUE}, which is added by the \link{xgb.cb.cv.predict}
|
a sub-element \code{pred} when passing \code{prediction = TRUE}, which is added by the \code{\link[=xgb.cb.cv.predict]{xgb.cb.cv.predict()}}
|
||||||
callback (note that one can also pass it manually under \code{callbacks} with different settings,
|
callback (note that one can also pass it manually under \code{callbacks} with different settings,
|
||||||
such as saving also the models created during cross validation); or a list \code{early_stop} which
|
such as saving also the models created during cross validation); or a list \code{early_stop} which
|
||||||
will contain elements such as \code{best_iteration} when using the early stopping callback (\link{xgb.cb.early.stop}).
|
will contain elements such as \code{best_iteration} when using the early stopping callback (\code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}}).
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
The cross validation function of xgboost.
|
The cross validation function of xgboost.
|
||||||
@ -179,11 +174,20 @@ All observations are used for both training and validation.
|
|||||||
Adapted from \url{https://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29}
|
Adapted from \url{https://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29}
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
data(agaricus.train, package='xgboost')
|
data(agaricus.train, package = "xgboost")
|
||||||
|
|
||||||
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||||
cv <- xgb.cv(data = dtrain, nrounds = 3, nthread = 2, nfold = 5, metrics = list("rmse","auc"),
|
|
||||||
max_depth = 3, eta = 1, objective = "binary:logistic")
|
cv <- xgb.cv(
|
||||||
|
data = dtrain,
|
||||||
|
nrounds = 3,
|
||||||
|
nthread = 2,
|
||||||
|
nfold = 5,
|
||||||
|
metrics = list("rmse","auc"),
|
||||||
|
max_depth = 3,
|
||||||
|
eta = 1,objective = "binary:logistic"
|
||||||
|
)
|
||||||
print(cv)
|
print(cv)
|
||||||
print(cv, verbose=TRUE)
|
print(cv, verbose = TRUE)
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@ -7,7 +7,7 @@
|
|||||||
xgb.get.DMatrix.data(dmat)
|
xgb.get.DMatrix.data(dmat)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{dmat}{An \code{xgb.DMatrix} object, as returned by \link{xgb.DMatrix}.}
|
\item{dmat}{An \code{xgb.DMatrix} object, as returned by \code{\link[=xgb.DMatrix]{xgb.DMatrix()}}.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
The data held in the DMatrix, as a sparse CSR matrix (class \code{dgRMatrix}
|
The data held in the DMatrix, as a sparse CSR matrix (class \code{dgRMatrix}
|
||||||
|
|||||||
@ -7,10 +7,10 @@
|
|||||||
xgb.get.DMatrix.num.non.missing(dmat)
|
xgb.get.DMatrix.num.non.missing(dmat)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{dmat}{An \code{xgb.DMatrix} object, as returned by \link{xgb.DMatrix}.}
|
\item{dmat}{An \code{xgb.DMatrix} object, as returned by \code{\link[=xgb.DMatrix]{xgb.DMatrix()}}.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
The number of non-missing entries in the DMatrix
|
The number of non-missing entries in the DMatrix.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Get Number of Non-Missing Entries in DMatrix
|
Get Number of Non-Missing Entries in DMatrix
|
||||||
|
|||||||
@ -7,15 +7,14 @@
|
|||||||
xgb.get.DMatrix.qcut(dmat, output = c("list", "arrays"))
|
xgb.get.DMatrix.qcut(dmat, output = c("list", "arrays"))
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{dmat}{An \code{xgb.DMatrix} object, as returned by \link{xgb.DMatrix}.}
|
\item{dmat}{An \code{xgb.DMatrix} object, as returned by \code{\link[=xgb.DMatrix]{xgb.DMatrix()}}.}
|
||||||
|
|
||||||
\item{output}{Output format for the quantile cuts. Possible options are:\itemize{
|
\item{output}{Output format for the quantile cuts. Possible options are:
|
||||||
\item \code{"list"} will return the output as a list with one entry per column, where
|
\itemize{
|
||||||
each column will have a numeric vector with the cuts. The list will be named if
|
\item "list"\verb{will return the output as a list with one entry per column, where each column will have a numeric vector with the cuts. The list will be named if}dmat` has column names assigned to it.
|
||||||
\code{dmat} has column names assigned to it.
|
|
||||||
\item \code{"arrays"} will return a list with entries \code{indptr} (base-0 indexing) and
|
\item \code{"arrays"} will return a list with entries \code{indptr} (base-0 indexing) and
|
||||||
\code{data}. Here, the cuts for column 'i' are obtained by slicing 'data' from entries
|
\code{data}. Here, the cuts for column 'i' are obtained by slicing 'data' from entries
|
||||||
\code{indptr[i]+1} to \code{indptr[i+1]}.
|
\code{ indptr[i]+1} to \code{indptr[i+1]}.
|
||||||
}}
|
}}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
@ -23,7 +22,7 @@ The quantile cuts, in the format specified by parameter \code{output}.
|
|||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Get the quantile cuts (a.k.a. borders) from an \code{xgb.DMatrix}
|
Get the quantile cuts (a.k.a. borders) from an \code{xgb.DMatrix}
|
||||||
that has been quantized for the histogram method (\code{tree_method="hist"}).
|
that has been quantized for the histogram method (\code{tree_method = "hist"}).
|
||||||
|
|
||||||
These cuts are used in order to assign observations to bins - i.e. these are ordered
|
These cuts are used in order to assign observations to bins - i.e. these are ordered
|
||||||
boundaries which are used to determine assignment condition \verb{border_low < x < border_high}.
|
boundaries which are used to determine assignment condition \verb{border_low < x < border_high}.
|
||||||
@ -36,8 +35,8 @@ which will be output in sorted order from lowest to highest.
|
|||||||
Different columns can have different numbers of bins according to their range.
|
Different columns can have different numbers of bins according to their range.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
library(xgboost)
|
|
||||||
data(mtcars)
|
data(mtcars)
|
||||||
|
|
||||||
y <- mtcars$mpg
|
y <- mtcars$mpg
|
||||||
x <- as.matrix(mtcars[, -1])
|
x <- as.matrix(mtcars[, -1])
|
||||||
dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
||||||
@ -45,11 +44,7 @@ dm <- xgb.DMatrix(x, label = y, nthread = 1)
|
|||||||
# DMatrix is not quantized right away, but will be once a hist model is generated
|
# DMatrix is not quantized right away, but will be once a hist model is generated
|
||||||
model <- xgb.train(
|
model <- xgb.train(
|
||||||
data = dm,
|
data = dm,
|
||||||
params = list(
|
params = list(tree_method = "hist", max_bin = 8, nthread = 1),
|
||||||
tree_method = "hist",
|
|
||||||
max_bin = 8,
|
|
||||||
nthread = 1
|
|
||||||
),
|
|
||||||
nrounds = 3
|
nrounds = 3
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|||||||
@ -15,7 +15,7 @@ xgb.plot.multi.trees(
|
|||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{model}{Object of class \code{xgb.Booster}. If it contains feature names (they can be set through
|
\item{model}{Object of class \code{xgb.Booster}. If it contains feature names (they can be set through
|
||||||
\link{setinfo}), they will be used in the output from this function.}
|
\code{\link[=setinfo]{setinfo()}}, they will be used in the output from this function.}
|
||||||
|
|
||||||
\item{features_keep}{Number of features to keep in each position of the multi trees,
|
\item{features_keep}{Number of features to keep in each position of the multi trees,
|
||||||
by default 5.}
|
by default 5.}
|
||||||
|
|||||||
@ -17,7 +17,7 @@ xgb.plot.tree(
|
|||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{model}{Object of class \code{xgb.Booster}. If it contains feature names (they can be set through
|
\item{model}{Object of class \code{xgb.Booster}. If it contains feature names (they can be set through
|
||||||
\link{setinfo}), they will be used in the output from this function.}
|
\code{\link[=setinfo]{setinfo()}}, they will be used in the output from this function.}
|
||||||
|
|
||||||
\item{trees}{An integer vector of tree indices that should be used.
|
\item{trees}{An integer vector of tree indices that should be used.
|
||||||
The default (\code{NULL}) uses all trees.
|
The default (\code{NULL}) uses all trees.
|
||||||
|
|||||||
@ -3,36 +3,36 @@
|
|||||||
\name{xgb.slice.DMatrix}
|
\name{xgb.slice.DMatrix}
|
||||||
\alias{xgb.slice.DMatrix}
|
\alias{xgb.slice.DMatrix}
|
||||||
\alias{[.xgb.DMatrix}
|
\alias{[.xgb.DMatrix}
|
||||||
\title{Get a new DMatrix containing the specified rows of
|
\title{Slice DMatrix}
|
||||||
original xgb.DMatrix object}
|
|
||||||
\usage{
|
\usage{
|
||||||
xgb.slice.DMatrix(object, idxset, allow_groups = FALSE)
|
xgb.slice.DMatrix(object, idxset, allow_groups = FALSE)
|
||||||
|
|
||||||
\method{[}{xgb.DMatrix}(object, idxset, colset = NULL)
|
\method{[}{xgb.DMatrix}(object, idxset, colset = NULL)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{object}{Object of class "xgb.DMatrix".}
|
\item{object}{Object of class \code{xgb.DMatrix}.}
|
||||||
|
|
||||||
\item{idxset}{An integer vector of indices of rows needed (base-1 indexing).}
|
\item{idxset}{An integer vector of indices of rows needed (base-1 indexing).}
|
||||||
|
|
||||||
\item{allow_groups}{Whether to allow slicing an \code{xgb.DMatrix} with \code{group} (or
|
\item{allow_groups}{Whether to allow slicing an \code{xgb.DMatrix} with \code{group} (or
|
||||||
equivalently \code{qid}) field. Note that in such case, the result will not have
|
equivalently \code{qid}) field. Note that in such case, the result will not have
|
||||||
the groups anymore - they need to be set manually through \code{setinfo}.}
|
the groups anymore - they need to be set manually through \code{\link[=setinfo]{setinfo()}}.}
|
||||||
|
|
||||||
\item{colset}{currently not used (columns subsetting is not available)}
|
\item{colset}{Currently not used (columns subsetting is not available).}
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Get a new DMatrix containing the specified rows of
|
Get a new DMatrix containing the specified rows of original xgb.DMatrix object.
|
||||||
original xgb.DMatrix object
|
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
data(agaricus.train, package='xgboost')
|
data(agaricus.train, package = "xgboost")
|
||||||
|
|
||||||
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
|
||||||
|
|
||||||
dsub <- xgb.slice.DMatrix(dtrain, 1:42)
|
dsub <- xgb.slice.DMatrix(dtrain, 1:42)
|
||||||
labels1 <- getinfo(dsub, 'label')
|
labels1 <- getinfo(dsub, "label")
|
||||||
|
|
||||||
dsub <- dtrain[1:42, ]
|
dsub <- dtrain[1:42, ]
|
||||||
labels2 <- getinfo(dsub, 'label')
|
labels2 <- getinfo(dsub, "label")
|
||||||
all.equal(labels1, labels2)
|
all.equal(labels1, labels2)
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@ -24,106 +24,100 @@ xgb.train(
|
|||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{params}{the list of parameters. The complete list of parameters is
|
\item{params}{the list of parameters. The complete list of parameters is
|
||||||
available in the \href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}. Below
|
available in the \href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}.
|
||||||
is a shorter summary:
|
Below is a shorter summary:
|
||||||
\enumerate{
|
|
||||||
\item General Parameters
|
|
||||||
}
|
|
||||||
|
|
||||||
|
\strong{1. General Parameters}
|
||||||
\itemize{
|
\itemize{
|
||||||
\item \code{booster} which booster to use, can be \code{gbtree} or \code{gblinear}. Default: \code{gbtree}.
|
\item \code{booster}: Which booster to use, can be \code{gbtree} or \code{gblinear}. Default: \code{gbtree}.
|
||||||
}
|
|
||||||
\enumerate{
|
|
||||||
\item Booster Parameters
|
|
||||||
}
|
}
|
||||||
|
|
||||||
2.1. Parameters for Tree Booster
|
\strong{2. Booster Parameters}
|
||||||
|
|
||||||
|
\strong{2.1. Parameters for Tree Booster}
|
||||||
\itemize{
|
\itemize{
|
||||||
\item{ \code{eta} control the learning rate: scale the contribution of each tree by a factor of \code{0 < eta < 1}
|
\item \code{eta}: The learning rate: scale the contribution of each tree by a factor of \verb{0 < eta < 1}
|
||||||
when it is added to the current approximation.
|
when it is added to the current approximation.
|
||||||
Used to prevent overfitting by making the boosting process more conservative.
|
Used to prevent overfitting by making the boosting process more conservative.
|
||||||
Lower value for \code{eta} implies larger value for \code{nrounds}: low \code{eta} value means model
|
Lower value for \code{eta} implies larger value for \code{nrounds}: low \code{eta} value means model
|
||||||
more robust to overfitting but slower to compute. Default: 0.3}
|
more robust to overfitting but slower to compute. Default: 0.3.
|
||||||
\item{ \code{gamma} minimum loss reduction required to make a further partition on a leaf node of the tree.
|
\item \code{gamma}: Minimum loss reduction required to make a further partition on a leaf node of the tree.
|
||||||
the larger, the more conservative the algorithm will be.}
|
the larger, the more conservative the algorithm will be.
|
||||||
\item \code{max_depth} maximum depth of a tree. Default: 6
|
\item \code{max_depth}: Maximum depth of a tree. Default: 6.
|
||||||
\item{\code{min_child_weight} minimum sum of instance weight (hessian) needed in a child.
|
\item \code{min_child_weight}: Minimum sum of instance weight (hessian) needed in a child.
|
||||||
If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight,
|
If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight,
|
||||||
then the building process will give up further partitioning.
|
then the building process will give up further partitioning.
|
||||||
In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node.
|
In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node.
|
||||||
The larger, the more conservative the algorithm will be. Default: 1}
|
The larger, the more conservative the algorithm will be. Default: 1.
|
||||||
\item{ \code{subsample} subsample ratio of the training instance.
|
\item \code{subsample}: Subsample ratio of the training instance.
|
||||||
Setting it to 0.5 means that xgboost randomly collected half of the data instances to grow trees
|
Setting it to 0.5 means that xgboost randomly collected half of the data instances to grow trees
|
||||||
and this will prevent overfitting. It makes computation shorter (because less data to analyse).
|
and this will prevent overfitting. It makes computation shorter (because less data to analyse).
|
||||||
It is advised to use this parameter with \code{eta} and increase \code{nrounds}. Default: 1}
|
It is advised to use this parameter with \code{eta} and increase \code{nrounds}. Default: 1.
|
||||||
\item \code{colsample_bytree} subsample ratio of columns when constructing each tree. Default: 1
|
\item \code{colsample_bytree}: Subsample ratio of columns when constructing each tree. Default: 1.
|
||||||
\item \code{lambda} L2 regularization term on weights. Default: 1
|
\item \code{lambda}: L2 regularization term on weights. Default: 1.
|
||||||
\item \code{alpha} L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0
|
\item \code{alpha}: L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0.
|
||||||
\item{ \code{num_parallel_tree} Experimental parameter. number of trees to grow per round.
|
\item \code{num_parallel_tree}: Experimental parameter. number of trees to grow per round.
|
||||||
Useful to test Random Forest through XGBoost
|
Useful to test Random Forest through XGBoost.
|
||||||
(set \code{colsample_bytree < 1}, \code{subsample < 1} and \code{round = 1}) accordingly.
|
(set \code{colsample_bytree < 1}, \code{subsample < 1} and \code{round = 1}) accordingly.
|
||||||
Default: 1}
|
Default: 1.
|
||||||
\item{ \code{monotone_constraints} A numerical vector consists of \code{1}, \code{0} and \code{-1} with its length
|
\item \code{monotone_constraints}: A numerical vector consists of \code{1}, \code{0} and \code{-1} with its length
|
||||||
equals to the number of features in the training data.
|
equals to the number of features in the training data.
|
||||||
\code{1} is increasing, \code{-1} is decreasing and \code{0} is no constraint.}
|
\code{1} is increasing, \code{-1} is decreasing and \code{0} is no constraint.
|
||||||
\item{ \code{interaction_constraints} A list of vectors specifying feature indices of permitted interactions.
|
\item \code{interaction_constraints}: A list of vectors specifying feature indices of permitted interactions.
|
||||||
Each item of the list represents one permitted interaction where specified features are allowed to interact with each other.
|
Each item of the list represents one permitted interaction where specified features are allowed to interact with each other.
|
||||||
Feature index values should start from \code{0} (\code{0} references the first column).
|
Feature index values should start from \code{0} (\code{0} references the first column).
|
||||||
Leave argument unspecified for no interaction constraints.}
|
Leave argument unspecified for no interaction constraints.
|
||||||
}
|
}
|
||||||
|
|
||||||
2.2. Parameters for Linear Booster
|
\strong{2.2. Parameters for Linear Booster}
|
||||||
|
|
||||||
\itemize{
|
\itemize{
|
||||||
\item \code{lambda} L2 regularization term on weights. Default: 0
|
\item \code{lambda}: L2 regularization term on weights. Default: 0.
|
||||||
\item \code{lambda_bias} L2 regularization term on bias. Default: 0
|
\item \code{lambda_bias}: L2 regularization term on bias. Default: 0.
|
||||||
\item \code{alpha} L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0
|
\item \code{alpha}: L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0.
|
||||||
}
|
|
||||||
\enumerate{
|
|
||||||
\item Task Parameters
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
\strong{3. Task Parameters}
|
||||||
\itemize{
|
\itemize{
|
||||||
\item{ \code{objective} specify the learning task and the corresponding learning objective, users can pass a self-defined function to it.
|
\item \code{objective}: Specifies the learning task and the corresponding learning objective.
|
||||||
The default objective options are below:
|
users can pass a self-defined function to it. The default objective options are below:
|
||||||
\itemize{
|
\itemize{
|
||||||
\item \code{reg:squarederror} Regression with squared loss (Default).
|
\item \code{reg:squarederror}: Regression with squared loss (default).
|
||||||
\item{ \code{reg:squaredlogerror}: regression with squared log loss \eqn{1/2 * (log(pred + 1) - log(label + 1))^2}.
|
\item \code{reg:squaredlogerror}: Regression with squared log loss \eqn{1/2 \cdot (\log(pred + 1) - \log(label + 1))^2}.
|
||||||
All inputs are required to be greater than -1.
|
All inputs are required to be greater than -1.
|
||||||
Also, see metric rmsle for possible issue with this objective.}
|
Also, see metric rmsle for possible issue with this objective.
|
||||||
\item \code{reg:logistic} logistic regression.
|
\item \code{reg:logistic}: Logistic regression.
|
||||||
\item \code{reg:pseudohubererror}: regression with Pseudo Huber loss, a twice differentiable alternative to absolute loss.
|
\item \code{reg:pseudohubererror}: Regression with Pseudo Huber loss, a twice differentiable alternative to absolute loss.
|
||||||
\item \code{binary:logistic} logistic regression for binary classification. Output probability.
|
\item \code{binary:logistic}: Logistic regression for binary classification. Output probability.
|
||||||
\item \code{binary:logitraw} logistic regression for binary classification, output score before logistic transformation.
|
\item \code{binary:logitraw}: Logistic regression for binary classification, output score before logistic transformation.
|
||||||
\item \code{binary:hinge}: hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities.
|
\item \code{binary:hinge}: Hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities.
|
||||||
\item{ \code{count:poisson}: Poisson regression for count data, output mean of Poisson distribution.
|
\item \code{count:poisson}: Poisson regression for count data, output mean of Poisson distribution.
|
||||||
\code{max_delta_step} is set to 0.7 by default in poisson regression (used to safeguard optimization).}
|
The parameter \code{max_delta_step} is set to 0.7 by default in poisson regression
|
||||||
\item{ \code{survival:cox}: Cox regression for right censored survival time data (negative values are considered right censored).
|
(used to safeguard optimization).
|
||||||
|
\item \code{survival:cox}: Cox regression for right censored survival time data (negative values are considered right censored).
|
||||||
Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional
|
Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional
|
||||||
hazard function \code{h(t) = h0(t) * HR)}.}
|
hazard function \eqn{h(t) = h_0(t) \cdot HR}.
|
||||||
\item{ \code{survival:aft}: Accelerated failure time model for censored survival time data. See
|
\item \code{survival:aft}: Accelerated failure time model for censored survival time data. See
|
||||||
\href{https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html}{Survival Analysis with Accelerated Failure Time}
|
\href{https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html}{Survival Analysis with Accelerated Failure Time}
|
||||||
for details.}
|
for details.
|
||||||
\item \code{aft_loss_distribution}: Probability Density Function used by \code{survival:aft} and \code{aft-nloglik} metric.
|
The parameter \code{aft_loss_distribution} specifies the Probability Density Function
|
||||||
\item{ \code{multi:softmax} set xgboost to do multiclass classification using the softmax objective.
|
used by \code{survival:aft} and the \code{aft-nloglik} metric.
|
||||||
Class is represented by a number and should be from 0 to \code{num_class - 1}.}
|
\item \code{multi:softmax}: Set xgboost to do multiclass classification using the softmax objective.
|
||||||
\item{ \code{multi:softprob} same as softmax, but prediction outputs a vector of ndata * nclass elements, which can be
|
Class is represented by a number and should be from 0 to \code{num_class - 1}.
|
||||||
|
\item \code{multi:softprob}: Same as softmax, but prediction outputs a vector of ndata * nclass elements, which can be
|
||||||
further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging
|
further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging
|
||||||
to each class.}
|
to each class.
|
||||||
\item \code{rank:pairwise} set xgboost to do ranking task by minimizing the pairwise loss.
|
\item \code{rank:pairwise}: Set XGBoost to do ranking task by minimizing the pairwise loss.
|
||||||
\item{ \code{rank:ndcg}: Use LambdaMART to perform list-wise ranking where
|
\item \code{rank:ndcg}: Use LambdaMART to perform list-wise ranking where
|
||||||
\href{https://en.wikipedia.org/wiki/Discounted_cumulative_gain}{Normalized Discounted Cumulative Gain (NDCG)} is maximized.}
|
\href{https://en.wikipedia.org/wiki/Discounted_cumulative_gain}{Normalized Discounted Cumulative Gain (NDCG)} is maximized.
|
||||||
\item{ \code{rank:map}: Use LambdaMART to perform list-wise ranking where
|
\item \code{rank:map}: Use LambdaMART to perform list-wise ranking where
|
||||||
\href{https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision}{Mean Average Precision (MAP)}
|
\href{https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision}{Mean Average Precision (MAP)}
|
||||||
is maximized.}
|
is maximized.
|
||||||
\item{ \code{reg:gamma}: gamma regression with log-link.
|
\item \code{reg:gamma}: Gamma regression with log-link. Output is a mean of gamma distribution.
|
||||||
Output is a mean of gamma distribution.
|
|
||||||
It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be
|
It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be
|
||||||
\href{https://en.wikipedia.org/wiki/Gamma_distribution#Applications}{gamma-distributed}.}
|
\href{https://en.wikipedia.org/wiki/Gamma_distribution#Applications}{gamma-distributed}.
|
||||||
\item{ \code{reg:tweedie}: Tweedie regression with log-link.
|
\item \code{reg:tweedie}: Tweedie regression with log-link.
|
||||||
It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be
|
It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be
|
||||||
\href{https://en.wikipedia.org/wiki/Tweedie_distribution#Applications}{Tweedie-distributed}.}
|
\href{https://en.wikipedia.org/wiki/Tweedie_distribution#Applications}{Tweedie-distributed}.
|
||||||
}
|
}
|
||||||
|
|
||||||
For custom objectives, one should pass a function taking as input the current predictions (as a numeric
|
For custom objectives, one should pass a function taking as input the current predictions (as a numeric
|
||||||
@ -134,91 +128,85 @@ For multi-valued custom objectives, should have shape \verb{[nrows, ntargets]}.
|
|||||||
the Hessian will be clipped, so one might consider using the expected Hessian (Fisher information) if the
|
the Hessian will be clipped, so one might consider using the expected Hessian (Fisher information) if the
|
||||||
objective is non-convex.
|
objective is non-convex.
|
||||||
|
|
||||||
See the tutorials \href{https://xgboost.readthedocs.io/en/stable/tutorials/custom_metric_obj.html}{
|
See the tutorials \href{https://xgboost.readthedocs.io/en/stable/tutorials/custom_metric_obj.html}{Custom Objective and Evaluation Metric}
|
||||||
Custom Objective and Evaluation Metric} and \href{https://xgboost.readthedocs.io/en/stable/tutorials/advanced_custom_obj}{
|
and \href{https://xgboost.readthedocs.io/en/stable/tutorials/advanced_custom_obj}{Advanced Usage of Custom Objectives}
|
||||||
Advanced Usage of Custom Objectives} for more information about custom objectives.
|
for more information about custom objectives.
|
||||||
}
|
\item \code{base_score}: The initial prediction score of all instances, global bias. Default: 0.5.
|
||||||
\item \code{base_score} the initial prediction score of all instances, global bias. Default: 0.5
|
\item \code{eval_metric}: Evaluation metrics for validation data.
|
||||||
\item{ \code{eval_metric} evaluation metrics for validation data.
|
|
||||||
Users can pass a self-defined function to it.
|
Users can pass a self-defined function to it.
|
||||||
Default: metric will be assigned according to objective
|
Default: metric will be assigned according to objective
|
||||||
(rmse for regression, and error for classification, mean average precision for ranking).
|
(rmse for regression, and error for classification, mean average precision for ranking).
|
||||||
List is provided in detail section.}
|
List is provided in detail section.
|
||||||
}}
|
}}
|
||||||
|
|
||||||
\item{data}{training dataset. \code{xgb.train} accepts only an \code{xgb.DMatrix} as the input.
|
\item{data}{Training dataset. \code{xgb.train()} accepts only an \code{xgb.DMatrix} as the input.
|
||||||
\code{xgboost}, in addition, also accepts \code{matrix}, \code{dgCMatrix}, or name of a local data file.}
|
\code{\link[=xgboost]{xgboost()}}, in addition, also accepts \code{matrix}, \code{dgCMatrix}, or name of a local data file.}
|
||||||
|
|
||||||
\item{nrounds}{max number of boosting iterations.}
|
\item{nrounds}{Max number of boosting iterations.}
|
||||||
|
|
||||||
\item{evals}{Named list of \code{xgb.DMatrix} datasets to use for evaluating model performance.
|
\item{evals}{Named list of \code{xgb.DMatrix} datasets to use for evaluating model performance.
|
||||||
Metrics specified in either \code{eval_metric} or \code{feval} will be computed for each
|
Metrics specified in either \code{eval_metric} or \code{feval} will be computed for each
|
||||||
of these datasets during each boosting iteration, and stored in the end as a field named
|
of these datasets during each boosting iteration, and stored in the end as a field named
|
||||||
\code{evaluation_log} in the resulting object. When either \code{verbose>=1} or
|
\code{evaluation_log} in the resulting object. When either \code{verbose>=1} or
|
||||||
\code{\link{xgb.cb.print.evaluation}} callback is engaged, the performance results are continuously
|
\code{\link[=xgb.cb.print.evaluation]{xgb.cb.print.evaluation()}} callback is engaged, the performance results are continuously
|
||||||
printed out during the training.
|
printed out during the training.
|
||||||
E.g., specifying \code{evals=list(validation1=mat1, validation2=mat2)} allows to track
|
E.g., specifying \code{evals=list(validation1=mat1, validation2=mat2)} allows to track
|
||||||
the performance of each round's model on mat1 and mat2.}
|
the performance of each round's model on mat1 and mat2.}
|
||||||
|
|
||||||
\item{obj}{customized objective function. Should take two arguments: the first one will be the
|
\item{obj}{Customized objective function. Should take two arguments: the first one will be the
|
||||||
current predictions (either a numeric vector or matrix depending on the number of targets / classes),
|
current predictions (either a numeric vector or matrix depending on the number of targets / classes),
|
||||||
and the second one will be the \code{data} DMatrix object that is used for training.
|
and the second one will be the \code{data} DMatrix object that is used for training.
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ It should return a list with two elements `grad` and `hess` (in that order), as either
|
It should return a list with two elements \code{grad} and \code{hess} (in that order), as either
|
||||||
numeric vectors or numeric matrices depending on the number of targets / classes (same
|
numeric vectors or numeric matrices depending on the number of targets / classes (same
|
||||||
dimension as the predictions that are passed as first argument).
|
dimension as the predictions that are passed as first argument).}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{feval}{customized evaluation function. Just like \code{obj}, should take two arguments, with
|
\item{feval}{Customized evaluation function. Just like \code{obj}, should take two arguments, with
|
||||||
the first one being the predictions and the second one the \code{data} DMatrix.
|
the first one being the predictions and the second one the \code{data} DMatrix.
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Should return a list with two elements `metric` (name that will be displayed for this metric,
|
Should return a list with two elements \code{metric} (name that will be displayed for this metric,
|
||||||
should be a string / character), and `value` (the number that the function calculates, should
|
should be a string / character), and \code{value} (the number that the function calculates, should
|
||||||
be a numeric scalar).
|
be a numeric scalar).
|
||||||
|
|
||||||
Note that even if passing `feval`, objectives also have an associated default metric that
|
Note that even if passing \code{feval}, objectives also have an associated default metric that
|
||||||
will be evaluated in addition to it. In order to disable the built-in metric, one can pass
|
will be evaluated in addition to it. In order to disable the built-in metric, one can pass
|
||||||
parameter `disable_default_eval_metric = TRUE`.
|
parameter \code{disable_default_eval_metric = TRUE}.}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{verbose}{If 0, xgboost will stay silent. If 1, it will print information about performance.
|
\item{verbose}{If 0, xgboost will stay silent. If 1, it will print information about performance.
|
||||||
If 2, some additional information will be printed out.
|
If 2, some additional information will be printed out.
|
||||||
Note that setting \code{verbose > 0} automatically engages the
|
Note that setting \code{verbose > 0} automatically engages the
|
||||||
\code{xgb.cb.print.evaluation(period=1)} callback function.}
|
\code{xgb.cb.print.evaluation(period=1)} callback function.}
|
||||||
|
|
||||||
\item{print_every_n}{Print each n-th iteration evaluation messages when \code{verbose>0}.
|
\item{print_every_n}{Print each nth iteration evaluation messages when \code{verbose>0}.
|
||||||
Default is 1 which means all messages are printed. This parameter is passed to the
|
Default is 1 which means all messages are printed. This parameter is passed to the
|
||||||
\code{\link{xgb.cb.print.evaluation}} callback.}
|
\code{\link[=xgb.cb.print.evaluation]{xgb.cb.print.evaluation()}} callback.}
|
||||||
|
|
||||||
\item{early_stopping_rounds}{If \code{NULL}, the early stopping function is not triggered.
|
\item{early_stopping_rounds}{If \code{NULL}, the early stopping function is not triggered.
|
||||||
If set to an integer \code{k}, training with a validation set will stop if the performance
|
If set to an integer \code{k}, training with a validation set will stop if the performance
|
||||||
doesn't improve for \code{k} rounds.
|
doesn't improve for \code{k} rounds. Setting this parameter engages the \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}} callback.}
|
||||||
Setting this parameter engages the \code{\link{xgb.cb.early.stop}} callback.}
|
|
||||||
|
|
||||||
\item{maximize}{If \code{feval} and \code{early_stopping_rounds} are set,
|
\item{maximize}{If \code{feval} and \code{early_stopping_rounds} are set, then this parameter must be set as well.
|
||||||
then this parameter must be set as well.
|
|
||||||
When it is \code{TRUE}, it means the larger the evaluation score the better.
|
When it is \code{TRUE}, it means the larger the evaluation score the better.
|
||||||
This parameter is passed to the \code{\link{xgb.cb.early.stop}} callback.}
|
This parameter is passed to the \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}} callback.}
|
||||||
|
|
||||||
\item{save_period}{when it is non-NULL, model is saved to disk after every \code{save_period} rounds,
|
\item{save_period}{When not \code{NULL}, model is saved to disk after every \code{save_period} rounds.
|
||||||
0 means save at the end. The saving is handled by the \code{\link{xgb.cb.save.model}} callback.}
|
0 means save at the end. The saving is handled by the \code{\link[=xgb.cb.save.model]{xgb.cb.save.model()}} callback.}
|
||||||
|
|
||||||
\item{save_name}{the name or path for periodically saved model file.}
|
\item{save_name}{the name or path for periodically saved model file.}
|
||||||
|
|
||||||
\item{xgb_model}{a previously built model to continue the training from.
|
\item{xgb_model}{A previously built model to continue the training from.
|
||||||
Could be either an object of class \code{xgb.Booster}, or its raw data, or the name of a
|
Could be either an object of class \code{xgb.Booster}, or its raw data, or the name of a
|
||||||
file with a previously saved model.}
|
file with a previously saved model.}
|
||||||
|
|
||||||
\item{callbacks}{a list of callback functions to perform various task during boosting.
|
\item{callbacks}{A list of callback functions to perform various task during boosting.
|
||||||
See \code{\link{xgb.Callback}}. Some of the callbacks are automatically created depending on the
|
See \code{\link[=xgb.Callback]{xgb.Callback()}}. Some of the callbacks are automatically created depending on the
|
||||||
parameters' values. User can provide either existing or their own callback methods in order
|
parameters' values. User can provide either existing or their own callback methods in order
|
||||||
to customize the training process.
|
to customize the training process.
|
||||||
|
|
||||||
\if{html}{\out{<div class="sourceCode">}}\preformatted{ Note that some callbacks might try to leave attributes in the resulting model object,
|
Note that some callbacks might try to leave attributes in the resulting model object,
|
||||||
such as an evaluation log (a `data.table` object) - be aware that these objects are kept
|
such as an evaluation log (a \code{data.table} object) - be aware that these objects are kept
|
||||||
as R attributes, and thus do not get saved when using XGBoost's own serializaters like
|
as R attributes, and thus do not get saved when using XGBoost's own serializaters like
|
||||||
\link{xgb.save} (but are kept when using R serializers like \link{saveRDS}).
|
\code{\link[=xgb.save]{xgb.save()}} (but are kept when using R serializers like \code{\link[=saveRDS]{saveRDS()}}).}
|
||||||
}\if{html}{\out{</div>}}}
|
|
||||||
|
|
||||||
\item{...}{other parameters to pass to \code{params}.}
|
\item{...}{other parameters to pass to \code{params}.}
|
||||||
}
|
}
|
||||||
@ -226,19 +214,18 @@ to customize the training process.
|
|||||||
An object of class \code{xgb.Booster}.
|
An object of class \code{xgb.Booster}.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
\code{xgb.train} is an advanced interface for training an xgboost model.
|
\code{xgb.train()} is an advanced interface for training an xgboost model.
|
||||||
The \code{xgboost} function is a simpler wrapper for \code{xgb.train}.
|
The \code{\link[=xgboost]{xgboost()}} function is a simpler wrapper for \code{xgb.train()}.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
These are the training functions for \code{xgboost}.
|
These are the training functions for \code{\link[=xgboost]{xgboost()}}.
|
||||||
|
|
||||||
The \code{xgb.train} interface supports advanced features such as \code{evals},
|
The \code{xgb.train()} interface supports advanced features such as \code{evals},
|
||||||
customized objective and evaluation metric functions, therefore it is more flexible
|
customized objective and evaluation metric functions, therefore it is more flexible
|
||||||
than the \code{xgboost} interface.
|
than the \code{\link[=xgboost]{xgboost()}} interface.
|
||||||
|
|
||||||
Parallelization is automatically enabled if \code{OpenMP} is present.
|
Parallelization is automatically enabled if OpenMP is present.
|
||||||
Number of threads can also be manually specified via the \code{nthread}
|
Number of threads can also be manually specified via the \code{nthread} parameter.
|
||||||
parameter.
|
|
||||||
|
|
||||||
While in other interfaces, the default random seed defaults to zero, in R, if a parameter \code{seed}
|
While in other interfaces, the default random seed defaults to zero, in R, if a parameter \code{seed}
|
||||||
is not manually supplied, it will generate a random seed through R's own random number generator,
|
is not manually supplied, it will generate a random seed through R's own random number generator,
|
||||||
@ -251,49 +238,49 @@ User may set one or several \code{eval_metric} parameters.
|
|||||||
Note that when using a customized metric, only this single metric can be used.
|
Note that when using a customized metric, only this single metric can be used.
|
||||||
The following is the list of built-in metrics for which XGBoost provides optimized implementation:
|
The following is the list of built-in metrics for which XGBoost provides optimized implementation:
|
||||||
\itemize{
|
\itemize{
|
||||||
\item \code{rmse} root mean square error. \url{https://en.wikipedia.org/wiki/Root_mean_square_error}
|
\item \code{rmse}: Root mean square error. \url{https://en.wikipedia.org/wiki/Root_mean_square_error}
|
||||||
\item \code{logloss} negative log-likelihood. \url{https://en.wikipedia.org/wiki/Log-likelihood}
|
\item \code{logloss}: Negative log-likelihood. \url{https://en.wikipedia.org/wiki/Log-likelihood}
|
||||||
\item \code{mlogloss} multiclass logloss. \url{https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html}
|
\item \code{mlogloss}: Multiclass logloss. \url{https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html}
|
||||||
\item \code{error} Binary classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
|
\item \code{error}: Binary classification error rate. It is calculated as \verb{(# wrong cases) / (# all cases)}.
|
||||||
By default, it uses the 0.5 threshold for predicted values to define negative and positive instances.
|
By default, it uses the 0.5 threshold for predicted values to define negative and positive instances.
|
||||||
Different threshold (e.g., 0.) could be specified as "error@0."
|
Different threshold (e.g., 0.) could be specified as \verb{error@0}.
|
||||||
\item \code{merror} Multiclass classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
|
\item \code{merror}: Multiclass classification error rate. It is calculated as \verb{(# wrong cases) / (# all cases)}.
|
||||||
\item \code{mae} Mean absolute error
|
\item \code{mae}: Mean absolute error.
|
||||||
\item \code{mape} Mean absolute percentage error
|
\item \code{mape}: Mean absolute percentage error.
|
||||||
\item{ \code{auc} Area under the curve.
|
\item \code{auc}: Area under the curve.
|
||||||
\url{https://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.}
|
\url{https://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.
|
||||||
\item \code{aucpr} Area under the PR curve. \url{https://en.wikipedia.org/wiki/Precision_and_recall} for ranking evaluation.
|
\item \code{aucpr}: Area under the PR curve. \url{https://en.wikipedia.org/wiki/Precision_and_recall} for ranking evaluation.
|
||||||
\item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{https://en.wikipedia.org/wiki/NDCG}
|
\item \code{ndcg}: Normalized Discounted Cumulative Gain (for ranking task). \url{https://en.wikipedia.org/wiki/NDCG}
|
||||||
}
|
}
|
||||||
|
|
||||||
The following callbacks are automatically created when certain parameters are set:
|
The following callbacks are automatically created when certain parameters are set:
|
||||||
\itemize{
|
\itemize{
|
||||||
\item \code{xgb.cb.print.evaluation} is turned on when \code{verbose > 0};
|
\item \code{\link[=xgb.cb.print.evaluation]{xgb.cb.print.evaluation()}} is turned on when \code{verbose > 0} and the \code{print_every_n}
|
||||||
and the \code{print_every_n} parameter is passed to it.
|
parameter is passed to it.
|
||||||
\item \code{xgb.cb.evaluation.log} is on when \code{evals} is present.
|
\item \code{\link[=xgb.cb.evaluation.log]{xgb.cb.evaluation.log()}} is on when \code{evals} is present.
|
||||||
\item \code{xgb.cb.early.stop}: when \code{early_stopping_rounds} is set.
|
\item \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}}: When \code{early_stopping_rounds} is set.
|
||||||
\item \code{xgb.cb.save.model}: when \code{save_period > 0} is set.
|
\item \code{\link[=xgb.cb.save.model]{xgb.cb.save.model()}}: When \code{save_period > 0} is set.
|
||||||
}
|
}
|
||||||
|
|
||||||
Note that objects of type \code{xgb.Booster} as returned by this function behave a bit differently
|
Note that objects of type \code{xgb.Booster} as returned by this function behave a bit differently
|
||||||
from typical R objects (it's an 'altrep' list class), and it makes a separation between
|
from typical R objects (it's an 'altrep' list class), and it makes a separation between
|
||||||
internal booster attributes (restricted to jsonifyable data), accessed through \link{xgb.attr}
|
internal booster attributes (restricted to jsonifyable data), accessed through \code{\link[=xgb.attr]{xgb.attr()}}
|
||||||
and shared between interfaces through serialization functions like \link{xgb.save}; and
|
and shared between interfaces through serialization functions like \code{\link[=xgb.save]{xgb.save()}}; and
|
||||||
R-specific attributes (typically the result from a callback), accessed through \link{attributes}
|
R-specific attributes (typically the result from a callback), accessed through \code{\link[=attributes]{attributes()}}
|
||||||
and \link{attr}, which are otherwise
|
and \code{\link[=attr]{attr()}}, which are otherwise
|
||||||
only used in the R interface, only kept when using R's serializers like \link{saveRDS}, and
|
only used in the R interface, only kept when using R's serializers like \code{\link[=saveRDS]{saveRDS()}}, and
|
||||||
not anyhow used by functions like \link{predict.xgb.Booster}.
|
not anyhow used by functions like \code{predict.xgb.Booster()}.
|
||||||
|
|
||||||
Be aware that one such R attribute that is automatically added is \code{params} - this attribute
|
Be aware that one such R attribute that is automatically added is \code{params} - this attribute
|
||||||
is assigned from the \code{params} argument to this function, and is only meant to serve as a
|
is assigned from the \code{params} argument to this function, and is only meant to serve as a
|
||||||
reference for what went into the booster, but is not used in other methods that take a booster
|
reference for what went into the booster, but is not used in other methods that take a booster
|
||||||
object - so for example, changing the booster's configuration requires calling \verb{xgb.config<-}
|
object - so for example, changing the booster's configuration requires calling \verb{xgb.config<-}
|
||||||
or 'xgb.parameters<-', while simply modifying \verb{attributes(model)$params$<...>} will have no
|
or \verb{xgb.parameters<-}, while simply modifying \verb{attributes(model)$params$<...>} will have no
|
||||||
effect elsewhere.
|
effect elsewhere.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
data(agaricus.train, package='xgboost')
|
data(agaricus.train, package = "xgboost")
|
||||||
data(agaricus.test, package='xgboost')
|
data(agaricus.test, package = "xgboost")
|
||||||
|
|
||||||
## Keep the number of threads to 1 for examples
|
## Keep the number of threads to 1 for examples
|
||||||
nthread <- 1
|
nthread <- 1
|
||||||
@ -308,8 +295,13 @@ dtest <- with(
|
|||||||
evals <- list(train = dtrain, eval = dtest)
|
evals <- list(train = dtrain, eval = dtest)
|
||||||
|
|
||||||
## A simple xgb.train example:
|
## A simple xgb.train example:
|
||||||
param <- list(max_depth = 2, eta = 1, nthread = nthread,
|
param <- list(
|
||||||
objective = "binary:logistic", eval_metric = "auc")
|
max_depth = 2,
|
||||||
|
eta = 1,
|
||||||
|
nthread = nthread,
|
||||||
|
objective = "binary:logistic",
|
||||||
|
eval_metric = "auc"
|
||||||
|
)
|
||||||
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)
|
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)
|
||||||
|
|
||||||
## An xgb.train example where custom objective and evaluation metric are
|
## An xgb.train example where custom objective and evaluation metric are
|
||||||
@ -329,34 +321,65 @@ evalerror <- function(preds, dtrain) {
|
|||||||
|
|
||||||
# These functions could be used by passing them either:
|
# These functions could be used by passing them either:
|
||||||
# as 'objective' and 'eval_metric' parameters in the params list:
|
# as 'objective' and 'eval_metric' parameters in the params list:
|
||||||
param <- list(max_depth = 2, eta = 1, nthread = nthread,
|
param <- list(
|
||||||
objective = logregobj, eval_metric = evalerror)
|
max_depth = 2,
|
||||||
|
eta = 1,
|
||||||
|
nthread = nthread,
|
||||||
|
objective = logregobj,
|
||||||
|
eval_metric = evalerror
|
||||||
|
)
|
||||||
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)
|
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)
|
||||||
|
|
||||||
# or through the ... arguments:
|
# or through the ... arguments:
|
||||||
param <- list(max_depth = 2, eta = 1, nthread = nthread)
|
param <- list(max_depth = 2, eta = 1, nthread = nthread)
|
||||||
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0,
|
bst <- xgb.train(
|
||||||
objective = logregobj, eval_metric = evalerror)
|
param,
|
||||||
|
dtrain,
|
||||||
|
nrounds = 2,
|
||||||
|
evals = evals,
|
||||||
|
verbose = 0,
|
||||||
|
objective = logregobj,
|
||||||
|
eval_metric = evalerror
|
||||||
|
)
|
||||||
|
|
||||||
# or as dedicated 'obj' and 'feval' parameters of xgb.train:
|
# or as dedicated 'obj' and 'feval' parameters of xgb.train:
|
||||||
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals,
|
bst <- xgb.train(
|
||||||
obj = logregobj, feval = evalerror)
|
param, dtrain, nrounds = 2, evals = evals, obj = logregobj, feval = evalerror
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
## An xgb.train example of using variable learning rates at each iteration:
|
## An xgb.train example of using variable learning rates at each iteration:
|
||||||
param <- list(max_depth = 2, eta = 1, nthread = nthread,
|
param <- list(
|
||||||
objective = "binary:logistic", eval_metric = "auc")
|
max_depth = 2,
|
||||||
|
eta = 1,
|
||||||
|
nthread = nthread,
|
||||||
|
objective = "binary:logistic",
|
||||||
|
eval_metric = "auc"
|
||||||
|
)
|
||||||
my_etas <- list(eta = c(0.5, 0.1))
|
my_etas <- list(eta = c(0.5, 0.1))
|
||||||
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0,
|
|
||||||
callbacks = list(xgb.cb.reset.parameters(my_etas)))
|
bst <- xgb.train(
|
||||||
|
param,
|
||||||
|
dtrain,
|
||||||
|
nrounds = 2,
|
||||||
|
evals = evals,
|
||||||
|
verbose = 0,
|
||||||
|
callbacks = list(xgb.cb.reset.parameters(my_etas))
|
||||||
|
)
|
||||||
|
|
||||||
## Early stopping:
|
## Early stopping:
|
||||||
bst <- xgb.train(param, dtrain, nrounds = 25, evals = evals,
|
bst <- xgb.train(
|
||||||
early_stopping_rounds = 3)
|
param, dtrain, nrounds = 25, evals = evals, early_stopping_rounds = 3
|
||||||
|
)
|
||||||
|
|
||||||
## An 'xgboost' interface example:
|
## An 'xgboost' interface example:
|
||||||
bst <- xgboost(x = agaricus.train$data, y = factor(agaricus.train$label),
|
bst <- xgboost(
|
||||||
params = list(max_depth = 2, eta = 1), nthread = nthread, nrounds = 2)
|
x = agaricus.train$data,
|
||||||
|
y = factor(agaricus.train$label),
|
||||||
|
params = list(max_depth = 2, eta = 1),
|
||||||
|
nthread = nthread,
|
||||||
|
nrounds = 2
|
||||||
|
)
|
||||||
pred <- predict(bst, agaricus.test$data)
|
pred <- predict(bst, agaricus.test$data)
|
||||||
|
|
||||||
}
|
}
|
||||||
@ -365,7 +388,5 @@ Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System",
|
|||||||
22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, \url{https://arxiv.org/abs/1603.02754}
|
22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, \url{https://arxiv.org/abs/1603.02754}
|
||||||
}
|
}
|
||||||
\seealso{
|
\seealso{
|
||||||
\code{\link{xgb.Callback}},
|
\code{\link[=xgb.Callback]{xgb.Callback()}}, \code{\link[=predict.xgb.Booster]{predict.xgb.Booster()}}, \code{\link[=xgb.cv]{xgb.cv()}}
|
||||||
\code{\link{predict.xgb.Booster}},
|
|
||||||
\code{\link{xgb.cv}}
|
|
||||||
}
|
}
|
||||||
|
|||||||
@ -14,21 +14,21 @@ xgb.get.config()
|
|||||||
\item{...}{List of parameters to be set, as keyword arguments}
|
\item{...}{List of parameters to be set, as keyword arguments}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
\code{xgb.set.config} returns \code{TRUE} to signal success. \code{xgb.get.config} returns
|
\code{xgb.set.config()} returns \code{TRUE} to signal success. \code{xgb.get.config()} returns
|
||||||
a list containing all global-scope parameters and their values.
|
a list containing all global-scope parameters and their values.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Global configuration consists of a collection of parameters that can be applied in the global
|
Global configuration consists of a collection of parameters that can be applied in the global
|
||||||
scope. See \url{https://xgboost.readthedocs.io/en/stable/parameter.html} for the full list of
|
scope. See \url{https://xgboost.readthedocs.io/en/stable/parameter.html} for the full list of
|
||||||
parameters supported in the global configuration. Use \code{xgb.set.config} to update the
|
parameters supported in the global configuration. Use \code{xgb.set.config()} to update the
|
||||||
values of one or more global-scope parameters. Use \code{xgb.get.config} to fetch the current
|
values of one or more global-scope parameters. Use \code{xgb.get.config()} to fetch the current
|
||||||
values of all global-scope parameters (listed in
|
values of all global-scope parameters (listed in
|
||||||
\url{https://xgboost.readthedocs.io/en/stable/parameter.html}).
|
\url{https://xgboost.readthedocs.io/en/stable/parameter.html}).
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
Note that serialization-related functions might use a globally-configured number of threads,
|
Note that serialization-related functions might use a globally-configured number of threads,
|
||||||
which is managed by the system's OpenMP (OMP) configuration instead. Typically, XGBoost methods
|
which is managed by the system's OpenMP (OMP) configuration instead. Typically, XGBoost methods
|
||||||
accept an \code{nthreads} parameter, but some methods like \code{readRDS} might get executed before such
|
accept an \code{nthreads} parameter, but some methods like \code{\link[=readRDS]{readRDS()}} might get executed before such
|
||||||
parameter can be supplied.
|
parameter can be supplied.
|
||||||
|
|
||||||
The number of OMP threads can in turn be configured for example through an environment variable
|
The number of OMP threads can in turn be configured for example through an environment variable
|
||||||
|
|||||||
@ -21,65 +21,69 @@ xgboost(
|
|||||||
)
|
)
|
||||||
}
|
}
|
||||||
\arguments{
|
\arguments{
|
||||||
\item{x}{The features / covariates. Can be passed as:\itemize{
|
\item{x}{The features / covariates. Can be passed as:
|
||||||
\item A numeric or integer `matrix`.
|
\itemize{
|
||||||
\item A `data.frame`, in which all columns are one of the following types:\itemize{
|
\item A numeric or integer \code{matrix}.
|
||||||
\item `numeric`
|
\item A \code{data.frame}, in which all columns are one of the following types:
|
||||||
\item `integer`
|
\itemize{
|
||||||
\item `logical`
|
\item \code{numeric}
|
||||||
\item `factor`
|
\item \code{integer}
|
||||||
|
\item \code{logical}
|
||||||
|
\item \code{factor}
|
||||||
}
|
}
|
||||||
|
|
||||||
Columns of `factor` type will be assumed to be categorical, while other column types will
|
Columns of \code{factor} type will be assumed to be categorical, while other column types will
|
||||||
be assumed to be numeric.
|
be assumed to be numeric.
|
||||||
\item A sparse matrix from the `Matrix` package, either as `dgCMatrix` or `dgRMatrix` class.
|
\item A sparse matrix from the \code{Matrix} package, either as \code{dgCMatrix} or \code{dgRMatrix} class.
|
||||||
}
|
}
|
||||||
|
|
||||||
Note that categorical features are only supported for `data.frame` inputs, and are automatically
|
Note that categorical features are only supported for \code{data.frame} inputs, and are automatically
|
||||||
determined based on their types. See \link{xgb.train} with \link{xgb.DMatrix} for more flexible
|
determined based on their types. See \code{\link[=xgb.train]{xgb.train()}} with \code{\link[=xgb.DMatrix]{xgb.DMatrix()}} for more flexible
|
||||||
variants that would allow something like categorical features on sparse matrices.}
|
variants that would allow something like categorical features on sparse matrices.}
|
||||||
|
|
||||||
\item{y}{The response variable. Allowed values are:\itemize{
|
\item{y}{The response variable. Allowed values are:
|
||||||
|
\itemize{
|
||||||
\item A numeric or integer vector (for regression tasks).
|
\item A numeric or integer vector (for regression tasks).
|
||||||
\item A factor or character vector (for binary and multi-class classification tasks).
|
\item A factor or character vector (for binary and multi-class classification tasks).
|
||||||
\item A logical (boolean) vector (for binary classification tasks).
|
\item A logical (boolean) vector (for binary classification tasks).
|
||||||
\item A numeric or integer matrix or `data.frame` with numeric/integer columns
|
\item A numeric or integer matrix or \code{data.frame} with numeric/integer columns
|
||||||
(for multi-task regression tasks).
|
(for multi-task regression tasks).
|
||||||
\item A `Surv` object from the `survival` package (for survival tasks).
|
\item A \code{Surv} object from the 'survival' package (for survival tasks).
|
||||||
}
|
}
|
||||||
|
|
||||||
If `objective` is `NULL`, the right task will be determined automatically based on
|
If \code{objective} is \code{NULL}, the right task will be determined automatically based on
|
||||||
the class of `y`.
|
the class of \code{y}.
|
||||||
|
|
||||||
If `objective` is not `NULL`, it must match with the type of `y` - e.g. `factor` types of `y`
|
If \code{objective} is not \code{NULL}, it must match with the type of \code{y} - e.g. \code{factor} types of \code{y}
|
||||||
can only be used with classification objectives and vice-versa.
|
can only be used with classification objectives and vice-versa.
|
||||||
|
|
||||||
For binary classification, the last factor level of `y` will be used as the "positive"
|
For binary classification, the last factor level of \code{y} will be used as the "positive"
|
||||||
class - that is, the numbers from `predict` will reflect the probabilities of belonging to this
|
class - that is, the numbers from \code{predict} will reflect the probabilities of belonging to this
|
||||||
class instead of to the first factor level. If `y` is a `logical` vector, then `TRUE` will be
|
class instead of to the first factor level. If \code{y} is a \code{logical} vector, then \code{TRUE} will be
|
||||||
set as the last level.}
|
set as the last level.}
|
||||||
|
|
||||||
\item{objective}{Optimization objective to minimize based on the supplied data, to be passed
|
\item{objective}{Optimization objective to minimize based on the supplied data, to be passed
|
||||||
by name as a string / character (e.g. `reg:absoluteerror`). See the
|
by name as a string / character (e.g. \code{reg:absoluteerror}). See the
|
||||||
\href{https://xgboost.readthedocs.io/en/stable/parameter.html#learning-task-parameters}{
|
\href{https://xgboost.readthedocs.io/en/stable/parameter.html#learning-task-parameters}{Learning Task Parameters}
|
||||||
Learning Task Parameters} page for more detailed information on allowed values.
|
page for more detailed information on allowed values.
|
||||||
|
|
||||||
If `NULL` (the default), will be automatically determined from `y` according to the following
|
If \code{NULL} (the default), will be automatically determined from \code{y} according to the following
|
||||||
logic:\itemize{
|
logic:
|
||||||
\item If `y` is a factor with 2 levels, will use `binary:logistic`.
|
\itemize{
|
||||||
\item If `y` is a factor with more than 2 levels, will use `multi:softprob` (number of classes
|
\item If \code{y} is a factor with 2 levels, will use \code{binary:logistic}.
|
||||||
will be determined automatically, should not be passed under `params`).
|
\item If \code{y} is a factor with more than 2 levels, will use \code{multi:softprob} (number of classes
|
||||||
\item If `y` is a `Surv` object from the `survival` package, will use `survival:aft` (note that
|
will be determined automatically, should not be passed under \code{params}).
|
||||||
|
\item If \code{y} is a \code{Surv} object from the \code{survival} package, will use \code{survival:aft} (note that
|
||||||
the only types supported are left / right / interval censored).
|
the only types supported are left / right / interval censored).
|
||||||
\item Otherwise, will use `reg:squarederror`.
|
\item Otherwise, will use \code{reg:squarederror}.
|
||||||
}
|
}
|
||||||
|
|
||||||
If `objective` is not `NULL`, it must match with the type of `y` - e.g. `factor` types of `y`
|
If \code{objective} is not \code{NULL}, it must match with the type of \code{y} - e.g. \code{factor} types of \code{y}
|
||||||
can only be used with classification objectives and vice-versa.
|
can only be used with classification objectives and vice-versa.
|
||||||
|
|
||||||
Note that not all possible `objective` values supported by the core XGBoost library are allowed
|
Note that not all possible \code{objective} values supported by the core XGBoost library are allowed
|
||||||
here - for example, objectives which are a variation of another but with a different default
|
here - for example, objectives which are a variation of another but with a different default
|
||||||
prediction type (e.g. `multi:softmax` vs. `multi:softprob`) are not allowed, and neither are
|
prediction type (e.g. \code{multi:softmax} vs. \code{multi:softprob}) are not allowed, and neither are
|
||||||
ranking objectives, nor custom objectives at the moment.}
|
ranking objectives, nor custom objectives at the moment.}
|
||||||
|
|
||||||
\item{nrounds}{Number of boosting iterations / rounds.
|
\item{nrounds}{Number of boosting iterations / rounds.
|
||||||
@ -87,56 +91,54 @@ ranking objectives, nor custom objectives at the moment.}
|
|||||||
Note that the number of default boosting rounds here is not automatically tuned, and different
|
Note that the number of default boosting rounds here is not automatically tuned, and different
|
||||||
problems will have vastly different optimal numbers of boosting rounds.}
|
problems will have vastly different optimal numbers of boosting rounds.}
|
||||||
|
|
||||||
\item{weights}{Sample weights for each row in `x` and `y`. If `NULL` (the default), each row
|
\item{weights}{Sample weights for each row in \code{x} and \code{y}. If \code{NULL} (the default), each row
|
||||||
will have the same weight.
|
will have the same weight.
|
||||||
|
|
||||||
If not `NULL`, should be passed as a numeric vector with length matching to the number of
|
If not \code{NULL}, should be passed as a numeric vector with length matching to the number of rows in \code{x}.}
|
||||||
rows in `x`.}
|
|
||||||
|
|
||||||
\item{verbosity}{Verbosity of printing messages. Valid values of 0 (silent), 1 (warning),
|
\item{verbosity}{Verbosity of printing messages. Valid values of 0 (silent), 1 (warning),
|
||||||
2 (info), and 3 (debug).}
|
2 (info), and 3 (debug).}
|
||||||
|
|
||||||
\item{nthreads}{Number of parallel threads to use. If passing zero, will use all CPU threads.}
|
\item{nthreads}{Number of parallel threads to use. If passing zero, will use all CPU threads.}
|
||||||
|
|
||||||
\item{seed}{Seed to use for random number generation. If passing `NULL`, will draw a random
|
\item{seed}{Seed to use for random number generation. If passing \code{NULL}, will draw a random
|
||||||
number using R's PRNG system to use as seed.}
|
number using R's PRNG system to use as seed.}
|
||||||
|
|
||||||
\item{monotone_constraints}{Optional monotonicity constraints for features.
|
\item{monotone_constraints}{Optional monotonicity constraints for features.
|
||||||
|
|
||||||
Can be passed either as a named list (when `x` has column names), or as a vector. If passed
|
Can be passed either as a named list (when \code{x} has column names), or as a vector. If passed
|
||||||
as a vector and `x` has column names, will try to match the elements by name.
|
as a vector and \code{x} has column names, will try to match the elements by name.
|
||||||
|
|
||||||
A value of `+1` for a given feature makes the model predictions / scores constrained to be
|
A value of \code{+1} for a given feature makes the model predictions / scores constrained to be
|
||||||
a monotonically increasing function of that feature (that is, as the value of the feature
|
a monotonically increasing function of that feature (that is, as the value of the feature
|
||||||
increases, the model prediction cannot decrease), while a value of `-1` makes it a monotonically
|
increases, the model prediction cannot decrease), while a value of \code{-1} makes it a monotonically
|
||||||
decreasing function. A value of zero imposes no constraint.
|
decreasing function. A value of zero imposes no constraint.
|
||||||
|
|
||||||
The input for `monotone_constraints` can be a subset of the columns of `x` if named, in which
|
The input for \code{monotone_constraints} can be a subset of the columns of \code{x} if named, in which
|
||||||
case the columns that are not referred to in `monotone_constraints` will be assumed to have
|
case the columns that are not referred to in \code{monotone_constraints} will be assumed to have
|
||||||
a value of zero (no constraint imposed on the model for those features).
|
a value of zero (no constraint imposed on the model for those features).
|
||||||
|
|
||||||
See the tutorial \href{https://xgboost.readthedocs.io/en/stable/tutorials/monotonic.html}{
|
See the tutorial \href{https://xgboost.readthedocs.io/en/stable/tutorials/monotonic.html}{Monotonic Constraints}
|
||||||
Monotonic Constraints} for a more detailed explanation.}
|
for a more detailed explanation.}
|
||||||
|
|
||||||
\item{interaction_constraints}{Constraints for interaction representing permitted interactions.
|
\item{interaction_constraints}{Constraints for interaction representing permitted interactions.
|
||||||
The constraints must be specified in the form of a list of vectors referencing columns in the
|
The constraints must be specified in the form of a list of vectors referencing columns in the
|
||||||
data, e.g. `list(c(1, 2), c(3, 4, 5))` (with these numbers being column indices, numeration
|
data, e.g. \code{list(c(1, 2), c(3, 4, 5))} (with these numbers being column indices, numeration
|
||||||
starting at 1 - i.e. the first sublist references the first and second columns) or
|
starting at 1 - i.e. the first sublist references the first and second columns) or
|
||||||
`list(c("Sepal.Length", "Sepal.Width"), c("Petal.Length", "Petal.Width"))` (references
|
\code{list(c("Sepal.Length", "Sepal.Width"), c("Petal.Length", "Petal.Width"))} (references
|
||||||
columns by names), where each vector is a group of indices of features that are allowed to
|
columns by names), where each vector is a group of indices of features that are allowed to
|
||||||
interact with each other.
|
interact with each other.
|
||||||
|
|
||||||
See the tutorial
|
See the tutorial \href{https://xgboost.readthedocs.io/en/stable/tutorials/feature_interaction_constraint.html}{Feature Interaction Constraints}
|
||||||
\href{https://xgboost.readthedocs.io/en/stable/tutorials/feature_interaction_constraint.html}{
|
for more information.}
|
||||||
Feature Interaction Constraints} for more information.}
|
|
||||||
|
|
||||||
\item{feature_weights}{Feature weights for column sampling.
|
\item{feature_weights}{Feature weights for column sampling.
|
||||||
|
|
||||||
Can be passed either as a vector with length matching to columns of `x`, or as a named
|
Can be passed either as a vector with length matching to columns of \code{x}, or as a named
|
||||||
list (only if `x` has column names) with names matching to columns of 'x'. If it is a
|
list (only if \code{x} has column names) with names matching to columns of 'x'. If it is a
|
||||||
named vector, will try to match the entries to column names of `x` by name.
|
named vector, will try to match the entries to column names of \code{x} by name.
|
||||||
|
|
||||||
If `NULL` (the default), all columns will have the same weight.}
|
If \code{NULL} (the default), all columns will have the same weight.}
|
||||||
|
|
||||||
\item{base_margin}{Base margin used for boosting from existing model.
|
\item{base_margin}{Base margin used for boosting from existing model.
|
||||||
|
|
||||||
@ -145,53 +147,53 @@ here - for example, one can pass the raw scores from a previous model, or some p
|
|||||||
offset, or similar.
|
offset, or similar.
|
||||||
|
|
||||||
Should be either a numeric vector or numeric matrix (for multi-class and multi-target objectives)
|
Should be either a numeric vector or numeric matrix (for multi-class and multi-target objectives)
|
||||||
with the same number of rows as `x` and number of columns corresponding to number of optimization
|
with the same number of rows as \code{x} and number of columns corresponding to number of optimization
|
||||||
targets, and should be in the untransformed scale (for example, for objective `binary:logistic`,
|
targets, and should be in the untransformed scale (for example, for objective \code{binary:logistic},
|
||||||
it should have log-odds, not probabilities; and for objective `multi:softprob`, should have
|
it should have log-odds, not probabilities; and for objective \code{multi:softprob}, should have
|
||||||
number of columns matching to number of classes in the data).
|
number of columns matching to number of classes in the data).
|
||||||
|
|
||||||
Note that, if it contains more than one column, then columns will not be matched by name to
|
Note that, if it contains more than one column, then columns will not be matched by name to
|
||||||
the corresponding `y` - `base_margin` should have the same column order that the model will use
|
the corresponding \code{y} - \code{base_margin} should have the same column order that the model will use
|
||||||
(for example, for objective `multi:softprob`, columns of `base_margin` will be matched against
|
(for example, for objective \code{multi:softprob}, columns of \code{base_margin} will be matched against
|
||||||
`levels(y)` by their position, regardless of what `colnames(base_margin)` returns).
|
\code{levels(y)} by their position, regardless of what \code{colnames(base_margin)} returns).
|
||||||
|
|
||||||
If `NULL`, will start from zero, but note that for most objectives, an intercept is usually
|
If \code{NULL}, will start from zero, but note that for most objectives, an intercept is usually
|
||||||
added (controllable through parameter `base_score` instead) when `base_margin` is not passed.}
|
added (controllable through parameter \code{base_score} instead) when \code{base_margin} is not passed.}
|
||||||
|
|
||||||
\item{...}{Other training parameters. See the online documentation
|
\item{...}{Other training parameters. See the online documentation
|
||||||
\href{https://xgboost.readthedocs.io/en/stable/parameter.html}{XGBoost Parameters} for
|
\href{https://xgboost.readthedocs.io/en/stable/parameter.html}{XGBoost Parameters} for
|
||||||
details about possible values and what they do.
|
details about possible values and what they do.
|
||||||
|
|
||||||
Note that not all possible values from the core XGBoost library are allowed as `params` for
|
Note that not all possible values from the core XGBoost library are allowed as \code{params} for
|
||||||
'xgboost()' - in particular, values which require an already-fitted booster object (such as
|
'xgboost()' - in particular, values which require an already-fitted booster object (such as
|
||||||
`process_type`) are not accepted here.}
|
\code{process_type}) are not accepted here.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
A model object, inheriting from both `xgboost` and `xgb.Booster`. Compared to the regular
|
A model object, inheriting from both \code{xgboost} and \code{xgb.Booster}. Compared to the regular
|
||||||
`xgb.Booster` model class produced by \link{xgb.train}, this `xgboost` class will have an
|
\code{xgb.Booster} model class produced by \code{\link[=xgb.train]{xgb.train()}}, this \code{xgboost} class will have an
|
||||||
additional attribute `metadata` containing information which is used for formatting prediction
|
|
||||||
|
additional attribute \code{metadata} containing information which is used for formatting prediction
|
||||||
outputs, such as class names for classification problems.
|
outputs, such as class names for classification problems.
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Fits an XGBoost model (boosted decision tree ensemble) to given x/y data.
|
Fits an XGBoost model (boosted decision tree ensemble) to given x/y data.
|
||||||
|
|
||||||
See the tutorial \href{https://xgboost.readthedocs.io/en/stable/tutorials/model.html}{
|
See the tutorial \href{https://xgboost.readthedocs.io/en/stable/tutorials/model.html}{Introduction to Boosted Trees}
|
||||||
Introduction to Boosted Trees} for a longer explanation of what XGBoost does.
|
for a longer explanation of what XGBoost does.
|
||||||
|
|
||||||
This function is intended to provide a more user-friendly interface for XGBoost that follows
|
This function is intended to provide a more user-friendly interface for XGBoost that follows
|
||||||
R's conventions for model fitting and predictions, but which doesn't expose all of the
|
R's conventions for model fitting and predictions, but which doesn't expose all of the
|
||||||
possible functionalities of the core XGBoost library.
|
possible functionalities of the core XGBoost library.
|
||||||
|
|
||||||
See \link{xgb.train} for a more flexible low-level alternative which is similar across different
|
See \code{\link[=xgb.train]{xgb.train()}} for a more flexible low-level alternative which is similar across different
|
||||||
language bindings of XGBoost and which exposes the full library's functionalities.
|
language bindings of XGBoost and which exposes the full library's functionalities.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
For package authors using `xgboost` as a dependency, it is highly recommended to use
|
For package authors using 'xgboost' as a dependency, it is highly recommended to use
|
||||||
\link{xgb.train} in package code instead of `xgboost()`, since it has a more stable interface
|
\code{\link[=xgb.train]{xgb.train()}} in package code instead of \code{\link[=xgboost]{xgboost()}}, since it has a more stable interface
|
||||||
and performs fewer data conversions and copies along the way.
|
and performs fewer data conversions and copies along the way.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
library(xgboost)
|
|
||||||
data(mtcars)
|
data(mtcars)
|
||||||
|
|
||||||
# Fit a small regression model on the mtcars data
|
# Fit a small regression model on the mtcars data
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user