fixed typos in R package docs (#4345)
* fixed typos in R package docs * updated verbosity parameter in xgb.train docs
This commit is contained in:
parent
65db8d0626
commit
5e97de6a41
@ -14,7 +14,7 @@
|
|||||||
#' WARNING: side-effects!!! Be aware that these callback functions access and modify things in
|
#' WARNING: side-effects!!! Be aware that these callback functions access and modify things in
|
||||||
#' the environment from which they are called from, which is a fairly uncommon thing to do in R.
|
#' the environment from which they are called from, which is a fairly uncommon thing to do in R.
|
||||||
#'
|
#'
|
||||||
#' To write a custom callback closure, make sure you first understand the main concepts about R envoronments.
|
#' To write a custom callback closure, make sure you first understand the main concepts about R environments.
|
||||||
#' Check either R documentation on \code{\link[base]{environment}} or the
|
#' Check either R documentation on \code{\link[base]{environment}} or the
|
||||||
#' \href{http://adv-r.had.co.nz/Environments.html}{Environments chapter} from the "Advanced R"
|
#' \href{http://adv-r.had.co.nz/Environments.html}{Environments chapter} from the "Advanced R"
|
||||||
#' book by Hadley Wickham. Further, the best option is to read the code of some of the existing callbacks -
|
#' book by Hadley Wickham. Further, the best option is to read the code of some of the existing callbacks -
|
||||||
@ -154,7 +154,7 @@ cb.evaluation.log <- function() {
|
|||||||
callback
|
callback
|
||||||
}
|
}
|
||||||
|
|
||||||
#' Callback closure for restetting the booster's parameters at each iteration.
|
#' Callback closure for resetting the booster's parameters at each iteration.
|
||||||
#'
|
#'
|
||||||
#' @param new_params a list where each element corresponds to a parameter that needs to be reset.
|
#' @param new_params a list where each element corresponds to a parameter that needs to be reset.
|
||||||
#' Each element's value must be either a vector of values of length \code{nrounds}
|
#' Each element's value must be either a vector of values of length \code{nrounds}
|
||||||
@ -470,7 +470,7 @@ cb.save.model <- function(save_period = 0, save_name = "xgboost.model") {
|
|||||||
#' to the order of rows in the original dataset. Note that when a custom \code{folds} list is
|
#' to the order of rows in the original dataset. Note that when a custom \code{folds} list is
|
||||||
#' provided in \code{xgb.cv}, the predictions would only be returned properly when this list is a
|
#' provided in \code{xgb.cv}, the predictions would only be returned properly when this list is a
|
||||||
#' non-overlapping list of k sets of indices, as in a standard k-fold CV. The predictions would not be
|
#' non-overlapping list of k sets of indices, as in a standard k-fold CV. The predictions would not be
|
||||||
#' meaningful when user-profided folds have overlapping indices as in, e.g., random sampling splits.
|
#' meaningful when user-provided folds have overlapping indices as in, e.g., random sampling splits.
|
||||||
#' When some of the indices in the training dataset are not included into user-provided \code{folds},
|
#' When some of the indices in the training dataset are not included into user-provided \code{folds},
|
||||||
#' their prediction value would be \code{NA}.
|
#' their prediction value would be \code{NA}.
|
||||||
#'
|
#'
|
||||||
@ -681,7 +681,7 @@ cb.gblinear.history <- function(sparse=FALSE) {
|
|||||||
#' using the \code{cb.gblinear.history()} callback.
|
#' using the \code{cb.gblinear.history()} callback.
|
||||||
#' @param class_index zero-based class index to extract the coefficients for only that
|
#' @param class_index zero-based class index to extract the coefficients for only that
|
||||||
#' specific class in a multinomial multiclass model. When it is NULL, all the
|
#' specific class in a multinomial multiclass model. When it is NULL, all the
|
||||||
#' coeffients are returned. Has no effect in non-multiclass models.
|
#' coefficients are returned. Has no effect in non-multiclass models.
|
||||||
#'
|
#'
|
||||||
#' @return
|
#' @return
|
||||||
#' For an \code{xgb.train} result, a matrix (either dense or sparse) with the columns
|
#' For an \code{xgb.train} result, a matrix (either dense or sparse) with the columns
|
||||||
|
|||||||
@ -81,7 +81,7 @@ xgb.get.handle <- function(object) {
|
|||||||
#' its handle (pointer) to an internal xgboost model would be invalid. The majority of xgboost methods
|
#' its handle (pointer) to an internal xgboost model would be invalid. The majority of xgboost methods
|
||||||
#' should still work for such a model object since those methods would be using
|
#' should still work for such a model object since those methods would be using
|
||||||
#' \code{xgb.Booster.complete} internally. However, one might find it to be more efficient to call the
|
#' \code{xgb.Booster.complete} internally. However, one might find it to be more efficient to call the
|
||||||
#' \code{xgb.Booster.complete} function explicitely once after loading a model as an R-object.
|
#' \code{xgb.Booster.complete} function explicitly once after loading a model as an R-object.
|
||||||
#' That would prevent further repeated implicit reconstruction of an internal booster model.
|
#' That would prevent further repeated implicit reconstruction of an internal booster model.
|
||||||
#'
|
#'
|
||||||
#' @return
|
#' @return
|
||||||
@ -162,7 +162,7 @@ xgb.Booster.complete <- function(object, saveraw = TRUE) {
|
|||||||
#'
|
#'
|
||||||
#' With \code{predinteraction = TRUE}, SHAP values of contributions of interaction of each pair of features
|
#' With \code{predinteraction = TRUE}, SHAP values of contributions of interaction of each pair of features
|
||||||
#' are computed. Note that this operation might be rather expensive in terms of compute and memory.
|
#' are computed. Note that this operation might be rather expensive in terms of compute and memory.
|
||||||
#' Since it quadratically depends on the number of features, it is recommended to perfom selection
|
#' Since it quadratically depends on the number of features, it is recommended to perform selection
|
||||||
#' of the most important features first. See below about the format of the returned results.
|
#' of the most important features first. See below about the format of the returned results.
|
||||||
#'
|
#'
|
||||||
#' @return
|
#' @return
|
||||||
|
|||||||
@ -104,7 +104,7 @@ dim.xgb.DMatrix <- function(x) {
|
|||||||
#' Handling of column names of \code{xgb.DMatrix}
|
#' Handling of column names of \code{xgb.DMatrix}
|
||||||
#'
|
#'
|
||||||
#' Only column names are supported for \code{xgb.DMatrix}, thus setting of
|
#' Only column names are supported for \code{xgb.DMatrix}, thus setting of
|
||||||
#' row names would have no effect and returnten row names would be NULL.
|
#' row names would have no effect and returned row names would be NULL.
|
||||||
#'
|
#'
|
||||||
#' @param x object of class \code{xgb.DMatrix}
|
#' @param x object of class \code{xgb.DMatrix}
|
||||||
#' @param value a list of two elements: the first one is ignored
|
#' @param value a list of two elements: the first one is ignored
|
||||||
@ -266,10 +266,10 @@ setinfo.xgb.DMatrix <- function(object, name, info, ...) {
|
|||||||
|
|
||||||
|
|
||||||
#' Get a new DMatrix containing the specified rows of
|
#' Get a new DMatrix containing the specified rows of
|
||||||
#' orginal xgb.DMatrix object
|
#' original xgb.DMatrix object
|
||||||
#'
|
#'
|
||||||
#' Get a new DMatrix containing the specified rows of
|
#' Get a new DMatrix containing the specified rows of
|
||||||
#' orginal xgb.DMatrix object
|
#' original xgb.DMatrix object
|
||||||
#'
|
#'
|
||||||
#' @param object Object of class "xgb.DMatrix"
|
#' @param object Object of class "xgb.DMatrix"
|
||||||
#' @param idxset a integer vector of indices of rows needed
|
#' @param idxset a integer vector of indices of rows needed
|
||||||
|
|||||||
@ -39,7 +39,7 @@
|
|||||||
#' }
|
#' }
|
||||||
#' @param obj customized objective function. Returns gradient and second order
|
#' @param obj customized objective function. Returns gradient and second order
|
||||||
#' gradient with given prediction and dtrain.
|
#' gradient with given prediction and dtrain.
|
||||||
#' @param feval custimized evaluation function. Returns
|
#' @param feval customized evaluation function. Returns
|
||||||
#' \code{list(metric='metric-name', value='metric-value')} with given
|
#' \code{list(metric='metric-name', value='metric-value')} with given
|
||||||
#' prediction and dtrain.
|
#' prediction and dtrain.
|
||||||
#' @param stratified a \code{boolean} indicating whether sampling of folds should be stratified
|
#' @param stratified a \code{boolean} indicating whether sampling of folds should be stratified
|
||||||
@ -84,7 +84,7 @@
|
|||||||
#' capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
|
#' capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
|
||||||
#' \item \code{callbacks} callback functions that were either automatically assigned or
|
#' \item \code{callbacks} callback functions that were either automatically assigned or
|
||||||
#' explicitly passed.
|
#' explicitly passed.
|
||||||
#' \item \code{evaluation_log} evaluation history storead as a \code{data.table} with the
|
#' \item \code{evaluation_log} evaluation history stored as a \code{data.table} with the
|
||||||
#' first column corresponding to iteration number and the rest corresponding to the
|
#' first column corresponding to iteration number and the rest corresponding to the
|
||||||
#' CV-based evaluation means and standard deviations for the training and test CV-sets.
|
#' CV-based evaluation means and standard deviations for the training and test CV-sets.
|
||||||
#' It is created by the \code{\link{cb.evaluation.log}} callback.
|
#' It is created by the \code{\link{cb.evaluation.log}} callback.
|
||||||
|
|||||||
@ -30,8 +30,8 @@
|
|||||||
#' Setting \code{rel_to_first = TRUE} allows to see the picture from the perspective of
|
#' Setting \code{rel_to_first = TRUE} allows to see the picture from the perspective of
|
||||||
#' "what is feature's importance contribution relative to the most important feature?"
|
#' "what is feature's importance contribution relative to the most important feature?"
|
||||||
#'
|
#'
|
||||||
#' The ggplot-backend method also performs 1-D custering of the importance values,
|
#' The ggplot-backend method also performs 1-D clustering of the importance values,
|
||||||
#' with bar colors coresponding to different clusters that have somewhat similar importance values.
|
#' with bar colors corresponding to different clusters that have somewhat similar importance values.
|
||||||
#'
|
#'
|
||||||
#' @return
|
#' @return
|
||||||
#' The \code{xgb.plot.importance} function creates a \code{barplot} (when \code{plot=TRUE})
|
#' The \code{xgb.plot.importance} function creates a \code{barplot} (when \code{plot=TRUE})
|
||||||
|
|||||||
@ -31,7 +31,7 @@
|
|||||||
#' @param plot_loess whether to plot loess-smoothed curves. The smoothing is only done for features with
|
#' @param plot_loess whether to plot loess-smoothed curves. The smoothing is only done for features with
|
||||||
#' more than 5 distinct values.
|
#' more than 5 distinct values.
|
||||||
#' @param col_loess a color to use for the loess curves.
|
#' @param col_loess a color to use for the loess curves.
|
||||||
#' @param span_loess the \code{span} paramerer in \code{\link[stats]{loess}}'s call.
|
#' @param span_loess the \code{span} parameter in \code{\link[stats]{loess}}'s call.
|
||||||
#' @param which whether to do univariate or bivariate plotting. NOTE: only 1D is implemented so far.
|
#' @param which whether to do univariate or bivariate plotting. NOTE: only 1D is implemented so far.
|
||||||
#' @param plot whether a plot should be drawn. If FALSE, only a lits of matrices is returned.
|
#' @param plot whether a plot should be drawn. If FALSE, only a lits of matrices is returned.
|
||||||
#' @param ... other parameters passed to \code{plot}.
|
#' @param ... other parameters passed to \code{plot}.
|
||||||
|
|||||||
@ -68,7 +68,7 @@
|
|||||||
#' the performance of each round's model on mat1 and mat2.
|
#' the performance of each round's model on mat1 and mat2.
|
||||||
#' @param obj customized objective function. Returns gradient and second order
|
#' @param obj customized objective function. Returns gradient and second order
|
||||||
#' gradient with given prediction and dtrain.
|
#' gradient with given prediction and dtrain.
|
||||||
#' @param feval custimized evaluation function. Returns
|
#' @param feval customized evaluation function. Returns
|
||||||
#' \code{list(metric='metric-name', value='metric-value')} with given
|
#' \code{list(metric='metric-name', value='metric-value')} with given
|
||||||
#' prediction and dtrain.
|
#' prediction and dtrain.
|
||||||
#' @param verbose If 0, xgboost will stay silent. If 1, it will print information about performance.
|
#' @param verbose If 0, xgboost will stay silent. If 1, it will print information about performance.
|
||||||
@ -118,7 +118,7 @@
|
|||||||
#' when the \code{eval_metric} parameter is not provided.
|
#' when the \code{eval_metric} parameter is not provided.
|
||||||
#' User may set one or several \code{eval_metric} parameters.
|
#' User may set one or several \code{eval_metric} parameters.
|
||||||
#' Note that when using a customized metric, only this single metric can be used.
|
#' Note that when using a customized metric, only this single metric can be used.
|
||||||
#' The folloiwing is the list of built-in metrics for which Xgboost provides optimized implementation:
|
#' The following is the list of built-in metrics for which Xgboost provides optimized implementation:
|
||||||
#' \itemize{
|
#' \itemize{
|
||||||
#' \item \code{rmse} root mean square error. \url{http://en.wikipedia.org/wiki/Root_mean_square_error}
|
#' \item \code{rmse} root mean square error. \url{http://en.wikipedia.org/wiki/Root_mean_square_error}
|
||||||
#' \item \code{logloss} negative log-likelihood. \url{http://en.wikipedia.org/wiki/Log-likelihood}
|
#' \item \code{logloss} negative log-likelihood. \url{http://en.wikipedia.org/wiki/Log-likelihood}
|
||||||
@ -147,14 +147,14 @@
|
|||||||
#' \item \code{handle} a handle (pointer) to the xgboost model in memory.
|
#' \item \code{handle} a handle (pointer) to the xgboost model in memory.
|
||||||
#' \item \code{raw} a cached memory dump of the xgboost model saved as R's \code{raw} type.
|
#' \item \code{raw} a cached memory dump of the xgboost model saved as R's \code{raw} type.
|
||||||
#' \item \code{niter} number of boosting iterations.
|
#' \item \code{niter} number of boosting iterations.
|
||||||
#' \item \code{evaluation_log} evaluation history storead as a \code{data.table} with the
|
#' \item \code{evaluation_log} evaluation history stored as a \code{data.table} with the
|
||||||
#' first column corresponding to iteration number and the rest corresponding to evaluation
|
#' first column corresponding to iteration number and the rest corresponding to evaluation
|
||||||
#' metrics' values. It is created by the \code{\link{cb.evaluation.log}} callback.
|
#' metrics' values. It is created by the \code{\link{cb.evaluation.log}} callback.
|
||||||
#' \item \code{call} a function call.
|
#' \item \code{call} a function call.
|
||||||
#' \item \code{params} parameters that were passed to the xgboost library. Note that it does not
|
#' \item \code{params} parameters that were passed to the xgboost library. Note that it does not
|
||||||
#' capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
|
#' capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
|
||||||
#' \item \code{callbacks} callback functions that were either automatically assigned or
|
#' \item \code{callbacks} callback functions that were either automatically assigned or
|
||||||
#' explicitely passed.
|
#' explicitly passed.
|
||||||
#' \item \code{best_iteration} iteration number with the best evaluation metric value
|
#' \item \code{best_iteration} iteration number with the best evaluation metric value
|
||||||
#' (only available with early stopping).
|
#' (only available with early stopping).
|
||||||
#' \item \code{best_ntreelimit} the \code{ntreelimit} value corresponding to the best iteration,
|
#' \item \code{best_ntreelimit} the \code{ntreelimit} value corresponding to the best iteration,
|
||||||
@ -163,7 +163,7 @@
|
|||||||
#' \item \code{best_score} the best evaluation metric value during early stopping.
|
#' \item \code{best_score} the best evaluation metric value during early stopping.
|
||||||
#' (only available with early stopping).
|
#' (only available with early stopping).
|
||||||
#' \item \code{feature_names} names of the training dataset features
|
#' \item \code{feature_names} names of the training dataset features
|
||||||
#' (only when comun names were defined in training data).
|
#' (only when column names were defined in training data).
|
||||||
#' \item \code{nfeatures} number of features in training data.
|
#' \item \code{nfeatures} number of features in training data.
|
||||||
#' }
|
#' }
|
||||||
#'
|
#'
|
||||||
@ -186,7 +186,7 @@
|
|||||||
#' watchlist <- list(train = dtrain, eval = dtest)
|
#' watchlist <- list(train = dtrain, eval = dtest)
|
||||||
#'
|
#'
|
||||||
#' ## A simple xgb.train example:
|
#' ## A simple xgb.train example:
|
||||||
#' param <- list(max_depth = 2, eta = 1, silent = 1, nthread = 2,
|
#' param <- list(max_depth = 2, eta = 1, verbose = 0, nthread = 2,
|
||||||
#' objective = "binary:logistic", eval_metric = "auc")
|
#' objective = "binary:logistic", eval_metric = "auc")
|
||||||
#' bst <- xgb.train(param, dtrain, nrounds = 2, watchlist)
|
#' bst <- xgb.train(param, dtrain, nrounds = 2, watchlist)
|
||||||
#'
|
#'
|
||||||
@ -207,12 +207,12 @@
|
|||||||
#'
|
#'
|
||||||
#' # These functions could be used by passing them either:
|
#' # These functions could be used by passing them either:
|
||||||
#' # as 'objective' and 'eval_metric' parameters in the params list:
|
#' # as 'objective' and 'eval_metric' parameters in the params list:
|
||||||
#' param <- list(max_depth = 2, eta = 1, silent = 1, nthread = 2,
|
#' param <- list(max_depth = 2, eta = 1, verbose = 0, nthread = 2,
|
||||||
#' objective = logregobj, eval_metric = evalerror)
|
#' objective = logregobj, eval_metric = evalerror)
|
||||||
#' bst <- xgb.train(param, dtrain, nrounds = 2, watchlist)
|
#' bst <- xgb.train(param, dtrain, nrounds = 2, watchlist)
|
||||||
#'
|
#'
|
||||||
#' # or through the ... arguments:
|
#' # or through the ... arguments:
|
||||||
#' param <- list(max_depth = 2, eta = 1, silent = 1, nthread = 2)
|
#' param <- list(max_depth = 2, eta = 1, verbose = 0, nthread = 2)
|
||||||
#' bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
|
#' bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
|
||||||
#' objective = logregobj, eval_metric = evalerror)
|
#' objective = logregobj, eval_metric = evalerror)
|
||||||
#'
|
#'
|
||||||
@ -222,7 +222,7 @@
|
|||||||
#'
|
#'
|
||||||
#'
|
#'
|
||||||
#' ## An xgb.train example of using variable learning rates at each iteration:
|
#' ## An xgb.train example of using variable learning rates at each iteration:
|
||||||
#' param <- list(max_depth = 2, eta = 1, silent = 1, nthread = 2,
|
#' param <- list(max_depth = 2, eta = 1, verbose = 0, nthread = 2,
|
||||||
#' objective = "binary:logistic", eval_metric = "auc")
|
#' objective = "binary:logistic", eval_metric = "auc")
|
||||||
#' my_etas <- list(eta = c(0.5, 0.1))
|
#' my_etas <- list(eta = c(0.5, 0.1))
|
||||||
#' bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
|
#' bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
|
||||||
|
|||||||
@ -18,7 +18,7 @@ the boosting is completed.
|
|||||||
WARNING: side-effects!!! Be aware that these callback functions access and modify things in
|
WARNING: side-effects!!! Be aware that these callback functions access and modify things in
|
||||||
the environment from which they are called from, which is a fairly uncommon thing to do in R.
|
the environment from which they are called from, which is a fairly uncommon thing to do in R.
|
||||||
|
|
||||||
To write a custom callback closure, make sure you first understand the main concepts about R envoronments.
|
To write a custom callback closure, make sure you first understand the main concepts about R environments.
|
||||||
Check either R documentation on \code{\link[base]{environment}} or the
|
Check either R documentation on \code{\link[base]{environment}} or the
|
||||||
\href{http://adv-r.had.co.nz/Environments.html}{Environments chapter} from the "Advanced R"
|
\href{http://adv-r.had.co.nz/Environments.html}{Environments chapter} from the "Advanced R"
|
||||||
book by Hadley Wickham. Further, the best option is to read the code of some of the existing callbacks -
|
book by Hadley Wickham. Further, the best option is to read the code of some of the existing callbacks -
|
||||||
|
|||||||
@ -15,7 +15,7 @@ depending on the number of prediction outputs per data row. The order of predict
|
|||||||
to the order of rows in the original dataset. Note that when a custom \code{folds} list is
|
to the order of rows in the original dataset. Note that when a custom \code{folds} list is
|
||||||
provided in \code{xgb.cv}, the predictions would only be returned properly when this list is a
|
provided in \code{xgb.cv}, the predictions would only be returned properly when this list is a
|
||||||
non-overlapping list of k sets of indices, as in a standard k-fold CV. The predictions would not be
|
non-overlapping list of k sets of indices, as in a standard k-fold CV. The predictions would not be
|
||||||
meaningful when user-profided folds have overlapping indices as in, e.g., random sampling splits.
|
meaningful when user-provided folds have overlapping indices as in, e.g., random sampling splits.
|
||||||
When some of the indices in the training dataset are not included into user-provided \code{folds},
|
When some of the indices in the training dataset are not included into user-provided \code{folds},
|
||||||
their prediction value would be \code{NA}.
|
their prediction value would be \code{NA}.
|
||||||
}
|
}
|
||||||
|
|||||||
@ -2,7 +2,7 @@
|
|||||||
% Please edit documentation in R/callbacks.R
|
% Please edit documentation in R/callbacks.R
|
||||||
\name{cb.reset.parameters}
|
\name{cb.reset.parameters}
|
||||||
\alias{cb.reset.parameters}
|
\alias{cb.reset.parameters}
|
||||||
\title{Callback closure for restetting the booster's parameters at each iteration.}
|
\title{Callback closure for resetting the booster's parameters at each iteration.}
|
||||||
\usage{
|
\usage{
|
||||||
cb.reset.parameters(new_params)
|
cb.reset.parameters(new_params)
|
||||||
}
|
}
|
||||||
@ -15,7 +15,7 @@ which returns a new parameter value by using the current iteration number
|
|||||||
and the total number of boosting rounds.}
|
and the total number of boosting rounds.}
|
||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Callback closure for restetting the booster's parameters at each iteration.
|
Callback closure for resetting the booster's parameters at each iteration.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
This is a "pre-iteration" callback function used to reset booster's parameters
|
This is a "pre-iteration" callback function used to reset booster's parameters
|
||||||
|
|||||||
@ -17,7 +17,7 @@ and the second one is column names}
|
|||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Only column names are supported for \code{xgb.DMatrix}, thus setting of
|
Only column names are supported for \code{xgb.DMatrix}, thus setting of
|
||||||
row names would have no effect and returnten row names would be NULL.
|
row names would have no effect and returned row names would be NULL.
|
||||||
}
|
}
|
||||||
\details{
|
\details{
|
||||||
Generic \code{dimnames} methods are used by \code{colnames}.
|
Generic \code{dimnames} methods are used by \code{colnames}.
|
||||||
|
|||||||
@ -91,7 +91,7 @@ in \url{http://blog.datadive.net/interpreting-random-forests/}.
|
|||||||
|
|
||||||
With \code{predinteraction = TRUE}, SHAP values of contributions of interaction of each pair of features
|
With \code{predinteraction = TRUE}, SHAP values of contributions of interaction of each pair of features
|
||||||
are computed. Note that this operation might be rather expensive in terms of compute and memory.
|
are computed. Note that this operation might be rather expensive in terms of compute and memory.
|
||||||
Since it quadratically depends on the number of features, it is recommended to perfom selection
|
Since it quadratically depends on the number of features, it is recommended to perform selection
|
||||||
of the most important features first. See below about the format of the returned results.
|
of the most important features first. See below about the format of the returned results.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
|
|||||||
@ -5,7 +5,7 @@
|
|||||||
\alias{slice.xgb.DMatrix}
|
\alias{slice.xgb.DMatrix}
|
||||||
\alias{[.xgb.DMatrix}
|
\alias{[.xgb.DMatrix}
|
||||||
\title{Get a new DMatrix containing the specified rows of
|
\title{Get a new DMatrix containing the specified rows of
|
||||||
orginal xgb.DMatrix object}
|
original xgb.DMatrix object}
|
||||||
\usage{
|
\usage{
|
||||||
slice(object, ...)
|
slice(object, ...)
|
||||||
|
|
||||||
@ -24,7 +24,7 @@ slice(object, ...)
|
|||||||
}
|
}
|
||||||
\description{
|
\description{
|
||||||
Get a new DMatrix containing the specified rows of
|
Get a new DMatrix containing the specified rows of
|
||||||
orginal xgb.DMatrix object
|
original xgb.DMatrix object
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
data(agaricus.train, package='xgboost')
|
data(agaricus.train, package='xgboost')
|
||||||
|
|||||||
@ -28,7 +28,7 @@ E.g., when an \code{xgb.Booster} model is saved as an R object and then is loade
|
|||||||
its handle (pointer) to an internal xgboost model would be invalid. The majority of xgboost methods
|
its handle (pointer) to an internal xgboost model would be invalid. The majority of xgboost methods
|
||||||
should still work for such a model object since those methods would be using
|
should still work for such a model object since those methods would be using
|
||||||
\code{xgb.Booster.complete} internally. However, one might find it to be more efficient to call the
|
\code{xgb.Booster.complete} internally. However, one might find it to be more efficient to call the
|
||||||
\code{xgb.Booster.complete} function explicitely once after loading a model as an R-object.
|
\code{xgb.Booster.complete} function explicitly once after loading a model as an R-object.
|
||||||
That would prevent further repeated implicit reconstruction of an internal booster model.
|
That would prevent further repeated implicit reconstruction of an internal booster model.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
|
|||||||
@ -16,7 +16,7 @@ xgb.cv(params = list(), data, nrounds, nfold, label = NULL,
|
|||||||
\itemize{
|
\itemize{
|
||||||
\item \code{objective} objective function, common ones are
|
\item \code{objective} objective function, common ones are
|
||||||
\itemize{
|
\itemize{
|
||||||
\item \code{reg:squarederror} Regression with squared loss.
|
\item \code{reg:squarederror} Regression with squared loss
|
||||||
\item \code{binary:logistic} logistic regression for classification
|
\item \code{binary:logistic} logistic regression for classification
|
||||||
}
|
}
|
||||||
\item \code{eta} step size of each boosting step
|
\item \code{eta} step size of each boosting step
|
||||||
@ -59,7 +59,7 @@ from each CV model. This parameter engages the \code{\link{cb.cv.predict}} callb
|
|||||||
\item{obj}{customized objective function. Returns gradient and second order
|
\item{obj}{customized objective function. Returns gradient and second order
|
||||||
gradient with given prediction and dtrain.}
|
gradient with given prediction and dtrain.}
|
||||||
|
|
||||||
\item{feval}{custimized evaluation function. Returns
|
\item{feval}{customized evaluation function. Returns
|
||||||
\code{list(metric='metric-name', value='metric-value')} with given
|
\code{list(metric='metric-name', value='metric-value')} with given
|
||||||
prediction and dtrain.}
|
prediction and dtrain.}
|
||||||
|
|
||||||
@ -101,7 +101,7 @@ An object of class \code{xgb.cv.synchronous} with the following elements:
|
|||||||
capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
|
capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
|
||||||
\item \code{callbacks} callback functions that were either automatically assigned or
|
\item \code{callbacks} callback functions that were either automatically assigned or
|
||||||
explicitly passed.
|
explicitly passed.
|
||||||
\item \code{evaluation_log} evaluation history storead as a \code{data.table} with the
|
\item \code{evaluation_log} evaluation history stored as a \code{data.table} with the
|
||||||
first column corresponding to iteration number and the rest corresponding to the
|
first column corresponding to iteration number and the rest corresponding to the
|
||||||
CV-based evaluation means and standard deviations for the training and test CV-sets.
|
CV-based evaluation means and standard deviations for the training and test CV-sets.
|
||||||
It is created by the \code{\link{cb.evaluation.log}} callback.
|
It is created by the \code{\link{cb.evaluation.log}} callback.
|
||||||
|
|||||||
@ -12,7 +12,7 @@ using the \code{cb.gblinear.history()} callback.}
|
|||||||
|
|
||||||
\item{class_index}{zero-based class index to extract the coefficients for only that
|
\item{class_index}{zero-based class index to extract the coefficients for only that
|
||||||
specific class in a multinomial multiclass model. When it is NULL, all the
|
specific class in a multinomial multiclass model. When it is NULL, all the
|
||||||
coeffients are returned. Has no effect in non-multiclass models.}
|
coefficients are returned. Has no effect in non-multiclass models.}
|
||||||
}
|
}
|
||||||
\value{
|
\value{
|
||||||
For an \code{xgb.train} result, a matrix (either dense or sparse) with the columns
|
For an \code{xgb.train} result, a matrix (either dense or sparse) with the columns
|
||||||
|
|||||||
@ -59,8 +59,8 @@ For linear models, \code{rel_to_first = FALSE} would show actual values of the c
|
|||||||
Setting \code{rel_to_first = TRUE} allows to see the picture from the perspective of
|
Setting \code{rel_to_first = TRUE} allows to see the picture from the perspective of
|
||||||
"what is feature's importance contribution relative to the most important feature?"
|
"what is feature's importance contribution relative to the most important feature?"
|
||||||
|
|
||||||
The ggplot-backend method also performs 1-D custering of the importance values,
|
The ggplot-backend method also performs 1-D clustering of the importance values,
|
||||||
with bar colors coresponding to different clusters that have somewhat similar importance values.
|
with bar colors corresponding to different clusters that have somewhat similar importance values.
|
||||||
}
|
}
|
||||||
\examples{
|
\examples{
|
||||||
data(agaricus.train)
|
data(agaricus.train)
|
||||||
|
|||||||
@ -63,7 +63,7 @@ more than 5 distinct values.}
|
|||||||
|
|
||||||
\item{col_loess}{a color to use for the loess curves.}
|
\item{col_loess}{a color to use for the loess curves.}
|
||||||
|
|
||||||
\item{span_loess}{the \code{span} paramerer in \code{\link[stats]{loess}}'s call.}
|
\item{span_loess}{the \code{span} parameter in \code{\link[stats]{loess}}'s call.}
|
||||||
|
|
||||||
\item{which}{whether to do univariate or bivariate plotting. NOTE: only 1D is implemented so far.}
|
\item{which}{whether to do univariate or bivariate plotting. NOTE: only 1D is implemented so far.}
|
||||||
|
|
||||||
|
|||||||
@ -41,6 +41,7 @@ xgboost(data = NULL, label = NULL, missing = NA, weight = NULL,
|
|||||||
\item \code{colsample_bytree} subsample ratio of columns when constructing each tree. Default: 1
|
\item \code{colsample_bytree} subsample ratio of columns when constructing each tree. Default: 1
|
||||||
\item \code{num_parallel_tree} Experimental parameter. number of trees to grow per round. Useful to test Random Forest through Xgboost (set \code{colsample_bytree < 1}, \code{subsample < 1} and \code{round = 1}) accordingly. Default: 1
|
\item \code{num_parallel_tree} Experimental parameter. number of trees to grow per round. Useful to test Random Forest through Xgboost (set \code{colsample_bytree < 1}, \code{subsample < 1} and \code{round = 1}) accordingly. Default: 1
|
||||||
\item \code{monotone_constraints} A numerical vector consists of \code{1}, \code{0} and \code{-1} with its length equals to the number of features in the training data. \code{1} is increasing, \code{-1} is decreasing and \code{0} is no constraint.
|
\item \code{monotone_constraints} A numerical vector consists of \code{1}, \code{0} and \code{-1} with its length equals to the number of features in the training data. \code{1} is increasing, \code{-1} is decreasing and \code{0} is no constraint.
|
||||||
|
\item \code{interaction_constraints} A list of vectors specifying feature indices of permitted interactions. Each item of the list represents one permitted interaction where specified features are allowed to interact with each other. Feature index values should start from \code{0} (\code{0} references the first column). Leave argument unspecified for no interaction constraints.
|
||||||
}
|
}
|
||||||
|
|
||||||
2.2. Parameter for Linear Booster
|
2.2. Parameter for Linear Booster
|
||||||
@ -86,7 +87,7 @@ the performance of each round's model on mat1 and mat2.}
|
|||||||
\item{obj}{customized objective function. Returns gradient and second order
|
\item{obj}{customized objective function. Returns gradient and second order
|
||||||
gradient with given prediction and dtrain.}
|
gradient with given prediction and dtrain.}
|
||||||
|
|
||||||
\item{feval}{custimized evaluation function. Returns
|
\item{feval}{customized evaluation function. Returns
|
||||||
\code{list(metric='metric-name', value='metric-value')} with given
|
\code{list(metric='metric-name', value='metric-value')} with given
|
||||||
prediction and dtrain.}
|
prediction and dtrain.}
|
||||||
|
|
||||||
@ -140,14 +141,14 @@ An object of class \code{xgb.Booster} with the following elements:
|
|||||||
\item \code{handle} a handle (pointer) to the xgboost model in memory.
|
\item \code{handle} a handle (pointer) to the xgboost model in memory.
|
||||||
\item \code{raw} a cached memory dump of the xgboost model saved as R's \code{raw} type.
|
\item \code{raw} a cached memory dump of the xgboost model saved as R's \code{raw} type.
|
||||||
\item \code{niter} number of boosting iterations.
|
\item \code{niter} number of boosting iterations.
|
||||||
\item \code{evaluation_log} evaluation history storead as a \code{data.table} with the
|
\item \code{evaluation_log} evaluation history stored as a \code{data.table} with the
|
||||||
first column corresponding to iteration number and the rest corresponding to evaluation
|
first column corresponding to iteration number and the rest corresponding to evaluation
|
||||||
metrics' values. It is created by the \code{\link{cb.evaluation.log}} callback.
|
metrics' values. It is created by the \code{\link{cb.evaluation.log}} callback.
|
||||||
\item \code{call} a function call.
|
\item \code{call} a function call.
|
||||||
\item \code{params} parameters that were passed to the xgboost library. Note that it does not
|
\item \code{params} parameters that were passed to the xgboost library. Note that it does not
|
||||||
capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
|
capture parameters changed by the \code{\link{cb.reset.parameters}} callback.
|
||||||
\item \code{callbacks} callback functions that were either automatically assigned or
|
\item \code{callbacks} callback functions that were either automatically assigned or
|
||||||
explicitely passed.
|
explicitly passed.
|
||||||
\item \code{best_iteration} iteration number with the best evaluation metric value
|
\item \code{best_iteration} iteration number with the best evaluation metric value
|
||||||
(only available with early stopping).
|
(only available with early stopping).
|
||||||
\item \code{best_ntreelimit} the \code{ntreelimit} value corresponding to the best iteration,
|
\item \code{best_ntreelimit} the \code{ntreelimit} value corresponding to the best iteration,
|
||||||
@ -156,7 +157,7 @@ An object of class \code{xgb.Booster} with the following elements:
|
|||||||
\item \code{best_score} the best evaluation metric value during early stopping.
|
\item \code{best_score} the best evaluation metric value during early stopping.
|
||||||
(only available with early stopping).
|
(only available with early stopping).
|
||||||
\item \code{feature_names} names of the training dataset features
|
\item \code{feature_names} names of the training dataset features
|
||||||
(only when comun names were defined in training data).
|
(only when column names were defined in training data).
|
||||||
\item \code{nfeatures} number of features in training data.
|
\item \code{nfeatures} number of features in training data.
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@ -178,7 +179,7 @@ The evaluation metric is chosen automatically by Xgboost (according to the objec
|
|||||||
when the \code{eval_metric} parameter is not provided.
|
when the \code{eval_metric} parameter is not provided.
|
||||||
User may set one or several \code{eval_metric} parameters.
|
User may set one or several \code{eval_metric} parameters.
|
||||||
Note that when using a customized metric, only this single metric can be used.
|
Note that when using a customized metric, only this single metric can be used.
|
||||||
The folloiwing is the list of built-in metrics for which Xgboost provides optimized implementation:
|
The following is the list of built-in metrics for which Xgboost provides optimized implementation:
|
||||||
\itemize{
|
\itemize{
|
||||||
\item \code{rmse} root mean square error. \url{http://en.wikipedia.org/wiki/Root_mean_square_error}
|
\item \code{rmse} root mean square error. \url{http://en.wikipedia.org/wiki/Root_mean_square_error}
|
||||||
\item \code{logloss} negative log-likelihood. \url{http://en.wikipedia.org/wiki/Log-likelihood}
|
\item \code{logloss} negative log-likelihood. \url{http://en.wikipedia.org/wiki/Log-likelihood}
|
||||||
@ -210,7 +211,7 @@ dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
|
|||||||
watchlist <- list(train = dtrain, eval = dtest)
|
watchlist <- list(train = dtrain, eval = dtest)
|
||||||
|
|
||||||
## A simple xgb.train example:
|
## A simple xgb.train example:
|
||||||
param <- list(max_depth = 2, eta = 1, verbosity = 0, nthread = 2,
|
param <- list(max_depth = 2, eta = 1, verbose = 0, nthread = 2,
|
||||||
objective = "binary:logistic", eval_metric = "auc")
|
objective = "binary:logistic", eval_metric = "auc")
|
||||||
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist)
|
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist)
|
||||||
|
|
||||||
@ -231,12 +232,12 @@ evalerror <- function(preds, dtrain) {
|
|||||||
|
|
||||||
# These functions could be used by passing them either:
|
# These functions could be used by passing them either:
|
||||||
# as 'objective' and 'eval_metric' parameters in the params list:
|
# as 'objective' and 'eval_metric' parameters in the params list:
|
||||||
param <- list(max_depth = 2, eta = 1, verbosity = 0, nthread = 2,
|
param <- list(max_depth = 2, eta = 1, verbose = 0, nthread = 2,
|
||||||
objective = logregobj, eval_metric = evalerror)
|
objective = logregobj, eval_metric = evalerror)
|
||||||
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist)
|
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist)
|
||||||
|
|
||||||
# or through the ... arguments:
|
# or through the ... arguments:
|
||||||
param <- list(max_depth = 2, eta = 1, verbosity = 0, nthread = 2)
|
param <- list(max_depth = 2, eta = 1, verbose = 0, nthread = 2)
|
||||||
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
|
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
|
||||||
objective = logregobj, eval_metric = evalerror)
|
objective = logregobj, eval_metric = evalerror)
|
||||||
|
|
||||||
@ -246,7 +247,7 @@ bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
|
|||||||
|
|
||||||
|
|
||||||
## An xgb.train example of using variable learning rates at each iteration:
|
## An xgb.train example of using variable learning rates at each iteration:
|
||||||
param <- list(max_depth = 2, eta = 1, verbosity = 0, nthread = 2,
|
param <- list(max_depth = 2, eta = 1, verbose = 0, nthread = 2,
|
||||||
objective = "binary:logistic", eval_metric = "auc")
|
objective = "binary:logistic", eval_metric = "auc")
|
||||||
my_etas <- list(eta = c(0.5, 0.1))
|
my_etas <- list(eta = c(0.5, 0.1))
|
||||||
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
|
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
|
||||||
|
|||||||
@ -138,7 +138,7 @@ levels(df[,Treatment])
|
|||||||
|
|
||||||
Next step, we will transform the categorical data to dummy variables.
|
Next step, we will transform the categorical data to dummy variables.
|
||||||
Several encoding methods exist, e.g., [one-hot encoding](http://en.wikipedia.org/wiki/One-hot) is a common approach.
|
Several encoding methods exist, e.g., [one-hot encoding](http://en.wikipedia.org/wiki/One-hot) is a common approach.
|
||||||
We will use the [dummy contrast coding](http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm#dummy) which is popular because it producess "full rank" encoding (also see [this blog post by Max Kuhn](http://appliedpredictivemodeling.com/blog/2013/10/23/the-basics-of-encoding-categorical-data-for-predictive-models)).
|
We will use the [dummy contrast coding](http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm#dummy) which is popular because it produces "full rank" encoding (also see [this blog post by Max Kuhn](http://appliedpredictivemodeling.com/blog/2013/10/23/the-basics-of-encoding-categorical-data-for-predictive-models)).
|
||||||
|
|
||||||
The purpose is to transform each value of each *categorical* feature into a *binary* feature `{0, 1}`.
|
The purpose is to transform each value of each *categorical* feature into a *binary* feature `{0, 1}`.
|
||||||
|
|
||||||
@ -268,7 +268,7 @@ c2 <- chisq.test(df$Age, output_vector)
|
|||||||
print(c2)
|
print(c2)
|
||||||
```
|
```
|
||||||
|
|
||||||
Pearson correlation between Age and illness disapearing is **`r round(c2$statistic, 2 )`**.
|
Pearson correlation between Age and illness disappearing is **`r round(c2$statistic, 2 )`**.
|
||||||
|
|
||||||
```{r, warning=FALSE, message=FALSE}
|
```{r, warning=FALSE, message=FALSE}
|
||||||
c2 <- chisq.test(df$AgeDiscret, output_vector)
|
c2 <- chisq.test(df$AgeDiscret, output_vector)
|
||||||
|
|||||||
@ -313,7 +313,7 @@ Until now, all the learnings we have performed were based on boosting trees. **X
|
|||||||
bst <- xgb.train(data=dtrain, booster = "gblinear", max_depth=2, nthread = 2, nrounds=2, watchlist=watchlist, eval_metric = "error", eval_metric = "logloss", objective = "binary:logistic")
|
bst <- xgb.train(data=dtrain, booster = "gblinear", max_depth=2, nthread = 2, nrounds=2, watchlist=watchlist, eval_metric = "error", eval_metric = "logloss", objective = "binary:logistic")
|
||||||
```
|
```
|
||||||
|
|
||||||
In this specific case, *linear boosting* gets sligtly better performance metrics than decision trees based algorithm.
|
In this specific case, *linear boosting* gets slightly better performance metrics than decision trees based algorithm.
|
||||||
|
|
||||||
In simple cases, it will happen because there is nothing better than a linear algorithm to catch a linear link. However, decision trees are much better to catch a non linear link between predictors and outcome. Because there is no silver bullet, we advise you to check both algorithms with your own datasets to have an idea of what to use.
|
In simple cases, it will happen because there is nothing better than a linear algorithm to catch a linear link. However, decision trees are much better to catch a non linear link between predictors and outcome. Because there is no silver bullet, we advise you to check both algorithms with your own datasets to have an idea of what to use.
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user