xgboost/R-package/man/xgb.train.Rd

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/xgb.train.R
\name{xgb.train}
\alias{xgb.train}
\title{eXtreme Gradient Boosting Training}
\usage{
xgb.train(
  params = list(),
  data,
  nrounds,
  evals = list(),
  obj = NULL,
  feval = NULL,
  verbose = 1,
  print_every_n = 1L,
  early_stopping_rounds = NULL,
  maximize = NULL,
  save_period = NULL,
  save_name = "xgboost.model",
  xgb_model = NULL,
  callbacks = list(),
  ...
)
}
\arguments{
\item{params}{the list of parameters. The complete list of parameters is
available in the \href{http://xgboost.readthedocs.io/en/latest/parameter.html}{online documentation}.
Below is a shorter summary:

\strong{1. General Parameters}
\itemize{
\item \code{booster}: Which booster to use, can be \code{gbtree} or \code{gblinear}. Default: \code{gbtree}.
}

\strong{2. Booster Parameters}

\strong{2.1. Parameters for Tree Booster}
\itemize{
\item \code{eta}: The learning rate: scale the contribution of each tree by a factor of \verb{0 < eta < 1}
when it is added to the current approximation.
Used to prevent overfitting by making the boosting process more conservative.
Lower value for \code{eta} implies larger value for \code{nrounds}: low \code{eta} value means model
more robust to overfitting but slower to compute. Default: 0.3.
\item \code{gamma}: Minimum loss reduction required to make a further partition on a leaf node of the tree.
the larger, the more conservative the algorithm will be.
\item \code{max_depth}: Maximum depth of a tree. Default: 6.
\item \code{min_child_weight}: Minimum sum of instance weight (hessian) needed in a child.
If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight,
then the building process will give up further partitioning.
In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node.
The larger, the more conservative the algorithm will be. Default: 1.
\item \code{subsample}: Subsample ratio of the training instance.
Setting it to 0.5 means that xgboost randomly collected half of the data instances to grow trees
and this will prevent overfitting. It makes computation shorter (because less data to analyse).
It is advised to use this parameter with \code{eta} and increase \code{nrounds}. Default: 1.
\item \code{colsample_bytree}: Subsample ratio of columns when constructing each tree. Default: 1.
\item \code{lambda}: L2 regularization term on weights. Default: 1.
\item \code{alpha}: L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0.
\item \code{num_parallel_tree}: Experimental parameter. number of trees to grow per round.
Useful to test Random Forest through XGBoost.
(set \code{colsample_bytree < 1}, \code{subsample  < 1} and \code{round = 1}) accordingly.
Default: 1.
\item \code{monotone_constraints}: A numerical vector consists of \code{1}, \code{0} and \code{-1} with its length
equals to the number of features in the training data.
\code{1} is increasing, \code{-1} is decreasing and \code{0} is no constraint.
\item \code{interaction_constraints}: A list of vectors specifying feature indices of permitted interactions.
Each item of the list represents one permitted interaction where specified features are allowed to interact with each other.
Feature index values should start from \code{0} (\code{0} references the first column).
Leave argument unspecified for no interaction constraints.
}

\strong{2.2. Parameters for Linear Booster}
\itemize{
\item \code{lambda}: L2 regularization term on weights. Default: 0.
\item \code{lambda_bias}: L2 regularization term on bias. Default: 0.
\item \code{alpha}: L1 regularization term on weights. (there is no L1 reg on bias because it is not important). Default: 0.
}

\strong{3. Task Parameters}
\itemize{
\item \code{objective}: Specifies the learning task and the corresponding learning objective.
users can pass a self-defined function to it. The default objective options are below:
\itemize{
\item \code{reg:squarederror}: Regression with squared loss (default).
\item \code{reg:squaredlogerror}: Regression with squared log loss \eqn{1/2 \cdot (\log(pred + 1) - \log(label + 1))^2}.
All inputs are required to be greater than -1.
Also, see metric rmsle for possible issue with this objective.
\item \code{reg:logistic}: Logistic regression.
\item \code{reg:pseudohubererror}: Regression with Pseudo Huber loss, a twice differentiable alternative to absolute loss.
\item \code{binary:logistic}: Logistic regression for binary classification. Output probability.
\item \code{binary:logitraw}: Logistic regression for binary classification, output score before logistic transformation.
\item \code{binary:hinge}: Hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities.
\item \code{count:poisson}: Poisson regression for count data, output mean of Poisson distribution.
The parameter \code{max_delta_step} is set to 0.7 by default in poisson regression
(used to safeguard optimization).
\item \code{survival:cox}: Cox regression for right censored survival time data (negative values are considered right censored).
Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional
hazard function \eqn{h(t) = h_0(t) \cdot HR}.
\item \code{survival:aft}: Accelerated failure time model for censored survival time data. See
\href{https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html}{Survival Analysis with Accelerated Failure Time}
for details.
The parameter \code{aft_loss_distribution} specifies the Probability Density Function
used by \code{survival:aft} and the \code{aft-nloglik} metric.
\item \code{multi:softmax}: Set xgboost to do multiclass classification using the softmax objective.
Class is represented by a number and should be from 0 to \code{num_class - 1}.
\item \code{multi:softprob}: Same as softmax, but prediction outputs a vector of ndata * nclass elements, which can be
further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging
to each class.
\item \code{rank:pairwise}: Set XGBoost to do ranking task by minimizing the pairwise loss.
\item \code{rank:ndcg}: Use LambdaMART to perform list-wise ranking where
\href{https://en.wikipedia.org/wiki/Discounted_cumulative_gain}{Normalized Discounted Cumulative Gain (NDCG)} is maximized.
\item \code{rank:map}: Use LambdaMART to perform list-wise ranking where
\href{https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision}{Mean Average Precision (MAP)}
is maximized.
\item \code{reg:gamma}: Gamma regression with log-link. Output is a mean of gamma distribution.
It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be
\href{https://en.wikipedia.org/wiki/Gamma_distribution#Applications}{gamma-distributed}.
\item \code{reg:tweedie}: Tweedie regression with log-link.
It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be
\href{https://en.wikipedia.org/wiki/Tweedie_distribution#Applications}{Tweedie-distributed}.
}

For custom objectives, one should pass a function taking as input the current predictions (as a numeric
vector or matrix) and the training data (as an \code{xgb.DMatrix} object) that will return a list with elements
\code{grad} and \code{hess}, which should be numeric vectors or matrices with number of rows matching to the numbers
of rows in the training data (same shape as the predictions that are passed as input to the function).
For multi-valued custom objectives, should have shape \verb{[nrows, ntargets]}. Note that negative values of
the Hessian will be clipped, so one might consider using the expected Hessian (Fisher information) if the
objective is non-convex.

See the tutorials \href{https://xgboost.readthedocs.io/en/stable/tutorials/custom_metric_obj.html}{Custom Objective and Evaluation Metric}
and \href{https://xgboost.readthedocs.io/en/stable/tutorials/advanced_custom_obj}{Advanced Usage of Custom Objectives}
for more information about custom objectives.
\item \code{base_score}: The initial prediction score of all instances, global bias. Default: 0.5.
\item \code{eval_metric}: Evaluation metrics for validation data.
Users can pass a self-defined function to it.
Default: metric will be assigned according to objective
(rmse for regression, and error for classification, mean average precision for ranking).
List is provided in detail section.
}}

\item{data}{Training dataset. \code{xgb.train()} accepts only an \code{xgb.DMatrix} as the input.
\code{\link[=xgboost]{xgboost()}}, in addition, also accepts \code{matrix}, \code{dgCMatrix}, or name of a local data file.}

\item{nrounds}{Max number of boosting iterations.}

\item{evals}{Named list of \code{xgb.DMatrix} datasets to use for evaluating model performance.
Metrics specified in either \code{eval_metric} or \code{feval} will be computed for each
of these datasets during each boosting iteration, and stored in the end as a field named
\code{evaluation_log} in the resulting object. When either \code{verbose>=1} or
\code{\link[=xgb.cb.print.evaluation]{xgb.cb.print.evaluation()}} callback is engaged, the performance results are continuously
printed out during the training.
E.g., specifying \code{evals=list(validation1=mat1, validation2=mat2)} allows to track
the performance of each round's model on mat1 and mat2.}

\item{obj}{Customized objective function. Should take two arguments: the first one will be the
current predictions (either a numeric vector or matrix depending on the number of targets / classes),
and the second one will be the \code{data} DMatrix object that is used for training.

It should return a list with two elements \code{grad} and \code{hess} (in that order), as either
numeric vectors or numeric matrices depending on the number of targets / classes (same
dimension as the predictions that are passed as first argument).}

\item{feval}{Customized evaluation function. Just like \code{obj}, should take two arguments, with
the first one being the predictions and the second one the \code{data} DMatrix.

Should return a list with two elements \code{metric} (name that will be displayed for this metric,
should be a string / character), and \code{value} (the number that the function calculates, should
be a numeric scalar).

Note that even if passing \code{feval}, objectives also have an associated default metric that
will be evaluated in addition to it. In order to disable the built-in metric, one can pass
parameter \code{disable_default_eval_metric = TRUE}.}

\item{verbose}{If 0, xgboost will stay silent. If 1, it will print information about performance.
If 2, some additional information will be printed out.
Note that setting \code{verbose > 0} automatically engages the
\code{xgb.cb.print.evaluation(period=1)} callback function.}

\item{print_every_n}{Print each nth iteration evaluation messages when \code{verbose>0}.
Default is 1 which means all messages are printed. This parameter is passed to the
\code{\link[=xgb.cb.print.evaluation]{xgb.cb.print.evaluation()}} callback.}

\item{early_stopping_rounds}{If \code{NULL}, the early stopping function is not triggered.
If set to an integer \code{k}, training with a validation set will stop if the performance
doesn't improve for \code{k} rounds. Setting this parameter engages the \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}} callback.}

\item{maximize}{If \code{feval} and \code{early_stopping_rounds} are set, then this parameter must be set as well.
When it is \code{TRUE}, it means the larger the evaluation score the better.
This parameter is passed to the \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}} callback.}

\item{save_period}{When not \code{NULL}, model is saved to disk after every \code{save_period} rounds.
0 means save at the end. The saving is handled by the \code{\link[=xgb.cb.save.model]{xgb.cb.save.model()}} callback.}

\item{save_name}{the name or path for periodically saved model file.}

\item{xgb_model}{A previously built model to continue the training from.
Could be either an object of class \code{xgb.Booster}, or its raw data, or the name of a
file with a previously saved model.}

\item{callbacks}{A list of callback functions to perform various task during boosting.
See \code{\link[=xgb.Callback]{xgb.Callback()}}. Some of the callbacks are automatically created depending on the
parameters' values. User can provide either existing or their own callback methods in order
to customize the training process.

Note that some callbacks might try to leave attributes in the resulting model object,
such as an evaluation log (a \code{data.table} object) - be aware that these objects are kept
as R attributes, and thus do not get saved when using XGBoost's own serializaters like
\code{\link[=xgb.save]{xgb.save()}} (but are kept when using R serializers like \code{\link[=saveRDS]{saveRDS()}}).}

\item{...}{other parameters to pass to \code{params}.}
}
\value{
An object of class \code{xgb.Booster}.
}
\description{
\code{xgb.train()} is an advanced interface for training an xgboost model.
The \code{\link[=xgboost]{xgboost()}} function is a simpler wrapper for \code{xgb.train()}.
}
\details{
These are the training functions for \code{\link[=xgboost]{xgboost()}}.

The \code{xgb.train()} interface supports advanced features such as \code{evals},
customized objective and evaluation metric functions, therefore it is more flexible
than the \code{\link[=xgboost]{xgboost()}} interface.

Parallelization is automatically enabled if OpenMP is present.
Number of threads can also be manually specified via the \code{nthread} parameter.

While in other interfaces, the default random seed defaults to zero, in R, if a parameter \code{seed}
is not manually supplied, it will generate a random seed through R's own random number generator,
whose seed in turn is controllable through \code{set.seed}. If \code{seed} is passed, it will override the
RNG from R.

The evaluation metric is chosen automatically by XGBoost (according to the objective)
when the \code{eval_metric} parameter is not provided.
User may set one or several \code{eval_metric} parameters.
Note that when using a customized metric, only this single metric can be used.
The following is the list of built-in metrics for which XGBoost provides optimized implementation:
\itemize{
\item \code{rmse}: Root mean square error. \url{https://en.wikipedia.org/wiki/Root_mean_square_error}
\item \code{logloss}: Negative log-likelihood. \url{https://en.wikipedia.org/wiki/Log-likelihood}
\item \code{mlogloss}: Multiclass logloss. \url{https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html}
\item \code{error}: Binary classification error rate. It is calculated as \verb{(# wrong cases) / (# all cases)}.
By default, it uses the 0.5 threshold for predicted values to define negative and positive instances.
Different threshold (e.g., 0.) could be specified as \verb{error@0}.
\item \code{merror}: Multiclass classification error rate. It is calculated as \verb{(# wrong cases) / (# all cases)}.
\item \code{mae}: Mean absolute error.
\item \code{mape}: Mean absolute percentage error.
\item \code{auc}: Area under the curve.
\url{https://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.
\item \code{aucpr}: Area under the PR curve. \url{https://en.wikipedia.org/wiki/Precision_and_recall} for ranking evaluation.
\item \code{ndcg}: Normalized Discounted Cumulative Gain (for ranking task). \url{https://en.wikipedia.org/wiki/NDCG}
}

The following callbacks are automatically created when certain parameters are set:
\itemize{
\item \code{\link[=xgb.cb.print.evaluation]{xgb.cb.print.evaluation()}} is turned on when \code{verbose > 0} and the \code{print_every_n}
parameter is passed to it.
\item \code{\link[=xgb.cb.evaluation.log]{xgb.cb.evaluation.log()}} is on when \code{evals} is present.
\item \code{\link[=xgb.cb.early.stop]{xgb.cb.early.stop()}}: When \code{early_stopping_rounds} is set.
\item \code{\link[=xgb.cb.save.model]{xgb.cb.save.model()}}: When \code{save_period > 0} is set.
}

Note that objects of type \code{xgb.Booster} as returned by this function behave a bit differently
from typical R objects (it's an 'altrep' list class), and it makes a separation between
internal booster attributes (restricted to jsonifyable data), accessed through \code{\link[=xgb.attr]{xgb.attr()}}
and shared between interfaces through serialization functions like \code{\link[=xgb.save]{xgb.save()}}; and
R-specific attributes (typically the result from a callback), accessed through \code{\link[=attributes]{attributes()}}
and \code{\link[=attr]{attr()}}, which are otherwise
only used in the R interface, only kept when using R's serializers like \code{\link[=saveRDS]{saveRDS()}}, and
not anyhow used by functions like \code{predict.xgb.Booster()}.

Be aware that one such R attribute that is automatically added is \code{params} - this attribute
is assigned from the \code{params} argument to this function, and is only meant to serve as a
reference for what went into the booster, but is not used in other methods that take a booster
object - so for example, changing the booster's configuration requires calling \verb{xgb.config<-}
or \verb{xgb.parameters<-}, while simply modifying \verb{attributes(model)$params$<...>} will have no
effect elsewhere.
}
\examples{
data(agaricus.train, package = "xgboost")
data(agaricus.test, package = "xgboost")

## Keep the number of threads to 1 for examples
nthread <- 1
data.table::setDTthreads(nthread)

dtrain <- with(
  agaricus.train, xgb.DMatrix(data, label = label, nthread = nthread)
)
dtest <- with(
  agaricus.test, xgb.DMatrix(data, label = label, nthread = nthread)
)
evals <- list(train = dtrain, eval = dtest)

## A simple xgb.train example:
param <- list(
  max_depth = 2,
  eta = 1,
  nthread = nthread,
  objective = "binary:logistic",
  eval_metric = "auc"
)
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)

## An xgb.train example where custom objective and evaluation metric are
## used:
logregobj <- function(preds, dtrain) {
   labels <- getinfo(dtrain, "label")
   preds <- 1/(1 + exp(-preds))
   grad <- preds - labels
   hess <- preds * (1 - preds)
   return(list(grad = grad, hess = hess))
}
evalerror <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  err <- as.numeric(sum(labels != (preds > 0)))/length(labels)
  return(list(metric = "error", value = err))
}

# These functions could be used by passing them either:
#  as 'objective' and 'eval_metric' parameters in the params list:
param <- list(
  max_depth = 2,
  eta = 1,
  nthread = nthread,
  objective = logregobj,
  eval_metric = evalerror
)
bst <- xgb.train(param, dtrain, nrounds = 2, evals = evals, verbose = 0)

#  or through the ... arguments:
param <- list(max_depth = 2, eta = 1, nthread = nthread)
bst <- xgb.train(
  param,
  dtrain,
  nrounds = 2,
  evals = evals,
  verbose = 0,
  objective = logregobj,
  eval_metric = evalerror
)

#  or as dedicated 'obj' and 'feval' parameters of xgb.train:
bst <- xgb.train(
  param, dtrain, nrounds = 2, evals = evals, obj = logregobj, feval = evalerror
)


## An xgb.train example of using variable learning rates at each iteration:
param <- list(
  max_depth = 2,
  eta = 1,
  nthread = nthread,
  objective = "binary:logistic",
  eval_metric = "auc"
)
my_etas <- list(eta = c(0.5, 0.1))

bst <- xgb.train(
 param,
 dtrain,
 nrounds = 2,
 evals = evals,
 verbose = 0,
 callbacks = list(xgb.cb.reset.parameters(my_etas))
)

## Early stopping:
bst <- xgb.train(
  param, dtrain, nrounds = 25, evals = evals, early_stopping_rounds = 3
)

## An 'xgboost' interface example:
bst <- xgboost(
  x = agaricus.train$data,
  y = factor(agaricus.train$label),
  params = list(max_depth = 2, eta = 1),
  nthread = nthread,
  nrounds = 2
)
pred <- predict(bst, agaricus.test$data)

}
\references{
Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System",
22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, \url{https://arxiv.org/abs/1603.02754}
}
\seealso{
\code{\link[=xgb.Callback]{xgb.Callback()}}, \code{\link[=predict.xgb.Booster]{predict.xgb.Booster()}}, \code{\link[=xgb.cv]{xgb.cv()}}
}