% Generated by roxygen2 (4.1.1): do not edit by hand % Please edit documentation in R/xgb.cv.R \name{xgb.cv} \alias{xgb.cv} \title{Cross Validation} \usage{ xgb.cv(params = list(), data, nrounds, nfold, label = NULL, missing = NA, prediction = FALSE, showsd = TRUE, metrics = list(), obj = NULL, feval = NULL, stratified = TRUE, folds = NULL, verbose = T, print.every.n = 1L, early.stop.round = NULL, maximize = NULL, ...) } \arguments{ \item{params}{the list of parameters. Commonly used ones are: \itemize{ \item \code{objective} objective function, common ones are \itemize{ \item \code{reg:linear} linear regression \item \code{binary:logistic} logistic regression for classification } \item \code{eta} step size of each boosting step \item \code{max.depth} maximum depth of the tree \item \code{nthread} number of thread used in training, if not set, all threads are used } See \link{xgb.train} for further details. See also demo/ for walkthrough example in R.} \item{data}{takes an \code{xgb.DMatrix} or \code{Matrix} as the input.} \item{nrounds}{the max number of iterations} \item{nfold}{the original dataset is randomly partitioned into \code{nfold} equal size subsamples.} \item{label}{option field, when data is \code{Matrix}} \item{missing}{Missing is only used when input is dense matrix, pick a float value that represents missing value. Sometime a data use 0 or other extreme value to represents missing values.} \item{prediction}{A logical value indicating whether to return the prediction vector.} \item{showsd}{\code{boolean}, whether show standard deviation of cross validation} \item{metrics,}{list of evaluation metrics to be used in corss validation, when it is not specified, the evaluation metric is chosen according to objective function. Possible options are: \itemize{ \item \code{error} binary classification error rate \item \code{rmse} Rooted mean square error \item \code{logloss} negative log-likelihood function \item \code{auc} Area under curve \item \code{merror} Exact matching error, used to evaluate multi-class classification }} \item{obj}{customized objective function. Returns gradient and second order gradient with given prediction and dtrain.} \item{feval}{custimized evaluation function. Returns \code{list(metric='metric-name', value='metric-value')} with given prediction and dtrain.} \item{stratified}{\code{boolean} whether sampling of folds should be stratified by the values of labels in \code{data}} \item{folds}{\code{list} provides a possibility of using a list of pre-defined CV folds (each element must be a vector of fold's indices). If folds are supplied, the nfold and stratified parameters would be ignored.} \item{verbose}{\code{boolean}, print the statistics during the process} \item{print.every.n}{Print every N progress messages when \code{verbose>0}. Default is 1 which means all messages are printed.} \item{early.stop.round}{If \code{NULL}, the early stopping function is not triggered. If set to an integer \code{k}, training with a validation set will stop if the performance keeps getting worse consecutively for \code{k} rounds.} \item{maximize}{If \code{feval} and \code{early.stop.round} are set, then \code{maximize} must be set as well. \code{maximize=TRUE} means the larger the evaluation score the better.} \item{...}{other parameters to pass to \code{params}.} } \value{ If \code{prediction = TRUE}, a list with the following elements is returned: \itemize{ \item \code{dt} a \code{data.table} with each mean and standard deviation stat for training set and test set \item \code{pred} an array or matrix (for multiclass classification) with predictions for each CV-fold for the model having been trained on the data in all other folds. } If \code{prediction = FALSE}, just a \code{data.table} with each mean and standard deviation stat for training set and test set is returned. } \description{ The cross valudation function of xgboost } \details{ The original sample is randomly partitioned into \code{nfold} equal size subsamples. Of the \code{nfold} subsamples, a single subsample is retained as the validation data for testing the model, and the remaining \code{nfold - 1} subsamples are used as training data. The cross-validation process is then repeated \code{nrounds} times, with each of the \code{nfold} subsamples used exactly once as the validation data. All observations are used for both training and validation. Adapted from \url{http://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29#k-fold_cross-validation} } \examples{ data(agaricus.train, package='xgboost') dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label) history <- xgb.cv(data = dtrain, nround=3, nthread = 2, nfold = 5, metrics=list("rmse","auc"), max.depth =3, eta = 1, objective = "binary:logistic") print(history) }