[R] various R code maintenance (#1964)

* [R] xgb.save must work when handle in nil but raw exists

* [R] print.xgb.Booster should still print other info when handle is nil

* [R] rename internal function xgb.Booster to xgb.Booster.handle to make its intent clear

* [R] rename xgb.Booster.check to xgb.Booster.complete and make it visible; more docs

* [R] storing evaluation_log should depend only on watchlist, not on verbose

* [R] reduce the excessive chattiness of unit tests

* [R] only disable some tests in windows when it's not 64-bit

* [R] clean-up xgb.DMatrix

* [R] test xgb.DMatrix loading from libsvm text file

* [R] store feature_names in xgb.Booster, use them from utility functions

* [R] remove non-functional co-occurence computation from xgb.importance

* [R] verbose=0 is enough without a callback

* [R] added forgotten xgb.Booster.complete.Rd; cran check fixes

* [R] update installation instructions
This commit is contained in:
Vadim Khotilovich
2017-01-21 13:22:46 -06:00
committed by Tianqi Chen
parent a073a2c3d4
commit 2b5b96d760
27 changed files with 561 additions and 327 deletions

View File

@@ -23,8 +23,7 @@ xgboost(data = NULL, label = NULL, missing = NA, weight = NULL,
1. General Parameters
\itemize{
\item \code{booster} which booster to use, can be \code{gbtree} or \code{gblinear}. Default: \code{gbtree}
\item \code{silent} 0 means printing running messages, 1 means silent mode. Default: 0
\item \code{booster} which booster to use, can be \code{gbtree} or \code{gblinear}. Default: \code{gbtree}.
}
2. Booster Parameters
@@ -68,16 +67,19 @@ xgboost(data = NULL, label = NULL, missing = NA, weight = NULL,
\item \code{eval_metric} evaluation metrics for validation data. Users can pass a self-defined function to it. Default: metric will be assigned according to objective(rmse for regression, and error for classification, mean average precision for ranking). List is provided in detail section.
}}
\item{data}{input dataset. \code{xgb.train} takes only an \code{xgb.DMatrix} as the input.
\code{xgboost}, in addition, also accepts \code{matrix}, \code{dgCMatrix}, or local data file.}
\item{data}{training dataset. \code{xgb.train} accepts only an \code{xgb.DMatrix} as the input.
\code{xgboost}, in addition, also accepts \code{matrix}, \code{dgCMatrix}, or name of a local data file.}
\item{nrounds}{the max number of iterations}
\item{nrounds}{max number of boosting iterations.}
\item{watchlist}{what information should be printed when \code{verbose=1} or
\code{verbose=2}. Watchlist is used to specify validation set monitoring
during training. For example user can specify
watchlist=list(validation1=mat1, validation2=mat2) to watch
the performance of each round's model on mat1 and mat2}
\item{watchlist}{named list of xgb.DMatrix datasets to use for evaluating model performance.
Metrics specified in either \code{eval_metric} or \code{feval} will be computed for each
of these datasets during each boosting iteration, and stored in the end as a field named
\code{evaluation_log} in the resulting object. When either \code{verbose>=1} or
\code{\link{cb.print.evaluation}} callback is engaged, the performance results are continuously
printed out during the training.
E.g., specifying \code{watchlist=list(validation1=mat1, validation2=mat2)} allows to track
the performance of each round's model on mat1 and mat2.}
\item{obj}{customized objective function. Returns gradient and second order
gradient with given prediction and dtrain.}
@@ -86,10 +88,10 @@ gradient with given prediction and dtrain.}
\code{list(metric='metric-name', value='metric-value')} with given
prediction and dtrain.}
\item{verbose}{If 0, xgboost will stay silent. If 1, xgboost will print
information of performance. If 2, xgboost will print some additional information.
Setting \code{verbose > 0} automatically engages the \code{\link{cb.evaluation.log}} and
\code{\link{cb.print.evaluation}} callback functions.}
\item{verbose}{If 0, xgboost will stay silent. If 1, it will print information about performance.
If 2, some additional information will be printed out.
Note that setting \code{verbose > 0} automatically engages the
\code{cb.print.evaluation(period=1)} callback function.}
\item{print_every_n}{Print each n-th iteration evaluation messages when \code{verbose>0}.
Default is 1 which means all messages are printed. This parameter is passed to the
@@ -151,17 +153,20 @@ An object of class \code{xgb.Booster} with the following elements:
(only available with early stopping).
\item \code{best_score} the best evaluation metric value during early stopping.
(only available with early stopping).
\item \code{feature_names} names of the training dataset features
(only when comun names were defined in training data).
}
}
\description{
\code{xgb.train} is an advanced interface for training an xgboost model. The \code{xgboost} function provides a simpler interface.
\code{xgb.train} is an advanced interface for training an xgboost model.
The \code{xgboost} function is a simpler wrapper for \code{xgb.train}.
}
\details{
These are the training functions for \code{xgboost}.
The \code{xgb.train} interface supports advanced features such as \code{watchlist},
customized objective and evaluation metric functions, therefore it is more flexible
than the \code{\link{xgboost}} interface.
than the \code{xgboost} interface.
Parallelization is automatically enabled if \code{OpenMP} is present.
Number of threads can also be manually specified via \code{nthread} parameter.
@@ -187,7 +192,7 @@ The following callbacks are automatically created when certain parameters are se
\itemize{
\item \code{cb.print.evaluation} is turned on when \code{verbose > 0};
and the \code{print_every_n} parameter is passed to it.
\item \code{cb.evaluation.log} is on when \code{verbose > 0} and \code{watchlist} is present.
\item \code{cb.evaluation.log} is on when \code{watchlist} is present.
\item \code{cb.early.stop}: when \code{early_stopping_rounds} is set.
\item \code{cb.save.model}: when \code{save_period > 0} is set.
}
@@ -198,7 +203,7 @@ data(agaricus.test, package='xgboost')
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
watchlist <- list(eval = dtest, train = dtrain)
watchlist <- list(train = dtrain, eval = dtest)
## A simple xgb.train example:
param <- list(max_depth = 2, eta = 1, silent = 1, nthread = 2,
@@ -237,17 +242,15 @@ bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
## An xgb.train example of using variable learning rates at each iteration:
param <- list(max_depth = 2, eta = 1, silent = 1, nthread = 2)
param <- list(max_depth = 2, eta = 1, silent = 1, nthread = 2,
objective = "binary:logistic", eval_metric = "auc")
my_etas <- list(eta = c(0.5, 0.1))
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
callbacks = list(cb.reset.parameters(my_etas)))
## Explicit use of the cb.evaluation.log callback allows to run
## xgb.train silently but still store the evaluation results:
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
verbose = 0, callbacks = list(cb.evaluation.log()))
print(bst$evaluation_log)
## Early stopping:
bst <- xgb.train(param, dtrain, nrounds = 25, watchlist,
early_stopping_rounds = 3)
## An 'xgboost' interface example:
bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label,