Merge pull request #199 from pommedeterresautee/master
Cross validation documentation improvement
This commit is contained in:
commit
8025b338a8
@ -25,12 +25,12 @@
|
||||
#' \item \code{nthread} number of thread used in training, if not set, all threads are used
|
||||
#' }
|
||||
#'
|
||||
#' See \url{https://github.com/tqchen/xgboost/wiki/Parameters} for
|
||||
#' further details. See also demo/ for walkthrough example in R.
|
||||
#' @param data takes an \code{xgb.DMatrix} as the input.
|
||||
#' See \link{xgb.train} for further details.
|
||||
#' See also demo/ for walkthrough example in R.
|
||||
#' @param data takes an \code{xgb.DMatrix} or \code{Matrix} as the input.
|
||||
#' @param nrounds the max number of iterations
|
||||
#' @param nfold number of folds used
|
||||
#' @param label option field, when data is Matrix
|
||||
#' @param nfold the original dataset is randomly partitioned into \code{nfold} equal size subsamples.
|
||||
#' @param label option field, when data is \code{Matrix}
|
||||
#' @param missing Missing is only used when input is dense matrix, pick a float
|
||||
#' value that represents missing value. Sometime a data use 0 or other extreme value to represents missing values.
|
||||
#' @param prediction A logical value indicating whether to return the prediction vector.
|
||||
@ -56,18 +56,21 @@
|
||||
#' @return A \code{data.table} with each mean and standard deviation stat for training set and test set.
|
||||
#'
|
||||
#' @details
|
||||
#' This is the cross validation function for xgboost
|
||||
#'
|
||||
#' Parallelization is automatically enabled if OpenMP is present.
|
||||
#' Number of threads can also be manually specified via "nthread" parameter.
|
||||
#' The original sample is randomly partitioned into \code{nfold} equal size subsamples.
|
||||
#'
|
||||
#' This function only accepts an \code{xgb.DMatrix} object as the input.
|
||||
#' Of the \code{nfold} subsamples, a single subsample is retained as the validation data for testing the model, and the remaining \code{nfold - 1} subsamples are used as training data.
|
||||
#'
|
||||
#' The cross-validation process is then repeated \code{nrounds} times, with each of the \code{nfold} subsamples used exactly once as the validation data.
|
||||
#'
|
||||
#' All observations are used for both training and validation.
|
||||
#'
|
||||
#' Adapted from \url{http://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29#k-fold_cross-validation}
|
||||
#'
|
||||
#' @examples
|
||||
#' data(agaricus.train, package='xgboost')
|
||||
#' dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
|
||||
#' history <- xgb.cv(data = dtrain, nround=3, nthread = 2, nfold = 5, metrics=list("rmse","auc"),
|
||||
#' "max.depth"=3, "eta"=1, "objective"="binary:logistic")
|
||||
#' max.depth =3, eta = 1, objective = "binary:logistic")
|
||||
#' print(history)
|
||||
#' @export
|
||||
#'
|
||||
|
||||
@ -21,16 +21,16 @@ xgb.cv(params = list(), data, nrounds, nfold, label = NULL,
|
||||
\item \code{nthread} number of thread used in training, if not set, all threads are used
|
||||
}
|
||||
|
||||
See \url{https://github.com/tqchen/xgboost/wiki/Parameters} for
|
||||
further details. See also demo/ for walkthrough example in R.}
|
||||
See \link{xgb.train} for further details.
|
||||
See also demo/ for walkthrough example in R.}
|
||||
|
||||
\item{data}{takes an \code{xgb.DMatrix} as the input.}
|
||||
\item{data}{takes an \code{xgb.DMatrix} or \code{Matrix} as the input.}
|
||||
|
||||
\item{nrounds}{the max number of iterations}
|
||||
|
||||
\item{nfold}{number of folds used}
|
||||
\item{nfold}{the original dataset is randomly partitioned into \code{nfold} equal size subsamples.}
|
||||
|
||||
\item{label}{option field, when data is Matrix}
|
||||
\item{label}{option field, when data is \code{Matrix}}
|
||||
|
||||
\item{missing}{Missing is only used when input is dense matrix, pick a float
|
||||
value that represents missing value. Sometime a data use 0 or other extreme value to represents missing values.}
|
||||
@ -68,18 +68,21 @@ A \code{data.table} with each mean and standard deviation stat for training set
|
||||
The cross valudation function of xgboost
|
||||
}
|
||||
\details{
|
||||
This is the cross validation function for xgboost
|
||||
The original sample is randomly partitioned into \code{nfold} equal size subsamples.
|
||||
|
||||
Parallelization is automatically enabled if OpenMP is present.
|
||||
Number of threads can also be manually specified via "nthread" parameter.
|
||||
Of the \code{nfold} subsamples, a single subsample is retained as the validation data for testing the model, and the remaining \code{nfold - 1} subsamples are used as training data.
|
||||
|
||||
This function only accepts an \code{xgb.DMatrix} object as the input.
|
||||
The cross-validation process is then repeated \code{nrounds} times, with each of the \code{nfold} subsamples used exactly once as the validation data.
|
||||
|
||||
All observations are used for both training and validation.
|
||||
|
||||
Adapted from \url{http://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29#k-fold_cross-validation}
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package='xgboost')
|
||||
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
|
||||
history <- xgb.cv(data = dtrain, nround=3, nthread = 2, nfold = 5, metrics=list("rmse","auc"),
|
||||
"max.depth"=3, "eta"=1, "objective"="binary:logistic")
|
||||
max.depth =3, eta = 1, objective = "binary:logistic")
|
||||
print(history)
|
||||
}
|
||||
|
||||
|
||||
12
README.md
12
README.md
@ -1,5 +1,5 @@
|
||||
xgboost: eXtreme Gradient Boosting
|
||||
======
|
||||
XGBoost: eXtreme Gradient Boosting
|
||||
==================================
|
||||
An optimized general purpose gradient boosting library. The library is parallelized, and also provides an optimized distributed version.
|
||||
It implements machine learning algorithm under gradient boosting framework, including generalized linear model and gradient boosted regression tree (GBDT). XGBoost can also also distributed and scale to even larger data.
|
||||
|
||||
@ -23,7 +23,7 @@ Learning about the model: [Introduction to Boosted Trees](http://homes.cs.washin
|
||||
* The model presented is used by xgboost for boosted trees
|
||||
|
||||
What's New
|
||||
=====
|
||||
==========
|
||||
* [Distributed XGBoost now runs on YARN](multi-node/hadoop)!
|
||||
* [xgboost user group](https://groups.google.com/forum/#!forum/xgboost-user/) for tracking changes, sharing your experience on xgboost
|
||||
* [Distributed XGBoost](multi-node) is now available!!
|
||||
@ -37,7 +37,7 @@ What's New
|
||||
* Thanks to Tong He, the new [R package](R-package) is available
|
||||
|
||||
Features
|
||||
======
|
||||
========
|
||||
* Sparse feature format:
|
||||
- Sparse feature format allows easy handling of missing values, and improve computation efficiency.
|
||||
* Push the limit on single machine:
|
||||
@ -74,7 +74,7 @@ Build
|
||||
Then run ```bash build.sh``` normally.
|
||||
|
||||
Version
|
||||
======
|
||||
=======
|
||||
* This version xgboost-0.3, the code has been refactored from 0.2x to be cleaner and more flexibility
|
||||
* This version of xgboost is not compatible with 0.2x, due to huge amount of changes in code structure
|
||||
- This means the model and buffer file of previous version can not be loaded in xgboost-3.0
|
||||
@ -82,6 +82,6 @@ Version
|
||||
* Change log in [CHANGES.md](CHANGES.md)
|
||||
|
||||
XGBoost in Graphlab Create
|
||||
======
|
||||
==========================
|
||||
* XGBoost is adopted as part of boosted tree toolkit in Graphlab Create (GLC). Graphlab Create is a powerful python toolkit that allows you to data manipulation, graph processing, hyper-parameter search, and visualization of TeraBytes scale data in one framework. Try the Graphlab Create in http://graphlab.com/products/create/quick-start-guide.html
|
||||
* Nice blogpost by Jay Gu using GLC boosted tree to solve kaggle bike sharing challenge: http://blog.graphlab.com/using-gradient-boosted-trees-to-predict-bike-sharing-demand
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user