Fix CRAN submission (#6076)
This commit is contained in:
parent
884098ec22
commit
0cd0dad0b5
@ -2,7 +2,7 @@ Package: xgboost
|
||||
Type: Package
|
||||
Title: Extreme Gradient Boosting
|
||||
Version: 1.2.0.1
|
||||
Date: 2020-02-21
|
||||
Date: 2020-08-28
|
||||
Authors@R: c(
|
||||
person("Tianqi", "Chen", role = c("aut"),
|
||||
email = "tianqi.tchen@gmail.com"),
|
||||
|
||||
@ -349,6 +349,7 @@ NULL
|
||||
#' # Save as a stand-alone file (JSON); load it with xgb.load()
|
||||
#' xgb.save(bst, 'xgb.model.json')
|
||||
#' bst2 <- xgb.load('xgb.model.json')
|
||||
#' if (file.exists('xgb.model.json')) file.remove('xgb.model.json')
|
||||
#'
|
||||
#' # Save as a raw byte vector; load it with xgb.load.raw()
|
||||
#' xgb_bytes <- xgb.save.raw(bst)
|
||||
@ -364,6 +365,7 @@ NULL
|
||||
#' obj2 <- readRDS('my_object.rds')
|
||||
#' # Re-construct xgb.Booster object from the bytes
|
||||
#' bst2 <- xgb.load.raw(obj2$xgb_model_bytes)
|
||||
#' if (file.exists('my_object.rds')) file.remove('my_object.rds')
|
||||
#'
|
||||
#' @name a-compatibility-note-for-saveRDS-save
|
||||
NULL
|
||||
|
||||
@ -79,7 +79,7 @@
|
||||
#'
|
||||
#' All observations are used for both training and validation.
|
||||
#'
|
||||
#' Adapted from \url{http://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29#k-fold_cross-validation}
|
||||
#' Adapted from \url{https://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29}
|
||||
#'
|
||||
#' @return
|
||||
#' An object of class \code{xgb.cv.synchronous} with the following elements:
|
||||
|
||||
@ -130,16 +130,16 @@
|
||||
#' Note that when using a customized metric, only this single metric can be used.
|
||||
#' The following is the list of built-in metrics for which Xgboost provides optimized implementation:
|
||||
#' \itemize{
|
||||
#' \item \code{rmse} root mean square error. \url{http://en.wikipedia.org/wiki/Root_mean_square_error}
|
||||
#' \item \code{logloss} negative log-likelihood. \url{http://en.wikipedia.org/wiki/Log-likelihood}
|
||||
#' \item \code{rmse} root mean square error. \url{https://en.wikipedia.org/wiki/Root_mean_square_error}
|
||||
#' \item \code{logloss} negative log-likelihood. \url{https://en.wikipedia.org/wiki/Log-likelihood}
|
||||
#' \item \code{mlogloss} multiclass logloss. \url{https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html}
|
||||
#' \item \code{error} Binary classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
|
||||
#' By default, it uses the 0.5 threshold for predicted values to define negative and positive instances.
|
||||
#' Different threshold (e.g., 0.) could be specified as "error@0."
|
||||
#' \item \code{merror} Multiclass classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
|
||||
#' \item \code{auc} Area under the curve. \url{http://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.
|
||||
#' \item \code{auc} Area under the curve. \url{https://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.
|
||||
#' \item \code{aucpr} Area under the PR curve. \url{https://en.wikipedia.org/wiki/Precision_and_recall} for ranking evaluation.
|
||||
#' \item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{http://en.wikipedia.org/wiki/NDCG}
|
||||
#' \item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{https://en.wikipedia.org/wiki/NDCG}
|
||||
#' }
|
||||
#'
|
||||
#' The following callbacks are automatically created when certain parameters are set:
|
||||
|
||||
@ -43,6 +43,7 @@ bst2 <- xgb.load('xgb.model')
|
||||
# Save as a stand-alone file (JSON); load it with xgb.load()
|
||||
xgb.save(bst, 'xgb.model.json')
|
||||
bst2 <- xgb.load('xgb.model.json')
|
||||
if (file.exists('xgb.model.json')) file.remove('xgb.model.json')
|
||||
|
||||
# Save as a raw byte vector; load it with xgb.load.raw()
|
||||
xgb_bytes <- xgb.save.raw(bst)
|
||||
@ -58,5 +59,6 @@ saveRDS(obj, 'my_object.rds')
|
||||
obj2 <- readRDS('my_object.rds')
|
||||
# Re-construct xgb.Booster object from the bytes
|
||||
bst2 <- xgb.load.raw(obj2$xgb_model_bytes)
|
||||
if (file.exists('my_object.rds')) file.remove('my_object.rds')
|
||||
|
||||
}
|
||||
|
||||
@ -154,7 +154,7 @@ The cross-validation process is then repeated \code{nrounds} times, with each of
|
||||
|
||||
All observations are used for both training and validation.
|
||||
|
||||
Adapted from \url{http://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29#k-fold_cross-validation}
|
||||
Adapted from \url{https://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29}
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package='xgboost')
|
||||
|
||||
@ -215,16 +215,16 @@ User may set one or several \code{eval_metric} parameters.
|
||||
Note that when using a customized metric, only this single metric can be used.
|
||||
The following is the list of built-in metrics for which Xgboost provides optimized implementation:
|
||||
\itemize{
|
||||
\item \code{rmse} root mean square error. \url{http://en.wikipedia.org/wiki/Root_mean_square_error}
|
||||
\item \code{logloss} negative log-likelihood. \url{http://en.wikipedia.org/wiki/Log-likelihood}
|
||||
\item \code{rmse} root mean square error. \url{https://en.wikipedia.org/wiki/Root_mean_square_error}
|
||||
\item \code{logloss} negative log-likelihood. \url{https://en.wikipedia.org/wiki/Log-likelihood}
|
||||
\item \code{mlogloss} multiclass logloss. \url{https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html}
|
||||
\item \code{error} Binary classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
|
||||
By default, it uses the 0.5 threshold for predicted values to define negative and positive instances.
|
||||
Different threshold (e.g., 0.) could be specified as "error@0."
|
||||
\item \code{merror} Multiclass classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}.
|
||||
\item \code{auc} Area under the curve. \url{http://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.
|
||||
\item \code{auc} Area under the curve. \url{https://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation.
|
||||
\item \code{aucpr} Area under the PR curve. \url{https://en.wikipedia.org/wiki/Precision_and_recall} for ranking evaluation.
|
||||
\item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{http://en.wikipedia.org/wiki/NDCG}
|
||||
\item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{https://en.wikipedia.org/wiki/NDCG}
|
||||
}
|
||||
|
||||
The following callbacks are automatically created when certain parameters are set:
|
||||
|
||||
@ -57,7 +57,7 @@ To answer the question above we will convert *categorical* variables to `numeric
|
||||
|
||||
In this Vignette we will see how to transform a *dense* `data.frame` (*dense* = few zeroes in the matrix) with *categorical* variables to a very *sparse* matrix (*sparse* = lots of zero in the matrix) of `numeric` features.
|
||||
|
||||
The method we are going to see is usually called [one-hot encoding](http://en.wikipedia.org/wiki/One-hot).
|
||||
The method we are going to see is usually called [one-hot encoding](https://en.wikipedia.org/wiki/One-hot).
|
||||
|
||||
The first step is to load `Arthritis` dataset in memory and wrap it with `data.table` package.
|
||||
|
||||
@ -66,7 +66,7 @@ data(Arthritis)
|
||||
df <- data.table(Arthritis, keep.rownames = FALSE)
|
||||
```
|
||||
|
||||
> `data.table` is 100% compliant with **R** `data.frame` but its syntax is more consistent and its performance for large dataset is [best in class](http://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly) (`dplyr` from **R** and `Pandas` from **Python** [included](https://github.com/Rdatatable/data.table/wiki/Benchmarks-%3A-Grouping)). Some parts of **Xgboost** **R** package use `data.table`.
|
||||
> `data.table` is 100% compliant with **R** `data.frame` but its syntax is more consistent and its performance for large dataset is [best in class](https://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly) (`dplyr` from **R** and `Pandas` from **Python** [included](https://github.com/Rdatatable/data.table/wiki/Benchmarks-%3A-Grouping)). Some parts of **Xgboost** **R** package use `data.table`.
|
||||
|
||||
The first thing we want to do is to have a look to the first few lines of the `data.table`:
|
||||
|
||||
@ -137,8 +137,8 @@ levels(df[,Treatment])
|
||||
#### Encoding categorical features
|
||||
|
||||
Next step, we will transform the categorical data to dummy variables.
|
||||
Several encoding methods exist, e.g., [one-hot encoding](http://en.wikipedia.org/wiki/One-hot) is a common approach.
|
||||
We will use the [dummy contrast coding](http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm#dummy) which is popular because it produces "full rank" encoding (also see [this blog post by Max Kuhn](http://appliedpredictivemodeling.com/blog/2013/10/23/the-basics-of-encoding-categorical-data-for-predictive-models)).
|
||||
Several encoding methods exist, e.g., [one-hot encoding](https://en.wikipedia.org/wiki/One-hot) is a common approach.
|
||||
We will use the [dummy contrast coding](https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/) which is popular because it produces "full rank" encoding (also see [this blog post by Max Kuhn](http://appliedpredictivemodeling.com/blog/2013/10/23/the-basics-of-encoding-categorical-data-for-predictive-models)).
|
||||
|
||||
The purpose is to transform each value of each *categorical* feature into a *binary* feature `{0, 1}`.
|
||||
|
||||
@ -176,7 +176,7 @@ bst <- xgboost(data = sparse_matrix, label = output_vector, max_depth = 4,
|
||||
|
||||
You can see some `train-error: 0.XXXXX` lines followed by a number. It decreases. Each line shows how well the model explains your data. Lower is better.
|
||||
|
||||
A model which fits too well may [overfit](http://en.wikipedia.org/wiki/Overfitting) (meaning it copy/paste too much the past, and won't be that good to predict the future).
|
||||
A model which fits too well may [overfit](https://en.wikipedia.org/wiki/Overfitting) (meaning it copy/paste too much the past, and won't be that good to predict the future).
|
||||
|
||||
> Here you can see the numbers decrease until line 7 and then increase.
|
||||
>
|
||||
@ -304,7 +304,7 @@ Linear model may not be that smart in this scenario.
|
||||
Special Note: What about Random Forests™?
|
||||
-----------------------------------------
|
||||
|
||||
As you may know, [Random Forests™](http://en.wikipedia.org/wiki/Random_forest) algorithm is cousin with boosting and both are part of the [ensemble learning](http://en.wikipedia.org/wiki/Ensemble_learning) family.
|
||||
As you may know, [Random Forests™](https://en.wikipedia.org/wiki/Random_forest) algorithm is cousin with boosting and both are part of the [ensemble learning](https://en.wikipedia.org/wiki/Ensemble_learning) family.
|
||||
|
||||
Both trains several decision trees for one dataset. The *main* difference is that in Random Forests™, trees are independent and in boosting, the tree `N+1` focus its learning on the loss (<=> what has not been well modeled by the tree `N`).
|
||||
|
||||
|
||||
@ -24,7 +24,7 @@
|
||||
author = "K. Bache and M. Lichman",
|
||||
year = "2013",
|
||||
title = "{UCI} Machine Learning Repository",
|
||||
url = "http://archive.ics.uci.edu/ml",
|
||||
url = "http://archive.ics.uci.edu/ml/",
|
||||
institution = "University of California, Irvine, School of Information and Computer Sciences"
|
||||
}
|
||||
|
||||
|
||||
@ -68,7 +68,7 @@ The version 0.4-2 is on CRAN, and you can install it by:
|
||||
install.packages("xgboost")
|
||||
```
|
||||
|
||||
Formerly available versions can be obtained from the CRAN [archive](https://cran.r-project.org/src/contrib/Archive/xgboost)
|
||||
Formerly available versions can be obtained from the CRAN [archive](https://cran.r-project.org/src/contrib/Archive/xgboost/)
|
||||
|
||||
## Learning
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user