diff --git a/R-package/DESCRIPTION b/R-package/DESCRIPTION index b5d7585a3..f4a196130 100644 --- a/R-package/DESCRIPTION +++ b/R-package/DESCRIPTION @@ -2,7 +2,7 @@ Package: xgboost Type: Package Title: Extreme Gradient Boosting Version: 1.2.0.1 -Date: 2020-02-21 +Date: 2020-08-28 Authors@R: c( person("Tianqi", "Chen", role = c("aut"), email = "tianqi.tchen@gmail.com"), diff --git a/R-package/R/utils.R b/R-package/R/utils.R index b0c653f17..846cc1f44 100644 --- a/R-package/R/utils.R +++ b/R-package/R/utils.R @@ -349,6 +349,7 @@ NULL #' # Save as a stand-alone file (JSON); load it with xgb.load() #' xgb.save(bst, 'xgb.model.json') #' bst2 <- xgb.load('xgb.model.json') +#' if (file.exists('xgb.model.json')) file.remove('xgb.model.json') #' #' # Save as a raw byte vector; load it with xgb.load.raw() #' xgb_bytes <- xgb.save.raw(bst) @@ -364,6 +365,7 @@ NULL #' obj2 <- readRDS('my_object.rds') #' # Re-construct xgb.Booster object from the bytes #' bst2 <- xgb.load.raw(obj2$xgb_model_bytes) +#' if (file.exists('my_object.rds')) file.remove('my_object.rds') #' #' @name a-compatibility-note-for-saveRDS-save NULL diff --git a/R-package/R/xgb.cv.R b/R-package/R/xgb.cv.R index fd74d0f6b..fb48ca607 100644 --- a/R-package/R/xgb.cv.R +++ b/R-package/R/xgb.cv.R @@ -79,7 +79,7 @@ #' #' All observations are used for both training and validation. #' -#' Adapted from \url{http://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29#k-fold_cross-validation} +#' Adapted from \url{https://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29} #' #' @return #' An object of class \code{xgb.cv.synchronous} with the following elements: diff --git a/R-package/R/xgb.train.R b/R-package/R/xgb.train.R index 85021959c..0449ae266 100644 --- a/R-package/R/xgb.train.R +++ b/R-package/R/xgb.train.R @@ -130,16 +130,16 @@ #' Note that when using a customized metric, only this single metric can be used. #' The following is the list of built-in metrics for which Xgboost provides optimized implementation: #' \itemize{ -#' \item \code{rmse} root mean square error. \url{http://en.wikipedia.org/wiki/Root_mean_square_error} -#' \item \code{logloss} negative log-likelihood. \url{http://en.wikipedia.org/wiki/Log-likelihood} +#' \item \code{rmse} root mean square error. \url{https://en.wikipedia.org/wiki/Root_mean_square_error} +#' \item \code{logloss} negative log-likelihood. \url{https://en.wikipedia.org/wiki/Log-likelihood} #' \item \code{mlogloss} multiclass logloss. \url{https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html} #' \item \code{error} Binary classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}. #' By default, it uses the 0.5 threshold for predicted values to define negative and positive instances. #' Different threshold (e.g., 0.) could be specified as "error@0." #' \item \code{merror} Multiclass classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}. -#' \item \code{auc} Area under the curve. \url{http://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation. +#' \item \code{auc} Area under the curve. \url{https://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation. #' \item \code{aucpr} Area under the PR curve. \url{https://en.wikipedia.org/wiki/Precision_and_recall} for ranking evaluation. -#' \item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{http://en.wikipedia.org/wiki/NDCG} +#' \item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{https://en.wikipedia.org/wiki/NDCG} #' } #' #' The following callbacks are automatically created when certain parameters are set: diff --git a/R-package/man/a-compatibility-note-for-saveRDS-save.Rd b/R-package/man/a-compatibility-note-for-saveRDS-save.Rd index 63b8dfce5..85b52243c 100644 --- a/R-package/man/a-compatibility-note-for-saveRDS-save.Rd +++ b/R-package/man/a-compatibility-note-for-saveRDS-save.Rd @@ -43,6 +43,7 @@ bst2 <- xgb.load('xgb.model') # Save as a stand-alone file (JSON); load it with xgb.load() xgb.save(bst, 'xgb.model.json') bst2 <- xgb.load('xgb.model.json') +if (file.exists('xgb.model.json')) file.remove('xgb.model.json') # Save as a raw byte vector; load it with xgb.load.raw() xgb_bytes <- xgb.save.raw(bst) @@ -58,5 +59,6 @@ saveRDS(obj, 'my_object.rds') obj2 <- readRDS('my_object.rds') # Re-construct xgb.Booster object from the bytes bst2 <- xgb.load.raw(obj2$xgb_model_bytes) +if (file.exists('my_object.rds')) file.remove('my_object.rds') } diff --git a/R-package/man/xgb.cv.Rd b/R-package/man/xgb.cv.Rd index 98e70e48c..86a88007b 100644 --- a/R-package/man/xgb.cv.Rd +++ b/R-package/man/xgb.cv.Rd @@ -154,7 +154,7 @@ The cross-validation process is then repeated \code{nrounds} times, with each of All observations are used for both training and validation. -Adapted from \url{http://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29#k-fold_cross-validation} +Adapted from \url{https://en.wikipedia.org/wiki/Cross-validation_\%28statistics\%29} } \examples{ data(agaricus.train, package='xgboost') diff --git a/R-package/man/xgb.train.Rd b/R-package/man/xgb.train.Rd index 1fcd6c0e3..e68962fb6 100644 --- a/R-package/man/xgb.train.Rd +++ b/R-package/man/xgb.train.Rd @@ -215,16 +215,16 @@ User may set one or several \code{eval_metric} parameters. Note that when using a customized metric, only this single metric can be used. The following is the list of built-in metrics for which Xgboost provides optimized implementation: \itemize{ - \item \code{rmse} root mean square error. \url{http://en.wikipedia.org/wiki/Root_mean_square_error} - \item \code{logloss} negative log-likelihood. \url{http://en.wikipedia.org/wiki/Log-likelihood} + \item \code{rmse} root mean square error. \url{https://en.wikipedia.org/wiki/Root_mean_square_error} + \item \code{logloss} negative log-likelihood. \url{https://en.wikipedia.org/wiki/Log-likelihood} \item \code{mlogloss} multiclass logloss. \url{https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html} \item \code{error} Binary classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}. By default, it uses the 0.5 threshold for predicted values to define negative and positive instances. Different threshold (e.g., 0.) could be specified as "error@0." \item \code{merror} Multiclass classification error rate. It is calculated as \code{(# wrong cases) / (# all cases)}. - \item \code{auc} Area under the curve. \url{http://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation. + \item \code{auc} Area under the curve. \url{https://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve} for ranking evaluation. \item \code{aucpr} Area under the PR curve. \url{https://en.wikipedia.org/wiki/Precision_and_recall} for ranking evaluation. - \item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{http://en.wikipedia.org/wiki/NDCG} + \item \code{ndcg} Normalized Discounted Cumulative Gain (for ranking task). \url{https://en.wikipedia.org/wiki/NDCG} } The following callbacks are automatically created when certain parameters are set: diff --git a/R-package/vignettes/discoverYourData.Rmd b/R-package/vignettes/discoverYourData.Rmd index 8181fcbb9..c41f4f125 100644 --- a/R-package/vignettes/discoverYourData.Rmd +++ b/R-package/vignettes/discoverYourData.Rmd @@ -57,7 +57,7 @@ To answer the question above we will convert *categorical* variables to `numeric In this Vignette we will see how to transform a *dense* `data.frame` (*dense* = few zeroes in the matrix) with *categorical* variables to a very *sparse* matrix (*sparse* = lots of zero in the matrix) of `numeric` features. -The method we are going to see is usually called [one-hot encoding](http://en.wikipedia.org/wiki/One-hot). +The method we are going to see is usually called [one-hot encoding](https://en.wikipedia.org/wiki/One-hot). The first step is to load `Arthritis` dataset in memory and wrap it with `data.table` package. @@ -66,7 +66,7 @@ data(Arthritis) df <- data.table(Arthritis, keep.rownames = FALSE) ``` -> `data.table` is 100% compliant with **R** `data.frame` but its syntax is more consistent and its performance for large dataset is [best in class](http://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly) (`dplyr` from **R** and `Pandas` from **Python** [included](https://github.com/Rdatatable/data.table/wiki/Benchmarks-%3A-Grouping)). Some parts of **Xgboost** **R** package use `data.table`. +> `data.table` is 100% compliant with **R** `data.frame` but its syntax is more consistent and its performance for large dataset is [best in class](https://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly) (`dplyr` from **R** and `Pandas` from **Python** [included](https://github.com/Rdatatable/data.table/wiki/Benchmarks-%3A-Grouping)). Some parts of **Xgboost** **R** package use `data.table`. The first thing we want to do is to have a look to the first few lines of the `data.table`: @@ -137,8 +137,8 @@ levels(df[,Treatment]) #### Encoding categorical features Next step, we will transform the categorical data to dummy variables. -Several encoding methods exist, e.g., [one-hot encoding](http://en.wikipedia.org/wiki/One-hot) is a common approach. -We will use the [dummy contrast coding](http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm#dummy) which is popular because it produces "full rank" encoding (also see [this blog post by Max Kuhn](http://appliedpredictivemodeling.com/blog/2013/10/23/the-basics-of-encoding-categorical-data-for-predictive-models)). +Several encoding methods exist, e.g., [one-hot encoding](https://en.wikipedia.org/wiki/One-hot) is a common approach. +We will use the [dummy contrast coding](https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/) which is popular because it produces "full rank" encoding (also see [this blog post by Max Kuhn](http://appliedpredictivemodeling.com/blog/2013/10/23/the-basics-of-encoding-categorical-data-for-predictive-models)). The purpose is to transform each value of each *categorical* feature into a *binary* feature `{0, 1}`. @@ -176,7 +176,7 @@ bst <- xgboost(data = sparse_matrix, label = output_vector, max_depth = 4, You can see some `train-error: 0.XXXXX` lines followed by a number. It decreases. Each line shows how well the model explains your data. Lower is better. -A model which fits too well may [overfit](http://en.wikipedia.org/wiki/Overfitting) (meaning it copy/paste too much the past, and won't be that good to predict the future). +A model which fits too well may [overfit](https://en.wikipedia.org/wiki/Overfitting) (meaning it copy/paste too much the past, and won't be that good to predict the future). > Here you can see the numbers decrease until line 7 and then increase. > @@ -304,7 +304,7 @@ Linear model may not be that smart in this scenario. Special Note: What about Random Forests™? ----------------------------------------- -As you may know, [Random Forests™](http://en.wikipedia.org/wiki/Random_forest) algorithm is cousin with boosting and both are part of the [ensemble learning](http://en.wikipedia.org/wiki/Ensemble_learning) family. +As you may know, [Random Forests™](https://en.wikipedia.org/wiki/Random_forest) algorithm is cousin with boosting and both are part of the [ensemble learning](https://en.wikipedia.org/wiki/Ensemble_learning) family. Both trains several decision trees for one dataset. The *main* difference is that in Random Forests™, trees are independent and in boosting, the tree `N+1` focus its learning on the loss (<=> what has not been well modeled by the tree `N`). diff --git a/R-package/vignettes/xgboost.bib b/R-package/vignettes/xgboost.bib index f21bdae16..5deb1e13d 100644 --- a/R-package/vignettes/xgboost.bib +++ b/R-package/vignettes/xgboost.bib @@ -24,7 +24,7 @@ author = "K. Bache and M. Lichman", year = "2013", title = "{UCI} Machine Learning Repository", - url = "http://archive.ics.uci.edu/ml", + url = "http://archive.ics.uci.edu/ml/", institution = "University of California, Irvine, School of Information and Computer Sciences" } diff --git a/R-package/vignettes/xgboostPresentation.Rmd b/R-package/vignettes/xgboostPresentation.Rmd index c2f990e14..ab72c6779 100644 --- a/R-package/vignettes/xgboostPresentation.Rmd +++ b/R-package/vignettes/xgboostPresentation.Rmd @@ -68,7 +68,7 @@ The version 0.4-2 is on CRAN, and you can install it by: install.packages("xgboost") ``` -Formerly available versions can be obtained from the CRAN [archive](https://cran.r-project.org/src/contrib/Archive/xgboost) +Formerly available versions can be obtained from the CRAN [archive](https://cran.r-project.org/src/contrib/Archive/xgboost/) ## Learning