[R] resolve line_length_linter warnings (#8565)

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
This commit is contained in:
James Lamb
2022-12-14 07:04:24 -06:00
committed by GitHub
parent eac980fbfc
commit 7a07dcf651
10 changed files with 305 additions and 75 deletions

View File

@@ -327,10 +327,25 @@ train <- agaricus.train
test <- agaricus.test
#Random Forest - 1000 trees
bst <- xgboost(data = train$data, label = train$label, max_depth = 4, num_parallel_tree = 1000, subsample = 0.5, colsample_bytree =0.5, nrounds = 1, objective = "binary:logistic")
bst <- xgboost(
data = train$data
, label = train$label
, max_depth = 4
, num_parallel_tree = 1000
, subsample = 0.5
, colsample_bytree = 0.5
, nrounds = 1
, objective = "binary:logistic"
)
#Boosting - 3 rounds
bst <- xgboost(data = train$data, label = train$label, max_depth = 4, nrounds = 3, objective = "binary:logistic")
bst <- xgboost(
data = train$data
, label = train$label
, max_depth = 4
, nrounds = 3
, objective = "binary:logistic"
)
```
> Note that the parameter `round` is set to `1`.

View File

@@ -152,7 +152,15 @@ We will train decision tree model using the following parameters:
* `nrounds = 2`: there will be two passes on the data, the second one will enhance the model by further reducing the difference between ground truth and prediction.
```{r trainingSparse, message=F, warning=F}
bstSparse <- xgboost(data = train$data, label = train$label, max_depth = 2, eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
bstSparse <- xgboost(
data = train$data
, label = train$label
, max_depth = 2
, eta = 1
, nthread = 2
, nrounds = 2
, objective = "binary:logistic"
)
```
> More complex the relationship between your features and your `label` is, more passes you need.
@@ -164,7 +172,15 @@ bstSparse <- xgboost(data = train$data, label = train$label, max_depth = 2, eta
Alternatively, you can put your dataset in a *dense* matrix, i.e. a basic **R** matrix.
```{r trainingDense, message=F, warning=F}
bstDense <- xgboost(data = as.matrix(train$data), label = train$label, max_depth = 2, eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
bstDense <- xgboost(
data = as.matrix(train$data)
, label = train$label
, max_depth = 2
, eta = 1
, nthread = 2
, nrounds = 2
, objective = "binary:logistic"
)
```
##### xgb.DMatrix
@@ -173,7 +189,14 @@ bstDense <- xgboost(data = as.matrix(train$data), label = train$label, max_depth
```{r trainingDmatrix, message=F, warning=F}
dtrain <- xgb.DMatrix(data = train$data, label = train$label)
bstDMatrix <- xgboost(data = dtrain, max_depth = 2, eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
bstDMatrix <- xgboost(
data = dtrain
, max_depth = 2
, eta = 1
, nthread = 2
, nrounds = 2
, objective = "binary:logistic"
)
```
##### Verbose option
@@ -184,17 +207,41 @@ One of the simplest way to see the training progress is to set the `verbose` opt
```{r trainingVerbose0, message=T, warning=F}
# verbose = 0, no message
bst <- xgboost(data = dtrain, max_depth = 2, eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic", verbose = 0)
bst <- xgboost(
data = dtrain
, max_depth = 2
, eta = 1
, nthread = 2
, nrounds = 2
, objective = "binary:logistic"
, verbose = 0
)
```
```{r trainingVerbose1, message=T, warning=F}
# verbose = 1, print evaluation metric
bst <- xgboost(data = dtrain, max_depth = 2, eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic", verbose = 1)
bst <- xgboost(
data = dtrain
, max_depth = 2
, eta = 1
, nthread = 2
, nrounds = 2
, objective = "binary:logistic"
, verbose = 1
)
```
```{r trainingVerbose2, message=T, warning=F}
# verbose = 2, also print information about tree
bst <- xgboost(data = dtrain, max_depth = 2, eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic", verbose = 2)
bst <- xgboost(
data = dtrain
, max_depth = 2
, eta = 1
, nthread = 2
, nrounds = 2
, objective = "binary:logistic"
, verbose = 2
)
```
## Basic prediction using XGBoost
@@ -287,7 +334,15 @@ For the purpose of this example, we use `watchlist` parameter. It is a list of `
```{r watchlist, message=F, warning=F}
watchlist <- list(train=dtrain, test=dtest)
bst <- xgb.train(data=dtrain, max_depth=2, eta=1, nthread = 2, nrounds=2, watchlist=watchlist, objective = "binary:logistic")
bst <- xgb.train(
data = dtrain
, max_depth = 2
, eta = 1
, nthread = 2
, nrounds = 2
, watchlist = watchlist
, objective = "binary:logistic"
)
```
**XGBoost** has computed at each round the same average error metric than seen above (we set `nrounds` to 2, that is why we have two lines). Obviously, the `train-error` number is related to the training dataset (the one the algorithm learns from) and the `test-error` number to the test dataset.
@@ -299,7 +354,17 @@ If with your own dataset you have not such results, you should think about how y
For a better understanding of the learning progression, you may want to have some specific metric or even use multiple evaluation metrics.
```{r watchlist2, message=F, warning=F}
bst <- xgb.train(data=dtrain, max_depth=2, eta=1, nthread = 2, nrounds=2, watchlist=watchlist, eval_metric = "error", eval_metric = "logloss", objective = "binary:logistic")
bst <- xgb.train(
data = dtrain
, max_depth = 2
, eta = 1
, nthread = 2
, nrounds = 2
, watchlist = watchlist
, eval_metric = "error"
, eval_metric = "logloss"
, objective = "binary:logistic"
)
```
> `eval_metric` allows us to monitor two new metrics for each round, `logloss` and `error`.
@@ -310,7 +375,17 @@ bst <- xgb.train(data=dtrain, max_depth=2, eta=1, nthread = 2, nrounds=2, watchl
Until now, all the learnings we have performed were based on boosting trees. **XGBoost** implements a second algorithm, based on linear boosting. The only difference with previous command is `booster = "gblinear"` parameter (and removing `eta` parameter).
```{r linearBoosting, message=F, warning=F}
bst <- xgb.train(data=dtrain, booster = "gblinear", max_depth=2, nthread = 2, nrounds=2, watchlist=watchlist, eval_metric = "error", eval_metric = "logloss", objective = "binary:logistic")
bst <- xgb.train(
data = dtrain
, booster = "gblinear"
, max_depth = 2
, nthread = 2
, nrounds = 2
, watchlist = watchlist
, eval_metric = "error"
, eval_metric = "logloss"
, objective = "binary:logistic"
)
```
In this specific case, *linear boosting* gets slightly better performance metrics than decision trees based algorithm.
@@ -328,7 +403,15 @@ Like saving models, `xgb.DMatrix` object (which groups both dataset and outcome)
xgb.DMatrix.save(dtrain, "dtrain.buffer")
# to load it in, simply call xgb.DMatrix
dtrain2 <- xgb.DMatrix("dtrain.buffer")
bst <- xgb.train(data=dtrain2, max_depth=2, eta=1, nthread = 2, nrounds=2, watchlist=watchlist, objective = "binary:logistic")
bst <- xgb.train(
data = dtrain2
, max_depth = 2
, eta = 1
, nthread = 2
, nrounds = 2
, watchlist = watchlist
, objective = "binary:logistic"
)
```
```{r DMatrixDel, include=FALSE}