fixed typos in R package docs (#4345)

* fixed typos in R package docs

* updated verbosity parameter in xgb.train docs
This commit is contained in:
James Lamb
2019-04-21 02:54:11 -05:00
committed by Jiaming Yuan
parent 65db8d0626
commit 5e97de6a41
30 changed files with 414 additions and 413 deletions

View File

@@ -138,7 +138,7 @@ levels(df[,Treatment])
Next step, we will transform the categorical data to dummy variables.
Several encoding methods exist, e.g., [one-hot encoding](http://en.wikipedia.org/wiki/One-hot) is a common approach.
We will use the [dummy contrast coding](http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm#dummy) which is popular because it producess "full rank" encoding (also see [this blog post by Max Kuhn](http://appliedpredictivemodeling.com/blog/2013/10/23/the-basics-of-encoding-categorical-data-for-predictive-models)).
We will use the [dummy contrast coding](http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm#dummy) which is popular because it produces "full rank" encoding (also see [this blog post by Max Kuhn](http://appliedpredictivemodeling.com/blog/2013/10/23/the-basics-of-encoding-categorical-data-for-predictive-models)).
The purpose is to transform each value of each *categorical* feature into a *binary* feature `{0, 1}`.
@@ -268,7 +268,7 @@ c2 <- chisq.test(df$Age, output_vector)
print(c2)
```
Pearson correlation between Age and illness disapearing is **`r round(c2$statistic, 2 )`**.
Pearson correlation between Age and illness disappearing is **`r round(c2$statistic, 2 )`**.
```{r, warning=FALSE, message=FALSE}
c2 <- chisq.test(df$AgeDiscret, output_vector)

View File

@@ -313,7 +313,7 @@ Until now, all the learnings we have performed were based on boosting trees. **X
bst <- xgb.train(data=dtrain, booster = "gblinear", max_depth=2, nthread = 2, nrounds=2, watchlist=watchlist, eval_metric = "error", eval_metric = "logloss", objective = "binary:logistic")
```
In this specific case, *linear boosting* gets sligtly better performance metrics than decision trees based algorithm.
In this specific case, *linear boosting* gets slightly better performance metrics than decision trees based algorithm.
In simple cases, it will happen because there is nothing better than a linear algorithm to catch a linear link. However, decision trees are much better to catch a non linear link between predictors and outcome. Because there is no silver bullet, we advise you to check both algorithms with your own datasets to have an idea of what to use.