text vignette

2015-02-12 17:36:10 +01:00
parent 7f71cc12f4
commit ba36c495be
2 changed files with 17 additions and 14 deletions
--- a/R-package/vignettes/discoverYourData.Rmd
+++ b/R-package/vignettes/discoverYourData.Rmd
@@ -37,14 +37,14 @@ Sometimes the dataset we have to work on have *categorical* data.

 A *categorical* variable is one which have a fixed number of different values. By exemple, if for each observation a variable called *Colour* can have only *red*, *blue* or *green* as value, it is a *categorical* variable.

-> In **R**, *categorical* variable is called `factor`.
+> In *R*, *categorical* variable is called `factor`.
 > Type `?factor` in console for more information.

 In this demo we will see how to transform a dense dataframe (dense = few zero in the matrix) with *categorical* variables to a very sparse matrix (sparse = lots of zero in the matrix) of `numeric` features before analyzing these data in **Xgboost**.

 The method we are going to see is usually called [one hot encoding](http://en.wikipedia.org/wiki/One-hot).

-The first step is to load Arthritis dataset in memory and wrap the dataset with `data.table` package (`data.table` is 100% compliant with **R** dataframe but its syntax is a lot more consistent and its performance are really good).
+The first step is to load Arthritis dataset in memory and wrap the dataset with `data.table` package (`data.table` is 100% compliant with *R* dataframe but its syntax is a lot more consistent and its performance are really good).

 ```{r, results='hide'}
 data(Arthritis)