text vignette
This commit is contained in:
@@ -37,14 +37,14 @@ Sometimes the dataset we have to work on have *categorical* data.
|
||||
|
||||
A *categorical* variable is one which have a fixed number of different values. By exemple, if for each observation a variable called *Colour* can have only *red*, *blue* or *green* as value, it is a *categorical* variable.
|
||||
|
||||
> In **R**, *categorical* variable is called `factor`.
|
||||
> In *R*, *categorical* variable is called `factor`.
|
||||
> Type `?factor` in console for more information.
|
||||
|
||||
In this demo we will see how to transform a dense dataframe (dense = few zero in the matrix) with *categorical* variables to a very sparse matrix (sparse = lots of zero in the matrix) of `numeric` features before analyzing these data in **Xgboost**.
|
||||
|
||||
The method we are going to see is usually called [one hot encoding](http://en.wikipedia.org/wiki/One-hot).
|
||||
|
||||
The first step is to load Arthritis dataset in memory and wrap the dataset with `data.table` package (`data.table` is 100% compliant with **R** dataframe but its syntax is a lot more consistent and its performance are really good).
|
||||
The first step is to load Arthritis dataset in memory and wrap the dataset with `data.table` package (`data.table` is 100% compliant with *R* dataframe but its syntax is a lot more consistent and its performance are really good).
|
||||
|
||||
```{r, results='hide'}
|
||||
data(Arthritis)
|
||||
|
||||
Reference in New Issue
Block a user