Merge pull request #193 from pommedeterresautee/master
Vignette text (very biiiiig change)
This commit is contained in:
commit
90ade3bb84
@ -153,7 +153,7 @@ head(sparse_matrix)
|
|||||||
Create the output `numeric` vector (not as a sparse `Matrix`):
|
Create the output `numeric` vector (not as a sparse `Matrix`):
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
output_vector = df[,Y:=0][Improved == "Marked",Y:=1][,Y]
|
output_vector = df[,Improved] == "Marked"
|
||||||
```
|
```
|
||||||
|
|
||||||
1. set `Y` vector to `0`;
|
1. set `Y` vector to `0`;
|
||||||
@ -250,7 +250,7 @@ According to the plot above, the most important features in this dataset to pred
|
|||||||
|
|
||||||
* the Age ;
|
* the Age ;
|
||||||
* having received a placebo or not ;
|
* having received a placebo or not ;
|
||||||
* the sex is third but already included in the not interesting feature ;
|
* the sex is third but already included in the not interesting features group ;
|
||||||
* then we see our generated features (AgeDiscret). We can see that their contribution is very low.
|
* then we see our generated features (AgeDiscret). We can see that their contribution is very low.
|
||||||
|
|
||||||
Do these results make sense?
|
Do these results make sense?
|
||||||
@ -261,21 +261,21 @@ Let's check some **Chi2** between each of these features and the label.
|
|||||||
Higher **Chi2** means better correlation.
|
Higher **Chi2** means better correlation.
|
||||||
|
|
||||||
```{r, warning=FALSE, message=FALSE}
|
```{r, warning=FALSE, message=FALSE}
|
||||||
c2 <- chisq.test(df$Age, df$Y)
|
c2 <- chisq.test(df$Age, output_vector)
|
||||||
print(c2)
|
print(c2)
|
||||||
```
|
```
|
||||||
|
|
||||||
Pearson correlation between Age and illness disapearing is **`r round(c2$statistic, 2 )`**.
|
Pearson correlation between Age and illness disapearing is **`r round(c2$statistic, 2 )`**.
|
||||||
|
|
||||||
```{r, warning=FALSE, message=FALSE}
|
```{r, warning=FALSE, message=FALSE}
|
||||||
c2 <- chisq.test(df$AgeDiscret, df$Y)
|
c2 <- chisq.test(df$AgeDiscret, output_vector)
|
||||||
print(c2)
|
print(c2)
|
||||||
```
|
```
|
||||||
|
|
||||||
Our first simplification of Age gives a Pearson correlation is **`r round(c2$statistic, 2)`**.
|
Our first simplification of Age gives a Pearson correlation is **`r round(c2$statistic, 2)`**.
|
||||||
|
|
||||||
```{r, warning=FALSE, message=FALSE}
|
```{r, warning=FALSE, message=FALSE}
|
||||||
c2 <- chisq.test(df$AgeCat, df$Y)
|
c2 <- chisq.test(df$AgeCat, output_vector)
|
||||||
print(c2)
|
print(c2)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user