Merge pull request #193 from pommedeterresautee/master

Vignette text (very biiiiig change)
2015-03-13 14:50:49 -07:00
parent e52de85e59 93a019d174
commit 90ade3bb84
1 changed files with 5 additions and 5 deletions
--- a/R-package/vignettes/discoverYourData.Rmd
+++ b/R-package/vignettes/discoverYourData.Rmd
@@ -153,7 +153,7 @@ head(sparse_matrix)
 Create the output `numeric` vector (not as a sparse `Matrix`):

 ```{r}
-output_vector = df[,Y:=0][Improved == "Marked",Y:=1][,Y]
+output_vector = df[,Improved] == "Marked"
 ```

 1. set `Y` vector to `0`; 
@@ -250,7 +250,7 @@ According to the plot above, the most important features in this dataset to pred

 * the Age ;
 * having received a placebo or not ;
-* the sex is third but already included in the not interesting feature ; 
+* the sex is third but already included in the not interesting features group ; 
 * then we see our generated features (AgeDiscret). We can see that their contribution is very low.

 Do these results make sense?
@@ -261,21 +261,21 @@ Let's check some **Chi2** between each of these features and the label.
 Higher **Chi2** means better correlation.

 ```{r, warning=FALSE, message=FALSE}
-c2 <- chisq.test(df$Age, df$Y)
+c2 <- chisq.test(df$Age, output_vector)
 print(c2)
 ```

 Pearson correlation between Age and illness disapearing is **`r round(c2$statistic, 2 )`**.

 ```{r, warning=FALSE, message=FALSE}
-c2 <- chisq.test(df$AgeDiscret, df$Y)
+c2 <- chisq.test(df$AgeDiscret, output_vector)
 print(c2)
 ```

 Our first simplification of Age gives a Pearson correlation is **`r round(c2$statistic, 2)`**.

 ```{r, warning=FALSE, message=FALSE}
-c2 <- chisq.test(df$AgeCat, df$Y)
+c2 <- chisq.test(df$AgeCat, output_vector)
 print(c2)
 ```