From 21a4a32655b9dfe25e28cdfe0f23373d360db8a4 Mon Sep 17 00:00:00 2001 From: El Potaeto Date: Sun, 8 Mar 2015 21:57:31 +0100 Subject: [PATCH 1/2] Vignette text --- R-package/vignettes/discoverYourData.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/R-package/vignettes/discoverYourData.Rmd b/R-package/vignettes/discoverYourData.Rmd index c9060f012..1b463af38 100644 --- a/R-package/vignettes/discoverYourData.Rmd +++ b/R-package/vignettes/discoverYourData.Rmd @@ -250,7 +250,7 @@ According to the plot above, the most important features in this dataset to pred * the Age ; * having received a placebo or not ; -* the sex is third but already included in the not interesting feature ; +* the sex is third but already included in the not interesting features group ; * then we see our generated features (AgeDiscret). We can see that their contribution is very low. Do these results make sense? From 93a019d1742ad9201c007e1c56f755a8f5f0c21a Mon Sep 17 00:00:00 2001 From: El Potaeto Date: Thu, 12 Mar 2015 23:44:08 +0100 Subject: [PATCH 2/2] code simplification --- R-package/vignettes/discoverYourData.Rmd | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/R-package/vignettes/discoverYourData.Rmd b/R-package/vignettes/discoverYourData.Rmd index 1b463af38..78df67d4e 100644 --- a/R-package/vignettes/discoverYourData.Rmd +++ b/R-package/vignettes/discoverYourData.Rmd @@ -153,7 +153,7 @@ head(sparse_matrix) Create the output `numeric` vector (not as a sparse `Matrix`): ```{r} -output_vector = df[,Y:=0][Improved == "Marked",Y:=1][,Y] +output_vector = df[,Improved] == "Marked" ``` 1. set `Y` vector to `0`; @@ -261,21 +261,21 @@ Let's check some **Chi2** between each of these features and the label. Higher **Chi2** means better correlation. ```{r, warning=FALSE, message=FALSE} -c2 <- chisq.test(df$Age, df$Y) +c2 <- chisq.test(df$Age, output_vector) print(c2) ``` Pearson correlation between Age and illness disapearing is **`r round(c2$statistic, 2 )`**. ```{r, warning=FALSE, message=FALSE} -c2 <- chisq.test(df$AgeDiscret, df$Y) +c2 <- chisq.test(df$AgeDiscret, output_vector) print(c2) ``` Our first simplification of Age gives a Pearson correlation is **`r round(c2$statistic, 2)`**. ```{r, warning=FALSE, message=FALSE} -c2 <- chisq.test(df$AgeCat, df$Y) +c2 <- chisq.test(df$AgeCat, output_vector) print(c2) ```