diff --git a/R-package/vignettes/discoverYourData.Rmd b/R-package/vignettes/discoverYourData.Rmd index 1b463af38..78df67d4e 100644 --- a/R-package/vignettes/discoverYourData.Rmd +++ b/R-package/vignettes/discoverYourData.Rmd @@ -153,7 +153,7 @@ head(sparse_matrix) Create the output `numeric` vector (not as a sparse `Matrix`): ```{r} -output_vector = df[,Y:=0][Improved == "Marked",Y:=1][,Y] +output_vector = df[,Improved] == "Marked" ``` 1. set `Y` vector to `0`; @@ -261,21 +261,21 @@ Let's check some **Chi2** between each of these features and the label. Higher **Chi2** means better correlation. ```{r, warning=FALSE, message=FALSE} -c2 <- chisq.test(df$Age, df$Y) +c2 <- chisq.test(df$Age, output_vector) print(c2) ``` Pearson correlation between Age and illness disapearing is **`r round(c2$statistic, 2 )`**. ```{r, warning=FALSE, message=FALSE} -c2 <- chisq.test(df$AgeDiscret, df$Y) +c2 <- chisq.test(df$AgeDiscret, output_vector) print(c2) ``` Our first simplification of Age gives a Pearson correlation is **`r round(c2$statistic, 2)`**. ```{r, warning=FALSE, message=FALSE} -c2 <- chisq.test(df$AgeCat, df$Y) +c2 <- chisq.test(df$AgeCat, output_vector) print(c2) ```