resolving not-CRAN issues
This commit is contained in:
@@ -4,4 +4,5 @@ boost_from_prediction Boosting from existing prediction
|
||||
predict_first_ntree Predicting using first n trees
|
||||
generalized_linear_model Generalized Linear Model
|
||||
cross_validation Cross validation
|
||||
create_sparse_matrix
|
||||
create_sparse_matrix Create Sparse Matrix
|
||||
predict_leaf_indices Predicting the corresponding leaves
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
require(xgboost)
|
||||
require(Matrix)
|
||||
require(data.table)
|
||||
require(vcd) #Available in Cran. Used for its dataset with categorical values.
|
||||
if (!require(vcd)) install.packages('vcd') #Available in Cran. Used for its dataset with categorical values.
|
||||
|
||||
# According to its documentation, Xgboost works only on numbers.
|
||||
# Sometimes the dataset we have to work on have categorical data.
|
||||
@@ -86,4 +86,4 @@ print(chisq.test(df$AgeCat, df$Y))
|
||||
|
||||
# As you can see, in general destroying information by simplying it won't improve your model. Chi2 just demonstrates that. But in more complex cases, creating a new feature based on existing one which makes link with the outcome more obvious may help the algorithm and improve the model. The case studied here is not enough complex to show that. Check Kaggle forum for some challenging datasets.
|
||||
# However it's almost always worse when you add some arbitrary rules.
|
||||
# Moreover, you can notice that even if we have added some not useful new features highly correlated with other features, the boosting tree algorithm have been able to choose the best one, which in this case is the Age. Linear model may not be that strong in these scenario.
|
||||
# Moreover, you can notice that even if we have added some not useful new features highly correlated with other features, the boosting tree algorithm have been able to choose the best one, which in this case is the Age. Linear model may not be that strong in these scenario.
|
||||
|
||||
Reference in New Issue
Block a user