Small rewording function xgb.importance

2015-12-02 15:48:22 +01:00 · 2015-12-02 15:48:22 +01:00 · edca27fa32
commit edca27fa32
parent 6ceb3438be
2 changed files with 16 additions and 10 deletions
--- a/R-package/R/xgb.importance.R
+++ b/R-package/R/xgb.importance.R
@ -25,14 +25,17 @@
 #' Results are returned for both linear and tree models.
 #' 
 #' \code{data.table} is returned by the function. 
-#' There are 3 columns :
+#' The columns are :
 #' \itemize{
-#'   \item \code{Features} name of the features as provided in \code{feature_names} or already present in the model dump.
-#'   \item \code{Gain} contribution of each feature to the model. For boosted tree model, each gain of each feature of each tree is taken into account, then average per feature to give a vision of the entire model. Highest percentage means important feature to predict the \code{label} used for the training ;
-#'   \item \code{Cover} metric of the number of observation related to this feature (only available for tree models) ;
-#'   \item \code{Weight} percentage representing the relative number of times a feature have been taken into trees. \code{Gain} should be prefered to search the most important feature. For boosted linear model, this column has no meaning.
+#'   \item \code{Features} name of the features as provided in \code{feature_names} or already present in the model dump;
+#'   \item \code{Gain} contribution of each feature to the model. For boosted tree model, each gain of each feature of each tree is taken into account, then average per feature to give a vision of the entire model. Highest percentage means important feature to predict the \code{label} used for the training (only available for tree models);
+#'   \item \code{Cover} metric of the number of observation related to this feature (only available for tree models);
+#'   \item \code{Weight} percentage representing the relative number of times a feature have been taken into trees.
 #' }
 #' 
+#' If you don't provide name, index of the features are used.
+#' They are extracted from the boost dump (made on the C++ side), the index starts at 0 (usual in C++) instead of 1 (usual in R).
+#' 
 #' Co-occurence count
 #' ------------------
 #' 
--- a/R-package/man/xgb.importance.Rd
+++ b/R-package/man/xgb.importance.Rd
@ -31,14 +31,17 @@ This is the function to understand the model trained (and through your model, yo
 Results are returned for both linear and tree models.

 \code{data.table} is returned by the function. 
-There are 3 columns :
+The columns are :
 \itemize{
-  \item \code{Features} name of the features as provided in \code{feature_names} or already present in the model dump.
-  \item \code{Gain} contribution of each feature to the model. For boosted tree model, each gain of each feature of each tree is taken into account, then average per feature to give a vision of the entire model. Highest percentage means important feature to predict the \code{label} used for the training ;
-  \item \code{Cover} metric of the number of observation related to this feature (only available for tree models) ;
-  \item \code{Weight} percentage representing the relative number of times a feature have been taken into trees. \code{Gain} should be prefered to search the most important feature. For boosted linear model, this column has no meaning.
+  \item \code{Features} name of the features as provided in \code{feature_names} or already present in the model dump;
+  \item \code{Gain} contribution of each feature to the model. For boosted tree model, each gain of each feature of each tree is taken into account, then average per feature to give a vision of the entire model. Highest percentage means important feature to predict the \code{label} used for the training (only available for tree models);
+  \item \code{Cover} metric of the number of observation related to this feature (only available for tree models);
+  \item \code{Weight} percentage representing the relative number of times a feature have been taken into trees.
 }

+If you don't provide name, index of the features are used.
+They are extracted from the boost dump (made on the C++ side), the index starts at 0 (usual in C++) instead of 1 (usual in R).
+
 Co-occurence count
 ------------------