Small rewording function xgb.importance

This commit is contained in:
pommedeterresautee 2015-12-02 15:48:22 +01:00
parent 6ceb3438be
commit edca27fa32
2 changed files with 16 additions and 10 deletions

View File

@ -25,14 +25,17 @@
#' Results are returned for both linear and tree models. #' Results are returned for both linear and tree models.
#' #'
#' \code{data.table} is returned by the function. #' \code{data.table} is returned by the function.
#' There are 3 columns : #' The columns are :
#' \itemize{ #' \itemize{
#' \item \code{Features} name of the features as provided in \code{feature_names} or already present in the model dump. #' \item \code{Features} name of the features as provided in \code{feature_names} or already present in the model dump;
#' \item \code{Gain} contribution of each feature to the model. For boosted tree model, each gain of each feature of each tree is taken into account, then average per feature to give a vision of the entire model. Highest percentage means important feature to predict the \code{label} used for the training ; #' \item \code{Gain} contribution of each feature to the model. For boosted tree model, each gain of each feature of each tree is taken into account, then average per feature to give a vision of the entire model. Highest percentage means important feature to predict the \code{label} used for the training (only available for tree models);
#' \item \code{Cover} metric of the number of observation related to this feature (only available for tree models) ; #' \item \code{Cover} metric of the number of observation related to this feature (only available for tree models);
#' \item \code{Weight} percentage representing the relative number of times a feature have been taken into trees. \code{Gain} should be prefered to search the most important feature. For boosted linear model, this column has no meaning. #' \item \code{Weight} percentage representing the relative number of times a feature have been taken into trees.
#' } #' }
#' #'
#' If you don't provide name, index of the features are used.
#' They are extracted from the boost dump (made on the C++ side), the index starts at 0 (usual in C++) instead of 1 (usual in R).
#'
#' Co-occurence count #' Co-occurence count
#' ------------------ #' ------------------
#' #'

View File

@ -31,14 +31,17 @@ This is the function to understand the model trained (and through your model, yo
Results are returned for both linear and tree models. Results are returned for both linear and tree models.
\code{data.table} is returned by the function. \code{data.table} is returned by the function.
There are 3 columns : The columns are :
\itemize{ \itemize{
\item \code{Features} name of the features as provided in \code{feature_names} or already present in the model dump. \item \code{Features} name of the features as provided in \code{feature_names} or already present in the model dump;
\item \code{Gain} contribution of each feature to the model. For boosted tree model, each gain of each feature of each tree is taken into account, then average per feature to give a vision of the entire model. Highest percentage means important feature to predict the \code{label} used for the training ; \item \code{Gain} contribution of each feature to the model. For boosted tree model, each gain of each feature of each tree is taken into account, then average per feature to give a vision of the entire model. Highest percentage means important feature to predict the \code{label} used for the training (only available for tree models);
\item \code{Cover} metric of the number of observation related to this feature (only available for tree models) ; \item \code{Cover} metric of the number of observation related to this feature (only available for tree models);
\item \code{Weight} percentage representing the relative number of times a feature have been taken into trees. \code{Gain} should be prefered to search the most important feature. For boosted linear model, this column has no meaning. \item \code{Weight} percentage representing the relative number of times a feature have been taken into trees.
} }
If you don't provide name, index of the features are used.
They are extracted from the boost dump (made on the C++ side), the index starts at 0 (usual in C++) instead of 1 (usual in R).
Co-occurence count Co-occurence count
------------------ ------------------