documentation wording

This commit is contained in:
El Potaeto 2014-12-30 12:32:21 +01:00
parent 3694772bde
commit c754fd4ad0

View File

@ -9,15 +9,25 @@ xgb.importance(feature_names = NULL, filename_dump = NULL)
\arguments{ \arguments{
\item{feature_names}{names of each feature as a character vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be \code{NULL}.} \item{feature_names}{names of each feature as a character vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be \code{NULL}.}
\item{filename_dump}{the path to the text file storing the model.} \item{filename_dump}{the path to the text file storing the model. Model dump must include the gain per feature and per tree (\code{with.stats = T} in function \code{xgb.dump}).}
} }
\description{ \description{
Read a xgboost model in text file format. Read a xgboost model text dump.
Can be tree or linear model (text dump of linear model are only supported in dev version of Xgboost for now). Can be tree or linear model (text dump of linear model are only supported in dev version of \code{Xgboost} for now).
Return a data.table of the features used in the model with their average gain (and their weight for boosted tree model) in the model.
} }
\details{ \details{
Return a data.table of the features with their weight. This is the function to understand the model trained (and through your model, your data).
#'
Results are returned for both linear and tree models.
\code{data.table} is returned by the function.
There are 3 columns :
\itemize{
\item \code{Features} name of the features as provided in \code{feature_names} or already present in the model dump.
\item \code{Gain} contribution of each feature to the model. For boosted tree model, each gain of each feature of each tree is taken into account, then average per feature to give a vision of the entire model. Highest percentage means most important feature regarding the \code{label} used for the training.
\item \code{Weight} percentage representing the relative number of times a feature have been taken into trees. \code{Gain} should be prefered to search the most important feature. For boosted linear model, this column has no meaning.
}
} }
\examples{ \examples{
data(agaricus.train, package='xgboost') data(agaricus.train, package='xgboost')