add new function to read model and use it in the plot function

This commit is contained in:
El Potaeto
2015-01-07 17:47:50 +01:00
parent e380e4facf
commit d532f04394
5 changed files with 173 additions and 60 deletions

View File

@@ -0,0 +1,54 @@
% Generated by roxygen2 (4.1.0): do not edit by hand
% Please edit documentation in R/xgb.model.dt.tree.R
\name{xgb.model.dt.tree}
\alias{xgb.model.dt.tree}
\title{Convert tree model dump to data.table}
\usage{
xgb.model.dt.tree(feature_names = NULL, filename_dump = NULL,
n_first_tree = NULL)
}
\arguments{
\item{feature_names}{names of each feature as a character vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be \code{NULL}.}
\item{filename_dump}{the path to the text file storing the model. Model dump must include the gain per feature and per tree (parameter \code{with.stats = T} in function \code{xgb.dump}).}
\item{n_first_tree}{limit the plot to the n first trees. If \code{NULL}, all trees of the model are plotted. Performance can be low for huge models.}
}
\value{
A \code{data.table} of the features used in the model with their gain, cover and few other thing.
}
\description{
Read a tree model text dump and return a data.table.
}
\details{
General function to convert a text dump of tree model to a Matrix. The purpose is to help user to explore the model and get a better understanding of it.
The content of the \code{data.table} is organised that way:
\itemize{
\item \code{ID}: unique identifier of a node ;
\item \code{Feature}: feature used in the tree to operate a split. When Leaf is indicated, it is the end of a branch ;
\item \code{Split}: value of the chosen feature where is operated the split ;
\item \code{Yes}: ID of the feature for the next node in the branch when the split condition is met ;
\item \code{No}: ID of the feature for the next node in the branch when the split condition is not met ;
\item \code{Missing}: ID of the feature for the next node in the branch for observation where the feature used for the split are not provided ;
\item \code{Quality}: it's the gain related to the split in this specific node ;
\item \code{Cover}: metric to measure the number of observation affected by the split ;
\item \code{Tree}: ID of the tree. It is included in the main ID ;
}
}
\examples{
data(agaricus.train, package='xgboost')
#Both dataset are list with two items, a sparse matrix and labels (labels = outcome column which will be learned).
#Each column of the sparse Matrix is a feature in one hot encoding format.
train <- agaricus.train
bst <- xgboost(data = train$data, label = train$label, max.depth = 2,
eta = 1, nround = 2,objective = "binary:logistic")
xgb.dump(bst, 'xgb.model.dump', with.stats = T)
#agaricus.test$data@Dimnames[[2]] represents the column names of the sparse matrix.
xgb.model.dt.tree(agaricus.train$data@Dimnames[[2]], 'xgb.model.dump')
}

View File

@@ -17,14 +17,13 @@ xgb.plot.tree(feature_names = NULL, filename_dump = NULL,
\item{style}{a \code{character} vector storing a css style to customize the appearance of nodes. Look at the \href{https://github.com/knsv/mermaid/wiki}{Mermaid wiki} for more information.}
}
\value{
A \code{data.table} of the features used in the model with their average gain (and their weight for boosted tree model) in the model.
A \code{DiagrammeR} of the model.
}
\description{
Read a xgboost model text dump.
Read a tree model text dump.
Plotting only works for boosted tree model (not linear model).
}
\details{
Plotting only works for boosted tree model (not linear model).
The content of each node is organised that way:
\itemize{
@@ -34,7 +33,7 @@ The content of each node is organised that way:
}
Each branch finishes with a leaf. For each leaf, only the \code{cover} is indicated.
It uses Mermaid JS library for that purpose.
It uses \href{https://github.com/knsv/mermaid/}{Mermaid} library for that purpose.
}
\examples{
data(agaricus.train, package='xgboost')