[R] xgb.plot.tree fixes (#1939)

* [R] a few fixes and improvements to xgb.plot.tree * [R] deprecate n_first_tree replace with trees; fix types in xgb.model.dt.tree
2017-01-06 13:09:51 -06:00
parent d23ea5ca7d
commit d7406e07f3
7 changed files with 225 additions and 116 deletions
--- a/R-package/man/xgb.plot.tree.Rd
+++ b/R-package/man/xgb.plot.tree.Rd
@@ -4,24 +4,39 @@
 \alias{xgb.plot.tree}
 \title{Plot a boosted tree model}
 \usage{
-xgb.plot.tree(feature_names = NULL, model = NULL, n_first_tree = NULL,
-  plot_width = NULL, plot_height = NULL, ...)
+xgb.plot.tree(feature_names = NULL, model = NULL, trees = NULL,
+  plot_width = NULL, plot_height = NULL, render = TRUE,
+  show_node_id = FALSE, ...)
 }
 \arguments{
-\item{feature_names}{names of each feature as a \code{character} vector. Can be extracted from a sparse matrix (see example). If model dump already contains feature names, this argument should be \code{NULL}.}
+\item{feature_names}{names of each feature as a \code{character} vector.}

-\item{model}{generated by the \code{xgb.train} function. Avoid the creation of a dump file.}
+\item{model}{produced by the \code{xgb.train} function.}

-\item{n_first_tree}{limit the plot to the n first trees. If \code{NULL}, all trees of the model are plotted. Performance can be low for huge models.}
+\item{trees}{an integer vector of tree indices that should be visualized.
+If set to \code{NULL}, all trees of the model are included.
+IMPORTANT: the tree index in xgboost model is zero-based
+(e.g., use \code{trees = 0:2} for the first 3 trees in a model).}

 \item{plot_width}{the width of the diagram in pixels.}

 \item{plot_height}{the height of the diagram in pixels.}

+\item{render}{a logical flag for whether the graph should be rendered (see Value).}
+
+\item{show_node_id}{a logical flag for whether to include node id's in the graph.}
+
 \item{...}{currently not used.}
 }
 \value{
-A \code{DiagrammeR} of the model.
+When \code{render = TRUE}:
+returns a rendered graph object which is an \code{htmlwidget} of class \code{grViz}.
+Similar to ggplot objects, it needs to be printed to see it when not running from command line.
+
+When \code{render = FALSE}:
+silently returns a graph object which is of DiagrammeR's class \code{dgr_graph}.
+This could be useful if one wants to modify some of the graph attributes
+before rendering the graph with \code{\link[DiagrammeR]{render_graph}}.
 }
 \description{
 Read a tree model text dump and plot the model.
@@ -30,20 +45,33 @@ Read a tree model text dump and plot the model.
 The content of each node is organised that way:

 \itemize{
- \item \code{feature} value;
- \item \code{cover}: the sum of second order gradient of training data classified to the leaf, if it is square loss, this simply corresponds to the number of instances in that branch. Deeper in the tree a node is, lower this metric will be;
- \item \code{gain}: metric the importance of the node in the model.
+ \item Feature name.
+ \item \code{Cover}: The sum of second order gradient of training data classified to the leaf.
+       If it is square loss, this simply corresponds to the number of instances seen by a split
+       or collected by a leaf during training.
+       The deeper in the tree a node is, the lower this metric will be.
+ \item \code{Gain} (for split nodes): the information gain metric of a split
+       (corresponds to the importance of the node in the model).
+ \item \code{Value} (for leafs): the margin value that the leaf may contribute to prediction.
 } 
+The tree root nodes also indicate the Tree index (0-based).

-The function uses \href{http://www.graphviz.org/}{GraphViz} library for that purpose.
+The "Yes" branches are marked by the "< split_value" label.
+The branches that also used for missing values are marked as bold
+(as in "carrying extra capacity").
+
+This function uses \href{http://www.graphviz.org/}{GraphViz} as a backend of DiagrammeR.
 }
 \examples{
 data(agaricus.train, package='xgboost')

-bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 2, 
+bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 3,
               eta = 1, nthread = 2, nrounds = 2,objective = "binary:logistic")
-
+# plot all the trees
 xgb.plot.tree(feature_names = colnames(agaricus.train$data), model = bst)
+# plot only the first tree and include the node ID:
+xgb.plot.tree(feature_names = colnames(agaricus.train$data), model = bst,
+              trees = 0, show_node_id = TRUE)

 }