[R-package] GPL2 dependency reduction and some fixes (#1401)
* [R] do not remove zero coefficients from gblinear dump * [R] switch from stringr to stringi * fix #1399 * [R] separate ggplot backend, add base r graphics, cleanup, more plots, tests * add missing include in amalgamation - fixes building R package in linux * add forgotten file * [R] fix DESCRIPTION * [R] fix travis check issue and some cleanup
This commit is contained in:
committed by
Tong He
parent
f6423056c0
commit
d5c143367d
@@ -1,41 +1,82 @@
|
||||
% Generated by roxygen2: do not edit by hand
|
||||
% Please edit documentation in R/xgb.plot.importance.R
|
||||
\name{xgb.plot.importance}
|
||||
% Please edit documentation in R/xgb.ggplot.R, R/xgb.plot.importance.R
|
||||
\name{xgb.ggplot.importance}
|
||||
\alias{xgb.ggplot.importance}
|
||||
\alias{xgb.plot.importance}
|
||||
\title{Plot feature importance bar graph}
|
||||
\title{Plot feature importance as a bar graph}
|
||||
\usage{
|
||||
xgb.plot.importance(importance_matrix = NULL, n_clusters = c(1:10), ...)
|
||||
xgb.ggplot.importance(importance_matrix = NULL, top_n = NULL,
|
||||
measure = NULL, rel_to_first = FALSE, n_clusters = c(1:10), ...)
|
||||
|
||||
xgb.plot.importance(importance_matrix = NULL, top_n = NULL,
|
||||
measure = NULL, rel_to_first = FALSE, left_margin = 10, cex = NULL,
|
||||
plot = TRUE, ...)
|
||||
}
|
||||
\arguments{
|
||||
\item{importance_matrix}{a \code{data.table} returned by the \code{xgb.importance} function.}
|
||||
\item{importance_matrix}{a \code{data.table} returned by \code{\link{xgb.importance}}.}
|
||||
|
||||
\item{n_clusters}{a \code{numeric} vector containing the min and the max range of the possible number of clusters of bars.}
|
||||
\item{top_n}{maximal number of top features to include into the plot.}
|
||||
|
||||
\item{...}{currently not used}
|
||||
\item{measure}{the name of importance measure to plot.
|
||||
When \code{NULL}, 'Gain' would be used for trees and 'Weight' would be used for gblinear.}
|
||||
|
||||
\item{rel_to_first}{whether importance values should be represented as relative to the highest ranked feature.
|
||||
See Details.}
|
||||
|
||||
\item{n_clusters}{(ggplot only) a \code{numeric} vector containing the min and the max range
|
||||
of the possible number of clusters of bars.}
|
||||
|
||||
\item{...}{other parameters passed to \code{barplot} (except horiz, border, cex.names, names.arg, and las).}
|
||||
|
||||
\item{left_margin}{(base R barplot) allows to adjust the left margin size to fit feature names.
|
||||
When it is NULL, the existing \code{par('mar')} is used.}
|
||||
|
||||
\item{cex}{(base R barplot) passed as \code{cex.names} parameter to \code{barplot}.}
|
||||
|
||||
\item{plot}{(base R barplot) whether a barplot should be produced.
|
||||
If FALSE, only a data.table is returned.}
|
||||
}
|
||||
\value{
|
||||
A \code{ggplot2} bar graph representing each feature by a horizontal bar. Longer is the bar, more important is the feature. Features are classified by importance and clustered by importance. The group is represented through the color of the bar.
|
||||
The \code{xgb.plot.importance} function creates a \code{barplot} (when \code{plot=TRUE})
|
||||
and silently returns a processed data.table with \code{n_top} features sorted by importance.
|
||||
|
||||
The \code{xgb.ggplot.importance} function returns a ggplot graph which could be customized afterwards.
|
||||
E.g., to change the title of the graph, add \code{+ ggtitle("A GRAPH NAME")} to the result.
|
||||
}
|
||||
\description{
|
||||
Read a data.table containing feature importance details and plot it (for both GLM and Trees).
|
||||
Represents previously calculated feature importance as a bar graph.
|
||||
\code{xgb.plot.importance} uses base R graphics, while \code{xgb.ggplot.importance} uses the ggplot backend.
|
||||
}
|
||||
\details{
|
||||
The purpose of this function is to easily represent the importance of each feature of a model.
|
||||
The function returns a ggplot graph, therefore each of its characteristic can be overriden (to customize it).
|
||||
In particular you may want to override the title of the graph. To do so, add \code{+ ggtitle("A GRAPH NAME")} next to the value returned by this function.
|
||||
The graph represents each feature as a horizontal bar of length proportional to the importance of a feature.
|
||||
Features are shown ranked in a decreasing importance order.
|
||||
It works for importances from both \code{gblinear} and \code{gbtree} models.
|
||||
|
||||
When \code{rel_to_first = FALSE}, the values would be plotted as they were in \code{importance_matrix}.
|
||||
For gbtree model, that would mean being normalized to the total of 1
|
||||
("what is feature's importance contribution relative to the whole model?").
|
||||
For linear models, \code{rel_to_first = FALSE} would show actual values of the coefficients.
|
||||
Setting \code{rel_to_first = TRUE} allows to see the picture from the perspective of
|
||||
"what is feature's importance contribution relative to the most important feature?"
|
||||
|
||||
The ggplot-backend method also performs 1-D custering of the importance values,
|
||||
with bar colors coresponding to different clusters that have somewhat similar importance values.
|
||||
}
|
||||
\examples{
|
||||
data(agaricus.train, package='xgboost')
|
||||
data(agaricus.train)
|
||||
|
||||
#Both dataset are list with two items, a sparse matrix and labels
|
||||
#(labels = outcome column which will be learned).
|
||||
#Each column of the sparse Matrix is a feature in one hot encoding format.
|
||||
|
||||
bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 2,
|
||||
bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 3,
|
||||
eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
|
||||
|
||||
importance_matrix <- xgb.importance(colnames(agaricus.train$data), model = bst)
|
||||
xgb.plot.importance(importance_matrix)
|
||||
|
||||
xgb.plot.importance(importance_matrix, rel_to_first = TRUE, xlab = "Relative importance")
|
||||
|
||||
(gg <- xgb.ggplot.importance(importance_matrix, measure = "Frequency", rel_to_first = TRUE))
|
||||
gg + ggplot2::ylab("Frequency")
|
||||
|
||||
}
|
||||
\seealso{
|
||||
\code{\link[graphics]{barplot}}.
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user