adding feature contributions to R and gblinear (#2295)

* [gblinear] add features contribution prediction; fix DumpModel bug * [gbtree] minor changes to PredContrib * [R] add feature contribution prediction to R * [R] bump up version; update NEWS * [gblinear] fix the base_margin issue; fixes #1969 * [R] list of matrices as output of multiclass feature contributions * [gblinear] make order of DumpModel coefficients consistent: group index changes the fastest
2017-05-21 06:41:51 -05:00
parent e5e721722e
commit b52db87d5c
10 changed files with 255 additions and 60 deletions
--- a/R-package/man/predict.xgb.Booster.Rd
+++ b/R-package/man/predict.xgb.Booster.Rd
@@ -7,7 +7,7 @@
 \usage{
 \method{predict}{xgb.Booster}(object, newdata, missing = NA,
  outputmargin = FALSE, ntreelimit = NULL, predleaf = FALSE,
-  reshape = FALSE, ...)
+  predcontrib = FALSE, reshape = FALSE, ...)

 \method{predict}{xgb.Booster.handle}(object, ...)
 }
@@ -28,6 +28,8 @@ It will use all the trees by default (\code{NULL} value).}

 \item{predleaf}{whether predict leaf index instead.}

+\item{predcontrib}{whether to return feature contributions to individual predictions instead (see Details).}
+
 \item{reshape}{whether to reshape the vector of predictions to a matrix form when there are several 
 prediction outputs per case. This option has no effect when \code{predleaf = TRUE}.}

@@ -41,6 +43,12 @@ the \code{reshape} value.

 When \code{predleaf = TRUE}, the output is a matrix object with the 
 number of columns corresponding to the number of trees.
+
+When \code{predcontrib = TRUE} and it is not a multiclass setting, the output is a matrix object with
+\code{num_features + 1} columns. The last "+ 1" column in a matrix corresponds to bias.
+For a multiclass case, a list of \code{num_class} elements is returned, where each element is
+such a matrix. The contribution values are on the scale of untransformed margin 
+(e.g., for binary classification would mean that the contributions are log-odds deviations from bias).
 }
 \description{
 Predicted values based on either xgboost model or model handle object.
@@ -49,15 +57,22 @@ Predicted values based on either xgboost model or model handle object.
 Note that \code{ntreelimit} is not necessarily equal to the number of boosting iterations
 and it is not necessarily equal to the number of trees in a model.
 E.g., in a random forest-like model, \code{ntreelimit} would limit the number of trees.
-But for multiclass classification, there are multiple trees per iteration, 
-but \code{ntreelimit} limits the number of boosting iterations.
+But for multiclass classification, while there are multiple trees per iteration, 
+\code{ntreelimit} limits the number of boosting iterations.

 Also note that \code{ntreelimit} would currently do nothing for predictions from gblinear, 
-since gblinear doesn't keep its boosting history. 
+since gblinear doesn't keep its boosting history.

 One possible practical applications of the \code{predleaf} option is to use the model 
 as a generator of new features which capture non-linearity and interactions, 
 e.g., as implemented in \code{\link{xgb.create.features}}.
+
+Setting \code{predcontrib = TRUE} allows to calculate contributions of each feature to
+individual predictions. For "gblinear" booster, feature contributions are simply linear terms
+(feature_beta * feature_value). For "gbtree" booster, feature contribution is calculated 
+as a sum of average contribution of that feature's split nodes across all trees to an 
+individual prediction, following the idea explained in 
+\url{http://blog.datadive.net/interpreting-random-forests/}.
 }
 \examples{
 ## binary classification:
@@ -68,11 +83,32 @@ train <- agaricus.train
 test <- agaricus.test

 bst <- xgboost(data = train$data, label = train$label, max_depth = 2, 
-               eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
+               eta = 0.5, nthread = 2, nrounds = 5, objective = "binary:logistic")
 # use all trees by default
 pred <- predict(bst, test$data)
 # use only the 1st tree
-pred <- predict(bst, test$data, ntreelimit = 1)
+pred1 <- predict(bst, test$data, ntreelimit = 1)
+
+# Predicting tree leafs:
+# the result is an nsamples X ntrees matrix
+pred_leaf <- predict(bst, test$data, predleaf = TRUE)
+str(pred_leaf)
+
+# Predicting feature contributions to predictions:
+# the result is an nsamples X (nfeatures + 1) matrix
+pred_contr <- predict(bst, test$data, predcontrib = TRUE)
+str(pred_contr)
+# verify that contributions' sums are equal to log-odds of predictions (up to foat precision):
+summary(rowSums(pred_contr) - qlogis(pred))
+# for the 1st record, let's inspect its features that had non-zero contribution to prediction:
+contr1 <- pred_contr[1,]
+contr1 <- contr1[-length(contr1)]    # drop BIAS
+contr1 <- contr1[contr1 != 0]        # drop non-contributing features
+contr1 <- contr1[order(abs(contr1))] # order by contribution magnitude
+old_mar <- par("mar")
+par(mar = old_mar + c(0,7,0,0))
+barplot(contr1, horiz = TRUE, las = 2, xlab = "contribution to prediction in log-odds")
+par(mar = old_mar)


 ## multiclass classification in iris dataset: