Add prediction of feature contributions (#2003)

* Add prediction of feature contributions This implements the idea described at http://blog.datadive.net/interpreting-random-forests/ which tries to give insight in how a prediction is composed of its feature contributions and a bias. * Support multi-class models * Calculate learning_rate per-tree instead of using the one from the first tree * Do not rely on node.base_weight * learning_rate having the same value as the node mean value (aka leaf value, if it were a leaf); instead calculate them (lazily) on-the-fly * Add simple test for contributions feature * Check against param.num_nodes instead of checking for non-zero length * Loop over all roots instead of only the first
2017-05-14 07:58:10 +02:00
parent e62be19c70
commit 6bd1869026
10 changed files with 205 additions and 5 deletions
--- a/include/xgboost/learner.h
+++ b/include/xgboost/learner.h
@@ -103,12 +103,14 @@ class Learner : public rabit::Serializable {
   * \param ntree_limit limit number of trees used for boosted tree
   *   predictor, when it equals 0, this means we are using all the trees
   * \param pred_leaf whether to only predict the leaf index of each tree in a boosted tree predictor
+   * \param pred_contribs whether to only predict the feature contributions of all trees
   */
  virtual void Predict(DMatrix* data,
                       bool output_margin,
                       std::vector<bst_float> *out_preds,
                       unsigned ntree_limit = 0,
-                       bool pred_leaf = false) const = 0;
+                       bool pred_leaf = false,
+                       bool pred_contribs = false) const = 0;
  /*!
   * \brief Set additional attribute to the Booster.
   *  The property will be saved along the booster.