Add prediction of feature contributions (#2003)

* Add prediction of feature contributions

This implements the idea described at http://blog.datadive.net/interpreting-random-forests/
which tries to give insight in how a prediction is composed of its feature contributions
and a bias.

* Support multi-class models

* Calculate learning_rate per-tree instead of using the one from the first tree

* Do not rely on node.base_weight * learning_rate having the same value as the node mean value (aka leaf value, if it were a leaf); instead calculate them (lazily) on-the-fly

* Add simple test for contributions feature

* Check against param.num_nodes instead of checking for non-zero length

* Loop over all roots instead of only the first
This commit is contained in:
Maurus Cuelenaere
2017-05-14 07:58:10 +02:00
committed by Vadim Khotilovich
parent e62be19c70
commit 6bd1869026
10 changed files with 205 additions and 5 deletions

View File

@@ -103,12 +103,14 @@ class Learner : public rabit::Serializable {
* \param ntree_limit limit number of trees used for boosted tree
* predictor, when it equals 0, this means we are using all the trees
* \param pred_leaf whether to only predict the leaf index of each tree in a boosted tree predictor
* \param pred_contribs whether to only predict the feature contributions of all trees
*/
virtual void Predict(DMatrix* data,
bool output_margin,
std::vector<bst_float> *out_preds,
unsigned ntree_limit = 0,
bool pred_leaf = false) const = 0;
bool pred_leaf = false,
bool pred_contribs = false) const = 0;
/*!
* \brief Set additional attribute to the Booster.
* The property will be saved along the booster.