Add prediction of feature contributions (#2003)

* Add prediction of feature contributions This implements the idea described at http://blog.datadive.net/interpreting-random-forests/ which tries to give insight in how a prediction is composed of its feature contributions and a bias. * Support multi-class models * Calculate learning_rate per-tree instead of using the one from the first tree * Do not rely on node.base_weight * learning_rate having the same value as the node mean value (aka leaf value, if it were a leaf); instead calculate them (lazily) on-the-fly * Add simple test for contributions feature * Check against param.num_nodes instead of checking for non-zero length * Loop over all roots instead of only the first
2017-05-14 07:58:10 +02:00
parent e62be19c70
commit 6bd1869026
10 changed files with 205 additions and 5 deletions
--- a/python-package/xgboost/core.py
+++ b/python-package/xgboost/core.py
@@ -911,7 +911,8 @@ class Booster(object):
        self._validate_features(data)
        return self.eval_set([(data, name)], iteration)

-    def predict(self, data, output_margin=False, ntree_limit=0, pred_leaf=False):
+    def predict(self, data, output_margin=False, ntree_limit=0, pred_leaf=False,
+                pred_contribs=False):
        """
        Predict with data.

@@ -937,6 +938,12 @@ class Booster(object):
            Note that the leaf index of a tree is unique per tree, so you may find leaf 1
            in both tree 1 and tree 0.

+        pred_contribs : bool
+            When this option is on, the output will be a matrix of (nsample, nfeats+1)
+            with each record indicating the feature contributions of all trees. The sum of
+            all feature contributions is equal to the prediction. Note that the bias is added
+            as the final column, on top of the regular features.
+
        Returns
        -------
        prediction : numpy array
@@ -946,6 +953,8 @@ class Booster(object):
            option_mask |= 0x01
        if pred_leaf:
            option_mask |= 0x02
+        if pred_contribs:
+            option_mask |= 0x04

        self._validate_features(data)