Add SHAP interaction effects, fix minor bug, and add cox loss (#3043)

* Add interaction effects and cox loss

* Minimize whitespace changes

* Cox loss now no longer needs a pre-sorted dataset.

* Address code review comments

* Remove mem check, rename to pred_interactions, include bias

* Make lint happy

* More lint fixes

* Fix cox loss indexing

* Fix main effects and tests

* Fix lint

* Use half interaction values on the off-diagonals

* Fix lint again
This commit is contained in:
Scott Lundberg
2018-02-07 18:38:01 -08:00
committed by Vadim Khotilovich
parent 077abb35cd
commit d878c36c84
19 changed files with 638 additions and 125 deletions

View File

@@ -65,8 +65,8 @@ Parameters for Tree Booster
- 'exact': Exact greedy algorithm.
- 'approx': Approximate greedy algorithm using sketching and histogram.
- 'hist': Fast histogram optimized approximate greedy algorithm. It uses some performance improvements such as bins caching.
- 'gpu_exact': GPU implementation of exact algorithm.
- 'gpu_hist': GPU implementation of hist algorithm.
- 'gpu_exact': GPU implementation of exact algorithm.
- 'gpu_hist': GPU implementation of hist algorithm.
* sketch_eps, [default=0.03]
- This is only used for approximate greedy algorithm.
- This roughly translated into ```O(1 / sketch_eps)``` number of bins.
@@ -170,6 +170,8 @@ Specify the learning task and the corresponding learning objective. The objectiv
they can only be used when the entire training session uses the same dataset
- "count:poisson" --poisson regression for count data, output mean of poisson distribution
- max_delta_step is set to 0.7 by default in poisson regression (used to safeguard optimization)
- "survival:cox" --Cox regression for right censored survival time data (negative values are considered right censored).
Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional hazard function h(t) = h0(t) * HR).
- "multi:softmax" --set XGBoost to do multiclass classification using the softmax objective, you also need to set num_class(number of classes)
- "multi:softprob" --same as softmax, but output a vector of ndata * nclass, which can be further reshaped to ndata, nclass matrix. The result contains predicted probability of each data point belonging to each class.
- "rank:pairwise" --set XGBoost to do ranking task by minimizing the pairwise loss
@@ -197,6 +199,7 @@ Specify the learning task and the corresponding learning objective. The objectiv
training repeatedly
- "poisson-nloglik": negative log-likelihood for Poisson regression
- "gamma-nloglik": negative log-likelihood for gamma regression
- "cox-nloglik": negative partial log-likelihood for Cox proportional hazards regression
- "gamma-deviance": residual deviance for gamma regression
- "tweedie-nloglik": negative log-likelihood for Tweedie regression (at a specified value of the tweedie_variance_power parameter)
* seed [default=0]