Add SHAP interaction effects, fix minor bug, and add cox loss (#3043)

* Add interaction effects and cox loss * Minimize whitespace changes * Cox loss now no longer needs a pre-sorted dataset. * Address code review comments * Remove mem check, rename to pred_interactions, include bias * Make lint happy * More lint fixes * Fix cox loss indexing * Fix main effects and tests * Fix lint * Use half interaction values on the off-diagonals * Fix lint again
2018-02-07 18:38:01 -08:00
parent 077abb35cd
commit d878c36c84
19 changed files with 638 additions and 125 deletions
--- a/doc/parameter.md
+++ b/doc/parameter.md
@@ -65,8 +65,8 @@ Parameters for Tree Booster
    - 'exact': Exact greedy algorithm.
    - 'approx': Approximate greedy algorithm using sketching and histogram.
    - 'hist': Fast histogram optimized approximate greedy algorithm. It uses some performance improvements such as bins caching.
-	- 'gpu_exact': GPU implementation of exact algorithm. 
-	- 'gpu_hist': GPU implementation of hist algorithm. 
+	- 'gpu_exact': GPU implementation of exact algorithm.
+	- 'gpu_hist': GPU implementation of hist algorithm.
 * sketch_eps, [default=0.03]
  - This is only used for approximate greedy algorithm.
  - This roughly translated into ```O(1 / sketch_eps)``` number of bins.
@@ -170,6 +170,8 @@ Specify the learning task and the corresponding learning objective. The objectiv
    they can only be used when the entire training session uses the same dataset
  - "count:poisson" --poisson regression for count data, output mean of poisson distribution
    - max_delta_step is set to 0.7 by default in poisson regression (used to safeguard optimization)
+  - "survival:cox" --Cox regression for right censored survival time data (negative values are considered right censored).
+    Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional hazard function h(t) = h0(t) * HR).
  - "multi:softmax" --set XGBoost to do multiclass classification using the softmax objective, you also need to set num_class(number of classes)
  - "multi:softprob" --same as softmax, but output a vector of ndata * nclass, which can be further reshaped to ndata, nclass matrix. The result contains predicted probability of each data point belonging to each class.
  - "rank:pairwise" --set XGBoost to do ranking task by minimizing the pairwise loss
@@ -197,6 +199,7 @@ Specify the learning task and the corresponding learning objective. The objectiv
 training repeatedly
  - "poisson-nloglik": negative log-likelihood for Poisson regression
  - "gamma-nloglik": negative log-likelihood for gamma regression
+  - "cox-nloglik": negative partial log-likelihood for Cox proportional hazards regression
  - "gamma-deviance": residual deviance for gamma regression
  - "tweedie-nloglik": negative log-likelihood for Tweedie regression (at a specified value of the tweedie_variance_power parameter)
 * seed [default=0]