xgboost/demo/data/veterans_lung_cancer.csv
Avinash Barnwal dcf439932a
Add Accelerated Failure Time loss for survival analysis task (#4763)
* [WIP] Add lower and upper bounds on the label for survival analysis

* Update test MetaInfo.SaveLoadBinary to account for extra two fields

* Don't clear qids_ for version 2 of MetaInfo

* Add SetInfo() and GetInfo() method for lower and upper bounds

* changes to aft

* Add parameter class for AFT; use enum's to represent distribution and event type

* Add AFT metric

* changes to neg grad to grad

* changes to binomial loss

* changes to overflow

* changes to eps

* changes to code refactoring

* changes to code refactoring

* changes to code refactoring

* Re-factor survival analysis

* Remove aft namespace

* Move function bodies out of AFTNormal and AFTLogistic, to reduce clutter

* Move function bodies out of AFTLoss, to reduce clutter

* Use smart pointer to store AFTDistribution and AFTLoss

* Rename AFTNoiseDistribution enum to AFTDistributionType for clarity

The enum class was not a distribution itself but a distribution type

* Add AFTDistribution::Create() method for convenience

* changes to extreme distribution

* changes to extreme distribution

* changes to extreme

* changes to extreme distribution

* changes to left censored

* deleted cout

* changes to x,mu and sd and code refactoring

* changes to print

* changes to hessian formula in censored and uncensored

* changes to variable names and pow

* changes to Logistic Pdf

* changes to parameter

* Expose lower and upper bound labels to R package

* Use example weights; normalize log likelihood metric

* changes to CHECK

* changes to logistic hessian to standard formula

* changes to logistic formula

* Comply with coding style guideline

* Revert back Rabit submodule

* Revert dmlc-core submodule

* Comply with coding style guideline (clang-tidy)

* Fix an error in AFTLoss::Gradient()

* Add missing files to amalgamation

* Address @RAMitchell's comment: minimize future change in MetaInfo interface

* Fix lint

* Fix compilation error on 32-bit target, when size_t == bst_uint

* Allocate sufficient memory to hold extra label info

* Use OpenMP to speed up

* Fix compilation on Windows

* Address reviewer's feedback

* Add unit tests for probability distributions

* Make Metric subclass of Configurable

* Address reviewer's feedback: Configure() AFT metric

* Add a dummy test for AFT metric configuration

* Complete AFT configuration test; remove debugging print

* Rename AFT parameters

* Clarify test comment

* Add a dummy test for AFT loss for uncensored case

* Fix a bug in AFT loss for uncensored labels

* Complete unit test for AFT loss metric

* Simplify unit tests for AFT metric

* Add unit test to verify aggregate output from AFT metric

* Use EXPECT_* instead of ASSERT_*, so that we run all unit tests

* Use aft_loss_param when serializing AFTObj

This is to be consistent with AFT metric

* Add unit tests for AFT Objective

* Fix OpenMP bug; clarify semantics for shared variables used in OpenMP loops

* Add comments

* Remove AFT prefix from probability distribution; put probability distribution in separate source file

* Add comments

* Define kPI and kEulerMascheroni in probability_distribution.h

* Add probability_distribution.cc to amalgamation

* Remove unnecessary diff

* Address reviewer's feedback: define variables where they're used

* Eliminate all INFs and NANs from AFT loss and gradient

* Add demo

* Add tutorial

* Fix lint

* Use 'survival:aft' to be consistent with 'survival:cox'

* Move sample data to demo/data

* Add visual demo with 1D toy data

* Add Python tests

Co-authored-by: Philip Cho <chohyu01@cs.washington.edu>
2020-03-25 13:52:51 -07:00

5.7 KiB

1Survival_label_lower_boundSurvival_label_upper_boundAge_in_yearsKarnofsky_scoreMonths_from_DiagnosisCelltype=adenoCelltype=largeCelltype=smallcellCelltype=squamousPrior_therapy=noPrior_therapy=yesTreatment=standardTreatment=test
272.072.069.060.07.000011010
3411.0411.064.070.05.000010110
4228.0228.038.060.03.000011010
5126.0126.063.060.09.000010110
6118.0118.065.070.011.000010110
710.010.049.020.05.000011010
882.082.069.040.010.000010110
9110.0110.068.080.029.000011010
10314.0314.043.050.018.000011010
11100.0inf70.070.06.000011010
1242.042.081.060.04.000011010
138.08.063.040.058.000010110
14144.0144.063.030.04.000011010
1525.0inf52.080.09.000010110
1611.011.048.070.011.000010110
1730.030.061.060.03.000101010
18384.0384.042.060.09.000101010
194.04.035.040.02.000101010
2054.054.063.080.04.000100110
2113.013.056.060.04.000101010
22123.0inf55.040.03.000101010
2397.0inf67.060.05.000101010
24153.0153.063.060.014.000100110
2559.059.065.030.02.000101010
26117.0117.046.080.03.000101010
2716.016.053.030.04.000100110
28151.0151.069.050.012.000101010
2922.022.068.060.04.000101010
3056.056.043.080.012.000100110
3121.021.055.040.02.000100110
3218.018.042.020.015.000101010
33139.0139.064.080.02.000101010
3420.020.065.030.05.000101010
3531.031.065.075.03.000101010
3652.052.055.070.02.000101010
37287.0287.066.060.025.000100110
3818.018.060.030.04.000101010
3951.051.067.060.01.000101010
40122.0122.053.080.028.000101010
4127.027.062.060.08.000101010
4254.054.067.070.01.000101010
437.07.072.050.07.000101010
4463.063.048.050.011.000101010
45392.0392.068.040.04.000101010
4610.010.067.040.023.000100110
478.08.061.020.019.010000110
4892.092.060.070.010.010001010
4935.035.062.040.06.010001010
50117.0117.038.080.02.010001010
51132.0132.050.080.05.010001010
5212.012.063.050.04.010000110
53162.0162.064.080.05.010001010
543.03.043.030.03.010001010
5595.095.034.080.04.010001010
56177.0177.066.050.016.001000110
57162.0162.062.080.05.001001010
58216.0216.052.050.015.001001010
59553.0553.047.070.02.001001010
60278.0278.063.060.012.001001010
6112.012.068.040.012.001000110
62260.0260.045.080.05.001001010
63200.0200.041.080.012.001000110
64156.0156.066.070.02.001001010
65182.0inf62.090.02.001001010
66143.0143.060.090.08.001001010
67105.0105.066.080.011.001001010
68103.0103.038.080.05.001001010
69250.0250.053.070.08.001000110
70100.0100.037.060.013.001000110
71999.0999.054.090.012.000010101
72112.0112.060.080.06.000011001
7387.0inf48.080.03.000011001
74231.0inf52.050.08.000010101
75242.0242.070.050.01.000011001
76991.0991.050.070.07.000010101
77111.0111.062.070.03.000011001
781.01.065.020.021.000010101
79587.0587.058.060.03.000011001
80389.0389.062.090.02.000011001
8133.033.064.030.06.000011001
8225.025.063.020.036.000011001
83357.0357.058.070.013.000011001
84467.0467.064.090.02.000011001
85201.0201.052.080.028.000010101
861.01.035.050.07.000011001
8730.030.063.070.011.000011001
8844.044.070.060.013.000010101
89283.0283.051.090.02.000011001
9015.015.040.050.013.000010101
9125.025.069.030.02.000101001
92103.0inf36.070.022.000100101
9321.021.071.020.04.000101001
9413.013.062.030.02.000101001
9587.087.060.060.02.000101001
962.02.044.040.036.000100101
9720.020.054.030.09.000100101
987.07.066.020.011.000101001
9924.024.049.060.08.000101001
10099.099.072.070.03.000101001
1018.08.068.080.02.000101001
10299.099.062.085.04.000101001
10361.061.071.070.02.000101001
10425.025.070.070.02.000101001
10595.095.061.070.01.000101001
10680.080.071.050.017.000101001
10751.051.059.030.087.000100101
10829.029.067.040.08.000101001
10924.024.060.040.02.010001001
11018.018.069.040.05.010000101
11183.0inf57.099.03.010001001
11231.031.039.080.03.010001001
11351.051.062.060.05.010001001
11490.090.050.060.022.010000101
11552.052.043.060.03.010001001
11673.073.070.060.03.010001001
1178.08.066.050.05.010001001
11836.036.061.070.08.010001001
11948.048.081.010.04.010001001
1207.07.058.040.04.010001001
121140.0140.063.070.03.010001001
122186.0186.060.090.03.010001001
12384.084.062.080.04.010000101
12419.019.042.050.010.010001001
12545.045.069.040.03.010001001
12680.080.063.040.04.010001001
12752.052.045.060.04.001001001
128164.0164.068.070.015.001000101
12919.019.039.030.04.001000101
13053.053.066.060.012.001001001
13115.015.063.030.05.001001001
13243.043.049.060.011.001000101
133340.0340.064.080.010.001000101
134133.0133.065.075.01.001001001
135111.0111.064.060.05.001001001
136231.0231.067.070.018.001000101
137378.0378.065.080.04.001001001
13849.049.037.030.03.001001001