4257 Commits

Author SHA1 Message Date
Rory Mitchell
13b10a6370
Device dmatrix (#5420) 2020-03-28 14:42:21 +13:00
Jiaming Yuan
780de49ddb
Resolve travis failure. (#5445)
* Install dependencies by pip.
2020-03-27 19:37:58 +08:00
Jiaming Yuan
4942da64ae
Refactor tests with data generator. (#5439) 2020-03-27 06:44:44 +08:00
Jiaming Yuan
7146b91d5a
Force compressed buffer to be 4 bytes aligned. (#5441) 2020-03-27 06:43:52 +08:00
Avinash Barnwal
dcf439932a
Add Accelerated Failure Time loss for survival analysis task (#4763)
* [WIP] Add lower and upper bounds on the label for survival analysis

* Update test MetaInfo.SaveLoadBinary to account for extra two fields

* Don't clear qids_ for version 2 of MetaInfo

* Add SetInfo() and GetInfo() method for lower and upper bounds

* changes to aft

* Add parameter class for AFT; use enum's to represent distribution and event type

* Add AFT metric

* changes to neg grad to grad

* changes to binomial loss

* changes to overflow

* changes to eps

* changes to code refactoring

* changes to code refactoring

* changes to code refactoring

* Re-factor survival analysis

* Remove aft namespace

* Move function bodies out of AFTNormal and AFTLogistic, to reduce clutter

* Move function bodies out of AFTLoss, to reduce clutter

* Use smart pointer to store AFTDistribution and AFTLoss

* Rename AFTNoiseDistribution enum to AFTDistributionType for clarity

The enum class was not a distribution itself but a distribution type

* Add AFTDistribution::Create() method for convenience

* changes to extreme distribution

* changes to extreme distribution

* changes to extreme

* changes to extreme distribution

* changes to left censored

* deleted cout

* changes to x,mu and sd and code refactoring

* changes to print

* changes to hessian formula in censored and uncensored

* changes to variable names and pow

* changes to Logistic Pdf

* changes to parameter

* Expose lower and upper bound labels to R package

* Use example weights; normalize log likelihood metric

* changes to CHECK

* changes to logistic hessian to standard formula

* changes to logistic formula

* Comply with coding style guideline

* Revert back Rabit submodule

* Revert dmlc-core submodule

* Comply with coding style guideline (clang-tidy)

* Fix an error in AFTLoss::Gradient()

* Add missing files to amalgamation

* Address @RAMitchell's comment: minimize future change in MetaInfo interface

* Fix lint

* Fix compilation error on 32-bit target, when size_t == bst_uint

* Allocate sufficient memory to hold extra label info

* Use OpenMP to speed up

* Fix compilation on Windows

* Address reviewer's feedback

* Add unit tests for probability distributions

* Make Metric subclass of Configurable

* Address reviewer's feedback: Configure() AFT metric

* Add a dummy test for AFT metric configuration

* Complete AFT configuration test; remove debugging print

* Rename AFT parameters

* Clarify test comment

* Add a dummy test for AFT loss for uncensored case

* Fix a bug in AFT loss for uncensored labels

* Complete unit test for AFT loss metric

* Simplify unit tests for AFT metric

* Add unit test to verify aggregate output from AFT metric

* Use EXPECT_* instead of ASSERT_*, so that we run all unit tests

* Use aft_loss_param when serializing AFTObj

This is to be consistent with AFT metric

* Add unit tests for AFT Objective

* Fix OpenMP bug; clarify semantics for shared variables used in OpenMP loops

* Add comments

* Remove AFT prefix from probability distribution; put probability distribution in separate source file

* Add comments

* Define kPI and kEulerMascheroni in probability_distribution.h

* Add probability_distribution.cc to amalgamation

* Remove unnecessary diff

* Address reviewer's feedback: define variables where they're used

* Eliminate all INFs and NANs from AFT loss and gradient

* Add demo

* Add tutorial

* Fix lint

* Use 'survival:aft' to be consistent with 'survival:cox'

* Move sample data to demo/data

* Add visual demo with 1D toy data

* Add Python tests

Co-authored-by: Philip Cho <chohyu01@cs.washington.edu>
2020-03-25 13:52:51 -07:00
Rory Mitchell
1de36cdf1e
Add link to GPU documentation (#5437) 2020-03-24 09:29:29 +13:00
sriramch
d2231fc840
Ranking metric acceleration on the gpu (#5398) 2020-03-22 19:38:48 +13:00
Jiaming Yuan
cd7d6f7d59
[dask] Fix missing value for scikit-learn interface. (#5435) 2020-03-20 10:56:01 -04:00
James Lamb
4b7e2b7bff
[R-package] fixed uses of class() (#5426)
Thank you a lot. Good catch!
2020-03-20 14:51:20 +01:00
Jiaming Yuan
abca9908ba
Support pandas SparseArray. (#5431) 2020-03-20 21:40:22 +08:00
James Lamb
3cf665d3ec
[R-package] changed FindLibR to take advantage of CMake cache (#5427) 2020-03-20 03:32:15 +08:00
Jiaming Yuan
760d5d0c3c
[dask] Accept other inputs for prediction. (#5428)
* Returns a series when input is dataframe.

* Merge assert client.
2020-03-19 17:05:55 +08:00
Jiaming Yuan
8ca06ab329
[dask] Check non-equal when setting threads. (#5421)
* Check non-equal.

`nthread` can be restored from internal parameter, which is mis-interpreted as
user defined parameter.

* Check None.
2020-03-17 13:07:20 +08:00
Jiaming Yuan
b51124c158
[dask] Enable gridsearching with skl. (#5417) 2020-03-16 04:51:51 +08:00
Jiaming Yuan
761a5dbdfc
[dask] Honor nthreads from dask worker. (#5414) 2020-03-16 04:51:24 +08:00
Jiaming Yuan
21b671aa06
[dask] Order the prediction result. (#5416) 2020-03-15 19:34:04 +08:00
Jiaming Yuan
668e432e2d
[dask] Use DMLC_TASK_ID. (#5415) 2020-03-15 16:47:03 +08:00
Jiaming Yuan
fc88105620
Better error message for updating. (#5418) 2020-03-15 16:46:21 +08:00
Jiaming Yuan
ab7a46a1a4
Check whether current updater can modify a tree. (#5406)
* Check whether current updater can modify a tree.

* Fix tree model JSON IO for pruned trees.
2020-03-14 09:24:08 +08:00
Rory Mitchell
b745b7acce
Fix memory usage of device sketching (#5407) 2020-03-14 13:43:24 +13:00
Jan Borchmann
bb8c8df39d
[dask] passed through verbose for dask fit (#5413) 2020-03-14 06:33:53 +08:00
Jiaming Yuan
45a97ddf32
Split up LearnerImpl. (#5350) 2020-03-12 16:30:23 +08:00
Rory Mitchell
3ad4333b0e
Partial rewrite EllpackPage (#5352) 2020-03-11 10:15:53 +13:00
Darby Payne
7a99f8f27f
Adding static library option (#5397) 2020-03-10 18:22:15 +08:00
Bart Broere
a931589c96
Fix typo (#5399) 2020-03-09 19:41:39 +08:00
Rory Mitchell
a38e7bd19c
Sketching from adapters (#5365)
* Sketching from adapters

* Add weights test
2020-03-07 21:07:58 +13:00
Jiaming Yuan
0dd97c206b
Move thread local entry into Learner. (#5396)
* Move thread local entry into Learner.

This is an attempt to workaround CUDA context issue in static variable, where
the CUDA context can be released before device vector.

* Add PredictionEntry to thread local entry.

This eliminates one copy of prediction vector.

* Don't define CUDA C API in a namespace.
2020-03-07 15:37:39 +08:00
sriramch
1ba6706167
- create a gpu metrics (internal) registry (#5387)
* - create a gpu metrics (internal) registry
  - the objective is to separate the cpu and gpu implementations such that they evolve
    indepedently. to that end, this approach will:
    - preserve the same metrics configuration (from the end user perspective)
    - internally delegate the responsibility to the gpu metrics builder when there is a
      valid device present
    - decouple the gpu metrics builder from the cpu ones to prevent misuse
    - move away from including the cuda file from within the cc file and segregate the code
      via ifdef's
2020-03-07 15:31:35 +13:00
Jiaming Yuan
8d06878bf9
Deterministic GPU histogram. (#5361)
* Use pre-rounding based method to obtain reproducible floating point
  summation.
* GPU Hist for regression and classification are bit-by-bit reproducible.
* Add doc.
* Switch to thrust reduce for `node_sum_gradient`.
2020-03-04 15:13:28 +08:00
Philip Hyunsu Cho
9775da02d9
Add release note for 1.0.0 in NEWS.md (#5329)
* Add release note for 1.0.0

* Fix a small bug in the Python script that compiles the list of contributors

* Clarify governance of CI infrastructure; now PMC is formally in charge

* Address reviewer comment

* Fix typo
2020-03-03 21:35:43 -08:00
sriramch
5dc8e894c9
Fixes and changes to the ranking metrics computed on cpu (#5380)
* - fixes and changes to the ranking metrics computed on cpu
  - auc/aucpr ranking metric accelerated on cpu
  - fixes to the auc/aucpr metrics
2020-03-03 15:56:36 +13:00
Darius Kharazi
71a8b8c65a
Fix simple typo: information.c -> information (#5384)
Closes #5383
2020-03-03 08:50:14 +08:00
Egor Smirnov
1b97eaf7a7
Optimized ApplySplit, BuildHist and UpdatePredictCache functions on CPU (#5244)
* Split up sparse and dense build hist kernels.
* Add `PartitionBuilder`.
2020-02-29 16:11:42 +08:00
sriramch
b81f8cbbc0
Move segment sorter to common (#5378)
- move segment sorter to common
- this is the first of a handful of pr's that splits the larger pr #5326
- it moves this facility to common (from ranking objective class), so that it can be
    used for metric computation
- it also wraps all the bald device pointers into span.
2020-02-29 15:42:07 +08:00
Jiaming Yuan
2ba8c13b69
Revert "Enable rabit test (#5358)" (#5377)
This reverts commit 9a5efffebe9e527ca4dc27c281b06c221596f860.
2020-02-29 04:25:03 +08:00
Chen Qin
9a5efffebe
Enable rabit test (#5358) 2020-02-28 22:29:02 +08:00
Samrat Pandiri
2d76d40dfd
Update dask.rst to correct a spelling mistake (#5371)
Change `signle-node` to `single-node`
2020-02-27 20:46:41 +08:00
Jiaming Yuan
a461a9a90a
Define lazy isinstance for Python compat. (#5364)
* Avoid importing datatable.
* Fix #5363.
2020-02-26 14:23:33 +08:00
Jiaming Yuan
0fd455e162
Restore loading model from buffer. (#5360) 2020-02-26 11:30:13 +08:00
Jiaming Yuan
f2b8cd2922
Add number of columns to native data iterator. (#5202)
* Change native data iter into an adapter.
2020-02-25 23:42:01 +08:00
Jiaming Yuan
e0509b3307
Fix pruner. (#5335)
* Honor the tree depth.
* Prevent pruning pruned node.
2020-02-25 08:32:46 +08:00
Rory Mitchell
b0ed3f0a66
Remove unnecessary DMatrix methods (#5324) 2020-02-25 12:40:39 +13:00
Jiaming Yuan
655cf17b60
Predict on Ellpack. (#5327)
* Unify GPU prediction node.
* Add `PageExists`.
* Dispatch prediction on input data for GPU Predictor.
2020-02-23 06:27:03 +08:00
daiki katsuragawa
70a91ec3ba
Update README.md (#5346) 2020-02-23 02:52:37 +08:00
Philip Hyunsu Cho
cfae247231
Fix a small typo in sklearn.py that broke multiple eval metrics (#5341) 2020-02-22 19:02:37 +08:00
Rong Ou
d6b31df449
update docs for gpu external memory (#5332)
* update docs for gpu external memory

* add hist limitation
2020-02-22 14:57:40 +08:00
Philip Hyunsu Cho
7ac7e8778f
Port patches from 1.0.0 branch (#5336)
* Remove f-string, since it's not supported by Python 3.5 (#5330)

* Remove f-string, since it's not supported by Python 3.5

* Add Python 3.5 to CI, to ensure compatibility

* Remove duplicated matplotlib

* Show deprecation notice for Python 3.5

* Fix lint

* Fix lint

* Fix a unit test that mistook MINOR ver for PATCH ver

* Enforce only major version in JSON model schema

* Bump version to 1.1.0-SNAPSHOT
2020-02-21 13:13:21 -08:00
Philip Hyunsu Cho
8aa8ef1031
Display Sponsor button, link to OpenCollective (#5325) 2020-02-19 01:58:21 -08:00
Rory Mitchell
bc96ceb8b2
Refactor SparsePageSource, delete cache files after use (#5321)
* Refactor sparse page source

* Delete temporary cache files

* Log fatal if cache exists

* Log fatal if multiple threads used with prefetcher
2020-02-19 16:43:41 +13:00
Rory Mitchell
b2b2c4e231
Remove SimpleCSRSource (#5315) 2020-02-18 16:49:17 +13:00