102 Commits

Author SHA1 Message Date
Jiaming Yuan
bb5e18c29c
Fix CUDA async stream. (#8380) 2022-10-22 23:13:28 +08:00
Jiaming Yuan
031d66ec27
Configuration for init estimation. (#8343)
* Configuration for init estimation.

* Check whether the model needs configuration based on const attribute `ModelFitted`
instead of a mutable state.
* Add parameter `boost_from_average` to tell whether the user has specified base score.
* Add tests.
2022-10-18 01:52:24 +08:00
Rong Ou
668b8a0ea4
[Breaking] Switch from rabit to the collective communicator (#8257)
* Switch from rabit to the collective communicator

* fix size_t specialization

* really fix size_t

* try again

* add include

* more include

* fix lint errors

* remove rabit includes

* fix pylint error

* return dict from communicator context

* fix communicator shutdown

* fix dask test

* reset communicator mocklist

* fix distributed tests

* do not save device communicator

* fix jvm gpu tests

* add python test for federated communicator

* Update gputreeshap submodule

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 14:39:01 -08:00
Jiaming Yuan
fffb1fca52
Calculate base_score based on input labels for mae. (#8107)
Fit an intercept as base score for abs loss.
2022-09-20 20:53:54 +08:00
Jiaming Yuan
142a208a90
Fix compiler warnings. (#8022)
- Remove/fix unused parameters
- Remove deprecated code in rabit.
- Update dmlc-core.
2022-06-22 21:29:10 +08:00
Jiaming Yuan
4fcfd9c96e
Fix and cleanup for column matrix. (#7901)
* Fix missed type dispatching for dense columns with missing values.
* Code cleanup to reduce special cases.
* Reduce memory usage.
2022-05-16 21:11:50 +08:00
Philip Hyunsu Cho
4cd14aee5a
Rename misspelled config parameter for pseudo-Huber (#7904) 2022-05-15 06:38:33 -07:00
Jiaming Yuan
11d65fcb21
Extract partial sum into an independent function. (#7889) 2022-05-13 14:30:35 +08:00
Jiaming Yuan
fdf533f2b9
[POC] Experimental support for l1 error. (#7812)
Support adaptive tree, a feature supported by both sklearn and lightgbm.  The tree leaf is recomputed based on residue of labels and predictions after construction.

For l1 error, the optimal value is the median (50 percentile).

This is marked as experimental support for the following reasons:
- The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors.
- Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.
2022-04-26 21:41:55 +08:00
Jiaming Yuan
98d6faefd6
Implement slope for Pseduo-Huber. (#7727)
* Add objective and metric.
* Some refactoring for CPU/GPU dispatching using linalg module.
2022-03-14 21:42:38 +08:00
Jiaming Yuan
81210420c6
Remove omp_get_max_threads (#7608)
This is the one last PR for removing omp global variable.

* Add context object to the `DMatrix`.  This bridges `DMatrix` with https://github.com/dmlc/xgboost/issues/7308 .
* Require context to be available at the construction time of booster.
* Add `n_threads` support for R csc DMatrix constructor.
* Remove `omp_get_max_threads` in R glue code.
* Remove threading utilities that rely on omp global variable.
2022-01-28 16:09:22 +08:00
Jiaming Yuan
6967ef7267
Remove omp_get_max_threads in objective. (#7589) 2022-01-24 04:35:49 +08:00
Jiaming Yuan
58a6723eb1
Initial support for multioutput regression. (#7514)
* Add num target model parameter, which is configured from input labels.
* Change elementwise metric and indexing for weights.
* Add demo.
* Add tests.
2021-12-18 09:28:38 +08:00
Jiaming Yuan
5b1161bb64
Convert labels into tensor. (#7456)
* Add a new ctor to tensor for `initilizer_list`.
* Change labels from host device vector to tensor.
* Rename the field from `labels_` to `labels` since it's a public member.
2021-12-17 00:58:35 +08:00
Jiaming Yuan
4100827971
Pass infomation about objective to tree methods. (#7385)
* Define the `ObjInfo` and pass it down to every tree updater.
2021-11-04 01:52:44 +08:00
ShvetsKS
475fd1abec
Reduced span overheads in objective function calculate (#7206)
Co-authored-by: fis <jm.yuan@outlook.com>
2021-09-23 04:43:59 +08:00
Jiaming Yuan
abec3dbf6d
Fix thread safety of softmax prediction. (#7104) 2021-07-16 02:06:55 +08:00
Jiaming Yuan
1c8fdf2218
Remove use of device_idx in dh::LaunchN. (#7063)
It's an unused parameter, removing it can make the CI log more readable.
2021-06-29 11:37:26 +08:00
Andrew Ziem
3e7e426b36
Fix spelling in documents (#6948)
* Update roxygen2 doc.

Co-authored-by: fis <jm.yuan@outlook.com>
2021-05-11 20:44:36 +08:00
Jiaming Yuan
1d90577800
Verify strictly positive labels for gamma regression. (#6778)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-03-25 11:46:52 +08:00
Jiaming Yuan
bcc0277338
Re-implement ROC-AUC. (#6747)
* Re-implement ROC-AUC.

* Binary
* MultiClass
* LTR
* Add documents.

This PR resolves a few issues:
  - Define a value when the dataset is invalid, which can happen if there's an
  empty dataset, or when the dataset contains only positive or negative values.
  - Define ROC-AUC for multi-class classification.
  - Define weighted average value for distributed setting.
  - A correct implementation for learning to rank task.  Previous
  implementation is just binary classification with averaging across groups,
  which doesn't measure ordered learning to rank.
2021-03-20 16:52:40 +08:00
Louis Desreumaux
9b530e5697
Improve OpenMP exception handling (#6680) 2021-02-25 13:56:16 +08:00
Jiaming Yuan
a9ec0ea6da
Align device id in predict transform with predictor. (#6662) 2021-02-02 08:33:29 +08:00
Philip Hyunsu Cho
0f2ed21a9d
[Breaking] Change default evaluation metric for binary:logitraw objective to logloss (#6647) 2021-01-29 00:12:12 +09:00
Philip Hyunsu Cho
ad1a527709
Enable loading model from <1.0.0 trained with objective='binary:logitraw' (#6517)
* Enable loading model from <1.0.0 trained with objective='binary:logitraw'

* Add binary:logitraw in model compatibility testing suite

* Feedback from @trivialfis: Override ProbToMargin() for LogisticRaw

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2020-12-16 16:53:46 -08:00
Igor Moura
d1254808d5
Clean up C++ warnings (#6213) 2020-10-19 23:02:33 +08:00
vcarpani
6bc9747df5
Reduce compile warnings (#6198)
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-10-08 23:14:59 +08:00
Christian Lorentzen
cf4f019ed6
[Breaking] Change default evaluation metric for classification to logloss / mlogloss (#6183)
* Change DefaultEvalMetric of classification from error to logloss

* Change default binary metric in plugin/example/custom_obj.cc

* Set old error metric in python tests

* Set old error metric in R tests

* Fix missed eval metrics and typos in R tests

* Fix setting eval_metric twice in R tests

* Add warning for empty eval_metric for classification

* Fix Dask tests

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-10-02 12:06:47 -07:00
Jiaming Yuan
6f7112a848
Move warning about empty dataset. (#5998) 2020-08-11 14:10:51 +08:00
Philip Hyunsu Cho
71b0528a2f
GPU implementation of AFT survival objective and metric (#5714)
* Add interval accuracy

* De-virtualize AFT functions

* Lint

* Refactor AFT metric using GPU-CPU reducer

* Fix R build

* Fix build on Windows

* Fix copyright header

* Clang-tidy

* Fix crashing demo

* Fix typos in comment; explain GPU ID

* Remove unnecessary #include

* Add C++ test for interval accuracy

* Fix a bug in accuracy metric: use log pred

* Refactor AFT objective using GPU-CPU Transform

* Lint

* Fix lint

* Use Ninja to speed up build

* Use time, not /usr/bin/time

* Add cpu_build worker class, with concurrency = 1

* Use concurrency = 1 only for CUDA build

* concurrency = 1 for clang-tidy

* Address reviewer's feedback

* Update link to AFT paper
2020-07-17 01:18:13 -07:00
Jiaming Yuan
7c2686146e
Dask device dmatrix (#5901)
* Fix softprob with empty dmatrix.
2020-07-17 13:17:43 +08:00
Philip Hyunsu Cho
1d22a9be1c
Revert "Reorder includes. (#5749)" (#5771)
This reverts commit d3a0efbf162f3dceaaf684109e1178c150b32de3.
2020-06-09 10:29:28 -07:00
Jiaming Yuan
d3a0efbf16
Reorder includes. (#5749)
* Reorder includes.

* R.
2020-06-03 17:30:47 +12:00
ShvetsKS
057c762ecd
Fix release degradation (#5720)
* fix release degradation, related to 5666

* less resizes

Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
2020-05-31 04:37:54 +03:00
Andy Adinets
646def51e0
C++14 for xgboost (#5664) 2020-05-21 12:26:40 +12:00
LionOrCatThatIsTheQuestion
83981a9ce3
Pseudo-huber loss metric added (#5647)
- Add pseudo huber loss objective.
- Add pseudo huber loss metric.

Co-authored-by: Reetz <s02reetz@iavgroup.local>
2020-05-18 21:08:07 +08:00
Jiaming Yuan
c90457f489
Refactor the CLI. (#5574)
* Enable parameter validation.
* Enable JSON.
* Catch `dmlc::Error`.
* Show help message.
2020-04-26 10:56:33 +08:00
Philip Hyunsu Cho
f68155de6c
Fix compilation on Mac OSX High Sierra (10.13) (#5597)
* Fix compilation on Mac OSX High Sierra

* [CI] Build Mac OSX binary wheel using Travis CI
2020-04-25 10:53:03 -07:00
Rory Mitchell
e268fb0093
Use thrust functions instead of custom functions (#5544) 2020-04-16 21:41:16 +12:00
Jiaming Yuan
0012f2ef93
Upgrade clang-tidy on CI. (#5469)
* Correct all clang-tidy errors.
* Upgrade clang-tidy to 10 on CI.

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-04-05 04:42:29 +08:00
Avinash Barnwal
dcf439932a
Add Accelerated Failure Time loss for survival analysis task (#4763)
* [WIP] Add lower and upper bounds on the label for survival analysis

* Update test MetaInfo.SaveLoadBinary to account for extra two fields

* Don't clear qids_ for version 2 of MetaInfo

* Add SetInfo() and GetInfo() method for lower and upper bounds

* changes to aft

* Add parameter class for AFT; use enum's to represent distribution and event type

* Add AFT metric

* changes to neg grad to grad

* changes to binomial loss

* changes to overflow

* changes to eps

* changes to code refactoring

* changes to code refactoring

* changes to code refactoring

* Re-factor survival analysis

* Remove aft namespace

* Move function bodies out of AFTNormal and AFTLogistic, to reduce clutter

* Move function bodies out of AFTLoss, to reduce clutter

* Use smart pointer to store AFTDistribution and AFTLoss

* Rename AFTNoiseDistribution enum to AFTDistributionType for clarity

The enum class was not a distribution itself but a distribution type

* Add AFTDistribution::Create() method for convenience

* changes to extreme distribution

* changes to extreme distribution

* changes to extreme

* changes to extreme distribution

* changes to left censored

* deleted cout

* changes to x,mu and sd and code refactoring

* changes to print

* changes to hessian formula in censored and uncensored

* changes to variable names and pow

* changes to Logistic Pdf

* changes to parameter

* Expose lower and upper bound labels to R package

* Use example weights; normalize log likelihood metric

* changes to CHECK

* changes to logistic hessian to standard formula

* changes to logistic formula

* Comply with coding style guideline

* Revert back Rabit submodule

* Revert dmlc-core submodule

* Comply with coding style guideline (clang-tidy)

* Fix an error in AFTLoss::Gradient()

* Add missing files to amalgamation

* Address @RAMitchell's comment: minimize future change in MetaInfo interface

* Fix lint

* Fix compilation error on 32-bit target, when size_t == bst_uint

* Allocate sufficient memory to hold extra label info

* Use OpenMP to speed up

* Fix compilation on Windows

* Address reviewer's feedback

* Add unit tests for probability distributions

* Make Metric subclass of Configurable

* Address reviewer's feedback: Configure() AFT metric

* Add a dummy test for AFT metric configuration

* Complete AFT configuration test; remove debugging print

* Rename AFT parameters

* Clarify test comment

* Add a dummy test for AFT loss for uncensored case

* Fix a bug in AFT loss for uncensored labels

* Complete unit test for AFT loss metric

* Simplify unit tests for AFT metric

* Add unit test to verify aggregate output from AFT metric

* Use EXPECT_* instead of ASSERT_*, so that we run all unit tests

* Use aft_loss_param when serializing AFTObj

This is to be consistent with AFT metric

* Add unit tests for AFT Objective

* Fix OpenMP bug; clarify semantics for shared variables used in OpenMP loops

* Add comments

* Remove AFT prefix from probability distribution; put probability distribution in separate source file

* Add comments

* Define kPI and kEulerMascheroni in probability_distribution.h

* Add probability_distribution.cc to amalgamation

* Remove unnecessary diff

* Address reviewer's feedback: define variables where they're used

* Eliminate all INFs and NANs from AFT loss and gradient

* Add demo

* Add tutorial

* Fix lint

* Use 'survival:aft' to be consistent with 'survival:cox'

* Move sample data to demo/data

* Add visual demo with 1D toy data

* Add Python tests

Co-authored-by: Philip Cho <chohyu01@cs.washington.edu>
2020-03-25 13:52:51 -07:00
sriramch
b81f8cbbc0
Move segment sorter to common (#5378)
- move segment sorter to common
- this is the first of a handful of pr's that splits the larger pr #5326
- it moves this facility to common (from ranking objective class), so that it can be
    used for metric computation
- it also wraps all the bald device pointers into span.
2020-02-29 15:42:07 +08:00
Jiaming Yuan
f2b8cd2922
Add number of columns to native data iterator. (#5202)
* Change native data iter into an adapter.
2020-02-25 23:42:01 +08:00
Jiaming Yuan
1891cc766d
Fix metainfo from DataFrame. (#5216)
* Fix metainfo from DataFrame.

* Unify helper functions for data and meta.
2020-01-22 16:29:44 +08:00
sriramch
ee81ba8e1f implementation of map ranking algorithm on gpu (#5129)
* - implementation of map ranking algorithm
  - also effected necessary suggestions mentioned in the earlier ranking pr's
  - made some performance improvements to the ndcg algo as well
2019-12-27 12:05:37 +13:00
Jiaming Yuan
c8bdb652c4
Add check for length of weights. (#4872) 2019-12-21 11:30:58 +08:00
sriramch
2abe69d774 - ndcg ltr implementation on gpu (#5004)
* - ndcg ltr implementation on gpu
  - this is a follow-up to the pairwise ltr implementation
2019-11-13 11:21:04 +13:00
Jiaming Yuan
7663de956c
Run training with empty DMatrix. (#4990)
This makes GPU Hist robust in distributed environment as some workers might not
be associated with any data in either training or evaluation.

* Disable rabit mock test for now: See #5012 .

* Disable dask-cudf test at prediction for now: See #5003

* Launch dask job for all workers despite they might not have any data.
* Check 0 rows in elementwise evaluation metrics.

   Using AUC and AUC-PR still throws an error.  See #4663 for a robust fix.

* Add tests for edge cases.
* Add `LaunchKernel` wrapper handling zero sized grid.
* Move some parts of allreducer into a cu file.
* Don't validate feature names when the booster is empty.

* Sync number of columns in DMatrix.

  As num_feature is required to be the same across all workers in data split
  mode.

* Filtering in dask interface now by default syncs all booster that's not
empty, instead of using rank 0.

* Fix Jenkins' GPU tests.

* Install dask-cuda from source in Jenkins' test.

  Now all tests are actually running.

* Restore GPU Hist tree synchronization test.

* Check UUID of running devices.

  The check is only performed on CUDA version >= 10.x, as 9.x doesn't have UUID field.

* Fix CMake policy and project variables.

  Use xgboost_SOURCE_DIR uniformly, add policy for CMake >= 3.13.

* Fix copying data to CPU

* Fix race condition in cpu predictor.

* Fix duplicated DMatrix construction.

* Don't download extra nccl in CI script.
2019-11-06 16:13:13 +08:00
Jiaming Yuan
6ec7e300bd
Fix external memory race in colmaker. (#4980)
* Move `GetColDensity` out of omp parallel block.
2019-10-25 04:11:13 -04:00
Jiaming Yuan
ac457c56a2
Use `UpdateAllowUnknown' for non-model related parameter. (#4961)
* Use `UpdateAllowUnknown' for non-model related parameter.

Model parameter can not pack an additional boolean value due to binary IO
format.  This commit deals only with non-model related parameter configuration.

* Add tidy command line arg for use-dmlc-gtest.
2019-10-23 05:50:12 -04:00