1345 Commits

Author SHA1 Message Date
Jiaming Yuan
ca17f8a5fc
Dispatch thrust versions and upgrade rmm. (#7254)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-09-25 03:43:23 +08:00
ShvetsKS
475fd1abec
Reduced span overheads in objective function calculate (#7206)
Co-authored-by: fis <jm.yuan@outlook.com>
2021-09-23 04:43:59 +08:00
david-cortes
4f93e5586a
Improve wording for warning (#7248)
This warning sounds  a bit ungrammatical. Additionally, the second part of the warning is not clear. This PR changes the wording to make it clearer.
2021-09-21 10:48:11 +08:00
Jiaming Yuan
c311a8c1d8
Enable compiling with system cub. (#7232)
- Tested with all CUDA 11.x.
- Workaround cub scan by using discard iterator in AUC.
- Limit the size of Argsort when compiled with CUDA cub.
2021-09-17 14:28:18 +08:00
Jiaming Yuan
22d56cebf1
Encode pandas categorical data automatically. (#7231) 2021-09-17 11:09:55 +08:00
Jiaming Yuan
31c1e13f90
Categorical data support in CPU sketching. (#7221) 2021-09-17 04:37:09 +08:00
Jiaming Yuan
0ed979b096
Support more input types for categorical data. (#7220)
* Support more input types for categorical data.

* Shorten the type name from "categorical" to "c".
* Tests for np/cp array and scipy csr/csc/coo.
* Specify the type for feature info.
2021-09-16 20:39:30 +08:00
Jiaming Yuan
2942dc68e4
Fix mixed types in GPU sketching. (#7228) 2021-09-16 00:10:25 +08:00
Jiaming Yuan
3515931305
Initial support for external memory in gradient index. (#7183)
* Add hessian to batch param in preparation of new approx impl.
* Extract a push method for gradient index matrix.
* Use span instead of vector ref for hessian in sketching.
* Create a binary format for gradient index.
2021-09-13 12:40:56 +08:00
Jiaming Yuan
804b2ac60f
Expose DMatrix API for CUDA columnar and array. (#7217)
* Use JSON encoded configurations.
* Expose them into header file.
2021-09-09 17:55:25 +08:00
Jiaming Yuan
b12e7f7edd
Add noexcept to JSON objects. (#7205) 2021-09-07 13:56:48 +08:00
Jiaming Yuan
3a4f51f39f
Avoid calling CUDA code on CPU for linear model. (#7154) 2021-09-01 10:45:31 +08:00
Jiaming Yuan
ba69244a94
Restore the custom double atomic add. (#7198) 2021-08-28 18:30:42 +08:00
Jiaming Yuan
7a1d67f9cb
[breaking] Use integer atomic for GPU histogram. (#7180)
On GPU we use rouding factor to truncate the gradient for deterministic results. This PR changes the gradient representation to fixed point number with exponent aligned with rounding factor.

    [breaking] Drop non-deterministic histogram.
    Use fixed point for shared memory.

This PR is to improve the performance of GPU Hist. 

Co-authored-by: Andy Adinets <aadinets@nvidia.com>
2021-08-28 05:17:05 +08:00
Jiaming Yuan
e7d7ab6bc3
Better error message for ncclUnhandledCudaError. (#7190) 2021-08-27 10:29:22 +08:00
Jiaming Yuan
ee8d1f5ed8
Fix histogram truncation. (#7181)
* Fix truncation.

* Lint.

* lint.
2021-08-24 18:34:32 -07:00
Jiaming Yuan
bf562bd33c
Remove unused code. (#7175) 2021-08-18 14:02:19 +08:00
Jiaming Yuan
9600ca83f3
Remove synchronization in monitor. (#7164)
* Remove synchronization in monitor.

Calling rabit functions during destruction is flaky.

* Add xgboost prefix to nvtx marker.
2021-08-11 16:33:53 +08:00
Jiaming Yuan
149f209af6
Extract histogram builder from CPU Hist. (#7152)
* Extract the CPU histogram builder.
* Fix tests.
* Reduce number of histograms being built.
2021-08-09 21:15:21 +08:00
Philip Hyunsu Cho
336af4f974
Work around a segfault observed in SparsePage::Push() (#7161)
* Work around a segfault observed in SparsePage::Push()

* Revert "Work around a segfault observed in SparsePage::Push()"

This reverts commit 30934844d00908750a5442082eb4769b1489f6a9.

* Don't call vector::resize() inside OpenMP block

* Set GITHUB_PAT env var to fix R tests

* Use built-in GITHUB_TOKEN
2021-08-08 02:12:30 -07:00
Jiaming Yuan
8a84be37b8
Pass scikit learn estimator checks for regressor. (#7130)
* Check data shape.
* Check labels.
2021-08-03 18:58:20 +08:00
Jiaming Yuan
d080b5a953
Fix model slicing. (#7149)
* Use correct pointer.
* Remove best_iteration/best_score.
2021-08-03 11:51:56 +08:00
Robert Maynard
1a75f43304
Allow compilation with nvcc 11.4 (#7131)
* Use type aliases for discard iterators

* update to include host_vector as thrust 1.12 doesn't bring it in as a side-effect

* cub::DispatchRadixSort requires signed offset types
2021-07-27 20:05:33 +08:00
Taewoo Kim
41e882f80b
Check input value is duplicated when quantile queue is full (#7091)
Co-authored-by: Taewoo Kim <taewoo@layer6.com>
2021-07-23 03:07:01 +08:00
ShvetsKS
caa9e527dd
Remove extra sync for dense data (#7120)
Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
2021-07-22 19:02:31 +08:00
Jiaming Yuan
e6088366df
Export Python Interface for external memory. (#7070)
* Add Python iterator interface.
* Add tests.
* Add demo.
* Add documents.
* Handle empty dataset.
2021-07-22 15:15:53 +08:00
farfarawayzyt
e64ee6592f
fix typo in src/common/hist.cc BuildHistKernel (#7116) 2021-07-21 19:53:05 +08:00
farfarawayzyt
d7c14496d2
fix typo in arguments of PartitionBuilder::Init (#7113)
Co-authored-by: Yuntian Zhang <zhangyt@lamda.nju.edu.cn>
2021-07-16 15:46:22 +08:00
Jiaming Yuan
bd1f3a38f0
Rewrite sparse dmatrix using callbacks. (#7092)
- Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves.
- Remove use of threaded iterator and IO queue.
- Remove `page_size`.
- Make sure the number of pages in memory is bounded.
- Make sure the cache can not be violated.
- Provide an interface for internal algorithms to process data asynchronously.
2021-07-16 12:33:31 +08:00
Jiaming Yuan
abec3dbf6d
Fix thread safety of softmax prediction. (#7104) 2021-07-16 02:06:55 +08:00
Jiaming Yuan
345796825f
Optional find dependency in installed cmake config. (#7099)
* Find dependency only when xgboost is built as static library.
* Resolve msvc warning.
* Add test for linking shared library.
2021-07-11 17:20:55 +08:00
Jiaming Yuan
77f6cf2d13
Support hessian in host sketch container. (#7081)
Prepare for migrating approx onto hist's codebase.
2021-07-08 16:33:58 +08:00
Jiaming Yuan
84d359efb8
Support host data in proxy DMatrix. (#7087) 2021-07-08 11:35:48 +08:00
Jiaming Yuan
5d7cdf2e36
[Breaking] Rename Quantile DMatrix C API. (#7082)
The role of ProxyDMatrix is going beyond what it was designed.  Now it's used by both
QuantileDeviceDMatrix and inplace prediction.  After the refactoring of sparse DMatrix it
will also be used for external memory.  Renaming the C API to extract it from
QuantileDeviceDMatrix.
2021-07-08 11:34:14 +08:00
Jiaming Yuan
c766f143ab
Refactor external memory formats. (#7089)
* Save base_rowid.
* Return write size.
* Remove unused function.
2021-07-08 04:04:51 +08:00
Jiaming Yuan
689eb8f620
Check external memory support for exact tree method. (#7088) 2021-07-08 02:12:57 +08:00
Jiaming Yuan
615ab2b03e
Extract evaluate splits from CPU hist. (#7079)
Other than modularizing the split evaluation function, this PR also removes some more functions including `InitNewNodes` and `BuildNodeStats` among some other unused variables.  Also, scattered code like setting leaf weights is grouped into the split evaluator and `NodeEntry` is simplified and made private.  Another subtle difference with the original implementation is that the modified code doesn't call `tree[nidx].Parent()` to traversal upward.
2021-07-07 15:16:25 +08:00
Jiaming Yuan
116d711815
Make SimpleDMatrix ctor reusable. (#7075) 2021-07-06 13:38:24 +08:00
Jiaming Yuan
d7e1fa7664
Fix feature names and types in output model slice. (#7078) 2021-07-06 11:47:49 +08:00
Jiaming Yuan
1cd20efe68
Move GHistIndex into DMatrix. (#7064) 2021-07-01 00:44:49 +08:00
Jiaming Yuan
1c8fdf2218
Remove use of device_idx in dh::LaunchN. (#7063)
It's an unused parameter, removing it can make the CI log more readable.
2021-06-29 11:37:26 +08:00
Jiaming Yuan
8fa32fdda2
Implement categorical data support for SHAP. (#7053)
* Add CPU implementation.
* Update GPUTreeSHAP.
* Add GPU implementation by defining custom split condition.
2021-06-25 19:02:46 +08:00
Jiaming Yuan
663136aa08
Implement feature score for linear model. (#7048)
* Add feature score support for linear model.
* Port R interface to the new implementation.
* Add linear model support in Python.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-06-25 14:34:02 +08:00
Jiaming Yuan
bbfffb444d
Fix race condition in CPU shap. (#7050) 2021-06-21 10:03:15 +08:00
Jiaming Yuan
29f8fd6fee
Support categorical split in tree model dump. (#7036) 2021-06-18 16:46:20 +08:00
Jiaming Yuan
7968c0d051
Test on s390x. (#7038)
* Fix && remove unused parameter.
2021-06-18 14:55:08 +08:00
Jiaming Yuan
86715e4cd4
Support categorical data for dask functional interface and DQM. (#7043)
* Support categorical data for dask functional interface and DQM.

* Implement categorical data support for GPU GK-merge.
* Add support for dask functional interface.
* Add support for DQM.

* Get newer cupy.
2021-06-18 13:06:52 +08:00
Jiaming Yuan
7dd29ffd47
Implement feature score in GBTree. (#7041)
* Categorical data support.
* Eliminate text parsing during feature score computation.
2021-06-18 11:53:16 +08:00
Jiaming Yuan
5c2d7a18c9
Parallel model dump for trees. (#7040) 2021-06-15 14:08:26 +08:00
ShvetsKS
2567404ab6
Simplify sparse and dense CPU hist kernels (#7029)
* Simplify sparse and dense kernels
* Extract row partitioner.

Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-06-11 18:26:30 +08:00