76 Commits

Author SHA1 Message Date
Jiaming Yuan
912e341d57
Initial GPU support for the approx tree method. (#9414) 2023-07-31 15:50:28 +08:00
Rong Ou
7579905e18
Retry switching to per-thread default stream (#9416) 2023-07-26 07:09:12 +08:00
Jiaming Yuan
3a9996173e
Revert "Switch to per-thread default stream (#9396)" (#9413)
This reverts commit f7f673b00c15458fb4dd74a2a0d2ba80369c5faf.
2023-07-24 12:03:28 -07:00
Rong Ou
f7f673b00c
Switch to per-thread default stream (#9396) 2023-07-20 08:21:00 +08:00
Jiaming Yuan
ee6809e642
Use mmap for external memory. (#9282)
- Have basic infrastructure for mmap.
- Release file write handle.
2023-06-19 18:52:55 +08:00
ZHAOKAI WANG
2b76061659
remove redundant method in expand_entry (#9283) 2023-06-10 05:18:21 +08:00
Jiaming Yuan
08ce495b5d
Use Booster context in DMatrix. (#8896)
- Pass context from booster to DMatrix.
- Use context instead of integer for `n_threads`.
- Check the consistency configuration for `max_bin`.
- Test for all combinations of initialization options.
2023-04-28 21:47:14 +08:00
Jiaming Yuan
4d665b3fb0
Restore clang tidy test. (#8861) 2023-03-03 13:47:04 -08:00
Jiaming Yuan
594371e35b
Fix CPP lint. (#8807) 2023-02-15 20:16:35 +08:00
Jiaming Yuan
70c9b885ef
Extract floating point rounding routines. (#8771) 2023-02-12 04:26:41 +08:00
Rory Mitchell
7214a45e83
Fix different number of features in gpu_hist evaluator. (#8754) 2023-02-06 23:15:16 +08:00
Jiaming Yuan
c6a8754c62
Define CUDA Context. (#8604)
We will transition to non-default and non-blocking CUDA stream.
2022-12-20 15:15:07 +08:00
Jiaming Yuan
3e26107a9c
Rename and extract Context. (#8528)
* Rename `GenericParameter` to `Context`.
* Rename header file to reflect the change.
* Rename all references.
2022-12-07 04:58:54 +08:00
Robert Maynard
16f96b6cfb
Work with newer thrust and libcudacxx (#8454)
* Thrust 1.17 removes the experimental/pinned_allocator.

When xgboost is brought into a large project it can
be compiled against Thrust 1.17+ which don't offer
this experimental allocator.

To ensure that going forward xgboost works in all environments we provide a xgboost namespaced version of
the pinned_allocator that previously was in Thrust.
2022-11-11 04:22:53 +08:00
Rory Mitchell
210915c985
Use integer gradients in gpu_hist split evaluation (#8274) 2022-10-11 12:16:27 +02:00
Rong Ou
668b8a0ea4
[Breaking] Switch from rabit to the collective communicator (#8257)
* Switch from rabit to the collective communicator

* fix size_t specialization

* really fix size_t

* try again

* add include

* more include

* fix lint errors

* remove rabit includes

* fix pylint error

* return dict from communicator context

* fix communicator shutdown

* fix dask test

* reset communicator mocklist

* fix distributed tests

* do not save device communicator

* fix jvm gpu tests

* add python test for federated communicator

* Update gputreeshap submodule

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 14:39:01 -08:00
Rory Mitchell
8f77677193
Use quantised gradients in gpu_hist histograms (#8246) 2022-09-26 17:35:35 +02:00
Jiaming Yuan
b5eb36f1af
Add max_cat_threshold to GPU and handle missing cat values. (#8212) 2022-09-07 00:57:51 +08:00
Philip Hyunsu Cho
56395d120b
Work around MSVC behavior wrt constexpr capture (#8211)
* Work around MSVC behavior wrt constexpr capture

* Fix lint
2022-08-31 11:42:08 -08:00
Rory Mitchell
1703dc330f
Optimise histogram kernels (#8118) 2022-08-18 14:07:26 +02:00
Rory Mitchell
1be09848a7
Refactor split valuation kernel (#8073) 2022-07-21 15:41:50 +02:00
Jiaming Yuan
abaa593aa0
Fix compiler warnings. (#8059)
- Remove unused parameters.
- Avoid comparison of different signedness.
2022-07-14 05:29:56 +08:00
Rory Mitchell
0bdaca25ca
Use single precision in gain calculation, use pointers instead of span. (#8051) 2022-07-12 21:56:27 +02:00
Rory Mitchell
794cbaa60a
Fuse split evaluation kernels (#8026) 2022-07-05 10:24:31 +02:00
Rory Mitchell
bc4f802b17
Batch UpdatePosition using cudaMemcpy (#7964) 2022-06-30 17:52:40 +02:00
Jiaming Yuan
142a208a90
Fix compiler warnings. (#8022)
- Remove/fix unused parameters
- Remove deprecated code in rabit.
- Update dmlc-core.
2022-06-22 21:29:10 +08:00
Jiaming Yuan
1a33b50a0d
Fix compiler warnings. (#7974)
- Remove unused parameters. There are still many warnings that are not yet
addressed. Currently, the warnings in dmlc-core dominate the error log.
- Remove `distributed` parameter from metric.
- Fixes some warnings about signed comparison.
2022-06-06 22:56:25 +08:00
Rory Mitchell
71d3b2e036
Fuse gpu_hist all-reduce calls where possible (#7867) 2022-05-17 13:27:50 +02:00
Rory Mitchell
7ef54e39ec
Small refactor to categoricals (#7858) 2022-05-05 17:47:02 +02:00
Jiaming Yuan
317d7be6ee
Always use partition based categorical splits. (#7857) 2022-05-03 22:30:32 +08:00
Jiaming Yuan
fdf533f2b9
[POC] Experimental support for l1 error. (#7812)
Support adaptive tree, a feature supported by both sklearn and lightgbm.  The tree leaf is recomputed based on residue of labels and predictions after construction.

For l1 error, the optimal value is the median (50 percentile).

This is marked as experimental support for the following reasons:
- The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors.
- Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.
2022-04-26 21:41:55 +08:00
Jiaming Yuan
1d468e20a4
Optimize GPU evaluation function for categorical data. (#7705)
* Use transform and cache.
2022-02-28 17:46:29 +08:00
Jiaming Yuan
d625dc2047
Work around nvcc error. (#7673) 2022-02-19 01:41:46 +08:00
Jiaming Yuan
0d0abe1845
Support optimal partitioning for GPU hist. (#7652)
* Implement `MaxCategory` in quantile.
* Implement partition-based split for GPU evaluation.  Currently, it's based on the existing evaluation function.
* Extract an evaluator from GPU Hist to store the needed states.
* Added some CUDA stream/event utilities.
* Update document with references.
* Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
2022-02-15 03:03:12 +08:00
Ginko Balboa
29bfa94bb6
Fix external memory with gpu_hist and subsampling combination bug. (#7481)
Instead of accessing data from the `original_page_`, access the data from the first page of the available batch.

fix #7476

Co-authored-by: jiamingy <jm.yuan@outlook.com>
2021-12-24 11:15:35 +08:00
Jiaming Yuan
7f399eac8b
Use double for GPU Hist node sum. (#7507) 2021-12-22 08:41:35 +08:00
Jiaming Yuan
ccdabe4512
Support building gradient index with cat data. (#7371) 2021-11-03 22:37:37 +08:00
Jiaming Yuan
c311a8c1d8
Enable compiling with system cub. (#7232)
- Tested with all CUDA 11.x.
- Workaround cub scan by using discard iterator in AUC.
- Limit the size of Argsort when compiled with CUDA cub.
2021-09-17 14:28:18 +08:00
Jiaming Yuan
3515931305
Initial support for external memory in gradient index. (#7183)
* Add hessian to batch param in preparation of new approx impl.
* Extract a push method for gradient index matrix.
* Use span instead of vector ref for hessian in sketching.
* Create a binary format for gradient index.
2021-09-13 12:40:56 +08:00
Jiaming Yuan
7a1d67f9cb
[breaking] Use integer atomic for GPU histogram. (#7180)
On GPU we use rouding factor to truncate the gradient for deterministic results. This PR changes the gradient representation to fixed point number with exponent aligned with rounding factor.

    [breaking] Drop non-deterministic histogram.
    Use fixed point for shared memory.

This PR is to improve the performance of GPU Hist. 

Co-authored-by: Andy Adinets <aadinets@nvidia.com>
2021-08-28 05:17:05 +08:00
Jiaming Yuan
ee8d1f5ed8
Fix histogram truncation. (#7181)
* Fix truncation.

* Lint.

* lint.
2021-08-24 18:34:32 -07:00
Jiaming Yuan
bf562bd33c
Remove unused code. (#7175) 2021-08-18 14:02:19 +08:00
Robert Maynard
1a75f43304
Allow compilation with nvcc 11.4 (#7131)
* Use type aliases for discard iterators

* update to include host_vector as thrust 1.12 doesn't bring it in as a side-effect

* cub::DispatchRadixSort requires signed offset types
2021-07-27 20:05:33 +08:00
Jiaming Yuan
bd1f3a38f0
Rewrite sparse dmatrix using callbacks. (#7092)
- Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves.
- Remove use of threaded iterator and IO queue.
- Remove `page_size`.
- Make sure the number of pages in memory is bounded.
- Make sure the cache can not be violated.
- Provide an interface for internal algorithms to process data asynchronously.
2021-07-16 12:33:31 +08:00
Jiaming Yuan
1c8fdf2218
Remove use of device_idx in dh::LaunchN. (#7063)
It's an unused parameter, removing it can make the CI log more readable.
2021-06-29 11:37:26 +08:00
ShvetsKS
57c732655e
Merge lossgude and depthwise strategies for CPU hist (#7007)
* fix java/scala test: max depth is also valid parameter for lossguide

Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-06-03 01:49:43 +08:00
Andrew Ziem
3e7e426b36
Fix spelling in documents (#6948)
* Update roxygen2 doc.

Co-authored-by: fis <jm.yuan@outlook.com>
2021-05-11 20:44:36 +08:00
Philip Hyunsu Cho
4230dcb614
Re-introduce double buffer in UpdatePosition, to fix perf regression in gpu_hist (#6757)
* Revert "gpu_hist performance tweaks (#5707)"

This reverts commit f779980f7ea7f6f07e86229b8e78144e8a74e6b3.

* Address reviewer's comment

* Fix build error
2021-03-18 13:56:10 -07:00
Igor Moura
d1254808d5
Clean up C++ warnings (#6213) 2020-10-19 23:02:33 +08:00
Jiaming Yuan
bed7ae4083
Loop over thrust::reduce. (#6229)
* Check input chunk size of dqdm.
* Add doc for current limitation.
2020-10-14 10:40:56 +13:00