xgboost

Author	SHA1	Message	Date
Jiaming Yuan	fedd9674c8	Implement column sampler in CUDA. (#9785 ) - CUDA implementation. - Extract the broadcasting logic, we will need the context parameter after revamping the collective implementation. - Some changes to the event loop for fixing a deadlock in CI. - Move argsort into algorithms.cuh, add support for cuda stream.	2023-11-17 04:29:08 +08:00
Jiaming Yuan	06bdc15e9b	[coll] Pass context to various functions. (#9772 ) * [coll] Pass context to various functions. In the future, the `Context` object would be required for collective operations, this PR passes the context object to some required functions to prepare for swapping out the implementation.	2023-11-08 09:54:05 +08:00
Jiaming Yuan	7a02facc9d	Serialize expand entry for allgather. (#9702 )	2023-10-24 14:33:28 +08:00
Jiaming Yuan	8c676c889d	Remove internal use of gpu_id. (#9568 )	2023-09-20 23:29:51 +08:00
Rong Ou	9bab06cbca	Support column split in gpu hist updater (#9384 )	2023-08-31 18:09:35 +08:00
Jiaming Yuan	942b957eef	Fix GPU categorical split memory allocation. (#9529 )	2023-08-29 10:06:03 +08:00
Rong Ou	6103dca0bb	Support column split in GPU evaluate splits (#9511 )	2023-08-23 16:33:43 +08:00
Jiaming Yuan	912e341d57	Initial GPU support for the approx tree method. (#9414 )	2023-07-31 15:50:28 +08:00
Rong Ou	7579905e18	Retry switching to per-thread default stream (#9416 )	2023-07-26 07:09:12 +08:00
Jiaming Yuan	3a9996173e	Revert "Switch to per-thread default stream (#9396 )" (#9413 ) This reverts commit f7f673b00c15458fb4dd74a2a0d2ba80369c5faf.	2023-07-24 12:03:28 -07:00
Rong Ou	f7f673b00c	Switch to per-thread default stream (#9396 )	2023-07-20 08:21:00 +08:00
Jiaming Yuan	ee6809e642	Use mmap for external memory. (#9282 ) - Have basic infrastructure for mmap. - Release file write handle.	2023-06-19 18:52:55 +08:00
ZHAOKAI WANG	2b76061659	remove redundant method in `expand_entry` (#9283 )	2023-06-10 05:18:21 +08:00
Jiaming Yuan	08ce495b5d	Use Booster context in DMatrix. (#8896 ) - Pass context from booster to DMatrix. - Use context instead of integer for `n_threads`. - Check the consistency configuration for `max_bin`. - Test for all combinations of initialization options.	2023-04-28 21:47:14 +08:00
Jiaming Yuan	4d665b3fb0	Restore clang tidy test. (#8861 )	2023-03-03 13:47:04 -08:00
Jiaming Yuan	594371e35b	Fix CPP lint. (#8807 )	2023-02-15 20:16:35 +08:00
Jiaming Yuan	70c9b885ef	Extract floating point rounding routines. (#8771 )	2023-02-12 04:26:41 +08:00
Rory Mitchell	7214a45e83	Fix different number of features in gpu_hist evaluator. (#8754 )	2023-02-06 23:15:16 +08:00
Jiaming Yuan	c6a8754c62	Define CUDA Context. (#8604 ) We will transition to non-default and non-blocking CUDA stream.	2022-12-20 15:15:07 +08:00
Jiaming Yuan	3e26107a9c	Rename and extract `Context`. (#8528 ) * Rename `GenericParameter` to `Context`. * Rename header file to reflect the change. * Rename all references.	2022-12-07 04:58:54 +08:00
Robert Maynard	16f96b6cfb	Work with newer thrust and libcudacxx (#8454 ) * Thrust 1.17 removes the experimental/pinned_allocator. When xgboost is brought into a large project it can be compiled against Thrust 1.17+ which don't offer this experimental allocator. To ensure that going forward xgboost works in all environments we provide a xgboost namespaced version of the pinned_allocator that previously was in Thrust.	2022-11-11 04:22:53 +08:00
Rory Mitchell	210915c985	Use integer gradients in gpu_hist split evaluation (#8274 )	2022-10-11 12:16:27 +02:00
Rong Ou	668b8a0ea4	[Breaking] Switch from rabit to the collective communicator (#8257 ) * Switch from rabit to the collective communicator * fix size_t specialization * really fix size_t * try again * add include * more include * fix lint errors * remove rabit includes * fix pylint error * return dict from communicator context * fix communicator shutdown * fix dask test * reset communicator mocklist * fix distributed tests * do not save device communicator * fix jvm gpu tests * add python test for federated communicator * Update gputreeshap submodule Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-05 14:39:01 -08:00
Rory Mitchell	8f77677193	Use quantised gradients in gpu_hist histograms (#8246 )	2022-09-26 17:35:35 +02:00
Jiaming Yuan	b5eb36f1af	Add `max_cat_threshold` to GPU and handle missing cat values. (#8212 )	2022-09-07 00:57:51 +08:00
Philip Hyunsu Cho	56395d120b	Work around MSVC behavior wrt constexpr capture (#8211 ) * Work around MSVC behavior wrt constexpr capture * Fix lint	2022-08-31 11:42:08 -08:00
Rory Mitchell	1703dc330f	Optimise histogram kernels (#8118 )	2022-08-18 14:07:26 +02:00
Rory Mitchell	1be09848a7	Refactor split valuation kernel (#8073 )	2022-07-21 15:41:50 +02:00
Jiaming Yuan	abaa593aa0	Fix compiler warnings. (#8059 ) - Remove unused parameters. - Avoid comparison of different signedness.	2022-07-14 05:29:56 +08:00
Rory Mitchell	0bdaca25ca	Use single precision in gain calculation, use pointers instead of span. (#8051 )	2022-07-12 21:56:27 +02:00
Rory Mitchell	794cbaa60a	Fuse split evaluation kernels (#8026 )	2022-07-05 10:24:31 +02:00
Rory Mitchell	bc4f802b17	Batch UpdatePosition using cudaMemcpy (#7964 )	2022-06-30 17:52:40 +02:00
Jiaming Yuan	142a208a90	Fix compiler warnings. (#8022 ) - Remove/fix unused parameters - Remove deprecated code in rabit. - Update dmlc-core.	2022-06-22 21:29:10 +08:00
Jiaming Yuan	1a33b50a0d	Fix compiler warnings. (#7974 ) - Remove unused parameters. There are still many warnings that are not yet addressed. Currently, the warnings in dmlc-core dominate the error log. - Remove `distributed` parameter from metric. - Fixes some warnings about signed comparison.	2022-06-06 22:56:25 +08:00
Rory Mitchell	71d3b2e036	Fuse gpu_hist all-reduce calls where possible (#7867 )	2022-05-17 13:27:50 +02:00
Rory Mitchell	7ef54e39ec	Small refactor to categoricals (#7858 )	2022-05-05 17:47:02 +02:00
Jiaming Yuan	317d7be6ee	Always use partition based categorical splits. (#7857 )	2022-05-03 22:30:32 +08:00
Jiaming Yuan	fdf533f2b9	[POC] Experimental support for l1 error. (#7812 ) Support adaptive tree, a feature supported by both sklearn and lightgbm. The tree leaf is recomputed based on residue of labels and predictions after construction. For l1 error, the optimal value is the median (50 percentile). This is marked as experimental support for the following reasons: - The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors. - Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.	2022-04-26 21:41:55 +08:00
Jiaming Yuan	1d468e20a4	Optimize GPU evaluation function for categorical data. (#7705 ) * Use transform and cache.	2022-02-28 17:46:29 +08:00
Jiaming Yuan	d625dc2047	Work around nvcc error. (#7673 )	2022-02-19 01:41:46 +08:00
Jiaming Yuan	0d0abe1845	Support optimal partitioning for GPU hist. (#7652 ) * Implement `MaxCategory` in quantile. * Implement partition-based split for GPU evaluation. Currently, it's based on the existing evaluation function. * Extract an evaluator from GPU Hist to store the needed states. * Added some CUDA stream/event utilities. * Update document with references. * Fixed a bug in approx evaluator where the number of data points is less than the number of categories.	2022-02-15 03:03:12 +08:00
Ginko Balboa	29bfa94bb6	Fix external memory with gpu_hist and subsampling combination bug. (#7481 ) Instead of accessing data from the `original_page_`, access the data from the first page of the available batch. fix #7476 Co-authored-by: jiamingy <jm.yuan@outlook.com>	2021-12-24 11:15:35 +08:00
Jiaming Yuan	7f399eac8b	Use double for GPU Hist node sum. (#7507 )	2021-12-22 08:41:35 +08:00
Jiaming Yuan	ccdabe4512	Support building gradient index with cat data. (#7371 )	2021-11-03 22:37:37 +08:00
Jiaming Yuan	c311a8c1d8	Enable compiling with system cub. (#7232 ) - Tested with all CUDA 11.x. - Workaround cub scan by using discard iterator in AUC. - Limit the size of Argsort when compiled with CUDA cub.	2021-09-17 14:28:18 +08:00
Jiaming Yuan	3515931305	Initial support for external memory in gradient index. (#7183 ) * Add hessian to batch param in preparation of new approx impl. * Extract a push method for gradient index matrix. * Use span instead of vector ref for hessian in sketching. * Create a binary format for gradient index.	2021-09-13 12:40:56 +08:00
Jiaming Yuan	7a1d67f9cb	[breaking] Use integer atomic for GPU histogram. (#7180 ) On GPU we use rouding factor to truncate the gradient for deterministic results. This PR changes the gradient representation to fixed point number with exponent aligned with rounding factor. [breaking] Drop non-deterministic histogram. Use fixed point for shared memory. This PR is to improve the performance of GPU Hist. Co-authored-by: Andy Adinets <aadinets@nvidia.com>	2021-08-28 05:17:05 +08:00
Jiaming Yuan	ee8d1f5ed8	Fix histogram truncation. (#7181 ) * Fix truncation. * Lint. * lint.	2021-08-24 18:34:32 -07:00
Jiaming Yuan	bf562bd33c	Remove unused code. (#7175 )	2021-08-18 14:02:19 +08:00
Robert Maynard	1a75f43304	Allow compilation with nvcc 11.4 (#7131 ) * Use type aliases for discard iterators * update to include host_vector as thrust 1.12 doesn't bring it in as a side-effect * cub::DispatchRadixSort requires signed offset types	2021-07-27 20:05:33 +08:00

1 2

83 Commits