xgboost

Author	SHA1	Message	Date
Jiaming Yuan	24241ed6e3	[EM] Compress dense ellpack. (#10821 ) This helps reduce the memory copying needed for dense data. In addition, it helps reduce memory usage even if external memory is not used. - Decouple the number of symbols needed in the compressor with the number of features when the data is dense. - Remove the fetch call in the `at_end_` iteration. - Reduce synchronization and kernel launches by using the `uvector` and ctx.	2024-09-20 18:20:56 +08:00
Jiaming Yuan	96bbf80457	[EM] Suport quantile objectives for GPU-based external memory. (#10820 ) - Improved error message for memory usage. - Support quantile-based objectives for GPU external memory.	2024-09-17 13:27:02 +08:00
Jiaming Yuan	d94f6679fc	[EM] Avoid synchronous calls and unnecessary ATS access. (#10811 ) - Pass context into various functions. - Factor out some CUDA algorithms. - Use ATS only for update position.	2024-09-10 14:33:14 +08:00
Jiaming Yuan	5f7f31d464	[EM] Refactor ellpack construction. (#10810 ) - Remove the calculation of n_symbols in the accessor. - Pack initialization steps into the parameter list. - Pass the context into various ctors. - Specialization for dense data to prepare for further compression.	2024-09-09 14:10:10 +08:00
Jiaming Yuan	e1a2c1bbb3	[EM] Merge GPU partitioning with histogram building. (#10766 ) - Stop concatenating pages if there's no subsampling. - Use a single iteration for histogram build and partitioning.	2024-08-31 03:25:37 +08:00
Jiaming Yuan	98ac153265	Avoid warning from NVCC. (#10757 )	2024-08-30 16:11:31 +08:00
Jiaming Yuan	61dd854a52	[EM] Refactor GPU histogram builder. (#10764 ) - Expose the maximum number of cached nodes to be consistent with the CPU implementation. Also easier for testing. - Extract the subtraction trick for easier testing. - Split up the `GradientQuantiser` to avoid circular dependency.	2024-08-30 02:39:14 +08:00
Jiaming Yuan	4fe67f10b4	[EM] Have one partitioner for each batch. (#10760 ) - Initialize one partitioner for each batch. - Collect partition size during initialization. - Support base ridx in the finalization.	2024-08-29 01:35:17 +08:00
Jiaming Yuan	64afe9873b	Increase timeout in C++ tests from 1 to 5 seconds. (#10756 ) To avoid CI failures on FreeBSD.	2024-08-28 02:27:14 +08:00
Jiaming Yuan	bde1265caf	[EM] Return a full DMatrix instead of a Ellpack from the GPU sampler. (#10753 )	2024-08-28 01:05:11 +08:00
Jiaming Yuan	d6ebcfb032	[EM] Support CPU quantile objective for external memory. (#10751 )	2024-08-27 04:16:57 +08:00
Jiaming Yuan	cb54374550	Update clang-tidy. (#10730 ) - Install cmake using pip. - Fix compile command generation. - Clean up the tidy script and remove the need to load the yaml file. - Fix modernized type traits. - Fix span class. Polymorphism support is dropped	2024-08-22 04:12:18 +08:00
Dmitry Razdoburdin	24d225c1ab	[SYCL] Implement UpdatePredictionCache and connect updater with leraner. (#10701 ) --------- Co-authored-by: Dmitry Razdoburdin <>	2024-08-22 02:07:44 +08:00
Jiaming Yuan	402e7837fb	Fix potential race in feature constraint. (#10719 )	2024-08-21 16:50:31 +08:00
Jiaming Yuan	508ac13243	Check cub errors. (#10721 ) - Make sure cuda error returned by cub scan is caught. - Avoid temporary buffer allocation in thrust device vector.	2024-08-21 02:50:26 +08:00
Jiaming Yuan	8d7fe262d9	[EM] Enable access to the number of batches. (#10691 ) - Expose `NumBatches` in `DMatrix`. - Small cleanup for removing legacy CUDA stream and ~force CUDA context initialization~. - Purge old external memory data generation code.	2024-08-17 02:59:45 +08:00
Jiaming Yuan	abe65e3769	Reduce thread contention in column split histogram test. (#10708 )	2024-08-17 01:00:32 +08:00
Jiaming Yuan	582ea104b5	[EM] Enable prediction cache for GPU. (#10707 ) - Use `UpdatePosition` for all nodes and skip `FinalizePosition` when external memory is used. - Create `encode/decode` for node position, this is just as a refactor. - Reuse code between update position and finalization.	2024-08-15 21:41:59 +08:00
Jiaming Yuan	cc3b56fc37	Cleanup GPU Hist tests. (#10677 ) * Cleanup GPU Hist tests. - Remove GPU Hist gradient sampling test. The same properties are tested in the gradient sampler test suite. - Move basic histogram tests into the histogram test suite. - Remove the header inclusion of the `updater_gpu_hist.cu` in tests.	2024-08-06 11:50:44 +08:00
Jiaming Yuan	77c844cef7	Reduce thread contention in column split tests. (#10658 ) --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2024-08-01 18:36:46 +08:00
Jiaming Yuan	a19bbc9be5	Avoid caching allocator for large allocations. (#10582 )	2024-07-23 03:48:03 +08:00
Jiaming Yuan	6d9fcb771e	Move device histogram storage into `histogram.cuh`. (#10608 )	2024-07-21 14:10:13 +08:00
Jiaming Yuan	292bb677e5	[EM] Support mmap backed ellpack. (#10602 ) - Support resource view in ellpack. - Define the CUDA version of MMAP resource. - Define the CUDA version of malloc resource. - Refactor cuda runtime API wrappers, and add memory access related wrappers. - gather windows macros into a single header.	2024-07-18 08:20:21 +08:00
Jiaming Yuan	a6a8a55ffa	Merge approx tests. (#10583 )	2024-07-16 19:03:48 +08:00
Jiaming Yuan	6c403187ec	Fix column split race condition. (#10572 )	2024-07-12 01:07:12 +08:00
Jiaming Yuan	5f910cd4ff	[EM] Handle base idx in GPU histogram. (#10549 )	2024-07-11 03:26:30 +08:00
Jiaming Yuan	34b154c284	Avoid the use of size_t in the partitioner. (#10541 ) - Avoid the use of size_t in the partitioner. - Use `Span` instead of `Elem` where `node_id` is not needed. - Remove the `const_cast`. - Make sure the constness is not removed in the `Elem` by making it reference only. size_t is implementation-defined, which causes issue when we want to pass pointer or span.	2024-07-11 00:43:08 +08:00
Jiaming Yuan	620b2b155a	Cache GPU histogram kernel configuration. (#10538 )	2024-07-04 15:38:59 +08:00
Jiaming Yuan	e5f1720656	[EM] Avoid writing cut matrix to cache. (#10444 )	2024-06-19 18:03:38 +08:00
Jiaming Yuan	a5a58102e5	Revamp the rabit implementation. (#10112 ) This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features: - Federated learning for both CPU and GPU. - NCCL. - More data types. - A unified interface for all the underlying implementations. - Improved timeout handling for both tracker and workers. - Exhausted tests with metrics (fixed a couple of bugs along the way). - A reusable tracker for Python and JVM packages.	2024-05-20 11:56:23 +08:00
Jiaming Yuan	230010d9a0	Cleanup set info. (#10139 ) - Use the array interface internally. - Deprecate `XGDMatrixSetDenseInfo`. - Deprecate `XGDMatrixSetUIntInfo`. - Move the handling of `DataType` into the deprecated C function. --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2024-03-26 23:26:24 +08:00
Jiaming Yuan	53fc17578f	Use `std::uint64_t` for row index. (#10120 ) - Use std::uint64_t instead of size_t to avoid implementation-defined type. - Rename to bst_idx_t, to account for other types of indexing. - Small cleanup to the base header.	2024-03-15 18:43:49 +08:00
Jiaming Yuan	2c13f90384	Support graphviz plot for multi-target tree. (#10093 )	2024-03-09 05:35:25 +08:00
Jiaming Yuan	0ce4372bd4	Use UBJSON for serializing splits for vertical data split. (#10059 )	2024-02-25 00:18:23 +08:00
Jiaming Yuan	cacb4b1fdd	Fix gain calculation in multi-target tree. (#9978 )	2024-01-17 13:18:44 +08:00
Jiaming Yuan	a7226c0222	Fix feature names with special characters. (#9923 )	2023-12-28 22:45:13 +08:00
Jiaming Yuan	fedd9674c8	Implement column sampler in CUDA. (#9785 ) - CUDA implementation. - Extract the broadcasting logic, we will need the context parameter after revamping the collective implementation. - Some changes to the event loop for fixing a deadlock in CI. - Move argsort into algorithms.cuh, add support for cuda stream.	2023-11-17 04:29:08 +08:00
Jiaming Yuan	06bdc15e9b	[coll] Pass context to various functions. (#9772 ) * [coll] Pass context to various functions. In the future, the `Context` object would be required for collective operations, this PR passes the context object to some required functions to prepare for swapping out the implementation.	2023-11-08 09:54:05 +08:00
Jiaming Yuan	7a02facc9d	Serialize expand entry for allgather. (#9702 )	2023-10-24 14:33:28 +08:00
Jiaming Yuan	8c676c889d	Remove internal use of gpu_id. (#9568 )	2023-09-20 23:29:51 +08:00
Rong Ou	66a0832778	Add tests for gpu_approx (#9553 )	2023-09-07 17:21:58 +08:00
Rong Ou	9bab06cbca	Support column split in gpu hist updater (#9384 )	2023-08-31 18:09:35 +08:00
Jiaming Yuan	972730cde0	Use matrix for gradient. (#9508 ) - Use the `linalg::Matrix` for storing gradients. - New API for the custom objective. - Custom objective for multi-class/multi-target is now required to return the correct shape. - Custom objective for Python can accept arrays with any strides. (row-major, column-major)	2023-08-24 05:29:52 +08:00
Rong Ou	6103dca0bb	Support column split in GPU evaluate splits (#9511 )	2023-08-23 16:33:43 +08:00
Jiaming Yuan	1caa93221a	Use `realloc` for histogram cache and expose the cache limit. (#9455 )	2023-08-10 14:05:27 +08:00
Jiaming Yuan	54029a59af	Bound the size of the histogram cache. (#9440 ) - A new histogram collection with a limit in size. - Unify histogram building logic between hist, multi-hist, and approx.	2023-08-08 03:21:26 +08:00
Jiaming Yuan	1332ff787f	Unify the code path between local and distributed training. (#9433 ) This removes the need for a local histogram space during distributed training, which cuts the cache size by half.	2023-08-03 21:46:36 +08:00
Jiaming Yuan	e93a274823	Small cleanup for histogram routines. (#9427 ) * Small cleanup for histogram routines. - Extract hist train param from GPU hist. - Make histogram const after construction. - Unify parameter names.	2023-08-02 18:28:26 +08:00
Jiaming Yuan	912e341d57	Initial GPU support for the approx tree method. (#9414 )	2023-07-31 15:50:28 +08:00
Rong Ou	7579905e18	Retry switching to per-thread default stream (#9416 )	2023-07-26 07:09:12 +08:00

1 2 3 4 5 ...

277 Commits