xgboost

Author	SHA1	Message	Date
Jiaming Yuan	271f4a80e7	Use CUDA virtual memory for pinned memory allocation. (#10850 ) - Add a grow-only virtual memory allocator. - Define a driver API wrapper. Split up the runtime API wrapper.	2024-09-28 04:26:44 +08:00
Jiaming Yuan	e228c1a121	[EM] Make page concatenation optional. (#10826 ) This PR introduces a new parameter `extmem_concat_pages` to make the page concatenation optional for GPU hist. In addition, the document is updated for the new GPU-based external memory.	2024-09-24 06:19:28 +08:00
Jiaming Yuan	24241ed6e3	[EM] Compress dense ellpack. (#10821 ) This helps reduce the memory copying needed for dense data. In addition, it helps reduce memory usage even if external memory is not used. - Decouple the number of symbols needed in the compressor with the number of features when the data is dense. - Remove the fetch call in the `at_end_` iteration. - Reduce synchronization and kernel launches by using the `uvector` and ctx.	2024-09-20 18:20:56 +08:00
Jiaming Yuan	96bbf80457	[EM] Suport quantile objectives for GPU-based external memory. (#10820 ) - Improved error message for memory usage. - Support quantile-based objectives for GPU external memory.	2024-09-17 13:27:02 +08:00
Jiaming Yuan	d94f6679fc	[EM] Avoid synchronous calls and unnecessary ATS access. (#10811 ) - Pass context into various functions. - Factor out some CUDA algorithms. - Use ATS only for update position.	2024-09-10 14:33:14 +08:00
Jiaming Yuan	5f7f31d464	[EM] Refactor ellpack construction. (#10810 ) - Remove the calculation of n_symbols in the accessor. - Pack initialization steps into the parameter list. - Pass the context into various ctors. - Specialization for dense data to prepare for further compression.	2024-09-09 14:10:10 +08:00
Jiaming Yuan	e1a2c1bbb3	[EM] Merge GPU partitioning with histogram building. (#10766 ) - Stop concatenating pages if there's no subsampling. - Use a single iteration for histogram build and partitioning.	2024-08-31 03:25:37 +08:00
Jiaming Yuan	61dd854a52	[EM] Refactor GPU histogram builder. (#10764 ) - Expose the maximum number of cached nodes to be consistent with the CPU implementation. Also easier for testing. - Extract the subtraction trick for easier testing. - Split up the `GradientQuantiser` to avoid circular dependency.	2024-08-30 02:39:14 +08:00
Jiaming Yuan	4fe67f10b4	[EM] Have one partitioner for each batch. (#10760 ) - Initialize one partitioner for each batch. - Collect partition size during initialization. - Support base ridx in the finalization.	2024-08-29 01:35:17 +08:00
Jiaming Yuan	bde1265caf	[EM] Return a full DMatrix instead of a Ellpack from the GPU sampler. (#10753 )	2024-08-28 01:05:11 +08:00
Jiaming Yuan	508ac13243	Check cub errors. (#10721 ) - Make sure cuda error returned by cub scan is caught. - Avoid temporary buffer allocation in thrust device vector.	2024-08-21 02:50:26 +08:00
Jiaming Yuan	8d7fe262d9	[EM] Enable access to the number of batches. (#10691 ) - Expose `NumBatches` in `DMatrix`. - Small cleanup for removing legacy CUDA stream and ~force CUDA context initialization~. - Purge old external memory data generation code.	2024-08-17 02:59:45 +08:00
Jiaming Yuan	582ea104b5	[EM] Enable prediction cache for GPU. (#10707 ) - Use `UpdatePosition` for all nodes and skip `FinalizePosition` when external memory is used. - Create `encode/decode` for node position, this is just as a refactor. - Reuse code between update position and finalization.	2024-08-15 21:41:59 +08:00
Jiaming Yuan	cc3b56fc37	Cleanup GPU Hist tests. (#10677 ) * Cleanup GPU Hist tests. - Remove GPU Hist gradient sampling test. The same properties are tested in the gradient sampler test suite. - Move basic histogram tests into the histogram test suite. - Remove the header inclusion of the `updater_gpu_hist.cu` in tests.	2024-08-06 11:50:44 +08:00
Jiaming Yuan	a19bbc9be5	Avoid caching allocator for large allocations. (#10582 )	2024-07-23 03:48:03 +08:00
Jiaming Yuan	6d9fcb771e	Move device histogram storage into `histogram.cuh`. (#10608 )	2024-07-21 14:10:13 +08:00
Jiaming Yuan	292bb677e5	[EM] Support mmap backed ellpack. (#10602 ) - Support resource view in ellpack. - Define the CUDA version of MMAP resource. - Define the CUDA version of malloc resource. - Refactor cuda runtime API wrappers, and add memory access related wrappers. - gather windows macros into a single header.	2024-07-18 08:20:21 +08:00
Jiaming Yuan	5f910cd4ff	[EM] Handle base idx in GPU histogram. (#10549 )	2024-07-11 03:26:30 +08:00
Jiaming Yuan	620b2b155a	Cache GPU histogram kernel configuration. (#10538 )	2024-07-04 15:38:59 +08:00
Jiaming Yuan	a5a58102e5	Revamp the rabit implementation. (#10112 ) This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features: - Federated learning for both CPU and GPU. - NCCL. - More data types. - A unified interface for all the underlying implementations. - Improved timeout handling for both tracker and workers. - Exhausted tests with metrics (fixed a couple of bugs along the way). - A reusable tracker for Python and JVM packages.	2024-05-20 11:56:23 +08:00
Jiaming Yuan	230010d9a0	Cleanup set info. (#10139 ) - Use the array interface internally. - Deprecate `XGDMatrixSetDenseInfo`. - Deprecate `XGDMatrixSetUIntInfo`. - Move the handling of `DataType` into the deprecated C function. --------- Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2024-03-26 23:26:24 +08:00
Jiaming Yuan	53fc17578f	Use `std::uint64_t` for row index. (#10120 ) - Use std::uint64_t instead of size_t to avoid implementation-defined type. - Rename to bst_idx_t, to account for other types of indexing. - Small cleanup to the base header.	2024-03-15 18:43:49 +08:00
Jiaming Yuan	06bdc15e9b	[coll] Pass context to various functions. (#9772 ) * [coll] Pass context to various functions. In the future, the `Context` object would be required for collective operations, this PR passes the context object to some required functions to prepare for swapping out the implementation.	2023-11-08 09:54:05 +08:00
Jiaming Yuan	7a02facc9d	Serialize expand entry for allgather. (#9702 )	2023-10-24 14:33:28 +08:00
Jiaming Yuan	8c676c889d	Remove internal use of gpu_id. (#9568 )	2023-09-20 23:29:51 +08:00
Rong Ou	9bab06cbca	Support column split in gpu hist updater (#9384 )	2023-08-31 18:09:35 +08:00
Rong Ou	6103dca0bb	Support column split in GPU evaluate splits (#9511 )	2023-08-23 16:33:43 +08:00
Rong Ou	7579905e18	Retry switching to per-thread default stream (#9416 )	2023-07-26 07:09:12 +08:00
Jiaming Yuan	3a9996173e	Revert "Switch to per-thread default stream (#9396 )" (#9413 ) This reverts commit `f7f673b00c`.	2023-07-24 12:03:28 -07:00
Rong Ou	f7f673b00c	Switch to per-thread default stream (#9396 )	2023-07-20 08:21:00 +08:00
Jiaming Yuan	54da4b3185	Cleanup to prepare for using mmap pointer in external memory. (#9317 ) - Update SparseDMatrix comment. - Use a pointer in the bitfield. We will replace the `std::vector<bool>` in `ColumnMatrix` with bitfield. - Clean up the page source. The timer is removed as it's inaccurate once we swap the mmap pointer into the page.	2023-06-22 06:43:11 +08:00
Jiaming Yuan	ee6809e642	Use mmap for external memory. (#9282 ) - Have basic infrastructure for mmap. - Release file write handle.	2023-06-19 18:52:55 +08:00
Jiaming Yuan	08ce495b5d	Use Booster context in DMatrix. (#8896 ) - Pass context from booster to DMatrix. - Use context instead of integer for `n_threads`. - Check the consistency configuration for `max_bin`. - Test for all combinations of initialization options.	2023-04-28 21:47:14 +08:00
Jiaming Yuan	8685556af2	Implement hist evaluator for multi-target tree. (#8908 )	2023-03-15 01:42:51 +08:00
Jiaming Yuan	c6a8754c62	Define CUDA Context. (#8604 ) We will transition to non-default and non-blocking CUDA stream.	2022-12-20 15:15:07 +08:00
Jiaming Yuan	3e26107a9c	Rename and extract `Context`. (#8528 ) * Rename `GenericParameter` to `Context`. * Rename header file to reflect the change. * Rename all references.	2022-12-07 04:58:54 +08:00
Jiaming Yuan	3ef1703553	Allow using string view to find JSON value. (#8332 ) - Allow comparison between string and string view. - Fix compiler warnings.	2022-10-13 17:10:13 +08:00
Rory Mitchell	210915c985	Use integer gradients in gpu_hist split evaluation (#8274 )	2022-10-11 12:16:27 +02:00
Rory Mitchell	8f77677193	Use quantised gradients in gpu_hist histograms (#8246 )	2022-09-26 17:35:35 +02:00
Jiaming Yuan	bc818316f2	Prepare for improving Windows networking compatibility. (#8234 ) * Prepare for improving Windows networking compatibility. * Include dmlc filesystem indirectly as dmlc/filesystem.h includes windows.h, which conflicts with winsock2.h * Define `NOMINMAX` conditionally. * Link the winsock library when mysys32 is used. * Add config file for read the doc.	2022-09-10 15:16:49 +08:00
Jiaming Yuan	b5eb36f1af	Add `max_cat_threshold` to GPU and handle missing cat values. (#8212 )	2022-09-07 00:57:51 +08:00
Rory Mitchell	1be09848a7	Refactor split valuation kernel (#8073 )	2022-07-21 15:41:50 +02:00
Jiaming Yuan	abaa593aa0	Fix compiler warnings. (#8059 ) - Remove unused parameters. - Avoid comparison of different signedness.	2022-07-14 05:29:56 +08:00
Rory Mitchell	794cbaa60a	Fuse split evaluation kernels (#8026 )	2022-07-05 10:24:31 +02:00
Rory Mitchell	bc4f802b17	Batch UpdatePosition using cudaMemcpy (#7964 )	2022-06-30 17:52:40 +02:00
Jiaming Yuan	142a208a90	Fix compiler warnings. (#8022 ) - Remove/fix unused parameters - Remove deprecated code in rabit. - Update dmlc-core.	2022-06-22 21:29:10 +08:00
Jiaming Yuan	9b0eb66b78	Fix GPU driver test. (#8008 ) * Initialize the training parameter.	2022-06-20 19:37:31 +08:00
Rory Mitchell	71d3b2e036	Fuse gpu_hist all-reduce calls where possible (#7867 )	2022-05-17 13:27:50 +02:00
Rory Mitchell	7ef54e39ec	Small refactor to categoricals (#7858 )	2022-05-05 17:47:02 +02:00
Jiaming Yuan	317d7be6ee	Always use partition based categorical splits. (#7857 )	2022-05-03 22:30:32 +08:00

1 2

79 Commits