xgboost

Author	SHA1	Message	Date
Jiaming Yuan	8d7fe262d9	[EM] Enable access to the number of batches. (#10691 ) - Expose `NumBatches` in `DMatrix`. - Small cleanup for removing legacy CUDA stream and ~force CUDA context initialization~. - Purge old external memory data generation code.	2024-08-17 02:59:45 +08:00
Jiaming Yuan	abe65e3769	Reduce thread contention in column split histogram test. (#10708 )	2024-08-17 01:00:32 +08:00
Jiaming Yuan	582ea104b5	[EM] Enable prediction cache for GPU. (#10707 ) - Use `UpdatePosition` for all nodes and skip `FinalizePosition` when external memory is used. - Create `encode/decode` for node position, this is just as a refactor. - Reuse code between update position and finalization.	2024-08-15 21:41:59 +08:00
Jiaming Yuan	34b154c284	Avoid the use of size_t in the partitioner. (#10541 ) - Avoid the use of size_t in the partitioner. - Use `Span` instead of `Elem` where `node_id` is not needed. - Remove the `const_cast`. - Make sure the constness is not removed in the `Elem` by making it reference only. size_t is implementation-defined, which causes issue when we want to pass pointer or span.	2024-07-11 00:43:08 +08:00
Jiaming Yuan	a5a58102e5	Revamp the rabit implementation. (#10112 ) This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features: - Federated learning for both CPU and GPU. - NCCL. - More data types. - A unified interface for all the underlying implementations. - Improved timeout handling for both tracker and workers. - Exhausted tests with metrics (fixed a couple of bugs along the way). - A reusable tracker for Python and JVM packages.	2024-05-20 11:56:23 +08:00
Jiaming Yuan	53fc17578f	Use `std::uint64_t` for row index. (#10120 ) - Use std::uint64_t instead of size_t to avoid implementation-defined type. - Rename to bst_idx_t, to account for other types of indexing. - Small cleanup to the base header.	2024-03-15 18:43:49 +08:00
Jiaming Yuan	fedd9674c8	Implement column sampler in CUDA. (#9785 ) - CUDA implementation. - Extract the broadcasting logic, we will need the context parameter after revamping the collective implementation. - Some changes to the event loop for fixing a deadlock in CI. - Move argsort into algorithms.cuh, add support for cuda stream.	2023-11-17 04:29:08 +08:00
Jiaming Yuan	06bdc15e9b	[coll] Pass context to various functions. (#9772 ) * [coll] Pass context to various functions. In the future, the `Context` object would be required for collective operations, this PR passes the context object to some required functions to prepare for swapping out the implementation.	2023-11-08 09:54:05 +08:00
Jiaming Yuan	7a02facc9d	Serialize expand entry for allgather. (#9702 )	2023-10-24 14:33:28 +08:00
Jiaming Yuan	8c676c889d	Remove internal use of gpu_id. (#9568 )	2023-09-20 23:29:51 +08:00
Jiaming Yuan	1caa93221a	Use `realloc` for histogram cache and expose the cache limit. (#9455 )	2023-08-10 14:05:27 +08:00
Jiaming Yuan	54029a59af	Bound the size of the histogram cache. (#9440 ) - A new histogram collection with a limit in size. - Unify histogram building logic between hist, multi-hist, and approx.	2023-08-08 03:21:26 +08:00
Jiaming Yuan	1332ff787f	Unify the code path between local and distributed training. (#9433 ) This removes the need for a local histogram space during distributed training, which cuts the cache size by half.	2023-08-03 21:46:36 +08:00
Jiaming Yuan	22b0a55a04	Remove hist builder class. (#9400 ) * Remove hist build class. * Cleanup this stateless class. * Add comment to thread block.	2023-07-22 10:43:12 +08:00
Jiaming Yuan	152e2fb072	Unify test helpers for creating ctx. (#9274 )	2023-06-10 03:35:22 +08:00
Rong Ou	5b69534b43	Support column split in multi-target `hist` (#9171 )	2023-05-26 16:56:05 +08:00
Jiaming Yuan	08ce495b5d	Use Booster context in DMatrix. (#8896 ) - Pass context from booster to DMatrix. - Use context instead of integer for `n_threads`. - Check the consistency configuration for `max_bin`. - Test for all combinations of initialization options.	2023-04-28 21:47:14 +08:00
Jiaming Yuan	8685556af2	Implement hist evaluator for multi-target tree. (#8908 )	2023-03-15 01:42:51 +08:00
Jiaming Yuan	5ba3509dd3	Define multi expand entry. (#8895 )	2023-03-13 19:31:05 +08:00
Jiaming Yuan	228a46e8ad	Support learning rate for zero-hessian objectives. (#8866 )	2023-03-06 20:33:28 +08:00
Rong Ou	a65ad0bd9c	Support column split in histogram builder (#8811 )	2023-02-17 22:37:01 +08:00
Jiaming Yuan	282b1729da	Specify the number of threads for parallel sort. (#8735 ) * Specify the number of threads for parallel sort. - Pass context object into argsort. - Replace macros with inline functions.	2023-02-16 00:20:19 +08:00
Jiaming Yuan	3760cede0f	Consistent use of context to specify number of threads. (#8733 ) - Use context in all tests. - Use context in R. - Use context in C API DMatrix initialization. (0 threads is used as dft).	2023-01-30 15:25:31 +08:00
Jiaming Yuan	e49e0998c0	Extract CPU sampling routines. (#8697 )	2023-01-19 23:28:18 +08:00
Dmitry Razdoburdin	5bd849f1b5	Unify the partitioner for hist and approx. Co-authored-by: dmitry.razdoburdin <drazdobu@jfldaal005.jf.intel.com> Co-authored-by: jiamingy <jm.yuan@outlook.com>	2022-10-20 02:49:20 +08:00
Rory Mitchell	8f77677193	Use quantised gradients in gpu_hist histograms (#8246 )	2022-09-26 17:35:35 +02:00
Dmitry Razdoburdin	eb7bbee2c9	Optional by-column histogram build. (#8233 ) Co-authored-by: dmitry.razdoburdin <drazdobu@jfldaal005.jf.intel.com>	2022-09-22 05:16:13 +08:00
Jiaming Yuan	b5eb36f1af	Add `max_cat_threshold` to GPU and handle missing cat values. (#8212 )	2022-09-07 00:57:51 +08:00
Jiaming Yuan	4a4e5c7c18	Prepare gradient index for Quantile DMatrix. (#8103 ) * Prepare gradient index for Quantile DMatrix. - Implement push batch with adapter batch. - Implement `GetFvalue` for prediction.	2022-07-22 17:26:33 +08:00
Jiaming Yuan	142a208a90	Fix compiler warnings. (#8022 ) - Remove/fix unused parameters - Remove deprecated code in rabit. - Update dmlc-core.	2022-06-22 21:29:10 +08:00
Jiaming Yuan	bde4f25794	Handle missing categorical value in CPU evaluator. (#7948 )	2022-05-27 14:15:47 +08:00
Jiaming Yuan	18a38f7ca0	Refactor for GHistIndex. (#7923 ) * Pass sparse page as adapter, which prepares for quantile dmatrix. * Remove old external memory code like `rbegin` and extra `Init` function. * Simplify type dispatch.	2022-05-23 23:04:53 +08:00
Jiaming Yuan	4fcfd9c96e	Fix and cleanup for column matrix. (#7901 ) * Fix missed type dispatching for dense columns with missing values. * Code cleanup to reduce special cases. * Reduce memory usage.	2022-05-16 21:11:50 +08:00
Jiaming Yuan	1b6538b4e5	[breaking] Drop single precision histogram (#7892 ) Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2022-05-13 19:54:55 +08:00
Jiaming Yuan	317d7be6ee	Always use partition based categorical splits. (#7857 )	2022-05-03 22:30:32 +08:00
Jiaming Yuan	4d81c741e9	External memory support for hist (#7531 ) * Generate column matrix from gHistIndex. * Avoid synchronization with the sparse page once the cache is written. * Cleanups: Remove member variables/functions, change the update routine to look like approx and gpu_hist. * Remove pruner.	2022-03-22 00:13:20 +08:00
Jiaming Yuan	83a66b4994	Support categorical data for hist. (#7695 ) * Extract partitioner from hist. * Implement categorical data support by passing the gradient index directly into the partitioner. * Organize/update document. * Remove code for negative hessian.	2022-02-25 03:47:14 +08:00
Jiaming Yuan	6762c45494	Small cleanup to gradient index and hist. (#7668 ) * Code comments. * Const accessor to index. * Remove some weird variables in the `Index` class. * Simplify the `MemStackAllocator`.	2022-02-23 11:37:21 +08:00
Jiaming Yuan	0d0abe1845	Support optimal partitioning for GPU hist. (#7652 ) * Implement `MaxCategory` in quantile. * Implement partition-based split for GPU evaluation. Currently, it's based on the existing evaluation function. * Extract an evaluator from GPU Hist to store the needed states. * Added some CUDA stream/event utilities. * Update document with references. * Fixed a bug in approx evaluator where the number of data points is less than the number of categories.	2022-02-15 03:03:12 +08:00
Jiaming Yuan	2775c2a1ab	Prepare external memory support for hist. (#7638 ) This PR prepares the GHistIndexMatrix to host the column matrix which is used by the hist tree method by accepting sparse_threshold parameter. Some cleanups are made to ensure the correct batch param is being passed into DMatrix along with some additional tests for correctness of SimpleDMatrix.	2022-02-10 16:58:02 +08:00
Jiaming Yuan	5817840858	Remove `omp_get_max_threads` in data. (#7588 )	2022-01-24 02:44:07 +08:00
Jiaming Yuan	9ab73f737e	Extract Sketch Entry from hist maker. (#7503 ) * Extract Sketch Entry from hist maker. * Add a new sketch container for sorted inputs. * Optimize bin search.	2021-12-18 05:36:56 +08:00
Jiaming Yuan	bf7bb575b4	Test CPU histogram with cat data. (#7465 )	2021-11-27 00:43:28 +08:00
Jiaming Yuan	176110a22d	Support external memory in CPU histogram building. (#7372 )	2021-11-23 01:13:33 +08:00
Jiaming Yuan	d7d1b6e3a6	CPU evaluation for cat data. (#7393 ) * Implementation for one hot based. * Implementation for partition based. (LightGBM)	2021-11-06 14:41:35 +08:00
Jiaming Yuan	8d7c6366d7	Accept histogram cut instead gradient index in evaluation. (#7336 )	2021-10-20 18:04:46 +08:00
Jiaming Yuan	8e619010d0	Extract CPUExpandEntry and HistParam. (#7321 ) * Remove kRootNid. * Check for empty hessian.	2021-10-17 14:22:25 +08:00
Jiaming Yuan	149f209af6	Extract histogram builder from CPU Hist. (#7152 ) * Extract the CPU histogram builder. * Fix tests. * Reduce number of histograms being built.	2021-08-09 21:15:21 +08:00
Jiaming Yuan	615ab2b03e	Extract evaluate splits from CPU hist. (#7079 ) Other than modularizing the split evaluation function, this PR also removes some more functions including `InitNewNodes` and `BuildNodeStats` among some other unused variables. Also, scattered code like setting leaf weights is grouped into the split evaluator and `NodeEntry` is simplified and made private. Another subtle difference with the original implementation is that the modified code doesn't call `tree[nidx].Parent()` to traversal upward.	2021-07-07 15:16:25 +08:00

49 Commits