49 Commits

Author SHA1 Message Date
Jiaming Yuan
8d7fe262d9
[EM] Enable access to the number of batches. (#10691)
- Expose `NumBatches` in `DMatrix`.
- Small cleanup for removing legacy CUDA stream and ~force CUDA context initialization~.
- Purge old external memory data generation code.
2024-08-17 02:59:45 +08:00
Jiaming Yuan
abe65e3769
Reduce thread contention in column split histogram test. (#10708) 2024-08-17 01:00:32 +08:00
Jiaming Yuan
582ea104b5
[EM] Enable prediction cache for GPU. (#10707)
- Use `UpdatePosition` for all nodes and skip `FinalizePosition` when external memory is used.
- Create `encode/decode` for node position, this is just as a refactor.
- Reuse code between update position and finalization.
2024-08-15 21:41:59 +08:00
Jiaming Yuan
34b154c284
Avoid the use of size_t in the partitioner. (#10541)
- Avoid the use of size_t in the partitioner.
- Use `Span` instead of `Elem` where `node_id` is not needed.
- Remove the `const_cast`.
- Make sure the constness is not removed in the `Elem` by making it reference only.

size_t is implementation-defined, which causes issue when we want to pass pointer or span.
2024-07-11 00:43:08 +08:00
Jiaming Yuan
a5a58102e5
Revamp the rabit implementation. (#10112)
This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features:
- Federated learning for both CPU and GPU.
- NCCL.
- More data types.
- A unified interface for all the underlying implementations.
- Improved timeout handling for both tracker and workers.
- Exhausted tests with metrics (fixed a couple of bugs along the way).
- A reusable tracker for Python and JVM packages.
2024-05-20 11:56:23 +08:00
Jiaming Yuan
53fc17578f
Use std::uint64_t for row index. (#10120)
- Use std::uint64_t instead of size_t to avoid implementation-defined type.
- Rename to bst_idx_t, to account for other types of indexing.
- Small cleanup to the base header.
2024-03-15 18:43:49 +08:00
Jiaming Yuan
fedd9674c8
Implement column sampler in CUDA. (#9785)
- CUDA implementation.
- Extract the broadcasting logic, we will need the context parameter after revamping the collective implementation.
- Some changes to the event loop for fixing a deadlock in CI.
- Move argsort into algorithms.cuh, add support for cuda stream.
2023-11-17 04:29:08 +08:00
Jiaming Yuan
06bdc15e9b
[coll] Pass context to various functions. (#9772)
* [coll] Pass context to various functions.

In the future, the `Context` object would be required for collective operations, this PR
passes the context object to some required functions to prepare for swapping out the
implementation.
2023-11-08 09:54:05 +08:00
Jiaming Yuan
7a02facc9d
Serialize expand entry for allgather. (#9702) 2023-10-24 14:33:28 +08:00
Jiaming Yuan
8c676c889d
Remove internal use of gpu_id. (#9568) 2023-09-20 23:29:51 +08:00
Jiaming Yuan
1caa93221a
Use realloc for histogram cache and expose the cache limit. (#9455) 2023-08-10 14:05:27 +08:00
Jiaming Yuan
54029a59af
Bound the size of the histogram cache. (#9440)
- A new histogram collection with a limit in size.
- Unify histogram building logic between hist, multi-hist, and approx.
2023-08-08 03:21:26 +08:00
Jiaming Yuan
1332ff787f
Unify the code path between local and distributed training. (#9433)
This removes the need for a local histogram space during distributed training, which cuts the cache size by half.
2023-08-03 21:46:36 +08:00
Jiaming Yuan
22b0a55a04
Remove hist builder class. (#9400)
* Remove hist build class.

* Cleanup this stateless class.

* Add comment to thread block.
2023-07-22 10:43:12 +08:00
Jiaming Yuan
152e2fb072
Unify test helpers for creating ctx. (#9274) 2023-06-10 03:35:22 +08:00
Rong Ou
5b69534b43
Support column split in multi-target hist (#9171) 2023-05-26 16:56:05 +08:00
Jiaming Yuan
08ce495b5d
Use Booster context in DMatrix. (#8896)
- Pass context from booster to DMatrix.
- Use context instead of integer for `n_threads`.
- Check the consistency configuration for `max_bin`.
- Test for all combinations of initialization options.
2023-04-28 21:47:14 +08:00
Jiaming Yuan
8685556af2
Implement hist evaluator for multi-target tree. (#8908) 2023-03-15 01:42:51 +08:00
Jiaming Yuan
5ba3509dd3
Define multi expand entry. (#8895) 2023-03-13 19:31:05 +08:00
Jiaming Yuan
228a46e8ad
Support learning rate for zero-hessian objectives. (#8866) 2023-03-06 20:33:28 +08:00
Rong Ou
a65ad0bd9c
Support column split in histogram builder (#8811) 2023-02-17 22:37:01 +08:00
Jiaming Yuan
282b1729da
Specify the number of threads for parallel sort. (#8735)
* Specify the number of threads for parallel sort.

- Pass context object into argsort.
- Replace macros with inline functions.
2023-02-16 00:20:19 +08:00
Jiaming Yuan
3760cede0f
Consistent use of context to specify number of threads. (#8733)
- Use context in all tests.
- Use context in R.
- Use context in C API DMatrix initialization. (0 threads is used as dft).
2023-01-30 15:25:31 +08:00
Jiaming Yuan
e49e0998c0
Extract CPU sampling routines. (#8697) 2023-01-19 23:28:18 +08:00
Dmitry Razdoburdin
5bd849f1b5
Unify the partitioner for hist and approx.
Co-authored-by: dmitry.razdoburdin <drazdobu@jfldaal005.jf.intel.com>
Co-authored-by: jiamingy <jm.yuan@outlook.com>
2022-10-20 02:49:20 +08:00
Rory Mitchell
8f77677193
Use quantised gradients in gpu_hist histograms (#8246) 2022-09-26 17:35:35 +02:00
Dmitry Razdoburdin
eb7bbee2c9
Optional by-column histogram build. (#8233)
Co-authored-by: dmitry.razdoburdin <drazdobu@jfldaal005.jf.intel.com>
2022-09-22 05:16:13 +08:00
Jiaming Yuan
b5eb36f1af
Add max_cat_threshold to GPU and handle missing cat values. (#8212) 2022-09-07 00:57:51 +08:00
Jiaming Yuan
4a4e5c7c18
Prepare gradient index for Quantile DMatrix. (#8103)
* Prepare gradient index for Quantile DMatrix.

- Implement push batch with adapter batch.
- Implement `GetFvalue` for prediction.
2022-07-22 17:26:33 +08:00
Jiaming Yuan
142a208a90
Fix compiler warnings. (#8022)
- Remove/fix unused parameters
- Remove deprecated code in rabit.
- Update dmlc-core.
2022-06-22 21:29:10 +08:00
Jiaming Yuan
bde4f25794
Handle missing categorical value in CPU evaluator. (#7948) 2022-05-27 14:15:47 +08:00
Jiaming Yuan
18a38f7ca0
Refactor for GHistIndex. (#7923)
* Pass sparse page as adapter, which prepares for quantile dmatrix.
* Remove old external memory code like `rbegin` and extra `Init` function.
* Simplify type dispatch.
2022-05-23 23:04:53 +08:00
Jiaming Yuan
4fcfd9c96e
Fix and cleanup for column matrix. (#7901)
* Fix missed type dispatching for dense columns with missing values.
* Code cleanup to reduce special cases.
* Reduce memory usage.
2022-05-16 21:11:50 +08:00
Jiaming Yuan
1b6538b4e5
[breaking] Drop single precision histogram (#7892)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2022-05-13 19:54:55 +08:00
Jiaming Yuan
317d7be6ee
Always use partition based categorical splits. (#7857) 2022-05-03 22:30:32 +08:00
Jiaming Yuan
4d81c741e9
External memory support for hist (#7531)
* Generate column matrix from gHistIndex.
* Avoid synchronization with the sparse page once the cache is written.
* Cleanups: Remove member variables/functions, change the update routine to look like approx and gpu_hist.
* Remove pruner.
2022-03-22 00:13:20 +08:00
Jiaming Yuan
83a66b4994
Support categorical data for hist. (#7695)
* Extract partitioner from hist.
* Implement categorical data support by passing the gradient index directly into the partitioner.
* Organize/update document.
* Remove code for negative hessian.
2022-02-25 03:47:14 +08:00
Jiaming Yuan
6762c45494
Small cleanup to gradient index and hist. (#7668)
* Code comments.
* Const accessor to index.
* Remove some weird variables in the `Index` class.
* Simplify the `MemStackAllocator`.
2022-02-23 11:37:21 +08:00
Jiaming Yuan
0d0abe1845
Support optimal partitioning for GPU hist. (#7652)
* Implement `MaxCategory` in quantile.
* Implement partition-based split for GPU evaluation.  Currently, it's based on the existing evaluation function.
* Extract an evaluator from GPU Hist to store the needed states.
* Added some CUDA stream/event utilities.
* Update document with references.
* Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
2022-02-15 03:03:12 +08:00
Jiaming Yuan
2775c2a1ab
Prepare external memory support for hist. (#7638)
This PR prepares the GHistIndexMatrix to host the column matrix which is used by the hist tree method by accepting sparse_threshold parameter.

Some cleanups are made to ensure the correct batch param is being passed into DMatrix along with some additional tests for correctness of SimpleDMatrix.
2022-02-10 16:58:02 +08:00
Jiaming Yuan
5817840858
Remove omp_get_max_threads in data. (#7588) 2022-01-24 02:44:07 +08:00
Jiaming Yuan
9ab73f737e
Extract Sketch Entry from hist maker. (#7503)
* Extract Sketch Entry from hist maker.

* Add a new sketch container for sorted inputs.
* Optimize bin search.
2021-12-18 05:36:56 +08:00
Jiaming Yuan
bf7bb575b4
Test CPU histogram with cat data. (#7465) 2021-11-27 00:43:28 +08:00
Jiaming Yuan
176110a22d
Support external memory in CPU histogram building. (#7372) 2021-11-23 01:13:33 +08:00
Jiaming Yuan
d7d1b6e3a6
CPU evaluation for cat data. (#7393)
* Implementation for one hot based.
* Implementation for partition based. (LightGBM)
2021-11-06 14:41:35 +08:00
Jiaming Yuan
8d7c6366d7
Accept histogram cut instead gradient index in evaluation. (#7336) 2021-10-20 18:04:46 +08:00
Jiaming Yuan
8e619010d0
Extract CPUExpandEntry and HistParam. (#7321)
* Remove kRootNid.
* Check for empty hessian.
2021-10-17 14:22:25 +08:00
Jiaming Yuan
149f209af6
Extract histogram builder from CPU Hist. (#7152)
* Extract the CPU histogram builder.
* Fix tests.
* Reduce number of histograms being built.
2021-08-09 21:15:21 +08:00
Jiaming Yuan
615ab2b03e
Extract evaluate splits from CPU hist. (#7079)
Other than modularizing the split evaluation function, this PR also removes some more functions including `InitNewNodes` and `BuildNodeStats` among some other unused variables.  Also, scattered code like setting leaf weights is grouped into the split evaluator and `NodeEntry` is simplified and made private.  Another subtle difference with the original implementation is that the modified code doesn't call `tree[nidx].Parent()` to traversal upward.
2021-07-07 15:16:25 +08:00