68 Commits

Author SHA1 Message Date
Jiaming Yuan
8d7fe262d9
[EM] Enable access to the number of batches. (#10691)
- Expose `NumBatches` in `DMatrix`.
- Small cleanup for removing legacy CUDA stream and ~force CUDA context initialization~.
- Purge old external memory data generation code.
2024-08-17 02:59:45 +08:00
Jiaming Yuan
582ea104b5
[EM] Enable prediction cache for GPU. (#10707)
- Use `UpdatePosition` for all nodes and skip `FinalizePosition` when external memory is used.
- Create `encode/decode` for node position, this is just as a refactor.
- Reuse code between update position and finalization.
2024-08-15 21:41:59 +08:00
Jiaming Yuan
cc3b56fc37
Cleanup GPU Hist tests. (#10677)
* Cleanup GPU Hist tests.

- Remove GPU Hist gradient sampling test. The same properties are tested in the gradient
  sampler test suite.
- Move basic histogram tests into the histogram test suite.
- Remove the header inclusion of the `updater_gpu_hist.cu` in tests.
2024-08-06 11:50:44 +08:00
Jiaming Yuan
a19bbc9be5
Avoid caching allocator for large allocations. (#10582) 2024-07-23 03:48:03 +08:00
Jiaming Yuan
6d9fcb771e
Move device histogram storage into histogram.cuh. (#10608) 2024-07-21 14:10:13 +08:00
Jiaming Yuan
292bb677e5
[EM] Support mmap backed ellpack. (#10602)
- Support resource view in ellpack.
- Define the CUDA version of MMAP resource.
- Define the CUDA version of malloc resource.
- Refactor cuda runtime API wrappers, and add memory access related wrappers.
- gather windows macros into a single header.
2024-07-18 08:20:21 +08:00
Jiaming Yuan
5f910cd4ff
[EM] Handle base idx in GPU histogram. (#10549) 2024-07-11 03:26:30 +08:00
Jiaming Yuan
620b2b155a
Cache GPU histogram kernel configuration. (#10538) 2024-07-04 15:38:59 +08:00
Jiaming Yuan
a5a58102e5
Revamp the rabit implementation. (#10112)
This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features:
- Federated learning for both CPU and GPU.
- NCCL.
- More data types.
- A unified interface for all the underlying implementations.
- Improved timeout handling for both tracker and workers.
- Exhausted tests with metrics (fixed a couple of bugs along the way).
- A reusable tracker for Python and JVM packages.
2024-05-20 11:56:23 +08:00
Jiaming Yuan
230010d9a0
Cleanup set info. (#10139)
- Use the array interface internally.
- Deprecate `XGDMatrixSetDenseInfo`.
- Deprecate `XGDMatrixSetUIntInfo`.
- Move the handling of `DataType` into the deprecated C function.

---------

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2024-03-26 23:26:24 +08:00
Jiaming Yuan
53fc17578f
Use std::uint64_t for row index. (#10120)
- Use std::uint64_t instead of size_t to avoid implementation-defined type.
- Rename to bst_idx_t, to account for other types of indexing.
- Small cleanup to the base header.
2024-03-15 18:43:49 +08:00
Jiaming Yuan
06bdc15e9b
[coll] Pass context to various functions. (#9772)
* [coll] Pass context to various functions.

In the future, the `Context` object would be required for collective operations, this PR
passes the context object to some required functions to prepare for swapping out the
implementation.
2023-11-08 09:54:05 +08:00
Jiaming Yuan
7a02facc9d
Serialize expand entry for allgather. (#9702) 2023-10-24 14:33:28 +08:00
Jiaming Yuan
8c676c889d
Remove internal use of gpu_id. (#9568) 2023-09-20 23:29:51 +08:00
Rong Ou
9bab06cbca
Support column split in gpu hist updater (#9384) 2023-08-31 18:09:35 +08:00
Rong Ou
6103dca0bb
Support column split in GPU evaluate splits (#9511) 2023-08-23 16:33:43 +08:00
Rong Ou
7579905e18
Retry switching to per-thread default stream (#9416) 2023-07-26 07:09:12 +08:00
Jiaming Yuan
3a9996173e
Revert "Switch to per-thread default stream (#9396)" (#9413)
This reverts commit f7f673b00c15458fb4dd74a2a0d2ba80369c5faf.
2023-07-24 12:03:28 -07:00
Rong Ou
f7f673b00c
Switch to per-thread default stream (#9396) 2023-07-20 08:21:00 +08:00
Jiaming Yuan
54da4b3185
Cleanup to prepare for using mmap pointer in external memory. (#9317)
- Update SparseDMatrix comment.
- Use a pointer in the bitfield. We will replace the `std::vector<bool>` in `ColumnMatrix` with bitfield.
- Clean up the page source. The timer is removed as it's inaccurate once we swap the mmap pointer into the page.
2023-06-22 06:43:11 +08:00
Jiaming Yuan
ee6809e642
Use mmap for external memory. (#9282)
- Have basic infrastructure for mmap.
- Release file write handle.
2023-06-19 18:52:55 +08:00
Jiaming Yuan
08ce495b5d
Use Booster context in DMatrix. (#8896)
- Pass context from booster to DMatrix.
- Use context instead of integer for `n_threads`.
- Check the consistency configuration for `max_bin`.
- Test for all combinations of initialization options.
2023-04-28 21:47:14 +08:00
Jiaming Yuan
8685556af2
Implement hist evaluator for multi-target tree. (#8908) 2023-03-15 01:42:51 +08:00
Jiaming Yuan
c6a8754c62
Define CUDA Context. (#8604)
We will transition to non-default and non-blocking CUDA stream.
2022-12-20 15:15:07 +08:00
Jiaming Yuan
3e26107a9c
Rename and extract Context. (#8528)
* Rename `GenericParameter` to `Context`.
* Rename header file to reflect the change.
* Rename all references.
2022-12-07 04:58:54 +08:00
Jiaming Yuan
3ef1703553
Allow using string view to find JSON value. (#8332)
- Allow comparison between string and string view.
- Fix compiler warnings.
2022-10-13 17:10:13 +08:00
Rory Mitchell
210915c985
Use integer gradients in gpu_hist split evaluation (#8274) 2022-10-11 12:16:27 +02:00
Rory Mitchell
8f77677193
Use quantised gradients in gpu_hist histograms (#8246) 2022-09-26 17:35:35 +02:00
Jiaming Yuan
bc818316f2
Prepare for improving Windows networking compatibility. (#8234)
* Prepare for improving Windows networking compatibility.

* Include dmlc filesystem indirectly as dmlc/filesystem.h includes windows.h, which
  conflicts with winsock2.h
* Define `NOMINMAX` conditionally.
* Link the winsock library when mysys32 is used.
* Add config file for read the doc.
2022-09-10 15:16:49 +08:00
Jiaming Yuan
b5eb36f1af
Add max_cat_threshold to GPU and handle missing cat values. (#8212) 2022-09-07 00:57:51 +08:00
Rory Mitchell
1be09848a7
Refactor split valuation kernel (#8073) 2022-07-21 15:41:50 +02:00
Jiaming Yuan
abaa593aa0
Fix compiler warnings. (#8059)
- Remove unused parameters.
- Avoid comparison of different signedness.
2022-07-14 05:29:56 +08:00
Rory Mitchell
794cbaa60a
Fuse split evaluation kernels (#8026) 2022-07-05 10:24:31 +02:00
Rory Mitchell
bc4f802b17
Batch UpdatePosition using cudaMemcpy (#7964) 2022-06-30 17:52:40 +02:00
Jiaming Yuan
142a208a90
Fix compiler warnings. (#8022)
- Remove/fix unused parameters
- Remove deprecated code in rabit.
- Update dmlc-core.
2022-06-22 21:29:10 +08:00
Jiaming Yuan
9b0eb66b78
Fix GPU driver test. (#8008)
* Initialize the training parameter.
2022-06-20 19:37:31 +08:00
Rory Mitchell
71d3b2e036
Fuse gpu_hist all-reduce calls where possible (#7867) 2022-05-17 13:27:50 +02:00
Rory Mitchell
7ef54e39ec
Small refactor to categoricals (#7858) 2022-05-05 17:47:02 +02:00
Jiaming Yuan
317d7be6ee
Always use partition based categorical splits. (#7857) 2022-05-03 22:30:32 +08:00
Jiaming Yuan
fdf533f2b9
[POC] Experimental support for l1 error. (#7812)
Support adaptive tree, a feature supported by both sklearn and lightgbm.  The tree leaf is recomputed based on residue of labels and predictions after construction.

For l1 error, the optimal value is the median (50 percentile).

This is marked as experimental support for the following reasons:
- The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors.
- Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.
2022-04-26 21:41:55 +08:00
Jiaming Yuan
0d0abe1845
Support optimal partitioning for GPU hist. (#7652)
* Implement `MaxCategory` in quantile.
* Implement partition-based split for GPU evaluation.  Currently, it's based on the existing evaluation function.
* Extract an evaluator from GPU Hist to store the needed states.
* Added some CUDA stream/event utilities.
* Update document with references.
* Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
2022-02-15 03:03:12 +08:00
Jiaming Yuan
7f399eac8b
Use double for GPU Hist node sum. (#7507) 2021-12-22 08:41:35 +08:00
Jiaming Yuan
d7d1b6e3a6
CPU evaluation for cat data. (#7393)
* Implementation for one hot based.
* Implementation for partition based. (LightGBM)
2021-11-06 14:41:35 +08:00
Jiaming Yuan
6ede12412c
Update dmlc-core and use data iter for GPU sampling tests. (#7398)
* Update dmlc-core.
* New parquet parser in dmlc-core.
* Use data iter for GPU sampling tests.
2021-11-06 05:12:49 +08:00
Jiaming Yuan
ccdabe4512
Support building gradient index with cat data. (#7371) 2021-11-03 22:37:37 +08:00
Jiaming Yuan
7a1d67f9cb
[breaking] Use integer atomic for GPU histogram. (#7180)
On GPU we use rouding factor to truncate the gradient for deterministic results. This PR changes the gradient representation to fixed point number with exponent aligned with rounding factor.

    [breaking] Drop non-deterministic histogram.
    Use fixed point for shared memory.

This PR is to improve the performance of GPU Hist. 

Co-authored-by: Andy Adinets <aadinets@nvidia.com>
2021-08-28 05:17:05 +08:00
Jiaming Yuan
bd1f3a38f0
Rewrite sparse dmatrix using callbacks. (#7092)
- Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves.
- Remove use of threaded iterator and IO queue.
- Remove `page_size`.
- Make sure the number of pages in memory is bounded.
- Make sure the cache can not be violated.
- Provide an interface for internal algorithms to process data asynchronously.
2021-07-16 12:33:31 +08:00
ShvetsKS
57c732655e
Merge lossgude and depthwise strategies for CPU hist (#7007)
* fix java/scala test: max depth is also valid parameter for lossguide

Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-06-03 01:49:43 +08:00
Jiaming Yuan
444131a2e6
Add categorical data support to GPU Hist. (#6164) 2020-09-29 11:27:25 +08:00
Jiaming Yuan
14afdb4d92
Support categorical data in ellpack. (#6140) 2020-09-24 19:28:57 +08:00