1646 Commits

Author SHA1 Message Date
Jiaming Yuan
cc3b56fc37
Cleanup GPU Hist tests. (#10677)
* Cleanup GPU Hist tests.

- Remove GPU Hist gradient sampling test. The same properties are tested in the gradient
  sampler test suite.
- Move basic histogram tests into the histogram test suite.
- Remove the header inclusion of the `updater_gpu_hist.cu` in tests.
2024-08-06 11:50:44 +08:00
Jiaming Yuan
77c844cef7
Reduce thread contention in column split tests. (#10658)
---------

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2024-08-01 18:36:46 +08:00
Jiaming Yuan
449be7a402
Quick fix for clang-tidy error. (#10641) 2024-07-26 18:21:16 +08:00
Dmitry Razdoburdin
7720272870
[sycl] add split applications and tests (#10636)
Co-authored-by: Dmitry Razdoburdin <>
2024-07-26 15:25:49 +08:00
Bobby Wang
7949a8d5f4
[jvm-packages] support missing value when constructing dmatrix with iterator (#10628) 2024-07-23 23:25:07 +08:00
Jiaming Yuan
485d90218c
Catch exceptions during file read. (#10623) 2024-07-23 03:48:19 +08:00
Jiaming Yuan
a19bbc9be5
Avoid caching allocator for large allocations. (#10582) 2024-07-23 03:48:03 +08:00
Jiaming Yuan
b2cae34a8e
Fix integer overflow. (#10615) 2024-07-23 02:13:15 +08:00
Jiaming Yuan
6d9fcb771e
Move device histogram storage into histogram.cuh. (#10608) 2024-07-21 14:10:13 +08:00
Jiaming Yuan
cb62f9e73b
[EM] Prevent init with CUDA malloc resource. (#10606) 2024-07-21 05:08:29 +08:00
Jiaming Yuan
292bb677e5
[EM] Support mmap backed ellpack. (#10602)
- Support resource view in ellpack.
- Define the CUDA version of MMAP resource.
- Define the CUDA version of malloc resource.
- Refactor cuda runtime API wrappers, and add memory access related wrappers.
- gather windows macros into a single header.
2024-07-18 08:20:21 +08:00
Jiaming Yuan
e9fbce9791
Refactor DeviceUVector. (#10595)
Create a wrapper instead of using inheritance to avoid inconsistent interface of the class.
2024-07-18 03:33:01 +08:00
Jiaming Yuan
5a92ffe3ca
Partial fix for CTK 12.5 (#10574) 2024-07-16 17:41:50 +08:00
Jiaming Yuan
5fea9d24f2
Small cleanup for CMake scripts. (#10573)
- Remove rabit.
2024-07-12 05:18:23 +08:00
Jiaming Yuan
6c403187ec
Fix column split race condition. (#10572) 2024-07-12 01:07:12 +08:00
Jiaming Yuan
1ca4bfd20e
Avoid thrust vector initialization. (#10544)
* Avoid thrust vector initialization.

- Add a wrapper for rmm device uvector.
- Split up the `Resize` method for HDV.
2024-07-11 17:29:27 +08:00
Jiaming Yuan
5f910cd4ff
[EM] Handle base idx in GPU histogram. (#10549) 2024-07-11 03:26:30 +08:00
Jiaming Yuan
34b154c284
Avoid the use of size_t in the partitioner. (#10541)
- Avoid the use of size_t in the partitioner.
- Use `Span` instead of `Elem` where `node_id` is not needed.
- Remove the `const_cast`.
- Make sure the constness is not removed in the `Elem` by making it reference only.

size_t is implementation-defined, which causes issue when we want to pass pointer or span.
2024-07-11 00:43:08 +08:00
Jiaming Yuan
baba3e9eb0
Fix empty partition. (#10559) 2024-07-10 13:01:47 +08:00
Dmitry Razdoburdin
513d7a7d84
[sycl] Reorder if-else statements to allow using of cpu branches for sycl-devices (#10543)
* reoder if-else statements for sycl compatibility

* trigger check

---------

Co-authored-by: Dmitry Razdoburdin <>
2024-07-05 16:31:48 +08:00
Jiaming Yuan
620b2b155a
Cache GPU histogram kernel configuration. (#10538) 2024-07-04 15:38:59 +08:00
Jiaming Yuan
628411a654
Enhance the threadpool implementation. (#10531)
- Accept an initialization function.
- Support void return tasks.
2024-07-03 12:13:27 +08:00
Jiaming Yuan
9cb4c938da
[EM] Move prefetch in reset into the end of the iteration. (#10529) 2024-07-03 03:48:18 +08:00
Jiaming Yuan
d33043a348
[coll] Allow using local host for testing. (#10526)
- Don't try to retrieve the IP address if a host is specified.
- Fix compiler deprecation warning.
2024-07-02 15:34:38 +08:00
Jiaming Yuan
a39fef2c67
[fed] Fixes for the encrypted GRPC backend. (#10503) 2024-07-02 15:15:12 +08:00
Jiaming Yuan
5f0c1e902b
Small cleanup for error message. (#10502)
- The `Fail` function can handle file location automatically.
- Report concatenated error for connection poll.
- Typos.
2024-07-02 13:36:41 +08:00
Philip Hyunsu Cho
09d32f1f2b
Fix build and C++ tests for FreeBSD (#10480) 2024-06-28 01:47:55 -07:00
Jiaming Yuan
e8a962575a
[EM] Allow staging ellpack on host for GPU external memory. (#10488)
- New parameter `on_host`.
- Abstract format creation and stream creation into policy classes.
2024-06-28 04:42:18 +08:00
Jiaming Yuan
2b400b18d5
Small cleanup for rowset collection. (#10401) 2024-06-19 18:06:23 +08:00
Jiaming Yuan
e5f1720656
[EM] Avoid writing cut matrix to cache. (#10444) 2024-06-19 18:03:38 +08:00
Jiaming Yuan
b4cc350ec5
Fix categorical data with external memory. (#10433) 2024-06-18 04:34:54 +08:00
Jiaming Yuan
49e25cfb36
Allow unaligned pointer if the array is empty. (#10418)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2024-06-15 19:10:21 +08:00
Philip Hyunsu Cho
1ace9c66ec
[CI] Fix JVM tests on Windows (#10404) 2024-06-15 00:21:40 -07:00
Jiaming Yuan
0808e50ae8
Sync stream in ellpack format. (#10374) 2024-06-04 12:58:26 +08:00
Jiaming Yuan
4f48647932
Fix typo. (#10353) 2024-06-02 02:07:55 +08:00
Jiaming Yuan
d2d01d977a
Remove unnecessary fetch operations in external memory. (#10342) 2024-05-31 13:16:40 +08:00
Jiaming Yuan
e6eefea5e2
[coll] Move the rabit poll helper. (#10349) 2024-05-31 08:02:21 +08:00
Jiaming Yuan
2de67f0050
[coll] Prevent race during error check. (#10319) 2024-05-28 15:43:16 -07:00
Philip Hyunsu Cho
7ae5c972f9
[CI] Upgrade github workflows to use latest Conda setup action (#10320)
Co-authored-by: Christian Clauss <cclauss@me.com>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2024-05-28 10:23:07 -07:00
Jiaming Yuan
5627af6b21
[coll] Increase timeout limit. (#10332) 2024-05-28 10:20:49 +08:00
Jiaming Yuan
966dc81788
[coll] Keep the tracker alive during initialization error. (#10306) 2024-05-23 11:13:59 +08:00
Jiaming Yuan
d5fcbee44b
Add timeout for distributed tests. (#10315) 2024-05-23 11:11:49 +08:00
Jiaming Yuan
a5a58102e5
Revamp the rabit implementation. (#10112)
This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features:
- Federated learning for both CPU and GPU.
- NCCL.
- More data types.
- A unified interface for all the underlying implementations.
- Improved timeout handling for both tracker and workers.
- Exhausted tests with metrics (fixed a couple of bugs along the way).
- A reusable tracker for Python and JVM packages.
2024-05-20 11:56:23 +08:00
Jiaming Yuan
835e59e538
Use a thread pool for external memory. (#10288) 2024-05-16 19:32:12 +08:00
Jiaming Yuan
5de57435c7
Be more lenient on floating point error for AUC. (#10264) 2024-05-11 08:48:11 +08:00
Jiaming Yuan
5e64276a9b
Update nvtx. (#10227) 2024-04-29 06:33:46 +08:00
Jiaming Yuan
3fbb221fec
[coll] Implement shutdown for tracker and comm. (#10208)
- Force shutdown the tracker.
- Implement shutdown notice for error handling thread in comm.
2024-04-20 04:08:17 +08:00
Jiaming Yuan
3f64b4fde3
[coll] Add global functions. (#10203) 2024-04-19 03:17:23 +08:00
Jiaming Yuan
4b10200456
[coll] Improve event loop. (#10199)
- Add a test for blocking calls.
- Do not require the queue to be empty after waking up; this frees up the thread to answer blocking calls.
- Handle EOF in read.
- Improve the error message in the result. Allow concatenation of multiple results.
2024-04-18 03:29:52 +08:00
Jiaming Yuan
1022909bbe
Fix global config for external memory. (#10173)
Pass the thread-local configuration between threads.
2024-04-11 01:29:28 +08:00