Commit Graph

1675 Commits

Author SHA1 Message Date
Jiaming Yuan
d94f6679fc [EM] Avoid synchronous calls and unnecessary ATS access. (#10811)
- Pass context into various functions.
- Factor out some CUDA algorithms.
- Use ATS only for update position.
2024-09-10 14:33:14 +08:00
Jiaming Yuan
ed5f33df16 [EM] Multi-level quantile sketching for GPU. (#10813) 2024-09-10 13:08:34 +08:00
Jiaming Yuan
5f7f31d464 [EM] Refactor ellpack construction. (#10810)
- Remove the calculation of n_symbols in the accessor.
- Pack initialization steps into the parameter list.
- Pass the context into various ctors.
- Specialization for dense data to prepare for further compression.
2024-09-09 14:10:10 +08:00
Samuel Marks
4503555274 POSIX compliant poll.h and mmap over sys/poll.h and mmap64 (#10767) 2024-09-01 15:47:30 +08:00
Jiaming Yuan
e1a2c1bbb3 [EM] Merge GPU partitioning with histogram building. (#10766)
- Stop concatenating pages if there's no subsampling.
- Use a single iteration for histogram build and partitioning.
2024-08-31 03:25:37 +08:00
Jiaming Yuan
34d4ab455e [EM] Avoid stream sync in quantile sketching. (#10765)
.
2024-08-30 12:33:24 +08:00
Jiaming Yuan
61dd854a52 [EM] Refactor GPU histogram builder. (#10764)
- Expose the maximum number of cached nodes to be consistent with the CPU implementation. Also easier for testing.
- Extract the subtraction trick for easier testing.
- Split up the `GradientQuantiser` to avoid circular dependency.
2024-08-30 02:39:14 +08:00
Jiaming Yuan
34937fea41 [EM] Python wrapper for the ExtMemQuantileDMatrix. (#10762)
Not exposed to the document yet.

- Add C API.
- Add Python API.
- Basic CPU tests.
2024-08-29 04:08:25 +08:00
Jiaming Yuan
7510a87466 [EM] Reuse the quantile container. (#10761)
Use the push method to merge the quantiles instead of creating multiple containers. This
reduces the memory usage by consistent pruning.
2024-08-29 01:39:55 +08:00
Jiaming Yuan
4fe67f10b4 [EM] Have one partitioner for each batch. (#10760)
- Initialize one partitioner for each batch.
- Collect partition size during initialization.
- Support base ridx in the finalization.
2024-08-29 01:35:17 +08:00
Jiaming Yuan
bde1265caf [EM] Return a full DMatrix instead of a Ellpack from the GPU sampler. (#10753) 2024-08-28 01:05:11 +08:00
Jiaming Yuan
d6ebcfb032 [EM] Support CPU quantile objective for external memory. (#10751) 2024-08-27 04:16:57 +08:00
Jiaming Yuan
25966e4ba8 [EM] Pass batch parameter into extmem format. (#10736)
- Allow customization for format reading.
- Customize the number of pre-fetch batches.
2024-08-27 02:37:50 +08:00
Jiaming Yuan
fd0138c91c [coll] Improve column split tests with named threads. (#10735) 2024-08-24 12:43:47 +08:00
Jiaming Yuan
55aef8f546 [EM] Avoid resizing host cache. (#10734)
* [EM] Avoid resizing host cache.

- Add SAM allocator and resource.
- Use page-based cache instead of stream-based cache.
2024-08-23 06:34:01 +08:00
Jiaming Yuan
142bdc73ec [EM] Support SHAP contribution with QDM. (#10724)
- Add GPU support.
- Add external memory support.
- Update the GPU tree shap.
2024-08-22 05:25:10 +08:00
Jiaming Yuan
cb54374550 Update clang-tidy. (#10730)
- Install cmake using pip.
- Fix compile command generation.
- Clean up the tidy script and remove the need to load the yaml file.
- Fix modernized type traits.
- Fix span class. Polymorphism support is dropped
2024-08-22 04:12:18 +08:00
Dmitry Razdoburdin
24d225c1ab [SYCL] Implement UpdatePredictionCache and connect updater with leraner. (#10701)
---------

Co-authored-by: Dmitry Razdoburdin <>
2024-08-22 02:07:44 +08:00
Jiaming Yuan
9b88495840 [multi] Implement weight feature importance. (#10700) 2024-08-22 02:06:47 +08:00
Jiaming Yuan
402e7837fb Fix potential race in feature constraint. (#10719) 2024-08-21 16:50:31 +08:00
Jiaming Yuan
508ac13243 Check cub errors. (#10721)
- Make sure cuda error returned by cub scan is caught.
- Avoid temporary buffer allocation in thrust device vector.
2024-08-21 02:50:26 +08:00
Jiaming Yuan
ec3f327c20 Add managed memory allocator. (#10711) 2024-08-17 03:02:34 +08:00
Jiaming Yuan
8d7fe262d9 [EM] Enable access to the number of batches. (#10691)
- Expose `NumBatches` in `DMatrix`.
- Small cleanup for removing legacy CUDA stream and ~force CUDA context initialization~.
- Purge old external memory data generation code.
2024-08-17 02:59:45 +08:00
Jiaming Yuan
033a666900 [EM] Log the page size of ellpack. (#10713) 2024-08-17 01:35:47 +08:00
Jiaming Yuan
582ea104b5 [EM] Enable prediction cache for GPU. (#10707)
- Use `UpdatePosition` for all nodes and skip `FinalizePosition` when external memory is used.
- Create `encode/decode` for node position, this is just as a refactor.
- Reuse code between update position and finalization.
2024-08-15 21:41:59 +08:00
Jiaming Yuan
2ecc85ffad [EM] Support ExtMemQdm in the GPU predictor. (#10694) 2024-08-13 12:21:11 +08:00
Jiaming Yuan
43704549a2 [coll] Reduce the amount of open files (socket). (#10693)
Reduce the chance of hitting `Failed to call `socket`: Too many open files`.
2024-08-13 05:23:49 +08:00
Jiaming Yuan
d414fdf2e7 [EM] Add GPU version of the external memory QDM. (#10689) 2024-08-10 10:49:43 +08:00
Jiaming Yuan
7bccc1ea2c [EM] CPU implementation for external memory QDM. (#10682)
- A new DMatrix type.
- Extract common code into a new QDM base class.

Not yet working:
- Not exposed to the interface yet, will wait for the GPU implementation.
- ~No meta info yet, still working on the source.~
- Exporting data to CSR is not supported yet.
2024-08-09 09:38:02 +08:00
Jiaming Yuan
cc3b56fc37 Cleanup GPU Hist tests. (#10677)
* Cleanup GPU Hist tests.

- Remove GPU Hist gradient sampling test. The same properties are tested in the gradient
  sampler test suite.
- Move basic histogram tests into the histogram test suite.
- Remove the header inclusion of the `updater_gpu_hist.cu` in tests.
2024-08-06 11:50:44 +08:00
Jiaming Yuan
77c844cef7 Reduce thread contention in column split tests. (#10658)
---------

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2024-08-01 18:36:46 +08:00
Jiaming Yuan
449be7a402 Quick fix for clang-tidy error. (#10641) 2024-07-26 18:21:16 +08:00
Dmitry Razdoburdin
7720272870 [sycl] add split applications and tests (#10636)
Co-authored-by: Dmitry Razdoburdin <>
2024-07-26 15:25:49 +08:00
Bobby Wang
7949a8d5f4 [jvm-packages] support missing value when constructing dmatrix with iterator (#10628) 2024-07-23 23:25:07 +08:00
Jiaming Yuan
485d90218c Catch exceptions during file read. (#10623) 2024-07-23 03:48:19 +08:00
Jiaming Yuan
a19bbc9be5 Avoid caching allocator for large allocations. (#10582) 2024-07-23 03:48:03 +08:00
Jiaming Yuan
b2cae34a8e Fix integer overflow. (#10615) 2024-07-23 02:13:15 +08:00
Jiaming Yuan
6d9fcb771e Move device histogram storage into histogram.cuh. (#10608) 2024-07-21 14:10:13 +08:00
Jiaming Yuan
cb62f9e73b [EM] Prevent init with CUDA malloc resource. (#10606) 2024-07-21 05:08:29 +08:00
Jiaming Yuan
292bb677e5 [EM] Support mmap backed ellpack. (#10602)
- Support resource view in ellpack.
- Define the CUDA version of MMAP resource.
- Define the CUDA version of malloc resource.
- Refactor cuda runtime API wrappers, and add memory access related wrappers.
- gather windows macros into a single header.
2024-07-18 08:20:21 +08:00
Jiaming Yuan
e9fbce9791 Refactor DeviceUVector. (#10595)
Create a wrapper instead of using inheritance to avoid inconsistent interface of the class.
2024-07-18 03:33:01 +08:00
Jiaming Yuan
5a92ffe3ca Partial fix for CTK 12.5 (#10574) 2024-07-16 17:41:50 +08:00
Jiaming Yuan
5fea9d24f2 Small cleanup for CMake scripts. (#10573)
- Remove rabit.
2024-07-12 05:18:23 +08:00
Jiaming Yuan
6c403187ec Fix column split race condition. (#10572) 2024-07-12 01:07:12 +08:00
Jiaming Yuan
1ca4bfd20e Avoid thrust vector initialization. (#10544)
* Avoid thrust vector initialization.

- Add a wrapper for rmm device uvector.
- Split up the `Resize` method for HDV.
2024-07-11 17:29:27 +08:00
Jiaming Yuan
5f910cd4ff [EM] Handle base idx in GPU histogram. (#10549) 2024-07-11 03:26:30 +08:00
Jiaming Yuan
34b154c284 Avoid the use of size_t in the partitioner. (#10541)
- Avoid the use of size_t in the partitioner.
- Use `Span` instead of `Elem` where `node_id` is not needed.
- Remove the `const_cast`.
- Make sure the constness is not removed in the `Elem` by making it reference only.

size_t is implementation-defined, which causes issue when we want to pass pointer or span.
2024-07-11 00:43:08 +08:00
Jiaming Yuan
baba3e9eb0 Fix empty partition. (#10559) 2024-07-10 13:01:47 +08:00
Dmitry Razdoburdin
513d7a7d84 [sycl] Reorder if-else statements to allow using of cpu branches for sycl-devices (#10543)
* reoder if-else statements for sycl compatibility

* trigger check

---------

Co-authored-by: Dmitry Razdoburdin <>
2024-07-05 16:31:48 +08:00
Jiaming Yuan
620b2b155a Cache GPU histogram kernel configuration. (#10538) 2024-07-04 15:38:59 +08:00