699 Commits

Author SHA1 Message Date
Dmitry Razdoburdin
381f1d3dc9
Add support inference on SYCL devices (#9800)
---------

Co-authored-by: Dmitry Razdoburdin <>
Co-authored-by: Nikolay Petrov <nikolay.a.petrov@intel.com>
Co-authored-by: Alexandra <alexandra.epanchinzeva@intel.com>
2023-12-04 16:15:57 +08:00
Jiaming Yuan
8fe1a2213c
Cleanup code for distributed training. (#9805)
* Cleanup code for distributed training.

- Merge `GetNcclResult` into nccl stub.
- Split up utilities from the main dask module.
- Let Channel return `Result` to accommodate nccl channel.
- Remove old `use_label_encoder` parameter.
2023-11-25 09:10:56 +08:00
Jiaming Yuan
0715ab3c10
Use dlopen to load NCCL. (#9796)
This PR adds optional support for loading nccl with `dlopen` as an alternative of compile time linking. This is to address the size bloat issue with the PyPI binary release.
- Add CMake option to load `nccl` at runtime.
- Add an NCCL stub.

After this, `nccl` will be fetched from PyPI when using pip to install XGBoost, either by a user or by `pyproject.toml`. Others who want to link the nccl at compile time can continue to do so without any change.

At the moment, this is Linux only since we only support MNMG on Linux.
2023-11-22 19:27:31 +08:00
Jiaming Yuan
fedd9674c8
Implement column sampler in CUDA. (#9785)
- CUDA implementation.
- Extract the broadcasting logic, we will need the context parameter after revamping the collective implementation.
- Some changes to the event loop for fixing a deadlock in CI.
- Move argsort into algorithms.cuh, add support for cuda stream.
2023-11-17 04:29:08 +08:00
Jiaming Yuan
ada377c57e
[coll] Reduce the scope of lock in the event loop. (#9784) 2023-11-15 14:16:19 +08:00
Jiaming Yuan
6fd4a30667
[coll] Increase timeout for allgather test. (#9777) 2023-11-09 05:26:40 +08:00
Jiaming Yuan
44099f585d
[coll] Add C API for the tracker. (#9773) 2023-11-08 18:17:14 +08:00
Jiaming Yuan
06bdc15e9b
[coll] Pass context to various functions. (#9772)
* [coll] Pass context to various functions.

In the future, the `Context` object would be required for collective operations, this PR
passes the context object to some required functions to prepare for swapping out the
implementation.
2023-11-08 09:54:05 +08:00
Jiaming Yuan
6c0a190f6d
[coll] Add comm group. (#9759)
- Implement `CommGroup` for double dispatching.
- Small cleanup to tracker for handling abort.
2023-11-07 11:12:31 +08:00
Jiaming Yuan
4da4e092b5
[coll] Improvements and fixes for tracker and allreduce. (#9745)
- Allow the tracker to wait.
- Fix allreduce type cast
- Return args from the federated tracker.
2023-11-02 04:06:46 +08:00
Jiaming Yuan
bc995a4865
[coll] Add federated coll. (#9738)
- Define a new data type, the proto file is copied for now.
- Merge client and communicator into `FederatedColl`.
- Define CUDA variant.
- Migrate tests for CPU, add tests for CUDA.
2023-11-01 04:06:46 +08:00
Philip Hyunsu Cho
6b98305db4
[CI] Enable gmock in gtest (#9737) 2023-10-31 20:09:35 +08:00
Jiaming Yuan
80390e6cb6
[coll] Federated comm. (#9732) 2023-10-31 02:39:55 +08:00
Jiaming Yuan
6755179e77
[coll] Add nccl. (#9726) 2023-10-28 16:33:58 +08:00
Dmitry Razdoburdin
f41a08fda8
Add 'sycl' devices to the context (#9691)
Co-authored-by: Dmitry Razdoburdin <>
2023-10-26 22:17:56 +08:00
Jiaming Yuan
7a02facc9d
Serialize expand entry for allgather. (#9702) 2023-10-24 14:33:28 +08:00
Philip Hyunsu Cho
5e6cb63a56
[CI] Set up CI for Mac M1 (#9699) 2023-10-22 23:33:19 -07:00
Jiaming Yuan
b771f58453
[coll] Define interface for bridging. (#9695)
* Define the basic interface that will shared by nccl, federated and native.
2023-10-20 16:20:48 +08:00
Philip Hyunsu Cho
3b86260b50
Fix build for AppleClang 11 (#9684) (#9693) 2023-10-18 12:27:21 -07:00
Jiaming Yuan
5d1bcde719
[coll] allgatherv. (#9688) 2023-10-19 03:13:50 +08:00
Jiaming Yuan
4c0e4422d0
[coll] allgather. (#9681) 2023-10-18 10:22:18 +08:00
Jiaming Yuan
48ac9b6cbe
[coll] Allreduce. (#9679) 2023-10-17 13:57:14 +08:00
Rong Ou
da6803b75b
Support column-wise data split with in-memory inputs (#9628)
---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-10-17 12:16:39 +08:00
James Lamb
eb562d3829
[CI] address cmakelint warnings about whitespace (#9674) 2023-10-14 12:46:07 +08:00
Jiaming Yuan
53049b16b8
[coll] Broadcast. (#9659) 2023-10-14 09:34:37 +08:00
Rong Ou
e164d51c43
Improve allgather functions (#9649) 2023-10-12 23:31:43 +08:00
Jiaming Yuan
946ae1c440
[coll] Implement a new tracker and a communicator. (#9650)
* [coll] Implement a new tracker and a communicator.

The new tracker and communicators communicate through the use of JSON documents. Along
with which, communicators are aware of each other.
2023-10-12 12:49:16 +08:00
James Lamb
2e42f33fc1
[CI] standardize else() and enfunction() calls in CMake scripts (#9653) 2023-10-12 11:14:19 +08:00
Rong Ou
0ecb4de963
[breaking] Change DMatrix construction to be distributed (#9623)
* Change column-split DMatrix construction to be distributed

* remove splitting code for row split
2023-10-10 23:35:57 +08:00
Jiaming Yuan
b14e535e78
[Coll] Implement get host address in libxgboost. (#9644)
- Port `xgboost.tracker.get_host_ip` in C++.
2023-10-10 10:01:14 +08:00
Jiaming Yuan
680d53db43
Extract JSON utils. (#9645) 2023-10-10 07:15:14 +08:00
James Lamb
db8d117f7e
[CI] standardize endif() calls in CMake scripts (#9637) 2023-10-08 11:45:20 +08:00
Rong Ou
3f2093fb81
Test monotone constraints with column split (#9613) 2023-09-28 04:54:53 +08:00
Rong Ou
d6d14d0fb9
Integration tests for interaction constraints with column-wise data split (#9611) 2023-09-27 08:27:43 +08:00
Rong Ou
290b17ffda
Test column sampler with column-wise data split (#9609) 2023-09-26 13:31:23 +08:00
Rong Ou
def77870f3
Test categorical features with column-split gpu quantile (#9595) 2023-09-23 09:55:09 +08:00
Jiaming Yuan
8c676c889d
Remove internal use of gpu_id. (#9568) 2023-09-20 23:29:51 +08:00
Jiaming Yuan
38ac52dd87
Build a simple event loop for collective. (#9593) 2023-09-20 02:09:07 +08:00
Rong Ou
d8c3cc92ae
More support for column split in gpu predictor (#9562) 2023-09-14 08:13:13 +08:00
Jiaming Yuan
300f9ace06
Fix default metric configuration. (#9575) 2023-09-13 13:05:47 -07:00
Jiaming Yuan
b438d684d2
Utilities and cleanups for socket. (#9576)
- Use c++-17 nodiscard and nested ns.
- Add bind method to socket.
- Remove rabit parameters.
2023-09-14 01:41:42 +08:00
Rong Ou
66a0832778
Add tests for gpu_approx (#9553) 2023-09-07 17:21:58 +08:00
Jiaming Yuan
adea842c83
Fix inplace predict with fallback when base margin is used. (#9536)
- Copy meta info from proxy DMatrix.
- Use `std::call_once` to emit less warnings.
2023-09-05 01:04:24 +08:00
Rong Ou
c928dd4ff5
Support vertical federated learning with gpu_hist (#9539) 2023-09-03 11:37:11 +08:00
Rong Ou
9bab06cbca
Support column split in gpu hist updater (#9384) 2023-08-31 18:09:35 +08:00
Jiaming Yuan
ccfc90e4c6
[rabit] Improved connection handling. (#9531)
- Enable timeout.
- Report connection error from the system.
- Handle retry for both tracker connection and peer connection.
2023-08-30 13:00:04 +08:00
Jiaming Yuan
ddf2e68821
Use the new DeviceOrd in the linalg module. (#9527) 2023-08-29 13:37:29 +08:00
Jiaming Yuan
972730cde0
Use matrix for gradient. (#9508)
- Use the `linalg::Matrix` for storing gradients.
- New API for the custom objective.
- Custom objective for multi-class/multi-target is now required to return the correct shape.
- Custom objective for Python can accept arrays with any strides. (row-major, column-major)
2023-08-24 05:29:52 +08:00
Rong Ou
6103dca0bb
Support column split in GPU evaluate splits (#9511) 2023-08-23 16:33:43 +08:00
Jiaming Yuan
3c09399f29
Fix device dispatch for linear updater. (#9507) 2023-08-23 00:17:35 +08:00