Jiaming Yuan
4da4e092b5
[coll] Improvements and fixes for tracker and allreduce. ( #9745 )
...
- Allow the tracker to wait.
- Fix allreduce type cast
- Return args from the federated tracker.
2023-11-02 04:06:46 +08:00
Hui Liu
129bb76941
enable federated
2023-10-31 16:31:56 -07:00
Hui Liu
123af45327
Merge branch 'master'
2023-10-31 15:59:31 -07:00
Jiaming Yuan
bc995a4865
[coll] Add federated coll. ( #9738 )
...
- Define a new data type, the proto file is copied for now.
- Merge client and communicator into `FederatedColl`.
- Define CUDA variant.
- Migrate tests for CPU, add tests for CUDA.
2023-11-01 04:06:46 +08:00
Hui Liu
8fab17ae8f
rm hip.h files
2023-10-30 21:20:28 -07:00
Hui Liu
9b7aa1a7cd
unify cuda to hip
2023-10-30 17:12:06 -07:00
Hui Liu
4eb371b3f0
unify cuda to hip
2023-10-30 17:10:06 -07:00
Hui Liu
6df27eadc9
rm hip_category from source
2023-10-30 16:34:49 -07:00
Hui Liu
02f5464fa6
enable coll and comm
2023-10-30 15:15:05 -07:00
Hui Liu
b6b5218245
enable RCCL
2023-10-30 14:05:04 -07:00
Hui Liu
d7f1235b7d
Merge branch 'master' into sync-condition-2023Oct11
2023-10-30 13:19:33 -07:00
Hui Liu
1bedd76e94
rm un-necessary code
2023-10-30 13:14:45 -07:00
Jiaming Yuan
80390e6cb6
[coll] Federated comm. ( #9732 )
2023-10-31 02:39:55 +08:00
Jiaming Yuan
6755179e77
[coll] Add nccl. ( #9726 )
2023-10-28 16:33:58 +08:00
Hui Liu
32ae49ab92
temp hack for multi GPUs
2023-10-27 13:00:49 -07:00
Hui Liu
6bbca9a8b7
restore learner
2023-10-27 11:15:06 -07:00
Hui Liu
6762230d9a
namespace to reduce code
2023-10-27 10:51:32 -07:00
Hui Liu
4302200a33
Merge branch 'master' into sync-condition-2023Oct11
2023-10-27 10:09:37 -07:00
Hui Liu
4a4b528d54
add namespace aliases to reduce code
2023-10-27 09:11:55 -07:00
Dmitry Razdoburdin
9c22df9342
Fix mingw hanging on regex in context ( #9729 )
...
---------
Co-authored-by: Dmitry Razdoburdin <>
2023-10-27 20:01:35 +08:00
Dmitry Razdoburdin
f41a08fda8
Add 'sycl' devices to the context ( #9691 )
...
Co-authored-by: Dmitry Razdoburdin <>
2023-10-26 22:17:56 +08:00
Hui Liu
cd28b9f997
add back per-thread
2023-10-24 15:17:19 -07:00
Hui Liu
3752b06550
Merge branch 'master' into sync-condition-2023Oct11
2023-10-24 10:46:38 -07:00
Jiaming Yuan
7a02facc9d
Serialize expand entry for allgather. ( #9702 )
2023-10-24 14:33:28 +08:00
Hui Liu
79319dfd4d
format
2023-10-23 22:29:48 -07:00
Hui Liu
558352afc9
fix stream
2023-10-23 21:51:20 -07:00
Hui Liu
643b334919
add nccl_device_communicator.hip
2023-10-23 16:43:03 -07:00
Hui Liu
6ba66463b6
fix uuid and Clear/SetValid
2023-10-23 16:32:26 -07:00
Hui Liu
55994b1ac7
enable ROCm on latest XGBoost
2023-10-23 11:15:04 -07:00
Hui Liu
15421e40d9
enable ROCm on latest XGBoost
2023-10-23 11:07:08 -07:00
Philip Hyunsu Cho
5e6cb63a56
[CI] Set up CI for Mac M1 ( #9699 )
2023-10-22 23:33:19 -07:00
Jiaming Yuan
b771f58453
[coll] Define interface for bridging. ( #9695 )
...
* Define the basic interface that will shared by nccl, federated and native.
2023-10-20 16:20:48 +08:00
Philip Hyunsu Cho
3b86260b50
Fix build for AppleClang 11 ( #9684 ) ( #9693 )
2023-10-18 12:27:21 -07:00
Jiaming Yuan
5d1bcde719
[coll] allgatherv. ( #9688 )
2023-10-19 03:13:50 +08:00
Dmitry Razdoburdin
ea9f09716b
Reorder if-else statements to allow using of cpu branches for sycl-devices ( #9682 )
2023-10-18 10:55:33 +08:00
Jiaming Yuan
4c0e4422d0
[coll] allgather. ( #9681 )
2023-10-18 10:22:18 +08:00
Your Name
ffbbc9c968
add cuda to hip wrapper
2023-10-17 12:42:37 -07:00
Jiaming Yuan
48ac9b6cbe
[coll] Allreduce. ( #9679 )
2023-10-17 13:57:14 +08:00
Rong Ou
da6803b75b
Support column-wise data split with in-memory inputs ( #9628 )
...
---------
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-10-17 12:16:39 +08:00
James Lamb
eb562d3829
[CI] address cmakelint warnings about whitespace ( #9674 )
2023-10-14 12:46:07 +08:00
Jiaming Yuan
53049b16b8
[coll] Broadcast. ( #9659 )
2023-10-14 09:34:37 +08:00
Your Name
ea19555474
temp merge, disable 1 line, SetValid
2023-10-12 16:16:44 -07:00
Rong Ou
e164d51c43
Improve allgather functions ( #9649 )
2023-10-12 23:31:43 +08:00
Jiaming Yuan
946ae1c440
[coll] Implement a new tracker and a communicator. ( #9650 )
...
* [coll] Implement a new tracker and a communicator.
The new tracker and communicators communicate through the use of JSON documents. Along
with which, communicators are aware of each other.
2023-10-12 12:49:16 +08:00
Jiaming Yuan
084d89216c
Add support for cgroupv2. ( #9651 )
2023-10-12 09:36:36 +08:00
Rong Ou
0ecb4de963
[breaking] Change DMatrix construction to be distributed ( #9623 )
...
* Change column-split DMatrix construction to be distributed
* remove splitting code for row split
2023-10-10 23:35:57 +08:00
Jiaming Yuan
b14e535e78
[Coll] Implement get host address in libxgboost. ( #9644 )
...
- Port `xgboost.tracker.get_host_ip` in C++.
2023-10-10 10:01:14 +08:00
Jiaming Yuan
680d53db43
Extract JSON utils. ( #9645 )
2023-10-10 07:15:14 +08:00
James Lamb
db8d117f7e
[CI] standardize endif() calls in CMake scripts ( #9637 )
2023-10-08 11:45:20 +08:00
Jiaming Yuan
4d7a187cb0
Remove XGBoosterGetModelRaw. ( #9617 )
...
Deprecated in 1.6.
2023-09-29 02:29:33 +08:00