Jiaming Yuan
48ac9b6cbe
[coll] Allreduce. ( #9679 )
2023-10-17 13:57:14 +08:00
Rong Ou
da6803b75b
Support column-wise data split with in-memory inputs ( #9628 )
...
---------
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-10-17 12:16:39 +08:00
Jiaming Yuan
53049b16b8
[coll] Broadcast. ( #9659 )
2023-10-14 09:34:37 +08:00
Your Name
ea19555474
temp merge, disable 1 line, SetValid
2023-10-12 16:16:44 -07:00
Rong Ou
e164d51c43
Improve allgather functions ( #9649 )
2023-10-12 23:31:43 +08:00
Jiaming Yuan
946ae1c440
[coll] Implement a new tracker and a communicator. ( #9650 )
...
* [coll] Implement a new tracker and a communicator.
The new tracker and communicators communicate through the use of JSON documents. Along
with which, communicators are aware of each other.
2023-10-12 12:49:16 +08:00
Jiaming Yuan
b14e535e78
[Coll] Implement get host address in libxgboost. ( #9644 )
...
- Port `xgboost.tracker.get_host_ip` in C++.
2023-10-10 10:01:14 +08:00
Jiaming Yuan
8c676c889d
Remove internal use of gpu_id. ( #9568 )
2023-09-20 23:29:51 +08:00
Jiaming Yuan
38ac52dd87
Build a simple event loop for collective. ( #9593 )
2023-09-20 02:09:07 +08:00
Jiaming Yuan
b438d684d2
Utilities and cleanups for socket. ( #9576 )
...
- Use c++-17 nodiscard and nested ns.
- Add bind method to socket.
- Remove rabit parameters.
2023-09-14 01:41:42 +08:00
Rong Ou
c928dd4ff5
Support vertical federated learning with gpu_hist ( #9539 )
2023-09-03 11:37:11 +08:00
Rong Ou
9bab06cbca
Support column split in gpu hist updater ( #9384 )
2023-08-31 18:09:35 +08:00
Jiaming Yuan
ccfc90e4c6
[rabit] Improved connection handling. ( #9531 )
...
- Enable timeout.
- Report connection error from the system.
- Handle retry for both tracker connection and peer connection.
2023-08-30 13:00:04 +08:00
Rong Ou
6103dca0bb
Support column split in GPU evaluate splits ( #9511 )
2023-08-23 16:33:43 +08:00
Rong Ou
c2b85ab68a
Clean up MGPU C++ tests ( #9430 )
2023-08-02 14:31:18 +08:00
Rong Ou
7579905e18
Retry switching to per-thread default stream ( #9416 )
2023-07-26 07:09:12 +08:00
Jiaming Yuan
3a9996173e
Revert "Switch to per-thread default stream ( #9396 )" ( #9413 )
...
This reverts commit f7f673b00c15458fb4dd74a2a0d2ba80369c5faf.
2023-07-24 12:03:28 -07:00
Rong Ou
f7f673b00c
Switch to per-thread default stream ( #9396 )
2023-07-20 08:21:00 +08:00
Rong Ou
15ca12a77e
Fix NCCL test hang ( #9367 )
2023-07-07 11:21:35 +08:00
Rong Ou
3a0f787703
Support column split in GPU predictor ( #9343 )
2023-07-03 04:05:34 +08:00
Rong Ou
f90771eec6
Fix device communicator dependency ( #9346 )
2023-06-29 10:34:30 +08:00
Jiaming Yuan
54da4b3185
Cleanup to prepare for using mmap pointer in external memory. ( #9317 )
...
- Update SparseDMatrix comment.
- Use a pointer in the bitfield. We will replace the `std::vector<bool>` in `ColumnMatrix` with bitfield.
- Clean up the page source. The timer is removed as it's inaccurate once we swap the mmap pointer into the page.
2023-06-22 06:43:11 +08:00
Rong Ou
d8beb517ed
Support bitwise allreduce in NCCL communicator ( #9300 )
2023-06-17 01:56:50 +08:00
amdsc21
5ca7daaa13
merge latest changes
2023-06-15 21:39:14 +02:00
Rong Ou
e70810be8a
Refactor device communicator to make allreduce more flexible ( #9295 )
2023-06-14 03:53:03 +08:00
Your Name
42867a4805
sync Jun 1
2023-06-01 15:55:06 -07:00
Jiaming Yuan
03bc6e6427
Remove unused variables. ( #9210 )
...
- remove used variables.
- Remove signed comparison warnings.
2023-05-28 05:24:15 +08:00
Rong Ou
5b69534b43
Support column split in multi-target hist ( #9171 )
2023-05-26 16:56:05 +08:00
Rong Ou
acd363033e
Fix running MGPU gtests ( #9200 )
2023-05-26 05:26:38 +08:00
amdsc21
b22644fc10
add hip.h
2023-05-20 01:25:33 +02:00
amdsc21
8cad8c693c
sync up May15 2023
2023-05-15 18:59:18 +02:00
Rong Ou
52311dcec9
Fix multi-threaded gtests ( #9148 )
2023-05-10 19:15:32 +08:00
amdsc21
5446c501af
merge 23Mar01
2023-05-02 00:05:58 +02:00
Rong Ou
a320b402a5
More refactoring to take advantage of collective aggregators ( #9081 )
2023-04-26 03:36:09 +08:00
Rong Ou
8dbe0510de
More collective aggregators ( #9060 )
2023-04-22 03:32:05 +08:00
Jiaming Yuan
a7b3dd3176
Fix compiler warnings. ( #9055 )
2023-04-21 02:26:47 +08:00
Rong Ou
42d100de18
Make sure metrics work with federated learning ( #9037 )
2023-04-19 15:39:11 +08:00
amdsc21
06d9b998ce
fix CAPI BuildInfo
2023-03-28 00:14:18 +02:00
amdsc21
ee582f03c3
rm device_helpers.hip.h from cuh
2023-03-25 23:35:57 +01:00
amdsc21
f67e7de7ef
finished communicator.cu
2023-03-09 21:02:48 +01:00
amdsc21
a56055225a
fix auc.cu
2023-03-09 20:29:38 +01:00
amdsc21
0fc1f640a9
enable rocm, fix nccl_device_communicator.cuh
2023-03-08 06:18:13 +01:00
amdsc21
762fd9028d
enable rocm, fix device_communicator_adapter.cuh
2023-03-08 06:13:29 +01:00
amdsc21
f2009533e1
rm hip.h
2023-03-08 06:04:01 +01:00
amdsc21
6b7be96373
add HIP flags
2023-03-08 01:22:25 +01:00
amdsc21
ed45aa2816
Merge branch 'master' into dev-hui
2023-03-08 00:39:33 +01:00
amdsc21
c51a1c9aae
rename hip.cc to hip
2023-03-07 05:39:53 +01:00
amdsc21
cafbfce51f
add hip.h
2023-03-07 03:46:26 +01:00
amdsc21
6039a71e6c
add hip structure
2023-03-07 02:17:19 +01:00
Jiaming Yuan
4d665b3fb0
Restore clang tidy test. ( #8861 )
2023-03-03 13:47:04 -08:00