Rong Ou
|
c2b85ab68a
|
Clean up MGPU C++ tests (#9430)
|
2023-08-02 14:31:18 +08:00 |
|
Rong Ou
|
15ca12a77e
|
Fix NCCL test hang (#9367)
|
2023-07-07 11:21:35 +08:00 |
|
Rong Ou
|
f90771eec6
|
Fix device communicator dependency (#9346)
|
2023-06-29 10:34:30 +08:00 |
|
Rong Ou
|
d8beb517ed
|
Support bitwise allreduce in NCCL communicator (#9300)
|
2023-06-17 01:56:50 +08:00 |
|
Rong Ou
|
e70810be8a
|
Refactor device communicator to make allreduce more flexible (#9295)
|
2023-06-14 03:53:03 +08:00 |
|
Jiaming Yuan
|
ea04d4c46c
|
[doc] [dask] Troubleshooting NCCL errors. (#8943)
|
2023-03-22 22:17:26 +08:00 |
|
Rong Ou
|
a2686543a9
|
Common interface for collective communication (#8057)
* implement broadcast for federated communicator
* implement allreduce
* add communicator factory
* add device adapter
* add device communicator to factory
* add rabit communicator
* add rabit communicator to the factory
* add nccl device communicator
* add synchronize to device communicator
* add back print and getprocessorname
* add python wrapper and c api
* clean up types
* fix non-gpu build
* try to fix ci
* fix std::size_t
* portable string compare ignore case
* c style size_t
* fix lint errors
* cross platform setenv
* fix memory leak
* fix lint errors
* address review feedback
* add python test for rabit communicator
* fix failing gtest
* use json to configure communicators
* fix lint error
* get rid of factories
* fix cpu build
* fix include
* fix python import
* don't export collective.py yet
* skip collective communicator pytest on windows
* add review feedback
* update documentation
* remove mpi communicator type
* fix tests
* shutdown the communicator separately
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
|
2022-09-12 15:21:12 -07:00 |
|