Jiaming Yuan
3fbb221fec
[coll] Implement shutdown for tracker and comm. ( #10208 )
...
- Force shutdown the tracker.
- Implement shutdown notice for error handling thread in comm.
2024-04-20 04:08:17 +08:00
Jiaming Yuan
8bad677c2f
Update collective implementation. ( #10152 )
...
* Update collective implementation.
- Cleanup resource during `Finalize` to avoid handling threads in destructor.
- Calculate the size for allgather automatically.
- Use simple allgather for small (smaller than the number of worker) allreduce.
2024-03-30 18:57:31 +08:00
Jiaming Yuan
ca4801f81d
Work with IPv6 in the new tracker. ( #10125 )
2024-03-20 05:19:23 +08:00
Jiaming Yuan
4da4e092b5
[coll] Improvements and fixes for tracker and allreduce. ( #9745 )
...
- Allow the tracker to wait.
- Fix allreduce type cast
- Return args from the federated tracker.
2023-11-02 04:06:46 +08:00
Jiaming Yuan
6755179e77
[coll] Add nccl. ( #9726 )
2023-10-28 16:33:58 +08:00
Jiaming Yuan
b771f58453
[coll] Define interface for bridging. ( #9695 )
...
* Define the basic interface that will shared by nccl, federated and native.
2023-10-20 16:20:48 +08:00