14 Commits

Author SHA1 Message Date
Jiaming Yuan
e6eefea5e2
[coll] Move the rabit poll helper. (#10349) 2024-05-31 08:02:21 +08:00
Jiaming Yuan
a5a58102e5
Revamp the rabit implementation. (#10112)
This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features:
- Federated learning for both CPU and GPU.
- NCCL.
- More data types.
- A unified interface for all the underlying implementations.
- Improved timeout handling for both tracker and workers.
- Exhausted tests with metrics (fixed a couple of bugs along the way).
- A reusable tracker for Python and JVM packages.
2024-05-20 11:56:23 +08:00
Jiaming Yuan
3fbb221fec
[coll] Implement shutdown for tracker and comm. (#10208)
- Force shutdown the tracker.
- Implement shutdown notice for error handling thread in comm.
2024-04-20 04:08:17 +08:00
Jiaming Yuan
4b10200456
[coll] Improve event loop. (#10199)
- Add a test for blocking calls.
- Do not require the queue to be empty after waking up; this frees up the thread to answer blocking calls.
- Handle EOF in read.
- Improve the error message in the result. Allow concatenation of multiple results.
2024-04-18 03:29:52 +08:00
Jiaming Yuan
ca4801f81d
Work with IPv6 in the new tracker. (#10125) 2024-03-20 05:19:23 +08:00
Jiaming Yuan
5ac233280e
Require context in aggregators. (#10075) 2024-02-28 03:12:42 +08:00
Jiaming Yuan
ada377c57e
[coll] Reduce the scope of lock in the event loop. (#9784) 2023-11-15 14:16:19 +08:00
Jiaming Yuan
946ae1c440
[coll] Implement a new tracker and a communicator. (#9650)
* [coll] Implement a new tracker and a communicator.

The new tracker and communicators communicate through the use of JSON documents. Along
with which, communicators are aware of each other.
2023-10-12 12:49:16 +08:00
Jiaming Yuan
b14e535e78
[Coll] Implement get host address in libxgboost. (#9644)
- Port `xgboost.tracker.get_host_ip` in C++.
2023-10-10 10:01:14 +08:00
Jiaming Yuan
b438d684d2
Utilities and cleanups for socket. (#9576)
- Use c++-17 nodiscard and nested ns.
- Add bind method to socket.
- Remove rabit parameters.
2023-09-14 01:41:42 +08:00
Jiaming Yuan
ccfc90e4c6
[rabit] Improved connection handling. (#9531)
- Enable timeout.
- Report connection error from the system.
- Handle retry for both tracker connection and peer connection.
2023-08-30 13:00:04 +08:00
Jiaming Yuan
bcb55d3b6a
Portable macro definition. (#8999) 2023-03-31 20:48:59 +08:00
Gavin Zhang
0c6266bc4a
SO_DOMAIN do not support on IBM i, using getsockname instead (#8437)
Co-authored-by: GavinZhang <zhanggan@cn.ibm.com>
2022-11-10 23:54:57 +08:00
Jiaming Yuan
b791446623
Initial support for IPv6 (#8225)
- Merge rabit socket into XGBoost.
- Dask interface support.
- Add test to the socket.
2022-09-21 18:06:50 +08:00