xgboost

Author	SHA1	Message	Date
Jiaming Yuan	4da4e092b5	[coll] Improvements and fixes for tracker and allreduce. (#9745 ) - Allow the tracker to wait. - Fix allreduce type cast - Return args from the federated tracker.	2023-11-02 04:06:46 +08:00
Jiaming Yuan	bc995a4865	[coll] Add federated coll. (#9738 ) - Define a new data type, the proto file is copied for now. - Merge client and communicator into `FederatedColl`. - Define CUDA variant. - Migrate tests for CPU, add tests for CUDA.	2023-11-01 04:06:46 +08:00
Jiaming Yuan	80390e6cb6	[coll] Federated comm. (#9732 )	2023-10-31 02:39:55 +08:00
Rong Ou	e164d51c43	Improve allgather functions (#9649 )	2023-10-12 23:31:43 +08:00
Jiaming Yuan	b438d684d2	Utilities and cleanups for socket. (#9576 ) - Use c++-17 nodiscard and nested ns. - Add bind method to socket. - Remove rabit parameters.	2023-09-14 01:41:42 +08:00
Jiaming Yuan	c1b2cff874	[CI] Check compiler warnings. (#9444 )	2023-08-08 12:02:45 -07:00
Philip Hyunsu Cho	a5cd2412de	Replace setup.py with pyproject.toml (#9021 ) * Create pyproject.toml * Implement a custom build backend (see below) in packager directory. Build logic from setup.py has been refactored and migrated into the new backend. * Tested: pip wheel . (build wheel), python -m build --sdist . (source distribution)	2023-04-20 13:51:39 -07:00
Jiaming Yuan	c5c8f643f2	Remove the cub submodule. (#8888 ) XGBoost now uses CTK-11.8 for binary packages, there's no need to maintain a cub submodule anymore.	2023-03-09 19:43:02 -08:00
Rong Ou	cbf98cb9c6	Add Allgather to collective communicator (#8765 ) * Add Allgather to collective communicator	2023-02-09 11:31:22 +08:00
Rong Ou	77b069c25d	Support bitwise allreduce operations in the communicator (#8623 )	2022-12-25 06:40:05 +08:00
Rong Ou	a8255ea678	Add an in-memory collective communicator (#8494 )	2022-12-01 00:24:12 +08:00
Rong Ou	4449e30184	Always link federated proto statically (#8442 )	2022-11-09 07:47:38 +08:00
Rong Ou	521086d56b	Make federated client more robust (#8351 )	2022-10-18 13:52:44 +08:00
Rong Ou	8f3dee58be	Speed up tests with federated learning enabled (#8350 ) * Speed up tests with federated learning enabled * Re-enable timeouts Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-17 15:17:04 -07:00
Philip Hyunsu Cho	2faa744aba	[CI] Test federated learning plugin in the CI (#8325 )	2022-10-12 13:57:39 -07:00
Rong Ou	39afdac3be	Better error message when world size and rank are set as strings (#8316 ) Co-authored-by: jiamingy <jm.yuan@outlook.com>	2022-10-12 15:53:25 +08:00
Rong Ou	8d4038da57	Don't split input data in federated mode (#8279 ) Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-05 18:19:28 -08:00
Rong Ou	668b8a0ea4	[Breaking] Switch from rabit to the collective communicator (#8257 ) * Switch from rabit to the collective communicator * fix size_t specialization * really fix size_t * try again * add include * more include * fix lint errors * remove rabit includes * fix pylint error * return dict from communicator context * fix communicator shutdown * fix dask test * reset communicator mocklist * fix distributed tests * do not save device communicator * fix jvm gpu tests * add python test for federated communicator * Update gputreeshap submodule Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-05 14:39:01 -08:00
Rong Ou	a2686543a9	Common interface for collective communication (#8057 ) * implement broadcast for federated communicator * implement allreduce * add communicator factory * add device adapter * add device communicator to factory * add rabit communicator * add rabit communicator to the factory * add nccl device communicator * add synchronize to device communicator * add back print and getprocessorname * add python wrapper and c api * clean up types * fix non-gpu build * try to fix ci * fix std::size_t * portable string compare ignore case * c style size_t * fix lint errors * cross platform setenv * fix memory leak * fix lint errors * address review feedback * add python test for rabit communicator * fix failing gtest * use json to configure communicators * fix lint error * get rid of factories * fix cpu build * fix include * fix python import * don't export collective.py yet * skip collective communicator pytest on windows * add review feedback * update documentation * remove mpi communicator type * fix tests * shutdown the communicator separately Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2022-09-12 15:21:12 -07:00
Rong Ou	d6e2013c5f	Set max message size in insecure gRPC (#8203 )	2022-08-26 16:33:51 +08:00
Rong Ou	ad3bc0edee	Allow insecure gRPC connections for federated learning (#8181 ) * Allow insecure gRPC connections for federated learning * format	2022-08-19 12:16:14 +08:00
Rong Ou	45dc1f818a	Make federated plugin work with cmake 3.16.3 (#8029 )	2022-06-27 17:26:41 +08:00
Rong Ou	0725fd6081	fix federated learning plugin (#8027 )	2022-06-24 08:41:07 +08:00
Rong Ou	e5ec546da5	[Breaking] Remove rabit support for custom reductions and `grow_local_histmaker` updater (#7992 )	2022-06-21 15:08:23 +08:00
Rong Ou	31e6902e43	Support GPU training in the NVFlare demo (#7965 )	2022-06-02 21:52:36 +08:00
Rong Ou	d3429f2ff6	Increase gRPC max receive message size for federated learning (#7958 )	2022-06-01 13:21:54 +08:00
Rong Ou	af907e2d0d	Demo of federated learning using NVFlare (#7879 ) Co-authored-by: jiamingy <jm.yuan@outlook.com>	2022-05-14 22:45:41 +08:00
Rong Ou	14ef38b834	Initial support for federated learning (#7831 ) Federated learning plugin for xgboost: * A gRPC server to aggregate MPI-style requests (allgather, allreduce, broadcast) from federated workers. * A Rabit engine for the federated environment. * Integration test to simulate federated learning. Additional followups are needed to address GPU support, better security, and privacy, etc.	2022-05-05 21:49:22 +08:00

28 Commits