19 Commits

Author SHA1 Message Date
Rong Ou
77b069c25d
Support bitwise allreduce operations in the communicator (#8623) 2022-12-25 06:40:05 +08:00
Rong Ou
a8255ea678
Add an in-memory collective communicator (#8494) 2022-12-01 00:24:12 +08:00
Rong Ou
4449e30184
Always link federated proto statically (#8442) 2022-11-09 07:47:38 +08:00
Rong Ou
521086d56b
Make federated client more robust (#8351) 2022-10-18 13:52:44 +08:00
Rong Ou
8f3dee58be
Speed up tests with federated learning enabled (#8350)
* Speed up tests with federated learning enabled

* Re-enable timeouts

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-17 15:17:04 -07:00
Philip Hyunsu Cho
2faa744aba
[CI] Test federated learning plugin in the CI (#8325) 2022-10-12 13:57:39 -07:00
Rong Ou
39afdac3be
Better error message when world size and rank are set as strings (#8316)
Co-authored-by: jiamingy <jm.yuan@outlook.com>
2022-10-12 15:53:25 +08:00
Rong Ou
8d4038da57
Don't split input data in federated mode (#8279)
Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 18:19:28 -08:00
Rong Ou
668b8a0ea4
[Breaking] Switch from rabit to the collective communicator (#8257)
* Switch from rabit to the collective communicator

* fix size_t specialization

* really fix size_t

* try again

* add include

* more include

* fix lint errors

* remove rabit includes

* fix pylint error

* return dict from communicator context

* fix communicator shutdown

* fix dask test

* reset communicator mocklist

* fix distributed tests

* do not save device communicator

* fix jvm gpu tests

* add python test for federated communicator

* Update gputreeshap submodule

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 14:39:01 -08:00
Rong Ou
a2686543a9
Common interface for collective communication (#8057)
* implement broadcast for federated communicator

* implement allreduce

* add communicator factory

* add device adapter

* add device communicator to factory

* add rabit communicator

* add rabit communicator to the factory

* add nccl device communicator

* add synchronize to device communicator

* add back print and getprocessorname

* add python wrapper and c api

* clean up types

* fix non-gpu build

* try to fix ci

* fix std::size_t

* portable string compare ignore case

* c style size_t

* fix lint errors

* cross platform setenv

* fix memory leak

* fix lint errors

* address review feedback

* add python test for rabit communicator

* fix failing gtest

* use json to configure communicators

* fix lint error

* get rid of factories

* fix cpu build

* fix include

* fix python import

* don't export collective.py yet

* skip collective communicator pytest on windows

* add review feedback

* update documentation

* remove mpi communicator type

* fix tests

* shutdown the communicator separately

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2022-09-12 15:21:12 -07:00
Rong Ou
d6e2013c5f
Set max message size in insecure gRPC (#8203) 2022-08-26 16:33:51 +08:00
Rong Ou
ad3bc0edee
Allow insecure gRPC connections for federated learning (#8181)
* Allow insecure gRPC connections for federated learning

* format
2022-08-19 12:16:14 +08:00
Rong Ou
45dc1f818a
Make federated plugin work with cmake 3.16.3 (#8029) 2022-06-27 17:26:41 +08:00
Rong Ou
0725fd6081
fix federated learning plugin (#8027) 2022-06-24 08:41:07 +08:00
Rong Ou
e5ec546da5
[Breaking] Remove rabit support for custom reductions and grow_local_histmaker updater (#7992) 2022-06-21 15:08:23 +08:00
Rong Ou
31e6902e43
Support GPU training in the NVFlare demo (#7965) 2022-06-02 21:52:36 +08:00
Rong Ou
d3429f2ff6
Increase gRPC max receive message size for federated learning (#7958) 2022-06-01 13:21:54 +08:00
Rong Ou
af907e2d0d
Demo of federated learning using NVFlare (#7879)
Co-authored-by: jiamingy <jm.yuan@outlook.com>
2022-05-14 22:45:41 +08:00
Rong Ou
14ef38b834
Initial support for federated learning (#7831)
Federated learning plugin for xgboost:
* A gRPC server to aggregate MPI-style requests (allgather, allreduce, broadcast) from federated workers.
* A Rabit engine for the federated environment.
* Integration test to simulate federated learning.

Additional followups are needed to address GPU support, better security, and privacy, etc.
2022-05-05 21:49:22 +08:00