25 Commits

Author SHA1 Message Date
Jiaming Yuan
54da4b3185
Cleanup to prepare for using mmap pointer in external memory. (#9317)
- Update SparseDMatrix comment.
- Use a pointer in the bitfield. We will replace the `std::vector<bool>` in `ColumnMatrix` with bitfield.
- Clean up the page source. The timer is removed as it's inaccurate once we swap the mmap pointer into the page.
2023-06-22 06:43:11 +08:00
Rong Ou
d8beb517ed
Support bitwise allreduce in NCCL communicator (#9300) 2023-06-17 01:56:50 +08:00
Rong Ou
e70810be8a
Refactor device communicator to make allreduce more flexible (#9295) 2023-06-14 03:53:03 +08:00
Jiaming Yuan
03bc6e6427
Remove unused variables. (#9210)
- remove used variables.
- Remove signed comparison warnings.
2023-05-28 05:24:15 +08:00
Rong Ou
5b69534b43
Support column split in multi-target hist (#9171) 2023-05-26 16:56:05 +08:00
Rong Ou
acd363033e
Fix running MGPU gtests (#9200) 2023-05-26 05:26:38 +08:00
Rong Ou
52311dcec9
Fix multi-threaded gtests (#9148) 2023-05-10 19:15:32 +08:00
Rong Ou
a320b402a5
More refactoring to take advantage of collective aggregators (#9081) 2023-04-26 03:36:09 +08:00
Rong Ou
8dbe0510de
More collective aggregators (#9060) 2023-04-22 03:32:05 +08:00
Jiaming Yuan
a7b3dd3176
Fix compiler warnings. (#9055) 2023-04-21 02:26:47 +08:00
Rong Ou
42d100de18
Make sure metrics work with federated learning (#9037) 2023-04-19 15:39:11 +08:00
Jiaming Yuan
4d665b3fb0
Restore clang tidy test. (#8861) 2023-03-03 13:47:04 -08:00
Rong Ou
74572b5d45
Add convenience method for allgather (#8804) 2023-02-15 11:37:11 +08:00
Rong Ou
cbf98cb9c6
Add Allgather to collective communicator (#8765)
* Add Allgather to collective communicator
2023-02-09 11:31:22 +08:00
Rong Ou
78396f8a6e
Initial support for column-split cpu predictor (#8676) 2023-01-18 06:33:13 +08:00
Jiaming Yuan
cfa994d57f
Multi-target support for L1 error. (#8652)
- Add matrix support to the median function.
- Iterate through each target for quantile computation.
2023-01-11 05:51:14 +08:00
Jiaming Yuan
badeff1d74
Init estimation for regression. (#8272) 2023-01-11 02:04:56 +08:00
Rong Ou
77b069c25d
Support bitwise allreduce operations in the communicator (#8623) 2022-12-25 06:40:05 +08:00
Jiaming Yuan
3e26107a9c
Rename and extract Context. (#8528)
* Rename `GenericParameter` to `Context`.
* Rename header file to reflect the change.
* Rename all references.
2022-12-07 04:58:54 +08:00
Rong Ou
a8255ea678
Add an in-memory collective communicator (#8494) 2022-12-01 00:24:12 +08:00
Rong Ou
39afdac3be
Better error message when world size and rank are set as strings (#8316)
Co-authored-by: jiamingy <jm.yuan@outlook.com>
2022-10-12 15:53:25 +08:00
Rong Ou
8d4038da57
Don't split input data in federated mode (#8279)
Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 18:19:28 -08:00
Rong Ou
668b8a0ea4
[Breaking] Switch from rabit to the collective communicator (#8257)
* Switch from rabit to the collective communicator

* fix size_t specialization

* really fix size_t

* try again

* add include

* more include

* fix lint errors

* remove rabit includes

* fix pylint error

* return dict from communicator context

* fix communicator shutdown

* fix dask test

* reset communicator mocklist

* fix distributed tests

* do not save device communicator

* fix jvm gpu tests

* add python test for federated communicator

* Update gputreeshap submodule

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 14:39:01 -08:00
Jiaming Yuan
b791446623
Initial support for IPv6 (#8225)
- Merge rabit socket into XGBoost.
- Dask interface support.
- Add test to the socket.
2022-09-21 18:06:50 +08:00
Rong Ou
a2686543a9
Common interface for collective communication (#8057)
* implement broadcast for federated communicator

* implement allreduce

* add communicator factory

* add device adapter

* add device communicator to factory

* add rabit communicator

* add rabit communicator to the factory

* add nccl device communicator

* add synchronize to device communicator

* add back print and getprocessorname

* add python wrapper and c api

* clean up types

* fix non-gpu build

* try to fix ci

* fix std::size_t

* portable string compare ignore case

* c style size_t

* fix lint errors

* cross platform setenv

* fix memory leak

* fix lint errors

* address review feedback

* add python test for rabit communicator

* fix failing gtest

* use json to configure communicators

* fix lint error

* get rid of factories

* fix cpu build

* fix include

* fix python import

* don't export collective.py yet

* skip collective communicator pytest on windows

* add review feedback

* update documentation

* remove mpi communicator type

* fix tests

* shutdown the communicator separately

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2022-09-12 15:21:12 -07:00