53 Commits

Author SHA1 Message Date
Dmitry Razdoburdin
c7e7ce7569
[SYCL] Add nodes initialisation (#10269)
---------

Co-authored-by: Dmitry Razdoburdin <>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2024-05-21 23:38:52 +08:00
Jiaming Yuan
a5a58102e5
Revamp the rabit implementation. (#10112)
This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features:
- Federated learning for both CPU and GPU.
- NCCL.
- More data types.
- A unified interface for all the underlying implementations.
- Improved timeout handling for both tracker and workers.
- Exhausted tests with metrics (fixed a couple of bugs along the way).
- A reusable tracker for Python and JVM packages.
2024-05-20 11:56:23 +08:00
Dmitry Razdoburdin
f588252481
[sycl] add loss guided hist building (#10251)
Co-authored-by: Dmitry Razdoburdin <>
2024-05-10 22:35:13 +08:00
Dmitry Razdoburdin
dcc9639b91
[sycl] add data initialisation for training (#10222)
Co-authored-by: Dmitry Razdoburdin <>
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2024-05-05 12:07:10 +08:00
Dmitry Razdoburdin
58513dc288
[SYCL] Add sampling initialization (#10216)
---------

Co-authored-by: Dmitry Razdoburdin <>
2024-04-25 04:35:52 +08:00
Jiaming Yuan
3fbb221fec
[coll] Implement shutdown for tracker and comm. (#10208)
- Force shutdown the tracker.
- Implement shutdown notice for error handling thread in comm.
2024-04-20 04:08:17 +08:00
Dmitry Razdoburdin
6e5c335cea
[SYCL] Add basic features for QuantileHistMaker (#10174)
---------

Co-authored-by: Dmitry Razdoburdin <>
2024-04-15 21:24:46 +08:00
Jiaming Yuan
8bad677c2f
Update collective implementation. (#10152)
* Update collective implementation.

- Cleanup resource during `Finalize` to avoid handling threads in destructor.
- Calculate the size for allgather automatically.
- Use simple allgather for small (smaller than the number of worker) allreduce.
2024-03-30 18:57:31 +08:00
Dmitry Razdoburdin
6a7c6a8ae6
add sycl reaslisation of ghist builder (#10138)
Co-authored-by: Dmitry Razdoburdin <>
2024-03-23 12:55:25 +08:00
Jiaming Yuan
53fc17578f
Use std::uint64_t for row index. (#10120)
- Use std::uint64_t instead of size_t to avoid implementation-defined type.
- Rename to bst_idx_t, to account for other types of indexing.
- Small cleanup to the base header.
2024-03-15 18:43:49 +08:00
Dmitry Razdoburdin
617970a0c2
[SYCL] Add split evaluation (#10119)
---------

Co-authored-by: Dmitry Razdoburdin <>
2024-03-15 01:46:46 +08:00
Jiaming Yuan
d07b7fe8c8
Small cleanup for mock tests. (#10085) 2024-03-04 23:32:11 +08:00
Dmitry Razdoburdin
7a61216690
[sycl] add partitioning and related tests (#10080)
Co-authored-by: Dmitry Razdoburdin <>
2024-03-02 01:49:27 +08:00
Dmitry Razdoburdin
761845f594
[SYCL] Implement row set collection. (#10057)
Co-authored-by: Dmitry Razdoburdin <>
2024-02-26 21:07:36 +08:00
Dmitry Razdoburdin
057f03cacc
[SYCL] Initial implementation of GHistIndexMatrix (#10045)
Co-authored-by: Dmitry Razdoburdin <>
2024-02-19 04:27:15 +08:00
Dmitry Razdoburdin
234674a0a6
[sync]. Add partition builder. (#10011)
---------

Co-authored-by: Dmitry Razdoburdin <>
2024-01-31 17:39:48 +08:00
Dmitry Razdoburdin
43897b8296
Sycl implementation for objective functions (#9846)
---------

Co-authored-by: Dmitry Razdoburdin <>
2023-12-12 14:41:50 +08:00
Dmitry Razdoburdin
381f1d3dc9
Add support inference on SYCL devices (#9800)
---------

Co-authored-by: Dmitry Razdoburdin <>
Co-authored-by: Nikolay Petrov <nikolay.a.petrov@intel.com>
Co-authored-by: Alexandra <alexandra.epanchinzeva@intel.com>
2023-12-04 16:15:57 +08:00
Jiaming Yuan
06bdc15e9b
[coll] Pass context to various functions. (#9772)
* [coll] Pass context to various functions.

In the future, the `Context` object would be required for collective operations, this PR
passes the context object to some required functions to prepare for swapping out the
implementation.
2023-11-08 09:54:05 +08:00
Jiaming Yuan
6c0a190f6d
[coll] Add comm group. (#9759)
- Implement `CommGroup` for double dispatching.
- Small cleanup to tracker for handling abort.
2023-11-07 11:12:31 +08:00
Jiaming Yuan
4da4e092b5
[coll] Improvements and fixes for tracker and allreduce. (#9745)
- Allow the tracker to wait.
- Fix allreduce type cast
- Return args from the federated tracker.
2023-11-02 04:06:46 +08:00
Jiaming Yuan
bc995a4865
[coll] Add federated coll. (#9738)
- Define a new data type, the proto file is copied for now.
- Merge client and communicator into `FederatedColl`.
- Define CUDA variant.
- Migrate tests for CPU, add tests for CUDA.
2023-11-01 04:06:46 +08:00
Jiaming Yuan
80390e6cb6
[coll] Federated comm. (#9732) 2023-10-31 02:39:55 +08:00
Rong Ou
e164d51c43
Improve allgather functions (#9649) 2023-10-12 23:31:43 +08:00
Rong Ou
66a0832778
Add tests for gpu_approx (#9553) 2023-09-07 17:21:58 +08:00
Rong Ou
c928dd4ff5
Support vertical federated learning with gpu_hist (#9539) 2023-09-03 11:37:11 +08:00
Rong Ou
6103dca0bb
Support column split in GPU evaluate splits (#9511) 2023-08-23 16:33:43 +08:00
Rong Ou
bde1ebc209
Switch back to the GPUIDX macro (#9438) 2023-08-04 15:14:31 +08:00
Rong Ou
c2b85ab68a
Clean up MGPU C++ tests (#9430) 2023-08-02 14:31:18 +08:00
Jiaming Yuan
04aff3af8e
Define the new device parameter. (#9362) 2023-07-13 19:30:25 +08:00
Rong Ou
f90771eec6
Fix device communicator dependency (#9346) 2023-06-29 10:34:30 +08:00
Rong Ou
e70810be8a
Refactor device communicator to make allreduce more flexible (#9295) 2023-06-14 03:53:03 +08:00
Jiaming Yuan
152e2fb072
Unify test helpers for creating ctx. (#9274) 2023-06-10 03:35:22 +08:00
Rong Ou
5b69534b43
Support column split in multi-target hist (#9171) 2023-05-26 16:56:05 +08:00
Rong Ou
52311dcec9
Fix multi-threaded gtests (#9148) 2023-05-10 19:15:32 +08:00
Rong Ou
511d4996b5
Rely on gRPC to generate random port (#9102) 2023-04-27 09:48:26 +08:00
Rong Ou
42d100de18
Make sure metrics work with federated learning (#9037) 2023-04-19 15:39:11 +08:00
Jiaming Yuan
fe9dff339c
Convert federated learner test into test suite. (#9018)
* Convert federated learner test into test suite.

- Add specialization to learning to rank.
2023-04-11 09:52:55 +08:00
Rong Ou
15e073ca9d
Make objectives work with vertical distributed and federated learning (#9002) 2023-04-03 17:07:42 +08:00
Rong Ou
ff26cd3212
More tests for column split and vertical federated learning (#8985)
Added some more tests for the learner and fit_stump, for both column-wise distributed learning and vertical federated learning.

Also moved the `IsRowSplit` and `IsColumnSplit` methods from the `DMatrix` to the `MetaInfo` since in some places we only have access to the `MetaInfo`. Added a new convenience method `IsVerticalFederatedLearning`.

Some refactoring of the testing fixtures.
2023-03-28 16:40:26 +08:00
Rong Ou
b240f055d3
Support vertical federated learning (#8932) 2023-03-22 14:25:26 +08:00
Rong Ou
cbf98cb9c6
Add Allgather to collective communicator (#8765)
* Add Allgather to collective communicator
2023-02-09 11:31:22 +08:00
Jiaming Yuan
3e26107a9c
Rename and extract Context. (#8528)
* Rename `GenericParameter` to `Context`.
* Rename header file to reflect the change.
* Rename all references.
2022-12-07 04:58:54 +08:00
Rong Ou
a8255ea678
Add an in-memory collective communicator (#8494) 2022-12-01 00:24:12 +08:00
Philip Hyunsu Cho
2faa744aba
[CI] Test federated learning plugin in the CI (#8325) 2022-10-12 13:57:39 -07:00
Rong Ou
39afdac3be
Better error message when world size and rank are set as strings (#8316)
Co-authored-by: jiamingy <jm.yuan@outlook.com>
2022-10-12 15:53:25 +08:00
Rong Ou
a2686543a9
Common interface for collective communication (#8057)
* implement broadcast for federated communicator

* implement allreduce

* add communicator factory

* add device adapter

* add device communicator to factory

* add rabit communicator

* add rabit communicator to the factory

* add nccl device communicator

* add synchronize to device communicator

* add back print and getprocessorname

* add python wrapper and c api

* clean up types

* fix non-gpu build

* try to fix ci

* fix std::size_t

* portable string compare ignore case

* c style size_t

* fix lint errors

* cross platform setenv

* fix memory leak

* fix lint errors

* address review feedback

* add python test for rabit communicator

* fix failing gtest

* use json to configure communicators

* fix lint error

* get rid of factories

* fix cpu build

* fix include

* fix python import

* don't export collective.py yet

* skip collective communicator pytest on windows

* add review feedback

* update documentation

* remove mpi communicator type

* fix tests

* shutdown the communicator separately

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2022-09-12 15:21:12 -07:00
Jiaming Yuan
bc818316f2
Prepare for improving Windows networking compatibility. (#8234)
* Prepare for improving Windows networking compatibility.

* Include dmlc filesystem indirectly as dmlc/filesystem.h includes windows.h, which
  conflicts with winsock2.h
* Define `NOMINMAX` conditionally.
* Link the winsock library when mysys32 is used.
* Add config file for read the doc.
2022-09-10 15:16:49 +08:00
Rong Ou
14ef38b834
Initial support for federated learning (#7831)
Federated learning plugin for xgboost:
* A gRPC server to aggregate MPI-style requests (allgather, allreduce, broadcast) from federated workers.
* A Rabit engine for the federated environment.
* Integration test to simulate federated learning.

Additional followups are needed to address GPU support, better security, and privacy, etc.
2022-05-05 21:49:22 +08:00
Christian Lorentzen
cf4f019ed6
[Breaking] Change default evaluation metric for classification to logloss / mlogloss (#6183)
* Change DefaultEvalMetric of classification from error to logloss

* Change default binary metric in plugin/example/custom_obj.cc

* Set old error metric in python tests

* Set old error metric in R tests

* Fix missed eval metrics and typos in R tests

* Fix setting eval_metric twice in R tests

* Add warning for empty eval_metric for classification

* Fix Dask tests

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-10-02 12:06:47 -07:00