Jiaming Yuan
292bb677e5
[EM] Support mmap backed ellpack. ( #10602 )
...
- Support resource view in ellpack.
- Define the CUDA version of MMAP resource.
- Define the CUDA version of malloc resource.
- Refactor cuda runtime API wrappers, and add memory access related wrappers.
- gather windows macros into a single header.
2024-07-18 08:20:21 +08:00
Jiaming Yuan
89da9f9741
[fed] Split up federated test CMake file. ( #10566 )
...
- Collect all federated test files into the same directory.
- Independently list the files.
2024-07-11 13:09:18 +08:00
Jiaming Yuan
26eb68859f
Consistently report error in tests. ( #10453 )
2024-06-21 14:35:22 +08:00
Jiaming Yuan
c9f5fcaf21
[col] Small cleanup to federated comm. ( #10397 )
2024-06-07 21:19:04 +08:00
Dmitry Razdoburdin
c7e7ce7569
[SYCL] Add nodes initialisation ( #10269 )
...
---------
Co-authored-by: Dmitry Razdoburdin <>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2024-05-21 23:38:52 +08:00
Jiaming Yuan
a5a58102e5
Revamp the rabit implementation. ( #10112 )
...
This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features:
- Federated learning for both CPU and GPU.
- NCCL.
- More data types.
- A unified interface for all the underlying implementations.
- Improved timeout handling for both tracker and workers.
- Exhausted tests with metrics (fixed a couple of bugs along the way).
- A reusable tracker for Python and JVM packages.
2024-05-20 11:56:23 +08:00
Dmitry Razdoburdin
f588252481
[sycl] add loss guided hist building ( #10251 )
...
Co-authored-by: Dmitry Razdoburdin <>
2024-05-10 22:35:13 +08:00
Dmitry Razdoburdin
dcc9639b91
[sycl] add data initialisation for training ( #10222 )
...
Co-authored-by: Dmitry Razdoburdin <>
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2024-05-05 12:07:10 +08:00
Dmitry Razdoburdin
58513dc288
[SYCL] Add sampling initialization ( #10216 )
...
---------
Co-authored-by: Dmitry Razdoburdin <>
2024-04-25 04:35:52 +08:00
Jiaming Yuan
3fbb221fec
[coll] Implement shutdown for tracker and comm. ( #10208 )
...
- Force shutdown the tracker.
- Implement shutdown notice for error handling thread in comm.
2024-04-20 04:08:17 +08:00
Dmitry Razdoburdin
6e5c335cea
[SYCL] Add basic features for QuantileHistMaker ( #10174 )
...
---------
Co-authored-by: Dmitry Razdoburdin <>
2024-04-15 21:24:46 +08:00
Jiaming Yuan
8bad677c2f
Update collective implementation. ( #10152 )
...
* Update collective implementation.
- Cleanup resource during `Finalize` to avoid handling threads in destructor.
- Calculate the size for allgather automatically.
- Use simple allgather for small (smaller than the number of worker) allreduce.
2024-03-30 18:57:31 +08:00
Dmitry Razdoburdin
6a7c6a8ae6
add sycl reaslisation of ghist builder ( #10138 )
...
Co-authored-by: Dmitry Razdoburdin <>
2024-03-23 12:55:25 +08:00
Jiaming Yuan
53fc17578f
Use std::uint64_t for row index. ( #10120 )
...
- Use std::uint64_t instead of size_t to avoid implementation-defined type.
- Rename to bst_idx_t, to account for other types of indexing.
- Small cleanup to the base header.
2024-03-15 18:43:49 +08:00
Dmitry Razdoburdin
617970a0c2
[SYCL] Add split evaluation ( #10119 )
...
---------
Co-authored-by: Dmitry Razdoburdin <>
2024-03-15 01:46:46 +08:00
Jiaming Yuan
d07b7fe8c8
Small cleanup for mock tests. ( #10085 )
2024-03-04 23:32:11 +08:00
Dmitry Razdoburdin
7a61216690
[sycl] add partitioning and related tests ( #10080 )
...
Co-authored-by: Dmitry Razdoburdin <>
2024-03-02 01:49:27 +08:00
Dmitry Razdoburdin
761845f594
[SYCL] Implement row set collection. ( #10057 )
...
Co-authored-by: Dmitry Razdoburdin <>
2024-02-26 21:07:36 +08:00
Dmitry Razdoburdin
057f03cacc
[SYCL] Initial implementation of GHistIndexMatrix ( #10045 )
...
Co-authored-by: Dmitry Razdoburdin <>
2024-02-19 04:27:15 +08:00
Dmitry Razdoburdin
234674a0a6
[sync]. Add partition builder. ( #10011 )
...
---------
Co-authored-by: Dmitry Razdoburdin <>
2024-01-31 17:39:48 +08:00
Dmitry Razdoburdin
43897b8296
Sycl implementation for objective functions ( #9846 )
...
---------
Co-authored-by: Dmitry Razdoburdin <>
2023-12-12 14:41:50 +08:00
Dmitry Razdoburdin
381f1d3dc9
Add support inference on SYCL devices ( #9800 )
...
---------
Co-authored-by: Dmitry Razdoburdin <>
Co-authored-by: Nikolay Petrov <nikolay.a.petrov@intel.com>
Co-authored-by: Alexandra <alexandra.epanchinzeva@intel.com>
2023-12-04 16:15:57 +08:00
Jiaming Yuan
06bdc15e9b
[coll] Pass context to various functions. ( #9772 )
...
* [coll] Pass context to various functions.
In the future, the `Context` object would be required for collective operations, this PR
passes the context object to some required functions to prepare for swapping out the
implementation.
2023-11-08 09:54:05 +08:00
Jiaming Yuan
6c0a190f6d
[coll] Add comm group. ( #9759 )
...
- Implement `CommGroup` for double dispatching.
- Small cleanup to tracker for handling abort.
2023-11-07 11:12:31 +08:00
Jiaming Yuan
4da4e092b5
[coll] Improvements and fixes for tracker and allreduce. ( #9745 )
...
- Allow the tracker to wait.
- Fix allreduce type cast
- Return args from the federated tracker.
2023-11-02 04:06:46 +08:00
Jiaming Yuan
bc995a4865
[coll] Add federated coll. ( #9738 )
...
- Define a new data type, the proto file is copied for now.
- Merge client and communicator into `FederatedColl`.
- Define CUDA variant.
- Migrate tests for CPU, add tests for CUDA.
2023-11-01 04:06:46 +08:00
Jiaming Yuan
80390e6cb6
[coll] Federated comm. ( #9732 )
2023-10-31 02:39:55 +08:00
Rong Ou
e164d51c43
Improve allgather functions ( #9649 )
2023-10-12 23:31:43 +08:00
Rong Ou
66a0832778
Add tests for gpu_approx ( #9553 )
2023-09-07 17:21:58 +08:00
Rong Ou
c928dd4ff5
Support vertical federated learning with gpu_hist ( #9539 )
2023-09-03 11:37:11 +08:00
Rong Ou
6103dca0bb
Support column split in GPU evaluate splits ( #9511 )
2023-08-23 16:33:43 +08:00
Rong Ou
bde1ebc209
Switch back to the GPUIDX macro ( #9438 )
2023-08-04 15:14:31 +08:00
Rong Ou
c2b85ab68a
Clean up MGPU C++ tests ( #9430 )
2023-08-02 14:31:18 +08:00
Jiaming Yuan
04aff3af8e
Define the new device parameter. ( #9362 )
2023-07-13 19:30:25 +08:00
Rong Ou
f90771eec6
Fix device communicator dependency ( #9346 )
2023-06-29 10:34:30 +08:00
Rong Ou
e70810be8a
Refactor device communicator to make allreduce more flexible ( #9295 )
2023-06-14 03:53:03 +08:00
Jiaming Yuan
152e2fb072
Unify test helpers for creating ctx. ( #9274 )
2023-06-10 03:35:22 +08:00
Rong Ou
5b69534b43
Support column split in multi-target hist ( #9171 )
2023-05-26 16:56:05 +08:00
Rong Ou
52311dcec9
Fix multi-threaded gtests ( #9148 )
2023-05-10 19:15:32 +08:00
Rong Ou
511d4996b5
Rely on gRPC to generate random port ( #9102 )
2023-04-27 09:48:26 +08:00
Rong Ou
42d100de18
Make sure metrics work with federated learning ( #9037 )
2023-04-19 15:39:11 +08:00
Jiaming Yuan
fe9dff339c
Convert federated learner test into test suite. ( #9018 )
...
* Convert federated learner test into test suite.
- Add specialization to learning to rank.
2023-04-11 09:52:55 +08:00
Rong Ou
15e073ca9d
Make objectives work with vertical distributed and federated learning ( #9002 )
2023-04-03 17:07:42 +08:00
Rong Ou
ff26cd3212
More tests for column split and vertical federated learning ( #8985 )
...
Added some more tests for the learner and fit_stump, for both column-wise distributed learning and vertical federated learning.
Also moved the `IsRowSplit` and `IsColumnSplit` methods from the `DMatrix` to the `MetaInfo` since in some places we only have access to the `MetaInfo`. Added a new convenience method `IsVerticalFederatedLearning`.
Some refactoring of the testing fixtures.
2023-03-28 16:40:26 +08:00
Rong Ou
b240f055d3
Support vertical federated learning ( #8932 )
2023-03-22 14:25:26 +08:00
Rong Ou
cbf98cb9c6
Add Allgather to collective communicator ( #8765 )
...
* Add Allgather to collective communicator
2023-02-09 11:31:22 +08:00
Jiaming Yuan
3e26107a9c
Rename and extract Context. ( #8528 )
...
* Rename `GenericParameter` to `Context`.
* Rename header file to reflect the change.
* Rename all references.
2022-12-07 04:58:54 +08:00
Rong Ou
a8255ea678
Add an in-memory collective communicator ( #8494 )
2022-12-01 00:24:12 +08:00
Philip Hyunsu Cho
2faa744aba
[CI] Test federated learning plugin in the CI ( #8325 )
2022-10-12 13:57:39 -07:00
Rong Ou
39afdac3be
Better error message when world size and rank are set as strings ( #8316 )
...
Co-authored-by: jiamingy <jm.yuan@outlook.com>
2022-10-12 15:53:25 +08:00