Dmitry Razdoburdin
f588252481
[sycl] add loss guided hist building ( #10251 )
...
Co-authored-by: Dmitry Razdoburdin <>
2024-05-10 22:35:13 +08:00
Dmitry Razdoburdin
dcc9639b91
[sycl] add data initialisation for training ( #10222 )
...
Co-authored-by: Dmitry Razdoburdin <>
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2024-05-05 12:07:10 +08:00
Dmitry Razdoburdin
58513dc288
[SYCL] Add sampling initialization ( #10216 )
...
---------
Co-authored-by: Dmitry Razdoburdin <>
2024-04-25 04:35:52 +08:00
Jiaming Yuan
3fbb221fec
[coll] Implement shutdown for tracker and comm. ( #10208 )
...
- Force shutdown the tracker.
- Implement shutdown notice for error handling thread in comm.
2024-04-20 04:08:17 +08:00
Dmitry Razdoburdin
6e5c335cea
[SYCL] Add basic features for QuantileHistMaker ( #10174 )
...
---------
Co-authored-by: Dmitry Razdoburdin <>
2024-04-15 21:24:46 +08:00
Jiaming Yuan
8bad677c2f
Update collective implementation. ( #10152 )
...
* Update collective implementation.
- Cleanup resource during `Finalize` to avoid handling threads in destructor.
- Calculate the size for allgather automatically.
- Use simple allgather for small (smaller than the number of worker) allreduce.
2024-03-30 18:57:31 +08:00
Dmitry Razdoburdin
6a7c6a8ae6
add sycl reaslisation of ghist builder ( #10138 )
...
Co-authored-by: Dmitry Razdoburdin <>
2024-03-23 12:55:25 +08:00
Jiaming Yuan
53fc17578f
Use std::uint64_t for row index. ( #10120 )
...
- Use std::uint64_t instead of size_t to avoid implementation-defined type.
- Rename to bst_idx_t, to account for other types of indexing.
- Small cleanup to the base header.
2024-03-15 18:43:49 +08:00
Dmitry Razdoburdin
617970a0c2
[SYCL] Add split evaluation ( #10119 )
...
---------
Co-authored-by: Dmitry Razdoburdin <>
2024-03-15 01:46:46 +08:00
Dmitry Razdoburdin
7a61216690
[sycl] add partitioning and related tests ( #10080 )
...
Co-authored-by: Dmitry Razdoburdin <>
2024-03-02 01:49:27 +08:00
Dmitry Razdoburdin
761845f594
[SYCL] Implement row set collection. ( #10057 )
...
Co-authored-by: Dmitry Razdoburdin <>
2024-02-26 21:07:36 +08:00
Dmitry Razdoburdin
057f03cacc
[SYCL] Initial implementation of GHistIndexMatrix ( #10045 )
...
Co-authored-by: Dmitry Razdoburdin <>
2024-02-19 04:27:15 +08:00
Dmitry Razdoburdin
234674a0a6
[sync]. Add partition builder. ( #10011 )
...
---------
Co-authored-by: Dmitry Razdoburdin <>
2024-01-31 17:39:48 +08:00
Dmitry Razdoburdin
2a6ab2547d
SYCL inference optimization ( #9876 )
...
---------
Co-authored-by: Dmitry Razdoburdin <>
2023-12-15 11:04:39 +08:00
Dmitry Razdoburdin
43897b8296
Sycl implementation for objective functions ( #9846 )
...
---------
Co-authored-by: Dmitry Razdoburdin <>
2023-12-12 14:41:50 +08:00
Jiaming Yuan
b3700bbb3f
Flexible find protobuf. ( #9867 )
2023-12-12 07:34:01 +08:00
Dmitry Razdoburdin
381f1d3dc9
Add support inference on SYCL devices ( #9800 )
...
---------
Co-authored-by: Dmitry Razdoburdin <>
Co-authored-by: Nikolay Petrov <nikolay.a.petrov@intel.com>
Co-authored-by: Alexandra <alexandra.epanchinzeva@intel.com>
2023-12-04 16:15:57 +08:00
Jiaming Yuan
e9260de3f3
[breaking] Remove dense libsvm parser plugin. ( #9799 )
2023-11-23 00:12:39 +08:00
Jiaming Yuan
0715ab3c10
Use dlopen to load NCCL. ( #9796 )
...
This PR adds optional support for loading nccl with `dlopen` as an alternative of compile time linking. This is to address the size bloat issue with the PyPI binary release.
- Add CMake option to load `nccl` at runtime.
- Add an NCCL stub.
After this, `nccl` will be fetched from PyPI when using pip to install XGBoost, either by a user or by `pyproject.toml`. Others who want to link the nccl at compile time can continue to do so without any change.
At the moment, this is Linux only since we only support MNMG on Linux.
2023-11-22 19:27:31 +08:00
Jiaming Yuan
06bdc15e9b
[coll] Pass context to various functions. ( #9772 )
...
* [coll] Pass context to various functions.
In the future, the `Context` object would be required for collective operations, this PR
passes the context object to some required functions to prepare for swapping out the
implementation.
2023-11-08 09:54:05 +08:00
Jiaming Yuan
6c0a190f6d
[coll] Add comm group. ( #9759 )
...
- Implement `CommGroup` for double dispatching.
- Small cleanup to tracker for handling abort.
2023-11-07 11:12:31 +08:00
Jiaming Yuan
4da4e092b5
[coll] Improvements and fixes for tracker and allreduce. ( #9745 )
...
- Allow the tracker to wait.
- Fix allreduce type cast
- Return args from the federated tracker.
2023-11-02 04:06:46 +08:00
Jiaming Yuan
bc995a4865
[coll] Add federated coll. ( #9738 )
...
- Define a new data type, the proto file is copied for now.
- Merge client and communicator into `FederatedColl`.
- Define CUDA variant.
- Migrate tests for CPU, add tests for CUDA.
2023-11-01 04:06:46 +08:00
Jiaming Yuan
80390e6cb6
[coll] Federated comm. ( #9732 )
2023-10-31 02:39:55 +08:00
James Lamb
eb562d3829
[CI] address cmakelint warnings about whitespace ( #9674 )
2023-10-14 12:46:07 +08:00
Rong Ou
e164d51c43
Improve allgather functions ( #9649 )
2023-10-12 23:31:43 +08:00
James Lamb
db8d117f7e
[CI] standardize endif() calls in CMake scripts ( #9637 )
2023-10-08 11:45:20 +08:00
Jiaming Yuan
b438d684d2
Utilities and cleanups for socket. ( #9576 )
...
- Use c++-17 nodiscard and nested ns.
- Add bind method to socket.
- Remove rabit parameters.
2023-09-14 01:41:42 +08:00
Jiaming Yuan
972730cde0
Use matrix for gradient. ( #9508 )
...
- Use the `linalg::Matrix` for storing gradients.
- New API for the custom objective.
- Custom objective for multi-class/multi-target is now required to return the correct shape.
- Custom objective for Python can accept arrays with any strides. (row-major, column-major)
2023-08-24 05:29:52 +08:00
Jiaming Yuan
c1b2cff874
[CI] Check compiler warnings. ( #9444 )
2023-08-08 12:02:45 -07:00
Philip Hyunsu Cho
a5cd2412de
Replace setup.py with pyproject.toml ( #9021 )
...
* Create pyproject.toml
* Implement a custom build backend (see below) in packager directory. Build logic from setup.py has been refactored and migrated into the new backend.
* Tested: pip wheel . (build wheel), python -m build --sdist . (source distribution)
2023-04-20 13:51:39 -07:00
Jiaming Yuan
26209a42a5
Define git attributes for renormalization. ( #8921 )
2023-03-16 02:43:11 +08:00
Jiaming Yuan
36a7396658
Replace dmlc any with std any. ( #8892 )
2023-03-11 06:11:04 +08:00
Jiaming Yuan
c5c8f643f2
Remove the cub submodule. ( #8888 )
...
XGBoost now uses CTK-11.8 for binary packages, there's no need to maintain a cub
submodule anymore.
2023-03-09 19:43:02 -08:00
Philip Hyunsu Cho
6d8afb2218
[CI] Require C++17 + CMake 3.18; Use CUDA 11.8 in CI ( #8853 )
...
* Update to C++17
* Turn off unity build
* Update CMake to 3.18
* Use MSVC 2022 + CUDA 11.8
* Re-create stack for worker images
* Allocate more disk space for Windows
* Tempiorarily disable clang-tidy
* RAPIDS now requires Python 3.10+
* Unpin cuda-python
* Use latest NCCL
* Use Ubuntu 20.04 in RMM image
* Mark failing mgpu test as xfail
2023-03-01 09:22:24 -08:00
Rong Ou
cbf98cb9c6
Add Allgather to collective communicator ( #8765 )
...
* Add Allgather to collective communicator
2023-02-09 11:31:22 +08:00
Rong Ou
77b069c25d
Support bitwise allreduce operations in the communicator ( #8623 )
2022-12-25 06:40:05 +08:00
Jiaming Yuan
3e26107a9c
Rename and extract Context. ( #8528 )
...
* Rename `GenericParameter` to `Context`.
* Rename header file to reflect the change.
* Rename all references.
2022-12-07 04:58:54 +08:00
Rong Ou
a8255ea678
Add an in-memory collective communicator ( #8494 )
2022-12-01 00:24:12 +08:00
Rong Ou
4449e30184
Always link federated proto statically ( #8442 )
2022-11-09 07:47:38 +08:00
Rong Ou
521086d56b
Make federated client more robust ( #8351 )
2022-10-18 13:52:44 +08:00
Rong Ou
8f3dee58be
Speed up tests with federated learning enabled ( #8350 )
...
* Speed up tests with federated learning enabled
* Re-enable timeouts
Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-17 15:17:04 -07:00
Philip Hyunsu Cho
2faa744aba
[CI] Test federated learning plugin in the CI ( #8325 )
2022-10-12 13:57:39 -07:00
Rong Ou
39afdac3be
Better error message when world size and rank are set as strings ( #8316 )
...
Co-authored-by: jiamingy <jm.yuan@outlook.com>
2022-10-12 15:53:25 +08:00
Rong Ou
8d4038da57
Don't split input data in federated mode ( #8279 )
...
Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 18:19:28 -08:00
Rong Ou
668b8a0ea4
[Breaking] Switch from rabit to the collective communicator ( #8257 )
...
* Switch from rabit to the collective communicator
* fix size_t specialization
* really fix size_t
* try again
* add include
* more include
* fix lint errors
* remove rabit includes
* fix pylint error
* return dict from communicator context
* fix communicator shutdown
* fix dask test
* reset communicator mocklist
* fix distributed tests
* do not save device communicator
* fix jvm gpu tests
* add python test for federated communicator
* Update gputreeshap submodule
Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 14:39:01 -08:00
Rong Ou
a2686543a9
Common interface for collective communication ( #8057 )
...
* implement broadcast for federated communicator
* implement allreduce
* add communicator factory
* add device adapter
* add device communicator to factory
* add rabit communicator
* add rabit communicator to the factory
* add nccl device communicator
* add synchronize to device communicator
* add back print and getprocessorname
* add python wrapper and c api
* clean up types
* fix non-gpu build
* try to fix ci
* fix std::size_t
* portable string compare ignore case
* c style size_t
* fix lint errors
* cross platform setenv
* fix memory leak
* fix lint errors
* address review feedback
* add python test for rabit communicator
* fix failing gtest
* use json to configure communicators
* fix lint error
* get rid of factories
* fix cpu build
* fix include
* fix python import
* don't export collective.py yet
* skip collective communicator pytest on windows
* add review feedback
* update documentation
* remove mpi communicator type
* fix tests
* shutdown the communicator separately
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2022-09-12 15:21:12 -07:00
Rong Ou
d6e2013c5f
Set max message size in insecure gRPC ( #8203 )
2022-08-26 16:33:51 +08:00
Rong Ou
ad3bc0edee
Allow insecure gRPC connections for federated learning ( #8181 )
...
* Allow insecure gRPC connections for federated learning
* format
2022-08-19 12:16:14 +08:00
Rong Ou
45dc1f818a
Make federated plugin work with cmake 3.16.3 ( #8029 )
2022-06-27 17:26:41 +08:00