Federated learning plugin for xgboost:
* A gRPC server to aggregate MPI-style requests (allgather, allreduce, broadcast) from federated workers.
* A Rabit engine for the federated environment.
* Integration test to simulate federated learning.
Additional followups are needed to address GPU support, better security, and privacy, etc.
* Supply `-G;-src-in-ptx` when `USE_DEVICE_DEBUG` is set and debug mode is selected.
* Refactor CMake script to gather all CUDA configuration.
* Use CMAKE_CUDA_ARCHITECTURES. Close#6029.
* Add compute 80. Close#5999
* [CI] Add RMM as an optional dependency
* Replace caching allocator with pool allocator from RMM
* Revert "Replace caching allocator with pool allocator from RMM"
This reverts commit e15845d4e72e890c2babe31a988b26503a7d9038.
* Use rmm::mr::get_default_resource()
* Try setting default resource (doesn't work yet)
* Allocate pool_mr in the heap
* Prevent leaking pool_mr handle
* Separate EXPECT_DEATH() in separate test suite suffixed DeathTest
* Turn off death tests for RMM
* Address reviewer's feedback
* Prevent leaking of cuda_mr
* Fix Jenkinsfile syntax
* Remove unnecessary function in Jenkinsfile
* [CI] Install NCCL into RMM container
* Run Python tests
* Try building with RMM, CUDA 10.0
* Do not use RMM for CUDA 10.0 target
* Actually test for test_rmm flag
* Fix TestPythonGPU
* Use CNMeM allocator, since pool allocator doesn't yet support multiGPU
* Use 10.0 container to build RMM-enabled XGBoost
* Revert "Use 10.0 container to build RMM-enabled XGBoost"
This reverts commit 789021fa31112e25b683aef39fff375403060141.
* Fix Jenkinsfile
* [CI] Assign larger /dev/shm to NCCL
* Use 10.2 artifact to run multi-GPU Python tests
* Add CUDA 10.0 -> 11.0 cross-version test; remove CUDA 10.0 target
* Rename Conda env rmm_test -> gpu_test
* Use env var to opt into CNMeM pool for C++ tests
* Use identical CUDA version for RMM builds and tests
* Use Pytest fixtures to enable RMM pool in Python tests
* Move RMM to plugin/CMakeLists.txt; use PLUGIN_RMM
* Use per-device MR; use command arg in gtest
* Set CMake prefix path to use Conda env
* Use 0.15 nightly version of RMM
* Remove unnecessary header
* Fix a unit test when cudf is missing
* Add RMM demos
* Remove print()
* Use HostDeviceVector in GPU predictor
* Simplify pytest setup; use LocalCUDACluster fixture
* Address reviewers' commments
Co-authored-by: Hyunsu Cho <chohyu01@cs.wasshington.edu>
* Added plugin with DPC++-based predictor and objective function
* Update CMakeLists.txt
* Update regression_obj_oneapi.cc
* Added README.md for OneAPI plugin
* Added OneAPI predictor support to gbtree
* Update README.md
* Merged kernels in gradient computation. Enabled multiple loss functions with DPC++ backend
* Aligned plugin CMake files with latest master changes. Fixed whitespace typos
* Removed debug output
* [CI] Make oneapi_plugin a CMake target
* Added tests for OneAPI plugin for predictor and obj. functions
* Temporarily switched to default selector for device dispacthing in OneAPI plugin to enable execution in environments without gpus
* Updated readme file.
* Fixed USM usage in predictor
* Removed workaround with explicit templated names for DPC++ kernels
* Fixed warnings in plugin tests
* Fix CMake build of gtest
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
* Add OpenMP as CMake target
* Require CMake 3.12, to allow linking OpenMP target to objxgboost
* Specify OpenMP compiler flag for CUDA host compiler
* Require CMake 3.16+ if the OS is Mac OSX
* Use AppleClang in Mac tests.
* Update dmlc-core
This makes GPU Hist robust in distributed environment as some workers might not
be associated with any data in either training or evaluation.
* Disable rabit mock test for now: See #5012 .
* Disable dask-cudf test at prediction for now: See #5003
* Launch dask job for all workers despite they might not have any data.
* Check 0 rows in elementwise evaluation metrics.
Using AUC and AUC-PR still throws an error. See #4663 for a robust fix.
* Add tests for edge cases.
* Add `LaunchKernel` wrapper handling zero sized grid.
* Move some parts of allreducer into a cu file.
* Don't validate feature names when the booster is empty.
* Sync number of columns in DMatrix.
As num_feature is required to be the same across all workers in data split
mode.
* Filtering in dask interface now by default syncs all booster that's not
empty, instead of using rank 0.
* Fix Jenkins' GPU tests.
* Install dask-cuda from source in Jenkins' test.
Now all tests are actually running.
* Restore GPU Hist tree synchronization test.
* Check UUID of running devices.
The check is only performed on CUDA version >= 10.x, as 9.x doesn't have UUID field.
* Fix CMake policy and project variables.
Use xgboost_SOURCE_DIR uniformly, add policy for CMake >= 3.13.
* Fix copying data to CPU
* Fix race condition in cpu predictor.
* Fix duplicated DMatrix construction.
* Don't download extra nccl in CI script.
* Add CMake option to use bundled gtest from dmlc-core, so that it is easy to build XGBoost with gtest on Windows
* Consistently apply OpenMP flag to all targets. Force enable OpenMP when USE_CUDA is turned on.
* Insert vcomp140.dll into Windows wheels
* Add C++ and Python tests for CPU and GPU targets (CUDA 9.0, 10.0, 10.1)
* Prevent spurious msbuild failure
* Add GPU tests
* Upgrade dmlc-core
* Make CMakeLists.txt compatible with CMake 3.3; require CMake 3.11 for MSVC
* Use CMake 3.12 when sanitizer is enabled
* Disable funroll-loops for MSVC
* Use cmake version in container name
* Add missing arg
* Fix egrep use in ci_build.sh
* Display CMake version
* Do not set OpenMP_CXX_LIBRARIES for MSVC
* Use cmake_minimum_required()
* Refactor CMake scripts.
* Remove CMake CUDA wrapper.
* Bump CMake version for CUDA.
* Use CMake to handle Doxygen.
* Split up CMakeList.
* Export install target.
* Use modern CMake.
* Remove build.sh
* Workaround for gpu_hist test.
* Use cmake 3.12.
* Revert machine.conf.
* Move CLI test to gpu.
* Small cleanup.
* Support using XGBoost as submodule.
* Fix windows
* Fix cpp tests on Windows
* Remove duplicated find_package.