xgboost

Author	SHA1	Message	Date
Jiaming Yuan	c1b2cff874	[CI] Check compiler warnings. (#9444 )	2023-08-08 12:02:45 -07:00
Philip Hyunsu Cho	a5cd2412de	Replace setup.py with pyproject.toml (#9021 ) * Create pyproject.toml * Implement a custom build backend (see below) in packager directory. Build logic from setup.py has been refactored and migrated into the new backend. * Tested: pip wheel . (build wheel), python -m build --sdist . (source distribution)	2023-04-20 13:51:39 -07:00
Jiaming Yuan	26209a42a5	Define git attributes for renormalization. (#8921 )	2023-03-16 02:43:11 +08:00
Jiaming Yuan	36a7396658	Replace dmlc any with std any. (#8892 )	2023-03-11 06:11:04 +08:00
Jiaming Yuan	c5c8f643f2	Remove the cub submodule. (#8888 ) XGBoost now uses CTK-11.8 for binary packages, there's no need to maintain a cub submodule anymore.	2023-03-09 19:43:02 -08:00
Philip Hyunsu Cho	6d8afb2218	[CI] Require C++17 + CMake 3.18; Use CUDA 11.8 in CI (#8853 ) * Update to C++17 * Turn off unity build * Update CMake to 3.18 * Use MSVC 2022 + CUDA 11.8 * Re-create stack for worker images * Allocate more disk space for Windows * Tempiorarily disable clang-tidy * RAPIDS now requires Python 3.10+ * Unpin cuda-python * Use latest NCCL * Use Ubuntu 20.04 in RMM image * Mark failing mgpu test as xfail	2023-03-01 09:22:24 -08:00
Rong Ou	cbf98cb9c6	Add Allgather to collective communicator (#8765 ) * Add Allgather to collective communicator	2023-02-09 11:31:22 +08:00
Rong Ou	77b069c25d	Support bitwise allreduce operations in the communicator (#8623 )	2022-12-25 06:40:05 +08:00
Jiaming Yuan	3e26107a9c	Rename and extract `Context`. (#8528 ) * Rename `GenericParameter` to `Context`. * Rename header file to reflect the change. * Rename all references.	2022-12-07 04:58:54 +08:00
Rong Ou	a8255ea678	Add an in-memory collective communicator (#8494 )	2022-12-01 00:24:12 +08:00
Rong Ou	4449e30184	Always link federated proto statically (#8442 )	2022-11-09 07:47:38 +08:00
Rong Ou	521086d56b	Make federated client more robust (#8351 )	2022-10-18 13:52:44 +08:00
Rong Ou	8f3dee58be	Speed up tests with federated learning enabled (#8350 ) * Speed up tests with federated learning enabled * Re-enable timeouts Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-17 15:17:04 -07:00
Philip Hyunsu Cho	2faa744aba	[CI] Test federated learning plugin in the CI (#8325 )	2022-10-12 13:57:39 -07:00
Rong Ou	39afdac3be	Better error message when world size and rank are set as strings (#8316 ) Co-authored-by: jiamingy <jm.yuan@outlook.com>	2022-10-12 15:53:25 +08:00
Rong Ou	8d4038da57	Don't split input data in federated mode (#8279 ) Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-05 18:19:28 -08:00
Rong Ou	668b8a0ea4	[Breaking] Switch from rabit to the collective communicator (#8257 ) * Switch from rabit to the collective communicator * fix size_t specialization * really fix size_t * try again * add include * more include * fix lint errors * remove rabit includes * fix pylint error * return dict from communicator context * fix communicator shutdown * fix dask test * reset communicator mocklist * fix distributed tests * do not save device communicator * fix jvm gpu tests * add python test for federated communicator * Update gputreeshap submodule Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-05 14:39:01 -08:00
Rong Ou	a2686543a9	Common interface for collective communication (#8057 ) * implement broadcast for federated communicator * implement allreduce * add communicator factory * add device adapter * add device communicator to factory * add rabit communicator * add rabit communicator to the factory * add nccl device communicator * add synchronize to device communicator * add back print and getprocessorname * add python wrapper and c api * clean up types * fix non-gpu build * try to fix ci * fix std::size_t * portable string compare ignore case * c style size_t * fix lint errors * cross platform setenv * fix memory leak * fix lint errors * address review feedback * add python test for rabit communicator * fix failing gtest * use json to configure communicators * fix lint error * get rid of factories * fix cpu build * fix include * fix python import * don't export collective.py yet * skip collective communicator pytest on windows * add review feedback * update documentation * remove mpi communicator type * fix tests * shutdown the communicator separately Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2022-09-12 15:21:12 -07:00
Rong Ou	d6e2013c5f	Set max message size in insecure gRPC (#8203 )	2022-08-26 16:33:51 +08:00
Rong Ou	ad3bc0edee	Allow insecure gRPC connections for federated learning (#8181 ) * Allow insecure gRPC connections for federated learning * format	2022-08-19 12:16:14 +08:00
Rong Ou	45dc1f818a	Make federated plugin work with cmake 3.16.3 (#8029 )	2022-06-27 17:26:41 +08:00
Rong Ou	0725fd6081	fix federated learning plugin (#8027 )	2022-06-24 08:41:07 +08:00
Jiaming Yuan	142a208a90	Fix compiler warnings. (#8022 ) - Remove/fix unused parameters - Remove deprecated code in rabit. - Update dmlc-core.	2022-06-22 21:29:10 +08:00
Rong Ou	e5ec546da5	[Breaking] Remove rabit support for custom reductions and `grow_local_histmaker` updater (#7992 )	2022-06-21 15:08:23 +08:00
Jiaming Yuan	d48123d23b	Fix rmm build (#7973 ) - Optionally switch to c++17 - Use rmm CMake target. - Workaround compiler errors. - Fix GPUMetric inheritance. - Run death tests even if it's built with RMM support. Co-authored-by: jakirkham <jakirkham@gmail.com>	2022-06-06 20:18:32 +08:00
Rong Ou	31e6902e43	Support GPU training in the NVFlare demo (#7965 )	2022-06-02 21:52:36 +08:00
Rong Ou	d3429f2ff6	Increase gRPC max receive message size for federated learning (#7958 )	2022-06-01 13:21:54 +08:00
Rong Ou	af907e2d0d	Demo of federated learning using NVFlare (#7879 ) Co-authored-by: jiamingy <jm.yuan@outlook.com>	2022-05-14 22:45:41 +08:00
Rong Ou	14ef38b834	Initial support for federated learning (#7831 ) Federated learning plugin for xgboost: * A gRPC server to aggregate MPI-style requests (allgather, allreduce, broadcast) from federated workers. * A Rabit engine for the federated environment. * Integration test to simulate federated learning. Additional followups are needed to address GPU support, better security, and privacy, etc.	2022-05-05 21:49:22 +08:00
Jiaming Yuan	fdf533f2b9	[POC] Experimental support for l1 error. (#7812 ) Support adaptive tree, a feature supported by both sklearn and lightgbm. The tree leaf is recomputed based on residue of labels and predictions after construction. For l1 error, the optimal value is the median (50 percentile). This is marked as experimental support for the following reasons: - The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors. - Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.	2022-04-26 21:41:55 +08:00
Jiaming Yuan	5b1161bb64	Convert labels into tensor. (#7456 ) * Add a new ctor to tensor for `initilizer_list`. * Change labels from host device vector to tensor. * Rename the field from `labels_` to `labels` since it's a public member.	2021-12-17 00:58:35 +08:00
Jiaming Yuan	4100827971	Pass infomation about objective to tree methods. (#7385 ) * Define the `ObjInfo` and pass it down to every tree updater.	2021-11-04 01:52:44 +08:00
Jiaming Yuan	abec3dbf6d	Fix thread safety of softmax prediction. (#7104 )	2021-07-16 02:06:55 +08:00
Jiaming Yuan	f937f514aa	Remove lz4 compression with external memory. (#7076 )	2021-07-06 14:46:43 +08:00
Louis Desreumaux	9b530e5697	Improve OpenMP exception handling (#6680 )	2021-02-25 13:56:16 +08:00
Philip Hyunsu Cho	0f2ed21a9d	[Breaking] Change default evaluation metric for binary:logitraw objective to logloss (#6647 )	2021-01-29 00:12:12 +09:00
Akira Funahashi	8e0f5a6fc7	Update plugin instructions for CMake build (#6289 )	2020-10-26 17:42:07 -07:00
Christian Lorentzen	cf4f019ed6	[Breaking] Change default evaluation metric for classification to logloss / mlogloss (#6183 ) * Change DefaultEvalMetric of classification from error to logloss * Change default binary metric in plugin/example/custom_obj.cc * Set old error metric in python tests * Set old error metric in R tests * Fix missed eval metrics and typos in R tests * Fix setting eval_metric twice in R tests * Add warning for empty eval_metric for classification * Fix Dask tests Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-10-02 12:06:47 -07:00
Jiaming Yuan	e033caa3ba	Remove linking RMM library. (#6146 ) * Remove linking RMM library. * RMM is now header only. * Remove remaining reference.	2020-09-22 16:59:33 -07:00
Philip Hyunsu Cho	d0ccb13d09	Work around a compiler bug in MacOS AppleClang 11 (#6103 ) * Workaround a compiler bug in MacOS AppleClang * [CI] Run C++ test with MacOS Catalina + AppleClang 11.0.3 * [CI] Migrate cmake_test on MacOS from Travis CI to GitHub Actions * Install OpenMP runtime * [CI] Use CMake to locate lz4 lib	2020-09-09 21:21:55 -07:00
Philip Hyunsu Cho	9adb812a0a	RMM integration plugin (#5873 ) * [CI] Add RMM as an optional dependency * Replace caching allocator with pool allocator from RMM * Revert "Replace caching allocator with pool allocator from RMM" This reverts commit e15845d4e72e890c2babe31a988b26503a7d9038. * Use rmm::mr::get_default_resource() * Try setting default resource (doesn't work yet) * Allocate pool_mr in the heap * Prevent leaking pool_mr handle * Separate EXPECT_DEATH() in separate test suite suffixed DeathTest * Turn off death tests for RMM * Address reviewer's feedback * Prevent leaking of cuda_mr * Fix Jenkinsfile syntax * Remove unnecessary function in Jenkinsfile * [CI] Install NCCL into RMM container * Run Python tests * Try building with RMM, CUDA 10.0 * Do not use RMM for CUDA 10.0 target * Actually test for test_rmm flag * Fix TestPythonGPU * Use CNMeM allocator, since pool allocator doesn't yet support multiGPU * Use 10.0 container to build RMM-enabled XGBoost * Revert "Use 10.0 container to build RMM-enabled XGBoost" This reverts commit 789021fa31112e25b683aef39fff375403060141. * Fix Jenkinsfile * [CI] Assign larger /dev/shm to NCCL * Use 10.2 artifact to run multi-GPU Python tests * Add CUDA 10.0 -> 11.0 cross-version test; remove CUDA 10.0 target * Rename Conda env rmm_test -> gpu_test * Use env var to opt into CNMeM pool for C++ tests * Use identical CUDA version for RMM builds and tests * Use Pytest fixtures to enable RMM pool in Python tests * Move RMM to plugin/CMakeLists.txt; use PLUGIN_RMM * Use per-device MR; use command arg in gtest * Set CMake prefix path to use Conda env * Use 0.15 nightly version of RMM * Remove unnecessary header * Fix a unit test when cudf is missing * Add RMM demos * Remove print() * Use HostDeviceVector in GPU predictor * Simplify pytest setup; use LocalCUDACluster fixture * Address reviewers' commments Co-authored-by: Hyunsu Cho <chohyu01@cs.wasshington.edu>	2020-08-12 01:26:02 -07:00
Vladislav Epifanov	388f975cf5	Introducing DPC++-based plugin (predictor, objective function) supporting oneAPI programming model (#5825 ) * Added plugin with DPC++-based predictor and objective function * Update CMakeLists.txt * Update regression_obj_oneapi.cc * Added README.md for OneAPI plugin * Added OneAPI predictor support to gbtree * Update README.md * Merged kernels in gradient computation. Enabled multiple loss functions with DPC++ backend * Aligned plugin CMake files with latest master changes. Fixed whitespace typos * Removed debug output * [CI] Make oneapi_plugin a CMake target * Added tests for OneAPI plugin for predictor and obj. functions * Temporarily switched to default selector for device dispacthing in OneAPI plugin to enable execution in environments without gpus * Updated readme file. * Fixed USM usage in predictor * Removed workaround with explicit templated names for DPC++ kernels * Fixed warnings in plugin tests * Fix CMake build of gtest Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-08-08 18:40:40 -07:00
Philip Hyunsu Cho	0d411b0397	[CI] Simplify CMake build with modern CMake techniques (#5871 ) * [CI] Simplify CMake build * Make sure that plugins can be built * [CI] Install lz4 on Mac	2020-07-08 04:23:24 -07:00
Jiaming Yuan	0012f2ef93	Upgrade clang-tidy on CI. (#5469 ) * Correct all clang-tidy errors. * Upgrade clang-tidy to 10 on CI. Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-04-05 04:42:29 +08:00
Rory Mitchell	1de36cdf1e	Add link to GPU documentation (#5437 )	2020-03-24 09:29:29 +13:00
Jiaming Yuan	ac457c56a2	Use `UpdateAllowUnknown' for non-model related parameter. (#4961 ) * Use `UpdateAllowUnknown' for non-model related parameter. Model parameter can not pack an additional boolean value due to binary IO format. This commit deals only with non-model related parameter configuration. * Add tidy command line arg for use-dmlc-gtest.	2019-10-23 05:50:12 -04:00
Jiaming Yuan	ae536756ae	Add Model and Configurable interface. (#4945 ) * Apply Configurable to objective functions. * Apply Model to Learner and Regtree, gbm. * Add Load/SaveConfig to objs. * Refactor obj tests to use smart pointer. * Dummy methods for Save/Load Model.	2019-10-18 01:56:02 -04:00
Rong Ou	38ab79f889	Make HostDeviceVector single gpu only (#4773 ) * Make HostDeviceVector single gpu only	2019-08-26 09:51:13 +12:00
Jiaming Yuan	ab357dd41c	Remove plugin, cuda related code in automake & autoconf files (#4789 ) * Build plugin example with CMake. * Remove plugin, cuda related code in automake & autoconf files. * Fix typo in GPU doc.	2019-08-18 16:54:34 -04:00
Andy Adinets	72cd1517d6	Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. (#3446 ) * Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. - added distributions to HostDeviceVector - using HostDeviceVector for labels, weights and base margings in MetaInfo - using HostDeviceVector for offset and data in SparsePage - other necessary refactoring * Added const version of HostDeviceVector API calls. - const versions added to calls that can trigger data transfers, e.g. DevicePointer() - updated the code that uses HostDeviceVector - objective functions now accept const HostDeviceVector<bst_float>& for predictions * Updated src/linear/updater_gpu_coordinate.cu. * Added read-only state for HostDeviceVector sync. - this means no copies are performed if both host and devices access the HostDeviceVector read-only * Fixed linter and test errors. - updated the lz4 plugin - added ConstDeviceSpan to HostDeviceVector - using device % dh::NVisibleDevices() for the physical device number, e.g. in calls to cudaSetDevice() * Fixed explicit template instantiation errors for HostDeviceVector. - replaced HostDeviceVector<unsigned int> with HostDeviceVector<int> * Fixed HostDeviceVector tests that require multiple GPUs. - added a mock set device handler; when set, it is called instead of cudaSetDevice()	2018-08-30 14:28:47 +12:00

1 2

91 Commits