xgboost

Author	SHA1	Message	Date
Jiaming Yuan	a5a58102e5	Revamp the rabit implementation. (#10112 ) This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features: - Federated learning for both CPU and GPU. - NCCL. - More data types. - A unified interface for all the underlying implementations. - Improved timeout handling for both tracker and workers. - Exhausted tests with metrics (fixed a couple of bugs along the way). - A reusable tracker for Python and JVM packages.	2024-05-20 11:56:23 +08:00
Jiaming Yuan	ccfc90e4c6	[rabit] Improved connection handling. (#9531 ) - Enable timeout. - Report connection error from the system. - Handle retry for both tracker connection and peer connection.	2023-08-30 13:00:04 +08:00
Jiaming Yuan	0d3da9869c	Require isort on all Python files. (#8420 )	2022-11-08 12:59:06 +08:00
Jiaming Yuan	cfd2a9f872	Extract dask and spark test into distributed test. (#8395 ) - Move test files. - Run spark and dask separately to prevent conflicts. - Gather common code into the testing module.	2022-10-28 16:24:32 +08:00
Jiaming Yuan	cf70864fa3	Move Python testing utilities into xgboost module. (#8379 ) - Add typehints. - Fixes for pylint. Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-26 16:56:11 +08:00
Rong Ou	668b8a0ea4	[Breaking] Switch from rabit to the collective communicator (#8257 ) * Switch from rabit to the collective communicator * fix size_t specialization * really fix size_t * try again * add include * more include * fix lint errors * remove rabit includes * fix pylint error * return dict from communicator context * fix communicator shutdown * fix dask test * reset communicator mocklist * fix distributed tests * do not save device communicator * fix jvm gpu tests * add python test for federated communicator * Update gputreeshap submodule Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-05 14:39:01 -08:00
Jiaming Yuan	b791446623	Initial support for IPv6 (#8225 ) - Merge rabit socket into XGBoost. - Dask interface support. - Add test to the socket.	2022-09-21 18:06:50 +08:00
Jiaming Yuan	36e7c5364d	[dask] Deterministic rank assignment. (#8018 )	2022-08-11 19:17:58 +08:00
Rong Ou	77d4a53c32	use RabitContext intead of init/finalize (#7911 )	2022-05-17 12:15:41 +08:00
Jiaming Yuan	f08c5dcb06	Cleanup some pylint errors. (#7667 ) * Cleanup some pylint errors. * Cleanup pylint errors in rabit modules. * Make data iter an abstract class and cleanup private access. * Cleanup no-self-use for booster.	2022-02-19 18:53:12 +08:00
Jiaming Yuan	5cd1f71b51	[dask] Improve configuration for port. (#7645 ) - Try port 0 to let the OS return the available port. - Add port configuration.	2022-02-14 21:34:34 +08:00
Jiaming Yuan	ef4dae4c0e	[dask] Add scheduler address to dask config. (#7581 ) - Add user configuration. - Bring back to the logic of using scheduler address from dask. This was removed when we were trying to support GKE, now we bring it back and let xgboost try it if direct guess or host IP from user config failed.	2022-01-22 01:56:32 +08:00
Jiaming Yuan	f53da412aa	Add typehint to tracker. (#7338 )	2021-10-20 12:49:36 +08:00
Jiaming Yuan	325bc93e16	[dask] Use `distributed.MultiLock` (#6743 ) * [dask] Use `distributed.MultiLock` This enables training multiple models in parallel. * Conditionally import `MultiLock`. * Use async train directly in scikit learn interface. * Use `worker_client` when available.	2021-03-16 14:19:41 +08:00
Jiaming Yuan	6e12c2a6f8	[dask] Supoort running on GKE. (#6343 ) * Avoid accessing `scheduler_info()['workers']`. * Avoid calling `client.gather` inside task. * Avoid using `client.scheduler_address`.	2020-11-11 18:04:34 +08:00
Jiaming Yuan	fa3715f584	[Dask] Asyncio support. (#5862 )	2020-07-30 06:23:58 +08:00
Jiaming Yuan	66cc1e02aa	Setup github action. (#5917 )	2020-07-22 15:05:25 +08:00
Jiaming Yuan	e49607af19	Add Python binding for rabit ops. (#5743 )	2020-06-02 19:47:23 +08:00
Xu Xiao	97eece6ea0	[python package] include dmlc-tracker into xgb python pkg (#4731 )	2019-08-05 12:21:07 -04:00

19 Commits