xgboost

Author	SHA1	Message	Date
Jiaming Yuan	83a078b7e5	[backport] Fix sklearn test that calls a removed field (#8579 ) (#8636 ) Co-authored-by: Rong Ou <rong.ou@gmail.com>	2023-01-05 21:17:05 +08:00
Jiaming Yuan	575fba651b	[backport] [CI] Fix CI with updated dependencies. (#8631 ) (#8635 )	2023-01-05 19:10:58 +08:00
Jiaming Yuan	60a8c8ebba	[pyspark] sort qid for SparkRanker (#8497 ) (#8555 ) * [pyspark] sort qid for SparkRandker * resolve comments Co-authored-by: Bobby Wang <wbo4958@gmail.com>	2022-12-07 02:07:37 +08:00
Philip Hyunsu Cho	5b76acccff	Add back xgboost.rabit for backwards compatibility (#8408 ) (#8411 )	2022-11-02 07:56:55 -07:00
Jiaming Yuan	c884b9e888	Validate features for inplace predict. (#8359 )	2022-10-19 23:05:36 +08:00
Bobby Wang	76f95a6667	[pyspark] Filter out the unsupported train parameters (#8355 )	2022-10-18 23:26:02 +08:00
Jiaming Yuan	3901f5d9db	[pyspark] Cleanup data processing. (#8344 ) * Enable additional combinations of ctor parameters. * Unify procedures for QuantileDMatrix and DMatrix.	2022-10-18 14:56:23 +08:00
Jiaming Yuan	2176e511fc	Disable pytest-timeout for now. (#8348 )	2022-10-17 23:06:10 +08:00
Jiaming Yuan	97a5b088a5	[pyspark] Use quantile dmatrix. (#8284 )	2022-10-12 20:38:53 +08:00
Rory Mitchell	ce0382dcb0	[CI] Refactor tests to reduce CI time. (#8312 )	2022-10-12 11:32:06 +02:00
Jiaming Yuan	5545c49cfc	Require keyword args for data iterator. (#8327 )	2022-10-10 17:47:13 +08:00
Rong Ou	668b8a0ea4	[Breaking] Switch from rabit to the collective communicator (#8257 ) * Switch from rabit to the collective communicator * fix size_t specialization * really fix size_t * try again * add include * more include * fix lint errors * remove rabit includes * fix pylint error * return dict from communicator context * fix communicator shutdown * fix dask test * reset communicator mocklist * fix distributed tests * do not save device communicator * fix jvm gpu tests * add python test for federated communicator * Update gputreeshap submodule Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-05 14:39:01 -08:00
Jiaming Yuan	55cf24cc32	Obtain CSR matrix from DMatrix. (#8269 )	2022-09-29 20:41:43 +08:00
Jiaming Yuan	fcab51aa82	Support more pandas nullable types (#8262 ) - Float32/64 - Category.	2022-09-27 01:59:50 +08:00
WeichenXu	ff71c69adf	[pyspark] Add validation for param 'early_stopping_rounds' and 'validation_indicator_col' (#8250 ) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>	2022-09-26 17:43:03 +08:00
Jiaming Yuan	b791446623	Initial support for IPv6 (#8225 ) - Merge rabit socket into XGBoost. - Dask interface support. - Add test to the socket.	2022-09-21 18:06:50 +08:00
Jiaming Yuan	fffb1fca52	Calculate `base_score` based on input labels for mae. (#8107 ) Fit an intercept as base score for abs loss.	2022-09-20 20:53:54 +08:00
Bobby Wang	4f42aa5f12	[pyspark] make the model saved by pyspark compatible (#8219 ) Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2022-09-20 16:43:49 +08:00
Bobby Wang	520586ffa7	[pyspark] fix empty data issue when constructing DMatrix (#8245 ) Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-09-20 16:43:20 +08:00
Jiaming Yuan	2e63af6117	Mitigate flaky data iter test. (#8244 ) - Reduce the number of batches. - Verify labels.	2022-09-14 17:54:14 +08:00
Rong Ou	a2686543a9	Common interface for collective communication (#8057 ) * implement broadcast for federated communicator * implement allreduce * add communicator factory * add device adapter * add device communicator to factory * add rabit communicator * add rabit communicator to the factory * add nccl device communicator * add synchronize to device communicator * add back print and getprocessorname * add python wrapper and c api * clean up types * fix non-gpu build * try to fix ci * fix std::size_t * portable string compare ignore case * c style size_t * fix lint errors * cross platform setenv * fix memory leak * fix lint errors * address review feedback * add python test for rabit communicator * fix failing gtest * use json to configure communicators * fix lint error * get rid of factories * fix cpu build * fix include * fix python import * don't export collective.py yet * skip collective communicator pytest on windows * add review feedback * update documentation * remove mpi communicator type * fix tests * shutdown the communicator separately Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2022-09-12 15:21:12 -07:00
Jiaming Yuan	b5eb36f1af	Add `max_cat_threshold` to GPU and handle missing cat values. (#8212 )	2022-09-07 00:57:51 +08:00
WeichenXu	d03794ce7a	[pyspark] Add param validation for "objective" and "eval_metric" param, and remove invalid booster params (#8173 ) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>	2022-08-24 15:29:43 +08:00
WeichenXu	f4628c22a4	[pyspark] Implement SparkXGBRanker estimator (#8172 ) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>	2022-08-23 02:35:19 +08:00
WeichenXu	53d2a733b0	[pyspark] Make Xgboost estimator support using sparse matrix as optimization (#8145 ) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>	2022-08-19 01:57:28 +08:00
Jiaming Yuan	36e7c5364d	[dask] Deterministic rank assignment. (#8018 )	2022-08-11 19:17:58 +08:00
Jiaming Yuan	570f8ae4ba	Use black on more Python files. (#8137 )	2022-08-11 01:38:11 +08:00
Jiaming Yuan	446d536c23	Fix loading DMatrix binary in distributed env. (#8149 ) - Try to load DMatrix binary before trying to parse text input. - Remove some unmaintained code.	2022-08-10 22:53:16 +08:00
Jiaming Yuan	9ae547f994	Use config_context in sklearn interface. (#8141 )	2022-08-09 14:48:54 +08:00
Bobby Wang	03cc3b359c	[pyspark] support a list of feature column names (#8117 )	2022-08-08 17:05:27 +08:00
Jiaming Yuan	d87f69215e	Quantile DMatrix for CPU. (#8130 ) - Add a new `QuantileDMatrix` that works for both CPU and GPU. - Deprecate `DeviceQuantileDMatrix`.	2022-08-02 15:51:23 +08:00
Jiaming Yuan	546de5efd2	[pyspark] Cleanup data processing. (#8088 ) - Use numpy stack for handling list of arrays. - Reuse concat function from dask. - Prepare for `QuantileDMatrix`. - Remove unused code. - Use iterator for prediction to avoid initializing xgboost model	2022-07-26 15:00:52 +08:00
Bobby Wang	f801d3cf15	[PySpark] change the returning model type to string from binary (#8085 ) * [PySpark] change the returning model type to string from binary XGBoost pyspark can be can be accelerated by RAPIDS Accelerator seamlessly by changing the returning model type from binary to string.	2022-07-19 18:39:20 +08:00
Jiaming Yuan	2365f82750	[dask] Mitigate non-deterministic test. (#8077 )	2022-07-19 16:55:59 +08:00
Jiaming Yuan	647d3844dd	Make test for categorical data deterministic. (#8080 )	2022-07-15 14:48:39 +08:00
Jiaming Yuan	dae7a41baa	Update Python requirement to >=3.8. (#8071 ) Additional changes: - Use mamba for CPU test on Jenkins. - Cleanup CPU test dependencies. - Restore some of the modin tests	2022-07-14 18:01:47 +08:00
WeichenXu	176fec8789	PySpark XGBoost integration (#8020 ) Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2022-07-13 13:11:18 +08:00
Jiaming Yuan	8959622836	[dask] Use an invalid port for test. (#8064 )	2022-07-13 11:59:02 +08:00
Jiaming Yuan	4a87ea49b8	Reduce regularization for CPU gblinear. (#8013 )	2022-06-21 01:05:27 +08:00
Jiaming Yuan	b90c6d25e8	Implement `max_cat_threshold` for CPU. (#7957 )	2022-06-04 11:02:46 +08:00
Jiaming Yuan	13b15e07e8	Handle formatted JSON input. (#7953 )	2022-06-01 16:20:58 +08:00
Jiaming Yuan	bde4f25794	Handle missing categorical value in CPU evaluator. (#7948 )	2022-05-27 14:15:47 +08:00
Jiaming Yuan	18cbebaeb9	Unify the cat split storage for CPU. (#7937 ) * Unify the cat split storage for CPU. * Cleanup. * Workaround.	2022-05-26 04:14:40 -07:00
Jiaming Yuan	606be9e663	Handle missing values in one hot splits. (#7934 )	2022-05-24 20:48:41 +08:00
Jiaming Yuan	474366c020	Add convergence test for sparse datasets. (#7922 )	2022-05-23 18:07:26 +08:00
Jiaming Yuan	f93a727869	Address remaining mypy errors in python package. (#7914 )	2022-05-18 22:46:15 +08:00
Rong Ou	77d4a53c32	use RabitContext intead of init/finalize (#7911 )	2022-05-17 12:15:41 +08:00
Jiaming Yuan	db80671d6b	Fix monotone constraint with tuple input. (#7891 )	2022-05-13 04:00:03 +08:00
Jiaming Yuan	46e0bce212	Use maximum category in sketch. (#7853 )	2022-05-05 19:56:49 +08:00
Jiaming Yuan	fdf533f2b9	[POC] Experimental support for l1 error. (#7812 ) Support adaptive tree, a feature supported by both sklearn and lightgbm. The tree leaf is recomputed based on residue of labels and predictions after construction. For l1 error, the optimal value is the median (50 percentile). This is marked as experimental support for the following reasons: - The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors. - Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.	2022-04-26 21:41:55 +08:00

1 2 3 4 5 ...

479 Commits