xgboost

Author	SHA1	Message	Date
Jiaming Yuan	bdc1a3c178	Fix pyspark parameter. (#9460 ) - Don't pass the `use_gpu` parameter to the learner. - Fix GPU approx with PySpark.	2023-08-11 19:07:50 +08:00
Bobby Wang	d495a180d8	[pyspark] add logs for training (#9449 )	2023-08-09 18:32:23 +08:00
Jiaming Yuan	275da176ba	Document for device ordinal. (#9398 ) - Rewrite GPU demos. notebook is converted to script to avoid committing additional png plots. - Add GPU demos into the sphinx gallery. - Add RMM demos into the sphinx gallery. - Test for firing threads with different device ordinals.	2023-07-22 15:26:29 +08:00
Jiaming Yuan	6e18d3a290	[pyspark] Handle the `device` parameter in pyspark. (#9390 ) - Handle the new `device` parameter in PySpark. - Deprecate the old `use_gpu` parameter.	2023-07-18 08:47:03 +08:00
Jiaming Yuan	16eb41936d	Handle the new `device` parameter in dask and demos. (#9386 ) * Handle the new `device` parameter in dask and demos. - Check no ordinal is specified in the dask interface. - Update demos. - Update dask doc. - Update the condition for QDM.	2023-07-15 19:11:20 +08:00
Jiaming Yuan	04aff3af8e	Define the new `device` parameter. (#9362 )	2023-07-13 19:30:25 +08:00
Bobby Wang	320323f533	[pyspark] add parameters in the ctor of all estimators. (#9202 ) --------- Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2023-05-29 05:58:16 +08:00
Bobby Wang	6274fba0a5	[pyspark] support tying (#9172 )	2023-05-19 14:39:26 +08:00
Bobby Wang	caf326d508	[pyspark] Refactor and typing support for models (#9156 )	2023-05-17 16:38:51 +08:00
Bobby Wang	17add4776f	[pyspark] Don't stack for non feature columns (#9088 )	2023-04-25 23:09:12 +08:00
Bobby Wang	339f21e1bf	[pyspark] fix a type hint with old pyspark release (#9079 )	2023-04-24 20:04:14 +08:00
WeichenXu	191d0aa5cf	[spark] Make spark model have the same UID with its estimator (#9022 ) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>	2023-04-14 02:53:30 +08:00
Jiaming Yuan	bac22734fb	Remove ntree limit in python package. (#8345 ) - Remove `ntree_limit`. The parameter has been deprecated since 1.4.0. - The SHAP package compatibility is broken.	2023-03-31 19:01:55 +08:00
Jiaming Yuan	c2b3a13e70	[breaking][skl] Remove parameter serialization. (#8963 ) - Remove parameter serialization in the scikit-learn interface. The scikit-lear interface `save_model` will save only the model and discard all hyper-parameters. This is to align with the native XGBoost interface, which distinguishes the hyper-parameter and model parameters. With the scikit-learn interface, model parameters are attributes of the estimator. For instance, `n_features_in_`, `n_classes_` are always accessible with `estimator.n_features_in_` and `estimator.n_classes_`, but not with the `estimator.get_params`. - Define a `load_model` method for classifier to load its own attributes. - Set n_estimators to None by default.	2023-03-27 21:34:10 +08:00
Jiaming Yuan	6a892ce281	Specify src path for isort. (#8867 )	2023-03-06 17:30:27 +08:00
mzzhang95	6cef9a08e9	[pyspark] Update eval_metric validation to support list of strings (#8826 )	2023-03-02 08:24:12 +08:00
WeichenXu	f27a7258c6	Fix feature types param (#8772 ) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>	2023-02-14 02:16:42 +08:00
Jiaming Yuan	175986b739	[doc] Add missing document for pyspark ranker. [skip ci] (#8692 )	2023-01-18 07:52:18 +08:00
Bobby Wang	72ec0c5484	[pyspark] support pred_contribs (#8633 )	2023-01-11 16:51:12 +08:00
Bobby Wang	d3ad0524e7	[pyspark] Re-work _fit function (#8630 )	2023-01-04 18:21:57 +08:00
Bobby Wang	40a1a2ffa8	[pyspark] check use_qdm across all the workers (#8496 )	2022-12-08 18:09:17 +08:00
Bobby Wang	8e41ad24f5	[pyspark] sort qid for SparkRanker (#8497 ) * [pyspark] sort qid for SparkRandker * resolve comments	2022-12-01 16:40:35 -08:00
WeichenXu	67ea1c3435	[pyspark] Make QDM optional based on cuDF check (#8471 )	2022-11-27 14:58:54 +08:00
Jiaming Yuan	cfd2a9f872	Extract dask and spark test into distributed test. (#8395 ) - Move test files. - Run spark and dask separately to prevent conflicts. - Gather common code into the testing module.	2022-10-28 16:24:32 +08:00
Jiaming Yuan	d0b99bdd95	[pyspark] Add type hint to basic utilities. (#8375 )	2022-10-25 17:26:25 +08:00
Bobby Wang	76f95a6667	[pyspark] Filter out the unsupported train parameters (#8355 )	2022-10-18 23:26:02 +08:00
Jiaming Yuan	3901f5d9db	[pyspark] Cleanup data processing. (#8344 ) * Enable additional combinations of ctor parameters. * Unify procedures for QuantileDMatrix and DMatrix.	2022-10-18 14:56:23 +08:00
Jiaming Yuan	97a5b088a5	[pyspark] Use quantile dmatrix. (#8284 )	2022-10-12 20:38:53 +08:00
Rong Ou	668b8a0ea4	[Breaking] Switch from rabit to the collective communicator (#8257 ) * Switch from rabit to the collective communicator * fix size_t specialization * really fix size_t * try again * add include * more include * fix lint errors * remove rabit includes * fix pylint error * return dict from communicator context * fix communicator shutdown * fix dask test * reset communicator mocklist * fix distributed tests * do not save device communicator * fix jvm gpu tests * add python test for federated communicator * Update gputreeshap submodule Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-10-05 14:39:01 -08:00
Bobby Wang	c91fed083d	[pyspark] disable repartition_random_shuffle by default (#8283 )	2022-09-29 10:50:51 +08:00
WeichenXu	ff71c69adf	[pyspark] Add validation for param 'early_stopping_rounds' and 'validation_indicator_col' (#8250 ) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>	2022-09-26 17:43:03 +08:00
WeichenXu	ab342af242	[pyspark] Fix xgboost spark estimator dataset repartition issues (#8231 )	2022-09-22 21:31:41 +08:00
Bobby Wang	4f42aa5f12	[pyspark] make the model saved by pyspark compatible (#8219 ) Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2022-09-20 16:43:49 +08:00
Bobby Wang	520586ffa7	[pyspark] fix empty data issue when constructing DMatrix (#8245 ) Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>	2022-09-20 16:43:20 +08:00
Bobby Wang	7ee10e3dbd	[pyspark] Cleanup the comments (#8217 )	2022-09-05 16:20:12 +08:00
WeichenXu	651f0a8889	[pyspark] Fixing xgboost.spark python doc (#8200 ) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>	2022-08-25 14:41:48 +08:00
WeichenXu	d03794ce7a	[pyspark] Add param validation for "objective" and "eval_metric" param, and remove invalid booster params (#8173 ) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>	2022-08-24 15:29:43 +08:00
WeichenXu	f4628c22a4	[pyspark] Implement SparkXGBRanker estimator (#8172 ) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>	2022-08-23 02:35:19 +08:00
WeichenXu	53d2a733b0	[pyspark] Make Xgboost estimator support using sparse matrix as optimization (#8145 ) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>	2022-08-19 01:57:28 +08:00
Bobby Wang	03cc3b359c	[pyspark] support a list of feature column names (#8117 )	2022-08-08 17:05:27 +08:00
Jiaming Yuan	546de5efd2	[pyspark] Cleanup data processing. (#8088 ) - Use numpy stack for handling list of arrays. - Reuse concat function from dask. - Prepare for `QuantileDMatrix`. - Remove unused code. - Use iterator for prediction to avoid initializing xgboost model	2022-07-26 15:00:52 +08:00
Jiaming Yuan	8bdea72688	[Python] Require black and isort for new Python files. (#8096 ) * [Python] Require black and isort for new Python files. - Require black and isort for spark and dask module. These files are relatively new and are more conform to the black formatter. We will convert the rest of the library as we move forward. Other libraries including dask/distributed and optuna use the same formatting style and have a more strict standard. The black formatter is indeed quite nice, automating it can help us unify the code style. - Gather Python checks into a single script.	2022-07-20 10:25:24 +08:00
WeichenXu	f23cc92130	[pyspark] User guide doc and tutorials (#8082 ) Co-authored-by: Bobby Wang <wbo4958@gmail.com>	2022-07-19 22:25:14 +08:00
Bobby Wang	f801d3cf15	[PySpark] change the returning model type to string from binary (#8085 ) * [PySpark] change the returning model type to string from binary XGBoost pyspark can be can be accelerated by RAPIDS Accelerator seamlessly by changing the returning model type from binary to string.	2022-07-19 18:39:20 +08:00
Jiaming Yuan	e28f6f6657	[doc] Integrate pyspark module into sphinx doc [skip ci] (#8066 )	2022-07-17 10:46:09 +08:00
Bobby Wang	a33f35eecf	[PySpark] add gpu support for spark local mode (#8068 )	2022-07-17 07:59:06 +08:00
Bobby Wang	91bb9e2cb3	[PySpark] fix raw_prediction_col parameter and minor cleanup (#8067 )	2022-07-16 17:58:57 +08:00
WeichenXu	176fec8789	PySpark XGBoost integration (#8020 ) Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2022-07-13 13:11:18 +08:00

48 Commits