Jiaming Yuan
bdc1a3c178
Fix pyspark parameter. ( #9460 )
...
- Don't pass the `use_gpu` parameter to the learner.
- Fix GPU approx with PySpark.
2023-08-11 19:07:50 +08:00
Bobby Wang
d495a180d8
[pyspark] add logs for training ( #9449 )
2023-08-09 18:32:23 +08:00
Jiaming Yuan
275da176ba
Document for device ordinal. ( #9398 )
...
- Rewrite GPU demos. notebook is converted to script to avoid committing additional png plots.
- Add GPU demos into the sphinx gallery.
- Add RMM demos into the sphinx gallery.
- Test for firing threads with different device ordinals.
2023-07-22 15:26:29 +08:00
Jiaming Yuan
6e18d3a290
[pyspark] Handle the device parameter in pyspark. ( #9390 )
...
- Handle the new `device` parameter in PySpark.
- Deprecate the old `use_gpu` parameter.
2023-07-18 08:47:03 +08:00
Jiaming Yuan
16eb41936d
Handle the new device parameter in dask and demos. ( #9386 )
...
* Handle the new `device` parameter in dask and demos.
- Check no ordinal is specified in the dask interface.
- Update demos.
- Update dask doc.
- Update the condition for QDM.
2023-07-15 19:11:20 +08:00
Jiaming Yuan
04aff3af8e
Define the new device parameter. ( #9362 )
2023-07-13 19:30:25 +08:00
Bobby Wang
320323f533
[pyspark] add parameters in the ctor of all estimators. ( #9202 )
...
---------
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-05-29 05:58:16 +08:00
Bobby Wang
6274fba0a5
[pyspark] support tying ( #9172 )
2023-05-19 14:39:26 +08:00
Bobby Wang
caf326d508
[pyspark] Refactor and typing support for models ( #9156 )
2023-05-17 16:38:51 +08:00
Bobby Wang
17add4776f
[pyspark] Don't stack for non feature columns ( #9088 )
2023-04-25 23:09:12 +08:00
Bobby Wang
339f21e1bf
[pyspark] fix a type hint with old pyspark release ( #9079 )
2023-04-24 20:04:14 +08:00
WeichenXu
191d0aa5cf
[spark] Make spark model have the same UID with its estimator ( #9022 )
...
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2023-04-14 02:53:30 +08:00
Jiaming Yuan
bac22734fb
Remove ntree limit in python package. ( #8345 )
...
- Remove `ntree_limit`. The parameter has been deprecated since 1.4.0.
- The SHAP package compatibility is broken.
2023-03-31 19:01:55 +08:00
Jiaming Yuan
c2b3a13e70
[breaking][skl] Remove parameter serialization. ( #8963 )
...
- Remove parameter serialization in the scikit-learn interface.
The scikit-lear interface `save_model` will save only the model and discard all
hyper-parameters. This is to align with the native XGBoost interface, which distinguishes
the hyper-parameter and model parameters.
With the scikit-learn interface, model parameters are attributes of the estimator. For
instance, `n_features_in_`, `n_classes_` are always accessible with
`estimator.n_features_in_` and `estimator.n_classes_`, but not with the
`estimator.get_params`.
- Define a `load_model` method for classifier to load its own attributes.
- Set n_estimators to None by default.
2023-03-27 21:34:10 +08:00
Jiaming Yuan
6a892ce281
Specify src path for isort. ( #8867 )
2023-03-06 17:30:27 +08:00
mzzhang95
6cef9a08e9
[pyspark] Update eval_metric validation to support list of strings ( #8826 )
2023-03-02 08:24:12 +08:00
WeichenXu
f27a7258c6
Fix feature types param ( #8772 )
...
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2023-02-14 02:16:42 +08:00
Jiaming Yuan
175986b739
[doc] Add missing document for pyspark ranker. [skip ci] ( #8692 )
2023-01-18 07:52:18 +08:00
Bobby Wang
72ec0c5484
[pyspark] support pred_contribs ( #8633 )
2023-01-11 16:51:12 +08:00
Bobby Wang
d3ad0524e7
[pyspark] Re-work _fit function ( #8630 )
2023-01-04 18:21:57 +08:00
Bobby Wang
40a1a2ffa8
[pyspark] check use_qdm across all the workers ( #8496 )
2022-12-08 18:09:17 +08:00
Bobby Wang
8e41ad24f5
[pyspark] sort qid for SparkRanker ( #8497 )
...
* [pyspark] sort qid for SparkRandker
* resolve comments
2022-12-01 16:40:35 -08:00
WeichenXu
67ea1c3435
[pyspark] Make QDM optional based on cuDF check ( #8471 )
2022-11-27 14:58:54 +08:00
Jiaming Yuan
cfd2a9f872
Extract dask and spark test into distributed test. ( #8395 )
...
- Move test files.
- Run spark and dask separately to prevent conflicts.
- Gather common code into the testing module.
2022-10-28 16:24:32 +08:00
Jiaming Yuan
d0b99bdd95
[pyspark] Add type hint to basic utilities. ( #8375 )
2022-10-25 17:26:25 +08:00
Bobby Wang
76f95a6667
[pyspark] Filter out the unsupported train parameters ( #8355 )
2022-10-18 23:26:02 +08:00
Jiaming Yuan
3901f5d9db
[pyspark] Cleanup data processing. ( #8344 )
...
* Enable additional combinations of ctor parameters.
* Unify procedures for QuantileDMatrix and DMatrix.
2022-10-18 14:56:23 +08:00
Jiaming Yuan
97a5b088a5
[pyspark] Use quantile dmatrix. ( #8284 )
2022-10-12 20:38:53 +08:00
Rong Ou
668b8a0ea4
[Breaking] Switch from rabit to the collective communicator ( #8257 )
...
* Switch from rabit to the collective communicator
* fix size_t specialization
* really fix size_t
* try again
* add include
* more include
* fix lint errors
* remove rabit includes
* fix pylint error
* return dict from communicator context
* fix communicator shutdown
* fix dask test
* reset communicator mocklist
* fix distributed tests
* do not save device communicator
* fix jvm gpu tests
* add python test for federated communicator
* Update gputreeshap submodule
Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 14:39:01 -08:00
Bobby Wang
c91fed083d
[pyspark] disable repartition_random_shuffle by default ( #8283 )
2022-09-29 10:50:51 +08:00
WeichenXu
ff71c69adf
[pyspark] Add validation for param 'early_stopping_rounds' and 'validation_indicator_col' ( #8250 )
...
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-09-26 17:43:03 +08:00
WeichenXu
ab342af242
[pyspark] Fix xgboost spark estimator dataset repartition issues ( #8231 )
2022-09-22 21:31:41 +08:00
Bobby Wang
4f42aa5f12
[pyspark] make the model saved by pyspark compatible ( #8219 )
...
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2022-09-20 16:43:49 +08:00
Bobby Wang
520586ffa7
[pyspark] fix empty data issue when constructing DMatrix ( #8245 )
...
Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-09-20 16:43:20 +08:00
Bobby Wang
7ee10e3dbd
[pyspark] Cleanup the comments ( #8217 )
2022-09-05 16:20:12 +08:00
WeichenXu
651f0a8889
[pyspark] Fixing xgboost.spark python doc ( #8200 )
...
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-08-25 14:41:48 +08:00
WeichenXu
d03794ce7a
[pyspark] Add param validation for "objective" and "eval_metric" param, and remove invalid booster params ( #8173 )
...
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-08-24 15:29:43 +08:00
WeichenXu
f4628c22a4
[pyspark] Implement SparkXGBRanker estimator ( #8172 )
...
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-08-23 02:35:19 +08:00
WeichenXu
53d2a733b0
[pyspark] Make Xgboost estimator support using sparse matrix as optimization ( #8145 )
...
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-08-19 01:57:28 +08:00
Bobby Wang
03cc3b359c
[pyspark] support a list of feature column names ( #8117 )
2022-08-08 17:05:27 +08:00
Jiaming Yuan
546de5efd2
[pyspark] Cleanup data processing. ( #8088 )
...
- Use numpy stack for handling list of arrays.
- Reuse concat function from dask.
- Prepare for `QuantileDMatrix`.
- Remove unused code.
- Use iterator for prediction to avoid initializing xgboost model
2022-07-26 15:00:52 +08:00
Jiaming Yuan
8bdea72688
[Python] Require black and isort for new Python files. ( #8096 )
...
* [Python] Require black and isort for new Python files.
- Require black and isort for spark and dask module.
These files are relatively new and are more conform to the black formatter. We will
convert the rest of the library as we move forward.
Other libraries including dask/distributed and optuna use the same formatting style and
have a more strict standard. The black formatter is indeed quite nice, automating it can
help us unify the code style.
- Gather Python checks into a single script.
2022-07-20 10:25:24 +08:00
WeichenXu
f23cc92130
[pyspark] User guide doc and tutorials ( #8082 )
...
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
2022-07-19 22:25:14 +08:00
Bobby Wang
f801d3cf15
[PySpark] change the returning model type to string from binary ( #8085 )
...
* [PySpark] change the returning model type to string from binary
XGBoost pyspark can be can be accelerated by RAPIDS Accelerator seamlessly by
changing the returning model type from binary to string.
2022-07-19 18:39:20 +08:00
Jiaming Yuan
e28f6f6657
[doc] Integrate pyspark module into sphinx doc [skip ci] ( #8066 )
2022-07-17 10:46:09 +08:00
Bobby Wang
a33f35eecf
[PySpark] add gpu support for spark local mode ( #8068 )
2022-07-17 07:59:06 +08:00
Bobby Wang
91bb9e2cb3
[PySpark] fix raw_prediction_col parameter and minor cleanup ( #8067 )
2022-07-16 17:58:57 +08:00
WeichenXu
176fec8789
PySpark XGBoost integration ( #8020 )
...
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2022-07-13 13:11:18 +08:00