902 Commits

Author SHA1 Message Date
Rong Ou
99fa8dad2d
Add back xgboost.rabit for backwards compatibility (#8408)
* Add back xgboost.rabit for backwards compatibility

* fix my errors

* Fix lint

* Use FutureWarning

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-11-01 21:47:41 -07:00
Jiaming Yuan
a408c34558
Update JSON parser demo with categorical feature. (#8401)
- Parse categorical features in the Python example.
- Add tests.
- Update document.
2022-10-28 20:57:43 +08:00
Jiaming Yuan
cfd2a9f872
Extract dask and spark test into distributed test. (#8395)
- Move test files.
- Run spark and dask separately to prevent conflicts.
- Gather common code into the testing module.
2022-10-28 16:24:32 +08:00
Jiaming Yuan
f73520bfff
Bump development version to 2.0. (#8390) 2022-10-28 15:21:19 +08:00
Yizhi Liu
5699f60a88
Type fix for WebAssembly: use bst_ulong instead of size_t for ncol in CSR conversion. (#8369) 2022-10-26 19:21:45 +08:00
Jiaming Yuan
cf70864fa3
Move Python testing utilities into xgboost module. (#8379)
- Add typehints.
- Fixes for pylint.

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-26 16:56:11 +08:00
Jiaming Yuan
d0b99bdd95
[pyspark] Add type hint to basic utilities. (#8375) 2022-10-25 17:26:25 +08:00
Jiaming Yuan
c884b9e888
Validate features for inplace predict. (#8359) 2022-10-19 23:05:36 +08:00
luca-s
c47c71e34f
XGBRanker documentation: few clarifications (#8356) 2022-10-19 01:54:14 +08:00
Bobby Wang
76f95a6667
[pyspark] Filter out the unsupported train parameters (#8355) 2022-10-18 23:26:02 +08:00
Jiaming Yuan
3901f5d9db
[pyspark] Cleanup data processing. (#8344)
* Enable additional combinations of ctor parameters.
* Unify procedures for QuantileDMatrix and DMatrix.
2022-10-18 14:56:23 +08:00
luca-s
5647fc6542
XGBRanker documentation: missing default objective (#8347) 2022-10-18 10:43:29 +08:00
Rong Ou
8f3dee58be
Speed up tests with federated learning enabled (#8350)
* Speed up tests with federated learning enabled

* Re-enable timeouts

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-17 15:17:04 -07:00
Jiaming Yuan
2176e511fc
Disable pytest-timeout for now. (#8348) 2022-10-17 23:06:10 +08:00
Jiaming Yuan
fcddbc9264
FIx incorrect function name. (#8346) 2022-10-17 19:28:20 +08:00
Rong Ou
80e10e02ab
Avoid blank lines with federated training (#8342) 2022-10-14 14:55:01 +08:00
Jiaming Yuan
97a5b088a5
[pyspark] Use quantile dmatrix. (#8284) 2022-10-12 20:38:53 +08:00
Jiaming Yuan
c68684ff4c
Update parameter for categorical feature. (#8285) 2022-10-10 19:48:29 +08:00
Jiaming Yuan
5545c49cfc
Require keyword args for data iterator. (#8327) 2022-10-10 17:47:13 +08:00
Rong Ou
668b8a0ea4
[Breaking] Switch from rabit to the collective communicator (#8257)
* Switch from rabit to the collective communicator

* fix size_t specialization

* really fix size_t

* try again

* add include

* more include

* fix lint errors

* remove rabit includes

* fix pylint error

* return dict from communicator context

* fix communicator shutdown

* fix dask test

* reset communicator mocklist

* fix distributed tests

* do not save device communicator

* fix jvm gpu tests

* add python test for federated communicator

* Update gputreeshap submodule

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 14:39:01 -08:00
Jiaming Yuan
e47b3a3da3
Upgrade mypy. (#8302)
Some breaking changes were made in mypy.
2022-10-05 14:31:59 +08:00
Jiaming Yuan
97c3a80a34
Add C document to sphinx, fix arrow. (#8300)
- Group C API.
- Add C API sphinx doc.
- Consistent use of `OptionalArg` and the parameter name `config`.
- Remove call to deprecated functions in demo.
- Fix some formatting errors.
- Add links to c examples in the document (only visible with doxygen pages)
- Fix arrow.
2022-10-05 09:52:15 +08:00
Jiaming Yuan
55cf24cc32
Obtain CSR matrix from DMatrix. (#8269) 2022-09-29 20:41:43 +08:00
Bobby Wang
c91fed083d
[pyspark] disable repartition_random_shuffle by default (#8283) 2022-09-29 10:50:51 +08:00
Jiaming Yuan
6925b222e0
Fix mixed types with cuDF. (#8280) 2022-09-29 00:57:52 +08:00
Jiaming Yuan
f835368bcf
Mark next release as 1.7 instead of 2.0 (#8281) 2022-09-28 14:33:37 +08:00
Jiaming Yuan
fcab51aa82
Support more pandas nullable types (#8262)
- Float32/64
- Category.
2022-09-27 01:59:50 +08:00
WeichenXu
ff71c69adf
[pyspark] Add validation for param 'early_stopping_rounds' and 'validation_indicator_col' (#8250)
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-09-26 17:43:03 +08:00
WeichenXu
ab342af242
[pyspark] Fix xgboost spark estimator dataset repartition issues (#8231) 2022-09-22 21:31:41 +08:00
Jiaming Yuan
b791446623
Initial support for IPv6 (#8225)
- Merge rabit socket into XGBoost.
- Dask interface support.
- Add test to the socket.
2022-09-21 18:06:50 +08:00
Bobby Wang
4f42aa5f12
[pyspark] make the model saved by pyspark compatible (#8219)
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2022-09-20 16:43:49 +08:00
Bobby Wang
520586ffa7
[pyspark] fix empty data issue when constructing DMatrix (#8245)
Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-09-20 16:43:20 +08:00
Jiaming Yuan
bdf265076d
Make QuantileDMatrix default to sklearn esitmators. (#8220) 2022-09-13 13:52:19 +08:00
Rong Ou
a2686543a9
Common interface for collective communication (#8057)
* implement broadcast for federated communicator

* implement allreduce

* add communicator factory

* add device adapter

* add device communicator to factory

* add rabit communicator

* add rabit communicator to the factory

* add nccl device communicator

* add synchronize to device communicator

* add back print and getprocessorname

* add python wrapper and c api

* clean up types

* fix non-gpu build

* try to fix ci

* fix std::size_t

* portable string compare ignore case

* c style size_t

* fix lint errors

* cross platform setenv

* fix memory leak

* fix lint errors

* address review feedback

* add python test for rabit communicator

* fix failing gtest

* use json to configure communicators

* fix lint error

* get rid of factories

* fix cpu build

* fix include

* fix python import

* don't export collective.py yet

* skip collective communicator pytest on windows

* add review feedback

* update documentation

* remove mpi communicator type

* fix tests

* shutdown the communicator separately

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2022-09-12 15:21:12 -07:00
Bobby Wang
7ee10e3dbd
[pyspark] Cleanup the comments (#8217) 2022-09-05 16:20:12 +08:00
Jiaming Yuan
ada4a86d1c
Fix dask interface with latest cupy. (#8210) 2022-09-03 03:10:43 +08:00
Rong Ou
b78bc734d9
Fix dask.py lint error (#8216) 2022-09-02 16:30:01 +08:00
WeichenXu
651f0a8889
[pyspark] Fixing xgboost.spark python doc (#8200)
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-08-25 14:41:48 +08:00
WeichenXu
d03794ce7a
[pyspark] Add param validation for "objective" and "eval_metric" param, and remove invalid booster params (#8173)
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-08-24 15:29:43 +08:00
Jiaming Yuan
9b32e6e2dc
Fix release script. (#8187) (#8195) 2022-08-23 15:08:30 +08:00
WeichenXu
f4628c22a4
[pyspark] Implement SparkXGBRanker estimator (#8172)
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-08-23 02:35:19 +08:00
Rong Ou
ad3bc0edee
Allow insecure gRPC connections for federated learning (#8181)
* Allow insecure gRPC connections for federated learning

* format
2022-08-19 12:16:14 +08:00
WeichenXu
53d2a733b0
[pyspark] Make Xgboost estimator support using sparse matrix as optimization (#8145)
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-08-19 01:57:28 +08:00
Gavin Zhang
40a10c217d
Use make on i system (#8178)
Co-authored-by: GavinZhang <zhanggan@cn.ibm.com>
2022-08-18 12:55:32 +08:00
Jiaming Yuan
36e7c5364d
[dask] Deterministic rank assignment. (#8018) 2022-08-11 19:17:58 +08:00
Jiaming Yuan
570f8ae4ba
Use black on more Python files. (#8137) 2022-08-11 01:38:11 +08:00
Jiaming Yuan
bdb291f1c2
[doc] Clarification for feature importance. (#8151) 2022-08-11 00:30:42 +08:00
Jiaming Yuan
9ae547f994
Use config_context in sklearn interface. (#8141) 2022-08-09 14:48:54 +08:00
Bobby Wang
03cc3b359c
[pyspark] support a list of feature column names (#8117) 2022-08-08 17:05:27 +08:00
Jiaming Yuan
d87f69215e
Quantile DMatrix for CPU. (#8130)
- Add a new `QuantileDMatrix` that works for both CPU and GPU.
- Deprecate `DeviceQuantileDMatrix`.
2022-08-02 15:51:23 +08:00