892 Commits

Author SHA1 Message Date
Jiaming Yuan
9f73127a23
Cleanup Python GPU tests. (#9934)
* Cleanup Python GPU tests.

- Remove the use of `gpu_hist` and `gpu_id` in cudf/cupy tests.
- Move base margin test into the testing directory.
2024-01-04 13:15:18 +08:00
Jiaming Yuan
0edd600f3d
[doc] Brief introduction to base_score. (#9882) 2023-12-17 13:34:34 +08:00
Jiaming Yuan
125bc812f8
[doc] Reference enable_categorical doc in sklearn. (#9884) 2023-12-14 23:29:19 +08:00
Jiaming Yuan
1aa8c8d9be
Support more scipy types. (#9881) 2023-12-14 18:28:37 +08:00
david-cortes
42173d7bc3
[doc] Clarify the effect of enable_categorical (#9877) 2023-12-13 08:39:41 +08:00
Jiaming Yuan
faf0f2df10
Support dataframe data format in native XGBoost. (#9828)
- Implement a columnar adapter.
- Refactor Python pandas handling code to avoid converting into a single numpy array.
- Add support in R for transforming columns.
- Support R data.frame and factor type.
2023-12-12 09:56:31 +08:00
Jiaming Yuan
1094d6015d
[py] Use the first found native library. (#9860) 2023-12-08 17:23:16 +08:00
Jiaming Yuan
39c637ee19
Use array interface in Python prediction return. (#9855) 2023-12-08 03:42:14 +08:00
Jiaming Yuan
e9f149481e
[sklearn] Fix loading model attributes. (#9808) 2023-11-27 17:19:01 +08:00
Jiaming Yuan
8fe1a2213c
Cleanup code for distributed training. (#9805)
* Cleanup code for distributed training.

- Merge `GetNcclResult` into nccl stub.
- Split up utilities from the main dask module.
- Let Channel return `Result` to accommodate nccl channel.
- Remove old `use_label_encoder` parameter.
2023-11-25 09:10:56 +08:00
Jiaming Yuan
e9260de3f3
[breaking] Remove dense libsvm parser plugin. (#9799) 2023-11-23 00:12:39 +08:00
Jiaming Yuan
0715ab3c10
Use dlopen to load NCCL. (#9796)
This PR adds optional support for loading nccl with `dlopen` as an alternative of compile time linking. This is to address the size bloat issue with the PyPI binary release.
- Add CMake option to load `nccl` at runtime.
- Add an NCCL stub.

After this, `nccl` will be fetched from PyPI when using pip to install XGBoost, either by a user or by `pyproject.toml`. Others who want to link the nccl at compile time can continue to do so without any change.

At the moment, this is Linux only since we only support MNMG on Linux.
2023-11-22 19:27:31 +08:00
Bobby Wang
178cfe70a8
[pyspark][doc] Test and doc for stage-level scheduling. (#9786) 2023-11-16 18:15:59 +08:00
Jiaming Yuan
c3a0622b49
Fix using categorical data with the score function of ranker. (#9753) 2023-11-07 07:29:11 +08:00
david-cortes
be20df8c23
[Python] Accept numpy generators as random_state (#9743)
* accept numpy generators for random_state

* make linter happy

* fix tests
2023-11-01 16:20:44 -07:00
omahs
2cfc90e8db
Fix typos (#9731) 2023-10-30 16:52:12 +08:00
Bobby Wang
1323531323
[pyspark] unify the way for determining whether runs on the GPU. (#9724) 2023-10-27 11:21:30 +08:00
Philip Hyunsu Cho
01d59ded00
Fix libpath logic for Windows (#9712)
* Fix libpath logic for Windows (#9687)

* Use sys.base_prefix instead of sys.prefix (#9711)

* Use sys.base_prefix instead of sys.prefix

* Update libpath.py too
2023-10-24 17:25:28 -07:00
Jiaming Yuan
3ca06ac51e
[doc] Mention data consistency for categorical features. (#9678) 2023-10-24 10:11:33 +08:00
Rong Ou
6fbe6248f4
More in-memory input support for column split (#9685) 2023-10-20 16:02:36 +08:00
Rong Ou
da6803b75b
Support column-wise data split with in-memory inputs (#9628)
---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-10-17 12:16:39 +08:00
Bobby Wang
4d1607eefd
[pyspark] Support stage-level scheduling for training (#9519) 2023-10-17 10:35:39 +08:00
Jiaming Yuan
4e5a7729c3
Fix lint errors. (#9634) 2023-10-09 19:04:31 +08:00
Jiaming Yuan
60526100e3
Support arrow through pandas ext types. (#9612)
- Use pandas extension type for pyarrow support.
- Additional support for QDM.
- Additional support for inplace_predict.
2023-09-28 17:00:16 +08:00
Jiaming Yuan
c75a3bc0a9
[breaking] [jvm-packages] Remove rabit check point. (#9599)
- Add `numBoostedRound` to jvm packages
- Remove rabit checkpoint version.
- Change the starting version of training continuation in JVM [breaking].
- Redefine the checkpoint version policy in jvm package. [breaking]
- Rename the Python check point callback parameter. [breaking]
- Unifies the checkpoint policy between Python and JVM.
2023-09-26 18:06:34 +08:00
Jiaming Yuan
a90d204942
Use array interface for testing numpy arrays. (#9602) 2023-09-23 03:13:48 +08:00
Jiaming Yuan
bbf5b9ee57
[dask] Move dask module into directory. (#9597) 2023-09-23 01:28:18 +08:00
Jiaming Yuan
9027686cac
Support pandas 2.1.0. (#9557) 2023-09-11 17:44:51 +08:00
Bobby Wang
6c791b5b47
[pyspark] support gpu transform (#9542)
---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-09-07 12:15:50 +08:00
Bobby Wang
419e052314
[pyspark] rework transform to reuse same code (#9292) 2023-09-04 15:57:16 +08:00
Jiaming Yuan
ccfc90e4c6
[rabit] Improved connection handling. (#9531)
- Enable timeout.
- Report connection error from the system.
- Handle retry for both tracker connection and peer connection.
2023-08-30 13:00:04 +08:00
Jiaming Yuan
1b87a1d8f8
[rabit] Small cleanup to tracker initialization. (#9524)
- Remove recover related code.
- Clean startup, no need to consider previously connected nodes.
2023-08-27 05:10:59 +08:00
Jiaming Yuan
209335b18c
Remove the deprecated Python rabit module. (#9523) 2023-08-27 03:37:05 +08:00
Jiaming Yuan
aa86bd5207
[dask] Filter models on worker. (#9518) 2023-08-25 20:23:47 +08:00
Jiaming Yuan
972730cde0
Use matrix for gradient. (#9508)
- Use the `linalg::Matrix` for storing gradients.
- New API for the custom objective.
- Custom objective for multi-class/multi-target is now required to return the correct shape.
- Custom objective for Python can accept arrays with any strides. (row-major, column-major)
2023-08-24 05:29:52 +08:00
Jiaming Yuan
044fea1281
Drop support for loading remote files. (#9504) 2023-08-21 23:34:05 +08:00
Jiaming Yuan
7f29a238e6
Return base score as intercept. (#9486) 2023-08-19 12:28:02 +08:00
Jiaming Yuan
58530b1bc4
Bump version to 2.1. (#9498) 2023-08-18 01:04:04 +08:00
Bobby Wang
68be454cfa
[pyspark] hotfix for GPU setup validation (#9495)
* [pyspark] fix a bug of validating gpu configuration

---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-08-17 16:01:39 +08:00
Jiaming Yuan
5188e27513
Fix version parsing with rc release. (#9493) 2023-08-16 22:44:58 +08:00
Jiaming Yuan
bdc1a3c178
Fix pyspark parameter. (#9460)
- Don't pass the `use_gpu` parameter to the learner.
- Fix GPU approx with PySpark.
2023-08-11 19:07:50 +08:00
Jiaming Yuan
1caa93221a
Use realloc for histogram cache and expose the cache limit. (#9455) 2023-08-10 14:05:27 +08:00
Jiaming Yuan
f05a23b41c
Use weakref instead of id for DataIter cache. (#9445)
- Fix case where Python reuses id from freed objects.
- Small optimization to column matrix with QDM by using `realloc` instead of copying data.
2023-08-10 00:40:06 +08:00
Bobby Wang
d495a180d8
[pyspark] add logs for training (#9449) 2023-08-09 18:32:23 +08:00
Jiaming Yuan
54029a59af
Bound the size of the histogram cache. (#9440)
- A new histogram collection with a limit in size.
- Unify histogram building logic between hist, multi-hist, and approx.
2023-08-08 03:21:26 +08:00
Hendrik Makait
f958e32683
Raise if expected workers are not alive in xgboost.dask.train (#9421) 2023-08-03 20:14:07 +08:00
Jiaming Yuan
7129988847
Accept only keyword arguments in data iterator. (#9431) 2023-08-03 12:44:16 +08:00
Jiaming Yuan
912e341d57
Initial GPU support for the approx tree method. (#9414) 2023-07-31 15:50:28 +08:00
Jiaming Yuan
851cba931e
Define best_iteration only if early stopping is used. (#9403)
* Define `best_iteration` only if early stopping is used.

This is the behavior specified by the document but not honored in the actual code.

- Don't set the attributes if there's no early stopping.
- Clean up the code for callbacks, and replace assertions with proper exceptions.
- Assign the attributes when early stopping `save_best` is used.
- Turn the attributes into Python properties.

---------

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2023-07-24 12:43:35 +08:00
Jiaming Yuan
01e00efc53
[breaking] Remove support for single string feature info. (#9401)
- Input must be a sequence of strings.
- Improve validation error message.
2023-07-24 11:06:30 +08:00