581 Commits

Author SHA1 Message Date
Jiaming Yuan
d2d01d977a
Remove unnecessary fetch operations in external memory. (#10342) 2024-05-31 13:16:40 +08:00
Jiaming Yuan
a5a58102e5
Revamp the rabit implementation. (#10112)
This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features:
- Federated learning for both CPU and GPU.
- NCCL.
- More data types.
- A unified interface for all the underlying implementations.
- Improved timeout handling for both tracker and workers.
- Exhausted tests with metrics (fixed a couple of bugs along the way).
- A reusable tracker for Python and JVM packages.
2024-05-20 11:56:23 +08:00
Jiaming Yuan
ca1d04bcb7
Release data in cache. (#10286) 2024-05-14 14:20:19 +08:00
Jiaming Yuan
f1f69ff10e
[CI] Fixes for using the latest modin. (#10285) 2024-05-14 12:13:35 +08:00
Jiaming Yuan
d81e319e78
Fixes for the latest pandas. (#10266)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2024-05-12 11:15:46 +08:00
Jiaming Yuan
73afef1a6e
Fixes for numpy 2.0. (#10252) 2024-05-07 03:54:32 +08:00
Jiaming Yuan
837d44a345
Support more sklearn tags for testing. (#10230) 2024-04-29 06:33:23 +08:00
Jiaming Yuan
1450aebb74
Fix pairwise objective with NDCG metric along with custom gain. (#10100)
* Fix pairwise objective with NDCG metric.

- Allow setting `ndcg_exp_gain` for `rank:pairwise`.

This is useful when using pairwise for objective but ndcg for metric.
2024-03-11 14:54:10 +08:00
Jiaming Yuan
e14c3b9325
Optional normalization for learning to rank. (#10094) 2024-03-08 12:41:21 +08:00
Jiaming Yuan
3941b31ade
Disable column sample by node for the exact tree method. (#10083) 2024-03-01 14:16:10 +08:00
Jiaming Yuan
8ea705e4d5
Support sample weight in sklearn custom objective. (#10050) 2024-02-21 00:43:14 +08:00
Jiaming Yuan
69a17d5114
Fix with None input. (#10052) 2024-02-20 22:34:22 +08:00
Louis Desreumaux
edf501d227
Implement contribution prediction with QuantileDMatrix (#10043)
---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2024-02-19 21:03:29 +08:00
Jiaming Yuan
54b71c8fba
Fix with black 24.1.1. (#10014) 2024-01-30 17:24:11 +08:00
Jiaming Yuan
65d7bf2dfe
Handle np integer in model slice and prediction. (#10007) 2024-01-26 04:58:48 +08:00
Jiaming Yuan
d12cc1090a
Refactor tests for training continuation. (#9997) 2024-01-24 16:07:19 +08:00
Jiaming Yuan
0798e36d73
[breaking] Remove deprecated parameters in the skl interface. (#9986) 2024-01-15 20:40:05 +08:00
Jiaming Yuan
2f57bbde3c
Additional tests for attributes and model booosted rounds. (#9962) 2024-01-09 09:54:39 +08:00
Jiaming Yuan
b3eb5d0945
Use UBJ in Python checkpoint. (#9958) 2024-01-09 03:22:15 +08:00
Jiaming Yuan
9a30bdd313
Test loading models with invalid file extensions. (#9955) 2024-01-08 19:26:24 +08:00
Jiaming Yuan
38dd91f491
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00
Jiaming Yuan
c03a4d5088
Check support status for categorical features. (#9946) 2024-01-04 16:51:33 +08:00
Jiaming Yuan
621348abb3
Fix multi-output with alternating strategies. (#9933)
---------

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2024-01-04 16:41:13 +08:00
Jiaming Yuan
5f7b5a6921
Add tests for pickling with custom obj and metric. (#9943) 2024-01-04 14:52:48 +08:00
Jiaming Yuan
9f73127a23
Cleanup Python GPU tests. (#9934)
* Cleanup Python GPU tests.

- Remove the use of `gpu_hist` and `gpu_id` in cudf/cupy tests.
- Move base margin test into the testing directory.
2024-01-04 13:15:18 +08:00
Jiaming Yuan
a7226c0222
Fix feature names with special characters. (#9923) 2023-12-28 22:45:13 +08:00
Jiaming Yuan
1aa8c8d9be
Support more scipy types. (#9881) 2023-12-14 18:28:37 +08:00
Jiaming Yuan
faf0f2df10
Support dataframe data format in native XGBoost. (#9828)
- Implement a columnar adapter.
- Refactor Python pandas handling code to avoid converting into a single numpy array.
- Add support in R for transforming columns.
- Support R data.frame and factor type.
2023-12-12 09:56:31 +08:00
Jiaming Yuan
e9f149481e
[sklearn] Fix loading model attributes. (#9808) 2023-11-27 17:19:01 +08:00
Jiaming Yuan
c3a0622b49
Fix using categorical data with the score function of ranker. (#9753) 2023-11-07 07:29:11 +08:00
david-cortes
be20df8c23
[Python] Accept numpy generators as random_state (#9743)
* accept numpy generators for random_state

* make linter happy

* fix tests
2023-11-01 16:20:44 -07:00
Jiaming Yuan
3ca06ac51e
[doc] Mention data consistency for categorical features. (#9678) 2023-10-24 10:11:33 +08:00
Rong Ou
6fbe6248f4
More in-memory input support for column split (#9685) 2023-10-20 16:02:36 +08:00
Rong Ou
da6803b75b
Support column-wise data split with in-memory inputs (#9628)
---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-10-17 12:16:39 +08:00
Jiaming Yuan
60526100e3
Support arrow through pandas ext types. (#9612)
- Use pandas extension type for pyarrow support.
- Additional support for QDM.
- Additional support for inplace_predict.
2023-09-28 17:00:16 +08:00
Jiaming Yuan
c75a3bc0a9
[breaking] [jvm-packages] Remove rabit check point. (#9599)
- Add `numBoostedRound` to jvm packages
- Remove rabit checkpoint version.
- Change the starting version of training continuation in JVM [breaking].
- Redefine the checkpoint version policy in jvm package. [breaking]
- Rename the Python check point callback parameter. [breaking]
- Unifies the checkpoint policy between Python and JVM.
2023-09-26 18:06:34 +08:00
Jiaming Yuan
9027686cac
Support pandas 2.1.0. (#9557) 2023-09-11 17:44:51 +08:00
Jiaming Yuan
ccfc90e4c6
[rabit] Improved connection handling. (#9531)
- Enable timeout.
- Report connection error from the system.
- Handle retry for both tracker connection and peer connection.
2023-08-30 13:00:04 +08:00
Jiaming Yuan
209335b18c
Remove the deprecated Python rabit module. (#9523) 2023-08-27 03:37:05 +08:00
Jiaming Yuan
7f29a238e6
Return base score as intercept. (#9486) 2023-08-19 12:28:02 +08:00
Jiaming Yuan
19b59938b7
Convert input to str for hypothesis note. (#9480) 2023-08-15 02:27:58 +08:00
Jiaming Yuan
05d7000096
Handle special characters in JSON model dump. (#9474) 2023-08-14 15:49:00 +08:00
Jiaming Yuan
801116c307
Test scikit-learn model IO with gblinear. (#9459) 2023-08-13 23:41:49 +08:00
Jiaming Yuan
f05a23b41c
Use weakref instead of id for DataIter cache. (#9445)
- Fix case where Python reuses id from freed objects.
- Small optimization to column matrix with QDM by using `realloc` instead of copying data.
2023-08-10 00:40:06 +08:00
Philip Hyunsu Cho
7ce090e775
Handle UTF-8 paths correctly on Windows platform (#9443)
* Fix round-trip serialization with UTF-8 paths

* Add compiler version check

* Add comment to C API functions

* Add Python tests

* [CI] Updatre MacOS deployment target

* Use std::filesystem instead of dmlc::TemporaryDirectory
2023-08-07 23:27:25 -07:00
Jiaming Yuan
54029a59af
Bound the size of the histogram cache. (#9440)
- A new histogram collection with a limit in size.
- Unify histogram building logic between hist, multi-hist, and approx.
2023-08-08 03:21:26 +08:00
Jiaming Yuan
912e341d57
Initial GPU support for the approx tree method. (#9414) 2023-07-31 15:50:28 +08:00
Jiaming Yuan
851cba931e
Define best_iteration only if early stopping is used. (#9403)
* Define `best_iteration` only if early stopping is used.

This is the behavior specified by the document but not honored in the actual code.

- Don't set the attributes if there's no early stopping.
- Clean up the code for callbacks, and replace assertions with proper exceptions.
- Assign the attributes when early stopping `save_best` is used.
- Turn the attributes into Python properties.

---------

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2023-07-24 12:43:35 +08:00
Jiaming Yuan
01e00efc53
[breaking] Remove support for single string feature info. (#9401)
- Input must be a sequence of strings.
- Improve validation error message.
2023-07-24 11:06:30 +08:00
Jiaming Yuan
16eb41936d
Handle the new device parameter in dask and demos. (#9386)
* Handle the new `device` parameter in dask and demos.

- Check no ordinal is specified in the dask interface.
- Update demos.
- Update dask doc.
- Update the condition for QDM.
2023-07-15 19:11:20 +08:00