Jiaming Yuan
d2d01d977a
Remove unnecessary fetch operations in external memory. ( #10342 )
2024-05-31 13:16:40 +08:00
Jiaming Yuan
a5a58102e5
Revamp the rabit implementation. ( #10112 )
...
This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features:
- Federated learning for both CPU and GPU.
- NCCL.
- More data types.
- A unified interface for all the underlying implementations.
- Improved timeout handling for both tracker and workers.
- Exhausted tests with metrics (fixed a couple of bugs along the way).
- A reusable tracker for Python and JVM packages.
2024-05-20 11:56:23 +08:00
Jiaming Yuan
ca1d04bcb7
Release data in cache. ( #10286 )
2024-05-14 14:20:19 +08:00
Jiaming Yuan
f1f69ff10e
[CI] Fixes for using the latest modin. ( #10285 )
2024-05-14 12:13:35 +08:00
Jiaming Yuan
d81e319e78
Fixes for the latest pandas. ( #10266 )
...
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2024-05-12 11:15:46 +08:00
Jiaming Yuan
73afef1a6e
Fixes for numpy 2.0. ( #10252 )
2024-05-07 03:54:32 +08:00
Jiaming Yuan
837d44a345
Support more sklearn tags for testing. ( #10230 )
2024-04-29 06:33:23 +08:00
Jiaming Yuan
1450aebb74
Fix pairwise objective with NDCG metric along with custom gain. ( #10100 )
...
* Fix pairwise objective with NDCG metric.
- Allow setting `ndcg_exp_gain` for `rank:pairwise`.
This is useful when using pairwise for objective but ndcg for metric.
2024-03-11 14:54:10 +08:00
Jiaming Yuan
e14c3b9325
Optional normalization for learning to rank. ( #10094 )
2024-03-08 12:41:21 +08:00
Jiaming Yuan
3941b31ade
Disable column sample by node for the exact tree method. ( #10083 )
2024-03-01 14:16:10 +08:00
Jiaming Yuan
8ea705e4d5
Support sample weight in sklearn custom objective. ( #10050 )
2024-02-21 00:43:14 +08:00
Jiaming Yuan
69a17d5114
Fix with None input. ( #10052 )
2024-02-20 22:34:22 +08:00
Louis Desreumaux
edf501d227
Implement contribution prediction with QuantileDMatrix ( #10043 )
...
---------
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2024-02-19 21:03:29 +08:00
Jiaming Yuan
54b71c8fba
Fix with black 24.1.1. ( #10014 )
2024-01-30 17:24:11 +08:00
Jiaming Yuan
65d7bf2dfe
Handle np integer in model slice and prediction. ( #10007 )
2024-01-26 04:58:48 +08:00
Jiaming Yuan
d12cc1090a
Refactor tests for training continuation. ( #9997 )
2024-01-24 16:07:19 +08:00
Jiaming Yuan
0798e36d73
[breaking] Remove deprecated parameters in the skl interface. ( #9986 )
2024-01-15 20:40:05 +08:00
Jiaming Yuan
2f57bbde3c
Additional tests for attributes and model booosted rounds. ( #9962 )
2024-01-09 09:54:39 +08:00
Jiaming Yuan
b3eb5d0945
Use UBJ in Python checkpoint. ( #9958 )
2024-01-09 03:22:15 +08:00
Jiaming Yuan
9a30bdd313
Test loading models with invalid file extensions. ( #9955 )
2024-01-08 19:26:24 +08:00
Jiaming Yuan
38dd91f491
Save model in ubj as the default. ( #9947 )
2024-01-05 17:53:36 +08:00
Jiaming Yuan
c03a4d5088
Check support status for categorical features. ( #9946 )
2024-01-04 16:51:33 +08:00
Jiaming Yuan
621348abb3
Fix multi-output with alternating strategies. ( #9933 )
...
---------
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2024-01-04 16:41:13 +08:00
Jiaming Yuan
5f7b5a6921
Add tests for pickling with custom obj and metric. ( #9943 )
2024-01-04 14:52:48 +08:00
Jiaming Yuan
9f73127a23
Cleanup Python GPU tests. ( #9934 )
...
* Cleanup Python GPU tests.
- Remove the use of `gpu_hist` and `gpu_id` in cudf/cupy tests.
- Move base margin test into the testing directory.
2024-01-04 13:15:18 +08:00
Jiaming Yuan
a7226c0222
Fix feature names with special characters. ( #9923 )
2023-12-28 22:45:13 +08:00
Jiaming Yuan
1aa8c8d9be
Support more scipy types. ( #9881 )
2023-12-14 18:28:37 +08:00
Jiaming Yuan
faf0f2df10
Support dataframe data format in native XGBoost. ( #9828 )
...
- Implement a columnar adapter.
- Refactor Python pandas handling code to avoid converting into a single numpy array.
- Add support in R for transforming columns.
- Support R data.frame and factor type.
2023-12-12 09:56:31 +08:00
Jiaming Yuan
e9f149481e
[sklearn] Fix loading model attributes. ( #9808 )
2023-11-27 17:19:01 +08:00
Jiaming Yuan
c3a0622b49
Fix using categorical data with the score function of ranker. ( #9753 )
2023-11-07 07:29:11 +08:00
david-cortes
be20df8c23
[Python] Accept numpy generators as random_state ( #9743 )
...
* accept numpy generators for random_state
* make linter happy
* fix tests
2023-11-01 16:20:44 -07:00
Jiaming Yuan
3ca06ac51e
[doc] Mention data consistency for categorical features. ( #9678 )
2023-10-24 10:11:33 +08:00
Rong Ou
6fbe6248f4
More in-memory input support for column split ( #9685 )
2023-10-20 16:02:36 +08:00
Rong Ou
da6803b75b
Support column-wise data split with in-memory inputs ( #9628 )
...
---------
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-10-17 12:16:39 +08:00
Jiaming Yuan
60526100e3
Support arrow through pandas ext types. ( #9612 )
...
- Use pandas extension type for pyarrow support.
- Additional support for QDM.
- Additional support for inplace_predict.
2023-09-28 17:00:16 +08:00
Jiaming Yuan
c75a3bc0a9
[breaking] [jvm-packages] Remove rabit check point. ( #9599 )
...
- Add `numBoostedRound` to jvm packages
- Remove rabit checkpoint version.
- Change the starting version of training continuation in JVM [breaking].
- Redefine the checkpoint version policy in jvm package. [breaking]
- Rename the Python check point callback parameter. [breaking]
- Unifies the checkpoint policy between Python and JVM.
2023-09-26 18:06:34 +08:00
Jiaming Yuan
9027686cac
Support pandas 2.1.0. ( #9557 )
2023-09-11 17:44:51 +08:00
Jiaming Yuan
ccfc90e4c6
[rabit] Improved connection handling. ( #9531 )
...
- Enable timeout.
- Report connection error from the system.
- Handle retry for both tracker connection and peer connection.
2023-08-30 13:00:04 +08:00
Jiaming Yuan
209335b18c
Remove the deprecated Python rabit module. ( #9523 )
2023-08-27 03:37:05 +08:00
Jiaming Yuan
7f29a238e6
Return base score as intercept. ( #9486 )
2023-08-19 12:28:02 +08:00
Jiaming Yuan
19b59938b7
Convert input to str for hypothesis note. ( #9480 )
2023-08-15 02:27:58 +08:00
Jiaming Yuan
05d7000096
Handle special characters in JSON model dump. ( #9474 )
2023-08-14 15:49:00 +08:00
Jiaming Yuan
801116c307
Test scikit-learn model IO with gblinear. ( #9459 )
2023-08-13 23:41:49 +08:00
Jiaming Yuan
f05a23b41c
Use weakref instead of id for DataIter cache. ( #9445 )
...
- Fix case where Python reuses id from freed objects.
- Small optimization to column matrix with QDM by using `realloc` instead of copying data.
2023-08-10 00:40:06 +08:00
Philip Hyunsu Cho
7ce090e775
Handle UTF-8 paths correctly on Windows platform ( #9443 )
...
* Fix round-trip serialization with UTF-8 paths
* Add compiler version check
* Add comment to C API functions
* Add Python tests
* [CI] Updatre MacOS deployment target
* Use std::filesystem instead of dmlc::TemporaryDirectory
2023-08-07 23:27:25 -07:00
Jiaming Yuan
54029a59af
Bound the size of the histogram cache. ( #9440 )
...
- A new histogram collection with a limit in size.
- Unify histogram building logic between hist, multi-hist, and approx.
2023-08-08 03:21:26 +08:00
Jiaming Yuan
912e341d57
Initial GPU support for the approx tree method. ( #9414 )
2023-07-31 15:50:28 +08:00
Jiaming Yuan
851cba931e
Define best_iteration only if early stopping is used. ( #9403 )
...
* Define `best_iteration` only if early stopping is used.
This is the behavior specified by the document but not honored in the actual code.
- Don't set the attributes if there's no early stopping.
- Clean up the code for callbacks, and replace assertions with proper exceptions.
- Assign the attributes when early stopping `save_best` is used.
- Turn the attributes into Python properties.
---------
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2023-07-24 12:43:35 +08:00
Jiaming Yuan
01e00efc53
[breaking] Remove support for single string feature info. ( #9401 )
...
- Input must be a sequence of strings.
- Improve validation error message.
2023-07-24 11:06:30 +08:00
Jiaming Yuan
16eb41936d
Handle the new device parameter in dask and demos. ( #9386 )
...
* Handle the new `device` parameter in dask and demos.
- Check no ordinal is specified in the dask interface.
- Update demos.
- Update dask doc.
- Update the condition for QDM.
2023-07-15 19:11:20 +08:00