914 Commits

Author SHA1 Message Date
Philip Hyunsu Cho
edb945d59b
[CI] Use native arm64 worker in GHAction to build M1 wheel (#10225)
* [CI] Use native arm64 worker in GHAction to build M1 wheel

* Set up Conda

* Use mamba

* debug

* fix

* fix

* fix

* fix

* fix

* Temporarily disable other tests

* Fix prefix

* Use micromamba

* Use conda-incubator/setup-miniconda

* Use mambaforge

* Fix

* Fix prefix

* Don't use deprecated set-output

* Add verbose output from build

* verbose

* Specify arch

* Bump setup-miniconda to v3

* Use Python 3.9

* Restore deleted files

* WAR.

---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2024-04-26 10:16:55 -07:00
Bobby Wang
8fb05c8c95
[pyspark] support stage-level for yarn/k8s (#10209) 2024-04-20 00:24:40 +08:00
Jiaming Yuan
303c603c7d
[pyspark] Reuse the collective communicator. (#10198) 2024-04-18 19:09:30 +08:00
github-actions[bot]
2925cebdca
[CI] Use latest RAPIDS; Pandas 2.0 compatibility fix (#10175)
* [CI] Update RAPIDS to latest stable

* [CI] Use rapidsai stable channel; fix syntax errors in Dockerfile.gpu

* Don't combine astype() with loc()

* Work around https://github.com/dmlc/xgboost/issues/10181

* Fix formatting

* Fix test

---------

Co-authored-by: hcho3 <hcho3@users.noreply.github.com>
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2024-04-15 13:38:53 -07:00
Jiaming Yuan
f0a138f33a
Fix pyspark with verbosity=3. (#10172) 2024-04-09 23:18:56 +08:00
Jiaming Yuan
ca4801f81d
Work with IPv6 in the new tracker. (#10125) 2024-03-20 05:19:23 +08:00
Jiaming Yuan
e14c3b9325
Optional normalization for learning to rank. (#10094) 2024-03-08 12:41:21 +08:00
Bobby Wang
d24df52bb9
[pyspark] rework the log (#10077) 2024-02-29 16:47:31 +08:00
Jiaming Yuan
eb281ff9b4
[CI] Fix JVM tests on GH Action (#10064)
---------

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2024-02-22 14:21:32 -08:00
Jiaming Yuan
8ea705e4d5
Support sample weight in sklearn custom objective. (#10050) 2024-02-21 00:43:14 +08:00
Jiaming Yuan
69a17d5114
Fix with None input. (#10052) 2024-02-20 22:34:22 +08:00
david-cortes
3abbbe41ac
[R] Add data iterator, quantile dmatrix, external memory, and missing feature_types (#9913) 2024-01-30 19:26:44 +08:00
Jiaming Yuan
54b71c8fba
Fix with black 24.1.1. (#10014) 2024-01-30 17:24:11 +08:00
Jiaming Yuan
65d7bf2dfe
Handle np integer in model slice and prediction. (#10007) 2024-01-26 04:58:48 +08:00
Jiaming Yuan
d12cc1090a
Refactor tests for training continuation. (#9997) 2024-01-24 16:07:19 +08:00
Jiaming Yuan
0798e36d73
[breaking] Remove deprecated parameters in the skl interface. (#9986) 2024-01-15 20:40:05 +08:00
Jiaming Yuan
01c4711556
Check __cuda_array_interface__ instead of cupy class. (#9971)
* Now XGBoost can directly consume CUDA data from torch.
2024-01-09 19:59:01 +08:00
Jiaming Yuan
b3eb5d0945
Use UBJ in Python checkpoint. (#9958) 2024-01-09 03:22:15 +08:00
Jiaming Yuan
fa5e2f6c45
Synthesize the AMES housing dataset for tests. (#9963) 2024-01-09 00:54:23 +08:00
Jiaming Yuan
38dd91f491
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00
Jiaming Yuan
621348abb3
Fix multi-output with alternating strategies. (#9933)
---------

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2024-01-04 16:41:13 +08:00
Jiaming Yuan
5f7b5a6921
Add tests for pickling with custom obj and metric. (#9943) 2024-01-04 14:52:48 +08:00
Jiaming Yuan
9f73127a23
Cleanup Python GPU tests. (#9934)
* Cleanup Python GPU tests.

- Remove the use of `gpu_hist` and `gpu_id` in cudf/cupy tests.
- Move base margin test into the testing directory.
2024-01-04 13:15:18 +08:00
Jiaming Yuan
0edd600f3d
[doc] Brief introduction to base_score. (#9882) 2023-12-17 13:34:34 +08:00
Jiaming Yuan
125bc812f8
[doc] Reference enable_categorical doc in sklearn. (#9884) 2023-12-14 23:29:19 +08:00
Jiaming Yuan
1aa8c8d9be
Support more scipy types. (#9881) 2023-12-14 18:28:37 +08:00
david-cortes
42173d7bc3
[doc] Clarify the effect of enable_categorical (#9877) 2023-12-13 08:39:41 +08:00
Jiaming Yuan
faf0f2df10
Support dataframe data format in native XGBoost. (#9828)
- Implement a columnar adapter.
- Refactor Python pandas handling code to avoid converting into a single numpy array.
- Add support in R for transforming columns.
- Support R data.frame and factor type.
2023-12-12 09:56:31 +08:00
Jiaming Yuan
1094d6015d
[py] Use the first found native library. (#9860) 2023-12-08 17:23:16 +08:00
Jiaming Yuan
39c637ee19
Use array interface in Python prediction return. (#9855) 2023-12-08 03:42:14 +08:00
Jiaming Yuan
e9f149481e
[sklearn] Fix loading model attributes. (#9808) 2023-11-27 17:19:01 +08:00
Jiaming Yuan
8fe1a2213c
Cleanup code for distributed training. (#9805)
* Cleanup code for distributed training.

- Merge `GetNcclResult` into nccl stub.
- Split up utilities from the main dask module.
- Let Channel return `Result` to accommodate nccl channel.
- Remove old `use_label_encoder` parameter.
2023-11-25 09:10:56 +08:00
Jiaming Yuan
e9260de3f3
[breaking] Remove dense libsvm parser plugin. (#9799) 2023-11-23 00:12:39 +08:00
Jiaming Yuan
0715ab3c10
Use dlopen to load NCCL. (#9796)
This PR adds optional support for loading nccl with `dlopen` as an alternative of compile time linking. This is to address the size bloat issue with the PyPI binary release.
- Add CMake option to load `nccl` at runtime.
- Add an NCCL stub.

After this, `nccl` will be fetched from PyPI when using pip to install XGBoost, either by a user or by `pyproject.toml`. Others who want to link the nccl at compile time can continue to do so without any change.

At the moment, this is Linux only since we only support MNMG on Linux.
2023-11-22 19:27:31 +08:00
Bobby Wang
178cfe70a8
[pyspark][doc] Test and doc for stage-level scheduling. (#9786) 2023-11-16 18:15:59 +08:00
Jiaming Yuan
c3a0622b49
Fix using categorical data with the score function of ranker. (#9753) 2023-11-07 07:29:11 +08:00
david-cortes
be20df8c23
[Python] Accept numpy generators as random_state (#9743)
* accept numpy generators for random_state

* make linter happy

* fix tests
2023-11-01 16:20:44 -07:00
omahs
2cfc90e8db
Fix typos (#9731) 2023-10-30 16:52:12 +08:00
Bobby Wang
1323531323
[pyspark] unify the way for determining whether runs on the GPU. (#9724) 2023-10-27 11:21:30 +08:00
Philip Hyunsu Cho
01d59ded00
Fix libpath logic for Windows (#9712)
* Fix libpath logic for Windows (#9687)

* Use sys.base_prefix instead of sys.prefix (#9711)

* Use sys.base_prefix instead of sys.prefix

* Update libpath.py too
2023-10-24 17:25:28 -07:00
Jiaming Yuan
3ca06ac51e
[doc] Mention data consistency for categorical features. (#9678) 2023-10-24 10:11:33 +08:00
Rong Ou
6fbe6248f4
More in-memory input support for column split (#9685) 2023-10-20 16:02:36 +08:00
Rong Ou
da6803b75b
Support column-wise data split with in-memory inputs (#9628)
---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-10-17 12:16:39 +08:00
Bobby Wang
4d1607eefd
[pyspark] Support stage-level scheduling for training (#9519) 2023-10-17 10:35:39 +08:00
Jiaming Yuan
4e5a7729c3
Fix lint errors. (#9634) 2023-10-09 19:04:31 +08:00
Jiaming Yuan
60526100e3
Support arrow through pandas ext types. (#9612)
- Use pandas extension type for pyarrow support.
- Additional support for QDM.
- Additional support for inplace_predict.
2023-09-28 17:00:16 +08:00
Jiaming Yuan
c75a3bc0a9
[breaking] [jvm-packages] Remove rabit check point. (#9599)
- Add `numBoostedRound` to jvm packages
- Remove rabit checkpoint version.
- Change the starting version of training continuation in JVM [breaking].
- Redefine the checkpoint version policy in jvm package. [breaking]
- Rename the Python check point callback parameter. [breaking]
- Unifies the checkpoint policy between Python and JVM.
2023-09-26 18:06:34 +08:00
Jiaming Yuan
a90d204942
Use array interface for testing numpy arrays. (#9602) 2023-09-23 03:13:48 +08:00
Jiaming Yuan
bbf5b9ee57
[dask] Move dask module into directory. (#9597) 2023-09-23 01:28:18 +08:00
Jiaming Yuan
9027686cac
Support pandas 2.1.0. (#9557) 2023-09-11 17:44:51 +08:00