961 Commits

Author SHA1 Message Date
Jiaming Yuan
9ecb7583e9
[EM] Add basic distributed GPU tests. (#10861)
- Split Hist and Approx tests in unittests.
- Basic GPU tests for distributed.
2024-10-01 01:28:43 +08:00
Jiaming Yuan
92f1c48a22
[EM] Get quantile cuts from the extmem qdm. (#10860) 2024-10-01 00:59:28 +08:00
Jiaming Yuan
68a8865bc5
[CI] Fix PyLint errors. (#10837) 2024-09-24 14:09:32 +08:00
Jiaming Yuan
e228c1a121
[EM] Make page concatenation optional. (#10826)
This PR introduces a new parameter `extmem_concat_pages` to make the page concatenation optional for GPU hist. In addition, the document is updated for the new GPU-based external memory.
2024-09-24 06:19:28 +08:00
Dmitry Razdoburdin
d7599e095b
[SYCL] Add dask support for distributed (#10812) 2024-09-22 02:01:57 +08:00
Jiaming Yuan
d5e1c41b69
[coll] Use loky for rabit op tests. (#10828) 2024-09-20 16:46:05 +08:00
shlomota
de00e07087
Fix misleading error when feature names are missing during inference (#10814) 2024-09-13 23:30:50 +08:00
Dmitry Razdoburdin
bba6aa74fb
[SYCL] Fix for sycl support with sklearn estimators (#10806)
---------

Co-authored-by: Dmitry Razdoburdin <>
2024-09-09 14:14:07 +08:00
Jiaming Yuan
e1a2c1bbb3
[EM] Merge GPU partitioning with histogram building. (#10766)
- Stop concatenating pages if there's no subsampling.
- Use a single iteration for histogram build and partitioning.
2024-08-31 03:25:37 +08:00
Jiaming Yuan
34937fea41
[EM] Python wrapper for the ExtMemQuantileDMatrix. (#10762)
Not exposed to the document yet.

- Add C API.
- Add Python API.
- Basic CPU tests.
2024-08-29 04:08:25 +08:00
Philip Hyunsu Cho
7794d3da8a
Ensure that pip check does not fail due to bad platform tag (#10755)
* Remove custom tag generation

* Revert "Remove custom tag generation"

This reverts commit fe3cf0e8786c7dc05e1deced3a1c92cd79094735.

* Fetch an accurate platform tag from Pip 22+

* Fix formatting

* TOML allows trailing commas

* Update patch

* Add trailing comma

* Fix up patch

* Use `packaging`

Co-authored-by: jakirkham <jakirkham@gmail.com>

---------

Co-authored-by: jakirkham <jakirkham@gmail.com>
2024-08-27 18:11:08 -07:00
Jiaming Yuan
d6ebcfb032
[EM] Support CPU quantile objective for external memory. (#10751) 2024-08-27 04:16:57 +08:00
Jiaming Yuan
06c4246ff1
[CI] Workaround mypy errors. (#10754) 2024-08-27 02:54:11 +08:00
Jiaming Yuan
2258bc870d
Add more tests and doc for QDM. (#10692) 2024-08-16 23:30:04 +08:00
Jiaming Yuan
3d8107adb8
Support doc link for the sklearn module. (#10287) 2024-08-06 02:35:32 +08:00
Jiaming Yuan
a185b693dc
Reduce warnings and flakiness in tests. (#10659)
- Fix warnings in tests.
- Try to reduce the flakiness of dask test.
2024-08-03 07:32:47 +08:00
Jiaming Yuan
827d0e8edb
[breaking] Bump Python requirement to 3.10. (#10434)
- Bump the Python requirement.
- Fix type hints.
- Use loky to avoid deadlock.
- Workaround cupy-numpy compatibility issue on Windows caused by the `safe` casting rule.
- Simplify the repartitioning logic to avoid dask errors.
2024-07-30 17:31:06 +08:00
jakirkham
d4b82f50ab
Add Library\mingw-w64 to Windows search path (#10643) 2024-07-29 14:17:59 -07:00
Jiaming Yuan
fcae6301ec
[dask] Disable broadcast in the scatter call. (#10632) 2024-07-25 04:16:34 +08:00
Jiaming Yuan
0846ad860c
Optionally skip cupy on windows. (#10611) 2024-07-20 22:12:12 +08:00
Philip Hyunsu Cho
326921dbe4
[CI] Build a CPU-only wheel under name xgboost-cpu (#10603) 2024-07-19 10:51:08 -07:00
david-cortes
8d0f2bfbaa
[doc] Add more detailed explanations for advanced objectives (#10283)
---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2024-07-08 19:17:31 +08:00
Jiaming Yuan
00264eb72b
[EM] Basic distributed test for external memory. (#10492) 2024-07-06 01:15:20 +08:00
Jiaming Yuan
e537b0969f
Fix boolean array for arrow-backed DF. (#10527) 2024-07-02 17:02:54 +08:00
Jiaming Yuan
a39fef2c67
[fed] Fixes for the encrypted GRPC backend. (#10503) 2024-07-02 15:15:12 +08:00
Jiaming Yuan
e8a962575a
[EM] Allow staging ellpack on host for GPU external memory. (#10488)
- New parameter `on_host`.
- Abstract format creation and stream creation into policy classes.
2024-06-28 04:42:18 +08:00
Jiaming Yuan
824fba783e
Remove support for deprecated format in Python. (#10490) 2024-06-27 11:31:53 +08:00
Jiaming Yuan
2d88d17008
Remove deprecated DeviceQuantileDMatrix. (#10491) 2024-06-27 11:30:51 +08:00
Philip Hyunsu Cho
9a8bb7d186
Require Pandas 1.2+ (#10476) 2024-06-22 14:15:22 -07:00
Philip Hyunsu Cho
bc3747bdce
[CI] Migrate to rockylinux8 / manylinux_2_28_x86_64 (#10399)
* [CI] Migrate to rockylinux8 / manylinux_2_28_x86_64

* Scrub all references to CentOS 7

* Fix

* Remove use of yum

* Use gcc-10 in cpu

* Temporarily disable -Werror

* Use GCC 9 for now

* Roll back gRPC

* Scrub all references to manylinux2014_x86_64

* Revise rename_whl.py to handle no-op rename

* Change JDK_VERSION back to 8

* Reviewer's comment

* Use GCC 10

* Use Spark 3.5.1, same as in pom.xml

* Fix JAR install
2024-06-17 12:07:49 -07:00
Jiaming Yuan
6c83c8c2ef
Allow blocking launch of federated tracker. (#10414)
---------

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2024-06-16 01:43:53 +08:00
Jiaming Yuan
bbff74d2ff
[dask] Workaround the tokenizer by changing the scatter function. (#10419)
---------

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2024-06-15 19:10:00 +08:00
Richard (Rick) Zamora
dc14f98f40
Avoid default tokenization in Dask (#10398)
---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2024-06-14 19:44:54 +08:00
Bobby Wang
cf0c1d0888
[pyspark] Avoid repartition. (#10408) 2024-06-12 02:26:10 +08:00
Christopher Tee
e0ebbc0746
[doc] Fix small typos (#10405) 2024-06-11 16:13:02 +08:00
Jiaming Yuan
9f6608d6aa
Add python 3.12 classifier. (#10381) 2024-06-04 18:02:59 +08:00
Jiaming Yuan
43a57c4a85
Bump development version to 2.2. (#10376) 2024-06-04 12:59:16 +08:00
Jiaming Yuan
979e392deb
Fix warnings in GPU dask tests. (#10358) 2024-06-04 12:58:58 +08:00
Jiaming Yuan
e6eefea5e2
[coll] Move the rabit poll helper. (#10349) 2024-05-31 08:02:21 +08:00
Philip Hyunsu Cho
324f2d4e4a
Handle float128 generically (#10322) 2024-05-30 20:14:39 +08:00
Jiaming Yuan
a5a58102e5
Revamp the rabit implementation. (#10112)
This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features:
- Federated learning for both CPU and GPU.
- NCCL.
- More data types.
- A unified interface for all the underlying implementations.
- Improved timeout handling for both tracker and workers.
- Exhausted tests with metrics (fixed a couple of bugs along the way).
- A reusable tracker for Python and JVM packages.
2024-05-20 11:56:23 +08:00
Jiaming Yuan
ba9b4cb1ee
Fix pylint. (#10296) 2024-05-17 13:28:39 +08:00
Jiaming Yuan
ca1d04bcb7
Release data in cache. (#10286) 2024-05-14 14:20:19 +08:00
Jiaming Yuan
d81e319e78
Fixes for the latest pandas. (#10266)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2024-05-12 11:15:46 +08:00
Jiaming Yuan
73afef1a6e
Fixes for numpy 2.0. (#10252) 2024-05-07 03:54:32 +08:00
Jiaming Yuan
837d44a345
Support more sklearn tags for testing. (#10230) 2024-04-29 06:33:23 +08:00
Jiaming Yuan
54754f29dd
[pyspark] Sort workers by task ID. (#10220) 2024-04-28 18:05:15 +08:00
Philip Hyunsu Cho
edb945d59b
[CI] Use native arm64 worker in GHAction to build M1 wheel (#10225)
* [CI] Use native arm64 worker in GHAction to build M1 wheel

* Set up Conda

* Use mamba

* debug

* fix

* fix

* fix

* fix

* fix

* Temporarily disable other tests

* Fix prefix

* Use micromamba

* Use conda-incubator/setup-miniconda

* Use mambaforge

* Fix

* Fix prefix

* Don't use deprecated set-output

* Add verbose output from build

* verbose

* Specify arch

* Bump setup-miniconda to v3

* Use Python 3.9

* Restore deleted files

* WAR.

---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2024-04-26 10:16:55 -07:00
Bobby Wang
8fb05c8c95
[pyspark] support stage-level for yarn/k8s (#10209) 2024-04-20 00:24:40 +08:00
Jiaming Yuan
303c603c7d
[pyspark] Reuse the collective communicator. (#10198) 2024-04-18 19:09:30 +08:00