867 Commits

Author SHA1 Message Date
Jiaming Yuan
e4ee4e79dc
[backport][sklearn] Fix loading model attributes. (#9808) (#9880) 2023-12-13 14:20:04 +08:00
Philip Hyunsu Cho
41ce8f28b2
[jvm-packages] Add Scala version suffix to xgboost-jvm package (#9776)
* Update JVM script (#9714)

* Bump version to 2.0.2; revamp pom.xml

* Update instructions in prepare_jvm_release.py

* Fix formatting
2023-11-08 10:17:26 -08:00
Jiaming Yuan
0ffc52e05c
[backport] Fix using categorical data with the ranker. (#9753) (#9778) 2023-11-09 01:20:52 +08:00
Philip Hyunsu Cho
a408254c2f
Use sys.base_prefix instead of sys.prefix (#9711)
* Use sys.base_prefix instead of sys.prefix

* Update libpath.py too
2023-10-23 23:31:40 -07:00
Philip Hyunsu Cho
946ab53b57
Fix libpath logic for Windows (#9687) 2023-10-19 10:42:46 -07:00
Jiaming Yuan
f7da938458
[backport][pyspark] Support stage-level scheduling (#9519) (#9686)
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
2023-10-18 14:05:08 +08:00
Jiaming Yuan
58aa98a796
Bump version to 2.0.1. (#9660) 2023-10-13 08:47:32 +08:00
Jiaming Yuan
e824b18bf6
[backport] Support pandas 2.1.0. (#9557) (#9655) 2023-10-12 11:29:59 +08:00
Jiaming Yuan
54d1d72d01
[backport] Use array interface for testing numpy arrays. (#9602) (#9635) 2023-10-08 11:45:49 +08:00
Jiaming Yuan
096047c547
Make 2.0 release. (#9567) 2023-09-12 00:20:49 +08:00
Jiaming Yuan
e75dd75bb2
[backport] [pyspark] support gpu transform (#9542) (#9559)
---------

Co-authored-by: Bobby Wang <wbo4958@gmail.com>
2023-09-07 17:21:09 +08:00
Jiaming Yuan
4d387cbfbf
[backport] [pyspark] rework transform to reuse same code (#9292) (#9558)
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
2023-09-07 15:26:24 +08:00
Jiaming Yuan
4301558a57
Make 2.0.0 RC1. (#9492) 2023-08-17 16:16:51 +08:00
Bobby Wang
68be454cfa
[pyspark] hotfix for GPU setup validation (#9495)
* [pyspark] fix a bug of validating gpu configuration

---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-08-17 16:01:39 +08:00
Jiaming Yuan
5188e27513
Fix version parsing with rc release. (#9493) 2023-08-16 22:44:58 +08:00
Jiaming Yuan
bdc1a3c178
Fix pyspark parameter. (#9460)
- Don't pass the `use_gpu` parameter to the learner.
- Fix GPU approx with PySpark.
2023-08-11 19:07:50 +08:00
Jiaming Yuan
1caa93221a
Use realloc for histogram cache and expose the cache limit. (#9455) 2023-08-10 14:05:27 +08:00
Jiaming Yuan
f05a23b41c
Use weakref instead of id for DataIter cache. (#9445)
- Fix case where Python reuses id from freed objects.
- Small optimization to column matrix with QDM by using `realloc` instead of copying data.
2023-08-10 00:40:06 +08:00
Bobby Wang
d495a180d8
[pyspark] add logs for training (#9449) 2023-08-09 18:32:23 +08:00
Jiaming Yuan
54029a59af
Bound the size of the histogram cache. (#9440)
- A new histogram collection with a limit in size.
- Unify histogram building logic between hist, multi-hist, and approx.
2023-08-08 03:21:26 +08:00
Hendrik Makait
f958e32683
Raise if expected workers are not alive in xgboost.dask.train (#9421) 2023-08-03 20:14:07 +08:00
Jiaming Yuan
7129988847
Accept only keyword arguments in data iterator. (#9431) 2023-08-03 12:44:16 +08:00
Jiaming Yuan
912e341d57
Initial GPU support for the approx tree method. (#9414) 2023-07-31 15:50:28 +08:00
Jiaming Yuan
851cba931e
Define best_iteration only if early stopping is used. (#9403)
* Define `best_iteration` only if early stopping is used.

This is the behavior specified by the document but not honored in the actual code.

- Don't set the attributes if there's no early stopping.
- Clean up the code for callbacks, and replace assertions with proper exceptions.
- Assign the attributes when early stopping `save_best` is used.
- Turn the attributes into Python properties.

---------

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2023-07-24 12:43:35 +08:00
Jiaming Yuan
01e00efc53
[breaking] Remove support for single string feature info. (#9401)
- Input must be a sequence of strings.
- Improve validation error message.
2023-07-24 11:06:30 +08:00
Jiaming Yuan
275da176ba
Document for device ordinal. (#9398)
- Rewrite GPU demos. notebook is converted to script to avoid committing additional png plots.
- Add GPU demos into the sphinx gallery.
- Add RMM demos into the sphinx gallery.
- Test for firing threads with different device ordinals.
2023-07-22 15:26:29 +08:00
Jiaming Yuan
6e18d3a290
[pyspark] Handle the device parameter in pyspark. (#9390)
- Handle the new `device` parameter in PySpark.
- Deprecate the old `use_gpu` parameter.
2023-07-18 08:47:03 +08:00
Jiaming Yuan
b342ef951b
Make feature validation immutable. (#9388) 2023-07-16 06:52:55 +08:00
Jiaming Yuan
16eb41936d
Handle the new device parameter in dask and demos. (#9386)
* Handle the new `device` parameter in dask and demos.

- Check no ordinal is specified in the dask interface.
- Update demos.
- Update dask doc.
- Update the condition for QDM.
2023-07-15 19:11:20 +08:00
Jiaming Yuan
9da5050643
Turn warning messages into Python warnings. (#9387) 2023-07-15 07:46:43 +08:00
Jiaming Yuan
04aff3af8e
Define the new device parameter. (#9362) 2023-07-13 19:30:25 +08:00
Jiaming Yuan
20c52f07d2
Support exporting cut values (#9356) 2023-07-08 15:32:41 +08:00
edumugi
c3124813e8
Support numpy vertical split (#9365) 2023-07-08 13:18:12 +08:00
Oliver Holworthy
6c9c8a9001
Enable Installation of Python Package with System lib in a Virtual Environment (#9349) 2023-07-05 05:46:17 +08:00
Jiaming Yuan
e964654b8f
[skl] Enable cat feature without specifying tree method. (#9353) 2023-07-03 22:06:17 +08:00
Jiaming Yuan
39390cc2ee
[breaking] Remove the predictor param, allow fallback to prediction using DMatrix. (#9129)
- A `DeviceOrd` struct is implemented to indicate the device. It will eventually replace the `gpu_id` parameter.
- The `predictor` parameter is removed.
- Fallback to `DMatrix` when `inplace_predict` is not available.
- The heuristic for choosing a predictor is only used during training.
2023-07-03 19:23:54 +08:00
Jiaming Yuan
4066d68261
[doc] Clarify early stopping. (#9304) 2023-06-20 17:56:47 +08:00
Jiaming Yuan
ee6809e642
Use mmap for external memory. (#9282)
- Have basic infrastructure for mmap.
- Release file write handle.
2023-06-19 18:52:55 +08:00
Jiaming Yuan
ea0deeca68
Disable dense optimization in hist for distributed training. (#9272) 2023-06-10 02:31:34 +08:00
Jiaming Yuan
1fcc26a6f8
Set ndcg to default for LTR. (#8822)
- Add document.
- Add tests.
- Use `ndcg` with `topk` as default.
2023-06-09 23:31:33 +08:00
Jiaming Yuan
9fbde21e9d
Rework the precision metric. (#9222)
- Rework the precision metric for both CPU and GPU.
- Mention it in the document.
- Cleanup old support code for GPU ranking metric.
- Deterministic GPU implementation.

* Drop support for classification.

* type.

* use batch shape.

* lint.

* cpu build.

* cpu build.

* lint.

* Tests.

* Fix.

* Cleanup error message.
2023-06-02 20:49:43 +08:00
Jiaming Yuan
097f11b6e0
Support CUDA f16 without transformation. (#9207)
- Support f16 from cupy.
- Include CUDA header explicitly.
- Cleanup cmake nvtx support.
2023-05-30 20:54:31 +08:00
Bobby Wang
320323f533
[pyspark] add parameters in the ctor of all estimators. (#9202)
---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-05-29 05:58:16 +08:00
michael-gendy-mention-me
c5677a2b2c
Remove type: ignore hints (#9197) 2023-05-27 07:48:28 +08:00
Jiaming Yuan
3913ff470f
Import data lazily during tests. (#9176) 2023-05-23 03:58:31 +08:00
Bobby Wang
6274fba0a5
[pyspark] support tying (#9172) 2023-05-19 14:39:26 +08:00
Bobby Wang
caf326d508
[pyspark] Refactor and typing support for models (#9156) 2023-05-17 16:38:51 +08:00
Philip Hyunsu Cho
0cd4382d72
Fix config-settings handling in pip install (#9115)
* Fix config_settings handling in pip install

* Fix formatting

* Fix flag use_system_libxgboost

* Add setuptools to doc requirements.txt

* Fix mypy
2023-05-09 17:54:20 -07:00
Uriya Harpeness
a075aa24ba
Move python tool configurations to pyproject.toml, and add the python 3.11 classifier. (#9112) 2023-05-06 02:59:06 +08:00
Philip Hyunsu Cho
07b2d5a26d
Add useful links to pyproject.toml (#9114) 2023-05-02 12:47:15 -07:00