Jiaming Yuan
f7da938458
[backport][pyspark] Support stage-level scheduling ( #9519 ) ( #9686 )
...
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
2023-10-18 14:05:08 +08:00
Jiaming Yuan
58aa98a796
Bump version to 2.0.1. ( #9660 )
2023-10-13 08:47:32 +08:00
Jiaming Yuan
e824b18bf6
[backport] Support pandas 2.1.0. ( #9557 ) ( #9655 )
2023-10-12 11:29:59 +08:00
Jiaming Yuan
54d1d72d01
[backport] Use array interface for testing numpy arrays. ( #9602 ) ( #9635 )
2023-10-08 11:45:49 +08:00
Jiaming Yuan
096047c547
Make 2.0 release. ( #9567 )
2023-09-12 00:20:49 +08:00
Jiaming Yuan
e75dd75bb2
[backport] [pyspark] support gpu transform ( #9542 ) ( #9559 )
...
---------
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
2023-09-07 17:21:09 +08:00
Jiaming Yuan
4d387cbfbf
[backport] [pyspark] rework transform to reuse same code ( #9292 ) ( #9558 )
...
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
2023-09-07 15:26:24 +08:00
Jiaming Yuan
4301558a57
Make 2.0.0 RC1. ( #9492 )
2023-08-17 16:16:51 +08:00
Bobby Wang
68be454cfa
[pyspark] hotfix for GPU setup validation ( #9495 )
...
* [pyspark] fix a bug of validating gpu configuration
---------
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-08-17 16:01:39 +08:00
Jiaming Yuan
5188e27513
Fix version parsing with rc release. ( #9493 )
2023-08-16 22:44:58 +08:00
Jiaming Yuan
bdc1a3c178
Fix pyspark parameter. ( #9460 )
...
- Don't pass the `use_gpu` parameter to the learner.
- Fix GPU approx with PySpark.
2023-08-11 19:07:50 +08:00
Jiaming Yuan
1caa93221a
Use realloc for histogram cache and expose the cache limit. ( #9455 )
2023-08-10 14:05:27 +08:00
Jiaming Yuan
f05a23b41c
Use weakref instead of id for DataIter cache. ( #9445 )
...
- Fix case where Python reuses id from freed objects.
- Small optimization to column matrix with QDM by using `realloc` instead of copying data.
2023-08-10 00:40:06 +08:00
Bobby Wang
d495a180d8
[pyspark] add logs for training ( #9449 )
2023-08-09 18:32:23 +08:00
Jiaming Yuan
54029a59af
Bound the size of the histogram cache. ( #9440 )
...
- A new histogram collection with a limit in size.
- Unify histogram building logic between hist, multi-hist, and approx.
2023-08-08 03:21:26 +08:00
Hendrik Makait
f958e32683
Raise if expected workers are not alive in xgboost.dask.train ( #9421 )
2023-08-03 20:14:07 +08:00
Jiaming Yuan
7129988847
Accept only keyword arguments in data iterator. ( #9431 )
2023-08-03 12:44:16 +08:00
Jiaming Yuan
912e341d57
Initial GPU support for the approx tree method. ( #9414 )
2023-07-31 15:50:28 +08:00
Jiaming Yuan
851cba931e
Define best_iteration only if early stopping is used. ( #9403 )
...
* Define `best_iteration` only if early stopping is used.
This is the behavior specified by the document but not honored in the actual code.
- Don't set the attributes if there's no early stopping.
- Clean up the code for callbacks, and replace assertions with proper exceptions.
- Assign the attributes when early stopping `save_best` is used.
- Turn the attributes into Python properties.
---------
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2023-07-24 12:43:35 +08:00
Jiaming Yuan
01e00efc53
[breaking] Remove support for single string feature info. ( #9401 )
...
- Input must be a sequence of strings.
- Improve validation error message.
2023-07-24 11:06:30 +08:00
Jiaming Yuan
275da176ba
Document for device ordinal. ( #9398 )
...
- Rewrite GPU demos. notebook is converted to script to avoid committing additional png plots.
- Add GPU demos into the sphinx gallery.
- Add RMM demos into the sphinx gallery.
- Test for firing threads with different device ordinals.
2023-07-22 15:26:29 +08:00
Jiaming Yuan
6e18d3a290
[pyspark] Handle the device parameter in pyspark. ( #9390 )
...
- Handle the new `device` parameter in PySpark.
- Deprecate the old `use_gpu` parameter.
2023-07-18 08:47:03 +08:00
Jiaming Yuan
b342ef951b
Make feature validation immutable. ( #9388 )
2023-07-16 06:52:55 +08:00
Jiaming Yuan
16eb41936d
Handle the new device parameter in dask and demos. ( #9386 )
...
* Handle the new `device` parameter in dask and demos.
- Check no ordinal is specified in the dask interface.
- Update demos.
- Update dask doc.
- Update the condition for QDM.
2023-07-15 19:11:20 +08:00
Jiaming Yuan
9da5050643
Turn warning messages into Python warnings. ( #9387 )
2023-07-15 07:46:43 +08:00
Jiaming Yuan
04aff3af8e
Define the new device parameter. ( #9362 )
2023-07-13 19:30:25 +08:00
Jiaming Yuan
20c52f07d2
Support exporting cut values ( #9356 )
2023-07-08 15:32:41 +08:00
edumugi
c3124813e8
Support numpy vertical split ( #9365 )
2023-07-08 13:18:12 +08:00
Oliver Holworthy
6c9c8a9001
Enable Installation of Python Package with System lib in a Virtual Environment ( #9349 )
2023-07-05 05:46:17 +08:00
Jiaming Yuan
e964654b8f
[skl] Enable cat feature without specifying tree method. ( #9353 )
2023-07-03 22:06:17 +08:00
Jiaming Yuan
39390cc2ee
[breaking] Remove the predictor param, allow fallback to prediction using DMatrix. ( #9129 )
...
- A `DeviceOrd` struct is implemented to indicate the device. It will eventually replace the `gpu_id` parameter.
- The `predictor` parameter is removed.
- Fallback to `DMatrix` when `inplace_predict` is not available.
- The heuristic for choosing a predictor is only used during training.
2023-07-03 19:23:54 +08:00
Jiaming Yuan
4066d68261
[doc] Clarify early stopping. ( #9304 )
2023-06-20 17:56:47 +08:00
Jiaming Yuan
ee6809e642
Use mmap for external memory. ( #9282 )
...
- Have basic infrastructure for mmap.
- Release file write handle.
2023-06-19 18:52:55 +08:00
Jiaming Yuan
ea0deeca68
Disable dense optimization in hist for distributed training. ( #9272 )
2023-06-10 02:31:34 +08:00
Jiaming Yuan
1fcc26a6f8
Set ndcg to default for LTR. ( #8822 )
...
- Add document.
- Add tests.
- Use `ndcg` with `topk` as default.
2023-06-09 23:31:33 +08:00
Jiaming Yuan
9fbde21e9d
Rework the precision metric. ( #9222 )
...
- Rework the precision metric for both CPU and GPU.
- Mention it in the document.
- Cleanup old support code for GPU ranking metric.
- Deterministic GPU implementation.
* Drop support for classification.
* type.
* use batch shape.
* lint.
* cpu build.
* cpu build.
* lint.
* Tests.
* Fix.
* Cleanup error message.
2023-06-02 20:49:43 +08:00
Jiaming Yuan
097f11b6e0
Support CUDA f16 without transformation. ( #9207 )
...
- Support f16 from cupy.
- Include CUDA header explicitly.
- Cleanup cmake nvtx support.
2023-05-30 20:54:31 +08:00
Bobby Wang
320323f533
[pyspark] add parameters in the ctor of all estimators. ( #9202 )
...
---------
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-05-29 05:58:16 +08:00
michael-gendy-mention-me
c5677a2b2c
Remove type: ignore hints ( #9197 )
2023-05-27 07:48:28 +08:00
Jiaming Yuan
3913ff470f
Import data lazily during tests. ( #9176 )
2023-05-23 03:58:31 +08:00
Bobby Wang
6274fba0a5
[pyspark] support tying ( #9172 )
2023-05-19 14:39:26 +08:00
Bobby Wang
caf326d508
[pyspark] Refactor and typing support for models ( #9156 )
2023-05-17 16:38:51 +08:00
Philip Hyunsu Cho
0cd4382d72
Fix config-settings handling in pip install ( #9115 )
...
* Fix config_settings handling in pip install
* Fix formatting
* Fix flag use_system_libxgboost
* Add setuptools to doc requirements.txt
* Fix mypy
2023-05-09 17:54:20 -07:00
Uriya Harpeness
a075aa24ba
Move python tool configurations to pyproject.toml, and add the python 3.11 classifier. ( #9112 )
2023-05-06 02:59:06 +08:00
Philip Hyunsu Cho
07b2d5a26d
Add useful links to pyproject.toml ( #9114 )
2023-05-02 12:47:15 -07:00
Jiaming Yuan
08ce495b5d
Use Booster context in DMatrix. ( #8896 )
...
- Pass context from booster to DMatrix.
- Use context instead of integer for `n_threads`.
- Check the consistency configuration for `max_bin`.
- Test for all combinations of initialization options.
2023-04-28 21:47:14 +08:00
Jiaming Yuan
1f9a57d17b
[Breaking] Require format to be specified in input URI. ( #9077 )
...
Previously, we use `libsvm` as default when format is not specified. However, the dmlc
data parser is not particularly robust against errors, and the most common type of error
is undefined format.
Along with which, we will recommend users to use other data loader instead. We will
continue the maintenance of the parsers as it's currently used for many internal tests
including federated learning.
2023-04-28 19:45:15 +08:00
Jiaming Yuan
e206b899ef
Rework MAP and Pairwise for LTR. ( #9075 )
2023-04-28 02:39:12 +08:00
Scott Gustafson
353ed5339d
Convert `DaskXGBClassifier.classes_` to an array ( #8452 )
...
---------
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-04-27 02:23:35 +08:00
Bobby Wang
17add4776f
[pyspark] Don't stack for non feature columns ( #9088 )
2023-04-25 23:09:12 +08:00