830 Commits

Author SHA1 Message Date
Jiaming Yuan
ee6809e642
Use mmap for external memory. (#9282)
- Have basic infrastructure for mmap.
- Release file write handle.
2023-06-19 18:52:55 +08:00
Jiaming Yuan
ea0deeca68
Disable dense optimization in hist for distributed training. (#9272) 2023-06-10 02:31:34 +08:00
Jiaming Yuan
1fcc26a6f8
Set ndcg to default for LTR. (#8822)
- Add document.
- Add tests.
- Use `ndcg` with `topk` as default.
2023-06-09 23:31:33 +08:00
Jiaming Yuan
9fbde21e9d
Rework the precision metric. (#9222)
- Rework the precision metric for both CPU and GPU.
- Mention it in the document.
- Cleanup old support code for GPU ranking metric.
- Deterministic GPU implementation.

* Drop support for classification.

* type.

* use batch shape.

* lint.

* cpu build.

* cpu build.

* lint.

* Tests.

* Fix.

* Cleanup error message.
2023-06-02 20:49:43 +08:00
Jiaming Yuan
097f11b6e0
Support CUDA f16 without transformation. (#9207)
- Support f16 from cupy.
- Include CUDA header explicitly.
- Cleanup cmake nvtx support.
2023-05-30 20:54:31 +08:00
Bobby Wang
320323f533
[pyspark] add parameters in the ctor of all estimators. (#9202)
---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-05-29 05:58:16 +08:00
michael-gendy-mention-me
c5677a2b2c
Remove type: ignore hints (#9197) 2023-05-27 07:48:28 +08:00
Jiaming Yuan
3913ff470f
Import data lazily during tests. (#9176) 2023-05-23 03:58:31 +08:00
Bobby Wang
6274fba0a5
[pyspark] support tying (#9172) 2023-05-19 14:39:26 +08:00
Bobby Wang
caf326d508
[pyspark] Refactor and typing support for models (#9156) 2023-05-17 16:38:51 +08:00
Philip Hyunsu Cho
0cd4382d72
Fix config-settings handling in pip install (#9115)
* Fix config_settings handling in pip install

* Fix formatting

* Fix flag use_system_libxgboost

* Add setuptools to doc requirements.txt

* Fix mypy
2023-05-09 17:54:20 -07:00
Uriya Harpeness
a075aa24ba
Move python tool configurations to pyproject.toml, and add the python 3.11 classifier. (#9112) 2023-05-06 02:59:06 +08:00
Philip Hyunsu Cho
07b2d5a26d
Add useful links to pyproject.toml (#9114) 2023-05-02 12:47:15 -07:00
Jiaming Yuan
08ce495b5d
Use Booster context in DMatrix. (#8896)
- Pass context from booster to DMatrix.
- Use context instead of integer for `n_threads`.
- Check the consistency configuration for `max_bin`.
- Test for all combinations of initialization options.
2023-04-28 21:47:14 +08:00
Jiaming Yuan
1f9a57d17b
[Breaking] Require format to be specified in input URI. (#9077)
Previously, we use `libsvm` as default when format is not specified. However, the dmlc
data parser is not particularly robust against errors, and the most common type of error
is undefined format.

Along with which, we will recommend users to use other data loader instead. We will
continue the maintenance of the parsers as it's currently used for many internal tests
including federated learning.
2023-04-28 19:45:15 +08:00
Jiaming Yuan
e206b899ef
Rework MAP and Pairwise for LTR. (#9075) 2023-04-28 02:39:12 +08:00
Scott Gustafson
353ed5339d
Convert `DaskXGBClassifier.classes_` to an array (#8452)
---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-04-27 02:23:35 +08:00
Bobby Wang
17add4776f
[pyspark] Don't stack for non feature columns (#9088) 2023-04-25 23:09:12 +08:00
Bobby Wang
339f21e1bf
[pyspark] fix a type hint with old pyspark release (#9079) 2023-04-24 20:04:14 +08:00
Philip Hyunsu Cho
a5cd2412de
Replace setup.py with pyproject.toml (#9021)
* Create pyproject.toml
* Implement a custom build backend (see below) in packager directory. Build logic from setup.py has been refactored and migrated into the new backend.
* Tested: pip wheel . (build wheel), python -m build --sdist . (source distribution)
2023-04-20 13:51:39 -07:00
WeichenXu
191d0aa5cf
[spark] Make spark model have the same UID with its estimator (#9022)
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2023-04-14 02:53:30 +08:00
Jiaming Yuan
720a8c3273
[doc] Remove parameter type in Python doc strings. (#9005) 2023-04-01 04:04:30 +08:00
Jiaming Yuan
bac22734fb
Remove ntree limit in python package. (#8345)
- Remove `ntree_limit`. The parameter has been deprecated since 1.4.0.
- The SHAP package compatibility is broken.
2023-03-31 19:01:55 +08:00
Jiaming Yuan
a58055075b
[dask] Return the first valid booster instead of all valid ones. (#8993)
* [dask] Return the first valid booster instead of all valid ones.

- Reduce memory footprint of the returned model.

* mypy error.

* lint.

* duplicated.
2023-03-30 03:16:18 +08:00
Jiaming Yuan
acc110c251
[MT-TREE] Support prediction cache and model slicing. (#8968)
- Fix prediction range.
- Support prediction cache in mt-hist.
- Support model slicing.
- Make the booster a Python iterable by defining `__iter__`.
- Cleanup removed/deprecated parameters.
- A new field in the output model `iteration_indptr` for pointing to the ranges of trees for each iteration.
2023-03-27 23:10:54 +08:00
Jiaming Yuan
c2b3a13e70
[breaking][skl] Remove parameter serialization. (#8963)
- Remove parameter serialization in the scikit-learn interface.

The scikit-lear interface `save_model` will save only the model and discard all
hyper-parameters. This is to align with the native XGBoost interface, which distinguishes
the hyper-parameter and model parameters.

With the scikit-learn interface, model parameters are attributes of the estimator. For
instance, `n_features_in_`, `n_classes_` are always accessible with
`estimator.n_features_in_` and `estimator.n_classes_`, but not with the
`estimator.get_params`.

- Define a `load_model` method for classifier to load its own attributes.

- Set n_estimators to None by default.
2023-03-27 21:34:10 +08:00
Jiaming Yuan
21a52c7f98
[doc] Add introduction and notes for the sklearn interface. (#8948) 2023-03-23 13:30:42 +08:00
Jiaming Yuan
bf88dadb61
[doc] Fix callback example. (#8944) 2023-03-23 03:27:04 +08:00
Jiaming Yuan
151882dd26
Initial support for multi-target tree. (#8616)
* Implement multi-target for hist.

- Add new hist tree builder.
- Move data fetchers for tests.
- Dispatch function calls in gbm base on the tree type.
2023-03-22 23:49:56 +08:00
Jiaming Yuan
5891f752c8
Rework the MAP metric. (#8931)
- The new implementation is more strict as only binary labels are accepted. The previous implementation converts values greater than 1 to 1.
- Deterministic GPU. (no atomic add).
- Fix top-k handling.
- Precise definition of MAP. (There are other variants on how to handle top-k).
- Refactor GPU ranking tests.
2023-03-22 17:45:20 +08:00
Jiaming Yuan
f186c87cf9
Check inf in data for all types of DMatrix. (#8911) 2023-03-15 11:24:35 +08:00
Jiaming Yuan
7eba285a1e
Support sklearn cross validation for ranker. (#8859)
* Support sklearn cross validation for ranker.

- Add a convention for X to include a special `qid` column.

sklearn utilities consider only `X`, `y` and `sample_weight` for supervised learning
algorithms, but we need an additional qid array for ranking.

It's important to be able to support the cross validation function in sklearn since all
other tuning functions like grid search are based on cross validation.
2023-03-07 00:22:08 +08:00
Jiaming Yuan
6a892ce281
Specify src path for isort. (#8867) 2023-03-06 17:30:27 +08:00
mzzhang95
6cef9a08e9
[pyspark] Update eval_metric validation to support list of strings (#8826) 2023-03-02 08:24:12 +08:00
Jiaming Yuan
cce4af4acf
Initial support for quantile loss. (#8750)
- Add support for Python.
- Add objective.
2023-02-16 02:30:18 +08:00
WeichenXu
f27a7258c6
Fix feature types param (#8772)
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2023-02-14 02:16:42 +08:00
Jiaming Yuan
457f704e3d
Add quantile metric. (#8761) 2023-02-13 19:07:40 +08:00
Jiaming Yuan
225b3158f6
Support custom metric in sklearn ranker. (#8786) 2023-02-12 13:14:07 +08:00
Jiaming Yuan
8a16944664
Fix ranking with quantile dmatrix and group weight. (#8762) 2023-02-10 20:32:35 +08:00
Jiaming Yuan
c4802bfcd0
Cleanup booster param types. (#8756) 2023-02-07 15:52:19 +08:00
Jiaming Yuan
0f37a01dd9
Require black formatter for the python package. (#8748) 2023-02-07 01:53:33 +08:00
Jiaming Yuan
a2e433a089
Fix empty DMatrix with categorical features. (#8739) 2023-02-07 00:40:11 +08:00
Jiaming Yuan
c1786849e3
Use array interface for CSC matrix. (#8672)
* Use array interface for CSC matrix.

Use array interface for CSC matrix and align the interface with CSR and dense.

- Fix nthread issue in the R package DMatrix.
- Unify the behavior of handling `missing` with other inputs.
- Unify the behavior of handling `missing` around R, Python, Java, and Scala DMatrix.
- Expose `num_non_missing` to the JVM interface.
- Deprecate old CSR and CSC constructors.
2023-02-05 01:59:46 +08:00
BenEfrati
213b5602d9
Add sample_weight to eval_metric (#8706) 2023-02-05 00:06:38 +08:00
Jiaming Yuan
0e61ba57d6
Fix GPU L1 error. (#8749) 2023-02-04 03:02:00 +08:00
Jiaming Yuan
1325ba9251
Support primitive types of pyarrow-backed pandas dataframe. (#8653)
Categorical data (dictionary) is not supported at the moment.
2023-01-30 17:53:29 +08:00
Jiaming Yuan
9fb12b20a4
Cleanup the callback module. (#8702)
- Cleanup pylint markers.
- run formatter.
- Update examples of using callback.
2023-01-22 00:13:49 +08:00
James Lamb
6933240837
[python-package] remove unused functions in xgboost.data (#8695) 2023-01-19 08:02:54 +08:00
Jiaming Yuan
31b9cbab3d
Make sure input numpy array is aligned. (#8690)
- use `np.require` to specify that the alignment is required.
- scipy csr as well.
- validate input pointer in `ArrayInterface`.
2023-01-18 08:12:13 +08:00
Jiaming Yuan
175986b739
[doc] Add missing document for pyspark ranker. [skip ci] (#8692) 2023-01-18 07:52:18 +08:00