1373 Commits

Author SHA1 Message Date
Jiaming Yuan
1cf4d93246
Convert federated tests into test suite. (#9006)
- Add specialization for learning to rank.
2023-04-04 01:29:47 +08:00
Rong Ou
15e073ca9d
Make objectives work with vertical distributed and federated learning (#9002) 2023-04-03 17:07:42 +08:00
Jiaming Yuan
bac22734fb
Remove ntree limit in python package. (#8345)
- Remove `ntree_limit`. The parameter has been deprecated since 1.4.0.
- The SHAP package compatibility is broken.
2023-03-31 19:01:55 +08:00
Jiaming Yuan
d062a9e009
Define pair generation strategies for LTR. (#8984) 2023-03-30 12:00:35 +08:00
Philip Hyunsu Cho
6676c28cbc
[CI] Fix Windows wheel to be compatible with Poetry (#8991)
* [CI] Fix Windows wheel to be compatible with Poetry

* Typo

* Eagerly scan globs to avoid patching same file twice
2023-03-28 21:32:54 -07:00
Rong Ou
ff26cd3212
More tests for column split and vertical federated learning (#8985)
Added some more tests for the learner and fit_stump, for both column-wise distributed learning and vertical federated learning.

Also moved the `IsRowSplit` and `IsColumnSplit` methods from the `DMatrix` to the `MetaInfo` since in some places we only have access to the `MetaInfo`. Added a new convenience method `IsVerticalFederatedLearning`.

Some refactoring of the testing fixtures.
2023-03-28 16:40:26 +08:00
Jiaming Yuan
401ce5cf5e
Run linters with the multi output demo. (#8966) 2023-03-28 00:47:28 +08:00
Jiaming Yuan
acc110c251
[MT-TREE] Support prediction cache and model slicing. (#8968)
- Fix prediction range.
- Support prediction cache in mt-hist.
- Support model slicing.
- Make the booster a Python iterable by defining `__iter__`.
- Cleanup removed/deprecated parameters.
- A new field in the output model `iteration_indptr` for pointing to the ranges of trees for each iteration.
2023-03-27 23:10:54 +08:00
Jiaming Yuan
c2b3a13e70
[breaking][skl] Remove parameter serialization. (#8963)
- Remove parameter serialization in the scikit-learn interface.

The scikit-lear interface `save_model` will save only the model and discard all
hyper-parameters. This is to align with the native XGBoost interface, which distinguishes
the hyper-parameter and model parameters.

With the scikit-learn interface, model parameters are attributes of the estimator. For
instance, `n_features_in_`, `n_classes_` are always accessible with
`estimator.n_features_in_` and `estimator.n_classes_`, but not with the
`estimator.get_params`.

- Define a `load_model` method for classifier to load its own attributes.

- Set n_estimators to None by default.
2023-03-27 21:34:10 +08:00
Jiaming Yuan
151882dd26
Initial support for multi-target tree. (#8616)
* Implement multi-target for hist.

- Add new hist tree builder.
- Move data fetchers for tests.
- Dispatch function calls in gbm base on the tree type.
2023-03-22 23:49:56 +08:00
Jiaming Yuan
ea04d4c46c
[doc] [dask] Troubleshooting NCCL errors. (#8943) 2023-03-22 22:17:26 +08:00
Jiaming Yuan
5891f752c8
Rework the MAP metric. (#8931)
- The new implementation is more strict as only binary labels are accepted. The previous implementation converts values greater than 1 to 1.
- Deterministic GPU. (no atomic add).
- Fix top-k handling.
- Precise definition of MAP. (There are other variants on how to handle top-k).
- Refactor GPU ranking tests.
2023-03-22 17:45:20 +08:00
Rong Ou
b240f055d3
Support vertical federated learning (#8932) 2023-03-22 14:25:26 +08:00
Jiaming Yuan
a093770f36
Partitioner for multi-target tree. (#8922) 2023-03-16 18:49:34 +08:00
Jiaming Yuan
26209a42a5
Define git attributes for renormalization. (#8921) 2023-03-16 02:43:11 +08:00
Jiaming Yuan
f186c87cf9
Check inf in data for all types of DMatrix. (#8911) 2023-03-15 11:24:35 +08:00
Jiaming Yuan
72e8331eab
Reimplement the NDCG metric. (#8906)
- Add support for non-exp gain.
- Cache the DMatrix object to avoid re-calculating the IDCG.
- Make GPU implementation deterministic. (no atomic add)
2023-03-15 03:26:17 +08:00
Jiaming Yuan
8685556af2
Implement hist evaluator for multi-target tree. (#8908) 2023-03-15 01:42:51 +08:00
Jiaming Yuan
910ce580c8
Clear all cache after model load. (#8904) 2023-03-14 22:09:36 +08:00
Jiaming Yuan
c400fa1e8d
Predictor for vector leaf. (#8898) 2023-03-14 19:07:10 +08:00
Jiaming Yuan
8be6095ece
Implement NDCG cache. (#8893) 2023-03-13 22:16:31 +08:00
Jiaming Yuan
9bade7203a
Remove public access to tree model param. (#8902)
* Make tree model param a private member.
* Number of features and targets are immutable after construction.

This is to reduce the number of places where we can run configuration.
2023-03-13 20:55:10 +08:00
Jiaming Yuan
5ba3509dd3
Define multi expand entry. (#8895) 2023-03-13 19:31:05 +08:00
Jiaming Yuan
3689695d16
[CI] Run RMM gtests. (#8900)
* [CI] Run RMM gtests.

* Update test-cpp-gpu.sh

---------

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2023-03-12 03:14:31 +08:00
Jiaming Yuan
36a7396658
Replace dmlc any with std any. (#8892) 2023-03-11 06:11:04 +08:00
Jiaming Yuan
2aa838c75e
Define multi-strategy parameter. (#8890) 2023-03-11 02:58:01 +08:00
Jiaming Yuan
6deaec8027
Pass obj info by reference instead of by value. (#8889)
- Pass obj info into tree updater as const pointer.

This way we don't have to initialize the learner model param before configuring gbm, hence
breaking up the dependency of configurations.
2023-03-11 01:38:28 +08:00
Jiaming Yuan
c5c8f643f2
Remove the cub submodule. (#8888)
XGBoost now uses CTK-11.8 for binary packages, there's no need to maintain a cub
submodule anymore.
2023-03-09 19:43:02 -08:00
Jiaming Yuan
5feee8d4a9
Define core multi-target regression tree structure. (#8884)
- Define a new tree struct embedded in the `RegTree`.
- Provide dispatching functions in `RegTree`.
- Fix some c++-17 warnings about the use of nodiscard (currently we disable the warning on
  the CI).
- Use uint32_t instead of size_t for `bst_target_t` as it has a defined size and can be used
  as part of dmlc parameter.
- Hide the `Segment` struct inside the categorical split matrix.
2023-03-09 19:03:06 +08:00
Jiaming Yuan
46dfcc7d22
Define a new ranking parameter. (#8887) 2023-03-09 17:46:24 +08:00
Jiaming Yuan
f236640427
Support F order for the tensor type. (#8872)
- Add F order support for tensor and view.
- Use parameter pack for automatic type cast. (avoid excessive static cast for shape).
2023-03-08 03:27:49 +08:00
Jiaming Yuan
f7ce0ec0df
Upgrade gcc toolchain to 9.x. (#8878)
* Use new tool chain.

* Use gcc-9.

* Use cmake from system.

* DOn't link leak.
2023-03-07 08:25:23 -08:00
Jiaming Yuan
7eba285a1e
Support sklearn cross validation for ranker. (#8859)
* Support sklearn cross validation for ranker.

- Add a convention for X to include a special `qid` column.

sklearn utilities consider only `X`, `y` and `sample_weight` for supervised learning
algorithms, but we need an additional qid array for ranking.

It's important to be able to support the cross validation function in sklearn since all
other tuning functions like grid search are based on cross validation.
2023-03-07 00:22:08 +08:00
Jiaming Yuan
228a46e8ad
Support learning rate for zero-hessian objectives. (#8866) 2023-03-06 20:33:28 +08:00
Jiaming Yuan
6a892ce281
Specify src path for isort. (#8867) 2023-03-06 17:30:27 +08:00
Jiaming Yuan
4d665b3fb0
Restore clang tidy test. (#8861) 2023-03-03 13:47:04 -08:00
Rory Mitchell
69a50248b7
Fix scope of feature set pointers (#8850)
---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-03-02 12:37:14 +08:00
mzzhang95
6cef9a08e9
[pyspark] Update eval_metric validation to support list of strings (#8826) 2023-03-02 08:24:12 +08:00
Rong Ou
7cbaee9916
Support column split in approx tree method (#8847) 2023-03-02 03:59:07 +08:00
Philip Hyunsu Cho
6d8afb2218
[CI] Require C++17 + CMake 3.18; Use CUDA 11.8 in CI (#8853)
* Update to C++17

* Turn off unity build

* Update CMake to 3.18

* Use MSVC 2022 + CUDA 11.8

* Re-create stack for worker images

* Allocate more disk space for Windows

* Tempiorarily disable clang-tidy

* RAPIDS now requires Python 3.10+

* Unpin cuda-python

* Use latest NCCL

* Use Ubuntu 20.04 in RMM image

* Mark failing mgpu test as xfail
2023-03-01 09:22:24 -08:00
Jiaming Yuan
d54ef56f6f
Fix cache with gc (#8851)
- Make DMatrixCache thread-safe.
- Remove the use of thread-local memory.
2023-03-01 00:39:06 +08:00
Rong Ou
d9688f93c7
Support column-split in row partitioner (#8828) 2023-02-26 04:43:35 +08:00
Rong Ou
a65ad0bd9c
Support column split in histogram builder (#8811) 2023-02-17 22:37:01 +08:00
Jiaming Yuan
c0afdb6786
Fix CPU bin compression with categorical data. (#8809)
* Fix CPU bin compression with categorical data.

* The bug causes the maximum category to be lesser than 256 or the maximum number of bins when
the input data is dense.
2023-02-16 04:20:34 +08:00
Jiaming Yuan
cce4af4acf
Initial support for quantile loss. (#8750)
- Add support for Python.
- Add objective.
2023-02-16 02:30:18 +08:00
Jiaming Yuan
282b1729da
Specify the number of threads for parallel sort. (#8735)
* Specify the number of threads for parallel sort.

- Pass context object into argsort.
- Replace macros with inline functions.
2023-02-16 00:20:19 +08:00
Jiaming Yuan
81b2ee1153
Pass DMatrix into metric for caching. (#8790) 2023-02-13 22:15:05 +08:00
Jiaming Yuan
31d3ec07af
Extract device algorithms. (#8789) 2023-02-13 20:53:53 +08:00
Jiaming Yuan
457f704e3d
Add quantile metric. (#8761) 2023-02-13 19:07:40 +08:00
Jiaming Yuan
d11a0044cf
Generalize prediction cache. (#8783)
* Extract most of the functionality into `DMatrixCache`.
* Move API entry to independent file to reduce dependency on `predictor.h` file.
* Add test.

---------

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2023-02-13 12:36:43 +08:00