519 Commits

Author SHA1 Message Date
Jiaming Yuan
1f9a57d17b
[Breaking] Require format to be specified in input URI. (#9077)
Previously, we use `libsvm` as default when format is not specified. However, the dmlc
data parser is not particularly robust against errors, and the most common type of error
is undefined format.

Along with which, we will recommend users to use other data loader instead. We will
continue the maintenance of the parsers as it's currently used for many internal tests
including federated learning.
2023-04-28 19:45:15 +08:00
Jiaming Yuan
e206b899ef
Rework MAP and Pairwise for LTR. (#9075) 2023-04-28 02:39:12 +08:00
Jiaming Yuan
2c8d735cb3
Fix tests with pandas 2.0. (#9014)
* Fix tests with pandas 2.0.

- `is_categorical` is replaced by `is_categorical_dtype`.
- one hot encoding returns boolean type instead of integer type.
2023-04-11 00:17:34 +08:00
Jiaming Yuan
bac22734fb
Remove ntree limit in python package. (#8345)
- Remove `ntree_limit`. The parameter has been deprecated since 1.4.0.
- The SHAP package compatibility is broken.
2023-03-31 19:01:55 +08:00
Jiaming Yuan
acc110c251
[MT-TREE] Support prediction cache and model slicing. (#8968)
- Fix prediction range.
- Support prediction cache in mt-hist.
- Support model slicing.
- Make the booster a Python iterable by defining `__iter__`.
- Cleanup removed/deprecated parameters.
- A new field in the output model `iteration_indptr` for pointing to the ranges of trees for each iteration.
2023-03-27 23:10:54 +08:00
Jiaming Yuan
c2b3a13e70
[breaking][skl] Remove parameter serialization. (#8963)
- Remove parameter serialization in the scikit-learn interface.

The scikit-lear interface `save_model` will save only the model and discard all
hyper-parameters. This is to align with the native XGBoost interface, which distinguishes
the hyper-parameter and model parameters.

With the scikit-learn interface, model parameters are attributes of the estimator. For
instance, `n_features_in_`, `n_classes_` are always accessible with
`estimator.n_features_in_` and `estimator.n_classes_`, but not with the
`estimator.get_params`.

- Define a `load_model` method for classifier to load its own attributes.

- Set n_estimators to None by default.
2023-03-27 21:34:10 +08:00
Jiaming Yuan
151882dd26
Initial support for multi-target tree. (#8616)
* Implement multi-target for hist.

- Add new hist tree builder.
- Move data fetchers for tests.
- Dispatch function calls in gbm base on the tree type.
2023-03-22 23:49:56 +08:00
Jiaming Yuan
5891f752c8
Rework the MAP metric. (#8931)
- The new implementation is more strict as only binary labels are accepted. The previous implementation converts values greater than 1 to 1.
- Deterministic GPU. (no atomic add).
- Fix top-k handling.
- Precise definition of MAP. (There are other variants on how to handle top-k).
- Refactor GPU ranking tests.
2023-03-22 17:45:20 +08:00
Jiaming Yuan
f186c87cf9
Check inf in data for all types of DMatrix. (#8911) 2023-03-15 11:24:35 +08:00
Jiaming Yuan
910ce580c8
Clear all cache after model load. (#8904) 2023-03-14 22:09:36 +08:00
Jiaming Yuan
7eba285a1e
Support sklearn cross validation for ranker. (#8859)
* Support sklearn cross validation for ranker.

- Add a convention for X to include a special `qid` column.

sklearn utilities consider only `X`, `y` and `sample_weight` for supervised learning
algorithms, but we need an additional qid array for ranking.

It's important to be able to support the cross validation function in sklearn since all
other tuning functions like grid search are based on cross validation.
2023-03-07 00:22:08 +08:00
Jiaming Yuan
228a46e8ad
Support learning rate for zero-hessian objectives. (#8866) 2023-03-06 20:33:28 +08:00
Jiaming Yuan
6a892ce281
Specify src path for isort. (#8867) 2023-03-06 17:30:27 +08:00
Jiaming Yuan
cce4af4acf
Initial support for quantile loss. (#8750)
- Add support for Python.
- Add objective.
2023-02-16 02:30:18 +08:00
Jiaming Yuan
457f704e3d
Add quantile metric. (#8761) 2023-02-13 19:07:40 +08:00
Jiaming Yuan
225b3158f6
Support custom metric in sklearn ranker. (#8786) 2023-02-12 13:14:07 +08:00
Jiaming Yuan
8a16944664
Fix ranking with quantile dmatrix and group weight. (#8762) 2023-02-10 20:32:35 +08:00
Jiaming Yuan
199c421d60
Send default configuration from metric to objective. (#8760) 2023-02-09 20:18:07 +08:00
Jiaming Yuan
4ead65a28c
Increase timeout limit for linear. (#8767) 2023-02-09 18:20:12 +08:00
Jiaming Yuan
7b3d473593
[doc] Add demo for inference using individual tree. (#8752) 2023-02-07 04:40:18 +08:00
Jiaming Yuan
c1786849e3
Use array interface for CSC matrix. (#8672)
* Use array interface for CSC matrix.

Use array interface for CSC matrix and align the interface with CSR and dense.

- Fix nthread issue in the R package DMatrix.
- Unify the behavior of handling `missing` with other inputs.
- Unify the behavior of handling `missing` around R, Python, Java, and Scala DMatrix.
- Expose `num_non_missing` to the JVM interface.
- Deprecate old CSR and CSC constructors.
2023-02-05 01:59:46 +08:00
BenEfrati
213b5602d9
Add sample_weight to eval_metric (#8706) 2023-02-05 00:06:38 +08:00
Jiaming Yuan
0e61ba57d6
Fix GPU L1 error. (#8749) 2023-02-04 03:02:00 +08:00
Jiaming Yuan
1325ba9251
Support primitive types of pyarrow-backed pandas dataframe. (#8653)
Categorical data (dictionary) is not supported at the moment.
2023-01-30 17:53:29 +08:00
James Lamb
96e6b6beba
[ci] remove unused imports in tests (#8707) 2023-01-25 14:10:29 +08:00
Jiaming Yuan
31b9cbab3d
Make sure input numpy array is aligned. (#8690)
- use `np.require` to specify that the alignment is required.
- scipy csr as well.
- validate input pointer in `ArrayInterface`.
2023-01-18 08:12:13 +08:00
Jiaming Yuan
247946a875
Cache transformed in QuantileDMatrix for efficiency. (#8666) 2023-01-17 06:02:40 +08:00
Jiaming Yuan
d6018eb4b9
Remove all use of DeviceQuantileDMatrix. (#8665) 2023-01-17 00:04:10 +08:00
Jiaming Yuan
badeff1d74
Init estimation for regression. (#8272) 2023-01-11 02:04:56 +08:00
Jiaming Yuan
1b58d81315
[doc] Document Python inputs. (#8643) 2023-01-10 15:39:32 +08:00
Jiaming Yuan
e68a152d9e
Do not return internal value for get_params. (#8634) 2023-01-05 17:48:26 +08:00
Jiaming Yuan
6eaddaa9c3
[CI] Fix CI with updated dependencies. (#8631)
* [CI] Fix CI with updated dependencies.

- Fix jvm package get iris.

* Skip SHAP test for now.

* Revert "Skip SHAP test for now."

This reverts commit 9aa28b4d8aee53fa95d92d2a879c6783ff4b2faa.

* Catch all exceptions.
2023-01-03 21:04:04 -08:00
Jiaming Yuan
f6effa1734
Support Series and Python primitives in inplace_predict and QDM (#8547) 2022-12-17 00:15:15 +08:00
Rong Ou
42e6fbb0db
Fix sklearn test that calls a removed field (#8579) 2022-12-09 13:06:44 -08:00
Jiaming Yuan
deb3edf562
Support list and tuple for QDM. (#8542) 2022-12-10 01:14:44 +08:00
Jiaming Yuan
d666ba775e
Support all pandas nullable integer types. (#8480)
- Enumerate all pandas integer types.
- Tests for `None`, `nan`, and `pd.NA`
2022-11-28 22:38:16 +08:00
Jiaming Yuan
f2209c1fe4
Don't shuffle columns in categorical tests. (#8446) 2022-11-28 20:28:06 +08:00
Jiaming Yuan
8f97c92541
Support half type for pandas. (#8481) 2022-11-24 12:47:40 +08:00
Jiaming Yuan
e07245f110
Take datatable as row major input. (#8472)
* Take datatable as row major input.

Try to avoid a transform with dense table.
2022-11-24 09:20:13 +08:00
Jiaming Yuan
0d3da9869c
Require isort on all Python files. (#8420) 2022-11-08 12:59:06 +08:00
Rong Ou
99fa8dad2d
Add back xgboost.rabit for backwards compatibility (#8408)
* Add back xgboost.rabit for backwards compatibility

* fix my errors

* Fix lint

* Use FutureWarning

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-11-01 21:47:41 -07:00
Jiaming Yuan
a408c34558
Update JSON parser demo with categorical feature. (#8401)
- Parse categorical features in the Python example.
- Add tests.
- Update document.
2022-10-28 20:57:43 +08:00
Jiaming Yuan
cfd2a9f872
Extract dask and spark test into distributed test. (#8395)
- Move test files.
- Run spark and dask separately to prevent conflicts.
- Gather common code into the testing module.
2022-10-28 16:24:32 +08:00
Jiaming Yuan
cf70864fa3
Move Python testing utilities into xgboost module. (#8379)
- Add typehints.
- Fixes for pylint.

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-26 16:56:11 +08:00
Jiaming Yuan
c884b9e888
Validate features for inplace predict. (#8359) 2022-10-19 23:05:36 +08:00
Bobby Wang
76f95a6667
[pyspark] Filter out the unsupported train parameters (#8355) 2022-10-18 23:26:02 +08:00
Jiaming Yuan
3901f5d9db
[pyspark] Cleanup data processing. (#8344)
* Enable additional combinations of ctor parameters.
* Unify procedures for QuantileDMatrix and DMatrix.
2022-10-18 14:56:23 +08:00
Jiaming Yuan
2176e511fc
Disable pytest-timeout for now. (#8348) 2022-10-17 23:06:10 +08:00
Jiaming Yuan
97a5b088a5
[pyspark] Use quantile dmatrix. (#8284) 2022-10-12 20:38:53 +08:00
Rory Mitchell
ce0382dcb0
[CI] Refactor tests to reduce CI time. (#8312) 2022-10-12 11:32:06 +02:00