340 Commits

Author SHA1 Message Date
Jiaming Yuan
45ddc39c1d
Relax shotgun test. (#6918) 2021-04-30 09:03:12 +08:00
Jiaming Yuan
b31d37eac5
[CI] Fix custom metric test with empty dataset. (#6917) 2021-04-30 09:00:05 +08:00
Jiaming Yuan
8760ec4827
Ensure predict leaf output 1-dim vector where there's only 1 tree. (#6889) 2021-04-23 15:07:48 +08:00
Jiaming Yuan
54afa3ac7a
Relax shotgun test. (#6900)
It's non-deterministic algorithm, the test is flaky.
2021-04-23 13:01:44 +08:00
Jiaming Yuan
dee5ef2dfd
Typehint for Sklearn. (#6799) 2021-04-14 06:55:21 +08:00
giladmaya
aa0d8f20c1
Support configuring constraints by feature names (#6783)
Co-authored-by: fis <jm.yuan@outlook.com>
2021-04-04 06:53:33 +08:00
Jiaming Yuan
7e06c81894
Fix approximated predict contribution. (#6811) 2021-04-03 02:15:03 +08:00
Jiaming Yuan
47b62480af
More general predict proba. (#6817)
* Use `output_margin` for `softmax`.
* Add test for dask binary cls.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-04-01 19:52:12 +08:00
Jiaming Yuan
a5c852660b
Update document for sklearn model IO. (#6809)
* Update the use of JSON.
* Remove unnecessary type cast.
2021-04-01 15:52:36 +08:00
Jiaming Yuan
a59c7323b4
Fix inplace predict missing value. (#6787) 2021-03-27 05:36:10 +08:00
Jiaming Yuan
bcc0277338
Re-implement ROC-AUC. (#6747)
* Re-implement ROC-AUC.

* Binary
* MultiClass
* LTR
* Add documents.

This PR resolves a few issues:
  - Define a value when the dataset is invalid, which can happen if there's an
  empty dataset, or when the dataset contains only positive or negative values.
  - Define ROC-AUC for multi-class classification.
  - Define weighted average value for distributed setting.
  - A correct implementation for learning to rank task.  Previous
  implementation is just binary classification with averaging across groups,
  which doesn't measure ordered learning to rank.
2021-03-20 16:52:40 +08:00
Jiaming Yuan
4ee8340e79
Support column major array. (#6765) 2021-03-20 05:19:46 +08:00
Jiaming Yuan
23b4165a6b
Fix gamma deviance (#6761) 2021-03-20 01:56:17 +08:00
Jiaming Yuan
4f75f514ce
Fix GPU RF (#6755)
* Fix sampling.
2021-03-17 06:23:35 +08:00
Jiaming Yuan
325bc93e16
[dask] Use distributed.MultiLock (#6743)
* [dask] Use `distributed.MultiLock`

This enables training multiple models in parallel.

* Conditionally import `MultiLock`.
* Use async train directly in scikit learn interface.
* Use `worker_client` when available.
2021-03-16 14:19:41 +08:00
Philip Hyunsu Cho
366f3cb9d8
Add use_rmm flag to global configuration (#6656)
* Ensure RMM is 0.18 or later

* Add use_rmm flag to global configuration

* Modify XGBCachingDeviceAllocatorImpl to skip CUB when use_rmm=True

* Update the demo

* [CI] Pin NumPy to 1.19.4, since NumPy 1.19.5 doesn't work with latest Shap
2021-03-09 14:53:05 -08:00
Jiaming Yuan
a9b4a95225
Fix learning rate scheduler with cv. (#6720)
* Expose more methods in cvpack and packed booster.
* Fix cv context in deprecated callbacks.
* Fix document.
2021-02-28 13:57:42 +08:00
Jiaming Yuan
9da2287ab8
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605)
* Save feature info in booster in JSON model.
* [breaking] Remove automatic feature name generation in `DMatrix`.

This PR is to enable reliable feature validation in Python package.
2021-02-25 18:54:16 +08:00
capybara
b6167cd2ff
[dask] Use client to persist collections (#6722)
Co-authored-by: fis <jm.yuan@outlook.com>
2021-02-25 16:40:38 +08:00
Jiaming Yuan
e8c5c53e2f
Use Predictor for dart. (#6693)
* Use normal predictor for dart booster.
* Implement `inplace_predict` for dart.
* Enable `dart` for dask interface now that it's thread-safe.
* categorical data should be working out of box for dart now.

The implementation is not very efficient as it has to pull back the data and
apply weight for each tree, but still a significant improvement over previous
implementation as now we no longer binary search for each sample.

* Fix output prediction shape on dataframe.
2021-02-09 23:30:19 +08:00
Jiaming Yuan
1335db6113
[dask] Improve documents. (#6687)
* Add tag for versions.
* use autoclass in sphinx build.
Made some class methods to be private to avoid exporting documents.
2021-02-09 09:20:58 +08:00
Jiaming Yuan
5d48d40d9a
Fix DMatrix slice with feature types. (#6689) 2021-02-09 08:13:51 +08:00
Jiaming Yuan
4656b09d5d
[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668)
* Add a new API function for predicting on `DMatrix`.  This function aligns
with rest of the `XGBoosterPredictFrom*` functions on semantic of function
arguments.
* Purge `ntree_limit` from libxgboost, use iteration instead.
* [dask] Use `inplace_predict` by default for dask sklearn models.
* [dask] Run prediction shape inference on worker instead of client.

The breaking change is in the Python sklearn `apply` function, I made it to be
consistent with other prediction functions where `best_iteration` is used by
default.
2021-02-08 18:26:32 +08:00
Jiaming Yuan
dbb5208a0a
Use __array_interface__ for creating DMatrix from CSR. (#6675)
* Use __array_interface__ for creating DMatrix from CSR.
* Add configuration.
2021-02-05 21:09:47 +08:00
Jiaming Yuan
a4101de678
Fix divide by 0 in feature importance when no split is found. (#6676) 2021-02-05 03:39:30 +08:00
Jiaming Yuan
72892cc80d
[dask] Disable gblinear and dart. (#6665) 2021-02-04 09:13:09 +08:00
Jiaming Yuan
411592a347
Enhance inplace prediction. (#6653)
* Accept array interface for csr and array.
* Accept an optional proxy dmatrix for metainfo.

This constructs an explicit `_ProxyDMatrix` type in Python.

* Remove unused doc.
* Add strict output.
2021-02-02 11:41:46 +08:00
Jiaming Yuan
87ab1ad607
[dask] Accept Future of model for prediction. (#6650)
This PR changes predict and inplace_predict to accept a Future of model, to avoid sending models to workers repeatably.

* Document is updated to reflect functionality additions in recent changes.
2021-02-02 08:45:52 +08:00
Jiaming Yuan
d8ec7aad5a
[dask] Add a 1 line sample to infer output shape. (#6645)
* [dask] Use a 1 line sample to infer output shape.

This is for inferring shape with direct prediction (without DaskDMatrix).
There are a few things that requires known output shape before carrying out
actual prediction, including dask meta data, output dataframe columns.

* Infer output shape based on local prediction.
* Remove set param in predict function as it's not thread safe nor necessary as
we now let dask to decide the parallelism.
* Simplify prediction on `DaskDMatrix`.
2021-01-30 18:55:50 +08:00
Jiaming Yuan
d167892c7e
[dask] Ensure model can be pickled. (#6651) 2021-01-28 21:47:57 +08:00
Jiaming Yuan
740d042255
Add base_margin for evaluation dataset. (#6591)
* Add base margin to evaluation datasets.
* Unify the code base for evaluation matrices.
2021-01-26 02:11:02 +08:00
Jiaming Yuan
4bf23c2391
Specify shape in prediction contrib and interaction. (#6614) 2021-01-26 02:08:22 +08:00
Jiaming Yuan
a275f40267
[dask] Rework base margin test. (#6627) 2021-01-22 17:49:13 +08:00
Jiaming Yuan
7bc56fa0ed
Use simple print in tracker print function. (#6609) 2021-01-21 21:15:43 +08:00
Jiaming Yuan
d6d72de339
Revert ntree limit fix (#6616)
The old (before fix) best_ntree_limit ignores the num_class parameters, which is incorrect. In before we workarounded it in c++ layer to avoid possible breaking changes on other language bindings. But the Python interpretation stayed incorrect. The PR fixed that in Python to consider num_class, but didn't remove the old workaround, so tree calculation in predictor is incorrect, see PredictBatch in CPUPredictor.
2021-01-19 23:51:16 +08:00
Jiaming Yuan
d356b7a071
Restore unknown data support. (#6595) 2021-01-14 04:51:16 +08:00
Jiaming Yuan
89a00a5866
[dask] Random forest estimators (#6602) 2021-01-13 20:59:20 +08:00
Jiaming Yuan
0027220aa0
[breaking] Remove duplicated predict functions, Fix attributes IO. (#6593)
* Fix attributes not being restored.
* Rename all `data` to `X`. [breaking]
2021-01-13 16:56:49 +08:00
Jiaming Yuan
03cd087da1
Remove duplicated DMatrix. (#6592) 2021-01-12 09:36:56 +08:00
Jiaming Yuan
c709f2aaaf
Fix evaluation result for XGBRanker. (#6594)
* Remove duplicated code, which fixes typo `evals_result` -> `evals_result_`.
2021-01-12 09:36:41 +08:00
Jiaming Yuan
78f2cd83d7
Suppress hypothesis health check for dask client. (#6589) 2021-01-11 14:11:57 +08:00
Jiaming Yuan
80065d571e
[dask] Add DaskXGBRanker (#6576)
* Initial support for distributed LTR using dask.

* Support `qid` in libxgboost.
* Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]`
  to avoid duplicated code.
* Define `DaskXGBRanker`.

The dask ranker doesn't support group structure, instead it uses query id and
convert to group ptr internally.
2021-01-08 18:35:09 +08:00
Jiaming Yuan
96d3d32265
[dask] Add shap tests. (#6575) 2021-01-08 14:59:27 +08:00
Jiaming Yuan
7c9dcbedbc
Fix best_ntree_limit for dart and gblinear. (#6579) 2021-01-08 10:05:39 +08:00
Jiaming Yuan
f5ff90cd87
Support _estimator_type. (#6582)
* Use `_estimator_type`.

For more info, see: https://scikit-learn.org/stable/developers/develop.html#estimator-types

* Model trained from dask can be loaded by single node skl interface.
2021-01-08 10:01:16 +08:00
Jiaming Yuan
60cfd14349
[dask, sklearn] Fix predict proba. (#6566)
* For sklearn:
  - Handles user defined objective function.
  - Handles `softmax`.

* For dask:
  - Use the implementation from sklearn, the previous implementation doesn't perform any extra handling.
2021-01-05 08:29:06 +08:00
Jiaming Yuan
516a93d25c
Fix best_ntree_limit. (#6569) 2021-01-03 05:58:54 +08:00
Jiaming Yuan
5e9e525223
Remove warnings in tests. (#6554) 2020-12-31 13:41:18 +08:00
Jiaming Yuan
de8fd852a5
[dask] Add type hints. (#6519)
* Add validate_features.
* Show type hints in doc.

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-12-29 19:41:02 +08:00
Jiaming Yuan
ca3da55de4
Support early stopping with training continuation, correct num boosted rounds. (#6506)
* Implement early stopping with training continuation.

* Add new C API for obtaining boosted rounds.

* Fix off by 1 in `save_best`.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-12-17 19:59:19 +08:00