152 Commits

Author SHA1 Message Date
Jiaming Yuan
663136aa08
Implement feature score for linear model. (#7048)
* Add feature score support for linear model.
* Port R interface to the new implementation.
* Add linear model support in Python.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-06-25 14:34:02 +08:00
Jiaming Yuan
1d4d345634
Tests for dask skl categorical data support. (#7054) 2021-06-24 16:33:57 +08:00
Jiaming Yuan
c4b9f4f622
Add enable_categorical to sklearn. (#7011) 2021-06-04 02:29:14 +08:00
Jiaming Yuan
816b789bf0
Add predictor to skl constructor. (#7000) 2021-05-29 04:52:56 +08:00
Jiaming Yuan
86e60e3ba8
Guard against index error in prediction. (#6982)
* Remove `best_ntree_limit` from documents.
2021-05-25 23:24:59 +08:00
Daniel Saxton
e41619b1fc
Link to valid tree_method values in docs (#6935) 2021-05-06 17:33:18 +08:00
Jiaming Yuan
a5d7094a45
Update documents. (#6856)
* Add early stopping section to prediction doc.
* Remove best_ntree_limit.
* Better doxygen output.
2021-04-16 12:41:03 +08:00
Jiaming Yuan
dee5ef2dfd
Typehint for Sklearn. (#6799) 2021-04-14 06:55:21 +08:00
Jiaming Yuan
47b62480af
More general predict proba. (#6817)
* Use `output_margin` for `softmax`.
* Add test for dask binary cls.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-04-01 19:52:12 +08:00
Jiaming Yuan
a5c852660b
Update document for sklearn model IO. (#6809)
* Update the use of JSON.
* Remove unnecessary type cast.
2021-04-01 15:52:36 +08:00
Jiaming Yuan
10ae0f9511
Fix doc for apply method. (#6796) 2021-03-31 15:28:31 +08:00
Jiaming Yuan
9da2287ab8
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605)
* Save feature info in booster in JSON model.
* [breaking] Remove automatic feature name generation in `DMatrix`.

This PR is to enable reliable feature validation in Python package.
2021-02-25 18:54:16 +08:00
Jiaming Yuan
c375173dca
Support pylint 2.7.0 (#6726) 2021-02-25 12:49:58 +08:00
Jiaming Yuan
872e559b91
Use inplace predict for sklearn. (#6718)
* Use inplace predict for sklearn when possible.
2021-02-22 12:27:04 +08:00
Benjamin Lehmann
25077564ab
Fixes small typo in sklearn documentation (#6717)
Replaces "dowm" with "down" on parameter n_jobs
2021-02-20 07:36:06 +08:00
Jiaming Yuan
e8c5c53e2f
Use Predictor for dart. (#6693)
* Use normal predictor for dart booster.
* Implement `inplace_predict` for dart.
* Enable `dart` for dask interface now that it's thread-safe.
* categorical data should be working out of box for dart now.

The implementation is not very efficient as it has to pull back the data and
apply weight for each tree, but still a significant improvement over previous
implementation as now we no longer binary search for each sample.

* Fix output prediction shape on dataframe.
2021-02-09 23:30:19 +08:00
Jiaming Yuan
4656b09d5d
[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668)
* Add a new API function for predicting on `DMatrix`.  This function aligns
with rest of the `XGBoosterPredictFrom*` functions on semantic of function
arguments.
* Purge `ntree_limit` from libxgboost, use iteration instead.
* [dask] Use `inplace_predict` by default for dask sklearn models.
* [dask] Run prediction shape inference on worker instead of client.

The breaking change is in the Python sklearn `apply` function, I made it to be
consistent with other prediction functions where `best_iteration` is used by
default.
2021-02-08 18:26:32 +08:00
Jiaming Yuan
a4101de678
Fix divide by 0 in feature importance when no split is found. (#6676) 2021-02-05 03:39:30 +08:00
Jiaming Yuan
87ab1ad607
[dask] Accept Future of model for prediction. (#6650)
This PR changes predict and inplace_predict to accept a Future of model, to avoid sending models to workers repeatably.

* Document is updated to reflect functionality additions in recent changes.
2021-02-02 08:45:52 +08:00
Jiaming Yuan
740d042255
Add base_margin for evaluation dataset. (#6591)
* Add base margin to evaluation datasets.
* Unify the code base for evaluation matrices.
2021-01-26 02:11:02 +08:00
Jiaming Yuan
8942c98054
Define metainfo and other parameters for all DMatrix interfaces. (#6601)
This PR ensures all DMatrix types have a common interface.

* Fix logic in avoiding duplicated DMatrix in sklearn.
* Check for consistency between DMatrix types.
* Add doc for bounds.
2021-01-25 16:06:06 +08:00
Jiaming Yuan
89a00a5866
[dask] Random forest estimators (#6602) 2021-01-13 20:59:20 +08:00
Jiaming Yuan
0027220aa0
[breaking] Remove duplicated predict functions, Fix attributes IO. (#6593)
* Fix attributes not being restored.
* Rename all `data` to `X`. [breaking]
2021-01-13 16:56:49 +08:00
Jiaming Yuan
03cd087da1
Remove duplicated DMatrix. (#6592) 2021-01-12 09:36:56 +08:00
Jiaming Yuan
c709f2aaaf
Fix evaluation result for XGBRanker. (#6594)
* Remove duplicated code, which fixes typo `evals_result` -> `evals_result_`.
2021-01-12 09:36:41 +08:00
Jiaming Yuan
80065d571e
[dask] Add DaskXGBRanker (#6576)
* Initial support for distributed LTR using dask.

* Support `qid` in libxgboost.
* Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]`
  to avoid duplicated code.
* Define `DaskXGBRanker`.

The dask ranker doesn't support group structure, instead it uses query id and
convert to group ptr internally.
2021-01-08 18:35:09 +08:00
Jiaming Yuan
f5ff90cd87
Support _estimator_type. (#6582)
* Use `_estimator_type`.

For more info, see: https://scikit-learn.org/stable/developers/develop.html#estimator-types

* Model trained from dask can be loaded by single node skl interface.
2021-01-08 10:01:16 +08:00
Jiaming Yuan
60cfd14349
[dask, sklearn] Fix predict proba. (#6566)
* For sklearn:
  - Handles user defined objective function.
  - Handles `softmax`.

* For dask:
  - Use the implementation from sklearn, the previous implementation doesn't perform any extra handling.
2021-01-05 08:29:06 +08:00
Philip Hyunsu Cho
fa13992264
Calling XGBModel.fit() should clear the Booster by default (#6562)
* Calling XGBModel.fit() should clear the Booster by default

* Document the behavior of fit()

* Allow sklearn object to be passed in directly via xgb_model argument

* Fix lint
2020-12-31 11:02:08 -08:00
Jiaming Yuan
de8fd852a5
[dask] Add type hints. (#6519)
* Add validate_features.
* Show type hints in doc.

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-12-29 19:41:02 +08:00
Jiaming Yuan
610ee632cc
[Breaking] Rename data to X in predict_proba. (#6555)
New Scikit-Learn version uses keyword argument, and `X` is the predefined
keyword.

* Use pip to install latest Python graphviz on Windows CI.
2020-12-28 21:36:03 +08:00
Philip Hyunsu Cho
380f6f4ab8
Remove cupy.array_equal, since it's not compatible with cuPy 7.8 (#6528) 2020-12-18 09:16:52 -08:00
Jiaming Yuan
ca3da55de4
Support early stopping with training continuation, correct num boosted rounds. (#6506)
* Implement early stopping with training continuation.

* Add new C API for obtaining boosted rounds.

* Fix off by 1 in `save_best`.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-12-17 19:59:19 +08:00
Jiaming Yuan
d6386e45e8
Fix filtering callable objects in skl xgb param. (#6466)
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-12-05 17:20:36 +08:00
Jiaming Yuan
2ce2a1a4d8
[SKL] Propagate parameters to booster during set_param. (#6416) 2020-11-20 20:37:35 +08:00
Jiaming Yuan
fcfeb4959c
Deprecate positional arguments. (#6365)
Deprecate positional arguments in following functions:

- `__init__` for all classes in sklearn module.
- `fit` method for all classes in sklearn module.
- dask interface.
- `set_info` for `DMatrix` class.

Refactor the evaluation matrices handling.
2020-11-13 11:10:30 +08:00
Jiaming Yuan
c90f968d92
Update Python documents. (#6376) 2020-11-12 17:51:32 +08:00
Jiaming Yuan
e8884c4637
Document tree method for feature weights. (#6312) 2020-10-28 13:42:13 -07:00
Philip Hyunsu Cho
c8ec62103a
Deprecate LabelEncoder in XGBClassifier; Enable cuDF/cuPy inputs in XGBClassifier (#6269)
* Deprecate LabelEncoder in XGBClassifier; skip LabelEncoder for cuDF/cuPy inputs

* Add unit tests for cuDF and cuPy inputs with XGBClassifier

* Fix lint

* Clarify warning

* Move use_label_encoder option to XGBClassifier constructor

* Add a test for cudf.Series

* Add use_label_encoder to XGBRFClassifier doc

* Address reviewer feedback
2020-10-26 13:20:51 -07:00
Jiaming Yuan
81c37c28d5
Time the CPU tests on Jenkins. (#6257)
* Time the CPU tests on Jenkins.
* Reduce thread contention.
* Add doc.
* Skip heavy tests on ARM.
2020-10-20 17:19:07 -07:00
Jiaming Yuan
52452bebb9
Fix cls typo. (#6247) 2020-10-16 16:40:44 +08:00
Jiaming Yuan
a144daf034
Limit tree depth for GPU hist. (#6045) 2020-08-22 19:34:52 +08:00
Jiaming Yuan
7be2e04bd4
Fix scikit learn cls doc. (#6041) 2020-08-20 19:23:06 -07:00
Jiaming Yuan
4d99c58a5f
Feature weights (#5962) 2020-08-18 19:55:41 +08:00
Jiaming Yuan
1149a7a292
Fix sklearn doc. (#5980) 2020-08-05 12:26:19 +08:00
Jiaming Yuan
f5fdcbe194
Disable feature validation on sklearn predict prob. (#5953)
* Fix issue when scikit learn interface receives transformed inputs.
2020-07-29 19:26:44 +08:00
Alex
ae18a094b0
Add new skl model attribute for number of features (#5780) 2020-06-15 18:01:59 +08:00
Jiaming Yuan
f145241593
Let XGBoostError inherit ValueError. (#5696) 2020-05-26 08:34:56 +08:00
Jiaming Yuan
5af8161a1a
Implement Python data handler. (#5689)
* Define data handlers for DMatrix.
* Throw ValueError in scikit learn interface.
2020-05-22 11:53:55 +08:00
Jiaming Yuan
f27b6f9ba6
Update document. (#5572) 2020-04-22 02:37:37 +08:00