236 Commits

Author SHA1 Message Date
Jiaming Yuan
4656b09d5d
[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668)
* Add a new API function for predicting on `DMatrix`.  This function aligns
with rest of the `XGBoosterPredictFrom*` functions on semantic of function
arguments.
* Purge `ntree_limit` from libxgboost, use iteration instead.
* [dask] Use `inplace_predict` by default for dask sklearn models.
* [dask] Run prediction shape inference on worker instead of client.

The breaking change is in the Python sklearn `apply` function, I made it to be
consistent with other prediction functions where `best_iteration` is used by
default.
2021-02-08 18:26:32 +08:00
Jiaming Yuan
a4101de678
Fix divide by 0 in feature importance when no split is found. (#6676) 2021-02-05 03:39:30 +08:00
Jiaming Yuan
87ab1ad607
[dask] Accept Future of model for prediction. (#6650)
This PR changes predict and inplace_predict to accept a Future of model, to avoid sending models to workers repeatably.

* Document is updated to reflect functionality additions in recent changes.
2021-02-02 08:45:52 +08:00
Jiaming Yuan
740d042255
Add base_margin for evaluation dataset. (#6591)
* Add base margin to evaluation datasets.
* Unify the code base for evaluation matrices.
2021-01-26 02:11:02 +08:00
Jiaming Yuan
8942c98054
Define metainfo and other parameters for all DMatrix interfaces. (#6601)
This PR ensures all DMatrix types have a common interface.

* Fix logic in avoiding duplicated DMatrix in sklearn.
* Check for consistency between DMatrix types.
* Add doc for bounds.
2021-01-25 16:06:06 +08:00
Jiaming Yuan
89a00a5866
[dask] Random forest estimators (#6602) 2021-01-13 20:59:20 +08:00
Jiaming Yuan
0027220aa0
[breaking] Remove duplicated predict functions, Fix attributes IO. (#6593)
* Fix attributes not being restored.
* Rename all `data` to `X`. [breaking]
2021-01-13 16:56:49 +08:00
Jiaming Yuan
03cd087da1
Remove duplicated DMatrix. (#6592) 2021-01-12 09:36:56 +08:00
Jiaming Yuan
c709f2aaaf
Fix evaluation result for XGBRanker. (#6594)
* Remove duplicated code, which fixes typo `evals_result` -> `evals_result_`.
2021-01-12 09:36:41 +08:00
Jiaming Yuan
80065d571e
[dask] Add DaskXGBRanker (#6576)
* Initial support for distributed LTR using dask.

* Support `qid` in libxgboost.
* Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]`
  to avoid duplicated code.
* Define `DaskXGBRanker`.

The dask ranker doesn't support group structure, instead it uses query id and
convert to group ptr internally.
2021-01-08 18:35:09 +08:00
Jiaming Yuan
f5ff90cd87
Support _estimator_type. (#6582)
* Use `_estimator_type`.

For more info, see: https://scikit-learn.org/stable/developers/develop.html#estimator-types

* Model trained from dask can be loaded by single node skl interface.
2021-01-08 10:01:16 +08:00
Jiaming Yuan
60cfd14349
[dask, sklearn] Fix predict proba. (#6566)
* For sklearn:
  - Handles user defined objective function.
  - Handles `softmax`.

* For dask:
  - Use the implementation from sklearn, the previous implementation doesn't perform any extra handling.
2021-01-05 08:29:06 +08:00
Philip Hyunsu Cho
fa13992264
Calling XGBModel.fit() should clear the Booster by default (#6562)
* Calling XGBModel.fit() should clear the Booster by default

* Document the behavior of fit()

* Allow sklearn object to be passed in directly via xgb_model argument

* Fix lint
2020-12-31 11:02:08 -08:00
Jiaming Yuan
de8fd852a5
[dask] Add type hints. (#6519)
* Add validate_features.
* Show type hints in doc.

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-12-29 19:41:02 +08:00
Jiaming Yuan
610ee632cc
[Breaking] Rename data to X in predict_proba. (#6555)
New Scikit-Learn version uses keyword argument, and `X` is the predefined
keyword.

* Use pip to install latest Python graphviz on Windows CI.
2020-12-28 21:36:03 +08:00
Philip Hyunsu Cho
380f6f4ab8
Remove cupy.array_equal, since it's not compatible with cuPy 7.8 (#6528) 2020-12-18 09:16:52 -08:00
Jiaming Yuan
ca3da55de4
Support early stopping with training continuation, correct num boosted rounds. (#6506)
* Implement early stopping with training continuation.

* Add new C API for obtaining boosted rounds.

* Fix off by 1 in `save_best`.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-12-17 19:59:19 +08:00
Jiaming Yuan
d6386e45e8
Fix filtering callable objects in skl xgb param. (#6466)
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-12-05 17:20:36 +08:00
Jiaming Yuan
2ce2a1a4d8
[SKL] Propagate parameters to booster during set_param. (#6416) 2020-11-20 20:37:35 +08:00
Jiaming Yuan
fcfeb4959c
Deprecate positional arguments. (#6365)
Deprecate positional arguments in following functions:

- `__init__` for all classes in sklearn module.
- `fit` method for all classes in sklearn module.
- dask interface.
- `set_info` for `DMatrix` class.

Refactor the evaluation matrices handling.
2020-11-13 11:10:30 +08:00
Jiaming Yuan
c90f968d92
Update Python documents. (#6376) 2020-11-12 17:51:32 +08:00
Jiaming Yuan
e8884c4637
Document tree method for feature weights. (#6312) 2020-10-28 13:42:13 -07:00
Philip Hyunsu Cho
c8ec62103a
Deprecate LabelEncoder in XGBClassifier; Enable cuDF/cuPy inputs in XGBClassifier (#6269)
* Deprecate LabelEncoder in XGBClassifier; skip LabelEncoder for cuDF/cuPy inputs

* Add unit tests for cuDF and cuPy inputs with XGBClassifier

* Fix lint

* Clarify warning

* Move use_label_encoder option to XGBClassifier constructor

* Add a test for cudf.Series

* Add use_label_encoder to XGBRFClassifier doc

* Address reviewer feedback
2020-10-26 13:20:51 -07:00
Jiaming Yuan
81c37c28d5
Time the CPU tests on Jenkins. (#6257)
* Time the CPU tests on Jenkins.
* Reduce thread contention.
* Add doc.
* Skip heavy tests on ARM.
2020-10-20 17:19:07 -07:00
Jiaming Yuan
52452bebb9
Fix cls typo. (#6247) 2020-10-16 16:40:44 +08:00
Jiaming Yuan
a144daf034
Limit tree depth for GPU hist. (#6045) 2020-08-22 19:34:52 +08:00
Jiaming Yuan
7be2e04bd4
Fix scikit learn cls doc. (#6041) 2020-08-20 19:23:06 -07:00
Jiaming Yuan
4d99c58a5f
Feature weights (#5962) 2020-08-18 19:55:41 +08:00
Jiaming Yuan
1149a7a292
Fix sklearn doc. (#5980) 2020-08-05 12:26:19 +08:00
Jiaming Yuan
f5fdcbe194
Disable feature validation on sklearn predict prob. (#5953)
* Fix issue when scikit learn interface receives transformed inputs.
2020-07-29 19:26:44 +08:00
Alex
ae18a094b0
Add new skl model attribute for number of features (#5780) 2020-06-15 18:01:59 +08:00
Jiaming Yuan
f145241593
Let XGBoostError inherit ValueError. (#5696) 2020-05-26 08:34:56 +08:00
Jiaming Yuan
5af8161a1a
Implement Python data handler. (#5689)
* Define data handlers for DMatrix.
* Throw ValueError in scikit learn interface.
2020-05-22 11:53:55 +08:00
Jiaming Yuan
f27b6f9ba6
Update document. (#5572) 2020-04-22 02:37:37 +08:00
Jiaming Yuan
93df871c8c
Assert matching length of evaluation inputs. (#5540) 2020-04-18 06:52:55 +08:00
Jiaming Yuan
c69a19e2b1
Fix skl nan tag. (#5538) 2020-04-18 06:52:17 +08:00
Jiaming Yuan
dc2950fd90
Fix checking booster. (#5505)
* Use `get_params()` instead of `getattr` intrinsic.
2020-04-10 12:21:21 +08:00
Jiaming Yuan
c218d8ffbf
Enable parameter validation for skl. (#5477) 2020-04-03 10:23:58 +08:00
Darius Kharazi
71a8b8c65a
Fix simple typo: information.c -> information (#5384)
Closes #5383
2020-03-03 08:50:14 +08:00
Philip Hyunsu Cho
7ac7e8778f
Port patches from 1.0.0 branch (#5336)
* Remove f-string, since it's not supported by Python 3.5 (#5330)

* Remove f-string, since it's not supported by Python 3.5

* Add Python 3.5 to CI, to ensure compatibility

* Remove duplicated matplotlib

* Show deprecation notice for Python 3.5

* Fix lint

* Fix lint

* Fix a unit test that mistook MINOR ver for PATCH ver

* Enforce only major version in JSON model schema

* Bump version to 1.1.0-SNAPSHOT
2020-02-21 13:13:21 -08:00
David Díaz Vico
71e7e3b96f
Improved sklearn compatibility (#5255) 2020-02-03 13:30:45 +08:00
Jiaming Yuan
a5cc112eea
Export JSON config in get_params. (#5256) 2020-02-03 12:46:51 +08:00
Jiaming Yuan
472ded549d
Save Scikit-Learn attributes into learner attributes. (#5245)
* Remove the recommendation for pickle.

* Save skl attributes in booster.attr

* Test loading scikit-learn model with native booster.
2020-01-30 16:00:18 +08:00
Jiaming Yuan
40680368cf
Add constraint parameters to Scikit-Learn interface. (#5227)
* Add document for constraints.

* Fix a format error in doc for objective function.
2020-01-25 11:12:02 +08:00
OrdoAbChao
b4f952bd22 [Breaking] Remove Scikit-Learn default parameters (#5130)
* Simplify Scikit-Learn parameter management.

* Copy base class for removing duplicated parameter signatures.
* Set all parameters to None.
* Handle None in set_param.
* Extract the doc.

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2020-01-23 20:25:20 +08:00
Jiaming Yuan
ebc86a3afa
Disable parameter validation for Scikit-Learn interface. (#5167)
* Disable parameter validation for now.

Scikit-Learn passes all parameters down to XGBoost, whether they are used or
not.

* Add option `validate_parameters`.
2020-01-07 11:17:31 +08:00
Jiaming Yuan
0202e04a8e
Add base margin to sklearn interface. (#5151) 2019-12-24 09:43:41 +08:00
Jiaming Yuan
a4f5c86276
Allow using RandomState object from Numpy in sklearn interface. (#5049) 2019-11-19 10:56:39 +08:00
Jiaming Yuan
185e3f1916
Update GPU doc. (#4953) 2019-10-16 05:54:09 -04:00
Peter Badida
a9053aff83 Fix incorrectly displayed Note in the doc (#4943) 2019-10-14 03:45:23 -04:00