506 Commits

Author SHA1 Message Date
Jiaming Yuan
c709f2aaaf
Fix evaluation result for XGBRanker. (#6594)
* Remove duplicated code, which fixes typo `evals_result` -> `evals_result_`.
2021-01-12 09:36:41 +08:00
Jiaming Yuan
80065d571e
[dask] Add DaskXGBRanker (#6576)
* Initial support for distributed LTR using dask.

* Support `qid` in libxgboost.
* Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]`
  to avoid duplicated code.
* Define `DaskXGBRanker`.

The dask ranker doesn't support group structure, instead it uses query id and
convert to group ptr internally.
2021-01-08 18:35:09 +08:00
Jiaming Yuan
7c9dcbedbc
Fix best_ntree_limit for dart and gblinear. (#6579) 2021-01-08 10:05:39 +08:00
Jiaming Yuan
f5ff90cd87
Support _estimator_type. (#6582)
* Use `_estimator_type`.

For more info, see: https://scikit-learn.org/stable/developers/develop.html#estimator-types

* Model trained from dask can be loaded by single node skl interface.
2021-01-08 10:01:16 +08:00
Jiaming Yuan
60cfd14349
[dask, sklearn] Fix predict proba. (#6566)
* For sklearn:
  - Handles user defined objective function.
  - Handles `softmax`.

* For dask:
  - Use the implementation from sklearn, the previous implementation doesn't perform any extra handling.
2021-01-05 08:29:06 +08:00
Jiaming Yuan
516a93d25c
Fix best_ntree_limit. (#6569) 2021-01-03 05:58:54 +08:00
James Lamb
195a41cef1
[python-package] remove unnecessary files to reduce sdist size (fixes #6560) (#6565) 2021-01-02 15:56:39 +08:00
Philip Hyunsu Cho
fa13992264
Calling XGBModel.fit() should clear the Booster by default (#6562)
* Calling XGBModel.fit() should clear the Booster by default

* Document the behavior of fit()

* Allow sklearn object to be passed in directly via xgb_model argument

* Fix lint
2020-12-31 11:02:08 -08:00
Jiaming Yuan
de8fd852a5
[dask] Add type hints. (#6519)
* Add validate_features.
* Show type hints in doc.

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-12-29 19:41:02 +08:00
Jiaming Yuan
610ee632cc
[Breaking] Rename data to X in predict_proba. (#6555)
New Scikit-Learn version uses keyword argument, and `X` is the predefined
keyword.

* Use pip to install latest Python graphviz on Windows CI.
2020-12-28 21:36:03 +08:00
Philip Hyunsu Cho
fbb980d9d3
Expand ~ into the home directory on Linux and MacOS (#6531) 2020-12-19 23:35:13 -08:00
Philip Hyunsu Cho
380f6f4ab8
Remove cupy.array_equal, since it's not compatible with cuPy 7.8 (#6528) 2020-12-18 09:16:52 -08:00
Jiaming Yuan
ca3da55de4
Support early stopping with training continuation, correct num boosted rounds. (#6506)
* Implement early stopping with training continuation.

* Add new C API for obtaining boosted rounds.

* Fix off by 1 in `save_best`.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-12-17 19:59:19 +08:00
Philip Hyunsu Cho
125b3c0f2d
Lazy import cuDF and Dask (#6522)
* Lazy import cuDF

* Lazy import Dask

Co-authored-by: PSEUDOTENSOR / Jonathan McKinney <pseudotensor@gmail.com>

* Fix lint

Co-authored-by: PSEUDOTENSOR / Jonathan McKinney <pseudotensor@gmail.com>
2020-12-17 01:51:35 -08:00
Jiaming Yuan
d8d684538c
[CI] Split up main.yml, add mypy. (#6515) 2020-12-17 00:15:44 +08:00
Jiaming Yuan
0e97d97d50
Fix merge conflict. (#6512) 2020-12-16 18:02:25 +08:00
Jiaming Yuan
347f593169
Accept numpy array for DMatrix slice index. (#6368) 2020-12-16 14:42:52 +08:00
Jiaming Yuan
ef4a0e0aac
Fix DMatrix feature names/types IO. (#6507)
* Fix feature names/types IO

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-12-16 14:24:27 +08:00
Jiaming Yuan
3c3f026ec1
Move metric configuration into booster. (#6504) 2020-12-16 05:35:04 +08:00
Jiaming Yuan
d45c0d843b
Show partition status in dask error. (#6366) 2020-12-16 02:58:21 +08:00
ShvetsKS
8139849ab6
Fix handling of print period in EvaluationMonitor (#6499)
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2020-12-15 19:20:19 +08:00
Jiaming Yuan
a30461cf87
[dask] Support all parameters in regressor and classifier. (#6471)
* Add eval_metric.
* Add callback.
* Add feature weights.
* Add custom objective.
2020-12-14 07:35:56 +08:00
Philip Hyunsu Cho
0d483cb7c1
Bump version to 1.4.0 snapshot in master (#6486) 2020-12-10 07:38:08 -08:00
Jiaming Yuan
0ffaf0f5be
Fix dask ip resolution. (#6475)
This adopts the solution used in dask/dask-xgboost#40 which employs the get_host_ip from dmlc-core tracker.
2020-12-07 16:36:23 -08:00
Jiaming Yuan
47b86180f6
Don't validate feature when number of rows is 0. (#6472) 2020-12-07 18:08:51 +08:00
Jiaming Yuan
703c2d06aa
Fix global config default value. (#6470) 2020-12-06 06:15:33 +08:00
Jiaming Yuan
d6386e45e8
Fix filtering callable objects in skl xgb param. (#6466)
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-12-05 17:20:36 +08:00
Philip Hyunsu Cho
c103ec51d8
Enforce row-major order in cuPy array (#6459) 2020-12-03 18:29:10 -08:00
Philip Hyunsu Cho
4f70e14031
Fix docstring of config.py to use correct versionadded (#6458) 2020-12-03 10:41:53 -08:00
Philip Hyunsu Cho
fb56da5e8b
Add global configuration (#6414)
* Add management functions for global configuration: XGBSetGlobalConfig(), XGBGetGlobalConfig().
* Add Python interface: set_config(), get_config(), and config_context().
* Add unit tests for Python
* Add R interface: xgb.set.config(), xgb.get.config()
* Add unit tests for R

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2020-12-03 00:05:18 -08:00
Jiaming Yuan
927c316aeb
Fix period in evaluation monitor. (#6441) 2020-11-29 03:18:33 +08:00
Jiaming Yuan
2ce2a1a4d8
[SKL] Propagate parameters to booster during set_param. (#6416) 2020-11-20 20:37:35 +08:00
Jiaming Yuan
a7b42adb74
Fix dask predict (#6412) 2020-11-20 10:10:52 +08:00
Jiaming Yuan
3ac173fc8b
Fix typo. (#6399) 2020-11-16 16:59:12 -08:00
Nikhil Choudhary
ae1662028a
Fixed few grammatical mistakes in doc (#6393) 2020-11-15 13:48:08 +08:00
Jiaming Yuan
fcd6fad822
[dask] Small cleanup. (#6391) 2020-11-14 22:15:05 +08:00
Jiaming Yuan
4ccf92ea34
[dask] Fix union of workers. (#6375) 2020-11-13 16:55:05 +08:00
Jiaming Yuan
fcfeb4959c
Deprecate positional arguments. (#6365)
Deprecate positional arguments in following functions:

- `__init__` for all classes in sklearn module.
- `fit` method for all classes in sklearn module.
- dask interface.
- `set_info` for `DMatrix` class.

Refactor the evaluation matrices handling.
2020-11-13 11:10:30 +08:00
Jiaming Yuan
c90f968d92
Update Python documents. (#6376) 2020-11-12 17:51:32 +08:00
Jiaming Yuan
6e12c2a6f8
[dask] Supoort running on GKE. (#6343)
* Avoid accessing `scheduler_info()['workers']`.
* Avoid calling `client.gather` inside task.
* Avoid using `client.scheduler_address`.
2020-11-11 18:04:34 +08:00
Jiaming Yuan
e65e3cf36e
Support shared library in system path. (#6362) 2020-11-10 16:04:25 +08:00
Jiaming Yuan
184e2eac7d
Add period to evaluation monitor. (#6348) 2020-11-10 07:47:48 +08:00
Jiaming Yuan
2cc9662005
Support slicing tree model (#6302)
This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone.

* Implement the save_best option in early stopping.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-11-02 23:27:39 -08:00
Rory Mitchell
29745c6df2
Fix inclusive scan for large sizes (#6234) 2020-11-03 17:01:43 +13:00
Jiaming Yuan
7756192906
[dask] Fix prediction on DaskDMatrix with multiple meta data. (#6333)
* Unify the meta handling methods.
2020-11-02 19:18:44 -05:00
Jiaming Yuan
6ff331b705
Fix Python callback. (#6320) 2020-10-30 05:03:44 +08:00
Jiaming Yuan
74ea82209b
Lazy import dask libraries. (#6309)
* Lazy import dask libraries.

* Lint && fix.

* Use short name.
2020-10-28 15:50:11 -07:00
Jiaming Yuan
e8884c4637
Document tree method for feature weights. (#6312) 2020-10-28 13:42:13 -07:00
Jiaming Yuan
b180223d18
Cleanup RABIT. (#6290)
* Remove recovery and MPI speed tests.
* Remove readme.
* Remove Python binding.
* Add checks in C API.
2020-10-27 08:48:22 +08:00
Philip Hyunsu Cho
c8ec62103a
Deprecate LabelEncoder in XGBClassifier; Enable cuDF/cuPy inputs in XGBClassifier (#6269)
* Deprecate LabelEncoder in XGBClassifier; skip LabelEncoder for cuDF/cuPy inputs

* Add unit tests for cuDF and cuPy inputs with XGBClassifier

* Fix lint

* Clarify warning

* Move use_label_encoder option to XGBClassifier constructor

* Add a test for cudf.Series

* Add use_label_encoder to XGBRFClassifier doc

* Address reviewer feedback
2020-10-26 13:20:51 -07:00