236 Commits

Author SHA1 Message Date
Chengyang
806c92c80b
Add Type Hints for Python Package (#7742)
Co-authored-by: Chengyang Gu <bridgream@gmail.com>
Co-authored-by: Jiamingy <jm.yuan@outlook.com>
2022-05-17 22:14:09 +08:00
Jiaming Yuan
c70fa502a5
Expose feature_types to sklearn interface. (#7821) 2022-04-21 20:23:35 +08:00
Jiaming Yuan
52d4eda786
Deprecate use_label_encoder in XGBClassifier. (#7822)
* Deprecate `use_label_encoder` in XGBClassifier.

* We have removed the encoder, now prepare to remove the indicator.
2022-04-21 13:14:02 +08:00
Chengyang
c92ab2ce49
Add type hints to core.py (#7707)
Co-authored-by: Chengyang Gu <bridgream@gmail.com>
Co-authored-by: jiamingy <jm.yuan@outlook.com>
2022-03-23 21:12:14 +08:00
Jiaming Yuan
83a66b4994
Support categorical data for hist. (#7695)
* Extract partitioner from hist.
* Implement categorical data support by passing the gradient index directly into the partitioner.
* Organize/update document.
* Remove code for negative hessian.
2022-02-25 03:47:14 +08:00
Jiaming Yuan
c859764d29
[doc] Clarify that states in callbacks are mutated. (#7685)
* Fix copy for cv.  This prevents inserting default callbacks into the input list.
* Clarify the behavior of callbacks in training/cv.
* Fix typos in doc.
2022-02-22 11:45:00 +08:00
Jiaming Yuan
0d0abe1845
Support optimal partitioning for GPU hist. (#7652)
* Implement `MaxCategory` in quantile.
* Implement partition-based split for GPU evaluation.  Currently, it's based on the existing evaluation function.
* Extract an evaluator from GPU Hist to store the needed states.
* Added some CUDA stream/event utilities.
* Update document with references.
* Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
2022-02-15 03:03:12 +08:00
Jiaming Yuan
926af9951e
Add missing train parameter for sklearn interface. (#7629)
Some other parameters are still missing and rely on **kwargs, for instance parameters from
dart.
2022-02-08 13:20:19 +08:00
Philip Hyunsu Cho
b4340abf56
Add special handling for multi:softmax in sklearn predict (#7607)
* Add special handling for multi:softmax in sklearn predict

* Add test coverage
2022-01-29 15:54:49 -08:00
Jiaming Yuan
f84291c1e1
Fix max_cat_to_onehot doc annotation [skip ci] (#7592) 2022-01-23 16:33:23 +08:00
Jiaming Yuan
b4ec1682c6
Update document for multi output and categorical. (#7574)
* Group together categorical related parameters.
* Update documents about multioutput and categorical.
2022-01-19 04:35:17 +08:00
Jiaming Yuan
001503186c
Rewrite approx (#7214)
This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing.

The rewrite has many benefits:
- Support for both `max_leaves` and `max_depth`.
- Support for `grow_policy`.
- Support for mono constraint.
- Support for feature weights.
- Support for easier bin configuration (`max_bin`).
- Support for categorical data.
- Faster performance for most of the datasets. (many times faster)
- Support for prediction cache.
- Significantly better performance for external memory.
- Unites the code base between approx and hist.
2022-01-10 21:15:05 +08:00
Jiaming Yuan
54582f641a
[doc] Use cross references in sphinx doc. (#7522)
* Use cross references instead of URL.
* Fix auto doc for callback.
2022-01-05 03:21:25 +08:00
Jiaming Yuan
eb1efb54b5
Define feature_names_in_. (#7526)
* Define `feature_names_in_`.
* Raise attribute error if it's not defined.
2022-01-05 01:35:34 +08:00
Jiaming Yuan
8f0a42a266
Initial support for multi-label classification. (#7521)
* Add support in sklearn classifier.
2022-01-04 23:58:21 +08:00
Jiaming Yuan
6f8a4633b7
Fix Python typehint with upgraded mypy. (#7513) 2021-12-16 23:08:08 +08:00
Kian Meng Ang
d27a11ff87
Fix typos in python package (#7432) 2021-11-14 17:20:19 +08:00
Jiaming Yuan
97d7582457
Delay breaking changes to 1.6. (#7420)
The patch is too big to be backported.
2021-11-12 16:46:03 +08:00
Jiaming Yuan
154b15060e
Move callbacks from fit to __init__. (#7375) 2021-11-02 17:51:42 +08:00
Jiaming Yuan
c6769488b3
Typehint for subset of core API. (#7348) 2021-10-28 20:47:04 +08:00
Jiaming Yuan
45aef75cca
Move skl eval_metric and early_stopping rounds to model params. (#6751)
A new parameter `custom_metric` is added to `train` and `cv` to distinguish the behaviour from the old `feval`.  And `feval` is deprecated.  The new `custom_metric` receives transformed prediction when the built-in objective is used.  This enables XGBoost to use cost functions from other libraries like scikit-learn directly without going through the definition of the link function.

`eval_metric` and `early_stopping_rounds` in sklearn interface are moved from `fit` to `__init__` and is now saved as part of the scikit-learn model.  The old ones in `fit` function are now deprecated. The new `eval_metric` in `__init__` has the same new behaviour as `custom_metric`.

Added more detailed documents for the behaviour of custom objective and metric.
2021-10-28 17:20:20 +08:00
Jiaming Yuan
3c4aa9b2ea
[breaking] Remove label encoder deprecated in 1.3. (#7357) 2021-10-28 13:24:29 +08:00
Jiaming Yuan
f56e2e9a66
Support categorical data with pandas Dataframe in inplace prediction (#7322) 2021-10-17 14:32:06 +08:00
Jiaming Yuan
5b17bb0031
Fix prediction with cat data in sklearn interface. (#7306)
* Specify DMatrix parameter for pre-processing dataframe.
* Add document about the behaviour of prediction.
2021-10-12 14:31:12 +08:00
Jiaming Yuan
e48e05e6e2
Add typehint to rabit module. (#7240) 2021-09-17 18:31:02 +08:00
Jiaming Yuan
c735c17f33
Disable callback and ES on random forest. (#7236) 2021-09-17 18:21:17 +08:00
Jiaming Yuan
b18f5f61b0
Fix pylint (#7241) 2021-09-17 11:50:36 +08:00
Jiaming Yuan
0ed979b096
Support more input types for categorical data. (#7220)
* Support more input types for categorical data.

* Shorten the type name from "categorical" to "c".
* Tests for np/cp array and scipy csr/csc/coo.
* Specify the type for feature info.
2021-09-16 20:39:30 +08:00
Jiaming Yuan
037dd0820d
Implement __sklearn_is_fitted__. (#7230) 2021-09-15 19:09:04 +08:00
Jiaming Yuan
ee8d1f5ed8
Fix histogram truncation. (#7181)
* Fix truncation.

* Lint.

* lint.
2021-08-24 18:34:32 -07:00
Jiaming Yuan
3290a4f3ed
Re-enable feature validation in predict proba. (#7177) 2021-08-22 15:28:08 +08:00
Jiaming Yuan
3f38d983a6
Fix prediction configuration. (#7159)
After the predictor parameter was added to the constructor, this configuration was broken.
2021-08-11 16:34:36 +08:00
Jiaming Yuan
8a84be37b8
Pass scikit learn estimator checks for regressor. (#7130)
* Check data shape.
* Check labels.
2021-08-03 18:58:20 +08:00
graue70
dfdf0b08fc
Fix typo and grammatical mistake in error message (#7134) 2021-07-28 17:17:05 +08:00
Jiaming Yuan
663136aa08
Implement feature score for linear model. (#7048)
* Add feature score support for linear model.
* Port R interface to the new implementation.
* Add linear model support in Python.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-06-25 14:34:02 +08:00
Jiaming Yuan
1d4d345634
Tests for dask skl categorical data support. (#7054) 2021-06-24 16:33:57 +08:00
Jiaming Yuan
c4b9f4f622
Add enable_categorical to sklearn. (#7011) 2021-06-04 02:29:14 +08:00
Jiaming Yuan
816b789bf0
Add predictor to skl constructor. (#7000) 2021-05-29 04:52:56 +08:00
Jiaming Yuan
86e60e3ba8
Guard against index error in prediction. (#6982)
* Remove `best_ntree_limit` from documents.
2021-05-25 23:24:59 +08:00
Daniel Saxton
e41619b1fc
Link to valid tree_method values in docs (#6935) 2021-05-06 17:33:18 +08:00
Jiaming Yuan
a5d7094a45
Update documents. (#6856)
* Add early stopping section to prediction doc.
* Remove best_ntree_limit.
* Better doxygen output.
2021-04-16 12:41:03 +08:00
Jiaming Yuan
dee5ef2dfd
Typehint for Sklearn. (#6799) 2021-04-14 06:55:21 +08:00
Jiaming Yuan
47b62480af
More general predict proba. (#6817)
* Use `output_margin` for `softmax`.
* Add test for dask binary cls.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-04-01 19:52:12 +08:00
Jiaming Yuan
a5c852660b
Update document for sklearn model IO. (#6809)
* Update the use of JSON.
* Remove unnecessary type cast.
2021-04-01 15:52:36 +08:00
Jiaming Yuan
10ae0f9511
Fix doc for apply method. (#6796) 2021-03-31 15:28:31 +08:00
Jiaming Yuan
9da2287ab8
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605)
* Save feature info in booster in JSON model.
* [breaking] Remove automatic feature name generation in `DMatrix`.

This PR is to enable reliable feature validation in Python package.
2021-02-25 18:54:16 +08:00
Jiaming Yuan
c375173dca
Support pylint 2.7.0 (#6726) 2021-02-25 12:49:58 +08:00
Jiaming Yuan
872e559b91
Use inplace predict for sklearn. (#6718)
* Use inplace predict for sklearn when possible.
2021-02-22 12:27:04 +08:00
Benjamin Lehmann
25077564ab
Fixes small typo in sklearn documentation (#6717)
Replaces "dowm" with "down" on parameter n_jobs
2021-02-20 07:36:06 +08:00
Jiaming Yuan
e8c5c53e2f
Use Predictor for dart. (#6693)
* Use normal predictor for dart booster.
* Implement `inplace_predict` for dart.
* Enable `dart` for dask interface now that it's thread-safe.
* categorical data should be working out of box for dart now.

The implementation is not very efficient as it has to pull back the data and
apply weight for each tree, but still a significant improvement over previous
implementation as now we no longer binary search for each sample.

* Fix output prediction shape on dataframe.
2021-02-09 23:30:19 +08:00