730 Commits

Author SHA1 Message Date
Jiaming Yuan
b56d3d5d5c
Fix with latest panda range index. (#7074) 2021-07-03 16:43:52 +08:00
Jiaming Yuan
93f3acdef9
Fix with latest pylint. (#7071) 2021-07-02 21:26:00 +08:00
Jiaming Yuan
a5d222fcdb
Handle categorical split in model histogram and dataframe. (#7065)
* Error on get_split_value_histogram when feature is categorical
* Add a category column to output dataframe
2021-07-02 13:10:36 +08:00
Philip Hyunsu Cho
dd4db347f3
Fix early stopping behavior with MAPE metric (#7061) 2021-06-26 03:02:33 +08:00
Jiaming Yuan
663136aa08
Implement feature score for linear model. (#7048)
* Add feature score support for linear model.
* Port R interface to the new implementation.
* Add linear model support in Python.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-06-25 14:34:02 +08:00
Jiaming Yuan
1d4d345634
Tests for dask skl categorical data support. (#7054) 2021-06-24 16:33:57 +08:00
Jiaming Yuan
da1ad798ca
Convert numpy float to Python float in feat score. (#7047) 2021-06-21 20:58:43 +08:00
Jiaming Yuan
29f8fd6fee
Support categorical split in tree model dump. (#7036) 2021-06-18 16:46:20 +08:00
Jiaming Yuan
86715e4cd4
Support categorical data for dask functional interface and DQM. (#7043)
* Support categorical data for dask functional interface and DQM.

* Implement categorical data support for GPU GK-merge.
* Add support for dask functional interface.
* Add support for DQM.

* Get newer cupy.
2021-06-18 13:06:52 +08:00
Jiaming Yuan
7dd29ffd47
Implement feature score in GBTree. (#7041)
* Categorical data support.
* Eliminate text parsing during feature score computation.
2021-06-18 11:53:16 +08:00
Jiaming Yuan
d9799b09d0
Categorical data support for cuDF. (#7042)
* Add support in DMatrix.
* Add support in DQM, except for iterator.
2021-06-17 13:54:33 +08:00
Jiaming Yuan
b56614e9b8
[R] Use new predict function. (#6819)
* Call new C prediction API.
* Add `strict_shape`.
* Add `iterationrange`.
* Update document.
2021-06-11 13:03:29 +08:00
Jiaming Yuan
c4b9f4f622
Add enable_categorical to sklearn. (#7011) 2021-06-04 02:29:14 +08:00
Jiaming Yuan
ee4f51a631
Support for all primitive types from array. (#7003)
* Change C API name.
* Test for all primitive types from array.
* Add native support for CPU 128 float.
* Convert boolean and float16 in Python.

* Fix dask version for now.
2021-06-01 08:34:48 +08:00
Jiaming Yuan
816b789bf0
Add predictor to skl constructor. (#7000) 2021-05-29 04:52:56 +08:00
Jiaming Yuan
89a49cf30e
Fix dask predict on DaskDMatrix with iteration_range. (#7005) 2021-05-29 04:43:12 +08:00
Jiaming Yuan
4cf95a6041
Support numpy array interface (#6998) 2021-05-27 16:08:22 +08:00
Jiaming Yuan
ab6fd304c4
[Python] Change development release postfix to dev (#6988) 2021-05-27 16:06:51 +08:00
Jiaming Yuan
86e60e3ba8
Guard against index error in prediction. (#6982)
* Remove `best_ntree_limit` from documents.
2021-05-25 23:24:59 +08:00
Mads R. B. Kristensen
81bdfb835d
lazy_isinstance(): use .__class__ for type check (#6974) 2021-05-21 11:33:08 +08:00
Jiaming Yuan
7e846bb965
Fix prediction on df with latest dask. (#6969) 2021-05-19 12:23:03 +08:00
Jiaming Yuan
d245bc891e
Add tolerance to early stopping. (#6942) 2021-05-14 00:19:51 +08:00
Jiaming Yuan
05ac415780
[dask] Set dataframe index in predict. (#6944) 2021-05-12 13:24:21 +08:00
vslaykovsky
2a9979e256
Fixed incorrect feature mismatch error message (#6949)
data.shape[0] denotes the number of samples, data.shape[1] is the number of features
2021-05-11 13:52:11 +08:00
Daniel Saxton
e41619b1fc
Link to valid tree_method values in docs (#6935) 2021-05-06 17:33:18 +08:00
Jose Manuel Llorens
4ddbaeea32
Improve warning when using np.ndarray subsets (#6934) 2021-05-04 13:24:41 +08:00
Jiaming Yuan
37ad60fe25
Enforce input data is not object. (#6927)
* Check for object data type.
* Allow strided arrays with greater underlying buffer size.
2021-05-02 00:09:01 +08:00
Jiaming Yuan
ef473b1f09
Disable pylint error. (#6911) 2021-04-29 01:01:37 +08:00
Jiaming Yuan
a2ecbdaa31
Add an API guard to prevent global variables being changed. (#6891) 2021-04-23 10:27:57 +08:00
Jiaming Yuan
233bdf105f
Remove setDaemon in tracker. (#6872) 2021-04-22 01:57:13 +08:00
Jiaming Yuan
146549260a
Bump version to 1.5.0 snapshot in master. (#6875) 2021-04-22 01:53:44 +08:00
Jiaming Yuan
a5d7094a45
Update documents. (#6856)
* Add early stopping section to prediction doc.
* Remove best_ntree_limit.
* Better doxygen output.
2021-04-16 12:41:03 +08:00
Jiaming Yuan
dee5ef2dfd
Typehint for Sklearn. (#6799) 2021-04-14 06:55:21 +08:00
giladmaya
aa0d8f20c1
Support configuring constraints by feature names (#6783)
Co-authored-by: fis <jm.yuan@outlook.com>
2021-04-04 06:53:33 +08:00
Jiaming Yuan
7e06c81894
Fix approximated predict contribution. (#6811) 2021-04-03 02:15:03 +08:00
Jiaming Yuan
0cced530ea
[doc] Clarify prediction function. (#6813) 2021-04-03 02:12:04 +08:00
Jiaming Yuan
47b62480af
More general predict proba. (#6817)
* Use `output_margin` for `softmax`.
* Add test for dask binary cls.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-04-01 19:52:12 +08:00
Jiaming Yuan
a5c852660b
Update document for sklearn model IO. (#6809)
* Update the use of JSON.
* Remove unnecessary type cast.
2021-04-01 15:52:36 +08:00
Jiaming Yuan
10ae0f9511
Fix doc for apply method. (#6796) 2021-03-31 15:28:31 +08:00
James Lamb
f01af43eb0
[dask] disable work stealing explicitly for training tasks (#6794) 2021-03-29 16:47:56 +08:00
Jiaming Yuan
4ee8340e79
Support column major array. (#6765) 2021-03-20 05:19:46 +08:00
Jiaming Yuan
325bc93e16
[dask] Use distributed.MultiLock (#6743)
* [dask] Use `distributed.MultiLock`

This enables training multiple models in parallel.

* Conditionally import `MultiLock`.
* Use async train directly in scikit learn interface.
* Use `worker_client` when available.
2021-03-16 14:19:41 +08:00
Jiaming Yuan
a9b4a95225
Fix learning rate scheduler with cv. (#6720)
* Expose more methods in cvpack and packed booster.
* Fix cv context in deprecated callbacks.
* Fix document.
2021-02-28 13:57:42 +08:00
Jiaming Yuan
9da2287ab8
[breaking] Save booster feature info in JSON, remove feature name generation. (#6605)
* Save feature info in booster in JSON model.
* [breaking] Remove automatic feature name generation in `DMatrix`.

This PR is to enable reliable feature validation in Python package.
2021-02-25 18:54:16 +08:00
capybara
b6167cd2ff
[dask] Use client to persist collections (#6722)
Co-authored-by: fis <jm.yuan@outlook.com>
2021-02-25 16:40:38 +08:00
Jiaming Yuan
c375173dca
Support pylint 2.7.0 (#6726) 2021-02-25 12:49:58 +08:00
Jiaming Yuan
872e559b91
Use inplace predict for sklearn. (#6718)
* Use inplace predict for sklearn when possible.
2021-02-22 12:27:04 +08:00
Benjamin Lehmann
25077564ab
Fixes small typo in sklearn documentation (#6717)
Replaces "dowm" with "down" on parameter n_jobs
2021-02-20 07:36:06 +08:00
Jiaming Yuan
bdedaab8d1
Fix pylint. (#6714) 2021-02-19 11:53:27 +08:00
James Lamb
dc97b5f19f
[dask] remove outdated comment (#6699) 2021-02-15 18:49:11 +08:00