Commit Graph

853 Commits

Author SHA1 Message Date
Jiaming Yuan
ef473b1f09 Disable pylint error. (#6911) 2021-04-29 01:01:37 +08:00
Jiaming Yuan
a2ecbdaa31 Add an API guard to prevent global variables being changed. (#6891) 2021-04-23 10:27:57 +08:00
Jiaming Yuan
233bdf105f Remove setDaemon in tracker. (#6872) 2021-04-22 01:57:13 +08:00
Jiaming Yuan
146549260a Bump version to 1.5.0 snapshot in master. (#6875) 2021-04-22 01:53:44 +08:00
Jiaming Yuan
a5d7094a45 Update documents. (#6856)
* Add early stopping section to prediction doc.
* Remove best_ntree_limit.
* Better doxygen output.
2021-04-16 12:41:03 +08:00
Jiaming Yuan
dee5ef2dfd Typehint for Sklearn. (#6799) 2021-04-14 06:55:21 +08:00
giladmaya
aa0d8f20c1 Support configuring constraints by feature names (#6783)
Co-authored-by: fis <jm.yuan@outlook.com>
2021-04-04 06:53:33 +08:00
Jiaming Yuan
7e06c81894 Fix approximated predict contribution. (#6811) 2021-04-03 02:15:03 +08:00
Jiaming Yuan
0cced530ea [doc] Clarify prediction function. (#6813) 2021-04-03 02:12:04 +08:00
Jiaming Yuan
47b62480af More general predict proba. (#6817)
* Use `output_margin` for `softmax`.
* Add test for dask binary cls.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-04-01 19:52:12 +08:00
Jiaming Yuan
a5c852660b Update document for sklearn model IO. (#6809)
* Update the use of JSON.
* Remove unnecessary type cast.
2021-04-01 15:52:36 +08:00
Jiaming Yuan
10ae0f9511 Fix doc for apply method. (#6796) 2021-03-31 15:28:31 +08:00
James Lamb
f01af43eb0 [dask] disable work stealing explicitly for training tasks (#6794) 2021-03-29 16:47:56 +08:00
Jiaming Yuan
4ee8340e79 Support column major array. (#6765) 2021-03-20 05:19:46 +08:00
Jiaming Yuan
325bc93e16 [dask] Use distributed.MultiLock (#6743)
* [dask] Use `distributed.MultiLock`

This enables training multiple models in parallel.

* Conditionally import `MultiLock`.
* Use async train directly in scikit learn interface.
* Use `worker_client` when available.
2021-03-16 14:19:41 +08:00
Jiaming Yuan
a9b4a95225 Fix learning rate scheduler with cv. (#6720)
* Expose more methods in cvpack and packed booster.
* Fix cv context in deprecated callbacks.
* Fix document.
2021-02-28 13:57:42 +08:00
Jiaming Yuan
9da2287ab8 [breaking] Save booster feature info in JSON, remove feature name generation. (#6605)
* Save feature info in booster in JSON model.
* [breaking] Remove automatic feature name generation in `DMatrix`.

This PR is to enable reliable feature validation in Python package.
2021-02-25 18:54:16 +08:00
capybara
b6167cd2ff [dask] Use client to persist collections (#6722)
Co-authored-by: fis <jm.yuan@outlook.com>
2021-02-25 16:40:38 +08:00
Jiaming Yuan
c375173dca Support pylint 2.7.0 (#6726) 2021-02-25 12:49:58 +08:00
Jiaming Yuan
872e559b91 Use inplace predict for sklearn. (#6718)
* Use inplace predict for sklearn when possible.
2021-02-22 12:27:04 +08:00
Benjamin Lehmann
25077564ab Fixes small typo in sklearn documentation (#6717)
Replaces "dowm" with "down" on parameter n_jobs
2021-02-20 07:36:06 +08:00
Jiaming Yuan
bdedaab8d1 Fix pylint. (#6714) 2021-02-19 11:53:27 +08:00
James Lamb
dc97b5f19f [dask] remove outdated comment (#6699) 2021-02-15 18:49:11 +08:00
Roffild
4c5d2608e0 [python-package] Fix class Booster: feature_types = None (#6705) 2021-02-13 17:50:23 +08:00
Ali
9b267a435e Bail out early if libxgboost exists in python setup (#6694)
Skip `copy_tree` when existing build is found.
2021-02-10 10:50:10 +08:00
Jiaming Yuan
e8c5c53e2f Use Predictor for dart. (#6693)
* Use normal predictor for dart booster.
* Implement `inplace_predict` for dart.
* Enable `dart` for dask interface now that it's thread-safe.
* categorical data should be working out of box for dart now.

The implementation is not very efficient as it has to pull back the data and
apply weight for each tree, but still a significant improvement over previous
implementation as now we no longer binary search for each sample.

* Fix output prediction shape on dataframe.
2021-02-09 23:30:19 +08:00
Jiaming Yuan
1335db6113 [dask] Improve documents. (#6687)
* Add tag for versions.
* use autoclass in sphinx build.
Made some class methods to be private to avoid exporting documents.
2021-02-09 09:20:58 +08:00
Jiaming Yuan
4656b09d5d [breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668)
* Add a new API function for predicting on `DMatrix`.  This function aligns
with rest of the `XGBoosterPredictFrom*` functions on semantic of function
arguments.
* Purge `ntree_limit` from libxgboost, use iteration instead.
* [dask] Use `inplace_predict` by default for dask sklearn models.
* [dask] Run prediction shape inference on worker instead of client.

The breaking change is in the Python sklearn `apply` function, I made it to be
consistent with other prediction functions where `best_iteration` is used by
default.
2021-02-08 18:26:32 +08:00
Jiaming Yuan
dbb5208a0a Use __array_interface__ for creating DMatrix from CSR. (#6675)
* Use __array_interface__ for creating DMatrix from CSR.
* Add configuration.
2021-02-05 21:09:47 +08:00
Jiaming Yuan
a4101de678 Fix divide by 0 in feature importance when no split is found. (#6676) 2021-02-05 03:39:30 +08:00
Jiaming Yuan
72892cc80d [dask] Disable gblinear and dart. (#6665) 2021-02-04 09:13:09 +08:00
Jiaming Yuan
9d62b14591 Fix document. [skip ci] (#6669) 2021-02-02 20:43:31 +08:00
Jiaming Yuan
411592a347 Enhance inplace prediction. (#6653)
* Accept array interface for csr and array.
* Accept an optional proxy dmatrix for metainfo.

This constructs an explicit `_ProxyDMatrix` type in Python.

* Remove unused doc.
* Add strict output.
2021-02-02 11:41:46 +08:00
Jiaming Yuan
87ab1ad607 [dask] Accept Future of model for prediction. (#6650)
This PR changes predict and inplace_predict to accept a Future of model, to avoid sending models to workers repeatably.

* Document is updated to reflect functionality additions in recent changes.
2021-02-02 08:45:52 +08:00
Jiaming Yuan
d8ec7aad5a [dask] Add a 1 line sample to infer output shape. (#6645)
* [dask] Use a 1 line sample to infer output shape.

This is for inferring shape with direct prediction (without DaskDMatrix).
There are a few things that requires known output shape before carrying out
actual prediction, including dask meta data, output dataframe columns.

* Infer output shape based on local prediction.
* Remove set param in predict function as it's not thread safe nor necessary as
we now let dask to decide the parallelism.
* Simplify prediction on `DaskDMatrix`.
2021-01-30 18:55:50 +08:00
Jiaming Yuan
d167892c7e [dask] Ensure model can be pickled. (#6651) 2021-01-28 21:47:57 +08:00
Jiaming Yuan
740d042255 Add base_margin for evaluation dataset. (#6591)
* Add base margin to evaluation datasets.
* Unify the code base for evaluation matrices.
2021-01-26 02:11:02 +08:00
Jiaming Yuan
4bf23c2391 Specify shape in prediction contrib and interaction. (#6614) 2021-01-26 02:08:22 +08:00
Jiaming Yuan
8942c98054 Define metainfo and other parameters for all DMatrix interfaces. (#6601)
This PR ensures all DMatrix types have a common interface.

* Fix logic in avoiding duplicated DMatrix in sklearn.
* Check for consistency between DMatrix types.
* Add doc for bounds.
2021-01-25 16:06:06 +08:00
Jiaming Yuan
7bc56fa0ed Use simple print in tracker print function. (#6609) 2021-01-21 21:15:43 +08:00
Jiaming Yuan
26982f9fce Skip unused CMake argument in setup.py (#6611) 2021-01-21 17:25:33 +08:00
Jiaming Yuan
f0fd7629ae Add helper script and doc for releasing pip package. (#6613)
* Fix `long_description_content_type`.
2021-01-21 14:46:52 +08:00
Jiaming Yuan
d6d72de339 Revert ntree limit fix (#6616)
The old (before fix) best_ntree_limit ignores the num_class parameters, which is incorrect. In before we workarounded it in c++ layer to avoid possible breaking changes on other language bindings. But the Python interpretation stayed incorrect. The PR fixed that in Python to consider num_class, but didn't remove the old workaround, so tree calculation in predictor is incorrect, see PredictBatch in CPUPredictor.
2021-01-19 23:51:16 +08:00
Jiaming Yuan
d356b7a071 Restore unknown data support. (#6595) 2021-01-14 04:51:16 +08:00
Jiaming Yuan
89a00a5866 [dask] Random forest estimators (#6602) 2021-01-13 20:59:20 +08:00
Jiaming Yuan
0027220aa0 [breaking] Remove duplicated predict functions, Fix attributes IO. (#6593)
* Fix attributes not being restored.
* Rename all `data` to `X`. [breaking]
2021-01-13 16:56:49 +08:00
Jiaming Yuan
03cd087da1 Remove duplicated DMatrix. (#6592) 2021-01-12 09:36:56 +08:00
Jiaming Yuan
c709f2aaaf Fix evaluation result for XGBRanker. (#6594)
* Remove duplicated code, which fixes typo `evals_result` -> `evals_result_`.
2021-01-12 09:36:41 +08:00
Jiaming Yuan
80065d571e [dask] Add DaskXGBRanker (#6576)
* Initial support for distributed LTR using dask.

* Support `qid` in libxgboost.
* Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]`
  to avoid duplicated code.
* Define `DaskXGBRanker`.

The dask ranker doesn't support group structure, instead it uses query id and
convert to group ptr internally.
2021-01-08 18:35:09 +08:00
Jiaming Yuan
7c9dcbedbc Fix best_ntree_limit for dart and gblinear. (#6579) 2021-01-08 10:05:39 +08:00