* Define `best_iteration` only if early stopping is used.
This is the behavior specified by the document but not honored in the actual code.
- Don't set the attributes if there's no early stopping.
- Clean up the code for callbacks, and replace assertions with proper exceptions.
- Assign the attributes when early stopping `save_best` is used.
- Turn the attributes into Python properties.
---------
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
- A `DeviceOrd` struct is implemented to indicate the device. It will eventually replace the `gpu_id` parameter.
- The `predictor` parameter is removed.
- Fallback to `DMatrix` when `inplace_predict` is not available.
- The heuristic for choosing a predictor is only used during training.
* Replace all uses of deprecated function sklearn.datasets.load_boston
* More renaming
* Fix bad name
* Update assertion
* Fix n boosted rounds.
* Avoid over regularization.
* Rebase.
* Avoid over regularization.
* Whac-a-mole
Co-authored-by: fis <jm.yuan@outlook.com>
* Change C API name.
* Test for all primitive types from array.
* Add native support for CPU 128 float.
* Convert boolean and float16 in Python.
* Fix dask version for now.
* Use normal predictor for dart booster.
* Implement `inplace_predict` for dart.
* Enable `dart` for dask interface now that it's thread-safe.
* categorical data should be working out of box for dart now.
The implementation is not very efficient as it has to pull back the data and
apply weight for each tree, but still a significant improvement over previous
implementation as now we no longer binary search for each sample.
* Fix output prediction shape on dataframe.
* Add a new API function for predicting on `DMatrix`. This function aligns
with rest of the `XGBoosterPredictFrom*` functions on semantic of function
arguments.
* Purge `ntree_limit` from libxgboost, use iteration instead.
* [dask] Use `inplace_predict` by default for dask sklearn models.
* [dask] Run prediction shape inference on worker instead of client.
The breaking change is in the Python sklearn `apply` function, I made it to be
consistent with other prediction functions where `best_iteration` is used by
default.
* Accept array interface for csr and array.
* Accept an optional proxy dmatrix for metainfo.
This constructs an explicit `_ProxyDMatrix` type in Python.
* Remove unused doc.
* Add strict output.
The old (before fix) best_ntree_limit ignores the num_class parameters, which is incorrect. In before we workarounded it in c++ layer to avoid possible breaking changes on other language bindings. But the Python interpretation stayed incorrect. The PR fixed that in Python to consider num_class, but didn't remove the old workaround, so tree calculation in predictor is incorrect, see PredictBatch in CPUPredictor.
* Do not derive from unittest.TestCase (not needed for pytest)
* assertRaises -> pytest.raises
* Simplify test_empty_dmatrix with test parametrization
* setUpClass -> setup_class, tearDownClass -> teardown_class
* Don't import unittest; import pytest
* Use plain assert
* Use parametrized tests in more places
* Fix test_gpu_with_sklearn.py
* Put back run_empty_dmatrix_reg / run_empty_dmatrix_cls
* Fix test_eta_decay_gpu_hist
* Add parametrized tests for monotone constraints
* Fix test names
* Remove test parametrization
* Revise test_slice to be not flaky
* Add inplace prediction for dask-cudf.
* Remove Dockerfile.release, since it's not used anywhere
* Use Conda exclusively in CUDF and GPU containers
* Improve cupy memory copying.
* Add skip marks to tests.
* Add mgpu-cudf category on the CI to run all distributed tests.
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
Normal prediction with DMatrix is now thread safe with locks. Added inplace prediction is lock free thread safe.
When data is on device (cupy, cudf), the returned data is also on device.
* Implementation for numpy, csr, cudf and cupy.
* Implementation for dask.
* Remove sync in simple dmatrix.