The old (before fix) best_ntree_limit ignores the num_class parameters, which is incorrect. In before we workarounded it in c++ layer to avoid possible breaking changes on other language bindings. But the Python interpretation stayed incorrect. The PR fixed that in Python to consider num_class, but didn't remove the old workaround, so tree calculation in predictor is incorrect, see PredictBatch in CPUPredictor.
* Initial support for distributed LTR using dask.
* Support `qid` in libxgboost.
* Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]`
to avoid duplicated code.
* Define `DaskXGBRanker`.
The dask ranker doesn't support group structure, instead it uses query id and
convert to group ptr internally.
* For sklearn:
- Handles user defined objective function.
- Handles `softmax`.
* For dask:
- Use the implementation from sklearn, the previous implementation doesn't perform any extra handling.
* Implement early stopping with training continuation.
* Add new C API for obtaining boosted rounds.
* Fix off by 1 in `save_best`.
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
* Enable loading model from <1.0.0 trained with objective='binary:logitraw'
* Add binary:logitraw in model compatibility testing suite
* Feedback from @trivialfis: Override ProbToMargin() for LogisticRaw
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
* Add management functions for global configuration: XGBSetGlobalConfig(), XGBGetGlobalConfig().
* Add Python interface: set_config(), get_config(), and config_context().
* Add unit tests for Python
* Add R interface: xgb.set.config(), xgb.get.config()
* Add unit tests for R
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
* Do not derive from unittest.TestCase (not needed for pytest)
* assertRaises -> pytest.raises
* Simplify test_empty_dmatrix with test parametrization
* setUpClass -> setup_class, tearDownClass -> teardown_class
* Don't import unittest; import pytest
* Use plain assert
* Use parametrized tests in more places
* Fix test_gpu_with_sklearn.py
* Put back run_empty_dmatrix_reg / run_empty_dmatrix_cls
* Fix test_eta_decay_gpu_hist
* Add parametrized tests for monotone constraints
* Fix test names
* Remove test parametrization
* Revise test_slice to be not flaky
Deprecate positional arguments in following functions:
- `__init__` for all classes in sklearn module.
- `fit` method for all classes in sklearn module.
- dask interface.
- `set_info` for `DMatrix` class.
Refactor the evaluation matrices handling.
This PR is meant the end the confusion around best_ntree_limit and unify model slicing. We have multi-class and random forests, asking users to understand how to set ntree_limit is difficult and error prone.
* Implement the save_best option in early stopping.
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
CLI is not most developed interface. Putting them into correct directory can help new users to avoid it as most of the use cases are from a language binding.
* Deprecate LabelEncoder in XGBClassifier; skip LabelEncoder for cuDF/cuPy inputs
* Add unit tests for cuDF and cuPy inputs with XGBClassifier
* Fix lint
* Clarify warning
* Move use_label_encoder option to XGBClassifier constructor
* Add a test for cudf.Series
* Add use_label_encoder to XGBRFClassifier doc
* Address reviewer feedback