* Don't set_params at the end of set_state.
* Also fix another issue found in dask prediction.
* Add note about prediction.
Don't support other prediction modes at the moment.
* Initial support for cudf integration.
* Add two C APIs for consuming data and metainfo.
* Add CopyFrom for SimpleCSRSource as a generic function to consume the data.
* Add FromDeviceColumnar for consuming device data.
* Add new MetaInfo::SetInfo for consuming label, weight etc.
* Refactor configuration [Part II].
* General changes:
** Remove `Init` methods to avoid ambiguity.
** Remove `Configure(std::map<>)` to avoid redundant copying and prepare for
parameter validation. (`std::vector` is returned from `InitAllowUnknown`).
** Add name to tree updaters for easier debugging.
* Learner changes:
** Make `LearnerImpl` the only source of configuration.
All configurations are stored and carried out by `LearnerImpl::Configure()`.
** Remove booster in C API.
Originally kept for "compatibility reason", but did not state why. So here
we just remove it.
** Add a `metric_names_` field in `LearnerImpl`.
** Remove `LazyInit`. Configuration will always be lazy.
** Run `Configure` before every iteration.
* Predictor changes:
** Allocate both cpu and gpu predictor.
** Remove cpu_predictor from gpu_predictor.
`GBTree` is now used to dispatch the predictor.
** Remove some GPU Predictor tests.
* IO
No IO changes. The binary model format stability is tested by comparing
hashing value of save models between two commits
* Fix#4630, #4421: Preserve correct ordering between metrics, and always use last metric for early stopping
* Clarify semantics of early stopping in presence of multiple valid sets and metrics
* Add a test
* Fix lint
* _maybe_pandas_xxx should return their arguments unchanged if no pandas installed
* Tests should not assume pandas is installed
* Mark tests which require pandas as such
* Implement tree model dump with a code generator.
* Split up generators.
* Implement graphviz generator.
* Use pattern matching.
* [Breaking] Return a Source in `to_graphviz` instead of Digraph in Python package.
Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
* adding support for matrix slicing with query ID for cross-validation
* hail mary test of unrar installation for windows tests
* trying to modify tests to run in Github CI
* Remove dependency on wget and unrar
* Save error log from R test
* Relax assertion in test_training
* Use int instead of bool in C function interface
* Revise R interface
* Add XGDMatrixSliceDMatrixEx and keep old XGDMatrixSliceDMatrix for API compatibility
* Brought the silent parameter for the SKLearn-like API back, marked it deprecated.
- added deprecation notice and warning
- removed silent from the tests for the SKLearn-like API
* Added SKLearn-like random forest Python API.
- added XGBRFClassifier and XGBRFRegressor classes to SKL-like xgboost API
- also added n_gpus and gpu_id parameters to SKL classes
- added documentation describing how to use xgboost for random forests,
as well as existing caveats
* Fix test_gpu_coordinate.
* Use `gpu_coord_descent` in test.
* Reduce number of running rounds.
* Remove nthread.
* Use githubusercontent for r-appveyor.
* Use githubusercontent in travis r tests.
* Prevent empty quantiles
* Revise and improve unit tests for quantile hist
* Remove unnecessary comment
* Add #2943 as a test case
* Skip test if no sklearn
* Revise misleading comments
* Enable xgb_model parameter in XGClassifier scikit-learn API
https://github.com/dmlc/xgboost/issues/3049
* add test_XGBClassifier_resume():
test for xgb_model parameter in XGBClassifier API.
* Update test_with_sklearn.py
* Fix lint
* Unify logging facilities.
* Enhance `ConsoleLogger` to handle different verbosity.
* Override macros from `dmlc`.
* Don't use specialized gamma when building with GPU.
* Remove verbosity cache in monitor.
* Test monitor.
* Deprecate `silent`.
* Fix doc and messages.
* Fix python test.
* Fix silent tests.
* use gain for sklearn feature_importances_
`gain` is a better feature importance criteria than the currently used `weight`
* added importance_type to class
* fixed test
* white space
* fix variable name
* fix deprecation warning
* fix exp array
* white spaces
* Fix#3730: scikit-learn 0.20 compatibility fix
sklearn.cross_validation has been removed from scikit-learn 0.20,
so replace it with sklearn.model_selection
* Display test names for Python tests for clarity
* add interaction constraints
* enable both interaction and monotonic constraints at the same time
* fix lint
* add R test, fix lint, update demo
* Use dmlc::JSONReader to express interaction constraints as nested lists; Use sparse arrays for bookkeeping
* Add Python test for interaction constraints
* make R interaction constraints parameter based on feature index instead of column names, fix R coding style
* Fix lint
* Add BlueTea88 to CONTRIBUTORS.md
* Short circuit when no constraint is specified; address review comments
* Add tutorial for feature interaction constraints
* allow interaction constraints to be passed as string, remove redundant column_names argument
* Fix typo
* Address review comments
* Add comments to Python test
* Fix#3648: XGBClassifier.predict() should return margin scores when output_margin=True
* Fix tests to reflect correct implementation of XGBClassifier.predict(output_margin=True)
* Fix flaky test test_with_sklearn.test_sklearn_api_gblinear
* added xgbranker
* fixed predict method and ranking test
* reformatted code in accordance with pep8
* fixed lint error
* fixed docstring and added checks on objective
* added ranking demo for python
* fixed suffix in rank.py
* Revert "Fix #3485, #3540: Don't use dropout for predicting test sets (#3556)"
This reverts commit 44811f233071c5805d70c287abd22b155b732727.
* Document behavior of predict() for DART booster
* Add notice to parameter.rst
* Added finding quantiles on GPU.
- this includes datasets where weights are assigned to data rows
- as the quantiles found by the new algorithm are not the same
as those found by the old one, test thresholds in
tests/python-gpu/test_gpu_updaters.py have been adjusted.
* Adjustments and improved testing for finding quantiles on the GPU.
- added C++ tests for the DeviceSketch() function
- reduced one of the thresholds in test_gpu_updaters.py
- adjusted the cuts found by the find_cuts_k kernel
Add `'total_gain'` and `'total_cover'` as possible `importance_type`
arguments to `Booster.get_score` in the Python package.
`get_score` already accepts a `'gain'` argument, which returns each
feature's average gain over all of its splits. `'total_gain'` does the
same, but returns a total rather than an average. This seems more
intuitively meaningful, and also matches the behavior of the R package's
`xgb.importance` function.
I also added an analogous `'total_cover'` command for consistency.
This should resolve#3484.