* Generate column matrix from gHistIndex.
* Avoid synchronization with the sparse page once the cache is written.
* Cleanups: Remove member variables/functions, change the update routine to look like approx and gpu_hist.
* Remove pruner.
* Implement `MaxCategory` in quantile.
* Implement partition-based split for GPU evaluation. Currently, it's based on the existing evaluation function.
* Extract an evaluator from GPU Hist to store the needed states.
* Added some CUDA stream/event utilities.
* Update document with references.
* Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
* Replace all uses of deprecated function sklearn.datasets.load_boston
* More renaming
* Fix bad name
* Update assertion
* Fix n boosted rounds.
* Avoid over regularization.
* Rebase.
* Avoid over regularization.
* Whac-a-mole
Co-authored-by: fis <jm.yuan@outlook.com>
This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing.
The rewrite has many benefits:
- Support for both `max_leaves` and `max_depth`.
- Support for `grow_policy`.
- Support for mono constraint.
- Support for feature weights.
- Support for easier bin configuration (`max_bin`).
- Support for categorical data.
- Faster performance for most of the datasets. (many times faster)
- Support for prediction cache.
- Significantly better performance for external memory.
- Unites the code base between approx and hist.
* Add num target model parameter, which is configured from input labels.
* Change elementwise metric and indexing for weights.
* Add demo.
* Add tests.
Change from system Python to environment python3. For Ubuntu 20.04, only `python3` is
available and there's no `python`. So at least `python3` is consistent with Python
virtual env, Ubuntu and anaconda.
A new parameter `custom_metric` is added to `train` and `cv` to distinguish the behaviour from the old `feval`. And `feval` is deprecated. The new `custom_metric` receives transformed prediction when the built-in objective is used. This enables XGBoost to use cost functions from other libraries like scikit-learn directly without going through the definition of the link function.
`eval_metric` and `early_stopping_rounds` in sklearn interface are moved from `fit` to `__init__` and is now saved as part of the scikit-learn model. The old ones in `fit` function are now deprecated. The new `eval_metric` in `__init__` has the same new behaviour as `custom_metric`.
Added more detailed documents for the behaviour of custom objective and metric.
* Support more input types for categorical data.
* Shorten the type name from "categorical" to "c".
* Tests for np/cp array and scipy csr/csc/coo.
* Specify the type for feature info.
* Add feature score support for linear model.
* Port R interface to the new implementation.
* Add linear model support in Python.
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
* Ensure RMM is 0.18 or later
* Add use_rmm flag to global configuration
* Modify XGBCachingDeviceAllocatorImpl to skip CUB when use_rmm=True
* Update the demo
* [CI] Pin NumPy to 1.19.4, since NumPy 1.19.5 doesn't work with latest Shap
CLI is not most developed interface. Putting them into correct directory can help new users to avoid it as most of the use cases are from a language binding.