* Cleanup some pylint errors.
* Cleanup pylint errors in rabit modules.
* Make data iter an abstract class and cleanup private access.
* Cleanup no-self-use for booster.
- Mention standard install command for R package.
- Remove repeated "get source" step.
- Remove troubleshooting on Windows. It's outdated considering VS 2022 is already out.
* Implement `MaxCategory` in quantile.
* Implement partition-based split for GPU evaluation. Currently, it's based on the existing evaluation function.
* Extract an evaluator from GPU Hist to store the needed states.
* Added some CUDA stream/event utilities.
* Update document with references.
* Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
Empty partition is different from empty dataset. For the former case, each worker has
non-empty dask collections, but each collection might contain empty partition.
This PR prepares the GHistIndexMatrix to host the column matrix which is used by the hist tree method by accepting sparse_threshold parameter.
Some cleanups are made to ensure the correct batch param is being passed into DMatrix along with some additional tests for correctness of SimpleDMatrix.
* Add a new utility for mapping function onto workers.
* Unify the type for feature names.
* Clean up the iterator.
* Fix prediction with DaskDMatrix worker specification.
* Fix base margin with DeviceQuantileDMatrix.
* Support vs 2022 in setup.py.
* Replace all uses of deprecated function sklearn.datasets.load_boston
* More renaming
* Fix bad name
* Update assertion
* Fix n boosted rounds.
* Avoid over regularization.
* Rebase.
* Avoid over regularization.
* Whac-a-mole
Co-authored-by: fis <jm.yuan@outlook.com>
This is the one last PR for removing omp global variable.
* Add context object to the `DMatrix`. This bridges `DMatrix` with https://github.com/dmlc/xgboost/issues/7308 .
* Require context to be available at the construction time of booster.
* Add `n_threads` support for R csc DMatrix constructor.
* Remove `omp_get_max_threads` in R glue code.
* Remove threading utilities that rely on omp global variable.
- Add user configuration.
- Bring back to the logic of using scheduler address from dask. This was removed when we were trying to support GKE, now we bring it back and let xgboost try it if direct guess or host IP from user config failed.
Note that when cub inside CUDA is being used, XGBoost performs checks on input size
instead of using internal cub function to accept inputs larger than maximum integer.