* Make tree model param a private member.
* Number of features and targets are immutable after construction.
This is to reduce the number of places where we can run configuration.
- Pass obj info into tree updater as const pointer.
This way we don't have to initialize the learner model param before configuring gbm, hence
breaking up the dependency of configurations.
- Define a new tree struct embedded in the `RegTree`.
- Provide dispatching functions in `RegTree`.
- Fix some c++-17 warnings about the use of nodiscard (currently we disable the warning on
the CI).
- Use uint32_t instead of size_t for `bst_target_t` as it has a defined size and can be used
as part of dmlc parameter.
- Hide the `Segment` struct inside the categorical split matrix.
* Support sklearn cross validation for ranker.
- Add a convention for X to include a special `qid` column.
sklearn utilities consider only `X`, `y` and `sample_weight` for supervised learning
algorithms, but we need an additional qid array for ranking.
It's important to be able to support the cross validation function in sklearn since all
other tuning functions like grid search are based on cross validation.
* Update to C++17
* Turn off unity build
* Update CMake to 3.18
* Use MSVC 2022 + CUDA 11.8
* Re-create stack for worker images
* Allocate more disk space for Windows
* Tempiorarily disable clang-tidy
* RAPIDS now requires Python 3.10+
* Unpin cuda-python
* Use latest NCCL
* Use Ubuntu 20.04 in RMM image
* Mark failing mgpu test as xfail
* Fix CPU bin compression with categorical data.
* The bug causes the maximum category to be lesser than 256 or the maximum number of bins when
the input data is dense.
* Extract most of the functionality into `DMatrixCache`.
* Move API entry to independent file to reduce dependency on `predictor.h` file.
* Add test.
---------
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
* Fix quantile tests running on multi-gpus
* Run some gtests with multiple GPUs
* fix mgpu test naming
* Instruct NCCL to print extra logs
* Allocate extra space in /dev/shm to enable NCCL
* use gtest_skip to skip mgpu tests
---------
Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
* Use array interface for CSC matrix.
Use array interface for CSC matrix and align the interface with CSR and dense.
- Fix nthread issue in the R package DMatrix.
- Unify the behavior of handling `missing` with other inputs.
- Unify the behavior of handling `missing` around R, Python, Java, and Scala DMatrix.
- Expose `num_non_missing` to the JVM interface.
- Deprecate old CSR and CSC constructors.