* added 'sorted()' to qid, and added pandas import
* Update learning_to_rank.rst
---------
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
This PR adds optional support for loading nccl with `dlopen` as an alternative of compile time linking. This is to address the size bloat issue with the PyPI binary release.
- Add CMake option to load `nccl` at runtime.
- Add an NCCL stub.
After this, `nccl` will be fetched from PyPI when using pip to install XGBoost, either by a user or by `pyproject.toml`. Others who want to link the nccl at compile time can continue to do so without any change.
At the moment, this is Linux only since we only support MNMG on Linux.
This aligns dask with pyspark, users need to explicitly call:
```
from xgboost.dask import DaskXGBClassifier
from xgboost import dask as dxgb
```
In future releases, we might stop using the default import and remove the lazy loader.
- Rewrite GPU demos. notebook is converted to script to avoid committing additional png plots.
- Add GPU demos into the sphinx gallery.
- Add RMM demos into the sphinx gallery.
- Test for firing threads with different device ordinals.
* Handle the new `device` parameter in dask and demos.
- Check no ordinal is specified in the dask interface.
- Update demos.
- Update dask doc.
- Update the condition for QDM.
- A `DeviceOrd` struct is implemented to indicate the device. It will eventually replace the `gpu_id` parameter.
- The `predictor` parameter is removed.
- Fallback to `DMatrix` when `inplace_predict` is not available.
- The heuristic for choosing a predictor is only used during training.
* Use ptr from mmap for `GHistIndexMatrix` and `ColumnMatrix`.
- Define a resource for holding various types of memory pointers.
- Define ref vector for holding resources.
- Swap the underlying resources for GHist and ColumnM.
- Add documentation for current status.
- s390x support is removed. It should work if you can compile XGBoost, all the old workaround code does is to get GCC to compile.
Previously, we use `libsvm` as default when format is not specified. However, the dmlc
data parser is not particularly robust against errors, and the most common type of error
is undefined format.
Along with which, we will recommend users to use other data loader instead. We will
continue the maintenance of the parsers as it's currently used for many internal tests
including federated learning.
* Implement multi-target for hist.
- Add new hist tree builder.
- Move data fetchers for tests.
- Dispatch function calls in gbm base on the tree type.
* Update to C++17
* Turn off unity build
* Update CMake to 3.18
* Use MSVC 2022 + CUDA 11.8
* Re-create stack for worker images
* Allocate more disk space for Windows
* Tempiorarily disable clang-tidy
* RAPIDS now requires Python 3.10+
* Unpin cuda-python
* Use latest NCCL
* Use Ubuntu 20.04 in RMM image
* Mark failing mgpu test as xfail