* Re-implement ROC-AUC.
* Binary
* MultiClass
* LTR
* Add documents.
This PR resolves a few issues:
- Define a value when the dataset is invalid, which can happen if there's an
empty dataset, or when the dataset contains only positive or negative values.
- Define ROC-AUC for multi-class classification.
- Define weighted average value for distributed setting.
- A correct implementation for learning to rank task. Previous
implementation is just binary classification with averaging across groups,
which doesn't measure ordered learning to rank.
* [dask] Use `distributed.MultiLock`
This enables training multiple models in parallel.
* Conditionally import `MultiLock`.
* Use async train directly in scikit learn interface.
* Use `worker_client` when available.
* Add a new API function for predicting on `DMatrix`. This function aligns
with rest of the `XGBoosterPredictFrom*` functions on semantic of function
arguments.
* Purge `ntree_limit` from libxgboost, use iteration instead.
* [dask] Use `inplace_predict` by default for dask sklearn models.
* [dask] Run prediction shape inference on worker instead of client.
The breaking change is in the Python sklearn `apply` function, I made it to be
consistent with other prediction functions where `best_iteration` is used by
default.
* Accept array interface for csr and array.
* Accept an optional proxy dmatrix for metainfo.
This constructs an explicit `_ProxyDMatrix` type in Python.
* Remove unused doc.
* Add strict output.
This PR ensures all DMatrix types have a common interface.
* Fix logic in avoiding duplicated DMatrix in sklearn.
* Check for consistency between DMatrix types.
* Add doc for bounds.
* Initial support for distributed LTR using dask.
* Support `qid` in libxgboost.
* Refactor `predict` and `n_features_in_`, `best_[score/iteration/ntree_limit]`
to avoid duplicated code.
* Define `DaskXGBRanker`.
The dask ranker doesn't support group structure, instead it uses query id and
convert to group ptr internally.
* Do not derive from unittest.TestCase (not needed for pytest)
* assertRaises -> pytest.raises
* Simplify test_empty_dmatrix with test parametrization
* setUpClass -> setup_class, tearDownClass -> teardown_class
* Don't import unittest; import pytest
* Use plain assert
* Use parametrized tests in more places
* Fix test_gpu_with_sklearn.py
* Put back run_empty_dmatrix_reg / run_empty_dmatrix_cls
* Fix test_eta_decay_gpu_hist
* Add parametrized tests for monotone constraints
* Fix test names
* Remove test parametrization
* Revise test_slice to be not flaky
* Make external memory data partitioning deterministic.
* Change the meaning of `page_size` from bytes to number of rows.
* Design a data pool.
* Note for external memory.
* Enable unity build on Windows CI.
* Force garbage collect on test.
* Deprecate LabelEncoder in XGBClassifier; skip LabelEncoder for cuDF/cuPy inputs
* Add unit tests for cuDF and cuPy inputs with XGBClassifier
* Fix lint
* Clarify warning
* Move use_label_encoder option to XGBClassifier constructor
* Add a test for cudf.Series
* Add use_label_encoder to XGBRFClassifier doc
* Address reviewer feedback
* [CI] Add RMM as an optional dependency
* Replace caching allocator with pool allocator from RMM
* Revert "Replace caching allocator with pool allocator from RMM"
This reverts commit e15845d4e72e890c2babe31a988b26503a7d9038.
* Use rmm::mr::get_default_resource()
* Try setting default resource (doesn't work yet)
* Allocate pool_mr in the heap
* Prevent leaking pool_mr handle
* Separate EXPECT_DEATH() in separate test suite suffixed DeathTest
* Turn off death tests for RMM
* Address reviewer's feedback
* Prevent leaking of cuda_mr
* Fix Jenkinsfile syntax
* Remove unnecessary function in Jenkinsfile
* [CI] Install NCCL into RMM container
* Run Python tests
* Try building with RMM, CUDA 10.0
* Do not use RMM for CUDA 10.0 target
* Actually test for test_rmm flag
* Fix TestPythonGPU
* Use CNMeM allocator, since pool allocator doesn't yet support multiGPU
* Use 10.0 container to build RMM-enabled XGBoost
* Revert "Use 10.0 container to build RMM-enabled XGBoost"
This reverts commit 789021fa31112e25b683aef39fff375403060141.
* Fix Jenkinsfile
* [CI] Assign larger /dev/shm to NCCL
* Use 10.2 artifact to run multi-GPU Python tests
* Add CUDA 10.0 -> 11.0 cross-version test; remove CUDA 10.0 target
* Rename Conda env rmm_test -> gpu_test
* Use env var to opt into CNMeM pool for C++ tests
* Use identical CUDA version for RMM builds and tests
* Use Pytest fixtures to enable RMM pool in Python tests
* Move RMM to plugin/CMakeLists.txt; use PLUGIN_RMM
* Use per-device MR; use command arg in gtest
* Set CMake prefix path to use Conda env
* Use 0.15 nightly version of RMM
* Remove unnecessary header
* Fix a unit test when cudf is missing
* Add RMM demos
* Remove print()
* Use HostDeviceVector in GPU predictor
* Simplify pytest setup; use LocalCUDACluster fixture
* Address reviewers' commments
Co-authored-by: Hyunsu Cho <chohyu01@cs.wasshington.edu>
* Publish artifacts only on the master and release branches
* Build CUDA only for Compute Capability 7.5 when building PRs
* Run all Windows jobs in a single worker image
* Build nightly XGBoost4J SNAPSHOT JARs with Scala 2.12 only
* Show skipped Python tests on Windows
* Make Graphviz optional for Python tests
* Add back C++ tests
* Unstash xgboost_cpp_tests
* Fix label to CUDA 10.1
* Install cuPy for CUDA 10.1
* Install jsonschema
* Address reviewer's feedback
* Implement GK sketching on GPU.
* Strong tests on quantile building.
* Handle sparse dataset by binary searching the column index.
* Hypothesis test on dask.
* Add cupy to Windows CI
* Update Jenkinsfile-win64
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
* Update Jenkinsfile-win64
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
* Update tests/python-gpu/test_gpu_prediction.py
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>