* Re-implement ROC-AUC.
* Binary
* MultiClass
* LTR
* Add documents.
This PR resolves a few issues:
- Define a value when the dataset is invalid, which can happen if there's an
empty dataset, or when the dataset contains only positive or negative values.
- Define ROC-AUC for multi-class classification.
- Define weighted average value for distributed setting.
- A correct implementation for learning to rank task. Previous
implementation is just binary classification with averaging across groups,
which doesn't measure ordered learning to rank.
* Ensure RMM is 0.18 or later
* Add use_rmm flag to global configuration
* Modify XGBCachingDeviceAllocatorImpl to skip CUB when use_rmm=True
* Update the demo
* [CI] Pin NumPy to 1.19.4, since NumPy 1.19.5 doesn't work with latest Shap
* [CI] Upgrade cuDF and RMM to 0.18 nightlies
* Modify RMM plugin to be compatible with RMM 0.18
* Update src/common/device_helpers.cuh
Co-authored-by: Mark Harris <mharris@nvidia.com>
Co-authored-by: Mark Harris <mharris@nvidia.com>
* Fall back to CUB allocator if RMM memory pool is not set up
* Fix build
* Prevent memory leak
* Add note about lack of memory initialisation
* Add check for other fast allocators
* Set use_cub_allocator_ to true when RMM is not enabled
* Fix clang-tidy
* Do not demangle symbol; add check to ensure Linux+Clang/GCC combo
* [CI] Add RMM as an optional dependency
* Replace caching allocator with pool allocator from RMM
* Revert "Replace caching allocator with pool allocator from RMM"
This reverts commit e15845d4e72e890c2babe31a988b26503a7d9038.
* Use rmm::mr::get_default_resource()
* Try setting default resource (doesn't work yet)
* Allocate pool_mr in the heap
* Prevent leaking pool_mr handle
* Separate EXPECT_DEATH() in separate test suite suffixed DeathTest
* Turn off death tests for RMM
* Address reviewer's feedback
* Prevent leaking of cuda_mr
* Fix Jenkinsfile syntax
* Remove unnecessary function in Jenkinsfile
* [CI] Install NCCL into RMM container
* Run Python tests
* Try building with RMM, CUDA 10.0
* Do not use RMM for CUDA 10.0 target
* Actually test for test_rmm flag
* Fix TestPythonGPU
* Use CNMeM allocator, since pool allocator doesn't yet support multiGPU
* Use 10.0 container to build RMM-enabled XGBoost
* Revert "Use 10.0 container to build RMM-enabled XGBoost"
This reverts commit 789021fa31112e25b683aef39fff375403060141.
* Fix Jenkinsfile
* [CI] Assign larger /dev/shm to NCCL
* Use 10.2 artifact to run multi-GPU Python tests
* Add CUDA 10.0 -> 11.0 cross-version test; remove CUDA 10.0 target
* Rename Conda env rmm_test -> gpu_test
* Use env var to opt into CNMeM pool for C++ tests
* Use identical CUDA version for RMM builds and tests
* Use Pytest fixtures to enable RMM pool in Python tests
* Move RMM to plugin/CMakeLists.txt; use PLUGIN_RMM
* Use per-device MR; use command arg in gtest
* Set CMake prefix path to use Conda env
* Use 0.15 nightly version of RMM
* Remove unnecessary header
* Fix a unit test when cudf is missing
* Add RMM demos
* Remove print()
* Use HostDeviceVector in GPU predictor
* Simplify pytest setup; use LocalCUDACluster fixture
* Address reviewers' commments
Co-authored-by: Hyunsu Cho <chohyu01@cs.wasshington.edu>
* Implement GK sketching on GPU.
* Strong tests on quantile building.
* Handle sparse dataset by binary searching the column index.
* Hypothesis test on dask.
- move segment sorter to common
- this is the first of a handful of pr's that splits the larger pr #5326
- it moves this facility to common (from ranking objective class), so that it can be
used for metric computation
- it also wraps all the bald device pointers into span.
* - implementation of map ranking algorithm
- also effected necessary suggestions mentioned in the earlier ranking pr's
- made some performance improvements to the ndcg algo as well
This makes GPU Hist robust in distributed environment as some workers might not
be associated with any data in either training or evaluation.
* Disable rabit mock test for now: See #5012 .
* Disable dask-cudf test at prediction for now: See #5003
* Launch dask job for all workers despite they might not have any data.
* Check 0 rows in elementwise evaluation metrics.
Using AUC and AUC-PR still throws an error. See #4663 for a robust fix.
* Add tests for edge cases.
* Add `LaunchKernel` wrapper handling zero sized grid.
* Move some parts of allreducer into a cu file.
* Don't validate feature names when the booster is empty.
* Sync number of columns in DMatrix.
As num_feature is required to be the same across all workers in data split
mode.
* Filtering in dask interface now by default syncs all booster that's not
empty, instead of using rank 0.
* Fix Jenkins' GPU tests.
* Install dask-cuda from source in Jenkins' test.
Now all tests are actually running.
* Restore GPU Hist tree synchronization test.
* Check UUID of running devices.
The check is only performed on CUDA version >= 10.x, as 9.x doesn't have UUID field.
* Fix CMake policy and project variables.
Use xgboost_SOURCE_DIR uniformly, add policy for CMake >= 3.13.
* Fix copying data to CPU
* Fix race condition in cpu predictor.
* Fix duplicated DMatrix construction.
* Don't download extra nccl in CI script.
* - pairwise ranking objective implementation on gpu
- there are couple of more algorithms (ndcg and map) for which support will be added
as follow-up pr's
- with no label groups defined, get gradient is 90x faster on gpu (120m instance
mortgage dataset)
- it can perform by an order of magnitude faster with ~ 10 groups (and adequate cores
for the cpu implementation)
* Add JSON config to rank obj.
* Move get transpose into cc.
* Clean up headers in host device vector, remove thrust dependency.
* Move span and host device vector into public.
* Install c++ headers.
* Short notes for c and c++.
Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
* Initial support for cudf integration.
* Add two C APIs for consuming data and metainfo.
* Add CopyFrom for SimpleCSRSource as a generic function to consume the data.
* Add FromDeviceColumnar for consuming device data.
* Add new MetaInfo::SetInfo for consuming label, weight etc.
* - set the appropriate device before freeing device memory...
- pr #4532 added a global memory tracker/logger to keep track of number of (de)allocations
and peak memory usage on a per device basis.
- this pr adds the appropriate check to make sure that the (de)allocation counts and memory usages
makes sense for the device. since verbosity is typically increased on debug/non-retail builds.
* - pre-create cub allocators and reuse them
- create them once and not resize them dynamically. we need to ensure that these allocators
are created and destroyed exactly once so that the appropriate device id's are set