68 Commits

Author SHA1 Message Date
trivialfis
5a7f7e7d49 Implement devices to devices reshard. (#3721)
* Force clearing device memory before Reshard.
* Remove calculating row_segments for gpu_hist and gpu_sketch.
* Guard against changing device.
2018-09-28 17:40:23 +12:00
trivialfis
9119f9e369 Fix gpu devices. (#3693)
* Fix gpu_set normalized and unnormalized.
* Fix DeviceSpan.
2018-09-19 17:39:42 +12:00
trivialfis
60787ecebc Merge generic device helper functions into gpu set. (#3626)
* Remove the use of old NDevices* functions.
* Use GPUSet in timer.h.
2018-08-26 18:14:23 +12:00
Rory Mitchell
07ff52d54c
Dynamically allocate GPU histogram memory (#3519)
* Expand histogram memory dynamically to prevent large allocations for large tree depths (e.g. > 15)

* Remove GPU memory allocation messages. These are misleading as a large number of allocations are now dynamic.

* Fix appveyor R test
2018-07-28 21:22:41 +12:00
Andy Adinets
cc6a5a3666 Added finding quantiles on GPU. (#3393)
* Added finding quantiles on GPU.

- this includes datasets where weights are assigned to data rows
- as the quantiles found by the new algorithm are not the same
  as those found by the old one, test thresholds in
    tests/python-gpu/test_gpu_updaters.py have been adjusted.

* Adjustments and improved testing for finding quantiles on the GPU.

- added C++ tests for the DeviceSketch() function
- reduced one of the thresholds in test_gpu_updaters.py
- adjusted the cuts found by the find_cuts_k kernel
2018-07-27 14:03:16 +12:00
Thejaswi
2200939416 Upgrading to NCCL2 (#3404)
* Upgrading to NCCL2

* Part - II of NCCL2 upgradation

 - Doc updates to build with nccl2
 - Dockerfile.gpu update for a correct CI build with nccl2
 - Updated FindNccl package to have env-var NCCL_ROOT to take precedence

* Upgrading to v9.2 for CI workflow, since it has the nccl2 binaries available

* Added NCCL2 license + copy the nccl binaries into /usr location for the FindNccl module to find

* Set LD_LIBRARY_PATH variable to pick nccl2 binary at runtime

* Need the nccl2 library download instructions inside Dockerfile.release as well

* Use NCCL2 as a static library
2018-07-10 00:42:15 -07:00
Andy Adinets
286dccb8e8 GPU binning and compression. (#3319)
* GPU binning and compression.

- binning and index compression are done inside the DeviceShard constructor
- in case of a DMatrix with multiple row batches, it is first converted into a single row batch
2018-06-05 17:15:13 +12:00
Thejaswi
d367e4fc6b Fix for issue 3306. (#3324) 2018-05-23 13:42:20 +12:00
Andrew V. Adinetz
b8a0d66fe6 Multi-GPU HostDeviceVector. (#3287)
* Multi-GPU HostDeviceVector.

- HostDeviceVector instances can now span multiple devices, defined by GPUSet struct
- the interface of HostDeviceVector has been modified accordingly
- GPU objective functions are now multi-GPU
- GPU predicting from cache is now multi-GPU
- avoiding omp_set_num_threads() calls
- other minor changes
2018-05-05 08:00:05 +12:00
Rory Mitchell
a185ddfe03
Implement GPU accelerated coordinate descent algorithm (#3178)
* Implement GPU accelerated coordinate descent algorithm. 

* Exclude external memory tests for GPU
2018-04-20 14:56:35 +12:00
Rory Mitchell
ccf80703ef
Clang-tidy static analysis (#3222)
* Clang-tidy static analysis

* Modernise checks

* Google coding standard checks

* Identifier renaming according to Google style
2018-04-19 18:57:13 +12:00
Rory Mitchell
a1ec7b1716
Change reduce operation from thrust to cub. Fix for cuda 9.1 error (#3218)
* Change reduce operation from thrust to cub. Fix for cuda 9.1 runtime error

* Unit test sum reduce
2018-04-04 14:21:48 +12:00
Andrew V. Adinetz
24c2e41287 Fixed the bug with illegal memory access in test_large_sizes.py with 4 GPUs. (#3068)
- thrust::copy() called from dvec::copy() for gpairs invoked a GPU kernel instead of
  cudaMemcpy()
- this resulted in illegal memory access if the GPU running the kernel could not access
  the data being copied
- new version of dvec::copy() for thrust::device_ptr iterators calls cudaMemcpy(),
  avoiding the problem.
2018-02-01 16:54:46 +13:00
Thejaswi
84ab74f3a5 Objective function evaluation on GPU with minimal PCIe transfers (#2935)
* Added GPU objective function and no-copy interface.

- xgboost::HostDeviceVector<T> syncs automatically between host and device
- no-copy interfaces have been added
- default implementations just sync the data to host
  and call the implementations with std::vector
- GPU objective function, predictor, histogram updater process data
  directly on GPU
2018-01-12 21:33:39 +13:00
Rory Mitchell
40c6e2f0c8
Improved gpu_hist_experimental algorithm (#2866)
- Implement colsampling, subsampling for gpu_hist_experimental

 - Optimised multi-GPU implementation for gpu_hist_experimental

 - Make nccl optional

 - Add Volta architecture flag

 - Optimise RegLossObj

 - Add timing utilities for debug verbose mode

 - Bump required cuda version to 8.0
2017-11-11 13:58:40 +13:00
Rory Mitchell
13e7a2cff0 Various bug fixes (#2825)
* Fatal error if GPU algorithm selected without GPU support compiled

* Resolve type conversion warnings

* Fix gpu unit test failure

* Fix compressed iterator edge case

* Fix python unit test failures due to flake8 update on pip
2017-10-25 14:45:01 +13:00
Rory Mitchell
4cb2f7598b -Add experimental GPU algorithm for lossguided mode (#2755)
-Improved GPU algorithm unit tests
-Removed some thrust code to improve compile times
2017-10-01 00:18:35 +13:00
Rory Mitchell
15267eedf2 [GPU-Plugin] Major refactor 2 (#2664)
* Change cmake option

* Move source files

* Move google tests

* Move python tests

* Move benchmarks

* Move documentation

* Remove makefile support

* Fix test run

* Move GPU tests
2017-09-08 09:57:16 +12:00