* Fix#2905
* Fix gpu_exact test failures
* Fix bug in GPU prediction where multiple calls to batch prediction can produce incorrect results
* Fix GPU documentation formatting
- Implement colsampling, subsampling for gpu_hist_experimental
- Optimised multi-GPU implementation for gpu_hist_experimental
- Make nccl optional
- Add Volta architecture flag
- Optimise RegLossObj
- Add timing utilities for debug verbose mode
- Bump required cuda version to 8.0