sriramch 6757654337 Optimizations for quantisation on device (#4572)
* - do not create device vectors for the entire sparse page while computing histograms...
   - while creating the compressed histogram indices, the row vector is created for the entire
     sparse page batch. this is needless as we only process chunks at a time based on a slice
     of the total gpu memory
   - this pr will allocate only as much as required to store the ppropriate row indices and the entries

* - do not dereference row_ptrs once the device_vector has been created to elide host copies of those counts
   - instead, grab the entry counts directly from the sparsepage
2019-06-19 10:50:25 +12:00
..
2019-04-27 13:03:23 +08:00