This helps reduce the memory copying needed for dense data. In addition, it helps reduce memory usage even if external memory is not used.
- Decouple the number of symbols needed in the compressor with the number of features when the data is dense.
- Remove the fetch call in the `at_end_` iteration.
- Reduce synchronization and kernel launches by using the `uvector` and ctx.
- Remove the calculation of n_symbols in the accessor.
- Pack initialization steps into the parameter list.
- Pass the context into various ctors.
- Specialization for dense data to prepare for further compression.
- Support resource view in ellpack.
- Define the CUDA version of MMAP resource.
- Define the CUDA version of malloc resource.
- Refactor cuda runtime API wrappers, and add memory access related wrappers.
- gather windows macros into a single header.
- Pass context from booster to DMatrix.
- Use context instead of integer for `n_threads`.
- Check the consistency configuration for `max_bin`.
- Test for all combinations of initialization options.