* Fix CPU bin compression with categorical data.
* The bug causes the maximum category to be lesser than 256 or the maximum number of bins when
the input data is dense.
* Split up column matrix initialization.
This PR splits the column matrix initialization into 2 steps, the first one initializes
the storage while the second one does the transpose. By doing so, we can reuse the code
for Quantile DMatrix.