Following classes are added to support dataframe in java binding:
- `Column` is an abstract type for a single column in tabular data.
- `ColumnBatch` is an abstract type for dataframe.
- `CuDFColumn` is an implementaiton of `Column` that consume cuDF column
- `CudfColumnBatch` is an implementation of `ColumnBatch` that consumes cuDF dataframe.
- `DeviceQuantileDMatrix` is the interface for quantized data.
The Java implementation mimics the Python interface and uses `__cuda_array_interface__` protocol for memory indexing. One difference is on JVM package, the data batch is staged on the host as java iterators cannot be reset.
Co-authored-by: jiamingy <jm.yuan@outlook.com>
* Support more input types for categorical data.
* Shorten the type name from "categorical" to "c".
* Tests for np/cp array and scipy csr/csc/coo.
* Specify the type for feature info.
* Add hessian to batch param in preparation of new approx impl.
* Extract a push method for gradient index matrix.
* Use span instead of vector ref for hessian in sketching.
* Create a binary format for gradient index.
On GPU we use rouding factor to truncate the gradient for deterministic results. This PR changes the gradient representation to fixed point number with exponent aligned with rounding factor.
[breaking] Drop non-deterministic histogram.
Use fixed point for shared memory.
This PR is to improve the performance of GPU Hist.
Co-authored-by: Andy Adinets <aadinets@nvidia.com>
* [CI] Automatically build GPU-enabled R package for Windows
* Update Jenkinsfile-win64
* Build R package for the release branch only
* Update install doc
Fix bug introduced in 17913713b554d820a8ce94226d854b4a5f1d8bbc (allow loading from byte array)
When loading model from stream, only last buffer read from the input stream is used to construct the model.
This may work for models smaller than 1 MiB (if you are lucky enough to read the whole model at once), but will always fail if the model is larger.