* [jvm-packages] move the dmatrix building into rabit context (#7823)
This fixes the QuantileDeviceDMatrix in distributed environment.
* [doc] update the jvm tutorial to 1.6.1 [skip ci] (#7834)
* [Breaking][jvm-packages] Use barrier execution mode (#7836)
With the introduction of the barrier execution mode. we don't need to kill SparkContext when some xgboost tasks failed. Instead, Spark will handle the errors for us. So in this PR, `killSparkContextOnWorkerFailure` parameter is deleted.
* [doc] remove the doc about killing SparkContext [skip ci] (#7840)
* [jvm-package] remove the coalesce in barrier mode (#7846)
* [jvm-packages] Fix model compatibility (#7845)
* Ignore all Java exceptions when looking for Linux musl support (#7844)
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
Co-authored-by: Michael Allman <msa@allman.ms>
* [jvm-packages] unify setFeaturesCol API for XGBoostRegressor (#7784)
* [jvm-packages] add doc for xgboost4j-spark-gpu (#7779)
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
* [jvm-packages] remove the dep of com.fasterxml.jackson (#7791)
* [jvm-packages] xgboost4j-spark should work when featuresCols is specified (#7789)
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
* Extend array interface to handle ndarray.
The `ArrayInterface` class is extended to support multi-dim array inputs. Previously this
class handles only 2-dim (vector is also matrix). This PR specifies the expected
dimension at compile-time and the array interface can perform various checks automatically
for input data. Also, adapters like CSR are more rigorous about their input. Lastly, row
vector and column vector are handled without intervention from the caller.
Following classes are added to support dataframe in java binding:
- `Column` is an abstract type for a single column in tabular data.
- `ColumnBatch` is an abstract type for dataframe.
- `CuDFColumn` is an implementaiton of `Column` that consume cuDF column
- `CudfColumnBatch` is an implementation of `ColumnBatch` that consumes cuDF dataframe.
- `DeviceQuantileDMatrix` is the interface for quantized data.
The Java implementation mimics the Python interface and uses `__cuda_array_interface__` protocol for memory indexing. One difference is on JVM package, the data batch is staged on the host as java iterators cannot be reset.
Co-authored-by: jiamingy <jm.yuan@outlook.com>
* [CI] Clean up build for JVM packages
* Use correct path for saving native lib
* Fix groupId of maven-surefire-plugin
* Fix stashing of xgboost4j_jar_gpu
* [CI] Don't run xgboost4j-tester with GPU, since it doesn't use gpu_hist