* [jvm-packages] move the dmatrix building into rabit context (#7823)
This fixes the QuantileDeviceDMatrix in distributed environment.
* [doc] update the jvm tutorial to 1.6.1 [skip ci] (#7834)
* [Breaking][jvm-packages] Use barrier execution mode (#7836)
With the introduction of the barrier execution mode. we don't need to kill SparkContext when some xgboost tasks failed. Instead, Spark will handle the errors for us. So in this PR, `killSparkContextOnWorkerFailure` parameter is deleted.
* [doc] remove the doc about killing SparkContext [skip ci] (#7840)
* [jvm-package] remove the coalesce in barrier mode (#7846)
* [jvm-packages] Fix model compatibility (#7845)
* Ignore all Java exceptions when looking for Linux musl support (#7844)
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
Co-authored-by: Michael Allman <msa@allman.ms>
* [jvm-packages] add hostIp and python exec for rabit tracker (#7808)
* Fix training continuation with categorical model. (#7810)
* Make sure the task is initialized before construction of tree updater.
This is a quick fix meant to be backported to 1.6, for a full fix we should pass the model
param into tree updater by reference instead.
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
* [jvm-packages] unify setFeaturesCol API for XGBoostRegressor (#7784)
* [jvm-packages] add doc for xgboost4j-spark-gpu (#7779)
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
* [jvm-packages] remove the dep of com.fasterxml.jackson (#7791)
* [jvm-packages] xgboost4j-spark should work when featuresCols is specified (#7789)
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
xgboost4j-spark provides 2 sets of API for setting features, one for CPU, another for GPU, which may cause confusion.
This PR removes the GPU API and adds an override CPU function setFeaturesCol to accept Array[String] parameters.
This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing.
The rewrite has many benefits:
- Support for both `max_leaves` and `max_depth`.
- Support for `grow_policy`.
- Support for mono constraint.
- Support for feature weights.
- Support for easier bin configuration (`max_bin`).
- Support for categorical data.
- Faster performance for most of the datasets. (many times faster)
- Support for prediction cache.
- Significantly better performance for external memory.
- Unites the code base between approx and hist.
* Extend array interface to handle ndarray.
The `ArrayInterface` class is extended to support multi-dim array inputs. Previously this
class handles only 2-dim (vector is also matrix). This PR specifies the expected
dimension at compile-time and the array interface can perform various checks automatically
for input data. Also, adapters like CSR are more rigorous about their input. Lastly, row
vector and column vector are handled without intervention from the caller.
Spark 3.2 depends on 3.7.0-M11 which has changed some implicited functions'
signatures. And it will result the xgboost4j built against spark 3.0/3.1
failed when saving the model.
Following classes are added to support dataframe in java binding:
- `Column` is an abstract type for a single column in tabular data.
- `ColumnBatch` is an abstract type for dataframe.
- `CuDFColumn` is an implementaiton of `Column` that consume cuDF column
- `CudfColumnBatch` is an implementation of `ColumnBatch` that consumes cuDF dataframe.
- `DeviceQuantileDMatrix` is the interface for quantized data.
The Java implementation mimics the Python interface and uses `__cuda_array_interface__` protocol for memory indexing. One difference is on JVM package, the data batch is staged on the host as java iterators cannot be reset.
Co-authored-by: jiamingy <jm.yuan@outlook.com>
Fix bug introduced in 17913713b554d820a8ce94226d854b4a5f1d8bbc (allow loading from byte array)
When loading model from stream, only last buffer read from the input stream is used to construct the model.
This may work for models smaller than 1 MiB (if you are lucky enough to read the whole model at once), but will always fail if the model is larger.
* Add feature score support for linear model.
* Port R interface to the new implementation.
* Add linear model support in Python.
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
* Add `XGBOOST_RABIT_TRACKER_IP_FOR_TEST` to set rabit tracker IP
* change spark and rabit tracker IP to 127.0.0.1on GitHub Action.
Co-authored-by: fis <jm.yuan@outlook.com>
* Add ability to load booster direct from byte array
* fix compiler error
* move InputStream to byte-buffer conversion
- move it from Booster to XGBoost facade class
* [java] extending the library loader to use both OS and CPU architecture.
* Simplifying create_jni.py's architecture detection.
* Tidying up the architecture detection in create_jni.py