This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features:
- Federated learning for both CPU and GPU.
- NCCL.
- More data types.
- A unified interface for all the underlying implementations.
- Improved timeout handling for both tracker and workers.
- Exhausted tests with metrics (fixed a couple of bugs along the way).
- A reusable tracker for Python and JVM packages.
- Use the array interface internally.
- Deprecate `XGDMatrixSetDenseInfo`.
- Deprecate `XGDMatrixSetUIntInfo`.
- Move the handling of `DataType` into the deprecated C function.
---------
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
- Add `numBoostedRound` to jvm packages
- Remove rabit checkpoint version.
- Change the starting version of training continuation in JVM [breaking].
- Redefine the checkpoint version policy in jvm package. [breaking]
- Rename the Python check point callback parameter. [breaking]
- Unifies the checkpoint policy between Python and JVM.
---------
Co-authored-by: Stephan T. Lavavej <stl@nuwen.net>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
Co-authored-by: Joe <25804777+ByteSizedJoe@users.noreply.github.com>
- Use the `linalg::Matrix` for storing gradients.
- New API for the custom objective.
- Custom objective for multi-class/multi-target is now required to return the correct shape.
- Custom objective for Python can accept arrays with any strides. (row-major, column-major)
* 1. Add parameters to set feature names and feature types
2. Save feature names and feature types to native json model
* Change serialization and deserialization format to ubj.
Previously, we use `libsvm` as default when format is not specified. However, the dmlc
data parser is not particularly robust against errors, and the most common type of error
is undefined format.
Along with which, we will recommend users to use other data loader instead. We will
continue the maintenance of the parsers as it's currently used for many internal tests
including federated learning.
* Use array interface for CSC matrix.
Use array interface for CSC matrix and align the interface with CSR and dense.
- Fix nthread issue in the R package DMatrix.
- Unify the behavior of handling `missing` with other inputs.
- Unify the behavior of handling `missing` around R, Python, Java, and Scala DMatrix.
- Expose `num_non_missing` to the JVM interface.
- Deprecate old CSR and CSC constructors.
This PR removes auto-detection of MUSL-based Linux systems in favor of system properties the user can set to configure a specific path for a native library.
Following classes are added to support dataframe in java binding:
- `Column` is an abstract type for a single column in tabular data.
- `ColumnBatch` is an abstract type for dataframe.
- `CuDFColumn` is an implementaiton of `Column` that consume cuDF column
- `CudfColumnBatch` is an implementation of `ColumnBatch` that consumes cuDF dataframe.
- `DeviceQuantileDMatrix` is the interface for quantized data.
The Java implementation mimics the Python interface and uses `__cuda_array_interface__` protocol for memory indexing. One difference is on JVM package, the data batch is staged on the host as java iterators cannot be reset.
Co-authored-by: jiamingy <jm.yuan@outlook.com>
Fix bug introduced in 17913713b554d820a8ce94226d854b4a5f1d8bbc (allow loading from byte array)
When loading model from stream, only last buffer read from the input stream is used to construct the model.
This may work for models smaller than 1 MiB (if you are lucky enough to read the whole model at once), but will always fail if the model is larger.
* Add feature score support for linear model.
* Port R interface to the new implementation.
* Add linear model support in Python.
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
* Add `XGBOOST_RABIT_TRACKER_IP_FOR_TEST` to set rabit tracker IP
* change spark and rabit tracker IP to 127.0.0.1on GitHub Action.
Co-authored-by: fis <jm.yuan@outlook.com>
* Add ability to load booster direct from byte array
* fix compiler error
* move InputStream to byte-buffer conversion
- move it from Booster to XGBoost facade class
* [java] extending the library loader to use both OS and CPU architecture.
* Simplifying create_jni.py's architecture detection.
* Tidying up the architecture detection in create_jni.py