28 Commits

Author SHA1 Message Date
Bobby Wang
7949a8d5f4
[jvm-packages] support missing value when constructing dmatrix with iterator (#10628) 2024-07-23 23:25:07 +08:00
Jiaming Yuan
a5a58102e5
Revamp the rabit implementation. (#10112)
This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features:
- Federated learning for both CPU and GPU.
- NCCL.
- More data types.
- A unified interface for all the underlying implementations.
- Improved timeout handling for both tracker and workers.
- Exhausted tests with metrics (fixed a couple of bugs along the way).
- A reusable tracker for Python and JVM packages.
2024-05-20 11:56:23 +08:00
Jiaming Yuan
c75a3bc0a9
[breaking] [jvm-packages] Remove rabit check point. (#9599)
- Add `numBoostedRound` to jvm packages
- Remove rabit checkpoint version.
- Change the starting version of training continuation in JVM [breaking].
- Redefine the checkpoint version policy in jvm package. [breaking]
- Rename the Python check point callback parameter. [breaking]
- Unifies the checkpoint policy between Python and JVM.
2023-09-26 18:06:34 +08:00
Jon Yoquinto
d05ea589fb
Allow JVM-Package to access inplace predict method (#9167)
---------

Co-authored-by: Stephan T. Lavavej <stl@nuwen.net>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
Co-authored-by: Joe <25804777+ByteSizedJoe@users.noreply.github.com>
2023-09-12 07:29:51 +08:00
Jiaming Yuan
972730cde0
Use matrix for gradient. (#9508)
- Use the `linalg::Matrix` for storing gradients.
- New API for the custom objective.
- Custom objective for multi-class/multi-target is now required to return the correct shape.
- Custom objective for Python can accept arrays with any strides. (row-major, column-major)
2023-08-24 05:29:52 +08:00
jinmfeng001
a1367ea1f8
Set feature_names and feature_types in jvm-packages (#9364)
* 1. Add parameters to set feature names and feature types
2. Save feature names and feature types to native json model

* Change serialization and deserialization format to ubj.
2023-07-12 15:18:46 +08:00
Jiaming Yuan
c1786849e3
Use array interface for CSC matrix. (#8672)
* Use array interface for CSC matrix.

Use array interface for CSC matrix and align the interface with CSR and dense.

- Fix nthread issue in the R package DMatrix.
- Unify the behavior of handling `missing` with other inputs.
- Unify the behavior of handling `missing` around R, Python, Java, and Scala DMatrix.
- Expose `num_non_missing` to the JVM interface.
- Deprecate old CSR and CSC constructors.
2023-02-05 01:59:46 +08:00
Bobby Wang
f1e9bbcee5
[breakinig] [jvm-packages] change DeviceQuantileDmatrix into QuantileDMatrix (#8461) 2022-12-05 12:23:21 +08:00
Rong Ou
668b8a0ea4
[Breaking] Switch from rabit to the collective communicator (#8257)
* Switch from rabit to the collective communicator

* fix size_t specialization

* really fix size_t

* try again

* add include

* more include

* fix lint errors

* remove rabit includes

* fix pylint error

* return dict from communicator context

* fix communicator shutdown

* fix dask test

* reset communicator mocklist

* fix distributed tests

* do not save device communicator

* fix jvm gpu tests

* add python test for federated communicator

* Update gputreeshap submodule

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 14:39:01 -08:00
Rong Ou
7d43e74e71
JNI wrapper for the collective communicator (#8242) 2022-09-21 04:20:25 +08:00
Bobby Wang
78694405a6
[jvm-packages] add jni for setting feature name and type (#7966) 2022-06-03 11:09:48 +08:00
Jiaming Yuan
ac7a36367c
[jvm-packages] Implement new save_raw in jvm-packages. (#7570)
* New `toByteArray` that accepts a parameter for format.
2022-01-19 16:00:14 +08:00
Jiaming Yuan
ed95e77752
[jvm-packages] Update JNI header. (#7550) 2022-01-10 14:59:40 +08:00
Bobby Wang
0ee11dac77
[jvm-packages][xgboost4j-gpu] Support GPU dataframe and DeviceQuantileDMatrix (#7195)
Following classes are added to support dataframe in java binding:

- `Column` is an abstract type for a single column in tabular data.
- `ColumnBatch` is an abstract type for dataframe.

- `CuDFColumn` is an implementaiton of `Column` that consume cuDF column
- `CudfColumnBatch` is an implementation of `ColumnBatch` that consumes cuDF dataframe.

- `DeviceQuantileDMatrix` is the interface for quantized data.

The Java implementation mimics the Python interface and uses `__cuda_array_interface__` protocol for memory indexing.  One difference is on JVM package, the data batch is staged on the host as java iterators cannot be reset.

Co-authored-by: jiamingy <jm.yuan@outlook.com>
2021-09-24 14:25:00 +08:00
Hristo Iliev
da61d9460b
[jvm-packages] Add getNumFeature method (#6075)
* Add getNumFeature to the Java API
* Add getNumFeature to the Scala API
* Add unit tests for getNumFeature

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-09-07 20:57:46 -07:00
Honza Sterba
22209b7b95 [jvm-packages] Add BigDenseMatrix (#4383)
* Add BigDenseMatrix

* ability to create DMatrix with bigger than Integer.MAX_VALUE size arrays
* uses sun.misc.Unsafe

* make DMatrix test work from a jar as well
2019-09-18 20:46:14 -07:00
Jiaming Yuan
d669ea1eaa
Deprecate set group (#4864)
* Convert jvm package and R package.

* Restore for compatibility.
2019-09-17 21:26:54 -04:00
Xu Xiao
60a9af567c [jvm-packages] Add methods operating attributes of booster in jvm package, which follow API design in python package. (#4336) 2019-04-08 11:00:35 -07:00
zengxy
9e73087324 [jvm-packages] support specified feature names when getModelDump and getFeatureScore (#3733)
* [jvm-packages] support specified feature names for jvm when get ModelDump and get FeatureScore (#3725)

* typo and style fix
2018-10-04 09:05:42 -07:00
ebernhardson
d3b866e3fd [jvm-packages] Expose json formatted booster dumps (#2233) (#2234)
* Change Booster dump from XGBoosterDumpModel to XGBoosterDumpModelEx

Allows exposing multiple formatting options of model dumping.
2017-04-29 20:23:09 -07:00
Xin Yin
e7fbc8591f [jvm-packages] Scala implementation of the Rabit tracker. (#1612)
* [jvm-packages] Scala implementation of the Rabit tracker.

A Scala implementation of RabitTracker that is interface-interchangable with the
Java implementation, ported from `tracker.py` in the
[dmlc-core project](https://github.com/dmlc/dmlc-core).

* [jvm-packages] Updated Akka dependency in pom.xml.

* Refactored the RabitTracker directory structure.

* Fixed premature stopping of connection handler.

Added a new finite state "AwaitingPortNumber" to explicitly wait for the
worker to send the port, and close the connection. Stopping the actor
prematurely sends a TCP RST to the worker, causing the worker to crash
on AssertionError.

* Added interface IRabitTracker so that user can switch implementations.

* Default timeout duration changes.

* Dependency for Akka tests.

* Removed the main function of RabitTracker.

* A skeleton for testing Akka-based Rabit tracker.

* waitFor() in RabitTracker no longer throws exceptions.

* Completed unit test for the 'start' command of Rabit tracker.

* Preliminary support for Rabit Allreduce via JNI (no prepare function support yet.)

* Fixed the default timeout duration.

* Use Java container to avoid serialization issues due to intermediate wrappers.

* Added tests for Allreduce/model training using Scala Rabit tracker.

* Added spill-over unit test for the Scala Rabit tracker.

* Fixed a typo.

* Overhaul of RabitTracker interface per code review.

  - Removed methods start() waitFor() (no arguments) from IRabitTracker.
  - The timeout in start(timeout) is now worker connection timeout, as tcp
    socket binding timeout is less intuitive.
  - Dropped time unit from start(...) and waitFor(...) methods; the default
    time unit is millisecond.
  - Moved random port number generation into the RabitTrackerHandler.
  - Moved all Rabit-related classes to package ml.dmlc.xgboost4j.scala.rabit.

* More code refactoring and comments.

* Unified timeout constants. Readable tracker status code.

* Add comments to indicate that allReduce is for tests only. Removed all other variants.

* Removed unused imports.

* Simplified signatures of training methods.

 - Moved TrackerConf into parameter map.
 - Changed GeneralParams so that TrackerConf becomes a standalone parameter.
 - Updated test cases accordingly.

* Changed monitoring strategies.

* Reverted monitoring changes.

* Update test case for Rabit AllReduce.

* Mix in UncaughtExceptionHandler into IRabitTracker to prevent tracker from hanging due to exceptions thrown by workers.

* More comprehensive test cases for exception handling and worker connection timeout.

* Handle executor loss due to unknown cause: the newly spawned executor will attempt to connect to the tracker. Interrupt tracker in such case.

* Per code-review, removed training timeout from TrackerConf. Timeout logic must be implemented explicitly and externally in the driver code.

* Reverted scalastyle-config changes.

* Visibility scope change. Interface tweaks.

* Use match pattern to handle tracker_conf parameter.

* Minor clarification in JNI code.

* Clearer intent in match pattern to suppress warnings.

* Removed Future from constructor. Block in start() and waitFor() instead.

* Revert inadvertent comment changes.

* Removed debugging information.

* Updated test cases that are a bit finicky.

* Added comments on the reasoning behind the unit tests for testing Rabit tracker robustness.
2016-12-07 06:35:42 -08:00
Nan Zhu
37bc122c90 [jvm-packages] Robust dmatrix creation (#1613)
* add back train method but mark as deprecated

* robust matrix creation in jvm
2016-09-26 13:35:04 -04:00
tqchen
e8560c7909 [refactor] move java package to namespace java 2016-03-05 14:04:13 -05:00
tqchen
86871d4be9 [JVM] Add Iterator loading API 2016-03-04 17:37:46 -08:00
tqchen
e80d3db64b [DIST] Enable multiple thread and tracker, make rabit and xgboost more thread-safe by using thread local variables. 2016-03-03 20:36:14 -08:00
tqchen
5c9e50148a [JVM-PKG] Update JNI to include rabit codes 2016-03-02 22:12:17 -08:00
CodingCat
f8fff6c6fc rename files/packages 2016-03-01 23:48:35 -05:00
CodingCat
3b246c2420 re-structure Java API, add Scala API and consolidate the names of Java/Scala API 2016-03-01 20:53:41 -05:00