58 Commits

Author SHA1 Message Date
Bobby Wang
932d7201f9
[jvm-packages] refine tracker (#10313)
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2024-05-23 12:46:21 +08:00
Jiaming Yuan
a5a58102e5
Revamp the rabit implementation. (#10112)
This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features:
- Federated learning for both CPU and GPU.
- NCCL.
- More data types.
- A unified interface for all the underlying implementations.
- Improved timeout handling for both tracker and workers.
- Exhausted tests with metrics (fixed a couple of bugs along the way).
- A reusable tracker for Python and JVM packages.
2024-05-20 11:56:23 +08:00
Philip Hyunsu Cho
1168a68872
[jvm-packages] Update release scripts (#9983)
* [jvm-packages] Add Scala version suffix to xgboost-jvm package (#9776)

* Update JVM script (#9714)

* Revamp pom.xml

* Update instructions in prepare_jvm_release.py

* Fix formatting

* [jvm-packages] Fix POM for xgboost-jvm metapackage (#9893)

* [jvm-packages] Fix POM for xgboost-jvm metapackage

* Add script for updating the Scala version

* Update change_scala_version.py to also change scala.version property (#9897)

* Remove 'release-cpu-only' profile

* Remove scala-2.13 profile; enable gpu package for Scala 2.13
2024-01-12 10:37:55 -08:00
Jiaming Yuan
58530b1bc4
Bump version to 2.1. (#9498) 2023-08-18 01:04:04 +08:00
Boris
a01df102c9
Scala 2.13 support. (#9099)
1. Updated the test logic
2. Added smoke tests for Spark examples.
3. Added integration tests for Spark with Scala 2.13
2023-05-27 19:34:02 +08:00
Boris
0e7377ba9c
Updated flink 1.8 -> 1.17. Added smoke tests for Flink (#9046) 2023-04-26 18:41:11 +08:00
dependabot[bot]
cff50fe3ef
Bump hadoop.version from 3.3.4 to 3.3.5 in /jvm-packages (#8962)
Bumps `hadoop.version` from 3.3.4 to 3.3.5.

Updates `hadoop-hdfs` from 3.3.4 to 3.3.5

Updates `hadoop-common` from 3.3.4 to 3.3.5

---
updated-dependencies:
- dependency-name: org.apache.hadoop:hadoop-hdfs
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.apache.hadoop:hadoop-common
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-23 16:12:04 +08:00
dependabot[bot]
85c3334c2b
Bump hadoop-common from 3.2.4 to 3.3.4 in /jvm-packages (#8882)
Bumps hadoop-common from 3.2.4 to 3.3.4.

---
updated-dependencies:
- dependency-name: org.apache.hadoop:hadoop-common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-08 13:15:39 +08:00
dependabot[bot]
4a99c9bdb8
Bump commons-lang3 from 3.9 to 3.12.0 in /jvm-packages (#8548)
Bumps commons-lang3 from 3.9 to 3.12.0.

---
updated-dependencies:
- dependency-name: org.apache.commons:commons-lang3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-06 20:13:46 +08:00
Philip Hyunsu Cho
2546d139d6
[jvm-packages] Add missing commons-lang3 dependency to xgboost4j-gpu (#8508)
* [jvm-packages] Add missing commons-lang3 dependency to xgboost4j-gpu

* Update commons-lang3
2022-12-01 16:27:11 -08:00
Jiaming Yuan
f73520bfff
Bump development version to 2.0. (#8390) 2022-10-28 15:21:19 +08:00
Rong Ou
668b8a0ea4
[Breaking] Switch from rabit to the collective communicator (#8257)
* Switch from rabit to the collective communicator

* fix size_t specialization

* really fix size_t

* try again

* add include

* more include

* fix lint errors

* remove rabit includes

* fix pylint error

* return dict from communicator context

* fix communicator shutdown

* fix dask test

* reset communicator mocklist

* fix distributed tests

* do not save device communicator

* fix jvm gpu tests

* add python test for federated communicator

* Update gputreeshap submodule

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 14:39:01 -08:00
Jiaming Yuan
f835368bcf
Mark next release as 1.7 instead of 2.0 (#8281) 2022-09-28 14:33:37 +08:00
dependabot[bot]
93966b0d19
Bump hadoop-common from 3.2.3 to 3.2.4 in /jvm-packages/xgboost4j-flink (#8157)
Bumps hadoop-common from 3.2.3 to 3.2.4.

---
updated-dependencies:
- dependency-name: org.apache.hadoop:hadoop-common
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-15 06:47:27 -08:00
dependabot[bot]
1bb1913811
Bump hadoop-common from 2.10.1 to 3.2.3 in /jvm-packages/xgboost4j-flink (#7801)
Bumps hadoop-common from 2.10.1 to 3.2.3.

---
updated-dependencies:
- dependency-name: org.apache.hadoop:hadoop-common
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-13 22:24:44 +08:00
Jiaming Yuan
522636cb52
Bump version. (#7769) 2022-03-31 06:33:22 +08:00
dependabot[bot]
87c01f49d8
Bump hadoop-common from 2.7.3 to 2.10.1 in /jvm-packages/xgboost4j-flink (#7641)
Bumps hadoop-common from 2.7.3 to 2.10.1.

---
updated-dependencies:
- dependency-name: org.apache.hadoop:hadoop-common
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-09 17:07:35 -08:00
Jiaming Yuan
f7caac2563
Bump version to 1.6.0 in master. (#7259) 2021-10-07 16:09:26 +08:00
Jiaming Yuan
146549260a
Bump version to 1.5.0 snapshot in master. (#6875) 2021-04-22 01:53:44 +08:00
Philip Hyunsu Cho
0d483cb7c1
Bump version to 1.4.0 snapshot in master (#6486) 2020-12-10 07:38:08 -08:00
Philip Hyunsu Cho
b3193052b3
Bump version to 1.3.0 snapshot in master (#6052) 2020-08-23 17:13:46 -07:00
Philip Hyunsu Cho
073b625bde
Bump version to 1.2.0 snapshot in master (#5733) 2020-05-31 00:11:34 -07:00
Bobby Wang
ad826e913f
[jvm-packages]add feature size for LabelPoint and DataBatch (#5303)
* fix type error

* Validate number of features.

* resolve comments

* add feature size for LabelPoint and DataBatch

* pass the feature size to native

* move feature size validating tests into a separate suite

* resolve comments

Co-authored-by: fis <jm.yuan@outlook.com>
2020-04-07 16:49:52 -07:00
Philip Hyunsu Cho
7ac7e8778f
Port patches from 1.0.0 branch (#5336)
* Remove f-string, since it's not supported by Python 3.5 (#5330)

* Remove f-string, since it's not supported by Python 3.5

* Add Python 3.5 to CI, to ensure compatibility

* Remove duplicated matplotlib

* Show deprecation notice for Python 3.5

* Fix lint

* Fix lint

* Fix a unit test that mistook MINOR ver for PATCH ver

* Enforce only major version in JSON model schema

* Bump version to 1.1.0-SNAPSHOT
2020-02-21 13:13:21 -08:00
Jiaming Yuan
010b8f1428 Revert "[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876)" (#4965)
This reverts commit 86ed01c4bbecef66e1bc4d02fb13116bd6130fae.
2019-10-18 14:02:35 -07:00
Chen Qin
86ed01c4bb [jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876)
* Expose sets of rabit configurations to spark layer
2019-10-18 15:07:31 -04:00
Nan Zhu
1595e3f57b
upgrade version num (#4670)
* upgrade version num

* missign changes

* fix version script

* change versions

* rm files

* Update CMakeLists.txt
2019-07-17 15:25:35 -07:00
koertkuipers
3c506b076e [jvm-packages] upgrade to Scala 2.12 (#4574)
* bump scala to 2.12 which requires java 8 and also newer flink and akka

* put scala version in artifactId

* fix appveyor

* fix for scaladoc issue that looks like https://github.com/scala/bug/issues/10509

* fix ci_build

* update versions in generate_pom.py

* fix generate_pom.py

* apache does not have a download for spark 2.4.3 distro using scala 2.12 yet, so for now i use a tgz i put on s3

* Upload spark-2.4.3-bin-scala2.12-hadoop2.7.tgz to our own S3

* Update Dockerfile.jvm_cross

* Update Dockerfile.jvm_cross
2019-07-16 08:43:34 -07:00
Philip Hyunsu Cho
515f5f5c47
[RFC] Version 0.90 release candidate (#4475)
* Release 0.90

* Add script to automatically generate acknowledgment

* Update NEWS.md
2019-05-20 01:02:44 -07:00
Nan Zhu
5f34078fba
[jvm-packages] bump version for master (#4209)
* update version

* bump version
2019-03-04 23:12:24 -08:00
Nan Zhu
dc2bfbfde1
[jvm-packages] update version to 0.82-SNAPSHOT (#3920)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* update version

* 0.82
2018-11-18 16:47:48 -08:00
Philip Hyunsu Cho
78ec77fa97
Release 0.81 version (#3864)
* Release 0.81 version

* Update NEWS.md
2018-11-04 05:49:11 -08:00
Chen Qin
42b108136f [jvm-packages] bump flink version number (#3686)
* bump flink version number

* bump flink version number

* add missing hadoop dependency
2018-09-13 09:33:09 -07:00
Philip Hyunsu Cho
6288f6d563 Update JVM packages version to 0.81-SNAPSHOT (#3584) 2018-08-13 10:17:52 -07:00
Philip Hyunsu Cho
96826a3515
Release version 0.80 (#3541)
* Up versions

* Write release note for 0.80
2018-08-13 01:38:37 -07:00
Nan Zhu
f66731181f
Update 0.8 version num (#3358)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* update 0.80
2018-06-02 07:06:01 -07:00
Nan Zhu
e1f57b4417
[jvm-packages] scripts to cross-build and deploy artifacts to github (#3276)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* cross building files

* update

* build with docker

* remove

* temp

* update build script

* update pom

* update

* update version

* upload build

* fix path

* update README.md

* fix compiler version to 4.8.5
2018-04-28 07:41:30 -07:00
Nan Zhu
25b2919c44
[jvm-packages] change version of jvm to keep consistent with other pkgs (#3253)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* change version of jvm to keep consistent with other pkgs
2018-04-19 20:48:50 -07:00
Nan Zhu
9747ea2acb
[jvm-packages] fix the pattern in dev script and version mismatch (#3009)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* fix the pattern in dev script and version mismatch
2018-01-06 06:59:38 -08:00
Nan Zhu
14c6392381
[jvm-packages] add dev script to update version and update versions (#2998)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* add dev script to update version and update versions
2018-01-01 21:28:53 -08:00
Sergei Lebedev
69c3b78a29 [jvm-packages] Implemented early stopping (#2710)
* Allowed subsampling test from the training data frame/RDD

The implementation requires storing 1 - trainTestRatio points in memory
to make the sampling work.

An alternative approach would be to construct the full DMatrix and then
slice it deterministically into train/test. The peak memory consumption
of such scenario, however, is twice the dataset size.

* Removed duplication from 'XGBoost.train'

Scala callers can (and should) use names to supply a subset of
parameters. Method overloading is not required.

* Reuse XGBoost seed parameter to stabilize train/test splitting

* Added early stopping support to non-distributed XGBoost

Closes #1544

* Added early-stopping to distributed XGBoost

* Moved construction of 'watches' into a separate method

This commit also fixes the handling of 'baseMargin' which previously
was not added to the validation matrix.

* Addressed review comments
2017-09-29 12:06:22 -07:00
Sergei Lebedev
771a95aec6 [jvm-packages] Added baseMargin to ml.dmlc.xgboost4j.LabeledPoint (#2532)
* Converted ml.dmlc.xgboost4j.LabeledPoint to Scala

This allows to easily integrate LabeledPoint with Spark DataFrame APIs,
which support encoding/decoding case classes out of the box. Alternative
solution would be to keep LabeledPoint in Java and make it a Bean by
generating boilerplate getters/setters. I have decided against that, even
thought the conversion in this PR implies a public API change.

I also had to remove the factory methods fromSparseVector and
fromDenseVector because a) they would need to be duplicated to support
overloaded calls with extra data (e.g. weight); and b) Scala would expose
them via mangled $.MODULE$ which looks ugly in Java.

Additionally, this commit makes it possible to switch to LabeledPoint in
all public APIs and effectively to pass initial margin/group as part of
the point. This seems to be the only reliable way of implementing distributed
learning with these data. Note that group size format used by single-node
XGBoost is not compatible with that scenario, since the partition split
could divide a group into two chunks.

* Switched to ml.dmlc.xgboost4j.LabeledPoint in RDD-based public APIs

Note that DataFrame-based and Flink APIs are not affected by this change.

* Removed baseMargin argument in favour of the LabeledPoint field

* Do a single pass over the partition in buildDistributedBoosters

Note that there is no formal guarantee that

    val repartitioned = rdd.repartition(42)
    repartitioned.zipPartitions(repartitioned.map(_ + 1)) { it1, it2, => ... }

would do a single shuffle, but in practice it seems to be always the case.

* Exposed baseMargin in DataFrame-based API

* Addressed review comments

* Pass baseMargin to XGBoost.trainWithDataFrame via params

* Reverted MLLabeledPoint in Spark APIs

As discussed, baseMargin would only be supported for DataFrame-based APIs.

* Cleaned up baseMargin tests

- Removed RDD-based test, since the option is no longer exposed via
  public APIs
- Changed DataFrame-based one to check that adding a margin actually
  affects the prediction

* Pleased Scalastyle

* Addressed more review comments

* Pleased scalastyle again

* Fixed XGBoost.fromBaseMarginsToArray

which always returned an array of NaNs even if base margin was not
specified. Surprisingly this only failed a few tests.
2017-08-10 14:29:26 -07:00
Sergei Lebedev
e62be19c70 Removed 'flink.suffix' and added 'flink.version' (#2277)
The former was just Scala binary tag, and the latter was hardcoded in
the 'xgboost4j-flink' POM.
2017-05-10 08:42:40 -07:00
Xin Yin
e7fbc8591f [jvm-packages] Scala implementation of the Rabit tracker. (#1612)
* [jvm-packages] Scala implementation of the Rabit tracker.

A Scala implementation of RabitTracker that is interface-interchangable with the
Java implementation, ported from `tracker.py` in the
[dmlc-core project](https://github.com/dmlc/dmlc-core).

* [jvm-packages] Updated Akka dependency in pom.xml.

* Refactored the RabitTracker directory structure.

* Fixed premature stopping of connection handler.

Added a new finite state "AwaitingPortNumber" to explicitly wait for the
worker to send the port, and close the connection. Stopping the actor
prematurely sends a TCP RST to the worker, causing the worker to crash
on AssertionError.

* Added interface IRabitTracker so that user can switch implementations.

* Default timeout duration changes.

* Dependency for Akka tests.

* Removed the main function of RabitTracker.

* A skeleton for testing Akka-based Rabit tracker.

* waitFor() in RabitTracker no longer throws exceptions.

* Completed unit test for the 'start' command of Rabit tracker.

* Preliminary support for Rabit Allreduce via JNI (no prepare function support yet.)

* Fixed the default timeout duration.

* Use Java container to avoid serialization issues due to intermediate wrappers.

* Added tests for Allreduce/model training using Scala Rabit tracker.

* Added spill-over unit test for the Scala Rabit tracker.

* Fixed a typo.

* Overhaul of RabitTracker interface per code review.

  - Removed methods start() waitFor() (no arguments) from IRabitTracker.
  - The timeout in start(timeout) is now worker connection timeout, as tcp
    socket binding timeout is less intuitive.
  - Dropped time unit from start(...) and waitFor(...) methods; the default
    time unit is millisecond.
  - Moved random port number generation into the RabitTrackerHandler.
  - Moved all Rabit-related classes to package ml.dmlc.xgboost4j.scala.rabit.

* More code refactoring and comments.

* Unified timeout constants. Readable tracker status code.

* Add comments to indicate that allReduce is for tests only. Removed all other variants.

* Removed unused imports.

* Simplified signatures of training methods.

 - Moved TrackerConf into parameter map.
 - Changed GeneralParams so that TrackerConf becomes a standalone parameter.
 - Updated test cases accordingly.

* Changed monitoring strategies.

* Reverted monitoring changes.

* Update test case for Rabit AllReduce.

* Mix in UncaughtExceptionHandler into IRabitTracker to prevent tracker from hanging due to exceptions thrown by workers.

* More comprehensive test cases for exception handling and worker connection timeout.

* Handle executor loss due to unknown cause: the newly spawned executor will attempt to connect to the tracker. Interrupt tracker in such case.

* Per code-review, removed training timeout from TrackerConf. Timeout logic must be implemented explicitly and externally in the driver code.

* Reverted scalastyle-config changes.

* Visibility scope change. Interface tweaks.

* Use match pattern to handle tracker_conf parameter.

* Minor clarification in JNI code.

* Clearer intent in match pattern to suppress warnings.

* Removed Future from constructor. Block in start() and waitFor() instead.

* Revert inadvertent comment changes.

* Removed debugging information.

* Updated test cases that are a bit finicky.

* Added comments on the reasoning behind the unit tests for testing Rabit tracker robustness.
2016-12-07 06:35:42 -08:00
Nan Zhu
fb02797e2a [jvm-packages] Integration with Spark Dataframe/Dataset (#1559)
* bump up to scala 2.11

* framework of data frame integration

* test consistency between RDD and DataFrame

* order preservation

* test order preservation

* example code and fix makefile

* improve type checking

* improve APIs

* user docs

* work around travis CI's limitation on log length

* adjust test structure

* integrate with Spark -1 .x

* spark 2.x integration

* remove spark 1.x implementation but provide instructions on how to downgrade
2016-09-11 15:02:58 -04:00
Nan Zhu
6dabdd33e3 [jvm-packages] bump to next version (#1535)
* bump to next version

* fix

* fix
2016-09-01 12:18:21 -04:00
tqchen
90f7220736 [FLINK] remove nWorker from API 2016-03-14 16:18:35 -07:00
CodingCat
16b9e92328 force the user to set number of workers 2016-03-12 13:33:57 -05:00
CodingCat
400b1faecc adjust the API signature as well as the docs 2016-03-11 15:22:44 -05:00
CodingCat
ab68a0ccc7 fix examples 2016-03-11 13:57:03 -05:00