Files

Chen Qin b29b8c2f34 [jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4966 )

* [phase 1] expose sets of rabit configurations to spark layer

* add back mutable import

* disable ring_mincount till https://github.com/dmlc/rabit/pull/106d

* Revert "disable ring_mincount till https://github.com/dmlc/rabit/pull/106d"

This reverts commit 65e95a98e24f5eb53c6ba9ef9b2379524258984d.

* apply latest rabit

* fix build error

* apply https://github.com/dmlc/xgboost/pull/4880

* downgrade cmake in rabit

* point to rabit with DMLC_ROOT fix

* relative path of rabit install prefix

* split rabit parameters to another trait

* misc

* misc

* Delete .classpath

* Delete .classpath

* Delete .classpath

* Update XGBoostClassifier.scala

* Update XGBoostRegressor.scala

* Update GeneralParams.scala

* Update GeneralParams.scala

* Update GeneralParams.scala

* Update GeneralParams.scala

* Delete .classpath

* Update RabitParams.scala

* Update .gitignore

* Update .gitignore

* apply rabitParams to training

* use string as rabit parameter value type

* cleanup

* add rabitEnv check

* point to dmlc/rabit

* per feedback

* update private scope

* misc

* update rabit

* add rabit_timtout, fix failing test.

* split tests

* allow build jvm with rabit mock

* pass mock failures to rabit with test

* add mock error and graceful handle rabit assertion error test

* split mvn test

* remove sign for test

* update rabit

* build jvm_packages with rabit mock

* point back to dmlc/rabit

* per feedback, update scala header

* cleanup pom

* per feedback

* try fix lint

* fix lint

* per feedback, remove bootstrap_cache

* per feedback 2

* try replace dev profile with passing mvn property

* fix build error

* remove mvn property and replace with env setting to build test jar

* per feedback

* revert copyright headlines, point to dmlc/rabit

* revert python lint

* remove multiple failure test case as retry is not enabled in spark

* Update core.py

* Update core.py

* per feedback, style fix

2019-11-01 14:21:19 -07:00

dev

upgrade version num (#4670 )

2019-07-17 15:25:35 -07:00

xgboost4j

[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4966 )

2019-11-01 14:21:19 -07:00

xgboost4j-example

Revert "[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876 )" (#4965 )

2019-10-18 14:02:35 -07:00

xgboost4j-flink

Revert "[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876 )" (#4965 )

2019-10-18 14:02:35 -07:00

xgboost4j-spark

[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4966 )

2019-11-01 14:21:19 -07:00

xgboost4j-tester

[jvm-packages] upgrade to Scala 2.12 (#4574 )

2019-07-16 08:43:34 -07:00

.gitignore

[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4966 )

2019-11-01 14:21:19 -07:00

checkstyle-suppressions.xml

[jvm-packages] Fixed checkstyle excludes on Windows (#2370 )

2017-06-02 10:14:13 -07:00

checkstyle.xml

apply google-java-style indentation and impose import orders....

2016-03-03 12:59:18 -05:00

CMakeLists.txt

Refactor CMake scripts. (#4323 )

2019-04-15 10:08:12 -07:00

create_jni.py

[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4966 )

2019-11-01 14:21:19 -07:00

pom.xml

Revert "[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876 )" (#4965 )

2019-10-18 14:02:35 -07:00

README.md

[jvm-packages] upgrade to Scala 2.12 (#4574 )

2019-07-16 08:43:34 -07:00

scalastyle-config.xml

Revert "[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876 )" (#4965 )

2019-10-18 14:02:35 -07:00

README.md

XGBoost4J: Distributed XGBoost for Scala/Java

Documentation | Resources | Release Notes

XGBoost4J is the JVM package of xgboost. It brings all the optimizations and power xgboost into JVM ecosystem.

Train XGBoost models in scala and java with easy customizations.
Run distributed xgboost natively on jvm frameworks such as Apache Flink and Apache Spark.

You can find more about XGBoost on Documentation and Resource Page.

Add Maven Dependency

XGBoost4J, XGBoost4J-Spark, etc. in maven repository is compiled with g++-4.8.5

Access release version

maven

<dependency>
    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost4j_2.12</artifactId>
    <version>latest_version_num</version>
</dependency>

sbt

 "ml.dmlc" %% "xgboost4j" % "latest_version_num"

For the latest release version number, please check here.

if you want to use xgboost4j-spark, you just need to replace xgboost4j with xgboost4j-spark

Access SNAPSHOT version

You need to add github as repo:

maven:

<repository>
  <id>GitHub Repo</id>
  <name>GitHub Repo</name>
  <url>https://raw.githubusercontent.com/CodingCat/xgboost/maven-repo/</url>
</repository>

sbt:

resolvers += "GitHub Repo" at "https://raw.githubusercontent.com/CodingCat/xgboost/maven-repo/"

the add dependency as following:

maven

<dependency>
    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost4j_2.12</artifactId>
    <version>latest_version_num</version>
</dependency>

sbt

 "ml.dmlc" %% "xgboost4j" % "latest_version_num"

For the latest release version number, please check here.

if you want to use xgboost4j-spark, you just need to replace xgboost4j with xgboost4j-spark

Examples

Full code examples for Scala, Java, Apache Spark, and Apache Flink can be found in the examples package.

NOTE on LIBSVM Format:

There is an inconsistent issue between XGBoost4J-Spark and other language bindings of XGBoost.

When users use Spark to load trainingset/testset in LibSVM format with the following code snippet:

spark.read.format("libsvm").load("trainingset_libsvm")

Spark assumes that the dataset is 1-based indexed. However, when you do prediction with other bindings of XGBoost (e.g. Python API of XGBoost), XGBoost assumes that the dataset is 0-based indexed. It creates a pitfall for the users who train model with Spark but predict with the dataset in the same format in other bindings of XGBoost.

Development

You can build/package xgboost4j locally with the following steps:

Linux:

Ensure Docker for Linux is installed.
Clone this repo: git clone --recursive https://github.com/dmlc/xgboost.git
Run the following command:

With Tests: ./xgboost/jvm-packages/dev/build-linux.sh
Skip Tests: ./xgboost/jvm-packages/dev/build-linux.sh --skip-tests

Windows:

Ensure Docker for Windows is installed.
Clone this repo: git clone --recursive https://github.com/dmlc/xgboost.git
Run the following command:

With Tests: .\xgboost\jvm-packages\dev\build-linux.cmd
Skip Tests: .\xgboost\jvm-packages\dev\build-linux.cmd --skip-tests

Note: this will create jars for deployment on Linux machines.