* [phase 1] expose sets of rabit configurations to spark layer * add back mutable import * disable ring_mincount till https://github.com/dmlc/rabit/pull/106d * Revert "disable ring_mincount till https://github.com/dmlc/rabit/pull/106d" This reverts commit 65e95a98e24f5eb53c6ba9ef9b2379524258984d. * apply latest rabit * fix build error * apply https://github.com/dmlc/xgboost/pull/4880 * downgrade cmake in rabit * point to rabit with DMLC_ROOT fix * relative path of rabit install prefix * split rabit parameters to another trait * misc * misc * Delete .classpath * Delete .classpath * Delete .classpath * Update XGBoostClassifier.scala * Update XGBoostRegressor.scala * Update GeneralParams.scala * Update GeneralParams.scala * Update GeneralParams.scala * Update GeneralParams.scala * Delete .classpath * Update RabitParams.scala * Update .gitignore * Update .gitignore * apply rabitParams to training * use string as rabit parameter value type * cleanup * add rabitEnv check * point to dmlc/rabit * per feedback * update private scope * misc * update rabit * add rabit_timtout, fix failing test. * split tests * allow build jvm with rabit mock * pass mock failures to rabit with test * add mock error and graceful handle rabit assertion error test * split mvn test * remove sign for test * update rabit * build jvm_packages with rabit mock * point back to dmlc/rabit * per feedback, update scala header * cleanup pom * per feedback * try fix lint * fix lint * per feedback, remove bootstrap_cache * per feedback 2 * try replace dev profile with passing mvn property * fix build error * remove mvn property and replace with env setting to build test jar * per feedback * revert copyright headlines, point to dmlc/rabit * revert python lint * remove multiple failure test case as retry is not enabled in spark * Update core.py * Update core.py * per feedback, style fix
XGBoost4J: Distributed XGBoost for Scala/Java
Documentation | Resources | Release Notes
XGBoost4J is the JVM package of xgboost. It brings all the optimizations and power xgboost into JVM ecosystem.
- Train XGBoost models in scala and java with easy customizations.
- Run distributed xgboost natively on jvm frameworks such as Apache Flink and Apache Spark.
You can find more about XGBoost on Documentation and Resource Page.
Add Maven Dependency
XGBoost4J, XGBoost4J-Spark, etc. in maven repository is compiled with g++-4.8.5
Access release version
maven
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j_2.12</artifactId>
<version>latest_version_num</version>
</dependency>
sbt
"ml.dmlc" %% "xgboost4j" % "latest_version_num"
For the latest release version number, please check here.
if you want to use xgboost4j-spark, you just need to replace xgboost4j with xgboost4j-spark
Access SNAPSHOT version
You need to add github as repo:
maven:
<repository>
<id>GitHub Repo</id>
<name>GitHub Repo</name>
<url>https://raw.githubusercontent.com/CodingCat/xgboost/maven-repo/</url>
</repository>
sbt:
resolvers += "GitHub Repo" at "https://raw.githubusercontent.com/CodingCat/xgboost/maven-repo/"
the add dependency as following:
maven
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j_2.12</artifactId>
<version>latest_version_num</version>
</dependency>
sbt
"ml.dmlc" %% "xgboost4j" % "latest_version_num"
For the latest release version number, please check here.
if you want to use xgboost4j-spark, you just need to replace xgboost4j with xgboost4j-spark
Examples
Full code examples for Scala, Java, Apache Spark, and Apache Flink can be found in the examples package.
NOTE on LIBSVM Format:
There is an inconsistent issue between XGBoost4J-Spark and other language bindings of XGBoost.
When users use Spark to load trainingset/testset in LibSVM format with the following code snippet:
spark.read.format("libsvm").load("trainingset_libsvm")
Spark assumes that the dataset is 1-based indexed. However, when you do prediction with other bindings of XGBoost (e.g. Python API of XGBoost), XGBoost assumes that the dataset is 0-based indexed. It creates a pitfall for the users who train model with Spark but predict with the dataset in the same format in other bindings of XGBoost.
Development
You can build/package xgboost4j locally with the following steps:
Linux:
- Ensure Docker for Linux is installed.
- Clone this repo:
git clone --recursive https://github.com/dmlc/xgboost.git - Run the following command:
- With Tests:
./xgboost/jvm-packages/dev/build-linux.sh - Skip Tests:
./xgboost/jvm-packages/dev/build-linux.sh --skip-tests
Windows:
- Ensure Docker for Windows is installed.
- Clone this repo:
git clone --recursive https://github.com/dmlc/xgboost.git - Run the following command:
- With Tests:
.\xgboost\jvm-packages\dev\build-linux.cmd - Skip Tests:
.\xgboost\jvm-packages\dev\build-linux.cmd --skip-tests
Note: this will create jars for deployment on Linux machines.