Files

dependabot[bot] 42d8b06e0a Bump rapids-4-spark_2.12 from 23.04.1 to 23.06.0 in /jvm-packages

Bumps rapids-4-spark_2.12 from 23.04.1 to 23.06.0.

---
updated-dependencies:
- dependency-name: com.nvidia:rapids-4-spark_2.12
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

2023-06-27 03:02:23 +00:00

dev

Add script for change version. (#8443 )

2022-11-24 00:06:39 +08:00

xgboost4j

Scala 2.13 support. (#9099 )

2023-05-27 19:34:02 +08:00

xgboost4j-example

Scala 2.13 support. (#9099 )

2023-05-27 19:34:02 +08:00

xgboost4j-flink

Scala 2.13 support. (#9099 )

2023-05-27 19:34:02 +08:00

xgboost4j-gpu

Scala 2.13 support. (#9099 )

2023-05-27 19:34:02 +08:00

xgboost4j-spark

Set ndcg to default for LTR. (#8822 )

2023-06-09 23:31:33 +08:00

xgboost4j-spark-gpu

Set ndcg to default for LTR. (#8822 )

2023-06-09 23:31:33 +08:00

xgboost4j-tester

Scala 2.13 support. (#9099 )

2023-05-27 19:34:02 +08:00

.gitignore

Scala 2.13 support. (#9099 )

2023-05-27 19:34:02 +08:00

checkstyle-suppressions.xml

[jvm-packages] Fixed checkstyle excludes on Windows (#2370 )

2017-06-02 10:14:13 -07:00

checkstyle.xml

[jvm-packages] update checkstyle (#7335 )

2021-10-18 18:42:01 +08:00

CMakeLists.txt

[jvm-packages][xgboost4j-gpu] Support GPU dataframe and DeviceQuantileDMatrix (#7195 )

2021-09-24 14:25:00 +08:00

create_jni.py

[jvm-packages] Fix for space in sys.executable path in create_jni.py (#7358 )

2021-10-25 13:45:11 +08:00

pom.xml

Bump rapids-4-spark_2.12 from 23.04.1 to 23.06.0 in /jvm-packages

2023-06-27 03:02:23 +00:00

README.md

Update outdated build badges (#9232 )

2023-06-02 08:22:25 -07:00

scalastyle-config.xml

[jvm-packages] update checkstyle (#7335 )

2021-10-18 18:42:01 +08:00

README.md

XGBoost4J: Distributed XGBoost for Scala/Java

Documentation | Resources | Release Notes

XGBoost4J is the JVM package of xgboost. It brings all the optimizations and power xgboost into JVM ecosystem.

Train XGBoost models in scala and java with easy customizations.
Run distributed xgboost natively on jvm frameworks such as Apache Flink and Apache Spark.

You can find more about XGBoost on Documentation and Resource Page.

Add Maven Dependency

XGBoost4J, XGBoost4J-Spark, etc. in maven repository is compiled with g++-4.8.5.

Access release version

Maven

<dependency>
    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost4j_2.12</artifactId>
    <version>latest_version_num</version>
</dependency>
<dependency>
    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost4j-spark_2.12</artifactId>
    <version>latest_version_num</version>
</dependency>

<dependency>
    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost4j_2.13</artifactId>
    <version>latest_version_num</version>
</dependency>
<dependency>
    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost4j-spark_2.13</artifactId>
    <version>latest_version_num</version>
</dependency>

sbt

libraryDependencies ++= Seq(
  "ml.dmlc" %% "xgboost4j" % "latest_version_num",
  "ml.dmlc" %% "xgboost4j-spark" % "latest_version_num"
)

For the latest release version number, please check here.

Access SNAPSHOT version

First add the following Maven repository hosted by the XGBoost project:

Maven:

<repository>
  <id>XGBoost4J Snapshot Repo</id>
  <name>XGBoost4J Snapshot Repo</name>
  <url>https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/</url>
</repository>

sbt:

resolvers += "XGBoost4J Snapshot Repo" at "https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/snapshot/"

Then add XGBoost4J as a dependency:

Maven

<dependency>
    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost4j_2.12</artifactId>
    <version>latest_version_num-SNAPSHOT</version>
</dependency>
<dependency>
    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost4j-spark_2.12</artifactId>
    <version>latest_version_num-SNAPSHOT</version>
</dependency>

or with scala 2.13

<dependency>
    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost4j_2.13</artifactId>
    <version>latest_version_num-SNAPSHOT</version>
</dependency>
<dependency>
    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost4j-spark_2.13</artifactId>
    <version>latest_version_num-SNAPSHOT</version>
</dependency>

sbt

libraryDependencies ++= Seq(
  "ml.dmlc" %% "xgboost4j" % "latest_version_num-SNAPSHOT",
  "ml.dmlc" %% "xgboost4j-spark" % "latest_version_num-SNAPSHOT"
)

For the latest release version number, please check the repository listing.

GPU algorithm

To enable the GPU algorithm (tree_method='gpu_hist'), use artifacts xgboost4j-gpu_2.12 and xgboost4j-spark-gpu_2.12 instead. Note that scala 2.13 is not supported by the NVIDIA/spark-rapids#1525 yet, so the GPU algorithm can only be used with scala 2.12.

Examples

Full code examples for Scala, Java, Apache Spark, and Apache Flink can be found in the examples package.

NOTE on LIBSVM Format:

There is an inconsistent issue between XGBoost4J-Spark and other language bindings of XGBoost.

When users use Spark to load trainingset/testset in LIBSVM format with the following code snippet:

spark.read.format("libsvm").load("trainingset_libsvm")

Spark assumes that the dataset is 1-based indexed. However, when you do prediction with other bindings of XGBoost (e.g. Python API of XGBoost), XGBoost assumes that the dataset is 0-based indexed. It creates a pitfall for the users who train model with Spark but predict with the dataset in the same format in other bindings of XGBoost.

Development

You can build/package xgboost4j locally with the following steps:

Linux:

Ensure Docker for Linux is installed.
Clone this repo: git clone --recursive https://github.com/dmlc/xgboost.git
Run the following command:

With Tests: ./xgboost/jvm-packages/dev/build-linux.sh
Skip Tests: ./xgboost/jvm-packages/dev/build-linux.sh --skip-tests

Windows:

Ensure Docker for Windows is installed.
Clone this repo: git clone --recursive https://github.com/dmlc/xgboost.git
Run the following command:

With Tests: .\xgboost\jvm-packages\dev\build-linux.cmd
Skip Tests: .\xgboost\jvm-packages\dev\build-linux.cmd --skip-tests

Note: this will create jars for deployment on Linux machines.