xgboost/doc/jvm/index.md
Nan Zhu fb02797e2a [jvm-packages] Integration with Spark Dataframe/Dataset (#1559)
* bump up to scala 2.11

* framework of data frame integration

* test consistency between RDD and DataFrame

* order preservation

* test order preservation

* example code and fix makefile

* improve type checking

* improve APIs

* user docs

* work around travis CI's limitation on log length

* adjust test structure

* integrate with Spark -1 .x

* spark 2.x integration

* remove spark 1.x implementation but provide instructions on how to downgrade
2016-09-11 15:02:58 -04:00

2.0 KiB

XGBoost JVM Package

Build Status GitHub license

You have find XGBoost JVM Package!

Installation

Currently, XGBoost4J only support installation from source. Building XGBoost4J using Maven requires Maven 3 or newer and Java 7+.

Before you install XGBoost4J, you need to define environment variable JAVA_HOME as your JDK directory to ensure that your compiler can find jni.h correctly, since XGBoost4J relies on JNI to implement the interaction between the JVM and native libraries.

After your JAVA_HOME is defined correctly, it is as simple as run mvn package under jvm-packages directory to install XGBoost4J. You can also skip the tests by running mvn -DskipTests=true package, if you are sure about the correctness of your local setup.

After integrating with Dataframe/Dataset APIs of Spark 2.0, XGBoost4J-Spark only supports compile with Spark 2.x. You can build XGBoost4J-Spark as a component of XGBoost4J by running mvn package, and you can specify the version of spark with mvn -Dspark.version=2.0.0 package. (To continue working with Spark 1.x, the users are supposed to update pom.xml by modifying the properties like spark.version, scala.version, and scala.binary.version. Users also need to change the implemention by replacing SparkSession with SQLContext and the type of API parameters from Dataset[_] to Dataframe)

Contents

Resources

Scala API Docs