xgboost/jvm-packages
Nan Zhu 49b9f39818
[jvm-packages] update xgboost4j cross build script to be compatible with older glibc (#3307)
* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* static glibc glibc++

* update to build with glib 2.12

* remove unsupported flags

* update version number

* remove properties

* remove unnecessary command

* update poms
2018-05-10 06:39:44 -07:00
..
2017-07-06 18:05:11 +12:00
2016-03-05 08:44:55 -05:00

XGBoost4J: Distributed XGBoost for Scala/Java

Build Status Documentation Status GitHub license

Documentation | Resources | Release Notes

XGBoost4J is the JVM package of xgboost. It brings all the optimizations and power xgboost into JVM ecosystem.

  • Train XGBoost models in scala and java with easy customizations.
  • Run distributed xgboost natively on jvm frameworks such as Apache Flink and Apache Spark.

You can find more about XGBoost on Documentation and Resource Page.

Add Maven Dependency

XGBoost4J, XGBoost4J-Spark, etc. in maven repository is compiled with g++-4.8.5

Access SNAPSHOT version

You need to add github as repo:

maven:

<repository>
  <id>GitHub Repo</id>
  <name>GitHub Repo</name>
  <url>https://raw.githubusercontent.com/CodingCat/xgboost/maven-repo/</url>
</repository>

sbt:

resolvers += "GitHub Repo" at "https://raw.githubusercontent.com/CodingCat/xgboost/maven-repo/"

the add dependency as following:

maven

<dependency>
    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost4j</artifactId>
    <version>latest_version_num</version>
</dependency>

sbt

 "ml.dmlc" % "xgboost4j" % "latest_version_num"

if you want to use xgboost4j-spark, you just need to replace xgboost4j with xgboost4j-spark

Examples

Full code examples for Scala, Java, Apache Spark, and Apache Flink can be found in the examples package.

NOTE on LIBSVM Format:

  • Use 1-based ascending indexes for the LIBSVM format in distributed training mode

    • Spark does the internal conversion, and does not accept formats that are 0-based
  • Whereas, use 0-based indexes format when predicting in normal mode - for instance, while using the saved model in the Python package