* add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * init * more changes * temp * update * udpate rabit * change the histogram * update kfactor * sync per node stats * temp * update * final * code clean * update rabit * more cleanup * fix errors * fix failed tests * enforce c++11 * broadcast subsampled feature correctly * init col * temp * col sampling * fix histmastrix init * fix col sampling * remove cout * fix out of bound access * fix core dump remove core dump file * disbale test temporarily * update * add fid * print perf data * update * revert some changes * temp * temp * pass all tests * bring back some tests * recover some changes * fix lint issue * enable monotone and interaction constraints * don't specify default for monotone and interactions * recover column init part * more recovery * fix core dumps * code clean * revert some changes * fix test compilation issue * fix lint issue * resolve compilation issue * fix issues of lint caused by rebase * fix stylistic changes and change variable names * use regtree internal function * modularize depth width * address the comments * fix failed tests * wrap perf timers with class * fix lint * fix num_leaves count * fix indention * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.h Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.h Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * merge * fix compilation
XGBoost4J: Distributed XGBoost for Scala/Java
Documentation | Resources | Release Notes
XGBoost4J is the JVM package of xgboost. It brings all the optimizations and power xgboost into JVM ecosystem.
- Train XGBoost models in scala and java with easy customizations.
- Run distributed xgboost natively on jvm frameworks such as Apache Flink and Apache Spark.
You can find more about XGBoost on Documentation and Resource Page.
Add Maven Dependency
XGBoost4J, XGBoost4J-Spark, etc. in maven repository is compiled with g++-4.8.5
Access release version
maven
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j</artifactId>
<version>latest_version_num</version>
</dependency>
sbt
"ml.dmlc" % "xgboost4j" % "latest_version_num"
For the latest release version number, please check here.
if you want to use xgboost4j-spark, you just need to replace xgboost4j with xgboost4j-spark
Access SNAPSHOT version
You need to add github as repo:
maven:
<repository>
<id>GitHub Repo</id>
<name>GitHub Repo</name>
<url>https://raw.githubusercontent.com/CodingCat/xgboost/maven-repo/</url>
</repository>
sbt:
resolvers += "GitHub Repo" at "https://raw.githubusercontent.com/CodingCat/xgboost/maven-repo/"
the add dependency as following:
maven
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j</artifactId>
<version>latest_version_num</version>
</dependency>
sbt
"ml.dmlc" % "xgboost4j" % "latest_version_num"
For the latest release version number, please check here.
if you want to use xgboost4j-spark, you just need to replace xgboost4j with xgboost4j-spark
Examples
Full code examples for Scala, Java, Apache Spark, and Apache Flink can be found in the examples package.
NOTE on LIBSVM Format:
There is an inconsistent issue between XGBoost4J-Spark and other language bindings of XGBoost.
When users use Spark to load trainingset/testset in LibSVM format with the following code snippet:
spark.read.format("libsvm").load("trainingset_libsvm")
Spark assumes that the dataset is 1-based indexed. However, when you do prediction with other bindings of XGBoost (e.g. Python API of XGBoost), XGBoost assumes that the dataset is 0-based indexed. It creates a pitfall for the users who train model with Spark but predict with the dataset in the same format in other bindings of XGBoost.