Files

Nan Zhu c18a3660fa Separate Depthwidth and Lossguide growing policy in fast histogram (#4102 )

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* add back train method but mark as deprecated

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* fix scalastyle error

* init

* more changes

* temp

* update

* udpate rabit

* change the histogram

* update kfactor

* sync per node stats

* temp

* update

* final

* code clean

* update rabit

* more cleanup

* fix errors

* fix failed tests

* enforce c++11

* broadcast subsampled feature correctly

* init col

* temp

* col sampling

* fix histmastrix init

* fix col sampling

* remove cout

* fix out of bound access

* fix core dump

remove core dump file

* disbale test temporarily

* update

* add fid

* print perf data

* update

* revert some changes

* temp

* temp

* pass all tests

* bring back some tests

* recover some changes

* fix lint issue

* enable monotone and interaction constraints

* don't specify default for monotone and interactions

* recover column init part

* more recovery

* fix core dumps

* code clean

* revert some changes

* fix test compilation issue

* fix lint issue

* resolve compilation issue

* fix issues of lint caused by rebase

* fix stylistic changes and change variable names

* use regtree internal function

* modularize depth width

* address the comments

* fix failed tests

* wrap perf timers with class

* fix lint

* fix num_leaves count

* fix indention

* Update src/tree/updater_quantile_hist.cc

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* Update src/tree/updater_quantile_hist.h

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* Update src/tree/updater_quantile_hist.cc

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* Update src/tree/updater_quantile_hist.cc

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* Update src/tree/updater_quantile_hist.cc

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* Update src/tree/updater_quantile_hist.h

Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>

* merge

* fix compilation

2019-02-13 12:56:19 -08:00

dev

Separate Depthwidth and Lossguide growing policy in fast histogram (#4102 )

2019-02-13 12:56:19 -08:00

xgboost4j

Separate Depthwidth and Lossguide growing policy in fast histogram (#4102 )

2019-02-13 12:56:19 -08:00

xgboost4j-example

Distributed Fast Histogram Algorithm (#4011 )

2019-02-05 05:12:53 -08:00

xgboost4j-flink

[jvm-packages] update version to 0.82-SNAPSHOT (#3920 )

2018-11-18 16:47:48 -08:00

xgboost4j-spark

Separate Depthwidth and Lossguide growing policy in fast histogram (#4102 )

2019-02-13 12:56:19 -08:00

.gitignore

[DIST] Enable multiple thread and tracker, make rabit and xgboost more thread-safe by using thread local variables.

2016-03-03 20:36:14 -08:00

build_doc.sh

[BLOCKING] Adding JVM doc build to Jenkins CI (#3567 )

2018-08-09 13:27:01 -07:00

checkstyle-suppressions.xml

[jvm-packages] Fixed checkstyle excludes on Windows (#2370 )

2017-06-02 10:14:13 -07:00

checkstyle.xml

apply google-java-style indentation and impose import orders....

2016-03-03 12:59:18 -05:00

create_jni.py

Correct JVM CMake GPU flag. (#4071 )

2019-01-21 20:36:38 +08:00

pom.xml

[jvm-packages] fix the scalability issue of prediction (#4033 )

2018-12-29 20:46:30 -08:00

README.md

[jvm-packages] a better explanation about the inconsistent issue (#3524 )

2018-07-28 17:34:39 -07:00

scalastyle-config.xml

sketch of xgboost-spark

2016-03-05 08:44:55 -05:00

README.md

XGBoost4J: Distributed XGBoost for Scala/Java

Documentation | Resources | Release Notes

XGBoost4J is the JVM package of xgboost. It brings all the optimizations and power xgboost into JVM ecosystem.

Train XGBoost models in scala and java with easy customizations.
Run distributed xgboost natively on jvm frameworks such as Apache Flink and Apache Spark.

You can find more about XGBoost on Documentation and Resource Page.

Add Maven Dependency

XGBoost4J, XGBoost4J-Spark, etc. in maven repository is compiled with g++-4.8.5

Access release version

maven

<dependency>
    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost4j</artifactId>
    <version>latest_version_num</version>
</dependency>

sbt

 "ml.dmlc" % "xgboost4j" % "latest_version_num"

For the latest release version number, please check here.

if you want to use xgboost4j-spark, you just need to replace xgboost4j with xgboost4j-spark

Access SNAPSHOT version

You need to add github as repo:

maven:

<repository>
  <id>GitHub Repo</id>
  <name>GitHub Repo</name>
  <url>https://raw.githubusercontent.com/CodingCat/xgboost/maven-repo/</url>
</repository>

sbt:

resolvers += "GitHub Repo" at "https://raw.githubusercontent.com/CodingCat/xgboost/maven-repo/"

the add dependency as following:

maven

<dependency>
    <groupId>ml.dmlc</groupId>
    <artifactId>xgboost4j</artifactId>
    <version>latest_version_num</version>
</dependency>

sbt

 "ml.dmlc" % "xgboost4j" % "latest_version_num"

For the latest release version number, please check here.

if you want to use xgboost4j-spark, you just need to replace xgboost4j with xgboost4j-spark

Examples

Full code examples for Scala, Java, Apache Spark, and Apache Flink can be found in the examples package.

NOTE on LIBSVM Format:

There is an inconsistent issue between XGBoost4J-Spark and other language bindings of XGBoost.

When users use Spark to load trainingset/testset in LibSVM format with the following code snippet:

spark.read.format("libsvm").load("trainingset_libsvm")

Spark assumes that the dataset is 1-based indexed. However, when you do prediction with other bindings of XGBoost (e.g. Python API of XGBoost), XGBoost assumes that the dataset is 0-based indexed. It creates a pitfall for the users who train model with Spark but predict with the dataset in the same format in other bindings of XGBoost.