107 Commits

Author SHA1 Message Date
XianXing Zhang
ce708c8e7f [jvm-packages] Leverage the Spark ml API to read DataFrame from files in LibSVM format. (#1785) 2016-11-20 21:28:03 -05:00
Nan Zhu
5217e53156 stylistic fix (#1789)
* stylistic fix

* try multiple repos

* fix

* fix
2016-11-19 22:03:10 -05:00
joandre
91b75f9b41 Fix a small typo in GeneralParams class. Change customEval parameter name from "custom_obj" to "custom_eval". (#1741) 2016-11-06 12:44:49 -05:00
Nan Zhu
6082184cd1 [jvm-packages] update API docs (#1713)
* add back train method but mark as deprecated

* fix scalastyle error

* update java doc

* update
2016-10-27 18:53:22 -07:00
Nan Zhu
d321375df5 [jvm-packages] Fix mis configure of nthread (#1709)
* add back train method but mark as deprecated

* fix scalastyle error

* change class to object in examples

* fix compilation error

* fix mis configuration
2016-10-27 12:10:35 -04:00
Nan Zhu
f12074d355 [jvm-packages] release blog (#1706) 2016-10-26 21:35:42 -04:00
Nan Zhu
f801c22710 [jvm-packages] change class to object in examples (#1703)
* change class to object in examples

* fix compilation error
2016-10-26 14:54:56 -04:00
Nan Zhu
016ab89484 [jvm-packages] Parameter tuning tool for XGBoost (#1664) 2016-10-23 16:58:18 -04:00
Adam Pocock
445029bb82 [jvm-packages] XGBoost4j Windows fixes (#1639)
* Changes for Mingw64 compilation to ensure long is a consistent size.

Mainly impacts the Java API which would not compile, but there may be
silent errors on Windows with large datasets before this patch (as long
is 32-bits when compiled with mingw64 even in 64-bit mode).

* Adding ifdefs to ensure it still compiles on MacOS

* Makefile and create_jni.bat changes for Windows.

* Switching XGDMatrixCreateFromCSREx JNI call to use size_t cast

* Fixing lint error, adding profile switching to jvm-packages build to make create-jni.bat get called, adding myself to Contributors.Md
2016-10-18 08:35:25 -04:00
Nan Zhu
f5c776f64f [jvm-packages] add apache maven repo url and bump up default spark version to 2.0.1 (#1650)
* add apache maven repo url and bump up default spark version to 2.0.1
2016-10-13 08:55:03 -04:00
Nan Zhu
813a53882a [jvm-packages] deprecate Flaky test (#1662)
* deprecate flaky test
2016-10-13 07:21:24 -04:00
Nan Zhu
1673bcbe7e [jvm-packages] separate classification and regression model and integrate with ML package (#1608) 2016-09-30 11:49:03 -04:00
Nan Zhu
37bc122c90 [jvm-packages] Robust dmatrix creation (#1613)
* add back train method but mark as deprecated

* robust matrix creation in jvm
2016-09-26 13:35:04 -04:00
reg.zhuce
3ee145b8dc [jvm-packages] IndexOutOfBoundsException (#1589)
ml.dmlc.xgboost4j.scala.spark.XGBoost.scala:51

values is empty when we meet it at first time, so values(0) throw an IndexOutOfBoundsException.
It should be  dVector.values(i) instead of values(i).
2016-09-20 09:13:47 -04:00
Xin Yin
7245145712 [jvm-packages] Fixed the sanity check for parameter 'nthread' against 'spark.task.cpus'. (#1582) 2016-09-16 11:31:35 -04:00
Nan Zhu
4ad648e856 [jvm-packages] predictLeaf with Dataframe (#1576)
* add back train method but mark as deprecated

* predictLeaf with Dataset

* fix

* fix
2016-09-15 06:15:47 -04:00
Nan Zhu
bb388cbb31 default eval func (#1574) 2016-09-14 13:26:16 -04:00
Nan Zhu
fb02797e2a [jvm-packages] Integration with Spark Dataframe/Dataset (#1559)
* bump up to scala 2.11

* framework of data frame integration

* test consistency between RDD and DataFrame

* order preservation

* test order preservation

* example code and fix makefile

* improve type checking

* improve APIs

* user docs

* work around travis CI's limitation on log length

* adjust test structure

* integrate with Spark -1 .x

* spark 2.x integration

* remove spark 1.x implementation but provide instructions on how to downgrade
2016-09-11 15:02:58 -04:00
Nan Zhu
6dabdd33e3 [jvm-packages] bump to next version (#1535)
* bump to next version

* fix

* fix
2016-09-01 12:18:21 -04:00
Nan Zhu
7fb3fbf577 impose shuffle when creating training RDD (#1531) 2016-08-31 07:34:10 -04:00
Nan Zhu
3f198b9fef [jvm-packages] allow training with missing values in xgboost-spark (#1525)
* allow training with missing values in xgboost-spark

* fix compilation error

* fix bug
2016-08-29 21:45:49 -04:00
Nan Zhu
74db1e8867 [jvm-packages] remove APIs with DMatrix from xgboost-spark (#1519)
* test consistency of prediction functions between DMatrix and RDD

* remove APIs with DMatrix from xgboost-spark

* fix compilation error in xgboost4j-example

* fix test cases
2016-08-28 21:25:49 -04:00
Nan Zhu
6d65aae091 [jvm-packages] test consistency of prediction functions with DMatrix and RDD (#1518)
* test consistency of prediction functions between DMatrix and RDD

* fix the failed test cases
2016-08-28 20:27:03 -04:00
Nan Zhu
d7f79255ec improve test of save/load model (#1515) 2016-08-27 17:16:22 -04:00
Nan Zhu
dc1125eb56 evaluation with RDD data (#1492) 2016-08-20 18:31:10 -04:00
Nan Zhu
582ee63e34 enable train multiple models by distinguishing stage IDs (#1493) 2016-08-20 16:37:07 -04:00
Nan Zhu
70432cac5b make IEvaluation serializable (#1487) 2016-08-19 13:12:39 -04:00
Fangzhou
a8adf16228 fix bug: doing rabit call after finalize in spark prediction phase (#1420) 2016-07-28 23:11:20 -05:00
Earthson Lu
d29edc677c fix #1377 spark-mllib scope: default => provided (#1381) 2016-07-20 23:10:49 -04:00
convexquad
313764b3be Expose predictLeaf functionality in Scala XGBoostModel (#1351) 2016-07-12 06:55:24 -04:00
Rahul
f14c160f4f [jvm-packages][xgboost4j-spark][Minor] Move sparkContext dependency from the XGBoostModel (#1335)
* Move sparkContext dependency from the XGBoostModel

* Update Spark example to declare SparkContext as implict
2016-07-08 06:43:33 -04:00
Muhammad Haseeb Tariq
7533191af7 Typos in README (#1326)
* Inconsistency in libsvm formats

* note on libsvm formats

* typos in README

* Update README.md

* Update README.md

* Update README.md
2016-07-03 15:14:35 -04:00
Muhammad Haseeb Tariq
14f9697025 Inconsistency in libsvm formats (#1325)
* Inconsistency in libsvm formats

* note on libsvm formats
2016-07-03 10:49:41 -07:00
Nan Zhu
bd5b07873e [jvm-packages] create dmatrix with specified missing value (#1272)
* create dmatrix with specified missing value

* update dmlc-core

* support for predict method in spark package

repartitioning

work around

* add more elements to work around training set empty partition issue
2016-06-21 17:35:17 -04:00
Nan Zhu
c9a73fe2a9 explicitly throw exception when detecting empty partition in training dataset (#1281) 2016-06-15 16:03:37 -04:00
Nan Zhu
c6631ad2ed specify spark version (#1224) 2016-05-24 18:19:32 -04:00
Nan Zhu
c85b9012c6 [jvm-packages] xgboost4j-spark external memory (#1219)
* implement external memory support for XGBoost4J

* remove extra space

* enable external memory for prediction

* update doc
2016-05-22 14:01:28 -04:00
CodingCat
d8535313eb allow empty partitions 2016-03-23 12:30:06 -04:00
CodingCat
55ab1c6a22 adjust numWorkers for test 2016-03-18 10:34:36 -04:00
CodingCat
a31a978471 run native lib building command from maven 2016-03-16 16:47:08 -04:00
tqchen
90f7220736 [FLINK] remove nWorker from API 2016-03-14 16:18:35 -07:00
CodingCat
3a951d0ab8 getter of XGBoostModel 2016-03-14 07:26:51 -04:00
Nan Zhu
e3fa7753f5 Merge branch 'master' into master 2016-03-13 22:46:38 -04:00
CodingCat
6f92f1c117 update spark version to 1.6.1 2016-03-13 22:46:06 -04:00
CodingCat
f2ef958ebb support kryo serialization 2016-03-13 11:55:14 -04:00
CodingCat
9011acf52b jvm doc index 2016-03-13 09:20:51 -04:00
CodingCat
16b9e92328 force the user to set number of workers 2016-03-12 13:33:57 -05:00
CodingCat
5f441a29a8 set nthread to spark.task.cpus by default 2016-03-11 20:07:09 -05:00
CodingCat
a3b2e76230 update README for jvm-packages 2016-03-11 15:28:55 -05:00
CodingCat
400b1faecc adjust the API signature as well as the docs 2016-03-11 15:22:44 -05:00