234 Commits

Author SHA1 Message Date
Fangzhou
a8adf16228 fix bug: doing rabit call after finalize in spark prediction phase (#1420) 2016-07-28 23:11:20 -05:00
convexquad
313764b3be Expose predictLeaf functionality in Scala XGBoostModel (#1351) 2016-07-12 06:55:24 -04:00
Rahul
f14c160f4f [jvm-packages][xgboost4j-spark][Minor] Move sparkContext dependency from the XGBoostModel (#1335)
* Move sparkContext dependency from the XGBoostModel

* Update Spark example to declare SparkContext as implict
2016-07-08 06:43:33 -04:00
Nan Zhu
bd5b07873e [jvm-packages] create dmatrix with specified missing value (#1272)
* create dmatrix with specified missing value

* update dmlc-core

* support for predict method in spark package

repartitioning

work around

* add more elements to work around training set empty partition issue
2016-06-21 17:35:17 -04:00
Nan Zhu
c9a73fe2a9 explicitly throw exception when detecting empty partition in training dataset (#1281) 2016-06-15 16:03:37 -04:00
Nan Zhu
c85b9012c6 [jvm-packages] xgboost4j-spark external memory (#1219)
* implement external memory support for XGBoost4J

* remove extra space

* enable external memory for prediction

* update doc
2016-05-22 14:01:28 -04:00
CodingCat
d8535313eb allow empty partitions 2016-03-23 12:30:06 -04:00
CodingCat
55ab1c6a22 adjust numWorkers for test 2016-03-18 10:34:36 -04:00
CodingCat
3a951d0ab8 getter of XGBoostModel 2016-03-14 07:26:51 -04:00
CodingCat
f2ef958ebb support kryo serialization 2016-03-13 11:55:14 -04:00
CodingCat
16b9e92328 force the user to set number of workers 2016-03-12 13:33:57 -05:00
CodingCat
5f441a29a8 set nthread to spark.task.cpus by default 2016-03-11 20:07:09 -05:00
CodingCat
400b1faecc adjust the API signature as well as the docs 2016-03-11 15:22:44 -05:00
CodingCat
aca0096b33 more updates for Flink
more fix
2016-03-11 10:15:49 -05:00
CodingCat
43d7a85bc9 change the API name since we support not only HDFS and local file system 2016-03-11 10:05:32 -05:00
Shaform
6558ef3273 support different types of filesystems 2016-03-11 22:06:40 +08:00
CodingCat
d47df5c1d8 allow the user to specify the worker number and avoid unnecessary shuffle 2016-03-10 06:58:30 -05:00
CodingCat
e0a3f1c000 nthread no larger than spark.task.cpus 2016-03-10 05:51:07 -05:00
CodingCat
005b1276d0 remove duplicate in stream close 2016-03-09 12:33:49 -05:00
CodingCat
852c5a4b32 code formatting in XGBoostModel 2016-03-09 12:31:35 -05:00
CodingCat
909c6af330 add test resources manually 2016-03-08 18:43:30 -05:00
CodingCat
fa03aaeb63 revise current API 2016-03-08 17:18:55 -05:00
tqchen
435a0425b9 [Spark] Refactor train, predict, add save 2016-03-06 21:51:08 -08:00
CodingCat
718a9d8c96 use another thread to control spark job 2016-03-06 15:46:27 -05:00
CodingCat
6499422e90 fix the merge 2016-03-06 15:22:05 -05:00
CodingCat
50337d1906 fix rabitEnv 2016-03-06 14:56:49 -05:00
CodingCat
808e30f9fc example of DistTrainWithSpark and trigger job with foreachPartition 2016-03-06 14:34:23 -05:00
CodingCat
f768edfede adjust the return values of RabitTracker.waitFor(), remove typesafe.Config 2016-03-06 08:44:04 -05:00
CodingCat
130ca7b00c test case for XGBoostSpark 2016-03-05 19:41:26 -05:00
CodingCat
f0647ec76d test resources 2016-03-05 18:18:07 -05:00
CodingCat
5c1af13f84 distributed in RDD 2016-03-05 17:50:40 -05:00
CodingCat
fb41e4e673 spark with new labeledpoint
fix import order
2016-03-05 17:22:34 -05:00
CodingCat
bb43177eb1 merge 2016-03-05 14:40:30 -05:00
CodingCat
b2d705ffb0 framework of xgboost-spark
iterator

return java iterator and recover test
2016-03-05 08:44:55 -05:00