29 Commits

Author SHA1 Message Date
Xin Yin
e7fbc8591f [jvm-packages] Scala implementation of the Rabit tracker. (#1612)
* [jvm-packages] Scala implementation of the Rabit tracker.

A Scala implementation of RabitTracker that is interface-interchangable with the
Java implementation, ported from `tracker.py` in the
[dmlc-core project](https://github.com/dmlc/dmlc-core).

* [jvm-packages] Updated Akka dependency in pom.xml.

* Refactored the RabitTracker directory structure.

* Fixed premature stopping of connection handler.

Added a new finite state "AwaitingPortNumber" to explicitly wait for the
worker to send the port, and close the connection. Stopping the actor
prematurely sends a TCP RST to the worker, causing the worker to crash
on AssertionError.

* Added interface IRabitTracker so that user can switch implementations.

* Default timeout duration changes.

* Dependency for Akka tests.

* Removed the main function of RabitTracker.

* A skeleton for testing Akka-based Rabit tracker.

* waitFor() in RabitTracker no longer throws exceptions.

* Completed unit test for the 'start' command of Rabit tracker.

* Preliminary support for Rabit Allreduce via JNI (no prepare function support yet.)

* Fixed the default timeout duration.

* Use Java container to avoid serialization issues due to intermediate wrappers.

* Added tests for Allreduce/model training using Scala Rabit tracker.

* Added spill-over unit test for the Scala Rabit tracker.

* Fixed a typo.

* Overhaul of RabitTracker interface per code review.

  - Removed methods start() waitFor() (no arguments) from IRabitTracker.
  - The timeout in start(timeout) is now worker connection timeout, as tcp
    socket binding timeout is less intuitive.
  - Dropped time unit from start(...) and waitFor(...) methods; the default
    time unit is millisecond.
  - Moved random port number generation into the RabitTrackerHandler.
  - Moved all Rabit-related classes to package ml.dmlc.xgboost4j.scala.rabit.

* More code refactoring and comments.

* Unified timeout constants. Readable tracker status code.

* Add comments to indicate that allReduce is for tests only. Removed all other variants.

* Removed unused imports.

* Simplified signatures of training methods.

 - Moved TrackerConf into parameter map.
 - Changed GeneralParams so that TrackerConf becomes a standalone parameter.
 - Updated test cases accordingly.

* Changed monitoring strategies.

* Reverted monitoring changes.

* Update test case for Rabit AllReduce.

* Mix in UncaughtExceptionHandler into IRabitTracker to prevent tracker from hanging due to exceptions thrown by workers.

* More comprehensive test cases for exception handling and worker connection timeout.

* Handle executor loss due to unknown cause: the newly spawned executor will attempt to connect to the tracker. Interrupt tracker in such case.

* Per code-review, removed training timeout from TrackerConf. Timeout logic must be implemented explicitly and externally in the driver code.

* Reverted scalastyle-config changes.

* Visibility scope change. Interface tweaks.

* Use match pattern to handle tracker_conf parameter.

* Minor clarification in JNI code.

* Clearer intent in match pattern to suppress warnings.

* Removed Future from constructor. Block in start() and waitFor() instead.

* Revert inadvertent comment changes.

* Removed debugging information.

* Updated test cases that are a bit finicky.

* Added comments on the reasoning behind the unit tests for testing Rabit tracker robustness.
2016-12-07 06:35:42 -08:00
Ruimin Wang
d80cec3384 [jvm-pacakges] the first parameter in getModelDump should be featuremap path not model path (#1788)
* fix the model dump in xgboost4j example

* Modify the dump model part of scala version

* add the forgotten modelInfos
2016-11-21 08:52:26 -05:00
XianXing Zhang
ce708c8e7f [jvm-packages] Leverage the Spark ml API to read DataFrame from files in LibSVM format. (#1785) 2016-11-20 21:28:03 -05:00
Nan Zhu
6082184cd1 [jvm-packages] update API docs (#1713)
* add back train method but mark as deprecated

* fix scalastyle error

* update java doc

* update
2016-10-27 18:53:22 -07:00
Nan Zhu
f12074d355 [jvm-packages] release blog (#1706) 2016-10-26 21:35:42 -04:00
Nan Zhu
f801c22710 [jvm-packages] change class to object in examples (#1703)
* change class to object in examples

* fix compilation error
2016-10-26 14:54:56 -04:00
Nan Zhu
016ab89484 [jvm-packages] Parameter tuning tool for XGBoost (#1664) 2016-10-23 16:58:18 -04:00
Nan Zhu
1673bcbe7e [jvm-packages] separate classification and regression model and integrate with ML package (#1608) 2016-09-30 11:49:03 -04:00
Nan Zhu
fb02797e2a [jvm-packages] Integration with Spark Dataframe/Dataset (#1559)
* bump up to scala 2.11

* framework of data frame integration

* test consistency between RDD and DataFrame

* order preservation

* test order preservation

* example code and fix makefile

* improve type checking

* improve APIs

* user docs

* work around travis CI's limitation on log length

* adjust test structure

* integrate with Spark -1 .x

* spark 2.x integration

* remove spark 1.x implementation but provide instructions on how to downgrade
2016-09-11 15:02:58 -04:00
Nan Zhu
6dabdd33e3 [jvm-packages] bump to next version (#1535)
* bump to next version

* fix

* fix
2016-09-01 12:18:21 -04:00
Nan Zhu
74db1e8867 [jvm-packages] remove APIs with DMatrix from xgboost-spark (#1519)
* test consistency of prediction functions between DMatrix and RDD

* remove APIs with DMatrix from xgboost-spark

* fix compilation error in xgboost4j-example

* fix test cases
2016-08-28 21:25:49 -04:00
Earthson Lu
d29edc677c fix #1377 spark-mllib scope: default => provided (#1381) 2016-07-20 23:10:49 -04:00
Rahul
f14c160f4f [jvm-packages][xgboost4j-spark][Minor] Move sparkContext dependency from the XGBoostModel (#1335)
* Move sparkContext dependency from the XGBoostModel

* Update Spark example to declare SparkContext as implict
2016-07-08 06:43:33 -04:00
Nan Zhu
c85b9012c6 [jvm-packages] xgboost4j-spark external memory (#1219)
* implement external memory support for XGBoost4J

* remove extra space

* enable external memory for prediction

* update doc
2016-05-22 14:01:28 -04:00
tqchen
90f7220736 [FLINK] remove nWorker from API 2016-03-14 16:18:35 -07:00
CodingCat
f2ef958ebb support kryo serialization 2016-03-13 11:55:14 -04:00
CodingCat
16b9e92328 force the user to set number of workers 2016-03-12 13:33:57 -05:00
CodingCat
400b1faecc adjust the API signature as well as the docs 2016-03-11 15:22:44 -05:00
CodingCat
ab68a0ccc7 fix examples 2016-03-11 13:57:03 -05:00
CodingCat
aca0096b33 more updates for Flink
more fix
2016-03-11 10:15:49 -05:00
CodingCat
43d7a85bc9 change the API name since we support not only HDFS and local file system 2016-03-11 10:05:32 -05:00
CodingCat
4e86c8c866 fix typo in README 2016-03-09 17:22:19 -05:00
CodingCat
7e30ada8c1 update README 2016-03-09 13:05:08 -05:00
CodingCat
c9830cd8b1 remove spark/flink examples 2016-03-09 12:31:35 -05:00
CodingCat
8cfa752fa0 add scala examples 2016-03-09 12:31:35 -05:00
CodingCat
a08cc8aad4 allow the user define how many workers they need 2016-03-08 18:46:53 -05:00
CodingCat
fa03aaeb63 revise current API 2016-03-08 17:18:55 -05:00
tqchen
435a0425b9 [Spark] Refactor train, predict, add save 2016-03-06 21:51:08 -08:00
tqchen
c05c5bc7bc [DOC-JVM] Refactor JVM docs 2016-03-06 20:42:01 -08:00