xgboost

Author	SHA1	Message	Date
Xin Yin	e7fbc8591f	[jvm-packages] Scala implementation of the Rabit tracker. (#1612 ) * [jvm-packages] Scala implementation of the Rabit tracker. A Scala implementation of RabitTracker that is interface-interchangable with the Java implementation, ported from `tracker.py` in the [dmlc-core project](https://github.com/dmlc/dmlc-core). * [jvm-packages] Updated Akka dependency in pom.xml. * Refactored the RabitTracker directory structure. * Fixed premature stopping of connection handler. Added a new finite state "AwaitingPortNumber" to explicitly wait for the worker to send the port, and close the connection. Stopping the actor prematurely sends a TCP RST to the worker, causing the worker to crash on AssertionError. * Added interface IRabitTracker so that user can switch implementations. * Default timeout duration changes. * Dependency for Akka tests. * Removed the main function of RabitTracker. * A skeleton for testing Akka-based Rabit tracker. * waitFor() in RabitTracker no longer throws exceptions. * Completed unit test for the 'start' command of Rabit tracker. * Preliminary support for Rabit Allreduce via JNI (no prepare function support yet.) * Fixed the default timeout duration. * Use Java container to avoid serialization issues due to intermediate wrappers. * Added tests for Allreduce/model training using Scala Rabit tracker. * Added spill-over unit test for the Scala Rabit tracker. * Fixed a typo. * Overhaul of RabitTracker interface per code review. - Removed methods start() waitFor() (no arguments) from IRabitTracker. - The timeout in start(timeout) is now worker connection timeout, as tcp socket binding timeout is less intuitive. - Dropped time unit from start(...) and waitFor(...) methods; the default time unit is millisecond. - Moved random port number generation into the RabitTrackerHandler. - Moved all Rabit-related classes to package ml.dmlc.xgboost4j.scala.rabit. * More code refactoring and comments. * Unified timeout constants. Readable tracker status code. * Add comments to indicate that allReduce is for tests only. Removed all other variants. * Removed unused imports. * Simplified signatures of training methods. - Moved TrackerConf into parameter map. - Changed GeneralParams so that TrackerConf becomes a standalone parameter. - Updated test cases accordingly. * Changed monitoring strategies. * Reverted monitoring changes. * Update test case for Rabit AllReduce. * Mix in UncaughtExceptionHandler into IRabitTracker to prevent tracker from hanging due to exceptions thrown by workers. * More comprehensive test cases for exception handling and worker connection timeout. * Handle executor loss due to unknown cause: the newly spawned executor will attempt to connect to the tracker. Interrupt tracker in such case. * Per code-review, removed training timeout from TrackerConf. Timeout logic must be implemented explicitly and externally in the driver code. * Reverted scalastyle-config changes. * Visibility scope change. Interface tweaks. * Use match pattern to handle tracker_conf parameter. * Minor clarification in JNI code. * Clearer intent in match pattern to suppress warnings. * Removed Future from constructor. Block in start() and waitFor() instead. * Revert inadvertent comment changes. * Removed debugging information. * Updated test cases that are a bit finicky. * Added comments on the reasoning behind the unit tests for testing Rabit tracker robustness.	2016-12-07 06:35:42 -08:00
Ruimin Wang	d80cec3384	[jvm-pacakges] the first parameter in getModelDump should be featuremap path not model path (#1788 ) * fix the model dump in xgboost4j example * Modify the dump model part of scala version * add the forgotten modelInfos	2016-11-21 08:52:26 -05:00
XianXing Zhang	ce708c8e7f	[jvm-packages] Leverage the Spark ml API to read DataFrame from files in LibSVM format. (#1785 )	2016-11-20 21:28:03 -05:00
Nan Zhu	6082184cd1	[jvm-packages] update API docs (#1713 ) * add back train method but mark as deprecated * fix scalastyle error * update java doc * update	2016-10-27 18:53:22 -07:00
Nan Zhu	f12074d355	[jvm-packages] release blog (#1706 )	2016-10-26 21:35:42 -04:00
Nan Zhu	f801c22710	[jvm-packages] change class to object in examples (#1703 ) * change class to object in examples * fix compilation error	2016-10-26 14:54:56 -04:00
Nan Zhu	016ab89484	[jvm-packages] Parameter tuning tool for XGBoost (#1664 )	2016-10-23 16:58:18 -04:00
Nan Zhu	1673bcbe7e	[jvm-packages] separate classification and regression model and integrate with ML package (#1608 )	2016-09-30 11:49:03 -04:00
Nan Zhu	fb02797e2a	[jvm-packages] Integration with Spark Dataframe/Dataset (#1559 ) * bump up to scala 2.11 * framework of data frame integration * test consistency between RDD and DataFrame * order preservation * test order preservation * example code and fix makefile * improve type checking * improve APIs * user docs * work around travis CI's limitation on log length * adjust test structure * integrate with Spark -1 .x * spark 2.x integration * remove spark 1.x implementation but provide instructions on how to downgrade	2016-09-11 15:02:58 -04:00
Nan Zhu	6dabdd33e3	[jvm-packages] bump to next version (#1535 ) * bump to next version * fix * fix	2016-09-01 12:18:21 -04:00
Nan Zhu	74db1e8867	[jvm-packages] remove APIs with DMatrix from xgboost-spark (#1519 ) * test consistency of prediction functions between DMatrix and RDD * remove APIs with DMatrix from xgboost-spark * fix compilation error in xgboost4j-example * fix test cases	2016-08-28 21:25:49 -04:00
Earthson Lu	d29edc677c	fix #1377 spark-mllib scope: default => provided (#1381 )	2016-07-20 23:10:49 -04:00
Rahul	f14c160f4f	[jvm-packages][xgboost4j-spark][Minor] Move sparkContext dependency from the XGBoostModel (#1335 ) * Move sparkContext dependency from the XGBoostModel * Update Spark example to declare SparkContext as implict	2016-07-08 06:43:33 -04:00
Nan Zhu	c85b9012c6	[jvm-packages] xgboost4j-spark external memory (#1219 ) * implement external memory support for XGBoost4J * remove extra space * enable external memory for prediction * update doc	2016-05-22 14:01:28 -04:00
tqchen	90f7220736	[FLINK] remove nWorker from API	2016-03-14 16:18:35 -07:00
CodingCat	f2ef958ebb	support kryo serialization	2016-03-13 11:55:14 -04:00
CodingCat	16b9e92328	force the user to set number of workers	2016-03-12 13:33:57 -05:00
CodingCat	400b1faecc	adjust the API signature as well as the docs	2016-03-11 15:22:44 -05:00
CodingCat	ab68a0ccc7	fix examples	2016-03-11 13:57:03 -05:00
CodingCat	aca0096b33	more updates for Flink more fix	2016-03-11 10:15:49 -05:00
CodingCat	43d7a85bc9	change the API name since we support not only HDFS and local file system	2016-03-11 10:05:32 -05:00
CodingCat	4e86c8c866	fix typo in README	2016-03-09 17:22:19 -05:00
CodingCat	7e30ada8c1	update README	2016-03-09 13:05:08 -05:00
CodingCat	c9830cd8b1	remove spark/flink examples	2016-03-09 12:31:35 -05:00
CodingCat	8cfa752fa0	add scala examples	2016-03-09 12:31:35 -05:00
CodingCat	a08cc8aad4	allow the user define how many workers they need	2016-03-08 18:46:53 -05:00
CodingCat	fa03aaeb63	revise current API	2016-03-08 17:18:55 -05:00
tqchen	435a0425b9	[Spark] Refactor train, predict, add save	2016-03-06 21:51:08 -08:00
tqchen	c05c5bc7bc	[DOC-JVM] Refactor JVM docs	2016-03-06 20:42:01 -08:00

29 Commits