xgboost

Author	SHA1	Message	Date
Sergei Lebedev	0db37c05bd	[jvm-packages] Deterministically XGBoost training on exception (#2405 ) Previously the code relied on the tracker process being terminated by the OS, which was not the case on Windows. Closes #2394	2017-06-12 20:19:28 -07:00
Nan Zhu	a607f697e3	[jvm-packages] Disable fast histo for spark (#2296 ) * add back train method but mark as deprecated * fix scalastyle error * disable fast histogram in xgboost4j-spark temporarily	2017-05-15 20:43:16 -07:00
Nan Zhu	428453f7d6	[jvm-packages] fix the persistence of XGBoostEstimator (#2265 ) * add back train method but mark as deprecated * fix scalastyle error * fix the persistence of XGBoostEstimator * test persistence of a complete pipeline * fix compilation issue * do not allow persist custom_eval and custom_obj * fix the failed tesl	2017-05-08 21:58:06 -07:00
ebernhardson	ccccf8a015	[jvm-packages] Accept groupData in spark model eval (#2244 ) * Support model evaluation for ranking tasks by accepting groupData in XGBoostModel.eval	2017-05-02 10:03:20 -07:00
Nan Zhu	392aa6d1d3	[jvm-packages] make XGBoostModel hold BoosterParams as well (#2214 ) * add back train method but mark as deprecated * fix scalastyle error * make XGBoostModel hold BoosterParams as well	2017-04-21 08:12:50 -07:00
Nan Zhu	a837fa9620	[jvm-packages] rdds containing boosters should be cleaned once we got boosters to driver (#2183 )	2017-04-11 06:12:49 -07:00
Nan Zhu	f08077606c	[jvm-packages] Clean external cache (#2181 ) * add back train method but mark as deprecated * fix scalastyle error * change class to object in examples * fix compilation error * small fix for cleanExternalCache	2017-04-10 07:49:58 -07:00
Nan Zhu	8d8cbcc6db	[jvm-packages] fixed several issues in unit tests (#2173 ) * add back train method but mark as deprecated * fix scalastyle error * change class to object in examples * fix compilation error * fix several issues in tests	2017-04-06 06:25:23 -07:00
cloverrose	288f309434	[jvm-packages] call setGroup for ranking task (#2066 ) * [jvm-packages] call setGroup for ranking task * passing groupData through xgBoostConfMap * fix original comment position * make groupData param * remove groupData variable, use xgBoostConfMap directly * set default groupData value * add use groupData tests * reduce rank-demo size * use TaskContext.getPartitionId() instead of mapPartitionsWithIndex * add DF use groupData test * remove unused varable	2017-03-06 15:45:06 -08:00
geoHeil	cf6b173bd7	[jvm-packages] Spark pipeline persistence (#1906 ) [jvm-packages] Spark pipeline persistence	2017-03-05 18:35:37 -08:00
Nan Zhu	ab13fd72bd	[jvm-packages] Scala/Java interface for Fast Histogram Algorithm (#1966 ) * add back train method but mark as deprecated * fix scalastyle error * first commit in scala binding for fast histo * java test * add missed scala tests * spark training * add back train method but mark as deprecated * fix scalastyle error * local change * first commit in scala binding for fast histo * local change * fix df frame test	2017-03-04 15:37:24 -08:00
Nan Zhu	ac30a0aff5	[jvm-packages][spark]Preserve num classes (#2068 ) * add back train method but mark as deprecated * fix scalastyle error * change class to object in examples * fix compilation error * bump spark version to 2.1 * preserve num_class issues * fix failed test cases * rivising * add multi class test	2017-03-04 14:14:31 -08:00
hlsc	a92093388d	[jvm-packages] fix bug doing rabit call after finalize (#2079 ) [jvm-packages]fix bug doing rabit call after finalize	2017-03-02 16:46:57 -08:00
Nan Zhu	185fe1d645	[jvm-packages] use ML's para system to build the passed-in params to XGBoost (#2043 ) * add back train method but mark as deprecated * fix scalastyle error * use ML's para system to build the passed-in params to XGBoost * clean	2017-02-18 11:56:27 -08:00
DougM	acce11d3f4	fix MLlib CrossValidator issues (wrong default value configuration) #1941 (#2042 )	2017-02-18 08:10:47 -08:00
Ruimin Wang	d9584ab82e	refactor duplicate evaluation implementation (#1852 )	2016-12-08 20:33:40 -08:00
Xin Yin	e7fbc8591f	[jvm-packages] Scala implementation of the Rabit tracker. (#1612 ) * [jvm-packages] Scala implementation of the Rabit tracker. A Scala implementation of RabitTracker that is interface-interchangable with the Java implementation, ported from `tracker.py` in the [dmlc-core project](https://github.com/dmlc/dmlc-core). * [jvm-packages] Updated Akka dependency in pom.xml. * Refactored the RabitTracker directory structure. * Fixed premature stopping of connection handler. Added a new finite state "AwaitingPortNumber" to explicitly wait for the worker to send the port, and close the connection. Stopping the actor prematurely sends a TCP RST to the worker, causing the worker to crash on AssertionError. * Added interface IRabitTracker so that user can switch implementations. * Default timeout duration changes. * Dependency for Akka tests. * Removed the main function of RabitTracker. * A skeleton for testing Akka-based Rabit tracker. * waitFor() in RabitTracker no longer throws exceptions. * Completed unit test for the 'start' command of Rabit tracker. * Preliminary support for Rabit Allreduce via JNI (no prepare function support yet.) * Fixed the default timeout duration. * Use Java container to avoid serialization issues due to intermediate wrappers. * Added tests for Allreduce/model training using Scala Rabit tracker. * Added spill-over unit test for the Scala Rabit tracker. * Fixed a typo. * Overhaul of RabitTracker interface per code review. - Removed methods start() waitFor() (no arguments) from IRabitTracker. - The timeout in start(timeout) is now worker connection timeout, as tcp socket binding timeout is less intuitive. - Dropped time unit from start(...) and waitFor(...) methods; the default time unit is millisecond. - Moved random port number generation into the RabitTrackerHandler. - Moved all Rabit-related classes to package ml.dmlc.xgboost4j.scala.rabit. * More code refactoring and comments. * Unified timeout constants. Readable tracker status code. * Add comments to indicate that allReduce is for tests only. Removed all other variants. * Removed unused imports. * Simplified signatures of training methods. - Moved TrackerConf into parameter map. - Changed GeneralParams so that TrackerConf becomes a standalone parameter. - Updated test cases accordingly. * Changed monitoring strategies. * Reverted monitoring changes. * Update test case for Rabit AllReduce. * Mix in UncaughtExceptionHandler into IRabitTracker to prevent tracker from hanging due to exceptions thrown by workers. * More comprehensive test cases for exception handling and worker connection timeout. * Handle executor loss due to unknown cause: the newly spawned executor will attempt to connect to the tracker. Interrupt tracker in such case. * Per code-review, removed training timeout from TrackerConf. Timeout logic must be implemented explicitly and externally in the driver code. * Reverted scalastyle-config changes. * Visibility scope change. Interface tweaks. * Use match pattern to handle tracker_conf parameter. * Minor clarification in JNI code. * Clearer intent in match pattern to suppress warnings. * Removed Future from constructor. Block in start() and waitFor() instead. * Revert inadvertent comment changes. * Removed debugging information. * Updated test cases that are a bit finicky. * Added comments on the reasoning behind the unit tests for testing Rabit tracker robustness.	2016-12-07 06:35:42 -08:00
Nan Zhu	965091c4bb	[jvm-packages] update methods in test cases to be consistent (#1780 ) * add back train method but mark as deprecated * fix scalastyle error * change class to object in examples * fix compilation error * update methods in test cases to be consistent * add blank lines * fix	2016-11-20 22:49:18 -05:00
joandre	91b75f9b41	Fix a small typo in GeneralParams class. Change customEval parameter name from "custom_obj" to "custom_eval". (#1741 )	2016-11-06 12:44:49 -05:00
Nan Zhu	6082184cd1	[jvm-packages] update API docs (#1713 ) * add back train method but mark as deprecated * fix scalastyle error * update java doc * update	2016-10-27 18:53:22 -07:00
Nan Zhu	d321375df5	[jvm-packages] Fix mis configure of nthread (#1709 ) * add back train method but mark as deprecated * fix scalastyle error * change class to object in examples * fix compilation error * fix mis configuration	2016-10-27 12:10:35 -04:00
Nan Zhu	016ab89484	[jvm-packages] Parameter tuning tool for XGBoost (#1664 )	2016-10-23 16:58:18 -04:00
Nan Zhu	813a53882a	[jvm-packages] deprecate Flaky test (#1662 ) * deprecate flaky test	2016-10-13 07:21:24 -04:00
Nan Zhu	1673bcbe7e	[jvm-packages] separate classification and regression model and integrate with ML package (#1608 )	2016-09-30 11:49:03 -04:00
reg.zhuce	3ee145b8dc	[jvm-packages] IndexOutOfBoundsException (#1589 ) ml.dmlc.xgboost4j.scala.spark.XGBoost.scala:51 values is empty when we meet it at first time, so values(0) throw an IndexOutOfBoundsException. It should be dVector.values(i) instead of values(i).	2016-09-20 09:13:47 -04:00
Xin Yin	7245145712	[jvm-packages] Fixed the sanity check for parameter 'nthread' against 'spark.task.cpus'. (#1582 )	2016-09-16 11:31:35 -04:00
Nan Zhu	4ad648e856	[jvm-packages] predictLeaf with Dataframe (#1576 ) * add back train method but mark as deprecated * predictLeaf with Dataset * fix * fix	2016-09-15 06:15:47 -04:00
Nan Zhu	bb388cbb31	default eval func (#1574 )	2016-09-14 13:26:16 -04:00
Nan Zhu	fb02797e2a	[jvm-packages] Integration with Spark Dataframe/Dataset (#1559 ) * bump up to scala 2.11 * framework of data frame integration * test consistency between RDD and DataFrame * order preservation * test order preservation * example code and fix makefile * improve type checking * improve APIs * user docs * work around travis CI's limitation on log length * adjust test structure * integrate with Spark -1 .x * spark 2.x integration * remove spark 1.x implementation but provide instructions on how to downgrade	2016-09-11 15:02:58 -04:00
Nan Zhu	6dabdd33e3	[jvm-packages] bump to next version (#1535 ) * bump to next version * fix * fix	2016-09-01 12:18:21 -04:00
Nan Zhu	7fb3fbf577	impose shuffle when creating training RDD (#1531 )	2016-08-31 07:34:10 -04:00
Nan Zhu	3f198b9fef	[jvm-packages] allow training with missing values in xgboost-spark (#1525 ) * allow training with missing values in xgboost-spark * fix compilation error * fix bug	2016-08-29 21:45:49 -04:00
Nan Zhu	74db1e8867	[jvm-packages] remove APIs with DMatrix from xgboost-spark (#1519 ) * test consistency of prediction functions between DMatrix and RDD * remove APIs with DMatrix from xgboost-spark * fix compilation error in xgboost4j-example * fix test cases	2016-08-28 21:25:49 -04:00
Nan Zhu	6d65aae091	[jvm-packages] test consistency of prediction functions with DMatrix and RDD (#1518 ) * test consistency of prediction functions between DMatrix and RDD * fix the failed test cases	2016-08-28 20:27:03 -04:00
Nan Zhu	d7f79255ec	improve test of save/load model (#1515 )	2016-08-27 17:16:22 -04:00
Nan Zhu	dc1125eb56	evaluation with RDD data (#1492 )	2016-08-20 18:31:10 -04:00
Nan Zhu	582ee63e34	enable train multiple models by distinguishing stage IDs (#1493 )	2016-08-20 16:37:07 -04:00
Fangzhou	a8adf16228	fix bug: doing rabit call after finalize in spark prediction phase (#1420 )	2016-07-28 23:11:20 -05:00
Earthson Lu	d29edc677c	fix #1377 spark-mllib scope: default => provided (#1381 )	2016-07-20 23:10:49 -04:00
convexquad	313764b3be	Expose predictLeaf functionality in Scala XGBoostModel (#1351 )	2016-07-12 06:55:24 -04:00
Rahul	f14c160f4f	[jvm-packages][xgboost4j-spark][Minor] Move sparkContext dependency from the XGBoostModel (#1335 ) * Move sparkContext dependency from the XGBoostModel * Update Spark example to declare SparkContext as implict	2016-07-08 06:43:33 -04:00
Nan Zhu	bd5b07873e	[jvm-packages] create dmatrix with specified missing value (#1272 ) * create dmatrix with specified missing value * update dmlc-core * support for predict method in spark package repartitioning work around * add more elements to work around training set empty partition issue	2016-06-21 17:35:17 -04:00
Nan Zhu	c9a73fe2a9	explicitly throw exception when detecting empty partition in training dataset (#1281 )	2016-06-15 16:03:37 -04:00
Nan Zhu	c6631ad2ed	specify spark version (#1224 )	2016-05-24 18:19:32 -04:00
Nan Zhu	c85b9012c6	[jvm-packages] xgboost4j-spark external memory (#1219 ) * implement external memory support for XGBoost4J * remove extra space * enable external memory for prediction * update doc	2016-05-22 14:01:28 -04:00
CodingCat	d8535313eb	allow empty partitions	2016-03-23 12:30:06 -04:00
CodingCat	55ab1c6a22	adjust numWorkers for test	2016-03-18 10:34:36 -04:00
CodingCat	3a951d0ab8	getter of XGBoostModel	2016-03-14 07:26:51 -04:00
Nan Zhu	e3fa7753f5	Merge branch 'master' into master	2016-03-13 22:46:38 -04:00
CodingCat	6f92f1c117	update spark version to 1.6.1	2016-03-13 22:46:06 -04:00

1 2

78 Commits