xgboost

Author	SHA1	Message	Date
Chen Qin	b29b8c2f34	[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4966 ) * [phase 1] expose sets of rabit configurations to spark layer * add back mutable import * disable ring_mincount till https://github.com/dmlc/rabit/pull/106d * Revert "disable ring_mincount till https://github.com/dmlc/rabit/pull/106d" This reverts commit 65e95a98e24f5eb53c6ba9ef9b2379524258984d. * apply latest rabit * fix build error * apply https://github.com/dmlc/xgboost/pull/4880 * downgrade cmake in rabit * point to rabit with DMLC_ROOT fix * relative path of rabit install prefix * split rabit parameters to another trait * misc * misc * Delete .classpath * Delete .classpath * Delete .classpath * Update XGBoostClassifier.scala * Update XGBoostRegressor.scala * Update GeneralParams.scala * Update GeneralParams.scala * Update GeneralParams.scala * Update GeneralParams.scala * Delete .classpath * Update RabitParams.scala * Update .gitignore * Update .gitignore * apply rabitParams to training * use string as rabit parameter value type * cleanup * add rabitEnv check * point to dmlc/rabit * per feedback * update private scope * misc * update rabit * add rabit_timtout, fix failing test. * split tests * allow build jvm with rabit mock * pass mock failures to rabit with test * add mock error and graceful handle rabit assertion error test * split mvn test * remove sign for test * update rabit * build jvm_packages with rabit mock * point back to dmlc/rabit * per feedback, update scala header * cleanup pom * per feedback * try fix lint * fix lint * per feedback, remove bootstrap_cache * per feedback 2 * try replace dev profile with passing mvn property * fix build error * remove mvn property and replace with env setting to build test jar * per feedback * revert copyright headlines, point to dmlc/rabit * revert python lint * remove multiple failure test case as retry is not enabled in spark * Update core.py * Update core.py * per feedback, style fix	2019-11-01 14:21:19 -07:00
Jiaming Yuan	010b8f1428	Revert "[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876 )" (#4965 ) This reverts commit 86ed01c4bbecef66e1bc4d02fb13116bd6130fae.	2019-10-18 14:02:35 -07:00
Chen Qin	86ed01c4bb	[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876 ) * Expose sets of rabit configurations to spark layer	2019-10-18 15:07:31 -04:00
Liangcai Li	82ee2317e8	Add case for LongParam. (#4885 ) To support specifying long parameter as String, the same as other basic type, such as Int, Double ...	2019-09-25 05:41:53 -07:00
Nan Zhu	fc8c9b0521	[jvm-packages] enable deterministic repartitioning when checkpoint is enabled (#4807 ) * do reparititoning in DataUtil * keep previous behavior of partitioning without checkpoint * deterministic repartitioning * change	2019-09-19 15:21:05 -07:00
Xu Xiao	277e25797b	[jvm-packages] refine numAliveCores method of SparkParallelismTracker (#4858 ) * refine numAliveCores * refine XGBoostToMLlibParams * fix waitForCondition * resolve conflicts * Update SparkParallelismTracker.scala	2019-09-19 15:18:29 -07:00
Nan Zhu	0184eb5d02	[jvm-packages] Refactor XGBoost.scala to put all params processing in one place (#4815 ) * cleaning checkpoint file after a successful file * address comments * refactor xgboost.scala to avoid multiple changes when adding params * consolidate params * fix compilation issue * fix failed test * fix wrong name * tyep conversion	2019-08-28 22:41:05 -07:00
Nan Zhu	7b5cbcc846	[jvm-packages] cleaning checkpoint file after a successful training (#4754 ) * cleaning checkpoint file after a successful file * address comments	2019-08-14 10:57:47 -07:00
Philip Hyunsu Cho	d333918f5e	[jvm-packages] Expose setMissing method in XGBoostClassificationModel / XGBoostRegressionModel (#4643 )	2019-07-07 16:02:44 -07:00
Nan Zhu	abffbe014e	[jvm-packages] delete all constraints from spark layer about obj and eval metrics and handle error in jvm layer (#4560 ) * temp * prediction part * remove supported* * add for test * fix param name * add rabit * update rabit * return value of rabit init * eliminate compilation warnings * update rabit * shutdown * update rabit again * check sparkcontext shutdown * fix logic * sleep * fix tests * test with relaxed threshold * create new thread each time * stop for job quitting * udpate rabit * update rabit * update rabit * update git modules	2019-06-27 08:47:37 -07:00
Jiaming Yuan	2f1319f273	Add `rmsle` metric and `reg:squaredlogerror` objective (#4541 )	2019-06-11 05:48:27 +08:00
Jiaming Yuan	0ce300e73a	[jvm-packages] Add back reg:linear for scala. (#4490 ) * Add back reg:linear for scala. * Fix linter.	2019-05-23 15:02:08 -07:00
Shaochen Shi	18e4fc3690	[jvm-packages] Automatically set maximize_evaluation_metrics if not explicitly given in XGBoost4J-Spark (#4446 ) * Automatically set maximize_evaluation_metrics if not explicitly given. * When custom_eval is set, require maximize_evaluation_metrics. * Update documents on early stop in XGBoost4J-Spark. * Fix code error.	2019-05-09 12:49:44 -07:00
Nan Zhu	995698b0cb	[BREAKING][jvm-packages] fix the non-zero missing value handling (#4349 ) * fix the nan and non-zero missing value handling * fix nan handling part * add missing value * Update MissingValueHandlingSuite.scala * Update MissingValueHandlingSuite.scala * stylistic fix	2019-04-26 11:10:33 -07:00
Xu Xiao	2d875ec019	[BLOCKING][jvm-packages] fix non-deterministic order within a partition (in the case of an upstream shuffle) on prediction (#4388 ) * [jvm-packages][hot-fix] fix column mismatch caused by zip actions at XGBooostModel.transformInternal * apply minibatch in prediction * an iterator-compatible minibatch prediction * regressor impl * continuous working on mini-batch prediction of xgboost4j-spark * Update Booster.java	2019-04-26 11:09:20 -07:00
Nan Zhu	65db8d0626	[jvm-packages] support spark 2.4 and compatibility test with previous xgboost version (#4377 ) * bump spark version * keep float.nan * handle brokenly changed name/value * add test * add model files * add model files * update doc	2019-04-17 11:33:13 -07:00
Nan Zhu	ad4de0d718	[jvm-packages] handle NaN as missing value explicitly (#4309 ) * handle nan * handle nan explicitly * make code better and handle sparse vector in spark * Update XGBoostGeneralSuite.scala	2019-03-30 19:34:26 +08:00
Nan Zhu	45c89a6792	[jvm-packages] logging version number (#4271 ) * print version number * add property file	2019-03-21 18:24:29 +08:00
Nan Zhu	359ed9c5bc	[jvm-packages] add configuration flag to control whether to cache transformed training set (#4268 ) * control whether to cache data * uncache	2019-03-18 10:13:28 +08:00
Jiaming Yuan	29a1356669	Deprecate `reg:linear' in favor of` reg:squarederror'. (#4267 ) * Deprecate `reg:linear' in favor of `reg:squarederror'. * Replace the use of `reg:linear'. * Replace the use of `silent`.	2019-03-17 17:55:04 +08:00
Shaochen Shi	224786f67f	[xgboost4j-spark] Allow set the parameter "maxLeaves". (#4226 ) * Allow set the parameter "maxLeaves". * Add "setMaxLeaves" to XGBoostRegressor.	2019-03-07 18:36:47 -08:00
Rong Ou	d506a8bc63	[jvm-packages] add `verbosity` param (#4138 )	2019-02-13 20:57:17 -08:00
Rong Ou	9b917cda4f	[jvm-packages] fix simple logic error :) (#4128 ) @CodingCat	2019-02-11 21:47:30 -08:00
Nan Zhu	3320a52192	[jvm-packages] force use per-group weights in spark layer (#4118 )	2019-02-10 05:38:03 +08:00
Rong Ou	2a9b085bc8	[jvm-packages] minor fix of params (#4114 )	2019-02-08 00:21:59 -08:00
Nan Zhu	05243642bb	[jvm-packages] better fix for shutdown applications (#4108 ) * intentionally failed task * throw exception * more * stop sparkcontext directly * stop from another thread * new scope * use a new thread * daemon threads * don't join the killer thread * remove injected errors * add comments	2019-02-07 09:02:17 -08:00
Nan Zhu	325b16bccd	[jvm-packages] fix return type of setEvalSets (#4105 )	2019-02-06 11:00:29 -08:00
Nan Zhu	ae3bb9c2d5	Distributed Fast Histogram Algorithm (#4011 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * init * allow hist algo * more changes * temp * update * remove hist sync * udpate rabit * change hist size * change the histogram * update kfactor * sync per node stats * temp * update * final * code clean * update rabit * more cleanup * fix errors * fix failed tests * enforce c++11 * fix lint issue * broadcast subsampled feature correctly * revert some changes * fix lint issue * enable monotone and interaction constraints * don't specify default for monotone and interactions * update docs	2019-02-05 05:12:53 -08:00
Nan Zhu	0d0ce32908	[jvm-packages] adding logs for parameters (#4091 )	2019-01-30 21:50:55 -08:00
KyleLi1985	dade7c3aff	[jvm-packages] Performance consideration and Alignment input parameter of repartition function (#4049 )	2019-01-07 08:38:05 -08:00
Nan Zhu	e290ec9a80	[jvm-packages] fix safe execution (#4046 )	2019-01-05 19:45:37 -08:00
Nan Zhu	f368d0de2b	[jvm-packages] fix the scalability issue of prediction (#4033 )	2018-12-29 20:46:30 -08:00
Nan Zhu	c055a32609	[jvm-packages]support multiple validation datasets in Spark (#3910 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * wrap iterators * enable copartition training and validationset * add parameters * converge code path and have init unit test * enable multi evals for ranking * unit test and doc * update example * fix early stopping * address the offline comments * udpate doc * test eval metrics * fix compilation issue * fix example	2018-12-17 21:03:57 -08:00
Huafeng Wang	42cac4a30b	[jvm-packages] Fix vector size of 'rawPredictionCol' in XGBoostClassificationModel (#3932 ) * Fix vector size of 'rawPredictionCol' in XGBoostClassificationModel * Fix UT	2018-11-23 21:09:43 -08:00
Nan Zhu	aa48b7e903	[jvm-packages][refactor] refactor XGBoost.scala (spark) (#3904 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * wrap iterators * remove unused code * refactor * fix typo	2018-11-15 20:38:28 -08:00
Nan Zhu	4ae225a08d	[Blocking][jvm-packages] fix the early stopping feature (#3808 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * temp * add method for classifier and regressor * update tutorial * address the comments * update	2018-10-23 14:53:13 -07:00
weitian	9504f411c1	[jvm-packages] For training data with group, empty RDD partition threw exception (#3749 ) (#3750 )	2018-10-09 09:03:22 -07:00
Nan Zhu	785094db53	[jvm-packages] fix issue when spark job execution thread cannot return before we execute first() (#3758 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * sparjJobThread * update * fix issue when spark job execution thread cannot return before we execute first()	2018-10-05 22:20:50 -07:00
weitian	efc4f85505	[jvm-packages] Fix #3489 : Spark repartitionForData can potentially shuffle all data and lose ordering required for ranking objectives (#3654 )	2018-10-03 08:43:55 -07:00
Sergei Lebedev	87aca8c244	[jvm-packages] Fixed the distributed updater check (#3739 ) The updater used in distributed training is grow_histmaker and not grow_colmaker as the error message stated prior to this commit.	2018-10-01 11:22:01 -07:00
Michael Mui	20a9e716bd	[jvm-packages] Fix "obj_type" error to enable custom objectives and evaluations (#3646 ) credits to @mmui	2018-09-14 12:06:33 -07:00
Jerry Lin	9acd549dc7	[jvm-packages] Add rank:ndcg and rank:map to Spark supported objectives (#3697 )	2018-09-13 09:51:24 -07:00
Joseph Bradley	14a8b96476	[jvm-packages] xgboost-spark warning when Spark encryption is turned on (#3667 ) * added test, commented out right now * reinstated test * added fix for checking encryption settings * fix by using RDD conf * fix compilation * renamed conf * use SparkSession if available * fix message * nop * code review fixes	2018-09-10 14:21:01 -07:00
Matthew Tovbin	beab6e08dd	Remove println in jsonDecode (#3665 ) Following issue #3578	2018-09-07 15:47:26 -07:00
Nan Zhu	3261002099	[jvm-packages] throw ControlThrowable instead of InterruptedException (#3632 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * interrupted exception is not rethrown	2018-08-25 20:30:21 -07:00
Nan Zhu	4912c1f9c6	[jvm-packages] fix checkpoint save/load (#3614 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix update checkpoint func	2018-08-21 12:34:24 -07:00
Matthew Tovbin	b53a5a262c	[jvm-packages] getTreeLimit return type should be Int	2018-08-17 09:36:00 -07:00
Matthew Tovbin	2b7a1c5780	[jvm-packages] Avoid loosing precision when computing probabilities by converting to Double early (#3576 )	2018-08-13 14:05:07 -07:00
Matthew Tovbin	ce0f0568a6	Make sure 'thresholds' are considered when executing predict method (#3577 )	2018-08-13 14:04:47 -07:00
Mathew	06ef4db4cc	Fix Spark 2.2 Support (Amending #3062 ) (#3325 ) This pull request amends the broken #3062 allow Spark 2.2 to work. Please note this won't work in Spark <=2.1 as sc.removeSparkListener was implemented in Spark 2.2. (So perhaps a more general method is better, although that is what was attempted in #3062) This PR fixes: #3208, #3151 and the discussion in #1927. I do find it strange that #3062 dose not work in Spark 2.2, it's probably due to some sort of public/private issue in the org.apache.spark.scheduler.LiveListenerBus class inheritance (In Spark itself). The error is: `java.lang.NoSuchMethodError: org.apache.spark.scheduler.LiveListenerBus.removeListener(Ljava/lang/Object;)V`	2018-08-12 18:35:20 -07:00

1 2 3

145 Commits