xgboost

Author	SHA1	Message	Date
Anthony D'Amato	f58e41bad8	Fix deterministic partitioning with dataset containing Double.NaN (#5996 ) The functions featureValueOfSparseVector or featureValueOfDenseVector could return a Float.NaN if the input vectore was containing any missing values. This would make fail the partition key computation and most of the vectors would end up in the same partition. We fix this by avoid returning a NaN and simply use the row HashCode in this case. We added a test to ensure that the repartition is indeed now uniform on input dataset containing values by checking that the partitions size variance is below a certain threshold. Signed-off-by: Anthony D'Amato <anthony.damato@hotmail.fr>	2020-08-18 18:55:37 -07:00
Jiaming Yuan	f93f1c03fc	Rabit update. (#5978 ) * Remove parameter on JVM Packages.	2020-08-11 09:17:32 +08:00
Shaochen Shi	71197d1dfa	[jvm-packages] Fix wrong method name `setAllowZeroForMissingValue`. (#5740 ) * Allow non-zero for missing value when training. * Fix wrong method names. * Add a unit test * Move the getter/setter unit test to MissingValueHandlingSuite Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-08-01 17:16:42 -07:00
Bobby Wang	8943eb4314	[BLOCKING] [jvm-packages] add gpu_hist and enable gpu scheduling (#5171 ) * [jvm-packages] add gpu_hist tree method * change updater hist to grow_quantile_histmaker * add gpu scheduling * pass correct parameters to xgboost library * remove debug info * add use.cuda for pom * add CI for gpu_hist for jvm * add gpu unit tests * use gpu node to build jvm * use nvidia-docker * Add CLI interface to create_jni.py using argparse Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-07-26 21:53:24 -07:00
Bobby Wang	9f85e92602	[jvm-packages] update spark dependency to 3.0.0 (#5836 )	2020-07-12 20:58:30 -07:00
Zhang Zhang	1813804e36	Add new parameter singlePrecisionHistogram to xgboost4j-spark (#5811 ) Expose the existing 'singlePrecisionHistogram' param to the Spark layer.	2020-07-08 16:29:35 -07:00
Bobby Wang	ad826e913f	[jvm-packages]add feature size for LabelPoint and DataBatch (#5303 ) * fix type error * Validate number of features. * resolve comments * add feature size for LabelPoint and DataBatch * pass the feature size to native * move feature size validating tests into a separate suite * resolve comments Co-authored-by: fis <jm.yuan@outlook.com>	2020-04-07 16:49:52 -07:00
Nan Zhu	d7b45fbcaf	[jvm-packages] do not use multiple jobs to make checkpoints (#5082 ) * temp * temp * tep * address the comments * fix stylistic issues * fix * external checkpoint	2020-02-01 19:36:39 -08:00
Philip Hyunsu Cho	37fdfa03f8	[jvm-packages] Comply with scala style convention + fix broken unit test (#5134 ) * Fix scala style check * fix messed unit test	2019-12-18 17:26:58 -08:00
cpfarrell	bc9d88259f	[jvm-packages] Allow for bypassing spark missing value check (#4805 ) * Allow for bypassing spark missing value check * Update documentation for dealing with missing values in spark xgboost	2019-12-18 10:48:20 -08:00
Chen Qin	b29b8c2f34	[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4966 ) * [phase 1] expose sets of rabit configurations to spark layer * add back mutable import * disable ring_mincount till https://github.com/dmlc/rabit/pull/106d * Revert "disable ring_mincount till https://github.com/dmlc/rabit/pull/106d" This reverts commit 65e95a98e24f5eb53c6ba9ef9b2379524258984d. * apply latest rabit * fix build error * apply https://github.com/dmlc/xgboost/pull/4880 * downgrade cmake in rabit * point to rabit with DMLC_ROOT fix * relative path of rabit install prefix * split rabit parameters to another trait * misc * misc * Delete .classpath * Delete .classpath * Delete .classpath * Update XGBoostClassifier.scala * Update XGBoostRegressor.scala * Update GeneralParams.scala * Update GeneralParams.scala * Update GeneralParams.scala * Update GeneralParams.scala * Delete .classpath * Update RabitParams.scala * Update .gitignore * Update .gitignore * apply rabitParams to training * use string as rabit parameter value type * cleanup * add rabitEnv check * point to dmlc/rabit * per feedback * update private scope * misc * update rabit * add rabit_timtout, fix failing test. * split tests * allow build jvm with rabit mock * pass mock failures to rabit with test * add mock error and graceful handle rabit assertion error test * split mvn test * remove sign for test * update rabit * build jvm_packages with rabit mock * point back to dmlc/rabit * per feedback, update scala header * cleanup pom * per feedback * try fix lint * fix lint * per feedback, remove bootstrap_cache * per feedback 2 * try replace dev profile with passing mvn property * fix build error * remove mvn property and replace with env setting to build test jar * per feedback * revert copyright headlines, point to dmlc/rabit * revert python lint * remove multiple failure test case as retry is not enabled in spark * Update core.py * Update core.py * per feedback, style fix	2019-11-01 14:21:19 -07:00
Jiaming Yuan	010b8f1428	Revert "[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876 )" (#4965 ) This reverts commit `86ed01c4bb`.	2019-10-18 14:02:35 -07:00
Chen Qin	86ed01c4bb	[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876 ) * Expose sets of rabit configurations to spark layer	2019-10-18 15:07:31 -04:00
Liangcai Li	82ee2317e8	Add case for LongParam. (#4885 ) To support specifying long parameter as String, the same as other basic type, such as Int, Double ...	2019-09-25 05:41:53 -07:00
Nan Zhu	fc8c9b0521	[jvm-packages] enable deterministic repartitioning when checkpoint is enabled (#4807 ) * do reparititoning in DataUtil * keep previous behavior of partitioning without checkpoint * deterministic repartitioning * change	2019-09-19 15:21:05 -07:00
Xu Xiao	277e25797b	[jvm-packages] refine numAliveCores method of SparkParallelismTracker (#4858 ) * refine numAliveCores * refine XGBoostToMLlibParams * fix waitForCondition * resolve conflicts * Update SparkParallelismTracker.scala	2019-09-19 15:18:29 -07:00
Nan Zhu	0184eb5d02	[jvm-packages] Refactor XGBoost.scala to put all params processing in one place (#4815 ) * cleaning checkpoint file after a successful file * address comments * refactor xgboost.scala to avoid multiple changes when adding params * consolidate params * fix compilation issue * fix failed test * fix wrong name * tyep conversion	2019-08-28 22:41:05 -07:00
Nan Zhu	7b5cbcc846	[jvm-packages] cleaning checkpoint file after a successful training (#4754 ) * cleaning checkpoint file after a successful file * address comments	2019-08-14 10:57:47 -07:00
Philip Hyunsu Cho	d333918f5e	[jvm-packages] Expose setMissing method in XGBoostClassificationModel / XGBoostRegressionModel (#4643 )	2019-07-07 16:02:44 -07:00
Nan Zhu	abffbe014e	[jvm-packages] delete all constraints from spark layer about obj and eval metrics and handle error in jvm layer (#4560 ) * temp * prediction part * remove supported* * add for test * fix param name * add rabit * update rabit * return value of rabit init * eliminate compilation warnings * update rabit * shutdown * update rabit again * check sparkcontext shutdown * fix logic * sleep * fix tests * test with relaxed threshold * create new thread each time * stop for job quitting * udpate rabit * update rabit * update rabit * update git modules	2019-06-27 08:47:37 -07:00
Jiaming Yuan	2f1319f273	Add `rmsle` metric and `reg:squaredlogerror` objective (#4541 )	2019-06-11 05:48:27 +08:00
Jiaming Yuan	0ce300e73a	[jvm-packages] Add back reg:linear for scala. (#4490 ) * Add back reg:linear for scala. * Fix linter.	2019-05-23 15:02:08 -07:00
Shaochen Shi	18e4fc3690	[jvm-packages] Automatically set maximize_evaluation_metrics if not explicitly given in XGBoost4J-Spark (#4446 ) * Automatically set maximize_evaluation_metrics if not explicitly given. * When custom_eval is set, require maximize_evaluation_metrics. * Update documents on early stop in XGBoost4J-Spark. * Fix code error.	2019-05-09 12:49:44 -07:00
Nan Zhu	995698b0cb	[BREAKING][jvm-packages] fix the non-zero missing value handling (#4349 ) * fix the nan and non-zero missing value handling * fix nan handling part * add missing value * Update MissingValueHandlingSuite.scala * Update MissingValueHandlingSuite.scala * stylistic fix	2019-04-26 11:10:33 -07:00
Xu Xiao	2d875ec019	[BLOCKING][jvm-packages] fix non-deterministic order within a partition (in the case of an upstream shuffle) on prediction (#4388 ) * [jvm-packages][hot-fix] fix column mismatch caused by zip actions at XGBooostModel.transformInternal * apply minibatch in prediction * an iterator-compatible minibatch prediction * regressor impl * continuous working on mini-batch prediction of xgboost4j-spark * Update Booster.java	2019-04-26 11:09:20 -07:00
Nan Zhu	65db8d0626	[jvm-packages] support spark 2.4 and compatibility test with previous xgboost version (#4377 ) * bump spark version * keep float.nan * handle brokenly changed name/value * add test * add model files * add model files * update doc	2019-04-17 11:33:13 -07:00
Nan Zhu	ad4de0d718	[jvm-packages] handle NaN as missing value explicitly (#4309 ) * handle nan * handle nan explicitly * make code better and handle sparse vector in spark * Update XGBoostGeneralSuite.scala	2019-03-30 19:34:26 +08:00
Nan Zhu	45c89a6792	[jvm-packages] logging version number (#4271 ) * print version number * add property file	2019-03-21 18:24:29 +08:00
Nan Zhu	359ed9c5bc	[jvm-packages] add configuration flag to control whether to cache transformed training set (#4268 ) * control whether to cache data * uncache	2019-03-18 10:13:28 +08:00
Jiaming Yuan	29a1356669	Deprecate `reg:linear' in favor of` reg:squarederror'. (#4267 ) * Deprecate `reg:linear' in favor of `reg:squarederror'. * Replace the use of `reg:linear'. * Replace the use of `silent`.	2019-03-17 17:55:04 +08:00
Shaochen Shi	224786f67f	[xgboost4j-spark] Allow set the parameter "maxLeaves". (#4226 ) * Allow set the parameter "maxLeaves". * Add "setMaxLeaves" to XGBoostRegressor.	2019-03-07 18:36:47 -08:00
Rong Ou	d506a8bc63	[jvm-packages] add `verbosity` param (#4138 )	2019-02-13 20:57:17 -08:00
Rong Ou	9b917cda4f	[jvm-packages] fix simple logic error :) (#4128 ) @CodingCat	2019-02-11 21:47:30 -08:00
Nan Zhu	3320a52192	[jvm-packages] force use per-group weights in spark layer (#4118 )	2019-02-10 05:38:03 +08:00
Rong Ou	2a9b085bc8	[jvm-packages] minor fix of params (#4114 )	2019-02-08 00:21:59 -08:00
Nan Zhu	05243642bb	[jvm-packages] better fix for shutdown applications (#4108 ) * intentionally failed task * throw exception * more * stop sparkcontext directly * stop from another thread * new scope * use a new thread * daemon threads * don't join the killer thread * remove injected errors * add comments	2019-02-07 09:02:17 -08:00
Nan Zhu	325b16bccd	[jvm-packages] fix return type of setEvalSets (#4105 )	2019-02-06 11:00:29 -08:00
Nan Zhu	ae3bb9c2d5	Distributed Fast Histogram Algorithm (#4011 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * init * allow hist algo * more changes * temp * update * remove hist sync * udpate rabit * change hist size * change the histogram * update kfactor * sync per node stats * temp * update * final * code clean * update rabit * more cleanup * fix errors * fix failed tests * enforce c++11 * fix lint issue * broadcast subsampled feature correctly * revert some changes * fix lint issue * enable monotone and interaction constraints * don't specify default for monotone and interactions * update docs	2019-02-05 05:12:53 -08:00
Nan Zhu	0d0ce32908	[jvm-packages] adding logs for parameters (#4091 )	2019-01-30 21:50:55 -08:00
KyleLi1985	dade7c3aff	[jvm-packages] Performance consideration and Alignment input parameter of repartition function (#4049 )	2019-01-07 08:38:05 -08:00
Nan Zhu	e290ec9a80	[jvm-packages] fix safe execution (#4046 )	2019-01-05 19:45:37 -08:00
Nan Zhu	f368d0de2b	[jvm-packages] fix the scalability issue of prediction (#4033 )	2018-12-29 20:46:30 -08:00
Nan Zhu	c055a32609	[jvm-packages]support multiple validation datasets in Spark (#3910 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * wrap iterators * enable copartition training and validationset * add parameters * converge code path and have init unit test * enable multi evals for ranking * unit test and doc * update example * fix early stopping * address the offline comments * udpate doc * test eval metrics * fix compilation issue * fix example	2018-12-17 21:03:57 -08:00
Huafeng Wang	42cac4a30b	[jvm-packages] Fix vector size of 'rawPredictionCol' in XGBoostClassificationModel (#3932 ) * Fix vector size of 'rawPredictionCol' in XGBoostClassificationModel * Fix UT	2018-11-23 21:09:43 -08:00
Nan Zhu	aa48b7e903	[jvm-packages][refactor] refactor XGBoost.scala (spark) (#3904 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * wrap iterators * remove unused code * refactor * fix typo	2018-11-15 20:38:28 -08:00
Nan Zhu	4ae225a08d	[Blocking][jvm-packages] fix the early stopping feature (#3808 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * temp * add method for classifier and regressor * update tutorial * address the comments * update	2018-10-23 14:53:13 -07:00
weitian	9504f411c1	[jvm-packages] For training data with group, empty RDD partition threw exception (#3749 ) (#3750 )	2018-10-09 09:03:22 -07:00
Nan Zhu	785094db53	[jvm-packages] fix issue when spark job execution thread cannot return before we execute first() (#3758 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * sparjJobThread * update * fix issue when spark job execution thread cannot return before we execute first()	2018-10-05 22:20:50 -07:00
weitian	efc4f85505	[jvm-packages] Fix #3489 : Spark repartitionForData can potentially shuffle all data and lose ordering required for ranking objectives (#3654 )	2018-10-03 08:43:55 -07:00
Sergei Lebedev	87aca8c244	[jvm-packages] Fixed the distributed updater check (#3739 ) The updater used in distributed training is grow_histmaker and not grow_colmaker as the error message stated prior to this commit.	2018-10-01 11:22:01 -07:00

1 2 3 4 5

205 Commits