xgboost

Author	SHA1	Message	Date
Nan Zhu	65db8d0626	[jvm-packages] support spark 2.4 and compatibility test with previous xgboost version (#4377 ) * bump spark version * keep float.nan * handle brokenly changed name/value * add test * add model files * add model files * update doc	2019-04-17 11:33:13 -07:00
Jiaming Yuan	207f058711	Refactor CMake scripts. (#4323 ) * Refactor CMake scripts. * Remove CMake CUDA wrapper. * Bump CMake version for CUDA. * Use CMake to handle Doxygen. * Split up CMakeList. * Export install target. * Use modern CMake. * Remove build.sh * Workaround for gpu_hist test. * Use cmake 3.12. * Revert machine.conf. * Move CLI test to gpu. * Small cleanup. * Support using XGBoost as submodule. * Fix windows * Fix cpp tests on Windows * Remove duplicated find_package.	2019-04-15 10:08:12 -07:00
Adam Pocock	a448a8320c	[jvm-packages] Fixing the NativeLibLoader on Java 9+ (#4351 ) The old NativeLibLoader had a short-circuit load path which modified java.library.path and attempted to load the xgboost library from outside the jar first, falling back to loading the library from inside the jar. This path is a no-op every time when using XGBoost outside of it's source tree. Additionally it triggers an illegal reflective access warning in the module system in 9, 10, and 11. On Java 12 the ClassLoader fields are not accessible via reflection (separately from the illegal reflective acces warning), and so it fails in a way that isn't caught by the code which falls back to loading the library from inside the jar. This commit removes that code path and always loads the xgboost library from inside the jar file as it's a valid technique across multiple JVM implementations and works with all versions of Java.	2019-04-10 12:41:44 -07:00
Xu Xiao	60a9af567c	[jvm-packages] Add methods operating attributes of booster in jvm package, which follow API design in python package. (#4336 )	2019-04-08 11:00:35 -07:00
Nan Zhu	ad4de0d718	[jvm-packages] handle NaN as missing value explicitly (#4309 ) * handle nan * handle nan explicitly * make code better and handle sparse vector in spark * Update XGBoostGeneralSuite.scala	2019-03-30 19:34:26 +08:00
Rong Ou	7ea5b772fb	do not filter shared library files (#4303 )	2019-03-28 19:40:54 +08:00
Rong Ou	8c8021dfa7	use all cores to build on linux (#4304 )	2019-03-27 19:51:08 -07:00
Harry Braviner	b374e0a7ab	[jvm-packages] Allow supression of Rabit output in Booster::train in xgboost4j (#4262 ) * Make train in xgboost4j respect print params Previously no setting in params argument of Booster::train would prevent the Rabit.trackerPrint call. This can fill up a lot of screen space in the case that many folds are being trained. * Setting "silent" in this map to "true", "True", a non-zero integer, or a string that can be parsed to such an int will prevent printing. * Setting "verbose_eval" to "False" or "false" will prevent printing. * Setting "verbose_eval" to an int (or a String parseable to an int) n will result in printing every n steps, or no printing is n is zero. This is to match the python behaviour described here: https://www.kaggle.com/c/rossmann-store-sales/discussion/17499 * Fixed 'slient' typo in xgboost4j test * private access on two methods	2019-03-21 18:25:12 +08:00
Nan Zhu	45c89a6792	[jvm-packages] logging version number (#4271 ) * print version number * add property file	2019-03-21 18:24:29 +08:00
Nan Zhu	359ed9c5bc	[jvm-packages] add configuration flag to control whether to cache transformed training set (#4268 ) * control whether to cache data * uncache	2019-03-18 10:13:28 +08:00
Jiaming Yuan	29a1356669	Deprecate `reg:linear' in favor of` reg:squarederror'. (#4267 ) * Deprecate `reg:linear' in favor of `reg:squarederror'. * Replace the use of `reg:linear'. * Replace the use of `silent`.	2019-03-17 17:55:04 +08:00
Shaochen Shi	224786f67f	[xgboost4j-spark] Allow set the parameter "maxLeaves". (#4226 ) * Allow set the parameter "maxLeaves". * Add "setMaxLeaves" to XGBoostRegressor.	2019-03-07 18:36:47 -08:00
Christopher Suchanek	ac3d03089b	[jvm-packages] remove shutdown of handler shutdown (#4224 )	2019-03-06 19:32:43 -08:00
Nan Zhu	5f34078fba	[jvm-packages] bump version for master (#4209 ) * update version * bump version	2019-03-04 23:12:24 -08:00
Yanbo Liang	9fefa2128d	[jvm-packages] Fix early stop with xgboost4j-spark (#4176 ) * Fix early stop with xgboost4j-spark * Update XGBoost.java * Update XGBoost.java * Update XGBoost.java To use -Float.MAX_VALUE as the lower bound, in case there is positive metric. * Only update best score if the current score is better (no update when equal) * Update xgboost-spark tutorial to fix early stopping docs.	2019-03-01 13:02:57 -08:00
Nan Zhu	1b7405f688	[jvm-packages] fix comments in objectiveTrait (#4174 )	2019-02-22 00:32:13 -08:00
Nan Zhu	dc2add96c5	[jvm-packages] upgrade spark version (#4170 )	2019-02-21 11:51:36 -08:00
Rong Ou	d506a8bc63	[jvm-packages] add `verbosity` param (#4138 )	2019-02-13 20:57:17 -08:00
Nan Zhu	c18a3660fa	Separate Depthwidth and Lossguide growing policy in fast histogram (#4102 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * init * more changes * temp * update * udpate rabit * change the histogram * update kfactor * sync per node stats * temp * update * final * code clean * update rabit * more cleanup * fix errors * fix failed tests * enforce c++11 * broadcast subsampled feature correctly * init col * temp * col sampling * fix histmastrix init * fix col sampling * remove cout * fix out of bound access * fix core dump remove core dump file * disbale test temporarily * update * add fid * print perf data * update * revert some changes * temp * temp * pass all tests * bring back some tests * recover some changes * fix lint issue * enable monotone and interaction constraints * don't specify default for monotone and interactions * recover column init part * more recovery * fix core dumps * code clean * revert some changes * fix test compilation issue * fix lint issue * resolve compilation issue * fix issues of lint caused by rebase * fix stylistic changes and change variable names * use regtree internal function * modularize depth width * address the comments * fix failed tests * wrap perf timers with class * fix lint * fix num_leaves count * fix indention * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.h Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.h Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * merge * fix compilation	2019-02-13 12:56:19 -08:00
Rong Ou	9b917cda4f	[jvm-packages] fix simple logic error :) (#4128 ) @CodingCat	2019-02-11 21:47:30 -08:00
Nan Zhu	3320a52192	[jvm-packages] force use per-group weights in spark layer (#4118 )	2019-02-10 05:38:03 +08:00
Rong Ou	2a9b085bc8	[jvm-packages] minor fix of params (#4114 )	2019-02-08 00:21:59 -08:00
Nan Zhu	05243642bb	[jvm-packages] better fix for shutdown applications (#4108 ) * intentionally failed task * throw exception * more * stop sparkcontext directly * stop from another thread * new scope * use a new thread * daemon threads * don't join the killer thread * remove injected errors * add comments	2019-02-07 09:02:17 -08:00
Nan Zhu	325b16bccd	[jvm-packages] fix return type of setEvalSets (#4105 )	2019-02-06 11:00:29 -08:00
Nan Zhu	ae3bb9c2d5	Distributed Fast Histogram Algorithm (#4011 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * init * allow hist algo * more changes * temp * update * remove hist sync * udpate rabit * change hist size * change the histogram * update kfactor * sync per node stats * temp * update * final * code clean * update rabit * more cleanup * fix errors * fix failed tests * enforce c++11 * fix lint issue * broadcast subsampled feature correctly * revert some changes * fix lint issue * enable monotone and interaction constraints * don't specify default for monotone and interactions * update docs	2019-02-05 05:12:53 -08:00
Nan Zhu	0d0ce32908	[jvm-packages] adding logs for parameters (#4091 )	2019-01-30 21:50:55 -08:00
Jiaming Yuan	301cef4638	Correct JVM CMake GPU flag. (#4071 )	2019-01-21 20:36:38 +08:00
KyleLi1985	dade7c3aff	[jvm-packages] Performance consideration and Alignment input parameter of repartition function (#4049 )	2019-01-07 08:38:05 -08:00
Nan Zhu	773ddbcfcb	[BLOCKING] fix the issue with infrequent feature (#4045 ) * fix the issue with infrequent feature * handle exception * use only 2 workers * address the comments	2019-01-06 16:01:03 -08:00
Nan Zhu	e290ec9a80	[jvm-packages] fix safe execution (#4046 )	2019-01-05 19:45:37 -08:00
Shayak Banerjee	431c850c03	[jvm-packages] Updates to Java Booster to support other feature importance measures (#3801 ) * Updates to Booster to support other feature importances * Add returns for Java methods * Pass Scala style checks * Pass Java style checks * Fix indents * Use class instead of enum * Return map string double * A no longer broken build, thanks to mvn package local build * Add a unit test to increase code coverage back * Address code review on main code * Add more unit tests for different feature importance scores * Address more CR	2019-01-02 01:13:14 -08:00
Nan Zhu	f368d0de2b	[jvm-packages] fix the scalability issue of prediction (#4033 )	2018-12-29 20:46:30 -08:00
Nan Zhu	c055a32609	[jvm-packages]support multiple validation datasets in Spark (#3910 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * wrap iterators * enable copartition training and validationset * add parameters * converge code path and have init unit test * enable multi evals for ranking * unit test and doc * update example * fix early stopping * address the offline comments * udpate doc * test eval metrics * fix compilation issue * fix example	2018-12-17 21:03:57 -08:00
Nan Zhu	9c4ff50e83	[jvm-packages]Fix early stopping condition (#3928 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * update version * 0.82 * fix early stopping condition * remove unused * update comments * udpate comments * update test	2018-11-24 00:18:07 -08:00
Huafeng Wang	42cac4a30b	[jvm-packages] Fix vector size of 'rawPredictionCol' in XGBoostClassificationModel (#3932 ) * Fix vector size of 'rawPredictionCol' in XGBoostClassificationModel * Fix UT	2018-11-23 21:09:43 -08:00
Philip Hyunsu Cho	86aac98e54	[jvm-packages] Fix #3898 : use correct group ID for maven-site-plugin (#3937 )	2018-11-23 09:46:27 -08:00
Nan Zhu	dc2bfbfde1	[jvm-packages] update version to 0.82-SNAPSHOT (#3920 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * update version * 0.82	2018-11-18 16:47:48 -08:00
Nan Zhu	aa48b7e903	[jvm-packages][refactor] refactor XGBoost.scala (spark) (#3904 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * wrap iterators * remove unused code * refactor * fix typo	2018-11-15 20:38:28 -08:00
ajing	0ddb8a7661	Update README.md (#3872 ) SparkWithDataFrame was not there anymore. So replace with SparkMLlibPipeline.scala	2018-11-12 11:03:13 -08:00
Philip Hyunsu Cho	78ec77fa97	Release 0.81 version (#3864 ) * Release 0.81 version * Update NEWS.md	2018-11-04 05:49:11 -08:00
Philip Hyunsu Cho	2febc105a4	[jvm-packages] Fix JVM doc build (#3853 ) To get around of the bug https://issues.apache.org/jira/browse/SUREFIRE-1588, set useSystemClassLoader=false.	2018-11-01 15:16:08 -07:00
Matthew Tovbin	d81fedb955	[jvm-packages] RabitTracker for Scala: allow specifying host ip from the xgboost-tracker.properties file (#3833 )	2018-10-26 22:01:36 -07:00
Nan Zhu	4ae225a08d	[Blocking][jvm-packages] fix the early stopping feature (#3808 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * temp * add method for classifier and regressor * update tutorial * address the comments * update	2018-10-23 14:53:13 -07:00
Philip Hyunsu Cho	e26b5d63b2	[jvm-packages] Upgrade Scala to 2.11.12 to address CVE-2017-15288 (#3816 ) A privilege escalation vulnerability (CVE-2017-15288) has been identified in the Scala compilation daemon. See https://nvd.nist.gov/vuln/detail/CVE-2017-15288 Fix: Upgrade Scala to 2.11.12.	2018-10-22 10:15:30 -07:00
weitian	9504f411c1	[jvm-packages] For training data with group, empty RDD partition threw exception (#3749 ) (#3750 )	2018-10-09 09:03:22 -07:00
Nan Zhu	785094db53	[jvm-packages] fix issue when spark job execution thread cannot return before we execute first() (#3758 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * sparjJobThread * update * fix issue when spark job execution thread cannot return before we execute first()	2018-10-05 22:20:50 -07:00
zengxy	9e73087324	[jvm-packages] support specified feature names when getModelDump and getFeatureScore (#3733 ) * [jvm-packages] support specified feature names for jvm when get ModelDump and get FeatureScore (#3725) * typo and style fix	2018-10-04 09:05:42 -07:00
weitian	efc4f85505	[jvm-packages] Fix #3489 : Spark repartitionForData can potentially shuffle all data and lose ordering required for ranking objectives (#3654 )	2018-10-03 08:43:55 -07:00
Sergei Lebedev	87aca8c244	[jvm-packages] Fixed the distributed updater check (#3739 ) The updater used in distributed training is grow_histmaker and not grow_colmaker as the error message stated prior to this commit.	2018-10-01 11:22:01 -07:00
Nan Zhu	79d854c695	[jvm-packages] fix errors in example (#3719 ) * add back train method but mark as deprecated * fix scalastyle error * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * instrumentation * use log console * better measurement * fix erros in example * update histmaker	2018-09-22 16:39:38 -07:00

... 3 4 5 6 7 ...

472 Commits