xgboost

Author	SHA1	Message	Date
Jiaming Yuan	f2b8cd2922	Add number of columns to native data iterator. (#5202 ) * Change native data iter into an adapter.	2020-02-25 23:42:01 +08:00
Jiaming Yuan	9f77c18b0d	Add JVM_CHECK_CALL. (#5199 ) * Added a check call macro in jvm package, prevents executing other functions from jvm when error occurred in XGBoost. For example, when prediction fails jvm should not try to allocate memory based on the output prediction size.	2020-02-18 11:10:55 +08:00
Nan Zhu	d7b45fbcaf	[jvm-packages] do not use multiple jobs to make checkpoints (#5082 ) * temp * temp * tep * address the comments * fix stylistic issues * fix * external checkpoint	2020-02-01 19:36:39 -08:00
Kodi Arfer	f100b8d878	[Breaking] Don't drop trees during DART prediction by default (#5115 ) * Simplify DropTrees calling logic * Add `training` parameter for prediction method. * [Breaking]: Add `training` to C API. * Change for R and Python custom objective. * Correct comment. Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2020-01-13 21:48:30 +08:00
Jiaming Yuan	7b65698187	Enforce correct data shape. (#5191 ) * Fix syncing DMatrix columns. * notes for tree method. * Enable feature validation for all interfaces except for jvm. * Better tests for boosting from predictions. * Disable validation on JVM.	2020-01-13 15:48:17 +08:00
Chen Qin	b29b8c2f34	[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4966 ) * [phase 1] expose sets of rabit configurations to spark layer * add back mutable import * disable ring_mincount till https://github.com/dmlc/rabit/pull/106d * Revert "disable ring_mincount till https://github.com/dmlc/rabit/pull/106d" This reverts commit 65e95a98e24f5eb53c6ba9ef9b2379524258984d. * apply latest rabit * fix build error * apply https://github.com/dmlc/xgboost/pull/4880 * downgrade cmake in rabit * point to rabit with DMLC_ROOT fix * relative path of rabit install prefix * split rabit parameters to another trait * misc * misc * Delete .classpath * Delete .classpath * Delete .classpath * Update XGBoostClassifier.scala * Update XGBoostRegressor.scala * Update GeneralParams.scala * Update GeneralParams.scala * Update GeneralParams.scala * Update GeneralParams.scala * Delete .classpath * Update RabitParams.scala * Update .gitignore * Update .gitignore * apply rabitParams to training * use string as rabit parameter value type * cleanup * add rabitEnv check * point to dmlc/rabit * per feedback * update private scope * misc * update rabit * add rabit_timtout, fix failing test. * split tests * allow build jvm with rabit mock * pass mock failures to rabit with test * add mock error and graceful handle rabit assertion error test * split mvn test * remove sign for test * update rabit * build jvm_packages with rabit mock * point back to dmlc/rabit * per feedback, update scala header * cleanup pom * per feedback * try fix lint * fix lint * per feedback, remove bootstrap_cache * per feedback 2 * try replace dev profile with passing mvn property * fix build error * remove mvn property and replace with env setting to build test jar * per feedback * revert copyright headlines, point to dmlc/rabit * revert python lint * remove multiple failure test case as retry is not enabled in spark * Update core.py * Update core.py * per feedback, style fix	2019-11-01 14:21:19 -07:00
Jiaming Yuan	010b8f1428	Revert "[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876 )" (#4965 ) This reverts commit `86ed01c4bb`.	2019-10-18 14:02:35 -07:00
Chen Qin	86ed01c4bb	[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876 ) * Expose sets of rabit configurations to spark layer	2019-10-18 15:07:31 -04:00
Jiaming Yuan	31030a8d3a	Set correct file permission. (#4964 )	2019-10-18 12:54:29 -04:00
Honza Sterba	22209b7b95	[jvm-packages] Add BigDenseMatrix (#4383 ) * Add BigDenseMatrix * ability to create DMatrix with bigger than Integer.MAX_VALUE size arrays * uses sun.misc.Unsafe * make DMatrix test work from a jar as well	2019-09-18 20:46:14 -07:00
Jiaming Yuan	d669ea1eaa	Deprecate set group (#4864 ) * Convert jvm package and R package. * Restore for compatibility.	2019-09-17 21:26:54 -04:00
Stephanie Yang	0fc7dcfe6c	Add public group getter for java and scala (#4838 ) * Add public group getter for java and scala * Remove unnecessary param from javadoc * Fix typo * Fix another typo * Add semicolon * Fix javadoc return statement * Fix missing return statement * Add a unit test	2019-09-09 10:07:48 -07:00
Oleksandr Pryimak	b68de018b8	[jvm-packages] jvm test should clean up after themselfs (#4706 )	2019-08-04 14:09:11 -07:00
koertkuipers	3c506b076e	[jvm-packages] upgrade to Scala 2.12 (#4574 ) * bump scala to 2.12 which requires java 8 and also newer flink and akka * put scala version in artifactId * fix appveyor * fix for scaladoc issue that looks like https://github.com/scala/bug/issues/10509 * fix ci_build * update versions in generate_pom.py * fix generate_pom.py * apache does not have a download for spark 2.4.3 distro using scala 2.12 yet, so for now i use a tgz i put on s3 * Upload spark-2.4.3-bin-scala2.12-hadoop2.7.tgz to our own S3 * Update Dockerfile.jvm_cross * Update Dockerfile.jvm_cross	2019-07-16 08:43:34 -07:00
Nan Zhu	abffbe014e	[jvm-packages] delete all constraints from spark layer about obj and eval metrics and handle error in jvm layer (#4560 ) * temp * prediction part * remove supported* * add for test * fix param name * add rabit * update rabit * return value of rabit init * eliminate compilation warnings * update rabit * shutdown * update rabit again * check sparkcontext shutdown * fix logic * sleep * fix tests * test with relaxed threshold * create new thread each time * stop for job quitting * udpate rabit * update rabit * update rabit * update git modules	2019-06-27 08:47:37 -07:00
Nan Zhu	fe2de6f415	[jvm-packages]fix silly bug in feature scoring (#4604 )	2019-06-25 20:49:01 -07:00
Bryan Woods	278562db13	Add support for cross-validation using query ID (#4474 ) * adding support for matrix slicing with query ID for cross-validation * hail mary test of unrar installation for windows tests * trying to modify tests to run in Github CI * Remove dependency on wget and unrar * Save error log from R test * Relax assertion in test_training * Use int instead of bool in C function interface * Revise R interface * Add XGDMatrixSliceDMatrixEx and keep old XGDMatrixSliceDMatrix for API compatibility	2019-05-23 10:45:02 -07:00
Nan Zhu	37dc82c3ff	[jvm-packages] allow partial evaluation of dataframe before prediction (#4407 ) * allow partial evaluation of dataframe before prediction * resume spark test * comments * Run unit tests after building JVM packages	2019-04-26 21:02:40 -07:00
Nan Zhu	995698b0cb	[BREAKING][jvm-packages] fix the non-zero missing value handling (#4349 ) * fix the nan and non-zero missing value handling * fix nan handling part * add missing value * Update MissingValueHandlingSuite.scala * Update MissingValueHandlingSuite.scala * stylistic fix	2019-04-26 11:10:33 -07:00
Jiaming Yuan	207f058711	Refactor CMake scripts. (#4323 ) * Refactor CMake scripts. * Remove CMake CUDA wrapper. * Bump CMake version for CUDA. * Use CMake to handle Doxygen. * Split up CMakeList. * Export install target. * Use modern CMake. * Remove build.sh * Workaround for gpu_hist test. * Use cmake 3.12. * Revert machine.conf. * Move CLI test to gpu. * Small cleanup. * Support using XGBoost as submodule. * Fix windows * Fix cpp tests on Windows * Remove duplicated find_package.	2019-04-15 10:08:12 -07:00
Adam Pocock	a448a8320c	[jvm-packages] Fixing the NativeLibLoader on Java 9+ (#4351 ) The old NativeLibLoader had a short-circuit load path which modified java.library.path and attempted to load the xgboost library from outside the jar first, falling back to loading the library from inside the jar. This path is a no-op every time when using XGBoost outside of it's source tree. Additionally it triggers an illegal reflective access warning in the module system in 9, 10, and 11. On Java 12 the ClassLoader fields are not accessible via reflection (separately from the illegal reflective acces warning), and so it fails in a way that isn't caught by the code which falls back to loading the library from inside the jar. This commit removes that code path and always loads the xgboost library from inside the jar file as it's a valid technique across multiple JVM implementations and works with all versions of Java.	2019-04-10 12:41:44 -07:00
Xu Xiao	60a9af567c	[jvm-packages] Add methods operating attributes of booster in jvm package, which follow API design in python package. (#4336 )	2019-04-08 11:00:35 -07:00
Harry Braviner	b374e0a7ab	[jvm-packages] Allow supression of Rabit output in Booster::train in xgboost4j (#4262 ) * Make train in xgboost4j respect print params Previously no setting in params argument of Booster::train would prevent the Rabit.trackerPrint call. This can fill up a lot of screen space in the case that many folds are being trained. * Setting "silent" in this map to "true", "True", a non-zero integer, or a string that can be parsed to such an int will prevent printing. * Setting "verbose_eval" to "False" or "false" will prevent printing. * Setting "verbose_eval" to an int (or a String parseable to an int) n will result in printing every n steps, or no printing is n is zero. This is to match the python behaviour described here: https://www.kaggle.com/c/rossmann-store-sales/discussion/17499 * Fixed 'slient' typo in xgboost4j test * private access on two methods	2019-03-21 18:25:12 +08:00
Nan Zhu	45c89a6792	[jvm-packages] logging version number (#4271 ) * print version number * add property file	2019-03-21 18:24:29 +08:00
Christopher Suchanek	ac3d03089b	[jvm-packages] remove shutdown of handler shutdown (#4224 )	2019-03-06 19:32:43 -08:00
Yanbo Liang	9fefa2128d	[jvm-packages] Fix early stop with xgboost4j-spark (#4176 ) * Fix early stop with xgboost4j-spark * Update XGBoost.java * Update XGBoost.java * Update XGBoost.java To use -Float.MAX_VALUE as the lower bound, in case there is positive metric. * Only update best score if the current score is better (no update when equal) * Update xgboost-spark tutorial to fix early stopping docs.	2019-03-01 13:02:57 -08:00
Nan Zhu	1b7405f688	[jvm-packages] fix comments in objectiveTrait (#4174 )	2019-02-22 00:32:13 -08:00
Nan Zhu	c18a3660fa	Separate Depthwidth and Lossguide growing policy in fast histogram (#4102 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * init * more changes * temp * update * udpate rabit * change the histogram * update kfactor * sync per node stats * temp * update * final * code clean * update rabit * more cleanup * fix errors * fix failed tests * enforce c++11 * broadcast subsampled feature correctly * init col * temp * col sampling * fix histmastrix init * fix col sampling * remove cout * fix out of bound access * fix core dump remove core dump file * disbale test temporarily * update * add fid * print perf data * update * revert some changes * temp * temp * pass all tests * bring back some tests * recover some changes * fix lint issue * enable monotone and interaction constraints * don't specify default for monotone and interactions * recover column init part * more recovery * fix core dumps * code clean * revert some changes * fix test compilation issue * fix lint issue * resolve compilation issue * fix issues of lint caused by rebase * fix stylistic changes and change variable names * use regtree internal function * modularize depth width * address the comments * fix failed tests * wrap perf timers with class * fix lint * fix num_leaves count * fix indention * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.h Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.h Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * merge * fix compilation	2019-02-13 12:56:19 -08:00
Nan Zhu	ae3bb9c2d5	Distributed Fast Histogram Algorithm (#4011 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * init * allow hist algo * more changes * temp * update * remove hist sync * udpate rabit * change hist size * change the histogram * update kfactor * sync per node stats * temp * update * final * code clean * update rabit * more cleanup * fix errors * fix failed tests * enforce c++11 * fix lint issue * broadcast subsampled feature correctly * revert some changes * fix lint issue * enable monotone and interaction constraints * don't specify default for monotone and interactions * update docs	2019-02-05 05:12:53 -08:00
Shayak Banerjee	431c850c03	[jvm-packages] Updates to Java Booster to support other feature importance measures (#3801 ) * Updates to Booster to support other feature importances * Add returns for Java methods * Pass Scala style checks * Pass Java style checks * Fix indents * Use class instead of enum * Return map string double * A no longer broken build, thanks to mvn package local build * Add a unit test to increase code coverage back * Address code review on main code * Add more unit tests for different feature importance scores * Address more CR	2019-01-02 01:13:14 -08:00
Nan Zhu	c055a32609	[jvm-packages]support multiple validation datasets in Spark (#3910 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * wrap iterators * enable copartition training and validationset * add parameters * converge code path and have init unit test * enable multi evals for ranking * unit test and doc * update example * fix early stopping * address the offline comments * udpate doc * test eval metrics * fix compilation issue * fix example	2018-12-17 21:03:57 -08:00
Nan Zhu	9c4ff50e83	[jvm-packages]Fix early stopping condition (#3928 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * update version * 0.82 * fix early stopping condition * remove unused * update comments * udpate comments * update test	2018-11-24 00:18:07 -08:00
Matthew Tovbin	d81fedb955	[jvm-packages] RabitTracker for Scala: allow specifying host ip from the xgboost-tracker.properties file (#3833 )	2018-10-26 22:01:36 -07:00
Nan Zhu	4ae225a08d	[Blocking][jvm-packages] fix the early stopping feature (#3808 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * temp * add method for classifier and regressor * update tutorial * address the comments * update	2018-10-23 14:53:13 -07:00
zengxy	9e73087324	[jvm-packages] support specified feature names when getModelDump and getFeatureScore (#3733 ) * [jvm-packages] support specified feature names for jvm when get ModelDump and get FeatureScore (#3725) * typo and style fix	2018-10-04 09:05:42 -07:00
Matthew Tovbin	bad76048d1	Eliminate use of System.out + proper error logging (#3572 )	2018-08-09 10:06:17 -07:00
Nan Zhu	1c08b3b2ea	[jvm-packages] enable predictLeaf/predictContrib/treeLimit in 0.8 (#3532 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * partial finish * no test * add test cases * add test cases * address comments * add test for regressor * fix typo	2018-08-07 14:01:18 -07:00
Yun Ni	30d10ab035	Convert handle == nullptr from SegFault to user-friendly error. (#3021 ) * Convert SegFault to user-friendly error. * Apply the change to DMatrix API as well	2018-06-29 06:30:26 +00:00
Yanbo Liang	2c4359e914	[jvm-packages] XGBoost Spark integration refactor (#3387 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * [jvm-packages] XGBoost Spark integration refactor. (#3313) * XGBoost Spark integration refactor. * Make corresponding update for xgboost4j-example * Address comments. * [jvm-packages] Refactor XGBoost-Spark params to make it compatible with both XGBoost and Spark MLLib (#3326) * Refactor XGBoost-Spark params to make it compatible with both XGBoost and Spark MLLib * Fix extra space. * [jvm-packages] XGBoost Spark supports ranking with group data. (#3369) * XGBoost Spark supports ranking with group data. * Use Iterator.duplicate to prevent OOM. * Update CheckpointManagerSuite.scala * Resolve conflicts	2018-06-18 15:39:18 -07:00
Nan Zhu	49b9f39818	[jvm-packages] update xgboost4j cross build script to be compatible with older glibc (#3307 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * static glibc glibc++ * update to build with glib 2.12 * remove unsupported flags * update version number * remove properties * remove unnecessary command * update poms	2018-05-10 06:39:44 -07:00
Nan Zhu	e1f57b4417	[jvm-packages] scripts to cross-build and deploy artifacts to github (#3276 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * cross building files * update * build with docker * remove * temp * update build script * update pom * update * update version * upload build * fix path * update README.md * fix compiler version to 4.8.5	2018-04-28 07:41:30 -07:00
Yun Ni	740eba42f7	[jvm-packages] Add back the overriden finalize() method for SBooster (#3011 ) * Convert SIGSEGV to XGBoostError * Address CR Comments * Address CR Comments	2018-01-06 14:07:37 -08:00
Yun Ni	65fb4e3f5c	[jvm-packages] Prevent dispose being called on unfinalized JBooster (#3005 ) * [jvm-packages] Prevent dispose being called twice when finalize * Convert SIGSEGV to XGBoostError * Avoid creating a new SBooster with the same JBooster * Address CR Comments	2018-01-06 09:46:52 -08:00
Yun Ni	9004ca03ca	[jvm-packages] Saving models into a tmp folder every a few rounds (#2964 ) * [jvm-packages] Train Booster from an existing model * Align Scala API with Java API * Existing model should not load rabit checkpoint * Address minor comments * Implement saving temporary boosters and loading previous booster * Add more unit tests for loadPrevBooster * Add params to XGBoostEstimator * (1) Move repartition out of the temp model saving loop (2) Address CR comments * Catch a corner case of training next model with fewer rounds * Address comments * Refactor newly added methods into TmpBoosterManager * Add two files which is missing in previous commit * Rename TmpBooster to checkpoint	2017-12-29 08:36:41 -08:00
avinocur	0ad20f8fe0	Parameterize host-ip to pass to tracker.py (#2831 )	2017-11-29 11:14:34 -08:00
Sergei Lebedev	8e141427aa	[jvm-packages] Exposed train-time evaluation metrics (#2836 ) * [jvm-packages] Exposed train-time evaluation metrics They are accessible via 'XGBoostModel.summary'. The summary is not serialized with the model and is only available after the training. * Addressed review comments * Extracted model-related tests into 'XGBoostModelSuite' * Added tests for copying the 'XGBoostModel' * [jvm-packages] Fixed a subtle bug in train/test split Iterator.partition (naturally) assumes that the predicate is deterministic but this is not the case for r.nextDouble() <= trainTestRatio therefore sometimes the DMatrix(...) call got a NoSuchElementException and crashed the JVM due to lack of exception handling in XGBoost4jCallbackDataIterNext. * Make sure train/test objectives are different	2017-11-20 22:21:54 +01:00
Seth Hendrickson	a8f670d247	[jvm-packages] Add some documentation to xgboost4j-spark plus minor style edits (#2823 ) * add scala docs to several methods * indentation * license formatting * clarify distributed boosters * address some review comments * reduce doc lengths * change method name, clarify doc * reset make config * delete most comments * more review feedback	2017-11-02 13:16:02 -07:00
Sergei Lebedev	69c3b78a29	[jvm-packages] Implemented early stopping (#2710 ) * Allowed subsampling test from the training data frame/RDD The implementation requires storing 1 - trainTestRatio points in memory to make the sampling work. An alternative approach would be to construct the full DMatrix and then slice it deterministically into train/test. The peak memory consumption of such scenario, however, is twice the dataset size. * Removed duplication from 'XGBoost.train' Scala callers can (and should) use names to supply a subset of parameters. Method overloading is not required. * Reuse XGBoost seed parameter to stabilize train/test splitting * Added early stopping support to non-distributed XGBoost Closes #1544 * Added early-stopping to distributed XGBoost * Moved construction of 'watches' into a separate method This commit also fixes the handling of 'baseMargin' which previously was not added to the validation matrix. * Addressed review comments	2017-09-29 12:06:22 -07:00
Mahmoud Rawas	a7ce4d2462	Returning back LabeledPoint into public, in referece to the discussion in : https://github.com/dmlc/xgboost/pull/2532#discussion_r137172759 (#2677 )	2017-09-10 20:45:43 -07:00
Yun Ni	a00157543d	Support instance weights for xgboost4j-spark (#2642 ) * Support instance weights for xgboost4j-spark * Use 0.001 instead of 0 for weights * Address CR comments	2017-08-28 09:03:20 -07:00

1 2 3 4

154 Commits