xgboost

Author	SHA1	Message	Date
Philip Hyunsu Cho	1168a68872	[jvm-packages] Update release scripts (#9983 ) * [jvm-packages] Add Scala version suffix to xgboost-jvm package (#9776) * Update JVM script (#9714) * Revamp pom.xml * Update instructions in prepare_jvm_release.py * Fix formatting * [jvm-packages] Fix POM for xgboost-jvm metapackage (#9893) * [jvm-packages] Fix POM for xgboost-jvm metapackage * Add script for updating the Scala version * Update change_scala_version.py to also change scala.version property (#9897) * Remove 'release-cpu-only' profile * Remove scala-2.13 profile; enable gpu package for Scala 2.13	2024-01-12 10:37:55 -08:00
Jiaming Yuan	58530b1bc4	Bump version to 2.1. (#9498 )	2023-08-18 01:04:04 +08:00
Jiaming Yuan	f4fb2be101	[jvm-packages] Add the new `device` parameter. (#9385 )	2023-07-17 18:40:39 +08:00
Boris	a01df102c9	Scala 2.13 support. (#9099 ) 1. Updated the test logic 2. Added smoke tests for Spark examples. 3. Added integration tests for Spark with Scala 2.13	2023-05-27 19:34:02 +08:00
Jiaming Yuan	1f9a57d17b	[Breaking] Require format to be specified in input URI. (#9077 ) Previously, we use `libsvm` as default when format is not specified. However, the dmlc data parser is not particularly robust against errors, and the most common type of error is undefined format. Along with which, we will recommend users to use other data loader instead. We will continue the maintenance of the parsers as it's currently used for many internal tests including federated learning.	2023-04-28 19:45:15 +08:00
Boris	0e7377ba9c	Updated flink 1.8 -> 1.17. Added smoke tests for Flink (#9046 )	2023-04-26 18:41:11 +08:00
Jiaming Yuan	26209a42a5	Define git attributes for renormalization. (#8921 )	2023-03-16 02:43:11 +08:00
dependabot[bot]	4a99c9bdb8	Bump commons-lang3 from 3.9 to 3.12.0 in /jvm-packages (#8548 ) Bumps commons-lang3 from 3.9 to 3.12.0. --- updated-dependencies: - dependency-name: org.apache.commons:commons-lang3 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-12-06 20:13:46 +08:00
Philip Hyunsu Cho	2546d139d6	[jvm-packages] Add missing commons-lang3 dependency to xgboost4j-gpu (#8508 ) * [jvm-packages] Add missing commons-lang3 dependency to xgboost4j-gpu * Update commons-lang3	2022-12-01 16:27:11 -08:00
Jiaming Yuan	f73520bfff	Bump development version to 2.0. (#8390 )	2022-10-28 15:21:19 +08:00
Jiaming Yuan	f835368bcf	Mark next release as 1.7 instead of 2.0 (#8281 )	2022-09-28 14:33:37 +08:00
Bobby Wang	9fa7ed1743	[Breaking][jvm-packages] remove timeoutRequestWorkers parameter (#7839 )	2022-05-13 16:26:25 +08:00
Jiaming Yuan	522636cb52	Bump version. (#7769 )	2022-03-31 06:33:22 +08:00
Jiaming Yuan	f7caac2563	Bump version to 1.6.0 in master. (#7259 )	2021-10-07 16:09:26 +08:00
Jiaming Yuan	fbd58bf190	[jvm-packages] Create demo and test for xgboost4j early stopping. (#7252 )	2021-09-25 03:29:27 +08:00
Jiaming Yuan	146549260a	Bump version to 1.5.0 snapshot in master. (#6875 )	2021-04-22 01:53:44 +08:00
Philip Hyunsu Cho	0d483cb7c1	Bump version to 1.4.0 snapshot in master (#6486 )	2020-12-10 07:38:08 -08:00
Philip Hyunsu Cho	b3193052b3	Bump version to 1.3.0 snapshot in master (#6052 )	2020-08-23 17:13:46 -07:00
Bobby Wang	8943eb4314	[BLOCKING] [jvm-packages] add gpu_hist and enable gpu scheduling (#5171 ) * [jvm-packages] add gpu_hist tree method * change updater hist to grow_quantile_histmaker * add gpu scheduling * pass correct parameters to xgboost library * remove debug info * add use.cuda for pom * add CI for gpu_hist for jvm * add gpu unit tests * use gpu node to build jvm * use nvidia-docker * Add CLI interface to create_jni.py using argparse Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2020-07-26 21:53:24 -07:00
Philip Hyunsu Cho	073b625bde	Bump version to 1.2.0 snapshot in master (#5733 )	2020-05-31 00:11:34 -07:00
Philip Hyunsu Cho	7ac7e8778f	Port patches from 1.0.0 branch (#5336 ) * Remove f-string, since it's not supported by Python 3.5 (#5330) * Remove f-string, since it's not supported by Python 3.5 * Add Python 3.5 to CI, to ensure compatibility * Remove duplicated matplotlib * Show deprecation notice for Python 3.5 * Fix lint * Fix lint * Fix a unit test that mistook MINOR ver for PATCH ver * Enforce only major version in JSON model schema * Bump version to 1.1.0-SNAPSHOT	2020-02-21 13:13:21 -08:00
Jiaming Yuan	010b8f1428	Revert "[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876 )" (#4965 ) This reverts commit 86ed01c4bbecef66e1bc4d02fb13116bd6130fae.	2019-10-18 14:02:35 -07:00
Chen Qin	86ed01c4bb	[jvm-packages] update rabit, surface new changes to spark, add parity and failure tests (#4876 ) * Expose sets of rabit configurations to spark layer	2019-10-18 15:07:31 -04:00
Nan Zhu	1595e3f57b	upgrade version num (#4670 ) * upgrade version num * missign changes * fix version script * change versions * rm files * Update CMakeLists.txt	2019-07-17 15:25:35 -07:00
koertkuipers	3c506b076e	[jvm-packages] upgrade to Scala 2.12 (#4574 ) * bump scala to 2.12 which requires java 8 and also newer flink and akka * put scala version in artifactId * fix appveyor * fix for scaladoc issue that looks like https://github.com/scala/bug/issues/10509 * fix ci_build * update versions in generate_pom.py * fix generate_pom.py * apache does not have a download for spark 2.4.3 distro using scala 2.12 yet, so for now i use a tgz i put on s3 * Upload spark-2.4.3-bin-scala2.12-hadoop2.7.tgz to our own S3 * Update Dockerfile.jvm_cross * Update Dockerfile.jvm_cross	2019-07-16 08:43:34 -07:00
Philip Hyunsu Cho	515f5f5c47	[RFC] Version 0.90 release candidate (#4475 ) * Release 0.90 * Add script to automatically generate acknowledgment * Update NEWS.md	2019-05-20 01:02:44 -07:00
Philip Hyunsu Cho	ea850ecd20	[CI] Refactor Jenkins CI pipeline + migrate all Linux tests to Jenkins (#4401 ) * All Linux tests are now in Jenkins CI * Tests are now de-coupled from builds. We can now build XGBoost with one version of CUDA/JDK and test it with another version of CUDA/JDK * Builds (compilation) are significantly faster because 1) They use C5 instances with faster CPU cores; and 2) build environment setup is cached using Docker containers	2019-04-26 18:39:12 -07:00
Nan Zhu	5f34078fba	[jvm-packages] bump version for master (#4209 ) * update version * bump version	2019-03-04 23:12:24 -08:00
Nan Zhu	ae3bb9c2d5	Distributed Fast Histogram Algorithm (#4011 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * init * allow hist algo * more changes * temp * update * remove hist sync * udpate rabit * change hist size * change the histogram * update kfactor * sync per node stats * temp * update * final * code clean * update rabit * more cleanup * fix errors * fix failed tests * enforce c++11 * fix lint issue * broadcast subsampled feature correctly * revert some changes * fix lint issue * enable monotone and interaction constraints * don't specify default for monotone and interactions * update docs	2019-02-05 05:12:53 -08:00
Nan Zhu	c055a32609	[jvm-packages]support multiple validation datasets in Spark (#3910 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * wrap iterators * enable copartition training and validationset * add parameters * converge code path and have init unit test * enable multi evals for ranking * unit test and doc * update example * fix early stopping * address the offline comments * udpate doc * test eval metrics * fix compilation issue * fix example	2018-12-17 21:03:57 -08:00
Nan Zhu	dc2bfbfde1	[jvm-packages] update version to 0.82-SNAPSHOT (#3920 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * update version * 0.82	2018-11-18 16:47:48 -08:00
ajing	0ddb8a7661	Update README.md (#3872 ) SparkWithDataFrame was not there anymore. So replace with SparkMLlibPipeline.scala	2018-11-12 11:03:13 -08:00
Philip Hyunsu Cho	78ec77fa97	Release 0.81 version (#3864 ) * Release 0.81 version * Update NEWS.md	2018-11-04 05:49:11 -08:00
Nan Zhu	79d854c695	[jvm-packages] fix errors in example (#3719 ) * add back train method but mark as deprecated * fix scalastyle error * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * instrumentation * use log console * better measurement * fix erros in example * update histmaker	2018-09-22 16:39:38 -07:00
Philip Hyunsu Cho	6288f6d563	Update JVM packages version to 0.81-SNAPSHOT (#3584 )	2018-08-13 10:17:52 -07:00
Philip Hyunsu Cho	96826a3515	Release version 0.80 (#3541 ) * Up versions * Write release note for 0.80	2018-08-13 01:38:37 -07:00
Nan Zhu	31d1baba3d	[jvm-packages] Tutorial of XGBoost4J-Spark (#3534 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * add new * update doc * finish Gang Scheduling * more * intro * Add sections: Prediction, Model persistence and ML pipeline. * Add XGBoost4j-Spark MLlib pipeline example * partial finished version * finish the doc * adjust code * fix the doc * use rst * Convert XGBoost4J-Spark tutorial to reST * Bring XGBoost4J up to date * add note about using hdfs * remove duplicate file * fix descriptions * update doc * Wrap HDFS/S3 export support as a note * update * wrap indexing_mode example in code block	2018-08-03 21:17:50 -07:00
Nan Zhu	e2f09db77a	[jvm-packages] minor fix for parameter name in example (#3507 )	2018-07-25 19:57:40 -07:00
Yanbo Liang	2c4359e914	[jvm-packages] XGBoost Spark integration refactor (#3387 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * [jvm-packages] XGBoost Spark integration refactor. (#3313) * XGBoost Spark integration refactor. * Make corresponding update for xgboost4j-example * Address comments. * [jvm-packages] Refactor XGBoost-Spark params to make it compatible with both XGBoost and Spark MLLib (#3326) * Refactor XGBoost-Spark params to make it compatible with both XGBoost and Spark MLLib * Fix extra space. * [jvm-packages] XGBoost Spark supports ranking with group data. (#3369) * XGBoost Spark supports ranking with group data. * Use Iterator.duplicate to prevent OOM. * Update CheckpointManagerSuite.scala * Resolve conflicts	2018-06-18 15:39:18 -07:00
Nan Zhu	f66731181f	Update 0.8 version num (#3358 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * update 0.80	2018-06-02 07:06:01 -07:00
Nan Zhu	e1f57b4417	[jvm-packages] scripts to cross-build and deploy artifacts to github (#3276 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * cross building files * update * build with docker * remove * temp * update build script * update pom * update * update version * upload build * fix path * update README.md * fix compiler version to 4.8.5	2018-04-28 07:41:30 -07:00
Yanbo Liang	4850f67b85	Fix broken link for xgboost-spark example. (#3275 )	2018-04-26 06:45:01 -07:00
Nan Zhu	25b2919c44	[jvm-packages] change version of jvm to keep consistent with other pkgs (#3253 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * change version of jvm to keep consistent with other pkgs	2018-04-19 20:48:50 -07:00
Nan Zhu	14c6392381	[jvm-packages] add dev script to update version and update versions (#2998 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * add dev script to update version and update versions	2018-01-01 21:28:53 -08:00
Sergei Lebedev	69c3b78a29	[jvm-packages] Implemented early stopping (#2710 ) * Allowed subsampling test from the training data frame/RDD The implementation requires storing 1 - trainTestRatio points in memory to make the sampling work. An alternative approach would be to construct the full DMatrix and then slice it deterministically into train/test. The peak memory consumption of such scenario, however, is twice the dataset size. * Removed duplication from 'XGBoost.train' Scala callers can (and should) use names to supply a subset of parameters. Method overloading is not required. * Reuse XGBoost seed parameter to stabilize train/test splitting * Added early stopping support to non-distributed XGBoost Closes #1544 * Added early-stopping to distributed XGBoost * Moved construction of 'watches' into a separate method This commit also fixes the handling of 'baseMargin' which previously was not added to the validation matrix. * Addressed review comments	2017-09-29 12:06:22 -07:00
Sergei Lebedev	771a95aec6	[jvm-packages] Added baseMargin to ml.dmlc.xgboost4j.LabeledPoint (#2532 ) * Converted ml.dmlc.xgboost4j.LabeledPoint to Scala This allows to easily integrate LabeledPoint with Spark DataFrame APIs, which support encoding/decoding case classes out of the box. Alternative solution would be to keep LabeledPoint in Java and make it a Bean by generating boilerplate getters/setters. I have decided against that, even thought the conversion in this PR implies a public API change. I also had to remove the factory methods fromSparseVector and fromDenseVector because a) they would need to be duplicated to support overloaded calls with extra data (e.g. weight); and b) Scala would expose them via mangled $.MODULE$ which looks ugly in Java. Additionally, this commit makes it possible to switch to LabeledPoint in all public APIs and effectively to pass initial margin/group as part of the point. This seems to be the only reliable way of implementing distributed learning with these data. Note that group size format used by single-node XGBoost is not compatible with that scenario, since the partition split could divide a group into two chunks. * Switched to ml.dmlc.xgboost4j.LabeledPoint in RDD-based public APIs Note that DataFrame-based and Flink APIs are not affected by this change. * Removed baseMargin argument in favour of the LabeledPoint field * Do a single pass over the partition in buildDistributedBoosters Note that there is no formal guarantee that val repartitioned = rdd.repartition(42) repartitioned.zipPartitions(repartitioned.map(_ + 1)) { it1, it2, => ... } would do a single shuffle, but in practice it seems to be always the case. * Exposed baseMargin in DataFrame-based API * Addressed review comments * Pass baseMargin to XGBoost.trainWithDataFrame via params * Reverted MLLabeledPoint in Spark APIs As discussed, baseMargin would only be supported for DataFrame-based APIs. * Cleaned up baseMargin tests - Removed RDD-based test, since the option is no longer exposed via public APIs - Changed DataFrame-based one to check that adding a margin actually affects the prediction * Pleased Scalastyle * Addressed more review comments * Pleased scalastyle again * Fixed XGBoost.fromBaseMarginsToArray which always returned an array of NaNs even if base margin was not specified. Surprisingly this only failed a few tests.	2017-08-10 14:29:26 -07:00
Sergei Lebedev	d535340459	[jvm-packages] Exposed baseMargin (#2450 ) * Disabled excessive Spark logging in tests * Fixed a singature of XGBoostModel.predict Prior to this commit XGBoostModel.predict produced an RDD with an array of predictions for each partition, effectively changing the shape wrt the input RDD. A more natural contract for prediction API is that given an RDD it returns a new RDD with the same number of elements. This allows the users to easily match inputs with predictions. This commit removes one layer of nesting in XGBoostModel.predict output. Even though the change is clearly non-backward compatible, I still think it is well justified. * Removed boxing in XGBoost.fromDenseToSparseLabeledPoints * Inlined XGBoost.repartitionData An if is more explicit than an opaque method name. * Moved XGBoost.convertBoosterToXGBoostModel to XGBoostModel * Check the input dimension in DMatrix.setBaseMargin Prior to this commit providing an array of incorrect dimensions would have resulted in memory corruption. Maybe backport this to C++? * Reduced nesting in XGBoost.buildDistributedBoosters * Ensured consistent naming of the params map * Cleaned up DataBatch to make it easier to comprehend * Made scalastyle happy * Added baseMargin to XGBoost.train and trainWithRDD * Deprecated XGBoost.train It is ambiguous and work only for RDDs. * Addressed review comments * Revert "Fixed a singature of XGBoostModel.predict" This reverts commit 06bd5dcae7780265dd57e93ed7d4135f4e78f9b4. * Addressed more review comments * Fixed NullPointerException in buildDistributedBoosters	2017-06-30 08:27:24 -07:00
Xin Yin	e7fbc8591f	[jvm-packages] Scala implementation of the Rabit tracker. (#1612 ) * [jvm-packages] Scala implementation of the Rabit tracker. A Scala implementation of RabitTracker that is interface-interchangable with the Java implementation, ported from `tracker.py` in the [dmlc-core project](https://github.com/dmlc/dmlc-core). * [jvm-packages] Updated Akka dependency in pom.xml. * Refactored the RabitTracker directory structure. * Fixed premature stopping of connection handler. Added a new finite state "AwaitingPortNumber" to explicitly wait for the worker to send the port, and close the connection. Stopping the actor prematurely sends a TCP RST to the worker, causing the worker to crash on AssertionError. * Added interface IRabitTracker so that user can switch implementations. * Default timeout duration changes. * Dependency for Akka tests. * Removed the main function of RabitTracker. * A skeleton for testing Akka-based Rabit tracker. * waitFor() in RabitTracker no longer throws exceptions. * Completed unit test for the 'start' command of Rabit tracker. * Preliminary support for Rabit Allreduce via JNI (no prepare function support yet.) * Fixed the default timeout duration. * Use Java container to avoid serialization issues due to intermediate wrappers. * Added tests for Allreduce/model training using Scala Rabit tracker. * Added spill-over unit test for the Scala Rabit tracker. * Fixed a typo. * Overhaul of RabitTracker interface per code review. - Removed methods start() waitFor() (no arguments) from IRabitTracker. - The timeout in start(timeout) is now worker connection timeout, as tcp socket binding timeout is less intuitive. - Dropped time unit from start(...) and waitFor(...) methods; the default time unit is millisecond. - Moved random port number generation into the RabitTrackerHandler. - Moved all Rabit-related classes to package ml.dmlc.xgboost4j.scala.rabit. * More code refactoring and comments. * Unified timeout constants. Readable tracker status code. * Add comments to indicate that allReduce is for tests only. Removed all other variants. * Removed unused imports. * Simplified signatures of training methods. - Moved TrackerConf into parameter map. - Changed GeneralParams so that TrackerConf becomes a standalone parameter. - Updated test cases accordingly. * Changed monitoring strategies. * Reverted monitoring changes. * Update test case for Rabit AllReduce. * Mix in UncaughtExceptionHandler into IRabitTracker to prevent tracker from hanging due to exceptions thrown by workers. * More comprehensive test cases for exception handling and worker connection timeout. * Handle executor loss due to unknown cause: the newly spawned executor will attempt to connect to the tracker. Interrupt tracker in such case. * Per code-review, removed training timeout from TrackerConf. Timeout logic must be implemented explicitly and externally in the driver code. * Reverted scalastyle-config changes. * Visibility scope change. Interface tweaks. * Use match pattern to handle tracker_conf parameter. * Minor clarification in JNI code. * Clearer intent in match pattern to suppress warnings. * Removed Future from constructor. Block in start() and waitFor() instead. * Revert inadvertent comment changes. * Removed debugging information. * Updated test cases that are a bit finicky. * Added comments on the reasoning behind the unit tests for testing Rabit tracker robustness.	2016-12-07 06:35:42 -08:00
Ruimin Wang	d80cec3384	[jvm-pacakges] the first parameter in getModelDump should be featuremap path not model path (#1788 ) * fix the model dump in xgboost4j example * Modify the dump model part of scala version * add the forgotten modelInfos	2016-11-21 08:52:26 -05:00
XianXing Zhang	ce708c8e7f	[jvm-packages] Leverage the Spark ml API to read DataFrame from files in LibSVM format. (#1785 )	2016-11-20 21:28:03 -05:00

1 2

76 Commits