xgboost

Author	SHA1	Message	Date
Philip Hyunsu Cho	6288f6d563	Update JVM packages version to 0.81-SNAPSHOT (#3584 )	2018-08-13 10:17:52 -07:00
Philip Hyunsu Cho	96826a3515	Release version 0.80 (#3541 ) * Up versions * Write release note for 0.80	2018-08-13 01:38:37 -07:00
Matthew Tovbin	bad76048d1	Eliminate use of System.out + proper error logging (#3572 )	2018-08-09 10:06:17 -07:00
Nan Zhu	1c08b3b2ea	[jvm-packages] enable predictLeaf/predictContrib/treeLimit in 0.8 (#3532 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * partial finish * no test * add test cases * add test cases * address comments * add test for regressor * fix typo	2018-08-07 14:01:18 -07:00
Yun Ni	30d10ab035	Convert handle == nullptr from SegFault to user-friendly error. (#3021 ) * Convert SegFault to user-friendly error. * Apply the change to DMatrix API as well	2018-06-29 06:30:26 +00:00
Yanbo Liang	2c4359e914	[jvm-packages] XGBoost Spark integration refactor (#3387 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * [jvm-packages] XGBoost Spark integration refactor. (#3313) * XGBoost Spark integration refactor. * Make corresponding update for xgboost4j-example * Address comments. * [jvm-packages] Refactor XGBoost-Spark params to make it compatible with both XGBoost and Spark MLLib (#3326) * Refactor XGBoost-Spark params to make it compatible with both XGBoost and Spark MLLib * Fix extra space. * [jvm-packages] XGBoost Spark supports ranking with group data. (#3369) * XGBoost Spark supports ranking with group data. * Use Iterator.duplicate to prevent OOM. * Update CheckpointManagerSuite.scala * Resolve conflicts	2018-06-18 15:39:18 -07:00
Nan Zhu	f66731181f	Update 0.8 version num (#3358 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * update 0.80	2018-06-02 07:06:01 -07:00
Nan Zhu	49b9f39818	[jvm-packages] update xgboost4j cross build script to be compatible with older glibc (#3307 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * static glibc glibc++ * update to build with glib 2.12 * remove unsupported flags * update version number * remove properties * remove unnecessary command * update poms	2018-05-10 06:39:44 -07:00
Nan Zhu	e1f57b4417	[jvm-packages] scripts to cross-build and deploy artifacts to github (#3276 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * cross building files * update * build with docker * remove * temp * update build script * update pom * update * update version * upload build * fix path * update README.md * fix compiler version to 4.8.5	2018-04-28 07:41:30 -07:00
Nan Zhu	25b2919c44	[jvm-packages] change version of jvm to keep consistent with other pkgs (#3253 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * change version of jvm to keep consistent with other pkgs	2018-04-19 20:48:50 -07:00
Yun Ni	740eba42f7	[jvm-packages] Add back the overriden finalize() method for SBooster (#3011 ) * Convert SIGSEGV to XGBoostError * Address CR Comments * Address CR Comments	2018-01-06 14:07:37 -08:00
Yun Ni	65fb4e3f5c	[jvm-packages] Prevent dispose being called on unfinalized JBooster (#3005 ) * [jvm-packages] Prevent dispose being called twice when finalize * Convert SIGSEGV to XGBoostError * Avoid creating a new SBooster with the same JBooster * Address CR Comments	2018-01-06 09:46:52 -08:00
Nan Zhu	9747ea2acb	[jvm-packages] fix the pattern in dev script and version mismatch (#3009 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix the pattern in dev script and version mismatch	2018-01-06 06:59:38 -08:00
Nan Zhu	14c6392381	[jvm-packages] add dev script to update version and update versions (#2998 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * add dev script to update version and update versions	2018-01-01 21:28:53 -08:00
Yun Ni	9004ca03ca	[jvm-packages] Saving models into a tmp folder every a few rounds (#2964 ) * [jvm-packages] Train Booster from an existing model * Align Scala API with Java API * Existing model should not load rabit checkpoint * Address minor comments * Implement saving temporary boosters and loading previous booster * Add more unit tests for loadPrevBooster * Add params to XGBoostEstimator * (1) Move repartition out of the temp model saving loop (2) Address CR comments * Catch a corner case of training next model with fewer rounds * Address comments * Refactor newly added methods into TmpBoosterManager * Add two files which is missing in previous commit * Rename TmpBooster to checkpoint	2017-12-29 08:36:41 -08:00
avinocur	0ad20f8fe0	Parameterize host-ip to pass to tracker.py (#2831 )	2017-11-29 11:14:34 -08:00
Sergei Lebedev	8e141427aa	[jvm-packages] Exposed train-time evaluation metrics (#2836 ) * [jvm-packages] Exposed train-time evaluation metrics They are accessible via 'XGBoostModel.summary'. The summary is not serialized with the model and is only available after the training. * Addressed review comments * Extracted model-related tests into 'XGBoostModelSuite' * Added tests for copying the 'XGBoostModel' * [jvm-packages] Fixed a subtle bug in train/test split Iterator.partition (naturally) assumes that the predicate is deterministic but this is not the case for r.nextDouble() <= trainTestRatio therefore sometimes the DMatrix(...) call got a NoSuchElementException and crashed the JVM due to lack of exception handling in XGBoost4jCallbackDataIterNext. * Make sure train/test objectives are different	2017-11-20 22:21:54 +01:00
Seth Hendrickson	a8f670d247	[jvm-packages] Add some documentation to xgboost4j-spark plus minor style edits (#2823 ) * add scala docs to several methods * indentation * license formatting * clarify distributed boosters * address some review comments * reduce doc lengths * change method name, clarify doc * reset make config * delete most comments * more review feedback	2017-11-02 13:16:02 -07:00
Sergei Lebedev	69c3b78a29	[jvm-packages] Implemented early stopping (#2710 ) * Allowed subsampling test from the training data frame/RDD The implementation requires storing 1 - trainTestRatio points in memory to make the sampling work. An alternative approach would be to construct the full DMatrix and then slice it deterministically into train/test. The peak memory consumption of such scenario, however, is twice the dataset size. * Removed duplication from 'XGBoost.train' Scala callers can (and should) use names to supply a subset of parameters. Method overloading is not required. * Reuse XGBoost seed parameter to stabilize train/test splitting * Added early stopping support to non-distributed XGBoost Closes #1544 * Added early-stopping to distributed XGBoost * Moved construction of 'watches' into a separate method This commit also fixes the handling of 'baseMargin' which previously was not added to the validation matrix. * Addressed review comments	2017-09-29 12:06:22 -07:00
Mahmoud Rawas	a7ce4d2462	Returning back LabeledPoint into public, in referece to the discussion in : https://github.com/dmlc/xgboost/pull/2532#discussion_r137172759 (#2677 )	2017-09-10 20:45:43 -07:00
Yun Ni	a00157543d	Support instance weights for xgboost4j-spark (#2642 ) * Support instance weights for xgboost4j-spark * Use 0.001 instead of 0 for weights * Address CR comments	2017-08-28 09:03:20 -07:00
Sergei Lebedev	771a95aec6	[jvm-packages] Added baseMargin to ml.dmlc.xgboost4j.LabeledPoint (#2532 ) * Converted ml.dmlc.xgboost4j.LabeledPoint to Scala This allows to easily integrate LabeledPoint with Spark DataFrame APIs, which support encoding/decoding case classes out of the box. Alternative solution would be to keep LabeledPoint in Java and make it a Bean by generating boilerplate getters/setters. I have decided against that, even thought the conversion in this PR implies a public API change. I also had to remove the factory methods fromSparseVector and fromDenseVector because a) they would need to be duplicated to support overloaded calls with extra data (e.g. weight); and b) Scala would expose them via mangled $.MODULE$ which looks ugly in Java. Additionally, this commit makes it possible to switch to LabeledPoint in all public APIs and effectively to pass initial margin/group as part of the point. This seems to be the only reliable way of implementing distributed learning with these data. Note that group size format used by single-node XGBoost is not compatible with that scenario, since the partition split could divide a group into two chunks. * Switched to ml.dmlc.xgboost4j.LabeledPoint in RDD-based public APIs Note that DataFrame-based and Flink APIs are not affected by this change. * Removed baseMargin argument in favour of the LabeledPoint field * Do a single pass over the partition in buildDistributedBoosters Note that there is no formal guarantee that val repartitioned = rdd.repartition(42) repartitioned.zipPartitions(repartitioned.map(_ + 1)) { it1, it2, => ... } would do a single shuffle, but in practice it seems to be always the case. * Exposed baseMargin in DataFrame-based API * Addressed review comments * Pass baseMargin to XGBoost.trainWithDataFrame via params * Reverted MLLabeledPoint in Spark APIs As discussed, baseMargin would only be supported for DataFrame-based APIs. * Cleaned up baseMargin tests - Removed RDD-based test, since the option is no longer exposed via public APIs - Changed DataFrame-based one to check that adding a margin actually affects the prediction * Pleased Scalastyle * Addressed more review comments * Pleased scalastyle again * Fixed XGBoost.fromBaseMarginsToArray which always returned an array of NaNs even if base margin was not specified. Surprisingly this only failed a few tests.	2017-08-10 14:29:26 -07:00
Philip Cho	03e213c7cd	Fix documentation for a misspelled parameter (#2569 )	2017-08-02 21:50:09 +12:00
Sergei Lebedev	d535340459	[jvm-packages] Exposed baseMargin (#2450 ) * Disabled excessive Spark logging in tests * Fixed a singature of XGBoostModel.predict Prior to this commit XGBoostModel.predict produced an RDD with an array of predictions for each partition, effectively changing the shape wrt the input RDD. A more natural contract for prediction API is that given an RDD it returns a new RDD with the same number of elements. This allows the users to easily match inputs with predictions. This commit removes one layer of nesting in XGBoostModel.predict output. Even though the change is clearly non-backward compatible, I still think it is well justified. * Removed boxing in XGBoost.fromDenseToSparseLabeledPoints * Inlined XGBoost.repartitionData An if is more explicit than an opaque method name. * Moved XGBoost.convertBoosterToXGBoostModel to XGBoostModel * Check the input dimension in DMatrix.setBaseMargin Prior to this commit providing an array of incorrect dimensions would have resulted in memory corruption. Maybe backport this to C++? * Reduced nesting in XGBoost.buildDistributedBoosters * Ensured consistent naming of the params map * Cleaned up DataBatch to make it easier to comprehend * Made scalastyle happy * Added baseMargin to XGBoost.train and trainWithRDD * Deprecated XGBoost.train It is ambiguous and work only for RDDs. * Addressed review comments * Revert "Fixed a singature of XGBoostModel.predict" This reverts commit 06bd5dcae7780265dd57e93ed7d4135f4e78f9b4. * Addressed more review comments * Fixed NullPointerException in buildDistributedBoosters	2017-06-30 08:27:24 -07:00
Edi Bice	2911597f3d	[jvm-packages] Expose prediction feature contribution on the Java side (#2441 ) * Exposed prediction feature contribution on the Java side * was not supplying the newly added argument * Exposed from Scala-side as well * formatting (keep declaration in one line unless exceeding 100 chars)	2017-06-28 13:34:51 -07:00
Sergei Lebedev	91e778c6db	[jvm-packages] JNI Cosmetics (#2448 ) * [jvm-packages] Ensure the native library is loaded once Previously any class using XGBoostJNI queried NativeLibLoader to make sure the native library is loaded. This commit moves the initXGBoost call to XGBoostJNI, effectively delegating the initialization to the class loader. Note also, that now XGBoostJNI would NOT suppress an IOException if it occured in initXGBoost. * [jvm-packages] Fused JNIErrorHandle with XGBoostJNI There was no reason for having a separate class.	2017-06-23 11:49:30 -07:00
Sergei Lebedev	2cb51f7097	[jvm-packages] Another pack of build/CI improvements (#2422 ) * [jvm-packages] Fixed compilation on Windows * [jvm-packages] Build the JNI bindings on Appveyor * [jvm-packages] Build & test on OS X * [jvm-packages] Re-applied the CMake build changes reverted by #2395 * Fixed Appveyor JVM build * Muted Maven on Travis * Don't link with libawt * "linux2"->"linux" Python2.x and 3.X use slightly different values for ``sys.platform``.	2017-06-21 12:28:35 -07:00
Sergei Lebedev	0db37c05bd	[jvm-packages] Deterministically XGBoost training on exception (#2405 ) Previously the code relied on the tracker process being terminated by the OS, which was not the case on Windows. Closes #2394	2017-06-12 20:19:28 -07:00
PSEUDOTENSOR / Jonathan McKinney	41efe32aa5	[GPU-Plugin] Multi-GPU for grow_gpu_hist histogram method using NVIDIA NCCL. (#2395 )	2017-06-12 05:06:08 +12:00
Sergei Lebedev	3820ab6a0b	[jvm-packages] Minor improvements to the CMake build (#2379 ) * [jvm-packages] Fixed JNI_OnLoad overload It does not compile on Windows without proper export flags. * [jvm-packages] Use JNI types directly where appropriate * Removed lib hack from CMake build Prior to this commit the CMake build use hardcoded lib prefix for libxgboost and libxgboost4j. Unfortunatelly this did not play well with Windows, which does not use the lib- prefix.	2017-06-09 08:25:09 -07:00
Sergei Lebedev	37c27ab8e8	[jvm-packages] Replaced create_jni.{bat,sh} with a Python version (#2383 ) * [jvm-packages] Replaced create_jni.{bat,sh} with a Python version This allows to have a single script for all platforms. * [jvm-packages] Added all configuration options to create_jni.py	2017-06-07 21:55:45 -07:00
Sergei Lebedev	2d9052bc7d	libxgboost4j is now part of the CMake build (#2373 ) * [jvm-packages] Added libxgboost4j to CMake build * [jvm-packages] Wired CMake build into create_jni.sh * User newer CMake version on Travis * Lowered CMake version constraints * Fixed various quirks in the new CMake build	2017-06-03 17:14:51 -07:00
Sergei Lebedev	433269c335	Minor improvements to xgboost/jvm-packages build (#2356 ) * Specified 'exec-maven-plugin' version * Changed 'create_jni.sh' to fail on error and also report each of the executed commands, which makes it easier to debug.	2017-05-30 17:51:27 +02:00
ebernhardson	197a9eacc5	[jvm-packages] Expose json dumps to scala (#2247 ) * Add parameter passthru of format on Booster.getModelDump	2017-05-02 17:41:27 -07:00
ebernhardson	d3b866e3fd	[jvm-packages] Expose json formatted booster dumps (#2233 ) (#2234 ) * Change Booster dump from XGBoosterDumpModel to XGBoosterDumpModelEx Allows exposing multiple formatting options of model dumping.	2017-04-29 20:23:09 -07:00
Xin Yin	5b54b9437c	Fixed Exception handling for fragmented Rabit 'print' tracker command. Fixed unit test. (#2081 )	2017-03-05 13:40:59 -08:00
Nan Zhu	ab13fd72bd	[jvm-packages] Scala/Java interface for Fast Histogram Algorithm (#1966 ) * add back train method but mark as deprecated * fix scalastyle error * first commit in scala binding for fast histo * java test * add missed scala tests * spark training * add back train method but mark as deprecated * fix scalastyle error * local change * first commit in scala binding for fast histo * local change * fix df frame test	2017-03-04 15:37:24 -08:00
Xin Yin	4fb7fdb240	[jvm-packages] Fixed java.nio.BufferUnderFlow issue in Scala Rabit tracker. (#1993 ) * [jvm-packages] Scala implementation of the Rabit tracker. A Scala implementation of RabitTracker that is interface-interchangable with the Java implementation, ported from `tracker.py` in the [dmlc-core project](https://github.com/dmlc/dmlc-core). * [jvm-packages] Updated Akka dependency in pom.xml. * Refactored the RabitTracker directory structure. * Fixed premature stopping of connection handler. Added a new finite state "AwaitingPortNumber" to explicitly wait for the worker to send the port, and close the connection. Stopping the actor prematurely sends a TCP RST to the worker, causing the worker to crash on AssertionError. * Added interface IRabitTracker so that user can switch implementations. * Default timeout duration changes. * Dependency for Akka tests. * Removed the main function of RabitTracker. * A skeleton for testing Akka-based Rabit tracker. * waitFor() in RabitTracker no longer throws exceptions. * Completed unit test for the 'start' command of Rabit tracker. * Preliminary support for Rabit Allreduce via JNI (no prepare function support yet.) * Fixed the default timeout duration. * Use Java container to avoid serialization issues due to intermediate wrappers. * Added tests for Allreduce/model training using Scala Rabit tracker. * Added spill-over unit test for the Scala Rabit tracker. * Fixed a typo. * Overhaul of RabitTracker interface per code review. - Removed methods start() waitFor() (no arguments) from IRabitTracker. - The timeout in start(timeout) is now worker connection timeout, as tcp socket binding timeout is less intuitive. - Dropped time unit from start(...) and waitFor(...) methods; the default time unit is millisecond. - Moved random port number generation into the RabitTrackerHandler. - Moved all Rabit-related classes to package ml.dmlc.xgboost4j.scala.rabit. * More code refactoring and comments. * Unified timeout constants. Readable tracker status code. * Add comments to indicate that allReduce is for tests only. Removed all other variants. * Removed unused imports. * Simplified signatures of training methods. - Moved TrackerConf into parameter map. - Changed GeneralParams so that TrackerConf becomes a standalone parameter. - Updated test cases accordingly. * Changed monitoring strategies. * Reverted monitoring changes. * Update test case for Rabit AllReduce. * Mix in UncaughtExceptionHandler into IRabitTracker to prevent tracker from hanging due to exceptions thrown by workers. * More comprehensive test cases for exception handling and worker connection timeout. * Handle executor loss due to unknown cause: the newly spawned executor will attempt to connect to the tracker. Interrupt tracker in such case. * Per code-review, removed training timeout from TrackerConf. Timeout logic must be implemented explicitly and externally in the driver code. * Reverted scalastyle-config changes. * Visibility scope change. Interface tweaks. * Use match pattern to handle tracker_conf parameter. * Minor clarification in JNI code. * Clearer intent in match pattern to suppress warnings. * Removed Future from constructor. Block in start() and waitFor() instead. * Revert inadvertent comment changes. * Removed debugging information. * Updated test cases that are a bit finicky. * Added comments on the reasoning behind the unit tests for testing Rabit tracker robustness. * Fixed BufferUnderFlow bug in decoding tracker 'print' command. * Merge conflicts resolution.	2017-02-04 10:20:39 -08:00
Xin Yin	e7fbc8591f	[jvm-packages] Scala implementation of the Rabit tracker. (#1612 ) * [jvm-packages] Scala implementation of the Rabit tracker. A Scala implementation of RabitTracker that is interface-interchangable with the Java implementation, ported from `tracker.py` in the [dmlc-core project](https://github.com/dmlc/dmlc-core). * [jvm-packages] Updated Akka dependency in pom.xml. * Refactored the RabitTracker directory structure. * Fixed premature stopping of connection handler. Added a new finite state "AwaitingPortNumber" to explicitly wait for the worker to send the port, and close the connection. Stopping the actor prematurely sends a TCP RST to the worker, causing the worker to crash on AssertionError. * Added interface IRabitTracker so that user can switch implementations. * Default timeout duration changes. * Dependency for Akka tests. * Removed the main function of RabitTracker. * A skeleton for testing Akka-based Rabit tracker. * waitFor() in RabitTracker no longer throws exceptions. * Completed unit test for the 'start' command of Rabit tracker. * Preliminary support for Rabit Allreduce via JNI (no prepare function support yet.) * Fixed the default timeout duration. * Use Java container to avoid serialization issues due to intermediate wrappers. * Added tests for Allreduce/model training using Scala Rabit tracker. * Added spill-over unit test for the Scala Rabit tracker. * Fixed a typo. * Overhaul of RabitTracker interface per code review. - Removed methods start() waitFor() (no arguments) from IRabitTracker. - The timeout in start(timeout) is now worker connection timeout, as tcp socket binding timeout is less intuitive. - Dropped time unit from start(...) and waitFor(...) methods; the default time unit is millisecond. - Moved random port number generation into the RabitTrackerHandler. - Moved all Rabit-related classes to package ml.dmlc.xgboost4j.scala.rabit. * More code refactoring and comments. * Unified timeout constants. Readable tracker status code. * Add comments to indicate that allReduce is for tests only. Removed all other variants. * Removed unused imports. * Simplified signatures of training methods. - Moved TrackerConf into parameter map. - Changed GeneralParams so that TrackerConf becomes a standalone parameter. - Updated test cases accordingly. * Changed monitoring strategies. * Reverted monitoring changes. * Update test case for Rabit AllReduce. * Mix in UncaughtExceptionHandler into IRabitTracker to prevent tracker from hanging due to exceptions thrown by workers. * More comprehensive test cases for exception handling and worker connection timeout. * Handle executor loss due to unknown cause: the newly spawned executor will attempt to connect to the tracker. Interrupt tracker in such case. * Per code-review, removed training timeout from TrackerConf. Timeout logic must be implemented explicitly and externally in the driver code. * Reverted scalastyle-config changes. * Visibility scope change. Interface tweaks. * Use match pattern to handle tracker_conf parameter. * Minor clarification in JNI code. * Clearer intent in match pattern to suppress warnings. * Removed Future from constructor. Block in start() and waitFor() instead. * Revert inadvertent comment changes. * Removed debugging information. * Updated test cases that are a bit finicky. * Added comments on the reasoning behind the unit tests for testing Rabit tracker robustness.	2016-12-07 06:35:42 -08:00
Nan Zhu	016ab89484	[jvm-packages] Parameter tuning tool for XGBoost (#1664 )	2016-10-23 16:58:18 -04:00
Adam Pocock	445029bb82	[jvm-packages] XGBoost4j Windows fixes (#1639 ) * Changes for Mingw64 compilation to ensure long is a consistent size. Mainly impacts the Java API which would not compile, but there may be silent errors on Windows with large datasets before this patch (as long is 32-bits when compiled with mingw64 even in 64-bit mode). * Adding ifdefs to ensure it still compiles on MacOS * Makefile and create_jni.bat changes for Windows. * Switching XGDMatrixCreateFromCSREx JNI call to use size_t cast * Fixing lint error, adding profile switching to jvm-packages build to make create-jni.bat get called, adding myself to Contributors.Md	2016-10-18 08:35:25 -04:00
Nan Zhu	1673bcbe7e	[jvm-packages] separate classification and regression model and integrate with ML package (#1608 )	2016-09-30 11:49:03 -04:00
Nan Zhu	37bc122c90	[jvm-packages] Robust dmatrix creation (#1613 ) * add back train method but mark as deprecated * robust matrix creation in jvm	2016-09-26 13:35:04 -04:00
Nan Zhu	6dabdd33e3	[jvm-packages] bump to next version (#1535 ) * bump to next version * fix * fix	2016-09-01 12:18:21 -04:00
Nan Zhu	70432cac5b	make IEvaluation serializable (#1487 )	2016-08-19 13:12:39 -04:00
Nan Zhu	bd5b07873e	[jvm-packages] create dmatrix with specified missing value (#1272 ) * create dmatrix with specified missing value * update dmlc-core * support for predict method in spark package repartitioning work around * add more elements to work around training set empty partition issue	2016-06-21 17:35:17 -04:00
CodingCat	a31a978471	run native lib building command from maven	2016-03-16 16:47:08 -04:00
CodingCat	f2ef958ebb	support kryo serialization	2016-03-13 11:55:14 -04:00
CodingCat	400b1faecc	adjust the API signature as well as the docs	2016-03-11 15:22:44 -05:00
CodingCat	ab68a0ccc7	fix examples	2016-03-11 13:57:03 -05:00

1 2 3 4

181 Commits