xgboost

Author	SHA1	Message	Date
Yang Zhang	cc012dac68	Updated sklearn_parallel.py for soon-to-be-deprecated modules (#2134 )	2017-03-22 16:18:15 -05:00
Yang Zhang	f6f5003f79	Updated sklearn_examples.py for soon-to-be-deprecated modules (#2117 )	2017-03-21 20:07:27 -07:00
Zhiquan	e65564ba59	Update rank_obj.cc (#2126 ) typo: PairwieRankObj -> PairwiseRankObj	2017-03-21 20:06:16 -07:00
Matthew R. Becker	95b7dbb1ea	ENH add gcc/g++ before clang for macs (#2125 ) * ENH add gcc/g++ before clang for macs - will default to clang anyways and supports separate gcc installs * BUG missed a ) - :(	2017-03-21 20:05:09 -07:00
Tianqi Chen	dc2fb978e1	new thread local requires xcode8	2017-03-17 09:40:34 -07:00
Icyblade Dai	301540f1d9	fix DeprecationWarning on sklearn.cross_validation (#2075 ) * fix DeprecationWarning on sklearn.cross_validation * fix syntax * fix kfold n_split issue * fix mistype * fix n_splits multiple value issue * split should pass a iterable * use np.arange instead of xrange, py3 compatibility	2017-03-17 08:38:22 -05:00
Tianqi Chen	d581a3d0e7	[UPDATE] Update rabit and threadlocal (#2114 ) * [UPDATE] Update rabit and threadlocal * minor fix to make build system happy * upgrade requirement to g++4.8 * upgrade dmlc-core * update travis	2017-03-16 18:48:37 -07:00
Luckick	b0c972aa4d	Typo Issue (#2100 ) Contruct to Construct	2017-03-16 10:38:25 -07:00
Oleg Sofrygin	9d19e13ed0	adding a copy of base_margin to slice, fixes a bug where base_margin was notcopied during cross-validation (#2007 )	2017-03-16 10:36:57 -07:00
Liam Huang	3a2b8332a6	bugfix: when metric's name contains `-` (#2090 ) When metric's name contains `-`, Python will complain about insufficient arguments to unpack.	2017-03-16 10:36:39 -07:00
ZhouYong	fee1181803	fix online prediction function in learner.h (#2010 ) I use the online prediction function(`inline void Predict(const SparseBatch::Inst &inst, ... ) const;`), the results obtained are different from the results of the batch prediction function(` virtual void Predict(DMatrix* data, ...) const = 0`). After the investigation found that the online prediction function using the `base_score_` parameters, and the batch prediction function is not used in this parameter. It is found that the `base_score_` values are different when the same model file is loaded many times. ``` 1st times：base_score_: 6.69023e-21 2nd times：base_score_: -3.7668e+19 3rd times：base_score_: 5.40507e+07 ``` Online prediction results are affected by `base_score_` parameters. After deleting the if condition(`if (out_preds->size() == 1)`) , the online prediction is consistent with the batch prediction results, and the xgboost prediction results are consistent with python version. Therefore, it is likely that the online prediction function is bug	2017-03-16 10:35:52 -07:00
Matthew R. Becker	4a63f4ab43	BUG make sure to specify no openmp for some mac osx builds properly (#2095 )	2017-03-10 18:36:15 -08:00
Shaform	15456c7882	Remove deprecated prefix `bst:` (#2091 )	2017-03-09 09:06:37 -08:00
Holger Peters	95510b9667	Inform setuptools that this is a binary package (#1996 ) * Inform setuptools that this is a binary package that needs platform-tags in wheel names. This fixes issue #1995 . * PEP8 Formatting * Add docstring	2017-03-07 09:26:04 -06:00
cloverrose	288f309434	[jvm-packages] call setGroup for ranking task (#2066 ) * [jvm-packages] call setGroup for ranking task * passing groupData through xgBoostConfMap * fix original comment position * make groupData param * remove groupData variable, use xgBoostConfMap directly * set default groupData value * add use groupData tests * reduce rank-demo size * use TaskContext.getPartitionId() instead of mapPartitionsWithIndex * add DF use groupData test * remove unused varable	2017-03-06 15:45:06 -08:00
geoHeil	cf6b173bd7	[jvm-packages] Spark pipeline persistence (#1906 ) [jvm-packages] Spark pipeline persistence	2017-03-05 18:35:37 -08:00
Xin Yin	5b54b9437c	Fixed Exception handling for fragmented Rabit 'print' tracker command. Fixed unit test. (#2081 )	2017-03-05 13:40:59 -08:00
Nan Zhu	ab13fd72bd	[jvm-packages] Scala/Java interface for Fast Histogram Algorithm (#1966 ) * add back train method but mark as deprecated * fix scalastyle error * first commit in scala binding for fast histo * java test * add missed scala tests * spark training * add back train method but mark as deprecated * fix scalastyle error * local change * first commit in scala binding for fast histo * local change * fix df frame test	2017-03-04 15:37:24 -08:00
Nan Zhu	ac30a0aff5	[jvm-packages][spark]Preserve num classes (#2068 ) * add back train method but mark as deprecated * fix scalastyle error * change class to object in examples * fix compilation error * bump spark version to 2.1 * preserve num_class issues * fix failed test cases * rivising * add multi class test	2017-03-04 14:14:31 -08:00
hlsc	a92093388d	[jvm-packages] fix bug doing rabit call after finalize (#2079 ) [jvm-packages]fix bug doing rabit call after finalize	2017-03-02 16:46:57 -08:00
Tianqi Chen	fd19b7a188	Automatically remove nan from input data when it is sparse. (#2062 ) * [DATALoad] Automatically remove Nan when load from sparse matrix * add log	2017-02-25 08:59:17 -08:00
moqiguzhu	5d093a7f4c	in `caret` settings, if you want do 10*10 cross validation, you need to set repeats=10, number=10 and method=repeatedcv, (#2061 ) if you set method=cv, actually just one 10-fold cross validation will be run; fixes #2055	2017-02-25 09:16:19 -05:00
Eric Liu	7927031ffe	print_evaluation callback output on last iteration (#2036 ) verbose_eval docs claim it will log the last iteration (http://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.train). this is also consistent w/the behavior from 0.4. not a huge deal but I found it handy to see the last iter's result b/c my period is usually large. this doesn't address logging the last stage found by early_stopping (as noted in docs) as I'm not sure how to do that.	2017-02-24 23:06:48 -05:00
Vadim Khotilovich	b4d97d3cb8	R maintenance Feb2017 (#2045 ) * [R] better argument check in xgb.DMatrix; fixes #1480 * [R] showsd was a dummy; fixes #2044 * [R] better categorical encoding explanation in vignette; fixes #1989 * [R] new roxygen version docs update	2017-02-20 10:02:40 -08:00
Nan Zhu	63aec12a13	[jvm-packages] Bump spark to 2.1 (#2046 )	2017-02-19 08:29:35 -08:00
Nan Zhu	185fe1d645	[jvm-packages] use ML's para system to build the passed-in params to XGBoost (#2043 ) * add back train method but mark as deprecated * fix scalastyle error * use ML's para system to build the passed-in params to XGBoost * clean	2017-02-18 11:56:27 -08:00
DougM	acce11d3f4	fix MLlib CrossValidator issues (wrong default value configuration) #1941 (#2042 )	2017-02-18 08:10:47 -08:00
Theodore Vasiloudis	9fb46e2c5e	[trivial] Fix typo in Poisson metric name. (#2026 )	2017-02-09 09:32:06 -08:00
ANtlord	f054d812dc	Fix typo in Python Package Introduction (#2023 ) Fixed #2016	2017-02-08 23:35:13 -05:00
Xin Yin	4fb7fdb240	[jvm-packages] Fixed java.nio.BufferUnderFlow issue in Scala Rabit tracker. (#1993 ) * [jvm-packages] Scala implementation of the Rabit tracker. A Scala implementation of RabitTracker that is interface-interchangable with the Java implementation, ported from `tracker.py` in the [dmlc-core project](https://github.com/dmlc/dmlc-core). * [jvm-packages] Updated Akka dependency in pom.xml. * Refactored the RabitTracker directory structure. * Fixed premature stopping of connection handler. Added a new finite state "AwaitingPortNumber" to explicitly wait for the worker to send the port, and close the connection. Stopping the actor prematurely sends a TCP RST to the worker, causing the worker to crash on AssertionError. * Added interface IRabitTracker so that user can switch implementations. * Default timeout duration changes. * Dependency for Akka tests. * Removed the main function of RabitTracker. * A skeleton for testing Akka-based Rabit tracker. * waitFor() in RabitTracker no longer throws exceptions. * Completed unit test for the 'start' command of Rabit tracker. * Preliminary support for Rabit Allreduce via JNI (no prepare function support yet.) * Fixed the default timeout duration. * Use Java container to avoid serialization issues due to intermediate wrappers. * Added tests for Allreduce/model training using Scala Rabit tracker. * Added spill-over unit test for the Scala Rabit tracker. * Fixed a typo. * Overhaul of RabitTracker interface per code review. - Removed methods start() waitFor() (no arguments) from IRabitTracker. - The timeout in start(timeout) is now worker connection timeout, as tcp socket binding timeout is less intuitive. - Dropped time unit from start(...) and waitFor(...) methods; the default time unit is millisecond. - Moved random port number generation into the RabitTrackerHandler. - Moved all Rabit-related classes to package ml.dmlc.xgboost4j.scala.rabit. * More code refactoring and comments. * Unified timeout constants. Readable tracker status code. * Add comments to indicate that allReduce is for tests only. Removed all other variants. * Removed unused imports. * Simplified signatures of training methods. - Moved TrackerConf into parameter map. - Changed GeneralParams so that TrackerConf becomes a standalone parameter. - Updated test cases accordingly. * Changed monitoring strategies. * Reverted monitoring changes. * Update test case for Rabit AllReduce. * Mix in UncaughtExceptionHandler into IRabitTracker to prevent tracker from hanging due to exceptions thrown by workers. * More comprehensive test cases for exception handling and worker connection timeout. * Handle executor loss due to unknown cause: the newly spawned executor will attempt to connect to the tracker. Interrupt tracker in such case. * Per code-review, removed training timeout from TrackerConf. Timeout logic must be implemented explicitly and externally in the driver code. * Reverted scalastyle-config changes. * Visibility scope change. Interface tweaks. * Use match pattern to handle tracker_conf parameter. * Minor clarification in JNI code. * Clearer intent in match pattern to suppress warnings. * Removed Future from constructor. Block in start() and waitFor() instead. * Revert inadvertent comment changes. * Removed debugging information. * Updated test cases that are a bit finicky. * Added comments on the reasoning behind the unit tests for testing Rabit tracker robustness. * Fixed BufferUnderFlow bug in decoding tracker 'print' command. * Merge conflicts resolution.	2017-02-04 10:20:39 -08:00
geoHeil	2250b9b6d2	[jvm-packages] try setting default profile (#1891 ) * try setting default profile * add spark pipeline persistence * access spark session * copy paste sparks default parameter reader * remove unnecessary parameters, only change xml * remove unnecessary changes 2	2017-01-31 08:33:51 -08:00
yexu15	179b384e39	A fix regarding the compatibility with python 2.6 (#1981 ) * A fix regarding the compatibility with python 2.6 the syntax of {n: self.attr(n) for n in attr_names} is illegal in python 2.6 * Update core.py add a space after comma	2017-01-29 20:18:28 -08:00
Philip Cho	5d74578095	Disallow multiple roots for tree_method=hist (#1979 ) As discussed in issue #1978, tree_method=hist ignores the parameter param.num_roots; it simply assumes that the tree has only one root. In particular, when InitData() method initializes row_set_collection_, it simply assigns all rows to node 0, the value that's hard-coded. For now, the updater will simply fail when num_roots exceeds 1. I will revise the updater soon to support multiple roots.	2017-01-21 12:02:29 -08:00
Srivatsan Ramanujam	036ee55fe0	adding sample weights for XGBRegressor (was this forgotten?) (#1874 )	2017-01-21 11:58:03 -08:00
Vadim Khotilovich	2b5b96d760	[R] various R code maintenance (#1964 ) * [R] xgb.save must work when handle in nil but raw exists * [R] print.xgb.Booster should still print other info when handle is nil * [R] rename internal function xgb.Booster to xgb.Booster.handle to make its intent clear * [R] rename xgb.Booster.check to xgb.Booster.complete and make it visible; more docs * [R] storing evaluation_log should depend only on watchlist, not on verbose * [R] reduce the excessive chattiness of unit tests * [R] only disable some tests in windows when it's not 64-bit * [R] clean-up xgb.DMatrix * [R] test xgb.DMatrix loading from libsvm text file * [R] store feature_names in xgb.Booster, use them from utility functions * [R] remove non-functional co-occurence computation from xgb.importance * [R] verbose=0 is enough without a callback * [R] added forgotten xgb.Booster.complete.Rd; cran check fixes * [R] update installation instructions	2017-01-21 11:22:46 -08:00
wxchan	a073a2c3d4	fix ylim with max_num_features in python plot_importance (#1974 )	2017-01-18 11:59:50 -08:00
Félix MIKAELIAN	a7d2833766	added the max_features parameter to the plot_importance function. (#1963 ) * added the max_features parameter to the plot_importance function. * renamed max_features parameter to max_num_features for better understanding * removed unwanted character in docstring	2017-01-16 14:49:47 -08:00
Philip Cho	49ff7c1649	Rename parameter in fast_hist to disambiguate (#1962 )	2017-01-13 11:35:55 -08:00
Philip Cho	aeb4e76118	Histogram Optimized Tree Grower (#1940 ) * Support histogram-based algorithm + multiple tree growing strategy * Add a brand new updater to support histogram-based algorithm, which buckets continuous features into discrete bins to speed up training. To use it, set `tree_method = fast_hist` to configuration. * Support multiple tree growing strategies. For now, two policies are supported: * `grow_policy=depthwise` (default): favor splitting at nodes closest to the root, i.e. grow depth-wise. * `grow_policy=lossguide`: favor splitting at nodes with highest loss change * Improve single-threaded performance * Unroll critical loops * Introduce specialized code for dense data (i.e. no missing values) * Additional training parameters: `max_leaves`, `max_bin`, `grow_policy`, `verbose` * Adding a small test for hist method * Fix memory error in row_set.h When std::vector is resized, a reference to one of its element may become stale. Any such reference must be updated as well. * Resolve cross-platform compilation issues * Versions of g++ older than 4.8 lacks support for a few C++11 features, e.g. alignas() and new initializer syntax. To support g++ 4.6, use pre-C++11 initializer and remove alignas(). * Versions of MSVC older than 2015 does not support alignas(). To support MSVC 2012, remove alignas(). * For g++ 4.8 and newer, alignas() is enabled for performance benefits. Some old compilers (MSVC 2012, g++ 4.6) do not support template aliases (which uses `using` to declate type aliases). So always use `typedef`. * Fix a host of CI issues * Remove dependency for libz on osx * Fix heading for hist_util * Fix minor style issues * Add missing #include * Remove extraneous logging * Enable tree_method=hist in R * Renaming HistMaker to GHistBuilder to avoid confusion * Fix R integration * Respond to style comments * Consistent tie-breaking for priority queue using timestamps * Last-minute style fixes * Fix issuecomment-271977647 The way we quantize data is broken. The agaricus data consists of all categorical values. When NAs are converted into 0's, `HistCutMatrix::Init` assign both 0's and 1's to the same single bin. Why? gmat only the smallest value (0) and an upper bound (2), which is twice the maximum value (1). Add the maximum value itself to gmat to fix the issue. * Fix issuecomment-272266358 * Remove padding from cut values for the continuous case * For categorical/ordinal values, use midpoints as bin boundaries to be safe * Fix CI issue -- do not use xrange() Fix corner case in quantile sketch Signed-off-by: Philip Cho <chohyu01@cs.washington.edu> * Adding a test for an edge case in quantile sketcher max_bin=2 used to cause an exception. * Fix fast_hist test The test used to require a strictly increasing Test AUC for all examples. One of them exhibits a small blip in Test AUC before achieving a Test AUC of 1. (See bottom.) Solution: do not require monotonic increase for this particular example. [0] train-auc:0.99989 test-auc:0.999497 [1] train-auc:1 test-auc:0.999749 [2] train-auc:1 test-auc:0.999749 [3] train-auc:1 test-auc:0.999749 [4] train-auc:1 test-auc:0.999749 [5] train-auc:1 test-auc:0.999497 [6] train-auc:1 test-auc:1 [7] train-auc:1 test-auc:1 [8] train-auc:1 test-auc:1 [9] train-auc:1 test-auc:1	2017-01-13 09:25:55 -08:00
Luckick	ef8d92fc52	Validation Typo (#1949 ) change valudation to validation	2017-01-09 10:40:43 -08:00
Andrey Tereskin	cfb9b11aa4	Make lib path relatrive to fix setup error #1932 (#1947 )	2017-01-09 10:40:24 -08:00
Vadim Khotilovich	87e897f428	[R] fix #1903 (#1929 )	2017-01-06 13:16:37 -08:00
Vadim Khotilovich	d7406e07f3	[R] xgb.plot.tree fixes (#1939 ) * [R] a few fixes and improvements to xgb.plot.tree * [R] deprecate n_first_tree replace with trees; fix types in xgb.model.dt.tree	2017-01-06 11:09:51 -08:00
Vadim Khotilovich	d23ea5ca7d	An option for doing binomial+1 or epsilon-dropout from DART paper (#1922 ) * An option for doing binomial+1 or epsilon-dropout from DART paper * use callback-based discrete_distribution to make MSVC2013 happy	2017-01-05 16:23:22 -08:00
Tong He	ce84af7923	0.6-4 submission (#1935 )	2017-01-04 23:31:05 -08:00
Muneyuki Noguchi	8b827425b2	Fix comment in cross_validation.py (#1923 ) cv() doesn't output std_value because show_stdv is set to False.	2017-01-02 09:40:41 -05:00
Kyle Willett	7e07b2b93d	Correcting small typos in documentation. (#1901 )	2016-12-31 20:47:51 +08:00
Tong He	f5c85836bf	[R] Increase the version number, date and required R version (#1920 ) * remove unnecessary line	2016-12-30 21:29:26 -08:00
Qiang Kou (KK)	7948d1c799	disable openmp on solaris (#1912 )	2016-12-28 11:32:56 -08:00
adamist521	119763bc49	cross_validation is included in model_selection module since sklearn 0.18 (#1908 )	2016-12-26 04:11:56 -05:00

1 2 3 4 5 ...

3050 Commits