xgboost

Author	SHA1	Message	Date
Juang, Yi-Lin	6776292951	Minor cleanup (#2342 ) * Clean up demo of multiclass classification * Remove extra space	2017-05-26 09:40:41 -04:00
Alexander Kiselev	f1dc82e3e1	Update parameter.md (#2348 )	2017-05-25 09:27:10 -04:00
gaw89	0f3a404d91	Sklearn kwargs (#2338 ) * Added kwargs support for Sklearn API * Updated NEWS and CONTRIBUTORS * Fixed CONTRIBUTORS.md * Added clarification of *kwargs and test for proper usage Fixed lint error * Fixed more lint errors and clf assigned but never used * Fixed more lint errors * Fixed more lint errors * Fixed issue with changes from different branch bleeding over * Fixed issue with changes from other branch bleeding over * Added note that kwargs may not be compatible with Sklearn * Fixed linting on kwargs note	2017-05-23 21:47:53 -05:00
gaw89	6cea1e3fb7	Sklearn convention update (#2323 ) * Added n_jobs and random_state to keep up to date with sklearn API. Deprecated nthread and seed. Added tests for new params and deprecations. * Fixed docstring to reflect updates to n_jobs and random_state. * Fixed whitespace issues and removed nose import. * Added deprecation note for nthread and seed in docstring. * Attempted fix of deprecation tests. * Second attempted fix to tests. * Set n_jobs to 1.	2017-05-22 08:22:05 -05:00
Vadim Khotilovich	da1629e848	[gbtree] fix update process to work with multiclass and multitree; fixes #2315 (#2332 )	2017-05-21 23:47:57 -05:00
Vadim Khotilovich	b52db87d5c	adding feature contributions to R and gblinear (#2295 ) * [gblinear] add features contribution prediction; fix DumpModel bug * [gbtree] minor changes to PredContrib * [R] add feature contribution prediction to R * [R] bump up version; update NEWS * [gblinear] fix the base_margin issue; fixes #1969 * [R] list of matrices as output of multiclass feature contributions * [gblinear] make order of DumpModel coefficients consistent: group index changes the fastest	2017-05-21 07:41:51 -04:00
Sergei Lebedev	e5e721722e	Fix compilation on OS X with GCC 7 (#2256 ) * Fix compilation on OS X with GCC 7 Compilation failed with In file included from src/tree/tree_updater.cc:6:0: include/xgboost/tree_updater.h:75:46: error: 'function' is not a member of 'std' std::function<TreeUpdater* ()> > { caused by a missing <functional> include. * Fixed another occurence of that issue spotted by @ClimberPG	2017-05-19 22:04:07 -07:00
PSEUDOTENSOR / Jonathan McKinney	3ca64ffa02	[GPU-Plugin] Improved split finding performance. (#2325 )	2017-05-19 19:16:24 -07:00
jayzed82	29289d2302	Add option to choose booster in scikit intreface (gbtree by default) (#2303 ) * Add option to choose booster in scikit intreface (gbtree by default) * Add option to choose booster in scikit intreface: complete docstring. * Fix XGBClassifier to work with booster option * Added test case for gblinear booster	2017-05-18 23:12:27 -04:00
Nan Zhu	96f9776ab0	Update ISSUE_TEMPLATE.md (#2308 ) * Update ISSUE_TEMPLATE.md * Update ISSUE_TEMPLATE.md	2017-05-18 08:49:07 -07:00
Nan Zhu	a607f697e3	[jvm-packages] Disable fast histo for spark (#2296 ) * add back train method but mark as deprecated * fix scalastyle error * disable fast histogram in xgboost4j-spark temporarily	2017-05-15 20:43:16 -07:00
Vadim Khotilovich	c66ca79221	[R] native routines registration (#2290 ) * [R] add native routines registration * c_api.h needs to include <cstdint> since it uses fixed width integer types * [R] use registered native routines from R code * [R] bump version; add info on native routine registration to the contributors guide * make lint happy	2017-05-14 11:00:46 -07:00
Maurus Cuelenaere	6bd1869026	Add prediction of feature contributions (#2003 ) * Add prediction of feature contributions This implements the idea described at http://blog.datadive.net/interpreting-random-forests/ which tries to give insight in how a prediction is composed of its feature contributions and a bias. * Support multi-class models * Calculate learning_rate per-tree instead of using the one from the first tree * Do not rely on node.base_weight * learning_rate having the same value as the node mean value (aka leaf value, if it were a leaf); instead calculate them (lazily) on-the-fly * Add simple test for contributions feature * Check against param.num_nodes instead of checking for non-zero length * Loop over all roots instead of only the first	2017-05-14 00:58:10 -05:00
Sergei Lebedev	e62be19c70	Removed 'flink.suffix' and added 'flink.version' (#2277 ) The former was just Scala binary tag, and the latter was hardcoded in the 'xgboost4j-flink' POM.	2017-05-10 08:42:40 -07:00
Nan Zhu	428453f7d6	[jvm-packages] fix the persistence of XGBoostEstimator (#2265 ) * add back train method but mark as deprecated * fix scalastyle error * fix the persistence of XGBoostEstimator * test persistence of a complete pipeline * fix compilation issue * do not allow persist custom_eval and custom_obj * fix the failed tesl	2017-05-08 21:58:06 -07:00
Rory Mitchell	6bf968efe6	[GPU Plugin] Fast histogram speed improvements. Updated benchmarks. (#2258 )	2017-05-08 09:21:38 -07:00
Dmitry Nikulin	98ea461532	Fix typo (#2264 )	2017-05-07 16:54:48 -07:00
ebernhardson	197a9eacc5	[jvm-packages] Expose json dumps to scala (#2247 ) * Add parameter passthru of format on Booster.getModelDump	2017-05-02 17:41:27 -07:00
ebernhardson	ccccf8a015	[jvm-packages] Accept groupData in spark model eval (#2244 ) * Support model evaluation for ranking tasks by accepting groupData in XGBoostModel.eval	2017-05-02 10:03:20 -07:00
Vadim Khotilovich	a375ad2822	[R] maintenance Apr 2017 (#2237 ) * [R] make sure things work for a single split model; fixes #2191 * [R] add option use_int_id to xgb.model.dt.tree * [R] add example of exporting tree plot to a file * [R] set save_period = NULL as default in xgboost() to be the same as in xgb.train; fixes #2182 * [R] it's a good practice after CRAN releases to bump up package version in dev * [R] allow xgb.DMatrix construction from integer dense matrices * [R] xgb.DMatrix: silent parameter; improve documentation * [R] xgb.model.dt.tree code style changes * [R] update NEWS with parameter changes * [R] code safety & style; handle non-strict matrix and inherited classes of input and model; fixes #2242 * [R] change to x.y.z.p R-package versioning scheme and set version to 0.6.4.3 * [R] add an R package versioning section to the contributors guide * [R] R-package/README.md: clean up the redundant old installation instructions, link the contributors guide	2017-05-01 22:51:34 -07:00
Philip Cho	d769b6bcb5	Fix performance degradation of BuildHist on Windows (#2243 ) Reported in issue #2165. Dynamic scheduling of OpenMP loops involve implicit synchronization. To implement synchronization, libgomp uses futex (fast userspace mutex), whereas MinGW uses kernel-space mutex, which is more costly. With chunk size of 1, synchronization overhead may become prohibitive on Windows machines. Solution: use 'guided' schedule to minimize the number of syncs	2017-05-01 15:54:44 -07:00
ebernhardson	da58f34ff8	Store metrics with learner (#2241 ) Storing and then loading a model loses any eval_metric that was provided. This causes implementations that always store/load, like xgboost4j-spark, to be unable to eval with the desired metric.	2017-04-30 14:23:24 -07:00
ebernhardson	d3b866e3fd	[jvm-packages] Expose json formatted booster dumps (#2233 ) (#2234 ) * Change Booster dump from XGBoosterDumpModel to XGBoosterDumpModelEx Allows exposing multiple formatting options of model dumping.	2017-04-29 20:23:09 -07:00
Qiang Kou (KK)	c441d0916e	fix #2228 (#2238 )	2017-04-29 18:44:08 -07:00
Rory Mitchell	8ab5d4611c	[GPU-Plugin] (#2227 ) * Add fast histogram algorithm * Fix Linux build * Add 'gpu_id' parameter	2017-04-25 16:37:10 -07:00
Tianqi Chen	d281c6aafa	Update CONTRIBUTORS.md	2017-04-22 08:46:31 -07:00
Alex Bain	dbaa5d0bdf	Disable invalid check for completely sparse batch that results in failed assertion for issue #1827 (#2213 )	2017-04-21 09:28:02 -07:00
Nan Zhu	392aa6d1d3	[jvm-packages] make XGBoostModel hold BoosterParams as well (#2214 ) * add back train method but mark as deprecated * fix scalastyle error * make XGBoostModel hold BoosterParams as well	2017-04-21 08:12:50 -07:00
Benjamin Pachev	e38bea3cdf	Update README.md (#2202 ) Add a link to a demo for the proposed PHP XGBoost wrapper.	2017-04-17 15:28:37 -07:00
avpronkin	31e800f340	erratum in index.md (#2203 ) Mxnet instead of XGBoost	2017-04-17 15:24:18 -07:00
Seong-Jin Kim	8222755564	Fix typo in R-package README.md (#2190 )	2017-04-13 20:22:23 +02:00
Preston Parry	1ab8088a09	Removes extraneous log (#2186 ) This log appears to fire every time I ask the python package to make a prediction. It's the only log that fires from XGBoost. When we're getting predictions on millions of items a day in production, this log seems out of place.	2017-04-11 17:38:29 -07:00
Nan Zhu	a837fa9620	[jvm-packages] rdds containing boosters should be cleaned once we got boosters to driver (#2183 )	2017-04-11 06:12:49 -07:00
Nan Zhu	f08077606c	[jvm-packages] Clean external cache (#2181 ) * add back train method but mark as deprecated * fix scalastyle error * change class to object in examples * fix compilation error * small fix for cleanExternalCache	2017-04-10 07:49:58 -07:00
Nan Zhu	8d8cbcc6db	[jvm-packages] fixed several issues in unit tests (#2173 ) * add back train method but mark as deprecated * fix scalastyle error * change class to object in examples * fix compilation error * fix several issues in tests	2017-04-06 06:25:23 -07:00
Philip Cho	2715baef64	Fix bugs in multithreaded ApplySplitSparseData() (#2161 ) * Bugfix 1: Fix segfault in multithreaded ApplySplitSparseData() When there are more threads than rows in rowset, some threads end up with empty ranges, causing them to crash. (iend - 1 needs to be accessible as part of algorithm) Fix: run only those threads with nonempty ranges. * Add regression test for Bugfix 1 * Moving python_omp_test to existing python test group Turns out you don't need to set "OMP_NUM_THREADS" to enable multithreading. Just add nthread parameter. * Bugfix 2: Fix corner case of ApplySplitSparseData() for categorical feature When split value is less than all cut points, split_cond is set incorrectly. Fix: set split_cond = -1 to indicate this scenario * Bugfix 3: Initialize data layout indicator before using it data_layout_ is accessed before being set; this variable determines whether feature 0 is included in feat_set. Fix: re-order code in InitData() to initialize data_layout_ first * Adding regression test for Bugfix 2 Unfortunately, no regression test for Bugfix 3, as there is no way to deterministically assign value to an uninitialized variable.	2017-04-02 11:37:39 -07:00
Denis M Korzhenkov	ed5e75de2f	Nonreproducible sequence of evaluations fixed (#2153 ) As `num_round=2` there is no `0003.model` file after training.	2017-03-29 10:11:23 -07:00
Rory Mitchell	a33fa05bda	GPU Plugin: Bug fix #2048 (#2155 )	2017-03-29 10:10:57 -07:00
Huffers	d45cf240a9	Remove xgboost's thread_local and switch to dmlc::ThreadLocalStore (#2121 ) * Remove xgboost's own version of thread_local and switch to dmlc::ThreadLocalStore (#2109) * Update dmlc-core	2017-03-27 09:09:18 -07:00
Philip Cho	14fba01b5a	Improve multi-threaded performance (#2104 ) * Add UpdatePredictionCache() option to updaters Some updaters (e.g. fast_hist) has enough information to quickly compute prediction cache for the training data. Each updater may override UpdaterPredictionCache() method to update the prediction cache. Note: this trick does not apply to validation data. * Respond to code review * Disable some debug messages by default * Document UpdatePredictionCache() interface * Remove base_margin logic from UpdatePredictionCache() implementation * Do not take pointer to cfg, as reference may get stale * Improve multi-threaded performance * Use columnwise accessor to accelerate ApplySplit() step, with support for a compressed representation * Parallel sort for evaluation step * Inline BuildHist() function * Cache gradient pairs when building histograms in BuildHist() * Add missing #if macro * Respond to code review * Use wrapper to enable parallel sort on Linux * Fix C++ compatibility issues * MSVC doesn't support unsigned in OpenMP loops * gcc 4.6 doesn't support using keyword * Fix lint issues * Respond to code review * Fix bug in ApplySplitSparseData() * Attempting to read beyond the end of a sparse column * Mishandling the case where an entire range of rows have missing values * Fix training continuation bug Disable UpdatePredictionCache() in the first iteration. This way, we can accomodate the scenario where we build off of an existing (nonempty) ensemble. * Add regression test for fast_hist * Respond to code review * Add back old version of ApplySplitSparseData	2017-03-25 10:35:01 -07:00
Denis M Korzhenkov	332aea26a3	Formatting fixed for CLI parameters (#2145 ) Fixed list of parameters format for CLI mode	2017-03-24 08:54:58 -07:00
Laurae	5c13aa0a8a	GLM test unit: make run deterministic (#2147 )	2017-03-24 08:54:39 -07:00
付雨帆	f1fe024a9d	Update md grammar for the README.md (#2141 )	2017-03-23 11:02:06 -07:00
Qin Xiaoming	12cf0ae122	Update sparse_page_dmatrix.h (#2139 )	2017-03-23 11:01:40 -07:00
Yang Zhang	48835c3a4e	Update predict leaf indices (#2135 ) * Updated sklearn_parallel.py for soon-to-be-deprecated modules * Updated predict_leaf_indices.py; Use python3 print() as other exmaples and removed unused module	2017-03-22 19:12:34 -07:00
Matthew R. Becker	a4bae1bdcd	ENH more makefile updates (#2133 ) This commit proposes a simpler single compiler specification for OSX and nix. It also let's people override the setting on both systems, not just nix.	2017-03-22 16:22:15 -05:00
Yang Zhang	cc012dac68	Updated sklearn_parallel.py for soon-to-be-deprecated modules (#2134 )	2017-03-22 16:18:15 -05:00
Yang Zhang	f6f5003f79	Updated sklearn_examples.py for soon-to-be-deprecated modules (#2117 )	2017-03-21 20:07:27 -07:00
Zhiquan	e65564ba59	Update rank_obj.cc (#2126 ) typo: PairwieRankObj -> PairwiseRankObj	2017-03-21 20:06:16 -07:00
Matthew R. Becker	95b7dbb1ea	ENH add gcc/g++ before clang for macs (#2125 ) * ENH add gcc/g++ before clang for macs - will default to clang anyways and supports separate gcc installs * BUG missed a ) - :(	2017-03-21 20:05:09 -07:00

1 2 3 4 5 ...

3046 Commits