xgboost

Author	SHA1	Message	Date
Vadim Khotilovich	c66ca79221	[R] native routines registration (#2290 ) * [R] add native routines registration * c_api.h needs to include <cstdint> since it uses fixed width integer types * [R] use registered native routines from R code * [R] bump version; add info on native routine registration to the contributors guide * make lint happy	2017-05-14 11:00:46 -07:00
Maurus Cuelenaere	6bd1869026	Add prediction of feature contributions (#2003 ) * Add prediction of feature contributions This implements the idea described at http://blog.datadive.net/interpreting-random-forests/ which tries to give insight in how a prediction is composed of its feature contributions and a bias. * Support multi-class models * Calculate learning_rate per-tree instead of using the one from the first tree * Do not rely on node.base_weight * learning_rate having the same value as the node mean value (aka leaf value, if it were a leaf); instead calculate them (lazily) on-the-fly * Add simple test for contributions feature * Check against param.num_nodes instead of checking for non-zero length * Loop over all roots instead of only the first	2017-05-14 00:58:10 -05:00
Sergei Lebedev	e62be19c70	Removed 'flink.suffix' and added 'flink.version' (#2277 ) The former was just Scala binary tag, and the latter was hardcoded in the 'xgboost4j-flink' POM.	2017-05-10 08:42:40 -07:00
Nan Zhu	428453f7d6	[jvm-packages] fix the persistence of XGBoostEstimator (#2265 ) * add back train method but mark as deprecated * fix scalastyle error * fix the persistence of XGBoostEstimator * test persistence of a complete pipeline * fix compilation issue * do not allow persist custom_eval and custom_obj * fix the failed tesl	2017-05-08 21:58:06 -07:00
Rory Mitchell	6bf968efe6	[GPU Plugin] Fast histogram speed improvements. Updated benchmarks. (#2258 )	2017-05-08 09:21:38 -07:00
Dmitry Nikulin	98ea461532	Fix typo (#2264 )	2017-05-07 16:54:48 -07:00
ebernhardson	197a9eacc5	[jvm-packages] Expose json dumps to scala (#2247 ) * Add parameter passthru of format on Booster.getModelDump	2017-05-02 17:41:27 -07:00
ebernhardson	ccccf8a015	[jvm-packages] Accept groupData in spark model eval (#2244 ) * Support model evaluation for ranking tasks by accepting groupData in XGBoostModel.eval	2017-05-02 10:03:20 -07:00
Vadim Khotilovich	a375ad2822	[R] maintenance Apr 2017 (#2237 ) * [R] make sure things work for a single split model; fixes #2191 * [R] add option use_int_id to xgb.model.dt.tree * [R] add example of exporting tree plot to a file * [R] set save_period = NULL as default in xgboost() to be the same as in xgb.train; fixes #2182 * [R] it's a good practice after CRAN releases to bump up package version in dev * [R] allow xgb.DMatrix construction from integer dense matrices * [R] xgb.DMatrix: silent parameter; improve documentation * [R] xgb.model.dt.tree code style changes * [R] update NEWS with parameter changes * [R] code safety & style; handle non-strict matrix and inherited classes of input and model; fixes #2242 * [R] change to x.y.z.p R-package versioning scheme and set version to 0.6.4.3 * [R] add an R package versioning section to the contributors guide * [R] R-package/README.md: clean up the redundant old installation instructions, link the contributors guide	2017-05-01 22:51:34 -07:00
Philip Cho	d769b6bcb5	Fix performance degradation of BuildHist on Windows (#2243 ) Reported in issue #2165. Dynamic scheduling of OpenMP loops involve implicit synchronization. To implement synchronization, libgomp uses futex (fast userspace mutex), whereas MinGW uses kernel-space mutex, which is more costly. With chunk size of 1, synchronization overhead may become prohibitive on Windows machines. Solution: use 'guided' schedule to minimize the number of syncs	2017-05-01 15:54:44 -07:00
ebernhardson	da58f34ff8	Store metrics with learner (#2241 ) Storing and then loading a model loses any eval_metric that was provided. This causes implementations that always store/load, like xgboost4j-spark, to be unable to eval with the desired metric.	2017-04-30 14:23:24 -07:00
ebernhardson	d3b866e3fd	[jvm-packages] Expose json formatted booster dumps (#2233 ) (#2234 ) * Change Booster dump from XGBoosterDumpModel to XGBoosterDumpModelEx Allows exposing multiple formatting options of model dumping.	2017-04-29 20:23:09 -07:00
Qiang Kou (KK)	c441d0916e	fix #2228 (#2238 )	2017-04-29 18:44:08 -07:00
Rory Mitchell	8ab5d4611c	[GPU-Plugin] (#2227 ) * Add fast histogram algorithm * Fix Linux build * Add 'gpu_id' parameter	2017-04-25 16:37:10 -07:00
Tianqi Chen	d281c6aafa	Update CONTRIBUTORS.md	2017-04-22 08:46:31 -07:00
Alex Bain	dbaa5d0bdf	Disable invalid check for completely sparse batch that results in failed assertion for issue #1827 (#2213 )	2017-04-21 09:28:02 -07:00
Nan Zhu	392aa6d1d3	[jvm-packages] make XGBoostModel hold BoosterParams as well (#2214 ) * add back train method but mark as deprecated * fix scalastyle error * make XGBoostModel hold BoosterParams as well	2017-04-21 08:12:50 -07:00
Benjamin Pachev	e38bea3cdf	Update README.md (#2202 ) Add a link to a demo for the proposed PHP XGBoost wrapper.	2017-04-17 15:28:37 -07:00
avpronkin	31e800f340	erratum in index.md (#2203 ) Mxnet instead of XGBoost	2017-04-17 15:24:18 -07:00
Seong-Jin Kim	8222755564	Fix typo in R-package README.md (#2190 )	2017-04-13 20:22:23 +02:00
Preston Parry	1ab8088a09	Removes extraneous log (#2186 ) This log appears to fire every time I ask the python package to make a prediction. It's the only log that fires from XGBoost. When we're getting predictions on millions of items a day in production, this log seems out of place.	2017-04-11 17:38:29 -07:00
Nan Zhu	a837fa9620	[jvm-packages] rdds containing boosters should be cleaned once we got boosters to driver (#2183 )	2017-04-11 06:12:49 -07:00
Nan Zhu	f08077606c	[jvm-packages] Clean external cache (#2181 ) * add back train method but mark as deprecated * fix scalastyle error * change class to object in examples * fix compilation error * small fix for cleanExternalCache	2017-04-10 07:49:58 -07:00
Nan Zhu	8d8cbcc6db	[jvm-packages] fixed several issues in unit tests (#2173 ) * add back train method but mark as deprecated * fix scalastyle error * change class to object in examples * fix compilation error * fix several issues in tests	2017-04-06 06:25:23 -07:00
Philip Cho	2715baef64	Fix bugs in multithreaded ApplySplitSparseData() (#2161 ) * Bugfix 1: Fix segfault in multithreaded ApplySplitSparseData() When there are more threads than rows in rowset, some threads end up with empty ranges, causing them to crash. (iend - 1 needs to be accessible as part of algorithm) Fix: run only those threads with nonempty ranges. * Add regression test for Bugfix 1 * Moving python_omp_test to existing python test group Turns out you don't need to set "OMP_NUM_THREADS" to enable multithreading. Just add nthread parameter. * Bugfix 2: Fix corner case of ApplySplitSparseData() for categorical feature When split value is less than all cut points, split_cond is set incorrectly. Fix: set split_cond = -1 to indicate this scenario * Bugfix 3: Initialize data layout indicator before using it data_layout_ is accessed before being set; this variable determines whether feature 0 is included in feat_set. Fix: re-order code in InitData() to initialize data_layout_ first * Adding regression test for Bugfix 2 Unfortunately, no regression test for Bugfix 3, as there is no way to deterministically assign value to an uninitialized variable.	2017-04-02 11:37:39 -07:00
Denis M Korzhenkov	ed5e75de2f	Nonreproducible sequence of evaluations fixed (#2153 ) As `num_round=2` there is no `0003.model` file after training.	2017-03-29 10:11:23 -07:00
Rory Mitchell	a33fa05bda	GPU Plugin: Bug fix #2048 (#2155 )	2017-03-29 10:10:57 -07:00
Huffers	d45cf240a9	Remove xgboost's thread_local and switch to dmlc::ThreadLocalStore (#2121 ) * Remove xgboost's own version of thread_local and switch to dmlc::ThreadLocalStore (#2109) * Update dmlc-core	2017-03-27 09:09:18 -07:00
Philip Cho	14fba01b5a	Improve multi-threaded performance (#2104 ) * Add UpdatePredictionCache() option to updaters Some updaters (e.g. fast_hist) has enough information to quickly compute prediction cache for the training data. Each updater may override UpdaterPredictionCache() method to update the prediction cache. Note: this trick does not apply to validation data. * Respond to code review * Disable some debug messages by default * Document UpdatePredictionCache() interface * Remove base_margin logic from UpdatePredictionCache() implementation * Do not take pointer to cfg, as reference may get stale * Improve multi-threaded performance * Use columnwise accessor to accelerate ApplySplit() step, with support for a compressed representation * Parallel sort for evaluation step * Inline BuildHist() function * Cache gradient pairs when building histograms in BuildHist() * Add missing #if macro * Respond to code review * Use wrapper to enable parallel sort on Linux * Fix C++ compatibility issues * MSVC doesn't support unsigned in OpenMP loops * gcc 4.6 doesn't support using keyword * Fix lint issues * Respond to code review * Fix bug in ApplySplitSparseData() * Attempting to read beyond the end of a sparse column * Mishandling the case where an entire range of rows have missing values * Fix training continuation bug Disable UpdatePredictionCache() in the first iteration. This way, we can accomodate the scenario where we build off of an existing (nonempty) ensemble. * Add regression test for fast_hist * Respond to code review * Add back old version of ApplySplitSparseData	2017-03-25 10:35:01 -07:00
Denis M Korzhenkov	332aea26a3	Formatting fixed for CLI parameters (#2145 ) Fixed list of parameters format for CLI mode	2017-03-24 08:54:58 -07:00
Laurae	5c13aa0a8a	GLM test unit: make run deterministic (#2147 )	2017-03-24 08:54:39 -07:00
付雨帆	f1fe024a9d	Update md grammar for the README.md (#2141 )	2017-03-23 11:02:06 -07:00
Qin Xiaoming	12cf0ae122	Update sparse_page_dmatrix.h (#2139 )	2017-03-23 11:01:40 -07:00
Yang Zhang	48835c3a4e	Update predict leaf indices (#2135 ) * Updated sklearn_parallel.py for soon-to-be-deprecated modules * Updated predict_leaf_indices.py; Use python3 print() as other exmaples and removed unused module	2017-03-22 19:12:34 -07:00
Matthew R. Becker	a4bae1bdcd	ENH more makefile updates (#2133 ) This commit proposes a simpler single compiler specification for OSX and nix. It also let's people override the setting on both systems, not just nix.	2017-03-22 16:22:15 -05:00
Yang Zhang	cc012dac68	Updated sklearn_parallel.py for soon-to-be-deprecated modules (#2134 )	2017-03-22 16:18:15 -05:00
Yang Zhang	f6f5003f79	Updated sklearn_examples.py for soon-to-be-deprecated modules (#2117 )	2017-03-21 20:07:27 -07:00
Zhiquan	e65564ba59	Update rank_obj.cc (#2126 ) typo: PairwieRankObj -> PairwiseRankObj	2017-03-21 20:06:16 -07:00
Matthew R. Becker	95b7dbb1ea	ENH add gcc/g++ before clang for macs (#2125 ) * ENH add gcc/g++ before clang for macs - will default to clang anyways and supports separate gcc installs * BUG missed a ) - :(	2017-03-21 20:05:09 -07:00
Tianqi Chen	dc2fb978e1	new thread local requires xcode8	2017-03-17 09:40:34 -07:00
Icyblade Dai	301540f1d9	fix DeprecationWarning on sklearn.cross_validation (#2075 ) * fix DeprecationWarning on sklearn.cross_validation * fix syntax * fix kfold n_split issue * fix mistype * fix n_splits multiple value issue * split should pass a iterable * use np.arange instead of xrange, py3 compatibility	2017-03-17 08:38:22 -05:00
Tianqi Chen	d581a3d0e7	[UPDATE] Update rabit and threadlocal (#2114 ) * [UPDATE] Update rabit and threadlocal * minor fix to make build system happy * upgrade requirement to g++4.8 * upgrade dmlc-core * update travis	2017-03-16 18:48:37 -07:00
Luckick	b0c972aa4d	Typo Issue (#2100 ) Contruct to Construct	2017-03-16 10:38:25 -07:00
Oleg Sofrygin	9d19e13ed0	adding a copy of base_margin to slice, fixes a bug where base_margin was notcopied during cross-validation (#2007 )	2017-03-16 10:36:57 -07:00
Liam Huang	3a2b8332a6	bugfix: when metric's name contains `-` (#2090 ) When metric's name contains `-`, Python will complain about insufficient arguments to unpack.	2017-03-16 10:36:39 -07:00
ZhouYong	fee1181803	fix online prediction function in learner.h (#2010 ) I use the online prediction function(`inline void Predict(const SparseBatch::Inst &inst, ... ) const;`), the results obtained are different from the results of the batch prediction function(` virtual void Predict(DMatrix* data, ...) const = 0`). After the investigation found that the online prediction function using the `base_score_` parameters, and the batch prediction function is not used in this parameter. It is found that the `base_score_` values are different when the same model file is loaded many times. ``` 1st times：base_score_: 6.69023e-21 2nd times：base_score_: -3.7668e+19 3rd times：base_score_: 5.40507e+07 ``` Online prediction results are affected by `base_score_` parameters. After deleting the if condition(`if (out_preds->size() == 1)`) , the online prediction is consistent with the batch prediction results, and the xgboost prediction results are consistent with python version. Therefore, it is likely that the online prediction function is bug	2017-03-16 10:35:52 -07:00
Matthew R. Becker	4a63f4ab43	BUG make sure to specify no openmp for some mac osx builds properly (#2095 )	2017-03-10 18:36:15 -08:00
Shaform	15456c7882	Remove deprecated prefix `bst:` (#2091 )	2017-03-09 09:06:37 -08:00
Holger Peters	95510b9667	Inform setuptools that this is a binary package (#1996 ) * Inform setuptools that this is a binary package that needs platform-tags in wheel names. This fixes issue #1995 . * PEP8 Formatting * Add docstring	2017-03-07 09:26:04 -06:00
cloverrose	288f309434	[jvm-packages] call setGroup for ranking task (#2066 ) * [jvm-packages] call setGroup for ranking task * passing groupData through xgBoostConfMap * fix original comment position * make groupData param * remove groupData variable, use xgBoostConfMap directly * set default groupData value * add use groupData tests * reduce rank-demo size * use TaskContext.getPartitionId() instead of mapPartitionsWithIndex * add DF use groupData test * remove unused varable	2017-03-06 15:45:06 -08:00

1 2 3 4 5 ...

3035 Commits