xgboost

Author	SHA1	Message	Date
Tianqi Chen	d281c6aafa	Update CONTRIBUTORS.md	2017-04-22 08:46:31 -07:00
Alex Bain	dbaa5d0bdf	Disable invalid check for completely sparse batch that results in failed assertion for issue #1827 (#2213 )	2017-04-21 09:28:02 -07:00
Nan Zhu	392aa6d1d3	[jvm-packages] make XGBoostModel hold BoosterParams as well (#2214 ) * add back train method but mark as deprecated * fix scalastyle error * make XGBoostModel hold BoosterParams as well	2017-04-21 08:12:50 -07:00
Benjamin Pachev	e38bea3cdf	Update README.md (#2202 ) Add a link to a demo for the proposed PHP XGBoost wrapper.	2017-04-17 15:28:37 -07:00
avpronkin	31e800f340	erratum in index.md (#2203 ) Mxnet instead of XGBoost	2017-04-17 15:24:18 -07:00
Seong-Jin Kim	8222755564	Fix typo in R-package README.md (#2190 )	2017-04-13 20:22:23 +02:00
Preston Parry	1ab8088a09	Removes extraneous log (#2186 ) This log appears to fire every time I ask the python package to make a prediction. It's the only log that fires from XGBoost. When we're getting predictions on millions of items a day in production, this log seems out of place.	2017-04-11 17:38:29 -07:00
Nan Zhu	a837fa9620	[jvm-packages] rdds containing boosters should be cleaned once we got boosters to driver (#2183 )	2017-04-11 06:12:49 -07:00
Nan Zhu	f08077606c	[jvm-packages] Clean external cache (#2181 ) * add back train method but mark as deprecated * fix scalastyle error * change class to object in examples * fix compilation error * small fix for cleanExternalCache	2017-04-10 07:49:58 -07:00
Nan Zhu	8d8cbcc6db	[jvm-packages] fixed several issues in unit tests (#2173 ) * add back train method but mark as deprecated * fix scalastyle error * change class to object in examples * fix compilation error * fix several issues in tests	2017-04-06 06:25:23 -07:00
Philip Cho	2715baef64	Fix bugs in multithreaded ApplySplitSparseData() (#2161 ) * Bugfix 1: Fix segfault in multithreaded ApplySplitSparseData() When there are more threads than rows in rowset, some threads end up with empty ranges, causing them to crash. (iend - 1 needs to be accessible as part of algorithm) Fix: run only those threads with nonempty ranges. * Add regression test for Bugfix 1 * Moving python_omp_test to existing python test group Turns out you don't need to set "OMP_NUM_THREADS" to enable multithreading. Just add nthread parameter. * Bugfix 2: Fix corner case of ApplySplitSparseData() for categorical feature When split value is less than all cut points, split_cond is set incorrectly. Fix: set split_cond = -1 to indicate this scenario * Bugfix 3: Initialize data layout indicator before using it data_layout_ is accessed before being set; this variable determines whether feature 0 is included in feat_set. Fix: re-order code in InitData() to initialize data_layout_ first * Adding regression test for Bugfix 2 Unfortunately, no regression test for Bugfix 3, as there is no way to deterministically assign value to an uninitialized variable.	2017-04-02 11:37:39 -07:00
Denis M Korzhenkov	ed5e75de2f	Nonreproducible sequence of evaluations fixed (#2153 ) As `num_round=2` there is no `0003.model` file after training.	2017-03-29 10:11:23 -07:00
Rory Mitchell	a33fa05bda	GPU Plugin: Bug fix #2048 (#2155 )	2017-03-29 10:10:57 -07:00
Huffers	d45cf240a9	Remove xgboost's thread_local and switch to dmlc::ThreadLocalStore (#2121 ) * Remove xgboost's own version of thread_local and switch to dmlc::ThreadLocalStore (#2109) * Update dmlc-core	2017-03-27 09:09:18 -07:00
Philip Cho	14fba01b5a	Improve multi-threaded performance (#2104 ) * Add UpdatePredictionCache() option to updaters Some updaters (e.g. fast_hist) has enough information to quickly compute prediction cache for the training data. Each updater may override UpdaterPredictionCache() method to update the prediction cache. Note: this trick does not apply to validation data. * Respond to code review * Disable some debug messages by default * Document UpdatePredictionCache() interface * Remove base_margin logic from UpdatePredictionCache() implementation * Do not take pointer to cfg, as reference may get stale * Improve multi-threaded performance * Use columnwise accessor to accelerate ApplySplit() step, with support for a compressed representation * Parallel sort for evaluation step * Inline BuildHist() function * Cache gradient pairs when building histograms in BuildHist() * Add missing #if macro * Respond to code review * Use wrapper to enable parallel sort on Linux * Fix C++ compatibility issues * MSVC doesn't support unsigned in OpenMP loops * gcc 4.6 doesn't support using keyword * Fix lint issues * Respond to code review * Fix bug in ApplySplitSparseData() * Attempting to read beyond the end of a sparse column * Mishandling the case where an entire range of rows have missing values * Fix training continuation bug Disable UpdatePredictionCache() in the first iteration. This way, we can accomodate the scenario where we build off of an existing (nonempty) ensemble. * Add regression test for fast_hist * Respond to code review * Add back old version of ApplySplitSparseData	2017-03-25 10:35:01 -07:00
Denis M Korzhenkov	332aea26a3	Formatting fixed for CLI parameters (#2145 ) Fixed list of parameters format for CLI mode	2017-03-24 08:54:58 -07:00
Laurae	5c13aa0a8a	GLM test unit: make run deterministic (#2147 )	2017-03-24 08:54:39 -07:00
付雨帆	f1fe024a9d	Update md grammar for the README.md (#2141 )	2017-03-23 11:02:06 -07:00
Qin Xiaoming	12cf0ae122	Update sparse_page_dmatrix.h (#2139 )	2017-03-23 11:01:40 -07:00
Yang Zhang	48835c3a4e	Update predict leaf indices (#2135 ) * Updated sklearn_parallel.py for soon-to-be-deprecated modules * Updated predict_leaf_indices.py; Use python3 print() as other exmaples and removed unused module	2017-03-22 19:12:34 -07:00
Matthew R. Becker	a4bae1bdcd	ENH more makefile updates (#2133 ) This commit proposes a simpler single compiler specification for OSX and nix. It also let's people override the setting on both systems, not just nix.	2017-03-22 16:22:15 -05:00
Yang Zhang	cc012dac68	Updated sklearn_parallel.py for soon-to-be-deprecated modules (#2134 )	2017-03-22 16:18:15 -05:00
Yang Zhang	f6f5003f79	Updated sklearn_examples.py for soon-to-be-deprecated modules (#2117 )	2017-03-21 20:07:27 -07:00
Zhiquan	e65564ba59	Update rank_obj.cc (#2126 ) typo: PairwieRankObj -> PairwiseRankObj	2017-03-21 20:06:16 -07:00
Matthew R. Becker	95b7dbb1ea	ENH add gcc/g++ before clang for macs (#2125 ) * ENH add gcc/g++ before clang for macs - will default to clang anyways and supports separate gcc installs * BUG missed a ) - :(	2017-03-21 20:05:09 -07:00
Tianqi Chen	dc2fb978e1	new thread local requires xcode8	2017-03-17 09:40:34 -07:00
Icyblade Dai	301540f1d9	fix DeprecationWarning on sklearn.cross_validation (#2075 ) * fix DeprecationWarning on sklearn.cross_validation * fix syntax * fix kfold n_split issue * fix mistype * fix n_splits multiple value issue * split should pass a iterable * use np.arange instead of xrange, py3 compatibility	2017-03-17 08:38:22 -05:00
Tianqi Chen	d581a3d0e7	[UPDATE] Update rabit and threadlocal (#2114 ) * [UPDATE] Update rabit and threadlocal * minor fix to make build system happy * upgrade requirement to g++4.8 * upgrade dmlc-core * update travis	2017-03-16 18:48:37 -07:00
Luckick	b0c972aa4d	Typo Issue (#2100 ) Contruct to Construct	2017-03-16 10:38:25 -07:00
Oleg Sofrygin	9d19e13ed0	adding a copy of base_margin to slice, fixes a bug where base_margin was notcopied during cross-validation (#2007 )	2017-03-16 10:36:57 -07:00
Liam Huang	3a2b8332a6	bugfix: when metric's name contains `-` (#2090 ) When metric's name contains `-`, Python will complain about insufficient arguments to unpack.	2017-03-16 10:36:39 -07:00
ZhouYong	fee1181803	fix online prediction function in learner.h (#2010 ) I use the online prediction function(`inline void Predict(const SparseBatch::Inst &inst, ... ) const;`), the results obtained are different from the results of the batch prediction function(` virtual void Predict(DMatrix* data, ...) const = 0`). After the investigation found that the online prediction function using the `base_score_` parameters, and the batch prediction function is not used in this parameter. It is found that the `base_score_` values are different when the same model file is loaded many times. ``` 1st times：base_score_: 6.69023e-21 2nd times：base_score_: -3.7668e+19 3rd times：base_score_: 5.40507e+07 ``` Online prediction results are affected by `base_score_` parameters. After deleting the if condition(`if (out_preds->size() == 1)`) , the online prediction is consistent with the batch prediction results, and the xgboost prediction results are consistent with python version. Therefore, it is likely that the online prediction function is bug	2017-03-16 10:35:52 -07:00
Matthew R. Becker	4a63f4ab43	BUG make sure to specify no openmp for some mac osx builds properly (#2095 )	2017-03-10 18:36:15 -08:00
Shaform	15456c7882	Remove deprecated prefix `bst:` (#2091 )	2017-03-09 09:06:37 -08:00
Holger Peters	95510b9667	Inform setuptools that this is a binary package (#1996 ) * Inform setuptools that this is a binary package that needs platform-tags in wheel names. This fixes issue #1995 . * PEP8 Formatting * Add docstring	2017-03-07 09:26:04 -06:00
cloverrose	288f309434	[jvm-packages] call setGroup for ranking task (#2066 ) * [jvm-packages] call setGroup for ranking task * passing groupData through xgBoostConfMap * fix original comment position * make groupData param * remove groupData variable, use xgBoostConfMap directly * set default groupData value * add use groupData tests * reduce rank-demo size * use TaskContext.getPartitionId() instead of mapPartitionsWithIndex * add DF use groupData test * remove unused varable	2017-03-06 15:45:06 -08:00
geoHeil	cf6b173bd7	[jvm-packages] Spark pipeline persistence (#1906 ) [jvm-packages] Spark pipeline persistence	2017-03-05 18:35:37 -08:00
Xin Yin	5b54b9437c	Fixed Exception handling for fragmented Rabit 'print' tracker command. Fixed unit test. (#2081 )	2017-03-05 13:40:59 -08:00
Nan Zhu	ab13fd72bd	[jvm-packages] Scala/Java interface for Fast Histogram Algorithm (#1966 ) * add back train method but mark as deprecated * fix scalastyle error * first commit in scala binding for fast histo * java test * add missed scala tests * spark training * add back train method but mark as deprecated * fix scalastyle error * local change * first commit in scala binding for fast histo * local change * fix df frame test	2017-03-04 15:37:24 -08:00
Nan Zhu	ac30a0aff5	[jvm-packages][spark]Preserve num classes (#2068 ) * add back train method but mark as deprecated * fix scalastyle error * change class to object in examples * fix compilation error * bump spark version to 2.1 * preserve num_class issues * fix failed test cases * rivising * add multi class test	2017-03-04 14:14:31 -08:00
hlsc	a92093388d	[jvm-packages] fix bug doing rabit call after finalize (#2079 ) [jvm-packages]fix bug doing rabit call after finalize	2017-03-02 16:46:57 -08:00
Tianqi Chen	fd19b7a188	Automatically remove nan from input data when it is sparse. (#2062 ) * [DATALoad] Automatically remove Nan when load from sparse matrix * add log	2017-02-25 08:59:17 -08:00
moqiguzhu	5d093a7f4c	in `caret` settings, if you want do 10*10 cross validation, you need to set repeats=10, number=10 and method=repeatedcv, (#2061 ) if you set method=cv, actually just one 10-fold cross validation will be run; fixes #2055	2017-02-25 09:16:19 -05:00
Eric Liu	7927031ffe	print_evaluation callback output on last iteration (#2036 ) verbose_eval docs claim it will log the last iteration (http://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.train). this is also consistent w/the behavior from 0.4. not a huge deal but I found it handy to see the last iter's result b/c my period is usually large. this doesn't address logging the last stage found by early_stopping (as noted in docs) as I'm not sure how to do that.	2017-02-24 23:06:48 -05:00
Vadim Khotilovich	b4d97d3cb8	R maintenance Feb2017 (#2045 ) * [R] better argument check in xgb.DMatrix; fixes #1480 * [R] showsd was a dummy; fixes #2044 * [R] better categorical encoding explanation in vignette; fixes #1989 * [R] new roxygen version docs update	2017-02-20 10:02:40 -08:00
Nan Zhu	63aec12a13	[jvm-packages] Bump spark to 2.1 (#2046 )	2017-02-19 08:29:35 -08:00
Nan Zhu	185fe1d645	[jvm-packages] use ML's para system to build the passed-in params to XGBoost (#2043 ) * add back train method but mark as deprecated * fix scalastyle error * use ML's para system to build the passed-in params to XGBoost * clean	2017-02-18 11:56:27 -08:00
DougM	acce11d3f4	fix MLlib CrossValidator issues (wrong default value configuration) #1941 (#2042 )	2017-02-18 08:10:47 -08:00
Theodore Vasiloudis	9fb46e2c5e	[trivial] Fix typo in Poisson metric name. (#2026 )	2017-02-09 09:32:06 -08:00
ANtlord	f054d812dc	Fix typo in Python Package Introduction (#2023 ) Fixed #2016	2017-02-08 23:35:13 -05:00

1 2 3 4 5 ...

3021 Commits