xgboost

Author	SHA1	Message	Date
Andy Adinets	cc6a5a3666	Added finding quantiles on GPU. (#3393 ) * Added finding quantiles on GPU. - this includes datasets where weights are assigned to data rows - as the quantiles found by the new algorithm are not the same as those found by the old one, test thresholds in tests/python-gpu/test_gpu_updaters.py have been adjusted. * Adjustments and improved testing for finding quantiles on the GPU. - added C++ tests for the DeviceSketch() function - reduced one of the thresholds in test_gpu_updaters.py - adjusted the cuts found by the find_cuts_k kernel	2018-07-27 14:03:16 +12:00
Henry Gouk	a13e29ece1	Add LASSO (#3429 ) * Allow multiple split constraints * Replace RidgePenalty with ElasticNet * Add test for checking Ridge, LASSO, and Elastic Net are implemented	2018-07-15 16:38:26 +12:00
Henry Gouk	64b8cffde3	Refactor of FastHistMaker to allow for custom regularisation methods (#3335 ) * Refactor to allow for custom regularisation methods * Implement compositional SplitEvaluator framework * Fixed segfault when no monotone_constraints are supplied. * Change pid to parentID * test_monotone_constraints.py now passes * Refactor ColMaker and DistColMaker to use SplitEvaluator * Performance optimisation when no monotone_constraints specified * Fix linter messages * Fix a few more linter errors * Update the amalgamation * Add bounds check * Add check for leaf node * Fix linter error in param.h * Fix clang-tidy errors on CI * Fix incorrect function name * Fix clang-tidy error in updater_fast_hist.cc * Enable SSE2 for Win32 R MinGW Addresses https://github.com/dmlc/xgboost/pull/3335#issuecomment-400535752 * Add contributor	2018-06-28 07:37:25 +00:00
Andy Adinets	286dccb8e8	GPU binning and compression. (#3319 ) * GPU binning and compression. - binning and index compression are done inside the DeviceShard constructor - in case of a DMatrix with multiple row batches, it is first converted into a single row batch	2018-06-05 17:15:13 +12:00
Rory Mitchell	ccf80703ef	Clang-tidy static analysis (#3222 ) * Clang-tidy static analysis * Modernise checks * Google coding standard checks * Identifier renaming according to Google style	2018-04-19 18:57:13 +12:00
redditur	d5f1b74ef5	'hist': Montonic Constraints (#3085 ) * Extended monotonic constraints support to 'hist' tree method. * Added monotonic constraints tests. * Fix the signature of NoConstraint::CalcSplitGain() * Document monotonic constraint support in 'hist' * Update signature of Update to account for latest refactor	2018-03-05 16:45:49 -08:00
Rory Mitchell	1b77903eeb	Fix several GPU bugs (#2916 ) * Fix #2905 * Fix gpu_exact test failures * Fix bug in GPU prediction where multiple calls to batch prediction can produce incorrect results * Fix GPU documentation formatting	2017-12-04 08:27:49 +13:00
Rory Mitchell	c51adb49b6	Monotone constraints for gpu_hist (#2904 )	2017-11-30 10:26:19 +13:00
Rory Mitchell	13e7a2cff0	Various bug fixes (#2825 ) * Fatal error if GPU algorithm selected without GPU support compiled * Resolve type conversion warnings * Fix gpu unit test failure * Fix compressed iterator edge case * Fix python unit test failures due to flake8 update on pip	2017-10-25 14:45:01 +13:00
Rory Mitchell	4cb2f7598b	-Add experimental GPU algorithm for lossguided mode (#2755 ) -Improved GPU algorithm unit tests -Removed some thrust code to improve compile times	2017-10-01 00:18:35 +13:00
Rory Mitchell	e6a9063344	Integer gradient summation for GPU histogram algorithm. (#2681 )	2017-09-08 15:07:29 +12:00
Philip Cho	ba820847f9	Patch to improve multithreaded performance scaling (#2493 ) * Patch to improve multithreaded performance scaling Change parallel strategy for histogram construction. Instead of partitioning data rows among multiple threads, partition feature columns instead. Useful heuristics for assigning partitions have been adopted from LightGBM project. * Add missing header to satisfy MSVC * Restore max_bin and related parameters to TrainParam * Fix lint error * inline functions do not require static keyword * Feature grouping algorithm accepting FastHistParam Feature grouping algorithm accepts many parameters (3+), and it gets annoying to pass them one by one. Instead, simply pass the reference to FastHistParam. The definition of FastHistParam has been moved to a separate header file to accomodate this change.	2017-07-07 08:25:07 -07:00
Rory Mitchell	5f1b0bb386	[GPU-Plugin] Unify gpu_gpair/bst_gpair. Refactor. (#2477 )	2017-07-01 17:31:13 +12:00
Rory Mitchell	0e48f87529	[GPU-Plugin] Make node_idx type 32 bit for hist algo. Set default n_gpus to 1. (#2445 )	2017-06-23 18:26:45 +12:00
PSEUDOTENSOR / Jonathan McKinney	41efe32aa5	[GPU-Plugin] Multi-GPU for grow_gpu_hist histogram method using NVIDIA NCCL. (#2395 )	2017-06-12 05:06:08 +12:00
Rory Mitchell	8ab5d4611c	[GPU-Plugin] (#2227 ) * Add fast histogram algorithm * Fix Linux build * Add 'gpu_id' parameter	2017-04-25 16:37:10 -07:00
Philip Cho	14fba01b5a	Improve multi-threaded performance (#2104 ) * Add UpdatePredictionCache() option to updaters Some updaters (e.g. fast_hist) has enough information to quickly compute prediction cache for the training data. Each updater may override UpdaterPredictionCache() method to update the prediction cache. Note: this trick does not apply to validation data. * Respond to code review * Disable some debug messages by default * Document UpdatePredictionCache() interface * Remove base_margin logic from UpdatePredictionCache() implementation * Do not take pointer to cfg, as reference may get stale * Improve multi-threaded performance * Use columnwise accessor to accelerate ApplySplit() step, with support for a compressed representation * Parallel sort for evaluation step * Inline BuildHist() function * Cache gradient pairs when building histograms in BuildHist() * Add missing #if macro * Respond to code review * Use wrapper to enable parallel sort on Linux * Fix C++ compatibility issues * MSVC doesn't support unsigned in OpenMP loops * gcc 4.6 doesn't support using keyword * Fix lint issues * Respond to code review * Fix bug in ApplySplitSparseData() * Attempting to read beyond the end of a sparse column * Mishandling the case where an entire range of rows have missing values * Fix training continuation bug Disable UpdatePredictionCache() in the first iteration. This way, we can accomodate the scenario where we build off of an existing (nonempty) ensemble. * Add regression test for fast_hist * Respond to code review * Add back old version of ApplySplitSparseData	2017-03-25 10:35:01 -07:00
Tianqi Chen	d581a3d0e7	[UPDATE] Update rabit and threadlocal (#2114 ) * [UPDATE] Update rabit and threadlocal * minor fix to make build system happy * upgrade requirement to g++4.8 * upgrade dmlc-core * update travis	2017-03-16 18:48:37 -07:00
Philip Cho	49ff7c1649	Rename parameter in fast_hist to disambiguate (#1962 )	2017-01-13 11:35:55 -08:00
Philip Cho	aeb4e76118	Histogram Optimized Tree Grower (#1940 ) * Support histogram-based algorithm + multiple tree growing strategy * Add a brand new updater to support histogram-based algorithm, which buckets continuous features into discrete bins to speed up training. To use it, set `tree_method = fast_hist` to configuration. * Support multiple tree growing strategies. For now, two policies are supported: * `grow_policy=depthwise` (default): favor splitting at nodes closest to the root, i.e. grow depth-wise. * `grow_policy=lossguide`: favor splitting at nodes with highest loss change * Improve single-threaded performance * Unroll critical loops * Introduce specialized code for dense data (i.e. no missing values) * Additional training parameters: `max_leaves`, `max_bin`, `grow_policy`, `verbose` * Adding a small test for hist method * Fix memory error in row_set.h When std::vector is resized, a reference to one of its element may become stale. Any such reference must be updated as well. * Resolve cross-platform compilation issues * Versions of g++ older than 4.8 lacks support for a few C++11 features, e.g. alignas() and new initializer syntax. To support g++ 4.6, use pre-C++11 initializer and remove alignas(). * Versions of MSVC older than 2015 does not support alignas(). To support MSVC 2012, remove alignas(). * For g++ 4.8 and newer, alignas() is enabled for performance benefits. Some old compilers (MSVC 2012, g++ 4.6) do not support template aliases (which uses `using` to declate type aliases). So always use `typedef`. * Fix a host of CI issues * Remove dependency for libz on osx * Fix heading for hist_util * Fix minor style issues * Add missing #include * Remove extraneous logging * Enable tree_method=hist in R * Renaming HistMaker to GHistBuilder to avoid confusion * Fix R integration * Respond to style comments * Consistent tie-breaking for priority queue using timestamps * Last-minute style fixes * Fix issuecomment-271977647 The way we quantize data is broken. The agaricus data consists of all categorical values. When NAs are converted into 0's, `HistCutMatrix::Init` assign both 0's and 1's to the same single bin. Why? gmat only the smallest value (0) and an upper bound (2), which is twice the maximum value (1). Add the maximum value itself to gmat to fix the issue. * Fix issuecomment-272266358 * Remove padding from cut values for the continuous case * For categorical/ordinal values, use midpoints as bin boundaries to be safe * Fix CI issue -- do not use xrange() Fix corner case in quantile sketch Signed-off-by: Philip Cho <chohyu01@cs.washington.edu> * Adding a test for an edge case in quantile sketcher max_bin=2 used to cause an exception. * Fix fast_hist test The test used to require a strictly increasing Test AUC for all examples. One of them exhibits a small blip in Test AUC before achieving a Test AUC of 1. (See bottom.) Solution: do not require monotonic increase for this particular example. [0] train-auc:0.99989 test-auc:0.999497 [1] train-auc:1 test-auc:0.999749 [2] train-auc:1 test-auc:0.999749 [3] train-auc:1 test-auc:0.999749 [4] train-auc:1 test-auc:0.999749 [5] train-auc:1 test-auc:0.999497 [6] train-auc:1 test-auc:1 [7] train-auc:1 test-auc:1 [8] train-auc:1 test-auc:1 [9] train-auc:1 test-auc:1	2017-01-13 09:25:55 -08:00
Vadim Khotilovich	a44032d095	[CORE] The update process for a tree model, and its application to feature importance (#1670 ) * [CORE] allow updating trees in an existing model * [CORE] in refresh updater, allow keeping old leaf values and update stats only * [R-package] xgb.train mod to allow updating trees in an existing model * [R-package] added check for nrounds when is_update * [CORE] merge parameter declaration changes; unify their code style * [CORE] move the update-process trees initialization to Configure; rename default process_type to 'default'; fix the trees and trees_to_update sizes comparison check * [R-package] unit tests for the update process type * [DOC] documentation for process_type parameter; improved docs for updater, Gamma and Tweedie; added some parameter aliases; metrics indentation and some were non-documented * fix my sloppy merge conflict resolutions * [CORE] add a TreeProcessType enum * whitespace fix	2016-12-04 09:33:52 -08:00
AbdealiJK	6f16f0ef58	Use bst_float consistently throughout (#1824 ) * Fix various typos * Add override to functions that are overridden gcc gives warnings about functions that are being overridden by not being marked as oveirridden. This fixes it. * Use bst_float consistently Use bst_float for all the variables that involve weight, leaf value, gradient, hessian, gain, loss_chg, predictions, base_margin, feature values. In some cases, when due to additions and so on the value can take a larger value, double is used. This ensures that type conversions are minimal and reduces loss of precision.	2016-11-30 10:02:10 -08:00
RAMitchell	be2f28ec08	Update build instructions, improve memory usage (#1811 )	2016-11-25 09:43:22 -08:00
Simon DENEL	58aa1129ea	Fixing a few typos (#1771 ) * Fixing a few typos * Fixing a few typos	2016-11-13 15:47:52 -08:00
RAMitchell	ac41845d4b	Add GPU accelerated tree construction plugin (#1679 )	2016-10-20 20:14:47 -07:00
Tianqi Chen	c93c9b7ed6	[TREE] Experimental version of monotone constraint (#1516 ) * [TREE] Experimental version of monotone constraint * Allow default detection of montone option * loose the condition of strict check * Update gbtree.cc	2016-09-07 21:28:43 -07:00
Vadim Khotilovich	75f401481f	no exception throwing within omp parallel; set nthread in Learner (#1421 )	2016-07-29 10:08:03 -07:00
Frank	3b73824842	Fix ambiguous call to abs(c or c++). (#1308 )	2016-06-29 14:28:28 -07:00
tqchen	63c4ad7617	[APPROX] Make global proposal default, add group ptr solution	2016-02-10 11:19:10 -08:00
tqchen	d75e3ed05d	[LIBXGBOOST] pass demo running.	2016-01-16 10:24:01 -08:00
tqchen	e4567bbc47	[REFACTOR] Add alias, allow missing variables, init gbm interface	2016-01-16 10:24:01 -08:00
tqchen	d4677b6561	[TREE] finish move of updater	2016-01-16 10:24:01 -08:00
tqchen	20043f63a6	[TREE] Move colmaker	2016-01-16 10:24:01 -08:00
tqchen	d530e0c14f	[REFACTOR] cleanup structure	2016-01-16 10:24:00 -08:00
Vadim Khotilovich	c70022e6c4	spelling, wording, and doc fixes in c++ code I was reading through the code and fixing some things in the comments. Only a few trivial actual code changes were made to make things more readable.	2015-12-12 21:40:12 -06:00
Tianqi Chen	fd8439ffbc	Update param.h enforce parallel option to 0 for now for stable result	2015-10-19 08:59:06 -07:00
tqchen	0162bb7034	lint half way	2015-07-03 18:31:52 -07:00
tqchen	e5dd894960	add a indicator opt	2015-06-02 11:38:06 -07:00
tqchen	09a841f810	auto turn on optimization	2015-05-15 23:54:34 -07:00
tqchen	792cff5abc	checkin some micro optimization	2015-05-15 23:54:03 -07:00
tqchen	23e46b7fa5	add max_delta_step	2015-03-26 09:47:16 -07:00
Tianqi Chen	2159d18f0b	Update param.h	2015-03-13 23:23:23 -07:00
tqchen	cd2bce4719	update with new rabit api	2015-01-19 21:32:25 -08:00
tqchen	08e9813c9b	potential BUG in skmaker?	2014-11-18 21:23:36 -08:00
tqchen	1b66a87456	checkin skmaker	2014-11-18 20:57:28 -08:00
tqchen	c86b83ea04	a version that compile	2014-11-15 17:41:03 -08:00
tqchen	c1f1bb9206	first ver	2014-11-15 09:46:30 -08:00
tqchen	a68ac8033e	refresher is now distributed	2014-10-17 14:48:32 -07:00
tqchen	aefe58a207	middle version	2014-10-16 10:38:49 -07:00
tqchen	91e34c6fb4	ok	2014-09-12 21:26:38 -07:00

1 2

64 Commits