xgboost

Author	SHA1	Message	Date
Rory Mitchell	07ff52d54c	Dynamically allocate GPU histogram memory (#3519 ) * Expand histogram memory dynamically to prevent large allocations for large tree depths (e.g. > 15) * Remove GPU memory allocation messages. These are misleading as a large number of allocations are now dynamic. * Fix appveyor R test	2018-07-28 21:22:41 +12:00
Andy Adinets	cc6a5a3666	Added finding quantiles on GPU. (#3393 ) * Added finding quantiles on GPU. - this includes datasets where weights are assigned to data rows - as the quantiles found by the new algorithm are not the same as those found by the old one, test thresholds in tests/python-gpu/test_gpu_updaters.py have been adjusted. * Adjustments and improved testing for finding quantiles on the GPU. - added C++ tests for the DeviceSketch() function - reduced one of the thresholds in test_gpu_updaters.py - adjusted the cuts found by the find_cuts_k kernel	2018-07-27 14:03:16 +12:00
Rory Mitchell	a725272e19	Correct mistake from dmatrix refactor (#3408 )	2018-07-24 15:03:36 +12:00
Rory Mitchell	0f145a0365	Resolve GPU bug on large files (#3472 ) Remove calls to thrust copy, fix indexing bug	2018-07-16 20:43:45 +12:00
Henry Gouk	a13e29ece1	Add LASSO (#3429 ) * Allow multiple split constraints * Replace RidgePenalty with ElasticNet * Add test for checking Ridge, LASSO, and Elastic Net are implemented	2018-07-15 16:38:26 +12:00
Thejaswi	2200939416	Upgrading to NCCL2 (#3404 ) * Upgrading to NCCL2 * Part - II of NCCL2 upgradation - Doc updates to build with nccl2 - Dockerfile.gpu update for a correct CI build with nccl2 - Updated FindNccl package to have env-var NCCL_ROOT to take precedence * Upgrading to v9.2 for CI workflow, since it has the nccl2 binaries available * Added NCCL2 license + copy the nccl binaries into /usr location for the FindNccl module to find * Set LD_LIBRARY_PATH variable to pick nccl2 binary at runtime * Need the nccl2 library download instructions inside Dockerfile.release as well * Use NCCL2 as a static library	2018-07-10 00:42:15 -07:00
Henry Gouk	64b8cffde3	Refactor of FastHistMaker to allow for custom regularisation methods (#3335 ) * Refactor to allow for custom regularisation methods * Implement compositional SplitEvaluator framework * Fixed segfault when no monotone_constraints are supplied. * Change pid to parentID * test_monotone_constraints.py now passes * Refactor ColMaker and DistColMaker to use SplitEvaluator * Performance optimisation when no monotone_constraints specified * Fix linter messages * Fix a few more linter errors * Update the amalgamation * Add bounds check * Add check for leaf node * Fix linter error in param.h * Fix clang-tidy errors on CI * Fix incorrect function name * Fix clang-tidy error in updater_fast_hist.cc * Enable SSE2 for Win32 R MinGW Addresses https://github.com/dmlc/xgboost/pull/3335#issuecomment-400535752 * Add contributor	2018-06-28 07:37:25 +00:00
Thejaswi	0e78034607	Shared memory atomics while building histogram (#3384 ) * Use shared memory atomics for building histograms, whenever possible	2018-06-19 16:03:09 +12:00
Rory Mitchell	a96039141a	Dmatrix refactor stage 1 (#3301 ) * Use sparse page as singular CSR matrix representation * Simplify dmatrix methods * Reduce statefullness of batch iterators * BREAKING CHANGE: Remove prob_buffer_row parameter. Users are instead recommended to sample their dataset as a preprocessing step before using XGBoost.	2018-06-07 10:25:58 +12:00
Andy Adinets	286dccb8e8	GPU binning and compression. (#3319 ) * GPU binning and compression. - binning and index compression are done inside the DeviceShard constructor - in case of a DMatrix with multiple row batches, it is first converted into a single row batch	2018-06-05 17:15:13 +12:00
Samuel O. Ronsin	cc79a65ab9	Increase precision of bst_float values in tree dumps (#3298 ) * Increase precision of bst_float values in tree dumps * Increase precision of bst_float values in tree dumps * Fix lint error and switch precision to right float variable * Fix clang-tidy error	2018-05-09 14:12:21 -07:00
Andrew V. Adinetz	b8a0d66fe6	Multi-GPU HostDeviceVector. (#3287 ) * Multi-GPU HostDeviceVector. - HostDeviceVector instances can now span multiple devices, defined by GPUSet struct - the interface of HostDeviceVector has been modified accordingly - GPU objective functions are now multi-GPU - GPU predicting from cache is now multi-GPU - avoiding omp_set_num_threads() calls - other minor changes	2018-05-05 08:00:05 +12:00
Thejaswi	c80d51ccb3	Fix issue #3264 , accuracy issues on k80 GPUs. (#3293 )	2018-05-04 13:14:08 +12:00
Rory Mitchell	ccf80703ef	Clang-tidy static analysis (#3222 ) * Clang-tidy static analysis * Modernise checks * Google coding standard checks * Identifier renaming according to Google style	2018-04-19 18:57:13 +12:00
Rory Mitchell	a1ec7b1716	Change reduce operation from thrust to cub. Fix for cuda 9.1 error (#3218 ) * Change reduce operation from thrust to cub. Fix for cuda 9.1 runtime error * Unit test sum reduce	2018-04-04 14:21:48 +12:00
Andrew V. Adinetz	a1b48afa41	Added back UpdatePredictionCache() in updater_gpu_hist.cu. (#3120 ) * Added back UpdatePredictionCache() in updater_gpu_hist.cu. - it had been there before, but wasn't ported to the new version of updater_gpu_hist.cu	2018-03-09 15:06:45 +13:00
redditur	d5f1b74ef5	'hist': Montonic Constraints (#3085 ) * Extended monotonic constraints support to 'hist' tree method. * Added monotonic constraints tests. * Fix the signature of NoConstraint::CalcSplitGain() * Document monotonic constraint support in 'hist' * Update signature of Update to account for latest refactor	2018-03-05 16:45:49 -08:00
Andrew V. Adinetz	d5992dd881	Replaced std::vector-based interfaces with HostDeviceVector-based interfaces. (#3116 ) * Replaced std::vector-based interfaces with HostDeviceVector-based interfaces. - replacement was performed in the learner, boosters, predictors, updaters, and objective functions - only interfaces used in training were replaced; interfaces like PredictInstance() still use std::vector - refactoring necessary for replacement of interfaces was also performed, such as using HostDeviceVector in prediction cache * HostDeviceVector-based interfaces for custom objective function example plugin.	2018-02-28 13:00:04 +13:00
Rory Mitchell	dd82b28e20	Update GPU code with dmatrix changes (#3117 )	2018-02-17 12:11:48 +13:00
Rory Mitchell	f87802f00c	Fix GPU bugs (#3051 ) * Change uint to unsigned int * Fix no root predictions bug * Remove redundant splitting due to numerical instability	2018-01-23 13:14:15 +13:00
Thejaswi	84ab74f3a5	Objective function evaluation on GPU with minimal PCIe transfers (#2935 ) * Added GPU objective function and no-copy interface. - xgboost::HostDeviceVector<T> syncs automatically between host and device - no-copy interfaces have been added - default implementations just sync the data to host and call the implementations with std::vector - GPU objective function, predictor, histogram updater process data directly on GPU	2018-01-12 21:33:39 +13:00
PSEUDOTENSOR / Jonathan McKinney	4d36036fe6	Avoid repeated cuda API call in GPU predictor and only synchronize used GPUs (#2936 )	2017-12-09 16:00:42 +13:00
Rory Mitchell	1b77903eeb	Fix several GPU bugs (#2916 ) * Fix #2905 * Fix gpu_exact test failures * Fix bug in GPU prediction where multiple calls to batch prediction can produce incorrect results * Fix GPU documentation formatting	2017-12-04 08:27:49 +13:00
Rory Mitchell	c51adb49b6	Monotone constraints for gpu_hist (#2904 )	2017-11-30 10:26:19 +13:00
Rory Mitchell	c55f14668e	Update gpu_hist algorithm (#2901 )	2017-11-27 13:44:24 +13:00
Rory Mitchell	24f527a1c0	AVX gradients (#2878 ) * AVX gradients * Add google test for AVX * Create fallback implementation, remove fma instruction * Improved accuracy of AVX exp function	2017-11-27 08:56:01 +13:00
Rory Mitchell	40c6e2f0c8	Improved gpu_hist_experimental algorithm (#2866 ) - Implement colsampling, subsampling for gpu_hist_experimental - Optimised multi-GPU implementation for gpu_hist_experimental - Make nccl optional - Add Volta architecture flag - Optimise RegLossObj - Add timing utilities for debug verbose mode - Bump required cuda version to 8.0	2017-11-11 13:58:40 +13:00
Rory Mitchell	d9d5293cdb	Add warnings for large labels when using GPU histogram algorithms (#2834 )	2017-10-26 17:31:10 +13:00
Rory Mitchell	13e7a2cff0	Various bug fixes (#2825 ) * Fatal error if GPU algorithm selected without GPU support compiled * Resolve type conversion warnings * Fix gpu unit test failure * Fix compressed iterator edge case * Fix python unit test failures due to flake8 update on pip	2017-10-25 14:45:01 +13:00
Rory Mitchell	4cb2f7598b	-Add experimental GPU algorithm for lossguided mode (#2755 ) -Improved GPU algorithm unit tests -Removed some thrust code to improve compile times	2017-10-01 00:18:35 +13:00
Vadim Khotilovich	74db9757b3	[R package] GPU support (#2732 ) * [R] MSVC compatibility * [GPU] allow seed in BernoulliRng up to size_t and scale to uint32_t * R package build with cmake and CUDA * R package CUDA build fixes and cleanups * always export the R package native initialization routine on windows * update the install instructions doc * fix lint * use static_cast directly to set BernoulliRng seed * [R] demo for GPU accelerated algorithm * tidy up the R package cmake stuff * R pack cmake: installs main dependency packages if needed * [R] version bump in DESCRIPTION * update NEWS * added short missing/sparse values explanations to FAQ	2017-09-28 18:15:28 -05:00
Rory Mitchell	e6a9063344	Integer gradient summation for GPU histogram algorithm. (#2681 )	2017-09-08 15:07:29 +12:00
Rory Mitchell	15267eedf2	[GPU-Plugin] Major refactor 2 (#2664 ) * Change cmake option * Move source files * Move google tests * Move python tests * Move benchmarks * Move documentation * Remove makefile support * Fix test run * Move GPU tests	2017-09-08 09:57:16 +12:00
Rory Mitchell	19a53814ce	[GPU-Plugin] Major refactor (#2644 ) * Removal of redundant code/files. * Removal of exact namespace in GPU plugin * Revert double precision histograms to single precision for performance on Maxwell/Kepler	2017-08-30 10:53:52 +12:00
PSEUDOTENSOR / Jonathan McKinney	ca7fc9fda3	[GPU-Plugin] Fix gpu_hist to allow matrices with more than just 2^{32} elements. Also fixed CPU hist algorithm. (#2518 )	2017-07-18 11:19:27 +12:00
Philip Cho	64c8f6fa6d	Use old parallel algorithm for histogram construction by default (#2501 ) It has been reported that new parallel algorithm (#2493) results in excessive message usage (see issue #2326). Until issues are resolved, XGBoost should use the old parallel algorithm by default. The user would have to specify `enable_feature_grouping=1` manually to enable the new algorithm.	2017-07-10 09:35:48 -07:00
Vadim Khotilovich	7350085955	Fix broken make on windows (#2499 ) * fix Makefile for make on windows * clean up compilation warnings * fix for `no file name for include` make warning	2017-07-08 09:17:31 -07:00
Philip Cho	ba820847f9	Patch to improve multithreaded performance scaling (#2493 ) * Patch to improve multithreaded performance scaling Change parallel strategy for histogram construction. Instead of partitioning data rows among multiple threads, partition feature columns instead. Useful heuristics for assigning partitions have been adopted from LightGBM project. * Add missing header to satisfy MSVC * Restore max_bin and related parameters to TrainParam * Fix lint error * inline functions do not require static keyword * Feature grouping algorithm accepting FastHistParam Feature grouping algorithm accepts many parameters (3+), and it gets annoying to pass them one by one. Instead, simply pass the reference to FastHistParam. The definition of FastHistParam has been moved to a separate header file to accomodate this change.	2017-07-07 08:25:07 -07:00
Rory Mitchell	5f1b0bb386	[GPU-Plugin] Unify gpu_gpair/bst_gpair. Refactor. (#2477 )	2017-07-01 17:31:13 +12:00
PSEUDOTENSOR / Jonathan McKinney	6b287177c8	[GPU-Plugin] Multi-GPU gpu_id bug fixes for grow_gpu_hist and grow_gpu methods, and additional documentation for the gpu plugin. (#2463 )	2017-06-30 20:04:17 +12:00
Rory Mitchell	0e48f87529	[GPU-Plugin] Make node_idx type 32 bit for hist algo. Set default n_gpus to 1. (#2445 )	2017-06-23 18:26:45 +12:00
PSEUDOTENSOR / Jonathan McKinney	41efe32aa5	[GPU-Plugin] Multi-GPU for grow_gpu_hist histogram method using NVIDIA NCCL. (#2395 )	2017-06-12 05:06:08 +12:00
Rory Mitchell	6bf968efe6	[GPU Plugin] Fast histogram speed improvements. Updated benchmarks. (#2258 )	2017-05-08 09:21:38 -07:00
Rory Mitchell	8ab5d4611c	[GPU-Plugin] (#2227 ) * Add fast histogram algorithm * Fix Linux build * Add 'gpu_id' parameter	2017-04-25 16:37:10 -07:00
Philip Cho	2715baef64	Fix bugs in multithreaded ApplySplitSparseData() (#2161 ) * Bugfix 1: Fix segfault in multithreaded ApplySplitSparseData() When there are more threads than rows in rowset, some threads end up with empty ranges, causing them to crash. (iend - 1 needs to be accessible as part of algorithm) Fix: run only those threads with nonempty ranges. * Add regression test for Bugfix 1 * Moving python_omp_test to existing python test group Turns out you don't need to set "OMP_NUM_THREADS" to enable multithreading. Just add nthread parameter. * Bugfix 2: Fix corner case of ApplySplitSparseData() for categorical feature When split value is less than all cut points, split_cond is set incorrectly. Fix: set split_cond = -1 to indicate this scenario * Bugfix 3: Initialize data layout indicator before using it data_layout_ is accessed before being set; this variable determines whether feature 0 is included in feat_set. Fix: re-order code in InitData() to initialize data_layout_ first * Adding regression test for Bugfix 2 Unfortunately, no regression test for Bugfix 3, as there is no way to deterministically assign value to an uninitialized variable.	2017-04-02 11:37:39 -07:00
Philip Cho	14fba01b5a	Improve multi-threaded performance (#2104 ) * Add UpdatePredictionCache() option to updaters Some updaters (e.g. fast_hist) has enough information to quickly compute prediction cache for the training data. Each updater may override UpdaterPredictionCache() method to update the prediction cache. Note: this trick does not apply to validation data. * Respond to code review * Disable some debug messages by default * Document UpdatePredictionCache() interface * Remove base_margin logic from UpdatePredictionCache() implementation * Do not take pointer to cfg, as reference may get stale * Improve multi-threaded performance * Use columnwise accessor to accelerate ApplySplit() step, with support for a compressed representation * Parallel sort for evaluation step * Inline BuildHist() function * Cache gradient pairs when building histograms in BuildHist() * Add missing #if macro * Respond to code review * Use wrapper to enable parallel sort on Linux * Fix C++ compatibility issues * MSVC doesn't support unsigned in OpenMP loops * gcc 4.6 doesn't support using keyword * Fix lint issues * Respond to code review * Fix bug in ApplySplitSparseData() * Attempting to read beyond the end of a sparse column * Mishandling the case where an entire range of rows have missing values * Fix training continuation bug Disable UpdatePredictionCache() in the first iteration. This way, we can accomodate the scenario where we build off of an existing (nonempty) ensemble. * Add regression test for fast_hist * Respond to code review * Add back old version of ApplySplitSparseData	2017-03-25 10:35:01 -07:00
Tianqi Chen	d581a3d0e7	[UPDATE] Update rabit and threadlocal (#2114 ) * [UPDATE] Update rabit and threadlocal * minor fix to make build system happy * upgrade requirement to g++4.8 * upgrade dmlc-core * update travis	2017-03-16 18:48:37 -07:00
Tianqi Chen	fd19b7a188	Automatically remove nan from input data when it is sparse. (#2062 ) * [DATALoad] Automatically remove Nan when load from sparse matrix * add log	2017-02-25 08:59:17 -08:00
Philip Cho	5d74578095	Disallow multiple roots for tree_method=hist (#1979 ) As discussed in issue #1978, tree_method=hist ignores the parameter param.num_roots; it simply assumes that the tree has only one root. In particular, when InitData() method initializes row_set_collection_, it simply assigns all rows to node 0, the value that's hard-coded. For now, the updater will simply fail when num_roots exceeds 1. I will revise the updater soon to support multiple roots.	2017-01-21 12:02:29 -08:00
Philip Cho	49ff7c1649	Rename parameter in fast_hist to disambiguate (#1962 )	2017-01-13 11:35:55 -08:00

1 2 3 4 5

248 Commits