xgboost

Author	SHA1	Message	Date
Rory Mitchell	9683fd433e	Overload device memory allocation (#4532 ) * Group source files, include headers in source files * Overload device memory allocation	2019-06-10 11:35:13 +12:00
Philip Hyunsu Cho	3f2fe25a32	Fix C++11 config parser (#4521 ) * Fix C++11 config parser * Use raw strings to improve readability of regex * Fix compilation for GCC 5.x Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2019-06-03 22:18:16 +08:00
Rory Mitchell	fbbae3386a	Smarter choice of histogram construction for distributed gpu_hist (#4519 ) * Smarter choice of histogram construction for distributed gpu_hist * Limit omp team size in ExecuteShards	2019-05-31 14:11:34 +12:00
fuhaoda	dd60fc23e6	Simplify INI-style config reader using C++11 STL (#4478 ) * simplify the config.h file * revise config.h * revised config.h * revise format * revise format issues * revise whitespace issues * revise whitespace namespace format issues * revise namespace format issues * format issues * format issues * format issues * format issues * Revert submodule changes * minor change * Update src/common/config.h Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu> * address format issue from trivialfis * Use correct cub submodule	2019-05-30 11:57:56 -07:00
sriramch	fed665ae8a	- training with external memory part 1 of 2 (#4486 ) * - training with external memory part 1 of 2 - this pr focuses on computing the quantiles using multiple gpus on a dataset that uses the external cache capabilities - there will a follow-up pr soon after this that will support creation of histogram indices on large dataset as well - both of these changes are required to support training with external memory - the sparse pages in dmatrix are taken in batches and the the cut matrices are incrementally built - also snuck in some (perf) changes related to sketches aggregation amongst multiple features across multiple sparse page batches. instead of aggregating the summary inside each device and merged later, it is aggregated in-place when the device is working on different rows but the same feature	2019-05-30 08:18:34 +12:00
Jiaming Yuan	c589eff941	De-duplicate GPU parameters. (#4454 ) * Only define `gpu_id` and `n_gpus` in `LearnerTrainParam` * Pass LearnerTrainParam through XGBoost vid factory method. * Disable all GPU usage when GPU related parameters are not specified (fixes XGBoost choosing GPU over aggressively). * Test learner train param io. * Fix gpu pickling.	2019-05-29 11:55:57 +08:00
sriramch	a3fedbeaa8	- fix issues with training with external memory on cpu (#4487 ) * - fix issues with training with external memory on cpu - use the batch size to determine the correct number of rows in a batch - use the right number of threads in omp parallalization if the batch size is less than the default omp max threads (applicable for the last batch) * - handle scenarios where last batch size is < available number of threads - augment tests such that we can test all scenarios (batch size <, >, = number of threads)	2019-05-29 12:31:30 +12:00
Jiaming Yuan	55e645c5f5	Revert hist init optimization. (#4502 )	2019-05-26 08:57:41 +08:00
Rong Ou	df2cdaca50	add cuda 10.1 support (#4468 )	2019-05-14 18:30:58 +00:00
Rong Ou	be0f346ec9	mgpu predictor using explicit offsets (#4438 ) * mgpu prediction using explicit sharding	2019-05-11 09:35:06 +12:00
Rong Ou	eaab364a63	More explict sharding methods for device memory (#4396 ) * Rename the Reshard method to Shard * Add a new Reshard method for sharding a vector that's already sharded	2019-05-01 11:47:22 +12:00
Rory Mitchell	5e582b0fa7	Combine thread launches into single launch per tree for gpu_hist (#4343 ) * Combine thread launches into single launch per tree for gpu_hist algorithm. * Address deprecation warning * Add manual column sampler constructor * Turn off omp dynamic to get a guaranteed number of threads * Enable openmp in cuda code	2019-04-29 09:58:34 +12:00
Egor Smirnov	711397d645	Optimizations of pre-processing for 'hist' tree method (#4310 ) * oprimizations for pre-processing * code cleaning * code cleaning * code cleaning after review * Apply suggestions from code review Co-Authored-By: SmirnovEgorRu <egor.smirnov@intel.com>	2019-04-16 17:36:19 -07:00
sriramch	2f7087eba1	Improve HostDeviceVector exception safety (#4301 ) * make the assignments of HostDeviceVector exception safe. * storing a dummy GPUDistribution instance in HDV for CPU based code. * change testxgboost binary location to build directory.	2019-03-31 22:48:58 +08:00
Rory Mitchell	3f312e30db	Retire DVec class in favour of c++20 style span for device memory. (#4293 )	2019-03-28 13:59:58 +13:00
Rong Ou	5aa42b5f11	jenkins build for cuda 10.0 (#4281 ) * jenkins build for cuda 10.0 * yum install nccl2 for cuda 10.0	2019-03-22 22:35:18 -07:00
Rory Mitchell	00465d243d	Optimisations for gpu_hist. (#4248 ) * Optimisations for gpu_hist. * Use streams to overlap operations. * ColumnSampler now uses HostDeviceVector to prevent repeatedly copying feature vectors to the device.	2019-03-20 13:30:06 +13:00
Jiaming Yuan	cf8d5b9b76	Mark CUDA 10.1 as unsupported. (#4265 )	2019-03-17 16:59:15 +08:00
Andy Adinets	b833b642ec	Improved multi-node multi-GPU random forests. (#4238 ) * Improved multi-node multi-GPU random forests. - removed rabit::Broadcast() from each invocation of column sampling - instead, syncing the PRNG seed when a ColumnSampler() object is constructed - this makes non-trivial column sampling significantly faster in the distributed case - refactored distributed GPU tests - added distributed random forests tests	2019-03-13 12:36:28 +13:00
Jiaming Yuan	7b9043cf71	Fix clang-tidy warnings. (#4149 ) * Upgrade gtest for clang-tidy. * Use CMake to install GTest instead of mv. * Don't enforce clang-tidy to return 0 due to errors in thrust. * Add a small test for tidy itself. * Reformat.	2019-03-13 02:25:51 +08:00
Tong He	259fb809e9	fix R-devel errors (#4251 )	2019-03-12 10:06:54 -07:00
Rory Mitchell	4eeeded7d1	Remove various synchronisations from cuda API calls, instrument monitor (#4205 ) * Remove various synchronisations from cuda API calls, instrument monitor with nvtx profiler ranges.	2019-03-10 15:01:23 +13:00
Philip Hyunsu Cho	f83e62dca5	Address #4042 : Prevent out-of-range access in column matrix (#4231 )	2019-03-08 17:11:42 -08:00
Rong Ou	9837b09b20	support cuda 10.1 (#4223 ) * support cuda 10.1 * add cuda 10.1 to jenkins build matrix	2019-03-08 12:22:12 +13:00
Matthew Jones	92b7577c62	[REVIEW] Enable Multi-Node Multi-GPU functionality (#4095 ) * Initial commit to support multi-node multi-gpu xgboost using dask * Fixed NCCL initialization by not ignoring the opg parameter. - it now crashes on NCCL initialization, but at least we're attempting it properly * At the root node, perform a rabit::Allreduce to get initial sum_gradient across workers * Synchronizing in a couple of more places. - now the workers don't go down, but just hang - no more "wild" values of gradients - probably needs syncing in more places * Added another missing max-allreduce operation inside BuildHistLeftRight * Removed unnecessary collective operations. * Simplified rabit::Allreduce() sync of gradient sums. * Removed unnecessary rabit syncs around ncclAllReduce. - this improves performance _significantly_ (7x faster for overall training, 20x faster for xgboost proper) * pulling in latest xgboost * removing changes to updater_quantile_hist.cc * changing use_nccl_opg initialization, removing unnecessary if statements * added definition for opaque ncclUniqueId struct to properly encapsulate GetUniqueId * placing struct defintion in guard to avoid duplicate code errors * addressing linting errors * removing * removing additional arguments to AllReduer initialization * removing distributed flag * making comm init symmetric * removing distributed flag * changing ncclCommInit to support multiple modalities * fix indenting * updating ncclCommInitRank block with necessary group calls * fix indenting * adding print statement, and updating accessor in vector * improving print statement to end-line * generalizing nccl_rank construction using rabit * assume device_ordinals is the same for every node * test, assume device_ordinals is identical for all nodes * test, assume device_ordinals is unique for all nodes * changing names of offset variable to be more descriptive, editing indenting * wrapping ncclUniqueId GetUniqueId() and aesthetic changes * adding synchronization, and tests for distributed * adding to tests * fixing broken #endif * fixing initialization of gpu histograms, correcting errors in tests * adding to contributors list * adding distributed tests to jenkins * fixing bad path in distributed test * debugging * adding kubernetes for distributed tests * adding proper import for OrderedDict * adding urllib3==1.22 to address ordered_dict import error * added sleep to allow workers to save their models for comparison * adding name to GPU contributors under docs	2019-03-02 10:03:22 +13:00
Jiaming Yuan	7ea5675679	Add PushCSC for SparsePage. (#4193 ) * Add PushCSC for SparsePage. * Move Push* definitions into cc file. * Add std:: prefix to `size_t` make clang++ happy. * Address monitor count == 0.	2019-03-02 01:58:08 +08:00
Philip Hyunsu Cho	549c8d6ae9	Prevent empty quantiles in fast hist (#4155 ) * Prevent empty quantiles * Revise and improve unit tests for quantile hist * Remove unnecessary comment * Add #2943 as a test case * Skip test if no sklearn * Revise misleading comments	2019-02-17 16:01:07 -08:00
Jiaming Yuan	2e618af743	Fix cpplint. (#4157 ) * Add comment after #endif. * Add missing headers.	2019-02-18 00:16:29 +08:00
Rory Mitchell	71a604fae3	Fix for windows compilation (#4139 )	2019-02-17 19:42:32 +13:00
Jiaming Yuan	1fe874e58a	Fix empty subspan. (#4151 ) * Silent the death tests.	2019-02-17 04:48:03 +08:00
Jiaming Yuan	754fe8142b	Make `HistCutMatrix::Init' be aware of groups. (#4115 ) * Add checks for group size. * Simple docs. * Search group index during hist cut matrix initialization. Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2019-02-16 04:39:41 +08:00
Nan Zhu	c18a3660fa	Separate Depthwidth and Lossguide growing policy in fast histogram (#4102 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * init * more changes * temp * update * udpate rabit * change the histogram * update kfactor * sync per node stats * temp * update * final * code clean * update rabit * more cleanup * fix errors * fix failed tests * enforce c++11 * broadcast subsampled feature correctly * init col * temp * col sampling * fix histmastrix init * fix col sampling * remove cout * fix out of bound access * fix core dump remove core dump file * disbale test temporarily * update * add fid * print perf data * update * revert some changes * temp * temp * pass all tests * bring back some tests * recover some changes * fix lint issue * enable monotone and interaction constraints * don't specify default for monotone and interactions * recover column init part * more recovery * fix core dumps * code clean * revert some changes * fix test compilation issue * fix lint issue * resolve compilation issue * fix issues of lint caused by rebase * fix stylistic changes and change variable names * use regtree internal function * modularize depth width * address the comments * fix failed tests * wrap perf timers with class * fix lint * fix num_leaves count * fix indention * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.h Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.h Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * merge * fix compilation	2019-02-13 12:56:19 -08:00
Jiaming Yuan	f8ca2960fc	Use nccl group calls to prevent from dead lock. (#4113 ) * launch all reduce sequentially. * Fix gpu_exact test memory leak.	2019-02-08 06:12:39 +08:00
Jiaming Yuan	017c97b8ce	Clean up training code. (#3825 ) * Remove GHistRow, GHistEntry, GHistIndexRow. * Remove kSimpleStats. * Remove CheckInfo, SetLeafVec in GradStats and in SKStats. * Clean up the GradStats. * Cleanup calcgain. * Move LossChangeMissing out of common. * Remove [] operator from GHistIndexBlock.	2019-02-07 14:22:13 +08:00
Nan Zhu	ae3bb9c2d5	Distributed Fast Histogram Algorithm (#4011 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * init * allow hist algo * more changes * temp * update * remove hist sync * udpate rabit * change hist size * change the histogram * update kfactor * sync per node stats * temp * update * final * code clean * update rabit * more cleanup * fix errors * fix failed tests * enforce c++11 * fix lint issue * broadcast subsampled feature correctly * revert some changes * fix lint issue * enable monotone and interaction constraints * don't specify default for monotone and interactions * update docs	2019-02-05 05:12:53 -08:00
Egor Smirnov	5f151c5cf3	Performance optimizations for Intel CPUs (#3957 ) * Initial performance optimizations for xgboost * remove includes * revert float->double * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * fix for CI * Check existence of _mm_prefetch and __builtin_prefetch * Fix lint	2019-01-08 21:08:13 -08:00
Jiaming Yuan	1f022929f4	Use Span in gpu coordinate. (#4029 ) * Use Span in gpu coordinate. * Use Span in device code. * Fix shard size calculation. - Use lower_bound instead of upper_bound. * Check empty devices.	2019-01-02 11:32:43 +08:00
Jiaming Yuan	9897b5042f	Use Span in GPU exact updater. (#4020 ) * Use Span in GPU exact updater. * Add a small test.	2018-12-26 12:44:46 +08:00
Jiaming Yuan	c8c7b9649c	Fix and optimize logger (#4002 ) * Fix logging switch statement. * Remove debug_verbose_ in AllReducer. * Don't construct the stream when not needed. * Make default constructor deleted. * Remove redundant IsVerbose.	2018-12-17 19:23:05 +08:00
Andy Adinets	42bf90eb8f	Column sampling at individual nodes (splits). (#3971 ) * Column sampling at individual nodes (splits). * Documented colsample_bynode parameter. - also updated documentation for colsample_by* parameters * Updated documentation. * GetFeatureSet() returns shared pointer to std::vector. * Sync sampled columns across multiple processes.	2018-12-14 22:37:35 +08:00
Jiaming Yuan	e0a279114e	Unify logging facilities. (#3982 ) * Unify logging facilities. * Enhance `ConsoleLogger` to handle different verbosity. * Override macros from `dmlc`. * Don't use specialized gamma when building with GPU. * Remove verbosity cache in monitor. * Test monitor. * Deprecate `silent`. * Fix doc and messages. * Fix python test. * Fix silent tests.	2018-12-14 19:29:58 +08:00
Tong He	84a3af8dc0	Fix CRAN check warnings/notes (#3988 ) * fix * reorder declaration to match initialization	2018-12-12 08:23:20 -06:00
Andy Adinets	4be5edaf92	Initialized AllReducer counters to 0. (#3987 )	2018-12-12 09:09:20 +13:00
Rory Mitchell	93f9ce9ef9	Single precision histograms on GPU (#3965 ) * Allow single precision histogram summation in gpu_hist * Add python test, reduce run-time of gpu_hist tests * Update documentation	2018-12-10 10:55:30 +13:00
Philip Hyunsu Cho	9af6b689d6	Use int instead of char in CLI config parser (#3976 )	2018-12-07 01:00:21 -08:00
Jiaming Yuan	48dddfd635	Porting elementwise metrics to GPU. (#3952 ) * Port elementwise metrics to GPU. * All elementwise metrics are converted to static polymorphic. * Create a reducer for metrics reduction. * Remove const of Metric::Eval to accommodate CubMemory.	2018-12-01 18:46:45 +13:00
Rory Mitchell	a9d684db18	GPU performance logging/improvements (#3945 ) - Improved GPU performance logging - Only use one execute shards function - Revert performance regression on multi-GPU - Use threads to launch NCCL AllReduce	2018-11-29 14:36:51 +13:00
Rory Mitchell	7af0946ac1	Improve update position function for gpu_hist (#3895 )	2018-11-14 19:33:29 +13:00
Rory Mitchell	926eb651fe	Minor refactor of split evaluation in gpu_hist (#3889 ) * Refactor evaluate split into shard * Use span in evaluate split * Update google tests	2018-11-14 00:11:20 +13:00
Jiaming Yuan	daf77ca7b7	Enable running objectives with 0 GPU. (#3878 ) * Enable running objectives with 0 GPU. * Enable 0 GPU for objectives. * Add doc for GPU objectives. * Fix some objectives defaulted to running on all GPUs.	2018-11-13 20:19:59 +13:00

1 2 3 4 5

226 Commits