xgboost

Author	SHA1	Message	Date
Rong Ou	81c1cd40ca	add a test for cpu predictor using external memory (#4308 ) * add a test for cpu predictor using external memory * allow different page size for testing	2019-04-10 13:25:10 +12:00
Philip Hyunsu Cho	70be1e38c2	[CI] Optimize external Docker build cache (#4334 ) * When building pull requests, use Docker cache for master branch Docker build caches are per-branch, so new pull requests will initially have no build cache, causing the Docker containers to be built from scratch. New pull requests should use the cache associated with the master branch. This makes sense, since most pull requests do not modify the Dockerfile. * Add comments	2019-04-04 15:59:07 -07:00
Philip Hyunsu Cho	37c75aac41	[CI] Add external Docker build cache (#4331 )	2019-04-04 13:36:39 -07:00
sriramch	2f7087eba1	Improve HostDeviceVector exception safety (#4301 ) * make the assignments of HostDeviceVector exception safe. * storing a dummy GPUDistribution instance in HDV for CPU based code. * change testxgboost binary location to build directory.	2019-03-31 22:48:58 +08:00
Philip Hyunsu Cho	7aed8f3d48	[CI] Upgrade to GCC 5.3.1, CMake 3.6.0 (#4306 ) * Upgrade to GCC 5.3.1, CMake 3.6.0 * <regex> is now okay	2019-03-28 00:21:21 -07:00
Rory Mitchell	3f312e30db	Retire DVec class in favour of c++20 style span for device memory. (#4293 )	2019-03-28 13:59:58 +13:00
Jiaming Yuan	c85181dd8a	Remove remaining `silent` and `debug_verbose`. (#4299 )	2019-03-28 03:30:46 +08:00
Rory Mitchell	6d5b34d824	Further optimisations for gpu_hist. (#4283 ) - Fuse final update position functions into a single more efficient kernel - Refactor gpu_hist with a more explicit ellpack matrix representation	2019-03-24 17:17:22 +13:00
Rong Ou	5aa42b5f11	jenkins build for cuda 10.0 (#4281 ) * jenkins build for cuda 10.0 * yum install nccl2 for cuda 10.0	2019-03-22 22:35:18 -07:00
Rory Mitchell	00465d243d	Optimisations for gpu_hist. (#4248 ) * Optimisations for gpu_hist. * Use streams to overlap operations. * ColumnSampler now uses HostDeviceVector to prevent repeatedly copying feature vectors to the device.	2019-03-20 13:30:06 +13:00
Jiaming Yuan	29a1356669	Deprecate `reg:linear' in favor of` reg:squarederror'. (#4267 ) * Deprecate `reg:linear' in favor of `reg:squarederror'. * Replace the use of `reg:linear'. * Replace the use of `silent`.	2019-03-17 17:55:04 +08:00
Andy Adinets	4352fcdb15	Brought the silent parameter for the SKLearn-like API back, marked it deprecated. (#4255 ) * Brought the silent parameter for the SKLearn-like API back, marked it deprecated. - added deprecation notice and warning - removed silent from the tests for the SKLearn-like API	2019-03-14 09:45:08 +13:00
Andy Adinets	b833b642ec	Improved multi-node multi-GPU random forests. (#4238 ) * Improved multi-node multi-GPU random forests. - removed rabit::Broadcast() from each invocation of column sampling - instead, syncing the PRNG seed when a ColumnSampler() object is constructed - this makes non-trivial column sampling significantly faster in the distributed case - refactored distributed GPU tests - added distributed random forests tests	2019-03-13 12:36:28 +13:00
Jiaming Yuan	7b9043cf71	Fix clang-tidy warnings. (#4149 ) * Upgrade gtest for clang-tidy. * Use CMake to install GTest instead of mv. * Don't enforce clang-tidy to return 0 due to errors in thrust. * Add a small test for tidy itself. * Reformat.	2019-03-13 02:25:51 +08:00
Andy Adinets	a36c3ed4f4	Added SKLearn-like random forest Python API. (#4148 ) * Added SKLearn-like random forest Python API. - added XGBRFClassifier and XGBRFRegressor classes to SKL-like xgboost API - also added n_gpus and gpu_id parameters to SKL classes - added documentation describing how to use xgboost for random forests, as well as existing caveats	2019-03-12 22:28:19 +08:00
Matthew Jones	92b7577c62	[REVIEW] Enable Multi-Node Multi-GPU functionality (#4095 ) * Initial commit to support multi-node multi-gpu xgboost using dask * Fixed NCCL initialization by not ignoring the opg parameter. - it now crashes on NCCL initialization, but at least we're attempting it properly * At the root node, perform a rabit::Allreduce to get initial sum_gradient across workers * Synchronizing in a couple of more places. - now the workers don't go down, but just hang - no more "wild" values of gradients - probably needs syncing in more places * Added another missing max-allreduce operation inside BuildHistLeftRight * Removed unnecessary collective operations. * Simplified rabit::Allreduce() sync of gradient sums. * Removed unnecessary rabit syncs around ncclAllReduce. - this improves performance _significantly_ (7x faster for overall training, 20x faster for xgboost proper) * pulling in latest xgboost * removing changes to updater_quantile_hist.cc * changing use_nccl_opg initialization, removing unnecessary if statements * added definition for opaque ncclUniqueId struct to properly encapsulate GetUniqueId * placing struct defintion in guard to avoid duplicate code errors * addressing linting errors * removing * removing additional arguments to AllReduer initialization * removing distributed flag * making comm init symmetric * removing distributed flag * changing ncclCommInit to support multiple modalities * fix indenting * updating ncclCommInitRank block with necessary group calls * fix indenting * adding print statement, and updating accessor in vector * improving print statement to end-line * generalizing nccl_rank construction using rabit * assume device_ordinals is the same for every node * test, assume device_ordinals is identical for all nodes * test, assume device_ordinals is unique for all nodes * changing names of offset variable to be more descriptive, editing indenting * wrapping ncclUniqueId GetUniqueId() and aesthetic changes * adding synchronization, and tests for distributed * adding to tests * fixing broken #endif * fixing initialization of gpu histograms, correcting errors in tests * adding to contributors list * adding distributed tests to jenkins * fixing bad path in distributed test * debugging * adding kubernetes for distributed tests * adding proper import for OrderedDict * adding urllib3==1.22 to address ordered_dict import error * added sleep to allow workers to save their models for comparison * adding name to GPU contributors under docs	2019-03-02 10:03:22 +13:00
Jiaming Yuan	7ea5675679	Add PushCSC for SparsePage. (#4193 ) * Add PushCSC for SparsePage. * Move Push* definitions into cc file. * Add std:: prefix to `size_t` make clang++ happy. * Address monitor count == 0.	2019-03-02 01:58:08 +08:00
Patrick Ford	74009afcac	Added trees_to_df() method for Booster class (#4153 ) * add test_parse_tree.py to tests/python * Fix formatting * Fix pylint error * Ignore 'no member' error for Pandas dataframe	2019-02-26 13:28:24 -08:00
Rong Ou	8e0a08fbcf	Update python benchmarking script (#4164 ) * a few tweaks to speed up data generation * del variable to save memory * switch to random numpy arrays	2019-02-21 15:16:09 +13:00
Philip Hyunsu Cho	2aaae2e7bb	Fix #4163 : always copy sliced data (#4165 ) * Revert "Accept numpy array view. (#4147)" This reverts commit `a985a99cf0`. * Fix #4163: always copy sliced data * Remove print() from the test; check shape equality * Check if 'base' attribute exists * Fix lint * Address reviewer comment * Fix lint	2019-02-20 14:46:34 -08:00
Jiaming Yuan	cecbe0cf71	Fix test_gpu_coordinate. (#3974 ) * Fix test_gpu_coordinate. * Use `gpu_coord_descent` in test. * Reduce number of running rounds. * Remove nthread. * Use githubusercontent for r-appveyor. * Use githubusercontent in travis r tests.	2019-02-19 14:09:10 -08:00
Nan Zhu	1dac5e2410	more correct way to build node stats in distributed fast hist (#4140 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * more changes * temp * update * udpate rabit * change the histogram * update kfactor * sync per node stats * temp * update * final * code clean * update rabit * more cleanup * fix errors * fix failed tests * enforce c++11 * broadcast subsampled feature correctly * init col * temp * col sampling * fix histmastrix init * fix col sampling * remove cout * fix out of bound access * fix core dump remove core dump file * update * add fid * update * revert some changes * temp * temp * pass all tests * bring back some tests * recover some changes * fix lint issue * enable monotone and interaction constraints * don't specify default for monotone and interactions * recover column init part * more recovery * fix core dumps * code clean * revert some changes * fix test compilation issue * fix lint issue * resolve compilation issue * fix issues of lint caused by rebase * fix stylistic changes and change variable names * modularize depth width * address the comments * fix failed tests * wrap perf timers with class * temp * pass all lossguide * pass tests * add comments * more changes * use separate flow for single and tests * add test for lossguide hist * remove duplications * syncing stats for only once * recover more changes * recover more changes * fix root-stats * simplify code * remove outdated comments	2019-02-18 13:45:30 -08:00
Jiaming Yuan	a985a99cf0	Accept numpy array view. (#4147 ) * Accept array view (slice) in metainfo.	2019-02-18 22:21:34 +08:00
Philip Hyunsu Cho	549c8d6ae9	Prevent empty quantiles in fast hist (#4155 ) * Prevent empty quantiles * Revise and improve unit tests for quantile hist * Remove unnecessary comment * Add #2943 as a test case * Skip test if no sklearn * Revise misleading comments	2019-02-17 16:01:07 -08:00
Jiaming Yuan	e1240413c9	Fix gpu_hist apply_split test. (#4158 )	2019-02-18 02:48:28 +08:00
Jiaming Yuan	1fe874e58a	Fix empty subspan. (#4151 ) * Silent the death tests.	2019-02-17 04:48:03 +08:00
Jiaming Yuan	754fe8142b	Make `HistCutMatrix::Init' be aware of groups. (#4115 ) * Add checks for group size. * Simple docs. * Search group index during hist cut matrix initialization. Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>	2019-02-16 04:39:41 +08:00
Philip Hyunsu Cho	37ddfd7d6e	Fix broken R test: Install Homebrew GCC (#4142 ) * Fix broken R test: Install Homebrew GCC Missing GCC Fortran causes installation failure of a dependency package (igraph) * Register gfortran system-wide * Use correct keg * Set env vars to change compiler choice * Do not break other Mac builds * Nuclear option: symlink gfortran * Use /usr/local/bin instead of /usr/bin * Symlink library path too * Update run_test.sh	2019-02-15 07:23:05 -08:00
Nan Zhu	c18a3660fa	Separate Depthwidth and Lossguide growing policy in fast histogram (#4102 ) * add back train method but mark as deprecated * add back train method but mark as deprecated * add back train method but mark as deprecated * fix scalastyle error * fix scalastyle error * fix scalastyle error * fix scalastyle error * init * more changes * temp * update * udpate rabit * change the histogram * update kfactor * sync per node stats * temp * update * final * code clean * update rabit * more cleanup * fix errors * fix failed tests * enforce c++11 * broadcast subsampled feature correctly * init col * temp * col sampling * fix histmastrix init * fix col sampling * remove cout * fix out of bound access * fix core dump remove core dump file * disbale test temporarily * update * add fid * print perf data * update * revert some changes * temp * temp * pass all tests * bring back some tests * recover some changes * fix lint issue * enable monotone and interaction constraints * don't specify default for monotone and interactions * recover column init part * more recovery * fix core dumps * code clean * revert some changes * fix test compilation issue * fix lint issue * resolve compilation issue * fix issues of lint caused by rebase * fix stylistic changes and change variable names * use regtree internal function * modularize depth width * address the comments * fix failed tests * wrap perf timers with class * fix lint * fix num_leaves count * fix indention * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.h Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.cc Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * Update src/tree/updater_quantile_hist.h Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com> * merge * fix compilation	2019-02-13 12:56:19 -08:00
Rong Ou	3be1b9ae30	reformat benchmark_tree.py to get rid of lint errors (#4126 )	2019-02-13 18:54:56 +13:00
Jiaming Yuan	f8ca2960fc	Use nccl group calls to prevent from dead lock. (#4113 ) * launch all reduce sequentially. * Fix gpu_exact test memory leak.	2019-02-08 06:12:39 +08:00
Jiaming Yuan	017c97b8ce	Clean up training code. (#3825 ) * Remove GHistRow, GHistEntry, GHistIndexRow. * Remove kSimpleStats. * Remove CheckInfo, SetLeafVec in GradStats and in SKStats. * Clean up the GradStats. * Cleanup calcgain. * Move LossChangeMissing out of common. * Remove [] operator from GHistIndexBlock.	2019-02-07 14:22:13 +08:00
Jiaming Yuan	8905df4a18	Perform clang-tidy on both cpp and cuda source. (#4034 ) * Basic script for using compilation database. * Add `GENERATE_COMPILATION_DATABASE' to CMake. * Rearrange CMakeLists.txt. * Add basic python clang-tidy script. * Remove modernize-use-auto. * Add clang-tidy to Jenkins * Refine logic for correct path detection In Jenkins, the project root is of form /home/ubuntu/workspace/xgboost_PR-XXXX * Run clang-tidy in CUDA 9.2 container * Use clang_tidy container	2019-02-05 16:07:43 +08:00
Philip Hyunsu Cho	7a652a8c64	Speed up Jenkins by not compiling CMake (#4099 )	2019-02-03 00:08:14 -08:00
tmitanitky	59f868bc60	enable xgb_model in scklearn XGBClassifier and test. (#4092 ) * Enable xgb_model parameter in XGClassifier scikit-learn API https://github.com/dmlc/xgboost/issues/3049 * add test_XGBClassifier_resume(): test for xgb_model parameter in XGBClassifier API. * Update test_with_sklearn.py * Fix lint	2019-01-31 11:29:19 -08:00
Philip Hyunsu Cho	a1c35cadf0	Fix failing Travis CI on Mac (#4086 ) * Fix failing Travis CI on Mac Use Homebrew Addon + latest Mac image * Use long command for pytest * Downgrade OSX image to xcode9.3, to use Java 8 * Install pytest in Python 2 environment * Remove clang-tidy from Travis	2019-01-30 09:43:57 -08:00
Rory Mitchell	1fc37e4749	Require leaf statistics when expanding tree (#4015 ) * Cache left and right gradient sums * Require leaf statistics when expanding tree	2019-01-17 21:12:20 -08:00
Andy Adinets	0f8af85f64	Fixed single-GPU tests. (#4053 ) - ./testxgboost (without filters) failed if run on a multi-GPU machine because the memory was allocated on the current device, but device 0 was always passed into LaunchN	2019-01-11 09:33:15 +02:00
Jiaming Yuan	1f022929f4	Use Span in gpu coordinate. (#4029 ) * Use Span in gpu coordinate. * Use Span in device code. * Fix shard size calculation. - Use lower_bound instead of upper_bound. * Check empty devices.	2019-01-02 11:32:43 +08:00
Jiaming Yuan	be948df23f	Fix ignoring dart in updater configuration. (#4024 ) * Fix ignoring dart in updater configuration.	2018-12-26 18:24:45 +08:00
Jiaming Yuan	9897b5042f	Use Span in GPU exact updater. (#4020 ) * Use Span in GPU exact updater. * Add a small test.	2018-12-26 12:44:46 +08:00
Jiaming Yuan	85939c6a6e	Merge duplicated linear updater parameters. (#4013 ) * Merge duplicated linear updater parameters. * Split up coordinate descent parameter.	2018-12-22 13:21:49 +08:00
Rory Mitchell	f75a21af25	Reduce tree expand boilerplate code (#4008 )	2018-12-20 15:52:28 +13:00
Rory Mitchell	84c99f86f4	Combine TreeModel and RegTree (#3995 )	2018-12-19 12:16:40 +13:00
Jiaming Yuan	c8c7b9649c	Fix and optimize logger (#4002 ) * Fix logging switch statement. * Remove debug_verbose_ in AllReducer. * Don't construct the stream when not needed. * Make default constructor deleted. * Remove redundant IsVerbose.	2018-12-17 19:23:05 +08:00
Andy Adinets	42bf90eb8f	Column sampling at individual nodes (splits). (#3971 ) * Column sampling at individual nodes (splits). * Documented colsample_bynode parameter. - also updated documentation for colsample_by* parameters * Updated documentation. * GetFeatureSet() returns shared pointer to std::vector. * Sync sampled columns across multiple processes.	2018-12-14 22:37:35 +08:00
Jiaming Yuan	e0a279114e	Unify logging facilities. (#3982 ) * Unify logging facilities. * Enhance `ConsoleLogger` to handle different verbosity. * Override macros from `dmlc`. * Don't use specialized gamma when building with GPU. * Remove verbosity cache in monitor. * Test monitor. * Deprecate `silent`. * Fix doc and messages. * Fix python test. * Fix silent tests.	2018-12-14 19:29:58 +08:00
Rory Mitchell	3d81c48d3f	Remove leaf vector, add tree serialisation test, fix Windows tests (#3989 )	2018-12-13 10:28:38 +13:00
Rory Mitchell	93f9ce9ef9	Single precision histograms on GPU (#3965 ) * Allow single precision histogram summation in gpu_hist * Add python test, reduce run-time of gpu_hist tests * Update documentation	2018-12-10 10:55:30 +13:00
Jiaming Yuan	48dddfd635	Porting elementwise metrics to GPU. (#3952 ) * Port elementwise metrics to GPU. * All elementwise metrics are converted to static polymorphic. * Create a reducer for metrics reduction. * Remove const of Metric::Eval to accommodate CubMemory.	2018-12-01 18:46:45 +13:00

... 4 5 6 7 8 ...

525 Commits