353 Commits

Author SHA1 Message Date
trivialfis
2c502784ff Span class. (#3548)
* Add basic Span class based on ISO++20.

* Use Span<Entry const> instead of Inst in SparsePage.

* Add DeviceSpan in HostDeviceVector, use it in regression obj.
2018-08-14 17:58:11 +12:00
Rory Mitchell
645996b12f Remove accidental SparsePage copies (#3583) 2018-08-12 17:49:38 -07:00
Rory Mitchell
bbb771f32e
Refactor parts of fast histogram utilities (#3564)
* Refactor parts of fast histogram utilities

* Removed byte packing from column matrix
2018-08-09 17:59:57 +12:00
Philip Hyunsu Cho
7fefd6865d
Fix #3402: wrong fid crashes distributed algorithm (#3535)
* Fix #3402: wrong fid crashes distributed algorithm

The bug was introduced by the recent DMatrix refactor (#3301). It was partially
fixed by #3408 but the example in #3402 was still failing. The example in #3402
will succeed after this fix is applied.

* Explicitly specify "this" to prevent compile error

* Add regression test

* Add distributed test to Travis matrix

* Install kubernetes Python package as dependency of dmlc tracker

* Add Python dependencies

* Add compile step

* Reduce size of regression test case

* Further reduce size of test
2018-08-04 19:20:04 -07:00
jqmp
dd07c25d12 Fix typo in ElasticNet threshold function (#3527) 2018-07-30 14:08:14 +12:00
Rory Mitchell
07ff52d54c
Dynamically allocate GPU histogram memory (#3519)
* Expand histogram memory dynamically to prevent large allocations for large tree depths (e.g. > 15)

* Remove GPU memory allocation messages. These are misleading as a large number of allocations are now dynamic.

* Fix appveyor R test
2018-07-28 21:22:41 +12:00
Andy Adinets
cc6a5a3666 Added finding quantiles on GPU. (#3393)
* Added finding quantiles on GPU.

- this includes datasets where weights are assigned to data rows
- as the quantiles found by the new algorithm are not the same
  as those found by the old one, test thresholds in
    tests/python-gpu/test_gpu_updaters.py have been adjusted.

* Adjustments and improved testing for finding quantiles on the GPU.

- added C++ tests for the DeviceSketch() function
- reduced one of the thresholds in test_gpu_updaters.py
- adjusted the cuts found by the find_cuts_k kernel
2018-07-27 14:03:16 +12:00
Rory Mitchell
a725272e19
Correct mistake from dmatrix refactor (#3408) 2018-07-24 15:03:36 +12:00
Rory Mitchell
0f145a0365
Resolve GPU bug on large files (#3472)
Remove calls to thrust copy, fix indexing bug
2018-07-16 20:43:45 +12:00
Henry Gouk
a13e29ece1 Add LASSO (#3429)
* Allow multiple split constraints

* Replace RidgePenalty with ElasticNet

* Add test for checking Ridge, LASSO, and Elastic Net are implemented
2018-07-15 16:38:26 +12:00
Thejaswi
2200939416 Upgrading to NCCL2 (#3404)
* Upgrading to NCCL2

* Part - II of NCCL2 upgradation

 - Doc updates to build with nccl2
 - Dockerfile.gpu update for a correct CI build with nccl2
 - Updated FindNccl package to have env-var NCCL_ROOT to take precedence

* Upgrading to v9.2 for CI workflow, since it has the nccl2 binaries available

* Added NCCL2 license + copy the nccl binaries into /usr location for the FindNccl module to find

* Set LD_LIBRARY_PATH variable to pick nccl2 binary at runtime

* Need the nccl2 library download instructions inside Dockerfile.release as well

* Use NCCL2 as a static library
2018-07-10 00:42:15 -07:00
Henry Gouk
64b8cffde3 Refactor of FastHistMaker to allow for custom regularisation methods (#3335)
* Refactor to allow for custom regularisation methods

* Implement compositional SplitEvaluator framework

* Fixed segfault when no monotone_constraints are supplied.

* Change pid to parentID

* test_monotone_constraints.py now passes

* Refactor ColMaker and DistColMaker to use SplitEvaluator

* Performance optimisation when no monotone_constraints specified

* Fix linter messages

* Fix a few more linter errors

* Update the amalgamation

* Add bounds check

* Add check for leaf node

* Fix linter error in param.h

* Fix clang-tidy errors on CI

* Fix incorrect function name

* Fix clang-tidy error in updater_fast_hist.cc

* Enable SSE2 for Win32 R MinGW

Addresses https://github.com/dmlc/xgboost/pull/3335#issuecomment-400535752

* Add contributor
2018-06-28 07:37:25 +00:00
Thejaswi
0e78034607 Shared memory atomics while building histogram (#3384)
* Use shared memory atomics for building histograms, whenever possible
2018-06-19 16:03:09 +12:00
Rory Mitchell
a96039141a
Dmatrix refactor stage 1 (#3301)
* Use sparse page as singular CSR matrix representation

* Simplify dmatrix methods

* Reduce statefullness of batch iterators

* BREAKING CHANGE: Remove prob_buffer_row parameter. Users are instead recommended to sample their dataset as a preprocessing step before using XGBoost.
2018-06-07 10:25:58 +12:00
Andy Adinets
286dccb8e8 GPU binning and compression. (#3319)
* GPU binning and compression.

- binning and index compression are done inside the DeviceShard constructor
- in case of a DMatrix with multiple row batches, it is first converted into a single row batch
2018-06-05 17:15:13 +12:00
Samuel O. Ronsin
cc79a65ab9 Increase precision of bst_float values in tree dumps (#3298)
* Increase precision of bst_float values in tree dumps

* Increase precision of bst_float values in tree dumps

* Fix lint error and switch precision to right float variable

* Fix clang-tidy error
2018-05-09 14:12:21 -07:00
Andrew V. Adinetz
b8a0d66fe6 Multi-GPU HostDeviceVector. (#3287)
* Multi-GPU HostDeviceVector.

- HostDeviceVector instances can now span multiple devices, defined by GPUSet struct
- the interface of HostDeviceVector has been modified accordingly
- GPU objective functions are now multi-GPU
- GPU predicting from cache is now multi-GPU
- avoiding omp_set_num_threads() calls
- other minor changes
2018-05-05 08:00:05 +12:00
Thejaswi
c80d51ccb3 Fix issue #3264, accuracy issues on k80 GPUs. (#3293) 2018-05-04 13:14:08 +12:00
Rory Mitchell
ccf80703ef
Clang-tidy static analysis (#3222)
* Clang-tidy static analysis

* Modernise checks

* Google coding standard checks

* Identifier renaming according to Google style
2018-04-19 18:57:13 +12:00
Rory Mitchell
a1ec7b1716
Change reduce operation from thrust to cub. Fix for cuda 9.1 error (#3218)
* Change reduce operation from thrust to cub. Fix for cuda 9.1 runtime error

* Unit test sum reduce
2018-04-04 14:21:48 +12:00
Andrew V. Adinetz
a1b48afa41 Added back UpdatePredictionCache() in updater_gpu_hist.cu. (#3120)
* Added back UpdatePredictionCache() in updater_gpu_hist.cu.

- it had been there before, but wasn't ported to the new version
  of updater_gpu_hist.cu
2018-03-09 15:06:45 +13:00
redditur
d5f1b74ef5 'hist': Montonic Constraints (#3085)
* Extended monotonic constraints support to 'hist' tree method.

* Added monotonic constraints tests.

* Fix the signature of NoConstraint::CalcSplitGain()

* Document monotonic constraint support in 'hist'

* Update signature of Update to account for latest refactor
2018-03-05 16:45:49 -08:00
Andrew V. Adinetz
d5992dd881 Replaced std::vector-based interfaces with HostDeviceVector-based interfaces. (#3116)
* Replaced std::vector-based interfaces with HostDeviceVector-based interfaces.

- replacement was performed in the learner, boosters, predictors,
  updaters, and objective functions
- only interfaces used in training were replaced;
  interfaces like PredictInstance() still use std::vector
- refactoring necessary for replacement of interfaces was also performed,
  such as using HostDeviceVector in prediction cache

* HostDeviceVector-based interfaces for custom objective function example plugin.
2018-02-28 13:00:04 +13:00
Rory Mitchell
dd82b28e20
Update GPU code with dmatrix changes (#3117) 2018-02-17 12:11:48 +13:00
Rory Mitchell
f87802f00c
Fix GPU bugs (#3051)
* Change uint to unsigned int

* Fix no root predictions bug

* Remove redundant splitting due to numerical instability
2018-01-23 13:14:15 +13:00
Thejaswi
84ab74f3a5 Objective function evaluation on GPU with minimal PCIe transfers (#2935)
* Added GPU objective function and no-copy interface.

- xgboost::HostDeviceVector<T> syncs automatically between host and device
- no-copy interfaces have been added
- default implementations just sync the data to host
  and call the implementations with std::vector
- GPU objective function, predictor, histogram updater process data
  directly on GPU
2018-01-12 21:33:39 +13:00
PSEUDOTENSOR / Jonathan McKinney
4d36036fe6 Avoid repeated cuda API call in GPU predictor and only synchronize used GPUs (#2936) 2017-12-09 16:00:42 +13:00
Rory Mitchell
1b77903eeb
Fix several GPU bugs (#2916)
* Fix #2905

* Fix gpu_exact test failures

* Fix bug in GPU prediction where multiple calls to batch prediction can produce incorrect results

* Fix GPU documentation formatting
2017-12-04 08:27:49 +13:00
Rory Mitchell
c51adb49b6
Monotone constraints for gpu_hist (#2904) 2017-11-30 10:26:19 +13:00
Rory Mitchell
c55f14668e
Update gpu_hist algorithm (#2901) 2017-11-27 13:44:24 +13:00
Rory Mitchell
24f527a1c0
AVX gradients (#2878)
* AVX gradients

* Add google test for AVX

* Create fallback implementation, remove fma instruction

* Improved accuracy of AVX exp function
2017-11-27 08:56:01 +13:00
Rory Mitchell
40c6e2f0c8
Improved gpu_hist_experimental algorithm (#2866)
- Implement colsampling, subsampling for gpu_hist_experimental

 - Optimised multi-GPU implementation for gpu_hist_experimental

 - Make nccl optional

 - Add Volta architecture flag

 - Optimise RegLossObj

 - Add timing utilities for debug verbose mode

 - Bump required cuda version to 8.0
2017-11-11 13:58:40 +13:00
Rory Mitchell
d9d5293cdb Add warnings for large labels when using GPU histogram algorithms (#2834) 2017-10-26 17:31:10 +13:00
Rory Mitchell
13e7a2cff0 Various bug fixes (#2825)
* Fatal error if GPU algorithm selected without GPU support compiled

* Resolve type conversion warnings

* Fix gpu unit test failure

* Fix compressed iterator edge case

* Fix python unit test failures due to flake8 update on pip
2017-10-25 14:45:01 +13:00
Rory Mitchell
4cb2f7598b -Add experimental GPU algorithm for lossguided mode (#2755)
-Improved GPU algorithm unit tests
-Removed some thrust code to improve compile times
2017-10-01 00:18:35 +13:00
Vadim Khotilovich
74db9757b3 [R package] GPU support (#2732)
* [R] MSVC compatibility

* [GPU] allow seed in BernoulliRng up to size_t and scale to uint32_t

* R package build with cmake and CUDA

* R package CUDA build fixes and cleanups

* always export the R package native initialization routine on windows

* update the install instructions doc

* fix lint

* use static_cast directly to set BernoulliRng seed

* [R] demo for GPU accelerated algorithm

* tidy up the R package cmake stuff

* R pack cmake: installs main dependency packages if needed

* [R] version bump in DESCRIPTION

* update NEWS

* added short missing/sparse values explanations to FAQ
2017-09-28 18:15:28 -05:00
Rory Mitchell
e6a9063344 Integer gradient summation for GPU histogram algorithm. (#2681) 2017-09-08 15:07:29 +12:00
Rory Mitchell
15267eedf2 [GPU-Plugin] Major refactor 2 (#2664)
* Change cmake option

* Move source files

* Move google tests

* Move python tests

* Move benchmarks

* Move documentation

* Remove makefile support

* Fix test run

* Move GPU tests
2017-09-08 09:57:16 +12:00
Rory Mitchell
19a53814ce [GPU-Plugin] Major refactor (#2644)
* Removal of redundant code/files.
* Removal of exact namespace in GPU plugin
* Revert double precision histograms to single precision for performance on Maxwell/Kepler
2017-08-30 10:53:52 +12:00
PSEUDOTENSOR / Jonathan McKinney
ca7fc9fda3 [GPU-Plugin] Fix gpu_hist to allow matrices with more than just 2^{32} elements. Also fixed CPU hist algorithm. (#2518) 2017-07-18 11:19:27 +12:00
Philip Cho
64c8f6fa6d Use old parallel algorithm for histogram construction by default (#2501)
It has been reported that new parallel algorithm (#2493) results in excessive
message usage (see issue #2326). Until issues are resolved, XGBoost should use
the old parallel algorithm by default. The user would have to specify
`enable_feature_grouping=1` manually to enable the new algorithm.
2017-07-10 09:35:48 -07:00
Vadim Khotilovich
7350085955 Fix broken make on windows (#2499)
* fix Makefile for make on windows

* clean up compilation warnings

* fix for `no file name for include` make warning
2017-07-08 09:17:31 -07:00
Philip Cho
ba820847f9 Patch to improve multithreaded performance scaling (#2493)
* Patch to improve multithreaded performance scaling

Change parallel strategy for histogram construction.
Instead of partitioning data rows among multiple threads, partition feature
columns instead. Useful heuristics for assigning partitions have been adopted
from LightGBM project.

* Add missing header to satisfy MSVC

* Restore max_bin and related parameters to TrainParam

* Fix lint error

* inline functions do not require static keyword

* Feature grouping algorithm accepting FastHistParam

Feature grouping algorithm accepts many parameters (3+), and it gets annoying to
pass them one by one. Instead, simply pass the reference to FastHistParam. The
definition of FastHistParam has been moved to a separate header file to
accomodate this change.
2017-07-07 08:25:07 -07:00
Rory Mitchell
5f1b0bb386 [GPU-Plugin] Unify gpu_gpair/bst_gpair. Refactor. (#2477) 2017-07-01 17:31:13 +12:00
PSEUDOTENSOR / Jonathan McKinney
6b287177c8 [GPU-Plugin] Multi-GPU gpu_id bug fixes for grow_gpu_hist and grow_gpu methods, and additional documentation for the gpu plugin. (#2463) 2017-06-30 20:04:17 +12:00
Rory Mitchell
0e48f87529 [GPU-Plugin] Make node_idx type 32 bit for hist algo. Set default n_gpus to 1. (#2445) 2017-06-23 18:26:45 +12:00
PSEUDOTENSOR / Jonathan McKinney
41efe32aa5 [GPU-Plugin] Multi-GPU for grow_gpu_hist histogram method using NVIDIA NCCL. (#2395) 2017-06-12 05:06:08 +12:00
Rory Mitchell
6bf968efe6 [GPU Plugin] Fast histogram speed improvements. Updated benchmarks. (#2258) 2017-05-08 09:21:38 -07:00
Rory Mitchell
8ab5d4611c [GPU-Plugin] (#2227)
* Add fast histogram algorithm
* Fix Linux build
* Add 'gpu_id' parameter
2017-04-25 16:37:10 -07:00
Philip Cho
2715baef64 Fix bugs in multithreaded ApplySplitSparseData() (#2161)
* Bugfix 1: Fix segfault in multithreaded ApplySplitSparseData()

When there are more threads than rows in rowset, some threads end up
with empty ranges, causing them to crash. (iend - 1 needs to be
accessible as part of algorithm)

Fix: run only those threads with nonempty ranges.

* Add regression test for Bugfix 1

* Moving python_omp_test to existing python test group

Turns out you don't need to set "OMP_NUM_THREADS" to enable
multithreading. Just add nthread parameter.

* Bugfix 2: Fix corner case of ApplySplitSparseData() for categorical feature

When split value is less than all cut points, split_cond is set
incorrectly.

Fix: set split_cond = -1 to indicate this scenario

* Bugfix 3: Initialize data layout indicator before using it

data_layout_ is accessed before being set; this variable determines
whether feature 0 is included in feat_set.

Fix: re-order code in InitData() to initialize data_layout_ first

* Adding regression test for Bugfix 2

Unfortunately, no regression test for Bugfix 3, as there is no
way to deterministically assign value to an uninitialized variable.
2017-04-02 11:37:39 -07:00