26 Commits

Author SHA1 Message Date
Jiaming Yuan
095de3bf5f
Export c++ headers in CMake installation. (#4897)
* Move get transpose into cc.

* Clean up headers in host device vector, remove thrust dependency.

* Move span and host device vector into public.

* Install c++ headers.

* Short notes for c and c++.

Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2019-10-06 23:53:09 -04:00
Rong Ou
562bb0ae31 remove device shards (#4867) 2019-09-25 13:15:46 +08:00
Rong Ou
38ab79f889 Make HostDeviceVector single gpu only (#4773)
* Make HostDeviceVector single gpu only
2019-08-26 09:51:13 +12:00
Rong Ou
6edddd7966 Refactor DMatrix to return batches of different page types (#4686)
* Use explicit template parameter for specifying page type.
2019-08-03 15:10:34 -04:00
Jiaming Yuan
f0064c07ab
Refactor configuration [Part II]. (#4577)
* Refactor configuration [Part II].

* General changes:
** Remove `Init` methods to avoid ambiguity.
** Remove `Configure(std::map<>)` to avoid redundant copying and prepare for
   parameter validation. (`std::vector` is returned from `InitAllowUnknown`).
** Add name to tree updaters for easier debugging.

* Learner changes:
** Make `LearnerImpl` the only source of configuration.

    All configurations are stored and carried out by `LearnerImpl::Configure()`.

** Remove booster in C API.

    Originally kept for "compatibility reason", but did not state why.  So here
    we just remove it.

** Add a `metric_names_` field in `LearnerImpl`.
** Remove `LazyInit`.  Configuration will always be lazy.
** Run `Configure` before every iteration.

* Predictor changes:
** Allocate both cpu and gpu predictor.
** Remove cpu_predictor from gpu_predictor.

    `GBTree` is now used to dispatch the predictor.

** Remove some GPU Predictor tests.

* IO

No IO changes.  The binary model format stability is tested by comparing
hashing value of save models between two commits
2019-07-20 08:34:56 -04:00
sriramch
90f683b25b Set the appropriate device before freeing device memory... (#4566)
* - set the appropriate device before freeing device memory...
   - pr #4532 added a global memory tracker/logger to keep track of number of (de)allocations
     and peak memory usage on a per device basis.
   - this pr adds the appropriate check to make sure that the (de)allocation counts and memory usages
     makes sense for the device. since verbosity is typically increased on debug/non-retail builds.  
* - pre-create cub allocators and reuse them
   - create them once and not resize them dynamically. we need to ensure that these allocators
     are created and destroyed exactly once so that the appropriate device id's are set
2019-06-18 14:58:05 +12:00
Jiaming Yuan
c589eff941
De-duplicate GPU parameters. (#4454)
* Only define `gpu_id` and `n_gpus` in `LearnerTrainParam`
* Pass LearnerTrainParam through XGBoost vid factory method.
* Disable all GPU usage when GPU related parameters are not specified (fixes XGBoost choosing GPU over aggressively).
* Test learner train param io.
* Fix gpu pickling.
2019-05-29 11:55:57 +08:00
Rory Mitchell
3f312e30db
Retire DVec class in favour of c++20 style span for device memory. (#4293) 2019-03-28 13:59:58 +13:00
Rory Mitchell
5465b73e7c
Fix multi-GPU test failures (#4259) 2019-03-15 14:40:43 +13:00
Jiaming Yuan
7b9043cf71
Fix clang-tidy warnings. (#4149)
* Upgrade gtest for clang-tidy.
* Use CMake to install GTest instead of mv.
* Don't enforce clang-tidy to return 0 due to errors in thrust.
* Add a small test for tidy itself.

* Reformat.
2019-03-13 02:25:51 +08:00
Jiaming Yuan
1f022929f4
Use Span in gpu coordinate. (#4029)
* Use Span in gpu coordinate.

* Use Span in device code.
* Fix shard size calculation.
  - Use lower_bound instead of upper_bound.
* Check empty devices.
2019-01-02 11:32:43 +08:00
Jiaming Yuan
85939c6a6e
Merge duplicated linear updater parameters. (#4013)
* Merge duplicated linear updater parameters.

* Split up coordinate descent parameter.
2018-12-22 13:21:49 +08:00
Jiaming Yuan
e0a279114e
Unify logging facilities. (#3982)
* Unify logging facilities.

* Enhance `ConsoleLogger` to handle different verbosity.
* Override macros from `dmlc`.
* Don't use specialized gamma when building with GPU.
* Remove verbosity cache in monitor.
* Test monitor.
* Deprecate `silent`.
* Fix doc and messages.
* Fix python test.
* Fix silent tests.
2018-12-14 19:29:58 +08:00
Rory Mitchell
a9d684db18
GPU performance logging/improvements (#3945)
- Improved GPU performance logging

- Only use one execute shards function

- Revert performance regression on multi-GPU

- Use threads to launch NCCL AllReduce
2018-11-29 14:36:51 +13:00
Philip Hyunsu Cho
973fc8b1ff
Use consistent type for sharding GPU data in GPU coordinate updater (#3917)
* Use consistent type for sharding GPU data in GPU coordinate updater

* Use fast integer ceiling trick
2018-11-18 00:20:00 -08:00
Jiaming Yuan
97984f4890
Fix gpu coordinate running on multi-gpu. (#3893) 2018-11-13 19:09:55 +13:00
Jiaming Yuan
f1275f52c1
Fix specifying gpu_id, add tests. (#3851)
* Rewrite gpu_id related code.

* Remove normalised/unnormalised operatios.
* Address difference between `Index' and `Device ID'.
* Modify doc for `gpu_id'.
* Better LOG for GPUSet.
* Check specified n_gpus.
* Remove inappropriate `device_idx' term.
* Clarify GpuIdType and size_t.
2018-11-06 18:17:53 +13:00
Rory Mitchell
70d208d68c
Dmatrix refactor stage 2 (#3395)
* DMatrix refactor 2

* Remove buffered rowset usage where possible

* Transition to c++11 style iterators for row access

* Transition column iterators to C++ 11
2018-10-01 01:29:03 +13:00
trivialfis
9119f9e369 Fix gpu devices. (#3693)
* Fix gpu_set normalized and unnormalized.
* Fix DeviceSpan.
2018-09-19 17:39:42 +12:00
Andy Adinets
72cd1517d6 Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. (#3446)
* Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage.

- added distributions to HostDeviceVector
- using HostDeviceVector for labels, weights and base margings in MetaInfo
- using HostDeviceVector for offset and data in SparsePage
- other necessary refactoring

* Added const version of HostDeviceVector API calls.

- const versions added to calls that can trigger data transfers, e.g. DevicePointer()
- updated the code that uses HostDeviceVector
- objective functions now accept const HostDeviceVector<bst_float>& for predictions

* Updated src/linear/updater_gpu_coordinate.cu.

* Added read-only state for HostDeviceVector sync.

- this means no copies are performed if both host and devices access
  the HostDeviceVector read-only

* Fixed linter and test errors.

- updated the lz4 plugin
- added ConstDeviceSpan to HostDeviceVector
- using device % dh::NVisibleDevices() for the physical device number,
  e.g. in calls to cudaSetDevice()

* Fixed explicit template instantiation errors for HostDeviceVector.

- replaced HostDeviceVector<unsigned int> with HostDeviceVector<int>

* Fixed HostDeviceVector tests that require multiple GPUs.

- added a mock set device handler; when set, it is called instead of cudaSetDevice()
2018-08-30 14:28:47 +12:00
trivialfis
60787ecebc Merge generic device helper functions into gpu set. (#3626)
* Remove the use of old NDevices* functions.
* Use GPUSet in timer.h.
2018-08-26 18:14:23 +12:00
trivialfis
2c502784ff Span class. (#3548)
* Add basic Span class based on ISO++20.

* Use Span<Entry const> instead of Inst in SparsePage.

* Add DeviceSpan in HostDeviceVector, use it in regression obj.
2018-08-14 17:58:11 +12:00
Rory Mitchell
645996b12f Remove accidental SparsePage copies (#3583) 2018-08-12 17:49:38 -07:00
Rory Mitchell
0f145a0365
Resolve GPU bug on large files (#3472)
Remove calls to thrust copy, fix indexing bug
2018-07-16 20:43:45 +12:00
Rory Mitchell
a96039141a
Dmatrix refactor stage 1 (#3301)
* Use sparse page as singular CSR matrix representation

* Simplify dmatrix methods

* Reduce statefullness of batch iterators

* BREAKING CHANGE: Remove prob_buffer_row parameter. Users are instead recommended to sample their dataset as a preprocessing step before using XGBoost.
2018-06-07 10:25:58 +12:00
Rory Mitchell
a185ddfe03
Implement GPU accelerated coordinate descent algorithm (#3178)
* Implement GPU accelerated coordinate descent algorithm. 

* Exclude external memory tests for GPU
2018-04-20 14:56:35 +12:00