* Pass pointer to model parameters.
This PR de-duplicates most of the model parameters except the one in
`tree_model.h`. One difficulty is `base_score` is a model property but can be
changed at runtime by objective function. Hence when performing model IO, we
need to save the one provided by users, instead of the one transformed by
objective. Here we created an immutable version of `LearnerModelParam` that
represents the value of model parameter after configuration.
* Use `UpdateAllowUnknown' for non-model related parameter.
Model parameter can not pack an additional boolean value due to binary IO
format. This commit deals only with non-model related parameter configuration.
* Add tidy command line arg for use-dmlc-gtest.
* Move get transpose into cc.
* Clean up headers in host device vector, remove thrust dependency.
* Move span and host device vector into public.
* Install c++ headers.
* Short notes for c and c++.
Co-Authored-By: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
* Refactor configuration [Part II].
* General changes:
** Remove `Init` methods to avoid ambiguity.
** Remove `Configure(std::map<>)` to avoid redundant copying and prepare for
parameter validation. (`std::vector` is returned from `InitAllowUnknown`).
** Add name to tree updaters for easier debugging.
* Learner changes:
** Make `LearnerImpl` the only source of configuration.
All configurations are stored and carried out by `LearnerImpl::Configure()`.
** Remove booster in C API.
Originally kept for "compatibility reason", but did not state why. So here
we just remove it.
** Add a `metric_names_` field in `LearnerImpl`.
** Remove `LazyInit`. Configuration will always be lazy.
** Run `Configure` before every iteration.
* Predictor changes:
** Allocate both cpu and gpu predictor.
** Remove cpu_predictor from gpu_predictor.
`GBTree` is now used to dispatch the predictor.
** Remove some GPU Predictor tests.
* IO
No IO changes. The binary model format stability is tested by comparing
hashing value of save models between two commits
* - set the appropriate device before freeing device memory...
- pr #4532 added a global memory tracker/logger to keep track of number of (de)allocations
and peak memory usage on a per device basis.
- this pr adds the appropriate check to make sure that the (de)allocation counts and memory usages
makes sense for the device. since verbosity is typically increased on debug/non-retail builds.
* - pre-create cub allocators and reuse them
- create them once and not resize them dynamically. we need to ensure that these allocators
are created and destroyed exactly once so that the appropriate device id's are set
* Only define `gpu_id` and `n_gpus` in `LearnerTrainParam`
* Pass LearnerTrainParam through XGBoost vid factory method.
* Disable all GPU usage when GPU related parameters are not specified (fixes XGBoost choosing GPU over aggressively).
* Test learner train param io.
* Fix gpu pickling.
* Upgrade gtest for clang-tidy.
* Use CMake to install GTest instead of mv.
* Don't enforce clang-tidy to return 0 due to errors in thrust.
* Add a small test for tidy itself.
* Reformat.
* Use Span in gpu coordinate.
* Use Span in device code.
* Fix shard size calculation.
- Use lower_bound instead of upper_bound.
* Check empty devices.
* Unify logging facilities.
* Enhance `ConsoleLogger` to handle different verbosity.
* Override macros from `dmlc`.
* Don't use specialized gamma when building with GPU.
* Remove verbosity cache in monitor.
* Test monitor.
* Deprecate `silent`.
* Fix doc and messages.
* Fix python test.
* Fix silent tests.
- Improved GPU performance logging
- Only use one execute shards function
- Revert performance regression on multi-GPU
- Use threads to launch NCCL AllReduce
* DMatrix refactor 2
* Remove buffered rowset usage where possible
* Transition to c++11 style iterators for row access
* Transition column iterators to C++ 11
* Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage.
- added distributions to HostDeviceVector
- using HostDeviceVector for labels, weights and base margings in MetaInfo
- using HostDeviceVector for offset and data in SparsePage
- other necessary refactoring
* Added const version of HostDeviceVector API calls.
- const versions added to calls that can trigger data transfers, e.g. DevicePointer()
- updated the code that uses HostDeviceVector
- objective functions now accept const HostDeviceVector<bst_float>& for predictions
* Updated src/linear/updater_gpu_coordinate.cu.
* Added read-only state for HostDeviceVector sync.
- this means no copies are performed if both host and devices access
the HostDeviceVector read-only
* Fixed linter and test errors.
- updated the lz4 plugin
- added ConstDeviceSpan to HostDeviceVector
- using device % dh::NVisibleDevices() for the physical device number,
e.g. in calls to cudaSetDevice()
* Fixed explicit template instantiation errors for HostDeviceVector.
- replaced HostDeviceVector<unsigned int> with HostDeviceVector<int>
* Fixed HostDeviceVector tests that require multiple GPUs.
- added a mock set device handler; when set, it is called instead of cudaSetDevice()
* Add basic Span class based on ISO++20.
* Use Span<Entry const> instead of Inst in SparsePage.
* Add DeviceSpan in HostDeviceVector, use it in regression obj.
* Use sparse page as singular CSR matrix representation
* Simplify dmatrix methods
* Reduce statefullness of batch iterators
* BREAKING CHANGE: Remove prob_buffer_row parameter. Users are instead recommended to sample their dataset as a preprocessing step before using XGBoost.