32 Commits

Author SHA1 Message Date
Rory Mitchell
1fc37e4749 Require leaf statistics when expanding tree (#4015)
* Cache left and right gradient sums

* Require leaf statistics when expanding tree
2019-01-17 21:12:20 -08:00
Rory Mitchell
f75a21af25
Reduce tree expand boilerplate code (#4008) 2018-12-20 15:52:28 +13:00
Andy Adinets
42bf90eb8f Column sampling at individual nodes (splits). (#3971)
* Column sampling at individual nodes (splits).

* Documented colsample_bynode parameter.

- also updated documentation for colsample_by* parameters

* Updated documentation.

* GetFeatureSet() returns shared pointer to std::vector.

* Sync sampled columns across multiple processes.
2018-12-14 22:37:35 +08:00
Rory Mitchell
3d81c48d3f
Remove leaf vector, add tree serialisation test, fix Windows tests (#3989) 2018-12-13 10:28:38 +13:00
Jiaming Yuan
19ee0a3579
Refactor fast-hist, add tests for some updaters. (#3836)
Add unittest for prune.

Add unittest for refresh.

Refactor fast_hist.

* Remove fast_hist_param.
* Rename to quantile_hist.

Add unittests for QuantileHist.

* Refactor QuantileHist into .h and .cc file.
* Remove sync.h.
* Remove MGPU_mock test.

Rename fast hist method to quantile hist.
2018-11-07 21:15:07 +13:00
Rory Mitchell
70d208d68c
Dmatrix refactor stage 2 (#3395)
* DMatrix refactor 2

* Remove buffered rowset usage where possible

* Transition to c++11 style iterators for row access

* Transition column iterators to C++ 11
2018-10-01 01:29:03 +13:00
Andy Adinets
72cd1517d6 Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage. (#3446)
* Replaced std::vector with HostDeviceVector in MetaInfo and SparsePage.

- added distributions to HostDeviceVector
- using HostDeviceVector for labels, weights and base margings in MetaInfo
- using HostDeviceVector for offset and data in SparsePage
- other necessary refactoring

* Added const version of HostDeviceVector API calls.

- const versions added to calls that can trigger data transfers, e.g. DevicePointer()
- updated the code that uses HostDeviceVector
- objective functions now accept const HostDeviceVector<bst_float>& for predictions

* Updated src/linear/updater_gpu_coordinate.cu.

* Added read-only state for HostDeviceVector sync.

- this means no copies are performed if both host and devices access
  the HostDeviceVector read-only

* Fixed linter and test errors.

- updated the lz4 plugin
- added ConstDeviceSpan to HostDeviceVector
- using device % dh::NVisibleDevices() for the physical device number,
  e.g. in calls to cudaSetDevice()

* Fixed explicit template instantiation errors for HostDeviceVector.

- replaced HostDeviceVector<unsigned int> with HostDeviceVector<int>

* Fixed HostDeviceVector tests that require multiple GPUs.

- added a mock set device handler; when set, it is called instead of cudaSetDevice()
2018-08-30 14:28:47 +12:00
Rory Mitchell
686e990ffc
GPU memory usage fixes + column sampling refactor (#3635)
* Remove thrust copy calls

* Fix  histogram memory usage

* Cap extreme histogram memory usage

* More efficient column sampling

* Use column sampler across updaters

* More efficient split evaluation on GPU with column sampling
2018-08-27 16:26:46 +12:00
trivialfis
2c502784ff Span class. (#3548)
* Add basic Span class based on ISO++20.

* Use Span<Entry const> instead of Inst in SparsePage.

* Add DeviceSpan in HostDeviceVector, use it in regression obj.
2018-08-14 17:58:11 +12:00
Rory Mitchell
645996b12f Remove accidental SparsePage copies (#3583) 2018-08-12 17:49:38 -07:00
Henry Gouk
a13e29ece1 Add LASSO (#3429)
* Allow multiple split constraints

* Replace RidgePenalty with ElasticNet

* Add test for checking Ridge, LASSO, and Elastic Net are implemented
2018-07-15 16:38:26 +12:00
Henry Gouk
64b8cffde3 Refactor of FastHistMaker to allow for custom regularisation methods (#3335)
* Refactor to allow for custom regularisation methods

* Implement compositional SplitEvaluator framework

* Fixed segfault when no monotone_constraints are supplied.

* Change pid to parentID

* test_monotone_constraints.py now passes

* Refactor ColMaker and DistColMaker to use SplitEvaluator

* Performance optimisation when no monotone_constraints specified

* Fix linter messages

* Fix a few more linter errors

* Update the amalgamation

* Add bounds check

* Add check for leaf node

* Fix linter error in param.h

* Fix clang-tidy errors on CI

* Fix incorrect function name

* Fix clang-tidy error in updater_fast_hist.cc

* Enable SSE2 for Win32 R MinGW

Addresses https://github.com/dmlc/xgboost/pull/3335#issuecomment-400535752

* Add contributor
2018-06-28 07:37:25 +00:00
Rory Mitchell
a96039141a
Dmatrix refactor stage 1 (#3301)
* Use sparse page as singular CSR matrix representation

* Simplify dmatrix methods

* Reduce statefullness of batch iterators

* BREAKING CHANGE: Remove prob_buffer_row parameter. Users are instead recommended to sample their dataset as a preprocessing step before using XGBoost.
2018-06-07 10:25:58 +12:00
Rory Mitchell
ccf80703ef
Clang-tidy static analysis (#3222)
* Clang-tidy static analysis

* Modernise checks

* Google coding standard checks

* Identifier renaming according to Google style
2018-04-19 18:57:13 +12:00
Andrew V. Adinetz
d5992dd881 Replaced std::vector-based interfaces with HostDeviceVector-based interfaces. (#3116)
* Replaced std::vector-based interfaces with HostDeviceVector-based interfaces.

- replacement was performed in the learner, boosters, predictors,
  updaters, and objective functions
- only interfaces used in training were replaced;
  interfaces like PredictInstance() still use std::vector
- refactoring necessary for replacement of interfaces was also performed,
  such as using HostDeviceVector in prediction cache

* HostDeviceVector-based interfaces for custom objective function example plugin.
2018-02-28 13:00:04 +13:00
Rory Mitchell
c51adb49b6
Monotone constraints for gpu_hist (#2904) 2017-11-30 10:26:19 +13:00
Rory Mitchell
e6a9063344 Integer gradient summation for GPU histogram algorithm. (#2681) 2017-09-08 15:07:29 +12:00
PSEUDOTENSOR / Jonathan McKinney
6b287177c8 [GPU-Plugin] Multi-GPU gpu_id bug fixes for grow_gpu_hist and grow_gpu methods, and additional documentation for the gpu plugin. (#2463) 2017-06-30 20:04:17 +12:00
Philip Cho
14fba01b5a Improve multi-threaded performance (#2104)
* Add UpdatePredictionCache() option to updaters

Some updaters (e.g. fast_hist) has enough information to quickly compute
prediction cache for the training data. Each updater may override
UpdaterPredictionCache() method to update the prediction cache. Note: this
trick does not apply to validation data.

* Respond to code review

* Disable some debug messages by default
* Document UpdatePredictionCache() interface
* Remove base_margin logic from UpdatePredictionCache() implementation
* Do not take pointer to cfg, as reference may get stale

* Improve multi-threaded performance

* Use columnwise accessor to accelerate ApplySplit() step,
  with support for a compressed representation
* Parallel sort for evaluation step
* Inline BuildHist() function
* Cache gradient pairs when building histograms in BuildHist()

* Add missing #if macro

* Respond to code review

* Use wrapper to enable parallel sort on Linux

* Fix C++ compatibility issues

* MSVC doesn't support unsigned in OpenMP loops
* gcc 4.6 doesn't support using keyword

* Fix lint issues

* Respond to code review

* Fix bug in ApplySplitSparseData()

* Attempting to read beyond the end of a sparse column
* Mishandling the case where an entire range of rows have missing values

* Fix training continuation bug

Disable UpdatePredictionCache() in the first iteration. This way, we can
accomodate the scenario where we build off of an existing (nonempty) ensemble.

* Add regression test for fast_hist

* Respond to code review

* Add back old version of ApplySplitSparseData
2017-03-25 10:35:01 -07:00
Tianqi Chen
d581a3d0e7 [UPDATE] Update rabit and threadlocal (#2114)
* [UPDATE] Update rabit and threadlocal

* minor fix to make build system happy

* upgrade requirement to g++4.8

* upgrade dmlc-core

* update travis
2017-03-16 18:48:37 -07:00
Tianqi Chen
fd19b7a188 Automatically remove nan from input data when it is sparse. (#2062)
* [DATALoad] Automatically remove Nan when load from sparse matrix

* add log
2017-02-25 08:59:17 -08:00
Simon DENEL
7078c41dad Changing omp_get_num_threads to omp_get_max_threads (#1831)
* Updating dmlc-core

* Changing omp_get_num_threads to omp_get_max_threads
2016-12-04 11:26:45 -08:00
AbdealiJK
6f16f0ef58 Use bst_float consistently throughout (#1824)
* Fix various typos

* Add override to functions that are overridden

gcc gives warnings about functions that are being overridden by not
being marked as oveirridden. This fixes it.

* Use bst_float consistently

Use bst_float for all the variables that involve weight,
leaf value, gradient, hessian, gain, loss_chg, predictions,
base_margin, feature values.

In some cases, when due to additions and so on the value can
take a larger value, double is used.

This ensures that type conversions are minimal and reduces loss of
precision.
2016-11-30 10:02:10 -08:00
Tianqi Chen
c93c9b7ed6 [TREE] Experimental version of monotone constraint (#1516)
* [TREE] Experimental version of monotone constraint

* Allow default detection of montone option

* loose the condition of strict check

* Update gbtree.cc
2016-09-07 21:28:43 -07:00
tqchen
2f2080a337 [TREE] Remove gap constraint, make tree construction more robust 2016-02-10 11:17:54 -08:00
tqchen
88447ca32e [MEM] Add rowset struct to save memory with billion level rows 2016-02-10 11:17:17 -08:00
samuel-liyi
d3540aacc5 change the formula of fsplit value 2016-02-08 15:00:04 +08:00
tqchen
d75e3ed05d [LIBXGBOOST] pass demo running. 2016-01-16 10:24:01 -08:00
tqchen
e4567bbc47 [REFACTOR] Add alias, allow missing variables, init gbm interface 2016-01-16 10:24:01 -08:00
tqchen
d4677b6561 [TREE] finish move of updater 2016-01-16 10:24:01 -08:00
tqchen
3128e1705b [TREE] Refactor colmaker 2016-01-16 10:24:01 -08:00
tqchen
20043f63a6 [TREE] Move colmaker 2016-01-16 10:24:01 -08:00