Jiaming Yuan
7ea5675679
Add PushCSC for SparsePage. ( #4193 )
...
* Add PushCSC for SparsePage.
* Move Push* definitions into cc file.
* Add std:: prefix to `size_t` make clang++ happy.
* Address monitor count == 0.
2019-03-02 01:58:08 +08:00
Philip Hyunsu Cho
2aaae2e7bb
Fix #4163 : always copy sliced data ( #4165 )
...
* Revert "Accept numpy array view. (#4147 )"
This reverts commit a985a99cf0dacb26a5d734835473d492d3c2a0df.
* Fix #4163 : always copy sliced data
* Remove print() from the test; check shape equality
* Check if 'base' attribute exists
* Fix lint
* Address reviewer comment
* Fix lint
2019-02-20 14:46:34 -08:00
Rory Mitchell
c8c472f39a
Fix incorrect device in multi-GPU algorithm ( #4161 )
2019-02-20 09:23:15 +13:00
Nan Zhu
1dac5e2410
more correct way to build node stats in distributed fast hist ( #4140 )
...
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* more changes
* temp
* update
* udpate rabit
* change the histogram
* update kfactor
* sync per node stats
* temp
* update
* final
* code clean
* update rabit
* more cleanup
* fix errors
* fix failed tests
* enforce c++11
* broadcast subsampled feature correctly
* init col
* temp
* col sampling
* fix histmastrix init
* fix col sampling
* remove cout
* fix out of bound access
* fix core dump
remove core dump file
* update
* add fid
* update
* revert some changes
* temp
* temp
* pass all tests
* bring back some tests
* recover some changes
* fix lint issue
* enable monotone and interaction constraints
* don't specify default for monotone and interactions
* recover column init part
* more recovery
* fix core dumps
* code clean
* revert some changes
* fix test compilation issue
* fix lint issue
* resolve compilation issue
* fix issues of lint caused by rebase
* fix stylistic changes and change variable names
* modularize depth width
* address the comments
* fix failed tests
* wrap perf timers with class
* temp
* pass all lossguide
* pass tests
* add comments
* more changes
* use separate flow for single and tests
* add test for lossguide hist
* remove duplications
* syncing stats for only once
* recover more changes
* recover more changes
* fix root-stats
* simplify code
* remove outdated comments
2019-02-18 13:45:30 -08:00
Jiaming Yuan
a985a99cf0
Accept numpy array view. ( #4147 )
...
* Accept array view (slice) in metainfo.
2019-02-18 22:21:34 +08:00
Philip Hyunsu Cho
549c8d6ae9
Prevent empty quantiles in fast hist ( #4155 )
...
* Prevent empty quantiles
* Revise and improve unit tests for quantile hist
* Remove unnecessary comment
* Add #2943 as a test case
* Skip test if no sklearn
* Revise misleading comments
2019-02-17 16:01:07 -08:00
Jiaming Yuan
2e618af743
Fix cpplint. ( #4157 )
...
* Add comment after #endif.
* Add missing headers.
2019-02-18 00:16:29 +08:00
Rory Mitchell
71a604fae3
Fix for windows compilation ( #4139 )
2019-02-17 19:42:32 +13:00
Jiaming Yuan
1fe874e58a
Fix empty subspan. ( #4151 )
...
* Silent the death tests.
2019-02-17 04:48:03 +08:00
Jiaming Yuan
754fe8142b
Make `HistCutMatrix::Init' be aware of groups. ( #4115 )
...
* Add checks for group size.
* Simple docs.
* Search group index during hist cut matrix initialization.
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2019-02-16 04:39:41 +08:00
Nan Zhu
c18a3660fa
Separate Depthwidth and Lossguide growing policy in fast histogram ( #4102 )
...
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* init
* more changes
* temp
* update
* udpate rabit
* change the histogram
* update kfactor
* sync per node stats
* temp
* update
* final
* code clean
* update rabit
* more cleanup
* fix errors
* fix failed tests
* enforce c++11
* broadcast subsampled feature correctly
* init col
* temp
* col sampling
* fix histmastrix init
* fix col sampling
* remove cout
* fix out of bound access
* fix core dump
remove core dump file
* disbale test temporarily
* update
* add fid
* print perf data
* update
* revert some changes
* temp
* temp
* pass all tests
* bring back some tests
* recover some changes
* fix lint issue
* enable monotone and interaction constraints
* don't specify default for monotone and interactions
* recover column init part
* more recovery
* fix core dumps
* code clean
* revert some changes
* fix test compilation issue
* fix lint issue
* resolve compilation issue
* fix issues of lint caused by rebase
* fix stylistic changes and change variable names
* use regtree internal function
* modularize depth width
* address the comments
* fix failed tests
* wrap perf timers with class
* fix lint
* fix num_leaves count
* fix indention
* Update src/tree/updater_quantile_hist.cc
Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>
* Update src/tree/updater_quantile_hist.h
Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>
* Update src/tree/updater_quantile_hist.cc
Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>
* Update src/tree/updater_quantile_hist.cc
Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>
* Update src/tree/updater_quantile_hist.cc
Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>
* Update src/tree/updater_quantile_hist.h
Co-Authored-By: CodingCat <CodingCat@users.noreply.github.com>
* merge
* fix compilation
2019-02-13 12:56:19 -08:00
Jiaming Yuan
f8ca2960fc
Use nccl group calls to prevent from dead lock. ( #4113 )
...
* launch all reduce sequentially.
* Fix gpu_exact test memory leak.
2019-02-08 06:12:39 +08:00
Jiaming Yuan
017c97b8ce
Clean up training code. ( #3825 )
...
* Remove GHistRow, GHistEntry, GHistIndexRow.
* Remove kSimpleStats.
* Remove CheckInfo, SetLeafVec in GradStats and in SKStats.
* Clean up the GradStats.
* Cleanup calcgain.
* Move LossChangeMissing out of common.
* Remove [] operator from GHistIndexBlock.
2019-02-07 14:22:13 +08:00
Nan Zhu
ae3bb9c2d5
Distributed Fast Histogram Algorithm ( #4011 )
...
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* add back train method but mark as deprecated
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* fix scalastyle error
* init
* allow hist algo
* more changes
* temp
* update
* remove hist sync
* udpate rabit
* change hist size
* change the histogram
* update kfactor
* sync per node stats
* temp
* update
* final
* code clean
* update rabit
* more cleanup
* fix errors
* fix failed tests
* enforce c++11
* fix lint issue
* broadcast subsampled feature correctly
* revert some changes
* fix lint issue
* enable monotone and interaction constraints
* don't specify default for monotone and interactions
* update docs
2019-02-05 05:12:53 -08:00
Jiaming Yuan
1088dff42c
Prevent training without setting up caches. ( #4066 )
...
* Prevent training without setting up caches.
* Add warning for internal functions.
* Check number of features.
* Address reviewer's comment.
2019-02-03 01:03:29 -08:00
Rory Mitchell
1fc37e4749
Require leaf statistics when expanding tree ( #4015 )
...
* Cache left and right gradient sums
* Require leaf statistics when expanding tree
2019-01-17 21:12:20 -08:00
Egor Smirnov
5f151c5cf3
Performance optimizations for Intel CPUs ( #3957 )
...
* Initial performance optimizations for xgboost
* remove includes
* revert float->double
* fix for CI
* fix for CI
* fix for CI
* fix for CI
* fix for CI
* fix for CI
* fix for CI
* fix for CI
* fix for CI
* fix for CI
* Check existence of _mm_prefetch and __builtin_prefetch
* Fix lint
2019-01-08 21:08:13 -08:00
Kodi Arfer
6a569b8cd9
Avoid generating NaNs in UnwoundPathSum ( #3943 )
...
* Avoid generating NaNs in UnwoundPathSum.
Kudos to Jakub Zakrzewski for tracking down the bug.
* Add a test
2019-01-03 15:04:46 -08:00
Jiaming Yuan
55bc149efb
Fix sparse page segfault. ( #4040 )
...
* Remove usage of raw pointers in SparsePageSource.
2019-01-03 23:40:40 +08:00
Jiaming Yuan
1f022929f4
Use Span in gpu coordinate. ( #4029 )
...
* Use Span in gpu coordinate.
* Use Span in device code.
* Fix shard size calculation.
- Use lower_bound instead of upper_bound.
* Check empty devices.
2019-01-02 11:32:43 +08:00
Jiaming Yuan
be948df23f
Fix ignoring dart in updater configuration. ( #4024 )
...
* Fix ignoring dart in updater configuration.
2018-12-26 18:24:45 +08:00
Jiaming Yuan
9897b5042f
Use Span in GPU exact updater. ( #4020 )
...
* Use Span in GPU exact updater.
* Add a small test.
2018-12-26 12:44:46 +08:00
Jiaming Yuan
7735252925
Document num_parallel_tree. ( #4022 )
2018-12-25 22:00:58 +08:00
Jiaming Yuan
85939c6a6e
Merge duplicated linear updater parameters. ( #4013 )
...
* Merge duplicated linear updater parameters.
* Split up coordinate descent parameter.
2018-12-22 13:21:49 +08:00
Rory Mitchell
f75a21af25
Reduce tree expand boilerplate code ( #4008 )
2018-12-20 15:52:28 +13:00
Rory Mitchell
84c99f86f4
Combine TreeModel and RegTree ( #3995 )
2018-12-19 12:16:40 +13:00
Jiaming Yuan
c8c7b9649c
Fix and optimize logger ( #4002 )
...
* Fix logging switch statement.
* Remove debug_verbose_ in AllReducer.
* Don't construct the stream when not needed.
* Make default constructor deleted.
* Remove redundant IsVerbose.
2018-12-17 19:23:05 +08:00
Andy Adinets
42bf90eb8f
Column sampling at individual nodes (splits). ( #3971 )
...
* Column sampling at individual nodes (splits).
* Documented colsample_bynode parameter.
- also updated documentation for colsample_by* parameters
* Updated documentation.
* GetFeatureSet() returns shared pointer to std::vector.
* Sync sampled columns across multiple processes.
2018-12-14 22:37:35 +08:00
Jiaming Yuan
e0a279114e
Unify logging facilities. ( #3982 )
...
* Unify logging facilities.
* Enhance `ConsoleLogger` to handle different verbosity.
* Override macros from `dmlc`.
* Don't use specialized gamma when building with GPU.
* Remove verbosity cache in monitor.
* Test monitor.
* Deprecate `silent`.
* Fix doc and messages.
* Fix python test.
* Fix silent tests.
2018-12-14 19:29:58 +08:00
Rory Mitchell
3d81c48d3f
Remove leaf vector, add tree serialisation test, fix Windows tests ( #3989 )
2018-12-13 10:28:38 +13:00
Tong He
84a3af8dc0
Fix CRAN check warnings/notes ( #3988 )
...
* fix
* reorder declaration to match initialization
2018-12-12 08:23:20 -06:00
Andy Adinets
4be5edaf92
Initialized AllReducer counters to 0. ( #3987 )
2018-12-12 09:09:20 +13:00
Rory Mitchell
93f9ce9ef9
Single precision histograms on GPU ( #3965 )
...
* Allow single precision histogram summation in gpu_hist
* Add python test, reduce run-time of gpu_hist tests
* Update documentation
2018-12-10 10:55:30 +13:00
Philip Hyunsu Cho
9af6b689d6
Use int instead of char in CLI config parser ( #3976 )
2018-12-07 01:00:21 -08:00
Jiaming Yuan
48dddfd635
Porting elementwise metrics to GPU. ( #3952 )
...
* Port elementwise metrics to GPU.
* All elementwise metrics are converted to static polymorphic.
* Create a reducer for metrics reduction.
* Remove const of Metric::Eval to accommodate CubMemory.
2018-12-01 18:46:45 +13:00
Rory Mitchell
a9d684db18
GPU performance logging/improvements ( #3945 )
...
- Improved GPU performance logging
- Only use one execute shards function
- Revert performance regression on multi-GPU
- Use threads to launch NCCL AllReduce
2018-11-29 14:36:51 +13:00
Philip Hyunsu Cho
973fc8b1ff
Use consistent type for sharding GPU data in GPU coordinate updater ( #3917 )
...
* Use consistent type for sharding GPU data in GPU coordinate updater
* Use fast integer ceiling trick
2018-11-18 00:20:00 -08:00
theycallhimavi
0a0d4239d3
Fix Typo in learner.cc ( #3902 )
2018-11-16 12:54:36 +13:00
Rory Mitchell
7af0946ac1
Improve update position function for gpu_hist ( #3895 )
2018-11-14 19:33:29 +13:00
Rory Mitchell
926eb651fe
Minor refactor of split evaluation in gpu_hist ( #3889 )
...
* Refactor evaluate split into shard
* Use span in evaluate split
* Update google tests
2018-11-14 00:11:20 +13:00
Jiaming Yuan
daf77ca7b7
Enable running objectives with 0 GPU. ( #3878 )
...
* Enable running objectives with 0 GPU.
* Enable 0 GPU for objectives.
* Add doc for GPU objectives.
* Fix some objectives defaulted to running on all GPUs.
2018-11-13 20:19:59 +13:00
Jiaming Yuan
97984f4890
Fix gpu coordinate running on multi-gpu. ( #3893 )
2018-11-13 19:09:55 +13:00
Philip Hyunsu Cho
be0bb7dd90
Remove unnecessary warning when 'gblinear' is selected ( #3888 )
2018-11-09 12:30:38 -08:00
Philip Hyunsu Cho
e38d5a6831
Document current limitation in number of features ( #3886 )
2018-11-09 00:32:43 -08:00
Jiaming Yuan
19ee0a3579
Refactor fast-hist, add tests for some updaters. ( #3836 )
...
Add unittest for prune.
Add unittest for refresh.
Refactor fast_hist.
* Remove fast_hist_param.
* Rename to quantile_hist.
Add unittests for QuantileHist.
* Refactor QuantileHist into .h and .cc file.
* Remove sync.h.
* Remove MGPU_mock test.
Rename fast hist method to quantile hist.
2018-11-07 21:15:07 +13:00
Philip Hyunsu Cho
2b045aa805
Make C++ unit tests run and pass on Windows ( #3869 )
...
* Make C++ unit tests run and pass on Windows
* Fix logic for external memory. The letter ':' is part of drive letter,
so remove the drive letter before splitting on ':'.
* Cosmetic syntax changes to keep MSVC happy.
* Fix lint
* Add Windows guard
2018-11-06 17:17:24 -08:00
Jiaming Yuan
f1275f52c1
Fix specifying gpu_id, add tests. ( #3851 )
...
* Rewrite gpu_id related code.
* Remove normalised/unnormalised operatios.
* Address difference between `Index' and `Device ID'.
* Modify doc for `gpu_id'.
* Better LOG for GPUSet.
* Check specified n_gpus.
* Remove inappropriate `device_idx' term.
* Clarify GpuIdType and size_t.
2018-11-06 18:17:53 +13:00
Philip Hyunsu Cho
91537e7353
Fix #3342 and h2oai/h2o4gpu#625 : Save predictor parameters in model file ( #3856 )
...
* Fix #3342 and h2oai/h2o4gpu#625 : Save predictor parameters in model file
This allows pickled models to retain predictor attributes, such as
'predictor' (whether to use CPU or GPU) and 'n_gpu' (number of GPUs
to use). Related: h2oai/h2o4gpu#625
Closes #3342 .
TODO. Write a test.
* Fix lint
* Do not load GPU predictor into CPU-only XGBoost
* Add a test for pickling GPU predictors
* Make sample data big enough to pass multi GPU test
* Update test_gpu_predictor.cu
2018-11-03 21:45:38 -07:00
Philip Hyunsu Cho
ad68865d6b
[Blocking] Fix #3840 : Clean up logic for parsing tree_method parameter ( #3849 )
...
* Clean up logic for converting tree_method to updater sequence
* Use C++11 enum class for extra safety
Compiler will give warnings if switch statements don't handle all
possible values of C++11 enum class.
Also allow enum class to be used as DMLC parameter.
* Fix compiler error + lint
* Address reviewer comment
* Better docstring for DECLARE_FIELD_ENUM_CLASS
* Fix lint
* Add C++ test to see if tree_method is recognized
* Fix clang-tidy error
* Add test_learner.h to R package
* Update comments
* Fix lint error
2018-11-01 19:33:35 -07:00
Andy Adinets
2a59ff2f9b
Multi-GPU support in GPUPredictor. ( #3738 )
...
* Multi-GPU support in GPUPredictor.
- GPUPredictor is multi-GPU
- removed DeviceMatrix, as it has been made obsolete by using HostDeviceVector in DMatrix
* Replaced pointers with spans in GPUPredictor.
* Added a multi-GPU predictor test.
* Fix multi-gpu test.
* Fix n_rows < n_gpus.
* Reinitialize shards when GPUSet is changed.
* Tests range of data.
* Remove commented code.
* Remove commented code.
2018-10-23 22:59:11 -07:00