Jiaming Yuan
176110a22d
Support external memory in CPU histogram building. ( #7372 )
2021-11-23 01:13:33 +08:00
Jiaming Yuan
b0015fda96
Fix R CRAN failures. ( #7404 )
...
* Remove hist builder dtor.
* Initialize values.
* Tolerance.
* Remove the use of nthread in col maker.
2021-11-16 10:51:12 +08:00
Jiaming Yuan
d7d1b6e3a6
CPU evaluation for cat data. ( #7393 )
...
* Implementation for one hot based.
* Implementation for partition based. (LightGBM)
2021-11-06 14:41:35 +08:00
Jiaming Yuan
b06040b6d0
Implement a general array view. ( #7365 )
...
* Replace existing matrix and vector view.
This is to prepare for handling higher dimension data and prediction when we support multi-target models.
2021-11-05 04:16:11 +08:00
Jiaming Yuan
4100827971
Pass infomation about objective to tree methods. ( #7385 )
...
* Define the `ObjInfo` and pass it down to every tree updater.
2021-11-04 01:52:44 +08:00
Jiaming Yuan
8d7c6366d7
Accept histogram cut instead gradient index in evaluation. ( #7336 )
2021-10-20 18:04:46 +08:00
Jiaming Yuan
8e619010d0
Extract CPUExpandEntry and HistParam. ( #7321 )
...
* Remove kRootNid.
* Check for empty hessian.
2021-10-17 14:22:25 +08:00
Jiaming Yuan
3515931305
Initial support for external memory in gradient index. ( #7183 )
...
* Add hessian to batch param in preparation of new approx impl.
* Extract a push method for gradient index matrix.
* Use span instead of vector ref for hessian in sketching.
* Create a binary format for gradient index.
2021-09-13 12:40:56 +08:00
Jiaming Yuan
149f209af6
Extract histogram builder from CPU Hist. ( #7152 )
...
* Extract the CPU histogram builder.
* Fix tests.
* Reduce number of histograms being built.
2021-08-09 21:15:21 +08:00
ShvetsKS
caa9e527dd
Remove extra sync for dense data ( #7120 )
...
Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
2021-07-22 19:02:31 +08:00
Jiaming Yuan
615ab2b03e
Extract evaluate splits from CPU hist. ( #7079 )
...
Other than modularizing the split evaluation function, this PR also removes some more functions including `InitNewNodes` and `BuildNodeStats` among some other unused variables. Also, scattered code like setting leaf weights is grouped into the split evaluator and `NodeEntry` is simplified and made private. Another subtle difference with the original implementation is that the modified code doesn't call `tree[nidx].Parent()` to traversal upward.
2021-07-07 15:16:25 +08:00
Jiaming Yuan
1cd20efe68
Move GHistIndex into DMatrix. ( #7064 )
2021-07-01 00:44:49 +08:00
ShvetsKS
2567404ab6
Simplify sparse and dense CPU hist kernels ( #7029 )
...
* Simplify sparse and dense kernels
* Extract row partitioner.
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-06-11 18:26:30 +08:00
Jiaming Yuan
b56614e9b8
[R] Use new predict function. ( #6819 )
...
* Call new C prediction API.
* Add `strict_shape`.
* Add `iterationrange`.
* Update document.
2021-06-11 13:03:29 +08:00
ShvetsKS
5cdaac00c1
Remove feature grouping ( #7018 )
...
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-06-03 04:35:26 +08:00
ShvetsKS
57c732655e
Merge lossgude and depthwise strategies for CPU hist ( #7007 )
...
* fix java/scala test: max depth is also valid parameter for lossguide
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-06-03 01:49:43 +08:00
ShvetsKS
55b823b27d
Reduce 'InitSampling' complexity and set gradients to zero ( #6922 )
...
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-05-29 04:52:23 +08:00
Jiaming Yuan
556a83022d
Implement unified update prediction cache for (gpu_)hist. ( #6860 )
...
* Implement utilites for linalg.
* Unify the update prediction cache functions.
* Implement update prediction cache for multi-class gpu hist.
2021-04-17 00:29:34 +08:00
Igor Rukhovich
19a2c54265
Prediction by indices (subsample < 1) ( #6683 )
...
* Another implementation of predicting by indices
* Fixed omp parallel_for variable type
* Removed SparsePageView from Updater
2021-03-16 15:08:20 +13:00
Louis Desreumaux
9b530e5697
Improve OpenMP exception handling ( #6680 )
2021-02-25 13:56:16 +08:00
ShvetsKS
7f4d3a91b9
Multiclass prediction caching for CPU Hist ( #6550 )
...
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-01-13 04:42:07 +08:00
Igor Rukhovich
5c8ccf4455
Improved InitSampling function speed by 2.12 times ( #6410 )
...
* Improved InitSampling function speed by 2.12 times
* Added explicit conversion
2020-12-15 20:59:24 -08:00
ShvetsKS
956beead70
Thread local memory allocation for BuildHist ( #6358 )
...
* thread mem locality
* fix apply
* cleanup
* fix lint
* fix tests
* simple try
* fix
* fix
* apply comments
* fix comments
* fix
* apply simple comment
Co-authored-by: ShvetsKS <kirill.shvets@intel.com>
2020-11-25 17:50:12 +03:00
Sergio Gavilán
b181a88f9f
Reduced some C++ compiler warnings ( #6197 )
...
* Removed some warnings
* Rebase with master
* Solved C++ Google Tests errors made by refactoring in order to remove warnings
* Undo renaming path -> path_
* Fix style check
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-10-29 12:36:00 -07:00
vcarpani
671971e12e
Compiler warnings ( #6286 )
...
* Fix warnings for json.h
* Fix warnings for metric.h
* Fix warnings for updater_quantile_hist.cc.
* Fix warnings for updater_histmaker.cc.
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-10-28 13:46:15 -07:00
ShvetsKS
a4ce0eae43
CPU predict performance improvement ( #6127 )
...
Co-authored-by: ShvetsKS <kirill.shvets@intel.com>
2020-10-08 15:50:21 +03:00
Jiaming Yuan
2fcc4f2886
Unify evaluation functions. ( #6037 )
2020-08-26 14:23:27 +08:00
Jiaming Yuan
4d99c58a5f
Feature weights ( #5962 )
2020-08-18 19:55:41 +08:00
boxdot
d268a2a463
Thread-safe prediction by making the prediction cache thread-local. ( #5853 )
...
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2020-07-30 12:33:50 +08:00
Philip Hyunsu Cho
4af857f95d
Add explicit template specialization for portability ( #5921 )
...
* Add explicit template specializations
* Adding Specialization for FileAdapterBatch
2020-07-22 12:31:17 -07:00
Philip Hyunsu Cho
1d22a9be1c
Revert "Reorder includes. ( #5749 )" ( #5771 )
...
This reverts commit d3a0efbf162f3dceaaf684109e1178c150b32de3.
2020-06-09 10:29:28 -07:00
Jiaming Yuan
d3a0efbf16
Reorder includes. ( #5749 )
...
* Reorder includes.
* R.
2020-06-03 17:30:47 +12:00
ShvetsKS
cd3d14ad0e
Add float32 histogram ( #5624 )
...
* new single_precision_histogram param was added.
Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
Co-authored-by: fis <jm.yuan@outlook.com>
2020-06-03 11:24:53 +08:00
ShvetsKS
dd01e4ba8d
Distributed optimizations for 'hist' method with CPUs ( #5557 )
...
Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
2020-05-20 06:03:03 +03:00
Oleksandr Kuvshynov
4e64e2ef8e
skip missing lookup if nothing is missing in CPU hist partition kernel. ( #5644 )
...
* [xgboost] skip missing lookup if nothing is missing
2020-05-12 05:50:08 +03:00
ShvetsKS
a2d86b8e4b
Optimizations for RNG in InitData kernel ( #5522 )
...
* optimizations for subsampling in InitData
* optimizations for subsampling in InitData
Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
2020-04-16 18:24:32 +03:00
Jiaming Yuan
7d52c0b8c2
Requires setting leaf stat when expanding tree. ( #5501 )
...
* Fix GPU Hist feature importance.
2020-04-10 12:27:03 +08:00
Jiaming Yuan
0012f2ef93
Upgrade clang-tidy on CI. ( #5469 )
...
* Correct all clang-tidy errors.
* Upgrade clang-tidy to 10 on CI.
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-04-05 04:42:29 +08:00
ShvetsKS
27a8e36fc3
Reducing memory consumption for 'hist' method on CPU ( #5334 )
2020-03-28 14:45:52 +13:00
Egor Smirnov
1b97eaf7a7
Optimized ApplySplit, BuildHist and UpdatePredictCache functions on CPU ( #5244 )
...
* Split up sparse and dense build hist kernels.
* Add `PartitionBuilder`.
2020-02-29 16:11:42 +08:00
Rong Ou
e4b74c4d22
Gradient based sampling for GPU Hist ( #5093 )
...
* Implement gradient based sampling for GPU Hist tree method.
* Add samplers and handle compacted page in GPU Hist.
2020-02-04 10:31:27 +08:00
Egor Smirnov
c67163250e
Optimized BuildHist function ( #5156 )
2020-01-29 23:32:57 -08:00
Egor Smirnov
7b17e76c5b
Optimized EvaluateSplut function ( #5138 )
...
* Add block based threading utilities.
2019-12-31 18:18:42 +08:00
Jiaming Yuan
04db125699
Quick fix for memory leak in CPU Hist. ( #5153 )
...
Closes https://github.com/dmlc/xgboost/issues/3579 .
* Don't use map.
2019-12-31 14:05:53 +08:00
Egor Smirnov
b1789b0346
added tracking execution time for UpdatePredictionCache function ( #5107 )
2019-12-10 01:32:56 +08:00
Jiaming Yuan
7ef5b78003
Implement JSON IO for updaters ( #5094 )
...
* Implement JSON IO for updaters.
* Remove parameters in split evaluator.
2019-12-07 00:24:00 +08:00
Jiaming Yuan
64af1ecf86
[Breaking] Remove num roots. ( #5059 )
2019-12-05 21:58:43 +08:00
Jiaming Yuan
97abcc7ee2
Extract interaction constraint from split evaluator. ( #5034 )
...
* Extract interaction constraints from split evaluator.
The reason for doing so is mostly for model IO, where num_feature and interaction_constraints are copied in split evaluator. Also interaction constraint by itself is a feature selector, acting like column sampler and it's inefficient to bury it deep in the evaluator chain. Lastly removing one another copied parameter is a win.
* Enable inc for approx tree method.
As now the implementation is spited up from evaluator class, it's also enabled for approx method.
* Removing obsoleted code in colmaker.
They are never documented nor actually used in real world. Also there isn't a single test for those code blocks.
* Unifying the types used for row and column.
As the size of input dataset is marching to billion, incorrect use of int is subject to overflow, also singed integer overflow is undefined behaviour. This PR starts the procedure for unifying used index type to unsigned integers. There's optimization that can utilize this undefined behaviour, but after some testings I don't see the optimization is beneficial to XGBoost.
2019-11-14 20:11:41 +08:00
Philip Hyunsu Cho
f4e7b707c9
Revert #4529 ( #5008 )
...
* Revert " Optimize ‘hist’ for multi-core CPU (#4529 )"
This reverts commit 4d6590be3c9a043d44d9e4fe0a456a9f8179ec72.
* Fix build
2019-11-12 09:35:03 -08:00
Jiaming Yuan
ac457c56a2
Use `UpdateAllowUnknown' for non-model related parameter. ( #4961 )
...
* Use `UpdateAllowUnknown' for non-model related parameter.
Model parameter can not pack an additional boolean value due to binary IO
format. This commit deals only with non-model related parameter configuration.
* Add tidy command line arg for use-dmlc-gtest.
2019-10-23 05:50:12 -04:00