Jiaming Yuan
3e26107a9c
Rename and extract Context. ( #8528 )
...
* Rename `GenericParameter` to `Context`.
* Rename header file to reflect the change.
* Rename all references.
2022-12-07 04:58:54 +08:00
Dmitry Razdoburdin
5bd849f1b5
Unify the partitioner for hist and approx.
...
Co-authored-by: dmitry.razdoburdin <drazdobu@jfldaal005.jf.intel.com>
Co-authored-by: jiamingy <jm.yuan@outlook.com>
2022-10-20 02:49:20 +08:00
Rong Ou
668b8a0ea4
[Breaking] Switch from rabit to the collective communicator ( #8257 )
...
* Switch from rabit to the collective communicator
* fix size_t specialization
* really fix size_t
* try again
* add include
* more include
* fix lint errors
* remove rabit includes
* fix pylint error
* return dict from communicator context
* fix communicator shutdown
* fix dask test
* reset communicator mocklist
* fix distributed tests
* do not save device communicator
* fix jvm gpu tests
* add python test for federated communicator
* Update gputreeshap submodule
Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 14:39:01 -08:00
Jiaming Yuan
abaa593aa0
Fix compiler warnings. ( #8059 )
...
- Remove unused parameters.
- Avoid comparison of different signedness.
2022-07-14 05:29:56 +08:00
Jiaming Yuan
1a33b50a0d
Fix compiler warnings. ( #7974 )
...
- Remove unused parameters. There are still many warnings that are not yet
addressed. Currently, the warnings in dmlc-core dominate the error log.
- Remove `distributed` parameter from metric.
- Fixes some warnings about signed comparison.
2022-06-06 22:56:25 +08:00
Rory Mitchell
71d3b2e036
Fuse gpu_hist all-reduce calls where possible ( #7867 )
2022-05-17 13:27:50 +02:00
Jiaming Yuan
4fcfd9c96e
Fix and cleanup for column matrix. ( #7901 )
...
* Fix missed type dispatching for dense columns with missing values.
* Code cleanup to reduce special cases.
* Reduce memory usage.
2022-05-16 21:11:50 +08:00
Jiaming Yuan
1b6538b4e5
[breaking] Drop single precision histogram ( #7892 )
...
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2022-05-13 19:54:55 +08:00
Jiaming Yuan
317d7be6ee
Always use partition based categorical splits. ( #7857 )
2022-05-03 22:30:32 +08:00
Rory Mitchell
90cce38236
Remove single_precision_histogram for gpu_hist ( #7828 )
2022-05-03 14:53:19 +02:00
Jiaming Yuan
fdf533f2b9
[POC] Experimental support for l1 error. ( #7812 )
...
Support adaptive tree, a feature supported by both sklearn and lightgbm. The tree leaf is recomputed based on residue of labels and predictions after construction.
For l1 error, the optimal value is the median (50 percentile).
This is marked as experimental support for the following reasons:
- The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors.
- Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.
2022-04-26 21:41:55 +08:00
Jiaming Yuan
4d81c741e9
External memory support for hist ( #7531 )
...
* Generate column matrix from gHistIndex.
* Avoid synchronization with the sparse page once the cache is written.
* Cleanups: Remove member variables/functions, change the update routine to look like approx and gpu_hist.
* Remove pruner.
2022-03-22 00:13:20 +08:00
Jiaming Yuan
996cc705af
Small cleanup to hist tree method. ( #7735 )
...
* Remove special optimization using number of bins.
* Remove 1-based index for column sampling.
* Remove data layout.
* Unify update prediction cache.
2022-03-20 03:44:55 +08:00
Jiaming Yuan
83a66b4994
Support categorical data for hist. ( #7695 )
...
* Extract partitioner from hist.
* Implement categorical data support by passing the gradient index directly into the partitioner.
* Organize/update document.
* Remove code for negative hessian.
2022-02-25 03:47:14 +08:00
Jiaming Yuan
6762c45494
Small cleanup to gradient index and hist. ( #7668 )
...
* Code comments.
* Const accessor to index.
* Remove some weird variables in the `Index` class.
* Simplify the `MemStackAllocator`.
2022-02-23 11:37:21 +08:00
Jiaming Yuan
2775c2a1ab
Prepare external memory support for hist. ( #7638 )
...
This PR prepares the GHistIndexMatrix to host the column matrix which is used by the hist tree method by accepting sparse_threshold parameter.
Some cleanups are made to ensure the correct batch param is being passed into DMatrix along with some additional tests for correctness of SimpleDMatrix.
2022-02-10 16:58:02 +08:00
Jiaming Yuan
5d7818e75d
Remove omp_get_max_threads in tree updaters. ( #7590 )
2022-01-26 19:55:47 +08:00
Philip Hyunsu Cho
20c0d60ac7
Restore functionality of max_depth=0 in hist ( #7551 )
...
* Restore functionality of max_depth=0 in hist
* Add test case
2022-01-11 01:37:44 +08:00
Jiaming Yuan
176110a22d
Support external memory in CPU histogram building. ( #7372 )
2021-11-23 01:13:33 +08:00
Jiaming Yuan
b0015fda96
Fix R CRAN failures. ( #7404 )
...
* Remove hist builder dtor.
* Initialize values.
* Tolerance.
* Remove the use of nthread in col maker.
2021-11-16 10:51:12 +08:00
Jiaming Yuan
d7d1b6e3a6
CPU evaluation for cat data. ( #7393 )
...
* Implementation for one hot based.
* Implementation for partition based. (LightGBM)
2021-11-06 14:41:35 +08:00
Jiaming Yuan
b06040b6d0
Implement a general array view. ( #7365 )
...
* Replace existing matrix and vector view.
This is to prepare for handling higher dimension data and prediction when we support multi-target models.
2021-11-05 04:16:11 +08:00
Jiaming Yuan
4100827971
Pass infomation about objective to tree methods. ( #7385 )
...
* Define the `ObjInfo` and pass it down to every tree updater.
2021-11-04 01:52:44 +08:00
Jiaming Yuan
8d7c6366d7
Accept histogram cut instead gradient index in evaluation. ( #7336 )
2021-10-20 18:04:46 +08:00
Jiaming Yuan
8e619010d0
Extract CPUExpandEntry and HistParam. ( #7321 )
...
* Remove kRootNid.
* Check for empty hessian.
2021-10-17 14:22:25 +08:00
Jiaming Yuan
3515931305
Initial support for external memory in gradient index. ( #7183 )
...
* Add hessian to batch param in preparation of new approx impl.
* Extract a push method for gradient index matrix.
* Use span instead of vector ref for hessian in sketching.
* Create a binary format for gradient index.
2021-09-13 12:40:56 +08:00
Jiaming Yuan
149f209af6
Extract histogram builder from CPU Hist. ( #7152 )
...
* Extract the CPU histogram builder.
* Fix tests.
* Reduce number of histograms being built.
2021-08-09 21:15:21 +08:00
ShvetsKS
caa9e527dd
Remove extra sync for dense data ( #7120 )
...
Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
2021-07-22 19:02:31 +08:00
Jiaming Yuan
615ab2b03e
Extract evaluate splits from CPU hist. ( #7079 )
...
Other than modularizing the split evaluation function, this PR also removes some more functions including `InitNewNodes` and `BuildNodeStats` among some other unused variables. Also, scattered code like setting leaf weights is grouped into the split evaluator and `NodeEntry` is simplified and made private. Another subtle difference with the original implementation is that the modified code doesn't call `tree[nidx].Parent()` to traversal upward.
2021-07-07 15:16:25 +08:00
Jiaming Yuan
1cd20efe68
Move GHistIndex into DMatrix. ( #7064 )
2021-07-01 00:44:49 +08:00
ShvetsKS
2567404ab6
Simplify sparse and dense CPU hist kernels ( #7029 )
...
* Simplify sparse and dense kernels
* Extract row partitioner.
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-06-11 18:26:30 +08:00
Jiaming Yuan
b56614e9b8
[R] Use new predict function. ( #6819 )
...
* Call new C prediction API.
* Add `strict_shape`.
* Add `iterationrange`.
* Update document.
2021-06-11 13:03:29 +08:00
ShvetsKS
5cdaac00c1
Remove feature grouping ( #7018 )
...
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-06-03 04:35:26 +08:00
ShvetsKS
57c732655e
Merge lossgude and depthwise strategies for CPU hist ( #7007 )
...
* fix java/scala test: max depth is also valid parameter for lossguide
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-06-03 01:49:43 +08:00
ShvetsKS
55b823b27d
Reduce 'InitSampling' complexity and set gradients to zero ( #6922 )
...
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-05-29 04:52:23 +08:00
Jiaming Yuan
556a83022d
Implement unified update prediction cache for (gpu_)hist. ( #6860 )
...
* Implement utilites for linalg.
* Unify the update prediction cache functions.
* Implement update prediction cache for multi-class gpu hist.
2021-04-17 00:29:34 +08:00
Igor Rukhovich
19a2c54265
Prediction by indices (subsample < 1) ( #6683 )
...
* Another implementation of predicting by indices
* Fixed omp parallel_for variable type
* Removed SparsePageView from Updater
2021-03-16 15:08:20 +13:00
Louis Desreumaux
9b530e5697
Improve OpenMP exception handling ( #6680 )
2021-02-25 13:56:16 +08:00
ShvetsKS
7f4d3a91b9
Multiclass prediction caching for CPU Hist ( #6550 )
...
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-01-13 04:42:07 +08:00
Igor Rukhovich
5c8ccf4455
Improved InitSampling function speed by 2.12 times ( #6410 )
...
* Improved InitSampling function speed by 2.12 times
* Added explicit conversion
2020-12-15 20:59:24 -08:00
ShvetsKS
956beead70
Thread local memory allocation for BuildHist ( #6358 )
...
* thread mem locality
* fix apply
* cleanup
* fix lint
* fix tests
* simple try
* fix
* fix
* apply comments
* fix comments
* fix
* apply simple comment
Co-authored-by: ShvetsKS <kirill.shvets@intel.com>
2020-11-25 17:50:12 +03:00
Sergio Gavilán
b181a88f9f
Reduced some C++ compiler warnings ( #6197 )
...
* Removed some warnings
* Rebase with master
* Solved C++ Google Tests errors made by refactoring in order to remove warnings
* Undo renaming path -> path_
* Fix style check
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-10-29 12:36:00 -07:00
vcarpani
671971e12e
Compiler warnings ( #6286 )
...
* Fix warnings for json.h
* Fix warnings for metric.h
* Fix warnings for updater_quantile_hist.cc.
* Fix warnings for updater_histmaker.cc.
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-10-28 13:46:15 -07:00
ShvetsKS
a4ce0eae43
CPU predict performance improvement ( #6127 )
...
Co-authored-by: ShvetsKS <kirill.shvets@intel.com>
2020-10-08 15:50:21 +03:00
Jiaming Yuan
2fcc4f2886
Unify evaluation functions. ( #6037 )
2020-08-26 14:23:27 +08:00
Jiaming Yuan
4d99c58a5f
Feature weights ( #5962 )
2020-08-18 19:55:41 +08:00
boxdot
d268a2a463
Thread-safe prediction by making the prediction cache thread-local. ( #5853 )
...
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2020-07-30 12:33:50 +08:00
Philip Hyunsu Cho
4af857f95d
Add explicit template specialization for portability ( #5921 )
...
* Add explicit template specializations
* Adding Specialization for FileAdapterBatch
2020-07-22 12:31:17 -07:00
Philip Hyunsu Cho
1d22a9be1c
Revert "Reorder includes. ( #5749 )" ( #5771 )
...
This reverts commit d3a0efbf162f3dceaaf684109e1178c150b32de3.
2020-06-09 10:29:28 -07:00
Jiaming Yuan
d3a0efbf16
Reorder includes. ( #5749 )
...
* Reorder includes.
* R.
2020-06-03 17:30:47 +12:00