Jiaming Yuan
53fc17578f
Use std::uint64_t for row index. ( #10120 )
...
- Use std::uint64_t instead of size_t to avoid implementation-defined type.
- Rename to bst_idx_t, to account for other types of indexing.
- Small cleanup to the base header.
2024-03-15 18:43:49 +08:00
Jiaming Yuan
0ce4372bd4
Use UBJSON for serializing splits for vertical data split. ( #10059 )
2024-02-25 00:18:23 +08:00
Jiaming Yuan
cacb4b1fdd
Fix gain calculation in multi-target tree. ( #9978 )
2024-01-17 13:18:44 +08:00
Jiaming Yuan
972730cde0
Use matrix for gradient. ( #9508 )
...
- Use the `linalg::Matrix` for storing gradients.
- New API for the custom objective.
- Custom objective for multi-class/multi-target is now required to return the correct shape.
- Custom objective for Python can accept arrays with any strides. (row-major, column-major)
2023-08-24 05:29:52 +08:00
Jiaming Yuan
e93a274823
Small cleanup for histogram routines. ( #9427 )
...
* Small cleanup for histogram routines.
- Extract hist train param from GPU hist.
- Make histogram const after construction.
- Unify parameter names.
2023-08-02 18:28:26 +08:00
Jiaming Yuan
03bc6e6427
Remove unused variables. ( #9210 )
...
- remove used variables.
- Remove signed comparison warnings.
2023-05-28 05:24:15 +08:00
Rong Ou
5b69534b43
Support column split in multi-target hist ( #9171 )
2023-05-26 16:56:05 +08:00
Rong Ou
603f8ce2fa
Support hist in the partition builder under column split ( #9120 )
2023-05-11 05:24:29 +08:00
Jiaming Yuan
08ce495b5d
Use Booster context in DMatrix. ( #8896 )
...
- Pass context from booster to DMatrix.
- Use context instead of integer for `n_threads`.
- Check the consistency configuration for `max_bin`.
- Test for all combinations of initialization options.
2023-04-28 21:47:14 +08:00
Jiaming Yuan
a093770f36
Partitioner for multi-target tree. ( #8922 )
2023-03-16 18:49:34 +08:00
Jiaming Yuan
5ba3509dd3
Define multi expand entry. ( #8895 )
2023-03-13 19:31:05 +08:00
Rong Ou
d9688f93c7
Support column-split in row partitioner ( #8828 )
2023-02-26 04:43:35 +08:00
Jiaming Yuan
3e26107a9c
Rename and extract Context. ( #8528 )
...
* Rename `GenericParameter` to `Context`.
* Rename header file to reflect the change.
* Rename all references.
2022-12-07 04:58:54 +08:00
Dmitry Razdoburdin
5bd849f1b5
Unify the partitioner for hist and approx.
...
Co-authored-by: dmitry.razdoburdin <drazdobu@jfldaal005.jf.intel.com>
Co-authored-by: jiamingy <jm.yuan@outlook.com>
2022-10-20 02:49:20 +08:00
Jiaming Yuan
441ffc017a
Copy data from Ellpack to GHist. ( #8215 )
2022-09-06 23:05:49 +08:00
Jiaming Yuan
4a4e5c7c18
Prepare gradient index for Quantile DMatrix. ( #8103 )
...
* Prepare gradient index for Quantile DMatrix.
- Implement push batch with adapter batch.
- Implement `GetFvalue` for prediction.
2022-07-22 17:26:33 +08:00
Jiaming Yuan
4fcfd9c96e
Fix and cleanup for column matrix. ( #7901 )
...
* Fix missed type dispatching for dense columns with missing values.
* Code cleanup to reduce special cases.
* Reduce memory usage.
2022-05-16 21:11:50 +08:00
Jiaming Yuan
4d81c741e9
External memory support for hist ( #7531 )
...
* Generate column matrix from gHistIndex.
* Avoid synchronization with the sparse page once the cache is written.
* Cleanups: Remove member variables/functions, change the update routine to look like approx and gpu_hist.
* Remove pruner.
2022-03-22 00:13:20 +08:00
Jiaming Yuan
996cc705af
Small cleanup to hist tree method. ( #7735 )
...
* Remove special optimization using number of bins.
* Remove 1-based index for column sampling.
* Remove data layout.
* Unify update prediction cache.
2022-03-20 03:44:55 +08:00
Jiaming Yuan
83a66b4994
Support categorical data for hist. ( #7695 )
...
* Extract partitioner from hist.
* Implement categorical data support by passing the gradient index directly into the partitioner.
* Organize/update document.
* Remove code for negative hessian.
2022-02-25 03:47:14 +08:00
Jiaming Yuan
6762c45494
Small cleanup to gradient index and hist. ( #7668 )
...
* Code comments.
* Const accessor to index.
* Remove some weird variables in the `Index` class.
* Simplify the `MemStackAllocator`.
2022-02-23 11:37:21 +08:00
Jiaming Yuan
2775c2a1ab
Prepare external memory support for hist. ( #7638 )
...
This PR prepares the GHistIndexMatrix to host the column matrix which is used by the hist tree method by accepting sparse_threshold parameter.
Some cleanups are made to ensure the correct batch param is being passed into DMatrix along with some additional tests for correctness of SimpleDMatrix.
2022-02-10 16:58:02 +08:00
Jiaming Yuan
5d7818e75d
Remove omp_get_max_threads in tree updaters. ( #7590 )
2022-01-26 19:55:47 +08:00
Jiaming Yuan
5817840858
Remove omp_get_max_threads in data. ( #7588 )
2022-01-24 02:44:07 +08:00
Jiaming Yuan
9ab73f737e
Extract Sketch Entry from hist maker. ( #7503 )
...
* Extract Sketch Entry from hist maker.
* Add a new sketch container for sorted inputs.
* Optimize bin search.
2021-12-18 05:36:56 +08:00
Jiaming Yuan
4100827971
Pass infomation about objective to tree methods. ( #7385 )
...
* Define the `ObjInfo` and pass it down to every tree updater.
2021-11-04 01:52:44 +08:00
Jiaming Yuan
149f209af6
Extract histogram builder from CPU Hist. ( #7152 )
...
* Extract the CPU histogram builder.
* Fix tests.
* Reduce number of histograms being built.
2021-08-09 21:15:21 +08:00
Jiaming Yuan
615ab2b03e
Extract evaluate splits from CPU hist. ( #7079 )
...
Other than modularizing the split evaluation function, this PR also removes some more functions including `InitNewNodes` and `BuildNodeStats` among some other unused variables. Also, scattered code like setting leaf weights is grouped into the split evaluator and `NodeEntry` is simplified and made private. Another subtle difference with the original implementation is that the modified code doesn't call `tree[nidx].Parent()` to traversal upward.
2021-07-07 15:16:25 +08:00
Jiaming Yuan
1cd20efe68
Move GHistIndex into DMatrix. ( #7064 )
2021-07-01 00:44:49 +08:00
ShvetsKS
2567404ab6
Simplify sparse and dense CPU hist kernels ( #7029 )
...
* Simplify sparse and dense kernels
* Extract row partitioner.
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-06-11 18:26:30 +08:00
ShvetsKS
5cdaac00c1
Remove feature grouping ( #7018 )
...
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-06-03 04:35:26 +08:00
ShvetsKS
57c732655e
Merge lossgude and depthwise strategies for CPU hist ( #7007 )
...
* fix java/scala test: max depth is also valid parameter for lossguide
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-06-03 01:49:43 +08:00
ShvetsKS
55b823b27d
Reduce 'InitSampling' complexity and set gradients to zero ( #6922 )
...
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-05-29 04:52:23 +08:00
Igor Rukhovich
19a2c54265
Prediction by indices (subsample < 1) ( #6683 )
...
* Another implementation of predicting by indices
* Fixed omp parallel_for variable type
* Removed SparsePageView from Updater
2021-03-16 15:08:20 +13:00
Jiaming Yuan
f2f7dd87b8
Use view for SparsePage exclusively. ( #6590 )
2021-01-11 18:04:55 +08:00
ShvetsKS
956beead70
Thread local memory allocation for BuildHist ( #6358 )
...
* thread mem locality
* fix apply
* cleanup
* fix lint
* fix tests
* simple try
* fix
* fix
* apply comments
* fix comments
* fix
* apply simple comment
Co-authored-by: ShvetsKS <kirill.shvets@intel.com>
2020-11-25 17:50:12 +03:00
Sergio Gavilán
b181a88f9f
Reduced some C++ compiler warnings ( #6197 )
...
* Removed some warnings
* Rebase with master
* Solved C++ Google Tests errors made by refactoring in order to remove warnings
* Undo renaming path -> path_
* Fix style check
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-10-29 12:36:00 -07:00
Jiaming Yuan
2fcc4f2886
Unify evaluation functions. ( #6037 )
2020-08-26 14:23:27 +08:00
ShvetsKS
cd3d14ad0e
Add float32 histogram ( #5624 )
...
* new single_precision_histogram param was added.
Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
Co-authored-by: fis <jm.yuan@outlook.com>
2020-06-03 11:24:53 +08:00
ShvetsKS
dd01e4ba8d
Distributed optimizations for 'hist' method with CPUs ( #5557 )
...
Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
2020-05-20 06:03:03 +03:00
Oleksandr Kuvshynov
4e64e2ef8e
skip missing lookup if nothing is missing in CPU hist partition kernel. ( #5644 )
...
* [xgboost] skip missing lookup if nothing is missing
2020-05-12 05:50:08 +03:00
ShvetsKS
a2d86b8e4b
Optimizations for RNG in InitData kernel ( #5522 )
...
* optimizations for subsampling in InitData
* optimizations for subsampling in InitData
Co-authored-by: SHVETS, KIRILL <kirill.shvets@intel.com>
2020-04-16 18:24:32 +03:00
Jiaming Yuan
6671b42dd4
Use ellpack for prediction only when sparsepage doesn't exist. ( #5504 )
2020-04-10 12:15:46 +08:00
Jiaming Yuan
0012f2ef93
Upgrade clang-tidy on CI. ( #5469 )
...
* Correct all clang-tidy errors.
* Upgrade clang-tidy to 10 on CI.
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-04-05 04:42:29 +08:00
Jiaming Yuan
4942da64ae
Refactor tests with data generator. ( #5439 )
2020-03-27 06:44:44 +08:00
Egor Smirnov
1b97eaf7a7
Optimized ApplySplit, BuildHist and UpdatePredictCache functions on CPU ( #5244 )
...
* Split up sparse and dense build hist kernels.
* Add `PartitionBuilder`.
2020-02-29 16:11:42 +08:00
Egor Smirnov
c67163250e
Optimized BuildHist function ( #5156 )
2020-01-29 23:32:57 -08:00
Egor Smirnov
7b17e76c5b
Optimized EvaluateSplut function ( #5138 )
...
* Add block based threading utilities.
2019-12-31 18:18:42 +08:00
Jiaming Yuan
04db125699
Quick fix for memory leak in CPU Hist. ( #5153 )
...
Closes https://github.com/dmlc/xgboost/issues/3579 .
* Don't use map.
2019-12-31 14:05:53 +08:00
Jiaming Yuan
7ef5b78003
Implement JSON IO for updaters ( #5094 )
...
* Implement JSON IO for updaters.
* Remove parameters in split evaluator.
2019-12-07 00:24:00 +08:00