14 Commits

Author SHA1 Message Date
Jiaming Yuan
b5eb36f1af
Add max_cat_threshold to GPU and handle missing cat values. (#8212) 2022-09-07 00:57:51 +08:00
Jiaming Yuan
bde4f25794
Handle missing categorical value in CPU evaluator. (#7948) 2022-05-27 14:15:47 +08:00
Jiaming Yuan
4fcfd9c96e
Fix and cleanup for column matrix. (#7901)
* Fix missed type dispatching for dense columns with missing values.
* Code cleanup to reduce special cases.
* Reduce memory usage.
2022-05-16 21:11:50 +08:00
Jiaming Yuan
1b6538b4e5
[breaking] Drop single precision histogram (#7892)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2022-05-13 19:54:55 +08:00
Jiaming Yuan
317d7be6ee
Always use partition based categorical splits. (#7857) 2022-05-03 22:30:32 +08:00
Jiaming Yuan
83a66b4994
Support categorical data for hist. (#7695)
* Extract partitioner from hist.
* Implement categorical data support by passing the gradient index directly into the partitioner.
* Organize/update document.
* Remove code for negative hessian.
2022-02-25 03:47:14 +08:00
Jiaming Yuan
0d0abe1845
Support optimal partitioning for GPU hist. (#7652)
* Implement `MaxCategory` in quantile.
* Implement partition-based split for GPU evaluation.  Currently, it's based on the existing evaluation function.
* Extract an evaluator from GPU Hist to store the needed states.
* Added some CUDA stream/event utilities.
* Update document with references.
* Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
2022-02-15 03:03:12 +08:00
Jiaming Yuan
2775c2a1ab
Prepare external memory support for hist. (#7638)
This PR prepares the GHistIndexMatrix to host the column matrix which is used by the hist tree method by accepting sparse_threshold parameter.

Some cleanups are made to ensure the correct batch param is being passed into DMatrix along with some additional tests for correctness of SimpleDMatrix.
2022-02-10 16:58:02 +08:00
Jiaming Yuan
5817840858
Remove omp_get_max_threads in data. (#7588) 2022-01-24 02:44:07 +08:00
Jiaming Yuan
9ab73f737e
Extract Sketch Entry from hist maker. (#7503)
* Extract Sketch Entry from hist maker.

* Add a new sketch container for sorted inputs.
* Optimize bin search.
2021-12-18 05:36:56 +08:00
Jiaming Yuan
176110a22d
Support external memory in CPU histogram building. (#7372) 2021-11-23 01:13:33 +08:00
Jiaming Yuan
d7d1b6e3a6
CPU evaluation for cat data. (#7393)
* Implementation for one hot based.
* Implementation for partition based. (LightGBM)
2021-11-06 14:41:35 +08:00
Jiaming Yuan
8d7c6366d7
Accept histogram cut instead gradient index in evaluation. (#7336) 2021-10-20 18:04:46 +08:00
Jiaming Yuan
615ab2b03e
Extract evaluate splits from CPU hist. (#7079)
Other than modularizing the split evaluation function, this PR also removes some more functions including `InitNewNodes` and `BuildNodeStats` among some other unused variables.  Also, scattered code like setting leaf weights is grouped into the split evaluator and `NodeEntry` is simplified and made private.  Another subtle difference with the original implementation is that the modified code doesn't call `tree[nidx].Parent()` to traversal upward.
2021-07-07 15:16:25 +08:00