11 Commits

Author SHA1 Message Date
Jiaming Yuan
b90c6d25e8
Implement max_cat_threshold for CPU. (#7957) 2022-06-04 11:02:46 +08:00
Jiaming Yuan
46e0bce212
Use maximum category in sketch. (#7853) 2022-05-05 19:56:49 +08:00
Jiaming Yuan
317d7be6ee
Always use partition based categorical splits. (#7857) 2022-05-03 22:30:32 +08:00
Jiaming Yuan
0d0abe1845
Support optimal partitioning for GPU hist. (#7652)
* Implement `MaxCategory` in quantile.
* Implement partition-based split for GPU evaluation.  Currently, it's based on the existing evaluation function.
* Extract an evaluator from GPU Hist to store the needed states.
* Added some CUDA stream/event utilities.
* Update document with references.
* Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
2022-02-15 03:03:12 +08:00
Jiaming Yuan
deab0e32ba
Validate out of range categorical value. (#7576)
* Use float in CPU categorical set to preserve the input value.
* Check out of range values.
2022-01-18 20:16:19 +08:00
Jiaming Yuan
e5e47c3c99
Clarify the behavior of invalid categorical value handling. (#7529) 2022-01-13 16:11:52 +08:00
Jiaming Yuan
d7d1b6e3a6
CPU evaluation for cat data. (#7393)
* Implementation for one hot based.
* Implementation for partition based. (LightGBM)
2021-11-06 14:41:35 +08:00
Jiaming Yuan
a55d43ccfd
Add test for invalid categorical data values. (#7380)
* Add test for invalid categorical data values.

* Add check during sketching.
2021-11-02 18:00:52 +08:00
Jiaming Yuan
ac9bfaa4f2
Handle missing values in dataframe with category dtype. (#7331)
* Replace -1 in pandas initializer.
* Unify `IsValid` functor.
* Mimic pandas data handling in cuDF glue code.
* Check invalid categories.
* Fix DDM sketching.
2021-10-28 03:33:54 +08:00
Jiaming Yuan
8fa32fdda2
Implement categorical data support for SHAP. (#7053)
* Add CPU implementation.
* Update GPUTreeSHAP.
* Add GPU implementation by defining custom split condition.
2021-06-25 19:02:46 +08:00
Jiaming Yuan
20c95be625
Expand categorical node. (#6028)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-08-25 18:53:57 +08:00