40 Commits

Author SHA1 Message Date
Jiaming Yuan
151882dd26
Initial support for multi-target tree. (#8616)
* Implement multi-target for hist.

- Add new hist tree builder.
- Move data fetchers for tests.
- Dispatch function calls in gbm base on the tree type.
2023-03-22 23:49:56 +08:00
Jiaming Yuan
8685556af2
Implement hist evaluator for multi-target tree. (#8908) 2023-03-15 01:42:51 +08:00
Jiaming Yuan
5ba3509dd3
Define multi expand entry. (#8895) 2023-03-13 19:31:05 +08:00
Jiaming Yuan
5feee8d4a9
Define core multi-target regression tree structure. (#8884)
- Define a new tree struct embedded in the `RegTree`.
- Provide dispatching functions in `RegTree`.
- Fix some c++-17 warnings about the use of nodiscard (currently we disable the warning on
  the CI).
- Use uint32_t instead of size_t for `bst_target_t` as it has a defined size and can be used
  as part of dmlc parameter.
- Hide the `Segment` struct inside the categorical split matrix.
2023-03-09 19:03:06 +08:00
Jiaming Yuan
228a46e8ad
Support learning rate for zero-hessian objectives. (#8866) 2023-03-06 20:33:28 +08:00
Jiaming Yuan
4d665b3fb0
Restore clang tidy test. (#8861) 2023-03-03 13:47:04 -08:00
Rong Ou
7cbaee9916
Support column split in approx tree method (#8847) 2023-03-02 03:59:07 +08:00
Rong Ou
a65ad0bd9c
Support column split in histogram builder (#8811) 2023-02-17 22:37:01 +08:00
Jiaming Yuan
282b1729da
Specify the number of threads for parallel sort. (#8735)
* Specify the number of threads for parallel sort.

- Pass context object into argsort.
- Replace macros with inline functions.
2023-02-16 00:20:19 +08:00
Jiaming Yuan
21a28f2cc5
Small refactor for hist builder. (#8698)
- Use span instead of vector as parameter. No perf change as the builder work on pointer.
- Use const pointer for reg tree.
2023-01-30 14:06:41 +08:00
Jiaming Yuan
e49e0998c0
Extract CPU sampling routines. (#8697) 2023-01-19 23:28:18 +08:00
Jiaming Yuan
3e26107a9c
Rename and extract Context. (#8528)
* Rename `GenericParameter` to `Context`.
* Rename header file to reflect the change.
* Rename all references.
2022-12-07 04:58:54 +08:00
Rong Ou
668b8a0ea4
[Breaking] Switch from rabit to the collective communicator (#8257)
* Switch from rabit to the collective communicator

* fix size_t specialization

* really fix size_t

* try again

* add include

* more include

* fix lint errors

* remove rabit includes

* fix pylint error

* return dict from communicator context

* fix communicator shutdown

* fix dask test

* reset communicator mocklist

* fix distributed tests

* do not save device communicator

* fix jvm gpu tests

* add python test for federated communicator

* Update gputreeshap submodule

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 14:39:01 -08:00
Dmitry Razdoburdin
eb7bbee2c9
Optional by-column histogram build. (#8233)
Co-authored-by: dmitry.razdoburdin <drazdobu@jfldaal005.jf.intel.com>
2022-09-22 05:16:13 +08:00
Jiaming Yuan
b5eb36f1af
Add max_cat_threshold to GPU and handle missing cat values. (#8212) 2022-09-07 00:57:51 +08:00
Jiaming Yuan
abaa593aa0
Fix compiler warnings. (#8059)
- Remove unused parameters.
- Avoid comparison of different signedness.
2022-07-14 05:29:56 +08:00
Jiaming Yuan
142a208a90
Fix compiler warnings. (#8022)
- Remove/fix unused parameters
- Remove deprecated code in rabit.
- Update dmlc-core.
2022-06-22 21:29:10 +08:00
Rong Ou
e5ec546da5
[Breaking] Remove rabit support for custom reductions and grow_local_histmaker updater (#7992) 2022-06-21 15:08:23 +08:00
Jiaming Yuan
1a33b50a0d
Fix compiler warnings. (#7974)
- Remove unused parameters. There are still many warnings that are not yet
addressed. Currently, the warnings in dmlc-core dominate the error log.
- Remove `distributed` parameter from metric.
- Fixes some warnings about signed comparison.
2022-06-06 22:56:25 +08:00
Jiaming Yuan
b90c6d25e8
Implement max_cat_threshold for CPU. (#7957) 2022-06-04 11:02:46 +08:00
Jiaming Yuan
bde4f25794
Handle missing categorical value in CPU evaluator. (#7948) 2022-05-27 14:15:47 +08:00
Jiaming Yuan
18cbebaeb9
Unify the cat split storage for CPU. (#7937)
* Unify the cat split storage for CPU.

* Cleanup.

* Workaround.
2022-05-26 04:14:40 -07:00
Jiaming Yuan
606be9e663
Handle missing values in one hot splits. (#7934) 2022-05-24 20:48:41 +08:00
Jiaming Yuan
1b6538b4e5
[breaking] Drop single precision histogram (#7892)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2022-05-13 19:54:55 +08:00
Jiaming Yuan
317d7be6ee
Always use partition based categorical splits. (#7857) 2022-05-03 22:30:32 +08:00
Jiaming Yuan
fdf533f2b9
[POC] Experimental support for l1 error. (#7812)
Support adaptive tree, a feature supported by both sklearn and lightgbm.  The tree leaf is recomputed based on residue of labels and predictions after construction.

For l1 error, the optimal value is the median (50 percentile).

This is marked as experimental support for the following reasons:
- The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors.
- Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.
2022-04-26 21:41:55 +08:00
Jiaming Yuan
4d81c741e9
External memory support for hist (#7531)
* Generate column matrix from gHistIndex.
* Avoid synchronization with the sparse page once the cache is written.
* Cleanups: Remove member variables/functions, change the update routine to look like approx and gpu_hist.
* Remove pruner.
2022-03-22 00:13:20 +08:00
Jiaming Yuan
996cc705af
Small cleanup to hist tree method. (#7735)
* Remove special optimization using number of bins.
* Remove 1-based index for column sampling.
* Remove data layout.
* Unify update prediction cache.
2022-03-20 03:44:55 +08:00
Jiaming Yuan
83a66b4994
Support categorical data for hist. (#7695)
* Extract partitioner from hist.
* Implement categorical data support by passing the gradient index directly into the partitioner.
* Organize/update document.
* Remove code for negative hessian.
2022-02-25 03:47:14 +08:00
Jiaming Yuan
711f7f3851
Avoid std::terminate for R package. (#7661)
This is part of CRAN policies.
2022-02-17 01:27:20 +08:00
Jiaming Yuan
0d0abe1845
Support optimal partitioning for GPU hist. (#7652)
* Implement `MaxCategory` in quantile.
* Implement partition-based split for GPU evaluation.  Currently, it's based on the existing evaluation function.
* Extract an evaluator from GPU Hist to store the needed states.
* Added some CUDA stream/event utilities.
* Update document with references.
* Fixed a bug in approx evaluator where the number of data points is less than the number of categories.
2022-02-15 03:03:12 +08:00
Jiaming Yuan
e060519d4f
Avoid regenerating the gradient index for approx. (#7591) 2022-01-26 21:41:30 +08:00
Jiaming Yuan
deab0e32ba
Validate out of range categorical value. (#7576)
* Use float in CPU categorical set to preserve the input value.
* Check out of range values.
2022-01-18 20:16:19 +08:00
Jiaming Yuan
176110a22d
Support external memory in CPU histogram building. (#7372) 2021-11-23 01:13:33 +08:00
Jiaming Yuan
d4274bc556
Fix typo. (#7433) 2021-11-15 01:28:11 +08:00
Jiaming Yuan
d7d1b6e3a6
CPU evaluation for cat data. (#7393)
* Implementation for one hot based.
* Implementation for partition based. (LightGBM)
2021-11-06 14:41:35 +08:00
Jiaming Yuan
8d7c6366d7
Accept histogram cut instead gradient index in evaluation. (#7336) 2021-10-20 18:04:46 +08:00
Jiaming Yuan
8e619010d0
Extract CPUExpandEntry and HistParam. (#7321)
* Remove kRootNid.
* Check for empty hessian.
2021-10-17 14:22:25 +08:00
Jiaming Yuan
149f209af6
Extract histogram builder from CPU Hist. (#7152)
* Extract the CPU histogram builder.
* Fix tests.
* Reduce number of histograms being built.
2021-08-09 21:15:21 +08:00
Jiaming Yuan
615ab2b03e
Extract evaluate splits from CPU hist. (#7079)
Other than modularizing the split evaluation function, this PR also removes some more functions including `InitNewNodes` and `BuildNodeStats` among some other unused variables.  Also, scattered code like setting leaf weights is grouped into the split evaluator and `NodeEntry` is simplified and made private.  Another subtle difference with the original implementation is that the modified code doesn't call `tree[nidx].Parent()` to traversal upward.
2021-07-07 15:16:25 +08:00