1438 Commits

Author SHA1 Message Date
Tim Gates
cb40bbdadd
docs: fix simple typo, cannonical -> canonical (#8099)
There is a small typo in src/common/partition_builder.h.

Should read `canonical` rather than `cannonical`.

Signed-off-by: Tim Gates <tim.gates@iress.com>
2022-07-20 21:04:50 +08:00
QuellaZhang
703261e78f
[MSVC][std:c++latest] Fix compiler error (#8093)
Co-authored-by: QuellaZhang <zhangyi2090@163.com>
2022-07-20 15:15:39 +08:00
Jiaming Yuan
5156be0f49
Limit max_depth to 30 for GPU. (#8098) 2022-07-20 12:28:49 +08:00
Rong Ou
7a6b711eb8
Remove unused updater basemaker (#8091) 2022-07-19 15:41:27 +08:00
Jiaming Yuan
4083440690
Small cleanups to various data types. (#8086)
- Use `bst_bin_t` in batch param constructor.
- Use `StringView` to avoid `std::string` when appropriate.
- Avoid using `MetaInfo` in quantile constructor to limit the scope of parameter.
2022-07-18 22:39:36 +08:00
Jiaming Yuan
8dd96013f1
Split up column matrix initialization. (#8060)
* Split up column matrix initialization.

This PR splits the column matrix initialization into 2 steps, the first one initializes
the storage while the second one does the transpose. By doing so, we can reuse the code
for Quantile DMatrix.
2022-07-14 10:34:47 +08:00
Jiaming Yuan
abaa593aa0
Fix compiler warnings. (#8059)
- Remove unused parameters.
- Avoid comparison of different signedness.
2022-07-14 05:29:56 +08:00
Rory Mitchell
0bdaca25ca
Use single precision in gain calculation, use pointers instead of span. (#8051) 2022-07-12 21:56:27 +02:00
Rory Mitchell
794cbaa60a
Fuse split evaluation kernels (#8026) 2022-07-05 10:24:31 +02:00
Jiaming Yuan
8746f9cddf
Rename IterativeDMatrix. (#8045) 2022-07-04 18:52:31 +08:00
Rory Mitchell
bc4f802b17
Batch UpdatePosition using cudaMemcpy (#7964) 2022-06-30 17:52:40 +02:00
kiwiwarmnfuzzy
2407381c3d
Force auc.cc to be statically linked (#8039) 2022-06-30 19:24:22 +08:00
Jiaming Yuan
f0c1b842bf
Implement sketching with adapter. (#8019) 2022-06-23 00:03:02 +08:00
Jiaming Yuan
142a208a90
Fix compiler warnings. (#8022)
- Remove/fix unused parameters
- Remove deprecated code in rabit.
- Update dmlc-core.
2022-06-22 21:29:10 +08:00
Rong Ou
e5ec546da5
[Breaking] Remove rabit support for custom reductions and grow_local_histmaker updater (#7992) 2022-06-21 15:08:23 +08:00
Jiaming Yuan
8f8bd8147a
Fix LTR with weighted Quantile DMatrix. (#7975)
* Fix LTR with weighted Quantile DMatrix.

* Better tests.
2022-06-09 01:33:41 +08:00
Jiaming Yuan
1a33b50a0d
Fix compiler warnings. (#7974)
- Remove unused parameters. There are still many warnings that are not yet
addressed. Currently, the warnings in dmlc-core dominate the error log.
- Remove `distributed` parameter from metric.
- Fixes some warnings about signed comparison.
2022-06-06 22:56:25 +08:00
Jiaming Yuan
d48123d23b
Fix rmm build (#7973)
- Optionally switch to c++17
- Use rmm CMake target.
- Workaround compiler errors.
- Fix GPUMetric inheritance.
- Run death tests even if it's built with RMM support.

Co-authored-by: jakirkham <jakirkham@gmail.com>
2022-06-06 20:18:32 +08:00
Jiaming Yuan
b90c6d25e8
Implement max_cat_threshold for CPU. (#7957) 2022-06-04 11:02:46 +08:00
Jiaming Yuan
13b15e07e8
Handle formatted JSON input. (#7953) 2022-06-01 16:20:58 +08:00
Rong Ou
80339c3427
Enable distributed GPU training over Rabit (#7930) 2022-05-31 04:09:45 +08:00
Gyeongjae Choi
cc6d57aa0d
Add minimal emscripten build support (#7954) 2022-05-30 14:11:40 +08:00
Jiaming Yuan
bde4f25794
Handle missing categorical value in CPU evaluator. (#7948) 2022-05-27 14:15:47 +08:00
Jiaming Yuan
18cbebaeb9
Unify the cat split storage for CPU. (#7937)
* Unify the cat split storage for CPU.

* Cleanup.

* Workaround.
2022-05-26 04:14:40 -07:00
Jiaming Yuan
606be9e663
Handle missing values in one hot splits. (#7934) 2022-05-24 20:48:41 +08:00
Jiaming Yuan
18a38f7ca0
Refactor for GHistIndex. (#7923)
* Pass sparse page as adapter, which prepares for quantile dmatrix.
* Remove old external memory code like `rbegin` and extra `Init` function.
* Simplify type dispatch.
2022-05-23 23:04:53 +08:00
Rory Mitchell
f6babc814c
Do not initialise data structures to maximum possible tree size. (#7919) 2022-05-19 19:45:53 +02:00
Jiaming Yuan
edf9a9608e
Fix type conversion warning. (#7916) 2022-05-18 20:14:14 +08:00
Jiaming Yuan
765097d514
Simplify inplace-predict. (#7910)
Pass the `X` as part of Proxy DMatrix instead of an independent `dmlc::any`.
2022-05-18 17:52:00 +08:00
Jiaming Yuan
19775ffe15
Use adapter to initialize column matrix. (#7912) 2022-05-18 16:15:12 +08:00
Rory Mitchell
71d3b2e036
Fuse gpu_hist all-reduce calls where possible (#7867) 2022-05-17 13:27:50 +02:00
Jiaming Yuan
4fcfd9c96e
Fix and cleanup for column matrix. (#7901)
* Fix missed type dispatching for dense columns with missing values.
* Code cleanup to reduce special cases.
* Reduce memory usage.
2022-05-16 21:11:50 +08:00
Philip Hyunsu Cho
4cd14aee5a
Rename misspelled config parameter for pseudo-Huber (#7904) 2022-05-15 06:38:33 -07:00
Jiaming Yuan
1baad8650c
Small cleanup to Column. (#7898)
* Define forward iterator to hide the internal state.
2022-05-15 12:39:10 +08:00
Jiaming Yuan
1b6538b4e5
[breaking] Drop single precision histogram (#7892)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2022-05-13 19:54:55 +08:00
Jiaming Yuan
11d65fcb21
Extract partial sum into an independent function. (#7889) 2022-05-13 14:30:35 +08:00
Jiaming Yuan
94ca52b7b7
Fix overflow in prediction size. (#7885) 2022-05-12 02:44:03 +08:00
Philip Hyunsu Cho
d2bc0f0f08
Allow loading old models from RDS (#7864) 2022-05-06 22:49:38 -07:00
Rory Mitchell
7ef54e39ec
Small refactor to categoricals (#7858) 2022-05-05 17:47:02 +02:00
Rong Ou
14ef38b834
Initial support for federated learning (#7831)
Federated learning plugin for xgboost:
* A gRPC server to aggregate MPI-style requests (allgather, allreduce, broadcast) from federated workers.
* A Rabit engine for the federated environment.
* Integration test to simulate federated learning.

Additional followups are needed to address GPU support, better security, and privacy, etc.
2022-05-05 21:49:22 +08:00
Jiaming Yuan
46e0bce212
Use maximum category in sketch. (#7853) 2022-05-05 19:56:49 +08:00
Jiaming Yuan
317d7be6ee
Always use partition based categorical splits. (#7857) 2022-05-03 22:30:32 +08:00
Rory Mitchell
90cce38236
Remove single_precision_histogram for gpu_hist (#7828) 2022-05-03 14:53:19 +02:00
Jiaming Yuan
288c52596c
Define bin type. (#7850) 2022-04-29 19:41:39 +08:00
Jiaming Yuan
fdf533f2b9
[POC] Experimental support for l1 error. (#7812)
Support adaptive tree, a feature supported by both sklearn and lightgbm.  The tree leaf is recomputed based on residue of labels and predictions after construction.

For l1 error, the optimal value is the median (50 percentile).

This is marked as experimental support for the following reasons:
- The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors.
- Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.
2022-04-26 21:41:55 +08:00
Jiaming Yuan
401d451569
Clear configuration cache. (#7826) 2022-04-21 19:09:54 +08:00
Jiaming Yuan
5815df4c46
Remove warning in 1.4. (#7815) 2022-04-20 01:19:09 +08:00
Jiaming Yuan
5dea21273a
Fix training continuation with categorical model. (#7810)
* Make sure the task is initialized before construction of tree updater.

This is a quick fix meant to be backported to 1.6, for a full fix we should pass the model
param into tree updater by reference instead.
2022-04-15 18:21:02 +08:00
Jiaming Yuan
6fa1afdffc
Avoid compiler warning about comparison. (#7768) 2022-03-31 08:52:14 +08:00
Jiaming Yuan
522636cb52
Bump version. (#7769) 2022-03-31 06:33:22 +08:00