1345 Commits

Author SHA1 Message Date
Jiaming Yuan
446d536c23
Fix loading DMatrix binary in distributed env. (#8149)
- Try to load DMatrix binary before trying to parse text input.
- Remove some unmaintained code.
2022-08-10 22:53:16 +08:00
Jiaming Yuan
bcc8679a05
Update CUDA docker image and NCCL. (#8139) 2022-08-07 16:32:41 +08:00
Jiaming Yuan
d87f69215e
Quantile DMatrix for CPU. (#8130)
- Add a new `QuantileDMatrix` that works for both CPU and GPU.
- Deprecate `DeviceQuantileDMatrix`.
2022-08-02 15:51:23 +08:00
Jiaming Yuan
2c70751d1e
Implement iterative DMatrix for CPU. (#8116) 2022-07-26 22:34:21 +08:00
Jiaming Yuan
7785d65c8a
Fix feature weights with multiple column sampling. (#8100) 2022-07-22 20:23:05 +08:00
Jiaming Yuan
4a4e5c7c18
Prepare gradient index for Quantile DMatrix. (#8103)
* Prepare gradient index for Quantile DMatrix.

- Implement push batch with adapter batch.
- Implement `GetFvalue` for prediction.
2022-07-22 17:26:33 +08:00
Rory Mitchell
1be09848a7
Refactor split valuation kernel (#8073) 2022-07-21 15:41:50 +02:00
Tim Gates
cb40bbdadd
docs: fix simple typo, cannonical -> canonical (#8099)
There is a small typo in src/common/partition_builder.h.

Should read `canonical` rather than `cannonical`.

Signed-off-by: Tim Gates <tim.gates@iress.com>
2022-07-20 21:04:50 +08:00
QuellaZhang
703261e78f
[MSVC][std:c++latest] Fix compiler error (#8093)
Co-authored-by: QuellaZhang <zhangyi2090@163.com>
2022-07-20 15:15:39 +08:00
Jiaming Yuan
5156be0f49
Limit max_depth to 30 for GPU. (#8098) 2022-07-20 12:28:49 +08:00
Rong Ou
7a6b711eb8
Remove unused updater basemaker (#8091) 2022-07-19 15:41:27 +08:00
Jiaming Yuan
4083440690
Small cleanups to various data types. (#8086)
- Use `bst_bin_t` in batch param constructor.
- Use `StringView` to avoid `std::string` when appropriate.
- Avoid using `MetaInfo` in quantile constructor to limit the scope of parameter.
2022-07-18 22:39:36 +08:00
Jiaming Yuan
8dd96013f1
Split up column matrix initialization. (#8060)
* Split up column matrix initialization.

This PR splits the column matrix initialization into 2 steps, the first one initializes
the storage while the second one does the transpose. By doing so, we can reuse the code
for Quantile DMatrix.
2022-07-14 10:34:47 +08:00
Jiaming Yuan
abaa593aa0
Fix compiler warnings. (#8059)
- Remove unused parameters.
- Avoid comparison of different signedness.
2022-07-14 05:29:56 +08:00
Rory Mitchell
0bdaca25ca
Use single precision in gain calculation, use pointers instead of span. (#8051) 2022-07-12 21:56:27 +02:00
Rory Mitchell
794cbaa60a
Fuse split evaluation kernels (#8026) 2022-07-05 10:24:31 +02:00
Jiaming Yuan
8746f9cddf
Rename IterativeDMatrix. (#8045) 2022-07-04 18:52:31 +08:00
Rory Mitchell
bc4f802b17
Batch UpdatePosition using cudaMemcpy (#7964) 2022-06-30 17:52:40 +02:00
kiwiwarmnfuzzy
2407381c3d
Force auc.cc to be statically linked (#8039) 2022-06-30 19:24:22 +08:00
Jiaming Yuan
f0c1b842bf
Implement sketching with adapter. (#8019) 2022-06-23 00:03:02 +08:00
Jiaming Yuan
142a208a90
Fix compiler warnings. (#8022)
- Remove/fix unused parameters
- Remove deprecated code in rabit.
- Update dmlc-core.
2022-06-22 21:29:10 +08:00
Rong Ou
e5ec546da5
[Breaking] Remove rabit support for custom reductions and grow_local_histmaker updater (#7992) 2022-06-21 15:08:23 +08:00
Jiaming Yuan
8f8bd8147a
Fix LTR with weighted Quantile DMatrix. (#7975)
* Fix LTR with weighted Quantile DMatrix.

* Better tests.
2022-06-09 01:33:41 +08:00
Jiaming Yuan
1a33b50a0d
Fix compiler warnings. (#7974)
- Remove unused parameters. There are still many warnings that are not yet
addressed. Currently, the warnings in dmlc-core dominate the error log.
- Remove `distributed` parameter from metric.
- Fixes some warnings about signed comparison.
2022-06-06 22:56:25 +08:00
Jiaming Yuan
d48123d23b
Fix rmm build (#7973)
- Optionally switch to c++17
- Use rmm CMake target.
- Workaround compiler errors.
- Fix GPUMetric inheritance.
- Run death tests even if it's built with RMM support.

Co-authored-by: jakirkham <jakirkham@gmail.com>
2022-06-06 20:18:32 +08:00
Jiaming Yuan
b90c6d25e8
Implement max_cat_threshold for CPU. (#7957) 2022-06-04 11:02:46 +08:00
Jiaming Yuan
13b15e07e8
Handle formatted JSON input. (#7953) 2022-06-01 16:20:58 +08:00
Rong Ou
80339c3427
Enable distributed GPU training over Rabit (#7930) 2022-05-31 04:09:45 +08:00
Gyeongjae Choi
cc6d57aa0d
Add minimal emscripten build support (#7954) 2022-05-30 14:11:40 +08:00
Jiaming Yuan
bde4f25794
Handle missing categorical value in CPU evaluator. (#7948) 2022-05-27 14:15:47 +08:00
Jiaming Yuan
18cbebaeb9
Unify the cat split storage for CPU. (#7937)
* Unify the cat split storage for CPU.

* Cleanup.

* Workaround.
2022-05-26 04:14:40 -07:00
Jiaming Yuan
606be9e663
Handle missing values in one hot splits. (#7934) 2022-05-24 20:48:41 +08:00
Jiaming Yuan
18a38f7ca0
Refactor for GHistIndex. (#7923)
* Pass sparse page as adapter, which prepares for quantile dmatrix.
* Remove old external memory code like `rbegin` and extra `Init` function.
* Simplify type dispatch.
2022-05-23 23:04:53 +08:00
Rory Mitchell
f6babc814c
Do not initialise data structures to maximum possible tree size. (#7919) 2022-05-19 19:45:53 +02:00
Jiaming Yuan
edf9a9608e
Fix type conversion warning. (#7916) 2022-05-18 20:14:14 +08:00
Jiaming Yuan
765097d514
Simplify inplace-predict. (#7910)
Pass the `X` as part of Proxy DMatrix instead of an independent `dmlc::any`.
2022-05-18 17:52:00 +08:00
Jiaming Yuan
19775ffe15
Use adapter to initialize column matrix. (#7912) 2022-05-18 16:15:12 +08:00
Rory Mitchell
71d3b2e036
Fuse gpu_hist all-reduce calls where possible (#7867) 2022-05-17 13:27:50 +02:00
Jiaming Yuan
4fcfd9c96e
Fix and cleanup for column matrix. (#7901)
* Fix missed type dispatching for dense columns with missing values.
* Code cleanup to reduce special cases.
* Reduce memory usage.
2022-05-16 21:11:50 +08:00
Philip Hyunsu Cho
4cd14aee5a
Rename misspelled config parameter for pseudo-Huber (#7904) 2022-05-15 06:38:33 -07:00
Jiaming Yuan
1baad8650c
Small cleanup to Column. (#7898)
* Define forward iterator to hide the internal state.
2022-05-15 12:39:10 +08:00
Jiaming Yuan
1b6538b4e5
[breaking] Drop single precision histogram (#7892)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2022-05-13 19:54:55 +08:00
Jiaming Yuan
11d65fcb21
Extract partial sum into an independent function. (#7889) 2022-05-13 14:30:35 +08:00
Jiaming Yuan
94ca52b7b7
Fix overflow in prediction size. (#7885) 2022-05-12 02:44:03 +08:00
Philip Hyunsu Cho
d2bc0f0f08
Allow loading old models from RDS (#7864) 2022-05-06 22:49:38 -07:00
Rory Mitchell
7ef54e39ec
Small refactor to categoricals (#7858) 2022-05-05 17:47:02 +02:00
Rong Ou
14ef38b834
Initial support for federated learning (#7831)
Federated learning plugin for xgboost:
* A gRPC server to aggregate MPI-style requests (allgather, allreduce, broadcast) from federated workers.
* A Rabit engine for the federated environment.
* Integration test to simulate federated learning.

Additional followups are needed to address GPU support, better security, and privacy, etc.
2022-05-05 21:49:22 +08:00
Jiaming Yuan
46e0bce212
Use maximum category in sketch. (#7853) 2022-05-05 19:56:49 +08:00
Jiaming Yuan
317d7be6ee
Always use partition based categorical splits. (#7857) 2022-05-03 22:30:32 +08:00
Rory Mitchell
90cce38236
Remove single_precision_histogram for gpu_hist (#7828) 2022-05-03 14:53:19 +02:00