178 Commits

Author SHA1 Message Date
Rong Ou
da6803b75b
Support column-wise data split with in-memory inputs (#9628)
---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-10-17 12:16:39 +08:00
Rong Ou
e164d51c43
Improve allgather functions (#9649) 2023-10-12 23:31:43 +08:00
Jiaming Yuan
680d53db43
Extract JSON utils. (#9645) 2023-10-10 07:15:14 +08:00
Jiaming Yuan
4d7a187cb0
Remove XGBoosterGetModelRaw. (#9617)
Deprecated in 1.6.
2023-09-29 02:29:33 +08:00
Jiaming Yuan
d95be1c38d
Small cleanup to jvm iter adapter. (#9616)
- Remove header dependency on c_api
- Remove remaining code for arrow.
2023-09-29 00:39:07 +08:00
Jiaming Yuan
60526100e3
Support arrow through pandas ext types. (#9612)
- Use pandas extension type for pyarrow support.
- Additional support for QDM.
- Additional support for inplace_predict.
2023-09-28 17:00:16 +08:00
Jiaming Yuan
c75a3bc0a9
[breaking] [jvm-packages] Remove rabit check point. (#9599)
- Add `numBoostedRound` to jvm packages
- Remove rabit checkpoint version.
- Change the starting version of training continuation in JVM [breaking].
- Redefine the checkpoint version policy in jvm package. [breaking]
- Rename the Python check point callback parameter. [breaking]
- Unifies the checkpoint policy between Python and JVM.
2023-09-26 18:06:34 +08:00
Jiaming Yuan
8c676c889d
Remove internal use of gpu_id. (#9568) 2023-09-20 23:29:51 +08:00
Jiaming Yuan
ddf2e68821
Use the new DeviceOrd in the linalg module. (#9527) 2023-08-29 13:37:29 +08:00
Jiaming Yuan
be6a552956
[R] Support multi-class custom objective. (#9526) 2023-08-29 08:27:13 +08:00
Jiaming Yuan
972730cde0
Use matrix for gradient. (#9508)
- Use the `linalg::Matrix` for storing gradients.
- New API for the custom objective.
- Custom objective for multi-class/multi-target is now required to return the correct shape.
- Custom objective for Python can accept arrays with any strides. (row-major, column-major)
2023-08-24 05:29:52 +08:00
Jiaming Yuan
044fea1281
Drop support for loading remote files. (#9504) 2023-08-21 23:34:05 +08:00
Jiaming Yuan
16eb41936d
Handle the new device parameter in dask and demos. (#9386)
* Handle the new `device` parameter in dask and demos.

- Check no ordinal is specified in the dask interface.
- Update demos.
- Update dask doc.
- Update the condition for QDM.
2023-07-15 19:11:20 +08:00
Jiaming Yuan
04aff3af8e
Define the new device parameter. (#9362) 2023-07-13 19:30:25 +08:00
Jiaming Yuan
20c52f07d2
Support exporting cut values (#9356) 2023-07-08 15:32:41 +08:00
edumugi
c3124813e8
Support numpy vertical split (#9365) 2023-07-08 13:18:12 +08:00
Jiaming Yuan
39390cc2ee
[breaking] Remove the predictor param, allow fallback to prediction using DMatrix. (#9129)
- A `DeviceOrd` struct is implemented to indicate the device. It will eventually replace the `gpu_id` parameter.
- The `predictor` parameter is removed.
- Fallback to `DMatrix` when `inplace_predict` is not available.
- The heuristic for choosing a predictor is only used during training.
2023-07-03 19:23:54 +08:00
Jiaming Yuan
08ce495b5d
Use Booster context in DMatrix. (#8896)
- Pass context from booster to DMatrix.
- Use context instead of integer for `n_threads`.
- Check the consistency configuration for `max_bin`.
- Test for all combinations of initialization options.
2023-04-28 21:47:14 +08:00
Jiaming Yuan
151882dd26
Initial support for multi-target tree. (#8616)
* Implement multi-target for hist.

- Add new hist tree builder.
- Move data fetchers for tests.
- Dispatch function calls in gbm base on the tree type.
2023-03-22 23:49:56 +08:00
Jiaming Yuan
2aa838c75e
Define multi-strategy parameter. (#8890) 2023-03-11 02:58:01 +08:00
Mauro Leggieri
90c0633a28
Fixes compilation errors on MSVC x86 targets (#8823) 2023-02-26 03:20:28 +08:00
Jiaming Yuan
d11a0044cf
Generalize prediction cache. (#8783)
* Extract most of the functionality into `DMatrixCache`.
* Move API entry to independent file to reduce dependency on `predictor.h` file.
* Add test.

---------

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2023-02-13 12:36:43 +08:00
Jiaming Yuan
c1786849e3
Use array interface for CSC matrix. (#8672)
* Use array interface for CSC matrix.

Use array interface for CSC matrix and align the interface with CSR and dense.

- Fix nthread issue in the R package DMatrix.
- Unify the behavior of handling `missing` with other inputs.
- Unify the behavior of handling `missing` around R, Python, Java, and Scala DMatrix.
- Expose `num_non_missing` to the JVM interface.
- Deprecate old CSR and CSC constructors.
2023-02-05 01:59:46 +08:00
Jiaming Yuan
3760cede0f
Consistent use of context to specify number of threads. (#8733)
- Use context in all tests.
- Use context in R.
- Use context in C API DMatrix initialization. (0 threads is used as dft).
2023-01-30 15:25:31 +08:00
Jiaming Yuan
43152657d4
Extract JSON type check. (#8677)
- Reuse it in `GetMissing`.
- Add test.
2023-01-17 03:11:07 +08:00
Rong Ou
3ceeb8c61c
Add data split mode to DMatrix MetaInfo (#8568) 2022-12-25 20:37:37 +08:00
Jiaming Yuan
5f1a6fca0d
[R] Use new interface for creating DMatrix from CSR. (#8455)
* [R] Use new interface for creating DMatrix from CSR.

- CSC is still using the old API.

The old API is not aware of `nthread` parameter, which makes DMatrix to use all available
thread during construction and during transformation lie `SparsePage` -> `CSCPage`.
2022-11-23 21:36:43 +08:00
Rong Ou
8e76f5f595
Use DataSplitMode to configure data loading (#8434)
* Use `DataSplitMode` to configure data loading
2022-11-08 16:21:50 +08:00
Rong Ou
8f3dee58be
Speed up tests with federated learning enabled (#8350)
* Speed up tests with federated learning enabled

* Re-enable timeouts

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-17 15:17:04 -07:00
Jiaming Yuan
3ef1703553
Allow using string view to find JSON value. (#8332)
- Allow comparison between string and string view.
- Fix compiler warnings.
2022-10-13 17:10:13 +08:00
Rong Ou
39afdac3be
Better error message when world size and rank are set as strings (#8316)
Co-authored-by: jiamingy <jm.yuan@outlook.com>
2022-10-12 15:53:25 +08:00
Rong Ou
8d4038da57
Don't split input data in federated mode (#8279)
Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 18:19:28 -08:00
Rong Ou
668b8a0ea4
[Breaking] Switch from rabit to the collective communicator (#8257)
* Switch from rabit to the collective communicator

* fix size_t specialization

* really fix size_t

* try again

* add include

* more include

* fix lint errors

* remove rabit includes

* fix pylint error

* return dict from communicator context

* fix communicator shutdown

* fix dask test

* reset communicator mocklist

* fix distributed tests

* do not save device communicator

* fix jvm gpu tests

* add python test for federated communicator

* Update gputreeshap submodule

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 14:39:01 -08:00
Jiaming Yuan
97c3a80a34
Add C document to sphinx, fix arrow. (#8300)
- Group C API.
- Add C API sphinx doc.
- Consistent use of `OptionalArg` and the parameter name `config`.
- Remove call to deprecated functions in demo.
- Fix some formatting errors.
- Add links to c examples in the document (only visible with doxygen pages)
- Fix arrow.
2022-10-05 09:52:15 +08:00
Jiaming Yuan
55cf24cc32
Obtain CSR matrix from DMatrix. (#8269) 2022-09-29 20:41:43 +08:00
Jiaming Yuan
3fd331f8f2
Add checks to C pointer arguments. (#8254) 2022-09-22 19:02:22 +08:00
Rong Ou
a2686543a9
Common interface for collective communication (#8057)
* implement broadcast for federated communicator

* implement allreduce

* add communicator factory

* add device adapter

* add device communicator to factory

* add rabit communicator

* add rabit communicator to the factory

* add nccl device communicator

* add synchronize to device communicator

* add back print and getprocessorname

* add python wrapper and c api

* clean up types

* fix non-gpu build

* try to fix ci

* fix std::size_t

* portable string compare ignore case

* c style size_t

* fix lint errors

* cross platform setenv

* fix memory leak

* fix lint errors

* address review feedback

* add python test for rabit communicator

* fix failing gtest

* use json to configure communicators

* fix lint error

* get rid of factories

* fix cpu build

* fix include

* fix python import

* don't export collective.py yet

* skip collective communicator pytest on windows

* add review feedback

* update documentation

* remove mpi communicator type

* fix tests

* shutdown the communicator separately

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2022-09-12 15:21:12 -07:00
Rong Ou
ad3bc0edee
Allow insecure gRPC connections for federated learning (#8181)
* Allow insecure gRPC connections for federated learning

* format
2022-08-19 12:16:14 +08:00
Jiaming Yuan
446d536c23
Fix loading DMatrix binary in distributed env. (#8149)
- Try to load DMatrix binary before trying to parse text input.
- Remove some unmaintained code.
2022-08-10 22:53:16 +08:00
Jiaming Yuan
d87f69215e
Quantile DMatrix for CPU. (#8130)
- Add a new `QuantileDMatrix` that works for both CPU and GPU.
- Deprecate `DeviceQuantileDMatrix`.
2022-08-02 15:51:23 +08:00
Jiaming Yuan
2c70751d1e
Implement iterative DMatrix for CPU. (#8116) 2022-07-26 22:34:21 +08:00
Jiaming Yuan
142a208a90
Fix compiler warnings. (#8022)
- Remove/fix unused parameters
- Remove deprecated code in rabit.
- Update dmlc-core.
2022-06-22 21:29:10 +08:00
Jiaming Yuan
13b15e07e8
Handle formatted JSON input. (#7953) 2022-06-01 16:20:58 +08:00
Jiaming Yuan
765097d514
Simplify inplace-predict. (#7910)
Pass the `X` as part of Proxy DMatrix instead of an independent `dmlc::any`.
2022-05-18 17:52:00 +08:00
Jiaming Yuan
94ca52b7b7
Fix overflow in prediction size. (#7885) 2022-05-12 02:44:03 +08:00
Rong Ou
14ef38b834
Initial support for federated learning (#7831)
Federated learning plugin for xgboost:
* A gRPC server to aggregate MPI-style requests (allgather, allreduce, broadcast) from federated workers.
* A Rabit engine for the federated environment.
* Integration test to simulate federated learning.

Additional followups are needed to address GPU support, better security, and privacy, etc.
2022-05-05 21:49:22 +08:00
Jiaming Yuan
3c9b04460a
Move num_parallel_tree to model parameter. (#7751)
The size of forest should be a property of model itself instead of a training
hyper-parameter.
2022-03-29 02:32:42 +08:00
Jiaming Yuan
64575591d8
Use context in SetInfo. (#7687)
* Use the name `Context`.
* Pass a context object into `SetInfo`.
* Add context to proxy matrix.
* Add context to iterative DMatrix.

This is to remove the use of the default number of threads during `SetInfo` as a follow-up on
removing the global omp variable while preparing for CUDA stream semantic.  Currently, XGBoost
uses the legacy CUDA stream, we will gradually remove them in the future in favor of non-blocking streams.
2022-03-24 22:16:26 +08:00
Xiaochang Wu
613ec36c5a
Support building SimpleDMatrix from Arrow data format (#7512)
* Integrate with Arrow C data API.
* Support Arrow dataset.
* Support Arrow table.

Co-authored-by: Xiaochang Wu <xiaochang.wu@intel.com>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
Co-authored-by: Zhang Zhang <zhang.zhang@intel.com>
2022-03-15 13:25:19 +08:00
Jiaming Yuan
dac9eb13bd
Implement new save_raw in Python. (#7572)
* Expose the new C API function to Python.
* Remove old document and helper script.
* Small optimization to the `save_raw` and Json ctors.
2022-01-19 02:27:51 +08:00