154 Commits

Author SHA1 Message Date
Jiaming Yuan
292bb677e5
[EM] Support mmap backed ellpack. (#10602)
- Support resource view in ellpack.
- Define the CUDA version of MMAP resource.
- Define the CUDA version of malloc resource.
- Refactor cuda runtime API wrappers, and add memory access related wrappers.
- gather windows macros into a single header.
2024-07-18 08:20:21 +08:00
Jiaming Yuan
a5a58102e5
Revamp the rabit implementation. (#10112)
This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features:
- Federated learning for both CPU and GPU.
- NCCL.
- More data types.
- A unified interface for all the underlying implementations.
- Improved timeout handling for both tracker and workers.
- Exhausted tests with metrics (fixed a couple of bugs along the way).
- A reusable tracker for Python and JVM packages.
2024-05-20 11:56:23 +08:00
Jiaming Yuan
53fc17578f
Use std::uint64_t for row index. (#10120)
- Use std::uint64_t instead of size_t to avoid implementation-defined type.
- Rename to bst_idx_t, to account for other types of indexing.
- Small cleanup to the base header.
2024-03-15 18:43:49 +08:00
Louis Desreumaux
edf501d227
Implement contribution prediction with QuantileDMatrix (#10043)
---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2024-02-19 21:03:29 +08:00
Philip Hyunsu Cho
4dfbe2a893
[CI] Test building for 32-bit arch (#10021)
* [CI] Test building for 32-bit arch

* Update CMakeLists.txt

* Fix yaml

* Use Debian container

* Remove -Werror for 32-bit

* Revert "Remove -Werror for 32-bit"

This reverts commit c652bc6a037361bcceaf56fb01863210b462793d.

* Don't error for overloaded-virtual warning

* Ignore some warnings from dmlc-core

* Fix compiler warnings

* Fix formatting

* Apply suggestions from code review

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>

* Add more cast

---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2024-01-31 13:20:51 -08:00
Jiaming Yuan
faf0f2df10
Support dataframe data format in native XGBoost. (#9828)
- Implement a columnar adapter.
- Refactor Python pandas handling code to avoid converting into a single numpy array.
- Add support in R for transforming columns.
- Support R data.frame and factor type.
2023-12-12 09:56:31 +08:00
Jiaming Yuan
06bdc15e9b
[coll] Pass context to various functions. (#9772)
* [coll] Pass context to various functions.

In the future, the `Context` object would be required for collective operations, this PR
passes the context object to some required functions to prepare for swapping out the
implementation.
2023-11-08 09:54:05 +08:00
Jiaming Yuan
8c676c889d
Remove internal use of gpu_id. (#9568) 2023-09-20 23:29:51 +08:00
Rong Ou
d8c3cc92ae
More support for column split in gpu predictor (#9562) 2023-09-14 08:13:13 +08:00
Jiaming Yuan
ddf2e68821
Use the new DeviceOrd in the linalg module. (#9527) 2023-08-29 13:37:29 +08:00
Jiaming Yuan
f05294a6f2
Fix clang warnings. (#9447)
- static function in header. (which is marked as unused due to translation unit
visibility).
- Implicit copy operator is deprecated.
- Unused lambda capture.
- Moving a temporary variable prevents copy elision.
2023-08-09 15:34:45 +08:00
Jiaming Yuan
645037e376
Improve test coverage with predictor configuration. (#9354)
* Improve test coverage with predictor configuration.

- Test with ext memory.
- Test with QDM.
- Test with dart.
2023-07-05 15:17:22 +08:00
Jiaming Yuan
39390cc2ee
[breaking] Remove the predictor param, allow fallback to prediction using DMatrix. (#9129)
- A `DeviceOrd` struct is implemented to indicate the device. It will eventually replace the `gpu_id` parameter.
- The `predictor` parameter is removed.
- Fallback to `DMatrix` when `inplace_predict` is not available.
- The heuristic for choosing a predictor is only used during training.
2023-07-03 19:23:54 +08:00
Rong Ou
3a0f787703
Support column split in GPU predictor (#9343) 2023-07-03 04:05:34 +08:00
Jiaming Yuan
54da4b3185
Cleanup to prepare for using mmap pointer in external memory. (#9317)
- Update SparseDMatrix comment.
- Use a pointer in the bitfield. We will replace the `std::vector<bool>` in `ColumnMatrix` with bitfield.
- Clean up the page source. The timer is removed as it's inaccurate once we swap the mmap pointer into the page.
2023-06-22 06:43:11 +08:00
Rong Ou
ff122d61ff
More tests for cpu predictor with column split (#9270) 2023-06-08 22:47:19 +08:00
ZHAOKAI WANG
84d3fcb7ea
Fix cpu_predictor categorical feature disaptch (#9256) 2023-06-08 01:24:04 +08:00
Rong Ou
962a20693f
More support for column split in cpu predictor (#9244)
- Added column split support to `PredictInstance` and `PredictLeaf`.
- Refactoring of tests.
2023-06-05 08:05:38 +08:00
Jiaming Yuan
08ce495b5d
Use Booster context in DMatrix. (#8896)
- Pass context from booster to DMatrix.
- Use context instead of integer for `n_threads`.
- Check the consistency configuration for `max_bin`.
- Test for all combinations of initialization options.
2023-04-28 21:47:14 +08:00
Jiaming Yuan
0e470ef606
Optimize prediction with QuantileDMatrix. (#9096)
- Reduce overhead in `FVecDrop`.
- Reduce overhead caused by `HostVector()` calls.
2023-04-28 00:51:41 +08:00
Rong Ou
ff26cd3212
More tests for column split and vertical federated learning (#8985)
Added some more tests for the learner and fit_stump, for both column-wise distributed learning and vertical federated learning.

Also moved the `IsRowSplit` and `IsColumnSplit` methods from the `DMatrix` to the `MetaInfo` since in some places we only have access to the `MetaInfo`. Added a new convenience method `IsVerticalFederatedLearning`.

Some refactoring of the testing fixtures.
2023-03-28 16:40:26 +08:00
Jiaming Yuan
acc110c251
[MT-TREE] Support prediction cache and model slicing. (#8968)
- Fix prediction range.
- Support prediction cache in mt-hist.
- Support model slicing.
- Make the booster a Python iterable by defining `__iter__`.
- Cleanup removed/deprecated parameters.
- A new field in the output model `iteration_indptr` for pointing to the ranges of trees for each iteration.
2023-03-27 23:10:54 +08:00
Jiaming Yuan
151882dd26
Initial support for multi-target tree. (#8616)
* Implement multi-target for hist.

- Add new hist tree builder.
- Move data fetchers for tests.
- Dispatch function calls in gbm base on the tree type.
2023-03-22 23:49:56 +08:00
Jiaming Yuan
c400fa1e8d
Predictor for vector leaf. (#8898) 2023-03-14 19:07:10 +08:00
Jiaming Yuan
9bade7203a
Remove public access to tree model param. (#8902)
* Make tree model param a private member.
* Number of features and targets are immutable after construction.

This is to reduce the number of places where we can run configuration.
2023-03-13 20:55:10 +08:00
Jiaming Yuan
36a7396658
Replace dmlc any with std any. (#8892) 2023-03-11 06:11:04 +08:00
Jiaming Yuan
2aa838c75e
Define multi-strategy parameter. (#8890) 2023-03-11 02:58:01 +08:00
Jiaming Yuan
5feee8d4a9
Define core multi-target regression tree structure. (#8884)
- Define a new tree struct embedded in the `RegTree`.
- Provide dispatching functions in `RegTree`.
- Fix some c++-17 warnings about the use of nodiscard (currently we disable the warning on
  the CI).
- Use uint32_t instead of size_t for `bst_target_t` as it has a defined size and can be used
  as part of dmlc parameter.
- Hide the `Segment` struct inside the categorical split matrix.
2023-03-09 19:03:06 +08:00
Mauro Leggieri
90c0633a28
Fixes compilation errors on MSVC x86 targets (#8823) 2023-02-26 03:20:28 +08:00
Rong Ou
a65ad0bd9c
Support column split in histogram builder (#8811) 2023-02-17 22:37:01 +08:00
Jiaming Yuan
594371e35b
Fix CPP lint. (#8807) 2023-02-15 20:16:35 +08:00
Jiaming Yuan
d11a0044cf
Generalize prediction cache. (#8783)
* Extract most of the functionality into `DMatrixCache`.
* Move API entry to independent file to reduce dependency on `predictor.h` file.
* Add test.

---------

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2023-02-13 12:36:43 +08:00
Jiaming Yuan
34eee56256
Fix compiler warnings. (#8703)
Fix warnings about signed/unsigned comparisons.
2023-01-21 15:16:23 +08:00
Rong Ou
78396f8a6e
Initial support for column-split cpu predictor (#8676) 2023-01-18 06:33:13 +08:00
Jiaming Yuan
beefd28471
Split up SHAP from RegTree. (#8612)
* Split up SHAP from `RegTree`.

Simplify the tree interface.
2023-01-04 18:17:47 +08:00
Jiaming Yuan
43a647a4dd
Fix inference with categorical feature. (#8591) 2022-12-15 17:57:26 +08:00
Jiaming Yuan
3e26107a9c
Rename and extract Context. (#8528)
* Rename `GenericParameter` to `Context`.
* Rename header file to reflect the change.
* Rename all references.
2022-12-07 04:58:54 +08:00
Jiaming Yuan
fffb1fca52
Calculate base_score based on input labels for mae. (#8107)
Fit an intercept as base score for abs loss.
2022-09-20 20:53:54 +08:00
Jiaming Yuan
2c70751d1e
Implement iterative DMatrix for CPU. (#8116) 2022-07-26 22:34:21 +08:00
Jiaming Yuan
142a208a90
Fix compiler warnings. (#8022)
- Remove/fix unused parameters
- Remove deprecated code in rabit.
- Update dmlc-core.
2022-06-22 21:29:10 +08:00
Jiaming Yuan
765097d514
Simplify inplace-predict. (#7910)
Pass the `X` as part of Proxy DMatrix instead of an independent `dmlc::any`.
2022-05-18 17:52:00 +08:00
Jiaming Yuan
b52c4e13b0
[dask] Fix empty partition with pandas input. (#7644)
Empty partition is different from empty dataset.  For the former case, each worker has
non-empty dask collections, but each collection might contain empty partition.
2022-02-14 19:35:51 +08:00
Jiaming Yuan
2775c2a1ab
Prepare external memory support for hist. (#7638)
This PR prepares the GHistIndexMatrix to host the column matrix which is used by the hist tree method by accepting sparse_threshold parameter.

Some cleanups are made to ensure the correct batch param is being passed into DMatrix along with some additional tests for correctness of SimpleDMatrix.
2022-02-10 16:58:02 +08:00
Jiaming Yuan
e5e47c3c99
Clarify the behavior of invalid categorical value handling. (#7529) 2022-01-13 16:11:52 +08:00
Jiaming Yuan
68cdbc9c16
Remove omp_get_max_threads in CPU predictor. (#7519)
This is part of the on going effort to remove the dependency on global omp variables.
2022-01-04 22:12:15 +08:00
Jiaming Yuan
557ffc4bf5
Reduce base margin to 2 dim for now. (#7455) 2021-11-27 00:46:13 +08:00
Jiaming Yuan
d33854af1b
[Breaking] Accept multi-dim meta info. (#7405)
This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.
2021-11-18 23:02:54 +08:00
Jiaming Yuan
a13321148a
Support multi-class with base margin. (#7381)
This is already partially supported but never properly tested. So the only possible way to use it is calling `numpy.ndarray.flatten` with `base_margin` before passing it into XGBoost. This PR adds proper support
for most of the data types along with tests.
2021-11-02 13:38:00 +08:00
Jiaming Yuan
d8a549e6ac
Avoid thread block with sparse data. (#7255) 2021-09-25 13:11:34 +08:00
Robert Maynard
1a75f43304
Allow compilation with nvcc 11.4 (#7131)
* Use type aliases for discard iterators

* update to include host_vector as thrust 1.12 doesn't bring it in as a side-effect

* cub::DispatchRadixSort requires signed offset types
2021-07-27 20:05:33 +08:00