Commit Graph

76 Commits

Author SHA1 Message Date
Jiaming Yuan
06bdc15e9b [coll] Pass context to various functions. (#9772)
* [coll] Pass context to various functions.

In the future, the `Context` object would be required for collective operations, this PR
passes the context object to some required functions to prepare for swapping out the
implementation.
2023-11-08 09:54:05 +08:00
Jiaming Yuan
ddf2e68821 Use the new DeviceOrd in the linalg module. (#9527) 2023-08-29 13:37:29 +08:00
Jiaming Yuan
645037e376 Improve test coverage with predictor configuration. (#9354)
* Improve test coverage with predictor configuration.

- Test with ext memory.
- Test with QDM.
- Test with dart.
2023-07-05 15:17:22 +08:00
Jiaming Yuan
39390cc2ee [breaking] Remove the predictor param, allow fallback to prediction using DMatrix. (#9129)
- A `DeviceOrd` struct is implemented to indicate the device. It will eventually replace the `gpu_id` parameter.
- The `predictor` parameter is removed.
- Fallback to `DMatrix` when `inplace_predict` is not available.
- The heuristic for choosing a predictor is only used during training.
2023-07-03 19:23:54 +08:00
Rong Ou
3a0f787703 Support column split in GPU predictor (#9343) 2023-07-03 04:05:34 +08:00
Rong Ou
ff122d61ff More tests for cpu predictor with column split (#9270) 2023-06-08 22:47:19 +08:00
ZHAOKAI WANG
84d3fcb7ea Fix cpu_predictor categorical feature disaptch (#9256) 2023-06-08 01:24:04 +08:00
Rong Ou
962a20693f More support for column split in cpu predictor (#9244)
- Added column split support to `PredictInstance` and `PredictLeaf`.
- Refactoring of tests.
2023-06-05 08:05:38 +08:00
Jiaming Yuan
08ce495b5d Use Booster context in DMatrix. (#8896)
- Pass context from booster to DMatrix.
- Use context instead of integer for `n_threads`.
- Check the consistency configuration for `max_bin`.
- Test for all combinations of initialization options.
2023-04-28 21:47:14 +08:00
Jiaming Yuan
0e470ef606 Optimize prediction with QuantileDMatrix. (#9096)
- Reduce overhead in `FVecDrop`.
- Reduce overhead caused by `HostVector()` calls.
2023-04-28 00:51:41 +08:00
Rong Ou
ff26cd3212 More tests for column split and vertical federated learning (#8985)
Added some more tests for the learner and fit_stump, for both column-wise distributed learning and vertical federated learning.

Also moved the `IsRowSplit` and `IsColumnSplit` methods from the `DMatrix` to the `MetaInfo` since in some places we only have access to the `MetaInfo`. Added a new convenience method `IsVerticalFederatedLearning`.

Some refactoring of the testing fixtures.
2023-03-28 16:40:26 +08:00
Jiaming Yuan
acc110c251 [MT-TREE] Support prediction cache and model slicing. (#8968)
- Fix prediction range.
- Support prediction cache in mt-hist.
- Support model slicing.
- Make the booster a Python iterable by defining `__iter__`.
- Cleanup removed/deprecated parameters.
- A new field in the output model `iteration_indptr` for pointing to the ranges of trees for each iteration.
2023-03-27 23:10:54 +08:00
Jiaming Yuan
151882dd26 Initial support for multi-target tree. (#8616)
* Implement multi-target for hist.

- Add new hist tree builder.
- Move data fetchers for tests.
- Dispatch function calls in gbm base on the tree type.
2023-03-22 23:49:56 +08:00
Jiaming Yuan
c400fa1e8d Predictor for vector leaf. (#8898) 2023-03-14 19:07:10 +08:00
Jiaming Yuan
9bade7203a Remove public access to tree model param. (#8902)
* Make tree model param a private member.
* Number of features and targets are immutable after construction.

This is to reduce the number of places where we can run configuration.
2023-03-13 20:55:10 +08:00
Jiaming Yuan
36a7396658 Replace dmlc any with std any. (#8892) 2023-03-11 06:11:04 +08:00
Mauro Leggieri
90c0633a28 Fixes compilation errors on MSVC x86 targets (#8823) 2023-02-26 03:20:28 +08:00
Rong Ou
a65ad0bd9c Support column split in histogram builder (#8811) 2023-02-17 22:37:01 +08:00
Jiaming Yuan
34eee56256 Fix compiler warnings. (#8703)
Fix warnings about signed/unsigned comparisons.
2023-01-21 15:16:23 +08:00
Rong Ou
78396f8a6e Initial support for column-split cpu predictor (#8676) 2023-01-18 06:33:13 +08:00
Jiaming Yuan
beefd28471 Split up SHAP from RegTree. (#8612)
* Split up SHAP from `RegTree`.

Simplify the tree interface.
2023-01-04 18:17:47 +08:00
Jiaming Yuan
3e26107a9c Rename and extract Context. (#8528)
* Rename `GenericParameter` to `Context`.
* Rename header file to reflect the change.
* Rename all references.
2022-12-07 04:58:54 +08:00
Jiaming Yuan
fffb1fca52 Calculate base_score based on input labels for mae. (#8107)
Fit an intercept as base score for abs loss.
2022-09-20 20:53:54 +08:00
Jiaming Yuan
2c70751d1e Implement iterative DMatrix for CPU. (#8116) 2022-07-26 22:34:21 +08:00
Jiaming Yuan
142a208a90 Fix compiler warnings. (#8022)
- Remove/fix unused parameters
- Remove deprecated code in rabit.
- Update dmlc-core.
2022-06-22 21:29:10 +08:00
Jiaming Yuan
765097d514 Simplify inplace-predict. (#7910)
Pass the `X` as part of Proxy DMatrix instead of an independent `dmlc::any`.
2022-05-18 17:52:00 +08:00
Jiaming Yuan
68cdbc9c16 Remove omp_get_max_threads in CPU predictor. (#7519)
This is part of the on going effort to remove the dependency on global omp variables.
2022-01-04 22:12:15 +08:00
Jiaming Yuan
d33854af1b [Breaking] Accept multi-dim meta info. (#7405)
This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.
2021-11-18 23:02:54 +08:00
Jiaming Yuan
a13321148a Support multi-class with base margin. (#7381)
This is already partially supported but never properly tested. So the only possible way to use it is calling `numpy.ndarray.flatten` with `base_margin` before passing it into XGBoost. This PR adds proper support
for most of the data types along with tests.
2021-11-02 13:38:00 +08:00
Jiaming Yuan
d8a549e6ac Avoid thread block with sparse data. (#7255) 2021-09-25 13:11:34 +08:00
Jiaming Yuan
bbfffb444d Fix race condition in CPU shap. (#7050) 2021-06-21 10:03:15 +08:00
Jiaming Yuan
29f8fd6fee Support categorical split in tree model dump. (#7036) 2021-06-18 16:46:20 +08:00
Jiaming Yuan
f79cc4a7a4 Implement categorical prediction for CPU and GPU predict leaf. (#7001)
* Categorical prediction with CPU predictor and GPU predict leaf.

* Implement categorical prediction for CPU prediction.
* Implement categorical prediction for GPU predict leaf.
* Refactor the prediction functions to have a unified get next node function.

Co-authored-by: Shvets Kirill <kirill.shvets@intel.com>
2021-06-11 10:11:45 +08:00
Louis Desreumaux
9b530e5697 Improve OpenMP exception handling (#6680) 2021-02-25 13:56:16 +08:00
ShvetsKS
9f15b9e322 Optimize CPU prediction (#6696)
Co-authored-by: Shvets Kirill <kirill.shvets@intel.com>
2021-02-16 14:41:22 +08:00
Jiaming Yuan
e8c5c53e2f Use Predictor for dart. (#6693)
* Use normal predictor for dart booster.
* Implement `inplace_predict` for dart.
* Enable `dart` for dask interface now that it's thread-safe.
* categorical data should be working out of box for dart now.

The implementation is not very efficient as it has to pull back the data and
apply weight for each tree, but still a significant improvement over previous
implementation as now we no longer binary search for each sample.

* Fix output prediction shape on dataframe.
2021-02-09 23:30:19 +08:00
Jiaming Yuan
4656b09d5d [breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668)
* Add a new API function for predicting on `DMatrix`.  This function aligns
with rest of the `XGBoosterPredictFrom*` functions on semantic of function
arguments.
* Purge `ntree_limit` from libxgboost, use iteration instead.
* [dask] Use `inplace_predict` by default for dask sklearn models.
* [dask] Run prediction shape inference on worker instead of client.

The breaking change is in the Python sklearn `apply` function, I made it to be
consistent with other prediction functions where `best_iteration` is used by
default.
2021-02-08 18:26:32 +08:00
Jiaming Yuan
411592a347 Enhance inplace prediction. (#6653)
* Accept array interface for csr and array.
* Accept an optional proxy dmatrix for metainfo.

This constructs an explicit `_ProxyDMatrix` type in Python.

* Remove unused doc.
* Add strict output.
2021-02-02 11:41:46 +08:00
Jiaming Yuan
c3c8e66fc9 Make prediction functions thread safe. (#6648) 2021-01-28 23:29:43 +08:00
Jiaming Yuan
f2f7dd87b8 Use view for SparsePage exclusively. (#6590) 2021-01-11 18:04:55 +08:00
Jiaming Yuan
8a17610666 Implement GPU predict leaf. (#6187) 2020-11-11 17:33:47 +08:00
ShvetsKS
d411f98d26 simple fix for static shedule in predict (#6357)
Co-authored-by: ShvetsKS <kirill.shvets@intel.com>
2020-11-09 17:01:30 +08:00
ShvetsKS
a4ce0eae43 CPU predict performance improvement (#6127)
Co-authored-by: ShvetsKS <kirill.shvets@intel.com>
2020-10-08 15:50:21 +03:00
Rory Mitchell
dda9e1e487 Update GPUTreeshap (#6163)
* Reduce shap test duration

* Test interoperability with shap package

* Add feature interactions

* Update GPUTreeShap
2020-09-28 09:43:47 +13:00
Jiaming Yuan
ee70a2380b Unify CPU hist sketching (#5880) 2020-08-12 01:33:06 +08:00
Philip Hyunsu Cho
1d22a9be1c Revert "Reorder includes. (#5749)" (#5771)
This reverts commit d3a0efbf16.
2020-06-09 10:29:28 -07:00
Jiaming Yuan
cacff9232a Remove column major specialization. (#5755)
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-06-05 16:19:14 +08:00
Jiaming Yuan
d3a0efbf16 Reorder includes. (#5749)
* Reorder includes.

* R.
2020-06-03 17:30:47 +12:00
Jiaming Yuan
0012f2ef93 Upgrade clang-tidy on CI. (#5469)
* Correct all clang-tidy errors.
* Upgrade clang-tidy to 10 on CI.

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-04-05 04:42:29 +08:00
Jiaming Yuan
6601a641d7 Thread safe, inplace prediction. (#5389)
Normal prediction with DMatrix is now thread safe with locks.  Added inplace prediction is lock free thread safe.

When data is on device (cupy, cudf), the returned data is also on device.

* Implementation for numpy, csr, cudf and cupy.

* Implementation for dask.

* Remove sync in simple dmatrix.
2020-03-30 15:35:28 +08:00