Commit Graph

124 Commits

Author SHA1 Message Date
Jiaming Yuan
594371e35b Fix CPP lint. (#8807) 2023-02-15 20:16:35 +08:00
Jiaming Yuan
d11a0044cf Generalize prediction cache. (#8783)
* Extract most of the functionality into `DMatrixCache`.
* Move API entry to independent file to reduce dependency on `predictor.h` file.
* Add test.

---------

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2023-02-13 12:36:43 +08:00
Jiaming Yuan
34eee56256 Fix compiler warnings. (#8703)
Fix warnings about signed/unsigned comparisons.
2023-01-21 15:16:23 +08:00
Rong Ou
78396f8a6e Initial support for column-split cpu predictor (#8676) 2023-01-18 06:33:13 +08:00
Jiaming Yuan
beefd28471 Split up SHAP from RegTree. (#8612)
* Split up SHAP from `RegTree`.

Simplify the tree interface.
2023-01-04 18:17:47 +08:00
Jiaming Yuan
43a647a4dd Fix inference with categorical feature. (#8591) 2022-12-15 17:57:26 +08:00
Jiaming Yuan
3e26107a9c Rename and extract Context. (#8528)
* Rename `GenericParameter` to `Context`.
* Rename header file to reflect the change.
* Rename all references.
2022-12-07 04:58:54 +08:00
Jiaming Yuan
fffb1fca52 Calculate base_score based on input labels for mae. (#8107)
Fit an intercept as base score for abs loss.
2022-09-20 20:53:54 +08:00
Jiaming Yuan
2c70751d1e Implement iterative DMatrix for CPU. (#8116) 2022-07-26 22:34:21 +08:00
Jiaming Yuan
142a208a90 Fix compiler warnings. (#8022)
- Remove/fix unused parameters
- Remove deprecated code in rabit.
- Update dmlc-core.
2022-06-22 21:29:10 +08:00
Jiaming Yuan
765097d514 Simplify inplace-predict. (#7910)
Pass the `X` as part of Proxy DMatrix instead of an independent `dmlc::any`.
2022-05-18 17:52:00 +08:00
Jiaming Yuan
b52c4e13b0 [dask] Fix empty partition with pandas input. (#7644)
Empty partition is different from empty dataset.  For the former case, each worker has
non-empty dask collections, but each collection might contain empty partition.
2022-02-14 19:35:51 +08:00
Jiaming Yuan
2775c2a1ab Prepare external memory support for hist. (#7638)
This PR prepares the GHistIndexMatrix to host the column matrix which is used by the hist tree method by accepting sparse_threshold parameter.

Some cleanups are made to ensure the correct batch param is being passed into DMatrix along with some additional tests for correctness of SimpleDMatrix.
2022-02-10 16:58:02 +08:00
Jiaming Yuan
e5e47c3c99 Clarify the behavior of invalid categorical value handling. (#7529) 2022-01-13 16:11:52 +08:00
Jiaming Yuan
68cdbc9c16 Remove omp_get_max_threads in CPU predictor. (#7519)
This is part of the on going effort to remove the dependency on global omp variables.
2022-01-04 22:12:15 +08:00
Jiaming Yuan
557ffc4bf5 Reduce base margin to 2 dim for now. (#7455) 2021-11-27 00:46:13 +08:00
Jiaming Yuan
d33854af1b [Breaking] Accept multi-dim meta info. (#7405)
This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.
2021-11-18 23:02:54 +08:00
Jiaming Yuan
a13321148a Support multi-class with base margin. (#7381)
This is already partially supported but never properly tested. So the only possible way to use it is calling `numpy.ndarray.flatten` with `base_margin` before passing it into XGBoost. This PR adds proper support
for most of the data types along with tests.
2021-11-02 13:38:00 +08:00
Jiaming Yuan
d8a549e6ac Avoid thread block with sparse data. (#7255) 2021-09-25 13:11:34 +08:00
Robert Maynard
1a75f43304 Allow compilation with nvcc 11.4 (#7131)
* Use type aliases for discard iterators

* update to include host_vector as thrust 1.12 doesn't bring it in as a side-effect

* cub::DispatchRadixSort requires signed offset types
2021-07-27 20:05:33 +08:00
Jiaming Yuan
1c8fdf2218 Remove use of device_idx in dh::LaunchN. (#7063)
It's an unused parameter, removing it can make the CI log more readable.
2021-06-29 11:37:26 +08:00
Jiaming Yuan
8fa32fdda2 Implement categorical data support for SHAP. (#7053)
* Add CPU implementation.
* Update GPUTreeSHAP.
* Add GPU implementation by defining custom split condition.
2021-06-25 19:02:46 +08:00
Jiaming Yuan
bbfffb444d Fix race condition in CPU shap. (#7050) 2021-06-21 10:03:15 +08:00
Jiaming Yuan
29f8fd6fee Support categorical split in tree model dump. (#7036) 2021-06-18 16:46:20 +08:00
Jiaming Yuan
86715e4cd4 Support categorical data for dask functional interface and DQM. (#7043)
* Support categorical data for dask functional interface and DQM.

* Implement categorical data support for GPU GK-merge.
* Add support for dask functional interface.
* Add support for DQM.

* Get newer cupy.
2021-06-18 13:06:52 +08:00
Jiaming Yuan
f79cc4a7a4 Implement categorical prediction for CPU and GPU predict leaf. (#7001)
* Categorical prediction with CPU predictor and GPU predict leaf.

* Implement categorical prediction for CPU prediction.
* Implement categorical prediction for GPU predict leaf.
* Refactor the prediction functions to have a unified get next node function.

Co-authored-by: Shvets Kirill <kirill.shvets@intel.com>
2021-06-11 10:11:45 +08:00
Jiaming Yuan
a59c7323b4 Fix inplace predict missing value. (#6787) 2021-03-27 05:36:10 +08:00
Louis Desreumaux
9b530e5697 Improve OpenMP exception handling (#6680) 2021-02-25 13:56:16 +08:00
ShvetsKS
9f15b9e322 Optimize CPU prediction (#6696)
Co-authored-by: Shvets Kirill <kirill.shvets@intel.com>
2021-02-16 14:41:22 +08:00
Jiaming Yuan
e8c5c53e2f Use Predictor for dart. (#6693)
* Use normal predictor for dart booster.
* Implement `inplace_predict` for dart.
* Enable `dart` for dask interface now that it's thread-safe.
* categorical data should be working out of box for dart now.

The implementation is not very efficient as it has to pull back the data and
apply weight for each tree, but still a significant improvement over previous
implementation as now we no longer binary search for each sample.

* Fix output prediction shape on dataframe.
2021-02-09 23:30:19 +08:00
Jiaming Yuan
4656b09d5d [breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668)
* Add a new API function for predicting on `DMatrix`.  This function aligns
with rest of the `XGBoosterPredictFrom*` functions on semantic of function
arguments.
* Purge `ntree_limit` from libxgboost, use iteration instead.
* [dask] Use `inplace_predict` by default for dask sklearn models.
* [dask] Run prediction shape inference on worker instead of client.

The breaking change is in the Python sklearn `apply` function, I made it to be
consistent with other prediction functions where `best_iteration` is used by
default.
2021-02-08 18:26:32 +08:00
Jiaming Yuan
411592a347 Enhance inplace prediction. (#6653)
* Accept array interface for csr and array.
* Accept an optional proxy dmatrix for metainfo.

This constructs an explicit `_ProxyDMatrix` type in Python.

* Remove unused doc.
* Add strict output.
2021-02-02 11:41:46 +08:00
Jiaming Yuan
c3c8e66fc9 Make prediction functions thread safe. (#6648) 2021-01-28 23:29:43 +08:00
Jiaming Yuan
f2f7dd87b8 Use view for SparsePage exclusively. (#6590) 2021-01-11 18:04:55 +08:00
Philip Hyunsu Cho
c31e3efa7c Pass correct split_type to GPU predictor (#6491)
* Pass correct split_type to GPU predictor

* Add a test
2020-12-11 19:30:00 -08:00
Honza Sterba
b0036b339b Optionaly fail when gpu_id is set to invalid value (#6342) 2020-11-28 15:14:12 +08:00
Jiaming Yuan
8a17610666 Implement GPU predict leaf. (#6187) 2020-11-11 17:33:47 +08:00
ShvetsKS
d411f98d26 simple fix for static shedule in predict (#6357)
Co-authored-by: ShvetsKS <kirill.shvets@intel.com>
2020-11-09 17:01:30 +08:00
Igor Moura
5e1e972aea Clean up warnings (#6325) 2020-10-30 23:50:29 +08:00
Jiaming Yuan
c4da967b5c Support unity build. (#6295)
* Support unity build.

* Setup on Windows Jenkins.

* Revert "Setup on Windows Jenkins."

This reverts commit 8345cb8d2b009eec8ae9fa6f16412a7c9b6ec12c.
2020-10-28 11:49:28 -07:00
Rory Mitchell
f0c3ff313f Update GPUTreeShap, add docs (#6281)
* Update GPUTreeShap, add docs

* Fix test

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-10-27 18:22:12 +13:00
Igor Moura
d1254808d5 Clean up C++ warnings (#6213) 2020-10-19 23:02:33 +08:00
ShvetsKS
a4ce0eae43 CPU predict performance improvement (#6127)
Co-authored-by: ShvetsKS <kirill.shvets@intel.com>
2020-10-08 15:50:21 +03:00
Jiaming Yuan
798af22ff4 Add categorical data support to GPU predictor. (#6165) 2020-09-29 11:25:34 +08:00
Jiaming Yuan
52c0b3f100 Fix error message. (#6176) 2020-09-29 11:18:25 +08:00
Rory Mitchell
dda9e1e487 Update GPUTreeshap (#6163)
* Reduce shap test duration

* Test interoperability with shap package

* Add feature interactions

* Update GPUTreeShap
2020-09-28 09:43:47 +13:00
Jiaming Yuan
c6f2b8c841 Upgrade gputreeshap. (#6099)
* Upgrade gputreeshap.

Co-authored-by: Rory Mitchell <r.a.mitchell.nz@gmail.com>
2020-09-15 12:57:22 +12:00
Rory Mitchell
2e907abdb8 Updates to GPUTreeShap (#6087)
* Extract paths on device

* Update GPUTreeShap
2020-09-06 13:39:08 +12:00
Jiaming Yuan
80c8547147 Make binary bin search reusable. (#6058)
* Move binary search row to hist util.
* Remove dead code.
2020-08-26 05:05:11 +08:00
Rory Mitchell
9a4e8b1d81 GPUTreeShap (#6038) 2020-08-25 12:47:41 +12:00