84 Commits

Author SHA1 Message Date
Jiaming Yuan
c400fa1e8d
Predictor for vector leaf. (#8898) 2023-03-14 19:07:10 +08:00
Rong Ou
66191e9926
Support cpu quantile sketch with column-wise data split (#8742) 2023-02-05 14:26:24 +08:00
Rong Ou
8af98e30fc
Use in-memory communicator to test quantile (#8710) 2023-01-27 23:28:28 +08:00
Jiaming Yuan
34eee56256
Fix compiler warnings. (#8703)
Fix warnings about signed/unsigned comparisons.
2023-01-21 15:16:23 +08:00
Rong Ou
78396f8a6e
Initial support for column-split cpu predictor (#8676) 2023-01-18 06:33:13 +08:00
Jiaming Yuan
3e26107a9c
Rename and extract Context. (#8528)
* Rename `GenericParameter` to `Context`.
* Rename header file to reflect the change.
* Rename all references.
2022-12-07 04:58:54 +08:00
Jiaming Yuan
6d1452074a
Remove MGPU cpp tests. (#8276)
Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-09-27 21:18:23 +08:00
Jiaming Yuan
fffb1fca52
Calculate base_score based on input labels for mae. (#8107)
Fit an intercept as base score for abs loss.
2022-09-20 20:53:54 +08:00
Jiaming Yuan
bc818316f2
Prepare for improving Windows networking compatibility. (#8234)
* Prepare for improving Windows networking compatibility.

* Include dmlc filesystem indirectly as dmlc/filesystem.h includes windows.h, which
  conflicts with winsock2.h
* Define `NOMINMAX` conditionally.
* Link the winsock library when mysys32 is used.
* Add config file for read the doc.
2022-09-10 15:16:49 +08:00
Jiaming Yuan
2c70751d1e
Implement iterative DMatrix for CPU. (#8116) 2022-07-26 22:34:21 +08:00
Jiaming Yuan
4a4e5c7c18
Prepare gradient index for Quantile DMatrix. (#8103)
* Prepare gradient index for Quantile DMatrix.

- Implement push batch with adapter batch.
- Implement `GetFvalue` for prediction.
2022-07-22 17:26:33 +08:00
Jiaming Yuan
142a208a90
Fix compiler warnings. (#8022)
- Remove/fix unused parameters
- Remove deprecated code in rabit.
- Update dmlc-core.
2022-06-22 21:29:10 +08:00
Jiaming Yuan
1a33b50a0d
Fix compiler warnings. (#7974)
- Remove unused parameters. There are still many warnings that are not yet
addressed. Currently, the warnings in dmlc-core dominate the error log.
- Remove `distributed` parameter from metric.
- Fixes some warnings about signed comparison.
2022-06-06 22:56:25 +08:00
Jiaming Yuan
765097d514
Simplify inplace-predict. (#7910)
Pass the `X` as part of Proxy DMatrix instead of an independent `dmlc::any`.
2022-05-18 17:52:00 +08:00
Jiaming Yuan
fdf533f2b9
[POC] Experimental support for l1 error. (#7812)
Support adaptive tree, a feature supported by both sklearn and lightgbm.  The tree leaf is recomputed based on residue of labels and predictions after construction.

For l1 error, the optimal value is the median (50 percentile).

This is marked as experimental support for the following reasons:
- The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors.
- Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.
2022-04-26 21:41:55 +08:00
Jiaming Yuan
2775c2a1ab
Prepare external memory support for hist. (#7638)
This PR prepares the GHistIndexMatrix to host the column matrix which is used by the hist tree method by accepting sparse_threshold parameter.

Some cleanups are made to ensure the correct batch param is being passed into DMatrix along with some additional tests for correctness of SimpleDMatrix.
2022-02-10 16:58:02 +08:00
Jiaming Yuan
81210420c6
Remove omp_get_max_threads (#7608)
This is the one last PR for removing omp global variable.

* Add context object to the `DMatrix`.  This bridges `DMatrix` with https://github.com/dmlc/xgboost/issues/7308 .
* Require context to be available at the construction time of booster.
* Add `n_threads` support for R csc DMatrix constructor.
* Remove `omp_get_max_threads` in R glue code.
* Remove threading utilities that rely on omp global variable.
2022-01-28 16:09:22 +08:00
Jiaming Yuan
68cdbc9c16
Remove omp_get_max_threads in CPU predictor. (#7519)
This is part of the on going effort to remove the dependency on global omp variables.
2022-01-04 22:12:15 +08:00
Jiaming Yuan
5b1161bb64
Convert labels into tensor. (#7456)
* Add a new ctor to tensor for `initilizer_list`.
* Change labels from host device vector to tensor.
* Rename the field from `labels_` to `labels` since it's a public member.
2021-12-17 00:58:35 +08:00
Jiaming Yuan
557ffc4bf5
Reduce base margin to 2 dim for now. (#7455) 2021-11-27 00:46:13 +08:00
Jiaming Yuan
d33854af1b
[Breaking] Accept multi-dim meta info. (#7405)
This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.
2021-11-18 23:02:54 +08:00
Jiaming Yuan
d8a549e6ac
Avoid thread block with sparse data. (#7255) 2021-09-25 13:11:34 +08:00
Jiaming Yuan
bd1f3a38f0
Rewrite sparse dmatrix using callbacks. (#7092)
- Reduce dependency on dmlc parsers and provide an interface for users to load data by themselves.
- Remove use of threaded iterator and IO queue.
- Remove `page_size`.
- Make sure the number of pages in memory is bounded.
- Make sure the cache can not be violated.
- Provide an interface for internal algorithms to process data asynchronously.
2021-07-16 12:33:31 +08:00
Jiaming Yuan
8fa32fdda2
Implement categorical data support for SHAP. (#7053)
* Add CPU implementation.
* Update GPUTreeSHAP.
* Add GPU implementation by defining custom split condition.
2021-06-25 19:02:46 +08:00
Jiaming Yuan
86715e4cd4
Support categorical data for dask functional interface and DQM. (#7043)
* Support categorical data for dask functional interface and DQM.

* Implement categorical data support for GPU GK-merge.
* Add support for dask functional interface.
* Add support for DQM.

* Get newer cupy.
2021-06-18 13:06:52 +08:00
Jiaming Yuan
f79cc4a7a4
Implement categorical prediction for CPU and GPU predict leaf. (#7001)
* Categorical prediction with CPU predictor and GPU predict leaf.

* Implement categorical prediction for CPU prediction.
* Implement categorical prediction for GPU predict leaf.
* Refactor the prediction functions to have a unified get next node function.

Co-authored-by: Shvets Kirill <kirill.shvets@intel.com>
2021-06-11 10:11:45 +08:00
Igor Rukhovich
19a2c54265
Prediction by indices (subsample < 1) (#6683)
* Another implementation of predicting by indices

* Fixed omp parallel_for variable type

* Removed SparsePageView from Updater
2021-03-16 15:08:20 +13:00
Jiaming Yuan
e8c5c53e2f
Use Predictor for dart. (#6693)
* Use normal predictor for dart booster.
* Implement `inplace_predict` for dart.
* Enable `dart` for dask interface now that it's thread-safe.
* categorical data should be working out of box for dart now.

The implementation is not very efficient as it has to pull back the data and
apply weight for each tree, but still a significant improvement over previous
implementation as now we no longer binary search for each sample.

* Fix output prediction shape on dataframe.
2021-02-09 23:30:19 +08:00
Jiaming Yuan
4656b09d5d
[breaking] Add prediction fucntion for DMatrix and use inplace predict for dask. (#6668)
* Add a new API function for predicting on `DMatrix`.  This function aligns
with rest of the `XGBoosterPredictFrom*` functions on semantic of function
arguments.
* Purge `ntree_limit` from libxgboost, use iteration instead.
* [dask] Use `inplace_predict` by default for dask sklearn models.
* [dask] Run prediction shape inference on worker instead of client.

The breaking change is in the Python sklearn `apply` function, I made it to be
consistent with other prediction functions where `best_iteration` is used by
default.
2021-02-08 18:26:32 +08:00
Jiaming Yuan
411592a347
Enhance inplace prediction. (#6653)
* Accept array interface for csr and array.
* Accept an optional proxy dmatrix for metainfo.

This constructs an explicit `_ProxyDMatrix` type in Python.

* Remove unused doc.
* Add strict output.
2021-02-02 11:41:46 +08:00
ShvetsKS
7f4d3a91b9
Multiclass prediction caching for CPU Hist (#6550)
Co-authored-by: Kirill Shvets <kirill.shvets@intel.com>
2021-01-13 04:42:07 +08:00
Jiaming Yuan
f2f7dd87b8
Use view for SparsePage exclusively. (#6590) 2021-01-11 18:04:55 +08:00
Jiaming Yuan
8a17610666
Implement GPU predict leaf. (#6187) 2020-11-11 17:33:47 +08:00
Jiaming Yuan
43efadea2e
Deterministic data partitioning for external memory (#6317)
* Make external memory data partitioning deterministic.

* Change the meaning of `page_size` from bytes to number of rows.

* Design a data pool.

* Note for external memory.

* Enable unity build on Windows CI.

* Force garbage collect on test.
2020-11-11 06:11:06 +08:00
Jiaming Yuan
798af22ff4
Add categorical data support to GPU predictor. (#6165) 2020-09-29 11:25:34 +08:00
Rory Mitchell
dda9e1e487
Update GPUTreeshap (#6163)
* Reduce shap test duration

* Test interoperability with shap package

* Add feature interactions

* Update GPUTreeShap
2020-09-28 09:43:47 +13:00
Rory Mitchell
9a4e8b1d81
GPUTreeShap (#6038) 2020-08-25 12:47:41 +12:00
Jiaming Yuan
75b8c22b0b
Fix prediction heuristic (#5955)
* Relax check for prediction.
* Relax test in spark test.
* Add tests in C++.
2020-07-29 19:24:07 +08:00
Jiaming Yuan
9b688aca3b
Fix mingw build with R. (#5918) 2020-07-22 02:56:49 +08:00
Jiaming Yuan
a3ec964346
Accept iterator in device dmatrix. (#5783)
* Remove Device DMatrix.
2020-07-07 21:44:48 +08:00
Jiaming Yuan
1a0801238e
Implement iterative DMatrix. (#5837) 2020-07-03 11:44:52 +08:00
Jiaming Yuan
cacff9232a
Remove column major specialization. (#5755)
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-06-05 16:19:14 +08:00
Jiaming Yuan
6671b42dd4
Use ellpack for prediction only when sparsepage doesn't exist. (#5504) 2020-04-10 12:15:46 +08:00
Jiaming Yuan
0012f2ef93
Upgrade clang-tidy on CI. (#5469)
* Correct all clang-tidy errors.
* Upgrade clang-tidy to 10 on CI.

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-04-05 04:42:29 +08:00
Jiaming Yuan
6601a641d7
Thread safe, inplace prediction. (#5389)
Normal prediction with DMatrix is now thread safe with locks.  Added inplace prediction is lock free thread safe.

When data is on device (cupy, cudf), the returned data is also on device.

* Implementation for numpy, csr, cudf and cupy.

* Implementation for dask.

* Remove sync in simple dmatrix.
2020-03-30 15:35:28 +08:00
Jiaming Yuan
4942da64ae
Refactor tests with data generator. (#5439) 2020-03-27 06:44:44 +08:00
Jiaming Yuan
655cf17b60
Predict on Ellpack. (#5327)
* Unify GPU prediction node.
* Add `PageExists`.
* Dispatch prediction on input data for GPU Predictor.
2020-02-23 06:27:03 +08:00
Jiaming Yuan
c35cdecddd
Move prediction cache to Learner. (#5220)
* Move prediction cache into Learner.

* Clean-ups

- Remove duplicated cache in Learner and GBM.
- Remove ad-hoc fix of invalid cache.
- Remove `PredictFromCache` in predictors.
- Remove prediction cache for linear altogether, as it's only moving the
  prediction into training process but doesn't provide any actual overall speed
  gain.
- The cache is now unique to Learner, which means the ownership is no longer
  shared by any other components.

* Changes

- Add version to prediction cache.
- Use weak ptr to check expired DMatrix.
- Pass shared pointer instead of raw pointer.
2020-02-14 13:04:23 +08:00
Jiaming Yuan
3136185bc5
JSON configuration IO. (#5111)
* Add saving/loading JSON configuration.
* Implement Python pickle interface with new IO routines.
* Basic tests for training continuation.
2019-12-15 17:31:53 +08:00
Jiaming Yuan
e089e16e3d
Pass pointer to model parameters. (#5101)
* Pass pointer to model parameters.

This PR de-duplicates most of the model parameters except the one in
`tree_model.h`.  One difficulty is `base_score` is a model property but can be
changed at runtime by objective function.  Hence when performing model IO, we
need to save the one provided by users, instead of the one transformed by
objective.  Here we created an immutable version of `LearnerModelParam` that
represents the value of model parameter after configuration.
2019-12-10 12:11:22 +08:00