229 Commits

Author SHA1 Message Date
Jiaming Yuan
cac2cd2e94
[R] Set number of threads in demos and tests. (#9591)
- Restrict the number of threads in IO.
- Specify the number of threads in demos and tests.
- Add helper scripts for checks.
2023-09-23 21:44:03 +08:00
Jiaming Yuan
8c676c889d
Remove internal use of gpu_id. (#9568) 2023-09-20 23:29:51 +08:00
Jiaming Yuan
adea842c83
Fix inplace predict with fallback when base margin is used. (#9536)
- Copy meta info from proxy DMatrix.
- Use `std::call_once` to emit less warnings.
2023-09-05 01:04:24 +08:00
Jiaming Yuan
ddf2e68821
Use the new DeviceOrd in the linalg module. (#9527) 2023-08-29 13:37:29 +08:00
Jiaming Yuan
972730cde0
Use matrix for gradient. (#9508)
- Use the `linalg::Matrix` for storing gradients.
- New API for the custom objective.
- Custom objective for multi-class/multi-target is now required to return the correct shape.
- Custom objective for Python can accept arrays with any strides. (row-major, column-major)
2023-08-24 05:29:52 +08:00
Jiaming Yuan
8c10af45a0
Delay the check for vector leaf. (#9509) 2023-08-23 01:53:40 +08:00
Jiaming Yuan
3c09399f29
Fix device dispatch for linear updater. (#9507) 2023-08-23 00:17:35 +08:00
Jiaming Yuan
912e341d57
Initial GPU support for the approx tree method. (#9414) 2023-07-31 15:50:28 +08:00
Jiaming Yuan
275da176ba
Document for device ordinal. (#9398)
- Rewrite GPU demos. notebook is converted to script to avoid committing additional png plots.
- Add GPU demos into the sphinx gallery.
- Add RMM demos into the sphinx gallery.
- Test for firing threads with different device ordinals.
2023-07-22 15:26:29 +08:00
Jiaming Yuan
6e18d3a290
[pyspark] Handle the device parameter in pyspark. (#9390)
- Handle the new `device` parameter in PySpark.
- Deprecate the old `use_gpu` parameter.
2023-07-18 08:47:03 +08:00
Jiaming Yuan
04aff3af8e
Define the new device parameter. (#9362) 2023-07-13 19:30:25 +08:00
Jiaming Yuan
97ed944209
Unify the hist tree method for different devices. (#9363) 2023-07-11 10:04:39 +08:00
Jiaming Yuan
41c6813496
Preserve order of saved updaters config. (#9355)
- Save the updater sequence as an array instead of object.
- Warn only once.

The compatibility is kept, but we should be able to break it as the config is not loaded
in pickle model and it's declared to be not stable.
2023-07-05 20:20:07 +08:00
Jiaming Yuan
645037e376
Improve test coverage with predictor configuration. (#9354)
* Improve test coverage with predictor configuration.

- Test with ext memory.
- Test with QDM.
- Test with dart.
2023-07-05 15:17:22 +08:00
Jiaming Yuan
39390cc2ee
[breaking] Remove the predictor param, allow fallback to prediction using DMatrix. (#9129)
- A `DeviceOrd` struct is implemented to indicate the device. It will eventually replace the `gpu_id` parameter.
- The `predictor` parameter is removed.
- Fallback to `DMatrix` when `inplace_predict` is not available.
- The heuristic for choosing a predictor is only used during training.
2023-07-03 19:23:54 +08:00
Jiaming Yuan
f4798718c7
Use hist as the default tree method. (#9320) 2023-06-27 23:04:24 +08:00
Jiaming Yuan
bc267dd729
Use ptr from mmap for GHistIndexMatrix and ColumnMatrix. (#9315)
* Use ptr from mmap for `GHistIndexMatrix` and `ColumnMatrix`.

- Define a resource for holding various types of memory pointers.
- Define ref vector for holding resources.
- Swap the underlying resources for GHist and ColumnM.
- Add documentation for current status.
- s390x support is removed. It should work if you can compile XGBoost, all the old workaround code does is to get GCC to compile.
2023-06-27 19:05:46 +08:00
Jiaming Yuan
acc110c251
[MT-TREE] Support prediction cache and model slicing. (#8968)
- Fix prediction range.
- Support prediction cache in mt-hist.
- Support model slicing.
- Make the booster a Python iterable by defining `__iter__`.
- Cleanup removed/deprecated parameters.
- A new field in the output model `iteration_indptr` for pointing to the ranges of trees for each iteration.
2023-03-27 23:10:54 +08:00
Jiaming Yuan
151882dd26
Initial support for multi-target tree. (#8616)
* Implement multi-target for hist.

- Add new hist tree builder.
- Move data fetchers for tests.
- Dispatch function calls in gbm base on the tree type.
2023-03-22 23:49:56 +08:00
Jiaming Yuan
a551bed803
Remove duplicated learning rate parameter. (#8941) 2023-03-22 20:51:14 +08:00
Jiaming Yuan
9bade7203a
Remove public access to tree model param. (#8902)
* Make tree model param a private member.
* Number of features and targets are immutable after construction.

This is to reduce the number of places where we can run configuration.
2023-03-13 20:55:10 +08:00
Jiaming Yuan
6deaec8027
Pass obj info by reference instead of by value. (#8889)
- Pass obj info into tree updater as const pointer.

This way we don't have to initialize the learner model param before configuring gbm, hence
breaking up the dependency of configurations.
2023-03-11 01:38:28 +08:00
Jiaming Yuan
228a46e8ad
Support learning rate for zero-hessian objectives. (#8866) 2023-03-06 20:33:28 +08:00
Jiaming Yuan
4d665b3fb0
Restore clang tidy test. (#8861) 2023-03-03 13:47:04 -08:00
Jiaming Yuan
cfa994d57f
Multi-target support for L1 error. (#8652)
- Add matrix support to the median function.
- Iterate through each target for quantile computation.
2023-01-11 05:51:14 +08:00
Jiaming Yuan
26c9882e23
Fix loading GPU pickle with a CPU-only xgboost distribution. (#8632)
We can handle loading the pickle on a CPU-only machine if the XGBoost is built with CUDA
enabled (Linux and Windows PyPI package), but not if the distribution is CPU-only (macOS
PyPI package).
2023-01-05 02:14:30 +08:00
Jiaming Yuan
3e26107a9c
Rename and extract Context. (#8528)
* Rename `GenericParameter` to `Context`.
* Rename header file to reflect the change.
* Rename all references.
2022-12-07 04:58:54 +08:00
Jiaming Yuan
3fc1046fd3
Reduce compiler warnings on CPU-only build. (#8483) 2022-11-29 00:04:16 +08:00
Jiaming Yuan
031d66ec27
Configuration for init estimation. (#8343)
* Configuration for init estimation.

* Check whether the model needs configuration based on const attribute `ModelFitted`
instead of a mutable state.
* Add parameter `boost_from_average` to tell whether the user has specified base score.
* Add tests.
2022-10-18 01:52:24 +08:00
Rong Ou
668b8a0ea4
[Breaking] Switch from rabit to the collective communicator (#8257)
* Switch from rabit to the collective communicator

* fix size_t specialization

* really fix size_t

* try again

* add include

* more include

* fix lint errors

* remove rabit includes

* fix pylint error

* return dict from communicator context

* fix communicator shutdown

* fix dask test

* reset communicator mocklist

* fix distributed tests

* do not save device communicator

* fix jvm gpu tests

* add python test for federated communicator

* Update gputreeshap submodule

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 14:39:01 -08:00
Jiaming Yuan
fffb1fca52
Calculate base_score based on input labels for mae. (#8107)
Fit an intercept as base score for abs loss.
2022-09-20 20:53:54 +08:00
Jiaming Yuan
142a208a90
Fix compiler warnings. (#8022)
- Remove/fix unused parameters
- Remove deprecated code in rabit.
- Update dmlc-core.
2022-06-22 21:29:10 +08:00
Jiaming Yuan
1a33b50a0d
Fix compiler warnings. (#7974)
- Remove unused parameters. There are still many warnings that are not yet
addressed. Currently, the warnings in dmlc-core dominate the error log.
- Remove `distributed` parameter from metric.
- Fixes some warnings about signed comparison.
2022-06-06 22:56:25 +08:00
Jiaming Yuan
765097d514
Simplify inplace-predict. (#7910)
Pass the `X` as part of Proxy DMatrix instead of an independent `dmlc::any`.
2022-05-18 17:52:00 +08:00
Jiaming Yuan
fdf533f2b9
[POC] Experimental support for l1 error. (#7812)
Support adaptive tree, a feature supported by both sklearn and lightgbm.  The tree leaf is recomputed based on residue of labels and predictions after construction.

For l1 error, the optimal value is the median (50 percentile).

This is marked as experimental support for the following reasons:
- The value is not well defined for distributed training, where we might have empty leaves for local workers. Right now I just use the original leaf value for computing the average with other workers, which might cause significant errors.
- Some follow-ups are required, for exact, pruner, and optimization for quantile function. Also, we need to calculate the initial estimation.
2022-04-26 21:41:55 +08:00
Jiaming Yuan
3c9b04460a
Move num_parallel_tree to model parameter. (#7751)
The size of forest should be a property of model itself instead of a training
hyper-parameter.
2022-03-29 02:32:42 +08:00
Jiaming Yuan
81210420c6
Remove omp_get_max_threads (#7608)
This is the one last PR for removing omp global variable.

* Add context object to the `DMatrix`.  This bridges `DMatrix` with https://github.com/dmlc/xgboost/issues/7308 .
* Require context to be available at the construction time of booster.
* Add `n_threads` support for R csc DMatrix constructor.
* Remove `omp_get_max_threads` in R glue code.
* Remove threading utilities that rely on omp global variable.
2022-01-28 16:09:22 +08:00
Jiaming Yuan
a1bcd33a3b
[breaking] Change internal model serialization to UBJSON. (#7556)
* Use typed array for models.
* Change the memory snapshot format.
* Add new C API for saving to raw format.
2022-01-16 02:11:53 +08:00
Jiaming Yuan
e94b766310
Fix early stopping with linear model. (#7554) 2022-01-13 21:53:06 +08:00
Jiaming Yuan
001503186c
Rewrite approx (#7214)
This PR rewrites the approx tree method to use codebase from hist for better performance and code sharing.

The rewrite has many benefits:
- Support for both `max_leaves` and `max_depth`.
- Support for `grow_policy`.
- Support for mono constraint.
- Support for feature weights.
- Support for easier bin configuration (`max_bin`).
- Support for categorical data.
- Faster performance for most of the datasets. (many times faster)
- Support for prediction cache.
- Significantly better performance for external memory.
- Unites the code base between approx and hist.
2022-01-10 21:15:05 +08:00
Jiaming Yuan
0df2ae63c7
Fix num_boosted_rounds for linear model. (#7538)
* Add note.

* Fix n boosted rounds.
2022-01-05 03:29:33 +08:00
Jiaming Yuan
28af6f9abb
Remove omp_get_max_threads in gbm and linear. (#7537)
* Use ctx in gbm.

* Use ctx threads in gbm and linear.
2022-01-05 03:28:52 +08:00
Jiaming Yuan
d33854af1b
[Breaking] Accept multi-dim meta info. (#7405)
This PR changes base_margin into a 3-dim array, with one of them being reserved for multi-target classification. Also, a breaking change is made for binary serialization due to extra dimension along with a fix for saving the feature weights. Lastly, it unifies the prediction initialization between CPU and GPU. After this PR, the meta info setter in Python will be based on array interface.
2021-11-18 23:02:54 +08:00
Jiaming Yuan
ca6f980932
Check number of trees in inplace predict. (#7409) 2021-11-12 18:20:23 +08:00
Jiaming Yuan
c968217ca8
[R] Fix global feature importance and predict with 1 sample. (#7394)
* [R] Fix global feature importance.

* Add implementation for tree index.  The parameter is not documented in C API since we
should work on porting the model slicing to R instead of supporting more use of tree
index.

* Fix the difference between "gain" and "total_gain".

* debug.

* Fix prediction.
2021-11-05 10:07:00 +08:00
Jiaming Yuan
b06040b6d0
Implement a general array view. (#7365)
* Replace existing matrix and vector view.

This is to prepare for handling higher dimension data and prediction when we support multi-target models.
2021-11-05 04:16:11 +08:00
Jiaming Yuan
4100827971
Pass infomation about objective to tree methods. (#7385)
* Define the `ObjInfo` and pass it down to every tree updater.
2021-11-04 01:52:44 +08:00
Jiaming Yuan
3a4f51f39f
Avoid calling CUDA code on CPU for linear model. (#7154) 2021-09-01 10:45:31 +08:00
Jiaming Yuan
d080b5a953
Fix model slicing. (#7149)
* Use correct pointer.
* Remove best_iteration/best_score.
2021-08-03 11:51:56 +08:00
Jiaming Yuan
1c8fdf2218
Remove use of device_idx in dh::LaunchN. (#7063)
It's an unused parameter, removing it can make the CI log more readable.
2021-06-29 11:37:26 +08:00