33 Commits

Author SHA1 Message Date
Hui Liu
6762230d9a namespace to reduce code 2023-10-27 10:51:32 -07:00
Hui Liu
4a4b528d54 add namespace aliases to reduce code 2023-10-27 09:11:55 -07:00
Hui Liu
15421e40d9 enable ROCm on latest XGBoost 2023-10-23 11:07:08 -07:00
Your Name
ffbbc9c968 add cuda to hip wrapper 2023-10-17 12:42:37 -07:00
Jiaming Yuan
8c676c889d
Remove internal use of gpu_id. (#9568) 2023-09-20 23:29:51 +08:00
amdsc21
5446c501af merge 23Mar01 2023-05-02 00:05:58 +02:00
Rong Ou
8dbe0510de
More collective aggregators (#9060) 2023-04-22 03:32:05 +08:00
Rong Ou
ba9d24ff7b
Make sure metrics work with column-wise distributed training (#9020) 2023-04-18 03:48:23 +08:00
amdsc21
7ee4734d3a rm device_helpers.hip.h from cu 2023-03-26 00:24:11 +01:00
amdsc21
b9d86d44d6 finish multiclass_metric.cu 2023-03-09 20:37:16 +01:00
amdsc21
6fa248b75f try elementwise_metric.cu 2023-03-08 22:42:48 +01:00
Jiaming Yuan
81b2ee1153
Pass DMatrix into metric for caching. (#8790) 2023-02-13 22:15:05 +08:00
Jiaming Yuan
9f598efc3e
Rename context in Metric. (#8686) 2023-01-17 01:10:13 +08:00
Jiaming Yuan
3e26107a9c
Rename and extract Context. (#8528)
* Rename `GenericParameter` to `Context`.
* Rename header file to reflect the change.
* Rename all references.
2022-12-07 04:58:54 +08:00
Rong Ou
668b8a0ea4
[Breaking] Switch from rabit to the collective communicator (#8257)
* Switch from rabit to the collective communicator

* fix size_t specialization

* really fix size_t

* try again

* add include

* more include

* fix lint errors

* remove rabit includes

* fix pylint error

* return dict from communicator context

* fix communicator shutdown

* fix dask test

* reset communicator mocklist

* fix distributed tests

* do not save device communicator

* fix jvm gpu tests

* add python test for federated communicator

* Update gputreeshap submodule

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 14:39:01 -08:00
Jiaming Yuan
142a208a90
Fix compiler warnings. (#8022)
- Remove/fix unused parameters
- Remove deprecated code in rabit.
- Update dmlc-core.
2022-06-22 21:29:10 +08:00
Jiaming Yuan
1a33b50a0d
Fix compiler warnings. (#7974)
- Remove unused parameters. There are still many warnings that are not yet
addressed. Currently, the warnings in dmlc-core dominate the error log.
- Remove `distributed` parameter from metric.
- Fixes some warnings about signed comparison.
2022-06-06 22:56:25 +08:00
Jiaming Yuan
5b1161bb64
Convert labels into tensor. (#7456)
* Add a new ctor to tensor for `initilizer_list`.
* Change labels from host device vector to tensor.
* Rename the field from `labels_` to `labels` since it's a public member.
2021-12-17 00:58:35 +08:00
Jiaming Yuan
0f7a9b42f1
Use double precision in metric calculation. (#7364) 2021-11-02 12:00:32 +08:00
Jiaming Yuan
4ddf8d001c
Deterministic result for element-wise/mclass metrics. (#7303)
Remove openmp reduction.
2021-10-13 14:22:40 +08:00
Louis Desreumaux
9b530e5697
Improve OpenMP exception handling (#6680) 2021-02-25 13:56:16 +08:00
Jiaming Yuan
7c2686146e
Dask device dmatrix (#5901)
* Fix softprob with empty dmatrix.
2020-07-17 13:17:43 +08:00
Philip Hyunsu Cho
1d22a9be1c
Revert "Reorder includes. (#5749)" (#5771)
This reverts commit d3a0efbf162f3dceaaf684109e1178c150b32de3.
2020-06-09 10:29:28 -07:00
Jiaming Yuan
d3a0efbf16
Reorder includes. (#5749)
* Reorder includes.

* R.
2020-06-03 17:30:47 +12:00
Rory Mitchell
d6d1035950
gpu_hist performance fixes (#5558)
* Remove unnecessary cuda API calls

* Fix histogram memory growth
2020-04-19 12:21:13 +12:00
Jiaming Yuan
ac457c56a2
Use `UpdateAllowUnknown' for non-model related parameter. (#4961)
* Use `UpdateAllowUnknown' for non-model related parameter.

Model parameter can not pack an additional boolean value due to binary IO
format.  This commit deals only with non-model related parameter configuration.

* Add tidy command line arg for use-dmlc-gtest.
2019-10-23 05:50:12 -04:00
Rong Ou
733ed24dd9 further cleanup of single process multi-GPU code (#4810)
* use subspan in gpu predictor instead of copying
* Revise `HostDeviceVector`
2019-08-30 05:27:23 -04:00
Rong Ou
38ab79f889 Make HostDeviceVector single gpu only (#4773)
* Make HostDeviceVector single gpu only
2019-08-26 09:51:13 +12:00
Jiaming Yuan
f0064c07ab
Refactor configuration [Part II]. (#4577)
* Refactor configuration [Part II].

* General changes:
** Remove `Init` methods to avoid ambiguity.
** Remove `Configure(std::map<>)` to avoid redundant copying and prepare for
   parameter validation. (`std::vector` is returned from `InitAllowUnknown`).
** Add name to tree updaters for easier debugging.

* Learner changes:
** Make `LearnerImpl` the only source of configuration.

    All configurations are stored and carried out by `LearnerImpl::Configure()`.

** Remove booster in C API.

    Originally kept for "compatibility reason", but did not state why.  So here
    we just remove it.

** Add a `metric_names_` field in `LearnerImpl`.
** Remove `LazyInit`.  Configuration will always be lazy.
** Run `Configure` before every iteration.

* Predictor changes:
** Allocate both cpu and gpu predictor.
** Remove cpu_predictor from gpu_predictor.

    `GBTree` is now used to dispatch the predictor.

** Remove some GPU Predictor tests.

* IO

No IO changes.  The binary model format stability is tested by comparing
hashing value of save models between two commits
2019-07-20 08:34:56 -04:00
sriramch
90f683b25b Set the appropriate device before freeing device memory... (#4566)
* - set the appropriate device before freeing device memory...
   - pr #4532 added a global memory tracker/logger to keep track of number of (de)allocations
     and peak memory usage on a per device basis.
   - this pr adds the appropriate check to make sure that the (de)allocation counts and memory usages
     makes sense for the device. since verbosity is typically increased on debug/non-retail builds.  
* - pre-create cub allocators and reuse them
   - create them once and not resize them dynamically. we need to ensure that these allocators
     are created and destroyed exactly once so that the appropriate device id's are set
2019-06-18 14:58:05 +12:00
Jiaming Yuan
c589eff941
De-duplicate GPU parameters. (#4454)
* Only define `gpu_id` and `n_gpus` in `LearnerTrainParam`
* Pass LearnerTrainParam through XGBoost vid factory method.
* Disable all GPU usage when GPU related parameters are not specified (fixes XGBoost choosing GPU over aggressively).
* Test learner train param io.
* Fix gpu pickling.
2019-05-29 11:55:57 +08:00
Rong Ou
eaab364a63 More explict sharding methods for device memory (#4396)
* Rename the Reshard method to Shard

* Add a new Reshard method for sharding a vector that's already sharded
2019-05-01 11:47:22 +12:00
Jiaming Yuan
84d992babc
GPU multiclass metrics (#4368)
* Port multi classes metrics to CUDA.
2019-04-15 17:47:47 +08:00