Commit Graph

235 Commits

Author SHA1 Message Date
Jiaming Yuan
08ce495b5d Use Booster context in DMatrix. (#8896)
- Pass context from booster to DMatrix.
- Use context instead of integer for `n_threads`.
- Check the consistency configuration for `max_bin`.
- Test for all combinations of initialization options.
2023-04-28 21:47:14 +08:00
Jiaming Yuan
1f9a57d17b [Breaking] Require format to be specified in input URI. (#9077)
Previously, we use `libsvm` as default when format is not specified. However, the dmlc
data parser is not particularly robust against errors, and the most common type of error
is undefined format.

Along with which, we will recommend users to use other data loader instead. We will
continue the maintenance of the parsers as it's currently used for many internal tests
including federated learning.
2023-04-28 19:45:15 +08:00
Rong Ou
a320b402a5 More refactoring to take advantage of collective aggregators (#9081) 2023-04-26 03:36:09 +08:00
Jiaming Yuan
acc110c251 [MT-TREE] Support prediction cache and model slicing. (#8968)
- Fix prediction range.
- Support prediction cache in mt-hist.
- Support model slicing.
- Make the booster a Python iterable by defining `__iter__`.
- Cleanup removed/deprecated parameters.
- A new field in the output model `iteration_indptr` for pointing to the ranges of trees for each iteration.
2023-03-27 23:10:54 +08:00
Jiaming Yuan
5891f752c8 Rework the MAP metric. (#8931)
- The new implementation is more strict as only binary labels are accepted. The previous implementation converts values greater than 1 to 1.
- Deterministic GPU. (no atomic add).
- Fix top-k handling.
- Precise definition of MAP. (There are other variants on how to handle top-k).
- Refactor GPU ranking tests.
2023-03-22 17:45:20 +08:00
Jiaming Yuan
a093770f36 Partitioner for multi-target tree. (#8922) 2023-03-16 18:49:34 +08:00
Jiaming Yuan
26209a42a5 Define git attributes for renormalization. (#8921) 2023-03-16 02:43:11 +08:00
Jiaming Yuan
8be6095ece Implement NDCG cache. (#8893) 2023-03-13 22:16:31 +08:00
Jiaming Yuan
46dfcc7d22 Define a new ranking parameter. (#8887) 2023-03-09 17:46:24 +08:00
Jiaming Yuan
f236640427 Support F order for the tensor type. (#8872)
- Add F order support for tensor and view.
- Use parameter pack for automatic type cast. (avoid excessive static cast for shape).
2023-03-08 03:27:49 +08:00
Jiaming Yuan
4d665b3fb0 Restore clang tidy test. (#8861) 2023-03-03 13:47:04 -08:00
Jiaming Yuan
cce4af4acf Initial support for quantile loss. (#8750)
- Add support for Python.
- Add objective.
2023-02-16 02:30:18 +08:00
Jiaming Yuan
282b1729da Specify the number of threads for parallel sort. (#8735)
* Specify the number of threads for parallel sort.

- Pass context object into argsort.
- Replace macros with inline functions.
2023-02-16 00:20:19 +08:00
Jiaming Yuan
31d3ec07af Extract device algorithms. (#8789) 2023-02-13 20:53:53 +08:00
Jiaming Yuan
457f704e3d Add quantile metric. (#8761) 2023-02-13 19:07:40 +08:00
Rong Ou
ed91e775ec Fix quantile tests running on multi-gpus (#8775)
* Fix quantile tests running on multi-gpus

* Run some gtests with multiple GPUs

* fix mgpu test naming

* Instruct NCCL to print extra logs

* Allocate extra space in /dev/shm to enable NCCL

* use gtest_skip to skip mgpu tests

---------

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2023-02-12 17:00:26 -08:00
Jiaming Yuan
17b709acb9 Rename ranking utils to threading utils. (#8785) 2023-02-12 05:41:18 +08:00
Jiaming Yuan
5f76edd296 Extract make metric name from ranking metric. (#8768)
- Extract the metric parsing routine from ranking.
- Add a test.
- Accept null for string view.
2023-02-09 18:30:21 +08:00
Jiaming Yuan
48cefa012e Support multiple alphas for segmented quantile. (#8758) 2023-02-07 17:17:59 +08:00
Jiaming Yuan
28bb01aa22 Extract optional weight. (#8747)
- Extract optional weight from coommon.h to reduce dependency on this header.
- Add test.
2023-02-07 03:11:53 +08:00
Rong Ou
66191e9926 Support cpu quantile sketch with column-wise data split (#8742) 2023-02-05 14:26:24 +08:00
Jiaming Yuan
c1786849e3 Use array interface for CSC matrix. (#8672)
* Use array interface for CSC matrix.

Use array interface for CSC matrix and align the interface with CSR and dense.

- Fix nthread issue in the R package DMatrix.
- Unify the behavior of handling `missing` with other inputs.
- Unify the behavior of handling `missing` around R, Python, Java, and Scala DMatrix.
- Expose `num_non_missing` to the JVM interface.
- Deprecate old CSR and CSC constructors.
2023-02-05 01:59:46 +08:00
Jiaming Yuan
3760cede0f Consistent use of context to specify number of threads. (#8733)
- Use context in all tests.
- Use context in R.
- Use context in C API DMatrix initialization. (0 threads is used as dft).
2023-01-30 15:25:31 +08:00
Rong Ou
8af98e30fc Use in-memory communicator to test quantile (#8710) 2023-01-27 23:28:28 +08:00
Jiaming Yuan
4416452f94 Return single thread from context when called inside omp region. (#8693) 2023-01-18 09:23:37 +08:00
Jiaming Yuan
43152657d4 Extract JSON type check. (#8677)
- Reuse it in `GetMissing`.
- Add test.
2023-01-17 03:11:07 +08:00
Jiaming Yuan
cfa994d57f Multi-target support for L1 error. (#8652)
- Add matrix support to the median function.
- Iterate through each target for quantile computation.
2023-01-11 05:51:14 +08:00
Jiaming Yuan
8d545ab2a2 Implement fit stump. (#8607) 2023-01-04 04:14:51 +08:00
Rong Ou
3ceeb8c61c Add data split mode to DMatrix MetaInfo (#8568) 2022-12-25 20:37:37 +08:00
Jiaming Yuan
38887a1876 Fix windows build on buildkite. (#8602) 2022-12-16 21:12:24 +08:00
Jiaming Yuan
43a647a4dd Fix inference with categorical feature. (#8591) 2022-12-15 17:57:26 +08:00
Jiaming Yuan
3e26107a9c Rename and extract Context. (#8528)
* Rename `GenericParameter` to `Context`.
* Rename header file to reflect the change.
* Rename all references.
2022-12-07 04:58:54 +08:00
Jiaming Yuan
e3bf5565ab Extract transform iterator. (#8498) 2022-12-05 21:37:07 +08:00
Rong Ou
8e76f5f595 Use DataSplitMode to configure data loading (#8434)
* Use `DataSplitMode` to configure data loading
2022-11-08 16:21:50 +08:00
Jiaming Yuan
3ef1703553 Allow using string view to find JSON value. (#8332)
- Allow comparison between string and string view.
- Fix compiler warnings.
2022-10-13 17:10:13 +08:00
Rong Ou
668b8a0ea4 [Breaking] Switch from rabit to the collective communicator (#8257)
* Switch from rabit to the collective communicator

* fix size_t specialization

* really fix size_t

* try again

* add include

* more include

* fix lint errors

* remove rabit includes

* fix pylint error

* return dict from communicator context

* fix communicator shutdown

* fix dask test

* reset communicator mocklist

* fix distributed tests

* do not save device communicator

* fix jvm gpu tests

* add python test for federated communicator

* Update gputreeshap submodule

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 14:39:01 -08:00
Rory Mitchell
d686bf52a6 Reduce time for some multi-gpu tests (#8288)
* Faster dask tests

* Reuse AllReducer objects in tests.

* Faster boost from prediction tests.

* Use rmm dask fixture.

* Speed up dask demo.

* mypy

* Format with black.

* mypy

* Clang-tidy

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-04 02:49:33 -08:00
Jiaming Yuan
6d1452074a Remove MGPU cpp tests. (#8276)
Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-09-27 21:18:23 +08:00
Rory Mitchell
8f77677193 Use quantised gradients in gpu_hist histograms (#8246) 2022-09-26 17:35:35 +02:00
Jiaming Yuan
fffb1fca52 Calculate base_score based on input labels for mae. (#8107)
Fit an intercept as base score for abs loss.
2022-09-20 20:53:54 +08:00
Jiaming Yuan
bc818316f2 Prepare for improving Windows networking compatibility. (#8234)
* Prepare for improving Windows networking compatibility.

* Include dmlc filesystem indirectly as dmlc/filesystem.h includes windows.h, which
  conflicts with winsock2.h
* Define `NOMINMAX` conditionally.
* Link the winsock library when mysys32 is used.
* Add config file for read the doc.
2022-09-10 15:16:49 +08:00
Jiaming Yuan
441ffc017a Copy data from Ellpack to GHist. (#8215) 2022-09-06 23:05:49 +08:00
Jiaming Yuan
7785d65c8a Fix feature weights with multiple column sampling. (#8100) 2022-07-22 20:23:05 +08:00
Jiaming Yuan
4083440690 Small cleanups to various data types. (#8086)
- Use `bst_bin_t` in batch param constructor.
- Use `StringView` to avoid `std::string` when appropriate.
- Avoid using `MetaInfo` in quantile constructor to limit the scope of parameter.
2022-07-18 22:39:36 +08:00
Jiaming Yuan
f0c1b842bf Implement sketching with adapter. (#8019) 2022-06-23 00:03:02 +08:00
Jiaming Yuan
142a208a90 Fix compiler warnings. (#8022)
- Remove/fix unused parameters
- Remove deprecated code in rabit.
- Update dmlc-core.
2022-06-22 21:29:10 +08:00
Jiaming Yuan
8f8bd8147a Fix LTR with weighted Quantile DMatrix. (#7975)
* Fix LTR with weighted Quantile DMatrix.

* Better tests.
2022-06-09 01:33:41 +08:00
Jiaming Yuan
1a33b50a0d Fix compiler warnings. (#7974)
- Remove unused parameters. There are still many warnings that are not yet
addressed. Currently, the warnings in dmlc-core dominate the error log.
- Remove `distributed` parameter from metric.
- Fixes some warnings about signed comparison.
2022-06-06 22:56:25 +08:00
Jiaming Yuan
d48123d23b Fix rmm build (#7973)
- Optionally switch to c++17
- Use rmm CMake target.
- Workaround compiler errors.
- Fix GPUMetric inheritance.
- Run death tests even if it's built with RMM support.

Co-authored-by: jakirkham <jakirkham@gmail.com>
2022-06-06 20:18:32 +08:00
Rong Ou
80339c3427 Enable distributed GPU training over Rabit (#7930) 2022-05-31 04:09:45 +08:00