38 Commits

Author SHA1 Message Date
Jiaming Yuan
e8a962575a
[EM] Allow staging ellpack on host for GPU external memory. (#10488)
- New parameter `on_host`.
- Abstract format creation and stream creation into policy classes.
2024-06-28 04:42:18 +08:00
Philip Hyunsu Cho
bc3747bdce
[CI] Migrate to rockylinux8 / manylinux_2_28_x86_64 (#10399)
* [CI] Migrate to rockylinux8 / manylinux_2_28_x86_64

* Scrub all references to CentOS 7

* Fix

* Remove use of yum

* Use gcc-10 in cpu

* Temporarily disable -Werror

* Use GCC 9 for now

* Roll back gRPC

* Scrub all references to manylinux2014_x86_64

* Revise rename_whl.py to handle no-op rename

* Change JDK_VERSION back to 8

* Reviewer's comment

* Use GCC 10

* Use Spark 3.5.1, same as in pom.xml

* Fix JAR install
2024-06-17 12:07:49 -07:00
Jiaming Yuan
a5a58102e5
Revamp the rabit implementation. (#10112)
This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features:
- Federated learning for both CPU and GPU.
- NCCL.
- More data types.
- A unified interface for all the underlying implementations.
- Improved timeout handling for both tracker and workers.
- Exhausted tests with metrics (fixed a couple of bugs along the way).
- A reusable tracker for Python and JVM packages.
2024-05-20 11:56:23 +08:00
Jiaming Yuan
73afef1a6e
Fixes for numpy 2.0. (#10252) 2024-05-07 03:54:32 +08:00
github-actions[bot]
2925cebdca
[CI] Use latest RAPIDS; Pandas 2.0 compatibility fix (#10175)
* [CI] Update RAPIDS to latest stable

* [CI] Use rapidsai stable channel; fix syntax errors in Dockerfile.gpu

* Don't combine astype() with loc()

* Work around https://github.com/dmlc/xgboost/issues/10181

* Fix formatting

* Fix test

---------

Co-authored-by: hcho3 <hcho3@users.noreply.github.com>
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2024-04-15 13:38:53 -07:00
Jiaming Yuan
ca4801f81d
Work with IPv6 in the new tracker. (#10125) 2024-03-20 05:19:23 +08:00
Jiaming Yuan
8ea705e4d5
Support sample weight in sklearn custom objective. (#10050) 2024-02-21 00:43:14 +08:00
Jiaming Yuan
54b71c8fba
Fix with black 24.1.1. (#10014) 2024-01-30 17:24:11 +08:00
Jiaming Yuan
38dd91f491
Save model in ubj as the default. (#9947) 2024-01-05 17:53:36 +08:00
Jiaming Yuan
5f7b5a6921
Add tests for pickling with custom obj and metric. (#9943) 2024-01-04 14:52:48 +08:00
Jiaming Yuan
3ca06ac51e
[doc] Mention data consistency for categorical features. (#9678) 2023-10-24 10:11:33 +08:00
Rong Ou
6fbe6248f4
More in-memory input support for column split (#9685) 2023-10-20 16:02:36 +08:00
Rong Ou
da6803b75b
Support column-wise data split with in-memory inputs (#9628)
---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2023-10-17 12:16:39 +08:00
Jiaming Yuan
ccfc90e4c6
[rabit] Improved connection handling. (#9531)
- Enable timeout.
- Report connection error from the system.
- Handle retry for both tracker connection and peer connection.
2023-08-30 13:00:04 +08:00
Jiaming Yuan
972730cde0
Use matrix for gradient. (#9508)
- Use the `linalg::Matrix` for storing gradients.
- New API for the custom objective.
- Custom objective for multi-class/multi-target is now required to return the correct shape.
- Custom objective for Python can accept arrays with any strides. (row-major, column-major)
2023-08-24 05:29:52 +08:00
Jiaming Yuan
f05a23b41c
Use weakref instead of id for DataIter cache. (#9445)
- Fix case where Python reuses id from freed objects.
- Small optimization to column matrix with QDM by using `realloc` instead of copying data.
2023-08-10 00:40:06 +08:00
Jiaming Yuan
7129988847
Accept only keyword arguments in data iterator. (#9431) 2023-08-03 12:44:16 +08:00
Jiaming Yuan
04aff3af8e
Define the new device parameter. (#9362) 2023-07-13 19:30:25 +08:00
Jiaming Yuan
20c52f07d2
Support exporting cut values (#9356) 2023-07-08 15:32:41 +08:00
Jiaming Yuan
39390cc2ee
[breaking] Remove the predictor param, allow fallback to prediction using DMatrix. (#9129)
- A `DeviceOrd` struct is implemented to indicate the device. It will eventually replace the `gpu_id` parameter.
- The `predictor` parameter is removed.
- Fallback to `DMatrix` when `inplace_predict` is not available.
- The heuristic for choosing a predictor is only used during training.
2023-07-03 19:23:54 +08:00
Jiaming Yuan
ee6809e642
Use mmap for external memory. (#9282)
- Have basic infrastructure for mmap.
- Release file write handle.
2023-06-19 18:52:55 +08:00
Jiaming Yuan
3913ff470f
Import data lazily during tests. (#9176) 2023-05-23 03:58:31 +08:00
Jiaming Yuan
08ce495b5d
Use Booster context in DMatrix. (#8896)
- Pass context from booster to DMatrix.
- Use context instead of integer for `n_threads`.
- Check the consistency configuration for `max_bin`.
- Test for all combinations of initialization options.
2023-04-28 21:47:14 +08:00
Jiaming Yuan
1f9a57d17b
[Breaking] Require format to be specified in input URI. (#9077)
Previously, we use `libsvm` as default when format is not specified. However, the dmlc
data parser is not particularly robust against errors, and the most common type of error
is undefined format.

Along with which, we will recommend users to use other data loader instead. We will
continue the maintenance of the parsers as it's currently used for many internal tests
including federated learning.
2023-04-28 19:45:15 +08:00
Jiaming Yuan
e206b899ef
Rework MAP and Pairwise for LTR. (#9075) 2023-04-28 02:39:12 +08:00
Jiaming Yuan
151882dd26
Initial support for multi-target tree. (#8616)
* Implement multi-target for hist.

- Add new hist tree builder.
- Move data fetchers for tests.
- Dispatch function calls in gbm base on the tree type.
2023-03-22 23:49:56 +08:00
Jiaming Yuan
5891f752c8
Rework the MAP metric. (#8931)
- The new implementation is more strict as only binary labels are accepted. The previous implementation converts values greater than 1 to 1.
- Deterministic GPU. (no atomic add).
- Fix top-k handling.
- Precise definition of MAP. (There are other variants on how to handle top-k).
- Refactor GPU ranking tests.
2023-03-22 17:45:20 +08:00
Jiaming Yuan
6a892ce281
Specify src path for isort. (#8867) 2023-03-06 17:30:27 +08:00
Jiaming Yuan
8a16944664
Fix ranking with quantile dmatrix and group weight. (#8762) 2023-02-10 20:32:35 +08:00
Jiaming Yuan
cfa994d57f
Multi-target support for L1 error. (#8652)
- Add matrix support to the median function.
- Iterate through each target for quantile computation.
2023-01-11 05:51:14 +08:00
Jiaming Yuan
40343c8ee1
Test dask demos. (#8557)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2022-12-13 18:37:31 +08:00
Jiaming Yuan
d666ba775e
Support all pandas nullable integer types. (#8480)
- Enumerate all pandas integer types.
- Tests for `None`, `nan`, and `pd.NA`
2022-11-28 22:38:16 +08:00
Jiaming Yuan
f2209c1fe4
Don't shuffle columns in categorical tests. (#8446) 2022-11-28 20:28:06 +08:00
Jiaming Yuan
0d3da9869c
Require isort on all Python files. (#8420) 2022-11-08 12:59:06 +08:00
James Lamb
bf8de227a9
[CI] remove unused import in python tests (#8409) 2022-11-03 22:27:25 +08:00
Jiaming Yuan
a408c34558
Update JSON parser demo with categorical feature. (#8401)
- Parse categorical features in the Python example.
- Add tests.
- Update document.
2022-10-28 20:57:43 +08:00
Jiaming Yuan
cfd2a9f872
Extract dask and spark test into distributed test. (#8395)
- Move test files.
- Run spark and dask separately to prevent conflicts.
- Gather common code into the testing module.
2022-10-28 16:24:32 +08:00
Jiaming Yuan
cf70864fa3
Move Python testing utilities into xgboost module. (#8379)
- Add typehints.
- Fixes for pylint.

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-26 16:56:11 +08:00