455 Commits

Author SHA1 Message Date
Jiaming Yuan
8e2b874b4c
[doc] Add notes about RMM and device ordinal. [skip ci] (#10562)
- Remove the experimental tag, we have been running it for a long time now.
- Add notes about avoiding set CUDA device.
- Add link in parameter.
2024-07-10 13:00:57 +08:00
david-cortes
8d0f2bfbaa
[doc] Add more detailed explanations for advanced objectives (#10283)
---------

Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2024-07-08 19:17:31 +08:00
Jiaming Yuan
b4cc350ec5
Fix categorical data with external memory. (#10433) 2024-06-18 04:34:54 +08:00
Jiaming Yuan
c2e3d4f3cd
[dask] Update dask demo for using the new dask backend. (#10347) 2024-05-31 08:03:20 +08:00
Jiaming Yuan
e6eefea5e2
[coll] Move the rabit poll helper. (#10349) 2024-05-31 08:02:21 +08:00
Jiaming Yuan
a5a58102e5
Revamp the rabit implementation. (#10112)
This PR replaces the original RABIT implementation with a new one, which has already been partially merged into XGBoost. The new one features:
- Federated learning for both CPU and GPU.
- NCCL.
- More data types.
- A unified interface for all the underlying implementations.
- Improved timeout handling for both tracker and workers.
- Exhausted tests with metrics (fixed a couple of bugs along the way).
- A reusable tracker for Python and JVM packages.
2024-05-20 11:56:23 +08:00
Jiaming Yuan
73afef1a6e
Fixes for numpy 2.0. (#10252) 2024-05-07 03:54:32 +08:00
Jiaming Yuan
59d7b8dc72
[doc] Add typing to dask demos. (#10207) 2024-04-23 00:57:05 +08:00
Jiaming Yuan
4b10200456
[coll] Improve event loop. (#10199)
- Add a test for blocking calls.
- Do not require the queue to be empty after waking up; this frees up the thread to answer blocking calls.
- Handle EOF in read.
- Improve the error message in the result. Allow concatenation of multiple results.
2024-04-18 03:29:52 +08:00
Jiaming Yuan
54b71c8fba
Fix with black 24.1.1. (#10014) 2024-01-30 17:24:11 +08:00
Jiaming Yuan
d3f2dbe64f
[dask] Add seed to demos. (#10009) 2024-01-26 02:09:38 +08:00
Jiaming Yuan
d07e8b503e
Fix quantile regression demo. (#9991) 2024-01-17 13:19:08 +08:00
Jiaming Yuan
0798e36d73
[breaking] Remove deprecated parameters in the skl interface. (#9986) 2024-01-15 20:40:05 +08:00
Jiaming Yuan
b3eb5d0945
Use UBJ in Python checkpoint. (#9958) 2024-01-09 03:22:15 +08:00
Jiaming Yuan
9f73127a23
Cleanup Python GPU tests. (#9934)
* Cleanup Python GPU tests.

- Remove the use of `gpu_hist` and `gpu_id` in cudf/cupy tests.
- Move base margin test into the testing directory.
2024-01-04 13:15:18 +08:00
Jiaming Yuan
faf0f2df10
Support dataframe data format in native XGBoost. (#9828)
- Implement a columnar adapter.
- Refactor Python pandas handling code to avoid converting into a single numpy array.
- Add support in R for transforming columns.
- Support R data.frame and factor type.
2023-12-12 09:56:31 +08:00
david-cortes
2c0fc97306
Remove note about multi-quantile being python-only (#9854) 2023-12-07 05:17:15 +08:00
Jiaming Yuan
98238d63fa
[dask] Change document to avoid using default import. (#9742)
This aligns dask with pyspark, users need to explicitly call:

```
from xgboost.dask import DaskXGBClassifier
from xgboost import dask as dxgb
```

In future releases, we might stop using the default import and remove the lazy loader.
2023-11-07 02:44:39 +08:00
Jiaming Yuan
3ca06ac51e
[doc] Mention data consistency for categorical features. (#9678) 2023-10-24 10:11:33 +08:00
James Lamb
eb562d3829
[CI] address cmakelint warnings about whitespace (#9674) 2023-10-14 12:46:07 +08:00
James Lamb
db8d117f7e
[CI] standardize endif() calls in CMake scripts (#9637) 2023-10-08 11:45:20 +08:00
James Lamb
799f8485e2
[R] [CI] enforce lintr::function_left_parentheses_linter check (#9631) 2023-10-08 09:42:09 +08:00
Jiaming Yuan
c75a3bc0a9
[breaking] [jvm-packages] Remove rabit check point. (#9599)
- Add `numBoostedRound` to jvm packages
- Remove rabit checkpoint version.
- Change the starting version of training continuation in JVM [breaking].
- Redefine the checkpoint version policy in jvm package. [breaking]
- Rename the Python check point callback parameter. [breaking]
- Unifies the checkpoint policy between Python and JVM.
2023-09-26 18:06:34 +08:00
Rong Ou
a343ae3b34
fix dupliate gpu check (#9578) 2023-09-14 05:53:46 +08:00
Rong Ou
0f35493b65
Add GPU support to NVFlare demo (#9552) 2023-09-06 17:03:59 +08:00
Jiaming Yuan
972730cde0
Use matrix for gradient. (#9508)
- Use the `linalg::Matrix` for storing gradients.
- New API for the custom objective.
- Custom objective for multi-class/multi-target is now required to return the correct shape.
- Custom objective for Python can accept arrays with any strides. (row-major, column-major)
2023-08-24 05:29:52 +08:00
Jiaming Yuan
e6cf7a1278
Deprecate the command line interface. (#9485)
---------

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2023-08-21 06:47:48 +08:00
Sean Yang
12fe2fc06c
Fix federated learning demos and tests (#9488) 2023-08-16 15:25:05 +08:00
Jiaming Yuan
fd4335d0bf
[doc] Document the current status of some features. (#9469) 2023-08-13 23:42:27 +08:00
ShaneConneely
d638535581
Update README.md (#9462) 2023-08-11 04:02:04 +08:00
Jiaming Yuan
f05a23b41c
Use weakref instead of id for DataIter cache. (#9445)
- Fix case where Python reuses id from freed objects.
- Small optimization to column matrix with QDM by using `realloc` instead of copying data.
2023-08-10 00:40:06 +08:00
Jiaming Yuan
851cba931e
Define best_iteration only if early stopping is used. (#9403)
* Define `best_iteration` only if early stopping is used.

This is the behavior specified by the document but not honored in the actual code.

- Don't set the attributes if there's no early stopping.
- Clean up the code for callbacks, and replace assertions with proper exceptions.
- Assign the attributes when early stopping `save_best` is used.
- Turn the attributes into Python properties.

---------

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2023-07-24 12:43:35 +08:00
Jiaming Yuan
275da176ba
Document for device ordinal. (#9398)
- Rewrite GPU demos. notebook is converted to script to avoid committing additional png plots.
- Add GPU demos into the sphinx gallery.
- Add RMM demos into the sphinx gallery.
- Test for firing threads with different device ordinals.
2023-07-22 15:26:29 +08:00
Jiaming Yuan
0897477af0
Remove unmaintained jvm readme and dev scripts. (#9395) 2023-07-18 18:23:43 +08:00
Jiaming Yuan
16eb41936d
Handle the new device parameter in dask and demos. (#9386)
* Handle the new `device` parameter in dask and demos.

- Check no ordinal is specified in the dask interface.
- Update demos.
- Update dask doc.
- Update the condition for QDM.
2023-07-15 19:11:20 +08:00
Jiaming Yuan
cfa9c42eb4
Fix callback in AFT viz demo. (#9333)
* Fix callback in AFT viz demo.

- Update the callback function.
- Add lint check.
2023-06-26 22:35:02 +08:00
Jiaming Yuan
ee6809e642
Use mmap for external memory. (#9282)
- Have basic infrastructure for mmap.
- Release file write handle.
2023-06-19 18:52:55 +08:00
Jiaming Yuan
1fcc26a6f8
Set ndcg to default for LTR. (#8822)
- Add document.
- Add tests.
- Use `ndcg` with `topk` as default.
2023-06-09 23:31:33 +08:00
Jiaming Yuan
aba4559c4f
[doc] Update dask demo. (#9201) 2023-05-31 05:01:02 +08:00
Rong Ou
250b22dd22
Fix nvflare horizontal demo (#9124) 2023-05-05 16:48:22 +08:00
Jiaming Yuan
1f9a57d17b
[Breaking] Require format to be specified in input URI. (#9077)
Previously, we use `libsvm` as default when format is not specified. However, the dmlc
data parser is not particularly robust against errors, and the most common type of error
is undefined format.

Along with which, we will recommend users to use other data loader instead. We will
continue the maintenance of the parsers as it's currently used for many internal tests
including federated learning.
2023-04-28 19:45:15 +08:00
Rong Ou
fb941262b4
Add demo for vertical federated learning (#9103) 2023-04-28 16:03:21 +08:00
Jiaming Yuan
720a8c3273
[doc] Remove parameter type in Python doc strings. (#9005) 2023-04-01 04:04:30 +08:00
Jiaming Yuan
401ce5cf5e
Run linters with the multi output demo. (#8966) 2023-03-28 00:47:28 +08:00
Jiaming Yuan
acc110c251
[MT-TREE] Support prediction cache and model slicing. (#8968)
- Fix prediction range.
- Support prediction cache in mt-hist.
- Support model slicing.
- Make the booster a Python iterable by defining `__iter__`.
- Cleanup removed/deprecated parameters.
- A new field in the output model `iteration_indptr` for pointing to the ranges of trees for each iteration.
2023-03-27 23:10:54 +08:00
Jiaming Yuan
21a52c7f98
[doc] Add introduction and notes for the sklearn interface. (#8948) 2023-03-23 13:30:42 +08:00
Jiaming Yuan
151882dd26
Initial support for multi-target tree. (#8616)
* Implement multi-target for hist.

- Add new hist tree builder.
- Move data fetchers for tests.
- Dispatch function calls in gbm base on the tree type.
2023-03-22 23:49:56 +08:00
Jiaming Yuan
228a46e8ad
Support learning rate for zero-hessian objectives. (#8866) 2023-03-06 20:33:28 +08:00
Jiaming Yuan
6a892ce281
Specify src path for isort. (#8867) 2023-03-06 17:30:27 +08:00
Philip Hyunsu Cho
6d8afb2218
[CI] Require C++17 + CMake 3.18; Use CUDA 11.8 in CI (#8853)
* Update to C++17

* Turn off unity build

* Update CMake to 3.18

* Use MSVC 2022 + CUDA 11.8

* Re-create stack for worker images

* Allocate more disk space for Windows

* Tempiorarily disable clang-tidy

* RAPIDS now requires Python 3.10+

* Unpin cuda-python

* Use latest NCCL

* Use Ubuntu 20.04 in RMM image

* Mark failing mgpu test as xfail
2023-03-01 09:22:24 -08:00