Commit Graph

1080 Commits

Author SHA1 Message Date
Jiaming Yuan
16bca5d4a1 Support CPU input for device QuantileDMatrix. (#8136)
- Copy `GHistIndexMatrix` to `Ellpack` when needed.
2022-08-11 21:21:26 +08:00
Jiaming Yuan
36e7c5364d [dask] Deterministic rank assignment. (#8018) 2022-08-11 19:17:58 +08:00
Jiaming Yuan
d868126c39 [CI] Fix R build on Jenkins. (#8154) 2022-08-11 14:50:03 +08:00
Jiaming Yuan
570f8ae4ba Use black on more Python files. (#8137) 2022-08-11 01:38:11 +08:00
Jiaming Yuan
446d536c23 Fix loading DMatrix binary in distributed env. (#8149)
- Try to load DMatrix binary before trying to parse text input.
- Remove some unmaintained code.
2022-08-10 22:53:16 +08:00
Jiaming Yuan
8fc60b31bc Update PyPi wheel size limit. (#8150) 2022-08-10 18:49:57 +08:00
Jiaming Yuan
9ae547f994 Use config_context in sklearn interface. (#8141) 2022-08-09 14:48:54 +08:00
Bobby Wang
03cc3b359c [pyspark] support a list of feature column names (#8117) 2022-08-08 17:05:27 +08:00
Jiaming Yuan
bcc8679a05 Update CUDA docker image and NCCL. (#8139) 2022-08-07 16:32:41 +08:00
Jiaming Yuan
d87f69215e Quantile DMatrix for CPU. (#8130)
- Add a new `QuantileDMatrix` that works for both CPU and GPU.
- Deprecate `DeviceQuantileDMatrix`.
2022-08-02 15:51:23 +08:00
Jiaming Yuan
2cba1d9fcc Fix compatibility with latest cupy. (#8129)
* Fix compatibility with latest cupy.

* Freeze mypy.
2022-08-01 15:24:42 +08:00
Jiaming Yuan
2c70751d1e Implement iterative DMatrix for CPU. (#8116) 2022-07-26 22:34:21 +08:00
Jiaming Yuan
546de5efd2 [pyspark] Cleanup data processing. (#8088)
- Use numpy stack for handling list of arrays.
- Reuse concat function from dask.
- Prepare for `QuantileDMatrix`.
- Remove unused code.
- Use iterator for prediction to avoid initializing xgboost model
2022-07-26 15:00:52 +08:00
Jiaming Yuan
3970e4e6bb Move pylint helper from dmlc-core. (#8101)
* Move pylint helper from dmlc-core.

- Move the helper into the XGBoost ci_build.
- Run it with multiprocessing.

* Fix original test.
2022-07-23 08:12:37 +08:00
Jiaming Yuan
7785d65c8a Fix feature weights with multiple column sampling. (#8100) 2022-07-22 20:23:05 +08:00
Jiaming Yuan
4a4e5c7c18 Prepare gradient index for Quantile DMatrix. (#8103)
* Prepare gradient index for Quantile DMatrix.

- Implement push batch with adapter batch.
- Implement `GetFvalue` for prediction.
2022-07-22 17:26:33 +08:00
Rory Mitchell
1be09848a7 Refactor split valuation kernel (#8073) 2022-07-21 15:41:50 +02:00
Jiaming Yuan
ef11b024e8 Cleanup data generator. (#8094)
- Avoid duplicated definition of data shape.
- Explicitly define numpy iterator for CPU data.
2022-07-20 13:48:52 +08:00
Jiaming Yuan
8bdea72688 [Python] Require black and isort for new Python files. (#8096)
* [Python] Require black and isort for new Python files.

- Require black and isort for spark and dask module.

These files are relatively new and are more conform to the black formatter. We will
convert the rest of the library as we move forward.

Other libraries including dask/distributed and optuna use the same formatting style and
have a more strict standard. The black formatter is indeed quite nice, automating it can
help us unify the code style.

- Gather Python checks into a single script.
2022-07-20 10:25:24 +08:00
Bobby Wang
f801d3cf15 [PySpark] change the returning model type to string from binary (#8085)
* [PySpark] change the returning model type to string from binary

XGBoost pyspark can be can be accelerated by RAPIDS Accelerator seamlessly by
changing the returning model type from binary to string.
2022-07-19 18:39:20 +08:00
Jiaming Yuan
2365f82750 [dask] Mitigate non-deterministic test. (#8077) 2022-07-19 16:55:59 +08:00
Jiaming Yuan
4083440690 Small cleanups to various data types. (#8086)
- Use `bst_bin_t` in batch param constructor.
- Use `StringView` to avoid `std::string` when appropriate.
- Avoid using `MetaInfo` in quantile constructor to limit the scope of parameter.
2022-07-18 22:39:36 +08:00
Jiaming Yuan
0ce80b7bcf Mitigate flaky GPU test. (#8078)
The flakiness is caused by the global random engine, which will take some time to fix.
2022-07-16 13:45:32 +08:00
Jiaming Yuan
7a5586f3db Fix GPU quantile distributed test. (#8076) 2022-07-16 11:40:53 +08:00
Jiaming Yuan
647d3844dd Make test for categorical data deterministic. (#8080) 2022-07-15 14:48:39 +08:00
Jiaming Yuan
dae7a41baa Update Python requirement to >=3.8. (#8071)
Additional changes:
- Use mamba for CPU test on Jenkins.
- Cleanup CPU test dependencies.
- Restore some of the modin tests
2022-07-14 18:01:47 +08:00
Jiaming Yuan
abaa593aa0 Fix compiler warnings. (#8059)
- Remove unused parameters.
- Avoid comparison of different signedness.
2022-07-14 05:29:56 +08:00
WeichenXu
176fec8789 PySpark XGBoost integration (#8020)
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2022-07-13 13:11:18 +08:00
Jiaming Yuan
8959622836 [dask] Use an invalid port for test. (#8064) 2022-07-13 11:59:02 +08:00
Rory Mitchell
794cbaa60a Fuse split evaluation kernels (#8026) 2022-07-05 10:24:31 +02:00
Jiaming Yuan
8746f9cddf Rename IterativeDMatrix. (#8045) 2022-07-04 18:52:31 +08:00
Rory Mitchell
bc4f802b17 Batch UpdatePosition using cudaMemcpy (#7964) 2022-06-30 17:52:40 +02:00
Jiaming Yuan
f0c1b842bf Implement sketching with adapter. (#8019) 2022-06-23 00:03:02 +08:00
Jiaming Yuan
142a208a90 Fix compiler warnings. (#8022)
- Remove/fix unused parameters
- Remove deprecated code in rabit.
- Update dmlc-core.
2022-06-22 21:29:10 +08:00
Bobby Wang
e44a082620 [jvm-packages] update nccl version to 2.12.12-1 (#8015) 2022-06-21 17:34:09 +08:00
Jiaming Yuan
4a87ea49b8 Reduce regularization for CPU gblinear. (#8013) 2022-06-21 01:05:27 +08:00
Jiaming Yuan
d285d6ba2a Reduce regularization in GPU gblinear test. (#8010) 2022-06-20 23:55:12 +08:00
Jiaming Yuan
9b0eb66b78 Fix GPU driver test. (#8008)
* Initialize the training parameter.
2022-06-20 19:37:31 +08:00
Jiaming Yuan
637e42a0c0 Use 22.04 for RMM. (#8001)
22.06 is not released yet.
2022-06-17 04:07:31 +08:00
Jiaming Yuan
8f8bd8147a Fix LTR with weighted Quantile DMatrix. (#7975)
* Fix LTR with weighted Quantile DMatrix.

* Better tests.
2022-06-09 01:33:41 +08:00
Jiaming Yuan
1a33b50a0d Fix compiler warnings. (#7974)
- Remove unused parameters. There are still many warnings that are not yet
addressed. Currently, the warnings in dmlc-core dominate the error log.
- Remove `distributed` parameter from metric.
- Fixes some warnings about signed comparison.
2022-06-06 22:56:25 +08:00
Jiaming Yuan
d48123d23b Fix rmm build (#7973)
- Optionally switch to c++17
- Use rmm CMake target.
- Workaround compiler errors.
- Fix GPUMetric inheritance.
- Run death tests even if it's built with RMM support.

Co-authored-by: jakirkham <jakirkham@gmail.com>
2022-06-06 20:18:32 +08:00
Jiaming Yuan
b90c6d25e8 Implement max_cat_threshold for CPU. (#7957) 2022-06-04 11:02:46 +08:00
Jiaming Yuan
13b15e07e8 Handle formatted JSON input. (#7953) 2022-06-01 16:20:58 +08:00
Rong Ou
80339c3427 Enable distributed GPU training over Rabit (#7930) 2022-05-31 04:09:45 +08:00
Philip Hyunsu Cho
47224dd6d3 Use private mirror to host llvm-openmp tarballs (#7950) 2022-05-27 14:56:59 -07:00
Jiaming Yuan
bde4f25794 Handle missing categorical value in CPU evaluator. (#7948) 2022-05-27 14:15:47 +08:00
Philip Hyunsu Cho
2070afea02 [CI] Rotate package repository keys (#7943) 2022-05-26 17:06:46 -07:00
Jiaming Yuan
18cbebaeb9 Unify the cat split storage for CPU. (#7937)
* Unify the cat split storage for CPU.

* Cleanup.

* Workaround.
2022-05-26 04:14:40 -07:00
Jiaming Yuan
606be9e663 Handle missing values in one hot splits. (#7934) 2022-05-24 20:48:41 +08:00