Jiaming Yuan
d868126c39
[CI] Fix R build on Jenkins. ( #8154 )
2022-08-11 14:50:03 +08:00
Jiaming Yuan
570f8ae4ba
Use black on more Python files. ( #8137 )
2022-08-11 01:38:11 +08:00
Jiaming Yuan
446d536c23
Fix loading DMatrix binary in distributed env. ( #8149 )
...
- Try to load DMatrix binary before trying to parse text input.
- Remove some unmaintained code.
2022-08-10 22:53:16 +08:00
Jiaming Yuan
8fc60b31bc
Update PyPi wheel size limit. ( #8150 )
2022-08-10 18:49:57 +08:00
Jiaming Yuan
9ae547f994
Use config_context in sklearn interface. ( #8141 )
2022-08-09 14:48:54 +08:00
Bobby Wang
03cc3b359c
[pyspark] support a list of feature column names ( #8117 )
2022-08-08 17:05:27 +08:00
Jiaming Yuan
bcc8679a05
Update CUDA docker image and NCCL. ( #8139 )
2022-08-07 16:32:41 +08:00
Jiaming Yuan
d87f69215e
Quantile DMatrix for CPU. ( #8130 )
...
- Add a new `QuantileDMatrix` that works for both CPU and GPU.
- Deprecate `DeviceQuantileDMatrix`.
2022-08-02 15:51:23 +08:00
Jiaming Yuan
2cba1d9fcc
Fix compatibility with latest cupy. ( #8129 )
...
* Fix compatibility with latest cupy.
* Freeze mypy.
2022-08-01 15:24:42 +08:00
Jiaming Yuan
2c70751d1e
Implement iterative DMatrix for CPU. ( #8116 )
2022-07-26 22:34:21 +08:00
Jiaming Yuan
546de5efd2
[pyspark] Cleanup data processing. ( #8088 )
...
- Use numpy stack for handling list of arrays.
- Reuse concat function from dask.
- Prepare for `QuantileDMatrix`.
- Remove unused code.
- Use iterator for prediction to avoid initializing xgboost model
2022-07-26 15:00:52 +08:00
Jiaming Yuan
3970e4e6bb
Move pylint helper from dmlc-core. ( #8101 )
...
* Move pylint helper from dmlc-core.
- Move the helper into the XGBoost ci_build.
- Run it with multiprocessing.
* Fix original test.
2022-07-23 08:12:37 +08:00
Jiaming Yuan
7785d65c8a
Fix feature weights with multiple column sampling. ( #8100 )
2022-07-22 20:23:05 +08:00
Jiaming Yuan
4a4e5c7c18
Prepare gradient index for Quantile DMatrix. ( #8103 )
...
* Prepare gradient index for Quantile DMatrix.
- Implement push batch with adapter batch.
- Implement `GetFvalue` for prediction.
2022-07-22 17:26:33 +08:00
Rory Mitchell
1be09848a7
Refactor split valuation kernel ( #8073 )
2022-07-21 15:41:50 +02:00
Jiaming Yuan
ef11b024e8
Cleanup data generator. ( #8094 )
...
- Avoid duplicated definition of data shape.
- Explicitly define numpy iterator for CPU data.
2022-07-20 13:48:52 +08:00
Jiaming Yuan
8bdea72688
[Python] Require black and isort for new Python files. ( #8096 )
...
* [Python] Require black and isort for new Python files.
- Require black and isort for spark and dask module.
These files are relatively new and are more conform to the black formatter. We will
convert the rest of the library as we move forward.
Other libraries including dask/distributed and optuna use the same formatting style and
have a more strict standard. The black formatter is indeed quite nice, automating it can
help us unify the code style.
- Gather Python checks into a single script.
2022-07-20 10:25:24 +08:00
Bobby Wang
f801d3cf15
[PySpark] change the returning model type to string from binary ( #8085 )
...
* [PySpark] change the returning model type to string from binary
XGBoost pyspark can be can be accelerated by RAPIDS Accelerator seamlessly by
changing the returning model type from binary to string.
2022-07-19 18:39:20 +08:00
Jiaming Yuan
2365f82750
[dask] Mitigate non-deterministic test. ( #8077 )
2022-07-19 16:55:59 +08:00
Jiaming Yuan
4083440690
Small cleanups to various data types. ( #8086 )
...
- Use `bst_bin_t` in batch param constructor.
- Use `StringView` to avoid `std::string` when appropriate.
- Avoid using `MetaInfo` in quantile constructor to limit the scope of parameter.
2022-07-18 22:39:36 +08:00
Jiaming Yuan
0ce80b7bcf
Mitigate flaky GPU test. ( #8078 )
...
The flakiness is caused by the global random engine, which will take some time to fix.
2022-07-16 13:45:32 +08:00
Jiaming Yuan
7a5586f3db
Fix GPU quantile distributed test. ( #8076 )
2022-07-16 11:40:53 +08:00
Jiaming Yuan
647d3844dd
Make test for categorical data deterministic. ( #8080 )
2022-07-15 14:48:39 +08:00
Jiaming Yuan
dae7a41baa
Update Python requirement to >=3.8. ( #8071 )
...
Additional changes:
- Use mamba for CPU test on Jenkins.
- Cleanup CPU test dependencies.
- Restore some of the modin tests
2022-07-14 18:01:47 +08:00
Jiaming Yuan
abaa593aa0
Fix compiler warnings. ( #8059 )
...
- Remove unused parameters.
- Avoid comparison of different signedness.
2022-07-14 05:29:56 +08:00
WeichenXu
176fec8789
PySpark XGBoost integration ( #8020 )
...
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2022-07-13 13:11:18 +08:00
Jiaming Yuan
8959622836
[dask] Use an invalid port for test. ( #8064 )
2022-07-13 11:59:02 +08:00
Rory Mitchell
794cbaa60a
Fuse split evaluation kernels ( #8026 )
2022-07-05 10:24:31 +02:00
Jiaming Yuan
8746f9cddf
Rename IterativeDMatrix. ( #8045 )
2022-07-04 18:52:31 +08:00
Rory Mitchell
bc4f802b17
Batch UpdatePosition using cudaMemcpy ( #7964 )
2022-06-30 17:52:40 +02:00
Jiaming Yuan
f0c1b842bf
Implement sketching with adapter. ( #8019 )
2022-06-23 00:03:02 +08:00
Jiaming Yuan
142a208a90
Fix compiler warnings. ( #8022 )
...
- Remove/fix unused parameters
- Remove deprecated code in rabit.
- Update dmlc-core.
2022-06-22 21:29:10 +08:00
Bobby Wang
e44a082620
[jvm-packages] update nccl version to 2.12.12-1 ( #8015 )
2022-06-21 17:34:09 +08:00
Jiaming Yuan
4a87ea49b8
Reduce regularization for CPU gblinear. ( #8013 )
2022-06-21 01:05:27 +08:00
Jiaming Yuan
d285d6ba2a
Reduce regularization in GPU gblinear test. ( #8010 )
2022-06-20 23:55:12 +08:00
Jiaming Yuan
9b0eb66b78
Fix GPU driver test. ( #8008 )
...
* Initialize the training parameter.
2022-06-20 19:37:31 +08:00
Jiaming Yuan
637e42a0c0
Use 22.04 for RMM. ( #8001 )
...
22.06 is not released yet.
2022-06-17 04:07:31 +08:00
Jiaming Yuan
8f8bd8147a
Fix LTR with weighted Quantile DMatrix. ( #7975 )
...
* Fix LTR with weighted Quantile DMatrix.
* Better tests.
2022-06-09 01:33:41 +08:00
Jiaming Yuan
1a33b50a0d
Fix compiler warnings. ( #7974 )
...
- Remove unused parameters. There are still many warnings that are not yet
addressed. Currently, the warnings in dmlc-core dominate the error log.
- Remove `distributed` parameter from metric.
- Fixes some warnings about signed comparison.
2022-06-06 22:56:25 +08:00
Jiaming Yuan
d48123d23b
Fix rmm build ( #7973 )
...
- Optionally switch to c++17
- Use rmm CMake target.
- Workaround compiler errors.
- Fix GPUMetric inheritance.
- Run death tests even if it's built with RMM support.
Co-authored-by: jakirkham <jakirkham@gmail.com>
2022-06-06 20:18:32 +08:00
Jiaming Yuan
b90c6d25e8
Implement max_cat_threshold for CPU. ( #7957 )
2022-06-04 11:02:46 +08:00
Jiaming Yuan
13b15e07e8
Handle formatted JSON input. ( #7953 )
2022-06-01 16:20:58 +08:00
Rong Ou
80339c3427
Enable distributed GPU training over Rabit ( #7930 )
2022-05-31 04:09:45 +08:00
Philip Hyunsu Cho
47224dd6d3
Use private mirror to host llvm-openmp tarballs ( #7950 )
2022-05-27 14:56:59 -07:00
Jiaming Yuan
bde4f25794
Handle missing categorical value in CPU evaluator. ( #7948 )
2022-05-27 14:15:47 +08:00
Philip Hyunsu Cho
2070afea02
[CI] Rotate package repository keys ( #7943 )
2022-05-26 17:06:46 -07:00
Jiaming Yuan
18cbebaeb9
Unify the cat split storage for CPU. ( #7937 )
...
* Unify the cat split storage for CPU.
* Cleanup.
* Workaround.
2022-05-26 04:14:40 -07:00
Jiaming Yuan
606be9e663
Handle missing values in one hot splits. ( #7934 )
2022-05-24 20:48:41 +08:00
Jiaming Yuan
18a38f7ca0
Refactor for GHistIndex. ( #7923 )
...
* Pass sparse page as adapter, which prepares for quantile dmatrix.
* Remove old external memory code like `rbegin` and extra `Init` function.
* Simplify type dispatch.
2022-05-23 23:04:53 +08:00
Jiaming Yuan
474366c020
Add convergence test for sparse datasets. ( #7922 )
2022-05-23 18:07:26 +08:00