Jiaming Yuan
16bca5d4a1
Support CPU input for device QuantileDMatrix. ( #8136 )
...
- Copy `GHistIndexMatrix` to `Ellpack` when needed.
2022-08-11 21:21:26 +08:00
Jiaming Yuan
36e7c5364d
[dask] Deterministic rank assignment. ( #8018 )
2022-08-11 19:17:58 +08:00
Ravi Makhija
20d1bba1bb
Simplify Python getting started example ( #8153 )
...
Load data set via `sklearn` rather than a local file path.
2022-08-11 16:42:09 +08:00
Jiaming Yuan
d868126c39
[CI] Fix R build on Jenkins. ( #8154 )
2022-08-11 14:50:03 +08:00
Jiaming Yuan
570f8ae4ba
Use black on more Python files. ( #8137 )
2022-08-11 01:38:11 +08:00
Jiaming Yuan
bdb291f1c2
[doc] Clarification for feature importance. ( #8151 )
2022-08-11 00:30:42 +08:00
Jiaming Yuan
446d536c23
Fix loading DMatrix binary in distributed env. ( #8149 )
...
- Try to load DMatrix binary before trying to parse text input.
- Remove some unmaintained code.
2022-08-10 22:53:16 +08:00
Jiaming Yuan
8fc60b31bc
Update PyPi wheel size limit. ( #8150 )
2022-08-10 18:49:57 +08:00
Jiaming Yuan
9ae547f994
Use config_context in sklearn interface. ( #8141 )
2022-08-09 14:48:54 +08:00
Bobby Wang
03cc3b359c
[pyspark] support a list of feature column names ( #8117 )
2022-08-08 17:05:27 +08:00
Jiaming Yuan
bcc8679a05
Update CUDA docker image and NCCL. ( #8139 )
2022-08-07 16:32:41 +08:00
Praateek Mahajan
ff471b3fab
In PySpark Estimator example use the model with validation_indicator ( #8131 )
...
* use the validation_indicator model
* use the validation_indicator model for regression
2022-08-03 13:57:41 +08:00
Jiaming Yuan
d87f69215e
Quantile DMatrix for CPU. ( #8130 )
...
- Add a new `QuantileDMatrix` that works for both CPU and GPU.
- Deprecate `DeviceQuantileDMatrix`.
2022-08-02 15:51:23 +08:00
Jiaming Yuan
2cba1d9fcc
Fix compatibility with latest cupy. ( #8129 )
...
* Fix compatibility with latest cupy.
* Freeze mypy.
2022-08-01 15:24:42 +08:00
Philip Hyunsu Cho
24c2373080
[Doc] Indicate lack of py-xgboost-gpu on Windows ( #8127 )
2022-07-28 12:57:16 -07:00
Jiaming Yuan
2c70751d1e
Implement iterative DMatrix for CPU. ( #8116 )
2022-07-26 22:34:21 +08:00
Jiaming Yuan
546de5efd2
[pyspark] Cleanup data processing. ( #8088 )
...
- Use numpy stack for handling list of arrays.
- Reuse concat function from dask.
- Prepare for `QuantileDMatrix`.
- Remove unused code.
- Use iterator for prediction to avoid initializing xgboost model
2022-07-26 15:00:52 +08:00
Jiaming Yuan
3970e4e6bb
Move pylint helper from dmlc-core. ( #8101 )
...
* Move pylint helper from dmlc-core.
- Move the helper into the XGBoost ci_build.
- Run it with multiprocessing.
* Fix original test.
2022-07-23 08:12:37 +08:00
Jiaming Yuan
7785d65c8a
Fix feature weights with multiple column sampling. ( #8100 )
2022-07-22 20:23:05 +08:00
Jiaming Yuan
4a4e5c7c18
Prepare gradient index for Quantile DMatrix. ( #8103 )
...
* Prepare gradient index for Quantile DMatrix.
- Implement push batch with adapter batch.
- Implement `GetFvalue` for prediction.
2022-07-22 17:26:33 +08:00
Rory Mitchell
1be09848a7
Refactor split valuation kernel ( #8073 )
2022-07-21 15:41:50 +02:00
Tim Gates
cb40bbdadd
docs: fix simple typo, cannonical -> canonical ( #8099 )
...
There is a small typo in src/common/partition_builder.h.
Should read `canonical` rather than `cannonical`.
Signed-off-by: Tim Gates <tim.gates@iress.com>
2022-07-20 21:04:50 +08:00
QuellaZhang
703261e78f
[MSVC][std:c++latest] Fix compiler error ( #8093 )
...
Co-authored-by: QuellaZhang <zhangyi2090@163.com>
2022-07-20 15:15:39 +08:00
Jiaming Yuan
ef11b024e8
Cleanup data generator. ( #8094 )
...
- Avoid duplicated definition of data shape.
- Explicitly define numpy iterator for CPU data.
2022-07-20 13:48:52 +08:00
Jiaming Yuan
5156be0f49
Limit max_depth to 30 for GPU. ( #8098 )
2022-07-20 12:28:49 +08:00
Jiaming Yuan
8bdea72688
[Python] Require black and isort for new Python files. ( #8096 )
...
* [Python] Require black and isort for new Python files.
- Require black and isort for spark and dask module.
These files are relatively new and are more conform to the black formatter. We will
convert the rest of the library as we move forward.
Other libraries including dask/distributed and optuna use the same formatting style and
have a more strict standard. The black formatter is indeed quite nice, automating it can
help us unify the code style.
- Gather Python checks into a single script.
2022-07-20 10:25:24 +08:00
WeichenXu
f23cc92130
[pyspark] User guide doc and tutorials ( #8082 )
...
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
2022-07-19 22:25:14 +08:00
Bobby Wang
f801d3cf15
[PySpark] change the returning model type to string from binary ( #8085 )
...
* [PySpark] change the returning model type to string from binary
XGBoost pyspark can be can be accelerated by RAPIDS Accelerator seamlessly by
changing the returning model type from binary to string.
2022-07-19 18:39:20 +08:00
Jiaming Yuan
2365f82750
[dask] Mitigate non-deterministic test. ( #8077 )
2022-07-19 16:55:59 +08:00
Rong Ou
7a6b711eb8
Remove unused updater basemaker ( #8091 )
2022-07-19 15:41:27 +08:00
Philip Hyunsu Cho
4325178822
[CI] Clear workspace after budget check ( #8092 )
...
* [CI] Clear workspace after budget check
* Windows too
2022-07-18 19:17:33 -07:00
Jiaming Yuan
4083440690
Small cleanups to various data types. ( #8086 )
...
- Use `bst_bin_t` in batch param constructor.
- Use `StringView` to avoid `std::string` when appropriate.
- Avoid using `MetaInfo` in quantile constructor to limit the scope of parameter.
2022-07-18 22:39:36 +08:00
Jiaming Yuan
e28f6f6657
[doc] Integrate pyspark module into sphinx doc [skip ci] ( #8066 )
2022-07-17 10:46:09 +08:00
Rafail Giavrimis
579ab23b10
Check cudf lazily ( #8084 )
2022-07-17 09:27:43 +08:00
Bobby Wang
a33f35eecf
[PySpark] add gpu support for spark local mode ( #8068 )
2022-07-17 07:59:06 +08:00
Bobby Wang
91bb9e2cb3
[PySpark] fix raw_prediction_col parameter and minor cleanup ( #8067 )
2022-07-16 17:58:57 +08:00
Jiaming Yuan
0ce80b7bcf
Mitigate flaky GPU test. ( #8078 )
...
The flakiness is caused by the global random engine, which will take some time to fix.
2022-07-16 13:45:32 +08:00
Jiaming Yuan
7a5586f3db
Fix GPU quantile distributed test. ( #8076 )
2022-07-16 11:40:53 +08:00
Jiaming Yuan
8fccc3c4ad
[dask] Fix potential error in demo. ( #8079 )
...
* Use dask_cudf instead.
2022-07-15 18:42:29 +08:00
Jiaming Yuan
647d3844dd
Make test for categorical data deterministic. ( #8080 )
2022-07-15 14:48:39 +08:00
Jiaming Yuan
dae7a41baa
Update Python requirement to >=3.8. ( #8071 )
...
Additional changes:
- Use mamba for CPU test on Jenkins.
- Cleanup CPU test dependencies.
- Restore some of the modin tests
2022-07-14 18:01:47 +08:00
Jiaming Yuan
8dd96013f1
Split up column matrix initialization. ( #8060 )
...
* Split up column matrix initialization.
This PR splits the column matrix initialization into 2 steps, the first one initializes
the storage while the second one does the transpose. By doing so, we can reuse the code
for Quantile DMatrix.
2022-07-14 10:34:47 +08:00
Philip Hyunsu Cho
36cf979b82
[CI] Fix S3 uploads ( #8069 )
...
* [CI] Fix S3 upload issues
* Don't launch Docker containers when uploading to S3
2022-07-13 16:23:00 -07:00
Jiaming Yuan
abaa593aa0
Fix compiler warnings. ( #8059 )
...
- Remove unused parameters.
- Avoid comparison of different signedness.
2022-07-14 05:29:56 +08:00
Jiaming Yuan
937352c78f
Fix R package Windows build. ( #8065 )
2022-07-14 05:27:38 +08:00
WeichenXu
176fec8789
PySpark XGBoost integration ( #8020 )
...
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>
2022-07-13 13:11:18 +08:00
Jiaming Yuan
8959622836
[dask] Use an invalid port for test. ( #8064 )
2022-07-13 11:59:02 +08:00
Rory Mitchell
0bdaca25ca
Use single precision in gain calculation, use pointers instead of span. ( #8051 )
2022-07-12 21:56:27 +02:00
Jiaming Yuan
a5bc8e2c6a
Fix mypy error with the latest dask. ( #8052 )
...
* Fix mypy error with latest dask.
Dask is adding type hints to its codebase and as the result, checks in XGBoost can be
performed more rigorously.
- Remove compatibility with old dask version where multi lock was missing.
- Restrict input of `X` to be non-series.
- Adopt latest definition of `Delayed`.
- Avoid passing optional `host_ip`.
- Avoid deprecated `worker.nthreads`.
2022-07-09 08:02:42 +08:00
Jiaming Yuan
210eb471e9
[R] Implement feature info for DMatrix. ( #8048 )
2022-07-09 05:57:39 +08:00