Commit Graph

5916 Commits

Author SHA1 Message Date
Jiaming Yuan
441ffc017a Copy data from Ellpack to GHist. (#8215) 2022-09-06 23:05:49 +08:00
Bobby Wang
7ee10e3dbd [pyspark] Cleanup the comments (#8217) 2022-09-05 16:20:12 +08:00
Jiaming Yuan
ada4a86d1c Fix dask interface with latest cupy. (#8210) 2022-09-03 03:10:43 +08:00
Dmitry Razdoburdin
deae99e662 Optimization/buildhist/hist util (#8218)
* BuildHistKernel optimization

Co-authored-by: dmitry.razdoburdin <drazdobu@jfldaal005.jf.intel.com>
2022-09-02 19:39:45 +08:00
Rong Ou
b78bc734d9 Fix dask.py lint error (#8216) 2022-09-02 16:30:01 +08:00
Philip Hyunsu Cho
56395d120b Work around MSVC behavior wrt constexpr capture (#8211)
* Work around MSVC behavior wrt constexpr capture

* Fix lint
2022-08-31 11:42:08 -08:00
CW
a868498c18 [doc] Update prediction.rst (#8214) 2022-08-31 21:00:12 +08:00
Jiaming Yuan
8dac90a593 Mark parameter validation non-experimental. (#8206) 2022-08-30 15:49:43 +08:00
Rong Ou
d6e2013c5f Set max message size in insecure gRPC (#8203) 2022-08-26 16:33:51 +08:00
WeichenXu
651f0a8889 [pyspark] Fixing xgboost.spark python doc (#8200)
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-08-25 14:41:48 +08:00
WeichenXu
d03794ce7a [pyspark] Add param validation for "objective" and "eval_metric" param, and remove invalid booster params (#8173)
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-08-24 15:29:43 +08:00
Jiaming Yuan
9b32e6e2dc Fix release script. (#8187) (#8195) 2022-08-23 15:08:30 +08:00
WeichenXu
f4628c22a4 [pyspark] Implement SparkXGBRanker estimator (#8172)
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-08-23 02:35:19 +08:00
Philip Hyunsu Cho
35ef8abc27 [CI] Prune unused archs from libnccl (#8179)
* [CI] Prune unused archs from libnccl

* Put pruning logic in CI directory

* Don't use --color in grep
2022-08-21 00:46:16 -08:00
Rong Ou
ad3bc0edee Allow insecure gRPC connections for federated learning (#8181)
* Allow insecure gRPC connections for federated learning

* format
2022-08-19 12:16:14 +08:00
WeichenXu
53d2a733b0 [pyspark] Make Xgboost estimator support using sparse matrix as optimization (#8145)
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-08-19 01:57:28 +08:00
Rory Mitchell
1703dc330f Optimise histogram kernels (#8118) 2022-08-18 14:07:26 +02:00
Gavin Zhang
40a10c217d Use make on i system (#8178)
Co-authored-by: GavinZhang <zhanggan@cn.ibm.com>
2022-08-18 12:55:32 +08:00
dependabot[bot]
93966b0d19 Bump hadoop-common from 3.2.3 to 3.2.4 in /jvm-packages/xgboost4j-flink (#8157)
Bumps hadoop-common from 3.2.3 to 3.2.4.

---
updated-dependencies:
- dependency-name: org.apache.hadoop:hadoop-common
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-15 06:47:27 -08:00
Andy Kattine
a9458fd844 Grammar Fix in Introduction to Boosted Trees (#8166)
Added "of" to "objective functions is that they consist of two parts" in line 32 of ./doc/tutorials/model.rst
2022-08-15 15:19:47 +08:00
Ravi Makhija
fa869eebd9 Edit grammar in custom metric tutorial (#8163) 2022-08-13 01:02:25 +08:00
Rory Mitchell
f421c26d35 Tune cuda architectures (#8152) 2022-08-11 13:36:47 -07:00
Jiaming Yuan
16bca5d4a1 Support CPU input for device QuantileDMatrix. (#8136)
- Copy `GHistIndexMatrix` to `Ellpack` when needed.
2022-08-11 21:21:26 +08:00
Jiaming Yuan
36e7c5364d [dask] Deterministic rank assignment. (#8018) 2022-08-11 19:17:58 +08:00
Ravi Makhija
20d1bba1bb Simplify Python getting started example (#8153)
Load data set via `sklearn` rather than a local file path.
2022-08-11 16:42:09 +08:00
Jiaming Yuan
d868126c39 [CI] Fix R build on Jenkins. (#8154) 2022-08-11 14:50:03 +08:00
Jiaming Yuan
570f8ae4ba Use black on more Python files. (#8137) 2022-08-11 01:38:11 +08:00
Jiaming Yuan
bdb291f1c2 [doc] Clarification for feature importance. (#8151) 2022-08-11 00:30:42 +08:00
Jiaming Yuan
446d536c23 Fix loading DMatrix binary in distributed env. (#8149)
- Try to load DMatrix binary before trying to parse text input.
- Remove some unmaintained code.
2022-08-10 22:53:16 +08:00
Jiaming Yuan
8fc60b31bc Update PyPi wheel size limit. (#8150) 2022-08-10 18:49:57 +08:00
Jiaming Yuan
9ae547f994 Use config_context in sklearn interface. (#8141) 2022-08-09 14:48:54 +08:00
Bobby Wang
03cc3b359c [pyspark] support a list of feature column names (#8117) 2022-08-08 17:05:27 +08:00
Jiaming Yuan
bcc8679a05 Update CUDA docker image and NCCL. (#8139) 2022-08-07 16:32:41 +08:00
Praateek Mahajan
ff471b3fab In PySpark Estimator example use the model with validation_indicator (#8131)
* use the validation_indicator model

* use the validation_indicator model for regression
2022-08-03 13:57:41 +08:00
Jiaming Yuan
d87f69215e Quantile DMatrix for CPU. (#8130)
- Add a new `QuantileDMatrix` that works for both CPU and GPU.
- Deprecate `DeviceQuantileDMatrix`.
2022-08-02 15:51:23 +08:00
Jiaming Yuan
2cba1d9fcc Fix compatibility with latest cupy. (#8129)
* Fix compatibility with latest cupy.

* Freeze mypy.
2022-08-01 15:24:42 +08:00
Philip Hyunsu Cho
24c2373080 [Doc] Indicate lack of py-xgboost-gpu on Windows (#8127) 2022-07-28 12:57:16 -07:00
Jiaming Yuan
2c70751d1e Implement iterative DMatrix for CPU. (#8116) 2022-07-26 22:34:21 +08:00
Jiaming Yuan
546de5efd2 [pyspark] Cleanup data processing. (#8088)
- Use numpy stack for handling list of arrays.
- Reuse concat function from dask.
- Prepare for `QuantileDMatrix`.
- Remove unused code.
- Use iterator for prediction to avoid initializing xgboost model
2022-07-26 15:00:52 +08:00
Jiaming Yuan
3970e4e6bb Move pylint helper from dmlc-core. (#8101)
* Move pylint helper from dmlc-core.

- Move the helper into the XGBoost ci_build.
- Run it with multiprocessing.

* Fix original test.
2022-07-23 08:12:37 +08:00
Jiaming Yuan
7785d65c8a Fix feature weights with multiple column sampling. (#8100) 2022-07-22 20:23:05 +08:00
Jiaming Yuan
4a4e5c7c18 Prepare gradient index for Quantile DMatrix. (#8103)
* Prepare gradient index for Quantile DMatrix.

- Implement push batch with adapter batch.
- Implement `GetFvalue` for prediction.
2022-07-22 17:26:33 +08:00
Rory Mitchell
1be09848a7 Refactor split valuation kernel (#8073) 2022-07-21 15:41:50 +02:00
Tim Gates
cb40bbdadd docs: fix simple typo, cannonical -> canonical (#8099)
There is a small typo in src/common/partition_builder.h.

Should read `canonical` rather than `cannonical`.

Signed-off-by: Tim Gates <tim.gates@iress.com>
2022-07-20 21:04:50 +08:00
QuellaZhang
703261e78f [MSVC][std:c++latest] Fix compiler error (#8093)
Co-authored-by: QuellaZhang <zhangyi2090@163.com>
2022-07-20 15:15:39 +08:00
Jiaming Yuan
ef11b024e8 Cleanup data generator. (#8094)
- Avoid duplicated definition of data shape.
- Explicitly define numpy iterator for CPU data.
2022-07-20 13:48:52 +08:00
Jiaming Yuan
5156be0f49 Limit max_depth to 30 for GPU. (#8098) 2022-07-20 12:28:49 +08:00
Jiaming Yuan
8bdea72688 [Python] Require black and isort for new Python files. (#8096)
* [Python] Require black and isort for new Python files.

- Require black and isort for spark and dask module.

These files are relatively new and are more conform to the black formatter. We will
convert the rest of the library as we move forward.

Other libraries including dask/distributed and optuna use the same formatting style and
have a more strict standard. The black formatter is indeed quite nice, automating it can
help us unify the code style.

- Gather Python checks into a single script.
2022-07-20 10:25:24 +08:00
WeichenXu
f23cc92130 [pyspark] User guide doc and tutorials (#8082)
Co-authored-by: Bobby Wang <wbo4958@gmail.com>
2022-07-19 22:25:14 +08:00
Bobby Wang
f801d3cf15 [PySpark] change the returning model type to string from binary (#8085)
* [PySpark] change the returning model type to string from binary

XGBoost pyspark can be can be accelerated by RAPIDS Accelerator seamlessly by
changing the returning model type from binary to string.
2022-07-19 18:39:20 +08:00