xgboost

Author	SHA1	Message	Date
Jiaming Yuan	2c70751d1e	Implement iterative DMatrix for CPU. (#8116 )	2022-07-26 22:34:21 +08:00
Jiaming Yuan	546de5efd2	[pyspark] Cleanup data processing. (#8088 ) - Use numpy stack for handling list of arrays. - Reuse concat function from dask. - Prepare for `QuantileDMatrix`. - Remove unused code. - Use iterator for prediction to avoid initializing xgboost model	2022-07-26 15:00:52 +08:00
Jiaming Yuan	3970e4e6bb	Move pylint helper from dmlc-core. (#8101 ) * Move pylint helper from dmlc-core. - Move the helper into the XGBoost ci_build. - Run it with multiprocessing. * Fix original test.	2022-07-23 08:12:37 +08:00
Jiaming Yuan	7785d65c8a	Fix feature weights with multiple column sampling. (#8100 )	2022-07-22 20:23:05 +08:00
Jiaming Yuan	4a4e5c7c18	Prepare gradient index for Quantile DMatrix. (#8103 ) * Prepare gradient index for Quantile DMatrix. - Implement push batch with adapter batch. - Implement `GetFvalue` for prediction.	2022-07-22 17:26:33 +08:00
Rory Mitchell	1be09848a7	Refactor split valuation kernel (#8073 )	2022-07-21 15:41:50 +02:00
Tim Gates	cb40bbdadd	docs: fix simple typo, cannonical -> canonical (#8099 ) There is a small typo in src/common/partition_builder.h. Should read `canonical` rather than `cannonical`. Signed-off-by: Tim Gates <tim.gates@iress.com>	2022-07-20 21:04:50 +08:00
QuellaZhang	703261e78f	[MSVC][std:c++latest] Fix compiler error (#8093 ) Co-authored-by: QuellaZhang <zhangyi2090@163.com>	2022-07-20 15:15:39 +08:00
Jiaming Yuan	ef11b024e8	Cleanup data generator. (#8094 ) - Avoid duplicated definition of data shape. - Explicitly define numpy iterator for CPU data.	2022-07-20 13:48:52 +08:00
Jiaming Yuan	5156be0f49	Limit `max_depth` to 30 for GPU. (#8098 )	2022-07-20 12:28:49 +08:00
Jiaming Yuan	8bdea72688	[Python] Require black and isort for new Python files. (#8096 ) * [Python] Require black and isort for new Python files. - Require black and isort for spark and dask module. These files are relatively new and are more conform to the black formatter. We will convert the rest of the library as we move forward. Other libraries including dask/distributed and optuna use the same formatting style and have a more strict standard. The black formatter is indeed quite nice, automating it can help us unify the code style. - Gather Python checks into a single script.	2022-07-20 10:25:24 +08:00
WeichenXu	f23cc92130	[pyspark] User guide doc and tutorials (#8082 ) Co-authored-by: Bobby Wang <wbo4958@gmail.com>	2022-07-19 22:25:14 +08:00
Bobby Wang	f801d3cf15	[PySpark] change the returning model type to string from binary (#8085 ) * [PySpark] change the returning model type to string from binary XGBoost pyspark can be can be accelerated by RAPIDS Accelerator seamlessly by changing the returning model type from binary to string.	2022-07-19 18:39:20 +08:00
Jiaming Yuan	2365f82750	[dask] Mitigate non-deterministic test. (#8077 )	2022-07-19 16:55:59 +08:00
Rong Ou	7a6b711eb8	Remove unused updater basemaker (#8091 )	2022-07-19 15:41:27 +08:00
Philip Hyunsu Cho	4325178822	[CI] Clear workspace after budget check (#8092 ) * [CI] Clear workspace after budget check * Windows too	2022-07-18 19:17:33 -07:00
Jiaming Yuan	4083440690	Small cleanups to various data types. (#8086 ) - Use `bst_bin_t` in batch param constructor. - Use `StringView` to avoid `std::string` when appropriate. - Avoid using `MetaInfo` in quantile constructor to limit the scope of parameter.	2022-07-18 22:39:36 +08:00
Jiaming Yuan	e28f6f6657	[doc] Integrate pyspark module into sphinx doc [skip ci] (#8066 )	2022-07-17 10:46:09 +08:00
Rafail Giavrimis	579ab23b10	Check cudf lazily (#8084 )	2022-07-17 09:27:43 +08:00
Bobby Wang	a33f35eecf	[PySpark] add gpu support for spark local mode (#8068 )	2022-07-17 07:59:06 +08:00
Bobby Wang	91bb9e2cb3	[PySpark] fix raw_prediction_col parameter and minor cleanup (#8067 )	2022-07-16 17:58:57 +08:00
Jiaming Yuan	0ce80b7bcf	Mitigate flaky GPU test. (#8078 ) The flakiness is caused by the global random engine, which will take some time to fix.	2022-07-16 13:45:32 +08:00
Jiaming Yuan	7a5586f3db	Fix GPU quantile distributed test. (#8076 )	2022-07-16 11:40:53 +08:00
Jiaming Yuan	8fccc3c4ad	[dask] Fix potential error in demo. (#8079 ) * Use dask_cudf instead.	2022-07-15 18:42:29 +08:00
Jiaming Yuan	647d3844dd	Make test for categorical data deterministic. (#8080 )	2022-07-15 14:48:39 +08:00
Jiaming Yuan	dae7a41baa	Update Python requirement to >=3.8. (#8071 ) Additional changes: - Use mamba for CPU test on Jenkins. - Cleanup CPU test dependencies. - Restore some of the modin tests	2022-07-14 18:01:47 +08:00
Jiaming Yuan	8dd96013f1	Split up column matrix initialization. (#8060 ) * Split up column matrix initialization. This PR splits the column matrix initialization into 2 steps, the first one initializes the storage while the second one does the transpose. By doing so, we can reuse the code for Quantile DMatrix.	2022-07-14 10:34:47 +08:00
Philip Hyunsu Cho	36cf979b82	[CI] Fix S3 uploads (#8069 ) * [CI] Fix S3 upload issues * Don't launch Docker containers when uploading to S3	2022-07-13 16:23:00 -07:00
Jiaming Yuan	abaa593aa0	Fix compiler warnings. (#8059 ) - Remove unused parameters. - Avoid comparison of different signedness.	2022-07-14 05:29:56 +08:00
Jiaming Yuan	937352c78f	Fix R package Windows build. (#8065 )	2022-07-14 05:27:38 +08:00
WeichenXu	176fec8789	PySpark XGBoost integration (#8020 ) Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu> Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com>	2022-07-13 13:11:18 +08:00
Jiaming Yuan	8959622836	[dask] Use an invalid port for test. (#8064 )	2022-07-13 11:59:02 +08:00
Rory Mitchell	0bdaca25ca	Use single precision in gain calculation, use pointers instead of span. (#8051 )	2022-07-12 21:56:27 +02:00
Jiaming Yuan	a5bc8e2c6a	Fix mypy error with the latest dask. (#8052 ) * Fix mypy error with latest dask. Dask is adding type hints to its codebase and as the result, checks in XGBoost can be performed more rigorously. - Remove compatibility with old dask version where multi lock was missing. - Restrict input of `X` to be non-series. - Adopt latest definition of `Delayed`. - Avoid passing optional `host_ip`. - Avoid deprecated `worker.nthreads`.	2022-07-09 08:02:42 +08:00
Jiaming Yuan	210eb471e9	[R] Implement feature info for DMatrix. (#8048 )	2022-07-09 05:57:39 +08:00
Jiaming Yuan	701f32b227	[py-sckl] Raise import error if skl is not installed. (#8049 )	2022-07-09 05:56:46 +08:00
Rory Mitchell	794cbaa60a	Fuse split evaluation kernels (#8026 )	2022-07-05 10:24:31 +02:00
Jiaming Yuan	ff1c559084	Remove unused variable. (#8046 )	2022-07-05 01:59:22 +08:00
Jiaming Yuan	8746f9cddf	Rename `IterativeDMatrix`. (#8045 )	2022-07-04 18:52:31 +08:00
Jiaming Yuan	f24bfc7684	Bump R cache version. (#8044 )	2022-07-03 03:53:05 +08:00
Michael Chirico	3af02584c1	error early if missing DiagrammeR (#8037 )	2022-07-02 19:37:53 +08:00
Rory Mitchell	bc4f802b17	Batch UpdatePosition using cudaMemcpy (#7964 )	2022-06-30 17:52:40 +02:00
kiwiwarmnfuzzy	2407381c3d	Force auc.cc to be statically linked (#8039 )	2022-06-30 19:24:22 +08:00
Jiaming Yuan	e88d6e071d	Fix compiler warning in JSON IO. (#8031 ) Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>	2022-06-30 01:13:22 +08:00
Jiaming Yuan	dcaf580476	Fix Python package source install. (#8036 ) * Copy gputreeshap.	2022-06-29 21:45:09 +08:00
Rong Ou	6eb23353d7	Update nvflare demo for release 2.1.2 (#8038 )	2022-06-29 17:58:06 +08:00
Joris LIMONIER	f470ad3af9	Fix multiple typos (#8028 ) Fix 4 "graphiz" instead of "graphviz".	2022-06-27 19:21:58 +08:00
Rong Ou	45dc1f818a	Make federated plugin work with cmake 3.16.3 (#8029 )	2022-06-27 17:26:41 +08:00
Rong Ou	0725fd6081	fix federated learning plugin (#8027 )	2022-06-24 08:41:07 +08:00
Bobby Wang	a68580e2a7	[jvm-packages] fix executor crashing issue when transforming on xgboost4j-spark-gpu (#8025 ) * [jvm-packages] fix executor crashing issue when transforming on xgboost4j-spark-gpu the API XGBoosterSetParam is not thread-safe. Dring the phase of transforming, XGBoost runs several transforming tasks at a time, and each of them will set the "gpu_id" and "predictor" parameters, so if several tasks (multi-threads) all XGBoosterSetParam simultaneously, it may cause the memory to be corrupted and cause SIGSEGV. This PR first get the booster from broadcast and set to the correct gpu_id and predictor, and then all transforming taskes will use the same booster to do the transforming.	2022-06-24 01:18:41 +08:00

... 2 3 4 5 6 ...

6029 Commits