Jiaming Yuan
6d1452074a
Remove MGPU cpp tests. ( #8276 )
...
Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-09-27 21:18:23 +08:00
Jiaming Yuan
fcab51aa82
Support more pandas nullable types ( #8262 )
...
- Float32/64
- Category.
2022-09-27 01:59:50 +08:00
Rory Mitchell
8f77677193
Use quantised gradients in gpu_hist histograms ( #8246 )
2022-09-26 17:35:35 +02:00
WeichenXu
ff71c69adf
[pyspark] Add validation for param 'early_stopping_rounds' and 'validation_indicator_col' ( #8250 )
...
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-09-26 17:43:03 +08:00
Jiaming Yuan
3fd331f8f2
Add checks to C pointer arguments. ( #8254 )
2022-09-22 19:02:22 +08:00
Dmitry Razdoburdin
eb7bbee2c9
Optional by-column histogram build. ( #8233 )
...
Co-authored-by: dmitry.razdoburdin <drazdobu@jfldaal005.jf.intel.com>
2022-09-22 05:16:13 +08:00
Jiaming Yuan
b791446623
Initial support for IPv6 ( #8225 )
...
- Merge rabit socket into XGBoost.
- Dask interface support.
- Add test to the socket.
2022-09-21 18:06:50 +08:00
Jiaming Yuan
fffb1fca52
Calculate base_score based on input labels for mae. ( #8107 )
...
Fit an intercept as base score for abs loss.
2022-09-20 20:53:54 +08:00
Bobby Wang
4f42aa5f12
[pyspark] make the model saved by pyspark compatible ( #8219 )
...
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2022-09-20 16:43:49 +08:00
Bobby Wang
520586ffa7
[pyspark] fix empty data issue when constructing DMatrix ( #8245 )
...
Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-09-20 16:43:20 +08:00
Philip Hyunsu Cho
70df36c99c
[CI] Retire Jenkins server ( #8243 )
2022-09-14 08:46:23 -07:00
Jiaming Yuan
2e63af6117
Mitigate flaky data iter test. ( #8244 )
...
- Reduce the number of batches.
- Verify labels.
2022-09-14 17:54:14 +08:00
Rong Ou
a2686543a9
Common interface for collective communication ( #8057 )
...
* implement broadcast for federated communicator
* implement allreduce
* add communicator factory
* add device adapter
* add device communicator to factory
* add rabit communicator
* add rabit communicator to the factory
* add nccl device communicator
* add synchronize to device communicator
* add back print and getprocessorname
* add python wrapper and c api
* clean up types
* fix non-gpu build
* try to fix ci
* fix std::size_t
* portable string compare ignore case
* c style size_t
* fix lint errors
* cross platform setenv
* fix memory leak
* fix lint errors
* address review feedback
* add python test for rabit communicator
* fix failing gtest
* use json to configure communicators
* fix lint error
* get rid of factories
* fix cpu build
* fix include
* fix python import
* don't export collective.py yet
* skip collective communicator pytest on windows
* add review feedback
* update documentation
* remove mpi communicator type
* fix tests
* shutdown the communicator separately
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2022-09-12 15:21:12 -07:00
Jiaming Yuan
bc818316f2
Prepare for improving Windows networking compatibility. ( #8234 )
...
* Prepare for improving Windows networking compatibility.
* Include dmlc filesystem indirectly as dmlc/filesystem.h includes windows.h, which
conflicts with winsock2.h
* Define `NOMINMAX` conditionally.
* Link the winsock library when mysys32 is used.
* Add config file for read the doc.
2022-09-10 15:16:49 +08:00
Philip Hyunsu Cho
23faf656ad
[CI] Don't require manual approval for master branch ( #8235 )
2022-09-08 09:26:22 -08:00
Philip Hyunsu Cho
e888eb2fa9
[CI] Migrate CI pipelines from Jenkins to BuildKite ( #8142 )
...
* [CI] Migrate CI pipelines from Jenkins to BuildKite
* Require manual approval
* Less verbose output when pulling Docker
* Remove us-east-2 from metadata.py
* Add documentation
* Add missing underscore
* Add missing punctuation
* More specific instruction
* Better paragraph structure
2022-09-07 16:29:25 -08:00
Jiaming Yuan
b5eb36f1af
Add max_cat_threshold to GPU and handle missing cat values. ( #8212 )
2022-09-07 00:57:51 +08:00
Jiaming Yuan
441ffc017a
Copy data from Ellpack to GHist. ( #8215 )
2022-09-06 23:05:49 +08:00
WeichenXu
d03794ce7a
[pyspark] Add param validation for "objective" and "eval_metric" param, and remove invalid booster params ( #8173 )
...
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-08-24 15:29:43 +08:00
WeichenXu
f4628c22a4
[pyspark] Implement SparkXGBRanker estimator ( #8172 )
...
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-08-23 02:35:19 +08:00
Philip Hyunsu Cho
35ef8abc27
[CI] Prune unused archs from libnccl ( #8179 )
...
* [CI] Prune unused archs from libnccl
* Put pruning logic in CI directory
* Don't use --color in grep
2022-08-21 00:46:16 -08:00
Rong Ou
ad3bc0edee
Allow insecure gRPC connections for federated learning ( #8181 )
...
* Allow insecure gRPC connections for federated learning
* format
2022-08-19 12:16:14 +08:00
WeichenXu
53d2a733b0
[pyspark] Make Xgboost estimator support using sparse matrix as optimization ( #8145 )
...
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-08-19 01:57:28 +08:00
Jiaming Yuan
16bca5d4a1
Support CPU input for device QuantileDMatrix. ( #8136 )
...
- Copy `GHistIndexMatrix` to `Ellpack` when needed.
2022-08-11 21:21:26 +08:00
Jiaming Yuan
36e7c5364d
[dask] Deterministic rank assignment. ( #8018 )
2022-08-11 19:17:58 +08:00
Jiaming Yuan
d868126c39
[CI] Fix R build on Jenkins. ( #8154 )
2022-08-11 14:50:03 +08:00
Jiaming Yuan
570f8ae4ba
Use black on more Python files. ( #8137 )
2022-08-11 01:38:11 +08:00
Jiaming Yuan
446d536c23
Fix loading DMatrix binary in distributed env. ( #8149 )
...
- Try to load DMatrix binary before trying to parse text input.
- Remove some unmaintained code.
2022-08-10 22:53:16 +08:00
Jiaming Yuan
8fc60b31bc
Update PyPi wheel size limit. ( #8150 )
2022-08-10 18:49:57 +08:00
Jiaming Yuan
9ae547f994
Use config_context in sklearn interface. ( #8141 )
2022-08-09 14:48:54 +08:00
Bobby Wang
03cc3b359c
[pyspark] support a list of feature column names ( #8117 )
2022-08-08 17:05:27 +08:00
Jiaming Yuan
bcc8679a05
Update CUDA docker image and NCCL. ( #8139 )
2022-08-07 16:32:41 +08:00
Jiaming Yuan
d87f69215e
Quantile DMatrix for CPU. ( #8130 )
...
- Add a new `QuantileDMatrix` that works for both CPU and GPU.
- Deprecate `DeviceQuantileDMatrix`.
2022-08-02 15:51:23 +08:00
Jiaming Yuan
2cba1d9fcc
Fix compatibility with latest cupy. ( #8129 )
...
* Fix compatibility with latest cupy.
* Freeze mypy.
2022-08-01 15:24:42 +08:00
Jiaming Yuan
2c70751d1e
Implement iterative DMatrix for CPU. ( #8116 )
2022-07-26 22:34:21 +08:00
Jiaming Yuan
546de5efd2
[pyspark] Cleanup data processing. ( #8088 )
...
- Use numpy stack for handling list of arrays.
- Reuse concat function from dask.
- Prepare for `QuantileDMatrix`.
- Remove unused code.
- Use iterator for prediction to avoid initializing xgboost model
2022-07-26 15:00:52 +08:00
Jiaming Yuan
3970e4e6bb
Move pylint helper from dmlc-core. ( #8101 )
...
* Move pylint helper from dmlc-core.
- Move the helper into the XGBoost ci_build.
- Run it with multiprocessing.
* Fix original test.
2022-07-23 08:12:37 +08:00
Jiaming Yuan
7785d65c8a
Fix feature weights with multiple column sampling. ( #8100 )
2022-07-22 20:23:05 +08:00
Jiaming Yuan
4a4e5c7c18
Prepare gradient index for Quantile DMatrix. ( #8103 )
...
* Prepare gradient index for Quantile DMatrix.
- Implement push batch with adapter batch.
- Implement `GetFvalue` for prediction.
2022-07-22 17:26:33 +08:00
Rory Mitchell
1be09848a7
Refactor split valuation kernel ( #8073 )
2022-07-21 15:41:50 +02:00
Jiaming Yuan
ef11b024e8
Cleanup data generator. ( #8094 )
...
- Avoid duplicated definition of data shape.
- Explicitly define numpy iterator for CPU data.
2022-07-20 13:48:52 +08:00
Jiaming Yuan
8bdea72688
[Python] Require black and isort for new Python files. ( #8096 )
...
* [Python] Require black and isort for new Python files.
- Require black and isort for spark and dask module.
These files are relatively new and are more conform to the black formatter. We will
convert the rest of the library as we move forward.
Other libraries including dask/distributed and optuna use the same formatting style and
have a more strict standard. The black formatter is indeed quite nice, automating it can
help us unify the code style.
- Gather Python checks into a single script.
2022-07-20 10:25:24 +08:00
Bobby Wang
f801d3cf15
[PySpark] change the returning model type to string from binary ( #8085 )
...
* [PySpark] change the returning model type to string from binary
XGBoost pyspark can be can be accelerated by RAPIDS Accelerator seamlessly by
changing the returning model type from binary to string.
2022-07-19 18:39:20 +08:00
Jiaming Yuan
2365f82750
[dask] Mitigate non-deterministic test. ( #8077 )
2022-07-19 16:55:59 +08:00
Jiaming Yuan
4083440690
Small cleanups to various data types. ( #8086 )
...
- Use `bst_bin_t` in batch param constructor.
- Use `StringView` to avoid `std::string` when appropriate.
- Avoid using `MetaInfo` in quantile constructor to limit the scope of parameter.
2022-07-18 22:39:36 +08:00
Jiaming Yuan
0ce80b7bcf
Mitigate flaky GPU test. ( #8078 )
...
The flakiness is caused by the global random engine, which will take some time to fix.
2022-07-16 13:45:32 +08:00
Jiaming Yuan
7a5586f3db
Fix GPU quantile distributed test. ( #8076 )
2022-07-16 11:40:53 +08:00
Jiaming Yuan
647d3844dd
Make test for categorical data deterministic. ( #8080 )
2022-07-15 14:48:39 +08:00
Jiaming Yuan
dae7a41baa
Update Python requirement to >=3.8. ( #8071 )
...
Additional changes:
- Use mamba for CPU test on Jenkins.
- Cleanup CPU test dependencies.
- Restore some of the modin tests
2022-07-14 18:01:47 +08:00
Jiaming Yuan
abaa593aa0
Fix compiler warnings. ( #8059 )
...
- Remove unused parameters.
- Avoid comparison of different signedness.
2022-07-14 05:29:56 +08:00