Commit Graph

1138 Commits

Author SHA1 Message Date
Jiaming Yuan
60a8c8ebba [pyspark] sort qid for SparkRanker (#8497) (#8555)
* [pyspark] sort qid for SparkRandker

* resolve comments

Co-authored-by: Bobby Wang <wbo4958@gmail.com>
2022-12-07 02:07:37 +08:00
Jiaming Yuan
58bc225657 [backport] [CI] Fix github action mismatched glibcxx. (#8551) (#8552)
Split up the Linux test to use the toolchain from conda forge.
2022-12-06 21:35:26 +08:00
Philip Hyunsu Cho
db14e3feb7 Support null value in CUDA array interface. (#8486) (#8499) 2022-11-30 11:44:54 -08:00
Philip Hyunsu Cho
5b76acccff Add back xgboost.rabit for backwards compatibility (#8408) (#8411) 2022-11-02 07:56:55 -07:00
Dmitry Razdoburdin
5bd849f1b5 Unify the partitioner for hist and approx.
Co-authored-by: dmitry.razdoburdin <drazdobu@jfldaal005.jf.intel.com>
Co-authored-by: jiamingy <jm.yuan@outlook.com>
2022-10-20 02:49:20 +08:00
Jiaming Yuan
c884b9e888 Validate features for inplace predict. (#8359) 2022-10-19 23:05:36 +08:00
Bobby Wang
76f95a6667 [pyspark] Filter out the unsupported train parameters (#8355) 2022-10-18 23:26:02 +08:00
Jiaming Yuan
3901f5d9db [pyspark] Cleanup data processing. (#8344)
* Enable additional combinations of ctor parameters.
* Unify procedures for QuantileDMatrix and DMatrix.
2022-10-18 14:56:23 +08:00
Rong Ou
8f3dee58be Speed up tests with federated learning enabled (#8350)
* Speed up tests with federated learning enabled

* Re-enable timeouts

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-17 15:17:04 -07:00
Jiaming Yuan
031d66ec27 Configuration for init estimation. (#8343)
* Configuration for init estimation.

* Check whether the model needs configuration based on const attribute `ModelFitted`
instead of a mutable state.
* Add parameter `boost_from_average` to tell whether the user has specified base score.
* Add tests.
2022-10-18 01:52:24 +08:00
Jiaming Yuan
2176e511fc Disable pytest-timeout for now. (#8348) 2022-10-17 23:06:10 +08:00
Jiaming Yuan
748d516c50 [pyspark] Enable running GPU tests on variable number of GPUs. (#8335) 2022-10-13 21:03:45 +08:00
Jiaming Yuan
3ef1703553 Allow using string view to find JSON value. (#8332)
- Allow comparison between string and string view.
- Fix compiler warnings.
2022-10-13 17:10:13 +08:00
Philip Hyunsu Cho
29595102b9 [CI] Set up test analytics for CPU Python tests (#8333)
* [CI] Set up test analytics for CPU Python tests

* Install test collector
2022-10-12 23:15:50 -07:00
Philip Hyunsu Cho
2faa744aba [CI] Test federated learning plugin in the CI (#8325) 2022-10-12 13:57:39 -07:00
Jiaming Yuan
97a5b088a5 [pyspark] Use quantile dmatrix. (#8284) 2022-10-12 20:38:53 +08:00
Rory Mitchell
ce0382dcb0 [CI] Refactor tests to reduce CI time. (#8312) 2022-10-12 11:32:06 +02:00
Rong Ou
39afdac3be Better error message when world size and rank are set as strings (#8316)
Co-authored-by: jiamingy <jm.yuan@outlook.com>
2022-10-12 15:53:25 +08:00
Rory Mitchell
210915c985 Use integer gradients in gpu_hist split evaluation (#8274) 2022-10-11 12:16:27 +02:00
Jiaming Yuan
5545c49cfc Require keyword args for data iterator. (#8327) 2022-10-10 17:47:13 +08:00
Jiaming Yuan
e1f9f80df2 Use gpu predictor for get csr test. (#8323) 2022-10-10 16:12:37 +08:00
Philip Hyunsu Cho
50ff8a2623 More CI improvements (#8313)
* Reduce clutter in log of Python test

* Set up BuildKite test analytics

* Add separate step for building containers

* Enable incremental update of CI stack; custom agent IAM policy
2022-10-06 06:33:46 -08:00
Philip Hyunsu Cho
bc7a6ec603 Fix clang tidy (#8314)
* Fix clang-tidy

* Exempt clang-tidy from budget check

* Move clang-tidy
2022-10-06 05:16:06 -08:00
Rory Mitchell
909e49e214 Reduce docker image size. (#8306) 2022-10-05 15:55:51 -08:00
Rong Ou
668b8a0ea4 [Breaking] Switch from rabit to the collective communicator (#8257)
* Switch from rabit to the collective communicator

* fix size_t specialization

* really fix size_t

* try again

* add include

* more include

* fix lint errors

* remove rabit includes

* fix pylint error

* return dict from communicator context

* fix communicator shutdown

* fix dask test

* reset communicator mocklist

* fix distributed tests

* do not save device communicator

* fix jvm gpu tests

* add python test for federated communicator

* Update gputreeshap submodule

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-05 14:39:01 -08:00
Jiaming Yuan
e47b3a3da3 Upgrade mypy. (#8302)
Some breaking changes were made in mypy.
2022-10-05 14:31:59 +08:00
Philip Hyunsu Cho
b2bbf49015 Additional improvements to CI (#8303)
* Wait until budget check is complete

* Ensure that multi-GPU tests run for the master branch

* Fix
2022-10-04 03:03:38 -08:00
Rory Mitchell
d686bf52a6 Reduce time for some multi-gpu tests (#8288)
* Faster dask tests

* Reuse AllReducer objects in tests.

* Faster boost from prediction tests.

* Use rmm dask fixture.

* Speed up dask demo.

* mypy

* Format with black.

* mypy

* Clang-tidy

Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-10-04 02:49:33 -08:00
Philip Hyunsu Cho
ca0547bb65 [CI] Use RAPIDS 22.10 (#8298)
* [CI] Use RAPIDS 22.10

* Store CUDA and RAPIDS versions in one place

* Fix

* Add missing #include

* Update gputreeshap submodule

* Fix

* Remove outdated distributed tests
2022-10-03 23:18:07 -08:00
Philip Hyunsu Cho
37886a5dff [CI] Document the use of Docker wrapper script (#8297)
* [CI] Document the use of Docker wrapper script

* Grammer fixes

* Document buildkite pipeline defs

* tests/buildkite/*.sh isn't meant to run locally
2022-10-02 12:45:00 -07:00
Philip Hyunsu Cho
9af99760d4 Various CI savings (#8291) 2022-09-30 05:42:56 -07:00
Jiaming Yuan
299e5000a4 Fix buildkite label. (#8287) 2022-09-29 17:33:19 -07:00
Jiaming Yuan
55cf24cc32 Obtain CSR matrix from DMatrix. (#8269) 2022-09-29 20:41:43 +08:00
Philip Hyunsu Cho
b14c44ee5e [CI] Put Multi-GPU test suites in separate pipeline (#8286)
* [CI] Put Multi-GPU test suites in separate pipeline

* Avoid unset var error in Bash
2022-09-29 00:41:48 -08:00
Jiaming Yuan
6925b222e0 Fix mixed types with cuDF. (#8280) 2022-09-29 00:57:52 +08:00
Jiaming Yuan
6d1452074a Remove MGPU cpp tests. (#8276)
Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-09-27 21:18:23 +08:00
Jiaming Yuan
fcab51aa82 Support more pandas nullable types (#8262)
- Float32/64
- Category.
2022-09-27 01:59:50 +08:00
Rory Mitchell
8f77677193 Use quantised gradients in gpu_hist histograms (#8246) 2022-09-26 17:35:35 +02:00
WeichenXu
ff71c69adf [pyspark] Add validation for param 'early_stopping_rounds' and 'validation_indicator_col' (#8250)
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
2022-09-26 17:43:03 +08:00
Jiaming Yuan
3fd331f8f2 Add checks to C pointer arguments. (#8254) 2022-09-22 19:02:22 +08:00
Dmitry Razdoburdin
eb7bbee2c9 Optional by-column histogram build. (#8233)
Co-authored-by: dmitry.razdoburdin <drazdobu@jfldaal005.jf.intel.com>
2022-09-22 05:16:13 +08:00
Jiaming Yuan
b791446623 Initial support for IPv6 (#8225)
- Merge rabit socket into XGBoost.
- Dask interface support.
- Add test to the socket.
2022-09-21 18:06:50 +08:00
Jiaming Yuan
fffb1fca52 Calculate base_score based on input labels for mae. (#8107)
Fit an intercept as base score for abs loss.
2022-09-20 20:53:54 +08:00
Bobby Wang
4f42aa5f12 [pyspark] make the model saved by pyspark compatible (#8219)
Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2022-09-20 16:43:49 +08:00
Bobby Wang
520586ffa7 [pyspark] fix empty data issue when constructing DMatrix (#8245)
Co-authored-by: Hyunsu Philip Cho <chohyu01@cs.washington.edu>
2022-09-20 16:43:20 +08:00
Philip Hyunsu Cho
70df36c99c [CI] Retire Jenkins server (#8243) 2022-09-14 08:46:23 -07:00
Jiaming Yuan
2e63af6117 Mitigate flaky data iter test. (#8244)
- Reduce the number of batches.
- Verify labels.
2022-09-14 17:54:14 +08:00
Rong Ou
a2686543a9 Common interface for collective communication (#8057)
* implement broadcast for federated communicator

* implement allreduce

* add communicator factory

* add device adapter

* add device communicator to factory

* add rabit communicator

* add rabit communicator to the factory

* add nccl device communicator

* add synchronize to device communicator

* add back print and getprocessorname

* add python wrapper and c api

* clean up types

* fix non-gpu build

* try to fix ci

* fix std::size_t

* portable string compare ignore case

* c style size_t

* fix lint errors

* cross platform setenv

* fix memory leak

* fix lint errors

* address review feedback

* add python test for rabit communicator

* fix failing gtest

* use json to configure communicators

* fix lint error

* get rid of factories

* fix cpu build

* fix include

* fix python import

* don't export collective.py yet

* skip collective communicator pytest on windows

* add review feedback

* update documentation

* remove mpi communicator type

* fix tests

* shutdown the communicator separately

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2022-09-12 15:21:12 -07:00
Jiaming Yuan
bc818316f2 Prepare for improving Windows networking compatibility. (#8234)
* Prepare for improving Windows networking compatibility.

* Include dmlc filesystem indirectly as dmlc/filesystem.h includes windows.h, which
  conflicts with winsock2.h
* Define `NOMINMAX` conditionally.
* Link the winsock library when mysys32 is used.
* Add config file for read the doc.
2022-09-10 15:16:49 +08:00
Philip Hyunsu Cho
23faf656ad [CI] Don't require manual approval for master branch (#8235) 2022-09-08 09:26:22 -08:00