97 Commits

Author SHA1 Message Date
Jiaming Yuan
bcc8679a05
Update CUDA docker image and NCCL. (#8139) 2022-08-07 16:32:41 +08:00
Philip Hyunsu Cho
4325178822
[CI] Clear workspace after budget check (#8092)
* [CI] Clear workspace after budget check

* Windows too
2022-07-18 19:17:33 -07:00
Philip Hyunsu Cho
36cf979b82
[CI] Fix S3 uploads (#8069)
* [CI] Fix S3 upload issues

* Don't launch Docker containers when uploading to S3
2022-07-13 16:23:00 -07:00
Jiaming Yuan
d48123d23b
Fix rmm build (#7973)
- Optionally switch to c++17
- Use rmm CMake target.
- Workaround compiler errors.
- Fix GPUMetric inheritance.
- Run death tests even if it's built with RMM support.

Co-authored-by: jakirkham <jakirkham@gmail.com>
2022-06-06 20:18:32 +08:00
Philip Hyunsu Cho
e5ab8f3ebe
[CI] Speed up CPU test pipeline (#7772) 2022-04-01 02:39:04 +08:00
Philip Hyunsu Cho
1b25dd59f9
Use CUDA 11 in clang-tidy (#7701)
* Show command args when clang-tidy fails

* Add option to specify CUDA args

* Use clang-tidy 11

* [CI] Use CUDA 11
2022-02-24 15:15:07 -08:00
Philip Hyunsu Cho
0149f81a5a
[CI] Fix S3 upload (#7662) 2022-02-16 01:35:27 -08:00
Philip Hyunsu Cho
34a238ca98
[CI] Clean up Python wheel build pipeline (#7626)
* [CI] Always upload artifacts to [branch_name]/

* [CI] Move detailed setup inside build_python_wheels.sh

* Fix typo
2022-02-03 00:55:44 -08:00
Bobby Wang
e8c1eb99e4
[jvm-package] Clean up the legacy gpu support tests (#7523) 2021-12-21 09:15:51 +08:00
Philip Hyunsu Cho
2adf222fb2
[CI] CI cost saving (#7407)
* [CI] Drop CUDA 10.1; Require 11.0

* Change NCCL version

* Use CUDA 10.1 for clang-tidy, for now

* Remove JDK 11 and 12

* Fix NCCL version

* Don't require 11.0 just yet, until clang-tidy is fixed

* Skip MultiClassesSerializationTest.GpuHist
2021-11-17 21:02:20 -08:00
Jiaming Yuan
ca17f8a5fc
Dispatch thrust versions and upgrade rmm. (#7254)
Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2021-09-25 03:43:23 +08:00
Bobby Wang
0ee11dac77
[jvm-packages][xgboost4j-gpu] Support GPU dataframe and DeviceQuantileDMatrix (#7195)
Following classes are added to support dataframe in java binding:

- `Column` is an abstract type for a single column in tabular data.
- `ColumnBatch` is an abstract type for dataframe.

- `CuDFColumn` is an implementaiton of `Column` that consume cuDF column
- `CudfColumnBatch` is an implementation of `ColumnBatch` that consumes cuDF dataframe.

- `DeviceQuantileDMatrix` is the interface for quantized data.

The Java implementation mimics the Python interface and uses `__cuda_array_interface__` protocol for memory indexing.  One difference is on JVM package, the data batch is staged on the host as java iterators cannot be reset.

Co-authored-by: jiamingy <jm.yuan@outlook.com>
2021-09-24 14:25:00 +08:00
Philip Hyunsu Cho
f1a4a1ac95
[CI] Upgrade build image to CentOS 7 + GCC 8; require CUDA 10.1 and later (#7141) 2021-07-29 10:54:33 -07:00
Jiaming Yuan
f937f514aa
Remove lz4 compression with external memory. (#7076) 2021-07-06 14:46:43 +08:00
Philip Hyunsu Cho
05db6a6c29
[CI] Upgrade cuDF and RMM to 21.06 nightly (#7012)
* [CI] Upgrade cuDF and RMM to 21.06 nightly

* Trim outdated test cases

* Pin Dask version to 2021.05.0 for now
2021-06-02 11:59:30 -07:00
Philip Hyunsu Cho
90cd724be1
[CI] Fix CI/CD pipeline broken by latest auditwheel (4.0.0) (#6951) 2021-05-10 22:43:15 -07:00
Philip Hyunsu Cho
ea7a6a0321
[CI] Pack R package tarball with pre-built xgboost.so (with GPU support) (#6827)
* Add scripts for packaging R package with GPU-enabled libxgboost.so

* [CI] Automatically build R package tarball

* Add comments

* Don't build tarball for pull requests

* Update the installation doc
2021-04-07 21:15:34 -07:00
Philip Hyunsu Cho
0ad6e18a2a
[CI] Do not mix up stashed executable built for ARM and x86_64 platforms (#6646) 2021-01-27 23:57:26 +09:00
Philip Hyunsu Cho
55ee2bd77f
[CI] Add ARM64 test to Jenkins pipeline (#6643)
* Add ARM64 test to Jenkins pipeline

* Check for bundled libgomp

* Use a separate test suite for ARM64

* Ensure that x86 jobs don't run on ARM workers
2021-01-27 21:51:17 +09:00
Philip Hyunsu Cho
b8044e6136
[CI] Use manylinux2010_x86_64 container to vendor libgomp (#6485) 2020-12-10 07:37:15 -08:00
Philip Hyunsu Cho
84b726ef53
Vendor libgomp in the manylinux Python wheel (#6461)
* Vendor libgomp in the manylinux2014_aarch64 wheel

* Use vault repo, since CentOS 6 has reached End-of-Life on Nov 30

* Vendor libgomp in the manylinux2010_x86_64 wheel

* Run verification step inside the container
2020-12-03 19:55:32 -08:00
Philip Hyunsu Cho
7f101d1b33
[CI] Remove R check from Jenkins (#6372)
* Remove R check from Jenkins

* Print stacktrace when CRAN test fail in GitHub Actions

* Add verbose flag in tests/ci_build/print_r_stacktrace.sh

* Fix path in tests/ci_build/print_r_stacktrace.sh
2020-11-10 22:46:54 -08:00
Jiaming Yuan
b5c2a47b20
Drop single point model recovery (#6262)
* Pass rabit params in JVM package.
* Implement timeout using poll timeout parameter.
* Remove OOB data check.
2020-10-21 15:27:03 +08:00
Philip Hyunsu Cho
c991eb612d
[jvm-packages] Fix up build for xgboost4j-gpu, xgboost4j-spark-gpu (#6216)
* [CI] Clean up build for JVM packages

* Use correct path for saving native lib

* Fix groupId of maven-surefire-plugin

* Fix stashing of xgboost4j_jar_gpu

* [CI] Don't run xgboost4j-tester with GPU, since it doesn't use gpu_hist
2020-10-09 14:08:15 -07:00
Jiaming Yuan
4cfdcaaf7b
Move non-OpenMP gtest to GitHub Actions (#6210) 2020-10-08 00:58:21 -07:00
Philip Hyunsu Cho
f121f2738f
[CI] Fix Docker build for CUDA 11 (#6202) 2020-10-05 17:54:14 -07:00
Philip Hyunsu Cho
9338582d79
[CI] Fix CTest by running it in a correct directory (#6104)
* [CI] Fix CTest by running it in a correct directory

* [CI] Do not run dmlc-core unit tests with sanitizer
2020-09-09 10:31:09 -07:00
Jiaming Yuan
3dcd85fab5
Refactor rabit tests (#6096)
* Merge rabit tests into XGBoost.
* Run them On CI.
* Simplification for CMake scripts.
2020-09-09 12:30:29 +08:00
Philip Hyunsu Cho
cfced58c1c
[CI] Port CI fixes from the 1.2.0 branch (#6050)
* Fix a unit test on CLI, to handle RC versions

* [CI] Use mgpu machine to run gpu hist unit tests

* [CI] Build GPU-enabled JAR artifact and deploy to xgboost-maven-repo
2020-08-22 23:24:46 -07:00
Philip Hyunsu Cho
1fd29edf66
[CI] Migrate linters to GitHub Actions (#6035)
* [CI] Move lint to GitHub Actions

* [CI] Move Doxygen to GitHub Actions

* [CI] Move Sphinx build test to GitHub Actions

* [CI] Reduce workload for Windows R tests

* [CI] Move clang-tidy to Build stage
2020-08-19 12:33:51 -07:00
Philip Hyunsu Cho
e3ec7b01df
[CI] Cancel builds on subsequent pushes (#6011)
* [CI] Cancel builds on subsequent pushes

* Use a more secure method

* test commit
2020-08-13 11:17:39 -07:00
Philip Hyunsu Cho
9adb812a0a
RMM integration plugin (#5873)
* [CI] Add RMM as an optional dependency

* Replace caching allocator with pool allocator from RMM

* Revert "Replace caching allocator with pool allocator from RMM"

This reverts commit e15845d4e72e890c2babe31a988b26503a7d9038.

* Use rmm::mr::get_default_resource()

* Try setting default resource (doesn't work yet)

* Allocate pool_mr in the heap

* Prevent leaking pool_mr handle

* Separate EXPECT_DEATH() in separate test suite suffixed DeathTest

* Turn off death tests for RMM

* Address reviewer's feedback

* Prevent leaking of cuda_mr

* Fix Jenkinsfile syntax

* Remove unnecessary function in Jenkinsfile

* [CI] Install NCCL into RMM container

* Run Python tests

* Try building with RMM, CUDA 10.0

* Do not use RMM for CUDA 10.0 target

* Actually test for test_rmm flag

* Fix TestPythonGPU

* Use CNMeM allocator, since pool allocator doesn't yet support multiGPU

* Use 10.0 container to build RMM-enabled XGBoost

* Revert "Use 10.0 container to build RMM-enabled XGBoost"

This reverts commit 789021fa31112e25b683aef39fff375403060141.

* Fix Jenkinsfile

* [CI] Assign larger /dev/shm to NCCL

* Use 10.2 artifact to run multi-GPU Python tests

* Add CUDA 10.0 -> 11.0 cross-version test; remove CUDA 10.0 target

* Rename Conda env rmm_test -> gpu_test

* Use env var to opt into CNMeM pool for C++ tests

* Use identical CUDA version for RMM builds and tests

* Use Pytest fixtures to enable RMM pool in Python tests

* Move RMM to plugin/CMakeLists.txt; use PLUGIN_RMM

* Use per-device MR; use command arg in gtest

* Set CMake prefix path to use Conda env

* Use 0.15 nightly version of RMM

* Remove unnecessary header

* Fix a unit test when cudf is missing

* Add RMM demos

* Remove print()

* Use HostDeviceVector in GPU predictor

* Simplify pytest setup; use LocalCUDACluster fixture

* Address reviewers' commments

Co-authored-by: Hyunsu Cho <chohyu01@cs.wasshington.edu>
2020-08-12 01:26:02 -07:00
Philip Hyunsu Cho
5f3c811e84
[CI] Assign larger /dev/shm to NCCL (#5966)
* [CI] Assign larger /dev/shm to NCCL

* Use 10.2 artifact to run multi-GPU Python tests

* Add CUDA 10.0 -> 11.0 cross-version test; remove CUDA 10.0 target
2020-07-31 10:05:04 -07:00
Philip Hyunsu Cho
071e10c1d1
[CI] Fix broken Docker container 'cpu' (#5956) 2020-07-29 04:29:57 -07:00
Bobby Wang
8943eb4314
[BLOCKING] [jvm-packages] add gpu_hist and enable gpu scheduling (#5171)
* [jvm-packages] add gpu_hist tree method

* change updater hist to grow_quantile_histmaker

* add gpu scheduling

* pass correct parameters to xgboost library

* remove debug info

* add use.cuda for pom

* add CI for gpu_hist for jvm

* add gpu unit tests

* use gpu node to build jvm

* use nvidia-docker

* Add CLI interface to create_jni.py using argparse

Co-authored-by: Hyunsu Cho <chohyu01@cs.washington.edu>
2020-07-26 21:53:24 -07:00
Andy Adinets
b3d2e7644a
Support building XGBoost with CUDA 11 (#5808)
* Change serialization test.
* Add CUDA 11 tests on Linux CI.

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-07-20 07:58:41 +08:00
Philip Hyunsu Cho
ac9136ee49
Further improvements and savings in Jenkins pipeline (#5904)
* Publish artifacts only on the master and release branches

* Build CUDA only for Compute Capability 7.5 when building PRs

* Run all Windows jobs in a single worker image

* Build nightly XGBoost4J SNAPSHOT JARs with Scala 2.12 only

* Show skipped Python tests on Windows

* Make Graphviz optional for Python tests

* Add back C++ tests

* Unstash xgboost_cpp_tests

* Fix label to CUDA 10.1

* Install cuPy for CUDA 10.1

* Install jsonschema

* Address reviewer's feedback
2020-07-18 03:30:40 -07:00
Philip Hyunsu Cho
71b0528a2f
GPU implementation of AFT survival objective and metric (#5714)
* Add interval accuracy

* De-virtualize AFT functions

* Lint

* Refactor AFT metric using GPU-CPU reducer

* Fix R build

* Fix build on Windows

* Fix copyright header

* Clang-tidy

* Fix crashing demo

* Fix typos in comment; explain GPU ID

* Remove unnecessary #include

* Add C++ test for interval accuracy

* Fix a bug in accuracy metric: use log pred

* Refactor AFT objective using GPU-CPU Transform

* Lint

* Fix lint

* Use Ninja to speed up build

* Use time, not /usr/bin/time

* Add cpu_build worker class, with concurrency = 1

* Use concurrency = 1 only for CUDA build

* concurrency = 1 for clang-tidy

* Address reviewer's feedback

* Update link to AFT paper
2020-07-17 01:18:13 -07:00
Bobby Wang
730866a7bc
[CI] update spark version to 3.0.0 (#5890)
* [CI] update spark version to 3.0.0

* Update Dockerfile.jvm_cross

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-07-16 00:23:44 -07:00
Philip Hyunsu Cho
e0c179c7cc
[CI] Enforce daily budget in Jenkins CI (#5884)
* [CI] Throttle Jenkins CI

* Don't use Jenkins master instance
2020-07-13 21:51:11 -07:00
Philip Hyunsu Cho
0d411b0397
[CI] Simplify CMake build with modern CMake techniques (#5871)
* [CI] Simplify CMake build

* Make sure that plugins can be built

* [CI] Install lz4 on Mac
2020-07-08 04:23:24 -07:00
Jiaming Yuan
048d969be4
Implement GK sketching on GPU. (#5846)
* Implement GK sketching on GPU.
* Strong tests on quantile building.
* Handle sparse dataset by binary searching the column index.
* Hypothesis test on dask.
2020-07-07 12:16:21 +08:00
Philip Hyunsu Cho
a6d9a06b7b
[CI] Fix cuDF install; merge 'gpu' and 'cudf' test suite (#5814) 2020-06-19 16:42:57 +08:00
Philip Hyunsu Cho
b77e3e3fcc
[CI] Remove CUDA 9.0 from CI (#5745) 2020-06-01 18:15:45 -07:00
Philip Hyunsu Cho
91c646392d
Require Python 3.6+; drop Python 3.5 from CI (#5715) 2020-05-27 16:19:30 -07:00
Jiaming Yuan
9ad40901a8
Upgrade to CUDA 10.0 (#5649) (#5652)
Co-authored-by: fis <jm.yuan@outlook.com>

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2020-05-11 22:27:36 +08:00
Philip Hyunsu Cho
ef26bc45bf
Hide C++ symbols in libxgboost.so when building Python wheel (#5590)
* Hide C++ symbols in libxgboost.so when building Python wheel

* Update Jenkinsfile

* Add test

* Upgrade rabit

* Add setup.py option.

Co-authored-by: fis <jm.yuan@outlook.com>
2020-04-24 13:32:05 -07:00
Philip Hyunsu Cho
92913aaf7f
[CI] Use Vault repository to re-gain access to devtoolset-4 (#5589)
* [CI] Use Vault repository to re-gain access to devtoolset-4

* Use manylinux2010 tag

* Update Dockerfile.jvm

* Fix rename_whl.py

* Upgrade Pip, to handle manylinux2010 tag

* Update insert_vcomp140.py

* Update test_python.sh
2020-04-23 18:53:54 -07:00
Jiaming Yuan
ccd30e4491
Fix non-openmp build. (#5566)
* Add test to Jenkins.
* Fix threading utils tests.
* Require thread library.
2020-04-20 12:16:38 +08:00
Jiaming Yuan
468b1594d3
Fix CLI model IO. (#5535)
* Add test for comparing Python and CLI training result.
2020-04-16 07:48:47 +08:00