[CI] Migrate CI pipelines from Jenkins to BuildKite (#8142)
* [CI] Migrate CI pipelines from Jenkins to BuildKite * Require manual approval * Less verbose output when pulling Docker * Remove us-east-2 from metadata.py * Add documentation * Add missing underscore * Add missing punctuation * More specific instruction * Better paragraph structure
This commit is contained in:
parent
b397d64c96
commit
e888eb2fa9
@ -39,24 +39,58 @@ task of cross-compiling a Python wheel. (Note that ``cibuildwheel`` will call
|
|||||||
a glue code to call CMake and a C++ compiler to build the native library on the fly.)
|
a glue code to call CMake and a C++ compiler to build the native library on the fly.)
|
||||||
|
|
||||||
*******************************
|
*******************************
|
||||||
Reproducing errors from Jenkins
|
Elastic CI Stack with BuildKite
|
||||||
*******************************
|
*******************************
|
||||||
|
|
||||||
It is often useful to reproduce the particular testing environment from our Jenkins server for
|
`BuildKite <https://buildkite.com/home>`_ is a SaaS (Software as a Service) platform that orchestrates
|
||||||
the purpose of troubleshooting a failing test. We use Docker containers heavily to package
|
cloud machines to host CI pipelines. The BuildKite platform allows us to define cloud resources in
|
||||||
the testing environment, so you can use Docker to reproduce it on your own machine.
|
a declarative fashion. Every configuration step is now documented explicitly as code.
|
||||||
|
|
||||||
1. Install Docker: https://docs.docker.com/engine/install/ubuntu/
|
**Prerequisite**: You should have some knowledge of `CloudFormation <https://aws.amazon.com/cloudformation/>`_.
|
||||||
2. Install NVIDIA Docker runtime: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installing-on-ubuntu-and-debian
|
CloudFormation lets us define a stack of cloud resources (EC2 machines, Lambda functions, S3 etc) using
|
||||||
The runtime lets you access NVIDIA GPUs inside a Docker container.
|
a single YAML file.
|
||||||
3. In a build log, all tests are invoked via the wrapper script ``tests/ci_build/ci_build.sh``.
|
|
||||||
Identify the test you'd like to reproduce locally, and note how the wrapper script was invoked for that test.
|
|
||||||
The invocation should look like this:
|
|
||||||
|
|
||||||
.. code-block:: bash
|
**Prerequisite**: Gain access to the XGBoost project's AWS account (``admin@xgboost-ci.net``), and then
|
||||||
|
set up a credential pair in order to provision resources on AWS. See
|
||||||
|
`Creating an IAM user in your AWS account <https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html>`_.
|
||||||
|
|
||||||
CI_DOCKER_EXTRA_PARAMS_INIT='--shm-size=4g' tests/ci_build/ci_build.sh gpu nvidia-docker \
|
* Option 1. Give full admin privileges to your IAM user. This is the simplest option.
|
||||||
--build-arg CUDA_VERSION_ARG=11.0 tests/ci_build/test_python.sh mgpu --use-rmm-pool
|
* Option 2. Give limited set of permissions to your IAM user, to reduce the possibility of messing up other resources.
|
||||||
|
For this, use the script ``tests/buildkite/infrastructure/service-user/create_service_user.py``.
|
||||||
|
|
||||||
4. You can now run the same command on your own machine. The wrapper script will automatically download and
|
=====================
|
||||||
set up the correct Docker container(s).
|
Worker Image Pipeline
|
||||||
|
=====================
|
||||||
|
Building images for worker machines used to be a chore: you'd provision an EC2 machine, SSH into it, and
|
||||||
|
manually install the necessary packages. This process is not only laborous but also error-prone. You may
|
||||||
|
forget to install a package or change a system configuration.
|
||||||
|
|
||||||
|
No more. Now we have an automated pipeline for building images for worker machines.
|
||||||
|
|
||||||
|
* Run ``tests/buildkite/infrastructure/worker-image-pipeline/create_worker_image_pipelines.py`` in order to provision
|
||||||
|
CloudFormation stacks named ``buildkite-linux-amd64-gpu-worker`` and ``buildkite-windows-gpu-worker``. They are
|
||||||
|
pipelines that create AMIs (Amazon Machine Images) for Linux and Windows workers, respectively.
|
||||||
|
* Navigate to the CloudFormation web console to verify that the image builder pipelines have been provisioned. It may
|
||||||
|
take some time.
|
||||||
|
* Once they pipelines have been fully provisioned, run the script
|
||||||
|
``tests/buildkite/infrastructure/worker-image-pipeline/run_pipelines.py`` to execute the pipelines. New AMIs will be
|
||||||
|
uploaded to the EC2 service. You can locate them in the EC2 console.
|
||||||
|
* Make sure to modify ``tests/buildkite/infrastructure/aws-stack-creator/metadata.py`` to use the correct AMI IDs.
|
||||||
|
(For ``linux-amd64-cpu`` and ``linux-arm64-cpu``, use the AMIs provided by BuildKite. Consult the ``AWSRegion2AMI``
|
||||||
|
section of https://s3.amazonaws.com/buildkite-aws-stack/latest/aws-stack.yml.)
|
||||||
|
|
||||||
|
======================
|
||||||
|
EC2 Autoscaling Groups
|
||||||
|
======================
|
||||||
|
In EC2, you can create auto-scaling groups, where you can dynamically adjust the number of worker instances according to
|
||||||
|
workload. When a pull request is submitted, the following steps take place:
|
||||||
|
|
||||||
|
1. GitHub sends a signal to the registered webhook, which connects to the BuildKite server.
|
||||||
|
2. BuildKite sends a signal to a `Lambda <https://aws.amazon.com/lambda/>`_ function named ``Autoscaling``.
|
||||||
|
3. The Lambda function sends a signal to the auto-scaling group. The group scales up and adds additional worker instances.
|
||||||
|
4. New worker instances run the test jobs. Test results are reported back to BuildKite.
|
||||||
|
5. When the test jobs complete, BuildKite sends a signal to ``Autoscaling``, which in turn requests the autoscaling group
|
||||||
|
to scale down. Idle worker instances are shut down.
|
||||||
|
|
||||||
|
To set up the auto-scaling group, run the script ``tests/buildkite/infrastructure/aws-stack-creator/create_stack.py``.
|
||||||
|
Check the CloudFormation web console to verify successful provision of auto-scaling groups.
|
||||||
43
tests/buildkite/build-cpu-arm64.sh
Executable file
43
tests/buildkite/build-cpu-arm64.sh
Executable file
@ -0,0 +1,43 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
WHEEL_TAG=manylinux2014_aarch64
|
||||||
|
|
||||||
|
echo "--- Build CPU code targeting ARM64"
|
||||||
|
|
||||||
|
source tests/buildkite/conftest.sh
|
||||||
|
|
||||||
|
command_wrapper="tests/ci_build/ci_build.sh aarch64 docker"
|
||||||
|
|
||||||
|
echo "--- Build libxgboost from the source"
|
||||||
|
$command_wrapper tests/ci_build/build_via_cmake.sh --conda-env=aarch64_test \
|
||||||
|
-DUSE_OPENMP=ON -DHIDE_CXX_SYMBOL=ON
|
||||||
|
echo "--- Run Google Test"
|
||||||
|
$command_wrapper bash -c "cd build && ctest --extra-verbose"
|
||||||
|
|
||||||
|
echo "--- Build binary wheel"
|
||||||
|
$command_wrapper bash -c \
|
||||||
|
"cd python-package && rm -rf dist/* && python setup.py bdist_wheel --universal"
|
||||||
|
$command_wrapper python tests/ci_build/rename_whl.py python-package/dist/*.whl \
|
||||||
|
${BUILDKITE_COMMIT} ${WHEEL_TAG}
|
||||||
|
|
||||||
|
echo "--- Audit binary wheel to ensure it's compliant with manylinux2014 standard"
|
||||||
|
$command_wrapper auditwheel repair --plat ${WHEEL_TAG} python-package/dist/*.whl
|
||||||
|
$command_wrapper python tests/ci_build/rename_whl.py wheelhouse/*.whl \
|
||||||
|
${BUILDKITE_COMMIT} ${WHEEL_TAG}
|
||||||
|
mv -v wheelhouse/*.whl python-package/dist/
|
||||||
|
# Make sure that libgomp.so is vendored in the wheel
|
||||||
|
$command_wrapper bash -c \
|
||||||
|
"unzip -l python-package/dist/*.whl | grep libgomp || exit -1"
|
||||||
|
|
||||||
|
echo "--- Upload Python wheel"
|
||||||
|
buildkite-agent artifact upload "python-package/dist/*.whl"
|
||||||
|
if [[ ($is_pull_request == 0) && ($is_release_branch == 1) ]]
|
||||||
|
then
|
||||||
|
aws s3 cp python-package/dist/*.whl s3://xgboost-nightly-builds/${BRANCH_NAME}/ \
|
||||||
|
--acl public-read --no-progress
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "--- Stash XGBoost CLI executable"
|
||||||
|
buildkite-agent artifact upload ./xgboost
|
||||||
33
tests/buildkite/build-cpu.sh
Executable file
33
tests/buildkite/build-cpu.sh
Executable file
@ -0,0 +1,33 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
echo "--- Build CPU code"
|
||||||
|
|
||||||
|
source tests/buildkite/conftest.sh
|
||||||
|
|
||||||
|
command_wrapper="tests/ci_build/ci_build.sh cpu docker"
|
||||||
|
|
||||||
|
$command_wrapper rm -fv dmlc-core/include/dmlc/build_config_default.h
|
||||||
|
# This step is not necessary, but here we include it, to ensure that
|
||||||
|
# DMLC_CORE_USE_CMAKE flag is correctly propagated. We want to make sure that we use
|
||||||
|
# the configured header build/dmlc/build_config.h instead of
|
||||||
|
# include/dmlc/build_config_default.h.
|
||||||
|
echo "--- Build libxgboost from the source"
|
||||||
|
$command_wrapper tests/ci_build/build_via_cmake.sh -DPLUGIN_DENSE_PARSER=ON
|
||||||
|
echo "--- Run Google Test"
|
||||||
|
$command_wrapper bash -c "cd build && ctest --extra-verbose"
|
||||||
|
echo "--- Stash XGBoost CLI executable"
|
||||||
|
buildkite-agent artifact upload ./xgboost
|
||||||
|
|
||||||
|
# Sanitizer test
|
||||||
|
echo "--- Run Google Test with sanitizer enabled"
|
||||||
|
$command_wrapper tests/ci_build/build_via_cmake.sh -DUSE_SANITIZER=ON \
|
||||||
|
-DENABLED_SANITIZERS="address;leak;undefined" -DCMAKE_BUILD_TYPE=Debug \
|
||||||
|
-DSANITIZER_PATH=/usr/lib/x86_64-linux-gnu/
|
||||||
|
CI_DOCKER_EXTRA_PARAMS_INIT="-e ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer "`
|
||||||
|
`"-e ASAN_OPTIONS=symbolize=1 "`
|
||||||
|
`"-e UBSAN_OPTIONS=print_stacktrace=1:log_path=ubsan_error.log "`
|
||||||
|
`"--cap-add SYS_PTRACE" \
|
||||||
|
$command_wrapper bash -c "cd build && ctest --exclude-regex AllTestsInDMLCUnitTests "`
|
||||||
|
`"--extra-verbose"
|
||||||
26
tests/buildkite/build-cuda-with-rmm.sh
Executable file
26
tests/buildkite/build-cuda-with-rmm.sh
Executable file
@ -0,0 +1,26 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
CUDA_VERSION=11.0.3
|
||||||
|
|
||||||
|
source tests/buildkite/conftest.sh
|
||||||
|
|
||||||
|
echo "--- Build with CUDA ${CUDA_VERSION}, RMM enabled"
|
||||||
|
|
||||||
|
if [[ ($is_pull_request == 1) || ($is_release_branch == 0) ]]
|
||||||
|
then
|
||||||
|
arch_flag="-DGPU_COMPUTE_VER=75"
|
||||||
|
else
|
||||||
|
arch_flag=""
|
||||||
|
fi
|
||||||
|
|
||||||
|
command_wrapper="tests/ci_build/ci_build.sh rmm docker --build-arg "`
|
||||||
|
`"CUDA_VERSION_ARG=$CUDA_VERSION"
|
||||||
|
|
||||||
|
echo "--- Build libxgboost from the source"
|
||||||
|
$command_wrapper tests/ci_build/build_via_cmake.sh --conda-env=gpu_test -DUSE_CUDA=ON \
|
||||||
|
-DUSE_NCCL=ON -DPLUGIN_RMM=ON -DBUILD_WITH_CUDA_CUB=ON ${arch_flag}
|
||||||
|
|
||||||
|
echo "-- Stash C++ test executable (testxgboost)"
|
||||||
|
buildkite-agent artifact upload build/testxgboost
|
||||||
52
tests/buildkite/build-cuda.sh
Executable file
52
tests/buildkite/build-cuda.sh
Executable file
@ -0,0 +1,52 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
CUDA_VERSION=11.0.3
|
||||||
|
WHEEL_TAG=manylinux2014_x86_64
|
||||||
|
|
||||||
|
echo "--- Build with CUDA ${CUDA_VERSION}"
|
||||||
|
|
||||||
|
source tests/buildkite/conftest.sh
|
||||||
|
|
||||||
|
if [[ ($is_pull_request == 1) || ($is_release_branch == 0) ]]
|
||||||
|
then
|
||||||
|
arch_flag="-DGPU_COMPUTE_VER=75"
|
||||||
|
else
|
||||||
|
arch_flag=""
|
||||||
|
fi
|
||||||
|
|
||||||
|
command_wrapper="tests/ci_build/ci_build.sh gpu_build_centos7 docker --build-arg "`
|
||||||
|
`"CUDA_VERSION_ARG=$CUDA_VERSION"
|
||||||
|
|
||||||
|
echo "--- Build libxgboost from the source"
|
||||||
|
$command_wrapper tests/ci_build/prune_libnccl.sh
|
||||||
|
$command_wrapper tests/ci_build/build_via_cmake.sh -DUSE_CUDA=ON -DUSE_NCCL=ON \
|
||||||
|
-DUSE_OPENMP=ON -DHIDE_CXX_SYMBOLS=ON -DUSE_NCCL_LIB_PATH=ON \
|
||||||
|
-DNCCL_INCLUDE_DIR=/usr/include -DNCCL_LIBRARY=/workspace/libnccl_static.a \
|
||||||
|
${arch_flag}
|
||||||
|
echo "--- Build binary wheel"
|
||||||
|
$command_wrapper bash -c \
|
||||||
|
"cd python-package && rm -rf dist/* && python setup.py bdist_wheel --universal"
|
||||||
|
$command_wrapper python tests/ci_build/rename_whl.py python-package/dist/*.whl \
|
||||||
|
${BUILDKITE_COMMIT} ${WHEEL_TAG}
|
||||||
|
|
||||||
|
echo "--- Audit binary wheel to ensure it's compliant with manylinux2014 standard"
|
||||||
|
tests/ci_build/ci_build.sh auditwheel_x86_64 docker auditwheel repair \
|
||||||
|
--plat ${WHEEL_TAG} python-package/dist/*.whl
|
||||||
|
$command_wrapper python tests/ci_build/rename_whl.py wheelhouse/*.whl \
|
||||||
|
${BUILDKITE_COMMIT} ${WHEEL_TAG}
|
||||||
|
mv -v wheelhouse/*.whl python-package/dist/
|
||||||
|
# Make sure that libgomp.so is vendored in the wheel
|
||||||
|
tests/ci_build/ci_build.sh auditwheel_x86_64 docker bash -c \
|
||||||
|
"unzip -l python-package/dist/*.whl | grep libgomp || exit -1"
|
||||||
|
|
||||||
|
echo "--- Upload Python wheel"
|
||||||
|
buildkite-agent artifact upload python-package/dist/*.whl
|
||||||
|
if [[ ($is_pull_request == 0) && ($is_release_branch == 1) ]]
|
||||||
|
then
|
||||||
|
aws s3 cp python-package/dist/*.whl s3://xgboost-nightly-builds/${BRANCH_NAME}/ \
|
||||||
|
--acl public-read --no-progress
|
||||||
|
fi
|
||||||
|
echo "-- Stash C++ test executable (testxgboost)"
|
||||||
|
buildkite-agent artifact upload build/testxgboost
|
||||||
22
tests/buildkite/build-gpu-rpkg.sh
Executable file
22
tests/buildkite/build-gpu-rpkg.sh
Executable file
@ -0,0 +1,22 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
CUDA_VERSION=11.0.3
|
||||||
|
|
||||||
|
source tests/buildkite/conftest.sh
|
||||||
|
|
||||||
|
echo "--- Build XGBoost R package with CUDA"
|
||||||
|
|
||||||
|
if [[ ($is_pull_request == 0) && ($is_release_branch == 1) ]]
|
||||||
|
then
|
||||||
|
|
||||||
|
tests/ci_build/ci_build.sh gpu_build_r_centos7 docker \
|
||||||
|
--build-arg CUDA_VERSION_ARG=${CUDA_VERSION} tests/ci_build/build_r_pkg_with_cuda.sh \
|
||||||
|
${BUILDKITE_COMMIT}
|
||||||
|
|
||||||
|
echo "--- Upload R tarball"
|
||||||
|
aws s3 cp xgboost_r_gpu_linux_*.tar.gz s3://xgboost-nightly-builds/${BRANCH_NAME}/ \
|
||||||
|
--acl public-read --no-progress
|
||||||
|
|
||||||
|
fi
|
||||||
14
tests/buildkite/build-jvm-doc.sh
Executable file
14
tests/buildkite/build-jvm-doc.sh
Executable file
@ -0,0 +1,14 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
source tests/buildkite/conftest.sh
|
||||||
|
|
||||||
|
echo "--- Build JVM packages doc"
|
||||||
|
tests/ci_build/ci_build.sh jvm docker tests/ci_build/build_jvm_doc.sh ${BRANCH_NAME}
|
||||||
|
if [[ ($is_pull_request == 0) && ($is_release_branch == 1) ]]
|
||||||
|
then
|
||||||
|
echo "--- Upload JVM packages doc"
|
||||||
|
aws s3 cp jvm-packages/${BRANCH_NAME}.tar.bz2 \
|
||||||
|
s3://xgboost-docs/${BRANCH_NAME}.tar.bz2 --acl public-read --no-progress
|
||||||
|
fi
|
||||||
21
tests/buildkite/build-jvm-packages-gpu.sh
Executable file
21
tests/buildkite/build-jvm-packages-gpu.sh
Executable file
@ -0,0 +1,21 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
SPARK_VERSION=3.0.1
|
||||||
|
CUDA_VERSION=11.0.3
|
||||||
|
|
||||||
|
source tests/buildkite/conftest.sh
|
||||||
|
|
||||||
|
echo "--- Build XGBoost JVM packages with CUDA"
|
||||||
|
|
||||||
|
if [[ ($is_pull_request == 1) || ($is_release_branch == 0) ]]
|
||||||
|
then
|
||||||
|
arch_flag="-DGPU_COMPUTE_VER=75"
|
||||||
|
else
|
||||||
|
arch_flag=""
|
||||||
|
fi
|
||||||
|
|
||||||
|
tests/ci_build/ci_build.sh jvm_gpu_build nvidia-docker \
|
||||||
|
--build-arg CUDA_VERSION_ARG=${CUDA_VERSION} tests/ci_build/build_jvm_packages.sh \
|
||||||
|
${SPARK_VERSION} -Duse.cuda=ON ${arch_flag}
|
||||||
16
tests/buildkite/build-jvm-packages.sh
Executable file
16
tests/buildkite/build-jvm-packages.sh
Executable file
@ -0,0 +1,16 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
SPARK_VERSION=3.0.1
|
||||||
|
|
||||||
|
source tests/buildkite/conftest.sh
|
||||||
|
|
||||||
|
echo "--- Build XGBoost JVM packages"
|
||||||
|
tests/ci_build/ci_build.sh jvm docker tests/ci_build/build_jvm_packages.sh \
|
||||||
|
${SPARK_VERSION}
|
||||||
|
|
||||||
|
echo "--- Stash XGBoost4J JARs"
|
||||||
|
buildkite-agent artifact upload "jvm-packages/xgboost4j/target/*.jar"
|
||||||
|
buildkite-agent artifact upload "jvm-packages/xgboost4j-spark/target/*.jar"
|
||||||
|
buildkite-agent artifact upload "jvm-packages/xgboost4j-example/target/*.jar"
|
||||||
21
tests/buildkite/build-rpkg-win64-gpu.ps1
Normal file
21
tests/buildkite/build-rpkg-win64-gpu.ps1
Normal file
@ -0,0 +1,21 @@
|
|||||||
|
$ErrorActionPreference = "Stop"
|
||||||
|
|
||||||
|
. tests/buildkite/conftest.ps1
|
||||||
|
|
||||||
|
Write-Host "--- Build XGBoost R package with CUDA"
|
||||||
|
|
||||||
|
nvcc --version
|
||||||
|
if ( $is_release_branch -eq 1 ) {
|
||||||
|
$arch_flag = "-DGPU_COMPUTE_VER=75"
|
||||||
|
|
||||||
|
bash tests/ci_build/build_r_pkg_with_cuda_win64.sh $Env:BUILDKITE_COMMIT
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
|
||||||
|
Write-Host "--- Upload R tarball"
|
||||||
|
Get-ChildItem . -Filter xgboost_r_gpu_win64_*.tar.gz |
|
||||||
|
Foreach-Object {
|
||||||
|
& aws s3 cp $_ s3://xgboost-nightly-builds/$Env:BUILDKITE_BRANCH/ `
|
||||||
|
--acl public-read --no-progress
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
}
|
||||||
|
}
|
||||||
58
tests/buildkite/build-win64-gpu.ps1
Normal file
58
tests/buildkite/build-win64-gpu.ps1
Normal file
@ -0,0 +1,58 @@
|
|||||||
|
$ErrorActionPreference = "Stop"
|
||||||
|
|
||||||
|
. tests/buildkite/conftest.ps1
|
||||||
|
|
||||||
|
Write-Host "--- Build libxgboost on Windows with CUDA"
|
||||||
|
|
||||||
|
nvcc --version
|
||||||
|
if ( $is_release_branch -eq 0 ) {
|
||||||
|
$arch_flag = "-DGPU_COMPUTE_VER=75"
|
||||||
|
} else {
|
||||||
|
$arch_flag = ""
|
||||||
|
}
|
||||||
|
mkdir build
|
||||||
|
cd build
|
||||||
|
cmake .. -G"Visual Studio 15 2017 Win64" -DUSE_CUDA=ON -DCMAKE_VERBOSE_MAKEFILE=ON `
|
||||||
|
-DGOOGLE_TEST=ON -DUSE_DMLC_GTEST=ON -DCMAKE_UNITY_BUILD=ON ${arch_flag}
|
||||||
|
$msbuild = -join @(
|
||||||
|
"C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\MSBuild\\15.0"
|
||||||
|
"\\Bin\\MSBuild.exe"
|
||||||
|
)
|
||||||
|
& $msbuild xgboost.sln /m /p:Configuration=Release /nodeReuse:false
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
|
||||||
|
Write-Host "--- Build binary wheel"
|
||||||
|
cd ../python-package
|
||||||
|
conda activate
|
||||||
|
& python setup.py bdist_wheel --universal
|
||||||
|
Get-ChildItem . -Filter dist/*.whl |
|
||||||
|
Foreach-Object {
|
||||||
|
& python ../tests/ci_build/rename_whl.py $_.FullName $Env:BUILDKITE_COMMIT win_amd64
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
}
|
||||||
|
|
||||||
|
Write-Host "--- Insert vcomp140.dll (OpenMP runtime) into the wheel"
|
||||||
|
cd dist
|
||||||
|
Copy-Item -Path ../../tests/ci_build/insert_vcomp140.py -Destination .
|
||||||
|
& python insert_vcomp140.py *.whl
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
|
||||||
|
Write-Host "--- Upload Python wheel"
|
||||||
|
cd ../..
|
||||||
|
Get-ChildItem . -Filter python-package/dist/*.whl |
|
||||||
|
Foreach-Object {
|
||||||
|
& buildkite-agent artifact upload python-package/dist/$_
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
}
|
||||||
|
if ( $is_release_branch -eq 1 ) {
|
||||||
|
Get-ChildItem . -Filter python-package/dist/*.whl |
|
||||||
|
Foreach-Object {
|
||||||
|
& aws s3 cp python-package/dist/$_ s3://xgboost-nightly-builds/$Env:BUILDKITE_BRANCH/ `
|
||||||
|
--acl public-read --no-progress
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Write-Host "--- Stash C++ test executables"
|
||||||
|
& buildkite-agent artifact upload build/testxgboost.exe
|
||||||
|
& buildkite-agent artifact upload xgboost.exe
|
||||||
11
tests/buildkite/conftest.ps1
Normal file
11
tests/buildkite/conftest.ps1
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
if ( $Env:BUILDKITE_PULL_REQUEST -and ($Env:BUILDKITE_PULL_REQUEST -ne "false") ) {
|
||||||
|
$is_pull_request = 1
|
||||||
|
} else {
|
||||||
|
$is_pull_request = 0
|
||||||
|
}
|
||||||
|
|
||||||
|
if ( ($Env:BUILDKITE_BRANCH -eq "master") -or ($Env:BUILDKITE_BRANCH -match "release_.+") ) {
|
||||||
|
$is_release_branch = 1
|
||||||
|
} else {
|
||||||
|
$is_release_branch = 0
|
||||||
|
}
|
||||||
22
tests/buildkite/conftest.sh
Executable file
22
tests/buildkite/conftest.sh
Executable file
@ -0,0 +1,22 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
set -x
|
||||||
|
|
||||||
|
if [[ -n $BUILDKITE_PULL_REQUEST && $BUILDKITE_PULL_REQUEST != "false" ]]
|
||||||
|
then
|
||||||
|
is_pull_request=1
|
||||||
|
export BRANCH_NAME=PR-$BUILDKITE_PULL_REQUEST
|
||||||
|
else
|
||||||
|
is_pull_request=0
|
||||||
|
export BRANCH_NAME=$BUILDKITE_BRANCH
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ $BUILDKITE_BRANCH == "master" || $BUILDKITE_BRANCH == "release_"* ]]
|
||||||
|
then
|
||||||
|
is_release_branch=1
|
||||||
|
else
|
||||||
|
is_release_branch=0
|
||||||
|
fi
|
||||||
|
|
||||||
|
set +x
|
||||||
16
tests/buildkite/deploy-jvm-packages.sh
Executable file
16
tests/buildkite/deploy-jvm-packages.sh
Executable file
@ -0,0 +1,16 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
SPARK_VERSION=3.0.1
|
||||||
|
CUDA_VERSION=11.0.3
|
||||||
|
|
||||||
|
source tests/buildkite/conftest.sh
|
||||||
|
|
||||||
|
if [[ ($is_pull_request == 0) && ($is_release_branch == 1) ]]
|
||||||
|
then
|
||||||
|
echo "--- Deploy JVM packages to xgboost-maven-repo S3 repo"
|
||||||
|
tests/ci_build/ci_build.sh jvm_gpu_build docker \
|
||||||
|
--build-arg CUDA_VERSION_ARG=${CUDA_VERSION} \
|
||||||
|
tests/ci_build/deploy_jvm_packages.sh ${SPARK_VERSION}
|
||||||
|
fi
|
||||||
101
tests/buildkite/infrastructure/aws-stack-creator/create_stack.py
Normal file
101
tests/buildkite/infrastructure/aws-stack-creator/create_stack.py
Normal file
@ -0,0 +1,101 @@
|
|||||||
|
import argparse
|
||||||
|
import copy
|
||||||
|
|
||||||
|
import boto3
|
||||||
|
|
||||||
|
from metadata import AMI_ID, COMMON_STACK_PARAMS, STACK_PARAMS
|
||||||
|
|
||||||
|
TEMPLATE_URL = "https://s3.amazonaws.com/buildkite-aws-stack/latest/aws-stack.yml"
|
||||||
|
|
||||||
|
|
||||||
|
def get_availability_zones(*, aws_region):
|
||||||
|
client = boto3.client("ec2", region_name=aws_region)
|
||||||
|
r = client.describe_availability_zones(
|
||||||
|
Filters=[
|
||||||
|
{"Name": "region-name", "Values": [aws_region]},
|
||||||
|
{"Name": "zone-type", "Values": ["availability-zone"]},
|
||||||
|
]
|
||||||
|
)
|
||||||
|
return sorted([x["ZoneName"] for x in r["AvailabilityZones"]])
|
||||||
|
|
||||||
|
|
||||||
|
def get_default_vpc(*, aws_region):
|
||||||
|
ec2 = boto3.resource("ec2", region_name=aws_region)
|
||||||
|
default_vpc_id = None
|
||||||
|
for x in ec2.vpcs.filter(Filters=[{"Name": "is-default", "Values": ["true"]}]):
|
||||||
|
return x
|
||||||
|
|
||||||
|
# Create default VPC if not exist
|
||||||
|
client = boto3.client("ec2", region_name=aws_region)
|
||||||
|
r = client.create_default_vpc()
|
||||||
|
default_vpc_id = r["Vpc"]["VpcId"]
|
||||||
|
|
||||||
|
return ec2.Vpc(default_vpc_id)
|
||||||
|
|
||||||
|
|
||||||
|
def format_params(args, *, stack_id):
|
||||||
|
default_vpc = get_default_vpc(aws_region=args.aws_region)
|
||||||
|
azs = get_availability_zones(aws_region=args.aws_region)
|
||||||
|
# For each of the first two availability zones (AZs), choose the default subnet
|
||||||
|
subnets = [
|
||||||
|
x.id
|
||||||
|
for x in default_vpc.subnets.filter(
|
||||||
|
Filters=[
|
||||||
|
{"Name": "default-for-az", "Values": ["true"]},
|
||||||
|
{"Name": "availability-zone", "Values": azs[:2]},
|
||||||
|
]
|
||||||
|
)
|
||||||
|
]
|
||||||
|
assert len(subnets) == 2
|
||||||
|
|
||||||
|
params = copy.deepcopy(STACK_PARAMS[stack_id])
|
||||||
|
params["ImageId"] = AMI_ID[stack_id][args.aws_region]
|
||||||
|
params["BuildkiteQueue"] = stack_id
|
||||||
|
params["CostAllocationTagValue"] = f"buildkite-{stack_id}"
|
||||||
|
params["BuildkiteAgentToken"] = args.agent_token
|
||||||
|
params["VpcId"] = default_vpc.id
|
||||||
|
params["Subnets"] = ",".join(subnets)
|
||||||
|
params.update(COMMON_STACK_PARAMS)
|
||||||
|
return [{"ParameterKey": k, "ParameterValue": v} for k, v in params.items()]
|
||||||
|
|
||||||
|
|
||||||
|
def get_full_stack_id(stack_id):
|
||||||
|
return f"buildkite-{stack_id}-autoscaling-group"
|
||||||
|
|
||||||
|
|
||||||
|
def main(args):
|
||||||
|
client = boto3.client("cloudformation", region_name=args.aws_region)
|
||||||
|
|
||||||
|
for stack_id in AMI_ID:
|
||||||
|
stack_id_full = get_full_stack_id(stack_id)
|
||||||
|
print(f"Creating elastic CI stack {stack_id_full}...")
|
||||||
|
|
||||||
|
params = format_params(args, stack_id=stack_id)
|
||||||
|
|
||||||
|
response = client.create_stack(
|
||||||
|
StackName=stack_id_full,
|
||||||
|
TemplateURL=TEMPLATE_URL,
|
||||||
|
Capabilities=[
|
||||||
|
"CAPABILITY_IAM",
|
||||||
|
"CAPABILITY_NAMED_IAM",
|
||||||
|
"CAPABILITY_AUTO_EXPAND",
|
||||||
|
],
|
||||||
|
OnFailure="ROLLBACK",
|
||||||
|
EnableTerminationProtection=False,
|
||||||
|
Parameters=params,
|
||||||
|
)
|
||||||
|
print(f"CI stack {stack_id_full} is in progress in the background")
|
||||||
|
|
||||||
|
for stack_id in AMI_ID:
|
||||||
|
stack_id_full = get_full_stack_id(stack_id)
|
||||||
|
waiter = client.get_waiter("stack_create_complete")
|
||||||
|
waiter.wait(StackName=stack_id_full)
|
||||||
|
print(f"CI stack {stack_id_full} is now finished.")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("--aws-region", type=str, required=True)
|
||||||
|
parser.add_argument("--agent-token", type=str, required=True)
|
||||||
|
args = parser.parse_args()
|
||||||
|
main(args)
|
||||||
101
tests/buildkite/infrastructure/aws-stack-creator/metadata.py
Normal file
101
tests/buildkite/infrastructure/aws-stack-creator/metadata.py
Normal file
@ -0,0 +1,101 @@
|
|||||||
|
AMI_ID = {
|
||||||
|
# Managed by XGBoost team
|
||||||
|
"linux-amd64-gpu": {
|
||||||
|
"us-west-2": "ami-00ed92bd37f77bc33",
|
||||||
|
},
|
||||||
|
"linux-amd64-mgpu": {
|
||||||
|
"us-west-2": "ami-00ed92bd37f77bc33",
|
||||||
|
},
|
||||||
|
"windows-gpu": {
|
||||||
|
"us-west-2": "ami-0a1a2ea551a07ad5f",
|
||||||
|
},
|
||||||
|
# Managed by BuildKite
|
||||||
|
"linux-amd64-cpu": {
|
||||||
|
"us-west-2": "ami-075d4c25d5f0c17c1",
|
||||||
|
},
|
||||||
|
"pipeline-loader": {
|
||||||
|
"us-west-2": "ami-075d4c25d5f0c17c1",
|
||||||
|
},
|
||||||
|
"linux-arm64-cpu": {
|
||||||
|
"us-west-2": "ami-0952c6fb6db9a9891",
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
STACK_PARAMS = {
|
||||||
|
"linux-amd64-gpu": {
|
||||||
|
"InstanceOperatingSystem": "linux",
|
||||||
|
"InstanceType": "g4dn.xlarge",
|
||||||
|
"AgentsPerInstance": "1",
|
||||||
|
"MinSize": "0",
|
||||||
|
"MaxSize": "8",
|
||||||
|
"OnDemandPercentage": "100",
|
||||||
|
"ScaleOutFactor": "1.0",
|
||||||
|
"ScaleInIdlePeriod": "60", # in seconds
|
||||||
|
},
|
||||||
|
"linux-amd64-mgpu": {
|
||||||
|
"InstanceOperatingSystem": "linux",
|
||||||
|
"InstanceType": "g4dn.12xlarge",
|
||||||
|
"AgentsPerInstance": "1",
|
||||||
|
"MinSize": "0",
|
||||||
|
"MaxSize": "4",
|
||||||
|
"OnDemandPercentage": "100",
|
||||||
|
"ScaleOutFactor": "1.0",
|
||||||
|
"ScaleInIdlePeriod": "60", # in seconds
|
||||||
|
},
|
||||||
|
"windows-gpu": {
|
||||||
|
"InstanceOperatingSystem": "windows",
|
||||||
|
"InstanceType": "g4dn.2xlarge",
|
||||||
|
"AgentsPerInstance": "1",
|
||||||
|
"MinSize": "0",
|
||||||
|
"MaxSize": "2",
|
||||||
|
"OnDemandPercentage": "100",
|
||||||
|
"ScaleOutFactor": "1.0",
|
||||||
|
"ScaleInIdlePeriod": "600", # in seconds
|
||||||
|
},
|
||||||
|
"linux-amd64-cpu": {
|
||||||
|
"InstanceOperatingSystem": "linux",
|
||||||
|
"InstanceType": "c5a.4xlarge",
|
||||||
|
"AgentsPerInstance": "1",
|
||||||
|
"MinSize": "0",
|
||||||
|
"MaxSize": "16",
|
||||||
|
"OnDemandPercentage": "100",
|
||||||
|
"ScaleOutFactor": "1.0",
|
||||||
|
"ScaleInIdlePeriod": "60", # in seconds
|
||||||
|
},
|
||||||
|
"pipeline-loader": {
|
||||||
|
"InstanceOperatingSystem": "linux",
|
||||||
|
"InstanceType": "t3a.micro",
|
||||||
|
"AgentsPerInstance": "1",
|
||||||
|
"MinSize": "1",
|
||||||
|
"MaxSize": "1",
|
||||||
|
"OnDemandPercentage": "100",
|
||||||
|
"ScaleOutFactor": "1.0",
|
||||||
|
"ScaleInIdlePeriod": "60", # in seconds
|
||||||
|
},
|
||||||
|
"linux-arm64-cpu": {
|
||||||
|
"InstanceOperatingSystem": "linux",
|
||||||
|
"InstanceType": "c6g.4xlarge",
|
||||||
|
"AgentsPerInstance": "1",
|
||||||
|
"MinSize": "0",
|
||||||
|
"MaxSize": "8",
|
||||||
|
"OnDemandPercentage": "100",
|
||||||
|
"ScaleOutFactor": "1.0",
|
||||||
|
"ScaleInIdlePeriod": "60", # in seconds
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
COMMON_STACK_PARAMS = {
|
||||||
|
"BuildkiteAgentTimestampLines": "false",
|
||||||
|
"BuildkiteWindowsAdministrator": "true",
|
||||||
|
"AssociatePublicIpAddress": "true",
|
||||||
|
"ScaleOutForWaitingJobs": "false",
|
||||||
|
"EnableCostAllocationTags": "true",
|
||||||
|
"CostAllocationTagName": "CreatedBy",
|
||||||
|
"ECRAccessPolicy": "full",
|
||||||
|
"ManagedPolicyARN": "arn:aws:iam::aws:policy/AmazonS3FullAccess",
|
||||||
|
"EnableSecretsPlugin": "false",
|
||||||
|
"EnableECRPlugin": "false",
|
||||||
|
"EnableDockerLoginPlugin": "false",
|
||||||
|
"EnableDockerUserNamespaceRemap": "false",
|
||||||
|
"BuildkiteAgentExperiments": "normalised-upload-paths,resolve-commit-after-checkout",
|
||||||
|
}
|
||||||
2
tests/buildkite/infrastructure/requirements.txt
Normal file
2
tests/buildkite/infrastructure/requirements.txt
Normal file
@ -0,0 +1,2 @@
|
|||||||
|
boto3
|
||||||
|
cfn_tools
|
||||||
@ -0,0 +1,44 @@
|
|||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
|
||||||
|
import boto3
|
||||||
|
|
||||||
|
current_dir = os.path.dirname(__file__)
|
||||||
|
|
||||||
|
|
||||||
|
def main(args):
|
||||||
|
with open(
|
||||||
|
os.path.join(current_dir, "service-user-template.yml"), encoding="utf-8"
|
||||||
|
) as f:
|
||||||
|
service_user_template = f.read()
|
||||||
|
|
||||||
|
stack_id = "buildkite-elastic-ci-stack-service-user"
|
||||||
|
|
||||||
|
print("Create a new IAM user with suitable permissions...")
|
||||||
|
client = boto3.client("cloudformation", region_name=args.aws_region)
|
||||||
|
response = client.create_stack(
|
||||||
|
StackName=stack_id,
|
||||||
|
TemplateBody=service_user_template,
|
||||||
|
Capabilities=[
|
||||||
|
"CAPABILITY_IAM",
|
||||||
|
"CAPABILITY_NAMED_IAM",
|
||||||
|
],
|
||||||
|
Parameters=[{"ParameterKey": "UserName", "ParameterValue": args.user_name}],
|
||||||
|
)
|
||||||
|
waiter = client.get_waiter("stack_create_complete")
|
||||||
|
waiter.wait(StackName=stack_id)
|
||||||
|
user = boto3.resource("iam", region_name=args.aws_region).User(args.user_name)
|
||||||
|
key_pair = user.create_access_key_pair()
|
||||||
|
print("Finished creating an IAM users with suitable permissions.")
|
||||||
|
print(f"Access Key ID: {key_pair.access_key_id}")
|
||||||
|
print(f"Access Secret Access Key: {key_pair.secret_access_key}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("--aws-region", type=str, required=True)
|
||||||
|
parser.add_argument(
|
||||||
|
"--user-name", type=str, default="buildkite-elastic-ci-stack-user"
|
||||||
|
)
|
||||||
|
args = parser.parse_args()
|
||||||
|
main(args)
|
||||||
@ -0,0 +1,349 @@
|
|||||||
|
---
|
||||||
|
AWSTemplateFormatVersion: "2010-09-09"
|
||||||
|
Description: "Buildkite Elastic CI Stack CloudFormation service user"
|
||||||
|
|
||||||
|
Parameters:
|
||||||
|
UserName:
|
||||||
|
Type: String
|
||||||
|
Default: buildkite-elastic-ci-stack-user
|
||||||
|
Description: Name of user to create
|
||||||
|
|
||||||
|
Outputs:
|
||||||
|
UserNameOutput:
|
||||||
|
Value: !Ref CloudFormationServiceUser
|
||||||
|
UserArnOutput:
|
||||||
|
Value: !GetAtt CloudFormationServiceUser.Arn
|
||||||
|
|
||||||
|
Resources:
|
||||||
|
CloudFormationServiceUser:
|
||||||
|
Type: AWS::IAM::User
|
||||||
|
Properties:
|
||||||
|
ManagedPolicyArns:
|
||||||
|
- !Ref SubstackCrudPolicy
|
||||||
|
- !Ref CrudPolicy
|
||||||
|
- !Ref ImageBuilderPolicy
|
||||||
|
UserName: !Ref UserName
|
||||||
|
|
||||||
|
SubstackCrudPolicy:
|
||||||
|
Type: AWS::IAM::ManagedPolicy
|
||||||
|
Properties:
|
||||||
|
PolicyDocument:
|
||||||
|
{
|
||||||
|
"Version": "2012-10-17",
|
||||||
|
"Statement": [
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": "cloudformation:*",
|
||||||
|
"Resource": "*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"serverlessrepo:GetApplication",
|
||||||
|
"serverlessrepo:GetCloudFormationTemplate",
|
||||||
|
"serverlessrepo:CreateCloudFormationTemplate"
|
||||||
|
],
|
||||||
|
"Resource": "*"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
CrudPolicy:
|
||||||
|
Type: AWS::IAM::ManagedPolicy
|
||||||
|
Properties:
|
||||||
|
PolicyDocument:
|
||||||
|
{
|
||||||
|
"Version": "2012-10-17",
|
||||||
|
"Statement": [
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"ec2:DescribeAccountAttributes",
|
||||||
|
"ec2:DescribeAvailabilityZones",
|
||||||
|
"ec2:DescribeInstances",
|
||||||
|
"ec2:DescribeInternetGateways",
|
||||||
|
"ec2:DescribeLaunchTemplateVersions",
|
||||||
|
"ec2:DescribeLaunchTemplates",
|
||||||
|
"ec2:DescribeNetworkInterfaces",
|
||||||
|
"ec2:DescribeRouteTables",
|
||||||
|
"ec2:DescribeSecurityGroups",
|
||||||
|
"ec2:DescribeSubnets",
|
||||||
|
"ec2:DescribeVpcs",
|
||||||
|
"ec2:CreateTags"
|
||||||
|
],
|
||||||
|
"Resource": "*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"ec2:CreateInternetGateway",
|
||||||
|
"ec2:AttachInternetGateway",
|
||||||
|
"ec2:DetachInternetGateway",
|
||||||
|
"ec2:DeleteInternetGateway"
|
||||||
|
],
|
||||||
|
"Resource": "arn:aws:ec2:*:*:internet-gateway/*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"ec2:CreateLaunchTemplate",
|
||||||
|
"ec2:CreateLaunchTemplateVersion",
|
||||||
|
"ec2:DeleteLaunchTemplate"
|
||||||
|
],
|
||||||
|
"Resource": "arn:aws:ec2:*:*:launch-template/*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"ec2:AssociateRouteTable",
|
||||||
|
"ec2:DisassociateRouteTable",
|
||||||
|
"ec2:CreateRoute",
|
||||||
|
"ec2:CreateRouteTable",
|
||||||
|
"ec2:DeleteRoute",
|
||||||
|
"ec2:DeleteRouteTable"
|
||||||
|
],
|
||||||
|
"Resource": "arn:aws:ec2:*:*:route-table/*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"ec2:AuthorizeSecurityGroupIngress",
|
||||||
|
"ec2:RevokeSecurityGroupIngress",
|
||||||
|
"ec2:CreateSecurityGroup",
|
||||||
|
"ec2:DeleteSecurityGroup"
|
||||||
|
],
|
||||||
|
"Resource": "arn:aws:ec2:*:*:security-group/*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": "ec2:RunInstances",
|
||||||
|
"Resource": "*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"ec2:CreateSubnet",
|
||||||
|
"ec2:DeleteSubnet",
|
||||||
|
"ec2:AssociateRouteTable",
|
||||||
|
"ec2:DisassociateRouteTable"
|
||||||
|
],
|
||||||
|
"Resource": "arn:aws:ec2:*:*:subnet/*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"ec2:CreateVpc",
|
||||||
|
"ec2:CreateSecurityGroup",
|
||||||
|
"ec2:ModifyVpcAttribute",
|
||||||
|
"ec2:AttachInternetGateway",
|
||||||
|
"ec2:DetachInternetGateway",
|
||||||
|
"ec2:CreateSubnet",
|
||||||
|
"ec2:CreateRouteTable",
|
||||||
|
"ec2:DeleteVpc"
|
||||||
|
],
|
||||||
|
"Resource": "arn:aws:ec2:*:*:vpc/*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"ec2:CreateDefaultVpc",
|
||||||
|
"ec2:CreateDefaultSubnet"
|
||||||
|
],
|
||||||
|
"Resource": "*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"iam:CreateInstanceProfile",
|
||||||
|
"iam:GetInstanceProfile",
|
||||||
|
"iam:AddRoleToInstanceProfile",
|
||||||
|
"iam:RemoveRoleFromInstanceProfile",
|
||||||
|
"iam:DeleteInstanceProfile"
|
||||||
|
],
|
||||||
|
"Resource": "arn:aws:iam::*:instance-profile/*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"kms:DescribeKey",
|
||||||
|
"kms:CreateGrant",
|
||||||
|
"kms:Decrypt",
|
||||||
|
"kms:Encrypt"
|
||||||
|
],
|
||||||
|
"Resource": "arn:aws:kms:*:*:key/*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"lambda:CreateFunction",
|
||||||
|
"lambda:GetFunction",
|
||||||
|
"lambda:GetFunctionCodeSigningConfig",
|
||||||
|
"lambda:AddPermission",
|
||||||
|
"lambda:RemovePermission",
|
||||||
|
"lambda:DeleteFunction",
|
||||||
|
"lambda:InvokeFunction",
|
||||||
|
"lambda:TagResource"
|
||||||
|
],
|
||||||
|
"Resource": "arn:aws:lambda:*:*:function:*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"logs:CreateLogGroup",
|
||||||
|
"logs:PutRetentionPolicy",
|
||||||
|
"logs:DeleteLogGroup"
|
||||||
|
],
|
||||||
|
"Resource": "arn:aws:logs:*:*:log-group:*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"s3:GetObject",
|
||||||
|
"s3:CreateBucket",
|
||||||
|
"s3:PutBucketAcl",
|
||||||
|
"s3:PutBucketLogging",
|
||||||
|
"s3:PutBucketTagging",
|
||||||
|
"s3:PutBucketVersioning"
|
||||||
|
],
|
||||||
|
"Resource": "arn:aws:s3:::*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"ssm:GetParameter",
|
||||||
|
"ssm:PutParameter",
|
||||||
|
"ssm:DeleteParameter"
|
||||||
|
],
|
||||||
|
"Resource": "arn:aws:ssm:*:*:parameter/*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"iam:ListPolicies",
|
||||||
|
"iam:ListInstanceProfiles",
|
||||||
|
"iam:ListRoles",
|
||||||
|
"iam:ListPolicyVersions",
|
||||||
|
"iam:ListRolePolicies",
|
||||||
|
"iam:ListAttachedRolePolicies",
|
||||||
|
"iam:ListInstanceProfileTags",
|
||||||
|
"iam:ListRoleTags",
|
||||||
|
"iam:ListInstanceProfilesForRole",
|
||||||
|
"iam:GetPolicyVersion",
|
||||||
|
"iam:GetPolicy",
|
||||||
|
"iam:GetInstanceProfile",
|
||||||
|
"iam:GetRole",
|
||||||
|
"iam:GetRolePolicy",
|
||||||
|
"iam:TagPolicy",
|
||||||
|
"iam:UntagPolicy",
|
||||||
|
"iam:TagInstanceProfile",
|
||||||
|
"iam:UntagInstanceProfile",
|
||||||
|
"iam:TagRole",
|
||||||
|
"iam:UntagRole",
|
||||||
|
"iam:CreateRole",
|
||||||
|
"iam:PassRole",
|
||||||
|
"iam:DeleteRole",
|
||||||
|
"iam:UpdateRoleDescription",
|
||||||
|
"iam:UpdateRole",
|
||||||
|
"iam:AddRoleToInstanceProfile",
|
||||||
|
"iam:RemoveRoleFromInstanceProfile",
|
||||||
|
"iam:CreateInstanceProfile",
|
||||||
|
"iam:DeleteInstanceProfile",
|
||||||
|
"iam:DetachRolePolicy",
|
||||||
|
"iam:SetDefaultPolicyVersion",
|
||||||
|
"iam:AttachRolePolicy",
|
||||||
|
"iam:UpdateAssumeRolePolicy",
|
||||||
|
"iam:PutRolePermissionsBoundary",
|
||||||
|
"iam:DeleteRolePermissionsBoundary",
|
||||||
|
"iam:CreatePolicy",
|
||||||
|
"iam:DeletePolicyVersion",
|
||||||
|
"iam:DeletePolicy",
|
||||||
|
"iam:PutRolePolicy",
|
||||||
|
"iam:DeleteRolePolicy"
|
||||||
|
],
|
||||||
|
"Resource": "*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"autoscaling:DescribeLifecycleHookTypes",
|
||||||
|
"autoscaling:DescribeTerminationPolicyTypes",
|
||||||
|
"autoscaling:DescribePolicies",
|
||||||
|
"autoscaling:DescribeWarmPool",
|
||||||
|
"autoscaling:DescribeScalingActivities",
|
||||||
|
"autoscaling:DescribeScalingProcessTypes",
|
||||||
|
"autoscaling:DescribeScheduledActions",
|
||||||
|
"autoscaling:DescribeAutoScalingGroups",
|
||||||
|
"autoscaling:DescribeAutoScalingInstances",
|
||||||
|
"autoscaling:DescribeLifecycleHooks",
|
||||||
|
"autoscaling:SetDesiredCapacity",
|
||||||
|
"autoscaling:PutLifecycleHook",
|
||||||
|
"autoscaling:DeleteLifecycleHook",
|
||||||
|
"autoscaling:SetInstanceProtection",
|
||||||
|
"autoscaling:CreateAutoScalingGroup",
|
||||||
|
"autoscaling:EnableMetricsCollection",
|
||||||
|
"autoscaling:UpdateAutoScalingGroup",
|
||||||
|
"autoscaling:DeleteAutoScalingGroup",
|
||||||
|
"autoscaling:PutScalingPolicy",
|
||||||
|
"autoscaling:DeletePolicy",
|
||||||
|
"autoscaling:BatchPutScheduledUpdateGroupAction",
|
||||||
|
"autoscaling:PutScheduledUpdateGroupAction",
|
||||||
|
"autoscaling:DeleteScheduledAction",
|
||||||
|
"autoscaling:PutWarmPool",
|
||||||
|
"autoscaling:DeleteWarmPool",
|
||||||
|
"autoscaling:TerminateInstanceInAutoScalingGroup",
|
||||||
|
"autoscaling:AttachInstances"
|
||||||
|
],
|
||||||
|
"Resource": "*"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"events:DescribeRule",
|
||||||
|
"events:PutRule",
|
||||||
|
"events:PutTargets",
|
||||||
|
"events:RemoveTargets",
|
||||||
|
"events:DeleteRule"
|
||||||
|
],
|
||||||
|
"Resource": "arn:aws:events:*:*:rule/*"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
ImageBuilderPolicy:
|
||||||
|
Type: AWS::IAM::ManagedPolicy
|
||||||
|
Properties:
|
||||||
|
PolicyDocument:
|
||||||
|
{
|
||||||
|
"Version": "2012-10-17",
|
||||||
|
"Statement": [
|
||||||
|
{
|
||||||
|
"Effect": "Allow",
|
||||||
|
"Action": [
|
||||||
|
"imagebuilder:CreateComponent",
|
||||||
|
"imagebuilder:GetComponent",
|
||||||
|
"imagebuilder:DeleteComponent",
|
||||||
|
"imagebuilder:CreateImageRecipe",
|
||||||
|
"imagebuilder:GetImageRecipe",
|
||||||
|
"imagebuilder:DeleteImageRecipe",
|
||||||
|
"imagebuilder:CreateImagePipeline",
|
||||||
|
"imagebuilder:GetImagePipeline",
|
||||||
|
"imagebuilder:DeleteImagePipeline",
|
||||||
|
"imagebuilder:CreateInfrastructureConfiguration",
|
||||||
|
"imagebuilder:GetInfrastructureConfiguration",
|
||||||
|
"imagebuilder:DeleteInfrastructureConfiguration",
|
||||||
|
"imagebuilder:CreateDistributionConfiguration",
|
||||||
|
"imagebuilder:GetDistributionConfiguration",
|
||||||
|
"imagebuilder:DeleteDistributionConfiguration",
|
||||||
|
"imagebuilder:TagResource",
|
||||||
|
"imagebuilder:StartImagePipelineExecution",
|
||||||
|
"ec2:DescribeImages",
|
||||||
|
"ec2:DescribeSnapshots",
|
||||||
|
"ec2:DescribeRegions",
|
||||||
|
"ec2:DescribeVolumes",
|
||||||
|
"ec2:DescribeKeyPairs",
|
||||||
|
"ec2:DescribeInstanceTypeOfferings"
|
||||||
|
],
|
||||||
|
"Resource": "*"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
@ -0,0 +1,86 @@
|
|||||||
|
import argparse
|
||||||
|
import copy
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from urllib.request import urlopen
|
||||||
|
|
||||||
|
import boto3
|
||||||
|
import cfn_flip
|
||||||
|
from metadata import IMAGE_PARAMS
|
||||||
|
|
||||||
|
current_dir = os.path.dirname(__file__)
|
||||||
|
|
||||||
|
BUILDKITE_CF_TEMPLATE_URL = (
|
||||||
|
"https://s3.amazonaws.com/buildkite-aws-stack/latest/aws-stack.yml"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def format_params(*, stack_id, aws_region, ami_mapping):
|
||||||
|
params = copy.deepcopy(IMAGE_PARAMS[stack_id])
|
||||||
|
with open(
|
||||||
|
os.path.join(current_dir, params["BootstrapScript"]),
|
||||||
|
encoding="utf-8",
|
||||||
|
) as f:
|
||||||
|
bootstrap_script = f.read()
|
||||||
|
params["BaseImageId"] = ami_mapping[aws_region][params["BaseImageId"]]
|
||||||
|
params["BootstrapScript"] = bootstrap_script
|
||||||
|
return [{"ParameterKey": k, "ParameterValue": v} for k, v in params.items()]
|
||||||
|
|
||||||
|
|
||||||
|
def get_ami_mapping():
|
||||||
|
with urlopen(BUILDKITE_CF_TEMPLATE_URL) as response:
|
||||||
|
buildkite_cf_template = response.read().decode("utf-8")
|
||||||
|
cfn_obj = json.loads(cfn_flip.to_json(buildkite_cf_template))
|
||||||
|
return cfn_obj["Mappings"]["AWSRegion2AMI"]
|
||||||
|
|
||||||
|
|
||||||
|
def get_full_stack_id(stack_id):
|
||||||
|
return f"buildkite-{stack_id}-worker"
|
||||||
|
|
||||||
|
|
||||||
|
def main(args):
|
||||||
|
with open(
|
||||||
|
os.path.join(current_dir, "ec2-image-builder-pipeline-template.yml"),
|
||||||
|
encoding="utf-8",
|
||||||
|
) as f:
|
||||||
|
ec2_image_pipeline_template = f.read()
|
||||||
|
|
||||||
|
ami_mapping = get_ami_mapping()
|
||||||
|
|
||||||
|
for stack_id in IMAGE_PARAMS:
|
||||||
|
stack_id_full = get_full_stack_id(stack_id)
|
||||||
|
print(f"Creating EC2 image builder stack {stack_id_full}...")
|
||||||
|
|
||||||
|
params = format_params(
|
||||||
|
stack_id=stack_id, aws_region=args.aws_region, ami_mapping=ami_mapping
|
||||||
|
)
|
||||||
|
|
||||||
|
client = boto3.client("cloudformation", region_name=args.aws_region)
|
||||||
|
response = client.create_stack(
|
||||||
|
StackName=stack_id_full,
|
||||||
|
TemplateBody=ec2_image_pipeline_template,
|
||||||
|
Capabilities=[
|
||||||
|
"CAPABILITY_IAM",
|
||||||
|
"CAPABILITY_NAMED_IAM",
|
||||||
|
"CAPABILITY_AUTO_EXPAND",
|
||||||
|
],
|
||||||
|
OnFailure="ROLLBACK",
|
||||||
|
EnableTerminationProtection=False,
|
||||||
|
Parameters=params,
|
||||||
|
)
|
||||||
|
print(
|
||||||
|
f"EC2 image builder stack {stack_id_full} is in progress in the background"
|
||||||
|
)
|
||||||
|
|
||||||
|
for stack_id in IMAGE_PARAMS:
|
||||||
|
stack_id_full = get_full_stack_id(stack_id)
|
||||||
|
waiter = client.get_waiter("stack_create_complete")
|
||||||
|
waiter.wait(StackName=stack_id_full)
|
||||||
|
print(f"EC2 image builder stack {stack_id_full} is now finished.")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("--aws-region", type=str, required=True)
|
||||||
|
args = parser.parse_args()
|
||||||
|
main(args)
|
||||||
@ -0,0 +1,108 @@
|
|||||||
|
---
|
||||||
|
AWSTemplateFormatVersion: "2010-09-09"
|
||||||
|
Description: "EC2 Image Builder pipelines to build workers"
|
||||||
|
|
||||||
|
Parameters:
|
||||||
|
BaseImageId:
|
||||||
|
Type: String
|
||||||
|
Description: Base AMI to build a new image on top of.
|
||||||
|
|
||||||
|
BootstrapScript:
|
||||||
|
Type: String
|
||||||
|
Description: Content of AMI customization script
|
||||||
|
|
||||||
|
InstanceType:
|
||||||
|
Type: String
|
||||||
|
Description: Instance type for the Image Builder instances.
|
||||||
|
|
||||||
|
InstanceOperatingSystem:
|
||||||
|
Type: String
|
||||||
|
Description: The operating system to run on the instance
|
||||||
|
AllowedValues:
|
||||||
|
- Linux
|
||||||
|
- Windows
|
||||||
|
Default: "Linux"
|
||||||
|
|
||||||
|
VolumeSize:
|
||||||
|
Type: Number
|
||||||
|
Description: Size of EBS volume, in GiBs
|
||||||
|
|
||||||
|
Conditions:
|
||||||
|
IsInstanceWindows:
|
||||||
|
!Equals [ !Ref InstanceOperatingSystem, "Windows" ]
|
||||||
|
|
||||||
|
Resources:
|
||||||
|
# IAM role for the image builder instance
|
||||||
|
InstanceRole:
|
||||||
|
Type: AWS::IAM::Role
|
||||||
|
Properties:
|
||||||
|
AssumeRolePolicyDocument:
|
||||||
|
Version: "2012-10-17"
|
||||||
|
Statement:
|
||||||
|
- Effect: "Allow"
|
||||||
|
Principal:
|
||||||
|
Service: "ec2.amazonaws.com"
|
||||||
|
Action: "sts:AssumeRole"
|
||||||
|
ManagedPolicyArns:
|
||||||
|
- arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
|
||||||
|
- arn:aws:iam::aws:policy/EC2InstanceProfileForImageBuilder
|
||||||
|
- arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
|
||||||
|
|
||||||
|
InstanceProfile:
|
||||||
|
Type: AWS::IAM::InstanceProfile
|
||||||
|
Properties:
|
||||||
|
Roles:
|
||||||
|
- !Ref InstanceRole
|
||||||
|
|
||||||
|
# Component that runs the bootstrap script
|
||||||
|
BootstrapComponent:
|
||||||
|
Type: AWS::ImageBuilder::Component
|
||||||
|
Properties:
|
||||||
|
Name: !Sub "${AWS::StackName}-bootstrap-component"
|
||||||
|
Platform: !Ref InstanceOperatingSystem
|
||||||
|
Version: "1.0.0"
|
||||||
|
Description: Execute a bootstrap script.
|
||||||
|
Data: !Ref BootstrapScript
|
||||||
|
|
||||||
|
Recipe:
|
||||||
|
Type: AWS::ImageBuilder::ImageRecipe
|
||||||
|
Properties:
|
||||||
|
Name: !Sub "${AWS::StackName}-image"
|
||||||
|
Components:
|
||||||
|
- ComponentArn: !Ref BootstrapComponent
|
||||||
|
ParentImage: !Ref BaseImageId
|
||||||
|
BlockDeviceMappings:
|
||||||
|
- DeviceName: !If [IsInstanceWindows, "/dev/sda1", "/dev/xvda"]
|
||||||
|
Ebs:
|
||||||
|
DeleteOnTermination: true
|
||||||
|
Encrypted: false
|
||||||
|
VolumeSize: !Ref VolumeSize
|
||||||
|
VolumeType: gp2
|
||||||
|
Version: "1.0.0"
|
||||||
|
|
||||||
|
Infrastructure:
|
||||||
|
Type: AWS::ImageBuilder::InfrastructureConfiguration
|
||||||
|
Properties:
|
||||||
|
Name: !Sub "${AWS::StackName}-image-pipeline-infrastructure"
|
||||||
|
InstanceProfileName: !Ref InstanceProfile
|
||||||
|
InstanceTypes:
|
||||||
|
- !Ref InstanceType
|
||||||
|
TerminateInstanceOnFailure: true
|
||||||
|
|
||||||
|
# Copy to this region only
|
||||||
|
Distribution:
|
||||||
|
Type: AWS::ImageBuilder::DistributionConfiguration
|
||||||
|
Properties:
|
||||||
|
Name: !Sub "${AWS::StackName}-image-pipeline-distribution-config"
|
||||||
|
Distributions:
|
||||||
|
- Region: !Ref AWS::Region
|
||||||
|
AmiDistributionConfiguration: {}
|
||||||
|
|
||||||
|
# Composition of the above elements
|
||||||
|
Pipeline:
|
||||||
|
Type: AWS::ImageBuilder::ImagePipeline
|
||||||
|
Properties:
|
||||||
|
Name: !Sub "${AWS::StackName}-image-pipeline"
|
||||||
|
DistributionConfigurationArn: !Ref Distribution
|
||||||
|
ImageRecipeArn: !Ref Recipe
|
||||||
|
InfrastructureConfigurationArn: !Ref Infrastructure
|
||||||
@ -0,0 +1,26 @@
|
|||||||
|
name: BuildKiteLinuxAMD64GPUBootstrap
|
||||||
|
description: Set up worker image for linux-amd64-gpu pipeline
|
||||||
|
schemaVersion: 1.0
|
||||||
|
|
||||||
|
phases:
|
||||||
|
- name: build
|
||||||
|
steps:
|
||||||
|
- name: SetupStep
|
||||||
|
action: ExecuteBash
|
||||||
|
inputs:
|
||||||
|
commands:
|
||||||
|
- |
|
||||||
|
yum groupinstall -y "Development tools"
|
||||||
|
yum install -y kernel-devel-$(uname -r)
|
||||||
|
aws s3 cp --recursive s3://ec2-linux-nvidia-drivers/latest/ .
|
||||||
|
chmod +x NVIDIA-Linux-x86_64*.run
|
||||||
|
CC=/usr/bin/gcc10-cc ./NVIDIA-Linux-x86_64*.run --silent
|
||||||
|
|
||||||
|
amazon-linux-extras install docker
|
||||||
|
systemctl --now enable docker
|
||||||
|
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
|
||||||
|
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo \
|
||||||
|
| sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
|
||||||
|
yum clean expire-cache
|
||||||
|
yum install -y nvidia-docker2
|
||||||
|
systemctl restart docker
|
||||||
@ -0,0 +1,18 @@
|
|||||||
|
IMAGE_PARAMS = {
|
||||||
|
"linux-amd64-gpu": {
|
||||||
|
"BaseImageId": "linuxamd64",
|
||||||
|
# AMI ID is looked up from Buildkite's CloudFormation template
|
||||||
|
"BootstrapScript": "linux-amd64-gpu-bootstrap.yml",
|
||||||
|
"InstanceType": "g4dn.xlarge",
|
||||||
|
"InstanceOperatingSystem": "Linux",
|
||||||
|
"VolumeSize": "40", # in GiBs
|
||||||
|
},
|
||||||
|
"windows-gpu": {
|
||||||
|
"BaseImageId": "windows",
|
||||||
|
# AMI ID is looked up from Buildkite's CloudFormation template
|
||||||
|
"BootstrapScript": "windows-gpu-bootstrap.yml",
|
||||||
|
"InstanceType": "g4dn.2xlarge",
|
||||||
|
"InstanceOperatingSystem": "Windows",
|
||||||
|
"VolumeSize": "80", # in GiBs
|
||||||
|
},
|
||||||
|
}
|
||||||
@ -0,0 +1,22 @@
|
|||||||
|
import argparse
|
||||||
|
|
||||||
|
import boto3
|
||||||
|
from create_worker_image_pipelines import get_full_stack_id
|
||||||
|
from metadata import IMAGE_PARAMS
|
||||||
|
|
||||||
|
|
||||||
|
def main(args):
|
||||||
|
cf = boto3.resource("cloudformation", region_name=args.aws_region)
|
||||||
|
builder_client = boto3.client("imagebuilder", region_name=args.aws_region)
|
||||||
|
for stack_id in IMAGE_PARAMS:
|
||||||
|
stack_id_full = get_full_stack_id(stack_id)
|
||||||
|
pipeline_arn = cf.Stack(stack_id_full).Resource("Pipeline").physical_resource_id
|
||||||
|
print(f"Running pipeline {pipeline_arn} to generate a new AMI...")
|
||||||
|
r = builder_client.start_image_pipeline_execution(imagePipelineArn=pipeline_arn)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("--aws-region", type=str, required=True)
|
||||||
|
args = parser.parse_args()
|
||||||
|
main(args)
|
||||||
@ -0,0 +1,73 @@
|
|||||||
|
name: BuildKiteWindowsGPUBootstrap
|
||||||
|
description: Set up worker image for windows-gpu pipeline
|
||||||
|
schemaVersion: 1.0
|
||||||
|
|
||||||
|
phases:
|
||||||
|
- name: build
|
||||||
|
steps:
|
||||||
|
- name: SetupStep
|
||||||
|
action: ExecutePowerShell
|
||||||
|
inputs:
|
||||||
|
commands:
|
||||||
|
- |
|
||||||
|
$ErrorActionPreference = "Stop"
|
||||||
|
|
||||||
|
choco --version
|
||||||
|
choco feature enable -n=allowGlobalConfirmation
|
||||||
|
|
||||||
|
# CMake 3.18
|
||||||
|
Write-Host '>>> Installing CMake 3.18...'
|
||||||
|
choco install cmake --version 3.18.0 --installargs "ADD_CMAKE_TO_PATH=System"
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
|
||||||
|
# Notepad++
|
||||||
|
Write-Host '>>> Installing Notepad++...'
|
||||||
|
choco install notepadplusplus
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
|
||||||
|
# Miniconda
|
||||||
|
Write-Host '>>> Installing Miniconda...'
|
||||||
|
choco install miniconda3 /RegisterPython:1 /D:C:\tools\miniconda3
|
||||||
|
C:\tools\miniconda3\Scripts\conda.exe init --user --system
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
. "C:\Windows\System32\WindowsPowerShell\v1.0\profile.ps1"
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
conda config --set auto_activate_base false
|
||||||
|
conda config --prepend channels conda-forge
|
||||||
|
|
||||||
|
# Install Java 11
|
||||||
|
Write-Host '>>> Installing Java 11...'
|
||||||
|
choco install openjdk11jre
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
|
||||||
|
# Install GraphViz
|
||||||
|
Write-Host '>>> Installing GraphViz...'
|
||||||
|
choco install graphviz
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
|
||||||
|
# Install Visual Studio Community 2017 (15.9)
|
||||||
|
Write-Host '>>> Installing Visual Studio 2017 Community (15.9)...'
|
||||||
|
choco install visualstudio2017community --version 15.9.23.0 `
|
||||||
|
--params "--wait --passive --norestart"
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
choco install visualstudio2017-workload-nativedesktop --params `
|
||||||
|
"--wait --passive --norestart --includeOptional"
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
|
||||||
|
# Install CUDA 11.0
|
||||||
|
Write-Host '>>> Installing CUDA 11.0...'
|
||||||
|
choco install cuda --version 11.0.3
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
|
||||||
|
# Install Python packages
|
||||||
|
Write-Host '>>> Installing Python packages...'
|
||||||
|
conda activate
|
||||||
|
conda install -y mamba
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
|
||||||
|
# Install R
|
||||||
|
Write-Host '>>> Installing R...'
|
||||||
|
choco install r.project --version=3.6.3
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
choco install rtools --version=3.5.0.4
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
22
tests/buildkite/pipeline-win64.yml
Normal file
22
tests/buildkite/pipeline-win64.yml
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
steps:
|
||||||
|
- block: ":rocket: Run this test job"
|
||||||
|
#### -------- BUILD --------
|
||||||
|
- label: ":windows: Build XGBoost for Windows with CUDA"
|
||||||
|
command: "tests/buildkite/build-win64-gpu.ps1"
|
||||||
|
key: build-win64-gpu
|
||||||
|
agents:
|
||||||
|
queue: windows-gpu
|
||||||
|
- label: ":windows: Build XGBoost R package for Windows with CUDA"
|
||||||
|
command: "tests/buildkite/build-rpkg-win64-gpu.ps1"
|
||||||
|
key: build-rpkg-win64-gpu
|
||||||
|
agents:
|
||||||
|
queue: windows-gpu
|
||||||
|
|
||||||
|
- wait
|
||||||
|
|
||||||
|
#### -------- TEST --------
|
||||||
|
- label: ":windows: Test XGBoost on Windows"
|
||||||
|
command: "tests/buildkite/test-win64-gpu.ps1"
|
||||||
|
key: test-win64-gpu
|
||||||
|
agents:
|
||||||
|
queue: windows-gpu
|
||||||
93
tests/buildkite/pipeline.yml
Normal file
93
tests/buildkite/pipeline.yml
Normal file
@ -0,0 +1,93 @@
|
|||||||
|
env:
|
||||||
|
DOCKER_CACHE_ECR_ID: "492475357299"
|
||||||
|
DOCKER_CACHE_ECR_REGION: "us-west-2"
|
||||||
|
steps:
|
||||||
|
- block: ":rocket: Run this test job"
|
||||||
|
#### -------- BUILD --------
|
||||||
|
- label: ":console: Run clang-tidy"
|
||||||
|
command: "tests/buildkite/run-clang-tidy.sh"
|
||||||
|
key: run-clang-tidy
|
||||||
|
agents:
|
||||||
|
queue: linux-amd64-cpu
|
||||||
|
- label: ":console: Build CPU"
|
||||||
|
command: "tests/buildkite/build-cpu.sh"
|
||||||
|
key: build-cpu
|
||||||
|
agents:
|
||||||
|
queue: linux-amd64-cpu
|
||||||
|
- label: ":console: Build CPU ARM64"
|
||||||
|
command: "tests/buildkite/build-cpu-arm64.sh"
|
||||||
|
key: build-cpu-arm64
|
||||||
|
agents:
|
||||||
|
queue: linux-arm64-cpu
|
||||||
|
- label: ":console: Build CUDA"
|
||||||
|
command: "tests/buildkite/build-cuda.sh"
|
||||||
|
key: build-cuda
|
||||||
|
agents:
|
||||||
|
queue: linux-amd64-cpu
|
||||||
|
- label: ":console: Build CUDA with RMM"
|
||||||
|
command: "tests/buildkite/build-cuda-with-rmm.sh"
|
||||||
|
key: build-cuda-with-rmm
|
||||||
|
agents:
|
||||||
|
queue: linux-amd64-cpu
|
||||||
|
- label: ":console: Build R package with CUDA"
|
||||||
|
command: "tests/buildkite/build-gpu-rpkg.sh"
|
||||||
|
key: build-gpu-rpkg
|
||||||
|
agents:
|
||||||
|
queue: linux-amd64-cpu
|
||||||
|
- label: ":console: Build JVM packages"
|
||||||
|
command: "tests/buildkite/build-jvm-packages.sh"
|
||||||
|
key: build-jvm-packages
|
||||||
|
agents:
|
||||||
|
queue: linux-amd64-cpu
|
||||||
|
- label: ":console: Build JVM packages with CUDA"
|
||||||
|
command: "tests/buildkite/build-jvm-packages-gpu.sh"
|
||||||
|
key: build-jvm-packages-gpu
|
||||||
|
agents:
|
||||||
|
queue: linux-amd64-mgpu
|
||||||
|
- label: ":console: Build JVM package doc"
|
||||||
|
command: "tests/buildkite/build-jvm-doc.sh"
|
||||||
|
key: build-jvm-doc
|
||||||
|
agents:
|
||||||
|
queue: linux-amd64-cpu
|
||||||
|
|
||||||
|
- wait
|
||||||
|
|
||||||
|
#### -------- TEST --------
|
||||||
|
- label: ":console: Test Python package, CPU"
|
||||||
|
command: "tests/buildkite/test-python-cpu.sh"
|
||||||
|
key: test-python-cpu
|
||||||
|
agents:
|
||||||
|
queue: linux-amd64-cpu
|
||||||
|
- label: ":console: Test Python package, CPU ARM64"
|
||||||
|
command: "tests/buildkite/test-python-cpu-arm64.sh"
|
||||||
|
key: test-python-cpu-arm64
|
||||||
|
agents:
|
||||||
|
queue: linux-arm64-cpu
|
||||||
|
- label: ":console: Test Python package, single GPU"
|
||||||
|
command: "tests/buildkite/test-python-gpu.sh gpu"
|
||||||
|
key: test-python-gpu
|
||||||
|
agents:
|
||||||
|
queue: linux-amd64-gpu
|
||||||
|
- label: ":console: Test Python package, 4 GPUs"
|
||||||
|
command: "tests/buildkite/test-python-gpu.sh mgpu"
|
||||||
|
key: test-python-mgpu
|
||||||
|
agents:
|
||||||
|
queue: linux-amd64-mgpu
|
||||||
|
- label: ":console: Run Google Tests, 4 GPUs"
|
||||||
|
command: "tests/buildkite/test-cpp-gpu.sh"
|
||||||
|
key: test-cpp-gpu
|
||||||
|
agents:
|
||||||
|
queue: linux-amd64-mgpu
|
||||||
|
- label: ":console: Run integration tests with JVM packages"
|
||||||
|
command: "tests/buildkite/test-integration-jvm-packages.sh"
|
||||||
|
key: test-integration-jvm-packages
|
||||||
|
agents:
|
||||||
|
queue: linux-amd64-cpu
|
||||||
|
|
||||||
|
- wait
|
||||||
|
|
||||||
|
- label: ":console: Deploy JVM packages"
|
||||||
|
command: "tests/buildkite/deploy-jvm-packages.sh"
|
||||||
|
key: deploy-jvm-packages
|
||||||
|
agents:
|
||||||
|
queue: linux-amd64-cpu
|
||||||
13
tests/buildkite/run-clang-tidy.sh
Executable file
13
tests/buildkite/run-clang-tidy.sh
Executable file
@ -0,0 +1,13 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
CUDA_VERSION=11.0.3
|
||||||
|
|
||||||
|
echo "--- Run clang-tidy"
|
||||||
|
|
||||||
|
source tests/buildkite/conftest.sh
|
||||||
|
|
||||||
|
tests/ci_build/ci_build.sh clang_tidy docker \
|
||||||
|
--build-arg CUDA_VERSION_ARG=${CUDA_VERSION} \
|
||||||
|
python3 tests/ci_build/tidy.py --cuda-archs 75
|
||||||
21
tests/buildkite/test-cpp-gpu.sh
Executable file
21
tests/buildkite/test-cpp-gpu.sh
Executable file
@ -0,0 +1,21 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
CUDA_VERSION=11.0.3
|
||||||
|
|
||||||
|
source tests/buildkite/conftest.sh
|
||||||
|
|
||||||
|
echo "--- Run Google Tests with CUDA, using 4 GPUs"
|
||||||
|
buildkite-agent artifact download "build/testxgboost" . --step build-cuda
|
||||||
|
chmod +x build/testxgboost
|
||||||
|
tests/ci_build/ci_build.sh gpu nvidia-docker \
|
||||||
|
--build-arg CUDA_VERSION_ARG=$CUDA_VERSION build/testxgboost
|
||||||
|
|
||||||
|
echo "--- Run Google Tests with CUDA, using 4 GPUs, RMM enabled"
|
||||||
|
rm -rfv build/
|
||||||
|
buildkite-agent artifact download "build/testxgboost" . --step build-cuda-with-rmm
|
||||||
|
chmod +x build/testxgboost
|
||||||
|
tests/ci_build/ci_build.sh rmm nvidia-docker \
|
||||||
|
--build-arg CUDA_VERSION_ARG=$CUDA_VERSION bash -c \
|
||||||
|
"source activate gpu_test && build/testxgboost --use-rmm-pool"
|
||||||
16
tests/buildkite/test-integration-jvm-packages.sh
Executable file
16
tests/buildkite/test-integration-jvm-packages.sh
Executable file
@ -0,0 +1,16 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
JDK_VERSION=8
|
||||||
|
SPARK_VERSION=3.0.1
|
||||||
|
|
||||||
|
source tests/buildkite/conftest.sh
|
||||||
|
|
||||||
|
echo "--- Test XGBoost4J on a machine with JDK ${JDK_VERSION}, Spark ${SPARK_VERSION}"
|
||||||
|
buildkite-agent artifact download "jvm-packages/xgboost4j/target/*.jar" . --step build-jvm-packages
|
||||||
|
buildkite-agent artifact download "jvm-packages/xgboost4j-spark/target/*.jar" . --step build-jvm-packages
|
||||||
|
buildkite-agent artifact download "jvm-packages/xgboost4j-example/target/*.jar" . --step build-jvm-packages
|
||||||
|
export CI_DOCKER_EXTRA_PARAMS_INIT='-e RUN_INTEGRATION_TEST=1'
|
||||||
|
tests/ci_build/ci_build.sh jvm_cross docker --build-arg JDK_VERSION=${JDK_VERSION} \
|
||||||
|
--build-arg SPARK_VERSION=${SPARK_VERSION} tests/ci_build/test_jvm_cross.sh
|
||||||
11
tests/buildkite/test-python-cpu-arm64.sh
Executable file
11
tests/buildkite/test-python-cpu-arm64.sh
Executable file
@ -0,0 +1,11 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
source tests/buildkite/conftest.sh
|
||||||
|
|
||||||
|
echo "--- Test Python CPU ARM64"
|
||||||
|
buildkite-agent artifact download "python-package/dist/*.whl" . --step build-cpu-arm64
|
||||||
|
buildkite-agent artifact download "xgboost" . --step build-cpu-arm64
|
||||||
|
chmod +x ./xgboost
|
||||||
|
tests/ci_build/ci_build.sh aarch64 docker tests/ci_build/test_python.sh cpu-arm64
|
||||||
14
tests/buildkite/test-python-cpu.sh
Executable file
14
tests/buildkite/test-python-cpu.sh
Executable file
@ -0,0 +1,14 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
echo "--- Test CPU code in Python env"
|
||||||
|
|
||||||
|
source tests/buildkite/conftest.sh
|
||||||
|
|
||||||
|
mkdir -pv python-package/dist
|
||||||
|
buildkite-agent artifact download "python-package/dist/*.whl" . --step build-cuda
|
||||||
|
buildkite-agent artifact download "xgboost" . --step build-cpu
|
||||||
|
chmod +x ./xgboost
|
||||||
|
|
||||||
|
tests/ci_build/ci_build.sh cpu docker tests/ci_build/test_python.sh cpu
|
||||||
46
tests/buildkite/test-python-gpu.sh
Executable file
46
tests/buildkite/test-python-gpu.sh
Executable file
@ -0,0 +1,46 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
CUDA_VERSION=11.0.3
|
||||||
|
|
||||||
|
if [ "$#" -lt 1 ]
|
||||||
|
then
|
||||||
|
suite=''
|
||||||
|
args=''
|
||||||
|
else
|
||||||
|
suite=$1
|
||||||
|
shift 1
|
||||||
|
args="$@"
|
||||||
|
fi
|
||||||
|
|
||||||
|
source tests/buildkite/conftest.sh
|
||||||
|
|
||||||
|
echo "--- Fetch build artifacts"
|
||||||
|
buildkite-agent artifact download "python-package/dist/*.whl" . --step build-cuda
|
||||||
|
buildkite-agent artifact download "build/testxgboost" . --step build-cuda
|
||||||
|
chmod +x build/testxgboost
|
||||||
|
|
||||||
|
# Allocate extra space in /dev/shm to enable NCCL
|
||||||
|
export CI_DOCKER_EXTRA_PARAMS_INIT='--shm-size=4g'
|
||||||
|
|
||||||
|
command_wrapper="tests/ci_build/ci_build.sh gpu nvidia-docker --build-arg "`
|
||||||
|
`"CUDA_VERSION_ARG=$CUDA_VERSION"
|
||||||
|
|
||||||
|
# Run specified test suite
|
||||||
|
case "$suite" in
|
||||||
|
gpu)
|
||||||
|
echo "--- Test XGBoost Python package, single GPU"
|
||||||
|
$command_wrapper tests/ci_build/test_python.sh $suite
|
||||||
|
;;
|
||||||
|
|
||||||
|
mgpu)
|
||||||
|
echo "--- Test XGBoost Python package, 4 GPUs"
|
||||||
|
$command_wrapper tests/ci_build/test_python.sh $suite
|
||||||
|
;;
|
||||||
|
|
||||||
|
*)
|
||||||
|
echo "Usage: $0 {gpu|mgpu} [extra args to pass to pytest]"
|
||||||
|
exit 1
|
||||||
|
;;
|
||||||
|
esac
|
||||||
37
tests/buildkite/test-win64-gpu.ps1
Normal file
37
tests/buildkite/test-win64-gpu.ps1
Normal file
@ -0,0 +1,37 @@
|
|||||||
|
$ErrorActionPreference = "Stop"
|
||||||
|
|
||||||
|
. tests/buildkite/conftest.ps1
|
||||||
|
|
||||||
|
Write-Host "--- Test XGBoost on Windows with CUDA"
|
||||||
|
|
||||||
|
New-Item python-package/dist -ItemType Directory -ea 0
|
||||||
|
New-Item build -ItemType Directory -ea 0
|
||||||
|
buildkite-agent artifact download "python-package/dist/*.whl" . --step build-win64-gpu
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
buildkite-agent artifact download "build/testxgboost.exe" . --step build-win64-gpu
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
buildkite-agent artifact download "xgboost.exe" . --step build-win64-gpu
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
|
||||||
|
nvcc --version
|
||||||
|
|
||||||
|
Write-Host "--- Run Google Tests"
|
||||||
|
& build/testxgboost.exe
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
|
||||||
|
Write-Host "--- Set up Python env"
|
||||||
|
conda activate
|
||||||
|
$env_name = -join("win64_", (New-Guid).ToString().replace("-", ""))
|
||||||
|
mamba env create -n ${env_name} --file=tests/ci_build/conda_env/win64_test.yml
|
||||||
|
conda activate ${env_name}
|
||||||
|
Get-ChildItem . -Filter python-package/dist/*.whl |
|
||||||
|
Foreach-Object {
|
||||||
|
& python -m pip install python-package/dist/$_
|
||||||
|
if ($LASTEXITCODE -ne 0) { throw "Last command failed" }
|
||||||
|
}
|
||||||
|
|
||||||
|
Write-Host "--- Run Python tests"
|
||||||
|
python -X faulthandler -m pytest -v -s -rxXs --fulltrace tests/python
|
||||||
|
Write-Host "--- Run Python tests with GPU"
|
||||||
|
python -X faulthandler -m pytest -v -s -rxXs --fulltrace -m "(not slow) and (not mgpu)"`
|
||||||
|
tests/python-gpu
|
||||||
@ -10,7 +10,7 @@ RUN \
|
|||||||
yum update -y && \
|
yum update -y && \
|
||||||
yum install -y devtoolset-7 && \
|
yum install -y devtoolset-7 && \
|
||||||
# Python
|
# Python
|
||||||
wget --no-verbose -O conda.sh https://github.com/conda-forge/miniforge/releases/download/4.8.2-1/Miniforge3-4.8.2-1-Linux-aarch64.sh && \
|
wget -nv -O conda.sh https://github.com/conda-forge/miniforge/releases/download/4.8.2-1/Miniforge3-4.8.2-1-Linux-aarch64.sh && \
|
||||||
bash conda.sh -b -p /opt/miniforge-python
|
bash conda.sh -b -p /opt/miniforge-python
|
||||||
|
|
||||||
ENV PATH=/opt/miniforge-python/bin:$PATH
|
ENV PATH=/opt/miniforge-python/bin:$PATH
|
||||||
@ -25,7 +25,7 @@ RUN conda env create -n aarch64_test --file=/scripts/aarch64_test.yml
|
|||||||
|
|
||||||
# Install lightweight sudo (not bound to TTY)
|
# Install lightweight sudo (not bound to TTY)
|
||||||
RUN set -ex; \
|
RUN set -ex; \
|
||||||
wget --no-verbose -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-arm64" && \
|
wget -nv -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-arm64" && \
|
||||||
chmod +x /usr/local/bin/gosu && \
|
chmod +x /usr/local/bin/gosu && \
|
||||||
gosu nobody true
|
gosu nobody true
|
||||||
|
|
||||||
|
|||||||
@ -11,7 +11,7 @@ RUN \
|
|||||||
apt-get update && \
|
apt-get update && \
|
||||||
apt-get install -y tar unzip wget git build-essential python3 python3-pip software-properties-common \
|
apt-get install -y tar unzip wget git build-essential python3 python3-pip software-properties-common \
|
||||||
apt-transport-https ca-certificates gnupg-agent && \
|
apt-transport-https ca-certificates gnupg-agent && \
|
||||||
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - && \
|
wget -nv -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - && \
|
||||||
add-apt-repository -u 'deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-11 main' && \
|
add-apt-repository -u 'deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-11 main' && \
|
||||||
apt-get update && \
|
apt-get update && \
|
||||||
apt-get install -y llvm-11 clang-tidy-11 clang-11 && \
|
apt-get install -y llvm-11 clang-tidy-11 clang-11 && \
|
||||||
@ -31,7 +31,7 @@ ENV GOSU_VERSION 1.10
|
|||||||
|
|
||||||
# Install lightweight sudo (not bound to TTY)
|
# Install lightweight sudo (not bound to TTY)
|
||||||
RUN set -ex; \
|
RUN set -ex; \
|
||||||
wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
|
wget -nv -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
|
||||||
chmod +x /usr/local/bin/gosu && \
|
chmod +x /usr/local/bin/gosu && \
|
||||||
gosu nobody true
|
gosu nobody true
|
||||||
|
|
||||||
|
|||||||
@ -15,7 +15,7 @@ RUN \
|
|||||||
wget -nv -nc https://cmake.org/files/v3.14/cmake-3.14.0-Linux-x86_64.sh --no-check-certificate && \
|
wget -nv -nc https://cmake.org/files/v3.14/cmake-3.14.0-Linux-x86_64.sh --no-check-certificate && \
|
||||||
bash cmake-3.14.0-Linux-x86_64.sh --skip-license --prefix=/usr && \
|
bash cmake-3.14.0-Linux-x86_64.sh --skip-license --prefix=/usr && \
|
||||||
# Python
|
# Python
|
||||||
wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh && \
|
wget -nv https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh && \
|
||||||
bash Mambaforge-Linux-x86_64.sh -b -p /opt/python
|
bash Mambaforge-Linux-x86_64.sh -b -p /opt/python
|
||||||
|
|
||||||
ENV PATH=/opt/python/bin:$PATH
|
ENV PATH=/opt/python/bin:$PATH
|
||||||
@ -32,7 +32,7 @@ RUN mamba env create -n cpu_test --file=/scripts/cpu_test.yml
|
|||||||
|
|
||||||
# Install lightweight sudo (not bound to TTY)
|
# Install lightweight sudo (not bound to TTY)
|
||||||
RUN set -ex; \
|
RUN set -ex; \
|
||||||
wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
|
wget -nv -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
|
||||||
chmod +x /usr/local/bin/gosu && \
|
chmod +x /usr/local/bin/gosu && \
|
||||||
gosu nobody true
|
gosu nobody true
|
||||||
|
|
||||||
|
|||||||
@ -12,7 +12,7 @@ RUN \
|
|||||||
apt-get update && \
|
apt-get update && \
|
||||||
apt-get install -y wget unzip bzip2 libgomp1 build-essential openjdk-8-jdk-headless && \
|
apt-get install -y wget unzip bzip2 libgomp1 build-essential openjdk-8-jdk-headless && \
|
||||||
# Python
|
# Python
|
||||||
wget -O Miniconda3.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
|
wget -nv -O Miniconda3.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
|
||||||
bash Miniconda3.sh -b -p /opt/python
|
bash Miniconda3.sh -b -p /opt/python
|
||||||
|
|
||||||
ENV PATH=/opt/python/bin:$PATH
|
ENV PATH=/opt/python/bin:$PATH
|
||||||
@ -30,7 +30,7 @@ ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/
|
|||||||
|
|
||||||
# Install lightweight sudo (not bound to TTY)
|
# Install lightweight sudo (not bound to TTY)
|
||||||
RUN set -ex; \
|
RUN set -ex; \
|
||||||
wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
|
wget -nv -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
|
||||||
chmod +x /usr/local/bin/gosu && \
|
chmod +x /usr/local/bin/gosu && \
|
||||||
gosu nobody true
|
gosu nobody true
|
||||||
|
|
||||||
|
|||||||
@ -18,7 +18,7 @@ RUN \
|
|||||||
wget -nv -nc https://cmake.org/files/v3.14/cmake-3.14.0-Linux-x86_64.sh --no-check-certificate && \
|
wget -nv -nc https://cmake.org/files/v3.14/cmake-3.14.0-Linux-x86_64.sh --no-check-certificate && \
|
||||||
bash cmake-3.14.0-Linux-x86_64.sh --skip-license --prefix=/usr && \
|
bash cmake-3.14.0-Linux-x86_64.sh --skip-license --prefix=/usr && \
|
||||||
# Python
|
# Python
|
||||||
wget -O Miniconda3.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
|
wget -nv -O Miniconda3.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
|
||||||
bash Miniconda3.sh -b -p /opt/python
|
bash Miniconda3.sh -b -p /opt/python
|
||||||
|
|
||||||
# NCCL2 (License: https://docs.nvidia.com/deeplearning/sdk/nccl-sla/index.html)
|
# NCCL2 (License: https://docs.nvidia.com/deeplearning/sdk/nccl-sla/index.html)
|
||||||
@ -37,7 +37,7 @@ ENV GOSU_VERSION 1.10
|
|||||||
|
|
||||||
# Install lightweight sudo (not bound to TTY)
|
# Install lightweight sudo (not bound to TTY)
|
||||||
RUN set -ex; \
|
RUN set -ex; \
|
||||||
wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
|
wget -nv -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
|
||||||
chmod +x /usr/local/bin/gosu && \
|
chmod +x /usr/local/bin/gosu && \
|
||||||
gosu nobody true
|
gosu nobody true
|
||||||
|
|
||||||
|
|||||||
@ -16,15 +16,15 @@ RUN \
|
|||||||
apt-get update && \
|
apt-get update && \
|
||||||
apt-get install -y tar unzip wget openjdk-$JDK_VERSION-jdk libgomp1 && \
|
apt-get install -y tar unzip wget openjdk-$JDK_VERSION-jdk libgomp1 && \
|
||||||
# Python
|
# Python
|
||||||
wget -O Miniconda3.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
|
wget -nv -O Miniconda3.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
|
||||||
bash Miniconda3.sh -b -p /opt/python && \
|
bash Miniconda3.sh -b -p /opt/python && \
|
||||||
/opt/python/bin/pip install awscli && \
|
/opt/python/bin/pip install awscli && \
|
||||||
# Maven
|
# Maven
|
||||||
wget https://archive.apache.org/dist/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz && \
|
wget -nv https://archive.apache.org/dist/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz && \
|
||||||
tar xvf apache-maven-3.6.1-bin.tar.gz -C /opt && \
|
tar xvf apache-maven-3.6.1-bin.tar.gz -C /opt && \
|
||||||
ln -s /opt/apache-maven-3.6.1/ /opt/maven && \
|
ln -s /opt/apache-maven-3.6.1/ /opt/maven && \
|
||||||
# Spark
|
# Spark
|
||||||
wget https://archive.apache.org/dist/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop2.7.tgz && \
|
wget -nv https://archive.apache.org/dist/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop2.7.tgz && \
|
||||||
tar xvf spark-$SPARK_VERSION-bin-hadoop2.7.tgz -C /opt && \
|
tar xvf spark-$SPARK_VERSION-bin-hadoop2.7.tgz -C /opt && \
|
||||||
ln -s /opt/spark-$SPARK_VERSION-bin-hadoop2.7 /opt/spark
|
ln -s /opt/spark-$SPARK_VERSION-bin-hadoop2.7 /opt/spark
|
||||||
|
|
||||||
@ -38,7 +38,7 @@ ENV GOSU_VERSION 1.10
|
|||||||
|
|
||||||
# Install lightweight sudo (not bound to TTY)
|
# Install lightweight sudo (not bound to TTY)
|
||||||
RUN set -ex; \
|
RUN set -ex; \
|
||||||
wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
|
wget -nv -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
|
||||||
chmod +x /usr/local/bin/gosu && \
|
chmod +x /usr/local/bin/gosu && \
|
||||||
gosu nobody true
|
gosu nobody true
|
||||||
|
|
||||||
|
|||||||
@ -13,15 +13,15 @@ RUN \
|
|||||||
apt-get update && \
|
apt-get update && \
|
||||||
apt-get install -y tar unzip wget openjdk-$JDK_VERSION-jdk libgomp1 && \
|
apt-get install -y tar unzip wget openjdk-$JDK_VERSION-jdk libgomp1 && \
|
||||||
# Python
|
# Python
|
||||||
wget -O Miniconda3.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
|
wget -nv -O Miniconda3.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
|
||||||
bash Miniconda3.sh -b -p /opt/python && \
|
bash Miniconda3.sh -b -p /opt/python && \
|
||||||
/opt/python/bin/pip install awscli && \
|
/opt/python/bin/pip install awscli && \
|
||||||
# Maven
|
# Maven
|
||||||
wget https://archive.apache.org/dist/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz && \
|
wget -nv https://archive.apache.org/dist/maven/maven-3/3.6.1/binaries/apache-maven-3.6.1-bin.tar.gz && \
|
||||||
tar xvf apache-maven-3.6.1-bin.tar.gz -C /opt && \
|
tar xvf apache-maven-3.6.1-bin.tar.gz -C /opt && \
|
||||||
ln -s /opt/apache-maven-3.6.1/ /opt/maven && \
|
ln -s /opt/apache-maven-3.6.1/ /opt/maven && \
|
||||||
# Spark
|
# Spark
|
||||||
wget https://archive.apache.org/dist/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop2.7.tgz && \
|
wget -nv https://archive.apache.org/dist/spark/spark-$SPARK_VERSION/spark-$SPARK_VERSION-bin-hadoop2.7.tgz && \
|
||||||
tar xvf spark-$SPARK_VERSION-bin-hadoop2.7.tgz -C /opt && \
|
tar xvf spark-$SPARK_VERSION-bin-hadoop2.7.tgz -C /opt && \
|
||||||
ln -s /opt/spark-$SPARK_VERSION-bin-hadoop2.7 /opt/spark
|
ln -s /opt/spark-$SPARK_VERSION-bin-hadoop2.7 /opt/spark
|
||||||
|
|
||||||
@ -35,7 +35,7 @@ ENV GOSU_VERSION 1.10
|
|||||||
|
|
||||||
# Install lightweight sudo (not bound to TTY)
|
# Install lightweight sudo (not bound to TTY)
|
||||||
RUN set -ex; \
|
RUN set -ex; \
|
||||||
wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
|
wget -nv -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
|
||||||
chmod +x /usr/local/bin/gosu && \
|
chmod +x /usr/local/bin/gosu && \
|
||||||
gosu nobody true
|
gosu nobody true
|
||||||
|
|
||||||
|
|||||||
@ -12,7 +12,7 @@ RUN \
|
|||||||
apt-get update && \
|
apt-get update && \
|
||||||
apt-get install -y wget unzip bzip2 libgomp1 build-essential ninja-build git && \
|
apt-get install -y wget unzip bzip2 libgomp1 build-essential ninja-build git && \
|
||||||
# Python
|
# Python
|
||||||
wget -O Miniconda3.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
|
wget -nv -O Miniconda3.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
|
||||||
bash Miniconda3.sh -b -p /opt/python
|
bash Miniconda3.sh -b -p /opt/python
|
||||||
|
|
||||||
# NCCL2 (License: https://docs.nvidia.com/deeplearning/sdk/nccl-sla/index.html)
|
# NCCL2 (License: https://docs.nvidia.com/deeplearning/sdk/nccl-sla/index.html)
|
||||||
@ -33,7 +33,7 @@ ENV GOSU_VERSION 1.10
|
|||||||
|
|
||||||
# Install lightweight sudo (not bound to TTY)
|
# Install lightweight sudo (not bound to TTY)
|
||||||
RUN set -ex; \
|
RUN set -ex; \
|
||||||
wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
|
wget -nv -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
|
||||||
chmod +x /usr/local/bin/gosu && \
|
chmod +x /usr/local/bin/gosu && \
|
||||||
gosu nobody true
|
gosu nobody true
|
||||||
|
|
||||||
|
|||||||
@ -15,7 +15,7 @@ ENV GOSU_VERSION 1.10
|
|||||||
|
|
||||||
# Install lightweight sudo (not bound to TTY)
|
# Install lightweight sudo (not bound to TTY)
|
||||||
RUN set -ex; \
|
RUN set -ex; \
|
||||||
wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
|
wget -nv -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-amd64" && \
|
||||||
chmod +x /usr/local/bin/gosu && \
|
chmod +x /usr/local/bin/gosu && \
|
||||||
gosu nobody true
|
gosu nobody true
|
||||||
|
|
||||||
|
|||||||
@ -144,19 +144,19 @@ then
|
|||||||
DOCKER_CACHE_REPO="${DOCKER_CACHE_ECR_ID}.dkr.ecr.${DOCKER_CACHE_ECR_REGION}.amazonaws.com"
|
DOCKER_CACHE_REPO="${DOCKER_CACHE_ECR_ID}.dkr.ecr.${DOCKER_CACHE_ECR_REGION}.amazonaws.com"
|
||||||
echo "Using AWS ECR; repo URL = ${DOCKER_CACHE_REPO}"
|
echo "Using AWS ECR; repo URL = ${DOCKER_CACHE_REPO}"
|
||||||
# Login for Docker registry
|
# Login for Docker registry
|
||||||
echo "\$(python3 -m awscli ecr get-login --no-include-email --region ${DOCKER_CACHE_ECR_REGION} --registry-ids ${DOCKER_CACHE_ECR_ID})"
|
echo "\$(aws ecr get-login --no-include-email --region ${DOCKER_CACHE_ECR_REGION} --registry-ids ${DOCKER_CACHE_ECR_ID})"
|
||||||
$(python3 -m awscli ecr get-login --no-include-email --region ${DOCKER_CACHE_ECR_REGION} --registry-ids ${DOCKER_CACHE_ECR_ID})
|
$(aws ecr get-login --no-include-email --region ${DOCKER_CACHE_ECR_REGION} --registry-ids ${DOCKER_CACHE_ECR_ID})
|
||||||
# Pull pre-build container from Docker build cache,
|
# Pull pre-build container from Docker build cache,
|
||||||
# if one exists for the particular branch or pull request
|
# if one exists for the particular branch or pull request
|
||||||
echo "docker pull ${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:${BRANCH_NAME}"
|
echo "docker pull --quiet ${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:${BRANCH_NAME}"
|
||||||
if docker pull "${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:${BRANCH_NAME}"
|
if docker pull --quiet "${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:${BRANCH_NAME}"
|
||||||
then
|
then
|
||||||
CACHE_FROM_CMD="--cache-from ${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:${BRANCH_NAME}"
|
CACHE_FROM_CMD="--cache-from ${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:${BRANCH_NAME}"
|
||||||
else
|
else
|
||||||
# If the build cache is empty of the particular branch or pull request,
|
# If the build cache is empty of the particular branch or pull request,
|
||||||
# use the build cache associated with the master branch
|
# use the build cache associated with the master branch
|
||||||
echo "docker pull ${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:master"
|
echo "docker pull --quiet ${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:master"
|
||||||
docker pull "${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:master" || true
|
docker pull --quiet "${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:master" || true
|
||||||
CACHE_FROM_CMD="--cache-from ${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:master"
|
CACHE_FROM_CMD="--cache-from ${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:master"
|
||||||
fi
|
fi
|
||||||
else
|
else
|
||||||
@ -188,11 +188,11 @@ then
|
|||||||
echo "docker tag ${DOCKER_IMG_NAME} ${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:${BRANCH_NAME}"
|
echo "docker tag ${DOCKER_IMG_NAME} ${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:${BRANCH_NAME}"
|
||||||
docker tag "${DOCKER_IMG_NAME}" "${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:${BRANCH_NAME}"
|
docker tag "${DOCKER_IMG_NAME}" "${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:${BRANCH_NAME}"
|
||||||
|
|
||||||
echo "python3 -m awscli ecr create-repository --repository-name ${DOCKER_IMG_NAME} --region ${DOCKER_CACHE_ECR_REGION} || true"
|
echo "aws ecr create-repository --repository-name ${DOCKER_IMG_NAME} --region ${DOCKER_CACHE_ECR_REGION} || true"
|
||||||
python3 -m awscli ecr create-repository --repository-name ${DOCKER_IMG_NAME} --region ${DOCKER_CACHE_ECR_REGION} || true
|
aws ecr create-repository --repository-name ${DOCKER_IMG_NAME} --region ${DOCKER_CACHE_ECR_REGION} || true
|
||||||
|
|
||||||
echo "docker push ${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:${BRANCH_NAME}"
|
echo "docker push --quiet ${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:${BRANCH_NAME}"
|
||||||
docker push "${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:${BRANCH_NAME}"
|
docker push --quiet "${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:${BRANCH_NAME}"
|
||||||
if [[ $? != "0" ]]; then
|
if [[ $? != "0" ]]; then
|
||||||
echo "ERROR: could not update Docker cache ${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:${BRANCH_NAME}"
|
echo "ERROR: could not update Docker cache ${DOCKER_CACHE_REPO}/${DOCKER_IMG_NAME}:${BRANCH_NAME}"
|
||||||
exit 1
|
exit 1
|
||||||
|
|||||||
@ -14,5 +14,6 @@ for wheel_path in sorted(glob.glob(sys.argv[1])):
|
|||||||
assert m, f'wheel_path = {wheel_path}'
|
assert m, f'wheel_path = {wheel_path}'
|
||||||
version = m.group(1)
|
version = m.group(1)
|
||||||
|
|
||||||
|
print(f"Inserting vcomp140.dll into {wheel_path}...")
|
||||||
with zipfile.ZipFile(wheel_path, 'a') as f:
|
with zipfile.ZipFile(wheel_path, 'a') as f:
|
||||||
f.write(vcomp140_path, 'xgboost-{}.data/data/xgboost/vcomp140.dll'.format(version))
|
f.write(vcomp140_path, 'xgboost-{}.data/data/xgboost/vcomp140.dll'.format(version))
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user