From 37886a5dffd8f45b6fb7c75d91d6efc767af3bd1 Mon Sep 17 00:00:00 2001 From: Philip Hyunsu Cho Date: Sun, 2 Oct 2022 12:45:00 -0700 Subject: [PATCH] [CI] Document the use of Docker wrapper script (#8297) * [CI] Document the use of Docker wrapper script * Grammer fixes * Document buildkite pipeline defs * tests/buildkite/*.sh isn't meant to run locally --- doc/contrib/ci.rst | 78 ++++++++++++++++++++++++++++++++++--- tests/buildkite/conftest.sh | 7 ++++ 2 files changed, 80 insertions(+), 5 deletions(-) diff --git a/doc/contrib/ci.rst b/doc/contrib/ci.rst index abb1cd73f..6073e646a 100644 --- a/doc/contrib/ci.rst +++ b/doc/contrib/ci.rst @@ -38,12 +38,80 @@ task of cross-compiling a Python wheel. (Note that ``cibuildwheel`` will call ``setup.py bdist_wheel``. Since XGBoost has a native library component, ``setup.py`` contains a glue code to call CMake and a C++ compiler to build the native library on the fly.) -******************************* -Elastic CI Stack with BuildKite -******************************* +********************************************************* +Reproduce CI testing environments using Docker containers +********************************************************* +In our CI pipelines, we use Docker containers extensively to package many software packages together. +You can reproduce the same testing environment as the CI pipelines by running Docker locally. + +============= +Prerequisites +============= +1. Install Docker: https://docs.docker.com/engine/install/ubuntu/ +2. Install NVIDIA Docker runtime: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installing-on-ubuntu-and-debian + The runtime lets you access NVIDIA GPUs inside a Docker container. + +============================================== +Building and Running Docker containers locally +============================================== +For your convenience, we provide the wrapper script ``tests/ci_build/ci_build.sh``. You can use it as follows: + +.. code-block:: bash + + tests/ci_build/ci_build.sh --build-arg \ + ... + +where: + +* ```` is the identifier for the container. The wrapper script will use the + container definition (Dockerfile) located at ``tests/ci_build/Dockerfile.``. + For example, setting the container type to ``gpu`` will cause the script to load the Dockerfile + ``tests/ci_build/Dockerfile.gpu``. +* ```` must be either ``docker`` or ``nvidia-docker``. Choose ``nvidia-docker`` + as long as you need to run any GPU code. +* ```` is a build argument to be passed to Docker. Must be of form ``VAR=VALUE``. + Example: ``--build-arg CUDA_VERSION_ARG=11.0``. You can pass multiple ``--build-arg``. +* ```` is the command to run inside the Docker container. This can be more than one argument. + Example: ``tests/ci_build/build_via_cmake.sh -DUSE_CUDA=ON -DUSE_NCCL=ON``. + +Optionally, you can set the environment variable ``CI_DOCKER_EXTRA_PARAMS_INIT`` to pass extra +arguments to Docker. For example: + +.. code-block:: bash + + # Allocate extra space in /dev/shm to enable NCCL + export CI_DOCKER_EXTRA_PARAMS_INIT='--shm-size=4g' + # Run multi-GPU test suite + tests/ci_build/ci_build.sh gpu nvidia-docker --build-arg CUDA_VERSION_ARG=11.0 \ + tests/ci_build/test_python.sh mgpu + +To pass multiple extra arguments: + +.. code-block:: bash + + export CI_DOCKER_EXTRA_PARAMS_INIT='-e VAR1=VAL1 -e VAR2=VAL2 -e VAR3=VAL3' + +******************************************** +Update pipeline definitions for BuildKite CI +******************************************** `BuildKite `_ is a SaaS (Software as a Service) platform that orchestrates -cloud machines to host CI pipelines. The BuildKite platform allows us to define cloud resources in +cloud machines to host CI pipelines. The BuildKite platform allows us to define CI pipelines as a +declarative YAML file. + +The pipeline definitions are found in ``tests/buildkite/``: + +* ``tests/buildkite/pipeline-win64.yml``: This pipeline builds and tests XGBoost for the Windows platform. +* ``tests/buildkite/pipeline-mgpu.yml``: This pipeline builds and tests XGBoost with access to multiple + NVIDIA GPUs. +* ``tests/buildkite/pipeline.yml``: This pipeline builds and tests XGBoost with access to a single + NVIDIA GPU. Most tests are located here. + +**************************************** +Managing Elastic CI Stack with BuildKite +**************************************** + +BuildKite allows us to define cloud resources in a declarative fashion. Every configuration step is now documented explicitly as code. **Prerequisite**: You should have some knowledge of `CloudFormation `_. @@ -93,4 +161,4 @@ workload. When a pull request is submitted, the following steps take place: to scale down. Idle worker instances are shut down. To set up the auto-scaling group, run the script ``tests/buildkite/infrastructure/aws-stack-creator/create_stack.py``. -Check the CloudFormation web console to verify successful provision of auto-scaling groups. \ No newline at end of file +Check the CloudFormation web console to verify successful provision of auto-scaling groups. diff --git a/tests/buildkite/conftest.sh b/tests/buildkite/conftest.sh index 6f33799e8..4dcc522bf 100755 --- a/tests/buildkite/conftest.sh +++ b/tests/buildkite/conftest.sh @@ -3,6 +3,13 @@ set -euo pipefail set -x +if [[ -z ${BUILDKITE:-} ]] +then + echo "$0 is not meant to run locally; it should run inside BuildKite." + echo "Please inspect the content of $0 and locate the desired command manually." + exit 1 +fi + if [[ -n $BUILDKITE_PULL_REQUEST && $BUILDKITE_PULL_REQUEST != "false" ]] then is_pull_request=1