Skip to content

Commit 059d0f8

Browse files
authored
Merge pull request #448 from stackhpc/ci/s3-sync
Upload main images to Arcus S3 and sync clouds
2 parents 3f3f925 + 049f4fd commit 059d0f8

File tree

3 files changed

+147
-7
lines changed

3 files changed

+147
-7
lines changed

.github/workflows/s3-image-sync.yml

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,143 @@ jobs:
2323
run: |
2424
echo "${{ secrets['ARCUS_S3_CFG'] }}" > ~/.s3cfg
2525
shell: bash
26+
27+
- name: Install s3cmd
28+
run: |
29+
sudo apt-get --yes install s3cmd
30+
31+
- name: Cleanup S3 bucket
32+
run: |
33+
s3cmd rm s3://${{ env.S3_BUCKET }} --recursive --force
34+
35+
image_upload:
36+
runs-on: ubuntu-22.04
37+
needs: s3_cleanup
38+
concurrency: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.build }}
39+
strategy:
40+
fail-fast: false
41+
matrix:
42+
build:
43+
- RL8
44+
- RL9
45+
- RL9-cuda
46+
env:
47+
ANSIBLE_FORCE_COLOR: True
48+
OS_CLOUD: openstack
49+
CI_CLOUD: ${{ vars.CI_CLOUD }}
50+
outputs:
51+
ci_cloud: ${{ steps.ci.outputs.CI_CLOUD }}
52+
steps:
53+
- uses: actions/checkout@v2
54+
55+
- name: Record which cloud CI is running on
56+
id: ci
57+
run: |
58+
echo "CI_CLOUD=${{ env.CI_CLOUD }}" >> "$GITHUB_OUTPUT"
59+
60+
- name: Setup environment
61+
run: |
62+
python3 -m venv venv
63+
. venv/bin/activate
64+
pip install -U pip
65+
pip install $(grep -o 'python-openstackclient[><=0-9\.]*' requirements.txt)
66+
shell: bash
67+
68+
- name: Write clouds.yaml
69+
run: |
70+
mkdir -p ~/.config/openstack/
71+
echo "${{ secrets[format('{0}_CLOUDS_YAML', env.CI_CLOUD)] }}" > ~/.config/openstack/clouds.yaml
72+
shell: bash
73+
74+
- name: Write s3cmd configuration
75+
run: |
76+
echo "${{ secrets['ARCUS_S3_CFG'] }}" > ~/.s3cfg
77+
shell: bash
78+
79+
- name: Install s3cmd
80+
run: |
81+
sudo apt-get --yes install s3cmd
82+
83+
- name: Retrieve image name
84+
run: |
85+
TARGET_IMAGE=$(jq --arg version "${{ matrix.build }}" -r '.cluster_image[$version]' "${{ env.IMAGE_PATH }}")
86+
echo "TARGET_IMAGE=${TARGET_IMAGE}" >> "$GITHUB_ENV"
87+
shell: bash
88+
89+
- name: Download image to runner
90+
run: |
91+
. venv/bin/activate
92+
openstack image save --file ${{ env.TARGET_IMAGE }} ${{ env.TARGET_IMAGE }}
93+
shell: bash
94+
95+
- name: Upload Image to S3
96+
run: |
97+
echo "Uploading Image: ${{ env.TARGET_IMAGE }} to S3..."
98+
s3cmd --multipart-chunk-size-mb=150 put ${{ env.TARGET_IMAGE }} s3://${{ env.S3_BUCKET }}
99+
shell: bash
100+
101+
image_sync:
102+
needs: image_upload
103+
runs-on: ubuntu-22.04
104+
concurrency: ${{ github.workflow }}-${{ github.ref }}-${{ matrix.cloud }}-${{ matrix.build }}
105+
strategy:
106+
fail-fast: false
107+
matrix:
108+
cloud:
109+
- LEAFCLOUD
110+
- SMS
111+
- ARCUS
112+
build:
113+
- RL8
114+
- RL9
115+
- RL9-cuda
116+
exclude:
117+
- cloud: ${{ needs.image_upload.outputs.ci_cloud }}
118+
119+
env:
120+
ANSIBLE_FORCE_COLOR: True
121+
OS_CLOUD: openstack
122+
CI_CLOUD: ${{ matrix.cloud }}
123+
steps:
124+
- uses: actions/checkout@v2
125+
126+
- name: Record which cloud CI is running on
127+
run: |
128+
echo CI_CLOUD: ${{ env.CI_CLOUD }}
129+
130+
- name: Setup environment
131+
run: |
132+
python3 -m venv venv
133+
. venv/bin/activate
134+
pip install -U pip
135+
pip install $(grep -o 'python-openstackclient[><=0-9\.]*' requirements.txt)
136+
shell: bash
137+
138+
- name: Write clouds.yaml
139+
run: |
140+
mkdir -p ~/.config/openstack/
141+
echo "${{ secrets[format('{0}_CLOUDS_YAML', env.CI_CLOUD)] }}" > ~/.config/openstack/clouds.yaml
142+
shell: bash
143+
144+
- name: Retrieve image name
145+
run: |
146+
TARGET_IMAGE=$(jq --arg version "${{ matrix.build }}" -r '.cluster_image[$version]' "${{ env.IMAGE_PATH }}")
147+
echo "TARGET_IMAGE=${TARGET_IMAGE}" >> "$GITHUB_ENV"
148+
149+
- name: Download latest image if missing
150+
run: |
151+
. venv/bin/activate
152+
bash .github/bin/get-s3-image.sh ${{ env.TARGET_IMAGE }} ${{ env.S3_BUCKET }}
153+
154+
- name: Cleanup OpenStack Image (on error or cancellation)
155+
if: cancelled() || failure()
156+
run: |
157+
. venv/bin/activate
158+
image_hanging=$(openstack image list --name ${{ env.TARGET_IMAGE }} -f value -c ID -c Status | grep -v ' active$' | awk '{print $1}')
159+
if [ -n "$image_hanging" ]; then
160+
echo "Cleaning up OpenStack image with ID: $image_hanging"
161+
openstack image delete $image_hanging
162+
else
163+
echo "No image ID found, skipping cleanup."
164+
fi
165+
shell: bash

.github/workflows/upload-release-image.yml.sample

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ jobs:
5353
bash .github/bin/get-s3-image.sh ${{ inputs.image_name }} ${{ inputs.bucket_name }}
5454

5555
- name: Cleanup OpenStack Image (on error or cancellation)
56-
if: cancelled()
56+
if: cancelled() || failure()
5757
run: |
5858
. venv/bin/activate
5959
image_hanging=$(openstack image list --name ${{ inputs.image_name }} -f value -c ID -c Status | grep -v ' active$' | awk '{print $1}')

packer/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ The Packer configuration defined here builds "fat images" which contain binaries
77
- Ensures re-deployment of the cluster or deployment of additional nodes can be completed even if packages are changed in upstream repositories (e.g. due to RockyLinux or OpenHPC updates).
88
- Improves deployment speed by reducing the number of package downloads to improve deployment speed.
99

10-
By default, a fat image build starts from a RockyLinux GenericCloud image and updates all DNF packages already present.
10+
By default, a fat image build starts from a nightly image build containing Mellanox OFED, and updates all DNF packages already present. The 'latest' nightly build itself is from a RockyLinux GenericCloud image.
1111

12-
The fat images StackHPC builds and test in CI are available from [GitHub releases](https://github.com/stackhpc/ansible-slurm-appliance/releases). However with some additional configuration it is also possible to:
12+
The fat images StackHPC builds and test in CI are available from [GitHub releases](https://github.com/stackhpc/ansible-slurm-appliance/releases). However with some additional configuration it is also possible to:
1313
1. Build site-specific fat images from scratch.
1414
2. Extend an existing fat image with additional software.
1515

@@ -39,9 +39,9 @@ The steps for building site-specific fat images or extending an existing fat ima
3939
cd packer/
4040
PACKER_LOG=1 /usr/bin/packer build -only=openstack.openhpc --on-error=ask -var-file=$PKR_VAR_environment_root/builder.pkrvars.hcl openstack.pkr.hcl
4141
42-
Note that the `-only` flag here restricts the build to the non-OFED fat image "source" (in Packer terminology). Other
42+
Note that the `-only` flag here restricts the build to the non-CUDA fat image "source" (in Packer terminology). Other
4343
source options are:
44-
- `-only=openstack.openhpc-ofed`: Build a fat image including Mellanox OFED
44+
- `-only=openstack.openhpc-cuda`: Build a fat image including CUDA packages.
4545
- `-only=openstack.openhpc-extra`: Build an image which extends an existing fat image - in this case the variable `source_image` or `source_image_name}` must also be set in the Packer variables file.
4646
4747
5. The built image will be automatically uploaded to OpenStack with a name prefixed `openhpc-` and including a timestamp and a shortened git hash.
@@ -70,7 +70,7 @@ What is Slurm Appliance-specific are the details of how Ansible is run:
7070
openhpc-extra = ["foo"]
7171
}
7272
73-
the build VM uses an existing "fat image" (rather than a RockyLinyux GenericCloud one) and is added to the `builder` and `foo` groups. This means only code targeting `builder` and `foo` groups runs. In this way an existing image can be extended with site-specific code, without modifying the part of the image which has already been tested in the StackHPC CI.
73+
the build VM uses an existing "fat image" (rather than a 'latest' nightly one) and is added to the `builder` and `foo` groups. This means only code targeting `builder` and `foo` groups runs. In this way an existing image can be extended with site-specific code, without modifying the part of the image which has already been tested in the StackHPC CI.
7474
7575
- The playbook `ansible/fatimage.yml` is run which is only a subset of `ansible/site.yml`. This allows restricting the code
7676
which runs during build for cases where setting `builder` groupvars is not sufficient (e.g. a role always attempts to configure or start services). This may eventually be removed.
@@ -82,5 +82,5 @@ There are some things to be aware of when developing Ansible to run in a Packer
8282
- Build VM hostnames are not the same as for equivalent "real" hosts and do not contain `login`, `control` etc. Therefore variables used by the build VM must be defined as groupvars not hostvars.
8383
- Ansible may need to proxy to real compute nodes. If Packer should not use the same proxy to connect to the
8484
build VMs (e.g. build happens on a different network), proxy configuration should not be added to the `all` group.
85-
- Currently two fat image "sources" are defined, with and without OFED. This simplifies CI configuration by allowing the
85+
- Currently two fat image "sources" are defined, with and without CUDA. This simplifies CI configuration by allowing the
8686
default source images to be defined in the `openstack.pkr.hcl` definition.

0 commit comments

Comments
 (0)