Skip to content

Commit 104243b

Browse files
authored
Move CI to arcus (#4)
* add ofed with ci on arcus * create port in CI * install all OFED packages by default * don't remove ofed build deps (as removing ofed packages too) * do both OFED and non-ofed builds in CI * use github actions matrix w/ single packer build for ofed/non-ofed builds * add ofed flag to CI job name * Revert "add ofed flag to CI job name" This reverts commit 1a4fc93. * try to fix github matrix * delete TF-provisioned infra on success or cancellation * Reduce to 20GB root disk size * upload image to S3 prerelease bucket * fix conditional templating warning * ensure kernel upgrades cause reboot * change to Rocky8.6 image * convert from ofed branch * temporarily disable CI on PR * reduce diff to main * reenable CI on PR * remove terraform * remove ofed suffix * fix network * update docs * use RL8.5 instead of broken (no sudo/hostname) 8.6 image * fix command not found in CI * retry getting image name * remove quotes from jq output * fix download step * add s3cmd install
1 parent d475e99 commit 104243b

File tree

12 files changed

+148
-189
lines changed

12 files changed

+148
-189
lines changed

.github/arcus_bastion_fingerprint

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
|1|BwhEZQPqvZcdf9Phmh2mTPmIivU=|bHi1Nf8dYI8z1C+qsqQFPAty1xA= ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQChxwhZggdwj55gNzfDBzah0G8IeTPQjgMZrpboxp2BO4J+o1iZSwDj+2fqyhBGTE43vCJR13uEygz49XIy+t17qBNwHz4fVVR7jdMNymtbZoOsq9oAoBdGEICHrMzQsYZmT9+Wt74ZP2PKOOn+a+f2vg7YdeSy1UhT08iJlbXwCx56fCQnMJMOnZM9MXVLd4NUFN1TeOCIBQHwRiMJyJ7S7CdUKpyUqHOG85peKiPJ07C0RZ/W5HkYKqltwtvPGQd262p5eLC9j3nhOYSG2meRV8yTxYz3lDIPDx0+189CZ5NaxFSPCgqSYA24zavhPVLQqoct7nd7fcEw9JiTs+abZC6GckCONSHDLM+iRtWC/i5u21ZZDLxM9SIqPI96cYFszGeqyZoXxS5qPaIDHbQNAEqJp9ygNXgh9vuBo7E+aWYbFDTG0RuvW02fbmFfZw2/yXIr37+cQX+GPOnkfIRuHE3Hx5eN8C04v+BMrAfK2minawhG3A2ONJs9LI6QoeE=
2+
|1|whGSPLhKW4xt/7PWOZ1treg3PtA=|F5gwV8j0JYWDzjb6DvHHaqO+sxs= ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBCpCG881Gt3dr+nuVIC2uGEQkeVwG6WDdS1WcCoxXC7AG+Oi5bfdqtf4IfeLpWmeuEaAaSFH48ODFr76ViygSjU=
3+
|1|0V6eQ1FKO5NMKaHZeNFbw62mrJs=|H1vuGTbbtZD2MEgZxQf1PXPk+yU= ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIEnOtYByM3s2qvRT8SS1sn5z5sbwjzb1alm0B3emPcHJ

.github/smslabs_bastion_fingerprint

Lines changed: 0 additions & 3 deletions
This file was deleted.

.github/workflows/arcus.yml

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
name: Build images on arcus:rcp-cloud-portal-demo
2+
on:
3+
push:
4+
branches:
5+
- main
6+
pull_request:
7+
concurrency: rcp-cloud-portal-demo # openstack project
8+
jobs:
9+
arcus:
10+
runs-on: ubuntu-20.04
11+
defaults:
12+
run:
13+
working-directory: ./collections/ansible_collections/stackhpc/slurm_image_builder
14+
outputs:
15+
image_name: ${{ steps.manifest.outputs.IMAGE_NAME }}
16+
steps:
17+
- uses: actions/checkout@v2
18+
with:
19+
path: collections/ansible_collections/stackhpc/slurm_image_builder
20+
21+
- name: Setup ssh
22+
run: |
23+
set -x
24+
mkdir ~/.ssh
25+
echo "$SSH_KEY" > ~/.ssh/id_rsa
26+
chmod 0600 ~/.ssh/id_rsa
27+
env:
28+
SSH_KEY: ${{ secrets.ARCUS_SSH_KEY }}
29+
30+
- name: Add bastion's ssh key to known_hosts
31+
run: cat .github/arcus_bastion_fingerprint >> ~/.ssh/known_hosts
32+
shell: bash
33+
34+
- name: Install ansible etc
35+
run: ./setup.sh
36+
37+
- name: Write clouds.yaml
38+
run: |
39+
mkdir -p ~/.config/openstack/
40+
echo "$CLOUDS_YAML" > ~/.config/openstack/clouds.yaml
41+
shell: bash
42+
env:
43+
CLOUDS_YAML: ${{ secrets.ARCUS_CLOUDS_YAML }}
44+
45+
- name: Run image build
46+
id: image_build
47+
run: |
48+
. venv/bin/activate
49+
PACKER_LOG=1 packer build --on-error=ask -var-file=arcus.builder.pkrvars.hcl openstack.pkr.hcl
50+
env:
51+
OS_CLOUD: openstack
52+
53+
- name: Delete infrastructure
54+
run: terraform destroy -auto-approve
55+
env:
56+
OS_CLOUD: openstack
57+
if: ${{ success() || cancelled() }}
58+
59+
- name: Get created image name from manifest
60+
id: manifest
61+
run: |
62+
. venv/bin/activate
63+
IMAGE_ID=$(jq --raw-output '.builds[-1].artifact_id' packer-manifest.json)
64+
while ! openstack image show -f value -c name $IMAGE_ID; do
65+
sleep 30
66+
done
67+
IMAGE_NAME=$(openstack image show -f value -c name $IMAGE_ID)
68+
echo "::set-output name=IMAGE_ID::$IMAGE_ID"
69+
echo "::set-output name=IMAGE_NAME::$IMAGE_NAME"
70+
env:
71+
OS_CLOUD: openstack
72+
73+
- name: Download image to runner
74+
run: |
75+
. venv/bin/activate
76+
openstack image save --file ${{ steps.manifest.outputs.IMAGE_NAME }} ${{ steps.manifest.outputs.IMAGE_ID }}
77+
env:
78+
OS_CLOUD: openstack
79+
80+
- name: Upload image to S3 prerelease bucket
81+
run: |
82+
echo "$S3_CFG" > ~/.s3cfg
83+
sudo apt-get install s3cmd
84+
s3cmd put ${{ steps.manifest.outputs.IMAGE_NAME }} s3://openhpc-images-prerelease
85+
env:
86+
S3_CFG: ${{ secrets.ARCUS_S3_CFG }}

.github/workflows/smslabs.yml

Lines changed: 0 additions & 63 deletions
This file was deleted.

README.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,18 @@ Images contain the git description of this repo state used to build them in `/va
1010

1111
# Creating Images
1212

13-
TODO: Describe how to run in CI.
13+
For CI, simply trigger the workflow. Built images will be uploaded to Arcus `s3://openhpc-images-prerelease`.
1414

15-
Current manual steps, assuming a Rocky Linux 8.5 host on [sms-lab](https://api.sms-lab.cloud/):
15+
Current manual steps, assuming a Rocky Linux 8.5 host on Arcus:
16+
17+
1. Create an appropriate collections path, e.g:
18+
19+
mkdir -p collections/ansible_collections/stackhpc/slurm_image_builder
1620

1721
1. Clone the repo
1822
1. Install environment: `./setup.sh`
1923
1. Activate venv if necessary: `. venv/bin/activate`
20-
1. Build image: `PACKER_LOG=1 packer build --on-error=ask -var-file=smslabs.builder.pkrvars.hcl openstack.pkr.hcl`
24+
1. Build image: `PACKER_LOG=1 packer build --on-error=ask -var-file=arcus.builder.pkrvars.hcl openstack.pkr.hcl`
2125

2226
# Usage of Images
2327

arcus.builder.pkrvars.hcl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
port_id = "b04da514-c51f-4f33-885c-a57b352aa1b9"

openstack.pkr.hcl

Lines changed: 11 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -2,68 +2,34 @@
22
# $ PACKER_LOG=1 packer build --on-error=ask -var-file=<something>.pkrvars.hcl openstack.pkr.hcl
33

44
# "timestamp" template function replacement:s
5-
locals { timestamp = formatdate("YYMMDD-hhmm", timestamp())}
6-
7-
variable "networks" {
8-
type = list(string)
9-
}
5+
locals {timestamp = formatdate("YYMMDD-hhmm", timestamp())}
106

117
variable "source_image_name" {
128
type = string
13-
}
14-
15-
variable "flavor" {
16-
type = string
17-
}
18-
19-
variable "ssh_username" {
20-
type = string
21-
default = "rocky"
22-
}
23-
24-
variable "ssh_private_key_file" {
25-
type = string
26-
default = "~/.ssh/id_rsa"
27-
}
28-
29-
variable "ssh_keypair_name" {
30-
type = string
31-
}
32-
33-
variable "security_groups" {
34-
type = list(string)
35-
}
36-
37-
variable "image_visibility" {
38-
type = string
39-
default = "Private"
9+
default = "Rocky-8-GenericCloud-8.5-20211114.2.x86_64"
4010
}
4111

4212
variable "ssh_bastion_host" {
4313
type = string
14+
default = "128.232.222.183"
4415
}
4516

4617
variable "ssh_bastion_username" {
4718
type = string
48-
}
49-
50-
variable "ssh_bastion_private_key_file" {
51-
type = string
52-
default = "~/.ssh/id_rsa"
19+
default = "slurm-app-ci"
5320
}
5421

5522
source "openstack" "openhpc" {
56-
flavor = "${var.flavor}"
57-
networks = "${var.networks}"
23+
flavor = "vm.alaska.cpu.general.tiny"
24+
networks = ["4b6b2722-ee5b-40ec-8e52-a6610e14cc51"] # portal-internal
5825
source_image_name = "${var.source_image_name}" # NB: must already exist in OpenStack
59-
ssh_username = "${var.ssh_username}"
26+
ssh_username = "rocky"
6027
ssh_timeout = "20m"
61-
ssh_private_key_file = "${var.ssh_private_key_file}" # TODO: doc same requirements as for qemu build?
62-
ssh_keypair_name = "${var.ssh_keypair_name}" # TODO: doc this
28+
ssh_private_key_file = "~/.ssh/id_rsa"
29+
ssh_keypair_name = "slurm-app-ci"
6330
ssh_bastion_host = "${var.ssh_bastion_host}"
6431
ssh_bastion_username = "${var.ssh_bastion_username}"
65-
ssh_bastion_private_key_file = "${var.ssh_bastion_private_key_file}"
66-
security_groups = "${var.security_groups}"
32+
ssh_bastion_private_key_file = "~/.ssh/id_rsa"
6733
image_name = "${source.name}-${local.timestamp}.qcow2"
6834
}
6935

@@ -75,8 +41,8 @@ build {
7541
playbook_file = "playbooks/build.yml" # can't use ansible FQCN here
7642
use_proxy = false # see https://www.packer.io/docs/provisioners/ansible#troubleshooting
7743
extra_arguments = ["-v"]
78-
# ansible_ssh_common_args: '-o ProxyCommand="ssh {{ bastion_user }}@{{ bastion_ip }} -W %h:%p"'
7944
ansible_ssh_extra_args = ["-o ProxyCommand='ssh ${var.ssh_bastion_username }@${ var.ssh_bastion_host} -W %h:%p'"]
45+
# keep_inventory_file = true
8046
}
8147

8248
post-processor "manifest" {

playbooks/download_image.yml

Lines changed: 0 additions & 29 deletions
This file was deleted.

roles/builder/tasks/dnf_packages.yml

Lines changed: 1 addition & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,5 @@
11
---
22

3-
- name: Install rpm keys
4-
ansible.builtin.rpm_key:
5-
key: "{{ item }}"
6-
state: present
7-
loop: "{{ rpm_keys }}"
8-
9-
- name: Add dnf repos
10-
ansible.builtin.get_url:
11-
url: "{{ item }}"
12-
dest: "/etc/yum.repos.d/{{ item.split('/')[-1] }}"
13-
loop: "{{ dnf_add_repos }}"
14-
15-
- name: Enable dnf repos
16-
# NB: Doesn't use `dnf config-manager --set-enabled ...` as can't make that idempotent
17-
lineinfile:
18-
path: "{{ item }}"
19-
create: false # raises error if not already installed
20-
regexp: enabled=
21-
line: enabled=1
22-
loop: "{{ dnf_enabled_repos }}"
23-
24-
- name: Upgrade dnf packages
25-
dnf:
26-
name: '*'
27-
state: latest
28-
exclude: "{{ dnf_update_exclude }}"
29-
30-
- name: Reboot if required due to package upgrades
31-
reboot:
32-
post_reboot_delay: 30
33-
when: lookup('fileglob', '/var/run/reboot-required') | length > 0
34-
35-
- name: Wait for hosts to be reachable
36-
wait_for_connection:
37-
sleep: 15
38-
393
- name: Install dnf release packages
404
dnf: "{{ item }}"
415
loop: "{{ dnf_release_packages }}"
@@ -45,6 +9,6 @@
459
name: "{{ dnf_latest_packages }}"
4610
state: latest
4711

48-
- name: Install dnf packages:2
12+
- name: Install dnf packages at specific versions
4913
dnf:
5014
name: "{{ dnf_specific_packages }}"

roles/builder/tasks/dnf_repos.yml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
- name: Install rpm keys
2+
ansible.builtin.rpm_key:
3+
key: "{{ item }}"
4+
state: present
5+
loop: "{{ rpm_keys }}"
6+
7+
- name: Add dnf repos
8+
ansible.builtin.get_url:
9+
url: "{{ item }}"
10+
dest: "/etc/yum.repos.d/{{ item.split('/')[-1] }}"
11+
loop: "{{ dnf_add_repos }}"
12+
13+
- name: Enable dnf repos
14+
# NB: Doesn't use `dnf config-manager --set-enabled ...` as can't make that idempotent
15+
lineinfile:
16+
path: "{{ item }}"
17+
create: false # raises error if not already installed
18+
regexp: enabled=
19+
line: enabled=1
20+
loop: "{{ dnf_enabled_repos }}"

0 commit comments

Comments
 (0)