-
Notifications
You must be signed in to change notification settings - Fork 23
OFED builder workflow #1132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
OFED builder workflow #1132
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
1a33e84
OFED workflow
assumptionsandg 7d5337e
Support DOCA OFED
assumptionsandg ffcf2a0
Push kernel modules
assumptionsandg 764a1c2
Push OFED userspace packages
assumptionsandg 78aed99
Fix build
assumptionsandg 95c4e9f
Fix kernel upgrade
assumptionsandg 2b30dff
Replace MLNX with DOCA
assumptionsandg 0ddfcac
Adjust lv_var_tmp to 2G
assumptionsandg 395e9cc
Fix workflow
assumptionsandg b902590
Disable gpg check for doca host
assumptionsandg 35d8345
Fix inputs in workflow
assumptionsandg a135c1b
Replace with_fileglob
assumptionsandg 9d628e8
Remove trailing slash in base_path
assumptionsandg cbb7a1b
Install kernel modules
assumptionsandg 1d14550
Re-add the vault password
assumptionsandg c5e7657
Remove trailing dash from distribution
assumptionsandg b62f57d
Remove LVM configuration
assumptionsandg 29bda85
Use reset-bls-entries playbook in OFED workflow
assumptionsandg 5a9126d
Use replace instead of lineinfile
assumptionsandg 699f244
Remove sed magic
assumptionsandg 022a7ea
Move OFED repositories to ofed.yml
assumptionsandg 1840b05
Rename build-ofed to build-ofed-rocky
assumptionsandg c61deb2
Add precheck for noexec
assumptionsandg d989478
Update workflow
assumptionsandg 2847ad9
WIP: OFED documentation
assumptionsandg 1c44353
Fix no eol
assumptionsandg 3490003
Add a release note
assumptionsandg 5eaf53c
Add to docs tree
assumptionsandg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,254 @@ | ||
--- | ||
name: Build OFED packages | ||
on: | ||
workflow_dispatch: | ||
inputs: | ||
rocky9: | ||
description: Build Rocky Linux 9 | ||
type: boolean | ||
default: true | ||
secrets: | ||
KAYOBE_VAULT_PASSWORD: | ||
required: true | ||
CLOUDS_YAML: | ||
required: true | ||
OS_APPLICATION_CREDENTIAL_ID: | ||
required: true | ||
OS_APPLICATION_CREDENTIAL_SECRET: | ||
required: true | ||
|
||
env: | ||
ANSIBLE_FORCE_COLOR: True | ||
KAYOBE_ENVIRONMENT: ci-builder | ||
KAYOBE_VAULT_PASSWORD: ${{ secrets.KAYOBE_VAULT_PASSWORD }} | ||
jobs: | ||
overcloud-ofed-packages: | ||
name: Build OFED packages | ||
if: github.repository == 'stackhpc/stackhpc-kayobe-config' | ||
runs-on: arc-skc-host-image-builder-runner | ||
permissions: {} | ||
steps: | ||
- name: Install Package | ||
uses: ConorMacBride/install-package@main | ||
with: | ||
apt: git unzip nodejs python3-pip python3-venv openssh-server openssh-client jq | ||
|
||
- name: Start the SSH service | ||
run: | | ||
sudo /etc/init.d/ssh start | ||
|
||
- name: Checkout | ||
uses: actions/checkout@v4 | ||
with: | ||
path: src/kayobe-config | ||
|
||
- name: Determine OpenStack release | ||
id: openstack_release | ||
run: | | ||
BRANCH=$(awk -F'=' '/defaultbranch/ {print $2}' src/kayobe-config/.gitreview) | ||
echo "openstack_release=${BRANCH}" | sed -E "s,(stable|unmaintained)/,," >> $GITHUB_OUTPUT | ||
|
||
- name: Clone StackHPC Kayobe repository | ||
uses: actions/checkout@v4 | ||
with: | ||
repository: stackhpc/kayobe | ||
ref: refs/heads/stackhpc/${{ steps.openstack_release.outputs.openstack_release }} | ||
path: src/kayobe | ||
|
||
- name: Install Kayobe | ||
run: | | ||
mkdir -p venvs && | ||
pushd venvs && | ||
python3 -m venv kayobe && | ||
source kayobe/bin/activate && | ||
pip install -U pip && | ||
pip install ../src/kayobe | ||
|
||
- name: Install terraform | ||
uses: hashicorp/setup-terraform@v2 | ||
|
||
- name: Initialise terraform | ||
run: terraform init | ||
working-directory: ${{ github.workspace }}/src/kayobe-config/terraform/aio | ||
|
||
- name: Generate SSH keypair | ||
run: ssh-keygen -f id_rsa -N '' | ||
working-directory: ${{ github.workspace }}/src/kayobe-config/terraform/aio | ||
|
||
- name: Generate clouds.yaml | ||
run: | | ||
cat << EOF > clouds.yaml | ||
${{ secrets.CLOUDS_YAML }} | ||
EOF | ||
working-directory: ${{ github.workspace }}/src/kayobe-config/terraform/aio | ||
|
||
- name: Output image tag | ||
id: image_tag | ||
run: | | ||
echo image_tag=$(grep stackhpc_rocky_9_overcloud_host_image_version: etc/kayobe/pulp-host-image-versions.yml | awk '{print $2}') >> $GITHUB_OUTPUT | ||
|
||
# Use the image override if set, otherwise use overcloud-os_distribution-os_release-tag | ||
- name: Output image name | ||
id: image_name | ||
run: | | ||
echo image_name=overcloud-rocky-9-${{ steps.image_tag.outputs.image_tag }} >> $GITHUB_OUTPUT | ||
|
||
- name: Generate terraform.tfvars | ||
run: | | ||
cat << EOF > terraform.tfvars | ||
ssh_public_key = "id_rsa.pub" | ||
ssh_username = "cloud-user" | ||
aio_vm_name = "skc-ofed-builder" | ||
aio_vm_image = "${{ env.VM_IMAGE }}" | ||
aio_vm_flavor = "en1.medium" | ||
aio_vm_network = "stackhpc-ci" | ||
aio_vm_subnet = "stackhpc-ci" | ||
aio_vm_interface = "ens3" | ||
EOF | ||
working-directory: ${{ github.workspace }}/src/kayobe-config/terraform/aio | ||
env: | ||
VM_IMAGE: ${{ steps.image_name.outputs.image_name }} | ||
|
||
- name: Terraform Plan | ||
run: terraform plan | ||
working-directory: ${{ github.workspace }}/src/kayobe-config/terraform/aio | ||
env: | ||
OS_CLOUD: "openstack" | ||
OS_APPLICATION_CREDENTIAL_ID: ${{ secrets.OS_APPLICATION_CREDENTIAL_ID }} | ||
OS_APPLICATION_CREDENTIAL_SECRET: ${{ secrets.OS_APPLICATION_CREDENTIAL_SECRET }} | ||
|
||
- name: Terraform Apply | ||
run: | | ||
for attempt in $(seq 5); do | ||
if terraform apply -auto-approve; then | ||
echo "Created infrastructure on attempt $attempt" | ||
exit 0 | ||
fi | ||
echo "Failed to create infrastructure on attempt $attempt" | ||
sleep 10 | ||
terraform destroy -auto-approve | ||
sleep 60 | ||
done | ||
echo "Failed to create infrastructure after $attempt attempts" | ||
exit 1 | ||
working-directory: ${{ github.workspace }}/src/kayobe-config/terraform/aio | ||
env: | ||
OS_CLOUD: "openstack" | ||
OS_APPLICATION_CREDENTIAL_ID: ${{ secrets.OS_APPLICATION_CREDENTIAL_ID }} | ||
OS_APPLICATION_CREDENTIAL_SECRET: ${{ secrets.OS_APPLICATION_CREDENTIAL_SECRET }} | ||
|
||
- name: Get Terraform outputs | ||
id: tf_outputs | ||
run: | | ||
terraform output -json | ||
working-directory: ${{ github.workspace }}/src/kayobe-config/terraform/aio | ||
|
||
- name: Write Terraform outputs | ||
run: | | ||
cat << EOF > src/kayobe-config/etc/kayobe/environments/ci-builder/tf-outputs.yml | ||
${{ steps.tf_outputs.outputs.stdout }} | ||
EOF | ||
|
||
- name: Write Terraform network config | ||
run: | | ||
cat << EOF > src/kayobe-config/etc/kayobe/environments/ci-builder/tf-network-allocation.yml | ||
--- | ||
aio_ips: | ||
builder: "{{ access_ip_v4.value }}" | ||
EOF | ||
|
||
- name: Write Terraform network interface config | ||
run: | | ||
mkdir -p src/kayobe-config/etc/kayobe/environments/$KAYOBE_ENVIRONMENT/inventory/group_vars/seed | ||
rm -f src/kayobe-config/etc/kayobe/environments/$KAYOBE_ENVIRONMENT/inventory/group_vars/seed/network-interfaces | ||
cat << EOF > src/kayobe-config/etc/kayobe/environments/$KAYOBE_ENVIRONMENT/inventory/group_vars/seed/network-interfaces | ||
admin_interface: "{{ access_interface.value }}" | ||
aio_interface: "{{ access_interface.value }}" | ||
EOF | ||
|
||
- name: Manage SSH keys | ||
run: | | ||
mkdir -p ~/.ssh | ||
touch ~/.ssh/authorized_keys | ||
cat src/kayobe-config/terraform/aio/id_rsa.pub >> ~/.ssh/authorized_keys | ||
cp src/kayobe-config/terraform/aio/id_rsa* ~/.ssh/ | ||
|
||
- name: Bootstrap the control host | ||
run: | | ||
source venvs/kayobe/bin/activate && | ||
source src/kayobe-config/kayobe-env --environment ci-builder && | ||
kayobe control host bootstrap | ||
|
||
- name: Run growroot playbook | ||
run: | | ||
source venvs/kayobe/bin/activate && | ||
source src/kayobe-config/kayobe-env --environment ci-builder && | ||
kayobe playbook run src/kayobe-config/etc/kayobe/ansible/growroot.yml | ||
env: | ||
KAYOBE_VAULT_PASSWORD: ${{ secrets.KAYOBE_VAULT_PASSWORD }} | ||
|
||
- name: Configure the seed host (Builder VM) | ||
run: | | ||
source venvs/kayobe/bin/activate && | ||
source src/kayobe-config/kayobe-env --environment ci-builder && | ||
kayobe seed host configure --skip-tags network,docker | ||
env: | ||
KAYOBE_VAULT_PASSWORD: ${{ secrets.KAYOBE_VAULT_PASSWORD }} | ||
|
||
- name: Run a distro-sync | ||
run: | | ||
source venvs/kayobe/bin/activate && | ||
source src/kayobe-config/kayobe-env --environment ci-builder && | ||
kayobe seed host command run --become --command "dnf distro-sync --refresh" | ||
env: | ||
KAYOBE_VAULT_PASSWORD: ${{ secrets.KAYOBE_VAULT_PASSWORD }} | ||
|
||
- name: Reset BLS entries on the seed host | ||
run: | | ||
source venvs/kayobe/bin/activate && | ||
source src/kayobe-config/kayobe-env --environment ci-builder && | ||
kayobe playbook run src/kayobe-config/etc/kayobe/ansible/reset-bls-entries.yml \ | ||
-e "reset_bls_host=ofed-builder" | ||
env: | ||
KAYOBE_VAULT_PASSWORD: ${{ secrets.KAYOBE_VAULT_PASSWORD }} | ||
|
||
- name: Disable noexec in /var/tmp | ||
run: | | ||
source venvs/kayobe/bin/activate && | ||
source src/kayobe-config/kayobe-env --environment ci-builder && | ||
kayobe seed host command run --become --command "sed -i 's/noexec,//g' /etc/fstab" | ||
env: | ||
KAYOBE_VAULT_PASSWORD: ${{ secrets.KAYOBE_VAULT_PASSWORD }} | ||
|
||
- name: Reboot to apply the kernel update | ||
run: | | ||
source venvs/kayobe/bin/activate && | ||
source src/kayobe-config/kayobe-env --environment ci-builder && | ||
kayobe playbook run src/kayobe-config/etc/kayobe/ansible/reboot.yml | ||
env: | ||
KAYOBE_VAULT_PASSWORD: ${{ secrets.KAYOBE_VAULT_PASSWORD }} | ||
|
||
- name: Run OFED builder playbook | ||
run: | | ||
source venvs/kayobe/bin/activate && | ||
source src/kayobe-config/kayobe-env --environment ci-builder && | ||
kayobe playbook run src/kayobe-config/etc/kayobe/ansible/build-ofed-rocky.yml | ||
env: | ||
KAYOBE_VAULT_PASSWORD: ${{ secrets.KAYOBE_VAULT_PASSWORD }} | ||
|
||
- name: Run OFED upload playbook | ||
run: | | ||
source venvs/kayobe/bin/activate && | ||
source src/kayobe-config/kayobe-env --environment ci-builder && | ||
kayobe playbook run src/kayobe-config/etc/kayobe/ansible/push-ofed.yml | ||
env: | ||
KAYOBE_VAULT_PASSWORD: ${{ secrets.KAYOBE_VAULT_PASSWORD }} | ||
|
||
- name: Destroy | ||
run: terraform destroy -auto-approve | ||
working-directory: ${{ github.workspace }}/src/kayobe-config/terraform/aio | ||
env: | ||
OS_CLOUD: openstack | ||
OS_APPLICATION_CREDENTIAL_ID: ${{ secrets.OS_APPLICATION_CREDENTIAL_ID }} | ||
OS_APPLICATION_CREDENTIAL_SECRET: ${{ secrets.OS_APPLICATION_CREDENTIAL_SECRET }} | ||
if: always() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Alex-Welsh marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
==== | ||
OFED | ||
==== | ||
|
||
Warning: Experimental workflow subject to change | ||
|
||
This section documents the workflow for building OFED packages for Release train integration. | ||
|
||
The workflow builds the OFED kernel modules against the latest available kernel in Release train | ||
(as configured in SKC) and compiles them into RPM packages to be uploaded to Ark. Addtionally, | ||
this workflow downloads the userspace OFED packages from the Nvidia repository and uploads these | ||
to Ark. | ||
|
||
Workflow | ||
======== | ||
|
||
The workflow uses workflow_dispatch to manually request an OFED build, which will deploy a builder | ||
VM, apply kayobe config to the builder, upgrade the kernel, reboot, then run two Ansible playbooks | ||
for building and uploading OFED to Ark. | ||
|
||
Pre-requisites | ||
-------------- | ||
|
||
Before building OFED packages, the workflow will ensure that: | ||
|
||
* A full distro-sync has taken place, ensuring the kernel is upgraded. | ||
|
||
* The bootloader has been configured to use the latest kernel | ||
|
||
* noexec is disabled in the temporary logical volume. | ||
|
||
build-ofed | ||
---------- | ||
|
||
Currently we only support building Rocky Linux 9 OFED packages. | ||
|
||
In order to setup OFED, we're required to build kernel modules for the OFED drivers as | ||
the kernels we provide in release train are unsupported by OFED. To accomplish this we | ||
will need to use the doca-kernel-support from the doca-extra repository. | ||
|
||
We will need to instll dependencies in order to build the OFED kernel modules, and these | ||
are installed at the beginning of the build playbook. We also install base and appstream | ||
dependencies of userspace OFED packages here, this is intended to stop these dependencies | ||
being pulled in later when we download the OFED packages from the doca-host repository. | ||
|
||
At the end of the playbook following the kernel module build, the OFED userspace packages | ||
are downloaded from the upstream repository in order to upload these to Ark. | ||
|
||
push-ofed | ||
--------- | ||
|
||
As we're not syncing OFED from any upstream source, and are instead creating our own | ||
repository of custom packages, we will be required to setup the Pulp distribution/publication | ||
and upload the content directly to Ark. This playbook uses the Pulp CLI to upload the RPMs | ||
to Ark. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
--- | ||
- name: Build OFED packages | ||
become: true | ||
hosts: ofed-builder | ||
gather_facts: false | ||
tasks: | ||
- name: Check whether noexec is enabled for /var/tmp | ||
ansible.builtin.lineinfile: | ||
path: "/etc/fstab" | ||
regexp: "noexec" | ||
state: absent | ||
changed_when: false | ||
check_mode: true | ||
register: result | ||
failed_when: result.found | ||
|
||
- name: Install package dependencies | ||
ansible.builtin.dnf: | ||
name: | ||
- kpartx | ||
- perl | ||
- rpm-build | ||
- automake | ||
- patch | ||
- kernel | ||
- kernel-devel | ||
- autoconf | ||
- pciutils | ||
- kernel-modules-extra | ||
- kernel-rpm-macros | ||
- lsof | ||
- libtool | ||
- tk | ||
- gcc-gfortran | ||
- tcl | ||
- createrepo | ||
- cmake-filesystem | ||
- libnl3-devel | ||
- python3-devel | ||
state: latest | ||
update_cache: true | ||
|
||
- name: Add DOCA host repository package | ||
ansible.builtin.dnf: | ||
name: https://developer.nvidia.com/downloads/networking/secure/doca-sdk/DOCA_2.8/doca-host-2.8.0-204000_{{ stackhpc_pulp_doca_ofed_version }}_rhel9{{ stackhpc_pulp_repo_rocky_9_minor_version }}.x86_64.rpm | ||
disable_gpg_check: true | ||
|
||
- name: Install DOCA extra packages | ||
ansible.builtin.dnf: | ||
name: doca-extra | ||
|
||
- name: Create build directory | ||
ansible.builtin.file: | ||
path: /home/cloud-user/ofed | ||
state: directory | ||
mode: 0777 | ||
|
||
- name: Set build directory | ||
ansible.builtin.replace: | ||
path: /opt/mellanox/doca/tools/doca-kernel-support | ||
regexp: 'TMP_DIR=\$1' | ||
replace: 'TMP_DIR=/home/cloud-user/ofed' | ||
|
||
- name: Build OFED kernel modules | ||
ansible.builtin.shell: | ||
cmd: | | ||
/opt/mellanox/doca/tools/doca-kernel-support | ||
|
||
- name: Download OFED userspace packages | ||
ansible.builtin.dnf: | ||
name: doca-ofed-userspace | ||
download_only: true | ||
download_dir: /home/cloud-user/ofed |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.