-
Notifications
You must be signed in to change notification settings - Fork 34
Add support for OFED #254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Add support for OFED #254
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
f72377d
add ofed role
sjpb 9cffe0b
fix ofed dependencies install
sjpb 720852f
use mlnxofedinstall as recommended for Rocky8/9 now
sjpb 8114bb1
Merge branch 'main' into ofed
sjpb ba00f71
add build of OFED image to CI
sjpb 18b8e3f
add build of OFED image to CI
sjpb 6c8e3cc
fix ofed commands
sjpb d1d0299
default to OFED hpc package selection
sjpb d200b82
fix OFED packages concatenation on RL9
sjpb 7f257d6
autobuild on ofed branch
sjpb 2a18a2f
always build RL8 and RL9 images
sjpb d68559d
fix ofed_package_selection templating
sjpb f4fa9ec
fix ofed_build_packages
sjpb 2901294
avoid OFED install timeouts
sjpb 69d0562
Merge branch 'ofed' of github.com:stackhpc/ansible-slurm-appliance in…
sjpb f49922b
Merge branch 'main' into ofed
sjpb 0427319
add additional packages for RL8
sjpb b854dd3
bump leafcloud build size for memory issues
sjpb 81bcf36
fix missing packages for RL9 build
sjpb 430af8a
remove duplication in packer definition and allow for different OFED …
sjpb 8a0ec9b
add leafcloud OFED disk size
sjpb db81091
workaround OFED/turbovnc install clash
sjpb e9fe323
output multiple image names
sjpb 84485ed
bump CI to RL8 and RL9 OFED-enabled images
sjpb 3223846
Merge branch 'main' into ofed
sjpb 5b64a7c
Merge branch 'main' into ofed (default to RL9)
sjpb 4b09ba8
Merge branch 'main' into ofed
sjpb 7b1afa0
bump CI images (non-OFED for RL8)
sjpb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# ofed | ||
|
||
This role installs Mellanox OFED: | ||
- It checks that the running kernel is the latest installed one, and errors if not. | ||
- Installation uses the `mlnxofedinstall` command, with support for the running kernel | ||
and (by default) without firmware updates. | ||
|
||
As OFED installation takes a long time generally this should only be used during image build, | ||
for example by setting: | ||
|
||
``` | ||
environments/groups/<environment>/groups: | ||
[ofed:children] | ||
builder | ||
``` | ||
|
||
# Role variables | ||
|
||
See `defaults/main.yml` | ||
|
||
Note ansible facts are required, unless setting `ofed_distro_version` and `ofed_arch` specifically. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
ofed_version: 24.01-0.3.3.1 | ||
ofed_download_url: https://content.mellanox.com/ofed/MLNX_OFED-{{ ofed_version }}/MLNX_OFED_LINUX-{{ ofed_version }}-{{ ofed_distro }}{{ ofed_distro_version }}-{{ ofed_arch }}.tgz | ||
ofed_distro: rhel # NB: not expected to work on other distros due to installation differences | ||
ofed_distro_version: "{{ ansible_distribution_version }}" # e.g. '8.9' | ||
ofed_arch: "{{ ansible_architecture }}" | ||
ofed_tmp_dir: /tmp | ||
ofed_update_firmware: false | ||
ofed_build_packages: # may require additional packages depending on ofed_package_selection | ||
- autoconf | ||
- automake | ||
- gcc | ||
- gcc-gfortran | ||
- kernel-devel-{{ _ofed_loaded_kernel.stdout | trim }} | ||
- kernel-rpm-macros | ||
- libtool | ||
- lsof | ||
- patch | ||
- pciutils | ||
- perl | ||
- rpm-build | ||
- tcl | ||
- tk | ||
ofed_build_rl8_packages: | ||
- gdb-headless | ||
- python36 | ||
ofed_package_selection: # list of package selection flags for mlnxofedinstall script | ||
- hpc | ||
- with-nfsrdma |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
- name: Get installed kernels | ||
command: dnf list --installed kernel | ||
register: _ofed_dnf_kernels | ||
changed_when: false | ||
|
||
- name: Determine running kernel | ||
command: uname -r # e.g. 4.18.0-513.18.1.el8_9.x86_64 | ||
register: _ofed_loaded_kernel | ||
changed_when: false | ||
|
||
- name: Check current kernel is newest installed | ||
assert: | ||
that: _ofed_loaded_kernel.stdout == _ofed_dnf_kernels_newest | ||
fail_msg: "Kernel {{ _ofed_loaded_kernel.stdout }} is loaded but newer {{ _ofed_dnf_kernels_newest }} is installed: consider rebooting?" | ||
vars: | ||
_ofed_dnf_kernels_newest: >- | ||
{{ _ofed_dnf_kernels.stdout_lines[1:] | map('regex_replace', '^\w+\.(\w+)\s+(\S+)\s+\S+\s*$', '\2.\1') | community.general.version_sort | last }} | ||
# dnf line format e.g. "kernel.x86_64 4.18.0-513.18.1.el8_9 @baseos " | ||
|
||
- name: Enable epel | ||
dnf: | ||
name: epel-release | ||
|
||
- name: Check for existing OFED installation | ||
command: ofed_info | ||
changed_when: false | ||
failed_when: | ||
- _ofed_info.rc > 0 | ||
- "'No such file or directory' not in _ofed_info.msg" | ||
register: _ofed_info | ||
|
||
- name: Install build prerequisites | ||
dnf: | ||
name: "{{ ofed_build_packages + (ofed_build_rl8_packages if ofed_distro_version == '8.9' else []) }}" | ||
when: "'MLNX_OFED_LINUX-' + ofed_version not in _ofed_info.stdout" | ||
# don't want to install a load of prereqs unnecessarily | ||
|
||
- name: Download and unpack Mellanox OFED tarball | ||
ansible.builtin.unarchive: | ||
src: "{{ ofed_download_url }}" | ||
dest: "{{ ofed_tmp_dir }}" | ||
remote_src: yes | ||
become: no | ||
when: "'MLNX_OFED_LINUX-' + ofed_version not in _ofed_info.stdout" | ||
|
||
# Below from https://docs.nvidia.com/networking/display/mlnxofedv24010331/user+manual | ||
- name: Run OFED install script | ||
command: | ||
cmd: > | ||
./mlnxofedinstall | ||
--add-kernel-support | ||
{% if not ofed_update_firmware %}--without-fw-update{% endif %} | ||
--force | ||
--skip-repo | ||
{% for pkgsel in ofed_package_selection %} | ||
--{{ pkgsel }} | ||
{% endfor %} | ||
chdir: "{{ ofed_tmp_dir }}/MLNX_OFED_LINUX-{{ ofed_version }}-{{ ofed_distro }}{{ ofed_distro_version }}-{{ ofed_arch }}/" | ||
register: _ofed_install | ||
when: "'MLNX_OFED_LINUX-' + ofed_version not in _ofed_info.stdout" | ||
async: "{{ 45 * 60 }}" # wait for up to 45 minutes | ||
poll: 15 # check every 15 seconds | ||
|
||
- name: Update initramfs | ||
command: | ||
cmd: dracut -f | ||
when: '"update your initramfs" in _ofed_install.stdout | default("")' | ||
failed_when: false # always shows errors due to deleted modules for inbox RDMA drivers | ||
|
||
- name: Load the new driver | ||
command: | ||
cmd: /etc/init.d/openibd restart | ||
when: '"To load the new driver" in _ofed_install.stdout | default("")' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
- include_tasks: install.yml |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.