Skip to content

Support EESSI #252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/stackhpc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,12 @@ jobs:
. environments/.stackhpc/activate
ansible-playbook -vv ansible/adhoc/hpctests.yml

- name: Run EESSI tests
run: |
. venv/bin/activate
. environments/.stackhpc/activate
ansible-playbook -vv ansible/ci/check_eessi.yml

- name: Confirm Open Ondemand is up (via SOCKS proxy)
run: |
. venv/bin/activate
Expand Down
10 changes: 10 additions & 0 deletions ansible/bootstrap.yml
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,16 @@
tasks_from: config.yml
tags: config

- name: Setup EESSI
hosts: eessi
tags: eessi
become: true
gather_facts: false
tasks:
- name: Install and configure EESSI
import_role:
name: eessi

- hosts: update
gather_facts: false
become: yes
Expand Down
34 changes: 34 additions & 0 deletions ansible/ci/check_eessi.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
- name: Run EESSI test job
hosts: login[0]
vars:
eessi_test_rootdir: /home/eessi_test
tasks:
- name: Create test root directory
file:
path: "{{ eessi_test_rootdir }}"
state: directory
owner: "{{ ansible_user }}"
group: "{{ ansible_user }}"
become: true

- name: Clone eessi-demo repo
ansible.builtin.git:
repo: "https://github.com/eessi/eessi-demo.git"
dest: "{{ eessi_test_rootdir }}/eessi-demo"

- name: Run test job
ansible.builtin.shell:
cmd: |
source /cvmfs/pilot.eessi-hpc.org/latest/init/bash
srun ./run.sh
chdir: "{{ eessi_test_rootdir }}/eessi-demo/TensorFlow"
executable: /bin/bash
register: job_output

- name: Fail if job output contains error
fail:
# Note: Job prints live progress bar to terminal, so use regex filter to remove this from stdout
msg: "Test job using EESSI modules failed. Job output was: {{ job_output.stdout | regex_replace('\b', '') }}"
when: '"Epoch 5/5" not in job_output.stdout'

6 changes: 3 additions & 3 deletions ansible/ci/check_sacct_hpctests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
gather_facts: false
become: true
vars:
sacct_stdout_expected: |- # based on CI running hpctests as the first job - NB note no trailing newline
sacct_stdout_expected: |- # based on CI running hpctests as the first job
JobID,JobName,State
1,pingpong.sh,COMPLETED
2,pingmatrix.sh,COMPLETED
Expand All @@ -18,10 +18,10 @@
register: sacct
- name: Check info for ended jobs
assert:
that: sacct.stdout == sacct_stdout_expected
that: sacct_stdout_expected in sacct.stdout
fail_msg: |
Expected:
--{{ sacct_stdout_expected }}--
Got:
--{{ sacct.stdout }}--
success_msg: sacct shows hpctests jobs as first and only jobs
success_msg: sacct shows hpctests jobs as first jobs in list
34 changes: 34 additions & 0 deletions ansible/roles/eessi/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
EESSI
=====

Configure the EESSI pilot respository for use on given hosts.

Requirements
------------

None.

Role Variables
--------------

- `cvmfs_quota_limit_mb`: Optional int. Maximum size of local package cache on each node in MB.
- `cvmfs_config_overrides`: Optional dict. Set of key-value pairs for additional CernVM-FS settings see [official docs](https://cvmfs.readthedocs.io/en/stable/cpt-configure.html) for list of options. Each dict key should correspond to a valid config variable (e.g. `CVMFS_HTTP_PROXY`) and the corresponding dict value will be set as the variable value (e.g. `https://my-proxy.com`). These configuration parameters will be written to the `/etc/cvmfs/default.local` config file on each host in the form `KEY=VALUE`.

Dependencies
------------

None.

Example Playbook
----------------

```yaml
- name: Setup EESSI
hosts: eessi
tags: eessi
become: true
tasks:
- name: Install and configure EESSI
import_role:
name: eessi
```
11 changes: 11 additions & 0 deletions ansible/roles/eessi/defaults/main.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
# Default to 10GB
cvmfs_quota_limit_mb: 10000

cvmfs_config_default:
CVMFS_CLIENT_PROFILE: single
CVMFS_QUOTA_LIMIT: "{{ cvmfs_quota_limit_mb }}"

cvmfs_config_overrides: {}

cvmfs_config: "{{ cvmfs_config_default | combine(cvmfs_config_overrides) }}"
46 changes: 46 additions & 0 deletions ansible/roles/eessi/tasks/main.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
- name: Download Cern GPG key
ansible.builtin.get_url:
url: http://cvmrepo.web.cern.ch/cvmrepo/yum/RPM-GPG-KEY-CernVM
dest: ./cvmfs-key.gpg

- name: Import downloaded GPG key
command: rpm --import cvmfs-key.gpg

- name: Add CVMFS repo
dnf:
name: https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest.noarch.rpm

- name: Install CVMFS
dnf:
name: cvmfs

- name: Install EESSI CVMFS config
dnf:
name: https://github.com/EESSI/filesystem-layer/releases/download/latest/cvmfs-config-eessi-latest.noarch.rpm
# NOTE: Can't find any docs on obtaining gpg key - maybe downloading directly from github is ok?
disable_gpg_check: true

# Alternative version using official repo - still no GPG key :(
# - name: Add EESSI repo
# dnf:
# name: http://repo.eessi-infra.org/eessi/rhel/8/noarch/eessi-release-0-1.noarch.rpm

# - name: Install EESSI CVMFS config
# dnf:
# name: cvmfs-config-eessi

- name: Add base CVMFS config
community.general.ini_file:
dest: /etc/cvmfs/default.local
section: null
option: "{{ item.key }}"
value: "{{ item.value }}"
no_extra_spaces: true
loop: "{{ cvmfs_config | dict2items }}"


# NOTE: Not clear how to make this idempotent
- name: Ensure CVMFS config is setup
command:
cmd: "cvmfs_config setup"
2 changes: 1 addition & 1 deletion environments/.stackhpc/builder.pkrvars.hcl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
flavor = "vm.ska.cpu.general.small"
networks = ["a262aabd-e6bf-4440-a155-13dbc1b5db0e"] # WCDC-iLab-60
source_image_name = "openhpc-230412-1447-e3769af6.qcow2" # https://github.com/stackhpc/ansible-slurm-appliance/pull/258
source_image_name = "openhpc-230503-0944-bf8c3f63.qcow2" # https://github.com/stackhpc/ansible-slurm-appliance/pull/252
#source_image_name = "Rocky-8-GenericCloud-Base-8.7-20221130.0.x86_64.qcow2"
ssh_keypair_name = "slurm-app-ci"
security_groups = ["default", "SSH"]
Expand Down
2 changes: 1 addition & 1 deletion environments/.stackhpc/terraform/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ variable "create_nodes" {
variable "cluster_image" {
description = "single image for all cluster nodes - a convenience for CI"
type = string
default = "openhpc-230412-1447-e3769af6.qcow2" # https://github.com/stackhpc/ansible-slurm-appliance/pull/258
default = "openhpc-230503-0944-bf8c3f63.qcow2" # https://github.com/stackhpc/ansible-slurm-appliance/pull/252
# default = "Rocky-8-GenericCloud-Base-8.7-20221130.0.x86_64.qcow2"
# default = "Rocky-8-GenericCloud-8.6.20220702.0.x86_64.qcow2"
}
Expand Down
3 changes: 3 additions & 0 deletions environments/common/inventory/groups
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ login
control
compute

[eessi:children]
# Hosts on which EESSI stack should be configured

[hpctests:children]
# Login group to use for running mpi-based testing.
login
Expand Down
3 changes: 3 additions & 0 deletions environments/common/layouts/everything
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@ compute
[etc_hosts]
# Hosts to manage /etc/hosts e.g. if no internal DNS. See ansible/roles/etc_hosts/README.md

[eessi:children]
openhpc

[resolv_conf]
# Allows defining nameservers in /etc/resolv.conf - see ansible/roles/resolv_conf/README.md

Expand Down