Skip to content

Ensure repo files using yum_repository #151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 23 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,27 @@ The minimal image for nodes is a CentOS 7 or RockyLinux 8 GenericCloud image. Th

## Role Variables

`openhpc_release_repo`: Optional. Path to the `ohpc-release` repo to use. Defaults provide v1.3 for Centos 7 and v2 for Centos 8. Or, include this
package in the image.

`openhpc_slurm_service_enabled`: boolean, whether to enable the appropriate slurm service (slurmd/slurmctld)
`openhpc_version`: Optional. OpenHPC version to install. Defaults provide `1.3` for Centos 7 and `2` for RockyLinux/CentOS 8.

`openhpc_extra_repos`: Optional list. Extra Yum repository definitions to configure, following the format of the Ansible
[yum_repository](https://docs.ansible.com/ansible/2.9/modules/yum_repository_module.html) module. Respected keys for
each list element:
* `name`: Required
* `description`: Optional
* `file`: Required
* `baseurl`: Optional
* `metalink`: Optional
* `mirrorlist`: Optional
* `gpgcheck`: Optional
* `gpgkey`: Optional

`openhpc_slurm_service_enabled`: boolean, whether to enable the appropriate slurm service (slurmd/slurmctld).

`openhpc_slurm_service_started`: Optional boolean. Whether to start slurm services. If set to false, all services will be stopped. Defaults to `openhpc_slurm_service_enabled`.

`openhpc_slurm_control_host`: ansible host name of the controller e.g `"{{ groups['cluster_control'] | first }}"`
`openhpc_slurm_control_host`: ansible host name of the controller e.g `"{{ groups['cluster_control'] | first }}"`.

`openhpc_packages`: additional OpenHPC packages to install
`openhpc_packages`: additional OpenHPC packages to install.

`openhpc_enable`:
* `control`: whether to enable control host
Expand Down Expand Up @@ -62,7 +73,7 @@ For each group (if used) or partition any nodes in an ansible inventory group `<

`openhpc_job_maxtime`: Maximum job time limit, default `'60-0'` (60 days). See [slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `MaxTime` for format. The default is 60 days. The value should be quoted to avoid Ansible conversions.

`openhpc_cluster_name`: name of the cluster
`openhpc_cluster_name`: name of the cluster.

`openhpc_config`: Optional. Mapping of additional parameters and values for `slurm.conf`. Note these will override any included in `templates/slurm.conf.j2`.

Expand Down Expand Up @@ -99,9 +110,9 @@ accounting data such as start and end times. By default no job accounting is con
`jobcomp/filetxt`, `jobcomp/none`, `jobcomp/elasticsearch`.

`openhpc_slurm_job_acct_gather_type`: Mechanism for collecting job accounting data. Can be one
of `jobacct_gather/linux`, `jobacct_gather/cgroup` and `jobacct_gather/none`
of `jobacct_gather/linux`, `jobacct_gather/cgroup` and `jobacct_gather/none`.

`openhpc_slurm_job_acct_gather_frequency`: Sampling period for job accounting (seconds)
`openhpc_slurm_job_acct_gather_frequency`: Sampling period for job accounting (seconds).

`openhpc_slurm_job_comp_loc`: Location to store the job accounting records. Depends on value of
`openhpc_slurm_job_comp_type`, e.g for `jobcomp/filetxt` represents a path on disk.
Expand All @@ -111,15 +122,15 @@ accounting data such as start and end times. By default no job accounting is con
The following options affect `slurmdbd.conf`. Please see the slurm [documentation](https://slurm.schedmd.com/slurmdbd.conf.html) for more details.
You will need to configure these variables if you have set `openhpc_enable.database` to `true`.

`openhpc_slurmdbd_port`: Port for slurmdb to listen on, defaults to `6819`
`openhpc_slurmdbd_port`: Port for slurmdb to listen on, defaults to `6819`.

`openhpc_slurmdbd_mysql_host`: Hostname or IP Where mariadb is running, defaults to `openhpc_slurm_control_host`.

`openhpc_slurmdbd_mysql_database`: Database to use for accounting, defaults to `slurm_acct_db`
`openhpc_slurmdbd_mysql_database`: Database to use for accounting, defaults to `slurm_acct_db`.

`openhpc_slurmdbd_mysql_password`: Password for authenticating with the database. You must set this variable.

`openhpc_slurmdbd_mysql_username`: Username for authenticating with the database, defaults to `slurm`
`openhpc_slurmdbd_mysql_username`: Username for authenticating with the database, defaults to `slurm`.

## Example Inventory

Expand Down
55 changes: 52 additions & 3 deletions defaults/main.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
---
openhpc_version: "{{ '1.3' if ansible_distribution_major_version == '7' else '2' }}"
openhpc_slurm_service_enabled: true
openhpc_slurm_service_started: "{{ openhpc_slurm_service_enabled }}"
openhpc_slurm_service:
Expand Down Expand Up @@ -48,9 +49,57 @@ openhpc_enable:
ohpc_slurm_services:
control: slurmctld
batch: slurmd
ohpc_release_repos:
"7": "https://github.com/openhpc/ohpc/releases/download/v1.3.GA/ohpc-release-1.3-1.el7.x86_64.rpm" # ohpc v1.3 for Centos 7
"8": "http://repos.openhpc.community/OpenHPC/2/EL_8/x86_64/ohpc-release-2-1.el8.x86_64.rpm" # ohpc v2 for Rocky 8

# Repository configuration
openhpc_extra_repos: []

ohpc_openhpc_repos:
"7":
- name: OpenHPC
file: OpenHPC.repo
description: "OpenHPC-1.3 - Base"
baseurl: "http://build.openhpc.community/OpenHPC:/1.3/CentOS_7"
gpgcheck: true
gpgkey: https://raw.githubusercontent.com/openhpc/ohpc/v1.3.5.GA/components/admin/ohpc-release/SOURCES/RPM-GPG-KEY-OpenHPC-1
- name: OpenHPC-updates
file: OpenHPC.repo
description: "OpenHPC-1.3 - Updates"
baseurl: "http://build.openhpc.community/OpenHPC:/1.3/updates/CentOS_7"
gpgcheck: true
gpgkey: https://raw.githubusercontent.com/openhpc/ohpc/v1.3.5.GA/components/admin/ohpc-release/SOURCES/RPM-GPG-KEY-OpenHPC-1
"8":
- name: OpenHPC
file: OpenHPC.repo
description: OpenHPC-2 - Base
baseurl: "http://repos.openhpc.community/OpenHPC/2/CentOS_8"
gpgcheck: true
gpgkey: https://raw.githubusercontent.com/openhpc/ohpc/v2.6.1.GA/components/admin/ohpc-release/SOURCES/RPM-GPG-KEY-OpenHPC-2
- name: OpenHPC-updates
file: OpenHPC.repo
description: OpenHPC-2 - Updates
baseurl: "http://repos.openhpc.community/OpenHPC/2/updates/CentOS_8"
gpgcheck: true
gpgkey: https://raw.githubusercontent.com/openhpc/ohpc/v2.6.1.GA/components/admin/ohpc-release/SOURCES/RPM-GPG-KEY-OpenHPC-2

ohpc_default_extra_repos:
"7":
- name: epel
file: epel.repo
description: "Extra Packages for Enterprise Linux 7 - $basearch"
metalink: "https://mirrors.fedoraproject.org/metalink?repo=epel-7&arch=$basearch&infra=$infra&content=$contentdir"
gpgcheck: true
gpgkey: "https://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-7"
"8":
- name: epel
file: epel.repo
description: "Extra Packages for Enterprise Linux 8 - $basearch"
metalink: "https://mirrors.fedoraproject.org/metalink?repo=epel-8&arch=$basearch&infra=$infra&content=$contentdir"
gpgcheck: true
gpgkey: "https://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-8"

# Concatenate all repo definitions here
ohpc_repos: "{{ ohpc_openhpc_repos[ansible_distribution_major_version] + ohpc_default_extra_repos[ansible_distribution_major_version] + openhpc_extra_repos }}"

openhpc_munge_key:
openhpc_login_only_nodes: ''
openhpc_module_system_install: true
Expand Down
1 change: 1 addition & 0 deletions tasks/drain.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
- name: Drain compute node
command: "scontrol update nodename={{ inventory_hostname }} state=DRAIN reason='maintenance'"
when: inventory_hostname not in drained_nodes_results.stdout_lines
changed_when: true

- name: Check node has drained
command: "sinfo --noheader --Node --format='%N' --states=DRAINED"
Expand Down
29 changes: 14 additions & 15 deletions tasks/install.yml
Original file line number Diff line number Diff line change
@@ -1,23 +1,22 @@
---

- name: Gather package facts
package_facts:
manager: rpm

- name: Install ohpc-release package
yum:
name: "{{ openhpc_release_repo | default(ohpc_release_repos[ansible_distribution_major_version]) }}"
state: present
disable_gpg_check: True
when: "'ohpc-release' not in ansible_facts.packages"

- name: Update package facts
package_facts:
manager: rpm
- name: Ensure OpenHPC repos
ansible.builtin.yum_repository:
name: "{{ item.name }}"
description: "{{ item.description | default(omit) }}"
file: "{{ item.file }}"
baseurl: "{{ item.baseurl | default(omit) }}"
metalink: "{{ item.metalink | default(omit) }}"
mirrorlist: "{{ item.mirrorlist | default(omit) }}"
gpgcheck: "{{ item.gpgcheck | default(omit) }}"
gpgkey: "{{ item.gpgkey | default(omit) }}"
loop: "{{ ohpc_repos }}"
loop_control:
label: "{{ item.name }}"

- name: Include variables for OpenHPC version
include_vars:
file: "ohpc-{{ ansible_facts.packages['ohpc-release'][0]['version'] }}"
file: "ohpc-{{ openhpc_version }}"

- name: Find PowerTools repo
find:
Expand Down
18 changes: 11 additions & 7 deletions tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,22 +20,26 @@

- name: Configure
block:
- include: runtime.yml
- include_tasks: runtime.yml
when: openhpc_enable.runtime | default(false) | bool
tags: configure

- include: post-configure.yml
- name: Run post-configure tasks
include_tasks: post-configure.yml
when:
- openhpc_enable.runtime | default(false) | bool
# Requires operational slurm cluster
- openhpc_slurm_service_started | bool
tags: post-configure

- include: drain.yml
when: openhpc_enable.drain | default(false) | bool
delegate_to: "{{ openhpc_slurm_control_host }}"
- name: Run drain or resume tasks
block:
- name: Run drain tasks
include_tasks: drain.yml
when: openhpc_enable.drain | default(false) | bool

- include: resume.yml
when: openhpc_enable.resume | default(false) | bool
- name: Run resume tasks
include_tasks: resume.yml
when: openhpc_enable.resume | default(false) | bool
delegate_to: "{{ openhpc_slurm_control_host }}"
...
1 change: 1 addition & 0 deletions tasks/resume.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
- name: Resume compute node
command: "scontrol update nodename={{ inventory_hostname }} state=RESUME"
when: inventory_hostname not in resumed_nodes_results.stdout_lines
changed_when: true

- name: Check node has resumed
command: "sinfo --noheader --Node --format='%N' --states=ALLOC,IDLE"
Expand Down