Skip to content

Commit ba8a38a

Browse files
committed
nodegroups using nodesets - doesn't handle empty nodegroups
1 parent e5897b7 commit ba8a38a

File tree

3 files changed

+71
-56
lines changed

3 files changed

+71
-56
lines changed

README.md

Lines changed: 33 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -50,30 +50,44 @@ each list element:
5050

5151
### slurm.conf
5252

53-
`openhpc_slurm_partitions`: Optional. List of one or more slurm partitions, default `[]`. Each partition may contain the following values:
54-
* `groups`: If there are multiple node groups that make up the partition, a list of group objects can be defined here.
55-
Otherwise, `groups` can be omitted and the following attributes can be defined in the partition object:
56-
* `name`: The name of the nodes within this group.
57-
* `cluster_name`: Optional. An override for the top-level definition `openhpc_cluster_name`.
58-
* `extra_nodes`: Optional. A list of additional node definitions, e.g. for nodes in this group/partition not controlled by this role. Each item should be a dict, with keys/values as per the ["NODE CONFIGURATION"](https://slurm.schedmd.com/slurm.conf.html#lbAE) docs for slurm.conf. Note the key `NodeName` must be first.
59-
* `ram_mb`: Optional. The physical RAM available in each node of this group ([slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `RealMemory`) in MiB. This is set using ansible facts if not defined, equivalent to `free --mebi` total * `openhpc_ram_multiplier`.
60-
* `ram_multiplier`: Optional. An override for the top-level definition `openhpc_ram_multiplier`. Has no effect if `ram_mb` is set.
53+
`openhpc_nodegroups`: Optional, default `[]`. List of mappings, each defining a
54+
unique set of homogenous nodes:
55+
* `name`: Required. Name of node group.
56+
* `ram_mb`: Optional. The physical RAM available in each node of this group
57+
([slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `RealMemory`)
58+
in MiB. This is set using ansible facts if not defined, equivalent to
59+
`free --mebi` total * `openhpc_ram_multiplier`.
60+
* `ram_multiplier`: Optional. An override for the top-level definition
61+
`openhpc_ram_multiplier`. Has no effect if `ram_mb` is set.
6162
* `gres`: Optional. List of dicts defining [generic resources](https://slurm.schedmd.com/gres.html). Each dict must define:
6263
- `conf`: A string with the [resource specification](https://slurm.schedmd.com/slurm.conf.html#OPT_Gres_1) but requiring the format `<name>:<type>:<number>`, e.g. `gpu:A100:2`. Note the `type` is an arbitrary string.
6364
- `file`: A string with the [File](https://slurm.schedmd.com/gres.conf.html#OPT_File) (path to device(s)) for this resource, e.g. `/dev/nvidia[0-1]` for the above example.
64-
6565
Note [GresTypes](https://slurm.schedmd.com/slurm.conf.html#OPT_GresTypes) must be set in `openhpc_config` if this is used.
66-
67-
* `default`: Optional. A boolean flag for whether this partion is the default. Valid settings are `YES` and `NO`.
68-
* `maxtime`: Optional. A partition-specific time limit following the format of [slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `MaxTime`. The default value is
66+
* `params`: Optional. Mapping of additional parameters and values for
67+
[node configuration](https://slurm.schedmd.com/slurm.conf.html#lbAE).
68+
69+
Each nodegroup will contain hosts from an Ansible inventory group named
70+
`{{ openhpc_cluster_name }}_{{ group_name}}`. Note that:
71+
- Each host may only appear in one nodegroup.
72+
- Hosts in a nodegroup are assumed to be homogenous in terms of processor and memory.
73+
- Hosts may have arbitrary hostnames, but these should be lowercase to avoid a
74+
mismatch between inventory and actual hostname.
75+
- An inventory group may be missing or empty, in which case the node group
76+
contains no hosts.
77+
- If the inventory group is not empty the play must contain at least one host.
78+
This is used to set `Sockets`, `CoresPerSocket`, `ThreadsPerCore` and
79+
optionally `RealMemory` for the nodegroup.
80+
81+
`openhpc_partitions`: Optional, default `[]`. List of mappings, each defining a
82+
partition. Each partition mapping may contain:
83+
* `name`: Required. Name of partition.
84+
* `groups`: Optional. List of nodegroup names. If omitted, the partition name
85+
is assumed to match a nodegroup name.
86+
* `default`: Optional. A boolean flag for whether this partion is the default. Valid settings are `YES` and `NO`.
87+
* `maxtime`: Optional. A partition-specific time limit following the format of [slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `MaxTime`. The default value is
6988
given by `openhpc_job_maxtime`. The value should be quoted to avoid Ansible conversions.
70-
* `partition_params`: Optional. Mapping of additional parameters and values for [partition configuration](https://slurm.schedmd.com/slurm.conf.html#SECTION_PARTITION-CONFIGURATION).
71-
72-
For each group (if used) or partition any nodes in an ansible inventory group `<cluster_name>_<group_name>` will be added to the group/partition. Note that:
73-
- Nodes may have arbitrary hostnames but these should be lowercase to avoid a mismatch between inventory and actual hostname.
74-
- Nodes in a group are assumed to be homogenous in terms of processor and memory.
75-
- An inventory group may be empty or missing, but if it is not then the play must contain at least one node from it (used to set processor information).
76-
89+
* `params`: Optional. Mapping of additional parameters and values for
90+
[partition configuration](https://slurm.schedmd.com/slurm.conf.html#SECTION_PARTITION-CONFIGURATION).
7791

7892
`openhpc_job_maxtime`: Maximum job time limit, default `'60-0'` (60 days). See [slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `MaxTime` for format. The default is 60 days. The value should be quoted to avoid Ansible conversions.
7993

defaults/main.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@ openhpc_slurm_service_started: "{{ openhpc_slurm_service_enabled }}"
44
openhpc_slurm_service:
55
openhpc_slurm_control_host: "{{ inventory_hostname }}"
66
#openhpc_slurm_control_host_address:
7-
openhpc_slurm_partitions: []
7+
openhpc_partitions: []
8+
openhpc_nodegroups: []
89
openhpc_cluster_name:
910
openhpc_packages:
1011
- slurm-libpmi-ohpc

templates/slurm.conf.j2

Lines changed: 36 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -135,55 +135,55 @@ SlurmdSyslogDebug=info
135135
#SlurmSchedLogFile=
136136
#SlurmSchedLogLevel=
137137
#DebugFlags=
138-
#
139-
#
140-
# POWER SAVE SUPPORT FOR IDLE NODES - NOT SUPPORTED IN THIS APPLIANCE VERSION
141138

142139
# LOGIN-ONLY NODES
143140
# Define slurmd nodes not in partitions for login-only nodes in "configless" mode:
144141
{%if openhpc_login_only_nodes %}{% for node in groups[openhpc_login_only_nodes] %}
145142
NodeName={{ node }}
146143
{% endfor %}{% endif %}
147144

148-
# COMPUTE NODES
149-
# OpenHPC default configuration
150145
PropagateResourceLimitsExcept=MEMLOCK
151146
Epilog=/etc/slurm/slurm.epilog.clean
152-
{% set donehosts = [] %}
153-
{% for part in openhpc_slurm_partitions %}
154-
{% set nodelist = [] %}
155-
{% for group in part.get('groups', [part]) %}
156-
{% set group_name = group.cluster_name|default(openhpc_cluster_name) ~ '_' ~ group.name %}
157-
# openhpc_slurm_partitions group: {{ group_name }}
158-
{% set inventory_group_hosts = groups.get(group_name, []) %}
159-
{% if inventory_group_hosts | length > 0 %}
160-
{% set play_group_hosts = inventory_group_hosts | intersect (play_hosts) %}
161-
{% set first_host = play_group_hosts | first | mandatory('Group "' ~ group_name ~ '" contains no hosts in this play - was --limit used?') %}
162-
{% set first_host_hv = hostvars[first_host] %}
163-
{% set ram_mb = (first_host_hv['ansible_memory_mb']['real']['total'] * (group.ram_multiplier | default(openhpc_ram_multiplier))) | int %}
164-
{% for hostlist in (inventory_group_hosts | hostlist_expression) %}
165-
{% set gres = ' Gres=%s' % (','.join(group.gres | map(attribute='conf') )) if 'gres' in group else '' %}
166-
{% if hostlist not in donehosts %}
167-
NodeName={{ hostlist }} State=UNKNOWN RealMemory={{ group.get('ram_mb', ram_mb) }} Sockets={{first_host_hv['ansible_processor_count']}} CoresPerSocket={{ first_host_hv['ansible_processor_cores'] }} ThreadsPerCore={{ first_host_hv['ansible_processor_threads_per_core'] }}{{ gres }}
168-
{% endif %}
169-
{% set _ = nodelist.append(hostlist) %}
170-
{% set _ = donehosts.append(hostlist) %}
171-
{% endfor %}{# nodes #}
172-
{% endif %}{# inventory_group_hosts #}
173-
{% for extra_node_defn in group.get('extra_nodes', []) %}
174-
{{ extra_node_defn.items() | map('join', '=') | join(' ') }}
175-
{% set _ = nodelist.append(extra_node_defn['NodeName']) %}
176-
{% endfor %}
177-
{% endfor %}{# group #}
178-
{% if not nodelist %}{# empty partition #}
179-
{% set nodelist = ['""'] %}
180-
{% endif %}
181-
PartitionName={{part.name}} Default={{ part.get('default', 'YES') }} MaxTime={{ part.get('maxtime', openhpc_job_maxtime) }} State=UP Nodes={{ nodelist | join(',') }} {{ part.partition_params | default({}) | dict2parameters }}
182-
{% endfor %}{# partitions #}
147+
148+
# COMPUTE NODES
149+
# OpenHPC default configuration
150+
{% for nodegroup in openhpc_nodegroups %}
151+
{% set inventory_group_name = openhpc_cluster_name ~ '_' ~ nodegroup.name %}
152+
{% set inventory_group_hosts = groups.get(inventory_group_name, []) %}
153+
{% if inventory_group_hosts | length > 0 %}
154+
{% set play_group_hosts = inventory_group_hosts | intersect (play_hosts) %}
155+
{% set first_host = play_group_hosts | first | mandatory('Inventory group "' ~ inventory_group_name ~ '" contains no hosts in this play - was --limit used?') %}
156+
{% set first_host_hv = hostvars[first_host] %}
157+
{% set ram_mb = (first_host_hv['ansible_memory_mb']['real']['total'] * (nodegroup.ram_multiplier | default(openhpc_ram_multiplier))) | int %}
158+
{% set hostlists = (inventory_group_hosts | hostlist_expression) %}{# hosts in inventory group aren't necessarily a single hostlist expression #}
159+
{% for hostlist in hostlists %}
160+
NodeName={{ hostlist }} {{ '' -}}
161+
State=UNKNOWN {{ '' -}}
162+
RealMemory={{ nodegroup.ram_mb | default(ram_mb) }} {{ '' -}}
163+
Sockets={{first_host_hv['ansible_processor_count'] }} {{ '' -}}
164+
CoresPerSocket={{ first_host_hv['ansible_processor_cores'] }} {{ '' -}}
165+
ThreadsPerCore={{ first_host_hv['ansible_processor_threads_per_core'] }} {{ '' -}}
166+
{{ nodegroup.params | default({}) | dict2parameters }} {{ '' -}}
167+
{% if 'gres' in nodegroup %}Gres={{ ','.join(nodegroup.gres | map(attribute='conf')) }}{% endif %}
168+
{% endfor %}{# hostlists #}
169+
170+
NodeSet={{ nodegroup.name }} Nodes={{ ','.join(hostlists) }}{# no support for creating nodesets by Feature #}
171+
{% endif %}{# 1 or more hosts in inventory #}
172+
{% endfor %}
183173

184174
# Define a non-existent node, in no partition, so that slurmctld starts even with all partitions empty
185175
NodeName=nonesuch
186176

177+
# PARTITIONS
178+
{% for partition in openhpc_partitions %}
179+
PartitionName={{partition.name}} {{ '' -}}
180+
Default={{ partition.get('default', 'YES') }} {{ '' -}}
181+
MaxTime={{ partition.get('maxtime', openhpc_job_maxtime) }} {{ '' -}}
182+
State=UP Nodes={{ partition.get('groups', [partition.name]) | join(',') }} {{ '' -}}
183+
{{ partition.params | default({}) | dict2parameters }}
184+
{% endfor %}{# openhpc_partitions #}
185+
187186
{% if openhpc_slurm_configless | bool %}SlurmctldParameters=enable_configless{% endif %}
188187

188+
189189
ReturnToService=2

0 commit comments

Comments
 (0)