You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+61-89Lines changed: 61 additions & 89 deletions
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@ each list element:
22
22
*`gpgcheck`: Optional
23
23
*`gpgkey`: Optional
24
24
25
-
`openhpc_slurm_service_enabled`: boolean, whether to enable the appropriate slurm service (slurmd/slurmctld).
25
+
`openhpc_slurm_service_enabled`: Optional boolean. Whether to enable the appropriate slurm service (slurmd/slurmctld).
26
26
27
27
`openhpc_slurm_service_started`: Optional boolean. Whether to start slurm services. If set to false, all services will be stopped. Defaults to `openhpc_slurm_service_enabled`.
28
28
@@ -33,23 +33,25 @@ each list element:
33
33
`openhpc_packages`: additional OpenHPC packages to install.
34
34
35
35
`openhpc_enable`:
36
-
*`control`: whether to enable control host
37
-
*`database`: whether to enable slurmdbd
38
-
*`batch`: whether to enable compute nodes
36
+
*`control`: whether host should run slurmctld
37
+
*`database`: whether host should run slurmdbd
38
+
*`batch`: whether host should run slurmd
39
39
*`runtime`: whether to enable OpenHPC runtime
40
40
41
41
`openhpc_slurmdbd_host`: Optional. Where to deploy slurmdbd if are using this role to deploy slurmdbd, otherwise where an existing slurmdbd is running. This should be the name of a host in your inventory. Set this to `none` to prevent the role from managing slurmdbd. Defaults to `openhpc_slurm_control_host`.
42
42
43
-
`openhpc_slurm_configless`: Optional, default false. If true then slurm's ["configless" mode](https://slurm.schedmd.com/configless_slurm.html) is used.
43
+
Note slurm's ["configless" mode](https://slurm.schedmd.com/configless_slurm.html) is always used.
44
44
45
-
`openhpc_munge_key`: Optional. Define a munge key to use. If not provided then one is generated but the `openhpc_slurm_control_host` must be in the play.
45
+
`openhpc_munge_key`: Required. Define a munge key to use.
46
46
47
-
`openhpc_login_only_nodes`: Optional. If using "configless" mode specify the name of an ansible group containing nodes which are login-only nodes (i.e. not also control nodes), if required. These nodes will run `slurmd` to contact the control node for config.
47
+
`openhpc_login_only_nodes`: Optional. The name of an ansible inventory group containing nodes which are login nodes (i.e. not also control nodes). These nodes must have `openhpc_enable.batch: true` and will run `slurmd` to contact the control node for config.
48
48
49
49
`openhpc_module_system_install`: Optional, default true. Whether or not to install an environment module system. If true, lmod will be installed. If false, You can either supply your own module system or go without one.
50
50
51
51
### slurm.conf
52
52
53
+
`openhpc_cluster_name`: Required, name of the cluster.
54
+
53
55
`openhpc_slurm_partitions`: Optional. List of one or more slurm partitions, default `[]`. Each partition may contain the following values:
54
56
*`groups`: If there are multiple node groups that make up the partition, a list of group objects can be defined here.
55
57
Otherwise, `groups` can be omitted and the following attributes can be defined in the partition object:
@@ -64,7 +66,7 @@ each list element:
64
66
65
67
Note [GresTypes](https://slurm.schedmd.com/slurm.conf.html#OPT_GresTypes) must be set in `openhpc_config` if this is used.
66
68
67
-
*`default`: Optional. A boolean flag for whether this partion is the default. Valid settings are `YES` and `NO`.
69
+
*`default`: Optional. Whether this partion is the default, valid settings are `YES` and `NO`.
68
70
*`maxtime`: Optional. A partition-specific time limit following the format of [slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `MaxTime`. The default value is
69
71
given by `openhpc_job_maxtime`. The value should be quoted to avoid Ansible conversions.
70
72
*`partition_params`: Optional. Mapping of additional parameters and values for [partition configuration](https://slurm.schedmd.com/slurm.conf.html#SECTION_PARTITION-CONFIGURATION).
@@ -74,52 +76,29 @@ For each group (if used) or partition any nodes in an ansible inventory group `<
74
76
- Nodes in a group are assumed to be homogenous in terms of processor and memory.
75
77
- An inventory group may be empty or missing, but if it is not then the play must contain at least one node from it (used to set processor information).
76
78
77
-
78
79
`openhpc_job_maxtime`: Maximum job time limit, default `'60-0'` (60 days). See [slurm.conf](https://slurm.schedmd.com/slurm.conf.html) parameter `MaxTime` for format. The default is 60 days. The value should be quoted to avoid Ansible conversions.
79
80
80
-
`openhpc_cluster_name`: name of the cluster.
81
+
`openhpc_ram_multiplier`: Optional, default `0.95`. Multiplier used in the calculation: `total_memory * openhpc_ram_multiplier` when setting `RealMemory` for the partition in slurm.conf. Can be overriden on a per partition basis using `openhpc_slurm_partitions.ram_multiplier`. Has no effect if `openhpc_slurm_partitions.ram_mb` is set.
81
82
82
-
`openhpc_config`: Optional. Mapping of additional parameters and values for `slurm.conf`. Note these will override any included in `templates/slurm.conf.j2`.
83
+
`openhpc_slurm_conf_default`: Optional. Multiline string giving default key=value parameters for `slurm.conf`. This may include jinja templating. See [defaults/main.yml](defaults/main.yml) for details. Values are only included here if either a) this role sets them to non-default values or b) they are parameterised from other role variables. Note any values here may be overriden using `openhpc_slurm_conf_overrides`.
83
84
84
-
`openhpc_ram_multiplier`: Optional, default `0.95`. Multiplier used in the calculation: `total_memory * openhpc_ram_multiplier` when setting `RealMemory` for the partition in slurm.conf. Can be overriden on a per partition basis using `openhpc_slurm_partitions.ram_multiplier`. Has no effect if `openhpc_slurm_partitions.ram_mb` is set.
85
+
`openhpc_slurm_conf_overrides`: Optional. Multiline string giving key=value parameters for `slurm.conf` to override those from `openhpc_slurm_conf_default`. This may include jinja templating. Note keys must be unique so this cannot be used to add e.g. additional `NodeName=...` entries. TODO: Fix this via an additional var.
86
+
87
+
`openhpc_slurm_conf_template`: Optional. Name/path of template for `slurm.conf`. The default template uses the relevant role variables and this should not usually need changing.
85
88
86
89
`openhpc_state_save_location`: Optional. Absolute path for Slurm controller state (`slurm.conf` parameter [StateSaveLocation](https://slurm.schedmd.com/slurm.conf.html#OPT_StateSaveLocation))
87
90
88
91
#### Accounting
89
92
90
-
By default, no accounting storage is configured. OpenHPC v1.x and un-updated OpenHPC v2.0 clusters support file-based accounting storage which can be selected by setting the role variable `openhpc_slurm_accounting_storage_type` to `accounting_storage/filetxt`<supid="accounting_storage">[1](#slurm_ver_footnote)</sup>. Accounting for OpenHPC v2.1 and updated OpenHPC v2.0 clusters requires the Slurm database daemon, `slurmdbd` (although job completion may be a limited alternative, see [below](#Job-accounting). To enable accounting:
93
+
By default, no accounting storage is configured. To enable accounting:
91
94
92
95
* Configure a mariadb or mysql server as described in the slurm accounting [documentation](https://slurm.schedmd.com/accounting.html) on one of the nodes in your inventory and set `openhpc_enable.database `to `true` for this node.
93
-
* Set `openhpc_slurm_accounting_storage_type` to `accounting_storage/slurmdbd`.
94
-
* Configure the variables for `slurmdbd.conf` below.
95
-
96
-
The role will take care of configuring the following variables for you:
97
-
98
-
`openhpc_slurm_accounting_storage_host`: Where the accounting storage service is running i.e where slurmdbd running.
99
-
100
-
`openhpc_slurm_accounting_storage_port`: Which port to use to connect to the accounting storage.
101
-
102
-
`openhpc_slurm_accounting_storage_user`: Username for authenticating with the accounting storage.
103
-
104
-
`openhpc_slurm_accounting_storage_pass`: Mungekey or database password to use for authenticating.
105
-
106
-
For more advanced customisation or to configure another storage type, you might want to modify these values manually.
107
-
108
-
#### Job accounting
109
-
110
-
This is largely redundant if you are using the accounting plugin above, but will give you basic
111
-
accounting data such as start and end times. By default no job accounting is configured.
112
-
113
-
`openhpc_slurm_job_comp_type`: Logging mechanism for job accounting. Can be one of
To deploy, create a playbook which looks like this:
162
-
163
-
---
164
-
- hosts:
165
-
- cluster_login
166
-
- cluster_control
167
-
- cluster_batch
168
-
become: yes
169
-
roles:
170
-
- role: openhpc
171
-
openhpc_enable:
172
-
control: "{{ inventory_hostname in groups['cluster_control'] }}"
173
-
batch: "{{ inventory_hostname in groups['cluster_batch'] }}"
174
-
runtime: true
175
-
openhpc_slurm_service_enabled: true
176
-
openhpc_slurm_control_host: "{{ groups['cluster_control'] | first }}"
177
-
openhpc_slurm_partitions:
178
-
- name: "compute"
179
-
openhpc_cluster_name: openhpc
180
-
openhpc_packages: []
181
-
...
182
-
183
-
---
184
-
185
-
<bid="slurm_ver_footnote">1</b> Slurm 20.11 removed `accounting_storage/filetxt` as an option. This version of Slurm was introduced in OpenHPC v2.1 but the OpenHPC repos are common to all OpenHPC v2.x releases. [↩](#accounting_storage)
118
+
## Example
119
+
120
+
With this Ansible inventory:
121
+
122
+
```ini
123
+
[cluster_control]
124
+
control-0
125
+
126
+
[cluster_login]
127
+
login-0
128
+
129
+
[cluster_compute]
130
+
compute-0
131
+
compute-1
132
+
```
133
+
134
+
The following playbook deploys control, login and compute nodes with a customised `slurm.conf` adding debug logging.
135
+
136
+
```yaml
137
+
- hosts:
138
+
- cluster_login
139
+
- cluster_control
140
+
- cluster_compute
141
+
become: yes
142
+
vars:
143
+
openhpc_enable:
144
+
control: "{{ inventory_hostname in groups['cluster_control'] }}"
145
+
batch: "{{ inventory_hostname in groups['cluster_compute'] + groups['cluster_login'] }}"
146
+
runtime: true
147
+
openhpc_slurm_control_host: "{{ groups['cluster_control'] | first }}"
# only include non-default (as constant) or templated values (b/c another part of the role needs it)
102
+
openhpc_slurm_conf_default: |
103
+
ClusterName={{ openhpc_cluster_name }}
104
+
SlurmctldHost={{ openhpc_slurm_control_host }}{% if openhpc_slurm_control_host_address is defined %}({{ openhpc_slurm_control_host_address }}){% endif %}
0 commit comments