Skip to content

Commit 9541669

Browse files
authored
Make home volume creation optional (#673)
* optional home vol using vol size * use home_volume_provisioning * automatically modify nfs configuration depending on home volume * remove dead tf code * fix prod docs * make state volume provisioning optional * address review comments
1 parent ad2cba8 commit 9541669

File tree

7 files changed

+163
-18
lines changed

7 files changed

+163
-18
lines changed

docs/production.md

Lines changed: 44 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ and referenced from the `site` and `production` environments, e.g.:
5858

5959
Note that:
6060
- Environment-specific variables (`cluster_name`) should be hardcoded
61-
into the module block.
61+
into the cluster module block.
6262
- Environment-independent variables (e.g. maybe `cluster_net` if the
6363
same is used for staging and production) should be set as *defaults*
6464
in `environments/site/tofu/variables.tf`, and then don't need to
@@ -76,17 +76,49 @@ and referenced from the `site` and `production` environments, e.g.:
7676
instances) it may be necessary to configure or proxy `chronyd` via an
7777
environment hook.
7878
79-
- The cookiecutter provided OpenTofu configurations define resources for home and
80-
state volumes. The former may not be required if the cluster's `/home` is
81-
provided from an external filesystem (or Manila). In any case, in at least
82-
the production environment, and probably also in the staging environment,
83-
the volumes should be manually created and the resources changed to [data
84-
resources](https://opentofu.org/docs/language/data-sources/). This ensures that even if the cluster is deleted via tofu, the
85-
volumes will persist.
86-
87-
For a development environment, having volumes under tofu control via volume
88-
resources is usually appropriate as there may be many instantiations
89-
of this environment.
79+
- By default, the cookiecutter-provided OpenTofu configuration provisions two
80+
volumes and attaches them to the control node:
81+
- "$cluster_name-home" for NFS-shared home directories
82+
- "$cluster_name-state" for monitoring and Slurm data
83+
The volumes mean this data is persisted when the control node is rebuilt.
84+
However if the cluster is destroyed with `tofu destroy` then the volumes will
85+
also be deleted. This is undesirable for production environments and usually
86+
also for staging environments. Therefore the volumes should be manually
87+
created, e.g. via the CLI:
88+
89+
openstack volume create --size 200 mycluster-home # size in GB
90+
openstack volume create --size 100 mycluster-state
91+
92+
and OpenTofu configured to use those volumes instead of managing them itself
93+
by setting:
94+
95+
home_volume_provisioning = "attach"
96+
state_volume_provisioning = "attach"
97+
98+
either for a specific environment within the cluster module block in
99+
`environments/$ENV/tofu/main.tf`, or as the site default by changing the
100+
default in `environments/site/tofu/variables.tf`.
101+
102+
For a development environment allowing OpenTofu to manage the volumes using
103+
the default value of `"manage"` for those varibles is usually appropriate, as
104+
it allows for multiple clusters to be created with this environment.
105+
106+
If no home volume at all is required because the home directories are provided
107+
by a parallel filesystem (e.g. manila) set
108+
109+
home_volume_provisioning = "none"
110+
111+
In this case the NFS share for home directories is automatically disabled.
112+
113+
**NB:** To apply "attach" options to existing clusters, first remove the
114+
volume(s) from the tofu state, e.g.:
115+
116+
tofu state list # find the volume(s)
117+
tofu state rm 'module.cluster.openstack_blockstorage_volume_v3.state[0]'
118+
119+
This leaves the volume itself intact, but means OpenTofu "forgets" it. Then
120+
set the "attach" options and run `tofu apply` again - this should show there
121+
are no changes planned.
90122
91123
- Enable `etc_hosts` templating:
92124

environments/common/inventory/group_vars/all/nfs.yml

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ _nfs_node_ips: "{{ groups['nfs'] | map('extract', hostvars, 'ansible_host') | jo
1111
# default *all* entries in nfs_configurations to only permitting mounts from above IPs:
1212
nfs_export_clients: "{{ _nfs_node_ips }}"
1313

14-
nfs_configurations:
14+
nfs_configuration_home_volume: # volume-backed home directories
1515
- comment: Export /exports/home from Slurm control node as /home
1616
nfs_enable:
1717
server: "{{ inventory_hostname in groups['control'] }}"
@@ -25,8 +25,20 @@ nfs_configurations:
2525
# accidently overriden via default options
2626
nfs_export_options: 'rw,secure,root_squash'
2727

28+
nfs_configuration_compute_nodes: # cluster configuration for compute_init/slurm-controlled rebuild
2829
- comment: Export /exports/cluster from Slurm control node
2930
nfs_enable:
3031
server: "{{ inventory_hostname in groups['control'] }}"
3132
clients: false
3233
nfs_export: "/exports/cluster"
34+
35+
nfs_configurations_extra: [] # site-specific nfs shares
36+
37+
nfs_configurations: >- # construct stackhpc.nfs variable
38+
{{
39+
nfs_configuration_home_volume if (cluster_home_volume | default(true)) else []
40+
+
41+
nfs_configuration_compute_nodes
42+
+
43+
nfs_configurations_extra
44+
}}

environments/skeleton/{{cookiecutter.environment}}/tofu/control.tf

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
locals {
2-
control_volumes = concat([openstack_blockstorage_volume_v3.state], var.home_volume_size > 0 ? [openstack_blockstorage_volume_v3.home][0] : [])
2+
control_volumes = concat(
3+
# convert maps to lists with zero or one entries:
4+
[for v in data.openstack_blockstorage_volume_v3.state: v],
5+
[for v in data.openstack_blockstorage_volume_v3.home: v]
6+
)
37
nodename = templatestring(
48
var.cluster_nodename_template,
59
{
@@ -83,7 +87,7 @@ resource "openstack_compute_instance_v2" "control" {
8387
8488
mounts:
8589
- [LABEL=state, ${var.state_dir}]
86-
%{if var.home_volume_size > 0}
90+
%{if var.home_volume_provisioning != "none"}
8791
- [LABEL=home, /exports/home]
8892
%{endif}
8993
EOF

environments/skeleton/{{cookiecutter.environment}}/tofu/inventory.tf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ resource "local_file" "hosts" {
77
"login_groups": module.login
88
"compute_groups": module.compute
99
"state_dir": var.state_dir
10+
"cluster_home_volume": var.home_volume_provisioning != "none"
1011
},
1112
)
1213
filename = "../inventory/hosts.yml"

environments/skeleton/{{cookiecutter.environment}}/tofu/inventory.tpl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ all:
22
vars:
33
openhpc_cluster_name: ${cluster_name}
44
cluster_domain_suffix: ${cluster_domain_suffix}
5+
cluster_home_volume: ${cluster_home_volume}
56
cluster_compute_groups: ${jsonencode(keys(compute_groups))}
67

78
control:

environments/skeleton/{{cookiecutter.environment}}/tofu/variables.tf

Lines changed: 54 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -125,10 +125,38 @@ variable "state_volume_type" {
125125
default = null
126126
}
127127

128+
variable "state_volume_provisioning" {
129+
type = string
130+
default = "manage"
131+
description = <<-EOT
132+
How to manage the state volume. Valid values are:
133+
"manage": (Default) OpenTofu will create a volume "$cluster_name-state"
134+
and delete it when the cluster is destroyed. A volume
135+
with this name must not already exist. Use for demo and
136+
dev environments.
137+
"attach": A single volume named "$cluster_name-state" must already
138+
exist. It is not managed by OpenTofu so e.g. is left
139+
intact if the cluster is destroyed. Use for production
140+
environments.
141+
EOT
142+
validation {
143+
condition = contains(["manage", "attach"], var.state_volume_provisioning)
144+
error_message = <<-EOT
145+
home_volume_provisioning must be "manage" or "attach"
146+
EOT
147+
}
148+
}
149+
128150
variable "home_volume_size" {
129151
type = number
130-
description = "Size of state volume on control node, in GB"
131-
default = 100 # GB, 0 means no home volume
152+
description = "Size of state volume on control node, in GB."
153+
default = 100
154+
validation {
155+
condition = var.home_volume_provisioning == "manage" ? var.home_volume_size > 0 : true
156+
error_message = <<-EOT
157+
home_volume_size must be > 0 when var.home_volume_provisioning == "manage"
158+
EOT
159+
}
132160
}
133161

134162
variable "home_volume_type" {
@@ -137,6 +165,30 @@ variable "home_volume_type" {
137165
description = "Type of home volume, if not default type"
138166
}
139167

168+
variable "home_volume_provisioning" {
169+
type = string
170+
default = "manage"
171+
description = <<-EOT
172+
How to manage the home volume. Valid values are:
173+
"manage": (Default) OpenTofu will create a volume "$cluster_name-home"
174+
and delete it when the cluster is destroyed. A volume
175+
with this name must not already exist. Use for demo and
176+
dev environments.
177+
"attach": A single volume named "$cluster_name-home" must already
178+
exist. It is not managed by OpenTofu so e.g. is left
179+
intact if the cluster is destroyed. Use for production
180+
environments.
181+
"none": No home volume is used. Use if /home is provided by
182+
a parallel filesystem, e.g. manila.
183+
EOT
184+
validation {
185+
condition = contains(["manage", "attach", "none"], var.home_volume_provisioning)
186+
error_message = <<-EOT
187+
home_volume_provisioning must be one of "manage", "attach" or "none"
188+
EOT
189+
}
190+
}
191+
140192
variable "vnic_types" {
141193
type = map(string)
142194
description = <<-EOT
Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,59 @@
11
resource "openstack_blockstorage_volume_v3" "state" {
2+
3+
# NB: Changes to this resource's "address" i.e. (label or for_each key)
4+
# may lose state data for existing clusters using this volume
5+
6+
count = var.state_volume_provisioning == "manage" ? 1 : 0
7+
28
name = "${var.cluster_name}-state" # last word used to label filesystem
39
description = "State for control node"
410
size = var.state_volume_size
511
volume_type = var.state_volume_type
612
}
713

14+
data "openstack_blockstorage_volume_v3" "state" {
15+
16+
/* We use a data resource whether or not TF is managing the volume, so the
17+
logic is all in one place. But that means this needs a dependency on the
18+
actual resource to avoid a race.
19+
20+
Because there may be no volume, this has to use for_each.
21+
*/
22+
23+
for_each = toset(
24+
(var.state_volume_provisioning == "manage") ?
25+
[for v in openstack_blockstorage_volume_v3.state: v.name] :
26+
["${var.cluster_name}-state"]
27+
)
28+
29+
name = each.key
30+
31+
}
32+
833
resource "openstack_blockstorage_volume_v3" "home" {
934

10-
count = var.home_volume_size > 0 ? 1 : 0
35+
# NB: Changes to this resource's "address" i.e. (label or for_each key)
36+
# may lose user data for existing clusters using this volume
37+
38+
count = var.home_volume_provisioning == "manage" ? 1 : 0
1139

1240
name = "${var.cluster_name}-home" # last word used to label filesystem
1341
description = "Home for control node"
1442
size = var.home_volume_size
1543
volume_type = var.home_volume_type
1644
}
45+
46+
data "openstack_blockstorage_volume_v3" "home" {
47+
48+
/* Comments as for the state volume. */
49+
50+
for_each = toset(
51+
(var.home_volume_provisioning == "manage") ?
52+
[for v in openstack_blockstorage_volume_v3.home: v.name] :
53+
(var.home_volume_provisioning == "attach") ?
54+
["${var.cluster_name}-home"] :
55+
[]
56+
)
57+
58+
name = each.key
59+
}

0 commit comments

Comments
 (0)