Skip to content

Make home volume creation optional #673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 44 additions & 12 deletions docs/production.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ and referenced from the `site` and `production` environments, e.g.:

Note that:
- Environment-specific variables (`cluster_name`) should be hardcoded
into the module block.
into the cluster module block.
- Environment-independent variables (e.g. maybe `cluster_net` if the
same is used for staging and production) should be set as *defaults*
in `environments/site/tofu/variables.tf`, and then don't need to
Expand All @@ -76,17 +76,49 @@ and referenced from the `site` and `production` environments, e.g.:
instances) it may be necessary to configure or proxy `chronyd` via an
environment hook.

- The cookiecutter provided OpenTofu configurations define resources for home and
state volumes. The former may not be required if the cluster's `/home` is
provided from an external filesystem (or Manila). In any case, in at least
the production environment, and probably also in the staging environment,
the volumes should be manually created and the resources changed to [data
resources](https://opentofu.org/docs/language/data-sources/). This ensures that even if the cluster is deleted via tofu, the
volumes will persist.

For a development environment, having volumes under tofu control via volume
resources is usually appropriate as there may be many instantiations
of this environment.
- By default, the cookiecutter-provided OpenTofu configuration provisions two
volumes and attaches them to the control node:
- "$cluster_name-home" for NFS-shared home directories
- "$cluster_name-state" for monitoring and Slurm data
The volumes mean this data is persisted when the control node is rebuilt.
However if the cluster is destroyed with `tofu destroy` then the volumes will
also be deleted. This is undesirable for production environments and usually
also for staging environments. Therefore the volumes should be manually
created, e.g. via the CLI:

openstack volume create --size 200 mycluster-home # size in GB
openstack volume create --size 100 mycluster-state

and OpenTofu configured to use those volumes instead of managing them itself
by setting:

home_volume_provisioning = "attach"
state_volume_provisioning = "attach"

either for a specific environment within the cluster module block in
`environments/$ENV/tofu/main.tf`, or as the site default by changing the
default in `environments/site/tofu/variables.tf`.

For a development environment allowing OpenTofu to manage the volumes using
the default value of `"manage"` for those varibles is usually appropriate, as
it allows for multiple clusters to be created with this environment.

If no home volume at all is required because the home directories are provided
by a parallel filesystem (e.g. manila) set

home_volume_provisioning = "none"

In this case the NFS share for home directories is automatically disabled.

**NB:** To apply "attach" options to existing clusters, first remove the
volume(s) from the tofu state, e.g.:

tofu state list # find the volume(s)
tofu state rm 'module.cluster.openstack_blockstorage_volume_v3.state[0]'

This leaves the volume itself intact, but means OpenTofu "forgets" it. Then
set the "attach" options and run `tofu apply` again - this should show there
are no changes planned.

- Enable `etc_hosts` templating:

Expand Down
14 changes: 13 additions & 1 deletion environments/common/inventory/group_vars/all/nfs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ _nfs_node_ips: "{{ groups['nfs'] | map('extract', hostvars, 'ansible_host') | jo
# default *all* entries in nfs_configurations to only permitting mounts from above IPs:
nfs_export_clients: "{{ _nfs_node_ips }}"

nfs_configurations:
nfs_configuration_home_volume: # volume-backed home directories
- comment: Export /exports/home from Slurm control node as /home
nfs_enable:
server: "{{ inventory_hostname in groups['control'] }}"
Expand All @@ -25,8 +25,20 @@ nfs_configurations:
# accidently overriden via default options
nfs_export_options: 'rw,secure,root_squash'

nfs_configuration_compute_nodes: # cluster configuration for compute_init/slurm-controlled rebuild
- comment: Export /exports/cluster from Slurm control node
nfs_enable:
server: "{{ inventory_hostname in groups['control'] }}"
clients: false
nfs_export: "/exports/cluster"

nfs_configurations_extra: [] # site-specific nfs shares

nfs_configurations: >- # construct stackhpc.nfs variable
{{
nfs_configuration_home_volume if (cluster_home_volume | default(true)) else []
+
nfs_configuration_compute_nodes
+
nfs_configurations_extra
}}
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
locals {
control_volumes = concat([openstack_blockstorage_volume_v3.state], var.home_volume_size > 0 ? [openstack_blockstorage_volume_v3.home][0] : [])
control_volumes = concat(
# convert maps to lists with zero or one entries:
[for v in data.openstack_blockstorage_volume_v3.state: v],
[for v in data.openstack_blockstorage_volume_v3.home: v]
)
nodename = templatestring(
var.cluster_nodename_template,
{
Expand Down Expand Up @@ -83,7 +87,7 @@ resource "openstack_compute_instance_v2" "control" {

mounts:
- [LABEL=state, ${var.state_dir}]
%{if var.home_volume_size > 0}
%{if var.home_volume_provisioning != "none"}
- [LABEL=home, /exports/home]
%{endif}
EOF
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ resource "local_file" "hosts" {
"login_groups": module.login
"compute_groups": module.compute
"state_dir": var.state_dir
"cluster_home_volume": var.home_volume_provisioning != "none"
},
)
filename = "../inventory/hosts.yml"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ all:
vars:
openhpc_cluster_name: ${cluster_name}
cluster_domain_suffix: ${cluster_domain_suffix}
cluster_home_volume: ${cluster_home_volume}
cluster_compute_groups: ${jsonencode(keys(compute_groups))}

control:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -125,10 +125,38 @@ variable "state_volume_type" {
default = null
}

variable "state_volume_provisioning" {
type = string
default = "manage"
description = <<-EOT
How to manage the state volume. Valid values are:
"manage": (Default) OpenTofu will create a volume "$cluster_name-state"
and delete it when the cluster is destroyed. A volume
with this name must not already exist. Use for demo and
dev environments.
"attach": A single volume named "$cluster_name-state" must already
exist. It is not managed by OpenTofu so e.g. is left
intact if the cluster is destroyed. Use for production
environments.
EOT
validation {
condition = contains(["manage", "attach"], var.state_volume_provisioning)
error_message = <<-EOT
home_volume_provisioning must be "manage" or "attach"
EOT
}
}

variable "home_volume_size" {
type = number
description = "Size of state volume on control node, in GB"
default = 100 # GB, 0 means no home volume
description = "Size of state volume on control node, in GB."
default = 100
validation {
condition = var.home_volume_provisioning == "manage" ? var.home_volume_size > 0 : true
error_message = <<-EOT
home_volume_size must be > 0 when var.home_volume_provisioning == "manage"
EOT
}
}

variable "home_volume_type" {
Expand All @@ -137,6 +165,30 @@ variable "home_volume_type" {
description = "Type of home volume, if not default type"
}

variable "home_volume_provisioning" {
type = string
default = "manage"
description = <<-EOT
How to manage the home volume. Valid values are:
"manage": (Default) OpenTofu will create a volume "$cluster_name-home"
and delete it when the cluster is destroyed. A volume
with this name must not already exist. Use for demo and
dev environments.
"attach": A single volume named "$cluster_name-home" must already
exist. It is not managed by OpenTofu so e.g. is left
intact if the cluster is destroyed. Use for production
environments.
"none": No home volume is used. Use if /home is provided by
a parallel filesystem, e.g. manila.
EOT
validation {
condition = contains(["manage", "attach", "none"], var.home_volume_provisioning)
error_message = <<-EOT
home_volume_provisioning must be one of "manage", "attach" or "none"
EOT
}
}

variable "vnic_types" {
type = map(string)
description = <<-EOT
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,59 @@
resource "openstack_blockstorage_volume_v3" "state" {

# NB: Changes to this resource's "address" i.e. (label or for_each key)
# may lose state data for existing clusters using this volume

count = var.state_volume_provisioning == "manage" ? 1 : 0

name = "${var.cluster_name}-state" # last word used to label filesystem
description = "State for control node"
size = var.state_volume_size
volume_type = var.state_volume_type
}

data "openstack_blockstorage_volume_v3" "state" {

/* We use a data resource whether or not TF is managing the volume, so the
logic is all in one place. But that means this needs a dependency on the
actual resource to avoid a race.

Because there may be no volume, this has to use for_each.
*/

for_each = toset(
(var.state_volume_provisioning == "manage") ?
[for v in openstack_blockstorage_volume_v3.state: v.name] :
["${var.cluster_name}-state"]
)

name = each.key

}

resource "openstack_blockstorage_volume_v3" "home" {

count = var.home_volume_size > 0 ? 1 : 0
# NB: Changes to this resource's "address" i.e. (label or for_each key)
# may lose user data for existing clusters using this volume

count = var.home_volume_provisioning == "manage" ? 1 : 0

name = "${var.cluster_name}-home" # last word used to label filesystem
description = "Home for control node"
size = var.home_volume_size
volume_type = var.home_volume_type
}

data "openstack_blockstorage_volume_v3" "home" {

/* Comments as for the state volume. */

for_each = toset(
(var.home_volume_provisioning == "manage") ?
[for v in openstack_blockstorage_volume_v3.home: v.name] :
(var.home_volume_provisioning == "attach") ?
["${var.cluster_name}-home"] :
[]
)

name = each.key
}