Skip to content

Commit eb216df

Browse files
sjpbm-bull
andauthored
Support multiple networks in OpenTofu configurations (#548)
* support multiple networks for control node * support multiple networks for all nodes w/ inventory output * simplify control node definition and access IP * add network docs * Apply suggestions from code review Co-authored-by: Matt Anson <[email protected]> * use first network as access network and support extra_networks only * fixup control node for access network changes --------- Co-authored-by: Matt Anson <[email protected]>
1 parent d6efcb6 commit eb216df

File tree

13 files changed

+231
-89
lines changed

13 files changed

+231
-89
lines changed

docs/networks.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# Networking
2+
3+
The default OpenTofu configurations in the appliance do not provision networks,
4+
subnets or associated infrastructure such as routers. The requirements are that:
5+
1. At least one network exists.
6+
2. The first network defined spans all nodes, referred to as the "access network".
7+
3. Only one subnet per network is attached to nodes.
8+
4. At least one network on each node provides outbound internet access (either
9+
directly, or via a proxy).
10+
11+
Futhermore, it is recommended that the deploy host has an interface on the
12+
access network. While it is possible to e.g. use a floating IP on a login node
13+
as an SSH proxy to access the other nodes, this can create problems in recovering
14+
the cluster if the login node is unavailable and can make Ansible problems harder
15+
to debug.
16+
17+
This page describes supported configurations and how to implement them using
18+
the OpenTofu variables. These will normally be set in
19+
`environments/site/tofu/terraform.tfvars` for the site base environment. If they
20+
need to be overriden for specific environments, this can be done via an OpenTofu
21+
module as discussed [here](./production.md).
22+
23+
Note that if an OpenStack subnet has a gateway IP defined then nodes with ports
24+
attached to that subnet will get a default route set via that gateway.
25+
26+
## Single network
27+
This is the simplest possible configuration. A single network and subnet is
28+
used for all nodes. The subnet provides outbound internet access via the default
29+
route defined by the subnet gateway (often an OpenStack router to an external
30+
network).
31+
32+
```terraform
33+
cluster_networks = [
34+
{
35+
network = "netA"
36+
subnet = "subnetA"
37+
}
38+
]
39+
...
40+
```
41+
42+
## Multiple homogenous networks
43+
This is similar to the above, except each node has multiple networks. The first
44+
network, "netA" is the access network. Note that only one subnet must have a
45+
gateway defined, else default routes via both subnets will be present causing
46+
routing problems. It also shows the second network (netB) using direct-type
47+
vNICs for RDMA.
48+
49+
```terraform
50+
cluster_networks = [
51+
{
52+
network = "netA"
53+
subnet = "subnetA"
54+
},
55+
{
56+
network = "netB"
57+
subnet = "subnetB"
58+
},
59+
]
60+
61+
vnic_types = {
62+
netB = "direct"
63+
}
64+
...
65+
```
66+
67+
68+
## Additional networks on some nodes
69+
70+
This example shows how to modify variables for specific node groups. In this
71+
case a baremetal node group has a second network attached. As above, only a
72+
single subnet can have a gateway IP.
73+
74+
```terraform
75+
cluster_networks = [
76+
{
77+
network = "netA"
78+
subnet = "subnetA"
79+
}
80+
]
81+
82+
compute = {
83+
general = {
84+
nodes = ["general-0", "general-1"]
85+
}
86+
baremetal = {
87+
nodes = ["baremetal-0", "baremetal-1"]
88+
extra_networks = [
89+
{
90+
network = "netB"
91+
subnet = "subnetB"
92+
}
93+
]
94+
vnic_types = {
95+
netA = "baremetal"
96+
netB = "baremetal"
97+
...
98+
}
99+
}
100+
}
101+
...
102+
```

environments/.stackhpc/tofu/LEAFCLOUD.tfvars

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
1-
cluster_net = "slurmapp-ci"
2-
cluster_subnet = "slurmapp-ci"
1+
cluster_networks = [
2+
{
3+
network = "slurmapp-ci"
4+
subnet = "slurmapp-ci"
5+
}
6+
]
37
control_node_flavor = "ec1.medium" # small ran out of memory, medium gets down to ~100Mi mem free on deployment
48
other_node_flavor = "en1.xsmall"
59
state_volume_type = "unencrypted"

environments/.stackhpc/tofu/main.tf

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -30,12 +30,10 @@ variable "cluster_image" {
3030
type = map(string)
3131
}
3232

33-
variable "cluster_net" {}
33+
variable "cluster_networks" {}
3434

35-
variable "cluster_subnet" {}
36-
37-
variable "vnic_type" {
38-
default = "normal"
35+
variable "vnic_types" {
36+
default = {}
3937
}
4038

4139
variable "state_volume_type"{
@@ -63,9 +61,8 @@ module "cluster" {
6361
source = "../../skeleton/{{cookiecutter.environment}}/tofu/"
6462

6563
cluster_name = var.cluster_name
66-
cluster_net = var.cluster_net
67-
cluster_subnet = var.cluster_subnet
68-
vnic_type = var.vnic_type
64+
cluster_networks = var.cluster_networks
65+
vnic_types = var.vnic_types
6966
key_pair = "slurm-app-ci"
7067
cluster_image_id = data.openstack_images_image_v2.cluster.id
7168
control_node_flavor = var.control_node_flavor

environments/skeleton/{{cookiecutter.environment}}/tofu/compute.tf

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,25 +7,27 @@ module "compute" {
77
nodes = each.value.nodes
88
flavor = each.value.flavor
99

10+
# always taken from top-level value:
1011
cluster_name = var.cluster_name
1112
cluster_domain_suffix = var.cluster_domain_suffix
12-
cluster_net_id = data.openstack_networking_network_v2.cluster_net.id
13-
cluster_subnet_id = data.openstack_networking_subnet_v2.cluster_subnet.id
14-
13+
key_pair = var.key_pair
14+
environment_root = var.environment_root
15+
1516
# can be set for group, defaults to top-level value:
1617
image_id = lookup(each.value, "image_id", var.cluster_image_id)
17-
vnic_type = lookup(each.value, "vnic_type", var.vnic_type)
18-
vnic_profile = lookup(each.value, "vnic_profile", var.vnic_profile)
18+
vnic_types = lookup(each.value, "vnic_types", var.vnic_types)
19+
vnic_profiles = lookup(each.value, "vnic_profiles", var.vnic_profiles)
1920
volume_backed_instances = lookup(each.value, "volume_backed_instances", var.volume_backed_instances)
2021
root_volume_size = lookup(each.value, "root_volume_size", var.root_volume_size)
22+
23+
# optionally set for group
24+
networks = concat(var.cluster_networks, lookup(each.value, "extra_networks", []))
2125
extra_volumes = lookup(each.value, "extra_volumes", {})
22-
2326
compute_init_enable = lookup(each.value, "compute_init_enable", [])
2427
ignore_image_changes = lookup(each.value, "ignore_image_changes", false)
2528

26-
key_pair = var.key_pair
27-
environment_root = var.environment_root
29+
# computed
2830
k3s_token = local.k3s_token
29-
control_address = [for n in openstack_compute_instance_v2.control["control"].network: n.fixed_ip_v4 if n.access_network][0]
31+
control_address = openstack_compute_instance_v2.control.access_ip_v4
3032
security_group_ids = [for o in data.openstack_networking_secgroup_v2.nonlogin: o.id]
3133
}

environments/skeleton/{{cookiecutter.environment}}/tofu/control.tf

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,27 +4,27 @@ locals {
44

55
resource "openstack_networking_port_v2" "control" {
66

7-
name = "${var.cluster_name}-control"
8-
network_id = data.openstack_networking_network_v2.cluster_net.id
7+
for_each = {for net in var.cluster_networks: net.network => net}
8+
9+
name = "${var.cluster_name}-control-${each.key}"
10+
network_id = data.openstack_networking_network_v2.cluster_net[each.key].id
911
admin_state_up = "true"
1012

1113
fixed_ip {
12-
subnet_id = data.openstack_networking_subnet_v2.cluster_subnet.id
14+
subnet_id = data.openstack_networking_subnet_v2.cluster_subnet[each.key].id
1315
}
1416

1517
security_group_ids = [for o in data.openstack_networking_secgroup_v2.nonlogin: o.id]
1618

1719
binding {
18-
vnic_type = var.vnic_type
19-
profile = var.vnic_profile
20+
vnic_type = lookup(var.vnic_types, each.key, "normal")
21+
profile = lookup(var.vnic_profiles, each.key, "{}")
2022
}
2123
}
2224

2325
resource "openstack_compute_instance_v2" "control" {
2426

25-
for_each = toset(["control"])
26-
27-
name = "${var.cluster_name}-${each.key}"
27+
name = "${var.cluster_name}-control"
2828
image_id = var.cluster_image_id
2929
flavor_name = var.control_node_flavor
3030
key_pair = var.key_pair
@@ -49,19 +49,23 @@ resource "openstack_compute_instance_v2" "control" {
4949
}
5050
}
5151

52-
network {
53-
port = openstack_networking_port_v2.control.id
54-
access_network = true
52+
dynamic "network" {
53+
for_each = {for net in var.cluster_networks: net.network => net}
54+
content {
55+
port = openstack_networking_port_v2.control[network.key].id
56+
access_network = network.key == var.cluster_networks[0].network
57+
}
5558
}
5659

5760
metadata = {
5861
environment_root = var.environment_root
5962
k3s_token = local.k3s_token
63+
# TODO: set k3s_subnet from access_network
6064
}
6165

6266
user_data = <<-EOF
6367
#cloud-config
64-
fqdn: ${var.cluster_name}-${each.key}.${var.cluster_name}.${var.cluster_domain_suffix}
68+
fqdn: ${var.cluster_name}-control.${var.cluster_name}.${var.cluster_domain_suffix}
6569
6670
bootcmd:
6771
%{for volume in local.control_volumes}

environments/skeleton/{{cookiecutter.environment}}/tofu/inventory.tf

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ resource "local_file" "hosts" {
33
{
44
"cluster_name": var.cluster_name,
55
"cluster_domain_suffix": var.cluster_domain_suffix,
6-
"control_instances": openstack_compute_instance_v2.control
6+
"control": openstack_compute_instance_v2.control
77
"login_groups": module.login
88
"compute_groups": module.compute
99
"state_dir": var.state_dir

environments/skeleton/{{cookiecutter.environment}}/tofu/inventory.tpl

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,13 @@ all:
55

66
control:
77
hosts:
8-
%{ for control in control_instances ~}
98
${ control.name }:
10-
ansible_host: ${[for n in control.network: n.fixed_ip_v4 if n.access_network][0]}
11-
instance_id: ${ control.id }
12-
%{ endfor ~}
9+
ansible_host: ${control.access_ip_v4}
10+
instance_id: ${control.id}
11+
networks: ${jsonencode({for n in control.network: n.name => {"fixed_ip_v4": n.fixed_ip_v4, "fixed_ip_v6": n.fixed_ip_v6}})}
1312
vars:
1413
appliances_state_dir: ${state_dir} # NB needs to be set on group not host otherwise it is ignored in packer build!
1514

16-
1715
%{ for group_name in keys(login_groups) ~}
1816
${cluster_name}_${group_name}:
1917
hosts:
@@ -22,6 +20,7 @@ ${cluster_name}_${group_name}:
2220
ansible_host: ${node.access_ip_v4}
2321
instance_id: ${ node.id }
2422
image_id: ${ node.image_id }
23+
networks: ${jsonencode({for n in node.network: n.name => {"fixed_ip_v4": n.fixed_ip_v4, "fixed_ip_v6": n.fixed_ip_v6}})}
2524
%{ endfor ~}
2625
%{ endfor ~}
2726

@@ -39,6 +38,7 @@ ${cluster_name}_${group_name}:
3938
ansible_host: ${node.access_ip_v4}
4039
instance_id: ${ node.id }
4140
image_id: ${ node.image_id }
41+
networks: ${jsonencode({for n in node.network: n.name => {"fixed_ip_v4": n.fixed_ip_v4, "fixed_ip_v6": n.fixed_ip_v6}})}
4242
%{ endfor ~}
4343
%{ endfor ~}
4444

environments/skeleton/{{cookiecutter.environment}}/tofu/login.tf

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,23 +9,25 @@ module "login" {
99

1010
cluster_name = var.cluster_name
1111
cluster_domain_suffix = var.cluster_domain_suffix
12-
cluster_net_id = data.openstack_networking_network_v2.cluster_net.id
13-
cluster_subnet_id = data.openstack_networking_subnet_v2.cluster_subnet.id
14-
12+
1513
# can be set for group, defaults to top-level value:
1614
image_id = lookup(each.value, "image_id", var.cluster_image_id)
17-
vnic_type = lookup(each.value, "vnic_type", var.vnic_type)
18-
vnic_profile = lookup(each.value, "vnic_profile", var.vnic_profile)
15+
vnic_types = lookup(each.value, "vnic_types", var.vnic_types)
16+
vnic_profiles = lookup(each.value, "vnic_profiles", var.vnic_profiles)
1917
volume_backed_instances = lookup(each.value, "volume_backed_instances", var.volume_backed_instances)
2018
root_volume_size = lookup(each.value, "root_volume_size", var.root_volume_size)
19+
20+
# optionally set for group
21+
networks = concat(var.cluster_networks, lookup(each.value, "extra_networks", []))
2122
extra_volumes = lookup(each.value, "extra_volumes", {})
2223

24+
# can't be set for login
2325
compute_init_enable = []
2426
ignore_image_changes = false
2527

2628
key_pair = var.key_pair
2729
environment_root = var.environment_root
2830
k3s_token = local.k3s_token
29-
control_address = [for n in openstack_compute_instance_v2.control["control"].network: n.fixed_ip_v4 if n.access_network][0]
31+
control_address = openstack_compute_instance_v2.control.access_ip_v4
3032
security_group_ids = [for o in data.openstack_networking_secgroup_v2.login: o.id]
3133
}

environments/skeleton/{{cookiecutter.environment}}/tofu/network.tf

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,16 @@
11

22
data "openstack_networking_network_v2" "cluster_net" {
3-
name = var.cluster_net
3+
4+
for_each = {for net in var.cluster_networks: net.network => net}
5+
6+
name = each.value.network
47
}
58

69
data "openstack_networking_subnet_v2" "cluster_subnet" {
710

8-
name = var.cluster_subnet
11+
for_each = {for net in var.cluster_networks: net.network => net}
12+
13+
name = each.value.subnet
914
}
1015

1116
data "openstack_networking_secgroup_v2" "login" {
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
2+
data "openstack_networking_network_v2" "network" {
3+
4+
for_each = {for net in var.networks: net.network => net}
5+
6+
name = each.value.network
7+
}
8+
9+
data "openstack_networking_subnet_v2" "subnet" {
10+
11+
for_each = {for net in var.networks: net.network => net}
12+
13+
name = each.value.subnet
14+
}

0 commit comments

Comments
 (0)