Skip to content

Commit 944d4ed

Browse files
committed
feat: add blue/green upgrade strategy settings
1 parent 968b024 commit 944d4ed

File tree

21 files changed

+638
-50
lines changed

21 files changed

+638
-50
lines changed

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -275,8 +275,13 @@ The node_pools variable takes the following parameters:
275275
| min_cpu_platform | Minimum CPU platform to be used by the nodes in the pool. The nodes may be scheduled on the specified or newer CPU platform. | " " | Optional |
276276
| max_count | Maximum number of nodes in the NodePool. Must be >= min_count | 100 | Optional |
277277
| max_pods_per_node | The maximum number of pods per node in this cluster | null | Optional |
278+
| strategy | The upgrade stragey to be used for upgrading the nodes. Valid values of state are: `SURGE` or `BLUE_GREEN` | "SURGE" | Optional |
278279
| max_surge | The number of additional nodes that can be added to the node pool during an upgrade. Increasing max_surge raises the number of nodes that can be upgraded simultaneously. Can be set to 0 or greater. | 1 | Optional |
279280
| max_unavailable | The number of nodes that can be simultaneously unavailable during an upgrade. Increasing max_unavailable raises the number of nodes that can be upgraded in parallel. Can be set to 0 or greater. | 0 | Optional |
281+
| node_pool_soak_duration | Time needed after draining the entire blue pool. After this period, the blue pool will be cleaned up. By default, it is set to one hour (3600 seconds). The maximum length of the soak time is 7 days (604,800 seconds). | "3600s" | Optional |
282+
| batch_soak_duration | Soak time after each batch gets drained, with the default being zero seconds. | "0s" | Optional |
283+
| batch_node_count | Absolute number of nodes to drain in a batch. If it is set to zero, this phase will be skipped. | null | Optional |
284+
| batch_percentage | Percentage of nodes to drain in a batch. Must be in the range of [0.0, 1.0]. If it is set to zero, this phase will be skipped. | null | Optional |
280285
| min_count | Minimum number of nodes in the NodePool. Must be >=0 and <= max_count. Should be used when autoscaling is true | 1 | Optional |
281286
| name | The name of the node pool | | Required |
282287
| node_count | The number of nodes in the nodepool when autoscaling is false. Otherwise defaults to 1. Only valid for non-autoscaling clusters | | Required |

autogen/main/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,8 +212,13 @@ The node_pools variable takes the following parameters:
212212
| min_cpu_platform | Minimum CPU platform to be used by the nodes in the pool. The nodes may be scheduled on the specified or newer CPU platform. | " " | Optional |
213213
| max_count | Maximum number of nodes in the NodePool. Must be >= min_count | 100 | Optional |
214214
| max_pods_per_node | The maximum number of pods per node in this cluster | null | Optional |
215+
| strategy | The upgrade stragey to be used for upgrading the nodes. Valid values of state are: `SURGE` or `BLUE_GREEN` | "SURGE" | Optional |
215216
| max_surge | The number of additional nodes that can be added to the node pool during an upgrade. Increasing max_surge raises the number of nodes that can be upgraded simultaneously. Can be set to 0 or greater. | 1 | Optional |
216217
| max_unavailable | The number of nodes that can be simultaneously unavailable during an upgrade. Increasing max_unavailable raises the number of nodes that can be upgraded in parallel. Can be set to 0 or greater. | 0 | Optional |
218+
| node_pool_soak_duration | Time needed after draining the entire blue pool. After this period, the blue pool will be cleaned up. By default, it is set to one hour (3600 seconds). The maximum length of the soak time is 7 days (604,800 seconds). | "3600s" | Optional |
219+
| batch_soak_duration | Soak time after each batch gets drained, with the default being zero seconds. | "0s" | Optional |
220+
| batch_node_count | Absolute number of nodes to drain in a batch. If it is set to zero, this phase will be skipped. | null | Optional |
221+
| batch_percentage | Percentage of nodes to drain in a batch. Must be in the range of [0.0, 1.0]. If it is set to zero, this phase will be skipped. | null | Optional |
217222
| min_count | Minimum number of nodes in the NodePool. Must be >=0 and <= max_count. Should be used when autoscaling is true | 1 | Optional |
218223
| name | The name of the node pool | | Required |
219224
{% if beta_cluster %}

autogen/main/cluster.tf.tmpl

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -676,9 +676,30 @@ resource "google_container_node_pool" "windows_pools" {
676676
auto_upgrade = lookup(each.value, "auto_upgrade", local.default_auto_upgrade)
677677
}
678678

679-
upgrade_settings {
680-
max_surge = lookup(each.value, "max_surge", 1)
681-
max_unavailable = lookup(each.value, "max_unavailable", 0)
679+
dynamic "upgrade_settings" {
680+
for_each = lookup(each.value, "strategy", var.strategy) == "SURGE" ? [each.value] : []
681+
content {
682+
strategy = lookup(each.value, "strategy", "SURGE")
683+
max_surge = lookup(each.value, "max_surge", 1)
684+
max_unavailable = lookup(each.value, "max_unavailable", 0)
685+
}
686+
}
687+
688+
dynamic "upgrade_settings" {
689+
for_each = lookup(each.value, "strategy", var.strategy) == "BLUE_GREEN" ? [each.value] : []
690+
content {
691+
strategy = lookup(each.value, "strategy", "BLUE_GREEN")
692+
693+
blue_green_settings {
694+
node_pool_soak_duration = lookup(each.value, "node_pool_soak_duration", "3600s")
695+
696+
standard_rollout_policy {
697+
batch_soak_duration = lookup(each.value, "batch_soak_duration", "60s")
698+
batch_percentage = lookup(each.value, "batch_percentage", null)
699+
batch_node_count = lookup(each.value, "batch_node_count", null)
700+
}
701+
}
702+
}
682703
}
683704

684705
node_config {

autogen/main/variables.tf.tmpl

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -725,7 +725,6 @@ variable "enable_pod_security_policy" {
725725
default = false
726726
}
727727

728-
729728
variable "enable_l4_ilb_subsetting" {
730729
type = bool
731730
description = "Enable L4 ILB Subsetting on the cluster"
@@ -749,5 +748,47 @@ variable "enable_identity_service" {
749748
description = "Enable the Identity Service component, which allows customers to use external identity providers with the K8S API."
750749
default = false
751750
}
751+
752+
variable "strategy" {
753+
type = string
754+
description = "The upgrade stragey to be used for upgrading the nodes. Valid values of state are: `SURGE`; `BLUE_GREEN`. By default strategy is `SURGE` (Optional)"
755+
default = "SURGE"
756+
}
757+
758+
variable "max_surge" {
759+
type = number
760+
description = "The number of additional nodes that can be added to the node pool during an upgrade. Increasing max_surge raises the number of nodes that can be upgraded simultaneously. Can be set to 0 or greater (Optional)"
761+
default = null
762+
}
763+
764+
variable "max_unavailable" {
765+
type = number
766+
description = "The number of nodes that can be simultaneously unavailable during an upgrade. Increasing max_unavailable raises the number of nodes that can be upgraded in parallel. Can be set to 0 or greater (Optional)"
767+
default = null
768+
}
769+
770+
variable "node_pool_soak_duration" {
771+
type = string
772+
description = "Time needed after draining the entire blue pool. After this period, the blue pool will be cleaned up (Optional)"
773+
default = "3600s"
774+
}
775+
776+
variable "batch_soak_duration" {
777+
type = string
778+
description = "Soak time after each batch gets drained (Optionial)"
779+
default = "0s"
780+
}
781+
782+
variable "batch_percentage" {
783+
type = string
784+
description = "Percentage of the blue pool nodes to drain in a batch (Optional)"
785+
default = null
786+
}
787+
788+
variable "batch_node_count" {
789+
type = number
790+
description = "The number of blue nodes to drain in a batch (Optional)"
791+
default = null
792+
}
752793
{% endif %}
753794
{% endif %}

cluster.tf

Lines changed: 48 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -399,9 +399,30 @@ resource "google_container_node_pool" "pools" {
399399
auto_upgrade = lookup(each.value, "auto_upgrade", local.default_auto_upgrade)
400400
}
401401

402-
upgrade_settings {
403-
max_surge = lookup(each.value, "max_surge", 1)
404-
max_unavailable = lookup(each.value, "max_unavailable", 0)
402+
dynamic "upgrade_settings" {
403+
for_each = lookup(each.value, "strategy", var.strategy) == "SURGE" ? [each.value] : []
404+
content {
405+
strategy = lookup(each.value, "strategy", "SURGE")
406+
max_surge = lookup(each.value, "max_surge", 1)
407+
max_unavailable = lookup(each.value, "max_unavailable", 0)
408+
}
409+
}
410+
411+
dynamic "upgrade_settings" {
412+
for_each = lookup(each.value, "strategy", var.strategy) == "BLUE_GREEN" ? [each.value] : []
413+
content {
414+
strategy = lookup(each.value, "strategy", "BLUE_GREEN")
415+
416+
blue_green_settings {
417+
node_pool_soak_duration = lookup(each.value, "node_pool_soak_duration", "3600s")
418+
419+
standard_rollout_policy {
420+
batch_soak_duration = lookup(each.value, "batch_soak_duration", "60s")
421+
batch_percentage = lookup(each.value, "batch_percentage", null)
422+
batch_node_count = lookup(each.value, "batch_node_count", null)
423+
}
424+
}
425+
}
405426
}
406427

407428
node_config {
@@ -557,9 +578,30 @@ resource "google_container_node_pool" "windows_pools" {
557578
auto_upgrade = lookup(each.value, "auto_upgrade", local.default_auto_upgrade)
558579
}
559580

560-
upgrade_settings {
561-
max_surge = lookup(each.value, "max_surge", 1)
562-
max_unavailable = lookup(each.value, "max_unavailable", 0)
581+
dynamic "upgrade_settings" {
582+
for_each = lookup(each.value, "strategy", var.strategy) == "SURGE" ? [each.value] : []
583+
content {
584+
strategy = lookup(each.value, "strategy", "SURGE")
585+
max_surge = lookup(each.value, "max_surge", 1)
586+
max_unavailable = lookup(each.value, "max_unavailable", 0)
587+
}
588+
}
589+
590+
dynamic "upgrade_settings" {
591+
for_each = lookup(each.value, "strategy", var.strategy) == "BLUE_GREEN" ? [each.value] : []
592+
content {
593+
strategy = lookup(each.value, "strategy", "BLUE_GREEN")
594+
595+
blue_green_settings {
596+
node_pool_soak_duration = lookup(each.value, "node_pool_soak_duration", "3600s")
597+
598+
standard_rollout_policy {
599+
batch_soak_duration = lookup(each.value, "batch_soak_duration", "60s")
600+
batch_percentage = lookup(each.value, "batch_percentage", null)
601+
batch_node_count = lookup(each.value, "batch_node_count", null)
602+
}
603+
}
604+
}
563605
}
564606

565607
node_config {

modules/beta-private-cluster-update-variant/README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,9 @@ Then perform the following commands on the root folder:
163163
| add\_master\_webhook\_firewall\_rules | Create master\_webhook firewall rules for ports defined in `firewall_inbound_ports` | `bool` | `false` | no |
164164
| add\_shadow\_firewall\_rules | Create GKE shadow firewall (the same as default firewall rules with firewall logs enabled). | `bool` | `false` | no |
165165
| authenticator\_security\_group | The name of the RBAC security group for use with Google security groups in Kubernetes RBAC. Group name must be in format [email protected] | `string` | `null` | no |
166+
| batch\_node\_count | The number of blue nodes to drain in a batch (Optional) | `number` | `null` | no |
167+
| batch\_percentage | Percentage of the blue pool nodes to drain in a batch (Optional) | `string` | `null` | no |
168+
| batch\_soak\_duration | Soak time after each batch gets drained (Optionial) | `string` | `"0s"` | no |
166169
| cloudrun | (Beta) Enable CloudRun addon | `bool` | `false` | no |
167170
| cloudrun\_load\_balancer\_type | (Beta) Configure the Cloud Run load balancer type. External by default. Set to `LOAD_BALANCER_TYPE_INTERNAL` to configure as an internal load balancer. | `string` | `""` | no |
168171
| cluster\_autoscaling | Cluster autoscaling configuration. See [more details](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#clusterautoscaling) | <pre>object({<br> enabled = bool<br> autoscaling_profile = string<br> min_cpu_cores = number<br> max_cpu_cores = number<br> min_memory_gb = number<br> max_memory_gb = number<br> gpu_resources = list(object({ resource_type = string, minimum = number, maximum = number }))<br> })</pre> | <pre>{<br> "autoscaling_profile": "BALANCED",<br> "enabled": false,<br> "gpu_resources": [],<br> "max_cpu_cores": 0,<br> "max_memory_gb": 0,<br> "min_cpu_cores": 0,<br> "min_memory_gb": 0<br>}</pre> | no |
@@ -227,6 +230,8 @@ Then perform the following commands on the root folder:
227230
| master\_authorized\_networks | List of master authorized networks. If none are provided, disallow external access (except the cluster node IPs, which GKE automatically whitelists). | `list(object({ cidr_block = string, display_name = string }))` | `[]` | no |
228231
| master\_global\_access\_enabled | Whether the cluster master is accessible globally (from any region) or only within the same region as the private endpoint. | `bool` | `true` | no |
229232
| master\_ipv4\_cidr\_block | (Beta) The IP range in CIDR notation to use for the hosted master network | `string` | `"10.0.0.0/28"` | no |
233+
| max\_surge | The number of additional nodes that can be added to the node pool during an upgrade. Increasing max\_surge raises the number of nodes that can be upgraded simultaneously. Can be set to 0 or greater (Optional) | `number` | `null` | no |
234+
| max\_unavailable | The number of nodes that can be simultaneously unavailable during an upgrade. Increasing max\_unavailable raises the number of nodes that can be upgraded in parallel. Can be set to 0 or greater (Optional) | `number` | `null` | no |
230235
| monitoring\_enable\_managed\_prometheus | Configuration for Managed Service for Prometheus. Whether or not the managed collection is enabled. | `bool` | `false` | no |
231236
| monitoring\_enabled\_components | List of services to monitor: SYSTEM\_COMPONENTS, WORKLOADS (provider version >= 3.89.0). Empty list is default GKE configuration. | `list(string)` | `[]` | no |
232237
| monitoring\_service | The monitoring service that the cluster should write metrics to. Automatically send metrics from pods in the cluster to the Google Cloud Monitoring API. VM metrics will be collected by Google Compute Engine regardless of this setting Available options include monitoring.googleapis.com, monitoring.googleapis.com/kubernetes (beta) and none | `string` | `"monitoring.googleapis.com/kubernetes"` | no |
@@ -236,6 +241,7 @@ Then perform the following commands on the root folder:
236241
| network\_policy\_provider | The network policy provider. | `string` | `"CALICO"` | no |
237242
| network\_project\_id | The project ID of the shared VPC's host (for shared vpc support) | `string` | `""` | no |
238243
| node\_metadata | Specifies how node metadata is exposed to the workload running on the node | `string` | `"GKE_METADATA"` | no |
244+
| node\_pool\_soak\_duration | Time needed after draining the entire blue pool. After this period, the blue pool will be cleaned up (Optional) | `string` | `"3600s"` | no |
239245
| node\_pools | List of maps containing node pools | `list(map(any))` | <pre>[<br> {<br> "name": "default-node-pool"<br> }<br>]</pre> | no |
240246
| node\_pools\_labels | Map of maps containing node labels by node-pool name | `map(map(string))` | <pre>{<br> "all": {},<br> "default-node-pool": {}<br>}</pre> | no |
241247
| node\_pools\_linux\_node\_configs\_sysctls | Map of maps containing linux node config sysctls by node-pool name | `map(map(string))` | <pre>{<br> "all": {},<br> "default-node-pool": {}<br>}</pre> | no |
@@ -259,6 +265,7 @@ Then perform the following commands on the root folder:
259265
| shadow\_firewall\_rules\_log\_config | The log\_config for shadow firewall rules. You can set this variable to `null` to disable logging. | <pre>object({<br> metadata = string<br> })</pre> | <pre>{<br> "metadata": "INCLUDE_ALL_METADATA"<br>}</pre> | no |
260266
| shadow\_firewall\_rules\_priority | The firewall priority of GKE shadow firewall rules. The priority should be less than default firewall, which is 1000. | `number` | `999` | no |
261267
| skip\_provisioners | Flag to skip all local-exec provisioners. It breaks `stub_domains` and `upstream_nameservers` variables functionality. | `bool` | `false` | no |
268+
| strategy | The upgrade stragey to be used for upgrading the nodes. Valid values of state are: `SURGE`; `BLUE_GREEN`. By default strategy is `SURGE` (Optional) | `string` | `"SURGE"` | no |
262269
| stub\_domains | Map of stub domains and their resolvers to forward DNS queries for a certain domain to an external DNS server | `map(list(string))` | `{}` | no |
263270
| subnetwork | The subnetwork to host the cluster in (required) | `string` | n/a | yes |
264271
| timeouts | Timeout for cluster operations. | `map(string)` | `{}` | no |
@@ -341,8 +348,13 @@ The node_pools variable takes the following parameters:
341348
| min_cpu_platform | Minimum CPU platform to be used by the nodes in the pool. The nodes may be scheduled on the specified or newer CPU platform. | " " | Optional |
342349
| max_count | Maximum number of nodes in the NodePool. Must be >= min_count | 100 | Optional |
343350
| max_pods_per_node | The maximum number of pods per node in this cluster | null | Optional |
351+
| strategy | The upgrade stragey to be used for upgrading the nodes. Valid values of state are: `SURGE` or `BLUE_GREEN` | "SURGE" | Optional |
344352
| max_surge | The number of additional nodes that can be added to the node pool during an upgrade. Increasing max_surge raises the number of nodes that can be upgraded simultaneously. Can be set to 0 or greater. | 1 | Optional |
345353
| max_unavailable | The number of nodes that can be simultaneously unavailable during an upgrade. Increasing max_unavailable raises the number of nodes that can be upgraded in parallel. Can be set to 0 or greater. | 0 | Optional |
354+
| node_pool_soak_duration | Time needed after draining the entire blue pool. After this period, the blue pool will be cleaned up. By default, it is set to one hour (3600 seconds). The maximum length of the soak time is 7 days (604,800 seconds). | "3600s" | Optional |
355+
| batch_soak_duration | Soak time after each batch gets drained, with the default being zero seconds. | "0s" | Optional |
356+
| batch_node_count | Absolute number of nodes to drain in a batch. If it is set to zero, this phase will be skipped. | null | Optional |
357+
| batch_percentage | Percentage of nodes to drain in a batch. Must be in the range of [0.0, 1.0]. If it is set to zero, this phase will be skipped. | null | Optional |
346358
| min_count | Minimum number of nodes in the NodePool. Must be >=0 and <= max_count. Should be used when autoscaling is true | 1 | Optional |
347359
| name | The name of the node pool | | Required |
348360
| placement_policy | Placement type to set for nodes in a node pool. Can be set as [COMPACT](https://cloud.google.com/kubernetes-engine/docs/how-to/compact-placement#overview) if desired | Optional |

0 commit comments

Comments
 (0)