Skip to content

Commit 0ed57d6

Browse files
committed
feat: add blue/green upgrade strategy settings
1 parent e9a72cf commit 0ed57d6

File tree

21 files changed

+638
-50
lines changed

21 files changed

+638
-50
lines changed

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -276,8 +276,13 @@ The node_pools variable takes the following parameters:
276276
| min_cpu_platform | Minimum CPU platform to be used by the nodes in the pool. The nodes may be scheduled on the specified or newer CPU platform. | " " | Optional |
277277
| max_count | Maximum number of nodes in the NodePool. Must be >= min_count | 100 | Optional |
278278
| max_pods_per_node | The maximum number of pods per node in this cluster | null | Optional |
279+
| strategy | The upgrade stragey to be used for upgrading the nodes. Valid values of state are: `SURGE` or `BLUE_GREEN` | "SURGE" | Optional |
279280
| max_surge | The number of additional nodes that can be added to the node pool during an upgrade. Increasing max_surge raises the number of nodes that can be upgraded simultaneously. Can be set to 0 or greater. | 1 | Optional |
280281
| max_unavailable | The number of nodes that can be simultaneously unavailable during an upgrade. Increasing max_unavailable raises the number of nodes that can be upgraded in parallel. Can be set to 0 or greater. | 0 | Optional |
282+
| node_pool_soak_duration | Time needed after draining the entire blue pool. After this period, the blue pool will be cleaned up. By default, it is set to one hour (3600 seconds). The maximum length of the soak time is 7 days (604,800 seconds). | "3600s" | Optional |
283+
| batch_soak_duration | Soak time after each batch gets drained, with the default being zero seconds. | "0s" | Optional |
284+
| batch_node_count | Absolute number of nodes to drain in a batch. If it is set to zero, this phase will be skipped. | null | Optional |
285+
| batch_percentage | Percentage of nodes to drain in a batch. Must be in the range of [0.0, 1.0]. If it is set to zero, this phase will be skipped. | null | Optional |
281286
| min_count | Minimum number of nodes in the NodePool. Must be >=0 and <= max_count. Should be used when autoscaling is true | 1 | Optional |
282287
| name | The name of the node pool | | Required |
283288
| node_count | The number of nodes in the nodepool when autoscaling is false. Otherwise defaults to 1. Only valid for non-autoscaling clusters | | Required |

autogen/main/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,8 +212,13 @@ The node_pools variable takes the following parameters:
212212
| min_cpu_platform | Minimum CPU platform to be used by the nodes in the pool. The nodes may be scheduled on the specified or newer CPU platform. | " " | Optional |
213213
| max_count | Maximum number of nodes in the NodePool. Must be >= min_count | 100 | Optional |
214214
| max_pods_per_node | The maximum number of pods per node in this cluster | null | Optional |
215+
| strategy | The upgrade stragey to be used for upgrading the nodes. Valid values of state are: `SURGE` or `BLUE_GREEN` | "SURGE" | Optional |
215216
| max_surge | The number of additional nodes that can be added to the node pool during an upgrade. Increasing max_surge raises the number of nodes that can be upgraded simultaneously. Can be set to 0 or greater. | 1 | Optional |
216217
| max_unavailable | The number of nodes that can be simultaneously unavailable during an upgrade. Increasing max_unavailable raises the number of nodes that can be upgraded in parallel. Can be set to 0 or greater. | 0 | Optional |
218+
| node_pool_soak_duration | Time needed after draining the entire blue pool. After this period, the blue pool will be cleaned up. By default, it is set to one hour (3600 seconds). The maximum length of the soak time is 7 days (604,800 seconds). | "3600s" | Optional |
219+
| batch_soak_duration | Soak time after each batch gets drained, with the default being zero seconds. | "0s" | Optional |
220+
| batch_node_count | Absolute number of nodes to drain in a batch. If it is set to zero, this phase will be skipped. | null | Optional |
221+
| batch_percentage | Percentage of nodes to drain in a batch. Must be in the range of [0.0, 1.0]. If it is set to zero, this phase will be skipped. | null | Optional |
217222
| min_count | Minimum number of nodes in the NodePool. Must be >=0 and <= max_count. Should be used when autoscaling is true | 1 | Optional |
218223
| name | The name of the node pool | | Required |
219224
{% if beta_cluster %}

autogen/main/cluster.tf.tmpl

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -684,9 +684,30 @@ resource "google_container_node_pool" "windows_pools" {
684684
auto_upgrade = lookup(each.value, "auto_upgrade", local.default_auto_upgrade)
685685
}
686686

687-
upgrade_settings {
688-
max_surge = lookup(each.value, "max_surge", 1)
689-
max_unavailable = lookup(each.value, "max_unavailable", 0)
687+
dynamic "upgrade_settings" {
688+
for_each = lookup(each.value, "strategy", var.strategy) == "SURGE" ? [each.value] : []
689+
content {
690+
strategy = lookup(each.value, "strategy", "SURGE")
691+
max_surge = lookup(each.value, "max_surge", 1)
692+
max_unavailable = lookup(each.value, "max_unavailable", 0)
693+
}
694+
}
695+
696+
dynamic "upgrade_settings" {
697+
for_each = lookup(each.value, "strategy", var.strategy) == "BLUE_GREEN" ? [each.value] : []
698+
content {
699+
strategy = lookup(each.value, "strategy", "BLUE_GREEN")
700+
701+
blue_green_settings {
702+
node_pool_soak_duration = lookup(each.value, "node_pool_soak_duration", "3600s")
703+
704+
standard_rollout_policy {
705+
batch_soak_duration = lookup(each.value, "batch_soak_duration", "60s")
706+
batch_percentage = lookup(each.value, "batch_percentage", null)
707+
batch_node_count = lookup(each.value, "batch_node_count", null)
708+
}
709+
}
710+
}
690711
}
691712

692713
node_config {

autogen/main/variables.tf.tmpl

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -727,7 +727,6 @@ variable "enable_pod_security_policy" {
727727
default = false
728728
}
729729

730-
731730
variable "enable_l4_ilb_subsetting" {
732731
type = bool
733732
description = "Enable L4 ILB Subsetting on the cluster"
@@ -751,5 +750,47 @@ variable "enable_identity_service" {
751750
description = "Enable the Identity Service component, which allows customers to use external identity providers with the K8S API."
752751
default = false
753752
}
753+
754+
variable "strategy" {
755+
type = string
756+
description = "The upgrade stragey to be used for upgrading the nodes. Valid values of state are: `SURGE`; `BLUE_GREEN`. By default strategy is `SURGE` (Optional)"
757+
default = "SURGE"
758+
}
759+
760+
variable "max_surge" {
761+
type = number
762+
description = "The number of additional nodes that can be added to the node pool during an upgrade. Increasing max_surge raises the number of nodes that can be upgraded simultaneously. Can be set to 0 or greater (Optional)"
763+
default = null
764+
}
765+
766+
variable "max_unavailable" {
767+
type = number
768+
description = "The number of nodes that can be simultaneously unavailable during an upgrade. Increasing max_unavailable raises the number of nodes that can be upgraded in parallel. Can be set to 0 or greater (Optional)"
769+
default = null
770+
}
771+
772+
variable "node_pool_soak_duration" {
773+
type = string
774+
description = "Time needed after draining the entire blue pool. After this period, the blue pool will be cleaned up (Optional)"
775+
default = "3600s"
776+
}
777+
778+
variable "batch_soak_duration" {
779+
type = string
780+
description = "Soak time after each batch gets drained (Optionial)"
781+
default = "0s"
782+
}
783+
784+
variable "batch_percentage" {
785+
type = string
786+
description = "Percentage of the blue pool nodes to drain in a batch (Optional)"
787+
default = null
788+
}
789+
790+
variable "batch_node_count" {
791+
type = number
792+
description = "The number of blue nodes to drain in a batch (Optional)"
793+
default = null
794+
}
754795
{% endif %}
755796
{% endif %}

cluster.tf

Lines changed: 48 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -405,9 +405,30 @@ resource "google_container_node_pool" "pools" {
405405
auto_upgrade = lookup(each.value, "auto_upgrade", local.default_auto_upgrade)
406406
}
407407

408-
upgrade_settings {
409-
max_surge = lookup(each.value, "max_surge", 1)
410-
max_unavailable = lookup(each.value, "max_unavailable", 0)
408+
dynamic "upgrade_settings" {
409+
for_each = lookup(each.value, "strategy", var.strategy) == "SURGE" ? [each.value] : []
410+
content {
411+
strategy = lookup(each.value, "strategy", "SURGE")
412+
max_surge = lookup(each.value, "max_surge", 1)
413+
max_unavailable = lookup(each.value, "max_unavailable", 0)
414+
}
415+
}
416+
417+
dynamic "upgrade_settings" {
418+
for_each = lookup(each.value, "strategy", var.strategy) == "BLUE_GREEN" ? [each.value] : []
419+
content {
420+
strategy = lookup(each.value, "strategy", "BLUE_GREEN")
421+
422+
blue_green_settings {
423+
node_pool_soak_duration = lookup(each.value, "node_pool_soak_duration", "3600s")
424+
425+
standard_rollout_policy {
426+
batch_soak_duration = lookup(each.value, "batch_soak_duration", "60s")
427+
batch_percentage = lookup(each.value, "batch_percentage", null)
428+
batch_node_count = lookup(each.value, "batch_node_count", null)
429+
}
430+
}
431+
}
411432
}
412433

413434
node_config {
@@ -577,9 +598,30 @@ resource "google_container_node_pool" "windows_pools" {
577598
auto_upgrade = lookup(each.value, "auto_upgrade", local.default_auto_upgrade)
578599
}
579600

580-
upgrade_settings {
581-
max_surge = lookup(each.value, "max_surge", 1)
582-
max_unavailable = lookup(each.value, "max_unavailable", 0)
601+
dynamic "upgrade_settings" {
602+
for_each = lookup(each.value, "strategy", var.strategy) == "SURGE" ? [each.value] : []
603+
content {
604+
strategy = lookup(each.value, "strategy", "SURGE")
605+
max_surge = lookup(each.value, "max_surge", 1)
606+
max_unavailable = lookup(each.value, "max_unavailable", 0)
607+
}
608+
}
609+
610+
dynamic "upgrade_settings" {
611+
for_each = lookup(each.value, "strategy", var.strategy) == "BLUE_GREEN" ? [each.value] : []
612+
content {
613+
strategy = lookup(each.value, "strategy", "BLUE_GREEN")
614+
615+
blue_green_settings {
616+
node_pool_soak_duration = lookup(each.value, "node_pool_soak_duration", "3600s")
617+
618+
standard_rollout_policy {
619+
batch_soak_duration = lookup(each.value, "batch_soak_duration", "60s")
620+
batch_percentage = lookup(each.value, "batch_percentage", null)
621+
batch_node_count = lookup(each.value, "batch_node_count", null)
622+
}
623+
}
624+
}
583625
}
584626

585627
node_config {

modules/beta-private-cluster-update-variant/README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,9 @@ Then perform the following commands on the root folder:
163163
| add\_master\_webhook\_firewall\_rules | Create master\_webhook firewall rules for ports defined in `firewall_inbound_ports` | `bool` | `false` | no |
164164
| add\_shadow\_firewall\_rules | Create GKE shadow firewall (the same as default firewall rules with firewall logs enabled). | `bool` | `false` | no |
165165
| authenticator\_security\_group | The name of the RBAC security group for use with Google security groups in Kubernetes RBAC. Group name must be in format [email protected] | `string` | `null` | no |
166+
| batch\_node\_count | The number of blue nodes to drain in a batch (Optional) | `number` | `null` | no |
167+
| batch\_percentage | Percentage of the blue pool nodes to drain in a batch (Optional) | `string` | `null` | no |
168+
| batch\_soak\_duration | Soak time after each batch gets drained (Optionial) | `string` | `"0s"` | no |
166169
| cloudrun | (Beta) Enable CloudRun addon | `bool` | `false` | no |
167170
| cloudrun\_load\_balancer\_type | (Beta) Configure the Cloud Run load balancer type. External by default. Set to `LOAD_BALANCER_TYPE_INTERNAL` to configure as an internal load balancer. | `string` | `""` | no |
168171
| cluster\_autoscaling | Cluster autoscaling configuration. See [more details](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#clusterautoscaling) | <pre>object({<br> enabled = bool<br> autoscaling_profile = string<br> min_cpu_cores = number<br> max_cpu_cores = number<br> min_memory_gb = number<br> max_memory_gb = number<br> gpu_resources = list(object({ resource_type = string, minimum = number, maximum = number }))<br> auto_repair = bool<br> auto_upgrade = bool<br> })</pre> | <pre>{<br> "auto_repair": true,<br> "auto_upgrade": true,<br> "autoscaling_profile": "BALANCED",<br> "enabled": false,<br> "gpu_resources": [],<br> "max_cpu_cores": 0,<br> "max_memory_gb": 0,<br> "min_cpu_cores": 0,<br> "min_memory_gb": 0<br>}</pre> | no |
@@ -227,6 +230,8 @@ Then perform the following commands on the root folder:
227230
| master\_authorized\_networks | List of master authorized networks. If none are provided, disallow external access (except the cluster node IPs, which GKE automatically whitelists). | `list(object({ cidr_block = string, display_name = string }))` | `[]` | no |
228231
| master\_global\_access\_enabled | Whether the cluster master is accessible globally (from any region) or only within the same region as the private endpoint. | `bool` | `true` | no |
229232
| master\_ipv4\_cidr\_block | (Beta) The IP range in CIDR notation to use for the hosted master network | `string` | `"10.0.0.0/28"` | no |
233+
| max\_surge | The number of additional nodes that can be added to the node pool during an upgrade. Increasing max\_surge raises the number of nodes that can be upgraded simultaneously. Can be set to 0 or greater (Optional) | `number` | `null` | no |
234+
| max\_unavailable | The number of nodes that can be simultaneously unavailable during an upgrade. Increasing max\_unavailable raises the number of nodes that can be upgraded in parallel. Can be set to 0 or greater (Optional) | `number` | `null` | no |
230235
| monitoring\_enable\_managed\_prometheus | Configuration for Managed Service for Prometheus. Whether or not the managed collection is enabled. | `bool` | `false` | no |
231236
| monitoring\_enabled\_components | List of services to monitor: SYSTEM\_COMPONENTS, WORKLOADS (provider version >= 3.89.0). Empty list is default GKE configuration. | `list(string)` | `[]` | no |
232237
| monitoring\_service | The monitoring service that the cluster should write metrics to. Automatically send metrics from pods in the cluster to the Google Cloud Monitoring API. VM metrics will be collected by Google Compute Engine regardless of this setting Available options include monitoring.googleapis.com, monitoring.googleapis.com/kubernetes (beta) and none | `string` | `"monitoring.googleapis.com/kubernetes"` | no |
@@ -236,6 +241,7 @@ Then perform the following commands on the root folder:
236241
| network\_policy\_provider | The network policy provider. | `string` | `"CALICO"` | no |
237242
| network\_project\_id | The project ID of the shared VPC's host (for shared vpc support) | `string` | `""` | no |
238243
| node\_metadata | Specifies how node metadata is exposed to the workload running on the node | `string` | `"GKE_METADATA"` | no |
244+
| node\_pool\_soak\_duration | Time needed after draining the entire blue pool. After this period, the blue pool will be cleaned up (Optional) | `string` | `"3600s"` | no |
239245
| node\_pools | List of maps containing node pools | `list(map(any))` | <pre>[<br> {<br> "name": "default-node-pool"<br> }<br>]</pre> | no |
240246
| node\_pools\_labels | Map of maps containing node labels by node-pool name | `map(map(string))` | <pre>{<br> "all": {},<br> "default-node-pool": {}<br>}</pre> | no |
241247
| node\_pools\_linux\_node\_configs\_sysctls | Map of maps containing linux node config sysctls by node-pool name | `map(map(string))` | <pre>{<br> "all": {},<br> "default-node-pool": {}<br>}</pre> | no |
@@ -259,6 +265,7 @@ Then perform the following commands on the root folder:
259265
| shadow\_firewall\_rules\_log\_config | The log\_config for shadow firewall rules. You can set this variable to `null` to disable logging. | <pre>object({<br> metadata = string<br> })</pre> | <pre>{<br> "metadata": "INCLUDE_ALL_METADATA"<br>}</pre> | no |
260266
| shadow\_firewall\_rules\_priority | The firewall priority of GKE shadow firewall rules. The priority should be less than default firewall, which is 1000. | `number` | `999` | no |
261267
| skip\_provisioners | Flag to skip all local-exec provisioners. It breaks `stub_domains` and `upstream_nameservers` variables functionality. | `bool` | `false` | no |
268+
| strategy | The upgrade stragey to be used for upgrading the nodes. Valid values of state are: `SURGE`; `BLUE_GREEN`. By default strategy is `SURGE` (Optional) | `string` | `"SURGE"` | no |
262269
| stub\_domains | Map of stub domains and their resolvers to forward DNS queries for a certain domain to an external DNS server | `map(list(string))` | `{}` | no |
263270
| subnetwork | The subnetwork to host the cluster in (required) | `string` | n/a | yes |
264271
| timeouts | Timeout for cluster operations. | `map(string)` | `{}` | no |
@@ -341,8 +348,13 @@ The node_pools variable takes the following parameters:
341348
| min_cpu_platform | Minimum CPU platform to be used by the nodes in the pool. The nodes may be scheduled on the specified or newer CPU platform. | " " | Optional |
342349
| max_count | Maximum number of nodes in the NodePool. Must be >= min_count | 100 | Optional |
343350
| max_pods_per_node | The maximum number of pods per node in this cluster | null | Optional |
351+
| strategy | The upgrade stragey to be used for upgrading the nodes. Valid values of state are: `SURGE` or `BLUE_GREEN` | "SURGE" | Optional |
344352
| max_surge | The number of additional nodes that can be added to the node pool during an upgrade. Increasing max_surge raises the number of nodes that can be upgraded simultaneously. Can be set to 0 or greater. | 1 | Optional |
345353
| max_unavailable | The number of nodes that can be simultaneously unavailable during an upgrade. Increasing max_unavailable raises the number of nodes that can be upgraded in parallel. Can be set to 0 or greater. | 0 | Optional |
354+
| node_pool_soak_duration | Time needed after draining the entire blue pool. After this period, the blue pool will be cleaned up. By default, it is set to one hour (3600 seconds). The maximum length of the soak time is 7 days (604,800 seconds). | "3600s" | Optional |
355+
| batch_soak_duration | Soak time after each batch gets drained, with the default being zero seconds. | "0s" | Optional |
356+
| batch_node_count | Absolute number of nodes to drain in a batch. If it is set to zero, this phase will be skipped. | null | Optional |
357+
| batch_percentage | Percentage of nodes to drain in a batch. Must be in the range of [0.0, 1.0]. If it is set to zero, this phase will be skipped. | null | Optional |
346358
| min_count | Minimum number of nodes in the NodePool. Must be >=0 and <= max_count. Should be used when autoscaling is true | 1 | Optional |
347359
| name | The name of the node pool | | Required |
348360
| placement_policy | Placement type to set for nodes in a node pool. Can be set as [COMPACT](https://cloud.google.com/kubernetes-engine/docs/how-to/compact-placement#overview) if desired | Optional |

0 commit comments

Comments
 (0)