|
| 1 | +--- |
| 2 | +title: "Recovering from Drift" |
| 3 | +description: "Recovering from Drift" |
| 4 | +lead: "How ACK controllers detect and remediate resource drift" |
| 5 | +draft: false |
| 6 | +menu: |
| 7 | + docs: |
| 8 | + parent: "getting-started" |
| 9 | +weight: 55 |
| 10 | +toc: true |
| 11 | +--- |
| 12 | + |
| 13 | +Kubernetes controllers work on the principal of [constant |
| 14 | +reconciliation][constant-reconciliation]. In essence, they continuously look at |
| 15 | +the current desired state of the system and compare it to the actual state, |
| 16 | +using the difference to determine the action required to get to the desired end |
| 17 | +result. |
| 18 | + |
| 19 | +Once a controller has reconciled a resource to its desired state, the controller |
| 20 | +shouldn't need to continue reconciling - the actual state of the resource meets |
| 21 | +the specification. However, this is only true for closed systems, where the |
| 22 | +controller is the only actor interacting with a resource. Unfortunately, ACK |
| 23 | +controllers don't act in a closed system. ACK controllers are not the only actor |
| 24 | +capable of modifying the actual state of any AWS resources - other programs, or |
| 25 | +even people, may have their own privileges. When another actor modifies a |
| 26 | +resource after the ACK controller has reconciled it to its desired state, that's |
| 27 | +called "drift". |
| 28 | + |
| 29 | +ACK controllers detect drift by continuing to reconcile resources after they |
| 30 | +have reached their desired state, but with much longer delays between |
| 31 | +reconciliation attempts. By default, all ACK controllers attempt to detect drift |
| 32 | +once every **10 hours**. That is, every 10 hours after a resource has been |
| 33 | +marked with the `ResourceSynced = true` condition, its owner controller will |
| 34 | +describe the resource in AWS to see if it no longer matches the desired state. |
| 35 | +If the controller detects a difference, it then starts the reconciliation loop |
| 36 | +again to get back to that state (just as when any other change has been made). |
| 37 | + |
| 38 | +{{% hint type="info" title="Existing resource overrides" %}} |
| 39 | +Some resources require more frequent drift remediation. For example, if a |
| 40 | +resource runs a stateful workload whose status changes frequently (such as a |
| 41 | +SageMaker `TrainingJob`). For these resources, the drift remediation period may |
| 42 | +already have been decreased by the controller authors to improve the |
| 43 | +responsiveness of the resource's `Status`. |
| 44 | + |
| 45 | +All override periods are logged to stdout when the controller is started. |
| 46 | +{{% /hint %}} |
| 47 | + |
| 48 | +## Overriding the drift remediation period |
| 49 | + |
| 50 | +### For all resources owned by a controller |
| 51 | + |
| 52 | +If you would like to decrease the drift remediation period for *all* resources |
| 53 | +owned by a controller, update the `reconcile.defaultResyncPeriod` value in the |
| 54 | +Helm chart `values.yaml` file with the number of seconds for the new period, |
| 55 | +like so: |
| 56 | + |
| 57 | +```yaml |
| 58 | +reconcile: |
| 59 | + defaultResyncPeriod: 1800 # 30 minutes (in seconds) |
| 60 | +``` |
| 61 | +
|
| 62 | +### For a single resource type |
| 63 | +
|
| 64 | +The most granular configuration for setting reconciliation periods is to apply |
| 65 | +it to all resources of a given type. For example, all S3 `Bucket` managed by a |
| 66 | +single controller. |
| 67 | + |
| 68 | +Add the resource name and the overriding period (in seconds) to the |
| 69 | +`reconcile.resourceResyncPeriods` value in the Helm chart `values.yaml` like |
| 70 | +so: |
| 71 | + |
| 72 | +```yaml |
| 73 | +reconcile: |
| 74 | + resourceResyncPeriods: |
| 75 | + Bucket: 1800 # 30 minutes (in seconds) |
| 76 | +``` |
| 77 | + |
| 78 | +[constant-reconciliation]: https://book.kubebuilder.io/cronjob-tutorial/controller-overview.html#whats-in-a-controller |
0 commit comments