Skip to content

Commit 04268d9

Browse files
Add drift remediation configuration document (#1811)
Issue #, if available: Closes #1810 Description of changes: Adds a new document to the "Getting Started" tab of the documentation which outlines the definition and configuration options for ACK drift remediation. Also adds a link to it from the "How it Works" page. By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
1 parent 587161b commit 04268d9

File tree

2 files changed

+86
-4
lines changed

2 files changed

+86
-4
lines changed

docs/content/docs/community/how-it-works.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -56,10 +56,13 @@ information it received from S3.
5656
## Drift Detection and Remediation
5757

5858
There are times where a resource that an ACK service controller is managing is
59-
modified outside of ACK, e.g. through the AWS CLI or the console. Every 10 hours,
60-
an ACK service controller will look for any drift and attempt to remediate. As
61-
part of the remediation, an ACK service controller will reconfigure the managed
62-
resource based on the `Spec`.
59+
modified outside of ACK, e.g. through the AWS CLI or the console. An ACK service
60+
controller will look for any drift and attempt to remediate every 10 hours
61+
(unless a different frequency is configured). As part of the remediation, an ACK
62+
service controller will reconfigure the managed resource based on the `Spec`.
63+
64+
For more information about configuring the drift remediation period, see
65+
[Recovering from Drift][drift]
6366

6467
[api-kind]: https://kubernetes.io/docs/reference/using-api/api-concepts/#standard-api-terminology
6568
[authz]: ../../user-docs/authorization/
@@ -69,3 +72,4 @@ resource based on the `Spec`.
6972
[crd]: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/
7073
[s3-cb-api]: https://docs.aws.amazon.com/AmazonS3/latest/API/API_CreateBucket.html
7174
[spec-status]: https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects/#object-spec-and-status
75+
[drift]: ../../user-docs/drift-recovery
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
---
2+
title: "Recovering from Drift"
3+
description: "Recovering from Drift"
4+
lead: "How ACK controllers detect and remediate resource drift"
5+
draft: false
6+
menu:
7+
docs:
8+
parent: "getting-started"
9+
weight: 55
10+
toc: true
11+
---
12+
13+
Kubernetes controllers work on the principal of [constant
14+
reconciliation][constant-reconciliation]. In essence, they continuously look at
15+
the current desired state of the system and compare it to the actual state,
16+
using the difference to determine the action required to get to the desired end
17+
result.
18+
19+
Once a controller has reconciled a resource to its desired state, the controller
20+
shouldn't need to continue reconciling - the actual state of the resource meets
21+
the specification. However, this is only true for closed systems, where the
22+
controller is the only actor interacting with a resource. Unfortunately, ACK
23+
controllers don't act in a closed system. ACK controllers are not the only actor
24+
capable of modifying the actual state of any AWS resources - other programs, or
25+
even people, may have their own privileges. When another actor modifies a
26+
resource after the ACK controller has reconciled it to its desired state, that's
27+
called "drift".
28+
29+
ACK controllers detect drift by continuing to reconcile resources after they
30+
have reached their desired state, but with much longer delays between
31+
reconciliation attempts. By default, all ACK controllers attempt to detect drift
32+
once every **10 hours**. That is, every 10 hours after a resource has been
33+
marked with the `ResourceSynced = true` condition, its owner controller will
34+
describe the resource in AWS to see if it no longer matches the desired state.
35+
If the controller detects a difference, it then starts the reconciliation loop
36+
again to get back to that state (just as when any other change has been made).
37+
38+
{{% hint type="info" title="Existing resource overrides" %}}
39+
Some resources require more frequent drift remediation. For example, if a
40+
resource runs a stateful workload whose status changes frequently (such as a
41+
SageMaker `TrainingJob`). For these resources, the drift remediation period may
42+
already have been decreased by the controller authors to improve the
43+
responsiveness of the resource's `Status`.
44+
45+
All override periods are logged to stdout when the controller is started.
46+
{{% /hint %}}
47+
48+
## Overriding the drift remediation period
49+
50+
### For all resources owned by a controller
51+
52+
If you would like to decrease the drift remediation period for *all* resources
53+
owned by a controller, update the `reconcile.defaultResyncPeriod` value in the
54+
Helm chart `values.yaml` file with the number of seconds for the new period,
55+
like so:
56+
57+
```yaml
58+
reconcile:
59+
defaultResyncPeriod: 1800 # 30 minutes (in seconds)
60+
```
61+
62+
### For a single resource type
63+
64+
The most granular configuration for setting reconciliation periods is to apply
65+
it to all resources of a given type. For example, all S3 `Bucket` managed by a
66+
single controller.
67+
68+
Add the resource name and the overriding period (in seconds) to the
69+
`reconcile.resourceResyncPeriods` value in the Helm chart `values.yaml` like
70+
so:
71+
72+
```yaml
73+
reconcile:
74+
resourceResyncPeriods:
75+
Bucket: 1800 # 30 minutes (in seconds)
76+
```
77+
78+
[constant-reconciliation]: https://book.kubebuilder.io/cronjob-tutorial/controller-overview.html#whats-in-a-controller

0 commit comments

Comments
 (0)