-
Notifications
You must be signed in to change notification settings - Fork 41
Support for multiple resource types and controller-runtime refactor #43
Support for multiple resource types and controller-runtime refactor #43
Conversation
} | ||
|
||
// Emit events once per minute regardless of detected changes. | ||
r := newReconcileLoop(time.Minute*1, options.GVK, mgr.GetClient()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we really need this (both here and in the ansible-operator).
@shawn-hurley Can you comment on why we can't just set the resync option for the manager(which sets it for the cache)?
https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/manager/manager.go#L91-L95
mgr, err := manager.New(cfg, manager.Options{Namespace: namespace, SyncPeriod: time.Minute*1})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline.
For our use case the manager's resync is almost equivalent to the reconcile loop.
kubernetes-sigs/controller-runtime#88 (comment)
So we can use either.
However the recently added RequeueAfter is probably the best way to periodically re-queue and re-trigger the reconcile for each CR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I'll make the change and verify expected behavior.
_, ok := s.(map[string]interface{}) | ||
if !ok { | ||
u.Object["spec"] = map[string]interface{}{} | ||
r.Client.Update(context.TODO(), u) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check return error
} | ||
|
||
// Update the CR with the updated status. | ||
r.Client.Update(context.TODO(), u) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check return error.
if apierrors.IsNotFound(err) { | ||
u, err = r.Installer.UninstallRelease(u) | ||
if err != nil { | ||
log.Print(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't too important for this PR but how about using logrus or something else for more structured logging(info,error,warning,debug).
We're still haven't decided on the default logger that the SDK should use since that impacts all users. operator-framework/operator-sdk#503
But we can pick one for the helm-operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably makes the most sense to go with the outcome of operator-framework/operator-sdk#503, since there's discussion and voting there about the preferences and technical merits of various loggers (and just for the sake of consistency). I'll open a new issue to follow up once that decision is made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New issue #44 to track logging updates.
m := map[schema.GroupVersionKind]Installer{} | ||
|
||
// Determine the watches file to use | ||
watchesFile, ok := os.LookupEnv(HelmChartWatchesEnvVar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HelmChartWatchesEnvVar could be set to empty, HELM_CHART_WATCHES =""
.
In which case ok=true
and watchesFile=""
. We'll end up trying to load the watches file in that case right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would, in which case there would be an error returned by NewFromWatches. I figured that if the environment variable is set (regardless of its value), we should use it since the user explicitly set it.
I could go either way on this though. We could also fall back to the default file for the empty string case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a commit make the logic in this section of code more obvious.
It seems periodic reconciles will result in increasing release numbers for no changes to the CR. |
Agreed. I'll remove a check that skips upgrades for unchanged releases to allow unwatched resources to be reconciled. I'll investigate the ramifications of and possible workarounds for increasing helm release version numbers. |
I found one pretty significant issue with the constantly incrementing release version numbers. When we complete a release, we update the CR status with the full contents of the release, which changes every time we reconcile. Since the CR status is updated, it causes an immediate re-reconcile with the new status, which then triggers another update with an incremented release number, and another reconciliation and so on, forever. It looks like our operator gets quickly rate-limited, but this is clearly something that needs to be resolved. A release update can be executed as a dry-run, which will produce the new complete manifest without making any changes to the tiller release cache or to the cluster state. As long as release manifests are created deterministically, comparing manifests may be the best way to determine whether a chart release needs to be made vs. just reconciling the existing resources associated with the chart. So the question is how to reconcile arbitrary resources. Does |
One way to ignore an event resulting from a CR status update is via the status subresource When the status subresource is enabled for a CRD then any change to And we can filter out Update events that don't have the generation changed via a predicate in our controller: pred := predicateForGenerationChange()
err = c.Watch(&source.Kind{Type: helmApp}, &handler.EnqueueRequestForObject{}) , pred)
...
func predicateForGenerationChange() predicate.Predicate {
return predicate.Funcs{
CreateFunc: func(e event.CreateEvent) bool {
return true
},
UpdateFunc: func(e event.UpdateEvent) bool {
// extract and compare metadata.generation between old and new object
return generationNotEqual(e.objectOld, e.objectNew)
},
DeleteFunc: func(e event.DeleteEvent) bool {
return true
},
}
} The only problem with the above is that the subresource status isn't enabled by default for k8s 1.10. It's alpha and feature-gated. It's beta in k8s 1.11 and 1.12 and should be enabled by default. For 1.10 we'll need to figure out another way besides the generation. And for |
* Replacing custom reconciliation loop with controller-runtime's reconcile.Result.RequeueAfter * Fixing issue with infinite incrementing of helm relase version numbers * Adding resource reconcilation even when helm release is unchanged * Improving logging
I think this latest commit solves the problem without the need for the status subresource, though that's not something I knew about, so thanks for the tip. For I think this raises a question about whether we should support helm's force option, and if so, how that would be conveyed in a CR (perhaps an annotation?). See helm/helm#1958 |
Added another commit which fixes a few more bugs: Release retries Most likely, the latest release is the one that just failed and now we're retrying it, so the latest release and the candidate release are identical. In this case, we reconcile the underlying resources when we should be retrying the failed update. Fix: use the latest deployed release, not the latest release. Failing uninstalls Fix: ignore missing release errors during uninstalls. |
* new functions for resource reconciliation * moved New* functions to separate file
@hasbro17 - This should be ready for another look. The PR is getting pretty big and has now expanded beyond the original scope. I'm wondering if it should be split into multiple PRs? There's probably still a bit more to do as well:
@alecmerdler - Are there any community members that should be aware of these proposed changes? |
@joelanford Nope, feel free to make whatever changes necessary. |
@joelanford If you could break this PR up into 2 that would be great. Perhaps a more logical split would be one for changes to the installer and another one for the new controller-runtime refactoring. But I'll let you decide how best to split it up. And yeah updating the docs and the project layout can be done in followup PRs after that. |
This PR is meant to fix #41 and #42.
It is based on new scaffolding produced by running the following command from a build of
operator-sdk
'srefactor/controller-runtime
branch:It also adds support for watching multiple resource types and managing releases from multiple helm charts using a new YAML file for configuration. Support for the existing environment variables is maintained if the configuration file is not defined/present.
Much of the design inspiration is taken from
github.com/water-hole/ansible-operator
Other improvements were also made: