-
Notifications
You must be signed in to change notification settings - Fork 562
Bug 1849164: don't store full manifests in installplan status (for bundle images) #1589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1849164: don't store full manifests in installplan status (for bundle images) #1589
Conversation
8cb0757
to
9816718
Compare
@ecordell: This pull request references Bugzilla bug 1849164, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
9816718
to
7ad841b
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good quick and dirty solution to the issue at hand -- I was unaware that we already unpack all objects in the bundle into separate configmaps but if so this makes the most sense.
Let's see how the tests perform but I'm a little concerned about the potential for transient read errors (failed to find configmap) impacting the actual step resolution. We are basically adding a roundtrip to the api-server for each object in the bundle.
Longer term, we are adding more and more objects to the bundle. Eventually we want to support all arbitrary kube objects to the bundle. This approach may still work in that case but kind of curious how it will perform.
@@ -1418,9 +1453,79 @@ func (o *Operator) ExecutePlan(plan *v1alpha1.InstallPlan) error { | |||
ensurer := newStepEnsurer(kubeclient, crclient, dynamicClient) | |||
b := newBuilder(kubeclient, dynamicClient, o.csvProvidedAPIsIndexer, o.logger) | |||
|
|||
refForStep := func(step *v1alpha1.Step) *UnpackedBundleReference { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it's a bit more clear to hang the refForStep
and manifestForStep
functions off the builder
struct, that way they are logically grouped with the object that executes the steps. From what I recall we wanted to eventually move the step logic into there. It also helps improve the readability of ExecutePlan
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose not to do that originally because the builder does not exist prior to 4.5 and I knew we intend to backport this.
But this is a good point - instead of adding to the builder, I'll make a new interface / method for resolving manifests and handling the step cache. In 4.5, the builder can depend on that interface, and in < 4.4 it can be used directly in the catalog operator.
for i, step := range plan.Status.Plan { | ||
// TODO: should lazy load only when the manifest is actually needed | ||
manifest, err := manifestForStep(step) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above -- I think linking this to step creation via the builder
helps with the lazy loading.
for i, step := range plan.Status.Plan { | ||
// TODO: should lazy load only when the manifest is actually needed | ||
manifest, err := manifestForStep(step) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: WDYT about having a type Manifest string
and using that in the code? helps improve readability, and we can hang methods off the type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the type on the API is string
, we'd have to handle manifests that are already unpacked differently from manifests that are in configmaps - instead I'll move all manifest handling into its own file, which should make it easier to keep track of what's what?
@@ -297,7 +297,7 @@ func (b *BundleLookup) SetCondition(cond BundleLookupCondition) BundleLookupCond | |||
if existing.Type != cond.Type { | |||
continue | |||
} | |||
if existing.Status == cond.Status { | |||
if existing.Status == cond.Status && existing.Reason == cond.Reason { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we open up a PR in api
for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call - I will do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to separate our utility methods from the data type in the future. What about defining decorator types in OLM that embed the data type instead?
I will lgtm since this is an urgent bugfix |
/hold I'd like to test with the set of operators we discovered this with before merging |
Resyncs should cover those cases, but I agree that this needs more stress testing to get numbers.
The current implementation doesn't do any extra roundtripping - it reads the configmap data from the cache. It also only goes to the cache once per bundle (not once per object in the bundle). I wanted to do a followup that lazy loads the data and only checks if the step needs some work done (so it would skip installed steps / bundles for a sync). The only reason I didn't do that here is because I would've needed to change the |
That makes sense - forgot we have our informer cache to work with
That makes sense |
@ecordell: This pull request references Bugzilla bug 1849164, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1 similar comment
@ecordell: This pull request references Bugzilla bug 1849164, which is valid. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a solid, highly backwards compatible fix. I expect we'll revisit some of the abstractions late (maybe when we implement/design the v2 APIs).
I do, however, have a single wrench:
IIRC, bundles are unpacked -- in their entirety -- into a single ConfigMap
, which should run into the resource storage limit before the point that this fix will help.
^ this is true in the case of a single big bundle, but the patch introduced in this PR still definitely helps in the case where the number of dependencies included in an InstallPlan is the primary contributor to its size.
manifestForStep := func(step *v1alpha1.Step) (string, error) { | ||
manifest := step.Resource.Manifest | ||
ref := refForStep(step) | ||
if ref != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: return early if ref == nil
to drop the indentation below
@@ -94,7 +113,7 @@ func RBACForClusterServiceVersion(csv *v1alpha1.ClusterServiceVersion) (map[stri | |||
// Create RoleBinding | |||
roleBinding := &rbacv1.RoleBinding{ | |||
ObjectMeta: metav1.ObjectMeta{ | |||
Name: generateName(fmt.Sprintf("%s-%s", role.GetName(), permission.ServiceAccountName)), | |||
Name: generateName(fmt.Sprintf("%s-%s", role.GetName(), permission.ServiceAccountName), HashObject(fmt.Sprintf("%s-%s", role.GetName(), permission.ServiceAccountName))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can this constitute an info leak?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can the Role
and RoleBindings
share a name since we're generating them 1:1?
@@ -297,7 +297,7 @@ func (b *BundleLookup) SetCondition(cond BundleLookupCondition) BundleLookupCond | |||
if existing.Type != cond.Type { | |||
continue | |||
} | |||
if existing.Status == cond.Status { | |||
if existing.Status == cond.Status && existing.Reason == cond.Reason { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to separate our utility methods from the data type in the future. What about defining decorator types in OLM that embed the data type instead?
func (o *Operator) unpackBundles(plan *v1alpha1.InstallPlan) (bool, *v1alpha1.InstallPlan, error) { | ||
out := plan.DeepCopy() | ||
unpacked := true | ||
unpackedCount := len(out.Status.BundleLookups) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like unpackedCount
is being used to trigger any conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops - a planned refactor that I abandoned. will remove.
There is a limit for how big a bundle can be right now, which is the amount that we can fit into a single configmap. This PR removes (well, extends greatly) the limit of how many bundles can be unpacked and installed via a single installplan. This limit was hit with 19 operators in a cluster. |
7ad841b
to
57cdfe1
Compare
57cdfe1
to
d0e858d
Compare
return utilrand.SafeEncodeString(fmt.Sprint(hasher.Sum32())) | ||
} | ||
|
||
func generateName(base string, hash string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the contract of this function to return the concatenation of base+hash, truncating base such that the result is short enough to be an object name? I'm worried about slice range out-of-bound panics and overly-long return values. Couldn't base have length 127 in the invocation on L116?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed tests for this - thanks for calling it out. see rbac_test.go
/retest |
"encoding/json" | ||
"fmt" | ||
|
||
"github.com/operator-framework/api/pkg/operators/v1alpha1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I've been asking myself whether we treat in-org dependencies as "external" for the purposes of import grouping. Thoughts?
the installplan status.plan field stores a list of all manifests that will be installed when the installplan is approved. if the installplan is resolving a large set of dependencies, it is possible for it to exceed various limits in kube (grpc request size limit, etcd value size limit, etc). this commit changes the way objects are tracked if they come from bundle images. in that case, the manifests are already unpacked onto the cluster into a configmap. the installplan now stores a reference to the source configmap for each step, rather than storing the manifest itself. this allows the installplan to scale to a large number of resolved objects.
d0e858d
to
a7659db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ecordell, njhale The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-4.5 |
@ecordell: once the present PR merges, I will cherry-pick it on top of release-4.5 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/hold cancel |
@ecordell: All pull requests linked via external trackers have merged: operator-framework/operator-lifecycle-manager#1589. Bugzilla bug 1849164 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@ecordell: new pull request created: #1594 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Description of the change:
If a manifest comes from a bundle image, instead of storing the manifest itself in the installplan status, we now store a reference to the configmap that contains the manifest. When installing, OLM will look up the manifest and use the correct one.
One additional change was necessary - role/rolebinding name generation is now deterministic. This is because some resources for an installplan are generated by OLM, rather than read directly from the unpacked bundle, and we need identities for them so that we can track the the status of an individual step.
Steps that are references will look like this:
where the configmap references the unpacked bundle for that step.
Motivation for the change:
It is possible to hit size limits in the installplan, if resolution results in lots of large manifests. Bundle images are already unpacked into configmaps on-cluster, so persisting the unpacked manifests in the installplan is not necessary.
Reviewer Checklist
/docs