-
Notifications
You must be signed in to change notification settings - Fork 562
fix(openshift): block upgrades on invalid max properties #2302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(openshift): block upgrades on invalid max properties #2302
Conversation
53b048b
to
9780e9a
Compare
Is ignoring invalid max versions the right thing to do here? It seems like it would be fairly easy for someone to accidentally fat-finger a version number and end up getting an upgrade when one isn't actually supported, which could then lead to an even worse situation. Is it possible to error out and notify the user in some way that they have an invalid property value? |
I agree, maybe it safer to fail closed versus fail open when its a (delicate) update process? |
Thanks for the quick review.
I had the same thought as I was implementing this. I'll update the logic to fail closed and bubble an error to the ClusterVersion when an unparsable max property is found. (part of the reason I did it like this in the first place was the expectation of QE to fail open in the original BZ) |
/hold until I update this |
45c1dd9
to
1ec1e4b
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: njhale The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1ec1e4b
to
10bf3c2
Compare
/hold cancel updated behavior to address feedback. |
10bf3c2
to
649f635
Compare
Block OpenShift upgrades while: - olm.maxOpenShiftVersion properties have invalid values - cluster information is unavailable; e.g. the desired version of the cluster is undefined - an installed operator has declared more than one olm.MaxOpenShiftVersion property See https://bugzilla.redhat.com/show_bug.cgi?id=1986753 for motivation. Signed-off-by: Nick Hale <[email protected]>
649f635
to
1314632
Compare
/hold cancel added behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some thoughts, we can let @joelanford review as well.
/lgtm
/hold
@@ -105,58 +131,59 @@ func incompatibleOperators(ctx context.Context, cli client.Client) (skews, error | |||
} | |||
|
|||
if next == nil { | |||
return nil, nil | |||
// Note: This shouldn't happen | |||
return nil, fmt.Errorf("Failed to determine next OpenShift Y-stream release") | |||
} | |||
next.Minor++ | |||
|
|||
csvList := &operatorsv1alpha1.ClusterServiceVersionList{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is it possible to build this list excluding copied CSVs via the annotation, so we don't potentially make a huge cache call? Not sure how much that matters though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could do something with the newly introduced metadata client?
My preference would be to optimize in a follow up, since the topic of the PR is potentially release blocking.
s skews | ||
errs []error | ||
) | ||
var incompatible skews | ||
for _, csv := range csvList.Items { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One case I'm thinking about is where Operator A has a defined maxOpenshiftVersion property, but depends on Operator B which does not have one defined. In that case, on upgrade Operator A will report ready, so will Operator B, but B may break, breaking A.
Catching this case would require doing introspection on the CSV, and checking any dependencies for their properties, which may be overly restrictive. But it's a scenario you could imagine happening on the cluster somewhat frequently if a popular dependency doesn't specify its maxOpenshiftVersion. Not sure how important this is to address up-front though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, on upgrade Operator A will report ready, so will Operator B, but B may break, breaking A.
could you expand on how this relates to blocking cluster upgrades wrt maxOpenShiftVersion
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it would not block cluster upgrades per se but this scenario would break the Operator that was using the maxOpenShiftVersion property, since its dependencies don't specify max version
{ | ||
namespace: ns.GetName(), | ||
name: incompatible.GetName(), | ||
err: fmt.Errorf(`Defining more than one "%s" property is not allowed`, MaxOpenShiftVersionProperty), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future reference, you can also use %q
instead of "%s"
to get a quoted string, which helps when you'd rather use double quotes to define the string literal.
} | ||
|
||
func desiredRelease(ctx context.Context, cli client.Client) (*semver.Version, error) { | ||
cv := configv1.ClusterVersion{} | ||
if err := cli.Get(ctx, client.ObjectKey{Name: "version"}, &cv); err != nil { // "version" is the name of OpenShift's ClusterVersion singleton | ||
return nil, err | ||
return nil, &transientError{fmt.Errorf("Failed to get ClusterVersion: %w", err)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that just crossed my mind: Is this section of code disabled in non-OpenShift clusters? I assume maxOpenshiftVersion
is just ignored unless we're in Openshift?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the only call site for this function is in incompatibleOperators
, and that function is only used in the context of the ClusterOperator controller that handles the maxOpenshiftVersion logic.
I guess we gate the creation of this controller on the --writeStatusName
CLI flag that we set downstream, which seems like a relatively loose check (couldn't we discovery this API dynamically instead?), so tl;dr should be ignored on non-OCP clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joelanford, @timflannagan hit the nail on the head: the entire controller is disabled unless --writeStatusName
is given when running olm-operator
.
desired.Reason = ErrorCheckingOperatorCompatibility | ||
desired.Message = fmt.Sprintf("Encountered errors while checking compatibility with the next minor version of OpenShift: %s", err) | ||
} else if len(incompatible) > 0 { | ||
// Operators are installed that have incompatible and/or invalid max versions | ||
desired.Status = configv1.ConditionFalse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just trying to wrap my head around whether it makes sense to specify configv1.ConditionTrue if we're setting the reason to IncompatibleOperatorsInstalled
when encountering incompatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disregard - I realize I was missing the full context and needed to look at the full function control flow.
/lgtm |
/lgtm |
Fail open when olm.maxOpenShiftVersion properties have invalidvalues. This means that all invalid values are logged but are ultimately
ignored when determining cluster upgradeability.
Block OpenShift upgrades while:
property
See https://bugzilla.redhat.com/show_bug.cgi?id=1986753 for motivation.