Skip to content

fix(openshift): block upgrades on invalid max properties #2302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

njhale
Copy link
Member

@njhale njhale commented Jul 29, 2021

Fail open when olm.maxOpenShiftVersion properties have invalid
values. This means that all invalid values are logged but are ultimately
ignored when determining cluster upgradeability.

Block OpenShift upgrades while:

  • olm.maxOpenShiftVersion properties have invalid values
  • cluster information is unavailable; e.g. the desired version of the cluster is undefined
  • an installed operator has declared more than one olm.MaxOpenShiftVersion
    property

See https://bugzilla.redhat.com/show_bug.cgi?id=1986753 for motivation.

@openshift-ci openshift-ci bot requested review from anik120 and dinhxuanvu July 29, 2021 04:59
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 29, 2021
@njhale njhale force-pushed the fix/openshift/compat-best-effort branch from 53b048b to 9780e9a Compare July 29, 2021 04:59
@joelanford
Copy link
Member

Is ignoring invalid max versions the right thing to do here? It seems like it would be fairly easy for someone to accidentally fat-finger a version number and end up getting an upgrade when one isn't actually supported, which could then lead to an even worse situation.

Is it possible to error out and notify the user in some way that they have an invalid property value?

@exdx
Copy link
Member

exdx commented Jul 29, 2021

Is ignoring invalid max versions the right thing to do here? It seems like it would be fairly easy for someone to accidentally fat-finger a version number and end up getting an upgrade when one isn't actually supported, which could then lead to an even worse situation.

Is it possible to error out and notify the user in some way that they have an invalid property value?

I agree, maybe it safer to fail closed versus fail open when its a (delicate) update process?

@njhale
Copy link
Member Author

njhale commented Jul 29, 2021

@joelanford @exdx

Thanks for the quick review.

Is it possible to error out and notify the user in some way that they have an invalid property value?

I had the same thought as I was implementing this. I'll update the logic to fail closed and bubble an error to the ClusterVersion when an unparsable max property is found.

(part of the reason I did it like this in the first place was the expectation of QE to fail open in the original BZ)

@njhale
Copy link
Member Author

njhale commented Jul 29, 2021

/hold

until I update this

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 29, 2021
@njhale njhale force-pushed the fix/openshift/compat-best-effort branch 4 times, most recently from 45c1dd9 to 1ec1e4b Compare July 30, 2021 06:21
@openshift-ci
Copy link

openshift-ci bot commented Jul 30, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: njhale

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@njhale njhale force-pushed the fix/openshift/compat-best-effort branch from 1ec1e4b to 10bf3c2 Compare July 30, 2021 06:27
@njhale njhale changed the title fix(openshift): ignore invalid max properties fix(openshift): block upgrades on invalid max properties Jul 30, 2021
@njhale
Copy link
Member Author

njhale commented Jul 30, 2021

/hold cancel

updated behavior to address feedback.

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2021
@njhale njhale force-pushed the fix/openshift/compat-best-effort branch from 10bf3c2 to 649f635 Compare July 30, 2021 13:19
Block OpenShift upgrades while:

- olm.maxOpenShiftVersion properties have invalid values
- cluster information is unavailable; e.g. the desired version of the cluster is undefined
- an installed operator has declared more than one olm.MaxOpenShiftVersion
  property

See https://bugzilla.redhat.com/show_bug.cgi?id=1986753 for motivation.

Signed-off-by: Nick Hale <[email protected]>
@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2021
@njhale njhale force-pushed the fix/openshift/compat-best-effort branch from 649f635 to 1314632 Compare July 30, 2021 19:16
@njhale
Copy link
Member Author

njhale commented Jul 30, 2021

/hold cancel

added behavior

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2021
Copy link
Member

@exdx exdx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some thoughts, we can let @joelanford review as well.
/lgtm
/hold

@@ -105,58 +131,59 @@ func incompatibleOperators(ctx context.Context, cli client.Client) (skews, error
}

if next == nil {
return nil, nil
// Note: This shouldn't happen
return nil, fmt.Errorf("Failed to determine next OpenShift Y-stream release")
}
next.Minor++

csvList := &operatorsv1alpha1.ClusterServiceVersionList{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it possible to build this list excluding copied CSVs via the annotation, so we don't potentially make a huge cache call? Not sure how much that matters though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could do something with the newly introduced metadata client?

My preference would be to optimize in a follow up, since the topic of the PR is potentially release blocking.

s skews
errs []error
)
var incompatible skews
for _, csv := range csvList.Items {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One case I'm thinking about is where Operator A has a defined maxOpenshiftVersion property, but depends on Operator B which does not have one defined. In that case, on upgrade Operator A will report ready, so will Operator B, but B may break, breaking A.

Catching this case would require doing introspection on the CSV, and checking any dependencies for their properties, which may be overly restrictive. But it's a scenario you could imagine happening on the cluster somewhat frequently if a popular dependency doesn't specify its maxOpenshiftVersion. Not sure how important this is to address up-front though.

Copy link
Member Author

@njhale njhale Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, on upgrade Operator A will report ready, so will Operator B, but B may break, breaking A.

could you expand on how this relates to blocking cluster upgrades wrt maxOpenShiftVersion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would not block cluster upgrades per se but this scenario would break the Operator that was using the maxOpenShiftVersion property, since its dependencies don't specify max version

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 2, 2021
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 2, 2021
{
namespace: ns.GetName(),
name: incompatible.GetName(),
err: fmt.Errorf(`Defining more than one "%s" property is not allowed`, MaxOpenShiftVersionProperty),
Copy link
Member

@joelanford joelanford Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, you can also use %q instead of "%s" to get a quoted string, which helps when you'd rather use double quotes to define the string literal.

}

func desiredRelease(ctx context.Context, cli client.Client) (*semver.Version, error) {
cv := configv1.ClusterVersion{}
if err := cli.Get(ctx, client.ObjectKey{Name: "version"}, &cv); err != nil { // "version" is the name of OpenShift's ClusterVersion singleton
return nil, err
return nil, &transientError{fmt.Errorf("Failed to get ClusterVersion: %w", err)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that just crossed my mind: Is this section of code disabled in non-OpenShift clusters? I assume maxOpenshiftVersion is just ignored unless we're in Openshift?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the only call site for this function is in incompatibleOperators, and that function is only used in the context of the ClusterOperator controller that handles the maxOpenshiftVersion logic.

I guess we gate the creation of this controller on the --writeStatusName CLI flag that we set downstream, which seems like a relatively loose check (couldn't we discovery this API dynamically instead?), so tl;dr should be ignored on non-OCP clusters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelanford, @timflannagan hit the nail on the head: the entire controller is disabled unless --writeStatusName is given when running olm-operator.

desired.Reason = ErrorCheckingOperatorCompatibility
desired.Message = fmt.Sprintf("Encountered errors while checking compatibility with the next minor version of OpenShift: %s", err)
} else if len(incompatible) > 0 {
// Operators are installed that have incompatible and/or invalid max versions
desired.Status = configv1.ConditionFalse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to wrap my head around whether it makes sense to specify configv1.ConditionTrue if we're setting the reason to IncompatibleOperatorsInstalled when encountering incompatibility?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disregard - I realize I was missing the full context and needed to look at the full function control flow.

@timflannagan
Copy link
Member

/lgtm

@joelanford
Copy link
Member

/lgtm
/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 2, 2021
@openshift-ci openshift-ci bot merged commit 734c6d0 into operator-framework:master Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants