Skip to content

Annotate CRDs that are installed alongside CSVs. #2114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

benluddy
Copy link
Contributor

@benluddy benluddy commented Apr 22, 2021

In the absence of a way to correlate installed resources back to their
bundle of origin, it's possible for upgrades to complete
early. Upgrades begin when the catalog operator creates a new
CSV (with .spec.replaces set) as part of InstallPlan execution, and
complete when the olm operator sees that the new CSV has .status.phase
Succeeded. When both versions of the operator being upgraded own the
same CRD, the new CSV may use the existing CRD to satisfy its CRD
ownership requirement, thereby transitioning to Succeeded and deleting
the old CSV.

This problem can be avoided for namespace-scoped bundle resources,
which perform the transfer by adding the new CSV as an owner reference
and allowing garbage collection to remove the old owner reference
post-upgrade. This is not possible for CRDs because cluster-scoped
resources cannot have owner references to a namespace-scoped resource
like ClusterServiceVersion.

It may be necessary to make a similar change affecting other
cluster-scoped resources, but CRDs are a common culprit of early
upgrade completion. Ideally, a first-class, cluster-scoped Bundle
resource will obsolete the need to use an annotation to track bundle
of origin.

Upstream fix for: https://bugzilla.redhat.com/show_bug.cgi?id=1947946

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 22, 2021
@openshift-ci-robot
Copy link
Collaborator

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: benluddy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 22, 2021
@benluddy benluddy marked this pull request as ready for review April 26, 2021 15:15
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 26, 2021
@benluddy benluddy force-pushed the alongside branch 2 times, most recently from d7af223 to 30d431d Compare April 26, 2021 18:13
@exdx
Copy link
Member

exdx commented Apr 28, 2021

Just to add more context to the original description:

The problem of early success of the new CSV (based off the ownership of the same CRDs as the old CSV already on-cluster) is that the old CSV gets deleted too early. The old CSV may have ownerrefs to different resources which would then be GC'd automatically and cause the new CSV to go into a pending state. For example, if the old CSV owned a ServiceAccount, it would be removed before the new CSV could use it, and the new CSV would then be stuck in a Pending phase.

Early success of the new CSV can also have an impact in the case where CR validation fails, but the CSV is in a succeeded state (since the CRDs it needs are already on-cluster), which the user may not notice unless they check the InstallPlan.

Copy link
Member

@njhale njhale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job!

I think this is really close.

"github.com/pkg/errors"
"github.com/sirupsen/logrus"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is this supposed to be part of the following group of imports?

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 1, 2021
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 5, 2021
@benluddy
Copy link
Contributor Author

benluddy commented May 5, 2021

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 5, 2021
@benluddy
Copy link
Contributor Author

benluddy commented May 6, 2021

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 6, 2021
In the absence of a way to correlate installed resources back to their
bundle of origin, it's possible for upgrades to complete
early. Upgrades begin when the catalog operator creates a new
CSV (with .spec.replaces set) as part of InstallPlan execution, and
complete when the olm operator sees that the new CSV has .status.phase
Succeeded. When both versions of the operator being upgraded own the
same CRD, the new CSV may use the existing CRD to satisfy its CRD
ownership requirement, thereby transitioning to Succeeded and deleting
the old CSV.

This problem can be avoided for namespace-scoped bundle resources,
which perform the transfer by adding the new CSV as an owner reference
and allowing garbage collection to remove the old owner reference
post-upgrade. This is not possible for CRDs because cluster-scoped
resources cannot have owner references to a namespace-scoped resource
like ClusterServiceVersion.

It may be necessary to make a similar change affecting other
cluster-scoped resources, but CRDs are a common culprit of early
upgrade completion. Ideally, a first-class, cluster-scoped Bundle
resource will obsolete the need to use an annotation to track bundle
of origin.

Signed-off-by: Ben Luddy <[email protected]>
@njhale
Copy link
Member

njhale commented May 7, 2021

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 7, 2021
@openshift-merge-robot openshift-merge-robot merged commit 8118f7d into operator-framework:master May 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants