Skip to content

OCPBUGS-78: Cleanup conversion webhooks when an operator is uninstalled #360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 25, 2022

Conversation

perdasilva
Copy link
Contributor

Problem: When uninstalling a CSV, OLM has always avoided deleting the
associated CRD as all CRs on cluster are subsequently deleted, possibly
resulting in user dataloss.

OLM supports defining conversion webhooks within the CSV. On cluster,
conversion webhooks are defined with a CRD and point to a service that
handles conversion. If the service is unable to fulfill the request,
all requests against the CRs associated with the CRD will fail.

When uninstalling a CSV, OLM does not remove the conversion webhook from
the CRD, meaning that all requests against the CRs associated with the
CRD will fail, resulting in at least two concerns:

  1. OLM is unable to subsequently reinstall the operator. When installing
    a CSV, if the CRD already exists and instances of CRs exist as well,
    OLM performs a series of checks which ensure that none of the CRs are
    invalidated against the new schema. The existing CRD's conversion
    webhooks points to a non-existant service, causing the check to fail
    and preventing installs.
  2. Broken conversion webhooks causes kubernete's garbage collection to
    fail.

Solution: When a CSV is deleted, if no CSV exists that is replacing it,
set the CRD's conversion strategy to None.

Signed-off-by: Alexander Greene [email protected]

Upstream-commit: 94374983d448c56d031f0493b84b6dce37b84741
Upstream-repository: operator-lifecycle-manager

@perdasilva perdasilva changed the title Cleanup conversion webhooks when an operator is uninstalled (#2832) Cleanup conversion webhooks when an operator is uninstalled Aug 16, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 16, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: perdasilva

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 16, 2022
@perdasilva
Copy link
Contributor Author

perdasilva commented Aug 16, 2022

/hold don't merge before #359 gets merged and qe approves

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 16, 2022
@perdasilva perdasilva changed the title Cleanup conversion webhooks when an operator is uninstalled fix: Cleanup conversion webhooks when an operator is uninstalled Aug 16, 2022
@oceanc80
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 16, 2022
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 16, 2022
@perdasilva
Copy link
Contributor Author

/retest

Problem: When uninstalling a CSV, OLM has always avoided deleting the
associated CRD as all CRs on cluster are subsequently deleted, possibly
resulting in user dataloss.

OLM supports defining conversion webhooks within the CSV. On cluster,
conversion webhooks are defined with a CRD and point to a service that
handles conversion.  If the service is unable to fulfill the request,
all requests against the CRs associated with the CRD will fail.

When uninstalling a CSV, OLM does not remove the conversion webhook from
the CRD, meaning that all requests against the CRs associated with the
CRD will fail, resulting in at least two concerns:
1. OLM is unable to subsequently reinstall the operator. When installing
   a CSV, if the CRD already exists and instances of CRs exist as well,
   OLM performs a series of checks which ensure that none of the CRs are
   invalidated against the new schema. The existing CRD's conversion
   webhooks points to a non-existant service, causing the check to fail
   and preventing installs.
2. Broken conversion webhooks causes kubernete's garbage collection to
   fail.

Solution: When a CSV is deleted, if no CSV exists that is replacing it,
set the CRD's conversion strategy to None.

Signed-off-by: Alexander Greene <[email protected]>

Upstream-commit: 94374983d448c56d031f0493b84b6dce37b84741
Upstream-repository: operator-lifecycle-manager
@perdasilva
Copy link
Contributor Author

/retest

3 similar comments
@perdasilva
Copy link
Contributor Author

/retest

@perdasilva
Copy link
Contributor Author

/retest

@anik120
Copy link
Contributor

anik120 commented Aug 18, 2022

/retest

@perdasilva perdasilva changed the title fix: Cleanup conversion webhooks when an operator is uninstalled OCPBUGS-78: Cleanup conversion webhooks when an operator is uninstalled Aug 22, 2022
@openshift-ci-robot openshift-ci-robot added the jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. label Aug 22, 2022
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 22, 2022

@perdasilva: This pull request references [Jira Issue OCPBUGS-78](https://issues.redhat.com//browse/OCPBUGS-78), which is invalid:

  • expected the bug to target the "4.12.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Problem: When uninstalling a CSV, OLM has always avoided deleting the
associated CRD as all CRs on cluster are subsequently deleted, possibly
resulting in user dataloss.

OLM supports defining conversion webhooks within the CSV. On cluster,
conversion webhooks are defined with a CRD and point to a service that
handles conversion. If the service is unable to fulfill the request,
all requests against the CRs associated with the CRD will fail.

When uninstalling a CSV, OLM does not remove the conversion webhook from
the CRD, meaning that all requests against the CRs associated with the
CRD will fail, resulting in at least two concerns:

  1. OLM is unable to subsequently reinstall the operator. When installing
    a CSV, if the CRD already exists and instances of CRs exist as well,
    OLM performs a series of checks which ensure that none of the CRs are
    invalidated against the new schema. The existing CRD's conversion
    webhooks points to a non-existant service, causing the check to fail
    and preventing installs.
  2. Broken conversion webhooks causes kubernete's garbage collection to
    fail.

Solution: When a CSV is deleted, if no CSV exists that is replacing it,
set the CRD's conversion strategy to None.

Signed-off-by: Alexander Greene [email protected]

Upstream-commit: 94374983d448c56d031f0493b84b6dce37b84741
Upstream-repository: operator-lifecycle-manager

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Aug 22, 2022
@perdasilva
Copy link
Contributor Author

/bugzilla refresh

@perdasilva
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Aug 22, 2022
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 22, 2022

@perdasilva: This pull request references [Jira Issue OCPBUGS-78](https://issues.redhat.com//browse/OCPBUGS-78), which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.12.0) matches configured target version for branch (4.12.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST)

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Aug 22, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 22, 2022

@perdasilva: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Retaining the bugzilla/valid-bug label as it was manually added.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@perdasilva
Copy link
Contributor Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 22, 2022
@jianzhangbjz
Copy link
Contributor

We will cancel /hold after the testing passes.
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 24, 2022
@jianzhangbjz
Copy link
Contributor

/hold

@kuiwang02
Copy link

/label qe-approved
/unhold

@openshift-ci openshift-ci bot added qe-approved Signifies that QE has signed off on this PR and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Aug 24, 2022
@grokspawn
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 25, 2022
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 2 against base HEAD f8c466a and 8 for PR HEAD 9ac51ba in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 25, 2022

@perdasilva: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 11644a5 into openshift:master Aug 25, 2022
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 25, 2022

@perdasilva: All pull requests linked via external trackers have merged:

[Jira Issue OCPBUGS-78](https://issues.redhat.com//browse/OCPBUGS-78) has been moved to the MODIFIED state.

In response to this:

Problem: When uninstalling a CSV, OLM has always avoided deleting the
associated CRD as all CRs on cluster are subsequently deleted, possibly
resulting in user dataloss.

OLM supports defining conversion webhooks within the CSV. On cluster,
conversion webhooks are defined with a CRD and point to a service that
handles conversion. If the service is unable to fulfill the request,
all requests against the CRs associated with the CRD will fail.

When uninstalling a CSV, OLM does not remove the conversion webhook from
the CRD, meaning that all requests against the CRs associated with the
CRD will fail, resulting in at least two concerns:

  1. OLM is unable to subsequently reinstall the operator. When installing
    a CSV, if the CRD already exists and instances of CRs exist as well,
    OLM performs a series of checks which ensure that none of the CRs are
    invalidated against the new schema. The existing CRD's conversion
    webhooks points to a non-existant service, causing the check to fail
    and preventing installs.
  2. Broken conversion webhooks causes kubernete's garbage collection to
    fail.

Solution: When a CSV is deleted, if no CSV exists that is replacing it,
set the CRD's conversion strategy to None.

Signed-off-by: Alexander Greene [email protected]

Upstream-commit: 94374983d448c56d031f0493b84b6dce37b84741
Upstream-repository: operator-lifecycle-manager

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@timflannagan
Copy link
Contributor

/cherrypick release-4.11

@openshift-cherrypick-robot

@timflannagan: new pull request created: #388

In response to this:

/cherrypick release-4.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@asmacdo
Copy link
Contributor

asmacdo commented Nov 10, 2022

/cherrypick release-4.10

@openshift-cherrypick-robot

@asmacdo: #360 failed to apply on top of branch "release-4.10":

Applying: Cleanup conversion webhooks when an operator is uninstalled (#2832)
Using index info to reconstruct a base tree...
M	staging/operator-lifecycle-manager/pkg/controller/operators/olm/operator.go
M	staging/operator-lifecycle-manager/test/e2e/csv_e2e_test.go
M	staging/operator-lifecycle-manager/test/e2e/webhook_e2e_test.go
M	vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/olm/operator.go
Falling back to patching base and 3-way merge...
Auto-merging vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/olm/operator.go
Auto-merging staging/operator-lifecycle-manager/test/e2e/webhook_e2e_test.go
CONFLICT (content): Merge conflict in staging/operator-lifecycle-manager/test/e2e/webhook_e2e_test.go
Auto-merging staging/operator-lifecycle-manager/test/e2e/csv_e2e_test.go
Auto-merging staging/operator-lifecycle-manager/pkg/controller/operators/olm/operator.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Cleanup conversion webhooks when an operator is uninstalled (#2832)
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherrypick release-4.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

perdasilva pushed a commit to perdasilva/operator-framework-olm that referenced this pull request Jan 15, 2025
)

Bumps [k8s.io/apiextensions-apiserver](https://github.com/kubernetes/apiextensions-apiserver) from 0.31.0 to 0.31.1.
- [Release notes](https://github.com/kubernetes/apiextensions-apiserver/releases)
- [Commits](kubernetes/apiextensions-apiserver@v0.31.0...v0.31.1)

---
updated-dependencies:
- dependency-name: k8s.io/apiextensions-apiserver
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Upstream-repository: api
Upstream-commit: e09acef76a53b7b14d2438275dff77a34bea88dc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.