Skip to content

[release-4.5] Bug 1862407: fix(resolver): be smarter about the way downgrades are attempted #1689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ecordell
Copy link
Member

@ecordell ecordell commented Jul 29, 2020

"downgrade" here refers to downgrading potential updates back to what
has already been installed.

this fixes issues where OLM would otherwise not have been able to
determine a safe set of operators to update to - the case where, after
tentative updates to the current generation, there are required apis
missing from the set.

the previous algorithm solves this by downgrading the updates that
require the missing apis, which has the potential to downgrade back
to the initial set of operators.

in the case where the api was provided by some operator in the previous
generation, it would be possible to not downgrade all requirers, but
rather downgrade the operator that previously provided the api.

this commit implements an exhaustive search of potential downgrades for
a generation, and uses one without missing required apis if one is found

if none is found, the old behavior takes place, and the set may still
contract back to what is currently installed.

the new resolver avoids this issue entirely, so to avoid complicating
the implementation a limit of 64 updated operators per generation has
been added.

the implementation uses a bitmask to select either the old or new
operator from the set. finding all possible combinations of old/new
versions is then a simple integer count from 0 to the max.

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Docs updated or added to /docs
  • Commit messages sensible and descriptive

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 29, 2020
@@ -153,7 +162,65 @@ func (e *NamespaceGenerationEvolver) queryForRequiredAPIs() error {
return nil
}

func (e *NamespaceGenerationEvolver) downgradeUpdates() error {
// no need to attempt downgrades
if len(e.gen.MissingAPIs()) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit: the caller for this function might be more obvious if this check happened outside the method itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just keeping in line with the rest of the object, which has a series of methods which may or may not affect the generation - adding the check in Evolve would make it read weirdly, IMO

@ecordell ecordell force-pushed the requiredapi-downgrade-allpermuations branch from 6c7f615 to d565dce Compare July 29, 2020 19:55
@ecordell ecordell changed the title fix(resolver): be smarter about the way downgrades are attempted Bug 1861690: fix(resolver): be smarter about the way downgrades are attempted Jul 29, 2020
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. label Jul 29, 2020
@openshift-ci-robot
Copy link
Collaborator

@ecordell: This pull request references Bugzilla bug 1861690, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1861690: fix(resolver): be smarter about the way downgrades are attempted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jul 29, 2020
@Bowenislandsong
Copy link
Member

This PR failed tests for 1 times with 1 individual failed tests and 4 skipped tests. A test is considered flaky if failed on multiple commits.

totaltestcount: 1
flaketestcount: 1
skippedtestcount: 4
flaketests:

  • classname: End-to-end
    name: 'Subscription creation using existing CSV'
    counts: 1
    details:
    • count: 1
      error: |4-

      /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/subscription_e2e_test.go:70
      
      	Error Trace:	subscription_e2e_test.go:77
      	            				runner.go:113
      	            				runner.go:64
      	            				it_node.go:26
      	            				spec.go:215
      	            				spec.go:138
      	            				spec_runner.go:200
      	            				spec_runner.go:170
      	            				spec_runner.go:66
      	            				suite.go:62
      	            				ginkgo_dsl.go:226
      	            				ginkgo_dsl.go:214
      	            				e2e_test.go:54
      	Error:      	Received unexpected error:
      	            	E2E bug detected: configmaps "mock-ocs" already exists
      
      /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/github.com/stretchr/testify/require/require.go:1005
      
    meandurationsec: 0.055855
    skippedtests:
  • classname: End-to-end
    name: 'Catalog image update'
    counts: 1
    details: []
    meandurationsec: 0.163795
  • classname: End-to-end
    name: 'Subscriptions create required objects from Catalogs Given a Namespace
    when a CatalogSource is created with a bundle that contains prometheus objects
    creating a subscription using the CatalogSource should have created the expected
    prometheus objects
    '
    counts: 1
    details: []
    meandurationsec: 2.183249
  • classname: End-to-end
    name: 'Subscription updates existing install plan'
    counts: 1
    details: []
    meandurationsec: 0.024478
  • classname: End-to-end
    name: 'Subscriptions create required objects from Catalogs Given a Namespace
    when a CatalogSource is created with a bundle that contains prometheus objects
    creating a subscription using the CatalogSource should install the operator
    successfully
    '
    counts: 1
    details: []
    meandurationsec: 2.116935

@dinhxuanvu
Copy link
Member

The PR looks good to me. Plus, I tested it against OCS and it works.
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 29, 2020
Copy link
Member Author

@ecordell ecordell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

Noticed a bug from refactoring - will fix in the morning

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2020
@ecordell
Copy link
Member Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2020
@ecordell
Copy link
Member Author

/retest

@ecordell
Copy link
Member Author

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2020
@dinhxuanvu
Copy link
Member

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2020
@dinhxuanvu
Copy link
Member

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@dinhxuanvu
Copy link
Member

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

4 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@ecordell
Copy link
Member Author

/retest

@ecordell ecordell changed the title Bug 1861690: fix(resolver): be smarter about the way downgrades are attempted [release-4.5] Bug 1861690: fix(resolver): be smarter about the way downgrades are attempted Jul 31, 2020
@ecordell ecordell changed the base branch from master to release-4.5 July 31, 2020 11:22
@openshift-ci-robot openshift-ci-robot removed the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jul 31, 2020
@openshift-ci-robot
Copy link
Collaborator

@ecordell: This pull request references Bugzilla bug 1861690, which is invalid:

  • expected the bug to target the "4.5.z" release, but it targets "4.6.0" instead
  • expected Bugzilla bug 1861690 to depend on a bug targeting a release in 4.6.0, 4.6.z and in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

[release-4.5] Bug 1861690: fix(resolver): be smarter about the way downgrades are attempted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Jul 31, 2020
"downgrade" here refers to downgrading potential updates back to what
has already been installed.

this fixes issues where OLM would otherwise not have been able to
determine a safe set of operators to update to - the case where, after
tentative updates to the current generation, there are required apis
missing from the set.

the previous algorithm solves this by downgrading the updates that
require the missing apis, which has the potential to downgrade back
to the initial set of operators.

in the case where the api was provided by some operator in the previous
generation, it would be possible to not downgrade all requirers, but
rather downgrade the operator that previously provided the api.

this commit implements an exhaustive search of potential downgrades for
a generation, and uses one without missing required apis if one is found

if none is found, the old behavior takes place, and the set may still
contract back to what is currently installed.

the new resolver avoids this issue entirely, so to avoid complicating
the implementation a limit of 64 updated operators per generation has
been added.

the implementation uses a bitmask to select either the old or new
operator from the set. finding all possible combinations of old/new
versions is then a simple integer count from 0 to the max.
@ecordell ecordell force-pushed the requiredapi-downgrade-allpermuations branch from d565dce to a58a338 Compare July 31, 2020 11:25
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 31, 2020
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ecordell

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ecordell ecordell changed the title [release-4.5] Bug 1861690: fix(resolver): be smarter about the way downgrades are attempted [release-4.5] Bug 1862407: fix(resolver): be smarter about the way downgrades are attempted Jul 31, 2020
@openshift-ci-robot
Copy link
Collaborator

@ecordell: This pull request references Bugzilla bug 1862407, which is invalid:

  • expected dependent Bugzilla bug 1861690 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

[release-4.5] Bug 1862407: fix(resolver): be smarter about the way downgrades are attempted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ecordell
Copy link
Member Author

/bugzilla refresh

(retargeted this to 4.5 since the new resolver doesn't have this issue)

@openshift-ci-robot
Copy link
Collaborator

@ecordell: This pull request references Bugzilla bug 1862407, which is invalid:

  • expected dependent Bugzilla bug 1861690 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is MODIFIED instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

(retargeted this to 4.5 since the new resolver doesn't have this issue)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ecordell
Copy link
Member Author

/retest

@ecordell
Copy link
Member Author

ecordell commented Aug 3, 2020

/bugzilla refresh

@openshift-ci-robot
Copy link
Collaborator

@ecordell: This pull request references Bugzilla bug 1862407, which is invalid:

  • expected dependent Bugzilla bug 1861690 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sdodson sdodson added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Aug 6, 2020
@sdodson
Copy link

sdodson commented Aug 6, 2020

Overrode bugzilla/valid-bug after reviewing the upstream bug, QE seems convinced that the implementation is not relevant to 4.6. This needs a fresh lgtm though.

@sdodson
Copy link

sdodson commented Aug 6, 2020

/retest

@dinhxuanvu
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 6, 2020
@sdodson sdodson added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Aug 6, 2020
@openshift-merge-robot openshift-merge-robot merged commit 454ee08 into operator-framework:release-4.5 Aug 6, 2020
@openshift-ci-robot
Copy link
Collaborator

@ecordell: All pull requests linked via external trackers have merged: . Bugzilla bug 1862407 has been moved to the MODIFIED state.

In response to this:

[release-4.5] Bug 1862407: fix(resolver): be smarter about the way downgrades are attempted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants