Skip to content

Bug 1982250: (fix)InstallPlan: Do not tranisition IP to failed on OG/SA failure #119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

anik120
Copy link
Contributor

@anik120 anik120 commented Jul 14, 2021

In #2077, a new phase Failed was introduced for InstallPlans, and failure in
detecting a valid OperatorGroup(OG) or a Service Account(SA) for the namespace
the InstallPlan was being created in would transition the InstallPlan to the
Failed state, i.e failure to detected these resources when the InstallPlan was
reconciled the first time was considered a permanant failure. This is a regression
from the previous behavior of InstallPlans where failure to detect OG/SA would
requeue the InstallPlan for reconciliation, so creating the required resources before
the retry limit of the informer queue was reached would transition the InstallPlan
from the Installing phase to the Complete phase(unless the bundle unpacking step
failed, in which case #2093 introduced transitioning the InstallPlan to the Failed
phase).

This regression introduced oddities for users who has infra built that applies a
set of manifests simultaneously to install an operator that includes a Subscription to
an operator (that creates InstallPlans) along with the required OG/SAs. In those cases,
whenever there was a delay in the reconciliation of the OG/SA, the InstallPlan would
be transitioned to a state of permanant faliure.

This PR:

  • Removes the logic that transitioned the InstallPlan to Failed. Instead, the
    InstallPlan will again be requeued for any reconciliation error.

  • Introduces logic to bubble up reconciliation error through the InstallPlan's
    status.Conditions, eg:

When no OperatorGroup is detected:

conditions:
    - lastTransitionTime: "2021-06-23T18:16:00Z"
      lastUpdateTime: "2021-06-23T18:16:16Z"
      message: attenuated service account query failed - no operator group found that
        is managing this namespace
      reason: InstallCheckFailed
      status: "False"
      type: Installed

Then when a valid OperatorGroup is created:

conditions:
    - lastTransitionTime: "2021-06-23T18:33:37Z"
      lastUpdateTime: "2021-06-23T18:33:37Z"
      status: "True"
      type: Installed

Signed-off-by: Anik Bhattacharjee [email protected]

Upstream-repository: operator-lifecycle-manager
Upstream-commit: 3a3874b1e7a663742a6c839d9ff630c921d8c689

In #2077, a new phase `Failed` was introduced for InstallPlans, and failure in
detecting a valid OperatorGroup(OG) or a Service Account(SA) for the namespace
the InstallPlan was being created in would transition the InstallPlan to the
`Failed` state, i.e failure to detected these resources when the InstallPlan was
reconciled the first time was considered a permanant failure. This is a regression
from the previous behavior of InstallPlans where failure to detect OG/SA would
requeue the InstallPlan for reconciliation, so creating the required resources before
the retry limit of the informer queue was reached would transition the InstallPlan
from the `Installing` phase to the `Complete` phase(unless the bundle unpacking step
failed, in which case #2093 introduced transitioning the InstallPlan to the `Failed`
phase).

This regression introduced oddities for users who has infra built that applies a
set of manifests simultaneously to install an operator that includes a Subscription to
an operator (that creates InstallPlans) along with the required OG/SAs. In those cases,
whenever there was a delay in the reconciliation of the OG/SA, the InstallPlan would
be transitioned to a state of permanant faliure.

This PR:
* Removes the logic that transitioned the InstallPlan to `Failed`. Instead, the
InstallPlan will again be requeued for any reconciliation error.

* Introduces logic to bubble up reconciliation error through the InstallPlan's
status.Conditions, eg:

When no OperatorGroup is detected:

```
conditions:
    - lastTransitionTime: "2021-06-23T18:16:00Z"
      lastUpdateTime: "2021-06-23T18:16:16Z"
      message: attenuated service account query failed - no operator group found that
        is managing this namespace
      reason: InstallCheckFailed
      status: "False"
      type: Installed
```

Then when a valid OperatorGroup is created:

```
conditions:
    - lastTransitionTime: "2021-06-23T18:33:37Z"
      lastUpdateTime: "2021-06-23T18:33:37Z"
      status: "True"
      type: Installed
```

Signed-off-by: Anik Bhattacharjee <[email protected]>

Upstream-repository: operator-lifecycle-manager
Upstream-commit: 3a3874b1e7a663742a6c839d9ff630c921d8c689
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 14, 2021

@anik120: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

(fix)InstallPlan: Do not tranisition IP to failed on OG/SA failure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 14, 2021
@anik120 anik120 changed the title (fix)InstallPlan: Do not tranisition IP to failed on OG/SA failure Bug 1982250: (fix)InstallPlan: Do not tranisition IP to failed on OG/SA failure Jul 14, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 14, 2021

@anik120: This pull request references Bugzilla bug 1982250, which is invalid:

  • expected the bug to target the "4.8.0" release, but it targets "4.8.z" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1982250: (fix)InstallPlan: Do not tranisition IP to failed on OG/SA failure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jul 14, 2021
@anik120
Copy link
Contributor Author

anik120 commented Jul 14, 2021

/bugzilla refresh

@openshift-ci openshift-ci bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jul 14, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 14, 2021

@anik120: This pull request references Bugzilla bug 1982250, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1960455 is in the state ON_QA, which is one of the valid states (MODIFIED, ON_QA, VERIFIED)
  • dependent Bugzilla bug 1960455 targets the "4.9.0" release, which is one of the valid target releases: 4.9.0
  • bug has dependents

Requesting review from QA contact:
/cc @jianzhangbjz

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from jianzhangbjz July 14, 2021 16:30
@anik120
Copy link
Contributor Author

anik120 commented Jul 14, 2021

/retest

timflannagan pushed a commit to timflannagan/operator-framework-olm that referenced this pull request Jul 14, 2021
* Add conditions array to OperatorCondition's spec

The conditions array in the spec is now available for operator to
create/update as the operator progresses through installation
process. As the spec is updated, the object generation will be
incremented and it can be used for tracking object changes.

Signed-off-by: Vu Dinh <[email protected]>

* Add OperatorCondition v2 with spec.conditions array

Signed-off-by: Vu Dinh <[email protected]>

Upstream-repository: api
Upstream-commit: bb9b80e8278978efdb06a6f5d5682eb3cad330ec
@kevinrizza
Copy link
Member

/retest

@anik120
Copy link
Contributor Author

anik120 commented Jul 15, 2021

/test e2e-aws-console-olm

Copy link
Member

@dinhxuanvu dinhxuanvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 15, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 15, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anik120, dinhxuanvu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@anik120
Copy link
Contributor Author

anik120 commented Jul 15, 2021

/test e2e-aws-console-olm

@openshift-merge-robot openshift-merge-robot merged commit 2b803dd into openshift:release-4.8 Jul 15, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 15, 2021

@anik120: All pull requests linked via external trackers have merged:

Bugzilla bug 1982250 has been moved to the MODIFIED state.

In response to this:

Bug 1982250: (fix)InstallPlan: Do not tranisition IP to failed on OG/SA failure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/operator-framework-olm that referenced this pull request Aug 4, 2021
* Add conditions array to OperatorCondition's spec

The conditions array in the spec is now available for operator to
create/update as the operator progresses through installation
process. As the spec is updated, the object generation will be
incremented and it can be used for tracking object changes.

Signed-off-by: Vu Dinh <[email protected]>

* Add OperatorCondition v2 with spec.conditions array

Upstream-commit: bb9b80e8278978efdb06a6f5d5682eb3cad330ec
Upstream-repository: api

Signed-off-by: Vu Dinh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants