-
Notifications
You must be signed in to change notification settings - Fork 72
Bug 1982250: (fix)InstallPlan: Do not tranisition IP to failed on OG/SA failure #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1982250: (fix)InstallPlan: Do not tranisition IP to failed on OG/SA failure #119
Conversation
In #2077, a new phase `Failed` was introduced for InstallPlans, and failure in detecting a valid OperatorGroup(OG) or a Service Account(SA) for the namespace the InstallPlan was being created in would transition the InstallPlan to the `Failed` state, i.e failure to detected these resources when the InstallPlan was reconciled the first time was considered a permanant failure. This is a regression from the previous behavior of InstallPlans where failure to detect OG/SA would requeue the InstallPlan for reconciliation, so creating the required resources before the retry limit of the informer queue was reached would transition the InstallPlan from the `Installing` phase to the `Complete` phase(unless the bundle unpacking step failed, in which case #2093 introduced transitioning the InstallPlan to the `Failed` phase). This regression introduced oddities for users who has infra built that applies a set of manifests simultaneously to install an operator that includes a Subscription to an operator (that creates InstallPlans) along with the required OG/SAs. In those cases, whenever there was a delay in the reconciliation of the OG/SA, the InstallPlan would be transitioned to a state of permanant faliure. This PR: * Removes the logic that transitioned the InstallPlan to `Failed`. Instead, the InstallPlan will again be requeued for any reconciliation error. * Introduces logic to bubble up reconciliation error through the InstallPlan's status.Conditions, eg: When no OperatorGroup is detected: ``` conditions: - lastTransitionTime: "2021-06-23T18:16:00Z" lastUpdateTime: "2021-06-23T18:16:16Z" message: attenuated service account query failed - no operator group found that is managing this namespace reason: InstallCheckFailed status: "False" type: Installed ``` Then when a valid OperatorGroup is created: ``` conditions: - lastTransitionTime: "2021-06-23T18:33:37Z" lastUpdateTime: "2021-06-23T18:33:37Z" status: "True" type: Installed ``` Signed-off-by: Anik Bhattacharjee <[email protected]> Upstream-repository: operator-lifecycle-manager Upstream-commit: 3a3874b1e7a663742a6c839d9ff630c921d8c689
@anik120: No Bugzilla bug is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@anik120: This pull request references Bugzilla bug 1982250, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@anik120: This pull request references Bugzilla bug 1982250, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 6 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
* Add conditions array to OperatorCondition's spec The conditions array in the spec is now available for operator to create/update as the operator progresses through installation process. As the spec is updated, the object generation will be incremented and it can be used for tracking object changes. Signed-off-by: Vu Dinh <[email protected]> * Add OperatorCondition v2 with spec.conditions array Signed-off-by: Vu Dinh <[email protected]> Upstream-repository: api Upstream-commit: bb9b80e8278978efdb06a6f5d5682eb3cad330ec
/retest |
/test e2e-aws-console-olm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: anik120, dinhxuanvu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test e2e-aws-console-olm |
@anik120: All pull requests linked via external trackers have merged: Bugzilla bug 1982250 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
* Add conditions array to OperatorCondition's spec The conditions array in the spec is now available for operator to create/update as the operator progresses through installation process. As the spec is updated, the object generation will be incremented and it can be used for tracking object changes. Signed-off-by: Vu Dinh <[email protected]> * Add OperatorCondition v2 with spec.conditions array Upstream-commit: bb9b80e8278978efdb06a6f5d5682eb3cad330ec Upstream-repository: api Signed-off-by: Vu Dinh <[email protected]>
In #2077, a new phase
Failed
was introduced for InstallPlans, and failure indetecting a valid OperatorGroup(OG) or a Service Account(SA) for the namespace
the InstallPlan was being created in would transition the InstallPlan to the
Failed
state, i.e failure to detected these resources when the InstallPlan wasreconciled the first time was considered a permanant failure. This is a regression
from the previous behavior of InstallPlans where failure to detect OG/SA would
requeue the InstallPlan for reconciliation, so creating the required resources before
the retry limit of the informer queue was reached would transition the InstallPlan
from the
Installing
phase to theComplete
phase(unless the bundle unpacking stepfailed, in which case #2093 introduced transitioning the InstallPlan to the
Failed
phase).
This regression introduced oddities for users who has infra built that applies a
set of manifests simultaneously to install an operator that includes a Subscription to
an operator (that creates InstallPlans) along with the required OG/SAs. In those cases,
whenever there was a delay in the reconciliation of the OG/SA, the InstallPlan would
be transitioned to a state of permanant faliure.
This PR:
Removes the logic that transitioned the InstallPlan to
Failed
. Instead, theInstallPlan will again be requeued for any reconciliation error.
Introduces logic to bubble up reconciliation error through the InstallPlan's
status.Conditions, eg:
When no OperatorGroup is detected:
Then when a valid OperatorGroup is created:
Signed-off-by: Anik Bhattacharjee [email protected]
Upstream-repository: operator-lifecycle-manager
Upstream-commit: 3a3874b1e7a663742a6c839d9ff630c921d8c689