Skip to content

Make CatalogSource the source of truth for available catalogs. #2779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

benluddy
Copy link
Contributor

Internally, the catalog operator has always maintained a set of
registry clients for each CatalogSource. Although this set is
reconciled toward containing a client per CatalogSource object, there
is some latency before changes made to CatalogSources are reflected in
the client set, and differences between CatalogSources and client set
membership are a potential source of error.

Instead, the catalog operator should list CatalogSources from its
informer cache to determine which catalogs are reachable from a given
namespace.

@openshift-ci openshift-ci bot requested review from dinhxuanvu and timflannagan May 19, 2022 18:34
@openshift-ci
Copy link

openshift-ci bot commented May 19, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: benluddy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 19, 2022
@benluddy
Copy link
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 19, 2022
for _, namespace := range namespaces {
cats, err := a.catsrcs.CatalogSources(namespace).List(labels.Everything())
if err != nil {
// todo: this must be properly exposed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely needs to be addressed before merging.

}, nil
}

// this doesn't seem very robust. basically, this subscription condition should arrive directly at True/ErrorPreventedResolution without passing through any other non-Unknown states
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not happy with this test, but it consistently reproduced the bug for me on master.

@benluddy
Copy link
Contributor Author

Here's the progression of the ResolutionFailed condition with this patch. The second line is new:

v1alpha1.SubscriptionCondition{Type:"ResolutionFailed", Status:"Unknown", Reason:"", Message:"", LastHeartbeatTime:<nil>, LastTransitionTime:<nil>}
v1alpha1.SubscriptionCondition{Type:"ResolutionFailed", Status:"True", Reason:"ErrorPreventedResolution", Message:"error using catalog without-registry-server-q8l65 (in namespace subscription-e2e-q8q7t): registry server not reachable for catalogsource subscription-e2e-q8q7t/without-registry-server-q8l65", LastHeartbeatTime:<nil>, LastTransitionTime:<nil>}
v1alpha1.SubscriptionCondition{Type:"ResolutionFailed", Status:"True", Reason:"ErrorPreventedResolution", Message:"error using catalog without-registry-server-q8l65 (in namespace subscription-e2e-q8q7t): failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 10.96.106.125:50051: i/o timeout\"", LastHeartbeatTime:<nil>, LastTransitionTime:<nil>}

Without it, the unreachable catalog is treated as completely empty and resolution proceeds as normal. This can produce ResolutionFailed: true with reason ConstraintsNotSatsfiable, or it can actually choose to install something that is technically acceptable but lower priority than the candidate in the unreachable catalog. In other words, transient skew between the SourceStore and the informer cache can cause the catalog operator to behave nondeterministically.

@openshift-ci
Copy link

openshift-ci bot commented Jun 20, 2022

@benluddy: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 20, 2022
@anik120
Copy link
Contributor

anik120 commented Dec 15, 2022

Thanks for the PR @benluddy 🎉

Do you think you'll have some cycles to get this PR to a reviewable stage? Or would you prefer someone from the operator-framework team to take over this PR for you?

@oceanc80 oceanc80 force-pushed the catsrc-as-catalog-truth-not-registry-clients branch from df427eb to 87d4edf Compare December 16, 2022 20:24
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 16, 2022
return v1alpha1.SubscriptionCondition{}, err
}
cond := sub.Status.GetCondition(v1alpha1.SubscriptionResolutionFailed)
ctx.Ctx().Logf("%#v\n", cond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean up logging

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 27, 2023
Internally, the catalog operator has always maintained a set of
registry clients for each CatalogSource. Although this set is
reconciled toward containing a client per CatalogSource object, there
is some latency before changes made to CatalogSources are reflected in
the client set, and differences between CatalogSources and client set
membership are a potential source of error.

Instead, the catalog operator should list CatalogSources from its
informer cache to determine which catalogs are reachable from a given
namespace.

Signed-off-by: Ben Luddy <[email protected]>
@benluddy benluddy force-pushed the catsrc-as-catalog-truth-not-registry-clients branch from 87d4edf to 7a2c454 Compare June 23, 2023 13:36
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 23, 2023
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 23, 2023
@openshift-merge-robot
Copy link
Collaborator

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@benluddy benluddy closed this Jan 26, 2024
@benluddy
Copy link
Contributor Author

I still think this is worthwhile, but I am several years out of the loop when it comes to operator framework issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants