Skip to content

wip(operators): watch preferred kinds dynamically #1670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

njhale
Copy link
Member

@njhale njhale commented Jul 22, 2020

Description of the change:

  • Allow associating resources of all preferred GVKs in a cluster with an Operator resource.
  • Enable automatic adoption of components for all preferred GVKs

This means that any resource on the cluster can be labelled as an Operator component.

Motivation for the change:

The set of possible resource types that can comprise an operator changes dynamically as a cluster's control-plane is extended; w/ CRDs and extension APIServices. Often, an operator author considers certain custom resources of kinds they provide to be a component of their operator deployment ( e.g. a config CR).

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 22, 2020
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: njhale

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 22, 2020
Watches(&source.Kind{Type: &apiregistrationv1.APIService{}}, enqueueCSV).
Watches(&source.Kind{Type: &operatorsv1alpha1.Subscription{}}, enqueueCSV).
Watches(&source.Kind{Type: &operatorsv1alpha1.InstallPlan{}}, enqueueCSV).
Watches(r.source, enqueueCSV).
Copy link
Member Author

@njhale njhale Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to figure out a way to share caches/watches between controllers under a single manager. I don't like the idea watching and storing k|cluster resources|.

Comment on lines +273 to +278
var componentLists []runtime.Object
for _, gvk := range informable {
gvk.Kind = gvk.Kind + "List"
ul := &unstructured.UnstructuredList{}
ul.SetGroupVersionKind(gvk)
componentLists = append(componentLists, ul)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be deduplicated across the operator and adoption controller. This is probably generic enough to be it's own layer on top of the controller-runtime client.

@@ -218,6 +218,7 @@ func (o *Operator) AdoptComponent(component runtime.Object) (adopted bool, err e
labels[labelKey] = ""
adopted = true
}
m.SetLabels(labels)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes the changes stick for unstructured.Unstructured.

)

// InformableGVKs returns the set of GVKs available in discovery that support the list and watch verbs (informable).
func (d *Dynamic) InformableGVKs() (informables []schema.GroupVersionKind, err error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to factor this out of Dynamic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getInformableGVKs might make the code more readable at the site this function is called.

https://dmitripavlutin.com/coding-like-shakespeare-practical-function-naming-conventions/#2-prefer-explanatory-names

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually quite a useful function in a standalone way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getInformableGVKs might make the code more readable at the site this function is called.

https://dmitripavlutin.com/coding-like-shakespeare-practical-function-naming-conventions/#2-prefer-explanatory-names

I actually disagree, I think it's Go code convention not to have things like Get... in function names, unlike say JavaScript
https://golang.org/doc/effective_go.html#Getters

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pet peeve of mine 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay fair point. I concede.

}

var resourceLists []*metav1.APIResourceList
resourceLists, err = d.client.ServerPreferredResources()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fails when discovery breaks for any resource.

informablesLock.RLock()
defer informablesLock.RUnlock()
log.Error(err, "could not get preferred resources, returning latest known gvks")
return knownInformables, nil
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to memoize the last successful APIResourceList and whittle it down with direct checks against the discovery client instead. This will let us weed out the problem resource types before returning.

}
}

}, 15*time.Second, d.stop)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we think about triggering this when APIServices and CRDs change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Maybe we can expand the InformableGVKs() function to not only memoize on known GVKs in case of discovery errors but also use that as a cache so that when new CRDs (GVKs) become available we can use the diff instead of going through the ServerPreferredResources again

}
}

}, 15*time.Second, d.stop)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this configurable?


}, 15*time.Second, d.stop)

synced.Wait()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block until we set up the initial set of sources.

r := &unstructured.Unstructured{}
r.SetGroupVersionKind(gvk)
s := &source.Kind{Type: r}
if err = s.InjectCache(d.cache); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controller-runtime doesn't expose a way to stop individual sources/informers, see:

I'm curious to see if anyone has suggestions for how to get around this short of reimplementing controller-runtime's informerCache.

@Bowenislandsong
Copy link
Member

This PR failed tests for 1 times with 1 individual failed tests and 4 skipped tests. A test is considered flaky if failed on multiple commits.

totaltestcount: 1
flaketestcount: 1
skippedtestcount: 4
flaketests:

  • classname: End-to-end
    name: 'Subscription skip range'
    counts: 1
    details:
    • count: 1
      error: |4-

      /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/test/e2e/subscription_e2e_test.go:92
      
      	Error Trace:	subscription_e2e_test.go:164
      	            				runner.go:113
      	            				runner.go:64
      	            				it_node.go:26
      	            				spec.go:215
      	            				spec.go:138
      	            				spec_runner.go:200
      	            				spec_runner.go:170
      	            				spec_runner.go:66
      	            				suite.go:62
      	            				ginkgo_dsl.go:226
      	            				ginkgo_dsl.go:214
      	            				e2e_test.go:54
      	Error:      	Received unexpected error:
      	            	etcdserver: request timed out
      
      /home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/github.com/stretchr/testify/require/require.go:1005
      
    meandurationsec: 28.239263
    skippedtests:
  • classname: End-to-end
    name: 'Subscription updates existing install plan'
    counts: 1
    details: []
    meandurationsec: 0.056757
  • classname: End-to-end
    name: 'Catalog image update'
    counts: 1
    details: []
    meandurationsec: 0.307005
  • classname: End-to-end
    name: 'Subscriptions create required objects from Catalogs Given a Namespace
    when a CatalogSource is created with a bundle that contains prometheus objects
    creating a subscription using the CatalogSource should have created the expected
    prometheus objects
    '
    counts: 1
    details: []
    meandurationsec: 2.184881
  • classname: End-to-end
    name: 'Subscriptions create required objects from Catalogs Given a Namespace
    when a CatalogSource is created with a bundle that contains prometheus objects
    creating a subscription using the CatalogSource should install the operator
    successfully
    '
    counts: 1
    details: []
    meandurationsec: 3.170511

@njhale
Copy link
Member Author

njhale commented Jul 23, 2020

/test e2e-gcp-upgrade

}

// InjectConfig injects the the REST Config dependency.
func (d *Dynamic) InjectConfig(config *rest.Config) (err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is InjectRESTConfigDependency too big of a name? It'll would get rid of the comment, and be self explanatory in the calling site too.

Copy link
Member Author

@njhale njhale Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree, but we don't control the method signatures used for injection.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll would get rid of the comment

Also, golangci-lint generally complains about exports w/o comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alas.

Copy link
Member

@exdx exdx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! just leaving some general feedback

synced.Add(1)

go wait.Until(func() {
defer once.Do(func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention here is to only call synced.Done() once after the function exits - but since the function is called within the Until() scope every 15 seconds are you sure this doesn't just get triggered the first time the function is executed within the Until loop?

This is probably standard code, just want to confirm.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention here is to only call synced.Done() once after the function exits - but since the function is called within the Until() scope every 15 seconds are you sure this doesn't just get triggered the first time the function is executed within the Until loop?

I'm not understanding the delineation you're making here. Could you rephrase?

}
}

}, 15*time.Second, d.stop)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Maybe we can expand the InformableGVKs() function to not only memoize on known GVKs in case of discovery errors but also use that as a cache so that when new CRDs (GVKs) become available we can use the diff instead of going through the ServerPreferredResources again

)

// InformableGVKs returns the set of GVKs available in discovery that support the list and watch verbs (informable).
func (d *Dynamic) InformableGVKs() (informables []schema.GroupVersionKind, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually quite a useful function in a standalone way

}

if len(informables) > 0 {
// Memoize the latest informables in case discovery fails on successive calls
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this touch a lot. May be interesting to play around with an e2e test that fails discovery calls intentionally and sees how the operator API behaves

&operatorsv1alpha1.SubscriptionList{},
&operatorsv1alpha1.InstallPlanList{},
&operatorsv1alpha1.ClusterServiceVersionList{},
informable, err := r.source.InformableGVKs()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this may take a while on an Openshift cluster with a ton of GVKs...maybe adding a log message indicating that this process is happening could be insightful

@exdx
Copy link
Member

exdx commented Jul 23, 2020

I think a lot of this work will be useful in the "OLM supports arbitrary kube types" feature upcoming in the next release. In fact if there's anything here that we could expose to that effort that would be super helpful.

@openshift-ci-robot
Copy link
Collaborator

@njhale: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 2, 2020
@openshift-ci-robot
Copy link
Collaborator

@njhale: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp-upgrade e37184e link /test e2e-gcp-upgrade
ci/prow/e2e-upgrade e37184e link /test e2e-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@njhale
Copy link
Member Author

njhale commented Sep 17, 2020

Closing this out to pursue an alternative: for each API group, whenever the default version set changes, spin up a new pod to watch the resources of that set.

@njhale njhale closed this Sep 17, 2020
@njhale njhale deleted the dynsrc branch July 23, 2021 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants