Skip to content

🌱 [WIP] Conditional Controllers #1527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

kevindelgado
Copy link
Contributor

This is an updated implementation of the design to add conditional controller support from #1192

It offers a way to configure controllers to seamlessly start/stop/restart themselves based on whether the CRD it watches exists on the cluster or not. It does so by:

  1. Implementing a stop channel on the informers map MapEntry that can be
    retrieved from the cache's Informers interface that fires when the informer has
    stopped.
  2. A new Source is created called ConditionalSource that is implemented by a
    ConditionalKind that has a Ready() method that blocks until the kind's
    type is installed on the cluster and ready to be controlled, and a
    StartNotifyDone() that starts the underlying source with a channel that
    fires when the source (and its underlying informer have stopped).
  3. Controllers maintain a list of ConditonalSources known as
    conditionalWatches that can be added like regular watches via the
    controllers Watch method. For any ConditionalSource that Watch() is
    called with, it uses Ready() to determine when to StartNotifyDone() it
    and uses the done channel returned by StartNotifyDone() to determine when
    the watch has started and that it should wait on Ready() again.
  4. The manager recognize a special type or Runnable known as a
    ConditionalRunnable that has a Ready() method to indicate when the
    underlying Runnable can be ran.
  5. The controller builder enables users supply a boolean conditional option to
    the builder's For, Owns, and Watches inputs that creates
    ConditionalRunnables to be ran by the manager.

Currently it has been manually tested and there is a single integration test that checks the whole CRD install/uninstall/reinstall flow. I intend to add more unit tests for each individual component that has changed, but wanted to get feedback on the overall design first.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 15, 2021
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 15, 2021
@kevindelgado
Copy link
Contributor Author

/assign @DirectXMan12

@coderanger
Copy link
Contributor

I think I'm missing some background here, but what use case is this for? This seems like it would only come up in operators which can do stuff to any specified object type based on runtime inputs? That seems legit but want to clarify since we've gotten questions about this kind of feature before when "two normal controllers that both have an early return check" was the far simpler solution :)

@kevindelgado
Copy link
Contributor Author

I think I'm missing some background here, but what use case is this for? This seems like it would only come up in operators which can do stuff to any specified object type based on runtime inputs? That seems legit but want to clarify since we've gotten questions about this kind of feature before when "two normal controllers that both have an early return check" was the far simpler solution :)

The use case we are targeting is when the owner of the operator is not an admin of the cluster(s) it is operating on and thus it doesn’t have any control over when CRDs are installed and uninstalled.

As the owner of the controller/operator, my operator should still run successfully on such a cluster and gracefully transition from a state of reconciling when the CRD I’m operating on is present to waiting for that CRD to be reinstalled when it no longer exists on the cluster (and vice versa).

Running two separate controllers that intentionally stop themselves is not always viable when you’re running multiple operators in a single binary (e.g. multi-cluster, or just sharing a cache for perf reasons like kubernetes’ controller-manager).

@kevindelgado kevindelgado force-pushed the conditional-controllers-ready-startnotifydone branch from b5f1ed8 to 32f8b21 Compare May 18, 2021 23:10
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kevindelgado
To complete the pull request process, please ask for approval from directxman12 after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

@kevindelgado: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-controller-runtime-apidiff-master 32f8b21 link /test pull-controller-runtime-apidiff-master
pull-controller-runtime-test-master 32f8b21 link /test pull-controller-runtime-test-master

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 23, 2021
@k8s-ci-robot
Copy link
Contributor

@kevindelgado: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 21, 2021
@cnfatal
Copy link

cnfatal commented Oct 18, 2021

nice feature

@tossmilestone
Copy link

@kevindelgado The feature is an awsome feature for us, we implemented a same one but with a bad method due to the sealed stop channel of informer. Hope the PR to be merged soon.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 1, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tossmilestone
Copy link

/reopen

@k8s-ci-robot
Copy link
Contributor

@tossmilestone: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tossmilestone
Copy link

@kevindelgado Is the feature will be abandoned? If you are busy, I can help to work on it and get it done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants