-
Notifications
You must be signed in to change notification settings - Fork 1.2k
✨ Allow removing individual informers from the cache (#935) #936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Allow removing individual informers from the cache (#935) #936
Conversation
This change adds support for removing individual informers from the cache using the new Remove() method. This is allowed before or after the cache has been started. Informers are stopped at the time of removal - once stopped, they will no longer deliver events to registered event handlers, and registered watched will be aborted. Also adds non-blocking API for getting informers without waiting for their cache to sync - GetInformerNonBlocking(). Signed-off-by: Oren Shomron <[email protected]>
Welcome @shomron! |
Hi @shomron. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: shomron The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@vincepri what's the best way to get some 👀 on this issue and decide how to proceed? |
/ok-to-test Sorry if I am missing some context, do we have an issue that describes the need for this? Sorry if I am out of the loop a little, just trying to link through when this would be used |
@shawn-hurley yes, the linked issue is #935 and the use case is described there. Let me know if you have any questions, I'd be happy to elaborate further. |
@@ -126,7 +126,7 @@ func defaultOpts(config *rest.Config, opts Options) (Options, error) { | |||
// Construct a new Mapper if unset | |||
if opts.Mapper == nil { | |||
var err error | |||
opts.Mapper, err = apiutil.NewDiscoveryRESTMapper(config) | |||
opts.Mapper, err = apiutil.NewDynamicRESTMapper(config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good change but should be its own PR as it doesn't really have anything to do with this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I will move this into a separate PR. 👍🏻
@shomron this means that controllers that use the given informer for their watch will silently degrade, correct? |
@alvaroaleman yes that is correct. In our use case, we have a primary controller multiplexing watch events from dynamically watched resources and forwarding them on to interested parties using a stable |
Okay. So from what I can see on a quick look, it is currently not possible to find out if an Informer has event handlers registered (correct me if I missed something there). This means that there is no way for us to prevent removal of informers that are being used in a watch by a controller and that in turn means that the implementation in this PR introduces a huge footgun. That being said, it would be great to support your usecase, we've had something similiar as well in the past. What prevents you from creating a cache per watch? cc @thetechnick |
@alvaroaleman thank you for looking at this! Let me try to address your points: Regarding checking whether the informer has registered event handlers - in our case we know the informer has registered event handlers, and explicitly want to stop watching. I think the issue is not introspection, but that event handlers cannot be removed from informers. Tangentially, as I called out in #863 (comment) - it is now possible to stop an unmanaged controller with registered event handlers - which I believe would cause its work queue to grow unbounded.
This seems less than ideal. The role of the cache is to manage the lifecycle of informers over disparate resources and provide reader/writer abstractions on top. If we created a cache-per-resource, we would end up implementing a cache of caches, where each cache manages a single informer. I feel we could achieve a more concise design. Thoughts? |
True, deregistering is the better solution. My point here is that this introduces a silent and hard to debug failure mode for everyone in order to fix a in the grand scheme of things rather exotic use case. And that is in my opinion not a good trade-off. Maybe we can add refcounting ourselves on top of the underlying informer and refuse removing them if they have registered eventhandlers? Potentially with the option to say "I know this has N eventhandlers but this is okay because I am also stopping the controller?" This is far from an elegant solution but from what I can see the best we can do without upstream changes. Maybe it would be a good idea to also start a discussion with upstream sig-apimachinery folks if they would consider adding an option to remove event handlers? Tangentially, do you know if its safe to stop individual informers or if that might leak goroutines and/or memory?
I completely agree that this isn't an ideal solution, its a workaround. The trade-off that is in this PR right now (big footgun for everyone to support a somewhat uncommon usecase) is in my opinion not a good one. But I'd love for others to weight in on that. |
I'm torn on this. I think the use case totally makes sense. I also agree with everything Alvaro said about it being a footgun and possible solutions. I'm curious what other API machinery folks think about (de)registering eventhandlers, as that's definitely the most pure solution albeit probably very non-trivial. I'd probably prefer doc'ing a cache of caches as a short term workaround for this case and fixing the eventhandlers long term to enable this scenario if I had full choice. third opinions? |
I ran into https://docs.google.com/document/d/1Wi3LM3sG6Qgfzm--bWb6R0SEKCkQCCt-ene6cO62FlM/edit?usp=sharing which touches on this use case |
Having read the whole discussion, it might be worth to take a step back and do a small proposal following the examples we already have in docs/proposals. In general, the use case is valid. That said, we need to make sure we don't open a whole set of potential issues that weren't possible before. I'd like to see a way for us to support this feature while leaning on the side of safety and explicit user configuration/intent. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been discussing a usecase with some folks around this. The general gist is that that a given CRD may or may not exist on a cluster, and you want to gracefully degrade when it's not present, but you want to run a controller while it is. In various cases you don't actually want to restart the manager, so it's useful to be able to say "run this controller and equivalent informers while a resource is available, otherwise don't".
All that being said, a decent first step for this might be to keep it internal, use it for that use case, and work on refcounting or something in parallel.
// via the provided stop argument. | ||
func (e *MapEntry) Start(stop <-chan struct{}) { | ||
// Stop on either the whole map stopping or just this informer being removed. | ||
internalStop, cancel := anyOf(stop, e.stop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1116 should help here
Ahead of tomorrow's community meeting, I've been working on a more formal design doc for this as requested above. See #1192 |
@shomron: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
gonna close this for the moment, we'll roll it into whatever we decide to do with the stoppable controllers work and give attribution. |
This change adds support for removing individual informers from the
cache using the new Remove() method. This is allowed before or after the
cache has been started. Informers are stopped at the time of removal -
once stopped, they will no longer deliver events to registered event
handlers, and registered watched will be aborted.
Also adds non-blocking API for getting informers without waiting for their
cache to sync - GetInformerNonBlocking().
Currently marked as ✨ but depending on whether we decide to extend existing interfaces, may become breaking⚠️ .
This PR is introduced without comprehensive tests to give an opportunity to iron out the API with the maintainers.
Signed-off-by: Oren Shomron [email protected]