Skip to content

Commit 6a6ffe4

Browse files
committed
Initial ConditionalControllers design doc
1 parent 6af4e7c commit 6a6ffe4

File tree

1 file changed

+204
-0
lines changed

1 file changed

+204
-0
lines changed

designs/conditional-controllers.md

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
Conditionally Runnable Controllers
2+
==========================
3+
4+
## Summary
5+
6+
Enable controller managers to successfully operate when the CRD the controller
7+
is configured to watch does not exist.
8+
9+
Successful operation of a controller manager includes:
10+
* Starts and runs without error when a CRD the controller watches does not exist.
11+
* Begins watching the CRD once it is installed.
12+
* Unregisters (stops watching) once a CRD is uninstalled.
13+
14+
## Background/Motivation
15+
16+
Usually there is a 1:1 relationship between controller and resource and a 1:1
17+
relationship between controller and k8s cluster. When this is the case it is
18+
fine to assume that for a controller to run successfully, the resource it
19+
controls must be installed on the cluster that the controller is watching.
20+
21+
We are now seeing use cases where a single controller is responsible for
22+
multiple resources across multiple clusters. This creates situations where
23+
controllers need to successfully run even when the resource is not installed,
24+
and need to proceed successfully as it begins/terminates its watch on a resource
25+
when the resource is installed/uninstalled.
26+
27+
The current approach is to check the discovery for a CRDs existence prior to
28+
adding the controller to the manager. This has its limitations, as complexity
29+
increases greatly for users who need to manage multiple controllers for a
30+
mixture of CRDs that might not always be installed on the cluster or are
31+
installed in different order.
32+
33+
While there is a lot of groundwork needed to fully support multi-cluster
34+
controllers, this proposal offers incremental steps to supporting one specific use case
35+
(conditionally runnable controllers), of which hopefully some minimal form can be agreed
36+
upon before needing a comple multi-cluster story.
37+
38+
## Goals
39+
40+
(In order from most necessary to least)
41+
1. A mechanism for starting/stopping/restarting controllers and their caches.
42+
2. A solution for running a controller conditionally, such that it automatically
43+
starts/stops/restarts upon installation/uninstallation/reinstallation of its
44+
respective CRD in the cluster.
45+
3. An easy-to-use mechanism for solving goal #2 without the end user needing to
46+
understand too much.
47+
48+
## Non-Goals
49+
50+
TODO
51+
52+
## Proposal
53+
54+
The following proposal attempts to address the three goals in four separate
55+
steps.
56+
57+
The first goal of enabling the possibility of starting/stopping/restarting
58+
controllers is addressed in two parts, first where a mechanism is built to
59+
enable the removal of informers, and second where we expose this mechanism and
60+
enable the restarting of controllers
61+
62+
The second goal of a solution for running controllers conditionally is addressed
63+
by creating a wrapper around a controller called a ConditionalController that
64+
within its Start() function, it polls the discovery doc for the existence of the
65+
watched object and starts/stops/restarts the underlying controller as necessary
66+
67+
The third goal of an ergonomic mechanism to use ConditionalControllers is a
68+
small modification to the controller builder to enable running a controller as a
69+
ConditionalController.
70+
71+
###### Proof of concept
72+
A proof of concept PR exists at
73+
[#1180](https://github.com/kubernetes-sigs/controller-runtime/pull/1180)
74+
75+
Each commit maps to one step in the proposal and can loosely be considered going
76+
from most necessary to least.
77+
78+
### Informer Removal [Pre-req]
79+
80+
This proposal assumes a mechanism exists for removing individual informers. We
81+
are agnostic to how this is done, but the current proposal is built on top of
82+
Shomron’s proposed implementation of individual informer removal
83+
[#936](https://github.com/kubernetes-sigs/controller-runtime/pull/936).
84+
85+
86+
Risks/mitigations and alternatives are discussed in the linked PR as well as the
87+
corresponding issue
88+
[#935](https://github.com/kubernetes-sigs/controller-runtime/issues/935).
89+
90+
We are currently looking into whether that PR can be distilled further into a more
91+
minimal solution.
92+
93+
### Minimal hooks
94+
95+
Given that a mechanism exists to remove individual informers, the next step is
96+
to expose this removal functionality and enable safely
97+
starting/stopping/restarting controllers and their caches.
98+
99+
The proposal to do this is:
100+
1. Expose the `informerCache.Remove()` functionality on the cache interface.
101+
2. Expose the ability to reset the internal controller’s `Started` field.
102+
3. Expose a field on the internal controller to determine whether to `saveWatches`
103+
or not and use this field to not empty the controller’s `startWatches` when the
104+
controller is stopped.
105+
106+
#### Risks and Mitigations
107+
108+
* [#1139](https://github.com/kubernetes-sigs/controller-runtime/pull/1139) discusses why
109+
the ability to start a controller more than once was taken away. It's a little
110+
unclear what effect explicitly enabling multiple starts in the case of
111+
conditional controllers will hae on the number of workers and workqueues
112+
relative to expectations and metrics.
113+
* [#1163](https://github.com/kubernetes-sigs/controller-runtime/pull/1163) discusses the
114+
memory leak caused by no clearing out the watches internal slice. A possible
115+
mitigation is to clear out the slices upon ConditionalController shutdown.
116+
117+
#### Alternatives
118+
119+
* Instead of exposing ResetStart and SaveWatches on the internal controller struct
120+
it might be better to expose it on the controller interface. This is more public
121+
and creates more potential for abuse, but it prevents some hacky solutions
122+
discussed below around needing to cast to the internal controller or create
123+
extra interfaces.
124+
125+
### Conditional Controllers
126+
127+
With the minimal hooks needed to start/stop/restart controllers and their caches
128+
in place, the next step is to provide a wrapper controller around a traditional
129+
controller that starts/stops the underlying controller based on the existence of
130+
the CRD under watch.
131+
132+
The proposal to do this:
133+
1. `ConditionalController` that takes implements controller and with in Start()
134+
method:
135+
2. Polls the discovery doc every configurable amount of time and recognizes when
136+
the CRD has been installed/uninstalled.
137+
3. Upon installation it merges the caller’s stop channel with a local start channel
138+
and runs the underlying controller with the merged stop channel such that both
139+
the caller and this conditional controller can stop it.
140+
4. Upon uninstallation it sets the controllers `Started` field to false so that it
141+
can be restarted, indicates that the controller should save its watches upon
142+
stopping, and then stops the controller via the local stop channel and removes
143+
the cache for the object under watch.
144+
145+
#### Risks and Mitigations
146+
147+
* With the above minimal hooks exposing `ResetStart()` and `SaveWatches()` on the
148+
internal controller this creates the need to create a hacky intermediate
149+
interface (StoppableController) for testing and to avoid casting to the internal
150+
controller. The simplest solution is just to expose these on the controller
151+
interface (see above alternative), but there might be a better way.
152+
153+
#### Alternatives
154+
155+
* A more general solution could allow for conditional runnables of more than just
156+
controllers, where the user could supply the conditional check on their own,
157+
such that it’s not just limited to looking at the existence of a CRD. (but we
158+
believe that the common case will be a controller watching CRD install and thus
159+
there should still be a simple way to do this)
160+
* Nit: it's unclear if the controller package is the best place for this to live.
161+
Originally it was thought that controllerutil might be best, but that creates an
162+
import cycle.
163+
164+
### Builder Ergonomics
165+
166+
Since we provide simple ergonomics for creating controllers (builder), it would
167+
make sense to do the same for any conditional controller utility we create.
168+
169+
The current proposal (link) is to:
170+
1. Provide a forInput boolean option that the user can set to enable conditionally
171+
running and an option to configure the wait time that the ConditionalController
172+
will wait between attempts to poll the discovery doc.
173+
2. In the builder’s doController function, it will first create an unmanaged
174+
controller.
175+
3. If the conditionally run option is set, it will wrap the unmanaged controller in
176+
a conditional controller.
177+
4. Add the resulting controller to the manager.
178+
179+
#### Risks and Mitigations
180+
181+
TODO
182+
183+
#### Alternatives
184+
* A separate builder specifically for the conditional controller would prevent us
185+
from having to make changes to the current builder.
186+
* As noted above a more general conditional runnable will probably require its own
187+
builder.
188+
189+
## Acceptance Criteria
190+
191+
Ultimately, the end user must have some successful solution that involves
192+
starting with a controller for a CRD that can
193+
1. Run the mgr without CRD installed yet and mgr should start successfully
194+
2. After CRD installation the controller should start and run successfully
195+
3. Upon uninstalling the CRD, the controller should stop but the manager should
196+
continue running successfully
197+
4. Upon CRD reinstallation, the controller should once again start and run
198+
successfully with no disruption to the manager and its other runnables.
199+
200+
## Timeline of Events
201+
* 9/30/2020: Propose idea in design doc and proof-of-concept PR to
202+
controller-runtime
203+
* 10/8/2020: Discuss idea in community meeting
204+

0 commit comments

Comments
 (0)