Skip to content

Commit 2c01809

Browse files
vincepriRainbowMango
authored andcommitted
🐛 Fix a race condition between leader election and recorder
This change introduces better syncronization between the leader election code and the event recorder. Running tests with -race flag, we often saw a panic on a closed channel, the channel was the one that the event recorder was using internally. After digging more through the code, it seems that we weren't properly waiting for leader election code to stop completely, but instead we were only calling the cancel() function asking the leader election to stop. With this change, during a shutdown, we now wait for leader election to finish up any internal task before we return and close an internal channel. Only after leader election signals that the channel has been closed, and Run(...) has properly returned, we return execution to the stop procedure, where the event recorder is then stopped. Signed-off-by: Vince Prignano <[email protected]>
1 parent d8f70ba commit 2c01809

File tree

2 files changed

+15
-1
lines changed

2 files changed

+15
-1
lines changed

pkg/manager/internal.go

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,10 @@ type controllerManager struct {
121121
// it must be deferred until after gracefulShutdown is done.
122122
leaderElectionCancel context.CancelFunc
123123

124+
// leaderElectionStopped is an internal channel used to signal the stopping procedure that the
125+
// LeaderElection.Run(...) function has returned and the shutdown can proceed.
126+
leaderElectionStopped chan struct{}
127+
124128
// stop procedure engaged. In other words, we should not add anything else to the manager
125129
stopProcedureEngaged bool
126130

@@ -557,7 +561,12 @@ func (cm *controllerManager) waitForRunnableToEnd(shutdownCancel context.CancelF
557561
// Cancel leader election only after we waited. It will os.Exit() the app for safety.
558562
defer func() {
559563
if cm.leaderElectionCancel != nil {
564+
// After asking the context to be cancelled, make sure
565+
// we wait for the leader stopped channel to be closed, otherwise
566+
// we might encounter race conditions between this code
567+
// and the event recorder, which is used within leader election code.
560568
cm.leaderElectionCancel()
569+
<-cm.leaderElectionStopped
561570
}
562571
}()
563572

@@ -660,7 +669,11 @@ func (cm *controllerManager) startLeaderElection() (err error) {
660669
}
661670

662671
// Start the leader elector process
663-
go l.Run(ctx)
672+
go func() {
673+
l.Run(ctx)
674+
<-ctx.Done()
675+
close(cm.leaderElectionStopped)
676+
}()
664677
return nil
665678
}
666679

pkg/manager/manager.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -359,6 +359,7 @@ func New(config *rest.Config, options Options) (Manager, error) {
359359
livenessEndpointName: options.LivenessEndpointName,
360360
gracefulShutdownTimeout: *options.GracefulShutdownTimeout,
361361
internalProceduresStop: make(chan struct{}),
362+
leaderElectionStopped: make(chan struct{}),
362363
}, nil
363364
}
364365

0 commit comments

Comments
 (0)