Skip to content

✨ added per controller leader election #444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

GrigoriyMikhalkin
Copy link
Contributor

Attempt to solve #364

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 25, 2019
@GrigoriyMikhalkin GrigoriyMikhalkin changed the title ✨added leader election per controller ✨added per controller leader election May 25, 2019
@GrigoriyMikhalkin
Copy link
Contributor Author

/assign @droot

Copy link
Member

@mengqiy mengqiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add some validation to check if there are 2 different controller or manager using the same LE namespace and ID when starting the manager.
We may need to add a new method in the controller interface to get the LE config.

"sigs.k8s.io/controller-runtime/pkg/source"
)

const (
// Values taken from: https://github.com/kubernetes/apiserver/blob/master/pkg/apis/config/v1alpha1/defaults.go
defaultLeaseDuration = 15 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are shared bits between controller and manager packages and they can potentially moved to sigs.k8s.io/controller-runtime/pkg/leaderelection/internal.

@GrigoriyMikhalkin
Copy link
Contributor Author

GrigoriyMikhalkin commented May 30, 2019

We should add some validation to check if there are 2 different controller or manager using the same LE namespace and ID when starting the manager.

@mengqiy You mean, check duplicate LE lock names used by manager and controllers that it manages? Or you mean totally unrelated managers and controllers? If later, i'm not sure how we can check that LE lock is created by different controller or manager.

@mengqiy
Copy link
Member

mengqiy commented May 30, 2019

You mean, check duplicate LE lock names used by manager and controllers that it manages? Or you mean totally unrelated managers and controllers? If later, i'm not sure how we can check that LE lock is created by different controller or manager.

I mean the former.

Copy link
Contributor

@DirectXMan12 DirectXMan12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this all functions somewhat awkwardly with the "needs leader election" logic we've just implemented. I think what we really want to do is let things that request leader election (via the interface) return an "id suffix", then have the manager take care of launching the leader election.

@DirectXMan12
Copy link
Contributor

that would also allow the manager to check for duplicate ids within a given manager

@GrigoriyMikhalkin
Copy link
Contributor Author

@mengqiy @DirectXMan12 Completely rewrote code. Added new runnable interface(called it SingletonRunnable, probably not the best naming). Now manager takes care of leader election for controllers, also checks that there's no duplicate LE IDs.

Copy link
Contributor

@DirectXMan12 DirectXMan12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have the LeaderElectionRunnable interface, we should be able to use that.

@GrigoriyMikhalkin
Copy link
Contributor Author

GrigoriyMikhalkin commented Jun 4, 2019

we already have the LeaderElectionRunnable interface, we should be able to use that.

@DirectXMan12 LeaderElectionRunnable interface used for checking if runnable have to be run before or after manager was elected. Are you implying that we can add GetLeaderElectionID method to it and use NeedLeaderElection to determine if need to perform leader election on runnable? It seems like it can be a breaking change for some users(who implemented they own runnables), would it be ok?

@DirectXMan12
Copy link
Contributor

DirectXMan12 commented Jun 5, 2019

interface used for checking if runnable have to be run before or after manager was elected

Correct -- "run before" is equivalent to "don't need leader election".

We could actually just have NeedLeaderElection return an ID as well as a bool (or we could have the separate "GetID" method).

It seems like it can be a breaking change for some users(who implemented they own runnables), would it be ok

We're doing breaking changes this release anyway, but we could still have the default be to run under some standard leader election ID from the manager.

@GrigoriyMikhalkin GrigoriyMikhalkin force-pushed the CR-364 branch 2 times, most recently from 2a0249f to 62aa08d Compare June 9, 2019 08:52
@GrigoriyMikhalkin GrigoriyMikhalkin changed the title ✨added per controller leader election ✨ added per controller leader election Jun 9, 2019
Copy link
Contributor

@DirectXMan12 DirectXMan12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going in the right direction, but I think you need to pull out the global leader lock, which isn't serving any purpose anymore.

if resourceLock := cm.resourceLocksMap[cm.leaderElectionID]; resourceLock != nil {
err := cm.startLeaderElection(cm.leaderElectionID, leaderelection.LeaderCallbacks{
OnStartedLeading: func(_ context.Context) {
cm.startLeaderElectionRunnables()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this double-leader election here -- in fact, it kinda defeats the purpose of per-controller leader election.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really, you want to try to start all of the locks independently.


// Check that leader election ID is unique
if _, exists := cm.resourceLocksMap[leID]; exists {
cm.errChan <- errors.New("LeaderElectionID must be unique")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not entirely true -- a group of controllers could choose to run under the same leader election

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(which might make sense if they needed the same resources, etc.

@GrigoriyMikhalkin
Copy link
Contributor Author

Updated. Now there's no global leader lock. But i use sameLeaderElectionNamespace(and different duration parameters) for all per controller locks, is that fine? IMHO, i think it is, seems like specifying different namespaces for locks is redundant.

@mengqiy
Copy link
Member

mengqiy commented Jun 17, 2019

But i use same LeaderElectionNamespace (and different duration parameters) for all per controller locks, is that fine? IMHO, i think it is, seems like specifying different namespaces for locks is redundant.

How to handle LE namespace is debatable. Using the same LeaderElectionNamespace probably is not flexible enough. What if some controllers want to do LE in the same namespace as the manager and some controllers want to do LE in a dedicated namespace e.g. leader-election-system or kube-system?
IMHO we can make GetID return namespace and object ID or have another function for getting namespace e.g. GetNamespace(). If getting an empty namespace, we still default it to the same namespace as the manager.

@DirectXMan12 WDYT?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 28, 2019
@DirectXMan12
Copy link
Contributor

Outcome of the f2f at kubecon was to support this, but default to manager's LE ID. We need to rebase this and make sure that's the state it's in. I'll take a poke at it.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 5, 2020
@vincepri
Copy link
Member

@GrigoriyMikhalkin are you still interested in working on this change?

@GrigoriyMikhalkin
Copy link
Contributor Author

@GrigoriyMikhalkin are you still interested in working on this change?

Yes. I'm gonna update this PR this weekend.

@vincepri
Copy link
Member

Thank you!

@GrigoriyMikhalkin
Copy link
Contributor Author

GrigoriyMikhalkin commented Feb 23, 2020

Outcome of the f2f at kubecon was to support this, but default to manager's LE ID.

@DirectXMan12 So, by default, we want to make leader election for controllers?

@k8s-ci-robot k8s-ci-robot removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 23, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: GrigoriyMikhalkin
To complete the pull request process, please assign directxman12
You can assign the PR to them by writing /assign @directxman12 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Feb 23, 2020
@DirectXMan12
Copy link
Contributor

Basically, by default, keep the current behavior unless explicitly overridden by the controller.

@GrigoriyMikhalkin
Copy link
Contributor Author

Updated PR. Now LeaderElectionRunnable should implement GetLeaderElectionMode method, which returns one of (Non, per-manager, per-controller) leader election modes. Controllers, by default, use per-manager mode(current behavior).

@DirectXMan12, @vincepri please review

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 9, 2020
@k8s-ci-robot
Copy link
Contributor

@GrigoriyMikhalkin: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 9, 2020
@GrigoriyMikhalkin
Copy link
Contributor Author

Planning to fix conflicts this weekend. Is there still plans to merge this?

// e.g. controllers need to be run in leader election mode, while webhook server doesn't.
NeedLeaderElection() bool
// GetLeaderElectionMode returns leader election mode in which Runnable needs to be run.
GetLeaderElectionMode() leaderelection.Mode
Copy link
Member

@alvaroaleman alvaroaleman Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can not do this. This will break all existing LeaderElection runnables in a way that they won't work anymore but without throwing any kind of compile error.

The current interface must stay as it is and remain having the same result as it has today. We can probably add a new interface and check for both.

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 10, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants