Enable leader election in ws-manager-mk2 #18419

aledbf · 2023-08-02T17:25:21Z

Description

Summary generated by Copilot

`🤖 Generated by Copilot at f4df6f5`

This pull request enhances the ws-manager-mk2 component, which is responsible for managing workspaces in Gitpod. It fixes an import error, removes an unnecessary flag, and enables leader election for the controller.

Related Issue(s)

Fixes ENG-53

How to test

Open the Preview environment and check workspaces work as expected
Kill one of the ws-manager-mk2 pods and check the logs of the remaining one, checking for the leader election

Preview status

Gitpod was successfully deployed to your preview environment.

🏷️ Name - aledbf-ha-mk2
🔗 URL - aledbf-ha-mk2.preview.gitpod-dev.com/workspaces.
📚 Documentation - See our internal documentation for information on how to interact with your preview environment.
📦 Version - aledbf-ha-mk2-gha.14639

Build Options

Build

/werft with-werft
Run the build with werft instead of GHA
leeway-no-cache
/werft no-test
Run Leeway with --dont-test

Publish

/werft publish-to-npm
/werft publish-to-jb-marketplace

Installer

analytics=segment
with-dedicated-emulation
workspace-feature-flags
Add desired feature flags to the end of the line above, space separated

Preview Environment / Integration Tests

/werft with-local-preview
If enabled this will build install/preview
/werft with-preview
/werft with-large-vm
/werft with-gce-vm
If enabled this will create the environment on GCE infra
with-integration-tests=all
Valid options are all, workspace, webapp, ide, jetbrains, vscode, ssh. If enabled, with-preview and with-large-vm will be enabled.
with-monitoring

/hold

Furisto · 2023-08-02T20:37:26Z

install/installer/pkg/components/ws-manager-mk2/deployment.go

@@ -176,7 +176,7 @@ func deployment(ctx *common.RenderContext) ([]runtime.Object, error) {
 			},
 			Spec: appsv1.DeploymentSpec{
 				Selector: &metav1.LabelSelector{MatchLabels: labels},
-				Replicas: common.Replicas(ctx, Component),
+				Replicas: pointer.Int32(2),


This could cause issues with activity timeouts. The load balancer will not know which instance is the leader so workspace activity could be reported to the standby and workspaces will time out. I think in practice it will not be a problem because we send heartbeats often enough that the leader will still get notified before that happens. Something to be aware of though.

That's a good point. I think we should find a solution for that before going with 2 replicas, otherwise it will become hard to investigate any timeout issues

This could cause issues with activity timeouts. The load balancer will not know which instance is the leader so workspace activity could be reported to the standby and workspaces will time out. I think in practice it will not be a problem because we send heartbeats often enough that the leader will still get notified before that happens. Something to be aware of though.

I think there's a misconception here, only the leader process requests to the CRD types

The load balancer only sees one instance, not two

That's a good point. I think we should find a solution for that before going with 2 replicas, otherwise it will become hard to investigate any timeout issues

Can you expand what do you mean by timeout issues?

The load balancer only sees one instance, not two

Why does it see only one? If we have two replicas would it not send requests to all two? Does the standby not become ready?

Can you expand what do you mean by timeout issues?

The workspaces send activity requests to ws-manager in order to mark the workspace as active. We store this information in memory in ws-manager. If the requests would go to the standby the active manager (which checks the timeouts) would not have the information that the workspace is active because that information is stored by the standby.

Correct, the standby is not ready to serve, but passing the ready health checks.

You can see this in the preview killing the current leader while tailing the other pod logs

We store this information in memory in ws-manager

This is wrong if when we switch the leader the other pod starts killing workspaces without waiting for status reports from workspaces

There's a callback we can implement when a pod is elected the new leader and wait for workspace reports before change anything

Furisto · 2023-08-14T08:30:24Z

We have decided that the impact of the above scenario is less than not merging this PR.

This reverts commit 12d7430.

roboquat added do-not-merge/work-in-progress do-not-merge/hold size/S labels Aug 2, 2023

aledbf changed the title ~~Enable leader election in wa-manager-mk2~~ Enable leader election in ws-manager-mk2 Aug 2, 2023

aledbf marked this pull request as ready for review August 2, 2023 18:30

aledbf requested a review from a team as a code owner August 2, 2023 18:30

roboquat removed the do-not-merge/work-in-progress label Aug 2, 2023

aledbf removed the do-not-merge/hold label Aug 2, 2023

github-actions bot added the team: team-engine label Aug 2, 2023

Furisto reviewed Aug 2, 2023

View reviewed changes

csweichel approved these changes Aug 9, 2023

View reviewed changes

aledbf added 2 commits August 13, 2023 20:52

Enable leader election in wa-manager-mk2

e737b6b

Remove old flag

3cecb74

aledbf force-pushed the aledbf/ha-mk2 branch from 669ba42 to 3cecb74 Compare August 13, 2023 20:52

Furisto approved these changes Aug 14, 2023

View reviewed changes

roboquat merged commit 12d7430 into main Aug 14, 2023

roboquat deleted the aledbf/ha-mk2 branch August 14, 2023 08:28

aledbf added a commit that referenced this pull request Aug 14, 2023

Revert "Enable leader election in ws-manager-mk2 (#18419)"

ae2212b

This reverts commit 12d7430.

aledbf mentioned this pull request Aug 14, 2023

Revert "Enable leader election in ws-manager-mk2" #18508

Merged

roboquat pushed a commit that referenced this pull request Aug 14, 2023

Revert "Enable leader election in ws-manager-mk2 (#18419)" (#18508)

02b5cbe

This reverts commit 12d7430.

aledbf mentioned this pull request Aug 14, 2023

Enable leader election in ws-manager-mk2 #18511

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable leader election in ws-manager-mk2 #18419

Enable leader election in ws-manager-mk2 #18419

Uh oh!

aledbf commented Aug 2, 2023 •

edited by github-actions bot

Loading

Uh oh!

Furisto Aug 2, 2023

Uh oh!

WVerlaek Aug 3, 2023

Uh oh!

aledbf Aug 3, 2023

Uh oh!

aledbf Aug 3, 2023

Uh oh!

Furisto Aug 3, 2023 •

edited

Loading

Uh oh!

aledbf Aug 3, 2023

Uh oh!

aledbf Aug 3, 2023

Uh oh!

aledbf Aug 3, 2023

Uh oh!

Furisto commented Aug 14, 2023

Uh oh!

Uh oh!

Enable leader election in ws-manager-mk2 #18419

Enable leader election in ws-manager-mk2 #18419

Uh oh!

Conversation

aledbf commented Aug 2, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

🤖 Generated by Copilot at f4df6f5

Related Issue(s)

How to test

Preview status

Build Options

Uh oh!

Furisto Aug 2, 2023

Choose a reason for hiding this comment

Uh oh!

WVerlaek Aug 3, 2023

Choose a reason for hiding this comment

Uh oh!

aledbf Aug 3, 2023

Choose a reason for hiding this comment

Uh oh!

aledbf Aug 3, 2023

Choose a reason for hiding this comment

Uh oh!

Furisto Aug 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aledbf Aug 3, 2023

Choose a reason for hiding this comment

Uh oh!

aledbf Aug 3, 2023

Choose a reason for hiding this comment

Uh oh!

aledbf Aug 3, 2023

Choose a reason for hiding this comment

Uh oh!

Furisto commented Aug 14, 2023

Uh oh!

Uh oh!

aledbf commented Aug 2, 2023 •

edited by github-actions bot

Loading

`🤖 Generated by Copilot at f4df6f5`

Furisto Aug 3, 2023 •

edited

Loading