Enable leader election in ws-manager-mk2 #18511

aledbf · 2023-08-14T13:31:45Z

Description

Replaces #18419

Description

Replaces #18419

This introduces an edge case when the maintenance mode is triggered, and we deploy a new version. The standby replica never gets the configmap update. We change the strategy to have one or more stand-by replicas waiting to be the leader and all the replicas watch the configuration configmap.

Summary generated by Copilot

`🤖 Generated by Copilot at ae677ea`

This pull request adds leader election for the ws-manager-mk2 component using the Kubernetes API and a Lease object. It removes the --leader-elect argument from the component and its deployment files, as it is no longer needed. It also reorders some imports in the sample-workspace command.

Preview status

Gitpod was successfully deployed to your preview environment.

🏷️ Name - alerdbf-ha-mk2
🔗 URL - alerdbf-ha-mk2.preview.gitpod-dev.com/workspaces.
📚 Documentation - See our internal documentation for information on how to interact with your preview environment.
📦 Version - alerdbf-ha-mk2-gha.15276

Build Options

Build

/werft with-werft
Run the build with werft instead of GHA
leeway-no-cache
/werft no-test
Run Leeway with --dont-test

Publish

/werft publish-to-npm
/werft publish-to-jb-marketplace

Installer

analytics=segment
with-dedicated-emulation
workspace-feature-flags
Add desired feature flags to the end of the line above, space separated

Preview Environment / Integration Tests

/werft with-local-preview
If enabled this will build install/preview
/werft with-preview
/werft with-large-vm
/werft with-gce-vm
If enabled this will create the environment on GCE infra
with-integration-tests=all
Valid options are all, workspace, webapp, ide, jetbrains, vscode, ssh. If enabled, with-preview and with-large-vm will be enabled.
with-monitoring

/hold

aledbf · 2023-08-14T16:11:30Z

/gh run recreate-vm=true

Comment triggered a workflow run

Started workflow run: 5858026990

recreate_vm: true

aledbf · 2023-08-14T16:54:12Z

/gh run recreate-vm=true

Comment triggered a workflow run

Started workflow run: 5858423705

recreate_vm: true

.github/actions/integration-tests/action.yml

aledbf · 2023-08-15T19:26:57Z

/gh run recreate-vm=true

Comment triggered a workflow run

Started workflow run: 5870929595

recreate_vm: true

aledbf · 2023-08-15T20:04:13Z

/gh run recreate-vm=true

Comment triggered a workflow run

Started workflow run: 5871222912

recreate_vm: true

aledbf · 2023-08-16T07:32:51Z

/gh run recreate-vm=true

Comment triggered a workflow run

Started workflow run: 5875956095

recreate_vm: true

aledbf · 2023-08-16T10:00:11Z

/gh run recreate-vm=true

Comment triggered a workflow run

Started workflow run: 5877356816

recreate_vm: true

aledbf · 2023-08-16T11:51:57Z

/gh run recreate-vm=true

Comment triggered a workflow run

Started workflow run: 5878388959

recreate_vm: true

aledbf · 2023-08-16T13:31:30Z

/gh run recreate-vm=true

Comment triggered a workflow run

Started workflow run: 5879474569

recreate_vm: true

WVerlaek

looks good, some questions

Also coming back to a question on the previous PR, how do we ensure workspaces don't timeout when a pod gets elected as leader after it had been running for a while?

We use the controller's startup time as the workspace's last activity in

gitpod/components/ws-manager-mk2/pkg/activity/activity.go

Line 42 in 3e8e061

return &w.ManagerStartedAt

. If a pod that was standby for 1 hour gets elected, it won't yet have workspace activity stored (as it didn't receive MarkActive requests), so it will think the workspace's last activity was 1 hour ago and timeout the workspace.

We could change ManagerStartedAt to e.g. ControllerActiveAt, and set this once a pod becomes elected?

test/run.sh

components/ws-manager-mk2/controllers/workspace_controller.go

aledbf · 2023-08-17T07:44:30Z

Also coming back to a question on the previous PR, how do we ensure workspaces don't timeout when a pod gets elected as leader after it had been running for a while?

This is similar to when we restart ws-manager-mk2 or we deploy a new version. The worst-case scenario is workspaces will run for more time than it should due to the lost state.

WVerlaek · 2023-08-17T07:48:59Z

This is similar to when we restart ws-manager-mk2 or we deploy a new version. The worst-case scenario is workspaces will run for more time than it should due to the lost state.

I don't think it is though, on a restart the ManagerStartedAt field also gets reset, but not on leader election. Unless I'm missing something I do believe that pods will timeout after an old standby pod gets elected

aledbf · 2023-08-17T07:55:14Z

components/ws-manager-mk2/main.go

+	go func() {
+		for {
+			<-mgr.Elected()
+			activity.ManagerStartedAt = time.Now()


I don't think it is though, on a restart the ManagerStartedAt field also gets reset, but not on leader election. Unless I'm missing something I do believe that pods will timeout after an old standby pod gets elected

Here

components/ws-manager-mk2/main.go

This reverts commit 2838b78.

roboquat added do-not-merge/work-in-progress do-not-merge/hold size/L labels Aug 14, 2023

aledbf force-pushed the alerdbf/ha-mk2 branch 2 times, most recently from 1a95c59 to 765d4cc Compare August 15, 2023 08:11

roboquat added size/XL size/L and removed size/L size/XL labels Aug 15, 2023

kylos101 reviewed Aug 15, 2023

View reviewed changes

.github/actions/integration-tests/action.yml Outdated Show resolved Hide resolved

roboquat added size/XL and removed size/L labels Aug 15, 2023

aledbf force-pushed the alerdbf/ha-mk2 branch from 9d87d6e to ea1e194 Compare August 15, 2023 18:42

roboquat added size/L size/XL and removed size/XL size/L labels Aug 15, 2023

aledbf force-pushed the alerdbf/ha-mk2 branch from c394a99 to 1f4884b Compare August 16, 2023 11:24

Enable leader election in wa-manager-mk2

29db4de

aledbf force-pushed the alerdbf/ha-mk2 branch 2 times, most recently from b58232d to 04c44f4 Compare August 16, 2023 21:40

Switch to four parallel tests

3e8e061

aledbf force-pushed the alerdbf/ha-mk2 branch from ca4860e to 3e8e061 Compare August 17, 2023 06:45

aledbf marked this pull request as ready for review August 17, 2023 06:54

aledbf requested review from a team as code owners August 17, 2023 06:54

roboquat removed the do-not-merge/work-in-progress label Aug 17, 2023

github-actions bot added team: team-experience team: team-engine labels Aug 17, 2023

WVerlaek reviewed Aug 17, 2023

View reviewed changes

test/run.sh Show resolved Hide resolved

test/run.sh Show resolved Hide resolved

components/ws-manager-mk2/controllers/workspace_controller.go Outdated Show resolved Hide resolved

Cleanup

2429e44

Update activity started time after leader election

85aeaa5

aledbf commented Aug 17, 2023

View reviewed changes

WVerlaek approved these changes Aug 17, 2023

View reviewed changes

aledbf added the do-not-merge/work-in-progress label Aug 17, 2023

WVerlaek reviewed Aug 17, 2023

View reviewed changes

components/ws-manager-mk2/main.go Show resolved Hide resolved

aledbf added 2 commits August 17, 2023 08:58

Count maintenance test failures

6beecda

Wait before updating the activity started time

7847cd5

aledbf force-pushed the alerdbf/ha-mk2 branch from 9ae5251 to 7847cd5 Compare August 17, 2023 08:58

aledbf removed do-not-merge/work-in-progress do-not-merge/hold labels Aug 17, 2023

roboquat merged commit 2838b78 into main Aug 17, 2023

roboquat deleted the alerdbf/ha-mk2 branch August 17, 2023 09:13

aledbf added a commit that referenced this pull request Aug 17, 2023

Revert "Enable leader election in ws-manager-mk2 (#18511)"

f5584fe

This reverts commit 2838b78.

aledbf mentioned this pull request Aug 17, 2023

Revert "Enable leader election in ws-manager-mk2" #18537

Merged

roboquat pushed a commit that referenced this pull request Aug 17, 2023

Revert "Enable leader election in ws-manager-mk2 (#18511)" (#18537)

7e3ccd1

This reverts commit 2838b78.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable leader election in ws-manager-mk2 #18511

Enable leader election in ws-manager-mk2 #18511

Uh oh!

aledbf commented Aug 14, 2023 •

edited

Loading

Uh oh!

aledbf commented Aug 14, 2023 •

edited by github-actions bot

Loading

Uh oh!

aledbf commented Aug 14, 2023 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

aledbf commented Aug 15, 2023 •

edited by github-actions bot

Loading

Uh oh!

aledbf commented Aug 15, 2023 •

edited by github-actions bot

Loading

Uh oh!

aledbf commented Aug 16, 2023 •

edited by github-actions bot

Loading

Uh oh!

aledbf commented Aug 16, 2023 •

edited by github-actions bot

Loading

Uh oh!

aledbf commented Aug 16, 2023 •

edited by github-actions bot

Loading

Uh oh!

aledbf commented Aug 16, 2023 •

edited by github-actions bot

Loading

Uh oh!

WVerlaek left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aledbf commented Aug 17, 2023

Uh oh!

WVerlaek commented Aug 17, 2023

Uh oh!

aledbf Aug 17, 2023

Uh oh!

Uh oh!

Uh oh!

Enable leader election in ws-manager-mk2 #18511

Enable leader election in ws-manager-mk2 #18511

Uh oh!

Conversation

aledbf commented Aug 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Description

🤖 Generated by Copilot at ae677ea

Preview status

Build Options

Uh oh!

aledbf commented Aug 14, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment triggered a workflow run

Uh oh!

aledbf commented Aug 14, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment triggered a workflow run

Uh oh!

Uh oh!

aledbf commented Aug 15, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment triggered a workflow run

Uh oh!

aledbf commented Aug 15, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment triggered a workflow run

Uh oh!

aledbf commented Aug 16, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment triggered a workflow run

Uh oh!

aledbf commented Aug 16, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment triggered a workflow run

Uh oh!

aledbf commented Aug 16, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment triggered a workflow run

Uh oh!

aledbf commented Aug 16, 2023 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment triggered a workflow run

Uh oh!

WVerlaek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aledbf commented Aug 17, 2023

Uh oh!

WVerlaek commented Aug 17, 2023

Uh oh!

aledbf Aug 17, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aledbf commented Aug 14, 2023 •

edited

Loading

`🤖 Generated by Copilot at ae677ea`

aledbf commented Aug 14, 2023 •

edited by github-actions bot

Loading

aledbf commented Aug 14, 2023 •

edited by github-actions bot

Loading

aledbf commented Aug 15, 2023 •

edited by github-actions bot

Loading

aledbf commented Aug 15, 2023 •

edited by github-actions bot

Loading

aledbf commented Aug 16, 2023 •

edited by github-actions bot

Loading

aledbf commented Aug 16, 2023 •

edited by github-actions bot

Loading

aledbf commented Aug 16, 2023 •

edited by github-actions bot

Loading

aledbf commented Aug 16, 2023 •

edited by github-actions bot

Loading