Skip to content

[ws-manager-mk2] Remember existing workspaces after a restart #17066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 29, 2023

Conversation

WVerlaek
Copy link
Member

Description

Fixes a bug where existing workspaces would not get metrics recorded after ws-manager-mk2 restarts, due to rememberWorkspace only being called for newly created workspaces

Related Issue(s)

How to test

Tested in a load test

Release Notes

NONE

Documentation

Build Options:

  • /werft with-werft
    Run the build with werft instead of GHA
  • leeway-no-cache
  • /werft no-test
    Run Leeway with --dont-test
Publish Options
  • /werft publish-to-npm
  • /werft publish-to-jb-marketplace
Installer Options
  • with-dedicated-emulation
  • with-ws-manager-mk2
  • workspace-feature-flags
    Add desired feature flags to the end of the line above, space separated

Preview Environment Options:

  • /werft with-local-preview
    If enabled this will build install/preview
  • /werft with-preview
  • /werft with-large-vm
  • /werft with-gce-vm
    If enabled this will create the environment on GCE infra
  • with-integration-tests=all
    Valid options are all, workspace, webapp, ide, jetbrains, vscode, ssh

@WVerlaek WVerlaek requested a review from a team March 28, 2023 14:20
@WVerlaek WVerlaek self-assigned this Mar 28, 2023
@github-actions github-actions bot added the team: workspace Issue belongs to the Workspace team label Mar 28, 2023
Comment on lines +305 to +307
if disposeErr != nil {
wsc.emitEvent(ws, "Backup", fmt.Errorf("failed to backup workspace: %w", disposeErr))
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixes a Backup failure event being generated even on successful backups, as the error is wrapped and therefore always non-nil, so a failure event gets generated

@@ -148,6 +148,11 @@ func (r *WorkspaceReconciler) Reconcile(ctx context.Context, req ctrl.Request) (
func (r *WorkspaceReconciler) actOnStatus(ctx context.Context, workspace *workspacev1.Workspace, workspacePods corev1.PodList) (ctrl.Result, error) {
log := log.FromContext(ctx)

if workspace.Status.Phase != workspacev1.WorkspacePhaseStopped && !r.metrics.containsWorkspace(workspace) {
// If the workspace hasn't stopped yet, and we don't know about this workspace yet, remember it.
r.metrics.rememberWorkspace(workspace, nil)
Copy link
Member Author

@WVerlaek WVerlaek Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will construct a metricState for this workspace with the assumption that any existing conditions have already been reported on.

E.g. if the workspace already has a BackupFailed condition, the metricState will assume it's already been reported before ws-manager-mk2 restarted and not report it again.

We could potentially loose a few events due to this during a restart, but the alternative is re-reporting every condition for all existing workspaces after ws-manager-mk2 restarts

@roboquat roboquat merged commit c828d41 into main Mar 29, 2023
@roboquat roboquat deleted the wv/mk2-fix-metrics-after-restart branch March 29, 2023 18:43
@roboquat roboquat added deployed: workspace Workspace team change is running in production deployed Change is completely running in production labels Apr 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: workspace Workspace team change is running in production deployed Change is completely running in production release-note-none size/S team: workspace Issue belongs to the Workspace team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants