-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[ws-manager-mk2] Refactor metrics with EverReady condition #17114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if c := wsk8s.GetCondition(ws.Status.Conditions, string(workspacev1.WorkspaceConditionFailed)); c != nil { | ||
reason = StopReasonFailed | ||
if !wsk8s.ConditionPresentAndTrue(ws.Status.Conditions, string(workspacev1.WorkspaceConditionEverReady)) { | ||
// Don't record 'failed' if there was a start failure. | ||
reason = StopReasonStartFailure | ||
} else if strings.Contains(c.Message, "Pod ephemeral local storage usage exceeds the total limit of containers") { | ||
reason = StopReasonOutOfSpace | ||
} | ||
} else if wsk8s.ConditionPresentAndTrue(ws.Status.Conditions, string(workspacev1.WorkspaceConditionAborted)) { | ||
reason = StopReasonAborted | ||
} else if wsk8s.ConditionPresentAndTrue(ws.Status.Conditions, string(workspacev1.WorkspaceConditionTimeout)) { | ||
reason = StopReasonTimeout | ||
} else if wsk8s.ConditionPresentAndTrue(ws.Status.Conditions, string(workspacev1.WorkspaceConditionClosed)) { | ||
reason = StopReasonTabClosed | ||
} else { | ||
reason = StopReasonRegular | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ordered these checks compared to mk1, to e.g. check for Failure
first. If there's both a failure and closed condition, we'd want the failure reason to be reported instead of closed
reason = StopReasonFailed | ||
if !wsk8s.ConditionPresentAndTrue(ws.Status.Conditions, string(workspacev1.WorkspaceConditionEverReady)) { | ||
// Don't record 'failed' if there was a start failure. | ||
reason = StopReasonStartFailure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
different from MK1: instead of not incrementing the stop metric when there was a start failure, we increment it now but with its own unique reason.
Description
Introduce an
EverReady
condition:Ready
, i.e. when content init succeeded and the supervisor readiness check passes.This condition then allows us to:
Stopped
phase without ever becoming readyRunning
toInitializing
if its container becomes unready. Once a workspace becomes ready it should not move backwards in phaseTo track workspace failures that happen after startup, a new
workspace_failure_total
metric is added which gets incremented whenever a workspaces receives theFailed
condition. This includes content init and disposal failures, but also all other possible failures.The
workspace_stops_total
metric is also updated to match MK1 behaviour by including the stopreason
label.Related Issue(s)
Fixes WKS-29, WKS-23
How to test
Run unit tests
Or in a preview env, check a workspace starts and stops as expected, and the right metrics are incremented.
E.g. the following stop metric reasons are reported for a workspace with an image build:
Release Notes
Documentation
Build Options:
Run the build with werft instead of GHA
Run Leeway with
--dont-test
Publish Options
Installer Options
Add desired feature flags to the end of the line above, space separated
Preview Environment Options:
If enabled this will build
install/preview
If enabled this will create the environment on GCE infra
Valid options are
all
,workspace
,webapp
,ide
,jetbrains
,vscode
,ssh