Skip to content

[obs] re-enable regular not active alerts #18341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 31, 2023
Merged

Conversation

kylos101
Copy link
Contributor

@kylos101 kylos101 commented Jul 24, 2023

Description

Now that gitpod_workspace_regular_not_active_percentage_mk2 is fixed, re-enable regular not active alerts.

Additionally, given yesterday's incident, we had data to test and fix the related alert expressions.

Depends on https://github.com/gitpod-io/runbooks/pull/418

Changes summary and walkthrough generated by Copilot

Related Issue(s)

Fixes ENG-20

How to test

Here you can see the recording rule is working again (it no longer has a value of 1), after I ran the enable alerts job on both workspace clusters.

image

Here you can see how the updated GitpodWorkspaceTooManyRegularNotActiveMk2 alert (f608648) would have caught the incident (this is me forwarding us102's prometheus to confirm the expression):

image

Similar corrections to GitpodWorkspacesNotStartingMk2 would have also helped catch the incident from July 26, too.

image

Documentation

Preview status

Gitpod was successfully deployed to your preview environment.

Build Options

Build
  • /werft with-werft
    Run the build with werft instead of GHA
  • leeway-no-cache
  • /werft no-test
    Run Leeway with --dont-test
Publish
  • /werft publish-to-npm
  • /werft publish-to-jb-marketplace
Installer
  • analytics=segment
  • with-dedicated-emulation
  • workspace-feature-flags
    Add desired feature flags to the end of the line above, space separated
Preview Environment
  • /werft with-local-preview
    If enabled this will build install/preview
  • /werft with-preview
  • /werft with-large-vm
  • /werft with-gce-vm
    If enabled this will create the environment on GCE infra
  • with-integration-tests=all
    Valid options are all, workspace, webapp, ide, jetbrains, vscode, ssh
  • with-monitoring

/hold

@kylos101 kylos101 changed the title Kylos101/fix-active-rule [ws-manager-mk2] re-enable regular not active alert and add type label to workspace_activity_total Jul 24, 2023
@kylos101 kylos101 force-pushed the kylos101/fix-active-rule branch 4 times, most recently from 68fe33a to d93cad6 Compare July 25, 2023 23:04
@kylos101 kylos101 changed the title [ws-manager-mk2] re-enable regular not active alert and add type label to workspace_activity_total [obs] re-enable regular not active alerts Jul 25, 2023
@kylos101 kylos101 marked this pull request as ready for review July 25, 2023 23:16
@kylos101 kylos101 requested a review from a team as a code owner July 25, 2023 23:16
@roboquat roboquat added size/S and removed size/XS labels Jul 27, 2023
@kylos101 kylos101 force-pushed the kylos101/fix-active-rule branch 2 times, most recently from d208291 to b552177 Compare July 27, 2023 19:51
@kylos101 kylos101 requested a review from easyCZ July 27, 2023 20:34
@kylos101
Copy link
Contributor Author

Hey @easyCZ , could I ask for a follow-up review? I made a few changes. Yesterday's incident gave us a nice opportunity test if these underlying alerts would have triggered. 🙂

@kylos101 kylos101 force-pushed the kylos101/fix-active-rule branch from 750321c to d40f2c3 Compare July 31, 2023 13:21
@roboquat roboquat merged commit b90e12b into main Jul 31, 2023
@roboquat roboquat deleted the kylos101/fix-active-rule branch July 31, 2023 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants