Skip to content

[test] Fix workspace integration tests #17222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
May 3, 2023
Merged

[test] Fix workspace integration tests #17222

merged 24 commits into from
May 3, 2023

Conversation

kylos101
Copy link
Contributor

@kylos101 kylos101 commented Apr 14, 2023

Description

Stabilize workspace integration tests (we're down to one failure now, TestGitActions, which intermittently fails), which we'll tackle in a separate PR.

This required that we:

  1. update the packer image to the version being used in gen95 (WKS-107). The image contained many fixes (including a bump to containerd 1.6.20, K3s 1.26, etc.). These updates help avoid disk pressure issues, which was causing tests to fail intermittently. 💡 I suspect this was also leading to preview environments being unstable in general.
  2. make code changes in /test to account for recent product changes (like organizationId being required for createWorkspace with server)
  3. leverage the trust-manager manifests built into the packer image, and modify the startup script, so that it retries applying trust-manager manifests to workaround timing issues causing preview environment failures.

Also, I've added /test sub-folders to .github/CODEOWNERS. This should help teams (1) inspect changes (2) assert tests pass before merge to main and (3) collaborate on testing.

Related

Fixes WKS-70
Fixes WKS-107
Fixes PDEO-7
#17335

How to test

  1. git checkout -b some-name to a local branch, and then build a preview TF_VAR_infra_provider="gce" TF_VAR_with_large_vm=true leeway run dev:preview (Why? The branch for this PR is rate limited until May 2 at 6pm EST via Let's Encrypt.)
  2. run the workspace integration tests from kylos101/gen94-integ branch, with asset kylos101-gen94-integ-gha.8496. Here's a sample run that passed ✔️ .

Release Notes

NONE

Documentation

Build Options:

  • /werft with-werft
    Run the build with werft instead of GHA
  • leeway-no-cache
  • /werft no-test
    Run Leeway with --dont-test
Publish Options
  • /werft publish-to-npm
  • /werft publish-to-jb-marketplace
Installer Options
  • with-dedicated-emulation
  • with-ws-manager-mk2
  • workspace-feature-flags
    Add desired feature flags to the end of the line above, space separated

Preview Environment Options:

  • /werft with-local-preview
    If enabled this will build install/preview
  • /werft with-preview
  • /werft with-large-vm
  • /werft with-gce-vm
    If enabled this will create the environment on GCE infra
  • with-integration-tests=all
    Valid options are all, workspace, webapp, ide, jetbrains, vscode, ssh

/hold

@kylos101
Copy link
Contributor Author

@mustard-mh and @akosyakov for 👀 , because I'm changing integration tests (workspace tests have been horribly broken), and adding /test sub-folders to CODEOWNERS. Will hopefully be marking this PR as ready for review soon. 🙏

@akosyakov
Copy link
Member

On our side if running ide as target works then it is alright. When PR is not draft we are happy to review. @iQQBot looked quite a lot in integration testing, for sure can help 🙏

@kylos101 kylos101 force-pushed the kylos101/gen94-integ branch from 23ce740 to 61fa273 Compare April 26, 2023 02:19
@roboquat roboquat added size/XL and removed size/L labels Apr 26, 2023
@kylos101 kylos101 force-pushed the kylos101/gen94-integ branch from f293a1c to 8a85a26 Compare April 26, 2023 17:34
@roboquat roboquat added size/XXL and removed size/XL labels Apr 26, 2023
@kylos101 kylos101 force-pushed the kylos101/gen94-integ branch 5 times, most recently from 0aee955 to 6b27225 Compare April 28, 2023 18:45
@kylos101
Copy link
Contributor Author

The trust-manager fix in action:

root@kylos101-gc38a6d66fe:/home/kyle# journalctl -t install-trustmanager -xf
Apr 28 18:51:26 kylos101-gc38a6d66fe install-trustmanager[3784]: Starting to install trust manager
Apr 28 18:51:26 kylos101-gc38a6d66fe install-trustmanager[3818]: Sleeping...
Apr 28 18:51:32 kylos101-gc38a6d66fe install-trustmanager[4988]: Trust manager applied
Apr 28 18:51:32 kylos101-gc38a6d66fe install-trustmanager[5039]: Sleeping...
Apr 28 18:51:38 kylos101-gc38a6d66fe install-trustmanager[5929]: Trust manager applied
Apr 28 18:51:38 kylos101-gc38a6d66fe install-trustmanager[5955]: Sleeping...
Apr 28 18:51:43 kylos101-gc38a6d66fe install-trustmanager[6497]: Trust manager applied
Apr 28 18:51:43 kylos101-gc38a6d66fe install-trustmanager[6523]: Sleeping...
Apr 28 18:51:48 kylos101-gc38a6d66fe install-trustmanager[7101]: Trust manager applied
Apr 28 18:52:11 kylos101-gc38a6d66fe install-trustmanager[8364]: Finishing installing trust manager

@kylos101 kylos101 force-pushed the kylos101/gen94-integ branch 2 times, most recently from fb33798 to b9a68fb Compare April 28, 2023 19:37
@kylos101
Copy link
Contributor Author

Can build preview envs:

TF_VAR_infra_provider="gce" TF_VAR_with_large_vm=true leeway run dev:preview
SUCCESS: Installation is happy: https://kylos101-gc38a6d66fe.preview.gitpod-dev.com/workspaces

image

@kylos101 kylos101 force-pushed the kylos101/gen94-integ branch from b9a68fb to 87bd305 Compare April 28, 2023 20:23
kylos101 added 21 commits May 3, 2023 18:32
Might remove later...
But sometimes there's no team 🤷
We use UBP now, there is no more unleashed.

Also, remove the "ff" feature flag code (which was for PVC). It was mutating the username, resulting in Code 460 errors on createWorkspace
Tests intermittently fail with  to avoid intermittent failures
This way, we can assert tests are passing for all teams prior to merging
Test to see if flakeyness goes away...
...and bump the timeout because we reduced parallel runs
This:
1. updates from K3s 1.23 to 1.26
2. requires that we remove PodSecurityPolicy changes (as it's no longer supported)
3. resolves intermittent disk pressure issues
* We were getting PSP from rook/ceph, which I think was for PVC
* We were getting PSP from the monitoring-satellite
…git actions.

Why? We miss state transitions, it's not guaranteed each one will be returned, and there are other tests waiting.

For example, in the below log, we miss INITIALIZING, RUNNING, and STOPPING.

 workspace.go:369: attempt to create the workspace as user 0565bb3c-e724-4da9-84fb-22e2a7b23b8c, with context github.com/gitpod-io/gitpod-test-repo/tree/integration-test/commit
    workspace.go:411: attempt to get the workspace information: gitpodio-gitpodtestrepo-nscsowy1njb
    workspace.go:423: not preparing
    workspace.go:432: got the workspace information: gitpodio-gitpodtestrepo-nscsowy1njb
    workspace.go:460: wait for workspace to be fully up and running
    workspace.go:569: prepare for a connection with ws-manager
    workspace.go:590: established for a connection with ws-manager
    workspace.go:598: check if the status of workspace is in the running phase: 462f1325-3019-4547-8666-508e8353335e
    workspace.go:631: status: 462f1325-3019-4547-8666-508e8353335e, PENDING
    workspace.go:598: check if the status of workspace is in the running phase: 462f1325-3019-4547-8666-508e8353335e
    workspace.go:631: status: 462f1325-3019-4547-8666-508e8353335e, PENDING
    workspace.go:598: check if the status of workspace is in the running phase: 462f1325-3019-4547-8666-508e8353335e
    workspace.go:631: status: 462f1325-3019-4547-8666-508e8353335e, CREATING
    workspace.go:598: check if the status of workspace is in the running phase: 462f1325-3019-4547-8666-508e8353335e
    workspace.go:631: status: 462f1325-3019-4547-8666-508e8353335e, CREATING
    workspace.go:598: check if the status of workspace is in the running phase: 462f1325-3019-4547-8666-508e8353335e
    workspace.go:631: status: 462f1325-3019-4547-8666-508e8353335e, CREATING
    workspace.go:598: check if the status of workspace is in the running phase: 462f1325-3019-4547-8666-508e8353335e
    workspace.go:504: waiting for stopping the workspace: 462f1325-3019-4547-8666-508e8353335e
    workspace.go:514: attemp to delete the workspace: 462f1325-3019-4547-8666-508e8353335e
    workspace.go:797: confirmed the worksapce is stopped: 462f1325-3019-4547-8666-508e8353335e, STOPPED
    workspace.go:538: successfully terminated workspace
    git_test.go:172: failed to wait for the workspace to start up: cannot wait for workspace: context deadline exceeded
And use trust-manager from the packer image
@kylos101 kylos101 force-pushed the kylos101/gen94-integ branch from 81d9407 to a10d7d5 Compare May 3, 2023 18:33
@kylos101
Copy link
Contributor Author

kylos101 commented May 3, 2023

One test intermittently failed here: TestOpenWorkspaceFromOutdatedPrebuild/prebuild/it_should_open_a_workspace_from_with_an_older_prebuild_initializer_successfully_and_run_the_init_task/classic: github.com/gitpod-io/gitpod/test/tests/components/ws-manager

Will follow-up in a separate PR.

@roboquat roboquat merged commit 99eb259 into main May 3, 2023
@roboquat roboquat deleted the kylos101/gen94-integ branch May 3, 2023 20:07
@roboquat roboquat added deployed: webapp Meta team change is running in production deployed: IDE IDE change is running in production deployed: workspace Workspace team change is running in production deployed Change is completely running in production labels May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: IDE IDE change is running in production deployed: webapp Meta team change is running in production deployed: workspace Workspace team change is running in production deployed Change is completely running in production release-note-none size/XXL team: IDE team: webapp Issue belongs to the WebApp team team: workspace Issue belongs to the Workspace team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants