-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[ws-manager-mk2] Loadgen fixes, concurrent reconciliation #16613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1bacab8
32918e8
b0c58b2
a70fe53
b4415e5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -55,7 +55,7 @@ var benchmarkCommand = &cobra.Command{ | |
} | ||
|
||
var load loadgen.LoadGenerator | ||
load = loadgen.NewFixedLoadGenerator(500*time.Millisecond, 300*time.Millisecond) | ||
load = loadgen.NewFixedLoadGenerator(800*time.Millisecond, 300*time.Millisecond) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. slightly decreased the rate, this was creating workspaces too quickly for mk2, as mk2's StartWorkspace request doesn't block and was following the 2/second rate. For mk1 loadtests, the StartWorkspace request takes seconds (to minutes) to complete, and never reached a rate of 2 starts/second anyways. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@WVerlaek were you hitting a rate limit of ws-manager-mk2, where it wasn't allowing additional gRPC connections? I assume yes, just curious. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no, some workspaces were failing to start, there were too many starting at once and pulling an image causing some to fail to pull. Increased the delay a bit to slow down workspace creation, but at this rate it's still faster than what mk1 would handle. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool, I see, so it's just a natural breaking limit. Good to know! What was failing on pull? registry-facade, containerd? Something else? Just curious. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The errors were failing to pull image from registry-facade due to IO timeout |
||
load = loadgen.NewWorkspaceCountLimitingGenerator(load, scenario.Workspaces) | ||
|
||
template := &api.StartWorkspaceRequest{ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -222,9 +222,12 @@ func (w *WsmanExecutor) StopAll(ctx context.Context) error { | |
if err != nil { | ||
log.Warnf("could not get workspaces: %v", err) | ||
} else { | ||
if len(resp.GetStatus()) == 0 { | ||
n := len(resp.GetStatus()) | ||
if n == 0 { | ||
break | ||
} | ||
ex := resp.GetStatus()[0] | ||
log.Infof("%d workspaces remaining, e.g. %s", n, ex.Id) | ||
Comment on lines
+225
to
+230
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. some extra logging while stopping workspaces after a load test to view progress. Also include a workspace ID of a stopping workspace, to make it easy to inspect a workspace stuck in stopping |
||
} | ||
|
||
select { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -221,10 +221,11 @@ func configmap(ctx *common.RenderContext) ([]runtime.Object, error) { | |
Interrupted: util.Duration(5 * time.Minute), | ||
}, | ||
//EventTraceLog: "", // todo(sje): make conditional based on config | ||
ReconnectionInterval: util.Duration(30 * time.Second), | ||
RegistryFacadeHost: fmt.Sprintf("reg.%s:%d", ctx.Config.Domain, common.RegistryFacadeServicePort), | ||
WorkspaceCACertSecret: customCASecret, | ||
TimeoutMaxConcurrentReconciles: 5, | ||
ReconnectionInterval: util.Duration(30 * time.Second), | ||
RegistryFacadeHost: fmt.Sprintf("reg.%s:%d", ctx.Config.Domain, common.RegistryFacadeServicePort), | ||
WorkspaceCACertSecret: customCASecret, | ||
WorkspaceMaxConcurrentReconciles: 15, | ||
TimeoutMaxConcurrentReconciles: 15, | ||
Comment on lines
+227
to
+228
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. set both the timeout and workspace controller's max reconciles to 15. This number is slightly arbitrary but should be sufficient for us looking at the metrics during the loadgen. It's in config, so we can easily change it anyways |
||
}, | ||
Content: struct { | ||
Storage storageconfig.StorageConfig `json:"storage"` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also finish disposal if
ContentReady
condition isn't present. Fixes workspaces stuck in Stopping when the condition isn't added due to e.g. workspace startup failure