Skip to content

Add a report-only mode to the BalanceCapacity middleware #2495

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 11, 2020

Conversation

jtgeibel
Copy link
Member

@jtgeibel jtgeibel commented May 7, 2020

This adds a single environment variable that can be quickly set or
removed to toggle capacity enforcement. If this variable is set to any
value (including the empty string) then no requests will be rejected due
to load.

This will allow us to evolve the heuristics used in this middleware to
find the right balance before turning on enforcement.

This also increases the default logging percentage to a higher default.

r? @pietroalbini

This adds a single environment variable that can be quickly set or
removed to toggle capacity enforcement. If this variable is set to any
value (including the empty string) then no requests will be rejected due
to load.

This will allow us to evolve the heuristics used in this middleware to
find the right balance before turning on enforcement.

This also increases the default logging percentage to a higher default.
@jtgeibel
Copy link
Member Author

jtgeibel commented May 7, 2020

This is a quick and dirty implementation of the report-only functionality. I've captured some longer-term thoughts in #2496. My primary goal for this PR is to make the master branch deployable. (We can technically re-deploy master without this but that would require updating multiple environment variables and it isn't even clear that those environment variables are the right knobs long-term.)

Comment on lines 61 to 63
if self.report_only {
return handler.call(request);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of early exiting, it'd be nice to add some kind of log metadata like "would be dropped" to requests that would be dropped. This way we can analyze the impact with a single query in Papertrail.

@pietroalbini
Copy link
Member

Also, to unblock deployments to master we could simply set all environment variables to 100 beforehand.

@jtgeibel
Copy link
Member Author

jtgeibel commented May 8, 2020

I agree with your suggested changes above, and should be able to update the implementation later today or this weekend.

Also, to unblock deployments to master we could simply set all environment variables to 100 beforehand.

I just want to point out 1 non-obvious thing regarding this. With DB_POOL_SIZE=20 and SERVER_THREADS=25, bumping a limit to 100% will still leave 5 threads available to serve static assets and reject over-capacity requests. So we would need to set the variables to something greater than 125% to ensure the logic is completely disabled.

Sidebar: If there are more requests than threads then the web server queues the requests. This means that under heavy load the response times we log become inaccurate because a request can be blocked in the web server for some amount of time before recording the start time in the logging middleware.

jtgeibel added 2 commits May 9, 2020 16:48
Download requests are no longer counted as in-flight and the variable
name is updated to reflect this. By default 20% of database connections
are still reserved for updating download counts, but other requests will
be rejected less often. Additionally this saves 2 atomic updates when
serving the most popular request.
@jtgeibel jtgeibel force-pushed the web-capacity-report-only branch from f3bcf7d to 6494ac6 Compare May 9, 2020 20:49
@jtgeibel
Copy link
Member Author

jtgeibel commented May 9, 2020

Okay, I've added 2 commits. The first should address you comment. In report-only mode, a would_reject= field is added to the log. When not in this mode, the same reason is logged as a cause= to align with the existing terminology for failed requests.

The second commit moves downloads outside of the in-flight request tracking. By default 20% of database connections are still reserved for updating download counts, but by not counting these requests the limits should be hit less often.

@pietroalbini
Copy link
Member

This looks good!

@bors r+

@bors
Copy link
Contributor

bors commented May 11, 2020

📌 Commit 6494ac6 has been approved by pietroalbini

@bors
Copy link
Contributor

bors commented May 11, 2020

⌛ Testing commit 6494ac6 with merge 06509f7...

@bors
Copy link
Contributor

bors commented May 11, 2020

☀️ Test successful - checks-travis
Approved by: pietroalbini
Pushing 06509f7 to master...

@bors bors merged commit 06509f7 into rust-lang:master May 11, 2020
@jtgeibel jtgeibel deleted the web-capacity-report-only branch May 11, 2020 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants