Skip to content

ref(replay): Pause recording when replay requests are rate-limited #6733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 12, 2023

Conversation

Lms24
Copy link
Member

@Lms24 Lms24 commented Jan 11, 2023

Following up on #6710, this PR addresses option B and adds the following rate-limiting strategy to the Replay integration:

If we receive a rate limit response from the server after sending a replay event, we pause recording and flushing for the received rate limit period. After this period, we resume recording the same session, which triggers a full snapshot and hence should give us working replays. This should reduce segment-loss due to rate limiting, as we previously ignored handling rate limits specifically for replay (other than the transport's internal rate-limiting).

I added a test to verify this behaviour but couldn't test it easily with a test app. Happy to take suggestions how to test this better. Otherwise, I think this should be good to go.

ref #6710
ref #6533

@Lms24 Lms24 requested review from mydea and billyvg January 11, 2023 13:46
@Lms24 Lms24 marked this pull request as draft January 11, 2023 14:30
// TODO (v8): we can remove this guard once transport.end's type signature doesn't include void anymore
if (response) {
this._rateLimits = updateRateLimits(this._rateLimits, response);
if (isRateLimited(this._rateLimits, 'replay_event') || isRateLimited(this._rateLimits, 'replay_recording')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this happens, it means the request has failed, correct? Which means we will have a "missing" segment? (No need to handle it right now, just confirming).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the segment we sent to get this response will be missing. Does this mess something up?
(I don't really see a way to get around this, fwiw)

@Lms24 Lms24 force-pushed the lms-replay-ratelimits branch from 8ea1580 to 5a88301 Compare January 11, 2023 16:39
@github-actions
Copy link
Contributor

github-actions bot commented Jan 11, 2023

size-limit report 📦

Path Size
@sentry/browser - ES5 CDN Bundle (gzipped + minified) 19.84 KB (0%)
@sentry/browser - ES5 CDN Bundle (minified) 61.46 KB (0%)
@sentry/browser - ES6 CDN Bundle (gzipped + minified) 18.62 KB (+0.03% 🔺)
@sentry/browser - ES6 CDN Bundle (minified) 55 KB (0%)
@sentry/browser - Webpack (gzipped + minified) 20.38 KB (0%)
@sentry/browser - Webpack (minified) 66.55 KB (0%)
@sentry/react - Webpack (gzipped + minified) 20.4 KB (0%)
@sentry/nextjs Client - Webpack (gzipped + minified) 47.63 KB (0%)
@sentry/browser + @sentry/tracing - ES5 CDN Bundle (gzipped + minified) 26.82 KB (+0.03% 🔺)
@sentry/browser + @sentry/tracing - ES6 CDN Bundle (gzipped + minified) 25.25 KB (-0.01% 🔽)
@sentry/replay ES6 CDN Bundle (gzipped + minified) 43.26 KB (+0.74% 🔺)
@sentry/replay - Webpack (gzipped + minified) 38.5 KB (+0.46% 🔺)

@Lms24 Lms24 force-pushed the lms-replay-ratelimits branch from f4b3567 to e1d8815 Compare January 11, 2023 16:59
@Lms24 Lms24 marked this pull request as ready for review January 11, 2023 17:28
// will retry in intervals of 5, 10, 30
this._retryInterval = this._retryCount * this._retryInterval;
this._retryInterval = ++this._retryCount * this._retryInterval;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice optimization 👍


setTimeout(() => {
__DEBUG_BUILD__ && logger.info('[Replay]', 'Resuming replay after rate limit');
this.resume();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity cc @billyvg - when we resume, do we automatically do a full checkout? I am not totally sure, as we just call rrweb's record there, basically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least that's what the JSdoc of startRecording tells me 😅

/**
* Start recording.
*
* Note that this will cause a new DOM checkout
*/
startRecording(): void {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point :D

const response = await transport.send(envelope);
// TODO (v8): we can remove this guard once transport.send's type signature doesn't include void anymore
if (response) {
this._rateLimits = updateRateLimits(this._rateLimits, response);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what's the reason to keep this on the class instance? Could we not just pass this through, e.g.:

const rateLimits = updateRateLimits({}, response);
// ...
this._handleRateLimits(rateLimits);

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added _rateLimits to the class as I was initially worried about multiple pending requests which would both return different rate limit responses. Also, we keep the rateLimits object alive across multiple requests in the base transport.

But I guess the risk here is minimal, so I'm happy to change it to a local variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I am also not 100% sure. I guess if there are multiple pending requests, they would just both result in the _handleRateLimits being called, right? Maybe we should add some logic there to handle this. Thinking of either:

a) Check if isPaused() === true, and if so, do nothing
b) Cancel the previous timeout, if it exists, and create a new one

I'd say a) is the easier one - and it should still work if we run into the rate limit again, as it will just re-pause it then. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there are multiple pending requests, they would just both result in the _handleRateLimits being called, right?

Good point!

I'd say a) is the easier one

Agreed, will do this!

private _handleRateLimit(): void {
// in case recording is already paused, we don't need to do anything, as we might have already paused because of a
// rate limit
if (this.isPaused()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and easy 👍

Copy link
Member

@mydea mydea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nice! I like it a lot. Great & relatively simple solution for a complex problem. 🚀

@Lms24 Lms24 force-pushed the lms-replay-ratelimits branch from f092ae9 to 6d61f5b Compare January 12, 2023 14:14
@Lms24 Lms24 merged commit b921631 into master Jan 12, 2023
@Lms24 Lms24 deleted the lms-replay-ratelimits branch January 12, 2023 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants