-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
ref(replay): Pause recording when replay requests are rate-limited #6733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
packages/replay/src/replay.ts
Outdated
// TODO (v8): we can remove this guard once transport.end's type signature doesn't include void anymore | ||
if (response) { | ||
this._rateLimits = updateRateLimits(this._rateLimits, response); | ||
if (isRateLimited(this._rateLimits, 'replay_event') || isRateLimited(this._rateLimits, 'replay_recording')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this happens, it means the request has failed, correct? Which means we will have a "missing" segment? (No need to handle it right now, just confirming).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the segment we sent to get this response will be missing. Does this mess something up?
(I don't really see a way to get around this, fwiw)
8ea1580
to
5a88301
Compare
size-limit report 📦
|
f4b3567
to
e1d8815
Compare
// will retry in intervals of 5, 10, 30 | ||
this._retryInterval = this._retryCount * this._retryInterval; | ||
this._retryInterval = ++this._retryCount * this._retryInterval; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice optimization 👍
|
||
setTimeout(() => { | ||
__DEBUG_BUILD__ && logger.info('[Replay]', 'Resuming replay after rate limit'); | ||
this.resume(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity cc @billyvg - when we resume, do we automatically do a full checkout? I am not totally sure, as we just call rrweb's record
there, basically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least that's what the JSdoc of startRecording
tells me 😅
sentry-javascript/packages/replay/src/replay.ts
Lines 204 to 209 in 1f1eefe
/** | |
* Start recording. | |
* | |
* Note that this will cause a new DOM checkout | |
*/ | |
startRecording(): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point :D
const response = await transport.send(envelope); | ||
// TODO (v8): we can remove this guard once transport.send's type signature doesn't include void anymore | ||
if (response) { | ||
this._rateLimits = updateRateLimits(this._rateLimits, response); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, what's the reason to keep this on the class instance? Could we not just pass this through, e.g.:
const rateLimits = updateRateLimits({}, response);
// ...
this._handleRateLimits(rateLimits);
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added _rateLimits
to the class as I was initially worried about multiple pending requests which would both return different rate limit responses. Also, we keep the rateLimits object alive across multiple requests in the base transport.
But I guess the risk here is minimal, so I'm happy to change it to a local variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I am also not 100% sure. I guess if there are multiple pending requests, they would just both result in the _handleRateLimits
being called, right? Maybe we should add some logic there to handle this. Thinking of either:
a) Check if isPaused() === true
, and if so, do nothing
b) Cancel the previous timeout, if it exists, and create a new one
I'd say a) is the easier one - and it should still work if we run into the rate limit again, as it will just re-pause it then. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there are multiple pending requests, they would just both result in the _handleRateLimits being called, right?
Good point!
I'd say a) is the easier one
Agreed, will do this!
e2c8d97
to
3c827a8
Compare
private _handleRateLimit(): void { | ||
// in case recording is already paused, we don't need to do anything, as we might have already paused because of a | ||
// rate limit | ||
if (this.isPaused()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice and easy 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nice! I like it a lot. Great & relatively simple solution for a complex problem. 🚀
f092ae9
to
6d61f5b
Compare
Following up on #6710, this PR addresses option B and adds the following rate-limiting strategy to the Replay integration:
If we receive a rate limit response from the server after sending a replay event, we pause recording and flushing for the received rate limit period. After this period, we resume recording the same session, which triggers a full snapshot and hence should give us working replays. This should reduce segment-loss due to rate limiting, as we previously ignored handling rate limits specifically for replay (other than the transport's internal rate-limiting).
I added a test to verify this behaviour but couldn't test it easily with a test app. Happy to take suggestions how to test this better. Otherwise, I think this should be good to go.
ref #6710
ref #6533