feat(replay): Keep 30-60s instead of 0-60s of recording in replay errror mode #6924

mydea · 2023-01-24T19:05:10Z

Currently, we keep up to 60s of recording data in buffer, for when an error occurs. This means that if you have bad luck, you may only get a second of replay recording when an error happens.

This PR changes this by making sure we keep one full checkout worth of data in the cache, meaning you should get 30-60s of recording data instead (by also reducing the checkout time to every 30s instead of every 60s).

This was a bit more involved than anticipated, but in the process I also streamlined the worker code a bit. Now, the worker is stateless. Since we already keep the pending events in the main script memory anyhow, we can just send all events along when we want to compress them. This allows us to simplify this code quite a bit, especially with the changes here where the buffer has to hold a bit more state so we can clear only the desired cache.

The only potential downside of this is that theoretically, the local cache could be very large, and we may transfer a single very large payload to the worker. However, my research showed that payloads up to 100-200MB should not be a problem, and I think we're unlikely to exceed this in 60 seconds. Worst case, compression would fail, in which case we fall back to just use the uncompressed payload.

This change also means that addEvent and clear can be sync again, as we always just write to local memory. Only finish has to be async, which also simplifies some things a bit!

This also fixes an issue where the initial event sent is usually uncompressed. This is because we immediately send the initial snapshot, at which time the worker is usually not loaded. This triggers the fallback behavior of sending the event uncompressed. With this PR, for the initial payload we wait for the worker to be loaded to send, to avoid this.

Closes #6908

ref #6923

mydea · 2023-01-24T19:11:19Z

packages/replay/src/eventBuffer/EventBufferCompressionWorker.ts

+    this.clear();
+
+    try {
+      return await this._compressEvents(this._getAndIncrementId(), pendingEvents);


This change means that if something goes wrong when processing, we don't error out but just send the events uncompressed.

mydea · 2023-01-24T19:14:09Z

packages/replay/test/integration/errorSampleRate.test.ts

@@ -321,11 +322,10 @@ describe('Integration | errorSampleRate', () => {
    });
  });

-  it('has correct timestamps when error occurs much later than initial pageload/checkout', async () => {
+  it('keeps up to the last two checkout events xxx', async () => {


Note: I tried to also add a test here that makes sure we clear out previous stuff after the previous checkout, but ran into many test issues here. So I ended up testing this in different places, mostly in addEvent.test as well as in eventBuffer.test.

github-actions · 2023-01-24T19:25:20Z

size-limit report 📦

Path	Size
@sentry/browser - ES5 CDN Bundle (gzipped + minified)	19.85 KB (+0.01% 🔺)
@sentry/browser - ES5 CDN Bundle (minified)	61.55 KB (0%)
@sentry/browser - ES6 CDN Bundle (gzipped + minified)	18.52 KB (+0.01% 🔺)
@sentry/browser - ES6 CDN Bundle (minified)	54.85 KB (0%)
@sentry/browser - Webpack (gzipped + minified)	20.24 KB (0%)
@sentry/browser - Webpack (minified)	66.25 KB (0%)
@sentry/react - Webpack (gzipped + minified)	20.27 KB (0%)
@sentry/nextjs Client - Webpack (gzipped + minified)	47.56 KB (0%)
@sentry/browser + @sentry/tracing - ES5 CDN Bundle (gzipped + minified)	26.77 KB (+0.01% 🔺)
@sentry/browser + @sentry/tracing - ES6 CDN Bundle (gzipped + minified)	25.06 KB (+0.01% 🔺)
@sentry/replay ES6 CDN Bundle (gzipped + minified)	44.15 KB (+0.22% 🔺)
@sentry/replay - Webpack (gzipped + minified)	38.87 KB (+0.16% 🔺)
@sentry/browser + @sentry/tracing + @sentry/replay - ES6 CDN Bundle (gzipped + minified)	61.47 KB (+0.16% 🔺)

billyvg

I still want to think about this some more, but a lot of these changes make sense.

The one thing I'm not sure about yet is compressing the entire buffer at once instead of having streaming updates. It will make it difficult for us to later break-up large segments into multiple envelopes (e.g. if we wanted to ensure that segments are under 64KB so that we can use keepalive).

packages/replay/src/eventBuffer/EventBufferArray.ts

billyvg · 2023-01-25T16:53:32Z

packages/replay/src/util/addEvent.ts

+
+      // Ensure we have the correct first checkout timestamp when an error occurs
+      if (!session.segmentId) {
+        replay.getContext().earliestEvent = eventBuffer.getFirstCheckoutTimestamp();


The timestamp of the first checkout is not necessarily the earliest event. This is because we add performance entries at flush time and events there can happen before rrweb checkout.

hmm, good point! I wonder what's the best way to solve this is then... I guess most correct is to check this at first flush time, and just get the earliest timestamp from the events array? WDYT?

packages/replay/src/replay.ts

packages/replay/src/eventBuffer/EventBufferArray.ts

slaesh · 2023-01-26T15:36:09Z

packages/replay/src/eventBuffer/EventBufferArray.ts

+    const pendingEvents = this.pendingEvents.slice();
+    this.clear();
+
+    return Promise.resolve(this._finishRecording(pendingEvents));


make function async?

Suggested change

return Promise.resolve(this._finishRecording(pendingEvents));

return this._finishRecording(pendingEvents);

We need to return a promise here to satisfy the interface, so need to wrap this.

if it's async, it will return an promise in any case :)

it's currently not async, though!

mydea · 2023-02-01T13:16:24Z

Superseded this by new PR: #7025

mydea added the Package: replay Issues related to the Sentry Replay SDK label Jan 24, 2023

mydea requested review from billyvg and Lms24 January 24, 2023 19:05

mydea self-assigned this Jan 24, 2023

mydea mentioned this pull request Jan 24, 2023

Error in compression worker being spammed after adding replay integration #6923

Closed

3 tasks

mydea force-pushed the fn/replay-error-mode-cache branch 2 times, most recently from 02a0131 to 751ec4a Compare January 24, 2023 19:10

mydea commented Jan 24, 2023

View reviewed changes

mydea force-pushed the fn/replay-error-mode-cache branch from 751ec4a to a1807d0 Compare January 24, 2023 19:14

mydea changed the title ~~feat(replay): Keep min. 60s instead of max. 60s of recording in replay errror mode~~ feat(replay): Keep 30-60s instead of 0-60s of recording in replay errror mode Jan 24, 2023

mydea force-pushed the fn/replay-error-mode-cache branch 3 times, most recently from 5238091 to 3a8f9f0 Compare January 24, 2023 21:00

billyvg reviewed Jan 25, 2023

View reviewed changes

mydea mentioned this pull request Jan 25, 2023

fix(replay): Handle compression worker errors more gracefully #6936

Merged

slaesh reviewed Jan 26, 2023

View reviewed changes

packages/replay/src/eventBuffer/EventBufferArray.ts Outdated Show resolved Hide resolved

slaesh reviewed Jan 26, 2023

View reviewed changes

mydea force-pushed the fn/replay-error-mode-cache branch from 3a8f9f0 to 23828b2 Compare January 30, 2023 10:45

mydea added 3 commits January 30, 2023 16:18

feat(replay): Keep min. 30s of data for error sessions

c34e11a

ref: Keep flat list of pending events

b6c02e6

ref: Use partitionedqueue for event buffer

cb6be34

mydea force-pushed the fn/replay-error-mode-cache branch from 4c5f3b3 to cb6be34 Compare January 31, 2023 08:46

mydea mentioned this pull request Feb 1, 2023

feat(replay): Keep min. 30 seconds of data in error mode #7025

Closed

mydea closed this Feb 1, 2023

AbhiPrasad deleted the fn/replay-error-mode-cache branch March 27, 2025 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(replay): Keep 30-60s instead of 0-60s of recording in replay errror mode #6924

feat(replay): Keep 30-60s instead of 0-60s of recording in replay errror mode #6924

Uh oh!

mydea commented Jan 24, 2023 •

edited

Loading

Uh oh!

mydea Jan 24, 2023

Uh oh!

mydea Jan 24, 2023

Uh oh!

github-actions bot commented Jan 24, 2023 •

edited

Loading

Uh oh!

billyvg left a comment

Uh oh!

Uh oh!

billyvg Jan 25, 2023

Uh oh!

mydea Jan 30, 2023

Uh oh!

Uh oh!

Uh oh!

slaesh Jan 26, 2023

Uh oh!

mydea Jan 26, 2023

Uh oh!

slaesh Jan 26, 2023

Uh oh!

mydea Jan 26, 2023

Uh oh!

mydea commented Feb 1, 2023

Uh oh!

Uh oh!

	return Promise.resolve(this._finishRecording(pendingEvents));
	return this._finishRecording(pendingEvents);

Uh oh!

feat(replay): Keep 30-60s instead of 0-60s of recording in replay errror mode #6924

feat(replay): Keep 30-60s instead of 0-60s of recording in replay errror mode #6924

Uh oh!

Conversation

mydea commented Jan 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

size-limit report 📦

Uh oh!

billyvg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mydea commented Feb 1, 2023

Uh oh!

Uh oh!

mydea commented Jan 24, 2023 •

edited

Loading

github-actions bot commented Jan 24, 2023 •

edited

Loading