-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
ref(replay): improve dedupe logic #8836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
size-limit report 📦
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice thanks @JonasBa -- we should probably revisit this and figure out the root cause as there shouldn't be dupes.
A test seems to be legit failing: looks like we are missing an expected "op": "navigation.navigate"
event
@@ -8,11 +8,7 @@ export function setupPerformanceObserver(replay: ReplayContainer): PerformanceOb | |||
const performanceObserverHandler = (list: PerformanceObserverEntryList): void => { | |||
// For whatever reason the observer was returning duplicate navigation | |||
// entries (the other entry types were not duplicated). | |||
const newPerformanceEntries = dedupePerformanceEntries( | |||
replay.performanceEvents, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I don't remember if we were seeing duplicate entries across calls to the observer handler
Hmm, that test is sus, I wondered why it only fails in the esm bundle test and not in cjs as well |
Yeah could be flakey |
@billyvg I was too eager on my claim and wrong on my first implementation. I discarded the fact that replay used to persist the entries (I assume in case of a clear buffer event), meaning we cannot just dedupe the list in a single pass like I did. I refactored this to respect the original implementation by treating both lists as queues (reading from each based on startTime) which merges the two lists in m+n time. I will try and simplify and add comments to the implementation as it's not IMO not the most readable piece of code and might not be worth using if we want to optimize for readability. |
This issue has gone three weeks without activity. In another week, I will close it. But! If you comment or otherwise update it, I will reset the clock, and if you remove the label "A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀 |
dedupePerformanceEntries showed up in a couple of browser profiles so I took the liberty of optimizing it. Since perf buffer registries can grow large in cases of SPA's, the execution time will grow and could become a bottleneck (this is still the case after my change)
The optimization relies on the fact that entries are sorted by startTime - since the list is sorted, we can do this in On vs On^2 time as we do not need to check the entire list of entries for duplicates, but we can just check the range of value i<x<=list size.
A possible further optimization would be to only query performance for getEntriesByType(navigation) and only dedupe those (same for LCP), which would allow us to skip iterating over elements that we are not detecting duplicates for anyways.
I ran a quick benchmark and confirmed the new dedupe is ~85% faster (on an input of ~200 perf entries)
Side note: since it seems that some perf registries are unbounded, it might make sense to add some safeguarding in place to guard from very large lists or prioritize certain entries over others.