-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Fix quadratic performance of the ListMerger
in specific usage pattern
#70910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix quadratic performance of the ListMerger
in specific usage pattern
#70910
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks promising thanks for the PR
g.wait() | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for including a benchmark 👍
@swift-ci please smoke test |
@swift-ci please smoke test |
@swift-ci please build toolchain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I've reviewed this patch, and I think it's correct. I'm a little ambivalent about taking it, though, because it feels like a very small piece of progress that builds on top of an existing mistake. Your patch is storing things in the actor's private state without any additional synchronization, and that's fine because we only access that state while we're the current thread processing the actor. So... why don't we just go ahead and split the queue? There are only five supported priority values right now; we probably shouldn't fall over if we see an extended priority, but we're also not required to priority-sort them, so let's just make five doubly-linked lists and split jobs into the right list during preprocessing. We can keep a bit-field of which buckets have jobs in the status. Escalation can just splice the job out of its old list and put it in the new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We chatted off thread on the forums with @nickolas-pohilets -- he'll try the suggested approach. Marking request-changes for tracking purposes :)
@rjmccall, I get the general idea of having 5 double-linked lists, but I'm afraid I did not understand some of the details that you've mentioned.
What do you mean by "extended priority"? A value not present in the enum, e.g. When talking about priority-sorting, what does "them" refer to? Jobs (which currently are sorted by priority), lists or something else?
Why do we need bitfields? Can't we just check the head pointers of the double-linked lists?
Does this happen already? I cannot find it in code. AFAICS, escalating priority affects the priority of the thread that drains the actor, but does not change relative order of the jobs. Am I missing something? |
I've pushed a commit, which implements actor's queue as a single linked list but with 5 insertion points. This is sufficient to ensure that each job is preprocessed in O(1). But it won't work for unknown priorities. And there are no changes related to the priority escalation. Let me know what you think. This eliminates usages of the |
Oh, sorry, I didn't see this comment. Let me respond to it as well for clarity's sake.
Yes, exactly. I'd like to stay future-proof about supporting those other priority values as long as it doesn't cost us too much (and I don't expect it will).
Jobs, yes. I'm just saying that it's okay if we effectively treat jobs with extended priorities as if they had one of the canonical priorities.
We can, yes. That's probably not a significant burden.
Oh, maybe I was thinking of something I've thought about doing but haven't ever done. Nevermind, then. |
e7032ce
to
f7cdd42
Compare
Updated. Prioritised and non-prioritised queues are fully separated. It was a bit challenging to update all the places where there is a check that queue is empty. Before it required only an atomic read of the
Renamed I've had to share some code between Benchmark results look similar:
|
Very nice work, thank you for the update @nickolas-pohilets ! Will give it a read shortly |
8921285
to
a9d7613
Compare
@swift-ci Please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking pretty good. Thank you for persevering through a long review here! I think we're getting close to done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, this generally looks good. I'm fine with how the assertion in destroy currently works; I can take a look myself about that. Please squash it down to a reasonable set of final commits, and then we can see about getting this merged.
@swift-ci Please test |
a003496
to
87f4a33
Compare
@swift-ci Please test |
@swift-ci Please test macOS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice, just a few minor comments.
…tes 2+ new jobs of the same priority See https://forums.swift.org/t/quadratic-performance-of-the-listmerger-in-specific-use-case/69393
…d in O(1) Fully separated unprocessed jobs and processed jobs Reverse jobs after updating status to minimise contention
87f4a33
to
9ba09ff
Compare
@rjmccall, do you want some TODOs to help you get back to the topic of ordering in actor destruction and assertions? |
@swift-ci Please test |
It's okay, I'll keep track of it. Thanks for asking, though. |
@nickolas-pohilets Merged. Thank you! |
These changes all reflect the the name `DefaultActorImplFooter::prioritizedJobs`. See also #70910 (comment).
When visiting a tree and creating new job per each tree node,
ListMerger::merge()
can exhibit quadratic performance.Every time
DefaultActorImpl::drainOne()
is called we have a queue of jobs consisting of 2 unprocessed jobs (in reverse order), and N processed ones, where N keeps growing. All jobs have the same priority.Unprocessed jobs are taken out of the queue and reversed in the process.
A new instance of
ListMerger
is used to merge this list of 2 jobs into the list of N processed jobs. But merging is always an appending and requires linear traversal of the entire list of N processed jobs to find the insertion point.This PR fixes this by caching in the
DefaultActorImpl
information about the last insertion point of theListMerger
.Before the change:
After the change
As expected, there is no significant difference for small number of jobs, but there is a noticeable improvement for large number of jobs.
Note that API used in attached benchmark first schedules task on the generic executor, which changes pattern of enqueueing and draining. Several drains may happen in a row, and
ListMerger::merge()
is called only for the first one. If in the future new API allows to schedule jobs directly onto actor, this change will have much bigger impact.