Fix quadratic performance of the `ListMerger` in specific usage pattern #70910

nickolas-pohilets · 2024-01-13T20:31:58Z

When visiting a tree and creating new job per each tree node, ListMerger::merge() can exhibit quadratic performance.

Every time DefaultActorImpl::drainOne() is called we have a queue of jobs consisting of 2 unprocessed jobs (in reverse order), and N processed ones, where N keeps growing. All jobs have the same priority.

Unprocessed jobs are taken out of the queue and reversed in the process.

A new instance of ListMerger is used to merge this list of 2 jobs into the list of N processed jobs. But merging is always an appending and requires linear traversal of the entire list of N processed jobs to find the insertion point.

This PR fixes this by caching in the DefaultActorImpl information about the last insertion point of the ListMerger.

Before the change:

$ SWIFT_DETERMINISTIC_HASHING=1 DYLD_LIBRARY_PATH=$RELEASE_BUILD_DIR/lib/swift/macosx $RELEASE_BUILD_DIR/bin/Benchmark_O-arm64-apple-macosx10.13 --tags=concurrency --num-samples=100 --num-iters=10 
  # TEST           SAMPLES      MIN   MEDIAN      MAX
 50 AsyncTree.100       100   54.900   62.000  137.600
 51 AsyncTree.5000      100 3720.600 3926.800 4603.800

After the change

$ SWIFT_DETERMINISTIC_HASHING=1 DYLD_LIBRARY_PATH=$RELEASE_BUILD_DIR/lib/swift/macosx $RELEASE_BUILD_DIR/bin/Benchmark_O-arm64-apple-macosx10.13 --tags=concurrency --num-samples=100 --num-iters=10
  # TEST           SAMPLES      MIN   MEDIAN      MAX
 50 AsyncTree.100       100   55.600   61.200  214.400
 51 AsyncTree.5000      100 2561.200 2796.100 3226.600

As expected, there is no significant difference for small number of jobs, but there is a noticeable improvement for large number of jobs.

Note that API used in attached benchmark first schedules task on the generic executor, which changes pattern of enqueueing and draining. Several drains may happen in a row, and ListMerger::merge() is called only for the first one. If in the future new API allows to schedule jobs directly onto actor, this change will have much bigger impact.

ktoso

Looks promising thanks for the PR

benchmark/single-source/AsyncTree.swift

ktoso · 2024-02-12T10:04:45Z

benchmark/single-source/AsyncTree.swift

+      g.wait()
+    }
+  }
+}


thanks for including a benchmark 👍

ktoso · 2024-02-12T10:08:21Z

@swift-ci please smoke test

include/swift/Basic/ListMerger.h

ktoso · 2024-02-12T13:50:24Z

@swift-ci please smoke test

ktoso · 2024-03-21T21:09:58Z

@swift-ci please build toolchain

rjmccall

So, I've reviewed this patch, and I think it's correct. I'm a little ambivalent about taking it, though, because it feels like a very small piece of progress that builds on top of an existing mistake. Your patch is storing things in the actor's private state without any additional synchronization, and that's fine because we only access that state while we're the current thread processing the actor. So... why don't we just go ahead and split the queue? There are only five supported priority values right now; we probably shouldn't fall over if we see an extended priority, but we're also not required to priority-sort them, so let's just make five doubly-linked lists and split jobs into the right list during preprocessing. We can keep a bit-field of which buckets have jobs in the status. Escalation can just splice the job out of its old list and put it in the new one.

ktoso

We chatted off thread on the forums with @nickolas-pohilets -- he'll try the suggested approach. Marking request-changes for tracking purposes :)

nickolas-pohilets · 2024-03-31T23:19:15Z

@rjmccall, I get the general idea of having 5 double-linked lists, but I'm afraid I did not understand some of the details that you've mentioned.

we probably shouldn't fall over if we see an extended priority, but we're also not required to priority-sort them

What do you mean by "extended priority"? A value not present in the enum, e.g. 0x1d?

When talking about priority-sorting, what does "them" refer to? Jobs (which currently are sorted by priority), lists or something else?

We can keep a bit-field of which buckets have jobs in the status.

Why do we need bitfields? Can't we just check the head pointers of the double-linked lists?

Escalation can just splice the job out of its old list and put it in the new one.

Does this happen already? I cannot find it in code. AFAICS, escalating priority affects the priority of the thread that drains the actor, but does not change relative order of the jobs. Am I missing something?

nickolas-pohilets · 2024-04-02T08:42:42Z

I've pushed a commit, which implements actor's queue as a single linked list but with 5 insertion points. This is sufficient to ensure that each job is preprocessed in O(1). But it won't work for unknown priorities. And there are no changes related to the priority escalation. Let me know what you think.

This eliminates usages of the ListMerger from default actor, but there are still 2 left in CooperativeGlobalExecutor.inc. Looks like JobQueue suffers from the same problem as default actor, and may also exhibit quadratic behaviour. If suggested approach looks good, I can extract reusable code from Actor.cpp and use it for JobQueue in CooperativeGlobalExecutor.inc, and completely remove ListMerger.

stdlib/public/Concurrency/Actor.cpp

rjmccall · 2024-04-03T22:52:51Z

@rjmccall, I get the general idea of having 5 double-linked lists, but I'm afraid I did not understand some of the details that you've mentioned.

Oh, sorry, I didn't see this comment. Let me respond to it as well for clarity's sake.

we probably shouldn't fall over if we see an extended priority, but we're also not required to priority-sort them

What do you mean by "extended priority"? A value not present in the enum, e.g. 0x1d?

Yes, exactly. I'd like to stay future-proof about supporting those other priority values as long as it doesn't cost us too much (and I don't expect it will).

When talking about priority-sorting, what does "them" refer to? Jobs (which currently are sorted by priority), lists or something else?

Jobs, yes. I'm just saying that it's okay if we effectively treat jobs with extended priorities as if they had one of the canonical priorities.

We can keep a bit-field of which buckets have jobs in the status.

Why do we need bitfields? Can't we just check the head pointers of the double-linked lists?

We can, yes. That's probably not a significant burden.

Escalation can just splice the job out of its old list and put it in the new one.

Does this happen already? I cannot find it in code. AFAICS, escalating priority affects the priority of the thread that drains the actor, but does not change relative order of the jobs. Am I missing something?

Oh, maybe I was thinking of something I've thought about doing but haven't ever done. Nevermind, then.

nickolas-pohilets · 2024-04-12T22:38:28Z

Updated.

Prioritised and non-prioritised queues are fully separated. JobRef removed. Prioritised is stored in the end of the actor layout to minimise false sharing.

It was a bit challenging to update all the places where there is a check that queue is empty. Before it required only an atomic read of the ActiveActorStatus, but now it requires also reading prioritisedJobs, which can be done only under the lock:

in DefaultActorImpl::unlock() is safe to check before unlocking
in DefaultActorImpl::tryLock() I've had to add a flag to assert after locking
in DefaultActorImpl::destroy(), IIUC, it cannot be done, because drainer can run concurrently with destruction. Is this correct? I've left a TODO about this.

Renamed ActiveActorStatus::FirstJob to ActiveActorStatus::FirstUnprioritisedJob to audit all the usages.

I've had to share some code between stdlib/public/Concurrency/Actor.cpp and stdlib/public/Concurrency/CooperativeGlobalExecutor.inc, and I have doubts if include/swift/ABI/MetadataValues.h is the best place for this.

Benchmark results look similar:

$ SWIFT_DETERMINISTIC_HASHING=1 DYLD_LIBRARY_PATH=$RELEASE_BUILD_DIR/lib/swift/macosx $RELEASE_BUILD_DIR/bin/Benchmark_O-arm64-apple-macosx10.13 --tags=concurrency --num-samples=100 --num-iters=10

  # TEST           SAMPLES      MIN   MEDIAN      MAX
 50 AsyncTree.100        100   55.300   63.100  120.000
 51 AsyncTree.5000       100 2543.200 2709.800 3281.800

ktoso · 2024-04-13T03:32:12Z

Very nice work, thank you for the update @nickolas-pohilets ! Will give it a read shortly

stdlib/public/Concurrency/Actor.cpp

rjmccall · 2024-04-24T20:21:37Z

@swift-ci Please test

stdlib/public/Concurrency/Actor.cpp

rjmccall

Looking pretty good. Thank you for persevering through a long review here! I think we're getting close to done.

stdlib/public/Concurrency/Actor.cpp

include/swift/Basic/PriorityQueue.h

rjmccall

Alright, this generally looks good. I'm fine with how the assertion in destroy currently works; I can take a look myself about that. Please squash it down to a reasonable set of final commits, and then we can see about getting this merged.

rjmccall · 2024-05-03T21:55:34Z

@swift-ci Please test

rjmccall · 2024-05-07T04:37:00Z

@swift-ci Please test

rjmccall · 2024-05-07T17:10:48Z

@swift-ci Please test macOS

rjmccall · 2024-05-08T18:54:16Z

I've asked @rokhinip and @mikeash to try to find time to take a look, but otherwise I think we're all set. We'll have to talk about our risk tolerance for merging this to the 6.0 branch.

mikeash

Looks very nice, just a few minor comments.

include/swift/Basic/HeaderFooterLayout.h

include/swift/Basic/PriorityQueue.h

stdlib/public/Concurrency/Actor.cpp

…tes 2+ new jobs of the same priority See https://forums.swift.org/t/quadratic-performance-of-the-listmerger-in-specific-use-case/69393

…d in O(1) Fully separated unprocessed jobs and processed jobs Reverse jobs after updating status to minimise contention

nickolas-pohilets · 2024-05-10T09:07:41Z

@rjmccall, do you want some TODOs to help you get back to the topic of ordering in actor destruction and assertions?

rjmccall · 2024-05-19T20:48:16Z

@swift-ci Please test

rjmccall · 2024-05-19T20:48:38Z

@rjmccall, do you want some TODOs to help you get back to the topic of ordering in actor destruction and assertions?

It's okay, I'll keep track of it. Thanks for asking, though.

rjmccall · 2024-05-20T21:26:51Z

@nickolas-pohilets Merged. Thank you!

These changes all reflect the the name `DefaultActorImplFooter::prioritizedJobs`. See also #70910 (comment).

nickolas-pohilets requested review from ktoso and kavon as code owners January 13, 2024 20:31

ktoso approved these changes Feb 12, 2024

View reviewed changes

ktoso reviewed Feb 12, 2024

View reviewed changes

include/swift/Basic/ListMerger.h Outdated Show resolved Hide resolved

ktoso added the concurrency Feature: umbrella label for concurrency language features label Feb 12, 2024

rjmccall reviewed Mar 26, 2024

View reviewed changes

ktoso requested changes Mar 28, 2024

View reviewed changes

rjmccall reviewed Apr 3, 2024

View reviewed changes

stdlib/public/Concurrency/Actor.cpp Outdated Show resolved Hide resolved

stdlib/public/Concurrency/Actor.cpp Outdated Show resolved Hide resolved

nickolas-pohilets force-pushed the mpokhylets/fix-list-merger-performance branch from e7032ce to f7cdd42 Compare April 12, 2024 21:46

nickolas-pohilets commented Apr 15, 2024

View reviewed changes

stdlib/public/Concurrency/Actor.cpp Outdated Show resolved Hide resolved

nickolas-pohilets commented Apr 15, 2024

View reviewed changes

stdlib/public/Concurrency/Actor.cpp Outdated Show resolved Hide resolved

nickolas-pohilets commented Apr 15, 2024

View reviewed changes

stdlib/public/Concurrency/Actor.cpp Outdated Show resolved Hide resolved

nickolas-pohilets force-pushed the mpokhylets/fix-list-merger-performance branch from 8921285 to a9d7613 Compare April 19, 2024 13:16

nickolas-pohilets requested review from ktoso and rjmccall April 19, 2024 13:22

rjmccall reviewed Apr 22, 2024

View reviewed changes

stdlib/public/Concurrency/Actor.cpp Outdated Show resolved Hide resolved

stdlib/public/Concurrency/Actor.cpp Outdated Show resolved Hide resolved

nickolas-pohilets requested a review from rjmccall April 23, 2024 15:08

rjmccall reviewed Apr 25, 2024

View reviewed changes

rjmccall reviewed Apr 26, 2024

View reviewed changes

nickolas-pohilets requested a review from rjmccall April 27, 2024 07:40

rjmccall reviewed May 3, 2024

View reviewed changes

nickolas-pohilets force-pushed the mpokhylets/fix-list-merger-performance branch from a003496 to 87f4a33 Compare May 6, 2024 11:04

nickolas-pohilets requested a review from rjmccall May 6, 2024 12:29

mikeash requested changes May 9, 2024

View reviewed changes

include/swift/Basic/HeaderFooterLayout.h Outdated Show resolved Hide resolved

include/swift/Basic/PriorityQueue.h Show resolved Hide resolved

stdlib/public/Concurrency/Actor.cpp Outdated Show resolved Hide resolved

stdlib/public/Concurrency/Actor.cpp Outdated Show resolved Hide resolved

nickolas-pohilets added 4 commits May 10, 2024 11:05

Added benchmark for adding jobs to default actor when visiting a tree

69f5450

Fixed quadratic performance of ListMerger when each executed job crea…

21a70e1

…tes 2+ new jobs of the same priority See https://forums.swift.org/t/quadratic-performance-of-the-listmerger-in-specific-use-case/69393

Using multiple insertion points to ensure all jobs are always inserte…

40c38f9

…d in O(1) Fully separated unprocessed jobs and processed jobs Reverse jobs after updating status to minimise contention

Process incoming queue when obtaining drainer lock

9ba09ff

nickolas-pohilets force-pushed the mpokhylets/fix-list-merger-performance branch from 87f4a33 to 9ba09ff Compare May 10, 2024 09:05

nickolas-pohilets requested a review from mikeash May 10, 2024 09:06

rjmccall merged commit e1a82f6 into swiftlang:main May 20, 2024
5 checks passed

atrick mentioned this pull request May 21, 2024

Revert "Fix quadratic performance of the ListMerger in specific usage pattern" #73808

Merged

ktoso mentioned this pull request Jun 4, 2024

[6.0] Fix quadratic performance of the ListMerger in specific usage pattern #74112

Merged

snaury mentioned this pull request Nov 4, 2024

Actor performance degrades catastrophically with contention #68299

Closed

kastiglione mentioned this pull request Mar 25, 2025

RemoteInspection: Update DefaultActorImpl layout #80278

Merged

kastiglione mentioned this pull request Apr 2, 2025

Concurrency: Use "prioritized" spelling (NFC) #80475

Merged

kastiglione added a commit that referenced this pull request Apr 17, 2025

Concurrency: Use "prioritized" spelling (NFC) (#80475)

6cc65c5

These changes all reflect the the name `DefaultActorImplFooter::prioritizedJobs`. See also #70910 (comment).

Fix quadratic performance of the ListMerger in specific usage pattern #70910

Fix quadratic performance of the ListMerger in specific usage pattern #70910

Uh oh!

Conversation

nickolas-pohilets commented Jan 13, 2024

Uh oh!

ktoso left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ktoso Feb 12, 2024

Choose a reason for hiding this comment

Uh oh!

ktoso commented Feb 12, 2024

Uh oh!

Uh oh!

ktoso commented Feb 12, 2024

Uh oh!

ktoso commented Mar 21, 2024

Uh oh!

rjmccall left a comment

Choose a reason for hiding this comment

Uh oh!

ktoso left a comment

Choose a reason for hiding this comment

Uh oh!

nickolas-pohilets commented Mar 31, 2024

Uh oh!

nickolas-pohilets commented Apr 2, 2024

Uh oh!

Uh oh!

Uh oh!

rjmccall commented Apr 3, 2024

Uh oh!

nickolas-pohilets commented Apr 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ktoso commented Apr 13, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rjmccall commented Apr 24, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rjmccall left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rjmccall left a comment

Choose a reason for hiding this comment

Uh oh!

rjmccall commented May 3, 2024

Uh oh!

rjmccall commented May 7, 2024

Uh oh!

rjmccall commented May 7, 2024

Uh oh!

rjmccall commented May 8, 2024

Uh oh!

mikeash left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix quadratic performance of the `ListMerger` in specific usage pattern #70910

Fix quadratic performance of the `ListMerger` in specific usage pattern #70910

nickolas-pohilets commented Apr 12, 2024 •

edited

Loading