Skip to content

Fix quadratic performance of the ListMerger in specific usage pattern #70910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

nickolas-pohilets
Copy link
Contributor

When visiting a tree and creating new job per each tree node, ListMerger::merge() can exhibit quadratic performance.

Every time DefaultActorImpl::drainOne() is called we have a queue of jobs consisting of 2 unprocessed jobs (in reverse order), and N processed ones, where N keeps growing. All jobs have the same priority.

Unprocessed jobs are taken out of the queue and reversed in the process.

A new instance of ListMerger is used to merge this list of 2 jobs into the list of N processed jobs. But merging is always an appending and requires linear traversal of the entire list of N processed jobs to find the insertion point.

This PR fixes this by caching in the DefaultActorImpl information about the last insertion point of the ListMerger.

Before the change:

$ SWIFT_DETERMINISTIC_HASHING=1 DYLD_LIBRARY_PATH=$RELEASE_BUILD_DIR/lib/swift/macosx $RELEASE_BUILD_DIR/bin/Benchmark_O-arm64-apple-macosx10.13 --tags=concurrency --num-samples=100 --num-iters=10 
  # TEST           SAMPLES      MIN   MEDIAN      MAX
 50 AsyncTree.100       100   54.900   62.000  137.600
 51 AsyncTree.5000      100 3720.600 3926.800 4603.800

After the change

$ SWIFT_DETERMINISTIC_HASHING=1 DYLD_LIBRARY_PATH=$RELEASE_BUILD_DIR/lib/swift/macosx $RELEASE_BUILD_DIR/bin/Benchmark_O-arm64-apple-macosx10.13 --tags=concurrency --num-samples=100 --num-iters=10
  # TEST           SAMPLES      MIN   MEDIAN      MAX
 50 AsyncTree.100       100   55.600   61.200  214.400
 51 AsyncTree.5000      100 2561.200 2796.100 3226.600

As expected, there is no significant difference for small number of jobs, but there is a noticeable improvement for large number of jobs.

Note that API used in attached benchmark first schedules task on the generic executor, which changes pattern of enqueueing and draining. Several drains may happen in a row, and ListMerger::merge() is called only for the first one. If in the future new API allows to schedule jobs directly onto actor, this change will have much bigger impact.

Copy link
Contributor

@ktoso ktoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising thanks for the PR

g.wait()
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for including a benchmark 👍

@ktoso
Copy link
Contributor

ktoso commented Feb 12, 2024

@swift-ci please smoke test

@ktoso ktoso added the concurrency Feature: umbrella label for concurrency language features label Feb 12, 2024
@ktoso
Copy link
Contributor

ktoso commented Feb 12, 2024

@swift-ci please smoke test

@ktoso
Copy link
Contributor

ktoso commented Mar 21, 2024

@swift-ci please build toolchain

Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I've reviewed this patch, and I think it's correct. I'm a little ambivalent about taking it, though, because it feels like a very small piece of progress that builds on top of an existing mistake. Your patch is storing things in the actor's private state without any additional synchronization, and that's fine because we only access that state while we're the current thread processing the actor. So... why don't we just go ahead and split the queue? There are only five supported priority values right now; we probably shouldn't fall over if we see an extended priority, but we're also not required to priority-sort them, so let's just make five doubly-linked lists and split jobs into the right list during preprocessing. We can keep a bit-field of which buckets have jobs in the status. Escalation can just splice the job out of its old list and put it in the new one.

Copy link
Contributor

@ktoso ktoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We chatted off thread on the forums with @nickolas-pohilets -- he'll try the suggested approach. Marking request-changes for tracking purposes :)

@nickolas-pohilets
Copy link
Contributor Author

@rjmccall, I get the general idea of having 5 double-linked lists, but I'm afraid I did not understand some of the details that you've mentioned.

we probably shouldn't fall over if we see an extended priority, but we're also not required to priority-sort them

What do you mean by "extended priority"? A value not present in the enum, e.g. 0x1d?

When talking about priority-sorting, what does "them" refer to? Jobs (which currently are sorted by priority), lists or something else?

We can keep a bit-field of which buckets have jobs in the status.

Why do we need bitfields? Can't we just check the head pointers of the double-linked lists?

Escalation can just splice the job out of its old list and put it in the new one.

Does this happen already? I cannot find it in code. AFAICS, escalating priority affects the priority of the thread that drains the actor, but does not change relative order of the jobs. Am I missing something?

@nickolas-pohilets
Copy link
Contributor Author

I've pushed a commit, which implements actor's queue as a single linked list but with 5 insertion points. This is sufficient to ensure that each job is preprocessed in O(1). But it won't work for unknown priorities. And there are no changes related to the priority escalation. Let me know what you think.

This eliminates usages of the ListMerger from default actor, but there are still 2 left in CooperativeGlobalExecutor.inc. Looks like JobQueue suffers from the same problem as default actor, and may also exhibit quadratic behaviour. If suggested approach looks good, I can extract reusable code from Actor.cpp and use it for JobQueue in CooperativeGlobalExecutor.inc, and completely remove ListMerger.

@rjmccall
Copy link
Contributor

rjmccall commented Apr 3, 2024

@rjmccall, I get the general idea of having 5 double-linked lists, but I'm afraid I did not understand some of the details that you've mentioned.

Oh, sorry, I didn't see this comment. Let me respond to it as well for clarity's sake.

we probably shouldn't fall over if we see an extended priority, but we're also not required to priority-sort them

What do you mean by "extended priority"? A value not present in the enum, e.g. 0x1d?

Yes, exactly. I'd like to stay future-proof about supporting those other priority values as long as it doesn't cost us too much (and I don't expect it will).

When talking about priority-sorting, what does "them" refer to? Jobs (which currently are sorted by priority), lists or something else?

Jobs, yes. I'm just saying that it's okay if we effectively treat jobs with extended priorities as if they had one of the canonical priorities.

We can keep a bit-field of which buckets have jobs in the status.

Why do we need bitfields? Can't we just check the head pointers of the double-linked lists?

We can, yes. That's probably not a significant burden.

Escalation can just splice the job out of its old list and put it in the new one.

Does this happen already? I cannot find it in code. AFAICS, escalating priority affects the priority of the thread that drains the actor, but does not change relative order of the jobs. Am I missing something?

Oh, maybe I was thinking of something I've thought about doing but haven't ever done. Nevermind, then.

@nickolas-pohilets nickolas-pohilets force-pushed the mpokhylets/fix-list-merger-performance branch from e7032ce to f7cdd42 Compare April 12, 2024 21:46
@nickolas-pohilets
Copy link
Contributor Author

nickolas-pohilets commented Apr 12, 2024

Updated.

Prioritised and non-prioritised queues are fully separated. JobRef removed. Prioritised is stored in the end of the actor layout to minimise false sharing.

It was a bit challenging to update all the places where there is a check that queue is empty. Before it required only an atomic read of the ActiveActorStatus, but now it requires also reading prioritisedJobs, which can be done only under the lock:

  • in DefaultActorImpl::unlock() is safe to check before unlocking
  • in DefaultActorImpl::tryLock() I've had to add a flag to assert after locking
  • in DefaultActorImpl::destroy(), IIUC, it cannot be done, because drainer can run concurrently with destruction. Is this correct? I've left a TODO about this.

Renamed ActiveActorStatus::FirstJob to ActiveActorStatus::FirstUnprioritisedJob to audit all the usages.

I've had to share some code between stdlib/public/Concurrency/Actor.cpp and stdlib/public/Concurrency/CooperativeGlobalExecutor.inc, and I have doubts if include/swift/ABI/MetadataValues.h is the best place for this.

Benchmark results look similar:

$ SWIFT_DETERMINISTIC_HASHING=1 DYLD_LIBRARY_PATH=$RELEASE_BUILD_DIR/lib/swift/macosx $RELEASE_BUILD_DIR/bin/Benchmark_O-arm64-apple-macosx10.13 --tags=concurrency --num-samples=100 --num-iters=10

  # TEST           SAMPLES      MIN   MEDIAN      MAX
 50 AsyncTree.100        100   55.300   63.100  120.000
 51 AsyncTree.5000       100 2543.200 2709.800 3281.800

@ktoso
Copy link
Contributor

ktoso commented Apr 13, 2024

Very nice work, thank you for the update @nickolas-pohilets ! Will give it a read shortly

@nickolas-pohilets nickolas-pohilets force-pushed the mpokhylets/fix-list-merger-performance branch from 8921285 to a9d7613 Compare April 19, 2024 13:16
@rjmccall
Copy link
Contributor

@swift-ci Please test

Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good. Thank you for persevering through a long review here! I think we're getting close to done.

Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, this generally looks good. I'm fine with how the assertion in destroy currently works; I can take a look myself about that. Please squash it down to a reasonable set of final commits, and then we can see about getting this merged.

@rjmccall
Copy link
Contributor

rjmccall commented May 3, 2024

@swift-ci Please test

@nickolas-pohilets nickolas-pohilets force-pushed the mpokhylets/fix-list-merger-performance branch from a003496 to 87f4a33 Compare May 6, 2024 11:04
@nickolas-pohilets nickolas-pohilets requested a review from rjmccall May 6, 2024 12:29
@rjmccall
Copy link
Contributor

rjmccall commented May 7, 2024

@swift-ci Please test

@rjmccall
Copy link
Contributor

rjmccall commented May 7, 2024

@swift-ci Please test macOS

@rjmccall
Copy link
Contributor

rjmccall commented May 8, 2024

I've asked @rokhinip and @mikeash to try to find time to take a look, but otherwise I think we're all set. We'll have to talk about our risk tolerance for merging this to the 6.0 branch.

Copy link
Contributor

@mikeash mikeash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice, just a few minor comments.

@nickolas-pohilets nickolas-pohilets force-pushed the mpokhylets/fix-list-merger-performance branch from 87f4a33 to 9ba09ff Compare May 10, 2024 09:05
@nickolas-pohilets nickolas-pohilets requested a review from mikeash May 10, 2024 09:06
@nickolas-pohilets
Copy link
Contributor Author

@rjmccall, do you want some TODOs to help you get back to the topic of ordering in actor destruction and assertions?

@rjmccall
Copy link
Contributor

@swift-ci Please test

@rjmccall
Copy link
Contributor

@rjmccall, do you want some TODOs to help you get back to the topic of ordering in actor destruction and assertions?

It's okay, I'll keep track of it. Thanks for asking, though.

@rjmccall rjmccall merged commit e1a82f6 into swiftlang:main May 20, 2024
5 checks passed
@rjmccall
Copy link
Contributor

@nickolas-pohilets Merged. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
concurrency Feature: umbrella label for concurrency language features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants