-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Parallel] Revert sequential task changes #109084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Parallel] Revert sequential task changes #109084
Conversation
Created using spr 1.3.5-bogner
@llvm/pr-subscribers-lld-elf @llvm/pr-subscribers-llvm-support Author: Fangrui Song (MaskRay) Changeshttps://reviews.llvm.org/D148728 introduced In addition, the extra member variables have overhead. This patch restores the behavior before https://reviews.llvm.org/D148728 Fix #105958 Full diff: https://github.com/llvm/llvm-project/pull/109084.diff 1 Files Affected:
diff --git a/llvm/lib/Support/Parallel.cpp b/llvm/lib/Support/Parallel.cpp
index a3ef3d9c621b98..4aec5b14ebb146 100644
--- a/llvm/lib/Support/Parallel.cpp
+++ b/llvm/lib/Support/Parallel.cpp
@@ -12,7 +12,6 @@
#include "llvm/Support/Threading.h"
#include <atomic>
-#include <deque>
#include <future>
#include <thread>
#include <vector>
@@ -39,7 +38,7 @@ namespace {
class Executor {
public:
virtual ~Executor() = default;
- virtual void add(std::function<void()> func, bool Sequential = false) = 0;
+ virtual void add(std::function<void()> func) = 0;
virtual size_t getThreadCount() const = 0;
static Executor *getDefaultExecutor();
@@ -98,13 +97,10 @@ class ThreadPoolExecutor : public Executor {
static void call(void *Ptr) { ((ThreadPoolExecutor *)Ptr)->stop(); }
};
- void add(std::function<void()> F, bool Sequential = false) override {
+ void add(std::function<void()> F) override {
{
std::lock_guard<std::mutex> Lock(Mutex);
- if (Sequential)
- WorkQueueSequential.emplace_front(std::move(F));
- else
- WorkQueue.emplace_back(std::move(F));
+ WorkStack.push_back(std::move(F));
}
Cond.notify_one();
}
@@ -112,42 +108,23 @@ class ThreadPoolExecutor : public Executor {
size_t getThreadCount() const override { return ThreadCount; }
private:
- bool hasSequentialTasks() const {
- return !WorkQueueSequential.empty() && !SequentialQueueIsLocked;
- }
-
- bool hasGeneralTasks() const { return !WorkQueue.empty(); }
-
void work(ThreadPoolStrategy S, unsigned ThreadID) {
threadIndex = ThreadID;
S.apply_thread_strategy(ThreadID);
while (true) {
std::unique_lock<std::mutex> Lock(Mutex);
- Cond.wait(Lock, [&] {
- return Stop || hasGeneralTasks() || hasSequentialTasks();
- });
+ Cond.wait(Lock, [&] { return Stop || !WorkStack.empty(); });
if (Stop)
break;
- bool Sequential = hasSequentialTasks();
- if (Sequential)
- SequentialQueueIsLocked = true;
- else
- assert(hasGeneralTasks());
-
- auto &Queue = Sequential ? WorkQueueSequential : WorkQueue;
- auto Task = std::move(Queue.back());
- Queue.pop_back();
+ auto Task = std::move(WorkStack.back());
+ WorkStack.pop_back();
Lock.unlock();
Task();
- if (Sequential)
- SequentialQueueIsLocked = false;
}
}
std::atomic<bool> Stop{false};
- std::atomic<bool> SequentialQueueIsLocked{false};
- std::deque<std::function<void()>> WorkQueue;
- std::deque<std::function<void()>> WorkQueueSequential;
+ std::vector<std::function<void()>> WorkStack;
std::mutex Mutex;
std::condition_variable Cond;
std::promise<void> ThreadsCreated;
@@ -217,13 +194,18 @@ TaskGroup::~TaskGroup() {
void TaskGroup::spawn(std::function<void()> F, bool Sequential) {
#if LLVM_ENABLE_THREADS
if (Parallel) {
+ if (Sequential) {
+ // Act as worker thread 0.
+ threadIndex = 0;
+ F();
+ threadIndex = UINT_MAX;
+ return;
+ }
L.inc();
- detail::Executor::getDefaultExecutor()->add(
- [&, F = std::move(F)] {
- F();
- L.dec();
- },
- Sequential);
+ detail::Executor::getDefaultExecutor()->add([&, F = std::move(F)] {
+ F();
+ L.dec();
+ });
return;
}
#endif
|
llvm/lib/Support/Parallel.cpp
Outdated
@@ -217,13 +194,18 @@ TaskGroup::~TaskGroup() { | |||
void TaskGroup::spawn(std::function<void()> F, bool Sequential) { | |||
#if LLVM_ENABLE_THREADS | |||
if (Parallel) { | |||
if (Sequential) { | |||
// Act as worker thread 0. | |||
threadIndex = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this sequential behavior was only added (and seems only to be used by) the ELF relocations, can we not move this code there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate the expected code here? threadIndex
is to satisfy lld ELF relocation's requirement. It does need threadIndex == 0
. The initial threadIndex
is UINT_MAX is for assertion https://reviews.llvm.org/D148916
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, if a user was to mix sequential and parallel tasks within the same TaskGroup
then this could potentially lead to unintentional concurrent access to thread index 0
resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaskRay I meant this whole sequential flag could be solved by including the for... loop in the task at the call site, when serial=true, no? (in elf::scanRelocations) That way you get a valid threadIndex
and the items are executed sequentially.
bool serial = ...
parallel::TaskGroup tg;
auto outerFn = [&]() {
for (ELFFileBase *f : ctx.objectFiles) {
auto fn = [f]() {
RelocationScanner scanner;
for (InputSectionBase *s : f->getSections()) {
if (s && s->kind() == SectionBase::Regular && s->isLive() &&
(s->flags & SHF_ALLOC) &&
!(s->type == SHT_ARM_EXIDX && config->emachine == EM_ARM))
scanner.template scanSection<ELFT>(*s);
}
};
if (serial)
fn();
else
tg.spawn(fn);
}
};
if (serial)
tg.spawn(outerFn);
else
outerFn();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thus we don't need the Sequential
flag here in Parallel.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch is an almost revert of reviews.llvm.org/D148728 .
Since the use case is so special, I'm fine with the style suggested by @aganea (also the initial style I used for parallel relocation scanning):
if (serial)
fn();
else
tg.spawn(fn);
}
};
if (serial)
tg.spawn(outerFn);
else
outerFn();
Perhaps Parallel.cpp could add setThreadIndex
and lld/ELF/Relocations.cpp could call setThreadIndex(0)
to make the pattern more explicit. It's up to lld to ensure correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this code would be left in the patch:
if (Sequential) {
// Act as worker thread 0.
threadIndex = 0; <<<<<<<<<<<<<<<<<<<
then threadIndex = 0 could clash with another thread having threadIndex = 0. Which might result in concurrent access to the resource with index 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@avl-llvm Yes, I think the goal would be to remove this chunk of code (everything related to Sequential
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks. So removing Sequential mode and including the for... loop in the task at the call site in elf::scanRelocations would be OK, then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I just realized the flipped condition :) Adopted the style and kept the threadIndex != UINT_MAX
assertion.
auto &Queue = Sequential ? WorkQueueSequential : WorkQueue; | ||
auto Task = std::move(Queue.back()); | ||
Queue.pop_back(); | ||
auto Task = std::move(WorkStack.back()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was missed in original code, but it seems that the check for WorkStack should be added here:
if (WorkStack.empty()) {
Lock.unlock();
continue;
}
Otherwise, WorkStack.back() might be called on empty WorkStack in case spurious wakeup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code is fine. The condition_variable should prevent spurious wakeup.
Also, unique_lock::~unique_lock
releases ownership if owned.
@@ -99,7 +99,12 @@ TEST(Parallel, TaskGroupSequentialFor) { | |||
{ | |||
parallel::TaskGroup tg; | |||
for (size_t Idx = 0; Idx < 500; Idx++) | |||
tg.spawn([&Count, Idx]() { EXPECT_EQ(Count++, Idx); }, true); | |||
tg.spawn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TaskGroupSequentialFor test is not necessary at all as "sequential" is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was not clear enough. I meant that this patch should delete TaskGroupSequentialFor test from unittests/Support/ParallelTest.cpp. As it was added for sequential mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. I just realized that build failure of the unittest. Deleted:)
There is a compilation issue with the unit tests, but otherwise seems good. |
if (serial) | ||
scanEH(); | ||
else | ||
tg.spawn(scanEH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to run scanEH(); on main thread here? If it uses threadIndex then it will receive UINT_MAX which is unwanted. On the another hand scanEH contains whole loop and then all scanSection would be done in deterministic order. It looks like we can replace whole "if(serial)" with single "tg.spawn(scanEH);" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to run scanEH
in a worker thread to avoid UINT_MAX
threadIndex
.
The if (serial) scanEH(); else tg.spawn(scanEH);
pattern is similar to if (serial) fn(); else tg.spawn(fn);
.
scanEH
scans .eh_frame
in all input files. .eh_frame
relocations are smaller than other sections, so the previous code scans all .eh_frame
in one task.
Created using spr 1.3.5-bogner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
https://reviews.llvm.org/D148728 introduced `bool Sequential` to unify `execute` and the old `spawn` without argument. However, sequential tasks might be executed by any worker thread (non-deterministic), leading to non-determinism output for ld.lld -z nocombreloc (see https://reviews.llvm.org/D133003). In addition, the extra member variables have overhead. This sequential task has only been used for lld parallel relocation scanning. This patch restores the behavior before https://reviews.llvm.org/D148728 . Fix #105958 Pull Request: #109084
[Parallel] Revert sequential task changes https://reviews.llvm.org/D148728 introduced `bool Sequential` to unify `execute` and the old `spawn` without argument. However, sequential tasks might be executed by any worker thread (non-deterministic), leading to non-determinism output for ld.lld -z nocombreloc (see https://reviews.llvm.org/D133003). In addition, the extra member variables have overhead. This sequential task has only been used for lld parallel relocation scanning. This patch restores the behavior before https://reviews.llvm.org/D148728 . Fix #105958 Pull Request: llvm/llvm-project#109084 This fixes the non-reproducibility we had noticed when linking our EFI loaders, and for which we committed a workaround in f5ce3f4. MFC after: 3 days
Merged in b84d773 |
[Parallel] Revert sequential task changes https://reviews.llvm.org/D148728 introduced `bool Sequential` to unify `execute` and the old `spawn` without argument. However, sequential tasks might be executed by any worker thread (non-deterministic), leading to non-determinism output for ld.lld -z nocombreloc (see https://reviews.llvm.org/D133003). In addition, the extra member variables have overhead. This sequential task has only been used for lld parallel relocation scanning. This patch restores the behavior before https://reviews.llvm.org/D148728 . Fix #105958 Pull Request: llvm/llvm-project#109084 This fixes the non-reproducibility we had noticed when linking our EFI loaders, and for which we committed a workaround in f5ce3f4. MFC after: 3 days (cherry picked from commit 54521a2)
[Parallel] Revert sequential task changes https://reviews.llvm.org/D148728 introduced `bool Sequential` to unify `execute` and the old `spawn` without argument. However, sequential tasks might be executed by any worker thread (non-deterministic), leading to non-determinism output for ld.lld -z nocombreloc (see https://reviews.llvm.org/D133003). In addition, the extra member variables have overhead. This sequential task has only been used for lld parallel relocation scanning. This patch restores the behavior before https://reviews.llvm.org/D148728 . Fix #105958 Pull Request: llvm/llvm-project#109084 This fixes the non-reproducibility we had noticed when linking our EFI loaders, and for which we committed a workaround in f5ce3f4. MFC after: 3 days (cherry picked from commit 54521a2)
[Parallel] Revert sequential task changes https://reviews.llvm.org/D148728 introduced `bool Sequential` to unify `execute` and the old `spawn` without argument. However, sequential tasks might be executed by any worker thread (non-deterministic), leading to non-determinism output for ld.lld -z nocombreloc (see https://reviews.llvm.org/D133003). In addition, the extra member variables have overhead. This sequential task has only been used for lld parallel relocation scanning. This patch restores the behavior before https://reviews.llvm.org/D148728 . Fix #105958 Pull Request: llvm/llvm-project#109084 This fixes the non-reproducibility we had noticed when linking our EFI loaders, and for which we committed a workaround in f5ce3f4. MFC after: 3 days
https://reviews.llvm.org/D148728 introduced
bool Sequential
to unifyexecute
and the oldspawn
without argument. However, sequentialtasks might be executed by any worker thread (non-deterministic),
leading to non-determinism output for ld.lld -z nocombreloc (see
https://reviews.llvm.org/D133003).
In addition, the extra member variables have overhead.
This sequential task has only been used for lld parallel relocation
scanning.
This patch restores the behavior before https://reviews.llvm.org/D148728 .
Fix #105958