Skip to content

[Parallel] Revert sequential task changes #109084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Sep 18, 2024

https://reviews.llvm.org/D148728 introduced bool Sequential to unify
execute and the old spawn without argument. However, sequential
tasks might be executed by any worker thread (non-deterministic),
leading to non-determinism output for ld.lld -z nocombreloc (see
https://reviews.llvm.org/D133003).

In addition, the extra member variables have overhead.
This sequential task has only been used for lld parallel relocation
scanning.

This patch restores the behavior before https://reviews.llvm.org/D148728 .

Fix #105958

Created using spr 1.3.5-bogner
@llvmbot
Copy link
Member

llvmbot commented Sep 18, 2024

@llvm/pr-subscribers-lld-elf
@llvm/pr-subscribers-lld

@llvm/pr-subscribers-llvm-support

Author: Fangrui Song (MaskRay)

Changes

https://reviews.llvm.org/D148728 introduced bool Sequential to unify
execute and the old spawn without argument. However, sequential
tasks might be executed by any worker thread (non-deterministic),
leading to non-determinism output for ld.lld -z nocombreloc.

In addition, the extra member variables have overhead.

This patch restores the behavior before https://reviews.llvm.org/D148728
, but temporarily sets threadIndex to 0 when the main thread is
executing the sequential task. In addition, use vector instead of
stack for WorkStack.

Fix #105958


Full diff: https://github.com/llvm/llvm-project/pull/109084.diff

1 Files Affected:

  • (modified) llvm/lib/Support/Parallel.cpp (+18-36)
diff --git a/llvm/lib/Support/Parallel.cpp b/llvm/lib/Support/Parallel.cpp
index a3ef3d9c621b98..4aec5b14ebb146 100644
--- a/llvm/lib/Support/Parallel.cpp
+++ b/llvm/lib/Support/Parallel.cpp
@@ -12,7 +12,6 @@
 #include "llvm/Support/Threading.h"
 
 #include <atomic>
-#include <deque>
 #include <future>
 #include <thread>
 #include <vector>
@@ -39,7 +38,7 @@ namespace {
 class Executor {
 public:
   virtual ~Executor() = default;
-  virtual void add(std::function<void()> func, bool Sequential = false) = 0;
+  virtual void add(std::function<void()> func) = 0;
   virtual size_t getThreadCount() const = 0;
 
   static Executor *getDefaultExecutor();
@@ -98,13 +97,10 @@ class ThreadPoolExecutor : public Executor {
     static void call(void *Ptr) { ((ThreadPoolExecutor *)Ptr)->stop(); }
   };
 
-  void add(std::function<void()> F, bool Sequential = false) override {
+  void add(std::function<void()> F) override {
     {
       std::lock_guard<std::mutex> Lock(Mutex);
-      if (Sequential)
-        WorkQueueSequential.emplace_front(std::move(F));
-      else
-        WorkQueue.emplace_back(std::move(F));
+      WorkStack.push_back(std::move(F));
     }
     Cond.notify_one();
   }
@@ -112,42 +108,23 @@ class ThreadPoolExecutor : public Executor {
   size_t getThreadCount() const override { return ThreadCount; }
 
 private:
-  bool hasSequentialTasks() const {
-    return !WorkQueueSequential.empty() && !SequentialQueueIsLocked;
-  }
-
-  bool hasGeneralTasks() const { return !WorkQueue.empty(); }
-
   void work(ThreadPoolStrategy S, unsigned ThreadID) {
     threadIndex = ThreadID;
     S.apply_thread_strategy(ThreadID);
     while (true) {
       std::unique_lock<std::mutex> Lock(Mutex);
-      Cond.wait(Lock, [&] {
-        return Stop || hasGeneralTasks() || hasSequentialTasks();
-      });
+      Cond.wait(Lock, [&] { return Stop || !WorkStack.empty(); });
       if (Stop)
         break;
-      bool Sequential = hasSequentialTasks();
-      if (Sequential)
-        SequentialQueueIsLocked = true;
-      else
-        assert(hasGeneralTasks());
-
-      auto &Queue = Sequential ? WorkQueueSequential : WorkQueue;
-      auto Task = std::move(Queue.back());
-      Queue.pop_back();
+      auto Task = std::move(WorkStack.back());
+      WorkStack.pop_back();
       Lock.unlock();
       Task();
-      if (Sequential)
-        SequentialQueueIsLocked = false;
     }
   }
 
   std::atomic<bool> Stop{false};
-  std::atomic<bool> SequentialQueueIsLocked{false};
-  std::deque<std::function<void()>> WorkQueue;
-  std::deque<std::function<void()>> WorkQueueSequential;
+  std::vector<std::function<void()>> WorkStack;
   std::mutex Mutex;
   std::condition_variable Cond;
   std::promise<void> ThreadsCreated;
@@ -217,13 +194,18 @@ TaskGroup::~TaskGroup() {
 void TaskGroup::spawn(std::function<void()> F, bool Sequential) {
 #if LLVM_ENABLE_THREADS
   if (Parallel) {
+    if (Sequential) {
+      // Act as worker thread 0.
+      threadIndex = 0;
+      F();
+      threadIndex = UINT_MAX;
+      return;
+    }
     L.inc();
-    detail::Executor::getDefaultExecutor()->add(
-        [&, F = std::move(F)] {
-          F();
-          L.dec();
-        },
-        Sequential);
+    detail::Executor::getDefaultExecutor()->add([&, F = std::move(F)] {
+      F();
+      L.dec();
+    });
     return;
   }
 #endif

Created using spr 1.3.5-bogner
@@ -217,13 +194,18 @@ TaskGroup::~TaskGroup() {
void TaskGroup::spawn(std::function<void()> F, bool Sequential) {
#if LLVM_ENABLE_THREADS
if (Parallel) {
if (Sequential) {
// Act as worker thread 0.
threadIndex = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this sequential behavior was only added (and seems only to be used by) the ELF relocations, can we not move this code there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate the expected code here? threadIndex is to satisfy lld ELF relocation's requirement. It does need threadIndex == 0. The initial threadIndex is UINT_MAX is for assertion https://reviews.llvm.org/D148916

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, if a user was to mix sequential and parallel tasks within the same TaskGroup then this could potentially lead to unintentional concurrent access to thread index 0 resources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaskRay I meant this whole sequential flag could be solved by including the for... loop in the task at the call site, when serial=true, no? (in elf::scanRelocations) That way you get a valid threadIndex and the items are executed sequentially.

  bool serial = ...
  parallel::TaskGroup tg;
  auto outerFn = [&]() {
    for (ELFFileBase *f : ctx.objectFiles) {
      auto fn = [f]() {
        RelocationScanner scanner;
        for (InputSectionBase *s : f->getSections()) {
          if (s && s->kind() == SectionBase::Regular && s->isLive() &&
              (s->flags & SHF_ALLOC) &&
              !(s->type == SHT_ARM_EXIDX && config->emachine == EM_ARM))
            scanner.template scanSection<ELFT>(*s);
        }
      };
      if (serial)
        fn();
      else
        tg.spawn(fn);

    }
  };
  if (serial)
    tg.spawn(outerFn);
  else
    outerFn();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus we don't need the Sequential flag here in Parallel.cpp

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch is an almost revert of reviews.llvm.org/D148728 .

Since the use case is so special, I'm fine with the style suggested by @aganea (also the initial style I used for parallel relocation scanning):

      if (serial)
        fn();
      else
        tg.spawn(fn);

    }
  };
  if (serial)
    tg.spawn(outerFn);
  else
    outerFn();

Perhaps Parallel.cpp could add setThreadIndex and lld/ELF/Relocations.cpp could call setThreadIndex(0) to make the pattern more explicit. It's up to lld to ensure correctness.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this code would be left in the patch:

    if (Sequential) {
      // Act as worker thread 0.
      threadIndex = 0;   <<<<<<<<<<<<<<<<<<<

then threadIndex = 0 could clash with another thread having threadIndex = 0. Which might result in concurrent access to the resource with index 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avl-llvm Yes, I think the goal would be to remove this chunk of code (everything related to Sequential)

Copy link
Collaborator

@avl-llvm avl-llvm Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks. So removing Sequential mode and including the for... loop in the task at the call site in elf::scanRelocations would be OK, then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I just realized the flipped condition :) Adopted the style and kept the threadIndex != UINT_MAX assertion.

.
Created using spr 1.3.5-bogner
@MaskRay MaskRay changed the title [Parallel] Ensure getThreadIndex()==0 for sequential tasks [Parallel] Revert sequential task changes Sep 19, 2024
auto &Queue = Sequential ? WorkQueueSequential : WorkQueue;
auto Task = std::move(Queue.back());
Queue.pop_back();
auto Task = std::move(WorkStack.back());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was missed in original code, but it seems that the check for WorkStack should be added here:

if (WorkStack.empty()) {
Lock.unlock();
continue;
}

Otherwise, WorkStack.back() might be called on empty WorkStack in case spurious wakeup?

Copy link
Member Author

@MaskRay MaskRay Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code is fine. The condition_variable should prevent spurious wakeup.

Also, unique_lock::~unique_lock releases ownership if owned.

@@ -99,7 +99,12 @@ TEST(Parallel, TaskGroupSequentialFor) {
{
parallel::TaskGroup tg;
for (size_t Idx = 0; Idx < 500; Idx++)
tg.spawn([&Count, Idx]() { EXPECT_EQ(Count++, Idx); }, true);
tg.spawn(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TaskGroupSequentialFor test is not necessary at all as "sequential" is removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted

Copy link
Collaborator

@avl-llvm avl-llvm Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was not clear enough. I meant that this patch should delete TaskGroupSequentialFor test from unittests/Support/ParallelTest.cpp. As it was added for sequential mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. I just realized that build failure of the unittest. Deleted:)

.
Created using spr 1.3.5-bogner
@aganea
Copy link
Member

aganea commented Sep 20, 2024

There is a compilation issue with the unit tests, but otherwise seems good.

if (serial)
scanEH();
else
tg.spawn(scanEH);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to run scanEH(); on main thread here? If it uses threadIndex then it will receive UINT_MAX which is unwanted. On the another hand scanEH contains whole loop and then all scanSection would be done in deterministic order. It looks like we can replace whole "if(serial)" with single "tg.spawn(scanEH);" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to run scanEH in a worker thread to avoid UINT_MAX threadIndex.

The if (serial) scanEH(); else tg.spawn(scanEH); pattern is similar to if (serial) fn(); else tg.spawn(fn); .

scanEH scans .eh_frame in all input files. .eh_frame relocations are smaller than other sections, so the previous code scans all .eh_frame in one task.

Created using spr 1.3.5-bogner
Copy link
Collaborator

@avl-llvm avl-llvm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

MaskRay added a commit that referenced this pull request Sep 21, 2024
https://reviews.llvm.org/D148728 introduced `bool Sequential` to unify
`execute` and the old `spawn` without argument. However, sequential
tasks might be executed by any worker thread (non-deterministic),
leading to non-determinism output for ld.lld -z nocombreloc (see
https://reviews.llvm.org/D133003).

In addition, the extra member variables have overhead.
This sequential task has only been used for lld parallel relocation
scanning.

This patch restores the behavior before https://reviews.llvm.org/D148728 .

Fix #105958

Pull Request: #109084
freebsd-git pushed a commit to freebsd/freebsd-src that referenced this pull request Sep 22, 2024
  [Parallel] Revert sequential task changes

  https://reviews.llvm.org/D148728 introduced `bool Sequential` to unify
  `execute` and the old `spawn` without argument. However, sequential
  tasks might be executed by any worker thread (non-deterministic),
  leading to non-determinism output for ld.lld -z nocombreloc (see
  https://reviews.llvm.org/D133003).

  In addition, the extra member variables have overhead.
  This sequential task has only been used for lld parallel relocation
  scanning.

  This patch restores the behavior before https://reviews.llvm.org/D148728 .

  Fix #105958

  Pull Request: llvm/llvm-project#109084

This fixes the non-reproducibility we had noticed when linking our EFI
loaders, and for which we committed a workaround in f5ce3f4.

MFC after:	3 days
@MaskRay
Copy link
Member Author

MaskRay commented Sep 23, 2024

Merged in b84d773

@MaskRay MaskRay closed this Sep 23, 2024
@MaskRay MaskRay deleted the users/MaskRay/spr/parallel-ensure-getthreadindex0-for-sequential-tasks branch September 23, 2024 00:27
freebsd-git pushed a commit to freebsd/freebsd-src that referenced this pull request Oct 8, 2024
  [Parallel] Revert sequential task changes

  https://reviews.llvm.org/D148728 introduced `bool Sequential` to unify
  `execute` and the old `spawn` without argument. However, sequential
  tasks might be executed by any worker thread (non-deterministic),
  leading to non-determinism output for ld.lld -z nocombreloc (see
  https://reviews.llvm.org/D133003).

  In addition, the extra member variables have overhead.
  This sequential task has only been used for lld parallel relocation
  scanning.

  This patch restores the behavior before https://reviews.llvm.org/D148728 .

  Fix #105958

  Pull Request: llvm/llvm-project#109084

This fixes the non-reproducibility we had noticed when linking our EFI
loaders, and for which we committed a workaround in f5ce3f4.

MFC after:	3 days

(cherry picked from commit 54521a2)
freebsd-git pushed a commit to freebsd/freebsd-src that referenced this pull request Oct 8, 2024
  [Parallel] Revert sequential task changes

  https://reviews.llvm.org/D148728 introduced `bool Sequential` to unify
  `execute` and the old `spawn` without argument. However, sequential
  tasks might be executed by any worker thread (non-deterministic),
  leading to non-determinism output for ld.lld -z nocombreloc (see
  https://reviews.llvm.org/D133003).

  In addition, the extra member variables have overhead.
  This sequential task has only been used for lld parallel relocation
  scanning.

  This patch restores the behavior before https://reviews.llvm.org/D148728 .

  Fix #105958

  Pull Request: llvm/llvm-project#109084

This fixes the non-reproducibility we had noticed when linking our EFI
loaders, and for which we committed a workaround in f5ce3f4.

MFC after:	3 days

(cherry picked from commit 54521a2)
bsdjhb pushed a commit to bsdjhb/cheribsd that referenced this pull request Jan 11, 2025
  [Parallel] Revert sequential task changes

  https://reviews.llvm.org/D148728 introduced `bool Sequential` to unify
  `execute` and the old `spawn` without argument. However, sequential
  tasks might be executed by any worker thread (non-deterministic),
  leading to non-determinism output for ld.lld -z nocombreloc (see
  https://reviews.llvm.org/D133003).

  In addition, the extra member variables have overhead.
  This sequential task has only been used for lld parallel relocation
  scanning.

  This patch restores the behavior before https://reviews.llvm.org/D148728 .

  Fix #105958

  Pull Request: llvm/llvm-project#109084

This fixes the non-reproducibility we had noticed when linking our EFI
loaders, and for which we committed a workaround in f5ce3f4.

MFC after:	3 days
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[lld] Thread-related non-reproducibility when linking FreeBSD EFI loader
5 participants