[mlir] DistinctAttributeAllocator: fix race #132935

cota · 2025-03-25T13:45:58Z

The bitfields added in #128566 (threadingIsEnabled : 1 and useThreadLocalAllocator : 1) are accessed without synchronization. Example tsan report:

WARNING: ThreadSanitizer: data race (pid=337)
  Write of size 1 at 0x7260001d0ff0 by thread T224:
    #0 disableThreadLocalStorage third_party/llvm/llvm-project/mlir/lib/IR/AttributeDetail.h:434:29
    #1 mlir::MLIRContext::disableThreadLocalStorage(bool) third_party/llvm/llvm-project/mlir/lib/IR/MLIRContext.cpp:723:40
    #2 mlir::PassManager::runWithCrashRecovery(mlir::Operation*, mlir::AnalysisManager) third_party/llvm/llvm-project/mlir/lib/Pass/PassCrashRecovery.cpp:423:8
[...]
Previous write of size 1 at 0x7260001d0ff0 by thread T222:
    #0 disableThreadLocalStorage third_party/llvm/llvm-project/mlir/lib/IR/AttributeDetail.h:434:29
    #1 mlir::MLIRContext::disableThreadLocalStorage(bool) third_party/llvm/llvm-project/mlir/lib/IR/MLIRContext.cpp:723:40
    #2 mlir::PassManager::runWithCrashRecovery(mlir::Operation*, mlir::AnalysisManager) third_party/llvm/llvm-project/mlir/lib/Pass/PassCrashRecovery.cpp:423:8

Fix the race by serializing accesses to these fields with the existing allocatorMutex.

llvmbot · 2025-03-25T13:46:39Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-core

Author: Emilio Cota (cota)

Changes

The bitfields added in #128566 (threadingIsEnabled : 1 and useThreadLocalAllocator : 1) are accessed without synchronization. Example tsan report:

WARNING: ThreadSanitizer: data race (pid=337)
  Write of size 1 at 0x7260001d0ff0 by thread T224:
    #<!-- -->0 disableThreadLocalStorage third_party/llvm/llvm-project/mlir/lib/IR/AttributeDetail.h:434:29
    #<!-- -->1 mlir::MLIRContext::disableThreadLocalStorage(bool) third_party/llvm/llvm-project/mlir/lib/IR/MLIRContext.cpp:723:40
    #<!-- -->2 mlir::PassManager::runWithCrashRecovery(mlir::Operation*, mlir::AnalysisManager) third_party/llvm/llvm-project/mlir/lib/Pass/PassCrashRecovery.cpp:423:8
[...]
Previous write of size 1 at 0x7260001d0ff0 by thread T222:
    #<!-- -->0 disableThreadLocalStorage third_party/llvm/llvm-project/mlir/lib/IR/AttributeDetail.h:434:29
    #<!-- -->1 mlir::MLIRContext::disableThreadLocalStorage(bool) third_party/llvm/llvm-project/mlir/lib/IR/MLIRContext.cpp:723:40
    #<!-- -->2 mlir::PassManager::runWithCrashRecovery(mlir::Operation*, mlir::AnalysisManager) third_party/llvm/llvm-project/mlir/lib/Pass/PassCrashRecovery.cpp:423:8

Fix the race by serializing accesses to these fields with the existing allocatorMutex.

Full diff: https://github.com/llvm/llvm-project/pull/132935.diff

1 Files Affected:

(modified) mlir/lib/IR/AttributeDetail.h (+12-6)

diff --git a/mlir/lib/IR/AttributeDetail.h b/mlir/lib/IR/AttributeDetail.h
index 8fed18140433c..08ab3c0114265 100644
--- a/mlir/lib/IR/AttributeDetail.h
+++ b/mlir/lib/IR/AttributeDetail.h
@@ -413,16 +413,17 @@ class DistinctAttributeAllocator {
   /// Allocates a distinct attribute storage using a thread local bump pointer
   /// allocator to enable synchronization free parallel allocations.
   DistinctAttrStorage *allocate(Attribute referencedAttr) {
+    std::unique_lock<std::mutex> lock(allocatorMutex);
     if (!useThreadLocalAllocator && threadingIsEnabled) {
-      std::scoped_lock<std::mutex> lock(allocatorMutex);
-      return allocateImpl(referencedAttr);
+      return allocateImpl(referencedAttr, lock);
     }
-    return allocateImpl(referencedAttr);
+    return allocateImpl(referencedAttr, lock);
   }
 
   /// Sets a flag that stores if multithreading is enabled. The flag is used to
   /// decide if locking is needed when using a non thread-safe allocator.
   void disableMultiThreading(bool disable = true) {
+    std::scoped_lock<std::mutex> lock(allocatorMutex);
     threadingIsEnabled = !disable;
   }
 
@@ -431,12 +432,15 @@ class DistinctAttributeAllocator {
   /// beyond the lifetime of a child thread calling this function while ensuring
   /// thread-safe allocation.
   void disableThreadLocalStorage(bool disable = true) {
+    std::scoped_lock<std::mutex> lock(allocatorMutex);
     useThreadLocalAllocator = !disable;
   }
 
 private:
-  DistinctAttrStorage *allocateImpl(Attribute referencedAttr) {
-    return new (getAllocatorInUse().Allocate<DistinctAttrStorage>())
+  DistinctAttrStorage *allocateImpl(Attribute referencedAttr,
+                                    const std::unique_lock<std::mutex>& lock) {
+    assert(lock.owns_lock());
+    return new (getAllocatorInUse(lock).Allocate<DistinctAttrStorage>())
         DistinctAttrStorage(referencedAttr);
   }
 
@@ -444,7 +448,9 @@ class DistinctAttributeAllocator {
   /// thread-local, non-thread safe bump pointer allocator is used instead to
   /// prevent use-after-free errors whenever attribute storage created on a
   /// crash recover thread is accessed after the thread joins.
-  llvm::BumpPtrAllocator &getAllocatorInUse() {
+  llvm::BumpPtrAllocator &getAllocatorInUse(
+      const std::unique_lock<std::mutex>& lock) {
+    assert(lock.owns_lock());
     if (useThreadLocalAllocator)
       return allocatorCache.get();
     return allocator;

github-actions · 2025-03-25T13:49:19Z

✅ With the latest revision this PR passed the C/C++ code formatter.

The bitfields added in llvm#128566 (`threadingIsEnabled : 1` and `useThreadLocalAllocator : 1`) are accessed without synchronization. Example tsan report: ``` WARNING: ThreadSanitizer: data race (pid=337) Write of size 1 at 0x7260001d0ff0 by thread T224: #0 disableThreadLocalStorage third_party/llvm/llvm-project/mlir/lib/IR/AttributeDetail.h:434:29 llvm#1 mlir::MLIRContext::disableThreadLocalStorage(bool) third_party/llvm/llvm-project/mlir/lib/IR/MLIRContext.cpp:723:40 llvm#2 mlir::PassManager::runWithCrashRecovery(mlir::Operation*, mlir::AnalysisManager) third_party/llvm/llvm-project/mlir/lib/Pass/PassCrashRecovery.cpp:423:8 [...] Previous write of size 1 at 0x7260001d0ff0 by thread T222: #0 disableThreadLocalStorage third_party/llvm/llvm-project/mlir/lib/IR/AttributeDetail.h:434:29 llvm#1 mlir::MLIRContext::disableThreadLocalStorage(bool) third_party/llvm/llvm-project/mlir/lib/IR/MLIRContext.cpp:723:40 llvm#2 mlir::PassManager::runWithCrashRecovery(mlir::Operation*, mlir::AnalysisManager) third_party/llvm/llvm-project/mlir/lib/Pass/PassCrashRecovery.cpp:423:8 ``` Fix the race by serializing accesses to these fields with the existing `allocatorMutex`.

cota · 2025-03-25T13:53:15Z

@abulavin I cannot add you as a reviewer but please take a look. Thanks.

github-actions · 2025-03-25T14:01:21Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff b24694371c62a71aab3550e983a6bf971ed721ff 7fbb72b897f5d3402cc22dd3c6edf56b46bf3402 --extensions h -- mlir/lib/IR/AttributeDetail.h

View the diff from clang-format here.

diff --git a/mlir/lib/IR/AttributeDetail.h b/mlir/lib/IR/AttributeDetail.h
index 08ab3c0114..6a481aad91 100644
--- a/mlir/lib/IR/AttributeDetail.h
+++ b/mlir/lib/IR/AttributeDetail.h
@@ -438,7 +438,7 @@ public:
 
 private:
   DistinctAttrStorage *allocateImpl(Attribute referencedAttr,
-                                    const std::unique_lock<std::mutex>& lock) {
+                                    const std::unique_lock<std::mutex> &lock) {
     assert(lock.owns_lock());
     return new (getAllocatorInUse(lock).Allocate<DistinctAttrStorage>())
         DistinctAttrStorage(referencedAttr);
@@ -448,8 +448,8 @@ private:
   /// thread-local, non-thread safe bump pointer allocator is used instead to
   /// prevent use-after-free errors whenever attribute storage created on a
   /// crash recover thread is accessed after the thread joins.
-  llvm::BumpPtrAllocator &getAllocatorInUse(
-      const std::unique_lock<std::mutex>& lock) {
+  llvm::BumpPtrAllocator &
+  getAllocatorInUse(const std::unique_lock<std::mutex> &lock) {
     assert(lock.owns_lock());
     if (useThreadLocalAllocator)
       return allocatorCache.get();

gysit

LGTM

I am fine if this fixes the immediate problem. However, as you wrote on the other PR the solution defeats the purpose of avoiding lock contention.

When reviewing I was under the impression that we can check this flags safely since they are only set in single-thread code (for enable/disable multi-threading this is probably the case). But it seems like disableThreadLocalStorage is called from multi-thread code? Would introducing a lock in disableThreadLocalStorage avoid the problem as well?

cota · 2025-03-25T14:28:26Z

Would introducing a lock in disableThreadLocalStorage avoid the problem as well?

That lock would protect useThreadLocalAllocator and we'd have to acquire the lock on all the other accesses to it. It's easier to reuse the existing lock.

When reviewing I was under the impression that we can check this flags safely since they are only set in single-thread code

DistinctAttributeAllocator is a member of MLIRContext and therefore can be accessed by different threads when threading is enabled. We should either (1) have the bitfields in DistinctAttributeAllocator as immutable members, therefore being read-only and thus requiring no atomics, or (2) properly deal with multi-threaded accesses if we don't want the bitfields to be immutable. I think @abulavin should make the call since they have all the context.

gysit · 2025-03-25T14:29:58Z

But it seems like disableThreadLocalStorage is called from multi-thread code? Would introducing a lock in disableThreadLocalStorage avoid the problem as well?

Note to myself. Locking probably does not help since PassManager::runWithCrashRecovery executes in parallel and we want to disable thread local storage whenever at least some pass manager with crash recovery is running.

I guess given this additional complication it may make sense to fall back to a bump pointer allocator & locking.

joker-eph · 2025-03-25T14:59:52Z

Note to myself. Locking probably does not help since PassManager::runWithCrashRecovery executes in parallel and we want to disable thread local storage whenever at least some pass manager with crash recovery is running.

We can't disable multi-threading while a pass manager is running I believe.

gysit · 2025-03-25T15:11:25Z

We can't disable multi-threading while a pass manager is running I believe.

Right, maybe a enable crash recovery flag that is set together the multi-threading flag could be used to decide which allocator to use.

joker-eph · 2025-03-25T16:26:52Z

I suspect there is a more fundamental problem here.
If PassManager::runWithCrashRecovery() is executed for multiple threads you can have the following sequence:

ctx->disableThreadLocalStorage(); // T1
ctx->disableThreadLocalStorage(); // T2
ctx->enableThreadLocalStorage(); // T1

At this point T2 can execute while the thread local storage it enabled again.
If we need to lock, I suspect the scope has to be for the entire duration of PassManager::runWithCrashRecovery() ; or use a semaphore mechanism (increment an atomic integer instead of using a boolean).

Assuming I am correct with this, I would instead revert the original PR.

gysit · 2025-03-25T16:37:39Z

At this point T2 can execute while the thread local storage it enabled again.
If we need to lock, I suspect the scope has to be for the entire duration of PassManager::runWithCrashRecovery() ; or use a semaphore mechanism (increment an atomic integer instead of using a boolean).

I believe the example sounds indeed correct and a more complex fix is needed :(. In that case reverting may indeed be the better option. The downside is that it reintroduces the bug we had before.

cota · 2025-03-25T21:34:37Z

I also prefer the revert. Will revert #128566 now. Thanks everyone.

…e when crash reproduction is enabled" (#133000) Reverts #128566. See as well the discussion in #132935.

…bute storage when crash reproduction is enabled" (#133000) Reverts llvm/llvm-project#128566. See as well the discussion in llvm/llvm-project#132935.

llvmbot added mlir:core MLIR Core Infrastructure mlir labels Mar 25, 2025

cota mentioned this pull request Mar 25, 2025

[mlir] Fix DistinctAttributeUniquer deleting attribute storage when crash reproduction is enabled #128566

Merged

cota requested review from gysit and River707 March 25, 2025 13:51

gysit approved these changes Mar 25, 2025

View reviewed changes

cota force-pushed the abulavin branch from 7fbb72b to 1cc7677 Compare March 25, 2025 14:13

cota mentioned this pull request Mar 25, 2025

Revert "[mlir] Fix DistinctAttributeUniquer deleting attribute storage when crash reproduction is enabled" #133000

Merged

cota added a commit that referenced this pull request Mar 25, 2025

Revert "[mlir] Fix DistinctAttributeUniquer deleting attribute storag…

2da4ce8

…e when crash reproduction is enabled" (#133000) Reverts #128566. See as well the discussion in #132935.

cota closed this Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir] DistinctAttributeAllocator: fix race #132935

[mlir] DistinctAttributeAllocator: fix race #132935

Uh oh!

cota commented Mar 25, 2025

Uh oh!

llvmbot commented Mar 25, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Mar 25, 2025 •

edited

Loading

Uh oh!

cota commented Mar 25, 2025

Uh oh!

github-actions bot commented Mar 25, 2025

Uh oh!

gysit left a comment

Uh oh!

cota commented Mar 25, 2025

Uh oh!

gysit commented Mar 25, 2025

Uh oh!

joker-eph commented Mar 25, 2025

Uh oh!

gysit commented Mar 25, 2025

Uh oh!

joker-eph commented Mar 25, 2025

Uh oh!

gysit commented Mar 25, 2025

Uh oh!

cota commented Mar 25, 2025

Uh oh!

Uh oh!

[mlir] DistinctAttributeAllocator: fix race #132935

[mlir] DistinctAttributeAllocator: fix race #132935

Uh oh!

Conversation

cota commented Mar 25, 2025

Uh oh!

llvmbot commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cota commented Mar 25, 2025

Uh oh!

github-actions bot commented Mar 25, 2025

Uh oh!

gysit left a comment

Choose a reason for hiding this comment

Uh oh!

cota commented Mar 25, 2025

Uh oh!

gysit commented Mar 25, 2025

Uh oh!

joker-eph commented Mar 25, 2025

Uh oh!

gysit commented Mar 25, 2025

Uh oh!

joker-eph commented Mar 25, 2025

Uh oh!

gysit commented Mar 25, 2025

Uh oh!

cota commented Mar 25, 2025

Uh oh!

Uh oh!

llvmbot commented Mar 25, 2025 •

edited

Loading

github-actions bot commented Mar 25, 2025 •

edited

Loading