Skip to content

[mlir] Fix DistinctAttributeUniquer deleting attribute storage when crash reproduction is enabled #128566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

abulavin
Copy link
Contributor

@abulavin abulavin commented Feb 24, 2025

Currently, DistinctAttr uses an allocator wrapped in a ThreadLocalCache to manage attribute storage allocations. This ensures all allocations are freed when the allocator is destroyed.

However, this setup can cause use-after-free errors when mlir::PassManager runs its passes on a separate thread as a result of crash reproduction being enabled. Distinct attribute storages are created in the child thread's local storage and freed once the thread joins. Attempting to access these attributes after this can result in segmentation faults, such as during printing or alias analysis.

Example: This invocation of mlir-opt demonstrates the segfault issue due to distinct attributes being created in a child thread and their storage being freed once the thread joins:

mlir-opt --mlir-pass-pipeline-crash-reproducer=. --test-distinct-attrs mlir/test/IR/test-builtin-distinct-attrs.mlir

This pull request changes the distinct attribute allocator to use different allocators depending on whether or not threading is enabled and whether or not the pass manager is running its passes in a separate thread. If multithreading is disabled, a non thread-local allocator is used. If threading remains enabled and the pass manager invokes its pass pipelines in a child thread, then a non-thread local but synchronised allocator is used. This ensures that the lifetime of allocated storage persists beyond the lifetime of the child thread.

I have added two tests for the -test-distinct-attrs pass and the -enable-debug-info-on-llvm-scope passes that run them with crash reproduction enabled.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added mlir:core MLIR Core Infrastructure mlir:llvm mlir labels Feb 24, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 24, 2025

@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir-core

Author: Artemiy Bulavin (abulavin)

Changes

Currently, DistinctAttr uses a specific allocator wrapped in a ThreadLocalCache to manage attribute storage allocations. This ensures all allocations are freed when the allocator is destroyed.

However, this setup can cause use-after-free errors when mlir::PassManager runs on a separate thread as a result of crash reproduction being enabled. Distinct attribute storages are created in the child thread's local storage and freed once the thread joins. Attempting to access these attributes after this can result in segmentation faults, such as during printing or alias analysis.

Example: This invocation of mlir-opt demonstrates the segfault issue due to distinct attributes being created in a child thread and their storage being freed once the thread joins:

mlir-opt --mlir-disable-threading --mlir-pass-pipeline-crash-reproducer=. --test-distinct-attrs mlir/test/IR/test-builtin-distinct-attrs.mlir

This pull request changes the distinct attribute allocator to use just BumpPtrAllocator when threading is disabled in the MLIRContext. The absence of threading caused by --mlir-disable-threading means ThreadLocalCache is unnecessary. This change also avoids segmentation faults when generating local reproducers with crash reproduction.

I have added tests specifically for the -test-distinct-attrs pass and the enable-debug-info-on-llvm-scope passes that cover the distinct attr case speficifcally and also add a test case for a pass that creates these distinct attributes (the latter is how I discovered this issue.)


Full diff: https://github.com/llvm/llvm-project/pull/128566.diff

4 Files Affected:

  • (modified) mlir/lib/IR/AttributeDetail.h (+20-2)
  • (modified) mlir/lib/IR/MLIRContext.cpp (+1)
  • (added) mlir/test/Dialect/LLVMIR/add-debuginfo-func-scope-with-crash-reproduction.mlir (+19)
  • (added) mlir/test/IR/test-builtin-distinct-attrs-with-crash-reproduction.mlir (+17)
diff --git a/mlir/lib/IR/AttributeDetail.h b/mlir/lib/IR/AttributeDetail.h
index 26d40ac3a38f6..6787dc6d3e698 100644
--- a/mlir/lib/IR/AttributeDetail.h
+++ b/mlir/lib/IR/AttributeDetail.h
@@ -23,6 +23,7 @@
 #include "mlir/Support/ThreadLocalCache.h"
 #include "llvm/ADT/APFloat.h"
 #include "llvm/ADT/PointerIntPair.h"
+#include "llvm/Support/Allocator.h"
 #include "llvm/Support/TrailingObjects.h"
 
 namespace mlir {
@@ -409,14 +410,31 @@ class DistinctAttributeAllocator {
   operator=(const DistinctAttributeAllocator &) = delete;
 
   /// Allocates a distinct attribute storage using a thread local bump pointer
-  /// allocator to enable synchronization free parallel allocations.
+  /// allocator to enable synchronization free parallel allocations. If
+  /// threading is disabled on the owning MLIR context, a normal non
+  /// thread-local bump pointer allocator is used instead to prevent
+  /// use-after-free errors whenever attribute storage created on a crash
+  /// recover thread is accessed after the thread joins.
   DistinctAttrStorage *allocate(Attribute referencedAttr) {
-    return new (allocatorCache.get().Allocate<DistinctAttrStorage>())
+    return new (getAllocatorInUse().Allocate<DistinctAttrStorage>())
         DistinctAttrStorage(referencedAttr);
   }
 
+  void disableMultithreading(bool disable = true) {
+    useThreadLocalCache = !disable;
+  };
+
 private:
+  llvm::BumpPtrAllocator &getAllocatorInUse() {
+    if (useThreadLocalCache) {
+      return allocatorCache.get();
+    }
+    return allocator;
+  }
+
   ThreadLocalCache<llvm::BumpPtrAllocator> allocatorCache;
+  llvm::BumpPtrAllocator allocator;
+  bool useThreadLocalCache : 1;
 };
 } // namespace detail
 } // namespace mlir
diff --git a/mlir/lib/IR/MLIRContext.cpp b/mlir/lib/IR/MLIRContext.cpp
index 87782e84dd6e4..04f2cf8f7b1ec 100644
--- a/mlir/lib/IR/MLIRContext.cpp
+++ b/mlir/lib/IR/MLIRContext.cpp
@@ -596,6 +596,7 @@ void MLIRContext::disableMultithreading(bool disable) {
   // Update the threading mode for each of the uniquers.
   impl->affineUniquer.disableMultithreading(disable);
   impl->attributeUniquer.disableMultithreading(disable);
+  impl->distinctAttributeAllocator.disableMultithreading(disable);
   impl->typeUniquer.disableMultithreading(disable);
 
   // Destroy thread pool (stop all threads) if it is no longer needed, or create
diff --git a/mlir/test/Dialect/LLVMIR/add-debuginfo-func-scope-with-crash-reproduction.mlir b/mlir/test/Dialect/LLVMIR/add-debuginfo-func-scope-with-crash-reproduction.mlir
new file mode 100644
index 0000000000000..5f49df419bccb
--- /dev/null
+++ b/mlir/test/Dialect/LLVMIR/add-debuginfo-func-scope-with-crash-reproduction.mlir
@@ -0,0 +1,19 @@
+// Test that the enable-debug-info-scope-on-llvm-func pass can create its
+// DI attributes when running in the crash reproducer thread,
+
+// RUN: mlir-opt --mlir-disable-threading --mlir-pass-pipeline-crash-reproducer=. \
+// RUN:          --pass-pipeline="builtin.module(ensure-debug-info-scope-on-llvm-func)" \
+// RUN:          --mlir-print-debuginfo %s | FileCheck %s
+
+module {
+  llvm.func @func_no_debug() {
+    llvm.return loc(unknown)
+  } loc(unknown)
+} loc(unknown)
+
+// CHECK-LABEL: llvm.func @func_no_debug()
+// CHECK: llvm.return loc(#loc
+// CHECK: loc(#loc[[LOC:[0-9]+]])
+// CHECK: #di_file = #llvm.di_file<"<unknown>" in "">
+// CHECK: #di_subprogram = #llvm.di_subprogram<id = distinct[{{.*}}]<>, compileUnit = #di_compile_unit, scope = #di_file, name = "func_no_debug", linkageName = "func_no_debug", file = #di_file, line = 1, scopeLine = 1, subprogramFlags = "Definition|Optimized", type = #di_subroutine_type>
+// CHECK: #loc[[LOC]] = loc(fused<#di_subprogram>
diff --git a/mlir/test/IR/test-builtin-distinct-attrs-with-crash-reproduction.mlir b/mlir/test/IR/test-builtin-distinct-attrs-with-crash-reproduction.mlir
new file mode 100644
index 0000000000000..5c3b78bc6e154
--- /dev/null
+++ b/mlir/test/IR/test-builtin-distinct-attrs-with-crash-reproduction.mlir
@@ -0,0 +1,17 @@
+// This is a regression test that verifies that when running with crash
+// reproduction enabled, distinct attribute storage is not allocated in
+// thread-local storage. Since crash reproduction runs the pass manager in a
+// separate thread, using thread-local storage for distinct attributes causes
+// use-after-free errors once the thread that runs the pass manager joins.
+
+// RUN: mlir-opt --mlir-disable-threading --mlir-pass-pipeline-crash-reproducer=. %s -test-distinct-attrs | FileCheck %s
+
+// CHECK: #[[DIST0:.*]] = distinct[0]<42 : i32>
+// CHECK: #[[DIST1:.*]] = distinct[1]<42 : i32>
+#distinct = distinct[0]<42 : i32>
+
+// CHECK: @foo_1
+func.func @foo_1() {
+  // CHECK: "test.op"() {distinct.input = #[[DIST0]], distinct.output = #[[DIST1]]}
+  "test.op"() {distinct.input = #distinct} : () -> ()
+}

@llvmbot
Copy link
Member

llvmbot commented Feb 24, 2025

@llvm/pr-subscribers-mlir

Author: Artemiy Bulavin (abulavin)

Changes

Currently, DistinctAttr uses a specific allocator wrapped in a ThreadLocalCache to manage attribute storage allocations. This ensures all allocations are freed when the allocator is destroyed.

However, this setup can cause use-after-free errors when mlir::PassManager runs on a separate thread as a result of crash reproduction being enabled. Distinct attribute storages are created in the child thread's local storage and freed once the thread joins. Attempting to access these attributes after this can result in segmentation faults, such as during printing or alias analysis.

Example: This invocation of mlir-opt demonstrates the segfault issue due to distinct attributes being created in a child thread and their storage being freed once the thread joins:

mlir-opt --mlir-disable-threading --mlir-pass-pipeline-crash-reproducer=. --test-distinct-attrs mlir/test/IR/test-builtin-distinct-attrs.mlir

This pull request changes the distinct attribute allocator to use just BumpPtrAllocator when threading is disabled in the MLIRContext. The absence of threading caused by --mlir-disable-threading means ThreadLocalCache is unnecessary. This change also avoids segmentation faults when generating local reproducers with crash reproduction.

I have added tests specifically for the -test-distinct-attrs pass and the enable-debug-info-on-llvm-scope passes that cover the distinct attr case speficifcally and also add a test case for a pass that creates these distinct attributes (the latter is how I discovered this issue.)


Full diff: https://github.com/llvm/llvm-project/pull/128566.diff

4 Files Affected:

  • (modified) mlir/lib/IR/AttributeDetail.h (+20-2)
  • (modified) mlir/lib/IR/MLIRContext.cpp (+1)
  • (added) mlir/test/Dialect/LLVMIR/add-debuginfo-func-scope-with-crash-reproduction.mlir (+19)
  • (added) mlir/test/IR/test-builtin-distinct-attrs-with-crash-reproduction.mlir (+17)
diff --git a/mlir/lib/IR/AttributeDetail.h b/mlir/lib/IR/AttributeDetail.h
index 26d40ac3a38f6..6787dc6d3e698 100644
--- a/mlir/lib/IR/AttributeDetail.h
+++ b/mlir/lib/IR/AttributeDetail.h
@@ -23,6 +23,7 @@
 #include "mlir/Support/ThreadLocalCache.h"
 #include "llvm/ADT/APFloat.h"
 #include "llvm/ADT/PointerIntPair.h"
+#include "llvm/Support/Allocator.h"
 #include "llvm/Support/TrailingObjects.h"
 
 namespace mlir {
@@ -409,14 +410,31 @@ class DistinctAttributeAllocator {
   operator=(const DistinctAttributeAllocator &) = delete;
 
   /// Allocates a distinct attribute storage using a thread local bump pointer
-  /// allocator to enable synchronization free parallel allocations.
+  /// allocator to enable synchronization free parallel allocations. If
+  /// threading is disabled on the owning MLIR context, a normal non
+  /// thread-local bump pointer allocator is used instead to prevent
+  /// use-after-free errors whenever attribute storage created on a crash
+  /// recover thread is accessed after the thread joins.
   DistinctAttrStorage *allocate(Attribute referencedAttr) {
-    return new (allocatorCache.get().Allocate<DistinctAttrStorage>())
+    return new (getAllocatorInUse().Allocate<DistinctAttrStorage>())
         DistinctAttrStorage(referencedAttr);
   }
 
+  void disableMultithreading(bool disable = true) {
+    useThreadLocalCache = !disable;
+  };
+
 private:
+  llvm::BumpPtrAllocator &getAllocatorInUse() {
+    if (useThreadLocalCache) {
+      return allocatorCache.get();
+    }
+    return allocator;
+  }
+
   ThreadLocalCache<llvm::BumpPtrAllocator> allocatorCache;
+  llvm::BumpPtrAllocator allocator;
+  bool useThreadLocalCache : 1;
 };
 } // namespace detail
 } // namespace mlir
diff --git a/mlir/lib/IR/MLIRContext.cpp b/mlir/lib/IR/MLIRContext.cpp
index 87782e84dd6e4..04f2cf8f7b1ec 100644
--- a/mlir/lib/IR/MLIRContext.cpp
+++ b/mlir/lib/IR/MLIRContext.cpp
@@ -596,6 +596,7 @@ void MLIRContext::disableMultithreading(bool disable) {
   // Update the threading mode for each of the uniquers.
   impl->affineUniquer.disableMultithreading(disable);
   impl->attributeUniquer.disableMultithreading(disable);
+  impl->distinctAttributeAllocator.disableMultithreading(disable);
   impl->typeUniquer.disableMultithreading(disable);
 
   // Destroy thread pool (stop all threads) if it is no longer needed, or create
diff --git a/mlir/test/Dialect/LLVMIR/add-debuginfo-func-scope-with-crash-reproduction.mlir b/mlir/test/Dialect/LLVMIR/add-debuginfo-func-scope-with-crash-reproduction.mlir
new file mode 100644
index 0000000000000..5f49df419bccb
--- /dev/null
+++ b/mlir/test/Dialect/LLVMIR/add-debuginfo-func-scope-with-crash-reproduction.mlir
@@ -0,0 +1,19 @@
+// Test that the enable-debug-info-scope-on-llvm-func pass can create its
+// DI attributes when running in the crash reproducer thread,
+
+// RUN: mlir-opt --mlir-disable-threading --mlir-pass-pipeline-crash-reproducer=. \
+// RUN:          --pass-pipeline="builtin.module(ensure-debug-info-scope-on-llvm-func)" \
+// RUN:          --mlir-print-debuginfo %s | FileCheck %s
+
+module {
+  llvm.func @func_no_debug() {
+    llvm.return loc(unknown)
+  } loc(unknown)
+} loc(unknown)
+
+// CHECK-LABEL: llvm.func @func_no_debug()
+// CHECK: llvm.return loc(#loc
+// CHECK: loc(#loc[[LOC:[0-9]+]])
+// CHECK: #di_file = #llvm.di_file<"<unknown>" in "">
+// CHECK: #di_subprogram = #llvm.di_subprogram<id = distinct[{{.*}}]<>, compileUnit = #di_compile_unit, scope = #di_file, name = "func_no_debug", linkageName = "func_no_debug", file = #di_file, line = 1, scopeLine = 1, subprogramFlags = "Definition|Optimized", type = #di_subroutine_type>
+// CHECK: #loc[[LOC]] = loc(fused<#di_subprogram>
diff --git a/mlir/test/IR/test-builtin-distinct-attrs-with-crash-reproduction.mlir b/mlir/test/IR/test-builtin-distinct-attrs-with-crash-reproduction.mlir
new file mode 100644
index 0000000000000..5c3b78bc6e154
--- /dev/null
+++ b/mlir/test/IR/test-builtin-distinct-attrs-with-crash-reproduction.mlir
@@ -0,0 +1,17 @@
+// This is a regression test that verifies that when running with crash
+// reproduction enabled, distinct attribute storage is not allocated in
+// thread-local storage. Since crash reproduction runs the pass manager in a
+// separate thread, using thread-local storage for distinct attributes causes
+// use-after-free errors once the thread that runs the pass manager joins.
+
+// RUN: mlir-opt --mlir-disable-threading --mlir-pass-pipeline-crash-reproducer=. %s -test-distinct-attrs | FileCheck %s
+
+// CHECK: #[[DIST0:.*]] = distinct[0]<42 : i32>
+// CHECK: #[[DIST1:.*]] = distinct[1]<42 : i32>
+#distinct = distinct[0]<42 : i32>
+
+// CHECK: @foo_1
+func.func @foo_1() {
+  // CHECK: "test.op"() {distinct.input = #[[DIST0]], distinct.output = #[[DIST1]]}
+  "test.op"() {distinct.input = #distinct} : () -> ()
+}

Copy link
Contributor

@gysit gysit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this issue! I added some nit comments.

I did not fully understand why there is a thread join when running with the crash reproducer. Isn't multi-threading disabled from the beginning in this case? I would have assumed the ThreadLocalCache should still work and only free the allocated attributes once the main thread terminates.

In any case, the change makes sense since there is no need for thread local allocators if multi-threading is disabled.

@gysit gysit requested a review from River707 February 25, 2025 10:40
@joker-eph
Copy link
Collaborator

Trying to see if I understand why it's not a problem when multi-threading is enabled: is this because we never joins thread for the lifetime of the MLIRContext?

Also, you're fixing the issue for -mlir-pass-pipeline-crash-reproducer when multi-threading is disabled, but what about the combination of -mlir-pass-pipeline-crash-reproducer and multi-threading?

@abulavin
Copy link
Contributor Author

abulavin commented Feb 25, 2025

I did not fully understand why there is a thread join when running with the crash reproducer. Isn't multi-threading disabled from the beginning in this case? I would have assumed the ThreadLocalCache should still work and only free the allocated attributes once the main thread terminates.

When crash reproduction is enabled, the pass manager runs its pass pipeline on a separate thread here. This thread is created and joined outside of the MLIR context. My understanding is that disabling threading only disables the MLIR context threadpool, which has no effect on the thread in the link I provided that's used to run the pass pipeline. So the thread local cache (as intended) will allocate distinct attributes in storage local to this special 'crash recovery' thread, even when threading is disabled.

Trying to see if I understand why it's not a problem when multi-threading is enabled: is this because we never joins thread for the lifetime of the MLIRContext?

Also, you're fixing the issue for -mlir-pass-pipeline-crash-reproducer when multi-threading is disabled, but what about the combination of -mlir-pass-pipeline-crash-reproducer and multi-threading?

Ah apologies, I believe I have accidentally missed out some important information on this PR. The initial issue I was investigating involved generating local crash reproducers, which requires multithreading to be disabled. I need to double check what happens for regular crash reproducers with multi threading still enabled. Thanks for pointing this out.

@abulavin
Copy link
Contributor Author

@joker-eph @gysit Thanks for the review comments. After Mehdi's suggestion to test what happens when multi-threading and crash reproduction are both enabled, I have had to make some significant changes to fix this issue. As it turns out, enabling multithreading will still exhibit the same use-after-free error because the lifetime of the thread-local storage only lasts for the lifetime of the crash reproducer thread. To remedy this, I have added some extra logic to disable the use of thread-local storage in the DistinctAttributeAllocator when the PassManager runs with crash recovery enabled. Since in this case disabling thread-local storage now means thread-safety is a concern, if threading is not disabled then a synchronised allocator is used. I appreciate this may be less than ideal so I am open to suggestions to remedy this.

Please let me know if anything is not clear.

Copy link
Contributor

@gysit gysit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we need this additional flag for the crash recovery use case. We could of course drop thread local storage at the cost of more lock contention. It is unclear to me if that would be a good performance vs implementation complexity tradeoff. Probably not.

Another option could be to find a way to prolong the lifetime of the entries in the ThreadLocalCache. I doubt there is a good/low overhead solution though given it relies on thread local variables.

Copy link
Contributor

@gysit gysit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the unreachable during the last review. I think if possible we should avoid it and I would also let the two disable functions just set one flag each and then interpret the flags in the allocate function.

Also the change is a bit outside of my area of expertise so it would be good to give the more knowledgeable reviewers time to have a look. Maybe @joker-eph could have a look to decide if the general approach makes sense.

Copy link
Contributor

@gysit gysit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for addressing the changes!

LGTM modulo some outdated comment.

Please give other reviewers some time to have a second look.

@abulavin
Copy link
Contributor Author

abulavin commented Mar 5, 2025

Please give other reviewers some time to have a second look.

Thank you for your review comments! I can see that the github bot has not added any more reviewers to this PR. Should other people be pinged on this? If so, can you suggest who may be interested in reviewing this PR?

@gysit gysit requested review from zero9178 and Mogball March 5, 2025 18:05
@gysit
Copy link
Contributor

gysit commented Mar 5, 2025

I added two more reviewers who may be interested. Feel free to land in a few days should there be no concerns.

Copy link
Contributor

@River707 River707 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of a burden would it be to just enforce no multi-threading when doing crash reproducers? I'm a little worried about the complexity of differentiating thread local storage from multithreading being on.

@@ -23,7 +23,9 @@
#include "mlir/Support/ThreadLocalCache.h"
#include "llvm/ADT/APFloat.h"
#include "llvm/ADT/PointerIntPair.h"
#include "llvm/Support/ErrorHandling.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need error handling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is left over from a previous commit, apologies. I would like to clarify though, since clangd tells me there are other headers unused here, shall I remove those too as an en-passant tidy? Notably,

  • "mlir/Support/StorageUniquer.h"
  • "llvm/ADT/PointerIntPair.h"
  • "llvm/Support/TrailingObjects.h"

@abulavin
Copy link
Contributor Author

abulavin commented Mar 6, 2025

How much of a burden would it be to just enforce no multi-threading when doing crash reproducers?

This has pass pipeline performance implications for downstream projects that have crash reproduction always on. For example, the downstream project https://github.com/triton-lang/triton has crash reproducer always enabled when running the pass manager, as you can see here: https://github.com/triton-lang/triton/blob/main/python/src/ir.cc#L1817-L1839. If we enforce multithreading to be disabled then projects like this will not be able to benefit from pass parallelisation.

I appreciate that disabling/enabling thread local storage does add complexity however my intention with this PR was not to necessarily make this part of the MLIRContext's public interface. It's only really the pass manager that needs to disable thread-local storage for safety reasons but I could not find a nice way to do this without affecting MLIRContext.h. I think if this complexity could somehow be abstracted away without affecting the MLIRContext's public methods then it minimises its impact.

As @gysit has mentioned, I think the tradeoff here is we either use a lock around the allocator and accept lock contention cost but avoid complexity surrounding toggling thread-local storage, or, we accept this complexity in favour of avoiding lock contention overhead. My opinion is that if we choose to drop using thread-local storage for DistinctAttributeStorage, the performance impact on MLIR overall will be minimal as it only affects the storage allocation for this one attribute. I don't foresee a situation where allocating DistinctAttributeStorage suddenly becomes a performance bottleneck due to there being a lock versus thread-local storage.

@abulavin
Copy link
Contributor Author

@River707 @gysit thanks for the reviews! Please can you merge this PR for me as I do not have write access? Thanks

@gysit
Copy link
Contributor

gysit commented Mar 13, 2025

I am happy to land for you.

Can you update the PR description and title. I think the paragraph that describes the change is outdated now and it may be confusing to have this in the commit history. Especially since the plain allocator is not only used when threading is disabled.

I would also recommend a shorter titel such as [mlir] Fix DistinctAttributeUniquer or similar since the problem is not limited to the disable threading case.

@abulavin abulavin changed the title [mlir] Use non thread-local allocator for DistinctAttr when threading is disabled [mlir] Fix DistinctAttributeUniquer deleting storage when crash reproduction is enabled Mar 13, 2025
@abulavin abulavin changed the title [mlir] Fix DistinctAttributeUniquer deleting storage when crash reproduction is enabled [mlir] Fix DistinctAttributeUniquer deleting attribute storage Mar 13, 2025
@abulavin abulavin changed the title [mlir] Fix DistinctAttributeUniquer deleting attribute storage [mlir] Fix DistinctAttributeUniquer deleting attribute storage when crash reproduction is enabled Mar 13, 2025
@abulavin
Copy link
Contributor Author

abulavin commented Mar 13, 2025

Thanks, I have updated the title and description and am all done now.

@gysit gysit merged commit 0aa5ba4 into llvm:main Mar 13, 2025
7 checks passed
Copy link

@abulavin Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

frederik-h pushed a commit to frederik-h/llvm-project that referenced this pull request Mar 18, 2025
…rash reproduction is enabled (llvm#128566)

Currently, `DistinctAttr` uses an allocator wrapped in a
`ThreadLocalCache` to manage attribute storage allocations. This ensures
all allocations are freed when the allocator is destroyed.

However, this setup can cause use-after-free errors when
`mlir::PassManager` runs its passes on a separate thread as a result of
crash reproduction being enabled. Distinct attribute storages are
created in the child thread's local storage and freed once the thread
joins. Attempting to access these attributes after this can result in
segmentation faults, such as during printing or alias analysis.

Example: This invocation of `mlir-opt` demonstrates the segfault issue
due to distinct attributes being created in a child thread and their
storage being freed once the thread joins:
```
mlir-opt --mlir-pass-pipeline-crash-reproducer=. --test-distinct-attrs mlir/test/IR/test-builtin-distinct-attrs.mlir
```

This pull request changes the distinct attribute allocator to use
different allocators depending on whether or not threading is enabled
and whether or not the pass manager is running its passes in a separate
thread. If multithreading is disabled, a non thread-local allocator is
used. If threading remains enabled and the pass manager invokes its pass
pipelines in a child thread, then a non-thread local but synchronised
allocator is used. This ensures that the lifetime of allocated storage
persists beyond the lifetime of the child thread.

I have added two tests for the `-test-distinct-attrs` pass and the
`-enable-debug-info-on-llvm-scope` passes that run them with crash
reproduction enabled.
ThomasRaoux pushed a commit to triton-lang/triton that referenced this pull request Mar 21, 2025
Updating LLVM in order to pull in the following change:

- llvm/llvm-project#128566

For context, crash reproduction generation in MLIR will run the
`PassManager`'s passes in a child thread. The above PR fixes crashes for
when passes such as `add_di_scope` add `DistinctAttr` to the IR and
their storage is then accessed later once the child thread joins.
Pulling this in improves QoL for out-of-tree projects and makes the pass
manager more robust to the use of `DistinctAttr`.

This pin update has also introduced the deprecation of a
`llvm::TargetMachine::createTargetMachine` overload. I've updated the
callsites to use the non-deprecated overloads.

# New contributor declaration
- [x] I am not making a trivial change, such as fixing a typo in a
comment.
- [x] I have written a PR description following these
  [rules](https://cbea.ms/git-commit/#why-not-how).
- [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.

- Select one of the following.
  - [ ] I have added tests.
    - `/test` for `lit` tests
    - `/unittest` for C++ tests
    - `/python/test` for end-to-end tests
- [x] This PR does not need a test because `this PR only updates the
LLVM pin, so CI is sufficient`.

- Select one of the following.
  - [x] I have not added any `lit` tests.
- [ ] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
    and using the instructions it generates is not minimal.)
@cota
Copy link
Contributor

cota commented Mar 25, 2025

@abulavin this is causing TSAN failures in downstream projects that we have at Google.
The problem is that the bitfields added here (threadingIsEnabled : 1 and useThreadLocalAllocator : 1) are accessed without synchronization. Example tsan report:

WARNING: ThreadSanitizer: data race (pid=337)
  Write of size 1 at 0x7260001d0ff0 by thread T224:
    #0 disableThreadLocalStorage third_party/llvm/llvm-project/mlir/lib/IR/AttributeDetail.h:434:29
    #1 mlir::MLIRContext::disableThreadLocalStorage(bool) third_party/llvm/llvm-project/mlir/lib/IR/MLIRContext.cpp:723:40
    #2 mlir::PassManager::runWithCrashRecovery(mlir::Operation*, mlir::AnalysisManager) third_party/llvm/llvm-project/mlir/lib/Pass/PassCrashRecovery.cpp:423:8
[...]
Previous write of size 1 at 0x7260001d0ff0 by thread T222:
    #0 disableThreadLocalStorage third_party/llvm/llvm-project/mlir/lib/IR/AttributeDetail.h:434:29 
    #1 mlir::MLIRContext::disableThreadLocalStorage(bool) third_party/llvm/llvm-project/mlir/lib/IR/MLIRContext.cpp:723:40
    #2 mlir::PassManager::runWithCrashRecovery(mlir::Operation*, mlir::AnalysisManager) third_party/llvm/llvm-project/mlir/lib/Pass/PassCrashRecovery.cpp:423:8

An obvious fix is to also access the bit fields with allocatorMutex being locked. However from reading the thread above it seems that your intent was to avoid contention if possible. You could define an std::atomic to keep these bits in there, but you would still have to refactor the code to only read the flag once per call to allocate.

Given that this is introducing a race, I'd normally revert the PR. However, since it is also fixing a crash, I'd suggest we apply the obvious, non-scalable fix first, then you can work on making this more performant. If that sounds good, I've uploaded #132935 for review.

cota added a commit to cota/llvm-project that referenced this pull request Mar 25, 2025
The bitfields added in llvm#128566 (`threadingIsEnabled : 1`
and `useThreadLocalAllocator : 1`) are accessed without
synchronization. Example tsan report:

```
WARNING: ThreadSanitizer: data race (pid=337)
  Write of size 1 at 0x7260001d0ff0 by thread T224:
    #0 disableThreadLocalStorage third_party/llvm/llvm-project/mlir/lib/IR/AttributeDetail.h:434:29
    llvm#1 mlir::MLIRContext::disableThreadLocalStorage(bool) third_party/llvm/llvm-project/mlir/lib/IR/MLIRContext.cpp:723:40
    llvm#2 mlir::PassManager::runWithCrashRecovery(mlir::Operation*, mlir::AnalysisManager) third_party/llvm/llvm-project/mlir/lib/Pass/PassCrashRecovery.cpp:423:8
[...]
Previous write of size 1 at 0x7260001d0ff0 by thread T222:
    #0 disableThreadLocalStorage third_party/llvm/llvm-project/mlir/lib/IR/AttributeDetail.h:434:29
    llvm#1 mlir::MLIRContext::disableThreadLocalStorage(bool) third_party/llvm/llvm-project/mlir/lib/IR/MLIRContext.cpp:723:40
    llvm#2 mlir::PassManager::runWithCrashRecovery(mlir::Operation*, mlir::AnalysisManager) third_party/llvm/llvm-project/mlir/lib/Pass/PassCrashRecovery.cpp:423:8
```

Fix the race by serializing accesses to these fields with
the existing `allocatorMutex`.
chsigg pushed a commit to openxla/triton that referenced this pull request Mar 25, 2025
Updating LLVM in order to pull in the following change:

- llvm/llvm-project#128566

For context, crash reproduction generation in MLIR will run the
`PassManager`'s passes in a child thread. The above PR fixes crashes for
when passes such as `add_di_scope` add `DistinctAttr` to the IR and
their storage is then accessed later once the child thread joins.
Pulling this in improves QoL for out-of-tree projects and makes the pass
manager more robust to the use of `DistinctAttr`.

This pin update has also introduced the deprecation of a
`llvm::TargetMachine::createTargetMachine` overload. I've updated the
callsites to use the non-deprecated overloads.

- [x] I am not making a trivial change, such as fixing a typo in a
comment.
- [x] I have written a PR description following these
  [rules](https://cbea.ms/git-commit/#why-not-how).
- [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.

- Select one of the following.
  - [ ] I have added tests.
    - `/test` for `lit` tests
    - `/unittest` for C++ tests
    - `/python/test` for end-to-end tests
- [x] This PR does not need a test because `this PR only updates the
LLVM pin, so CI is sufficient`.

- Select one of the following.
  - [x] I have not added any `lit` tests.
- [ ] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
    and using the instructions it generates is not minimal.)
@cota
Copy link
Contributor

cota commented Mar 25, 2025

There is a follow-up discussion in #132935 and the conclusion is that we should revert this PR. Will do so now.

cota added a commit that referenced this pull request Mar 25, 2025
…e when crash reproduction is enabled" (#133000)

Reverts #128566. See as well the discussion in
#132935.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 25, 2025
…bute storage when crash reproduction is enabled" (#133000)

Reverts llvm/llvm-project#128566. See as well the discussion in
llvm/llvm-project#132935.
makslevental pushed a commit to makslevental/triton that referenced this pull request Mar 28, 2025
Updating LLVM in order to pull in the following change:

- llvm/llvm-project#128566

For context, crash reproduction generation in MLIR will run the
`PassManager`'s passes in a child thread. The above PR fixes crashes for
when passes such as `add_di_scope` add `DistinctAttr` to the IR and
their storage is then accessed later once the child thread joins.
Pulling this in improves QoL for out-of-tree projects and makes the pass
manager more robust to the use of `DistinctAttr`.

This pin update has also introduced the deprecation of a
`llvm::TargetMachine::createTargetMachine` overload. I've updated the
callsites to use the non-deprecated overloads.

# New contributor declaration
- [x] I am not making a trivial change, such as fixing a typo in a
comment.
- [x] I have written a PR description following these
  [rules](https://cbea.ms/git-commit/#why-not-how).
- [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.

- Select one of the following.
  - [ ] I have added tests.
    - `/test` for `lit` tests
    - `/unittest` for C++ tests
    - `/python/test` for end-to-end tests
- [x] This PR does not need a test because `this PR only updates the
LLVM pin, so CI is sufficient`.

- Select one of the following.
  - [x] I have not added any `lit` tests.
- [ ] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
    and using the instructions it generates is not minimal.)
vwbaker pushed a commit to openxla/triton that referenced this pull request Apr 3, 2025
Updating LLVM in order to pull in the following change:

- llvm/llvm-project#128566

For context, crash reproduction generation in MLIR will run the
`PassManager`'s passes in a child thread. The above PR fixes crashes for
when passes such as `add_di_scope` add `DistinctAttr` to the IR and
their storage is then accessed later once the child thread joins.
Pulling this in improves QoL for out-of-tree projects and makes the pass
manager more robust to the use of `DistinctAttr`.

This pin update has also introduced the deprecation of a
`llvm::TargetMachine::createTargetMachine` overload. I've updated the
callsites to use the non-deprecated overloads.

- [x] I am not making a trivial change, such as fixing a typo in a
comment.
- [x] I have written a PR description following these
  [rules](https://cbea.ms/git-commit/#why-not-how).
- [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.

- Select one of the following.
  - [ ] I have added tests.
    - `/test` for `lit` tests
    - `/unittest` for C++ tests
    - `/python/test` for end-to-end tests
- [x] This PR does not need a test because `this PR only updates the
LLVM pin, so CI is sufficient`.

- Select one of the following.
  - [x] I have not added any `lit` tests.
- [ ] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
    and using the instructions it generates is not minimal.)
loislo pushed a commit to openxla/triton that referenced this pull request Apr 14, 2025
Updating LLVM in order to pull in the following change:

- llvm/llvm-project#128566

For context, crash reproduction generation in MLIR will run the
`PassManager`'s passes in a child thread. The above PR fixes crashes for
when passes such as `add_di_scope` add `DistinctAttr` to the IR and
their storage is then accessed later once the child thread joins.
Pulling this in improves QoL for out-of-tree projects and makes the pass
manager more robust to the use of `DistinctAttr`.

This pin update has also introduced the deprecation of a
`llvm::TargetMachine::createTargetMachine` overload. I've updated the
callsites to use the non-deprecated overloads.

- [x] I am not making a trivial change, such as fixing a typo in a
comment.
- [x] I have written a PR description following these
  [rules](https://cbea.ms/git-commit/#why-not-how).
- [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.

- Select one of the following.
  - [ ] I have added tests.
    - `/test` for `lit` tests
    - `/unittest` for C++ tests
    - `/python/test` for end-to-end tests
- [x] This PR does not need a test because `this PR only updates the
LLVM pin, so CI is sufficient`.

- Select one of the following.
  - [x] I have not added any `lit` tests.
- [ ] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
    and using the instructions it generates is not minimal.)
vwbaker pushed a commit to openxla/triton that referenced this pull request Apr 22, 2025
Updating LLVM in order to pull in the following change:

- llvm/llvm-project#128566

For context, crash reproduction generation in MLIR will run the
`PassManager`'s passes in a child thread. The above PR fixes crashes for
when passes such as `add_di_scope` add `DistinctAttr` to the IR and
their storage is then accessed later once the child thread joins.
Pulling this in improves QoL for out-of-tree projects and makes the pass
manager more robust to the use of `DistinctAttr`.

This pin update has also introduced the deprecation of a
`llvm::TargetMachine::createTargetMachine` overload. I've updated the
callsites to use the non-deprecated overloads.

- [x] I am not making a trivial change, such as fixing a typo in a
comment.
- [x] I have written a PR description following these
  [rules](https://cbea.ms/git-commit/#why-not-how).
- [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.

- Select one of the following.
  - [ ] I have added tests.
    - `/test` for `lit` tests
    - `/unittest` for C++ tests
    - `/python/test` for end-to-end tests
- [x] This PR does not need a test because `this PR only updates the
LLVM pin, so CI is sufficient`.

- Select one of the following.
  - [x] I have not added any `lit` tests.
- [ ] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
    and using the instructions it generates is not minimal.)
vwbaker pushed a commit to openxla/triton that referenced this pull request Apr 25, 2025
Updating LLVM in order to pull in the following change:

- llvm/llvm-project#128566

For context, crash reproduction generation in MLIR will run the
`PassManager`'s passes in a child thread. The above PR fixes crashes for
when passes such as `add_di_scope` add `DistinctAttr` to the IR and
their storage is then accessed later once the child thread joins.
Pulling this in improves QoL for out-of-tree projects and makes the pass
manager more robust to the use of `DistinctAttr`.

This pin update has also introduced the deprecation of a
`llvm::TargetMachine::createTargetMachine` overload. I've updated the
callsites to use the non-deprecated overloads.

- [x] I am not making a trivial change, such as fixing a typo in a
comment.
- [x] I have written a PR description following these
  [rules](https://cbea.ms/git-commit/#why-not-how).
- [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.

- Select one of the following.
  - [ ] I have added tests.
    - `/test` for `lit` tests
    - `/unittest` for C++ tests
    - `/python/test` for end-to-end tests
- [x] This PR does not need a test because `this PR only updates the
LLVM pin, so CI is sufficient`.

- Select one of the following.
  - [x] I have not added any `lit` tests.
- [ ] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
    and using the instructions it generates is not minimal.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlir:core MLIR Core Infrastructure mlir:llvm mlir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants