[X86] Avoid generating nested CALLSEQ for TLS pointer function arguments #106965

ritter-x2a · 2024-09-02T10:11:12Z

When a pointer to thread-local storage is passed in a function call,
ISel first lowers the call and wraps the resulting code in CALLSEQ
markers. Afterwards, to compute the pointer to TLS, a call to retrieve
the TLS base address is generated and then wrapped in a set of CALLSEQ
markers. If the latter call is inserted into the call sequence of the
former call, this leads to nested call frames, which are illegal and
lead to errors in the machine verifier.

This patch avoids surrounding the call to compute the TLS base address
in CALLSEQ markers if it is already surrounded by such markers. It
relies on zero-sized call frames being represented in the call frame
size info stored in the MachineBBs.

Fixes #45574 and #98042.

ritter-x2a · 2024-09-02T10:11:17Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @ritter-x2a and the rest of your teammates on Graphite

llvmbot · 2024-09-02T10:18:47Z

@llvm/pr-subscribers-backend-x86

Author: Fabian Ritter (ritter-x2a)

Changes

When a pointer to thread-local storage is passed in a function call,
ISel first lowers the call and wraps the resulting code in CALLSEQ
markers. Afterwards, to compute the pointer to TLS, a call to retrieve
the TLS base address is generated and then wrapped in a set of CALLSEQ
markers. If the latter call is inserted into the call sequence of the
former call, this leads to nested call frames, which are illegal and
lead to errors in the machine verifier.

This patch avoids surrounding the call to compute the TLS base address
in CALLSEQ markers if it is already surrounded by such markers. It
relies on zero-sized call frames being represented in the call frame
size info stored in the MachineBBs.

Fixes #45574 and #98042.

Full diff: https://github.com/llvm/llvm-project/pull/106965.diff

2 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+7)
(added) llvm/test/CodeGen/X86/tls-function-argument.ll (+17)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index bbee0af109c74b..bf9777888df831 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -35593,6 +35593,13 @@ X86TargetLowering::EmitLoweredTLSAddr(MachineInstr &MI,
   // inside MC, therefore without the two markers shrink-wrapping
   // may push the prologue/epilogue pass them.
   const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
+
+  // Do not introduce CALLSEQ markers if we are already in a call sequence.
+  // Nested call sequences are not allowed and cause errors in the machine
+  // verifier.
+  if (TII.getCallFrameSizeAt(MI).has_value())
+    return BB;
+
   const MIMetadata MIMD(MI);
   MachineFunction &MF = *BB->getParent();
 
diff --git a/llvm/test/CodeGen/X86/tls-function-argument.ll b/llvm/test/CodeGen/X86/tls-function-argument.ll
new file mode 100644
index 00000000000000..ec2d664fc6b96f
--- /dev/null
+++ b/llvm/test/CodeGen/X86/tls-function-argument.ll
@@ -0,0 +1,17 @@
+; RUN: llc -verify-machineinstrs < %s -relocation-model=pic
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; Passing a pointer to thread-local storage to a function can be problematic
+; since computing such addresses requires a function call that is introduced
+; very late in instruction selection. We need to ensure that we don't introduce
+; nested call sequence markers if this function call happens in a call sequence.
+
+@TLS = internal thread_local global i64 zeroinitializer, align 8
+declare void @bar(ptr)
+define internal void @foo() {
+call void @bar(ptr @TLS)
+call void @bar(ptr @TLS)
+ret void
+}
\ No newline at end of file

llvm/test/CodeGen/X86/tls-function-argument.ll

arsenm · 2024-09-04T14:10:00Z

llvm/test/CodeGen/X86/tls-function-argument.ll

@@ -0,0 +1,30 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=x86_64 -verify-machineinstrs < %s -relocation-model=pic | FileCheck %s


< %s to th eend, don't need -verify-machineinstrs

I think -verify-machineinstrs is useful here: Without my patch, the test fails in the MachineVerifier, where the call stack pseudos are checked. Without -verify-machineinstrs, this would only happen in builds with expensive checks enabled, and the test would be ineffective for other builds.

llvm/test/CodeGen/X86/tls-function-argument.ll

jyknight · 2024-09-04T15:17:42Z

This sounds sketchy to me. Is it really valid to enter a second call inside another call's CALLSEQ markers, but only if we avoid adding a second nested set of markers? It feels like attacking the symptom of the issue, but not the root cause. (I'm not certain it's not valid, but it just seems really suspicious...)

ritter-x2a · 2024-09-05T07:30:07Z

This sounds sketchy to me. Is it really valid to enter a second call inside another call's CALLSEQ markers, but only if we avoid adding a second nested set of markers? It feels like attacking the symptom of the issue, but not the root cause. (I'm not certain it's not valid, but it just seems really suspicious...)

From what I've gathered from the source comments and the patch introducing the code that inserts these CALLSEQ markers for TLSADDRs, their only point here is to stop shrink-wrapping from moving the function prologue/epilogue past the call to get the TLS address. This should also be given when the TLSADDR is in another CALLSEQ.

I am however by no means an expert on this topic; I'd appreciate more insights on which uses of CALLSEQ markers are and are not valid (besides the MachineVerifier checks).

jayfoad · 2024-09-05T12:26:18Z

This sounds sketchy to me. Is it really valid to enter a second call inside another call's CALLSEQ markers, but only if we avoid adding a second nested set of markers? It feels like attacking the symptom of the issue, but not the root cause. (I'm not certain it's not valid, but it just seems really suspicious...)

From what I've gathered from the source comments and the patch introducing the code that inserts these CALLSEQ markers for TLSADDRs, their only point here is to stop shrink-wrapping from moving the function prologue/epilogue past the call to get the TLS address. This should also be given when the TLSADDR is in another CALLSEQ.

I am however by no means an expert on this topic; I'd appreciate more insights on which uses of CALLSEQ markers are and are not valid (besides the MachineVerifier checks).

I also wondered about this. Are there other mechanisms that block shrink wrapping from moving the prologue? E.g. what if a regular instruction (not a call) has to come after the prologue, how would that be marked? Maybe adding an implicit use or def of some particular physical register would be enough??

When a pointer to thread-local storage is passed in a function call, ISel first lowers the call and wraps the resulting code in CALLSEQ markers. Afterwards, to compute the pointer to TLS, a call to retrieve the TLS base address is generated and then wrapped in a set of CALLSEQ markers. If the latter call is inserted into the call sequence of the former call, this leads to nested call frames, which are illegal and lead to errors in the machine verifier. This patch avoids surrounding the call to compute the TLS base address in CALLSEQ markers if it is already surrounded by such markers. It relies on zero-sized call frames being represented in the call frame size info stored in the MachineBBs. Fixes #45574 and #98042.

arsenm · 2024-10-10T14:03:06Z

llvm/lib/Target/X86/X86ISelLowering.cpp

@@ -35971,6 +35971,15 @@ X86TargetLowering::EmitLoweredTLSAddr(MachineInstr &MI,
  // inside MC, therefore without the two markers shrink-wrapping
  // may push the prologue/epilogue pass them.
  const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
+


I don't see why EmitLoweredTLS needs to be handled by raw MI emission. Why can't this just emit the call sequence during legalization, like for an ordinary call or dynamic_alloca? If you did that, you would avoid appearing in a call sequence in the first place

I now opened PR #113706 for an attempt to do that.

When lowering a TLS address for an ELF target, we introduce a call to obtain the TLS base address. So far, we do not insert CALLSEQ_START/END markers around this call when it is generated, but use a custom inserter to insert them in a later phase. This is problematic, since the TLS address call can land in a CALLSEQ for another calls before it is wrapped in its own CALLSEQ. That results in nested CALLSEQs, which are illegal and cause errors when expensive checks are enabled, e.g., in issues llvm#45574 and llvm#98042. This patch instead wraps each TLS address call in a CALLSEQ when it is generated so that instruction selection can avoid nested CALLSEQs. This is an alternative to PR llvm#106965, which instead changes the custom inserter to avoid generating CALLSEQs when the TLS address call is already in a CALLSEQ. This patch also effectively reverts commit 228978c, which introduced the CustomInserter that so far added the CALLSEQ around TLSAddrs. Fixes llvm#45574 and llvm#98042.

…113706) When lowering a TLS address for an ELF target, we introduce a call to obtain the TLS base address. So far, we do not insert CALLSEQ_START/END markers around this call when it is generated, but use a custom inserter to insert them in a later phase. This is problematic, since the TLS address call can land in a CALLSEQ for another call before it is wrapped in its own CALLSEQ. That results in nested CALLSEQs, which are illegal and cause errors when expensive checks are enabled, e.g., in issues #45574 and #98042. This patch instead wraps each TLS address call in a CALLSEQ when it is generated so that instruction selection can avoid nested CALLSEQs. This is an alternative to PR #106965, which instead changes the custom inserter to avoid generating CALLSEQs when the TLS address call is already in a CALLSEQ. This patch also effectively reverts commit [228978c](228978c), which introduced the CustomInserter that so far added the CALLSEQ around TLSAddrs. Fixes #45574 and #98042.

ritter-x2a · 2024-11-04T13:18:30Z

With #113706 now merged, this PR is obsolete.

…lvm#113706) When lowering a TLS address for an ELF target, we introduce a call to obtain the TLS base address. So far, we do not insert CALLSEQ_START/END markers around this call when it is generated, but use a custom inserter to insert them in a later phase. This is problematic, since the TLS address call can land in a CALLSEQ for another call before it is wrapped in its own CALLSEQ. That results in nested CALLSEQs, which are illegal and cause errors when expensive checks are enabled, e.g., in issues llvm#45574 and llvm#98042. This patch instead wraps each TLS address call in a CALLSEQ when it is generated so that instruction selection can avoid nested CALLSEQs. This is an alternative to PR llvm#106965, which instead changes the custom inserter to avoid generating CALLSEQs when the TLS address call is already in a CALLSEQ. This patch also effectively reverts commit [228978c](llvm@228978c), which introduced the CustomInserter that so far added the CALLSEQ around TLSAddrs. Fixes llvm#45574 and llvm#98042.

ritter-x2a mentioned this pull request Sep 2, 2024

[CodeGen] Compute call frame sizes instead of storing them in MachineBBs #106964

Closed

ritter-x2a requested review from arsenm, jayfoad, jyknight, kzhuravl, RKSimon, shiltian and topperc September 2, 2024 10:16

ritter-x2a marked this pull request as ready for review September 2, 2024 10:18

llvmbot added the backend:X86 label Sep 2, 2024

shiltian reviewed Sep 3, 2024

View reviewed changes

llvm/test/CodeGen/X86/tls-function-argument.ll Outdated Show resolved Hide resolved

shiltian requested a review from phoebewang September 3, 2024 02:25

phoebewang reviewed Sep 3, 2024

View reviewed changes

llvm/test/CodeGen/X86/tls-function-argument.ll Outdated Show resolved Hide resolved

llvm/test/CodeGen/X86/tls-function-argument.ll Outdated Show resolved Hide resolved

llvm/test/CodeGen/X86/tls-function-argument.ll Outdated Show resolved Hide resolved

ritter-x2a force-pushed the users/ritter-x2a/avoid-nested-callseqs-for-tls-args branch from 7159933 to dadca8c Compare September 3, 2024 15:03

arsenm reviewed Sep 4, 2024

View reviewed changes

ritter-x2a force-pushed the users/ritter-x2a/avoid-nested-callseqs-for-tls-args branch from dadca8c to c332034 Compare September 5, 2024 07:05

ritter-x2a force-pushed the users/ritter-x2a/avoid-nested-callseqs-for-tls-args branch from c332034 to a647e44 Compare September 5, 2024 10:43

ritter-x2a force-pushed the users/ritter-x2a/avoid-nested-callseqs-for-tls-args branch 2 times, most recently from 1bd72c4 to edc0f26 Compare September 20, 2024 10:42

ritter-x2a force-pushed the users/ritter-x2a/track-zero-sized-call-frames-in-machinebb branch from 20aa681 to 7020475 Compare October 10, 2024 10:15

ritter-x2a force-pushed the users/ritter-x2a/avoid-nested-callseqs-for-tls-args branch from edc0f26 to 58e4828 Compare October 10, 2024 10:16

arsenm reviewed Oct 10, 2024

View reviewed changes

ritter-x2a mentioned this pull request Oct 25, 2024

[X86] Insert CALLSEQ when lowering GlobalTLSAddress for ELF targets #113706

Merged

ritter-x2a closed this Nov 4, 2024

ritter-x2a deleted the users/ritter-x2a/avoid-nested-callseqs-for-tls-args branch November 4, 2024 13:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] Avoid generating nested CALLSEQ for TLS pointer function arguments #106965

[X86] Avoid generating nested CALLSEQ for TLS pointer function arguments #106965

Uh oh!

ritter-x2a commented Sep 2, 2024

Uh oh!

ritter-x2a commented Sep 2, 2024

Uh oh!

llvmbot commented Sep 2, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arsenm Sep 4, 2024

Uh oh!

ritter-x2a Sep 5, 2024

Uh oh!

Uh oh!

jyknight commented Sep 4, 2024

Uh oh!

ritter-x2a commented Sep 5, 2024

Uh oh!

jayfoad commented Sep 5, 2024

Uh oh!

arsenm Oct 10, 2024

Uh oh!

ritter-x2a Oct 25, 2024

Uh oh!

ritter-x2a commented Nov 4, 2024

Uh oh!

Uh oh!

		@@ -0,0 +1,30 @@
		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
		; RUN: llc -mtriple=x86_64 -verify-machineinstrs < %s -relocation-model=pic \| FileCheck %s

[X86] Avoid generating nested CALLSEQ for TLS pointer function arguments #106965

[X86] Avoid generating nested CALLSEQ for TLS pointer function arguments #106965

Uh oh!

Conversation

ritter-x2a commented Sep 2, 2024

Uh oh!

ritter-x2a commented Sep 2, 2024

Uh oh!

llvmbot commented Sep 2, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arsenm Sep 4, 2024

Choose a reason for hiding this comment

Uh oh!

ritter-x2a Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jyknight commented Sep 4, 2024

Uh oh!

ritter-x2a commented Sep 5, 2024

Uh oh!

jayfoad commented Sep 5, 2024

Uh oh!

arsenm Oct 10, 2024

Choose a reason for hiding this comment

Uh oh!

ritter-x2a Oct 25, 2024

Choose a reason for hiding this comment

Uh oh!

ritter-x2a commented Nov 4, 2024

Uh oh!

Uh oh!