Skip to content

[X86] Avoid generating nested CALLSEQ for TLS pointer function arguments #106965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

ritter-x2a
Copy link
Member

When a pointer to thread-local storage is passed in a function call,
ISel first lowers the call and wraps the resulting code in CALLSEQ
markers. Afterwards, to compute the pointer to TLS, a call to retrieve
the TLS base address is generated and then wrapped in a set of CALLSEQ
markers. If the latter call is inserted into the call sequence of the
former call, this leads to nested call frames, which are illegal and
lead to errors in the machine verifier.

This patch avoids surrounding the call to compute the TLS base address
in CALLSEQ markers if it is already surrounded by such markers. It
relies on zero-sized call frames being represented in the call frame
size info stored in the MachineBBs.

Fixes #45574 and #98042.

Copy link
Member Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @ritter-x2a and the rest of your teammates on Graphite Graphite

@llvmbot
Copy link
Member

llvmbot commented Sep 2, 2024

@llvm/pr-subscribers-backend-x86

Author: Fabian Ritter (ritter-x2a)

Changes

When a pointer to thread-local storage is passed in a function call,
ISel first lowers the call and wraps the resulting code in CALLSEQ
markers. Afterwards, to compute the pointer to TLS, a call to retrieve
the TLS base address is generated and then wrapped in a set of CALLSEQ
markers. If the latter call is inserted into the call sequence of the
former call, this leads to nested call frames, which are illegal and
lead to errors in the machine verifier.

This patch avoids surrounding the call to compute the TLS base address
in CALLSEQ markers if it is already surrounded by such markers. It
relies on zero-sized call frames being represented in the call frame
size info stored in the MachineBBs.

Fixes #45574 and #98042.


Full diff: https://github.com/llvm/llvm-project/pull/106965.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+7)
  • (added) llvm/test/CodeGen/X86/tls-function-argument.ll (+17)
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index bbee0af109c74b..bf9777888df831 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -35593,6 +35593,13 @@ X86TargetLowering::EmitLoweredTLSAddr(MachineInstr &MI,
   // inside MC, therefore without the two markers shrink-wrapping
   // may push the prologue/epilogue pass them.
   const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
+
+  // Do not introduce CALLSEQ markers if we are already in a call sequence.
+  // Nested call sequences are not allowed and cause errors in the machine
+  // verifier.
+  if (TII.getCallFrameSizeAt(MI).has_value())
+    return BB;
+
   const MIMetadata MIMD(MI);
   MachineFunction &MF = *BB->getParent();
 
diff --git a/llvm/test/CodeGen/X86/tls-function-argument.ll b/llvm/test/CodeGen/X86/tls-function-argument.ll
new file mode 100644
index 00000000000000..ec2d664fc6b96f
--- /dev/null
+++ b/llvm/test/CodeGen/X86/tls-function-argument.ll
@@ -0,0 +1,17 @@
+; RUN: llc -verify-machineinstrs < %s -relocation-model=pic
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; Passing a pointer to thread-local storage to a function can be problematic
+; since computing such addresses requires a function call that is introduced
+; very late in instruction selection. We need to ensure that we don't introduce
+; nested call sequence markers if this function call happens in a call sequence.
+
+@TLS = internal thread_local global i64 zeroinitializer, align 8
+declare void @bar(ptr)
+define internal void @foo() {
+call void @bar(ptr @TLS)
+call void @bar(ptr @TLS)
+ret void
+}
\ No newline at end of file

@shiltian shiltian requested a review from phoebewang September 3, 2024 02:25
@ritter-x2a ritter-x2a force-pushed the users/ritter-x2a/avoid-nested-callseqs-for-tls-args branch from 7159933 to dadca8c Compare September 3, 2024 15:03
@@ -0,0 +1,30 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc -mtriple=x86_64 -verify-machineinstrs < %s -relocation-model=pic | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

< %s to th eend, don't need -verify-machineinstrs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think -verify-machineinstrs is useful here: Without my patch, the test fails in the MachineVerifier, where the call stack pseudos are checked. Without -verify-machineinstrs, this would only happen in builds with expensive checks enabled, and the test would be ineffective for other builds.

@jyknight
Copy link
Member

jyknight commented Sep 4, 2024

This sounds sketchy to me. Is it really valid to enter a second call inside another call's CALLSEQ markers, but only if we avoid adding a second nested set of markers? It feels like attacking the symptom of the issue, but not the root cause. (I'm not certain it's not valid, but it just seems really suspicious...)

@ritter-x2a ritter-x2a force-pushed the users/ritter-x2a/avoid-nested-callseqs-for-tls-args branch from dadca8c to c332034 Compare September 5, 2024 07:05
Copy link
Member Author

This sounds sketchy to me. Is it really valid to enter a second call inside another call's CALLSEQ markers, but only if we avoid adding a second nested set of markers? It feels like attacking the symptom of the issue, but not the root cause. (I'm not certain it's not valid, but it just seems really suspicious...)

From what I've gathered from the source comments and the patch introducing the code that inserts these CALLSEQ markers for TLSADDRs, their only point here is to stop shrink-wrapping from moving the function prologue/epilogue past the call to get the TLS address. This should also be given when the TLSADDR is in another CALLSEQ.

I am however by no means an expert on this topic; I'd appreciate more insights on which uses of CALLSEQ markers are and are not valid (besides the MachineVerifier checks).

@ritter-x2a ritter-x2a force-pushed the users/ritter-x2a/avoid-nested-callseqs-for-tls-args branch from c332034 to a647e44 Compare September 5, 2024 10:43
@jayfoad
Copy link
Contributor

jayfoad commented Sep 5, 2024

This sounds sketchy to me. Is it really valid to enter a second call inside another call's CALLSEQ markers, but only if we avoid adding a second nested set of markers? It feels like attacking the symptom of the issue, but not the root cause. (I'm not certain it's not valid, but it just seems really suspicious...)

From what I've gathered from the source comments and the patch introducing the code that inserts these CALLSEQ markers for TLSADDRs, their only point here is to stop shrink-wrapping from moving the function prologue/epilogue past the call to get the TLS address. This should also be given when the TLSADDR is in another CALLSEQ.

I am however by no means an expert on this topic; I'd appreciate more insights on which uses of CALLSEQ markers are and are not valid (besides the MachineVerifier checks).

I also wondered about this. Are there other mechanisms that block shrink wrapping from moving the prologue? E.g. what if a regular instruction (not a call) has to come after the prologue, how would that be marked? Maybe adding an implicit use or def of some particular physical register would be enough??

@ritter-x2a ritter-x2a force-pushed the users/ritter-x2a/avoid-nested-callseqs-for-tls-args branch 2 times, most recently from 1bd72c4 to edc0f26 Compare September 20, 2024 10:42
When a pointer to thread-local storage is passed in a function call,
ISel first lowers the call and wraps the resulting code in CALLSEQ
markers. Afterwards, to compute the pointer to TLS, a call to retrieve
the TLS base address is generated and then wrapped in a set of CALLSEQ
markers. If the latter call is inserted into the call sequence of the
former call, this leads to nested call frames, which are illegal and
lead to errors in the machine verifier.

This patch avoids surrounding the call to compute the TLS base address
in CALLSEQ markers if it is already surrounded by such markers. It
relies on zero-sized call frames being represented in the call frame
size info stored in the MachineBBs.

Fixes #45574 and #98042.
@ritter-x2a ritter-x2a force-pushed the users/ritter-x2a/track-zero-sized-call-frames-in-machinebb branch from 20aa681 to 7020475 Compare October 10, 2024 10:15
@ritter-x2a ritter-x2a force-pushed the users/ritter-x2a/avoid-nested-callseqs-for-tls-args branch from edc0f26 to 58e4828 Compare October 10, 2024 10:16
@@ -35971,6 +35971,15 @@ X86TargetLowering::EmitLoweredTLSAddr(MachineInstr &MI,
// inside MC, therefore without the two markers shrink-wrapping
// may push the prologue/epilogue pass them.
const TargetInstrInfo &TII = *Subtarget.getInstrInfo();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why EmitLoweredTLS needs to be handled by raw MI emission. Why can't this just emit the call sequence during legalization, like for an ordinary call or dynamic_alloca? If you did that, you would avoid appearing in a call sequence in the first place

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now opened PR #113706 for an attempt to do that.

ritter-x2a added a commit to ritter-x2a/llvm-project that referenced this pull request Oct 25, 2024
When lowering a TLS address for an ELF target, we introduce a call to obtain
the TLS base address. So far, we do not insert CALLSEQ_START/END markers around
this call when it is generated, but use a custom inserter to insert them in a
later phase. This is problematic, since the TLS address call can land in a
CALLSEQ for another calls before it is wrapped in its own CALLSEQ. That results
in nested CALLSEQs, which are illegal and cause errors when expensive checks
are enabled, e.g., in issues llvm#45574 and llvm#98042.

This patch instead wraps each TLS address call in a CALLSEQ when it is
generated so that instruction selection can avoid nested CALLSEQs. This is an
alternative to PR llvm#106965, which instead changes the custom inserter to avoid
generating CALLSEQs when the TLS address call is already in a CALLSEQ.

This patch also effectively reverts commit 228978c, which introduced the
CustomInserter that so far added the CALLSEQ around TLSAddrs.

Fixes llvm#45574 and llvm#98042.
ritter-x2a added a commit to ritter-x2a/llvm-project that referenced this pull request Nov 4, 2024
When lowering a TLS address for an ELF target, we introduce a call to obtain
the TLS base address. So far, we do not insert CALLSEQ_START/END markers around
this call when it is generated, but use a custom inserter to insert them in a
later phase. This is problematic, since the TLS address call can land in a
CALLSEQ for another calls before it is wrapped in its own CALLSEQ. That results
in nested CALLSEQs, which are illegal and cause errors when expensive checks
are enabled, e.g., in issues llvm#45574 and llvm#98042.

This patch instead wraps each TLS address call in a CALLSEQ when it is
generated so that instruction selection can avoid nested CALLSEQs. This is an
alternative to PR llvm#106965, which instead changes the custom inserter to avoid
generating CALLSEQs when the TLS address call is already in a CALLSEQ.

This patch also effectively reverts commit 228978c, which introduced the
CustomInserter that so far added the CALLSEQ around TLSAddrs.

Fixes llvm#45574 and llvm#98042.
ritter-x2a added a commit that referenced this pull request Nov 4, 2024
…113706)

When lowering a TLS address for an ELF target, we introduce a call to
obtain the TLS base address. So far, we do not insert CALLSEQ_START/END
markers around this call when it is generated, but use a custom inserter
to insert them in a later phase. This is problematic, since the TLS
address call can land in a CALLSEQ for another call before it is
wrapped in its own CALLSEQ. That results in nested CALLSEQs, which are
illegal and cause errors when expensive checks are enabled, e.g., in
issues #45574 and #98042.

This patch instead wraps each TLS address call in a CALLSEQ when it is
generated so that instruction selection can avoid nested CALLSEQs. This
is an alternative to PR #106965, which instead changes the custom
inserter to avoid generating CALLSEQs when the TLS address call is
already in a CALLSEQ.

This patch also effectively reverts commit
[228978c](228978c),
which introduced the CustomInserter that so far added the CALLSEQ around
TLSAddrs.

Fixes #45574 and #98042.
@ritter-x2a ritter-x2a closed this Nov 4, 2024
@ritter-x2a
Copy link
Member Author

With #113706 now merged, this PR is obsolete.

@ritter-x2a ritter-x2a deleted the users/ritter-x2a/avoid-nested-callseqs-for-tls-args branch November 4, 2024 13:19
PhilippRados pushed a commit to PhilippRados/llvm-project that referenced this pull request Nov 6, 2024
…lvm#113706)

When lowering a TLS address for an ELF target, we introduce a call to
obtain the TLS base address. So far, we do not insert CALLSEQ_START/END
markers around this call when it is generated, but use a custom inserter
to insert them in a later phase. This is problematic, since the TLS
address call can land in a CALLSEQ for another call before it is
wrapped in its own CALLSEQ. That results in nested CALLSEQs, which are
illegal and cause errors when expensive checks are enabled, e.g., in
issues llvm#45574 and llvm#98042.

This patch instead wraps each TLS address call in a CALLSEQ when it is
generated so that instruction selection can avoid nested CALLSEQs. This
is an alternative to PR llvm#106965, which instead changes the custom
inserter to avoid generating CALLSEQs when the TLS address call is
already in a CALLSEQ.

This patch also effectively reverts commit
[228978c](llvm@228978c),
which introduced the CustomInserter that so far added the CALLSEQ around
TLSAddrs.

Fixes llvm#45574 and llvm#98042.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants