Skip to content

[X86] Avoid generating nested CALLSEQ for TLS pointer function arguments #106965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 23 additions & 2 deletions llvm/include/llvm/CodeGen/MachineFrameInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ class MachineFunction;
class MachineBasicBlock;
class BitVector;
class AllocaInst;
class MachineFrameSizeInfo;
class TargetInstrInfo;

/// The CalleeSavedInfo class tracks the information need to locate where a
Expand Down Expand Up @@ -283,6 +284,10 @@ class MachineFrameInfo {
/// It is only valid during and after prolog/epilog code insertion.
uint64_t MaxCallFrameSize = ~UINT64_C(0);

/// Call frame sizes for the MachineFunction's MachineBasicBlocks. This is set
/// by the MachineFrameSizeInfo constructor and cleared by its destructor.
MachineFrameSizeInfo *SizeInfo = nullptr;

/// The number of bytes of callee saved registers that the target wants to
/// report for the current function in the CodeView S_FRAMEPROC record.
unsigned CVBytesOfCalleeSavedRegisters = 0;
Expand Down Expand Up @@ -676,6 +681,13 @@ class MachineFrameInfo {
}
void setMaxCallFrameSize(uint64_t S) { MaxCallFrameSize = S; }

/// Return an object that can be queried for call frame sizes at specific
/// locations in the MachineFunction. Constructing a MachineFrameSizeInfo
/// object for the MachineFunction automatically makes it available via this
/// field during the object's lifetime.
MachineFrameSizeInfo *getSizeInfo() const { return SizeInfo; }
void setSizeInfo(MachineFrameSizeInfo *SI) { SizeInfo = SI; }

/// Returns how many bytes of callee-saved registers the target pushed in the
/// prologue. Only used for debug info.
unsigned getCVBytesOfCalleeSavedRegisters() const {
Expand Down Expand Up @@ -851,15 +863,24 @@ class MachineFrameInfo {
/// MachineBasicBlocks of a MachineFunction based on call frame setup and
/// destroy pseudo instructions. Usually, no call frame is open at block
/// boundaries, except if a call sequence has been split into multiple blocks.
/// Computing this information is deferred until it is queried.
///
/// Computing this information is deferred until it is queried. Upon
/// construction, a MachineFrameSizeInfo object registers itself in the
/// MachineFunction's MachineFrameInfo (and it unregisters when destructed).
/// While registered, it can be retrieved via MachineFrameInfo::getSizeInfo().
///
/// This class assumes that call frame instructions are placed properly, i.e.,
/// every program path hits a frame destroy of equal size after hitting a frame
/// setup, and a frame setup of equal size before a frame destroy. Nested call
/// frame sequences are not allowed.
class MachineFrameSizeInfo {
public:
MachineFrameSizeInfo(MachineFunction &MF) : MF(MF) {}
MachineFrameSizeInfo(MachineFunction &MF) : MF(MF) {
assert(MF.getFrameInfo().getSizeInfo() == nullptr);
MF.getFrameInfo().setSizeInfo(this);
}

~MachineFrameSizeInfo() { MF.getFrameInfo().setSizeInfo(nullptr); }

/// Get the call frame size just before MI. Contains no value if MI is not in
/// a call sequence. Zero-sized call frames are possible.
Expand Down
6 changes: 6 additions & 0 deletions llvm/lib/CodeGen/FinalizeISel.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,12 @@ static std::pair<bool, bool> runImpl(MachineFunction &MF) {
const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();
const TargetLowering *TLI = MF.getSubtarget().getTargetLowering();

// Pseudo-Lowering might require the sizes of call frames, so compute them
// (lazily). The MachineFrameSizeInfo registers itself in MF's
// MachineFrameInfo for the SizeInfo's lifetime and does not need to be passed
// explicitly.
const MachineFrameSizeInfo MFSI(MF);

// Iterate through each instruction in the function, looking for pseudos.
for (MachineFunction::iterator I = MF.begin(), E = MF.end(); I != E; ++I) {
MachineBasicBlock *MBB = &*I;
Expand Down
9 changes: 9 additions & 0 deletions llvm/lib/Target/X86/X86ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -35971,6 +35971,15 @@ X86TargetLowering::EmitLoweredTLSAddr(MachineInstr &MI,
// inside MC, therefore without the two markers shrink-wrapping
// may push the prologue/epilogue pass them.
const TargetInstrInfo &TII = *Subtarget.getInstrInfo();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why EmitLoweredTLS needs to be handled by raw MI emission. Why can't this just emit the call sequence during legalization, like for an ordinary call or dynamic_alloca? If you did that, you would avoid appearing in a call sequence in the first place

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now opened PR #113706 for an attempt to do that.

// Do not introduce CALLSEQ markers if we are already in a call sequence.
// Nested call sequences are not allowed and cause errors in the machine
// verifier.
MachineFrameSizeInfo *MFSI = MI.getMF()->getFrameInfo().getSizeInfo();
assert(MFSI && "Call frame size information needs to be available!");
if (MFSI->getCallFrameSizeAt(MI).has_value())
return BB;

const MIMetadata MIMD(MI);
MachineFunction &MF = *BB->getParent();

Expand Down
30 changes: 30 additions & 0 deletions llvm/test/CodeGen/X86/tls-function-argument.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc -mtriple=x86_64 -verify-machineinstrs -relocation-model=pic < %s | FileCheck %s

; Passing a pointer to thread-local storage to a function can be problematic
; since computing such addresses requires a function call that is introduced
; very late in instruction selection. We need to ensure that we don't introduce
; nested call sequence markers if this function call happens in a call sequence.

@TLS = internal thread_local global i64 zeroinitializer, align 8
declare void @bar(ptr)
define internal void @foo() {
; CHECK-LABEL: foo:
; CHECK: # %bb.0:
; CHECK-NEXT: pushq %rbx
; CHECK-NEXT: .cfi_def_cfa_offset 16
; CHECK-NEXT: .cfi_offset %rbx, -16
; CHECK-NEXT: leaq TLS@TLSLD(%rip), %rdi
; CHECK-NEXT: callq __tls_get_addr@PLT
; CHECK-NEXT: leaq TLS@DTPOFF(%rax), %rbx
; CHECK-NEXT: movq %rbx, %rdi
; CHECK-NEXT: callq bar@PLT
; CHECK-NEXT: movq %rbx, %rdi
; CHECK-NEXT: callq bar@PLT
; CHECK-NEXT: popq %rbx
; CHECK-NEXT: .cfi_def_cfa_offset 8
; CHECK-NEXT: retq
call void @bar(ptr @TLS)
call void @bar(ptr @TLS)
ret void
}
Loading