Skip to content

[X86][IPRA] Add getIPRACSRegs since frame registers are risked to be optimized out. #109597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 26, 2024

Conversation

FreddyLeaf
Copy link
Contributor

@FreddyLeaf FreddyLeaf commented Sep 23, 2024

X86 IPRA had below correctness issue: https://gcc.godbolt.org/z/6hh88xv9r
This patch is a workaround to fix it.

This patch is a workaround to fix the correctness of IPRA on X86.
@llvmbot
Copy link
Member

llvmbot commented Sep 23, 2024

@llvm/pr-subscribers-backend-x86

Author: Freddy Ye (FreddyLeaf)

Changes

This patch is a workaround to fix the correctness of IPRA on X86.


Full diff: https://github.com/llvm/llvm-project/pull/109597.diff

6 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/TargetRegisterInfo.h (+4)
  • (modified) llvm/lib/CodeGen/TargetFrameLoweringImpl.cpp (+5-1)
  • (modified) llvm/lib/Target/X86/X86CallingConv.td (+3)
  • (modified) llvm/lib/Target/X86/X86RegisterInfo.cpp (+5)
  • (modified) llvm/lib/Target/X86/X86RegisterInfo.h (+3)
  • (added) llvm/test/CodeGen/X86/ipra-local-linkage-2.ll (+37)
diff --git a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
index 1a2f31e199336a..0f6484fddfe61f 100644
--- a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
@@ -489,6 +489,10 @@ class TargetRegisterInfo : public MCRegisterInfo {
   virtual const MCPhysReg*
   getCalleeSavedRegs(const MachineFunction *MF) const = 0;
 
+  /// Return a null-terminated list of all of the callee-saved registers on
+  /// this target when IPRA is on. Normally, this list should be null.
+  virtual const MCPhysReg *getIPRACSRegs(const MachineFunction *MF) const = 0;
+
   /// Return a mask of call-preserved registers for the given calling convention
   /// on the current function. The mask should include all call-preserved
   /// aliases. This is used by the register allocator to determine which
diff --git a/llvm/lib/CodeGen/TargetFrameLoweringImpl.cpp b/llvm/lib/CodeGen/TargetFrameLoweringImpl.cpp
index 7d054cb7c7c71f..364cc933731dec 100644
--- a/llvm/lib/CodeGen/TargetFrameLoweringImpl.cpp
+++ b/llvm/lib/CodeGen/TargetFrameLoweringImpl.cpp
@@ -107,8 +107,12 @@ void TargetFrameLowering::determineCalleeSaves(MachineFunction &MF,
   // are preferred over callee saved registers.
   if (MF.getTarget().Options.EnableIPRA &&
       isSafeForNoCSROpt(MF.getFunction()) &&
-      isProfitableForNoCSROpt(MF.getFunction()))
+      isProfitableForNoCSROpt(MF.getFunction())) {
+    const MCPhysReg *IPRACSRegs = TRI.getIPRACSRegs(&MF);
+    for (unsigned i = 0; IPRACSRegs[i]; ++i)
+      SavedRegs.set(IPRACSRegs[i]);
     return;
+  }
 
   // Get the callee saved register list...
   const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
diff --git a/llvm/lib/Target/X86/X86CallingConv.td b/llvm/lib/Target/X86/X86CallingConv.td
index 307aeb2ea4c6fd..472823a6d036ba 100644
--- a/llvm/lib/Target/X86/X86CallingConv.td
+++ b/llvm/lib/Target/X86/X86CallingConv.td
@@ -1104,6 +1104,9 @@ def CC_X86 : CallingConv<[
 
 def CSR_NoRegs : CalleeSavedRegs<(add)>;
 
+def CSR_IPRA_32 : CalleeSavedRegs<(add EBP)>;
+def CSR_IPRA_64 : CalleeSavedRegs<(add RBP)>;
+
 def CSR_32 : CalleeSavedRegs<(add ESI, EDI, EBX, EBP)>;
 def CSR_64 : CalleeSavedRegs<(add RBX, R12, R13, R14, R15, RBP)>;
 
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.cpp b/llvm/lib/Target/X86/X86RegisterInfo.cpp
index 1d8808f4e2b7d0..302d50581e1e6b 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.cpp
+++ b/llvm/lib/Target/X86/X86RegisterInfo.cpp
@@ -410,6 +410,11 @@ X86RegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
   return CallsEHReturn ? CSR_32EHRet_SaveList : CSR_32_SaveList;
 }
 
+const MCPhysReg *
+X86RegisterInfo::getIPRACSRegs(const MachineFunction *MF) const {
+  return Is64Bit ? CSR_IPRA_64_SaveList : CSR_IPRA_32_SaveList;
+}
+
 const MCPhysReg *X86RegisterInfo::getCalleeSavedRegsViaCopy(
     const MachineFunction *MF) const {
   assert(MF && "Invalid MachineFunction pointer.");
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.h b/llvm/lib/Target/X86/X86RegisterInfo.h
index 2f73698a4b94d3..68ee372f27b14d 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.h
+++ b/llvm/lib/Target/X86/X86RegisterInfo.h
@@ -99,6 +99,9 @@ class X86RegisterInfo final : public X86GenRegisterInfo {
   /// callee-save registers on this target.
   const MCPhysReg *
   getCalleeSavedRegs(const MachineFunction* MF) const override;
+  /// getIPRACSRegs - This API can be removed when rbp is safe to optimized out
+  /// when IPRA is on.
+  const MCPhysReg *getIPRACSRegs(const MachineFunction *MF) const override;
   const MCPhysReg *
   getCalleeSavedRegsViaCopy(const MachineFunction *MF) const;
   const uint32_t *getCallPreservedMask(const MachineFunction &MF,
diff --git a/llvm/test/CodeGen/X86/ipra-local-linkage-2.ll b/llvm/test/CodeGen/X86/ipra-local-linkage-2.ll
new file mode 100644
index 00000000000000..f337dfc989d81a
--- /dev/null
+++ b/llvm/test/CodeGen/X86/ipra-local-linkage-2.ll
@@ -0,0 +1,37 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -enable-ipra < %s | FileCheck %s
+
+; This test is to ensure rbp is correctly saved/restored before/after the
+; inline asm call in foo()
+
+target triple = "x86_64--"
+
+define internal void @foo() norecurse nounwind {
+; CHECK-LABEL: foo:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    pushq %rbp
+; CHECK-NEXT:    #APP
+; CHECK-NEXT:    xorl %ebp, %ebp
+; CHECK-NEXT:    #NO_APP
+; CHECK-NEXT:    popq %rbp
+; CHECK-NEXT:    retq
+  call void asm sideeffect "xor %ebp, %ebp", "~{ebp}"()
+  ret void
+}
+
+define void @bar(i32 %X) "frame-pointer"="all" nounwind {
+; CHECK-LABEL: bar:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    pushq %rbp
+; CHECK-NEXT:    movq %rsp, %rbp
+; CHECK-NEXT:    subq $16, %rsp
+; CHECK-NEXT:    callq foo
+; CHECK-NEXT:    movl $5, -4(%rbp)
+; CHECK-NEXT:    addq $16, %rsp
+; CHECK-NEXT:    popq %rbp
+; CHECK-NEXT:    retq
+  call void @foo()
+  %addr = alloca i32, align 4
+  store i32 5, ptr %addr, align 4
+  ret void
+}

@KanRobert KanRobert self-requested a review September 23, 2024 05:52
@FreddyLeaf FreddyLeaf requested a review from RKSimon September 23, 2024 07:23
@weiguozhi weiguozhi requested a review from rnk September 23, 2024 17:46
@weiguozhi
Copy link
Contributor

X86 IPRA had below correctness issue: https://gcc.godbolt.org/z/6hh88xv9r This patch is a workaround to fix it.

The instructions for function foo is already broken even without considering IPRA. The RBP is not preserved according to the standard C calling convention.

Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would benefit from a clearer explanation of the problem in a GitHub issue.

If I try to fill in the details myself, I guess the issue arises because LLVM doesn't use RBP the same way other callee-saved registers are used. We simply reserve it from the set of allocatable registers, and hard code its usage to address stack memory when hasFP is true.

This should generalize to cover base pointers as well (ESI / RBP), where you'll see the same problem if you construct a complicated enough test case.

Do the main IPRA stakeholders feel like this is an acceptable fix? It effectively prevents making RBP an IPRA allocatable register, which seems like it might not be acceptable.

@FreddyLeaf
Copy link
Contributor Author

X86 IPRA had below correctness issue: https://gcc.godbolt.org/z/6hh88xv9r This patch is a workaround to fix it.

The instructions for function foo is already broken even without considering IPRA. The RBP is not preserved according to the standard C calling convention.

Emm. The rbp can be correctly preserved if we removed the option of -enable-ipra in the example, not sure if I understood you correctly.

@FreddyLeaf
Copy link
Contributor Author

FreddyLeaf commented Sep 24, 2024

I think this would benefit from a clearer explanation of the problem in a GitHub issue.

If I try to fill in the details myself, I guess the issue arises because LLVM doesn't use RBP the same way other callee-saved registers are used. We simply reserve it from the set of allocatable registers, and hard code its usage to address stack memory when hasFP is true.

This should generalize to cover base pointers as well (ESI / RBP), where you'll see the same problem if you construct a complicated enough test case.

Do the main IPRA stakeholders feel like this is an acceptable fix? It effectively prevents making RBP an IPRA allocatable register, which seems like it might not be acceptable.

Good idea. I'll create a new issue with more details. For ESI/RSI, since they are originally not X86 Callee Saved Registers, so I think IPRA won't have opportunity to optimize its push/pop out in the leaf function. Currently X86 didn't enable IPRA on by default. This issue is found by running cpu2017 benchmarks with explicitly enabling it on. I'll try add some IPRA guys. If you know some active ones, pls help add as well :)

@FreddyLeaf
Copy link
Contributor Author

Created a github issue: #109739

@FreddyLeaf
Copy link
Contributor Author

This issue is mainly due to $rbp is still not assigned at the stage of PEI pass. Instead it's a frame index: %stack.0.addr. So the risked registers should not only be $rbp, but all possible physical frame registers. Will update.
And sorry for the wrong conclusion above, ESI is indeed x86 callee saved register.

@FreddyLeaf FreddyLeaf changed the title [X86][IPRA] Add getIPRACSRegs since rbp is risked to be optimized out. [X86][IPRA] Add getIPRACSRegs since frame registers are risked to be optimized out. Sep 24, 2024
@FreddyLeaf
Copy link
Contributor Author

Though this patch is paying attention to IPRA optimizations on callee save operations, the main component of IPRA optimizations should be contributed from caller save operations :)

@FreddyLeaf FreddyLeaf marked this pull request as draft September 24, 2024 08:55
@FreddyLeaf
Copy link
Contributor Author

need to fix CI, set as draft first

@FreddyLeaf FreddyLeaf marked this pull request as ready for review September 24, 2024 12:00
@FreddyLeaf
Copy link
Contributor Author

ping for review

@FreddyLeaf
Copy link
Contributor Author

So the problemed code this PR is paying attention now looks like so: https://gcc.godbolt.org/z/Tr8G5b36z

Copy link
Collaborator

@rnk rnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a major IPRA stakeholder, so there might be a better owner, but I think correctness comes before performance tradeoffs, so we should go ahead and land this and work to optimize from a less buggy state. Thanks for the fix!

@KanRobert
Copy link
Contributor

Should we enable IPRA by default for x86 after this patch?

@FreddyLeaf
Copy link
Contributor Author

FreddyLeaf commented Sep 26, 2024

Should we enable IPRA by default for x86 after this patch?

It depends on broader performance data, which I haven't got yet. We can consider it after then.

@FreddyLeaf FreddyLeaf merged commit 14b567d into llvm:main Sep 26, 2024
8 checks passed
@FreddyLeaf FreddyLeaf deleted the ipra_fix2 branch September 26, 2024 02:57
Sterling-Augustine pushed a commit to Sterling-Augustine/llvm-project that referenced this pull request Sep 27, 2024
…optimized out. (llvm#109597)

X86 IPRA had below correctness issue:
https://gcc.godbolt.org/z/6hh88xv9r
This patch is a workaround to fix it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants