Skip to content

[MachineLICM] Give opportunity to analyze physregs for invariance. #84779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 15, 2024

Conversation

sdesmalen-arm
Copy link
Collaborator

At the moment MachineLoopInfo has a very simple way to determine if a use of a physical register will be invariant: if it is not a constant value or if it's not an ignorable use, then it's not considered invariant.

From a compile-time performance perspective this makes a lot of sense, but it limits code that uses implicit physical registers from being hoisted until the later MachineLICM pass (after register allocation), which has a lot fewer opportunities to hoist.

For AArch64 SME we use an implicit physical register ($vg) to avoid rematerialization beyond certain instructions. Doing this led to regressions because simple expressions were no longer hoisted by Early MachineLICM.

This patch adds some extra checks to 'isLoopInvariant' to see if any of the defs are found in the loop. If not, we can considered it loop invariant.

We expect the impact on compile-time to be negligible because there is an incentive for users to reduce the need for the smstart/smstop instructions that define $vg. In either case, we've put the functionality under a target interface to limit this only to specific registers.

At the moment MachineLoopInfo has a very simple way to determine if
a use of a physical register will be invariant: if it is not a constant
value or if it's not an ignorable use, then it's not considered invariant.

From a compile-time performance perspective this makes a lot of sense,
but it limits code that uses implicit physical registers from being
hoisted until the later MachineLICM pass (after register allocation),
which has a lot fewer opportunities to hoist.

For AArch64 SME we use an implicit physical register ($vg) to avoid
rematerialization beyond certain instructions. Doing this led to
regressions because simple expressions were no longer hoisted by Early
MachineLICM.

This patch adds some extra checks to 'isLoopInvariant' to see if any of the
defs are found in the loop. If not, we can considered it loop invariant.

We expect the impact on compile-time to be negligible because there is
an incentive for users to reduce the need for the smstart/smstop
instructions that define $vg. In either case, we've put the functionality
under a target interface to limit this only to specific registers.
@llvmbot
Copy link
Member

llvmbot commented Mar 11, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

At the moment MachineLoopInfo has a very simple way to determine if a use of a physical register will be invariant: if it is not a constant value or if it's not an ignorable use, then it's not considered invariant.

From a compile-time performance perspective this makes a lot of sense, but it limits code that uses implicit physical registers from being hoisted until the later MachineLICM pass (after register allocation), which has a lot fewer opportunities to hoist.

For AArch64 SME we use an implicit physical register ($vg) to avoid rematerialization beyond certain instructions. Doing this led to regressions because simple expressions were no longer hoisted by Early MachineLICM.

This patch adds some extra checks to 'isLoopInvariant' to see if any of the defs are found in the loop. If not, we can considered it loop invariant.

We expect the impact on compile-time to be negligible because there is an incentive for users to reduce the need for the smstart/smstop instructions that define $vg. In either case, we've put the functionality under a target interface to limit this only to specific registers.


Full diff: https://github.com/llvm/llvm-project/pull/84779.diff

6 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/MachineLoopInfo.h (+3)
  • (modified) llvm/include/llvm/CodeGen/TargetRegisterInfo.h (+4)
  • (modified) llvm/lib/CodeGen/MachineLoopInfo.cpp (+18-1)
  • (modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp (+5)
  • (modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.h (+2)
  • (added) llvm/test/CodeGen/AArch64/sme-machine-licm-vg.mir (+64)
diff --git a/llvm/include/llvm/CodeGen/MachineLoopInfo.h b/llvm/include/llvm/CodeGen/MachineLoopInfo.h
index 445c9b1c3bc00f..967c4a70ca4699 100644
--- a/llvm/include/llvm/CodeGen/MachineLoopInfo.h
+++ b/llvm/include/llvm/CodeGen/MachineLoopInfo.h
@@ -89,6 +89,9 @@ class MachineLoop : public LoopBase<MachineBasicBlock, MachineLoop> {
 private:
   friend class LoopInfoBase<MachineBasicBlock, MachineLoop>;
 
+  /// Returns true if the given physreg has no defs inside the loop.
+  bool isLoopInvariantImplicitPhysReg(Register Reg) const;
+
   explicit MachineLoop(MachineBasicBlock *MBB)
     : LoopBase<MachineBasicBlock, MachineLoop>(MBB) {}
 
diff --git a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
index e7c9ecd2e1851a..bf7e770aea8091 100644
--- a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
@@ -572,6 +572,10 @@ class TargetRegisterInfo : public MCRegisterInfo {
     return false;
   }
 
+  /// Returns true if MachineLoopInfo should analyze the given physreg
+  /// for loop invariance.
+  virtual bool shouldAnalyzePhysregInMachineLoopInfo(MCRegister R) const { return false; }
+
   /// Physical registers that may be modified within a function but are
   /// guaranteed to be restored before any uses. This is useful for targets that
   /// have call sequences where a GOT register may be updated by the caller
diff --git a/llvm/lib/CodeGen/MachineLoopInfo.cpp b/llvm/lib/CodeGen/MachineLoopInfo.cpp
index 1492c8c366fb41..6e7f4062f7e0f5 100644
--- a/llvm/lib/CodeGen/MachineLoopInfo.cpp
+++ b/llvm/lib/CodeGen/MachineLoopInfo.cpp
@@ -198,6 +198,23 @@ MDNode *MachineLoop::getLoopID() const {
   return LoopID;
 }
 
+bool MachineLoop::isLoopInvariantImplicitPhysReg(Register Reg) const {
+  MachineFunction *MF = getHeader()->getParent();
+  MachineRegisterInfo *MRI = &MF->getRegInfo();
+
+  if (MRI->isConstantPhysReg(Reg))
+    return true;
+
+  if (!MF->getSubtarget()
+           .getRegisterInfo()
+           ->shouldAnalyzePhysregInMachineLoopInfo(Reg))
+    return false;
+
+  return !llvm::any_of(
+      MRI->def_instructions(Reg),
+      [this, Reg](const MachineInstr &MI) { return this->contains(&MI); });
+}
+
 bool MachineLoop::isLoopInvariant(MachineInstr &I,
                                   const Register ExcludeReg) const {
   MachineFunction *MF = I.getParent()->getParent();
@@ -226,7 +243,7 @@ bool MachineLoop::isLoopInvariant(MachineInstr &I,
         // it could get allocated to something with a def during allocation.
         // However, if the physreg is known to always be caller saved/restored
         // then this use is safe to hoist.
-        if (!MRI->isConstantPhysReg(Reg) &&
+        if (!isLoopInvariantImplicitPhysReg(Reg) &&
             !(TRI->isCallerPreservedPhysReg(Reg.asMCReg(), *I.getMF())) &&
             !TII->isIgnorableUse(MO))
           return false;
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index ec59778cbb0efa..ad29003f1e8173 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -1072,3 +1072,8 @@ bool AArch64RegisterInfo::shouldCoalesce(
 
   return true;
 }
+
+bool AArch64RegisterInfo::shouldAnalyzePhysregInMachineLoopInfo(
+    MCRegister R) const {
+  return R == AArch64::VG;
+}
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.h b/llvm/lib/Target/AArch64/AArch64RegisterInfo.h
index e93352c13b3c72..5c8a5e029584fc 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.h
@@ -145,6 +145,8 @@ class AArch64RegisterInfo final : public AArch64GenRegisterInfo {
 
   void getOffsetOpcodes(const StackOffset &Offset,
                         SmallVectorImpl<uint64_t> &Ops) const override;
+
+  bool shouldAnalyzePhysregInMachineLoopInfo(MCRegister R) const override;
 };
 
 } // end namespace llvm
diff --git a/llvm/test/CodeGen/AArch64/sme-machine-licm-vg.mir b/llvm/test/CodeGen/AArch64/sme-machine-licm-vg.mir
new file mode 100644
index 00000000000000..e6cce9af0daf0e
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sme-machine-licm-vg.mir
@@ -0,0 +1,64 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
+# RUN: llc -mtriple=aarch64--linux-gnu -run-pass=early-machinelicm %s -verify-machineinstrs -o - | FileCheck %s
+---
+name:            test_should_hoist_pfalse
+tracksRegLiveness: true
+body:             |
+  ; CHECK-LABEL: name: test_should_hoist_pfalse
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT:   liveins: $x0, $x1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:gpr64 = COPY $x1
+  ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:gpr64 = COPY $x0
+  ; CHECK-NEXT:   MSRpstatesvcrImm1 1, 1, csr_aarch64_smstartstop, implicit-def dead $nzcv, implicit $vg, implicit-def $vg
+  ; CHECK-NEXT:   [[COPY2:%[0-9]+]]:gpr64all = COPY [[COPY1]]
+  ; CHECK-NEXT:   [[COPY3:%[0-9]+]]:gpr64all = COPY [[COPY]]
+  ; CHECK-NEXT:   [[PFALSE:%[0-9]+]]:ppr = PFALSE implicit $vg
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.2(0x40000000), %bb.1(0x40000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[PHI:%[0-9]+]]:gpr64common = PHI [[COPY2]], %bb.0, %5, %bb.1
+  ; CHECK-NEXT:   [[PHI1:%[0-9]+]]:gpr64sp = PHI [[COPY3]], %bb.0, %7, %bb.1
+  ; CHECK-NEXT:   STR_PXI [[PFALSE]], [[PHI]], 0
+  ; CHECK-NEXT:   [[SUBSXri:%[0-9]+]]:gpr64 = SUBSXri [[PHI1]], 1, 0, implicit-def $nzcv
+  ; CHECK-NEXT:   [[COPY4:%[0-9]+]]:gpr64all = COPY [[SUBSXri]]
+  ; CHECK-NEXT:   [[INCD_XPiI:%[0-9]+]]:gpr64 = INCD_XPiI [[PHI]], 31, 1
+  ; CHECK-NEXT:   [[COPY5:%[0-9]+]]:gpr64all = COPY [[INCD_XPiI]]
+  ; CHECK-NEXT:   Bcc 1, %bb.1, implicit $nzcv
+  ; CHECK-NEXT:   B %bb.2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
+  ; CHECK-NEXT:   MSRpstatesvcrImm1 1, 0, csr_aarch64_smstartstop, implicit-def dead $nzcv, implicit $vg, implicit-def $vg
+  ; CHECK-NEXT:   RET_ReallyLR
+  bb.0:
+    successors: %bb.1
+    liveins: $x0, $x1
+
+    %5:gpr64 = COPY $x1
+    %4:gpr64 = COPY $x0
+    MSRpstatesvcrImm1 1, 1, csr_aarch64_smstartstop, implicit-def dead $nzcv, implicit $vg, implicit-def $vg
+    %6:gpr64all = COPY %4
+    %7:gpr64all = COPY %5
+
+  bb.1:
+    successors: %bb.2, %bb.1
+
+    %0:gpr64common = PHI %6, %bb.0, %3, %bb.1
+    %1:gpr64sp = PHI %7, %bb.0, %2, %bb.1
+    %8:ppr = PFALSE implicit $vg
+    STR_PXI killed %8, %0, 0
+    %9:gpr64 = SUBSXri %1, 1, 0, implicit-def $nzcv
+    %2:gpr64all = COPY %9
+    %10:gpr64 = INCD_XPiI %0, 31, 1
+    %3:gpr64all = COPY %10
+
+
+    Bcc 1, %bb.1, implicit $nzcv
+    B %bb.2
+
+  bb.2:
+    MSRpstatesvcrImm1 1, 0, csr_aarch64_smstartstop, implicit-def dead $nzcv, implicit $vg, implicit-def $vg
+    RET_ReallyLR
+...

Copy link

github-actions bot commented Mar 11, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@rampitec rampitec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable, but isn't it the same as isIgnorableUse? Or do you need this because you want hoisting but still want to prevent rematerialization?

@sdesmalen-arm
Copy link
Collaborator Author

This looks reasonable, but isn't it the same as isIgnorableUse? Or do you need this because you want hoisting but still want to prevent rematerialization?

Yes, the latter. We've added the implicit def/use of these registers to prevent rematerialization from occurring.

@rampitec
Copy link
Collaborator

This looks reasonable, but isn't it the same as isIgnorableUse? Or do you need this because you want hoisting but still want to prevent rematerialization?

Yes, the latter. We've added the implicit def/use of these registers to prevent rematerialization from occurring.

Is it the same as adding code to ignore $vg to the isIgnorableUse instead and then override isTriviallyReMaterializable to check for $vg imp-use and return false?

@sdesmalen-arm
Copy link
Collaborator Author

This looks reasonable, but isn't it the same as isIgnorableUse? Or do you need this because you want hoisting but still want to prevent rematerialization?

Yes, the latter. We've added the implicit def/use of these registers to prevent rematerialization from occurring.

Is it the same as adding code to ignore $vg to the isIgnorableUse instead and then override isTriviallyReMaterializable to check for $vg imp-use and return false?

Unless I misunderstand what you're suggesting, I believe it is not the same because implicit $vg cannot be ignored. The dependence on $vg is important when trying to determine whether the instruction can be hoisted. Any definition of it in the loop should stop hoisting from happening. In contrast, isIgnorableUse does not result in any check for loop invariance of a physical register, it would instead just ignore the implicit use and continue the analysis as if it never saw the implicit use.

@rampitec
Copy link
Collaborator

Unless I misunderstand what you're suggesting, I believe it is not the same because implicit $vg cannot be ignored. The dependence on $vg is important when trying to determine whether the instruction can be hoisted. Any definition of it in the loop should stop hoisting from happening. In contrast, isIgnorableUse does not result in any check for loop invariance of a physical register, it would instead just ignore the implicit use and continue the analysis as if it never saw the implicit use.

OK, understood, thanks for the explanation!

Copy link
Collaborator

@rampitec rampitec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please give others time to review it.

@sdesmalen-arm
Copy link
Collaborator Author

@sjoerdmeijer @FreddyLeaf any other comments?

@banach-space
Copy link
Contributor

Thanks for preparing this 🙏🏻

@sjoerdmeijer @FreddyLeaf any other comments?

This unblocks some nice improvements for us in MLIR. Given that there's always a delay between MLIR and IREE (which is where ultimately we need this), could this be merged before the end of the week?

@sdesmalen-arm sdesmalen-arm merged commit 8310fd3 into llvm:main Mar 15, 2024
@sdesmalen-arm sdesmalen-arm deleted the fix-remat-regression2 branch March 15, 2024 09:30
@vitalybuka
Copy link
Collaborator

Could you please fix or revert https://lab.llvm.org/buildbot/#/builders/238/builds/8439
CC @kstoimenov

@sdesmalen-arm
Copy link
Collaborator Author

Could you please fix or revert https://lab.llvm.org/buildbot/#/builders/238/builds/8439 CC @kstoimenov

I don't understand, the build you pointed to passed successfully?

@vitalybuka
Copy link
Collaborator

vitalybuka commented Mar 18, 2024

Could you please fix or revert https://lab.llvm.org/buildbot/#/builders/238/builds/8439 CC @kstoimenov

I don't understand, the build you pointed to passed successfully?

Sorry, probably pasted wrong one.
I don't remember what was that, and all bots we are looking at were green after this patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants