Skip to content

[llvm][CodeGen] avoid repeated interval calculation in window scheduler #132352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

huaatian
Copy link
Contributor

@huaatian huaatian commented Mar 21, 2025

Some new registers are reused when replacing some old ones in
certain use case of ModuloScheduleExpander. It is necessary to
avoid repeated interval calculations for these registers.

Some new registers may be reused when replacing some old ones in
certain use case of ModuloScheduleExpander. It is necessary to
avoid repeated interval calculations for these registers.
@llvmbot
Copy link
Member

llvmbot commented Mar 21, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Hua Tian (huaatian)

Changes

Some new registers may be reused when replacing some old ones in
certain use case of ModuloScheduleExpander. It is necessary to
avoid repeated interval calculations for these registers.


Full diff: https://github.com/llvm/llvm-project/pull/132352.diff

3 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/ModuloSchedule.h (+2-1)
  • (modified) llvm/lib/CodeGen/ModuloSchedule.cpp (+5-5)
  • (added) llvm/test/CodeGen/AArch64/aarch64-swp-ws-live-intervals.mir (+103)
diff --git a/llvm/include/llvm/CodeGen/ModuloSchedule.h b/llvm/include/llvm/CodeGen/ModuloSchedule.h
index b6000ba05d882..a1b949844371f 100644
--- a/llvm/include/llvm/CodeGen/ModuloSchedule.h
+++ b/llvm/include/llvm/CodeGen/ModuloSchedule.h
@@ -60,6 +60,7 @@
 #ifndef LLVM_CODEGEN_MODULOSCHEDULE_H
 #define LLVM_CODEGEN_MODULOSCHEDULE_H
 
+#include "llvm/ADT/SmallSet.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineLoopUtils.h"
 #include "llvm/CodeGen/TargetInstrInfo.h"
@@ -189,7 +190,7 @@ class ModuloScheduleExpander {
   InstrChangesTy InstrChanges;
 
   /// Record the registers that need to compute live intervals.
-  SmallVector<Register> NoIntervalRegs;
+  SmallSet<Register, 8> NoIntervalRegs;
 
   void generatePipelinedLoop();
   void generateProlog(unsigned LastStage, MachineBasicBlock *KernelBB,
diff --git a/llvm/lib/CodeGen/ModuloSchedule.cpp b/llvm/lib/CodeGen/ModuloSchedule.cpp
index d208a62a99372..50fc0f437a5c5 100644
--- a/llvm/lib/CodeGen/ModuloSchedule.cpp
+++ b/llvm/lib/CodeGen/ModuloSchedule.cpp
@@ -551,7 +551,7 @@ void ModuloScheduleExpander::generateExistingPhis(
 
               if (IsLast && np == NumPhis - 1) {
                 replaceRegUsesAfterLoop(Def, NewReg, BB, MRI);
-                NoIntervalRegs.push_back(NewReg);
+                NoIntervalRegs.insert(NewReg);
               }
               continue;
             }
@@ -594,7 +594,7 @@ void ModuloScheduleExpander::generateExistingPhis(
       // epilog.
       if (IsLast && np == NumPhis - 1) {
         replaceRegUsesAfterLoop(Def, NewReg, BB, MRI);
-        NoIntervalRegs.push_back(NewReg);
+        NoIntervalRegs.insert(NewReg);
       }
 
       // In the kernel, a dependent Phi uses the value from this Phi.
@@ -617,7 +617,7 @@ void ModuloScheduleExpander::generateExistingPhis(
       auto It = CurStageMap.find(LoopVal);
       if (It != CurStageMap.end()) {
         replaceRegUsesAfterLoop(Def, It->second, BB, MRI);
-        NoIntervalRegs.push_back(It->second);
+        NoIntervalRegs.insert(It->second);
       }
     }
   }
@@ -740,7 +740,7 @@ void ModuloScheduleExpander::generatePhis(
         }
         if (IsLast && np == NumPhis - 1) {
           replaceRegUsesAfterLoop(Def, NewReg, BB, MRI);
-          NoIntervalRegs.push_back(NewReg);
+          NoIntervalRegs.insert(NewReg);
         }
       }
     }
@@ -1083,7 +1083,7 @@ void ModuloScheduleExpander::updateInstruction(MachineInstr *NewMI,
       VRMap[CurStageNum][reg] = NewReg;
       if (LastDef) {
         replaceRegUsesAfterLoop(reg, NewReg, BB, MRI);
-        NoIntervalRegs.push_back(NewReg);
+        NoIntervalRegs.insert(NewReg);
       }
     } else if (MO.isUse()) {
       MachineInstr *Def = MRI.getVRegDef(reg);
diff --git a/llvm/test/CodeGen/AArch64/aarch64-swp-ws-live-intervals.mir b/llvm/test/CodeGen/AArch64/aarch64-swp-ws-live-intervals.mir
new file mode 100644
index 0000000000000..4f0e52239b90c
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/aarch64-swp-ws-live-intervals.mir
@@ -0,0 +1,103 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc --mtriple=aarch64 %s -O2 -run-pass=pipeliner -o - | FileCheck %s
+
+...
+---
+name:            foo
+tracksRegLiveness: true
+body:             |
+  ; CHECK-LABEL: name: foo
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.3(0x80000000)
+  ; CHECK-NEXT:   liveins: $x0
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:gpr64common = COPY $x0
+  ; CHECK-NEXT:   [[FMOVD0_:%[0-9]+]]:fpr64 = FMOVD0
+  ; CHECK-NEXT:   [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 1
+  ; CHECK-NEXT:   [[SUBREG_TO_REG:%[0-9]+]]:gpr64sp = SUBREG_TO_REG 0, [[MOVi32imm]], %subreg.sub_32
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.3:
+  ; CHECK-NEXT:   successors: %bb.4(0x40000000), %bb.7(0x40000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[FADDDrr:%[0-9]+]]:fpr64 = nofpexcept FADDDrr [[FMOVD0_]], [[FMOVD0_]], implicit $fpcr
+  ; CHECK-NEXT:   [[SUBSXri:%[0-9]+]]:gpr64 = nsw SUBSXri [[SUBREG_TO_REG]], 1, 0, implicit-def $nzcv
+  ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:gpr64sp = COPY [[SUBSXri]]
+  ; CHECK-NEXT:   [[FMOVDi:%[0-9]+]]:fpr64 = FMOVDi 112
+  ; CHECK-NEXT:   Bcc 0, %bb.7, implicit $nzcv
+  ; CHECK-NEXT:   B %bb.4
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.4:
+  ; CHECK-NEXT:   successors: %bb.5(0x80000000), %bb.6(0x00000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[FADDDrr1:%[0-9]+]]:fpr64 = nofpexcept FADDDrr [[FADDDrr]], [[FMOVD0_]], implicit $fpcr
+  ; CHECK-NEXT:   [[FADDDrr2:%[0-9]+]]:fpr64 = nofpexcept FADDDrr [[FMOVD0_]], [[FMOVD0_]], implicit $fpcr
+  ; CHECK-NEXT:   [[SUBSXri1:%[0-9]+]]:gpr64 = nsw SUBSXri [[COPY1]], 1, 0, implicit-def $nzcv
+  ; CHECK-NEXT:   [[COPY2:%[0-9]+]]:gpr64all = COPY [[SUBSXri1]]
+  ; CHECK-NEXT:   [[FMOVDi1:%[0-9]+]]:fpr64 = FMOVDi 112
+  ; CHECK-NEXT:   Bcc 0, %bb.6, implicit $nzcv
+  ; CHECK-NEXT:   B %bb.5
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.5:
+  ; CHECK-NEXT:   successors: %bb.6(0x04000000), %bb.5(0x7c000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[PHI:%[0-9]+]]:gpr64sp = PHI [[COPY2]], %bb.4, %24, %bb.5
+  ; CHECK-NEXT:   [[PHI1:%[0-9]+]]:fpr64 = PHI [[FMOVDi1]], %bb.4, %25, %bb.5
+  ; CHECK-NEXT:   [[PHI2:%[0-9]+]]:fpr64 = PHI [[FMOVDi]], %bb.4, [[PHI1]], %bb.5
+  ; CHECK-NEXT:   [[PHI3:%[0-9]+]]:fpr64 = PHI [[FADDDrr2]], %bb.4, %22, %bb.5
+  ; CHECK-NEXT:   [[PHI4:%[0-9]+]]:fpr64 = PHI [[FADDDrr1]], %bb.4, %23, %bb.5
+  ; CHECK-NEXT:   [[SUBSXri2:%[0-9]+]]:gpr64 = nsw SUBSXri [[PHI]], 1, 0, implicit-def $nzcv
+  ; CHECK-NEXT:   [[FADDDrr3:%[0-9]+]]:fpr64 = nofpexcept FADDDrr [[PHI2]], [[FMOVD0_]], implicit $fpcr
+  ; CHECK-NEXT:   [[FADDDrr4:%[0-9]+]]:fpr64 = nofpexcept FADDDrr [[PHI3]], [[PHI2]], implicit $fpcr
+  ; CHECK-NEXT:   [[COPY3:%[0-9]+]]:gpr64all = COPY [[SUBSXri2]]
+  ; CHECK-NEXT:   STRDui [[PHI4]], [[COPY]], 0
+  ; CHECK-NEXT:   [[FMOVDi2:%[0-9]+]]:fpr64 = FMOVDi 112
+  ; CHECK-NEXT:   Bcc 1, %bb.5, implicit $nzcv
+  ; CHECK-NEXT:   B %bb.6
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.6:
+  ; CHECK-NEXT:   successors: %bb.7(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[PHI5:%[0-9]+]]:fpr64 = PHI [[FMOVDi]], %bb.4, [[PHI1]], %bb.5
+  ; CHECK-NEXT:   [[PHI6:%[0-9]+]]:fpr64 = PHI [[FADDDrr2]], %bb.4, [[FADDDrr3]], %bb.5
+  ; CHECK-NEXT:   [[PHI7:%[0-9]+]]:fpr64 = PHI [[FADDDrr1]], %bb.4, [[FADDDrr4]], %bb.5
+  ; CHECK-NEXT:   STRDui [[PHI7]], [[COPY]], 0
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.7:
+  ; CHECK-NEXT:   successors: %bb.2(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   [[PHI8:%[0-9]+]]:fpr64 = PHI [[FMOVD0_]], %bb.3, [[PHI5]], %bb.6
+  ; CHECK-NEXT:   [[PHI9:%[0-9]+]]:fpr64 = PHI [[FADDDrr]], %bb.3, [[PHI6]], %bb.6
+  ; CHECK-NEXT:   [[FADDDrr5:%[0-9]+]]:fpr64 = nofpexcept FADDDrr [[PHI9]], [[PHI8]], implicit $fpcr
+  ; CHECK-NEXT:   STRDui [[FADDDrr5]], [[COPY]], 0
+  ; CHECK-NEXT:   B %bb.2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
+  ; CHECK-NEXT:   RET_ReallyLR
+  bb.0:
+    successors: %bb.1(0x80000000)
+    liveins: $x0
+
+    %0:gpr64common = COPY $x0
+    %1:fpr64 = FMOVD0
+    %2:gpr32 = MOVi32imm 1
+    %3:gpr64all = SUBREG_TO_REG 0, killed %2, %subreg.sub_32
+
+  bb.1:
+    successors: %bb.2(0x04000000), %bb.1(0x7c000000)
+
+    %4:gpr64sp = PHI %3, %bb.0, %5, %bb.1
+    %6:fpr64 = PHI %1, %bb.0, %7, %bb.1
+    %8:fpr64 = PHI %1, %bb.0, %6, %bb.1
+    %9:fpr64 = nofpexcept FADDDrr %8, %1, implicit $fpcr
+    %10:fpr64 = nofpexcept FADDDrr killed %9, %6, implicit $fpcr
+    STRDui killed %10, %0, 0
+    %11:gpr64 = nsw SUBSXri %4, 1, 0, implicit-def $nzcv
+    %5:gpr64all = COPY %11
+    %7:fpr64 = FMOVDi 112
+    Bcc 1, %bb.1, implicit $nzcv
+    B %bb.2
+
+  bb.2:
+    RET_ReallyLR
+
+...

@huaatian
Copy link
Contributor Author

huaatian commented Mar 26, 2025

@arsenm Do you think it is appropriate to not calculate the register's live interval here, but instead delay the calculation until getInterval() is called, as suggested by @ytmukai
#132677 (comment)

@ytmukai
Copy link
Contributor

ytmukai commented Mar 28, 2025

I added the following code to test the behavior when there is no interval.

 void SwingSchedulerDAG::registerPressureFilter(NodeSetType &NodeSets) {
+  for (unsigned Reg = 0; Reg < MRI.getNumVirtRegs(); ++Reg) {
+    Register VReg = Register::index2VirtReg(Reg);
+    if (LIS.hasInterval(VReg))
+      LIS.removeInterval(VReg);
+  }
   for (auto &NS : NodeSets) {

As a result, there was no change in the nodes where ExceedPressure is set. When dumping LIS after registerPressureFilter(), only the registers defined in the target block were recalculated, so it seems there is no influence from other loops.

@huaatian
Copy link
Contributor Author

huaatian commented Apr 2, 2025

@arsenm Do you think it is appropriate to not calculate the register's live interval here, but instead delay the calculation until getInterval() is called, as suggested by @ytmukai#132677 (comment)

We are very much looking forward to your suggestions. Thank you! @arsenm

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring the computation is fine, as long as the verifier is happy

@ytmukai
Copy link
Contributor

ytmukai commented Apr 2, 2025

Strictly speaking, it might be necessary to fix and restore the following code. The intention of this code is to invalidate the interval because the reference to an existing register was changed. However, the existing register is not ToReg but FromReg. Additionally, the invalidation should be done using removeInterval().
As mentioned above, within the scope currently used by MachinePipeliner, the interval of this register will not be referenced further.

if (!LIS.hasInterval(ToReg))
LIS.createEmptyInterval(ToReg);

@huaatian
Copy link
Contributor Author

huaatian commented Apr 2, 2025

Strictly speaking, it might be necessary to fix and restore the following code. The intention of this code is to invalidate the interval because the reference to an existing register was changed. However, the existing register is not ToReg but FromReg. Additionally, the invalidation should be done using removeInterval(). As mentioned above, within the scope currently used by MachinePipeliner, the interval of this register will not be referenced further.

@ytmukai From the code, it seems that an empty interval is being created for ToReg, which has caused some issues. I think we can remove the interval for FromReg, but it might be more appropriate to implement this in a new PR.

@huaatian huaatian merged commit 7e65944 into llvm:main Apr 3, 2025
11 checks passed
kraj pushed a commit to kraj/llvm-project that referenced this pull request Apr 14, 2025
…er (llvm#132352)

Some new registers are reused when replacing some old ones in
certain use case of ModuloScheduleExpander. It is necessary to
avoid repeated interval calculations for these registers.

(cherry picked from commit 7e65944)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants