-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[llvm][CodeGen] avoid repeated interval calculation in window scheduler #132352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llvm][CodeGen] avoid repeated interval calculation in window scheduler #132352
Conversation
Some new registers may be reused when replacing some old ones in certain use case of ModuloScheduleExpander. It is necessary to avoid repeated interval calculations for these registers.
@llvm/pr-subscribers-backend-aarch64 Author: Hua Tian (huaatian) ChangesSome new registers may be reused when replacing some old ones in Full diff: https://github.com/llvm/llvm-project/pull/132352.diff 3 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/ModuloSchedule.h b/llvm/include/llvm/CodeGen/ModuloSchedule.h
index b6000ba05d882..a1b949844371f 100644
--- a/llvm/include/llvm/CodeGen/ModuloSchedule.h
+++ b/llvm/include/llvm/CodeGen/ModuloSchedule.h
@@ -60,6 +60,7 @@
#ifndef LLVM_CODEGEN_MODULOSCHEDULE_H
#define LLVM_CODEGEN_MODULOSCHEDULE_H
+#include "llvm/ADT/SmallSet.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineLoopUtils.h"
#include "llvm/CodeGen/TargetInstrInfo.h"
@@ -189,7 +190,7 @@ class ModuloScheduleExpander {
InstrChangesTy InstrChanges;
/// Record the registers that need to compute live intervals.
- SmallVector<Register> NoIntervalRegs;
+ SmallSet<Register, 8> NoIntervalRegs;
void generatePipelinedLoop();
void generateProlog(unsigned LastStage, MachineBasicBlock *KernelBB,
diff --git a/llvm/lib/CodeGen/ModuloSchedule.cpp b/llvm/lib/CodeGen/ModuloSchedule.cpp
index d208a62a99372..50fc0f437a5c5 100644
--- a/llvm/lib/CodeGen/ModuloSchedule.cpp
+++ b/llvm/lib/CodeGen/ModuloSchedule.cpp
@@ -551,7 +551,7 @@ void ModuloScheduleExpander::generateExistingPhis(
if (IsLast && np == NumPhis - 1) {
replaceRegUsesAfterLoop(Def, NewReg, BB, MRI);
- NoIntervalRegs.push_back(NewReg);
+ NoIntervalRegs.insert(NewReg);
}
continue;
}
@@ -594,7 +594,7 @@ void ModuloScheduleExpander::generateExistingPhis(
// epilog.
if (IsLast && np == NumPhis - 1) {
replaceRegUsesAfterLoop(Def, NewReg, BB, MRI);
- NoIntervalRegs.push_back(NewReg);
+ NoIntervalRegs.insert(NewReg);
}
// In the kernel, a dependent Phi uses the value from this Phi.
@@ -617,7 +617,7 @@ void ModuloScheduleExpander::generateExistingPhis(
auto It = CurStageMap.find(LoopVal);
if (It != CurStageMap.end()) {
replaceRegUsesAfterLoop(Def, It->second, BB, MRI);
- NoIntervalRegs.push_back(It->second);
+ NoIntervalRegs.insert(It->second);
}
}
}
@@ -740,7 +740,7 @@ void ModuloScheduleExpander::generatePhis(
}
if (IsLast && np == NumPhis - 1) {
replaceRegUsesAfterLoop(Def, NewReg, BB, MRI);
- NoIntervalRegs.push_back(NewReg);
+ NoIntervalRegs.insert(NewReg);
}
}
}
@@ -1083,7 +1083,7 @@ void ModuloScheduleExpander::updateInstruction(MachineInstr *NewMI,
VRMap[CurStageNum][reg] = NewReg;
if (LastDef) {
replaceRegUsesAfterLoop(reg, NewReg, BB, MRI);
- NoIntervalRegs.push_back(NewReg);
+ NoIntervalRegs.insert(NewReg);
}
} else if (MO.isUse()) {
MachineInstr *Def = MRI.getVRegDef(reg);
diff --git a/llvm/test/CodeGen/AArch64/aarch64-swp-ws-live-intervals.mir b/llvm/test/CodeGen/AArch64/aarch64-swp-ws-live-intervals.mir
new file mode 100644
index 0000000000000..4f0e52239b90c
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/aarch64-swp-ws-live-intervals.mir
@@ -0,0 +1,103 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc --mtriple=aarch64 %s -O2 -run-pass=pipeliner -o - | FileCheck %s
+
+...
+---
+name: foo
+tracksRegLiveness: true
+body: |
+ ; CHECK-LABEL: name: foo
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.3(0x80000000)
+ ; CHECK-NEXT: liveins: $x0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:gpr64common = COPY $x0
+ ; CHECK-NEXT: [[FMOVD0_:%[0-9]+]]:fpr64 = FMOVD0
+ ; CHECK-NEXT: [[MOVi32imm:%[0-9]+]]:gpr32 = MOVi32imm 1
+ ; CHECK-NEXT: [[SUBREG_TO_REG:%[0-9]+]]:gpr64sp = SUBREG_TO_REG 0, [[MOVi32imm]], %subreg.sub_32
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.3:
+ ; CHECK-NEXT: successors: %bb.4(0x40000000), %bb.7(0x40000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[FADDDrr:%[0-9]+]]:fpr64 = nofpexcept FADDDrr [[FMOVD0_]], [[FMOVD0_]], implicit $fpcr
+ ; CHECK-NEXT: [[SUBSXri:%[0-9]+]]:gpr64 = nsw SUBSXri [[SUBREG_TO_REG]], 1, 0, implicit-def $nzcv
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:gpr64sp = COPY [[SUBSXri]]
+ ; CHECK-NEXT: [[FMOVDi:%[0-9]+]]:fpr64 = FMOVDi 112
+ ; CHECK-NEXT: Bcc 0, %bb.7, implicit $nzcv
+ ; CHECK-NEXT: B %bb.4
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.4:
+ ; CHECK-NEXT: successors: %bb.5(0x80000000), %bb.6(0x00000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[FADDDrr1:%[0-9]+]]:fpr64 = nofpexcept FADDDrr [[FADDDrr]], [[FMOVD0_]], implicit $fpcr
+ ; CHECK-NEXT: [[FADDDrr2:%[0-9]+]]:fpr64 = nofpexcept FADDDrr [[FMOVD0_]], [[FMOVD0_]], implicit $fpcr
+ ; CHECK-NEXT: [[SUBSXri1:%[0-9]+]]:gpr64 = nsw SUBSXri [[COPY1]], 1, 0, implicit-def $nzcv
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:gpr64all = COPY [[SUBSXri1]]
+ ; CHECK-NEXT: [[FMOVDi1:%[0-9]+]]:fpr64 = FMOVDi 112
+ ; CHECK-NEXT: Bcc 0, %bb.6, implicit $nzcv
+ ; CHECK-NEXT: B %bb.5
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.5:
+ ; CHECK-NEXT: successors: %bb.6(0x04000000), %bb.5(0x7c000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[PHI:%[0-9]+]]:gpr64sp = PHI [[COPY2]], %bb.4, %24, %bb.5
+ ; CHECK-NEXT: [[PHI1:%[0-9]+]]:fpr64 = PHI [[FMOVDi1]], %bb.4, %25, %bb.5
+ ; CHECK-NEXT: [[PHI2:%[0-9]+]]:fpr64 = PHI [[FMOVDi]], %bb.4, [[PHI1]], %bb.5
+ ; CHECK-NEXT: [[PHI3:%[0-9]+]]:fpr64 = PHI [[FADDDrr2]], %bb.4, %22, %bb.5
+ ; CHECK-NEXT: [[PHI4:%[0-9]+]]:fpr64 = PHI [[FADDDrr1]], %bb.4, %23, %bb.5
+ ; CHECK-NEXT: [[SUBSXri2:%[0-9]+]]:gpr64 = nsw SUBSXri [[PHI]], 1, 0, implicit-def $nzcv
+ ; CHECK-NEXT: [[FADDDrr3:%[0-9]+]]:fpr64 = nofpexcept FADDDrr [[PHI2]], [[FMOVD0_]], implicit $fpcr
+ ; CHECK-NEXT: [[FADDDrr4:%[0-9]+]]:fpr64 = nofpexcept FADDDrr [[PHI3]], [[PHI2]], implicit $fpcr
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:gpr64all = COPY [[SUBSXri2]]
+ ; CHECK-NEXT: STRDui [[PHI4]], [[COPY]], 0
+ ; CHECK-NEXT: [[FMOVDi2:%[0-9]+]]:fpr64 = FMOVDi 112
+ ; CHECK-NEXT: Bcc 1, %bb.5, implicit $nzcv
+ ; CHECK-NEXT: B %bb.6
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.6:
+ ; CHECK-NEXT: successors: %bb.7(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[PHI5:%[0-9]+]]:fpr64 = PHI [[FMOVDi]], %bb.4, [[PHI1]], %bb.5
+ ; CHECK-NEXT: [[PHI6:%[0-9]+]]:fpr64 = PHI [[FADDDrr2]], %bb.4, [[FADDDrr3]], %bb.5
+ ; CHECK-NEXT: [[PHI7:%[0-9]+]]:fpr64 = PHI [[FADDDrr1]], %bb.4, [[FADDDrr4]], %bb.5
+ ; CHECK-NEXT: STRDui [[PHI7]], [[COPY]], 0
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.7:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[PHI8:%[0-9]+]]:fpr64 = PHI [[FMOVD0_]], %bb.3, [[PHI5]], %bb.6
+ ; CHECK-NEXT: [[PHI9:%[0-9]+]]:fpr64 = PHI [[FADDDrr]], %bb.3, [[PHI6]], %bb.6
+ ; CHECK-NEXT: [[FADDDrr5:%[0-9]+]]:fpr64 = nofpexcept FADDDrr [[PHI9]], [[PHI8]], implicit $fpcr
+ ; CHECK-NEXT: STRDui [[FADDDrr5]], [[COPY]], 0
+ ; CHECK-NEXT: B %bb.2
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: RET_ReallyLR
+ bb.0:
+ successors: %bb.1(0x80000000)
+ liveins: $x0
+
+ %0:gpr64common = COPY $x0
+ %1:fpr64 = FMOVD0
+ %2:gpr32 = MOVi32imm 1
+ %3:gpr64all = SUBREG_TO_REG 0, killed %2, %subreg.sub_32
+
+ bb.1:
+ successors: %bb.2(0x04000000), %bb.1(0x7c000000)
+
+ %4:gpr64sp = PHI %3, %bb.0, %5, %bb.1
+ %6:fpr64 = PHI %1, %bb.0, %7, %bb.1
+ %8:fpr64 = PHI %1, %bb.0, %6, %bb.1
+ %9:fpr64 = nofpexcept FADDDrr %8, %1, implicit $fpcr
+ %10:fpr64 = nofpexcept FADDDrr killed %9, %6, implicit $fpcr
+ STRDui killed %10, %0, 0
+ %11:gpr64 = nsw SUBSXri %4, 1, 0, implicit-def $nzcv
+ %5:gpr64all = COPY %11
+ %7:fpr64 = FMOVDi 112
+ Bcc 1, %bb.1, implicit $nzcv
+ B %bb.2
+
+ bb.2:
+ RET_ReallyLR
+
+...
|
@arsenm Do you think it is appropriate to not calculate the register's live interval here, but instead delay the calculation until getInterval() is called, as suggested by @ytmukai? |
I added the following code to test the behavior when there is no interval. void SwingSchedulerDAG::registerPressureFilter(NodeSetType &NodeSets) {
+ for (unsigned Reg = 0; Reg < MRI.getNumVirtRegs(); ++Reg) {
+ Register VReg = Register::index2VirtReg(Reg);
+ if (LIS.hasInterval(VReg))
+ LIS.removeInterval(VReg);
+ }
for (auto &NS : NodeSets) { As a result, there was no change in the nodes where |
We are very much looking forward to your suggestions. Thank you! @arsenm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deferring the computation is fine, as long as the verifier is happy
Strictly speaking, it might be necessary to fix and restore the following code. The intention of this code is to invalidate the interval because the reference to an existing register was changed. However, the existing register is not llvm-project/llvm/lib/CodeGen/ModuloSchedule.cpp Lines 352 to 353 in e0fee65
|
@ytmukai From the code, it seems that an empty interval is being created for ToReg, which has caused some issues. I think we can remove the interval for FromReg, but it might be more appropriate to implement this in a new PR. |
…er (llvm#132352) Some new registers are reused when replacing some old ones in certain use case of ModuloScheduleExpander. It is necessary to avoid repeated interval calculations for these registers. (cherry picked from commit 7e65944)
Some new registers are reused when replacing some old ones in
certain use case of ModuloScheduleExpander. It is necessary to
avoid repeated interval calculations for these registers.