-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AArch64] Re-enable rematerialization for streaming-mode-changing functions. #83235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ctions. We can add implicit defs/uses of the 'VG' register to the instructions to prevent the register allocator from rematerializing values in between streaming-mode changes, as the def/use of VG will further nail down the ordering that comes out of ISel. This avoids the heavy-handed approach to prevent any kind of rematerialization. While we could add 'VG' as a Use to all SVE instructions, we only really need to do this for instructions that are rematerializable, as the smstart/smstop instructions and pseudos act as scheduling barriers which is sufficient to prevent other instructions from being scheduled in between the streaming-mode-changing call sequence. However, we may revisit this in the future.
@llvm/pr-subscribers-backend-aarch64 Author: Sander de Smalen (sdesmalen-arm) ChangesWe can add implicit defs/uses of the 'VG' register to the instructions to prevent the register allocator from rematerializing values in between streaming-mode changes, as the def/use of VG will further nail down the ordering that comes out of ISel. This avoids the heavy-handed approach to prevent any kind of rematerialization. While we could add 'VG' as a Use to all SVE instructions, we only really need to do this for instructions that are rematerializable, as the smstart/smstop instructions and pseudos act as scheduling barriers which is sufficient to prevent other instructions from being scheduled in between the streaming-mode-changing call sequence. However, we may revisit this in the future. Patch is 55.00 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83235.diff 23 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 39c96092f10319..f88bfd0778fb0b 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -9494,40 +9494,13 @@ bool AArch64InstrInfo::isReallyTriviallyReMaterializable(
// Avoid rematerializing rematerializable instructions that use/define
// scalable values, such as 'pfalse' or 'ptrue', which result in different
// results when the runtime vector length is different.
- const MachineRegisterInfo &MRI = MF.getRegInfo();
const MachineFrameInfo &MFI = MF.getFrameInfo();
- if (any_of(MI.operands(), [&MRI, &MFI](const MachineOperand &MO) {
+ if (any_of(MI.operands(), [&MFI](const MachineOperand &MO) {
if (MO.isFI() &&
MFI.getStackID(MO.getIndex()) == TargetStackID::ScalableVector)
return true;
- if (!MO.isReg())
- return false;
-
- if (MO.getReg().isVirtual()) {
- const TargetRegisterClass *RC = MRI.getRegClass(MO.getReg());
- return AArch64::ZPRRegClass.hasSubClassEq(RC) ||
- AArch64::PPRRegClass.hasSubClassEq(RC);
- }
- return AArch64::ZPRRegClass.contains(MO.getReg()) ||
- AArch64::PPRRegClass.contains(MO.getReg());
}))
return false;
-
- // Avoid rematerializing instructions that return a value that is
- // different depending on vector length, even when it is not returned
- // in a scalable vector/predicate register.
- switch (MI.getOpcode()) {
- default:
- break;
- case AArch64::RDVLI_XI:
- case AArch64::ADDVL_XXI:
- case AArch64::ADDPL_XXI:
- case AArch64::CNTB_XPiI:
- case AArch64::CNTH_XPiI:
- case AArch64::CNTW_XPiI:
- case AArch64::CNTD_XPiI:
- return false;
- }
}
return TargetInstrInfo::isReallyTriviallyReMaterializable(MI);
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index b919c116445c8b..531f21f9c043a2 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -443,6 +443,9 @@ AArch64RegisterInfo::getStrictlyReservedRegs(const MachineFunction &MF) const {
Reserved.set(SubReg);
}
+ // VG cannot be allocated
+ Reserved.set(AArch64::VG);
+
if (MF.getSubtarget<AArch64Subtarget>().hasSME2()) {
for (MCSubRegIterator SubReg(AArch64::ZT0, this, /*self=*/true);
SubReg.isValid(); ++SubReg)
diff --git a/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
index acf067f2cc5a9d..2907ba74ff8108 100644
--- a/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
@@ -233,6 +233,8 @@ def MSRpstatePseudo :
(ins svcr_op:$pstatefield, timm0_1:$imm, GPR64:$rtpstate, timm0_1:$expected_pstate, variable_ops), []>,
Sched<[WriteSys]> {
let hasPostISelHook = 1;
+ let Uses = [VG];
+ let Defs = [VG];
}
def : Pat<(AArch64_smstart (i32 svcr_op:$pstate), (i64 GPR64:$rtpstate), (i64 timm0_1:$expected_pstate)),
diff --git a/llvm/lib/Target/AArch64/SMEInstrFormats.td b/llvm/lib/Target/AArch64/SMEInstrFormats.td
index 44d9a8ac7cb677..33cb5f9734b819 100644
--- a/llvm/lib/Target/AArch64/SMEInstrFormats.td
+++ b/llvm/lib/Target/AArch64/SMEInstrFormats.td
@@ -223,6 +223,8 @@ def MSRpstatesvcrImm1
let Inst{8} = imm;
let Inst{7-5} = 0b011; // op2
let hasPostISelHook = 1;
+ let Uses = [VG];
+ let Defs = [VG];
}
def : InstAlias<"smstart", (MSRpstatesvcrImm1 0b011, 0b1)>;
diff --git a/llvm/lib/Target/AArch64/SVEInstrFormats.td b/llvm/lib/Target/AArch64/SVEInstrFormats.td
index 789ec817d3d8b8..c8ca1832ec18dd 100644
--- a/llvm/lib/Target/AArch64/SVEInstrFormats.td
+++ b/llvm/lib/Target/AArch64/SVEInstrFormats.td
@@ -365,6 +365,7 @@ class sve_int_ptrue<bits<2> sz8_64, bits<3> opc, string asm, PPRRegOp pprty,
let ElementSize = pprty.ElementSize;
let hasSideEffects = 0;
let isReMaterializable = 1;
+ let Uses = [VG];
}
multiclass sve_int_ptrue<bits<3> opc, string asm, SDPatternOperator op> {
@@ -755,6 +756,7 @@ class sve_int_pfalse<bits<6> opc, string asm>
let hasSideEffects = 0;
let isReMaterializable = 1;
+ let Uses = [VG];
}
multiclass sve_int_pfalse<bits<6> opc, string asm> {
@@ -1090,6 +1092,7 @@ class sve_int_count<bits<3> opc, string asm>
let hasSideEffects = 0;
let isReMaterializable = 1;
+ let Uses = [VG];
}
multiclass sve_int_count<bits<3> opc, string asm, SDPatternOperator op> {
@@ -1982,6 +1985,7 @@ class sve_int_dup_mask_imm<string asm>
let DecoderMethod = "DecodeSVELogicalImmInstruction";
let hasSideEffects = 0;
let isReMaterializable = 1;
+ let Uses = [VG];
}
multiclass sve_int_dup_mask_imm<string asm> {
@@ -2862,6 +2866,7 @@ class sve_int_arith_vl<bit opc, string asm, bit streaming_sve = 0b0>
let Inst{4-0} = Rd;
let hasSideEffects = 0;
+ let Uses = [VG];
}
class sve_int_read_vl_a<bit op, bits<5> opc2, string asm, bit streaming_sve = 0b0>
@@ -2882,6 +2887,7 @@ class sve_int_read_vl_a<bit op, bits<5> opc2, string asm, bit streaming_sve = 0b
let hasSideEffects = 0;
let isReMaterializable = 1;
+ let Uses = [VG];
}
//===----------------------------------------------------------------------===//
@@ -4699,6 +4705,7 @@ class sve_int_dup_imm<bits<2> sz8_64, string asm,
let hasSideEffects = 0;
let isReMaterializable = 1;
+ let Uses = [VG];
}
multiclass sve_int_dup_imm<string asm> {
@@ -4741,6 +4748,7 @@ class sve_int_dup_fpimm<bits<2> sz8_64, Operand fpimmtype,
let hasSideEffects = 0;
let isReMaterializable = 1;
+ let Uses = [VG];
}
multiclass sve_int_dup_fpimm<string asm> {
@@ -5657,6 +5665,7 @@ class sve_int_index_ii<bits<2> sz8_64, string asm, ZPRRegOp zprty,
let hasSideEffects = 0;
let isReMaterializable = 1;
+ let Uses = [VG];
}
multiclass sve_int_index_ii<string asm> {
@@ -9308,6 +9317,7 @@ class sve2p1_ptrue_pn<string mnemonic, bits<2> sz, PNRP8to15RegOp pnrty, SDPatte
let hasSideEffects = 0;
let isReMaterializable = 1;
+ let Uses = [VG];
}
diff --git a/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir b/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir
index d44d45ea03b6bc..aca2816225e3ee 100644
--- a/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir
+++ b/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir
@@ -193,7 +193,7 @@ body: |
liveins: $z0, $z1, $p0, $p1, $w0
renamable $p2 = COPY killed $p0
- renamable $p0 = PTRUE_S 31
+ renamable $p0 = PTRUE_S 31, implicit $vg
ST1W_IMM killed renamable $z0, renamable $p0, %stack.0.z0.addr, 0 :: (store unknown-size into %ir.z0.addr, align 16)
ST1W_IMM killed renamable $z1, renamable $p0, %stack.1.z1.addr, 0 :: (store unknown-size into %ir.z1.addr, align 16)
STR_PXI killed renamable $p2, %stack.2.p0.addr, 0 :: (store unknown-size into %ir.p0.addr, align 2)
diff --git a/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir b/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir
index 75917ef32ae2ad..0ea180b20730f2 100644
--- a/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir
+++ b/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir
@@ -111,7 +111,7 @@ body: |
STRXui killed renamable $x1, %stack.1, 0, debug-location !8
DBG_VALUE %stack.1, $noreg, !11, !DIExpression(DW_OP_constu, 16, DW_OP_plus, DW_OP_deref), debug-location !8
- renamable $p2 = PTRUE_S 31, debug-location !DILocation(line: 4, column: 1, scope: !5)
+ renamable $p2 = PTRUE_S 31, implicit $vg, debug-location !DILocation(line: 4, column: 1, scope: !5)
ST1W_IMM renamable $z0, renamable $p2, %stack.2, 0, debug-location !DILocation(line: 5, column: 1, scope: !5)
DBG_VALUE %stack.2, $noreg, !12, !DIExpression(DW_OP_deref), debug-location !DILocation(line: 5, column: 1, scope: !5)
ST1W_IMM renamable $z1, killed renamable $p2, %stack.3, 0, debug-location !DILocation(line: 6, column: 1, scope: !5)
diff --git a/llvm/test/CodeGen/AArch64/live-debugvalues-sve.mir b/llvm/test/CodeGen/AArch64/live-debugvalues-sve.mir
index 8903ca2b865b98..612453ab53f438 100644
--- a/llvm/test/CodeGen/AArch64/live-debugvalues-sve.mir
+++ b/llvm/test/CodeGen/AArch64/live-debugvalues-sve.mir
@@ -145,7 +145,7 @@ body: |
liveins: $z1
ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp, debug-location !34
- renamable $p0 = PTRUE_S 31, debug-location !34
+ renamable $p0 = PTRUE_S 31, implicit $vg, debug-location !34
$x0 = ADDXri %stack.0, 0, 0, debug-location !34
ST1W_IMM renamable $z1, killed renamable $p0, %stack.0, 0, debug-location !34 :: (store unknown-size into %stack.0, align 16)
$z0 = COPY renamable $z1, debug-location !34
@@ -157,7 +157,7 @@ body: |
$z7 = COPY renamable $z1, debug-location !34
BL @bar, csr_aarch64_sve_aapcs, implicit-def dead $lr, implicit $sp, implicit $z0, implicit $z1, implicit $z2, implicit $z3, implicit $z4, implicit $z5, implicit $z6, implicit $z7, implicit $x0, implicit-def $sp, implicit-def $z0, implicit-def $z1, debug-location !34
ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp, debug-location !34
- renamable $p0 = PTRUE_S 31, debug-location !34
+ renamable $p0 = PTRUE_S 31, implicit $vg, debug-location !34
$z3 = IMPLICIT_DEF
renamable $z1 = LD1W_IMM renamable $p0, %stack.0, 0, debug-location !34 :: (load unknown-size from %stack.0, align 16)
ST1W_IMM renamable $z3, killed renamable $p0, %stack.0, 0 :: (store unknown-size into %stack.0, align 16)
diff --git a/llvm/test/CodeGen/AArch64/sve-localstackalloc.mir b/llvm/test/CodeGen/AArch64/sve-localstackalloc.mir
index 3fbb7889c8b79b..6063c8dfc792c9 100644
--- a/llvm/test/CodeGen/AArch64/sve-localstackalloc.mir
+++ b/llvm/test/CodeGen/AArch64/sve-localstackalloc.mir
@@ -48,7 +48,7 @@ body: |
%2:gpr32 = COPY $w0
%1:zpr = COPY $z1
%0:zpr = COPY $z0
- %5:ppr_3b = PTRUE_B 31
+ %5:ppr_3b = PTRUE_B 31, implicit $vg
%6:gpr64sp = ADDXri %stack.0, 0, 0
ST1B_IMM %1, %5, %6, 1 :: (store unknown-size, align 16)
ST1B_IMM %0, %5, %stack.0, 0 :: (store unknown-size into %stack.0, align 16)
diff --git a/llvm/test/CodeGen/AArch64/sve-pfalse-machine-cse.mir b/llvm/test/CodeGen/AArch64/sve-pfalse-machine-cse.mir
index b76fe7821b6c69..8395a7619fbb46 100644
--- a/llvm/test/CodeGen/AArch64/sve-pfalse-machine-cse.mir
+++ b/llvm/test/CodeGen/AArch64/sve-pfalse-machine-cse.mir
@@ -11,15 +11,15 @@ body: |
; CHECK: liveins: $p0
; CHECK-NEXT: {{ $}}
; CHECK-NEXT: [[COPY:%[0-9]+]]:ppr = COPY $p0
- ; CHECK-NEXT: [[PFALSE:%[0-9]+]]:ppr = PFALSE
+ ; CHECK-NEXT: [[PFALSE:%[0-9]+]]:ppr = PFALSE implicit $vg
; CHECK-NEXT: [[UZP1_PPP_B:%[0-9]+]]:ppr = UZP1_PPP_B [[COPY]], [[PFALSE]]
; CHECK-NEXT: [[UZP1_PPP_B1:%[0-9]+]]:ppr = UZP1_PPP_B killed [[UZP1_PPP_B]], [[PFALSE]]
; CHECK-NEXT: $p0 = COPY [[UZP1_PPP_B1]]
; CHECK-NEXT: RET_ReallyLR implicit $p0
%0:ppr = COPY $p0
- %2:ppr = PFALSE
+ %2:ppr = PFALSE implicit $vg
%3:ppr = UZP1_PPP_B %0, %2
- %4:ppr = PFALSE
+ %4:ppr = PFALSE implicit $vg
%5:ppr = UZP1_PPP_B killed %3, %4
$p0 = COPY %5
RET_ReallyLR implicit $p0
diff --git a/llvm/test/CodeGen/AArch64/sve-pseudos-expand-undef.mir b/llvm/test/CodeGen/AArch64/sve-pseudos-expand-undef.mir
index df0e50de4d1a7a..ae70f91a4ec641 100644
--- a/llvm/test/CodeGen/AArch64/sve-pseudos-expand-undef.mir
+++ b/llvm/test/CodeGen/AArch64/sve-pseudos-expand-undef.mir
@@ -26,7 +26,7 @@ body: |
name: expand_mls_to_msb
body: |
bb.0:
- renamable $p0 = PTRUE_B 31
+ renamable $p0 = PTRUE_B 31, implicit $vg
renamable $z0 = MLS_ZPZZZ_B_UNDEF killed renamable $p0, killed renamable $z2, killed renamable $z0, killed renamable $z1
RET_ReallyLR implicit $z0
...
@@ -36,7 +36,7 @@ body: |
name: expand_mla_to_mad
body: |
bb.0:
- renamable $p0 = PTRUE_B 31
+ renamable $p0 = PTRUE_B 31, implicit $vg
renamable $z0 = MLA_ZPZZZ_B_UNDEF killed renamable $p0, killed renamable $z2, killed renamable $z0, killed renamable $z1
RET_ReallyLR implicit $z0
...
diff --git a/llvm/test/CodeGen/AArch64/sve-ptest-removal-cmpeq.mir b/llvm/test/CodeGen/AArch64/sve-ptest-removal-cmpeq.mir
index 81318aa5c2a58d..5169113697dc60 100644
--- a/llvm/test/CodeGen/AArch64/sve-ptest-removal-cmpeq.mir
+++ b/llvm/test/CodeGen/AArch64/sve-ptest-removal-cmpeq.mir
@@ -174,7 +174,7 @@ body: |
%1:zpr = COPY $z0
%0:ppr_3b = COPY $p0
%2:ppr = CMPEQ_PPzZI_B %0, %1, 0, implicit-def dead $nzcv
- %3:ppr = PTRUE_B 31
+ %3:ppr = PTRUE_B 31, implicit $vg
PTEST_PP killed %3, killed %2, implicit-def $nzcv
%4:gpr32 = COPY $wzr
%5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
@@ -409,14 +409,14 @@ body: |
; CHECK-LABEL: name: cmpeq_imm_nxv16i8_ptest_not_all_active
; CHECK: %2:ppr = CMPEQ_PPzZI_B %0, %1, 0, implicit-def dead $nzcv
- ; CHECK-NEXT: %3:ppr = PTRUE_B 0
+ ; CHECK-NEXT: %3:ppr = PTRUE_B 0, implicit $vg
; CHECK-NEXT: PTEST_PP killed %3, killed %2, implicit-def $nzcv
; CHECK-NEXT: %4:gpr32 = COPY $wzr
; CHECK-NEXT: %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
%1:zpr = COPY $z0
%0:ppr_3b = COPY $p0
%2:ppr = CMPEQ_PPzZI_B %0, %1, 0, implicit-def dead $nzcv
- %3:ppr = PTRUE_B 0
+ %3:ppr = PTRUE_B 0, implicit $vg
PTEST_PP killed %3, killed %2, implicit-def $nzcv
%4:gpr32 = COPY $wzr
%5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
@@ -446,14 +446,14 @@ body: |
; CHECK-LABEL: name: cmpeq_imm_nxv16i8_ptest_of_halfs
; CHECK: %2:ppr = CMPEQ_PPzZI_B %0, %1, 0, implicit-def dead $nzcv
- ; CHECK-NEXT: %3:ppr = PTRUE_H 31
+ ; CHECK-NEXT: %3:ppr = PTRUE_H 31, implicit $vg
; CHECK-NEXT: PTEST_PP killed %3, killed %2, implicit-def $nzcv
; CHECK-NEXT: %4:gpr32 = COPY $wzr
; CHECK-NEXT: %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
%1:zpr = COPY $z0
%0:ppr_3b = COPY $p0
%2:ppr = CMPEQ_PPzZI_B %0, %1, 0, implicit-def dead $nzcv
- %3:ppr = PTRUE_H 31
+ %3:ppr = PTRUE_H 31, implicit $vg
PTEST_PP killed %3, killed %2, implicit-def $nzcv
%4:gpr32 = COPY $wzr
%5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
diff --git a/llvm/test/CodeGen/AArch64/sve-ptest-removal-whilege.mir b/llvm/test/CodeGen/AArch64/sve-ptest-removal-whilege.mir
index 8f7467d99154e3..c1d9dfff73447c 100644
--- a/llvm/test/CodeGen/AArch64/sve-ptest-removal-whilege.mir
+++ b/llvm/test/CodeGen/AArch64/sve-ptest-removal-whilege.mir
@@ -30,7 +30,7 @@ body: |
; CHECK-NEXT: %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
%1:gpr32 = COPY $w1
%0:gpr32 = COPY $w0
- %2:ppr = PTRUE_B 31
+ %2:ppr = PTRUE_B 31, implicit $vg
%3:ppr = WHILEGE_PWW_B %0, %1, implicit-def dead $nzcv
PTEST_PP killed %2, killed %3, implicit-def $nzcv
%4:gpr32 = COPY $wzr
@@ -63,7 +63,7 @@ body: |
; CHECK-NOT: PTEST
%1:gpr64 = COPY $x1
%0:gpr64 = COPY $x0
- %2:ppr = PTRUE_B 31
+ %2:ppr = PTRUE_B 31, implicit $vg, implicit $vg
%3:ppr = WHILEGE_PXX_B %0, %1, implicit-def dead $nzcv
PTEST_PP killed %2, killed %3, implicit-def $nzcv
%4:gpr32 = COPY $wzr
@@ -98,7 +98,7 @@ body: |
; CHECK-NOT: PTEST
%1:gpr32 = COPY $w1
%0:gpr32 = COPY $w0
- %2:ppr = PTRUE_H 31
+ %2:ppr = PTRUE_H 31, implicit $vg
%4:ppr = WHILEGE_PWW_H %0, %1, implicit-def dead $nzcv
PTEST_PP %2, %4, implicit-def $nzcv
%6:gpr32 = COPY $wzr
@@ -133,7 +133,7 @@ body: |
; CHECK-NOT: PTEST
%1:gpr64 = COPY $x1
%0:gpr64 = COPY $x0
- %2:ppr = PTRUE_H 31
+ %2:ppr = PTRUE_H 31, implicit $vg
%4:ppr = WHILEGE_PXX_H %0, %1, implicit-def dead $nzcv
PTEST_PP %2, %4, implicit-def $nzcv
%6:gpr32 = COPY $wzr
@@ -168,7 +168,7 @@ body: |
; CHECK-NOT: PTEST
%1:gpr32 = COPY $w1
%0:gpr32 = COPY $w0
- %2:ppr = PTRUE_S 31
+ %2:ppr = PTRUE_S 31, implicit $vg
%4:ppr = WHILEGE_PWW_S %0, %1, implicit-def dead $nzcv
PTEST_PP %2, %4, implicit-def $nzcv
%6:gpr32 = COPY $wzr
@@ -203,7 +203,7 @@ body: |
; CHECK-NOT: PTEST
%1:gpr64 = COPY $x1
%0:gpr64 = COPY $x0
- %2:ppr = PTRUE_S 31
+ %2:ppr = PTRUE_S 31, implicit $vg
%4:ppr = WHILEGE_PXX_S %0, %1, implicit-def dead $nzcv
PTEST_PP %2, %4, implicit-def $nzcv
%6:gpr32 = COPY $wzr
@@ -238,7 +238,7 @@ body: |
; CHECK-NOT: PTEST
%1:gpr32 = COPY $w1
%0:gpr32 = COPY $w0
- %2:ppr = PTRUE_D 31
+ %2:ppr = PTRUE_D 31, implicit $vg
%4:ppr = WHILEGE_PWW_D %0, %1, implicit-def dead $nzcv
PTEST_PP %2, %4, implicit-def $nzcv
%6:gpr32 = COPY $wzr
@@ -273,7 +273,7 @@ body: |
; CHECK-NOT: PTEST
%1:gpr64 = COPY $x1
%0:gpr64 = COPY $x0
- %2:ppr = PTRUE_D 31
+ %2:ppr = PTRUE_D 31, implicit $vg
%4:ppr = WHILEGE_PXX_D %0, %1, implicit-def dead $nzcv
PTEST_PP %2, %4, implicit-def $nzcv
%6:gpr32 = COPY $wzr
@@ -313,7 +313,7 @@ body: |
; CHECK-NEXT: %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
%1:gpr64 = COPY $x1
%0:gpr64 = COPY $x0
- %2:ppr = PTRUE_B 0
+ %2:ppr = PTRUE_B 0, implicit $vg
%3:ppr = WHILEGE_PXX_B %0, %1, implicit-def dead $nzcv
PTEST_PP killed %2, killed %3, implicit-def $nzcv
%4:gpr32 = COPY $wzr
@@ -353,7 +353,7 @@ body: |
; CHECK-NEXT: %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
%1:gpr64 = COPY $x1
%0:gpr64 = COPY $x0
- %2:ppr = PTRUE_H 31
+ %2:ppr = PTRUE_H 31, implicit $vg
%3:ppr = WHILEGE_PXX_B %0, %1, implicit-def dead $nzcv
PTEST_PP killed %2, killed %3, implicit-def $nzcv
%4:gpr32 = COPY $wzr
@@ -393,7 +393,7 @@ body: |
; CHECK-NEXT: %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
%1:gpr64 = COPY $x1
%0:gpr64 = COPY $x0
- %2:ppr = PTRUE_S 31
+ %2:ppr = PTRUE_S 31, implicit $vg
%3:ppr = WHILEGE_PXX_B %0, %1, implicit-def dead $nzcv
PTEST_PP killed %2, killed %3, implicit-def $nzcv
%4:gpr32 = COPY $wzr
@@ -433,7 +433,7 @@ body: |
; CHECK-NEXT: %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
%1:gpr64 = COPY $x1
%0:gpr64 = COPY $x0
- %2:ppr = PTRUE_D 31
+ %2:ppr = PTRUE_D 31, implicit $vg
%3:ppr = WHILEGE_PXX_B %0, %1, implicit-def dead $nzcv
PTEST_PP killed %2, killed %3, implicit-def $nzcv
%4:gpr32 = COPY $wzr
diff --git a/llvm/test/CodeGen/AArch64/sve-ptest-removal-whilegt.mir b/llvm/test/CodeGen/AArch64/sve-ptest-removal-whilegt.mir
index 217d984560e36c..c6df21f85db773 100644
--- a/llvm/test/CodeGen/AArch64/sve-ptest-removal-whilegt.mir
+++ b/llvm/test/CodeGen/AArch64/sve-ptest-removal-whilegt.mir
@@ -30,7 +30,7 @@ body: |
; CHECK-NEXT: %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
%1:gpr32 = COPY $w1
%0:gpr32 = COPY $w0
- %2:ppr = PTRUE_B 31
+ %2:ppr = PTRUE_B 31, implicit $vg
%3:ppr = WHILEGT_PWW_B %0, %1, implicit-def dead $nzcv
PTEST_PP killed %2, killed %3, implicit-def $nzcv
%4:gpr32 = COPY $wzr
@@ -63,7 +63,7 @@ body: |
; CHECK-NOT: PTEST
%1:gpr64 = COPY $x1
%0:gpr64 = COPY $x0
- %2:ppr = PTRUE_B 31
+ %2:ppr = PTRUE_B 31, implicit $vg, implicit $vg
%3:ppr = WHILEGT_PXX_B %0, %1, implicit-def dead $nzcv
PTEST_PP killed %2, killed %3, implicit-def $nzcv
%4:gpr32 = COPY $wzr
@@ -98,7 +98,7 @@ body: |
; CHECK-NOT: PTEST
%1:gpr32 = COPY $w1
%0:gpr32 = COPY $w0
- %2:ppr = PTRUE_H 31
+ %2:ppr = PTRUE_H 31, implicit $vg, implicit $vg
%4:ppr = WHILEGT_PWW_H %0, %1, implicit-def dead $nzcv
PTEST_PP %2, %4, implicit-def $nzcv
%6:gpr32 = COPY $wzr
@@ -133,7 +133,7 @@ body: |
; CHECK-NOT: PTEST
%1:gpr64 = COPY $x1
%0:gpr64 = COPY $x0
- %2:ppr = PTRUE_H 31
+ %2:ppr = PTRUE_H 31, implicit $vg, implicit $vg
%4:ppr = WHILEGT_PXX_H %0, %1, implicit-def dead $nzcv
PTEST_PP %2, %4, implicit-def $nzcv
%6:gpr32 = COPY $wzr
@@ -168,7 +168,7 @@ body: |
; CHECK-NOT: PT...
[truncated]
|
The approach looks sensible to me. I do think we should roll out the |
// VG cannot be allocated | ||
Reserved.set(AArch64::VG); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be protected by hasSVEorSME
? or does the definition of MSRpstatePseudo
precluded that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it really matters. None of the tests fail so it seems valid to just mark it as reserved, even if it's not used when functions are not compiled for SVE or SME.
// Avoid rematerializing rematerializable instructions that use/define | ||
// scalable values, such as 'pfalse' or 'ptrue', which result in different | ||
// results when the runtime vector length is different. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment likely needs updating?
if (any_of(MI.operands(), [&MFI](const MachineOperand &MO) { | ||
return MO.isFI() && | ||
MFI.getStackID(MO.getIndex()) == TargetStackID::ScalableVector; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide an example of a rematerialisable instruction that doesn't say it uses VG
but has a scalable vector frame index. I'm not saying there isn't one [1] but I'm wondering if there are specific cases that can be handled rather than having to walk every instructions operands.
[1] Perhaps COPY FI:op
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instruction that is rematerialisable that doesn't use VG (but would need it) is ADDXri, as this is used to materialise a frame-address, which only happens after register allocation. I've updated the PR to handle those instructions specifically, rather than walking through all instructions and their operands.
The corresponding test is @call_to_non_streaming_pass_sve_objects
in llvm/test/CodeGen/AArch64/sme-streaming-interface.ll.
...by moving the functionality to ADDXri, which is used for materializing addresses to frame-objects, for example: gpr64:%dst = ADDXri fi:#0, 0 gets expanded to an actual addresses, by adding SP/FP + offset. If the object is in the scalable vector region on the stack, then the offset will be multiplied by 'mul VL', therefore reading VG. This patch adds 'implicit $vg' to those instructions, which removes the need for scanning all operands of all instructions for frame-indices. I believe this is the only instruction where this is strictly required, because other SVE instructions are either not rematerializable, or they have an implicitly use of VG.
We can add implicit defs/uses of the 'VG' register to the instructions to prevent the register allocator from rematerializing values in between streaming-mode changes, as the def/use of VG will further nail down the ordering that comes out of ISel. This avoids the heavy-handed approach to prevent any kind of rematerialization.
While we could add 'VG' as a Use to all SVE instructions, we only really need to do this for instructions that are rematerializable, as the smstart/smstop instructions and pseudos act as scheduling barriers which is sufficient to prevent other instructions from being scheduled in between the streaming-mode-changing call sequence. However, we may revisit this in the future.