[AArch64] Re-enable rematerialization for streaming-mode-changing functions. #83235

sdesmalen-arm · 2024-02-28T09:00:31Z

We can add implicit defs/uses of the 'VG' register to the instructions to prevent the register allocator from rematerializing values in between streaming-mode changes, as the def/use of VG will further nail down the ordering that comes out of ISel. This avoids the heavy-handed approach to prevent any kind of rematerialization.

While we could add 'VG' as a Use to all SVE instructions, we only really need to do this for instructions that are rematerializable, as the smstart/smstop instructions and pseudos act as scheduling barriers which is sufficient to prevent other instructions from being scheduled in between the streaming-mode-changing call sequence. However, we may revisit this in the future.

…ctions. We can add implicit defs/uses of the 'VG' register to the instructions to prevent the register allocator from rematerializing values in between streaming-mode changes, as the def/use of VG will further nail down the ordering that comes out of ISel. This avoids the heavy-handed approach to prevent any kind of rematerialization. While we could add 'VG' as a Use to all SVE instructions, we only really need to do this for instructions that are rematerializable, as the smstart/smstop instructions and pseudos act as scheduling barriers which is sufficient to prevent other instructions from being scheduled in between the streaming-mode-changing call sequence. However, we may revisit this in the future.

llvmbot · 2024-02-28T09:01:00Z

@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)

Changes

We can add implicit defs/uses of the 'VG' register to the instructions to prevent the register allocator from rematerializing values in between streaming-mode changes, as the def/use of VG will further nail down the ordering that comes out of ISel. This avoids the heavy-handed approach to prevent any kind of rematerialization.

While we could add 'VG' as a Use to all SVE instructions, we only really need to do this for instructions that are rematerializable, as the smstart/smstop instructions and pseudos act as scheduling barriers which is sufficient to prevent other instructions from being scheduled in between the streaming-mode-changing call sequence. However, we may revisit this in the future.

Patch is 55.00 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83235.diff

23 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+1-28)
(modified) llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp (+3)
(modified) llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td (+2)
(modified) llvm/lib/Target/AArch64/SMEInstrFormats.td (+2)
(modified) llvm/lib/Target/AArch64/SVEInstrFormats.td (+10)
(modified) llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir (+1-1)
(modified) llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir (+1-1)
(modified) llvm/test/CodeGen/AArch64/live-debugvalues-sve.mir (+2-2)
(modified) llvm/test/CodeGen/AArch64/sve-localstackalloc.mir (+1-1)
(modified) llvm/test/CodeGen/AArch64/sve-pfalse-machine-cse.mir (+3-3)
(modified) llvm/test/CodeGen/AArch64/sve-pseudos-expand-undef.mir (+2-2)
(modified) llvm/test/CodeGen/AArch64/sve-ptest-removal-cmpeq.mir (+5-5)
(modified) llvm/test/CodeGen/AArch64/sve-ptest-removal-whilege.mir (+12-12)
(modified) llvm/test/CodeGen/AArch64/sve-ptest-removal-whilegt.mir (+12-12)
(modified) llvm/test/CodeGen/AArch64/sve-ptest-removal-whilehi.mir (+12-12)
(modified) llvm/test/CodeGen/AArch64/sve-ptest-removal-whilehs.mir (+12-12)
(modified) llvm/test/CodeGen/AArch64/sve-ptest-removal-whilele.mir (+12-12)
(modified) llvm/test/CodeGen/AArch64/sve-ptest-removal-whilelo.mir (+12-12)
(modified) llvm/test/CodeGen/AArch64/sve-ptest-removal-whilels.mir (+12-12)
(modified) llvm/test/CodeGen/AArch64/sve-ptest-removal-whilelt.mir (+12-12)
(modified) llvm/test/CodeGen/AArch64/sve-ptest-removal-whilerw.mir (+8-8)
(modified) llvm/test/CodeGen/AArch64/sve-ptest-removal-whilewr.mir (+8-8)
(modified) llvm/test/CodeGen/AArch64/sve2p1_copy_pnr.mir (+6-6)

diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 39c96092f10319..f88bfd0778fb0b 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -9494,40 +9494,13 @@ bool AArch64InstrInfo::isReallyTriviallyReMaterializable(
     // Avoid rematerializing rematerializable instructions that use/define
     // scalable values, such as 'pfalse' or 'ptrue', which result in different
     // results when the runtime vector length is different.
-    const MachineRegisterInfo &MRI = MF.getRegInfo();
     const MachineFrameInfo &MFI = MF.getFrameInfo();
-    if (any_of(MI.operands(), [&MRI, &MFI](const MachineOperand &MO) {
+    if (any_of(MI.operands(), [&MFI](const MachineOperand &MO) {
           if (MO.isFI() &&
               MFI.getStackID(MO.getIndex()) == TargetStackID::ScalableVector)
             return true;
-          if (!MO.isReg())
-            return false;
-
-          if (MO.getReg().isVirtual()) {
-            const TargetRegisterClass *RC = MRI.getRegClass(MO.getReg());
-            return AArch64::ZPRRegClass.hasSubClassEq(RC) ||
-                   AArch64::PPRRegClass.hasSubClassEq(RC);
-          }
-          return AArch64::ZPRRegClass.contains(MO.getReg()) ||
-                 AArch64::PPRRegClass.contains(MO.getReg());
         }))
       return false;
-
-    // Avoid rematerializing instructions that return a value that is
-    // different depending on vector length, even when it is not returned
-    // in a scalable vector/predicate register.
-    switch (MI.getOpcode()) {
-    default:
-      break;
-    case AArch64::RDVLI_XI:
-    case AArch64::ADDVL_XXI:
-    case AArch64::ADDPL_XXI:
-    case AArch64::CNTB_XPiI:
-    case AArch64::CNTH_XPiI:
-    case AArch64::CNTW_XPiI:
-    case AArch64::CNTD_XPiI:
-      return false;
-    }
   }
 
   return TargetInstrInfo::isReallyTriviallyReMaterializable(MI);
diff --git a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
index b919c116445c8b..531f21f9c043a2 100644
--- a/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp
@@ -443,6 +443,9 @@ AArch64RegisterInfo::getStrictlyReservedRegs(const MachineFunction &MF) const {
       Reserved.set(SubReg);
   }
 
+  // VG cannot be allocated
+  Reserved.set(AArch64::VG);
+
   if (MF.getSubtarget<AArch64Subtarget>().hasSME2()) {
     for (MCSubRegIterator SubReg(AArch64::ZT0, this, /*self=*/true);
          SubReg.isValid(); ++SubReg)
diff --git a/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td b/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
index acf067f2cc5a9d..2907ba74ff8108 100644
--- a/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
@@ -233,6 +233,8 @@ def MSRpstatePseudo :
            (ins svcr_op:$pstatefield, timm0_1:$imm, GPR64:$rtpstate, timm0_1:$expected_pstate, variable_ops), []>,
     Sched<[WriteSys]> {
   let hasPostISelHook = 1;
+  let Uses = [VG];
+  let Defs = [VG];
 }
 
 def : Pat<(AArch64_smstart (i32 svcr_op:$pstate), (i64 GPR64:$rtpstate), (i64 timm0_1:$expected_pstate)),
diff --git a/llvm/lib/Target/AArch64/SMEInstrFormats.td b/llvm/lib/Target/AArch64/SMEInstrFormats.td
index 44d9a8ac7cb677..33cb5f9734b819 100644
--- a/llvm/lib/Target/AArch64/SMEInstrFormats.td
+++ b/llvm/lib/Target/AArch64/SMEInstrFormats.td
@@ -223,6 +223,8 @@ def MSRpstatesvcrImm1
   let Inst{8} = imm;
   let Inst{7-5} = 0b011; // op2
   let hasPostISelHook = 1;
+  let Uses = [VG];
+  let Defs = [VG];
 }
 
 def : InstAlias<"smstart",    (MSRpstatesvcrImm1 0b011, 0b1)>;
diff --git a/llvm/lib/Target/AArch64/SVEInstrFormats.td b/llvm/lib/Target/AArch64/SVEInstrFormats.td
index 789ec817d3d8b8..c8ca1832ec18dd 100644
--- a/llvm/lib/Target/AArch64/SVEInstrFormats.td
+++ b/llvm/lib/Target/AArch64/SVEInstrFormats.td
@@ -365,6 +365,7 @@ class sve_int_ptrue<bits<2> sz8_64, bits<3> opc, string asm, PPRRegOp pprty,
   let ElementSize = pprty.ElementSize;
   let hasSideEffects = 0;
   let isReMaterializable = 1;
+  let Uses = [VG];
 }
 
 multiclass sve_int_ptrue<bits<3> opc, string asm, SDPatternOperator op> {
@@ -755,6 +756,7 @@ class sve_int_pfalse<bits<6> opc, string asm>
 
   let hasSideEffects = 0;
   let isReMaterializable = 1;
+  let Uses = [VG];
 }
 
 multiclass sve_int_pfalse<bits<6> opc, string asm> {
@@ -1090,6 +1092,7 @@ class sve_int_count<bits<3> opc, string asm>
 
   let hasSideEffects = 0;
   let isReMaterializable = 1;
+  let Uses = [VG];
 }
 
 multiclass sve_int_count<bits<3> opc, string asm, SDPatternOperator op> {
@@ -1982,6 +1985,7 @@ class sve_int_dup_mask_imm<string asm>
   let DecoderMethod = "DecodeSVELogicalImmInstruction";
   let hasSideEffects = 0;
   let isReMaterializable = 1;
+  let Uses = [VG];
 }
 
 multiclass sve_int_dup_mask_imm<string asm> {
@@ -2862,6 +2866,7 @@ class sve_int_arith_vl<bit opc, string asm, bit streaming_sve = 0b0>
   let Inst{4-0}   = Rd;
 
   let hasSideEffects = 0;
+  let Uses = [VG];
 }
 
 class sve_int_read_vl_a<bit op, bits<5> opc2, string asm, bit streaming_sve = 0b0>
@@ -2882,6 +2887,7 @@ class sve_int_read_vl_a<bit op, bits<5> opc2, string asm, bit streaming_sve = 0b
 
   let hasSideEffects = 0;
   let isReMaterializable = 1;
+  let Uses = [VG];
 }
 
 //===----------------------------------------------------------------------===//
@@ -4699,6 +4705,7 @@ class sve_int_dup_imm<bits<2> sz8_64, string asm,
 
   let hasSideEffects = 0;
   let isReMaterializable = 1;
+  let Uses = [VG];
 }
 
 multiclass sve_int_dup_imm<string asm> {
@@ -4741,6 +4748,7 @@ class sve_int_dup_fpimm<bits<2> sz8_64, Operand fpimmtype,
 
   let hasSideEffects = 0;
   let isReMaterializable = 1;
+  let Uses = [VG];
 }
 
 multiclass sve_int_dup_fpimm<string asm> {
@@ -5657,6 +5665,7 @@ class sve_int_index_ii<bits<2> sz8_64, string asm, ZPRRegOp zprty,
 
   let hasSideEffects = 0;
   let isReMaterializable = 1;
+  let Uses = [VG];
 }
 
 multiclass sve_int_index_ii<string asm> {
@@ -9308,6 +9317,7 @@ class sve2p1_ptrue_pn<string mnemonic, bits<2> sz, PNRP8to15RegOp pnrty, SDPatte
 
   let hasSideEffects = 0;
   let isReMaterializable = 1;
+  let Uses = [VG];
 }
 
 
diff --git a/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir b/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir
index d44d45ea03b6bc..aca2816225e3ee 100644
--- a/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir
+++ b/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-declare.mir
@@ -193,7 +193,7 @@ body:             |
     liveins: $z0, $z1, $p0, $p1, $w0
 
     renamable $p2 = COPY killed $p0
-    renamable $p0 = PTRUE_S 31
+    renamable $p0 = PTRUE_S 31, implicit $vg
     ST1W_IMM killed renamable $z0, renamable $p0, %stack.0.z0.addr, 0 :: (store unknown-size into %ir.z0.addr, align 16)
     ST1W_IMM killed renamable $z1, renamable $p0, %stack.1.z1.addr, 0 :: (store unknown-size into %ir.z1.addr, align 16)
     STR_PXI killed renamable $p2, %stack.2.p0.addr, 0 :: (store unknown-size into %ir.p0.addr, align 2)
diff --git a/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir b/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir
index 75917ef32ae2ad..0ea180b20730f2 100644
--- a/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir
+++ b/llvm/test/CodeGen/AArch64/debug-info-sve-dbg-value.mir
@@ -111,7 +111,7 @@ body:             |
     STRXui killed renamable $x1, %stack.1, 0, debug-location !8
     DBG_VALUE %stack.1, $noreg, !11, !DIExpression(DW_OP_constu, 16, DW_OP_plus, DW_OP_deref), debug-location !8
 
-    renamable $p2 = PTRUE_S 31, debug-location !DILocation(line: 4, column: 1, scope: !5)
+    renamable $p2 = PTRUE_S 31, implicit $vg, debug-location !DILocation(line: 4, column: 1, scope: !5)
     ST1W_IMM renamable $z0, renamable $p2, %stack.2, 0, debug-location !DILocation(line: 5, column: 1, scope: !5)
     DBG_VALUE %stack.2, $noreg, !12, !DIExpression(DW_OP_deref), debug-location !DILocation(line: 5, column: 1, scope: !5)
     ST1W_IMM renamable $z1, killed renamable $p2, %stack.3, 0, debug-location !DILocation(line: 6, column: 1, scope: !5)
diff --git a/llvm/test/CodeGen/AArch64/live-debugvalues-sve.mir b/llvm/test/CodeGen/AArch64/live-debugvalues-sve.mir
index 8903ca2b865b98..612453ab53f438 100644
--- a/llvm/test/CodeGen/AArch64/live-debugvalues-sve.mir
+++ b/llvm/test/CodeGen/AArch64/live-debugvalues-sve.mir
@@ -145,7 +145,7 @@ body:             |
     liveins: $z1
 
     ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp, debug-location !34
-    renamable $p0 = PTRUE_S 31, debug-location !34
+    renamable $p0 = PTRUE_S 31, implicit $vg, debug-location !34
     $x0 = ADDXri %stack.0, 0, 0, debug-location !34
     ST1W_IMM renamable $z1, killed renamable $p0, %stack.0, 0, debug-location !34 :: (store unknown-size into %stack.0, align 16)
     $z0 = COPY renamable $z1, debug-location !34
@@ -157,7 +157,7 @@ body:             |
     $z7 = COPY renamable $z1, debug-location !34
     BL @bar, csr_aarch64_sve_aapcs, implicit-def dead $lr, implicit $sp, implicit $z0, implicit $z1, implicit $z2, implicit $z3, implicit $z4, implicit $z5, implicit $z6, implicit $z7, implicit $x0, implicit-def $sp, implicit-def $z0, implicit-def $z1, debug-location !34
     ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp, debug-location !34
-    renamable $p0 = PTRUE_S 31, debug-location !34
+    renamable $p0 = PTRUE_S 31, implicit $vg, debug-location !34
     $z3 = IMPLICIT_DEF
     renamable $z1 = LD1W_IMM renamable $p0, %stack.0, 0, debug-location !34 :: (load unknown-size from %stack.0, align 16)
     ST1W_IMM renamable $z3, killed renamable $p0, %stack.0, 0 :: (store unknown-size into %stack.0, align 16)
diff --git a/llvm/test/CodeGen/AArch64/sve-localstackalloc.mir b/llvm/test/CodeGen/AArch64/sve-localstackalloc.mir
index 3fbb7889c8b79b..6063c8dfc792c9 100644
--- a/llvm/test/CodeGen/AArch64/sve-localstackalloc.mir
+++ b/llvm/test/CodeGen/AArch64/sve-localstackalloc.mir
@@ -48,7 +48,7 @@ body:             |
     %2:gpr32 = COPY $w0
     %1:zpr = COPY $z1
     %0:zpr = COPY $z0
-    %5:ppr_3b = PTRUE_B 31
+    %5:ppr_3b = PTRUE_B 31, implicit $vg
     %6:gpr64sp = ADDXri %stack.0, 0, 0
     ST1B_IMM %1, %5, %6, 1 :: (store unknown-size, align 16)
     ST1B_IMM %0, %5, %stack.0, 0 :: (store unknown-size into %stack.0, align 16)
diff --git a/llvm/test/CodeGen/AArch64/sve-pfalse-machine-cse.mir b/llvm/test/CodeGen/AArch64/sve-pfalse-machine-cse.mir
index b76fe7821b6c69..8395a7619fbb46 100644
--- a/llvm/test/CodeGen/AArch64/sve-pfalse-machine-cse.mir
+++ b/llvm/test/CodeGen/AArch64/sve-pfalse-machine-cse.mir
@@ -11,15 +11,15 @@ body:             |
     ; CHECK: liveins: $p0
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:ppr = COPY $p0
-    ; CHECK-NEXT: [[PFALSE:%[0-9]+]]:ppr = PFALSE
+    ; CHECK-NEXT: [[PFALSE:%[0-9]+]]:ppr = PFALSE implicit $vg
     ; CHECK-NEXT: [[UZP1_PPP_B:%[0-9]+]]:ppr = UZP1_PPP_B [[COPY]], [[PFALSE]]
     ; CHECK-NEXT: [[UZP1_PPP_B1:%[0-9]+]]:ppr = UZP1_PPP_B killed [[UZP1_PPP_B]], [[PFALSE]]
     ; CHECK-NEXT: $p0 = COPY [[UZP1_PPP_B1]]
     ; CHECK-NEXT: RET_ReallyLR implicit $p0
     %0:ppr = COPY $p0
-    %2:ppr = PFALSE
+    %2:ppr = PFALSE implicit $vg
     %3:ppr = UZP1_PPP_B %0, %2
-    %4:ppr = PFALSE
+    %4:ppr = PFALSE implicit $vg
     %5:ppr = UZP1_PPP_B killed %3, %4
     $p0 = COPY %5
     RET_ReallyLR implicit $p0
diff --git a/llvm/test/CodeGen/AArch64/sve-pseudos-expand-undef.mir b/llvm/test/CodeGen/AArch64/sve-pseudos-expand-undef.mir
index df0e50de4d1a7a..ae70f91a4ec641 100644
--- a/llvm/test/CodeGen/AArch64/sve-pseudos-expand-undef.mir
+++ b/llvm/test/CodeGen/AArch64/sve-pseudos-expand-undef.mir
@@ -26,7 +26,7 @@ body:             |
 name: expand_mls_to_msb
 body:             |
   bb.0:
-    renamable $p0 = PTRUE_B 31
+    renamable $p0 = PTRUE_B 31, implicit $vg
     renamable $z0 = MLS_ZPZZZ_B_UNDEF killed renamable $p0, killed renamable $z2, killed renamable $z0, killed renamable $z1
     RET_ReallyLR implicit $z0
 ...
@@ -36,7 +36,7 @@ body:             |
 name: expand_mla_to_mad
 body:             |
   bb.0:
-    renamable $p0 = PTRUE_B 31
+    renamable $p0 = PTRUE_B 31, implicit $vg
     renamable $z0 = MLA_ZPZZZ_B_UNDEF killed renamable $p0, killed renamable $z2, killed renamable $z0, killed renamable $z1
     RET_ReallyLR implicit $z0
 ...
diff --git a/llvm/test/CodeGen/AArch64/sve-ptest-removal-cmpeq.mir b/llvm/test/CodeGen/AArch64/sve-ptest-removal-cmpeq.mir
index 81318aa5c2a58d..5169113697dc60 100644
--- a/llvm/test/CodeGen/AArch64/sve-ptest-removal-cmpeq.mir
+++ b/llvm/test/CodeGen/AArch64/sve-ptest-removal-cmpeq.mir
@@ -174,7 +174,7 @@ body:             |
     %1:zpr = COPY $z0
     %0:ppr_3b = COPY $p0
     %2:ppr = CMPEQ_PPzZI_B %0, %1, 0, implicit-def dead $nzcv
-    %3:ppr = PTRUE_B 31
+    %3:ppr = PTRUE_B 31, implicit $vg
     PTEST_PP killed %3, killed %2, implicit-def $nzcv
     %4:gpr32 = COPY $wzr
     %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
@@ -409,14 +409,14 @@ body:             |
 
     ; CHECK-LABEL: name: cmpeq_imm_nxv16i8_ptest_not_all_active
     ; CHECK: %2:ppr = CMPEQ_PPzZI_B %0, %1, 0, implicit-def dead $nzcv
-    ; CHECK-NEXT: %3:ppr = PTRUE_B 0
+    ; CHECK-NEXT: %3:ppr = PTRUE_B 0, implicit $vg
     ; CHECK-NEXT: PTEST_PP killed %3, killed %2, implicit-def $nzcv
     ; CHECK-NEXT: %4:gpr32 = COPY $wzr
     ; CHECK-NEXT: %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
     %1:zpr = COPY $z0
     %0:ppr_3b = COPY $p0
     %2:ppr = CMPEQ_PPzZI_B %0, %1, 0, implicit-def dead $nzcv
-    %3:ppr = PTRUE_B 0
+    %3:ppr = PTRUE_B 0, implicit $vg
     PTEST_PP killed %3, killed %2, implicit-def $nzcv
     %4:gpr32 = COPY $wzr
     %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
@@ -446,14 +446,14 @@ body:             |
 
     ; CHECK-LABEL: name: cmpeq_imm_nxv16i8_ptest_of_halfs
     ; CHECK: %2:ppr = CMPEQ_PPzZI_B %0, %1, 0, implicit-def dead $nzcv
-    ; CHECK-NEXT: %3:ppr = PTRUE_H 31
+    ; CHECK-NEXT: %3:ppr = PTRUE_H 31, implicit $vg
     ; CHECK-NEXT: PTEST_PP killed %3, killed %2, implicit-def $nzcv
     ; CHECK-NEXT: %4:gpr32 = COPY $wzr
     ; CHECK-NEXT: %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
     %1:zpr = COPY $z0
     %0:ppr_3b = COPY $p0
     %2:ppr = CMPEQ_PPzZI_B %0, %1, 0, implicit-def dead $nzcv
-    %3:ppr = PTRUE_H 31
+    %3:ppr = PTRUE_H 31, implicit $vg
     PTEST_PP killed %3, killed %2, implicit-def $nzcv
     %4:gpr32 = COPY $wzr
     %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
diff --git a/llvm/test/CodeGen/AArch64/sve-ptest-removal-whilege.mir b/llvm/test/CodeGen/AArch64/sve-ptest-removal-whilege.mir
index 8f7467d99154e3..c1d9dfff73447c 100644
--- a/llvm/test/CodeGen/AArch64/sve-ptest-removal-whilege.mir
+++ b/llvm/test/CodeGen/AArch64/sve-ptest-removal-whilege.mir
@@ -30,7 +30,7 @@ body:             |
     ; CHECK-NEXT: %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
     %1:gpr32 = COPY $w1
     %0:gpr32 = COPY $w0
-    %2:ppr = PTRUE_B 31
+    %2:ppr = PTRUE_B 31, implicit $vg
     %3:ppr = WHILEGE_PWW_B %0, %1, implicit-def dead $nzcv
     PTEST_PP killed %2, killed %3, implicit-def $nzcv
     %4:gpr32 = COPY $wzr
@@ -63,7 +63,7 @@ body:             |
     ; CHECK-NOT: PTEST
     %1:gpr64 = COPY $x1
     %0:gpr64 = COPY $x0
-    %2:ppr = PTRUE_B 31
+    %2:ppr = PTRUE_B 31, implicit $vg, implicit $vg
     %3:ppr = WHILEGE_PXX_B %0, %1, implicit-def dead $nzcv
     PTEST_PP killed %2, killed %3, implicit-def $nzcv
     %4:gpr32 = COPY $wzr
@@ -98,7 +98,7 @@ body:             |
     ; CHECK-NOT: PTEST
     %1:gpr32 = COPY $w1
     %0:gpr32 = COPY $w0
-    %2:ppr = PTRUE_H 31
+    %2:ppr = PTRUE_H 31, implicit $vg
     %4:ppr = WHILEGE_PWW_H %0, %1, implicit-def dead $nzcv
     PTEST_PP %2, %4, implicit-def $nzcv
     %6:gpr32 = COPY $wzr
@@ -133,7 +133,7 @@ body:             |
     ; CHECK-NOT: PTEST
     %1:gpr64 = COPY $x1
     %0:gpr64 = COPY $x0
-    %2:ppr = PTRUE_H 31
+    %2:ppr = PTRUE_H 31, implicit $vg
     %4:ppr = WHILEGE_PXX_H %0, %1, implicit-def dead $nzcv
     PTEST_PP %2, %4, implicit-def $nzcv
     %6:gpr32 = COPY $wzr
@@ -168,7 +168,7 @@ body:             |
     ; CHECK-NOT: PTEST
     %1:gpr32 = COPY $w1
     %0:gpr32 = COPY $w0
-    %2:ppr = PTRUE_S 31
+    %2:ppr = PTRUE_S 31, implicit $vg
     %4:ppr = WHILEGE_PWW_S %0, %1, implicit-def dead $nzcv
     PTEST_PP %2, %4, implicit-def $nzcv
     %6:gpr32 = COPY $wzr
@@ -203,7 +203,7 @@ body:             |
     ; CHECK-NOT: PTEST
     %1:gpr64 = COPY $x1
     %0:gpr64 = COPY $x0
-    %2:ppr = PTRUE_S 31
+    %2:ppr = PTRUE_S 31, implicit $vg
     %4:ppr = WHILEGE_PXX_S %0, %1, implicit-def dead $nzcv
     PTEST_PP %2, %4, implicit-def $nzcv
     %6:gpr32 = COPY $wzr
@@ -238,7 +238,7 @@ body:             |
     ; CHECK-NOT: PTEST
     %1:gpr32 = COPY $w1
     %0:gpr32 = COPY $w0
-    %2:ppr = PTRUE_D 31
+    %2:ppr = PTRUE_D 31, implicit $vg
     %4:ppr = WHILEGE_PWW_D %0, %1, implicit-def dead $nzcv
     PTEST_PP %2, %4, implicit-def $nzcv
     %6:gpr32 = COPY $wzr
@@ -273,7 +273,7 @@ body:             |
     ; CHECK-NOT: PTEST
     %1:gpr64 = COPY $x1
     %0:gpr64 = COPY $x0
-    %2:ppr = PTRUE_D 31
+    %2:ppr = PTRUE_D 31, implicit $vg
     %4:ppr = WHILEGE_PXX_D %0, %1, implicit-def dead $nzcv
     PTEST_PP %2, %4, implicit-def $nzcv
     %6:gpr32 = COPY $wzr
@@ -313,7 +313,7 @@ body:             |
     ; CHECK-NEXT: %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
     %1:gpr64 = COPY $x1
     %0:gpr64 = COPY $x0
-    %2:ppr = PTRUE_B 0
+    %2:ppr = PTRUE_B 0, implicit $vg
     %3:ppr = WHILEGE_PXX_B %0, %1, implicit-def dead $nzcv
     PTEST_PP killed %2, killed %3, implicit-def $nzcv
     %4:gpr32 = COPY $wzr
@@ -353,7 +353,7 @@ body:             |
     ; CHECK-NEXT: %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
     %1:gpr64 = COPY $x1
     %0:gpr64 = COPY $x0
-    %2:ppr = PTRUE_H 31
+    %2:ppr = PTRUE_H 31, implicit $vg
     %3:ppr = WHILEGE_PXX_B %0, %1, implicit-def dead $nzcv
     PTEST_PP killed %2, killed %3, implicit-def $nzcv
     %4:gpr32 = COPY $wzr
@@ -393,7 +393,7 @@ body:             |
     ; CHECK-NEXT: %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
     %1:gpr64 = COPY $x1
     %0:gpr64 = COPY $x0
-    %2:ppr = PTRUE_S 31
+    %2:ppr = PTRUE_S 31, implicit $vg
     %3:ppr = WHILEGE_PXX_B %0, %1, implicit-def dead $nzcv
     PTEST_PP killed %2, killed %3, implicit-def $nzcv
     %4:gpr32 = COPY $wzr
@@ -433,7 +433,7 @@ body:             |
     ; CHECK-NEXT: %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
     %1:gpr64 = COPY $x1
     %0:gpr64 = COPY $x0
-    %2:ppr = PTRUE_D 31
+    %2:ppr = PTRUE_D 31, implicit $vg
     %3:ppr = WHILEGE_PXX_B %0, %1, implicit-def dead $nzcv
     PTEST_PP killed %2, killed %3, implicit-def $nzcv
     %4:gpr32 = COPY $wzr
diff --git a/llvm/test/CodeGen/AArch64/sve-ptest-removal-whilegt.mir b/llvm/test/CodeGen/AArch64/sve-ptest-removal-whilegt.mir
index 217d984560e36c..c6df21f85db773 100644
--- a/llvm/test/CodeGen/AArch64/sve-ptest-removal-whilegt.mir
+++ b/llvm/test/CodeGen/AArch64/sve-ptest-removal-whilegt.mir
@@ -30,7 +30,7 @@ body:             |
     ; CHECK-NEXT: %5:gpr32 = CSINCWr %4, $wzr, 0, implicit $nzcv
     %1:gpr32 = COPY $w1
     %0:gpr32 = COPY $w0
-    %2:ppr = PTRUE_B 31
+    %2:ppr = PTRUE_B 31, implicit $vg
     %3:ppr = WHILEGT_PWW_B %0, %1, implicit-def dead $nzcv
     PTEST_PP killed %2, killed %3, implicit-def $nzcv
     %4:gpr32 = COPY $wzr
@@ -63,7 +63,7 @@ body:             |
     ; CHECK-NOT: PTEST
     %1:gpr64 = COPY $x1
     %0:gpr64 = COPY $x0
-    %2:ppr = PTRUE_B 31
+    %2:ppr = PTRUE_B 31, implicit $vg, implicit $vg
     %3:ppr = WHILEGT_PXX_B %0, %1, implicit-def dead $nzcv
     PTEST_PP killed %2, killed %3, implicit-def $nzcv
     %4:gpr32 = COPY $wzr
@@ -98,7 +98,7 @@ body:             |
     ; CHECK-NOT: PTEST
     %1:gpr32 = COPY $w1
     %0:gpr32 = COPY $w0
-    %2:ppr = PTRUE_H 31
+    %2:ppr = PTRUE_H 31, implicit $vg, implicit $vg
     %4:ppr = WHILEGT_PWW_H %0, %1, implicit-def dead $nzcv
     PTEST_PP %2, %4, implicit-def $nzcv
     %6:gpr32 = COPY $wzr
@@ -133,7 +133,7 @@ body:             |
     ; CHECK-NOT: PTEST
     %1:gpr64 = COPY $x1
     %0:gpr64 = COPY $x0
-    %2:ppr = PTRUE_H 31
+    %2:ppr = PTRUE_H 31, implicit $vg, implicit $vg
     %4:ppr = WHILEGT_PXX_H %0, %1, implicit-def dead $nzcv
     PTEST_PP %2, %4, implicit-def $nzcv
     %6:gpr32 = COPY $wzr
@@ -168,7 +168,7 @@ body:             |
     ; CHECK-NOT: PT...
[truncated]

paulwalker-arm · 2024-02-28T12:42:41Z

The approach looks sensible to me. I do think we should roll out the VG change across the other instructions, or at least those that directly access VG like ADDVL, but agree that's better done as follow on work.

paulwalker-arm · 2024-02-28T16:42:03Z

llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp

+  // VG cannot be allocated
+  Reserved.set(AArch64::VG);


Should this be protected by hasSVEorSME? or does the definition of MSRpstatePseudo precluded that.

I'm not sure it really matters. None of the tests fail so it seems valid to just mark it as reserved, even if it's not used when functions are not compiled for SVE or SME.

paulwalker-arm · 2024-02-28T16:42:34Z

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

    // Avoid rematerializing rematerializable instructions that use/define
    // scalable values, such as 'pfalse' or 'ptrue', which result in different
    // results when the runtime vector length is different.


This comment likely needs updating?

paulwalker-arm · 2024-02-28T18:01:48Z

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

+    if (any_of(MI.operands(), [&MFI](const MachineOperand &MO) {
+          return MO.isFI() &&
+                 MFI.getStackID(MO.getIndex()) == TargetStackID::ScalableVector;


Can you provide an example of a rematerialisable instruction that doesn't say it uses VG but has a scalable vector frame index. I'm not saying there isn't one [1] but I'm wondering if there are specific cases that can be handled rather than having to walk every instructions operands.

[1] Perhaps COPY FI:op?

The instruction that is rematerialisable that doesn't use VG (but would need it) is ADDXri, as this is used to materialise a frame-address, which only happens after register allocation. I've updated the PR to handle those instructions specifically, rather than walking through all instructions and their operands.

The corresponding test is @call_to_non_streaming_pass_sve_objects in llvm/test/CodeGen/AArch64/sme-streaming-interface.ll.

...by moving the functionality to ADDXri, which is used for materializing addresses to frame-objects, for example: gpr64:%dst = ADDXri fi:#0, 0 gets expanded to an actual addresses, by adding SP/FP + offset. If the object is in the scalable vector region on the stack, then the offset will be multiplied by 'mul VL', therefore reading VG. This patch adds 'implicit $vg' to those instructions, which removes the need for scanning all operands of all instructions for frame-indices. I believe this is the only instruction where this is strictly required, because other SVE instructions are either not rematerializable, or they have an implicitly use of VG.

sdesmalen-arm requested review from jroelofs, paulwalker-arm and kmclaughlin-arm February 28, 2024 09:00

llvmbot added the backend:AArch64 label Feb 28, 2024

Remove undefined behaviour from lambda

1b0e15d

paulwalker-arm reviewed Feb 28, 2024

View reviewed changes

paulwalker-arm approved these changes Feb 29, 2024

View reviewed changes

sdesmalen-arm merged commit 5bd01ac into llvm:main Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64] Re-enable rematerialization for streaming-mode-changing functions. #83235

[AArch64] Re-enable rematerialization for streaming-mode-changing functions. #83235

Uh oh!

sdesmalen-arm commented Feb 28, 2024

Uh oh!

llvmbot commented Feb 28, 2024

Uh oh!

paulwalker-arm commented Feb 28, 2024

Uh oh!

paulwalker-arm Feb 28, 2024

Uh oh!

sdesmalen-arm Feb 29, 2024

Uh oh!

paulwalker-arm Feb 28, 2024

Uh oh!

paulwalker-arm Feb 28, 2024

Uh oh!

sdesmalen-arm Feb 29, 2024

Uh oh!

Uh oh!

[AArch64] Re-enable rematerialization for streaming-mode-changing functions. #83235

[AArch64] Re-enable rematerialization for streaming-mode-changing functions. #83235

Uh oh!

Conversation

sdesmalen-arm commented Feb 28, 2024

Uh oh!

llvmbot commented Feb 28, 2024

Uh oh!

paulwalker-arm commented Feb 28, 2024

Uh oh!

paulwalker-arm Feb 28, 2024

Choose a reason for hiding this comment

Uh oh!

sdesmalen-arm Feb 29, 2024

Choose a reason for hiding this comment

Uh oh!

paulwalker-arm Feb 28, 2024

Choose a reason for hiding this comment

Uh oh!

paulwalker-arm Feb 28, 2024

Choose a reason for hiding this comment

Uh oh!

sdesmalen-arm Feb 29, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!