[PowerPC] Add phony subregisters to cover the high half of the VSX registers. #94628

stefanp-synopsys · 2024-06-06T15:15:33Z

On PowerPC there are 128 bit VSX registers. These registers are half overlapped with 64 bit floating point registers (FPR). The 64 bit half of the VXS register that does not overlap with the FPR does not overlap with any other register class. The FPR are the only subregisters of the VSX registers but they do not fully cover the 128 bit super register. This leads to incorrect lane masks being created.

This patch adds phony registers for the other half of the VSX registers in order to fully cover them and to make sure that the lane masks are not the same for the VSX and the floating point register.

llvmbot · 2024-06-06T15:16:08Z

@llvm/pr-subscribers-llvm-regalloc

@llvm/pr-subscribers-backend-powerpc

Author: Stefan Pintilie (stefanp-ibm)

Changes

On PowerPC we have VSX registers which overlap with floating point registers.
However, the floating point registers only overlap with half of each VSX
register while the other half is never used alone. This patch adds phony
registers for the other half of the VSX registers in order to fully cover them
and to make sure that the lane masks are not the same for the VSX and the
floating point register.

Note: This patch is still Work in Progress as there are a number of LIT
failures that need to be investigated.

Full diff: https://github.com/llvm/llvm-project/pull/94628.diff

4 Files Affected:

(modified) llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp (+85-1)
(modified) llvm/lib/Target/PowerPC/PPCRegisterInfo.td (+22-9)
(modified) llvm/test/CodeGen/PowerPC/frem.ll (-5)
(added) llvm/test/CodeGen/PowerPC/subreg-lanemasks.mir (+63)

diff --git a/llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp b/llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
index 9e8da59615df..a2d089f32cab 100644
--- a/llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
+++ b/llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
@@ -435,7 +435,91 @@ BitVector PPCRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
     }
   }
 
-  assert(checkAllSuperRegsMarked(Reserved));
+  // Mark phony regsiters for the VSR high bits as reserved so that they are
+  // not used.
+  Reserved.set(PPC::FH0);
+  Reserved.set(PPC::FH1);
+  Reserved.set(PPC::FH2);
+  Reserved.set(PPC::FH3);
+  Reserved.set(PPC::FH4);
+  Reserved.set(PPC::FH5);
+  Reserved.set(PPC::FH6);
+  Reserved.set(PPC::FH7);
+  Reserved.set(PPC::FH8);
+  Reserved.set(PPC::FH9);
+  Reserved.set(PPC::FH10);
+  Reserved.set(PPC::FH11);
+  Reserved.set(PPC::FH12);
+  Reserved.set(PPC::FH13);
+  Reserved.set(PPC::FH14);
+  Reserved.set(PPC::FH15);
+  Reserved.set(PPC::FH16);
+  Reserved.set(PPC::FH17);
+  Reserved.set(PPC::FH18);
+  Reserved.set(PPC::FH19);
+  Reserved.set(PPC::FH20);
+  Reserved.set(PPC::FH21);
+  Reserved.set(PPC::FH22);
+  Reserved.set(PPC::FH23);
+  Reserved.set(PPC::FH24);
+  Reserved.set(PPC::FH25);
+  Reserved.set(PPC::FH26);
+  Reserved.set(PPC::FH27);
+  Reserved.set(PPC::FH28);
+  Reserved.set(PPC::FH29);
+  Reserved.set(PPC::FH30);
+  Reserved.set(PPC::FH31);
+
+  Reserved.set(PPC::VFH0);
+  Reserved.set(PPC::VFH1);
+  Reserved.set(PPC::VFH2);
+  Reserved.set(PPC::VFH3);
+  Reserved.set(PPC::VFH4);
+  Reserved.set(PPC::VFH5);
+  Reserved.set(PPC::VFH6);
+  Reserved.set(PPC::VFH7);
+  Reserved.set(PPC::VFH8);
+  Reserved.set(PPC::VFH9);
+  Reserved.set(PPC::VFH10);
+  Reserved.set(PPC::VFH11);
+  Reserved.set(PPC::VFH12);
+  Reserved.set(PPC::VFH13);
+  Reserved.set(PPC::VFH14);
+  Reserved.set(PPC::VFH15);
+  Reserved.set(PPC::VFH16);
+  Reserved.set(PPC::VFH17);
+  Reserved.set(PPC::VFH18);
+  Reserved.set(PPC::VFH19);
+  Reserved.set(PPC::VFH20);
+  Reserved.set(PPC::VFH21);
+  Reserved.set(PPC::VFH22);
+  Reserved.set(PPC::VFH23);
+  Reserved.set(PPC::VFH24);
+  Reserved.set(PPC::VFH25);
+  Reserved.set(PPC::VFH26);
+  Reserved.set(PPC::VFH27);
+  Reserved.set(PPC::VFH28);
+  Reserved.set(PPC::VFH29);
+  Reserved.set(PPC::VFH30);
+  Reserved.set(PPC::VFH31);
+
+  assert(checkAllSuperRegsMarked(Reserved,
+			         {PPC::FH0,  PPC::FH1,  PPC::FH2,  PPC::FH3,
+				  PPC::FH4,  PPC::FH5,  PPC::FH6,  PPC::FH7,
+				  PPC::FH8,  PPC::FH9,  PPC::FH10, PPC::FH11,
+				  PPC::FH12, PPC::FH13, PPC::FH14, PPC::FH15, 
+				  PPC::FH16, PPC::FH17, PPC::FH18, PPC::FH19,
+				  PPC::FH20, PPC::FH21, PPC::FH22, PPC::FH23,
+				  PPC::FH24, PPC::FH25, PPC::FH26, PPC::FH27,
+				  PPC::FH28, PPC::FH29, PPC::FH30, PPC::FH31, 
+				  PPC::VFH0,  PPC::VFH1,  PPC::VFH2,  PPC::VFH3,
+				  PPC::VFH4,  PPC::VFH5,  PPC::VFH6,  PPC::VFH7,
+				  PPC::VFH8,  PPC::VFH9,  PPC::VFH10, PPC::VFH11,
+				  PPC::VFH12, PPC::VFH13, PPC::VFH14, PPC::VFH15, 
+				  PPC::VFH16, PPC::VFH17, PPC::VFH18, PPC::VFH19,
+				  PPC::VFH20, PPC::VFH21, PPC::VFH22, PPC::VFH23,
+				  PPC::VFH24, PPC::VFH25, PPC::VFH26, PPC::VFH27,
+				  PPC::VFH28, PPC::VFH29, PPC::VFH30, PPC::VFH31}));
   return Reserved;
 }
 
diff --git a/llvm/lib/Target/PowerPC/PPCRegisterInfo.td b/llvm/lib/Target/PowerPC/PPCRegisterInfo.td
index 8a37e40414ee..30e936a157e0 100644
--- a/llvm/lib/Target/PowerPC/PPCRegisterInfo.td
+++ b/llvm/lib/Target/PowerPC/PPCRegisterInfo.td
@@ -17,6 +17,7 @@ def sub_un : SubRegIndex<1, 3>;
 def sub_32 : SubRegIndex<32>;
 def sub_32_hi_phony : SubRegIndex<32,32>;
 def sub_64 : SubRegIndex<64>;
+def sub_64_hi_phony : SubRegIndex<64,64>;
 def sub_vsx0 : SubRegIndex<128>;
 def sub_vsx1 : SubRegIndex<128, 128>;
 def sub_gp8_x0 : SubRegIndex<64>;
@@ -77,19 +78,19 @@ class VF<bits<5> num, string n> : PPCReg<n> {
 }
 
 // VR - One of the 32 128-bit vector registers
-class VR<VF SubReg, string n> : PPCReg<n> {
+class VR<VF SubReg, VF SubRegH, string n> : PPCReg<n> {
   let HWEncoding{4-0} = SubReg.HWEncoding{4-0};
   let HWEncoding{5} = 0;
-  let SubRegs = [SubReg];
-  let SubRegIndices = [sub_64];
+  let SubRegs = [SubReg, SubRegH];
+  let SubRegIndices = [sub_64, sub_64_hi_phony];
 }
 
 // VSRL - One of the 32 128-bit VSX registers that overlap with the scalar
 // floating-point registers.
-class VSRL<FPR SubReg, string n> : PPCReg<n> {
+class VSRL<FPR SubReg, FPR SubRegH, string n> : PPCReg<n> {
   let HWEncoding = SubReg.HWEncoding;
-  let SubRegs = [SubReg];
-  let SubRegIndices = [sub_64];
+  let SubRegs = [SubReg, SubRegH];
+  let SubRegIndices = [sub_64, sub_64_hi_phony];
 }
 
 // VSXReg - One of the VSX registers in the range vs32-vs63 with numbering
@@ -155,6 +156,12 @@ foreach Index = 0-31 in {
                 DwarfRegNum<[!add(Index, 32), !add(Index, 32)]>;
 }
 
+let isArtificial = 1 in {
+  foreach Index = 0-31 in {
+    def FH#Index : FPR<-1, "">;
+  }
+}
+
 // Floating-point pair registers
 foreach Index = { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 } in {
   def Fpair#Index : FPPair<"fp"#Index, Index>;
@@ -168,15 +175,21 @@ foreach Index = 0-31 in {
                  DwarfRegNum<[!add(Index, 77), !add(Index, 77)]>;
 }
 
+let isArtificial = 1 in {
+  foreach Index = 0-31 in {
+    def VFH#Index : VF<-1, "">;
+  }
+}
+
 // Vector registers
 foreach Index = 0-31 in {
-  def V#Index : VR<!cast<VF>("VF"#Index), "v"#Index>,
-                DwarfRegNum<[!add(Index, 77), !add(Index, 77)]>;
+  def V#Index : VR<!cast<VF>("VF"#Index), !cast<VF>("VFH"#Index), "v"#Index>,
+                DwarfRegNum<[!add(Index, 77)]>;
 }
 
 // VSX registers
 foreach Index = 0-31 in {
-  def VSL#Index : VSRL<!cast<FPR>("F"#Index), "vs"#Index>,
+  def VSL#Index : VSRL<!cast<FPR>("F"#Index), !cast<FPR>("FH"#Index), "vs"#Index>,
                   DwarfRegAlias<!cast<FPR>("F"#Index)>;
 }
 
diff --git a/llvm/test/CodeGen/PowerPC/frem.ll b/llvm/test/CodeGen/PowerPC/frem.ll
index 8cb68e60f7f9..19b4b1c9cdf9 100644
--- a/llvm/test/CodeGen/PowerPC/frem.ll
+++ b/llvm/test/CodeGen/PowerPC/frem.ll
@@ -70,7 +70,6 @@ define <4 x float> @frem4x32(<4 x float> %a, <4 x float> %b) {
 ; CHECK-NEXT:    xscvspdpn 2, 0
 ; CHECK-NEXT:    bl fmodf
 ; CHECK-NEXT:    nop
-; CHECK-NEXT:    # kill: def $f1 killed $f1 def $vsl1
 ; CHECK-NEXT:    xxmrghd 0, 1, 61
 ; CHECK-NEXT:    xscvspdpn 1, 62
 ; CHECK-NEXT:    xscvspdpn 2, 63
@@ -84,7 +83,6 @@ define <4 x float> @frem4x32(<4 x float> %a, <4 x float> %b) {
 ; CHECK-NEXT:    xscvspdpn 2, 0
 ; CHECK-NEXT:    bl fmodf
 ; CHECK-NEXT:    nop
-; CHECK-NEXT:    # kill: def $f1 killed $f1 def $vsl1
 ; CHECK-NEXT:    xxmrghd 0, 61, 1
 ; CHECK-NEXT:    lxv 63, 80(1) # 16-byte Folded Reload
 ; CHECK-NEXT:    lxv 62, 64(1) # 16-byte Folded Reload
@@ -124,11 +122,8 @@ define <2 x double> @frem2x64(<2 x double> %a, <2 x double> %b) {
 ; CHECK-NEXT:    xscpsgndp 61, 1, 1
 ; CHECK-NEXT:    xxswapd 1, 62
 ; CHECK-NEXT:    xxswapd 2, 63
-; CHECK-NEXT:    # kill: def $f1 killed $f1 killed $vsl1
-; CHECK-NEXT:    # kill: def $f2 killed $f2 killed $vsl2
 ; CHECK-NEXT:    bl fmod
 ; CHECK-NEXT:    nop
-; CHECK-NEXT:    # kill: def $f1 killed $f1 def $vsl1
 ; CHECK-NEXT:    xxmrghd 34, 61, 1
 ; CHECK-NEXT:    lxv 63, 64(1) # 16-byte Folded Reload
 ; CHECK-NEXT:    lxv 62, 48(1) # 16-byte Folded Reload
diff --git a/llvm/test/CodeGen/PowerPC/subreg-lanemasks.mir b/llvm/test/CodeGen/PowerPC/subreg-lanemasks.mir
new file mode 100644
index 000000000000..28a4d0347f10
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/subreg-lanemasks.mir
@@ -0,0 +1,63 @@
+# RUN: llc -mcpu=pwr10 -O3 -ppc-track-subreg-liveness -verify-machineinstrs \
+# RUN:   -mtriple=powerpc64le-unknown-linux-gnu -run-pass=greedy,virtregrewriter \
+# RUN:   -debug-only=regalloc -o - %s 2>&1 >/dev/null | FileCheck %s
+
+# Keep track of all of the lanemasks for various subregsiters.
+#
+# CHECK: %3 [80r,80d:0) 0@80r  L000000000000000C [80r,80d:0) 0@80r  weight:0.000000e+00
+# CHECK: %4 [96r,96d:0) 0@96r  L0000000000003000 [96r,96d:0) 0@96r  weight:0.000000e+00
+# CHECK: %5 [112r,112d:0) 0@112r  L000000000000000C [112r,112d:0) 0@112r  weight:0.000000e+00
+# CHECK: %6 [128r,128d:0) 0@128r  L0000000000003000 [128r,128d:0) 0@128r  weight:0.000000e+00
+# CHECK: %7 [144r,144d:0) 0@144r  L0000000000000004 [144r,144d:0) 0@144r  weight:0.000000e+00
+# CHECK: %8 [160r,160d:0) 0@160r  L0000000000001000 [160r,160d:0) 0@160r  weight:0.000000e+00
+# CHECK: %9 [176r,176d:0) 0@176r  L0000000000000004 [176r,176d:0) 0@176r  weight:0.000000e+00
+# CHECK: %10 [192r,192d:0) 0@192r  L0000000000001000 [192r,192d:0) 0@192r  weight:0.000000e+00
+# CHECK: %11 [208r,208d:0) 0@208r  L0000000000004000 [208r,208d:0) 0@208r  weight:0.000000e+00
+# CHECK: %12 [224r,224d:0) 0@224r  L0000000000010000 [224r,224d:0) 0@224r  weight:0.000000e+00
+# CHECK: %13 [240r,240d:0) 0@240r  L000000000000300C [240r,240d:0) 0@240r  weight:0.000000e+00
+# CHECK: %14 [256r,256d:0) 0@256r  L000000000003C000 [256r,256d:0) 0@256r  weight:0.000000e+00
+
+
+# CHECK:       0B bb.0
+# CHECK-NEXT:    liveins
+# CHECK-NEXT:  16B  %0:vsrc = COPY $v2
+# CHECK-NEXT:  32B  %float:fprc = COPY %0.sub_64:vsrc
+# CHECK-NEXT:  48B  dead undef %pair.sub_vsx0:vsrprc = COPY $v2
+# CHECK-NEXT:  64B  undef %15.sub_vsx1:vsrprc = COPY $v3
+# CHECK-NEXT:  80B  dead undef %3.sub_vsx0:vsrprc = COPY %0:vsrc
+# CHECK-NEXT:  96B  dead undef %4.sub_vsx1:vsrprc = COPY %0:vsrc
+# CHECK-NEXT:  112B  dead undef %5.sub_vsx0:accrc = COPY %0:vsrc
+# CHECK-NEXT:  128B  dead undef %6.sub_vsx1:accrc = COPY %0:vsrc
+# CHECK-NEXT:  144B  dead undef %7.sub_64:vsrprc = COPY %float:fprc
+# CHECK-NEXT:  160B  dead undef %8.sub_vsx1_then_sub_64:vsrprc = COPY %float:fprc
+# CHECK-NEXT:  176B  dead undef %9.sub_64:accrc = COPY %float:fprc
+# CHECK-NEXT:  192B  dead undef %10.sub_vsx1_then_sub_64:accrc = COPY %float:fprc
+# CHECK-NEXT:  208B  dead undef %11.sub_pair1_then_sub_64:accrc = COPY %float:fprc
+# CHECK-NEXT:  224B  dead undef %12.sub_pair1_then_sub_vsx1_then_sub_64:accrc = COPY %float:fprc
+# CHECK-NEXT:  240B  dead undef %13.sub_pair0:accrc = COPY %15:vsrprc
+# CHECK-NEXT:  256B  dead undef %14.sub_pair1:accrc = COPY %15:vsrprc
+
+
+---
+name:            test
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $v2, $v3
+    %0:vsrc = COPY $v2
+    %float:fprc = COPY %0.sub_64
+    undef %pair.sub_vsx0:vsrprc = COPY $v2
+    undef %pair.sub_vsx1:vsrprc = COPY $v3
+    undef %1.sub_vsx0:vsrprc = COPY %0
+    undef %2.sub_vsx1:vsrprc = COPY %0
+    undef %3.sub_vsx0:accrc = COPY %0
+    undef %4.sub_vsx1:accrc = COPY %0
+    undef %5.sub_64:vsrprc = COPY %float
+    undef %6.sub_vsx1_then_sub_64:vsrprc = COPY %float
+    undef %7.sub_64:accrc = COPY %float
+    undef %8.sub_vsx1_then_sub_64:accrc = COPY %float
+    undef %9.sub_pair1_then_sub_64:accrc = COPY %float
+    undef %10.sub_pair1_then_sub_vsx1_then_sub_64:accrc = COPY %float
+    undef %11.sub_pair0:accrc = COPY %pair
+    undef %12.sub_pair1:accrc = COPY %pair
+...

github-actions · 2024-06-06T15:18:40Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp

llvm/lib/Target/PowerPC/PPCRegisterInfo.td

arsenm · 2024-06-12T19:12:26Z

llvm/lib/CodeGen/RegisterCoalescer.cpp

+    LHSVals.pruneSubRegValues(LHS, ShrinkMask);
+    RHSVals.pruneSubRegValues(LHS, ShrinkMask);
+  } else if (TrackSubRegLiveness && !CP.getDstIdx() && CP.getSrcIdx()) {
+    LHS.createSubRangeFrom(LIS->getVNInfoAllocator(),
+                           CP.getNewRC()->getLaneMask(), LHS);
+    mergeSubRangeInto(LHS, RHS, TRI->getSubRegIndexLaneMask(CP.getSrcIdx()), CP,
+                      CP.getDstIdx());
+    LHSVals.pruneMainSegments(LHS, ShrinkMainRange);


Needs a new MIR test?

I have now added a test case with test/CodeGen/PowerPC/subreg-coalescer.mir

arsenm · 2024-06-12T19:13:01Z

llvm/test/CodeGen/PowerPC/subreg-lanemasks.mir

@@ -0,0 +1,63 @@
+# RUN: llc -mcpu=pwr10 -O3 -ppc-track-subreg-liveness -verify-machineinstrs \
+# RUN:   -mtriple=powerpc64le-unknown-linux-gnu -run-pass=greedy,virtregrewriter \
+# RUN:   -debug-only=regalloc -o - %s 2>&1 >/dev/null | FileCheck %s


Do you really need the debug output? Need REQUIRES: asserts if so.

I fixed this test in the patch that has gone in with #94363 .

arsenm · 2024-06-12T19:13:13Z

llvm/test/CodeGen/PowerPC/subreg-lanemasks.mir

@@ -0,0 +1,63 @@
+# RUN: llc -mcpu=pwr10 -O3 -ppc-track-subreg-liveness -verify-machineinstrs \


Use -verify-regalloc instead of -verify-machineinstrs. Also shouldn't need -O3

bzEq · 2024-06-13T00:16:41Z

llvm/lib/CodeGen/RegisterCoalescer.cpp

+                           CP.getNewRC()->getLaneMask(), LHS);
+    mergeSubRangeInto(LHS, RHS, TRI->getSubRegIndexLaneMask(CP.getSrcIdx()), CP,
+                      CP.getDstIdx());
+    LHSVals.pruneMainSegments(LHS, ShrinkMainRange);
    LHSVals.pruneSubRegValues(LHS, ShrinkMask);
    RHSVals.pruneSubRegValues(LHS, ShrinkMask);


Looks no need to prune RHS's subregvals, since RHS doesn't contain any subranges.

I'm not sure what you mean by RHS doesn't contain any subregs. I think I remember seeing subregs on both the RHS and LHS in IR. Taken from one of our tests for example:

%48:vsrprc = INSERT_SUBREG %45, killed %47, %subreg.sub_vsx0

This is probably not what you mean...

I'm not sure what you mean by RHS doesn't contain any subregs.

RHS as a LiveInterval doesn't contain any subranges(if it does, another branch prior this one has handled that). Looking into pruneSubRegValues, the method will iterate over LiveInterval's subranges and shrink range accordingly. Since RHS doesn't contain any subranges, so RHS should not be changed.

bzEq · 2024-06-13T00:19:49Z

llvm/lib/CodeGen/RegisterCoalescer.cpp

@@ -3672,6 +3672,14 @@ bool RegisterCoalescer::joinVirtRegs(CoalescerPair &CP) {
    // having stale segments.
    LHSVals.pruneMainSegments(LHS, ShrinkMainRange);

+    LHSVals.pruneSubRegValues(LHS, ShrinkMask);
+    RHSVals.pruneSubRegValues(LHS, ShrinkMask);
+  } else if (TrackSubRegLiveness && !CP.getDstIdx() && CP.getSrcIdx()) {


Add comment like "MachineVerifier expects subrange to be created if a register is partially defined when subregliveness is enabled"?

If CP has a src idx, I would expect the source value to have subranges when TrackSubRegLiveness is on.
Bottom line, I don't see why we need this case and thus echoing what @arsenm said wrt adding a test.

IIRC, it has something to do with SUBREG_TO_REG in which neither Src nor Dst has subranges.

SUBREG_TO_REG coalescing is pretty broken. It would be great if you could migrate to not using it at all. It's fundamentally flawed and I think should be removed

I'm not sure we can achieve it in short term. Grep a bit, PowerPC has the most lines of code involving SUBREG_TO_REG comparing to X86 and AArch64.

I'm not sure we can achieve it in short term. Grep a bit, PowerPC has the most lines of code involving SUBREG_TO_REG comparing to X86 and AArch64.

I agree we should be moving away from SUBREG_TO_REG however I also agree that for PowerPC this is more of a long term goal. For this patch we will probably have to continue like this.

Is it possible to split this change and test into its own PR

Is it possible to split this change and test into its own PR

The issue with splitting this test and change away is that the test passes anyway unless it is accompanied by the subregsiter changes in this PR. So, unfortunately, if I split out the test and add it to top of trunk main the test will PASS even without the changes to RegisterCoalescer.cpp. The error is exposed by the subregister changes that we made in PPCRegisterInfo.td.

That's fine, pre-committing passing tests is good

stefanp-synopsys

I think this patch is ready for another review. I will also remove the [WIP] from the title as it is no longer a work in progress.

stefanp-synopsys · 2024-06-20T14:41:50Z

llvm/lib/CodeGen/RegisterCoalescer.cpp

+    LHSVals.pruneSubRegValues(LHS, ShrinkMask);
+    RHSVals.pruneSubRegValues(LHS, ShrinkMask);
+  } else if (TrackSubRegLiveness && !CP.getDstIdx() && CP.getSrcIdx()) {
+    LHS.createSubRangeFrom(LIS->getVNInfoAllocator(),
+                           CP.getNewRC()->getLaneMask(), LHS);
+    mergeSubRangeInto(LHS, RHS, TRI->getSubRegIndexLaneMask(CP.getSrcIdx()), CP,
+                      CP.getDstIdx());
+    LHSVals.pruneMainSegments(LHS, ShrinkMainRange);


I have now added a test case with test/CodeGen/PowerPC/subreg-coalescer.mir

stefanp-synopsys · 2024-06-20T14:44:23Z

llvm/lib/CodeGen/RegisterCoalescer.cpp

@@ -3672,6 +3672,14 @@ bool RegisterCoalescer::joinVirtRegs(CoalescerPair &CP) {
    // having stale segments.
    LHSVals.pruneMainSegments(LHS, ShrinkMainRange);

+    LHSVals.pruneSubRegValues(LHS, ShrinkMask);
+    RHSVals.pruneSubRegValues(LHS, ShrinkMask);
+  } else if (TrackSubRegLiveness && !CP.getDstIdx() && CP.getSrcIdx()) {


I'm not sure we can achieve it in short term. Grep a bit, PowerPC has the most lines of code involving SUBREG_TO_REG comparing to X86 and AArch64.

I agree we should be moving away from SUBREG_TO_REG however I also agree that for PowerPC this is more of a long term goal. For this patch we will probably have to continue like this.

llvm/lib/Target/PowerPC/PPCRegisterInfo.td

stefanp-synopsys · 2024-06-20T14:50:34Z

llvm/lib/CodeGen/RegisterCoalescer.cpp

+                           CP.getNewRC()->getLaneMask(), LHS);
+    mergeSubRangeInto(LHS, RHS, TRI->getSubRegIndexLaneMask(CP.getSrcIdx()), CP,
+                      CP.getDstIdx());
+    LHSVals.pruneMainSegments(LHS, ShrinkMainRange);
    LHSVals.pruneSubRegValues(LHS, ShrinkMask);
    RHSVals.pruneSubRegValues(LHS, ShrinkMask);


I'm not sure what you mean by RHS doesn't contain any subregs. I think I remember seeing subregs on both the RHS and LHS in IR. Taken from one of our tests for example:

%48:vsrprc = INSERT_SUBREG %45, killed %47, %subreg.sub_vsx0

This is probably not what you mean...

bzEq · 2024-06-20T15:57:19Z

llvm/test/CodeGen/PowerPC/subreg-coalescer.mir

@@ -0,0 +1,27 @@
+# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 -x mir < %s \


Can we pre-commit this test case? Without this PR, this case should fail verification?

Can we pre-commit this test case? Without this PR, this case should fail verification?

Well, the issue is that without the fix in RegisterCoalescer.cpp the test will fail. However, without this entire patch this test will still pass. We need the fix in the register classes for this test to start failing.

Ok, I see. The HasDisjunctSubRegs bit is false without this PR, so that MRI->shouldTrackSubRegLiveness(...) returns false.

stefanp-synopsys · 2024-06-27T02:47:35Z

I've split a patch from this one:
#96839

It contains two tests. One is for X86 which actually fails without the fix to the register coalescer and the other is the PowerPC test that passes without the PowerPC subregister changes.

The patch #96839 should go in before this one.

chenzheng1030

The change in the PPC part looks mostly good to me except one unexpected change for the dwarfregnum.

chenzheng1030 · 2024-06-20T01:52:51Z

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp

@@ -434,7 +434,6 @@ BitVector PPCRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
      }
    }
  }
-


Nit: remove this unrelated change?

chenzheng1030 · 2024-06-27T05:57:25Z

llvm/lib/Target/PowerPC/PPCRegisterInfo.td

-  def V#Index : VR<!cast<VF>("VF"#Index), "v"#Index>,
-                DwarfRegNum<[!add(Index, 77), !add(Index, 77)]>;
+  def V#Index : VR<!cast<VF>("VF"#Index), !cast<VF>("VFH"#Index), "v"#Index>,
+                DwarfRegNum<[!add(Index, 77)]>;


Why we remove the second !add(Index, 77)? I think it is still needed for 32/64 bit?

chenzheng1030 · 2024-07-16T02:07:56Z

llvm/test/CodeGen/PowerPC/aix-vec_insert_elt.ll

@@ -750,25 +750,21 @@ entry:
 define <2 x double> @testDoubleImm1(<2 x double> %a, double %b) {
 ; CHECK-64-LABEL: testDoubleImm1:
 ; CHECK-64:       # %bb.0: # %entry
-; CHECK-64-NEXT:    # kill: def $f1 killed $f1 def $vsl1


this looks like degradation.

> renamable $f1 = COPY $f1, implicit-def $vsl1 Identity copy: renamable $f1 = COPY $f1, implicit-def $vsl1 replace by: renamable $f1 = KILL $f1, implicit-def $vsl1

The Kill pseudo opcode is inserted from this register rewrite.

Kill can tell the later pass that the f1/vsl1 is killed before instruction COPY. See VirtRegRewriter::handleIdentityCopy(). I don't think adding phony registers for the high part of vsl1 should break this part.

Kill can tell the later pass that the f1/vsl1 is killed before instruction COPY

I think f1/vsl1 is killed at the COPY, not before, so that this identity COPY is replaced by a KILL instruction. I'm wondering how we get rid of the identity COPY after adding phony registers.

I took a look at this and the copy has changed which is why the kill is no longer emitted.
Before the change:

renamable $f1 = COPY $f1, implicit-def $vsl1

After the change:

renamable $f1 = COPY $f1

Basically, we are missing the implicit-def. There is nothing wrong with not having the kill if the implicit-def is not there. However, I'm not sure if the implicit-def is necessary here or not.

A few more details here. This is as intended. As you mentioned, the kill is added by handleIdentityCopy() and is guarded by if (MI.getOperand(1).isUndef() || MI.getNumOperands() > 2) { . If we are missing the implicit-def we have exactly two operands and so we don't add the kill.

However, I'm not sure if the implicit-def is necessary here or not.

Maybe the implicit-def is still necessary at least for the case where the other part of the super register is not allocatable, i.e., once f1 is defined, the vsl1 should also be defined as the phony part can not be allocated.

If the phony lane isn't alive at the COPY, I think it's ok we don't have the implicit-def vsl1.

If the phony lane isn't alive at the COPY, I think it's ok we don't have the implicit-def vsl1.

I think that is what is happening here. I took a look at one of the smaller tests combine-fneg.ll to see and the MIR looks like this:

liveins: $f1, $v2, $v3 undef %3.sub_64:vsrc = COPY $f1 %4:vsrc = XXPERMDI killed %3:vsrc, %3:vsrc, 0 // This is the only use of %3

The only reason we use the VSRC register is because that is the type of register that XXPERMDI requires. The other lane is never used and the value in $f1 is just splatted into %4.

stefanp-synopsys · 2024-07-18T20:11:41Z

Note that this change goes on top of:
#96839

However, I can't seem to get git to show only the diff after that change. So, not compared to main but compared to [stefanp/RegCoalesceFix]. Therefore, this is the link to see that smaller diff:
https://github.com/llvm/llvm-project/pull/94628/files/6c707edffd1e368461cf50cc0068f2c29ece56d8..df2892d03bb5f7cde6f84bacf8ecc267c5dc9212

Once that patch goes in then I can rebase properly and compare against main.

arsenm · 2024-07-19T10:07:21Z

llvm/lib/Target/PowerPC/PPCRegisterInfo.td

@@ -155,6 +156,18 @@ foreach Index = 0-31 in {
                DwarfRegNum<[!add(Index, 32), !add(Index, 32)]>;
 }

+let isArtificial = 1 in {


This isArtificial field looks undocumented in Target.td, can you add some description to it?

Yes! I have added a comment.

I more meant it needs docs in Target.td, not just this one use

stefanp-synopsys · 2024-07-24T03:19:23Z

The patch that this depended on has gone in.
I have rebased this patch on top of it.

llvm/lib/Target/PowerPC/PPCRegisterInfo.td

lei137

I looked through all the tests that this patch updated and it seem to be doing what was expected so that part LGTM.

arsenm

I don't know anything about ppc but seems fine

bzEq

The solution and test change look good to me.

…SX registers. On PowerPC we have VSX registers which overlap with floating point registers. However, the floating point registers only overlap with half of each VSX register while the other half is never used alone. This patch adds phony registers for the other half of the VSX registers in order to fully cover them and to make sure that the lane masks are not the same for the VSX and the floating point register. Note: This patch is still Work in Progress as there are a number of LIT failures that need to be investigated.

…er of tests.

stefanp-synopsys · 2024-08-01T13:57:42Z

/cherry-pick 53c37f3

llvmbot · 2024-08-01T14:02:33Z

Failed to cherry-pick: 53c37f3

https://github.com/llvm/llvm-project/actions/runs/10199739617

Please manually backport the fix and push it to your github fork. Once this is done, please create a pull request

…gisters. (llvm#94628) On PowerPC there are 128 bit VSX registers. These registers are half overlapped with 64 bit floating point registers (FPR). The 64 bit half of the VXS register that does not overlap with the FPR does not overlap with any other register class. The FPR are the only subregisters of the VSX registers but they do not fully cover the 128 bit super register. This leads to incorrect lane masks being created. This patch adds phony registers for the other half of the VSX registers in order to fully cover them and to make sure that the lane masks are not the same for the VSX and the floating point register. (cherry picked from commit 53c37f3)

llvmbot added the backend:PowerPC label Jun 6, 2024

stefanp-synopsys requested review from chenzheng1030 and ecnelises June 6, 2024 15:18

stefanp-synopsys self-assigned this Jun 6, 2024

stefanp-synopsys requested review from MatzeB, lei137 and bzEq June 6, 2024 15:21

bzEq requested a review from kparzysz June 6, 2024 16:34

bzEq reviewed Jun 6, 2024

View reviewed changes

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/PowerPC/PPCRegisterInfo.td Show resolved Hide resolved

llvm/lib/Target/PowerPC/PPCRegisterInfo.td Show resolved Hide resolved

llvmbot added the llvm:regalloc label Jun 12, 2024

arsenm reviewed Jun 12, 2024

View reviewed changes

bzEq reviewed Jun 13, 2024

View reviewed changes

bzEq requested a review from qcolombet June 13, 2024 00:22

stefanp-synopsys force-pushed the stefanp/FixForMissingSubregs branch 2 times, most recently from 6cb5de0 to 7740d7e Compare June 18, 2024 17:59

stefanp-synopsys commented Jun 20, 2024

View reviewed changes

stefanp-synopsys changed the title ~~[WIP][PowerPC] Add phony subregisters to cover the high half of the VSX registers.~~ [PowerPC] Add phony subregisters to cover the high half of the VSX registers. Jun 20, 2024

bzEq reviewed Jun 20, 2024

View reviewed changes

bzEq requested a review from arsenm June 22, 2024 05:07

chenzheng1030 reviewed Jun 27, 2024

View reviewed changes

stefanp-synopsys force-pushed the stefanp/FixForMissingSubregs branch from bf8be4b to eb2dc76 Compare July 15, 2024 19:48

chenzheng1030 reviewed Jul 16, 2024

View reviewed changes

stefanp-synopsys force-pushed the stefanp/FixForMissingSubregs branch from eb2dc76 to df2892d Compare July 18, 2024 20:02

arsenm reviewed Jul 19, 2024

View reviewed changes

stefanp-synopsys force-pushed the stefanp/FixForMissingSubregs branch from 7980fea to 6158697 Compare July 24, 2024 03:16

stefanp-synopsys force-pushed the stefanp/FixForMissingSubregs branch from 6158697 to eaff954 Compare July 25, 2024 15:32

arsenm reviewed Jul 25, 2024

View reviewed changes

llvm/lib/Target/PowerPC/PPCRegisterInfo.td Outdated Show resolved Hide resolved

lei137 reviewed Jul 26, 2024

View reviewed changes

arsenm approved these changes Jul 26, 2024

View reviewed changes

bzEq approved these changes Jul 27, 2024

View reviewed changes

stefanp-synopsys added 10 commits July 29, 2024 09:44

Add fix for RegisterCoalescer

5173f7f

Added the register class for the phony registers. Also updated a numb…

f2a2eec

…er of tests.

Updated remaining failing test cases.

bb20b4c

Fixed up one more test case that I missed.

b06ddd7

Added test subreg-coalescer.mir for the RegisterCoalescer.

deda75f

Fixed a whitespace issue and a missing DwarfRegNum.

e17c8f8

Added comment for isArtificial.

2b00e8f

Updated test case after rebase

fe33901

Fixed typo in comment.

c1ef2be

stefanp-synopsys force-pushed the stefanp/FixForMissingSubregs branch from 0b3478b to c1ef2be Compare July 29, 2024 15:12

stefanp-synopsys merged commit 53c37f3 into llvm:main Jul 29, 2024
4 of 6 checks passed

stefanp-synopsys added this to the LLVM 19.X Release milestone Aug 1, 2024

llvmbot added the release:cherry-pick-failed label Aug 1, 2024

mgabka mentioned this pull request Aug 22, 2024

Add release note about ABI mgabka/llvm-project#6

Open

thewtex mentioned this pull request Feb 10, 2025

llvmorg 19.1.5 libcxxabi pthread lib name #126605

Closed

		@@ -0,0 +1,63 @@
		# RUN: llc -mcpu=pwr10 -O3 -ppc-track-subreg-liveness -verify-machineinstrs \

		@@ -0,0 +1,27 @@
		# RUN: llc -mtriple powerpc64le-unknown-linux-gnu -mcpu=pwr8 -x mir < %s \

[PowerPC] Add phony subregisters to cover the high half of the VSX registers. #94628

[PowerPC] Add phony subregisters to cover the high half of the VSX registers. #94628

Uh oh!

Conversation

stefanp-synopsys commented Jun 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jun 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stefanp-synopsys left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stefanp-synopsys commented Jun 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenzheng1030 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stefanp-synopsys commented Jun 6, 2024 •

edited

Loading

llvmbot commented Jun 6, 2024 •

edited

Loading

github-actions bot commented Jun 6, 2024 •

edited

Loading

stefanp-synopsys commented Jun 27, 2024 •

edited

Loading