Skip to content

[PowerPC] Add phony subregisters to cover the high half of the VSX registers. #94628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
35 changes: 27 additions & 8 deletions llvm/lib/Target/PowerPC/PPCRegisterInfo.td
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ def sub_un : SubRegIndex<1, 3>;
def sub_32 : SubRegIndex<32>;
def sub_32_hi_phony : SubRegIndex<32,32>;
def sub_64 : SubRegIndex<64>;
def sub_64_hi_phony : SubRegIndex<64,64>;
def sub_vsx0 : SubRegIndex<128>;
def sub_vsx1 : SubRegIndex<128, 128>;
def sub_gp8_x0 : SubRegIndex<64>;
Expand Down Expand Up @@ -77,19 +78,19 @@ class VF<bits<5> num, string n> : PPCReg<n> {
}

// VR - One of the 32 128-bit vector registers
class VR<VF SubReg, string n> : PPCReg<n> {
class VR<VF SubReg, VF SubRegH, string n> : PPCReg<n> {
let HWEncoding{4-0} = SubReg.HWEncoding{4-0};
let HWEncoding{5} = 0;
let SubRegs = [SubReg];
let SubRegIndices = [sub_64];
let SubRegs = [SubReg, SubRegH];
let SubRegIndices = [sub_64, sub_64_hi_phony];
}

// VSRL - One of the 32 128-bit VSX registers that overlap with the scalar
// floating-point registers.
class VSRL<FPR SubReg, string n> : PPCReg<n> {
class VSRL<FPR SubReg, FPR SubRegH, string n> : PPCReg<n> {
let HWEncoding = SubReg.HWEncoding;
let SubRegs = [SubReg];
let SubRegIndices = [sub_64];
let SubRegs = [SubReg, SubRegH];
let SubRegIndices = [sub_64, sub_64_hi_phony];
}

// VSXReg - One of the VSX registers in the range vs32-vs63 with numbering
Expand Down Expand Up @@ -155,6 +156,22 @@ foreach Index = 0-31 in {
DwarfRegNum<[!add(Index, 32), !add(Index, 32)]>;
}

// The FH and VFH registers have been marked as Artifical because there are no
// instructions on PowerPC that use those register classes. They only exist
// in order to ensure that the super registers (V and VSL) are covered by their
// subregisters and have correct subregister lane masks.
let isArtificial = 1 in {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isArtificial field looks undocumented in Target.td, can you add some description to it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I have added a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I more meant it needs docs in Target.td, not just this one use

foreach Index = 0-31 in {
def FH#Index : FPR<-1, "">;
def VFH#Index : VF<-1, "">;
}
}

let isAllocatable = 0, CopyCost = -1 in {
def VFHRC : RegisterClass<"PPC", [f64], 64, (sequence "VFH%u", 0, 31)>;
def FHRC : RegisterClass<"PPC", [f64], 64, (sequence "FH%u", 0, 31)>;
}

// Floating-point pair registers
foreach Index = { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 } in {
def Fpair#Index : FPPair<"fp"#Index, Index>;
Expand All @@ -168,17 +185,19 @@ foreach Index = 0-31 in {
DwarfRegNum<[!add(Index, 77), !add(Index, 77)]>;
}

let CoveredBySubRegs = 1 in {
// Vector registers
foreach Index = 0-31 in {
def V#Index : VR<!cast<VF>("VF"#Index), "v"#Index>,
def V#Index : VR<!cast<VF>("VF"#Index), !cast<VF>("VFH"#Index), "v"#Index>,
DwarfRegNum<[!add(Index, 77), !add(Index, 77)]>;
}

// VSX registers
foreach Index = 0-31 in {
def VSL#Index : VSRL<!cast<FPR>("F"#Index), "vs"#Index>,
def VSL#Index : VSRL<!cast<FPR>("F"#Index), !cast<FPR>("FH"#Index), "vs"#Index>,
DwarfRegAlias<!cast<FPR>("F"#Index)>;
}
}

// Dummy VSX registers, this defines string: "vs32"-"vs63", and is only used for
// asm printing.
Expand Down
4 changes: 0 additions & 4 deletions llvm/test/CodeGen/PowerPC/aix-vec_insert_elt.ll
Original file line number Diff line number Diff line change
Expand Up @@ -750,25 +750,21 @@ entry:
define <2 x double> @testDoubleImm1(<2 x double> %a, double %b) {
; CHECK-64-LABEL: testDoubleImm1:
; CHECK-64: # %bb.0: # %entry
; CHECK-64-NEXT: # kill: def $f1 killed $f1 def $vsl1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like degradation.

> renamable $f1 = COPY $f1, implicit-def $vsl1
Identity copy: renamable $f1 = COPY $f1, implicit-def $vsl1
  replace by: renamable $f1 = KILL $f1, implicit-def $vsl1

The Kill pseudo opcode is inserted from this register rewrite.

Kill can tell the later pass that the f1/vsl1 is killed before instruction COPY. See VirtRegRewriter::handleIdentityCopy(). I don't think adding phony registers for the high part of vsl1 should break this part.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kill can tell the later pass that the f1/vsl1 is killed before instruction COPY

I think f1/vsl1 is killed at the COPY, not before, so that this identity COPY is replaced by a KILL instruction. I'm wondering how we get rid of the identity COPY after adding phony registers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at this and the copy has changed which is why the kill is no longer emitted.
Before the change:

renamable $f1 = COPY $f1, implicit-def $vsl1

After the change:

renamable $f1 = COPY $f1

Basically, we are missing the implicit-def. There is nothing wrong with not having the kill if the implicit-def is not there. However, I'm not sure if the implicit-def is necessary here or not.

A few more details here. This is as intended. As you mentioned, the kill is added by handleIdentityCopy() and is guarded by if (MI.getOperand(1).isUndef() || MI.getNumOperands() > 2) { . If we are missing the implicit-def we have exactly two operands and so we don't add the kill.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I'm not sure if the implicit-def is necessary here or not.

Maybe the implicit-def is still necessary at least for the case where the other part of the super register is not allocatable, i.e., once f1 is defined, the vsl1 should also be defined as the phony part can not be allocated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the phony lane isn't alive at the COPY, I think it's ok we don't have the implicit-def vsl1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the phony lane isn't alive at the COPY, I think it's ok we don't have the implicit-def vsl1.

I think that is what is happening here. I took a look at one of the smaller tests combine-fneg.ll to see and the MIR looks like this:

  liveins: $f1, $v2, $v3                                                                                                                                                                                        
    undef %3.sub_64:vsrc = COPY $f1                                                                                    
    %4:vsrc = XXPERMDI killed %3:vsrc, %3:vsrc, 0   // This is the only use of %3

The only reason we use the VSRC register is because that is the type of register that XXPERMDI requires. The other lane is never used and the value in $f1 is just splatted into %4.

; CHECK-64-NEXT: xxpermdi 34, 1, 34, 1
; CHECK-64-NEXT: blr
;
; CHECK-32-LABEL: testDoubleImm1:
; CHECK-32: # %bb.0: # %entry
; CHECK-32-NEXT: # kill: def $f1 killed $f1 def $vsl1
; CHECK-32-NEXT: xxpermdi 34, 1, 34, 1
; CHECK-32-NEXT: blr
;
; CHECK-64-P10-LABEL: testDoubleImm1:
; CHECK-64-P10: # %bb.0: # %entry
; CHECK-64-P10-NEXT: # kill: def $f1 killed $f1 def $vsl1
; CHECK-64-P10-NEXT: xxpermdi 34, 1, 34, 1
; CHECK-64-P10-NEXT: blr
;
; CHECK-32-P10-LABEL: testDoubleImm1:
; CHECK-32-P10: # %bb.0: # %entry
; CHECK-32-P10-NEXT: # kill: def $f1 killed $f1 def $vsl1
; CHECK-32-P10-NEXT: xxpermdi 34, 1, 34, 1
; CHECK-32-P10-NEXT: blr
entry:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1099,7 +1099,6 @@ define double @getd1(<2 x double> %vd) {
; CHECK-LABEL: getd1:
; CHECK: # %bb.0: # %entry
; CHECK-NEXT: xxswapd 1, 34
; CHECK-NEXT: # kill: def $f1 killed $f1 killed $vsl1
; CHECK-NEXT: blr
entry:
%vecext = extractelement <2 x double> %vd, i32 1
Expand All @@ -1115,7 +1114,6 @@ define double @getveld(<2 x double> %vd, i32 signext %i) {
; CHECK-NEXT: lvsl 3, 0, 3
; CHECK-NEXT: vperm 2, 2, 2, 3
; CHECK-NEXT: xxlor 1, 34, 34
; CHECK-NEXT: # kill: def $f1 killed $f1 killed $vsl1
; CHECK-NEXT: blr
entry:
%vecext = extractelement <2 x double> %vd, i32 %i
Expand Down
Loading