Skip to content

[X86] Complete AMD znver4 AVX512 zeroing idioms #108740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 68 additions & 15 deletions llvm/lib/Target/X86/X86ScheduleZnver4.td
Original file line number Diff line number Diff line change
Expand Up @@ -1839,35 +1839,59 @@ def Zn4WriteFZeroIdiom : SchedWriteVariant<[
]>;
// NOTE: XORPSrr, XORPDrr are not zero-cycle!
def : InstRW<[Zn4WriteFZeroIdiom], (instrs VXORPSrr, VXORPDrr,
VANDNPSrr, VANDNPDrr)>;
VXORPSZ128rr,
VXORPDZ128rr,
VANDNPSrr, VANDNPDrr,
VANDNPSZ128rr,
VANDNPDZ128rr)>;

def Zn4WriteFZeroIdiomY : SchedWriteVariant<[
SchedVar<MCSchedPredicate<ZeroIdiomPredicate>, [Zn4WriteZeroLatency]>,
SchedVar<NoSchedPred, [WriteFLogicY]>
]>;
def : InstRW<[Zn4WriteFZeroIdiomY], (instrs VXORPSYrr, VXORPDYrr,
VANDNPSYrr, VANDNPDYrr)>;
VXORPSZ256rr,
VXORPDZ256rr,
VANDNPSYrr, VANDNPDYrr,
VANDNPSZ256rr,
VANDNPDZ256rr)>;

def Zn4WriteFZeroIdiomZ : SchedWriteVariant<[
SchedVar<MCSchedPredicate<ZeroIdiomPredicate>, [Zn4WriteZeroLatency]>,
SchedVar<NoSchedPred, [WriteFLogicZ]>
]>;
def : InstRW<[Zn4WriteFZeroIdiomZ], (instrs VXORPSZrr, VXORPDZrr,
VANDNPSZrr, VANDNPDZrr)>;

def Zn4WriteVZeroIdiomLogicX : SchedWriteVariant<[
SchedVar<MCSchedPredicate<ZeroIdiomPredicate>, [Zn4WriteZeroLatency]>,
SchedVar<NoSchedPred, [WriteVecLogicX]>
]>;
// NOTE: PXORrr,PANDNrr are not zero-cycle!
def : InstRW<[Zn4WriteVZeroIdiomLogicX], (instrs VPXORrr, VPANDNrr)>;
def : InstRW<[Zn4WriteVZeroIdiomLogicX], (instrs VPXORrr,
VPXORDZ128rr,
VPXORQZ128rr,
VPANDNrr,
VPANDNDZ128rr,
VPANDNQZ128rr)>;

// TODO: This should be extended to incorporate all of the AVX512 zeroing
// idioms that can be executed by the renamer.
def Zn4WriteVZeroIdiomLogicZ : SchedWriteVariant<[
def Zn4WriteVZeroIdiomLogicY : SchedWriteVariant<[
SchedVar<MCSchedPredicate<ZeroIdiomPredicate>, [Zn4WriteZeroLatency]>,
SchedVar<NoSchedPred, [WriteVecLogicZ]>
SchedVar<NoSchedPred, [WriteVecLogicY]>
]>;
def : InstRW<[Zn4WriteVZeroIdiomLogicZ], (instrs VPXORDZrr)>;
def : InstRW<[Zn4WriteVZeroIdiomLogicY], (instrs VPXORYrr,
VPXORDZ256rr,
VPXORQZ256rr,
VPANDNYrr,
VPANDNDZ256rr,
VPANDNQZ256rr)>;

def Zn4WriteVZeroIdiomLogicY : SchedWriteVariant<[
def Zn4WriteVZeroIdiomLogicZ : SchedWriteVariant<[
SchedVar<MCSchedPredicate<ZeroIdiomPredicate>, [Zn4WriteZeroLatency]>,
SchedVar<NoSchedPred, [WriteVecLogicY]>
SchedVar<NoSchedPred, [WriteVecLogicZ]>
]>;
def : InstRW<[Zn4WriteVZeroIdiomLogicY], (instrs VPXORYrr, VPANDNYrr)>;
def : InstRW<[Zn4WriteVZeroIdiomLogicZ], (instrs VPXORDZrr, VPXORQZrr,
VPANDNDZrr, VPANDNQZrr)>;

def Zn4WriteVZeroIdiomALUX : SchedWriteVariant<[
SchedVar<MCSchedPredicate<ZeroIdiomPredicate>, [Zn4WriteZeroLatency]>,
Expand All @@ -1877,15 +1901,29 @@ def Zn4WriteVZeroIdiomALUX : SchedWriteVariant<[
// PCMPGTBrr, PCMPGTWrr, PCMPGTDrr, PCMPGTQrr are not zero-cycle!
def : InstRW<[Zn4WriteVZeroIdiomALUX],
(instrs VPSUBBrr, VPSUBWrr, VPSUBDrr, VPSUBQrr,
VPCMPGTBrr, VPCMPGTWrr, VPCMPGTDrr, VPCMPGTQrr)>;
VPSUBBZ128rr, VPSUBWZ128rr, VPSUBDZ128rr, VPSUBQZ128rr,
VPCMPGTBrr, VPCMPGTWrr, VPCMPGTDrr, VPCMPGTQrr,
VPCMPGTBZ128rr, VPCMPGTWZ128rr,
VPCMPGTDZ128rr, VPCMPGTQZ128rr)>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ganeshgit Please can you confirm that AVX512 VPCMPGTZ128/Z256/Z style compares (which write to k-reg) are zero-idioms? It says so in the SoG but I'm concerned its a cut+paste typo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ganeshgit Please can you confirm that AVX512 VPCMPGTZ128/Z256/Z style compares (which write to k-reg) are zero-idioms? It says so in the SoG but I'm concerned its a cut+paste typo.

Yes they are zero-idioms. In AVX these would write a YMM register, and in AVX512, they write a K register. So, they are okay.


def Zn4WriteVZeroIdiomALUY : SchedWriteVariant<[
SchedVar<MCSchedPredicate<ZeroIdiomPredicate>, [Zn4WriteZeroLatency]>,
SchedVar<NoSchedPred, [WriteVecALUY]>
]>;
def : InstRW<[Zn4WriteVZeroIdiomALUY],
(instrs VPSUBBYrr, VPSUBWYrr, VPSUBDYrr, VPSUBQYrr,
VPCMPGTBYrr, VPCMPGTWYrr, VPCMPGTDYrr, VPCMPGTQYrr)>;
VPSUBBZ256rr, VPSUBWZ256rr, VPSUBDZ256rr, VPSUBQZ256rr,
VPCMPGTBYrr, VPCMPGTWYrr, VPCMPGTDYrr, VPCMPGTQYrr,
VPCMPGTBZ256rr, VPCMPGTWZ256rr,
VPCMPGTDZ256rr, VPCMPGTQZ256rr)>;

def Zn4WriteVZeroIdiomALUZ : SchedWriteVariant<[
SchedVar<MCSchedPredicate<ZeroIdiomPredicate>, [Zn4WriteZeroLatency]>,
SchedVar<NoSchedPred, [WriteVecALUZ]>
]>;
def : InstRW<[Zn4WriteVZeroIdiomALUY],
(instrs VPSUBBZrr, VPSUBWZrr, VPSUBDZrr, VPSUBQZrr,
VPCMPGTBZrr, VPCMPGTWZrr, VPCMPGTDZrr, VPCMPGTQZrr)>;

def : IsZeroIdiomFunction<[
// GPR Zero-idioms.
Expand Down Expand Up @@ -1940,9 +1978,24 @@ def : IsZeroIdiomFunction<[
], ZeroIdiomPredicate>,

// AVX ZMM Zero-idioms.
// TODO: This should be expanded to incorporate all AVX512 zeroing idioms.
DepBreakingClass<[
VPXORDZrr
// fp variants.
VXORPSZrr, VXORPDZrr,
VXORPSZ128rr, VXORPDZ128rr, VXORPSZ256rr, VXORPDZ256rr,
VANDNPSZrr, VANDNPDZrr,
VANDNPSZ128rr, VANDNPDZ128rr, VANDNPSZ256rr, VANDNPDZ256rr,

// int variants.
VPCMPGTBZrr, VPCMPGTWZrr, VPCMPGTDZrr, VPCMPGTQZrr,
VPCMPGTBZ128rr, VPCMPGTWZ128rr, VPCMPGTDZ128rr, VPCMPGTQZ128rr,
VPCMPGTBZ256rr, VPCMPGTWZ256rr, VPCMPGTDZ256rr, VPCMPGTQZ256rr,
VPANDNDZrr, VPANDNQZrr,
VPANDNDZ128rr, VPANDNQZ128rr, VPANDNDZ256rr, VPANDNQZ256rr,
VPXORDZrr, VPXORQZrr,
VPXORDZ128rr, VPXORQZ128rr, VPXORDZ256rr, VPXORQZ256rr,
VPSUBBZrr, VPSUBWZrr, VPSUBDZrr, VPSUBQZrr,
VPSUBBZ128rr, VPSUBWZ128rr, VPSUBDZ128rr, VPSUBQZ128rr,
VPSUBBZ256rr, VPSUBWZ256rr, VPSUBDZ256rr, VPSUBQZ256rr,
], ZeroIdiomPredicate>,
]>;

Expand Down
Loading
Loading