-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[X86] Prevent APX NDD compression when it creates a partial write #132051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[X86] Prevent APX NDD compression when it creates a partial write #132051
Conversation
APX NDD instructions may be compressed when the result is also a source. For 8/16b instructions, this may create partial register write hazards if a previous super-register def is within the partial reg update clearance, or incorrect code if the super-register is not dead. This change prevents compression when the super-register is marked as an implicit define, which the virtual rewriter already adds in the case where a subregister is defined but the super-register is not dead. The BreakFalseDeps interface is also updated to add implicit super-register defs for NDD instructions that would incur partial-write stalls if compressed to legacy ops.
@llvm/pr-subscribers-backend-x86 Author: Daniel Zabawa (daniel-zabawa) ChangesAPX NDD instructions may be compressed when the result is also a source. For 8/16b instructions, this may create partial register write hazards if a previous super-register def is within the partial reg update clearance, or incorrect code if the super-register is not dead. This change prevents compression when the super-register is marked as an implicit define, which the virtual rewriter already adds in the case where a subregister is defined but the super-register is not dead. The BreakFalseDeps interface is also updated to add implicit super-register defs for NDD instructions that would incur partial-write stalls if compressed to legacy ops. Full diff: https://github.com/llvm/llvm-project/pull/132051.diff 4 Files Affected:
diff --git a/llvm/lib/Target/X86/X86CompressEVEX.cpp b/llvm/lib/Target/X86/X86CompressEVEX.cpp
index b80c21b008f4b..6c22bd9c46b4f 100644
--- a/llvm/lib/Target/X86/X86CompressEVEX.cpp
+++ b/llvm/lib/Target/X86/X86CompressEVEX.cpp
@@ -237,6 +237,23 @@ static bool CompressEVEXImpl(MachineInstr &MI, const X86Subtarget &ST) {
return 0;
return I->NewOpc;
};
+
+ // Redundant NDD ops cannot be safely compressed if either:
+ // - the legacy op would introduce a partial write that BreakFalseDeps
+ // identified as a potential stall, or
+ // - the op is writing to a subregister of a live register, i.e. the
+ // full (zeroed) result is used.
+ // Both cases are indicated by an implicit def of the superregister.
+ if (IsRedundantNDD) {
+ Register Dst = MI.getOperand(0).getReg();
+ if (Dst &&
+ (X86::GR16RegClass.contains(Dst) || X86::GR8RegClass.contains(Dst))) {
+ Register Super = getX86SubSuperRegister(Dst, ST.is64Bit() ? 64 : 32);
+ if (MI.definesRegister(Super, /*TRI=*/nullptr))
+ IsRedundantNDD = false;
+ }
+ }
+
// NonNF -> NF only if it's not a compressible NDD instruction and eflags is
// dead.
unsigned NewOpc = IsRedundantNDD
diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index 5c65171dd83b0..a8a533de30cbf 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -6793,19 +6793,42 @@ static bool hasPartialRegUpdate(unsigned Opcode, const X86Subtarget &Subtarget,
unsigned X86InstrInfo::getPartialRegUpdateClearance(
const MachineInstr &MI, unsigned OpNum,
const TargetRegisterInfo *TRI) const {
- if (OpNum != 0 || !hasPartialRegUpdate(MI.getOpcode(), Subtarget))
+
+ // With the NDD/ZU features, ISel may generate NDD/ZU ops which
+ // appear to perform partial writes. We detect these based on flags
+ // and register class.
+ bool HasNDDPartialWrite = false;
+ if (OpNum == 0 && (Subtarget.hasNDD() || Subtarget.hasZU()) &&
+ X86II::hasNewDataDest(MI.getDesc().TSFlags)) {
+ Register Reg = MI.getOperand(0).getReg();
+ if (Reg.isVirtual()) {
+ auto &MRI = MI.getParent()->getParent()->getRegInfo();
+ if (auto *TRC = MRI.getRegClassOrNull(Reg))
+ HasNDDPartialWrite = (TRC->getID() == X86::GR16RegClassID ||
+ TRC->getID() == X86::GR8RegClassID);
+ } else
+ HasNDDPartialWrite =
+ X86::GR8RegClass.contains(Reg) || X86::GR16RegClass.contains(Reg);
+ }
+
+ if (OpNum != 0 ||
+ !(HasNDDPartialWrite || hasPartialRegUpdate(MI.getOpcode(), Subtarget)))
return 0;
- // If MI is marked as reading Reg, the partial register update is wanted.
+ // For non-NDD ops, if MI is marked as reading Reg, the partial register
+ // update is wanted, hence we return 0.
+ // For NDD ops, if MI is marked as reading Reg, then it is possible to
+ // compress to a legacy form in CompressEVEX, which would create an
+ // unwanted partial update, so we return the clearance.
const MachineOperand &MO = MI.getOperand(0);
Register Reg = MO.getReg();
- if (Reg.isVirtual()) {
- if (MO.readsReg() || MI.readsVirtualRegister(Reg))
- return 0;
- } else {
- if (MI.readsRegister(Reg, TRI))
- return 0;
- }
+ bool ReadsReg = false;
+ if (Reg.isVirtual())
+ ReadsReg = (MO.readsReg() || MI.readsVirtualRegister(Reg));
+ else
+ ReadsReg = MI.readsRegister(Reg, TRI);
+ if (ReadsReg != HasNDDPartialWrite)
+ return 0;
// If any instructions in the clearance range are reading Reg, insert a
// dependency breaking instruction, which is inexpensive and is likely to
@@ -7229,6 +7252,18 @@ void X86InstrInfo::breakPartialRegDependency(
.addReg(Reg, RegState::Undef)
.addReg(Reg, RegState::Undef);
MI.addRegisterKilled(Reg, TRI, true);
+ } else if ((X86::GR16RegClass.contains(Reg) ||
+ X86::GR8RegClass.contains(Reg)) &&
+ X86II::hasNewDataDest(MI.getDesc().TSFlags)) {
+ // This case is only expected for NDD ops which appear to be partial
+ // writes, but are not due to the zeroing of the upper part. Here
+ // we add an implicit def of the superegister, which prevents
+ // CompressEVEX from converting this to a legacy form.
+ Register SuperReg =
+ getX86SubSuperRegister(Reg, Subtarget.is64Bit() ? 64 : 32);
+ MachineInstrBuilder BuildMI(*MI.getParent()->getParent(), &MI);
+ if (!MI.definesRegister(SuperReg, /*TRI=*/nullptr))
+ BuildMI.addReg(SuperReg, RegState::ImplicitDefine);
}
}
diff --git a/llvm/test/CodeGen/X86/apx/ndd-false-deps-asm.mir b/llvm/test/CodeGen/X86/apx/ndd-false-deps-asm.mir
new file mode 100644
index 0000000000000..e297e87413ada
--- /dev/null
+++ b/llvm/test/CodeGen/X86/apx/ndd-false-deps-asm.mir
@@ -0,0 +1,89 @@
+# NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+# RUN: llc %s -mtriple=x86_64-unknown -mattr=+ndd,+egpr -start-before=break-false-deps -show-mc-encoding -o - | FileCheck --check-prefixes=ASM,RCDEFAULT %s
+# RUN: llc %s -mtriple=x86_64-unknown -mattr=+ndd,+egpr -start-before=break-false-deps -partial-reg-update-clearance=1 -show-mc-encoding -o - | FileCheck --check-prefixes=ASM,RC1 %s
+#
+# Check that BreakFalseDeps detects cases where an ND instruction would cause a partial register write
+# if compressed to a legacy op. MIR has been modified to force different register assignments.
+#
+# For partial_write, the ADD16rr_ND is compressible, but will become a partial write after compression.
+# Compression is inhibited if the eax definition is within the partial-reg-update-clearance threshold.
+#
+# For no_partial_write, the ADD16rr_ND is incompressible hence it cannot become a partial write.
+# This case checks that an implicit-def of eax is not added by breakPartialRegDependency.
+#
+--- |
+ define signext i16 @partial_write(ptr %p, i32 %a, i32 %b, i16 signext %x, i16 signext %y) #0 {
+ ; RCDEFAULT-LABEL: partial_write:
+ ; RCDEFAULT: # %bb.0: # %entry
+ ; RCDEFAULT-NEXT: addl %esi, %edx, %eax # encoding: [0x62,0xf4,0x7c,0x18,0x01,0xf2]
+ ; RCDEFAULT-NEXT: movl %eax, (%rdi) # encoding: [0x89,0x07]
+ ; RCDEFAULT-NEXT: addw %cx, %ax, %ax # encoding: [0x62,0xf4,0x7d,0x18,0x01,0xc8]
+ ; RCDEFAULT-NEXT: retq # encoding: [0xc3]
+ ;
+ ; RC1-LABEL: partial_write:
+ ; RC1: # %bb.0: # %entry
+ ; RC1-NEXT: addl %esi, %edx, %eax # encoding: [0x62,0xf4,0x7c,0x18,0x01,0xf2]
+ ; RC1-NEXT: movl %eax, (%rdi) # encoding: [0x89,0x07]
+ ; RC1-NEXT: addw %cx, %ax # EVEX TO LEGACY Compression encoding: [0x66,0x01,0xc8]
+ ; RC1-NEXT: retq # encoding: [0xc3]
+ entry:
+ %add = add nsw i32 %b, %a
+ store i32 %add, ptr %p, align 4, !tbaa !1
+ %add1 = trunc i32 %add to i16
+ %add2 = add i16 %add1, %x
+ ret i16 %add2
+ }
+
+ define signext i16 @no_partial_write(ptr %p, i32 %a, i32 %b, i16 signext %x, i16 signext %y) #0 {
+ ; ASM-LABEL: no_partial_write:
+ ; ASM: # %bb.0: # %entry
+ ; ASM-NEXT: addl %esi, %edx # EVEX TO LEGACY Compression encoding: [0x01,0xf2]
+ ; ASM-NEXT: movl %edx, (%rdi) # encoding: [0x89,0x17]
+ ; ASM-NEXT: addw %cx, %dx, %ax # encoding: [0x62,0xf4,0x7d,0x18,0x01,0xca]
+ ; ASM-NEXT: retq # encoding: [0xc3]
+ entry:
+ %add = add nsw i32 %b, %a
+ store i32 %add, ptr %p, align 4, !tbaa !1
+ %add1 = trunc i32 %add to i16
+ %add2 = add i16 %add1, %x
+ ret i16 %add2
+ }
+ attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write) uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx16,+cx8,+egpr,+fxsr,+mmx,+ndd,+sse,+sse2,+x87" "tune-cpu"="generic" }
+
+ !llvm.module.flags = !{!0}
+
+ !0 = !{i32 1, !"wchar_size", i32 4}
+ !1 = !{!2, !2, i64 0}
+ !2 = !{!"int", !3, i64 0}
+ !3 = !{!"omnipotent char", !4, i64 0}
+ !4 = !{!"Simple C/C++ TBAA"}
+...
+---
+name: partial_write
+tracksRegLiveness: true
+noVRegs: true
+noPhis: true
+isSSA: false
+body: |
+ bb.0.entry:
+ liveins: $ecx, $edx, $esi, $rdi
+ renamable $eax = nsw ADD32rr_ND killed renamable $edx, killed renamable $esi, implicit-def dead $eflags
+ MOV32mr killed renamable $rdi, 1, $noreg, 0, $noreg, renamable $eax :: (store (s32) into %ir.p, !tbaa !1)
+ renamable $ax = ADD16rr_ND renamable $ax, renamable $cx, implicit-def dead $eflags, implicit killed $ecx, implicit $eax
+ RET64 $ax
+...
+---
+name: no_partial_write
+tracksRegLiveness: true
+noVRegs: true
+noPhis: true
+isSSA: false
+body: |
+ bb.0.entry:
+ liveins: $ecx, $edx, $esi, $rdi
+
+ renamable $edx = nsw ADD32rr_ND killed renamable $edx, killed renamable $esi, implicit-def dead $eflags
+ MOV32mr killed renamable $rdi, 1, $noreg, 0, $noreg, renamable $edx :: (store (s32) into %ir.p, !tbaa !1)
+ renamable $ax = ADD16rr_ND renamable $dx, renamable $cx, implicit-def dead $eflags, implicit killed $ecx, implicit killed $edx
+ RET64 $ax
+...
diff --git a/llvm/test/CodeGen/X86/apx/ndd-false-deps.mir b/llvm/test/CodeGen/X86/apx/ndd-false-deps.mir
new file mode 100644
index 0000000000000..376a89180290d
--- /dev/null
+++ b/llvm/test/CodeGen/X86/apx/ndd-false-deps.mir
@@ -0,0 +1,96 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc %s -mtriple=x86_64-unknown -mattr=+ndd,+egpr -run-pass=break-false-deps -o - | FileCheck --check-prefixes=MIR,RCDEFAULT %s
+# RUN: llc %s -mtriple=x86_64-unknown -mattr=+ndd,+egpr -run-pass=break-false-deps -partial-reg-update-clearance=1 -o - | FileCheck --check-prefixes=MIR,RC1 %s
+#
+# Check that BreakFalseDeps detects cases where an ND instruction would cause a partial register write
+# if compressed to a legacy op. MIR has been modified to force different register assignments.
+#
+# For partial_write, the ADD16rr_ND is compressible, but will become a partial write after compression.
+# Compression is inhibited if the eax definition is within the partial-reg-update-clearance threshold.
+#
+# For no_partial_write, the ADD16rr_ND is incompressible hence it cannot become a partial write.
+# This case checks that an implicit-def of eax is not added by breakPartialRegDependency.
+#
+--- |
+ define signext i16 @partial_write(ptr %p, i32 %a, i32 %b, i16 signext %x, i16 signext %y) #0 {
+ entry:
+ %add = add nsw i32 %b, %a
+ store i32 %add, ptr %p, align 4, !tbaa !1
+ %add1 = trunc i32 %add to i16
+ %add2 = add i16 %add1, %x
+ ret i16 %add2
+ }
+
+ define signext i16 @no_partial_write(ptr %p, i32 %a, i32 %b, i16 signext %x, i16 signext %y) #0 {
+ entry:
+ %add = add nsw i32 %b, %a
+ store i32 %add, ptr %p, align 4, !tbaa !1
+ %add1 = trunc i32 %add to i16
+ %add2 = add i16 %add1, %x
+ ret i16 %add2
+ }
+ attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write) uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx16,+cx8,+egpr,+fxsr,+mmx,+ndd,+sse,+sse2,+x87" "tune-cpu"="generic" }
+
+ !llvm.module.flags = !{!0}
+
+ !0 = !{i32 1, !"wchar_size", i32 4}
+ !1 = !{!2, !2, i64 0}
+ !2 = !{!"int", !3, i64 0}
+ !3 = !{!"omnipotent char", !4, i64 0}
+ !4 = !{!"Simple C/C++ TBAA"}
+...
+---
+name: partial_write
+tracksRegLiveness: true
+noVRegs: true
+noPhis: true
+isSSA: false
+body: |
+ bb.0.entry:
+ liveins: $ecx, $edx, $esi, $rdi
+ ; RCDEFAULT-LABEL: name: partial_write
+ ; RCDEFAULT: liveins: $ecx, $edx, $esi, $rdi
+ ; RCDEFAULT-NEXT: {{ $}}
+ ; RCDEFAULT-NEXT: renamable $eax = nsw ADD32rr_ND killed renamable $edx, killed renamable $esi, implicit-def dead $eflags
+ ; RCDEFAULT-NEXT: MOV32mr killed renamable $rdi, 1, $noreg, 0, $noreg, renamable $eax :: (store (s32) into %ir.p, !tbaa !1)
+ ; RCDEFAULT-NEXT: renamable $ax = ADD16rr_ND renamable $ax, renamable $cx, implicit-def dead $eflags, implicit killed $ecx, implicit $eax, implicit-def $rax
+ ; RCDEFAULT-NEXT: RET64 $ax
+ ;
+ ; RC1-LABEL: name: partial_write
+ ; RC1: liveins: $ecx, $edx, $esi, $rdi
+ ; RC1-NEXT: {{ $}}
+ ; RC1-NEXT: renamable $eax = nsw ADD32rr_ND killed renamable $edx, killed renamable $esi, implicit-def dead $eflags
+ ; RC1-NEXT: MOV32mr killed renamable $rdi, 1, $noreg, 0, $noreg, renamable $eax :: (store (s32) into %ir.p, !tbaa !1)
+ ; RC1-NEXT: renamable $ax = ADD16rr_ND renamable $ax, renamable $cx, implicit-def dead $eflags, implicit killed $ecx, implicit $eax
+ ; RC1-NEXT: RET64 $ax
+ renamable $eax = nsw ADD32rr_ND killed renamable $edx, killed renamable $esi, implicit-def dead $eflags
+ MOV32mr killed renamable $rdi, 1, $noreg, 0, $noreg, renamable $eax :: (store (s32) into %ir.p, !tbaa !1)
+ renamable $ax = ADD16rr_ND renamable $ax, renamable $cx, implicit-def dead $eflags, implicit killed $ecx, implicit $eax
+ RET64 $ax
+...
+---
+name: no_partial_write
+tracksRegLiveness: true
+noVRegs: true
+noPhis: true
+isSSA: false
+body: |
+ bb.0.entry:
+ liveins: $ecx, $edx, $esi, $rdi
+
+ ; MIR-LABEL: name: no_partial_write
+ ; MIR: liveins: $ecx, $edx, $esi, $rdi
+ ; MIR-NEXT: {{ $}}
+ ; MIR-NEXT: renamable $edx = nsw ADD32rr_ND killed renamable $edx, killed renamable $esi, implicit-def dead $eflags
+ ; MIR-NEXT: MOV32mr killed renamable $rdi, 1, $noreg, 0, $noreg, renamable $edx :: (store (s32) into %ir.p, !tbaa !1)
+ ; MIR-NEXT: renamable $ax = ADD16rr_ND renamable $dx, renamable $cx, implicit-def dead $eflags, implicit killed $ecx, implicit killed $edx
+ ; MIR-NEXT: RET64 $ax
+ renamable $edx = nsw ADD32rr_ND killed renamable $edx, killed renamable $esi, implicit-def dead $eflags
+ MOV32mr killed renamable $rdi, 1, $noreg, 0, $noreg, renamable $edx :: (store (s32) into %ir.p, !tbaa !1)
+ renamable $ax = ADD16rr_ND renamable $dx, renamable $cx, implicit-def dead $eflags, implicit killed $ecx, implicit killed $edx
+ RET64 $ax
+...
+## NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+# MIR: {{.*}}
+# RC1: {{.*}}
+# RCDEFAULT: {{.*}}
|
note the tests are split over two files because I was hitting errors when RUN lines produced both MIR and asm output, which is likely not a common usage for update_mir_test_checks. If there's a better way to handle that I'd be glad to learn it. |
Register Dst = MI.getOperand(0).getReg(); | ||
if (Dst && | ||
(X86::GR16RegClass.contains(Dst) || X86::GR8RegClass.contains(Dst))) { | ||
Register Super = getX86SubSuperRegister(Dst, ST.is64Bit() ? 64 : 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to check is64Bit()
, NDD only available under 64bit.
llvm/lib/Target/X86/X86InstrInfo.cpp
Outdated
// appear to perform partial writes. We detect these based on flags | ||
// and register class. | ||
bool HasNDDPartialWrite = false; | ||
if (OpNum == 0 && (Subtarget.hasNDD() || Subtarget.hasZU()) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to check hasNDD()
again given NDD instruction been selected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to prevent checking individual instructions if the feature is disabled. This code previously only considered select instructions covered by hasPartialRegUpdate, so I was concerned about compile time impact now that a much wider set of opcodes are considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Should the ||
actually be &&
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think there won't be much difference, we still need to check hasNDD()
each time, and checking TSFlags
won't be much expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true - removed it either way. I am actually not certain on whether hasZU was really appropriate as well.
llvm/lib/Target/X86/X86InstrInfo.cpp
Outdated
X86::GR8RegClass.contains(Reg) || X86::GR16RegClass.contains(Reg); | ||
} | ||
|
||
if (OpNum != 0 || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move
if (OpNum != 0)
return 0;
at the beginning?
llvm/lib/Target/X86/X86InstrInfo.cpp
Outdated
// For NDD ops, if MI is marked as reading Reg, then it is possible to | ||
// compress to a legacy form in CompressEVEX, which would create an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we will mark the NDD dest as reading Reg given the NDD dest is purely a destination register. I think we probably don't have the knowledge if the source and dest will be the same register in the virtual register phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In virtual registers we'd have a tied use operand - they can't be the same register. The situation we're trying to detect is when it's valid for compression, which is only known after regalloc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is, there won't be tied operand for NDD instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RIght, I'm aware - this check for reads of the result only can only succeed post-RA. It really doesn't appear to be needed if we're still on VRegs, but I left that alone.
BreakFalseDeps only runs after regalloc, so I don't see the problem here; an NDD op that doesn't read its result will not be compressible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is confusing and code is less efficient. Suggested change:
bool IsND = X86II::hasNewDataDest(MI.getDesc().TSFlags);
... ...
// For NDD ops, if the dest uses the same register as source, then it is possible to
// compress to a legacy form in CompressEVEX, which would create an
// unwanted partial update, so we return the clearance.
if (IsND || !Reg.isVirtual()) {
if (MI.readsRegister(Reg, TRI))
return HasNDDPartialWrite;
} else {
if (MO.readsReg() || MI.readsVirtualRegister(Reg))
return 0;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll reword the comment, but the source changes would change the behaviour for the ND and non-ND cases.
I'm not certain that the virtual register case is ever used, but perhaps that's better left as a TODO to confirm and change the function to return 0 outright if the destination is virtual?
// This case is only expected for NDD ops which appear to be partial | ||
// writes, but are not due to the zeroing of the upper part. Here | ||
// we add an implicit def of the superegister, which prevents | ||
// CompressEVEX from converting this to a legacy form. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check it with large applications like benchmark? I'm not sure if we can arbitrarily adding operand to a MI. Maybe add -verify-machineinstrs
in the tests to check first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add verification to the tests. I don't believe it's arbitrary - an NDD op always writes to the full (physical) register, so this accurately reflects the semantics post-regalloc. The idea is to indicate in MIR that the zero-upper behavior is intentional and needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not about the register relationship. It is the machine instrcution operands match its defination. We use Defs/Uses
for implict operands as well, as a result, the number of operands of a machine instrcution is constant vaule during its life time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked Geekbench 6 and coremark-pro again. I was using Geekbench during development because it showed several instances of 8/16b ops generated with the ndd feature enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
llvm/lib/Target/X86/X86InstrInfo.cpp
Outdated
if (X86II::hasNewDataDest(MI.getDesc().TSFlags)) { | ||
Register Reg = MI.getOperand(0).getReg(); | ||
if (Reg.isVirtual()) { | ||
auto &MRI = MI.getParent()->getParent()->getRegInfo(); | ||
if (auto *TRC = MRI.getRegClassOrNull(Reg)) | ||
HasNDDPartialWrite = (TRC->getID() == X86::GR16RegClassID || | ||
TRC->getID() == X86::GR8RegClassID); | ||
} else | ||
HasNDDPartialWrite = | ||
X86::GR8RegClass.contains(Reg) || X86::GR16RegClass.contains(Reg); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (X86II::hasNewDataDest(MI.getDesc().TSFlags)) { | |
Register Reg = MI.getOperand(0).getReg(); | |
if (Reg.isVirtual()) { | |
auto &MRI = MI.getParent()->getParent()->getRegInfo(); | |
if (auto *TRC = MRI.getRegClassOrNull(Reg)) | |
HasNDDPartialWrite = (TRC->getID() == X86::GR16RegClassID || | |
TRC->getID() == X86::GR8RegClassID); | |
} else | |
HasNDDPartialWrite = | |
X86::GR8RegClass.contains(Reg) || X86::GR16RegClass.contains(Reg); | |
} | |
bool IsND = X86II::hasNewDataDest(MI.getDesc().TSFlags); | |
Register Reg = MI.getOperand(0).getReg(); | |
if (IsND) { | |
if (Reg.isVirtual()) | |
return 0; | |
HasNDDPartialWrite = X86::GR8RegClass.contains(Reg) || X86::GR16RegClass.contains(Reg); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can simply the code without handling VRegs for NDD.
## NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line: | ||
# MIR: {{.*}} | ||
# RC1: {{.*}} | ||
# RCDEFAULT: {{.*}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the problem here? The prefixes are there, did you manually edit them? If you do, remove the autogenerated note and these comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's b/c check prefix is shared between different checks. The algorithm does not handle it well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was left over from a previous iteration of update_mir_test_checks. I've removed it and it's no longer added by the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some nits.
!llvm.module.flags = !{!0} | ||
|
||
!0 = !{i32 1, !"wchar_size", i32 4} | ||
!1 = !{!2, !2, i64 0} | ||
!2 = !{!"int", !3, i64 0} | ||
!3 = !{!"omnipotent char", !4, i64 0} | ||
!4 = !{!"Simple C/C++ TBAA"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: These are not needed.
!llvm.module.flags = !{!0} | ||
|
||
!0 = !{i32 1, !"wchar_size", i32 4} | ||
!1 = !{!2, !2, i64 0} | ||
!2 = !{!"int", !3, i64 0} | ||
!3 = !{!"omnipotent char", !4, i64 0} | ||
!4 = !{!"Simple C/C++ TBAA"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
# RUN: llc %s -mtriple=x86_64-unknown -mattr=+ndd,+egpr -start-before=break-false-deps -verify-machineinstrs -show-mc-encoding -o - | FileCheck --check-prefixes=ASM,RCDEFAULT %s | ||
# RUN: llc %s -mtriple=x86_64-unknown -mattr=+ndd,+egpr -start-before=break-false-deps -verify-machineinstrs -partial-reg-update-clearance=1 -show-mc-encoding -o - | FileCheck --check-prefixes=ASM,RC1 %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We usually use CHECK/COMMON/ALL for the shared one.
# RUN: llc %s -mtriple=x86_64-unknown -mattr=+ndd,+egpr -run-pass=break-false-deps -o - | FileCheck --check-prefixes=MIR,RCDEFAULT %s | ||
# RUN: llc %s -mtriple=x86_64-unknown -mattr=+ndd,+egpr -run-pass=break-false-deps -partial-reg-update-clearance=1 -o - | FileCheck --check-prefixes=MIR,RC1 %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
APX NDD instructions may be compressed when the result is also a source. For 8/16b instructions, this may create partial register write hazards if a previous super-register def is within the partial reg update clearance, or incorrect code if the super-register is not dead.
This change prevents compression when the super-register is marked as an implicit define, which the virtual rewriter already adds in the case where a subregister is defined but the super-register is not dead.
The BreakFalseDeps interface is also updated to add implicit super-register defs for NDD instructions that would incur partial-write stalls if compressed to legacy ops.