Skip to content

[X86] Prevent APX NDD compression when it creates a partial write #132051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 21, 2025

Conversation

daniel-zabawa
Copy link
Contributor

APX NDD instructions may be compressed when the result is also a source. For 8/16b instructions, this may create partial register write hazards if a previous super-register def is within the partial reg update clearance, or incorrect code if the super-register is not dead.

This change prevents compression when the super-register is marked as an implicit define, which the virtual rewriter already adds in the case where a subregister is defined but the super-register is not dead.

The BreakFalseDeps interface is also updated to add implicit super-register defs for NDD instructions that would incur partial-write stalls if compressed to legacy ops.

APX NDD instructions may be compressed when the result is also a source.
For 8/16b instructions, this may create partial register write hazards
if a previous super-register def is within the partial reg update
clearance, or incorrect code if the super-register is not dead.

This change prevents compression when the super-register is marked
as an implicit define, which the virtual rewriter already adds in
the case where a subregister is defined but the super-register is not
dead.

The BreakFalseDeps interface is also updated to add implicit
super-register defs for NDD instructions that would incur partial-write
stalls if compressed to legacy ops.
@daniel-zabawa daniel-zabawa marked this pull request as ready for review March 19, 2025 21:01
@llvmbot
Copy link
Member

llvmbot commented Mar 19, 2025

@llvm/pr-subscribers-backend-x86

Author: Daniel Zabawa (daniel-zabawa)

Changes

APX NDD instructions may be compressed when the result is also a source. For 8/16b instructions, this may create partial register write hazards if a previous super-register def is within the partial reg update clearance, or incorrect code if the super-register is not dead.

This change prevents compression when the super-register is marked as an implicit define, which the virtual rewriter already adds in the case where a subregister is defined but the super-register is not dead.

The BreakFalseDeps interface is also updated to add implicit super-register defs for NDD instructions that would incur partial-write stalls if compressed to legacy ops.


Full diff: https://github.com/llvm/llvm-project/pull/132051.diff

4 Files Affected:

  • (modified) llvm/lib/Target/X86/X86CompressEVEX.cpp (+17)
  • (modified) llvm/lib/Target/X86/X86InstrInfo.cpp (+44-9)
  • (added) llvm/test/CodeGen/X86/apx/ndd-false-deps-asm.mir (+89)
  • (added) llvm/test/CodeGen/X86/apx/ndd-false-deps.mir (+96)
diff --git a/llvm/lib/Target/X86/X86CompressEVEX.cpp b/llvm/lib/Target/X86/X86CompressEVEX.cpp
index b80c21b008f4b..6c22bd9c46b4f 100644
--- a/llvm/lib/Target/X86/X86CompressEVEX.cpp
+++ b/llvm/lib/Target/X86/X86CompressEVEX.cpp
@@ -237,6 +237,23 @@ static bool CompressEVEXImpl(MachineInstr &MI, const X86Subtarget &ST) {
       return 0;
     return I->NewOpc;
   };
+
+  // Redundant NDD ops cannot be safely compressed if either:
+  // - the legacy op would introduce a partial write that BreakFalseDeps
+  // identified as a potential stall, or
+  // - the op is writing to a subregister of a live register, i.e. the
+  // full (zeroed) result is used.
+  // Both cases are indicated by an implicit def of the superregister.
+  if (IsRedundantNDD) {
+    Register Dst = MI.getOperand(0).getReg();
+    if (Dst &&
+        (X86::GR16RegClass.contains(Dst) || X86::GR8RegClass.contains(Dst))) {
+      Register Super = getX86SubSuperRegister(Dst, ST.is64Bit() ? 64 : 32);
+      if (MI.definesRegister(Super, /*TRI=*/nullptr))
+        IsRedundantNDD = false;
+    }
+  }
+
   // NonNF -> NF only if it's not a compressible NDD instruction and eflags is
   // dead.
   unsigned NewOpc = IsRedundantNDD
diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index 5c65171dd83b0..a8a533de30cbf 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -6793,19 +6793,42 @@ static bool hasPartialRegUpdate(unsigned Opcode, const X86Subtarget &Subtarget,
 unsigned X86InstrInfo::getPartialRegUpdateClearance(
     const MachineInstr &MI, unsigned OpNum,
     const TargetRegisterInfo *TRI) const {
-  if (OpNum != 0 || !hasPartialRegUpdate(MI.getOpcode(), Subtarget))
+
+  // With the NDD/ZU features, ISel may generate NDD/ZU ops which
+  // appear to perform partial writes. We detect these based on flags
+  // and register class.
+  bool HasNDDPartialWrite = false;
+  if (OpNum == 0 && (Subtarget.hasNDD() || Subtarget.hasZU()) &&
+      X86II::hasNewDataDest(MI.getDesc().TSFlags)) {
+    Register Reg = MI.getOperand(0).getReg();
+    if (Reg.isVirtual()) {
+      auto &MRI = MI.getParent()->getParent()->getRegInfo();
+      if (auto *TRC = MRI.getRegClassOrNull(Reg))
+        HasNDDPartialWrite = (TRC->getID() == X86::GR16RegClassID ||
+                              TRC->getID() == X86::GR8RegClassID);
+    } else
+      HasNDDPartialWrite =
+          X86::GR8RegClass.contains(Reg) || X86::GR16RegClass.contains(Reg);
+  }
+
+  if (OpNum != 0 ||
+      !(HasNDDPartialWrite || hasPartialRegUpdate(MI.getOpcode(), Subtarget)))
     return 0;
 
-  // If MI is marked as reading Reg, the partial register update is wanted.
+  // For non-NDD ops, if MI is marked as reading Reg, the partial register
+  // update is wanted, hence we return 0.
+  // For NDD ops, if MI is marked as reading Reg, then it is possible to
+  // compress to a legacy form in CompressEVEX, which would create an
+  // unwanted partial update, so we return the clearance.
   const MachineOperand &MO = MI.getOperand(0);
   Register Reg = MO.getReg();
-  if (Reg.isVirtual()) {
-    if (MO.readsReg() || MI.readsVirtualRegister(Reg))
-      return 0;
-  } else {
-    if (MI.readsRegister(Reg, TRI))
-      return 0;
-  }
+  bool ReadsReg = false;
+  if (Reg.isVirtual())
+    ReadsReg = (MO.readsReg() || MI.readsVirtualRegister(Reg));
+  else
+    ReadsReg = MI.readsRegister(Reg, TRI);
+  if (ReadsReg != HasNDDPartialWrite)
+    return 0;
 
   // If any instructions in the clearance range are reading Reg, insert a
   // dependency breaking instruction, which is inexpensive and is likely to
@@ -7229,6 +7252,18 @@ void X86InstrInfo::breakPartialRegDependency(
         .addReg(Reg, RegState::Undef)
         .addReg(Reg, RegState::Undef);
     MI.addRegisterKilled(Reg, TRI, true);
+  } else if ((X86::GR16RegClass.contains(Reg) ||
+              X86::GR8RegClass.contains(Reg)) &&
+             X86II::hasNewDataDest(MI.getDesc().TSFlags)) {
+    // This case is only expected for NDD ops which appear to be partial
+    // writes, but are not due to the zeroing of the upper part. Here
+    // we add an implicit def of the superegister, which prevents
+    // CompressEVEX from converting this to a legacy form.
+    Register SuperReg =
+        getX86SubSuperRegister(Reg, Subtarget.is64Bit() ? 64 : 32);
+    MachineInstrBuilder BuildMI(*MI.getParent()->getParent(), &MI);
+    if (!MI.definesRegister(SuperReg, /*TRI=*/nullptr))
+      BuildMI.addReg(SuperReg, RegState::ImplicitDefine);
   }
 }
 
diff --git a/llvm/test/CodeGen/X86/apx/ndd-false-deps-asm.mir b/llvm/test/CodeGen/X86/apx/ndd-false-deps-asm.mir
new file mode 100644
index 0000000000000..e297e87413ada
--- /dev/null
+++ b/llvm/test/CodeGen/X86/apx/ndd-false-deps-asm.mir
@@ -0,0 +1,89 @@
+# NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+# RUN: llc %s -mtriple=x86_64-unknown -mattr=+ndd,+egpr -start-before=break-false-deps -show-mc-encoding -o - | FileCheck --check-prefixes=ASM,RCDEFAULT %s
+# RUN: llc %s -mtriple=x86_64-unknown -mattr=+ndd,+egpr -start-before=break-false-deps -partial-reg-update-clearance=1 -show-mc-encoding -o - | FileCheck --check-prefixes=ASM,RC1 %s
+#
+# Check that BreakFalseDeps detects cases where an ND instruction would cause a partial register write
+# if compressed to a legacy op. MIR has been modified to force different register assignments.
+#
+# For partial_write, the ADD16rr_ND is compressible, but will become a partial write after compression.
+# Compression is inhibited if the eax definition is within the partial-reg-update-clearance threshold.
+#
+# For no_partial_write, the ADD16rr_ND is incompressible hence it cannot become a partial write.
+# This case checks that an implicit-def of eax is not added by breakPartialRegDependency.
+#
+--- |
+  define signext i16 @partial_write(ptr %p, i32 %a, i32 %b, i16 signext %x, i16 signext %y) #0 {
+  ; RCDEFAULT-LABEL: partial_write:
+  ; RCDEFAULT:       # %bb.0: # %entry
+  ; RCDEFAULT-NEXT:    addl %esi, %edx, %eax # encoding: [0x62,0xf4,0x7c,0x18,0x01,0xf2]
+  ; RCDEFAULT-NEXT:    movl %eax, (%rdi) # encoding: [0x89,0x07]
+  ; RCDEFAULT-NEXT:    addw %cx, %ax, %ax # encoding: [0x62,0xf4,0x7d,0x18,0x01,0xc8]
+  ; RCDEFAULT-NEXT:    retq # encoding: [0xc3]
+  ;
+  ; RC1-LABEL: partial_write:
+  ; RC1:       # %bb.0: # %entry
+  ; RC1-NEXT:    addl %esi, %edx, %eax # encoding: [0x62,0xf4,0x7c,0x18,0x01,0xf2]
+  ; RC1-NEXT:    movl %eax, (%rdi) # encoding: [0x89,0x07]
+  ; RC1-NEXT:    addw %cx, %ax # EVEX TO LEGACY Compression encoding: [0x66,0x01,0xc8]
+  ; RC1-NEXT:    retq # encoding: [0xc3]
+  entry:
+    %add = add nsw i32 %b, %a
+    store i32 %add, ptr %p, align 4, !tbaa !1
+    %add1 = trunc i32 %add to i16
+    %add2 = add i16 %add1, %x
+    ret i16 %add2
+  }
+
+  define signext i16 @no_partial_write(ptr %p, i32 %a, i32 %b, i16 signext %x, i16 signext %y) #0 {
+  ; ASM-LABEL: no_partial_write:
+  ; ASM:       # %bb.0: # %entry
+  ; ASM-NEXT:    addl %esi, %edx # EVEX TO LEGACY Compression encoding: [0x01,0xf2]
+  ; ASM-NEXT:    movl %edx, (%rdi) # encoding: [0x89,0x17]
+  ; ASM-NEXT:    addw %cx, %dx, %ax # encoding: [0x62,0xf4,0x7d,0x18,0x01,0xca]
+  ; ASM-NEXT:    retq # encoding: [0xc3]
+  entry:
+    %add = add nsw i32 %b, %a
+    store i32 %add, ptr %p, align 4, !tbaa !1
+    %add1 = trunc i32 %add to i16
+    %add2 = add i16 %add1, %x
+    ret i16 %add2
+  }
+  attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write) uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx16,+cx8,+egpr,+fxsr,+mmx,+ndd,+sse,+sse2,+x87" "tune-cpu"="generic" }
+
+  !llvm.module.flags = !{!0}
+
+  !0 = !{i32 1, !"wchar_size", i32 4}
+  !1 = !{!2, !2, i64 0}
+  !2 = !{!"int", !3, i64 0}
+  !3 = !{!"omnipotent char", !4, i64 0}
+  !4 = !{!"Simple C/C++ TBAA"}
+...
+---
+name:            partial_write
+tracksRegLiveness: true
+noVRegs:         true
+noPhis:          true
+isSSA:           false
+body:             |
+  bb.0.entry:
+    liveins: $ecx, $edx, $esi, $rdi
+    renamable $eax = nsw ADD32rr_ND killed renamable $edx, killed renamable $esi, implicit-def dead $eflags
+    MOV32mr killed renamable $rdi, 1, $noreg, 0, $noreg, renamable $eax :: (store (s32) into %ir.p, !tbaa !1)
+    renamable $ax = ADD16rr_ND renamable $ax, renamable $cx, implicit-def dead $eflags, implicit killed $ecx, implicit $eax
+    RET64 $ax
+...
+---
+name:            no_partial_write
+tracksRegLiveness: true
+noVRegs:         true
+noPhis:          true
+isSSA:           false
+body:             |
+  bb.0.entry:
+    liveins: $ecx, $edx, $esi, $rdi
+
+    renamable $edx = nsw ADD32rr_ND killed renamable $edx, killed renamable $esi, implicit-def dead $eflags
+    MOV32mr killed renamable $rdi, 1, $noreg, 0, $noreg, renamable $edx :: (store (s32) into %ir.p, !tbaa !1)
+    renamable $ax = ADD16rr_ND renamable $dx, renamable $cx, implicit-def dead $eflags, implicit killed $ecx, implicit killed $edx
+    RET64 $ax
+...
diff --git a/llvm/test/CodeGen/X86/apx/ndd-false-deps.mir b/llvm/test/CodeGen/X86/apx/ndd-false-deps.mir
new file mode 100644
index 0000000000000..376a89180290d
--- /dev/null
+++ b/llvm/test/CodeGen/X86/apx/ndd-false-deps.mir
@@ -0,0 +1,96 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc %s -mtriple=x86_64-unknown -mattr=+ndd,+egpr -run-pass=break-false-deps -o - | FileCheck --check-prefixes=MIR,RCDEFAULT %s
+# RUN: llc %s -mtriple=x86_64-unknown -mattr=+ndd,+egpr -run-pass=break-false-deps -partial-reg-update-clearance=1 -o - | FileCheck --check-prefixes=MIR,RC1 %s
+#
+# Check that BreakFalseDeps detects cases where an ND instruction would cause a partial register write
+# if compressed to a legacy op. MIR has been modified to force different register assignments.
+#
+# For partial_write, the ADD16rr_ND is compressible, but will become a partial write after compression.
+# Compression is inhibited if the eax definition is within the partial-reg-update-clearance threshold.
+#
+# For no_partial_write, the ADD16rr_ND is incompressible hence it cannot become a partial write.
+# This case checks that an implicit-def of eax is not added by breakPartialRegDependency.
+#
+--- |
+  define signext i16 @partial_write(ptr %p, i32 %a, i32 %b, i16 signext %x, i16 signext %y) #0 {
+  entry:
+    %add = add nsw i32 %b, %a
+    store i32 %add, ptr %p, align 4, !tbaa !1
+    %add1 = trunc i32 %add to i16
+    %add2 = add i16 %add1, %x
+    ret i16 %add2
+  }
+
+  define signext i16 @no_partial_write(ptr %p, i32 %a, i32 %b, i16 signext %x, i16 signext %y) #0 {
+  entry:
+    %add = add nsw i32 %b, %a
+    store i32 %add, ptr %p, align 4, !tbaa !1
+    %add1 = trunc i32 %add to i16
+    %add2 = add i16 %add1, %x
+    ret i16 %add2
+  }
+  attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: write) uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx16,+cx8,+egpr,+fxsr,+mmx,+ndd,+sse,+sse2,+x87" "tune-cpu"="generic" }
+
+  !llvm.module.flags = !{!0}
+
+  !0 = !{i32 1, !"wchar_size", i32 4}
+  !1 = !{!2, !2, i64 0}
+  !2 = !{!"int", !3, i64 0}
+  !3 = !{!"omnipotent char", !4, i64 0}
+  !4 = !{!"Simple C/C++ TBAA"}
+...
+---
+name:            partial_write
+tracksRegLiveness: true
+noVRegs:         true
+noPhis:          true
+isSSA:           false
+body:             |
+  bb.0.entry:
+    liveins: $ecx, $edx, $esi, $rdi
+    ; RCDEFAULT-LABEL: name: partial_write
+    ; RCDEFAULT: liveins: $ecx, $edx, $esi, $rdi
+    ; RCDEFAULT-NEXT: {{  $}}
+    ; RCDEFAULT-NEXT: renamable $eax = nsw ADD32rr_ND killed renamable $edx, killed renamable $esi, implicit-def dead $eflags
+    ; RCDEFAULT-NEXT: MOV32mr killed renamable $rdi, 1, $noreg, 0, $noreg, renamable $eax :: (store (s32) into %ir.p, !tbaa !1)
+    ; RCDEFAULT-NEXT: renamable $ax = ADD16rr_ND renamable $ax, renamable $cx, implicit-def dead $eflags, implicit killed $ecx, implicit $eax, implicit-def $rax
+    ; RCDEFAULT-NEXT: RET64 $ax
+    ;
+    ; RC1-LABEL: name: partial_write
+    ; RC1: liveins: $ecx, $edx, $esi, $rdi
+    ; RC1-NEXT: {{  $}}
+    ; RC1-NEXT: renamable $eax = nsw ADD32rr_ND killed renamable $edx, killed renamable $esi, implicit-def dead $eflags
+    ; RC1-NEXT: MOV32mr killed renamable $rdi, 1, $noreg, 0, $noreg, renamable $eax :: (store (s32) into %ir.p, !tbaa !1)
+    ; RC1-NEXT: renamable $ax = ADD16rr_ND renamable $ax, renamable $cx, implicit-def dead $eflags, implicit killed $ecx, implicit $eax
+    ; RC1-NEXT: RET64 $ax
+    renamable $eax = nsw ADD32rr_ND killed renamable $edx, killed renamable $esi, implicit-def dead $eflags
+    MOV32mr killed renamable $rdi, 1, $noreg, 0, $noreg, renamable $eax :: (store (s32) into %ir.p, !tbaa !1)
+    renamable $ax = ADD16rr_ND renamable $ax, renamable $cx, implicit-def dead $eflags, implicit killed $ecx, implicit $eax
+    RET64 $ax
+...
+---
+name:            no_partial_write
+tracksRegLiveness: true
+noVRegs:         true
+noPhis:          true
+isSSA:           false
+body:             |
+  bb.0.entry:
+    liveins: $ecx, $edx, $esi, $rdi
+
+    ; MIR-LABEL: name: no_partial_write
+    ; MIR: liveins: $ecx, $edx, $esi, $rdi
+    ; MIR-NEXT: {{  $}}
+    ; MIR-NEXT: renamable $edx = nsw ADD32rr_ND killed renamable $edx, killed renamable $esi, implicit-def dead $eflags
+    ; MIR-NEXT: MOV32mr killed renamable $rdi, 1, $noreg, 0, $noreg, renamable $edx :: (store (s32) into %ir.p, !tbaa !1)
+    ; MIR-NEXT: renamable $ax = ADD16rr_ND renamable $dx, renamable $cx, implicit-def dead $eflags, implicit killed $ecx, implicit killed $edx
+    ; MIR-NEXT: RET64 $ax
+    renamable $edx = nsw ADD32rr_ND killed renamable $edx, killed renamable $esi, implicit-def dead $eflags
+    MOV32mr killed renamable $rdi, 1, $noreg, 0, $noreg, renamable $edx :: (store (s32) into %ir.p, !tbaa !1)
+    renamable $ax = ADD16rr_ND renamable $dx, renamable $cx, implicit-def dead $eflags, implicit killed $ecx, implicit killed $edx
+    RET64 $ax
+...
+## NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+# MIR: {{.*}}
+# RC1: {{.*}}
+# RCDEFAULT: {{.*}}

@daniel-zabawa
Copy link
Contributor Author

note the tests are split over two files because I was hitting errors when RUN lines produced both MIR and asm output, which is likely not a common usage for update_mir_test_checks. If there's a better way to handle that I'd be glad to learn it.

Register Dst = MI.getOperand(0).getReg();
if (Dst &&
(X86::GR16RegClass.contains(Dst) || X86::GR8RegClass.contains(Dst))) {
Register Super = getX86SubSuperRegister(Dst, ST.is64Bit() ? 64 : 32);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check is64Bit(), NDD only available under 64bit.

// appear to perform partial writes. We detect these based on flags
// and register class.
bool HasNDDPartialWrite = false;
if (OpNum == 0 && (Subtarget.hasNDD() || Subtarget.hasZU()) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check hasNDD() again given NDD instruction been selected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to prevent checking individual instructions if the feature is disabled. This code previously only considered select instructions covered by hasPartialRegUpdate, so I was concerned about compile time impact now that a much wider set of opcodes are considered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Should the || actually be &&?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think there won't be much difference, we still need to check hasNDD() each time, and checking TSFlags won't be much expensive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true - removed it either way. I am actually not certain on whether hasZU was really appropriate as well.

X86::GR8RegClass.contains(Reg) || X86::GR16RegClass.contains(Reg);
}

if (OpNum != 0 ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move

if (OpNum != 0)
  return 0;

at the beginning?

Comment on lines 6820 to 6821
// For NDD ops, if MI is marked as reading Reg, then it is possible to
// compress to a legacy form in CompressEVEX, which would create an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we will mark the NDD dest as reading Reg given the NDD dest is purely a destination register. I think we probably don't have the knowledge if the source and dest will be the same register in the virtual register phase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In virtual registers we'd have a tied use operand - they can't be the same register. The situation we're trying to detect is when it's valid for compression, which is only known after regalloc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is, there won't be tied operand for NDD instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RIght, I'm aware - this check for reads of the result only can only succeed post-RA. It really doesn't appear to be needed if we're still on VRegs, but I left that alone.

BreakFalseDeps only runs after regalloc, so I don't see the problem here; an NDD op that doesn't read its result will not be compressible.

Copy link
Contributor

@phoebewang phoebewang Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is confusing and code is less efficient. Suggested change:

  bool IsND = X86II::hasNewDataDest(MI.getDesc().TSFlags);
  ... ...
  // For NDD ops, if the dest uses the same register as source, then it is possible to
  // compress to a legacy form in CompressEVEX, which would create an
  // unwanted partial update, so we return the clearance.
  if (IsND || !Reg.isVirtual()) {
    if (MI.readsRegister(Reg, TRI))
      return HasNDDPartialWrite;
  } else {
    if (MO.readsReg() || MI.readsVirtualRegister(Reg))
      return 0;
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll reword the comment, but the source changes would change the behaviour for the ND and non-ND cases.
I'm not certain that the virtual register case is ever used, but perhaps that's better left as a TODO to confirm and change the function to return 0 outright if the destination is virtual?

Comment on lines +7258 to +7261
// This case is only expected for NDD ops which appear to be partial
// writes, but are not due to the zeroing of the upper part. Here
// we add an implicit def of the superegister, which prevents
// CompressEVEX from converting this to a legacy form.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check it with large applications like benchmark? I'm not sure if we can arbitrarily adding operand to a MI. Maybe add -verify-machineinstrs in the tests to check first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add verification to the tests. I don't believe it's arbitrary - an NDD op always writes to the full (physical) register, so this accurately reflects the semantics post-regalloc. The idea is to indicate in MIR that the zero-upper behavior is intentional and needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not about the register relationship. It is the machine instrcution operands match its defination. We use Defs/Uses for implict operands as well, as a result, the number of operands of a machine instrcution is constant vaule during its life time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked Geekbench 6 and coremark-pro again. I was using Geekbench during development because it showed several instances of 8/16b ops generated with the ndd feature enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines 6804 to 6814
if (X86II::hasNewDataDest(MI.getDesc().TSFlags)) {
Register Reg = MI.getOperand(0).getReg();
if (Reg.isVirtual()) {
auto &MRI = MI.getParent()->getParent()->getRegInfo();
if (auto *TRC = MRI.getRegClassOrNull(Reg))
HasNDDPartialWrite = (TRC->getID() == X86::GR16RegClassID ||
TRC->getID() == X86::GR8RegClassID);
} else
HasNDDPartialWrite =
X86::GR8RegClass.contains(Reg) || X86::GR16RegClass.contains(Reg);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (X86II::hasNewDataDest(MI.getDesc().TSFlags)) {
Register Reg = MI.getOperand(0).getReg();
if (Reg.isVirtual()) {
auto &MRI = MI.getParent()->getParent()->getRegInfo();
if (auto *TRC = MRI.getRegClassOrNull(Reg))
HasNDDPartialWrite = (TRC->getID() == X86::GR16RegClassID ||
TRC->getID() == X86::GR8RegClassID);
} else
HasNDDPartialWrite =
X86::GR8RegClass.contains(Reg) || X86::GR16RegClass.contains(Reg);
}
bool IsND = X86II::hasNewDataDest(MI.getDesc().TSFlags);
Register Reg = MI.getOperand(0).getReg();
if (IsND) {
if (Reg.isVirtual())
return 0;
HasNDDPartialWrite = X86::GR8RegClass.contains(Reg) || X86::GR16RegClass.contains(Reg);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simply the code without handling VRegs for NDD.

Comment on lines 93 to 96
## NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
# MIR: {{.*}}
# RC1: {{.*}}
# RCDEFAULT: {{.*}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the problem here? The prefixes are there, did you manually edit them? If you do, remove the autogenerated note and these comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's b/c check prefix is shared between different checks. The algorithm does not handle it well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was left over from a previous iteration of update_mir_test_checks. I've removed it and it's no longer added by the script.

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some nits.

Comment on lines 53 to 59
!llvm.module.flags = !{!0}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!2, !2, i64 0}
!2 = !{!"int", !3, i64 0}
!3 = !{!"omnipotent char", !4, i64 0}
!4 = !{!"Simple C/C++ TBAA"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: These are not needed.

Comment on lines 34 to 40
!llvm.module.flags = !{!0}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!2, !2, i64 0}
!2 = !{!"int", !3, i64 0}
!3 = !{!"omnipotent char", !4, i64 0}
!4 = !{!"Simple C/C++ TBAA"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Comment on lines 2 to 3
# RUN: llc %s -mtriple=x86_64-unknown -mattr=+ndd,+egpr -start-before=break-false-deps -verify-machineinstrs -show-mc-encoding -o - | FileCheck --check-prefixes=ASM,RCDEFAULT %s
# RUN: llc %s -mtriple=x86_64-unknown -mattr=+ndd,+egpr -start-before=break-false-deps -verify-machineinstrs -partial-reg-update-clearance=1 -show-mc-encoding -o - | FileCheck --check-prefixes=ASM,RC1 %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We usually use CHECK/COMMON/ALL for the shared one.

Comment on lines 2 to 3
# RUN: llc %s -mtriple=x86_64-unknown -mattr=+ndd,+egpr -run-pass=break-false-deps -o - | FileCheck --check-prefixes=MIR,RCDEFAULT %s
# RUN: llc %s -mtriple=x86_64-unknown -mattr=+ndd,+egpr -run-pass=break-false-deps -partial-reg-update-clearance=1 -o - | FileCheck --check-prefixes=MIR,RC1 %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@phoebewang phoebewang merged commit 5afa0fa into llvm:main Mar 21, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants