Skip to content

[X86] Remove redundant test after setzucc #129506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 3, 2025
Merged

Conversation

phoebewang
Copy link
Contributor

Patch #96594 substitutes setcc + zext pair with setzucc, but it results in redundant test because X86FlagsCopyLowering doesn't recognize it.

This patch removes redundant test by reverting setzucc to setcc (optimized) + zext.

Patch llvm#96594 substitutes setcc + zext pair with setzucc, but it results in
redundant test because X86FlagsCopyLowering doesn't recognize it.

This patch removes redundant test by reverting setzucc to setcc (optimized)
+ zext.
@llvmbot
Copy link
Member

llvmbot commented Mar 3, 2025

@llvm/pr-subscribers-backend-x86

Author: Phoebe Wang (phoebewang)

Changes

Patch #96594 substitutes setcc + zext pair with setzucc, but it results in redundant test because X86FlagsCopyLowering doesn't recognize it.

This patch removes redundant test by reverting setzucc to setcc (optimized) + zext.


Full diff: https://github.com/llvm/llvm-project/pull/129506.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86FlagsCopyLowering.cpp (+24-1)
  • (modified) llvm/test/CodeGen/X86/apx/setzucc.ll (+2-2)
diff --git a/llvm/lib/Target/X86/X86FlagsCopyLowering.cpp b/llvm/lib/Target/X86/X86FlagsCopyLowering.cpp
index ab4c70aee486a..873dce4e922d1 100644
--- a/llvm/lib/Target/X86/X86FlagsCopyLowering.cpp
+++ b/llvm/lib/Target/X86/X86FlagsCopyLowering.cpp
@@ -615,7 +615,7 @@ bool X86FlagsCopyLoweringPass::runOnMachineFunction(MachineFunction &MF) {
           MRI->replaceRegWith(MI.getOperand(0).getReg(),
                               CopyDefI.getOperand(0).getReg());
           MI.eraseFromParent();
-        } else if (X86::isSETCC(Opc)) {
+        } else if (X86::isSETCC(Opc) || X86::isSETZUCC(Opc)) {
           rewriteSetCC(*TestMBB, TestPos, TestLoc, MI, CondRegs);
         } else if (isArithmeticOp(Opc)) {
           rewriteArithmetic(*TestMBB, TestPos, TestLoc, MI, CondRegs);
@@ -781,6 +781,29 @@ void X86FlagsCopyLoweringPass::rewriteSetCC(MachineBasicBlock &MBB,
   if (!CondReg)
     CondReg = promoteCondToReg(MBB, Pos, Loc, Cond);
 
+  if (X86::isSETZUCC(MI.getOpcode())) {
+    // SETZUCC is generated for register only for now.
+    assert(!MI.mayStore() && "Cannot handle memory variants");
+    assert(MI.getOperand(0).isReg() &&
+           "Cannot have a non-register defined operand to SETZUcc!");
+    Register OldReg = MI.getOperand(0).getReg();
+    // Drop Kill flags on the old register before replacing. CondReg may have
+    // a longer live range.
+    MRI->clearKillFlags(OldReg);
+    for (auto &Use : MRI->use_instructions(OldReg)) {
+      assert(Use.getOpcode() == X86::INSERT_SUBREG &&
+             "SETZUCC should be only used by INSERT_SUBREG");
+      Use.getOperand(2).setReg(CondReg);
+      // Recover MOV32r0 before INSERT_SUBREG, which removed by SETZUCC.
+      Register ZeroReg = MRI->createVirtualRegister(&X86::GR32RegClass);
+      BuildMI(*Use.getParent(), &Use, Use.getDebugLoc(), TII->get(X86::MOV32r0),
+              ZeroReg);
+      Use.getOperand(1).setReg(ZeroReg);
+    }
+    MI.eraseFromParent();
+    return;
+  }
+
   // Rewriting a register def is trivial: we just replace the register and
   // remove the setcc.
   if (!MI.mayStore()) {
diff --git a/llvm/test/CodeGen/X86/apx/setzucc.ll b/llvm/test/CodeGen/X86/apx/setzucc.ll
index 1f2b612a17088..6eb2d6966ecd8 100644
--- a/llvm/test/CodeGen/X86/apx/setzucc.ll
+++ b/llvm/test/CodeGen/X86/apx/setzucc.ll
@@ -60,11 +60,11 @@ define i32 @flags_copy_lowering() nounwind {
 ; CHECK-NEXT:    setb %sil
 ; CHECK-NEXT:    adcl $0, %ecx
 ; CHECK-NEXT:    testb %sil, %sil
-; CHECK-NEXT:    setzune %dl
-; CHECK-NEXT:    testb %sil, %sil
 ; CHECK-NEXT:    je .LBB4_3
 ; CHECK-NEXT:  # %bb.2: # %bb1
 ; CHECK-NEXT:    # in Loop: Header=BB4_1 Depth=1
+; CHECK-NEXT:    xorl %edx, %edx
+; CHECK-NEXT:    movb %sil, %dl
 ; CHECK-NEXT:    testb %al, %al
 ; CHECK-NEXT:    jne .LBB4_1
 ; CHECK-NEXT:  .LBB4_3: # %bb2

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@phoebewang phoebewang merged commit 3c9429f into llvm:main Mar 3, 2025
13 checks passed
@phoebewang phoebewang deleted the setzucc branch March 3, 2025 13:08
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I don't understand this. Why can't we just remove the testb %sil, %sil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testb %sil, %sil is due to adcl $0, %ecx clobbers EFLAGS.

jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
Patch llvm#96594 substitutes setcc + zext pair with setzucc, but it results
in redundant test because X86FlagsCopyLowering doesn't recognize it.

This patch removes redundant test by reverting setzucc to setcc
(optimized) + zext.
fzou1 added a commit to fzou1/llvm-project that referenced this pull request May 30, 2025
1. Added pattern to emit SetZUCC instruction with APX NDD enabled to avoid
   false dependency with SetCC. SetCC is emitted with APX NDD disabled.
2. Reverted part of llvm#129506 (changing
   setzucc back to setcc + zext). Keeping the check of SetZUCC instruction will
   call rewriteSetCC for SetZUCC instruction and remove redundant test after
   SetZUCC in X86 Flags Copy Lowering pass.
3. Also added SetZUCC support in FixupSetCC pass to eliminate zext instruction
   after SetZUCC.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants