Skip to content

[X86] Fix the issue of creating index reg negations #135632

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 16, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -275,8 +275,23 @@ namespace {
#define GET_ND_IF_ENABLED(OPC) (Subtarget->hasNDD() ? OPC##_ND : OPC)
// Negate the index if needed.
if (AM.NegateIndex) {
unsigned NegOpc = VT == MVT::i64 ? GET_ND_IF_ENABLED(X86::NEG64r)
: GET_ND_IF_ENABLED(X86::NEG32r);
unsigned NegOpc;
switch (VT.SimpleTy) {
default:
llvm_unreachable("Unsupported VT!");
case MVT::i64:
NegOpc = GET_ND_IF_ENABLED(X86::NEG64r);
break;
case MVT::i32:
NegOpc = GET_ND_IF_ENABLED(X86::NEG32r);
break;
case MVT::i16:
NegOpc = GET_ND_IF_ENABLED(X86::NEG16r);
break;
case MVT::i8:
NegOpc = GET_ND_IF_ENABLED(X86::NEG8r);
break;
}
SDValue Neg = SDValue(CurDAG->getMachineNode(NegOpc, DL, VT, MVT::i32,
AM.IndexReg), 0);
AM.IndexReg = Neg;
Expand Down
71 changes: 71 additions & 0 deletions llvm/test/CodeGen/X86/apx/ndd-neg-addr-index.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
; RUN: llc %s -mtriple=x86_64-unknown -mattr=+ndd -verify-machineinstrs --show-mc-encoding -o - | FileCheck %s --check-prefix=NDD


define void @neg_8bit_1(i1 %cmp) {
; NDD-LABEL: neg_8bit_1:
; NDD: # %bb.0: # %entry
; NDD-NEXT: andb $1, %dil, %al # encoding: [0x62,0xf4,0x7c,0x18,0x80,0xe7,0x01]
; NDD-NEXT: movzbl 0, %ecx # encoding: [0x0f,0xb6,0x0c,0x25,0x00,0x00,0x00,0x00]
; NDD-NEXT: negb %al, %al # encoding: [0x62,0xf4,0x7c,0x18,0xf6,0xd8]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it not compressed to non-NDD? same below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to clear the high bits of rax for lea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it should be cleared by line 8

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I think it's to prevent partial write then, see #132051

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phoebewang , thank you for your explanation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this patch. But it seems #132051 creates sub-optimal code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By sub-optimal, do you mean the code size? I think it's a general improvement considering the following sub has a dependency to the long lantency movzbl in the old code: https://godbolt.org/z/zexdsPnch

; NDD-NEXT: leab 2(%rcx,%rax), %al # encoding: [0x66,0x8d,0x44,0x01,0x02]
; NDD-NEXT: movb %al, 0 # encoding: [0x88,0x04,0x25,0x00,0x00,0x00,0x00]
; NDD-NEXT: retq # encoding: [0xc3]
entry:
%cond = select i1 %cmp, i8 1, i8 2
%0 = load i8, ptr null, align 4
%add = add i8 %cond, %0
store i8 %add, ptr null, align 4
ret void
}

define void @neg_8bit_2(i8 %int8) {
; NDD-LABEL: neg_8bit_2:
; NDD: # %bb.0: # %entry
; NDD-NEXT: # kill: def $edi killed $edi def $rdi
; NDD-NEXT: addb %dil, %dil, %al # encoding: [0x62,0xf4,0x7c,0x18,0x00,0xff]
; NDD-NEXT: negb %al, %al # encoding: [0x62,0xf4,0x7c,0x18,0xf6,0xd8]
; NDD-NEXT: leab 1(%rdi,%rax), %al # encoding: [0x66,0x8d,0x44,0x07,0x01]
; NDD-NEXT: mulb %dil # encoding: [0x40,0xf6,0xe7]
; NDD-NEXT: testb %al, %al # encoding: [0x84,0xc0]
; NDD-NEXT: retq # encoding: [0xc3]
entry:
%0 = shl i8 %int8, 1
%sub = sub i8 %int8, %0
%add = add i8 %sub, 1
%div = mul i8 %add, %int8
%cmp = icmp slt i8 %div, 0
br i1 %cmp, label %label2, label %label1

label1: ; preds = %entry
ret void

label2: ; preds = %entry
ret void
}

define i32 @neg_16bit(i16 %0) {
; NDD-LABEL: neg_16bit:
; NDD: # %bb.0: # %entry
; NDD-NEXT: # kill: def $edi killed $edi def $rdi
; NDD-NEXT: incw %di, %ax # encoding: [0x62,0xf4,0x7d,0x18,0xff,0xc7]
; NDD-NEXT: addw $256, %di, %cx # encoding: [0x62,0xf4,0x75,0x18,0x81,0xc7,0x00,0x01]
; NDD-NEXT: # imm = 0x100
; NDD-NEXT: testw %ax, %ax # encoding: [0x66,0x85,0xc0]
; NDD-NEXT: cmovsl %ecx, %eax # EVEX TO LEGACY Compression encoding: [0x0f,0x48,0xc1]
; NDD-NEXT: andw $-256, %ax # EVEX TO LEGACY Compression encoding: [0x66,0x25,0x00,0xff]
; NDD-NEXT: negw %ax, %ax # encoding: [0x62,0xf4,0x7d,0x18,0xf7,0xd8]
; NDD-NEXT: leaw 1(%rdi,%rax), %ax # encoding: [0x66,0x8d,0x44,0x07,0x01]
; NDD-NEXT: movzwl %ax, %eax # encoding: [0x0f,0xb7,0xc0]
; NDD-NEXT: movq %rax, 0 # encoding: [0x48,0x89,0x04,0x25,0x00,0x00,0x00,0x00]
; NDD-NEXT: xorl %eax, %eax # encoding: [0x31,0xc0]
; NDD-NEXT: retq # encoding: [0xc3]
entry:
%add = add i16 %0, 1
%rem = srem i16 %add, 256
%1 = zext i16 %rem to i19
%2 = sext i19 %1 to i64
%3 = getelementptr i8, ptr null, i64 %2
store ptr %3, ptr null, align 4
ret i32 0
}
Loading