Skip to content

[DAG] visitEXTRACT_VECTOR_ELT - constant fold legal fp imm values #74304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -22243,6 +22243,19 @@ SDValue DAGCombiner::visitEXTRACT_VECTOR_ELT(SDNode *N) {
unsigned NumElts = VecVT.getVectorNumElements();
unsigned VecEltBitWidth = VecVT.getScalarSizeInBits();

// See if the extracted element is constant, in which case fold it if its
// a legal fp immediate.
if (IndexC && ScalarVT.isFloatingPoint()) {
APInt EltMask = APInt::getOneBitSet(NumElts, IndexC->getZExtValue());
KnownBits KnownElt = DAG.computeKnownBits(VecOp, EltMask);
if (KnownElt.isConstant()) {
Copy link
Contributor

@arsenm arsenm Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this fancy instead of just using dyn_cast<ConstantSDNode>?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its to handle bitcasts of vector data (e.g. <2 x i32>), which don't get folded away if we have multiple uses

APFloat CstFP =
APFloat(DAG.EVTToAPFloatSemantics(ScalarVT), KnownElt.getConstant());
if (TLI.isFPImmLegal(CstFP, ScalarVT))
return DAG.getConstantFP(CstFP, DL, ScalarVT);
}
}

// TODO: These transforms should not require the 'hasOneUse' restriction, but
// there are regressions on multiple targets without it. We can end up with a
// mess of scalar and vector code if we reduce only part of the DAG to scalar.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,11 +69,10 @@ define void @insert_vec_v23i32_uaddlv_from_v8i16(ptr %0) {
; CHECK: ; %bb.0: ; %entry
; CHECK-NEXT: movi.2d v0, #0000000000000000
; CHECK-NEXT: movi.2d v2, #0000000000000000
; CHECK-NEXT: add x8, x0, #88
; CHECK-NEXT: str wzr, [x0, #88]
; CHECK-NEXT: uaddlv.8h s1, v0
; CHECK-NEXT: stp q0, q0, [x0, #16]
; CHECK-NEXT: stp q0, q0, [x0, #48]
; CHECK-NEXT: st1.s { v0 }[2], [x8]
; CHECK-NEXT: str d0, [x0, #80]
; CHECK-NEXT: mov.s v2[0], v1[0]
; CHECK-NEXT: ucvtf.4s v1, v2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,10 @@ define [1 x <4 x float>] @test1() {
define [1 x <4 x float>] @test2() {
; CHECK-LABEL: .p2align 4, 0x0 ; -- Begin function test2
; CHECK-NEXT: lCPI1_0:
; CHECK-NEXT: .long 0x00000000 ; float 0
; CHECK-NEXT: .long 0x00000000 ; float 0
; CHECK-NEXT: .long 0x00000000 ; float 0
; CHECK-NEXT: .long 0x3f800000 ; float 1
; CHECK-NEXT: .long 0x80000000 ; float -0
; CHECK-NEXT: .long 0x80000000 ; float -0
; CHECK-NEXT: .long 0x80000000 ; float -0
; CHECK-NEXT: .long 0xbf800000 ; float -1
; CHECK-NEXT: .section __TEXT,__text,regular,pure_instructions
; CHECK-NEXT: .globl _test2
; CHECK-NEXT: .p2align 2
Expand All @@ -43,17 +43,7 @@ define [1 x <4 x float>] @test2() {
; CHECK-NEXT: Lloh2:
; CHECK-NEXT: adrp x8, lCPI1_0@PAGE
; CHECK-NEXT: Lloh3:
; CHECK-NEXT: ldr q1, [x8, lCPI1_0@PAGEOFF]
; CHECK-NEXT: mov s2, v1[1]
; CHECK-NEXT: fneg s0, s1
; CHECK-NEXT: mov s3, v1[2]
; CHECK-NEXT: mov s1, v1[3]
; CHECK-NEXT: fneg s2, s2
; CHECK-NEXT: fneg s3, s3
; CHECK-NEXT: fneg s1, s1
; CHECK-NEXT: mov.s v0[1], v2[0]
; CHECK-NEXT: mov.s v0[2], v3[0]
; CHECK-NEXT: mov.s v0[3], v1[0]
; CHECK-NEXT: ldr q0, [x8, lCPI1_0@PAGEOFF]
; CHECK-NEXT: ret
;
%constexpr = fneg float extractelement (<4 x float> bitcast (<1 x i128> <i128 84405977732342157929391748327801880576> to <4 x float>), i32 0)
Expand Down
5 changes: 2 additions & 3 deletions llvm/test/CodeGen/X86/2011-10-19-widen_vselect.ll
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,12 @@ define void @zero_test() {
; X86-LABEL: zero_test:
; X86: # %bb.0: # %entry
; X86-NEXT: xorps %xmm0, %xmm0
; X86-NEXT: movlps %xmm0, (%eax)
; X86-NEXT: movsd %xmm0, (%eax)
; X86-NEXT: retl
;
; X64-LABEL: zero_test:
; X64: # %bb.0: # %entry
; X64-NEXT: xorps %xmm0, %xmm0
; X64-NEXT: movlps %xmm0, (%rax)
; X64-NEXT: movq $0, (%rax)
; X64-NEXT: retq
entry:
%0 = select <2 x i1> undef, <2 x float> undef, <2 x float> zeroinitializer
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/CodeGen/X86/2012-07-10-extload64.ll
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ define void @store_64(ptr %ptr) {
; X86: # %bb.0: # %BB
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: xorps %xmm0, %xmm0
; X86-NEXT: movlps %xmm0, (%eax)
; X86-NEXT: movsd %xmm0, (%eax)
; X86-NEXT: retl
;
; X64-LABEL: store_64:
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/CodeGen/X86/fold-load-vec.ll
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ define void @sample_test(ptr %source, ptr %dest) nounwind {
; CHECK-NEXT: subq $24, %rsp
; CHECK-NEXT: movq %rdi, {{[0-9]+}}(%rsp)
; CHECK-NEXT: movq %rsi, {{[0-9]+}}(%rsp)
; CHECK-NEXT: movq $0, (%rsp)
; CHECK-NEXT: xorps %xmm0, %xmm0
; CHECK-NEXT: movlps %xmm0, (%rsp)
; CHECK-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
; CHECK-NEXT: movlps %xmm0, (%rsp)
; CHECK-NEXT: movlps %xmm0, (%rsi)
Expand Down
14 changes: 6 additions & 8 deletions llvm/test/CodeGen/X86/half.ll
Original file line number Diff line number Diff line change
Expand Up @@ -1082,12 +1082,11 @@ define void @main.158() #0 {
; BWON-F16C-LABEL: main.158:
; BWON-F16C: # %bb.0: # %entry
; BWON-F16C-NEXT: vxorps %xmm0, %xmm0, %xmm0
; BWON-F16C-NEXT: vcvtps2ph $4, %xmm0, %xmm0
; BWON-F16C-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
; BWON-F16C-NEXT: vcvtph2ps %xmm0, %xmm0
; BWON-F16C-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; BWON-F16C-NEXT: vucomiss %xmm0, %xmm1
; BWON-F16C-NEXT: vxorps %xmm0, %xmm0, %xmm0
; BWON-F16C-NEXT: vcvtps2ph $4, %xmm0, %xmm1
; BWON-F16C-NEXT: vpmovzxwq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero
; BWON-F16C-NEXT: vcvtph2ps %xmm1, %xmm1
; BWON-F16C-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
; BWON-F16C-NEXT: vucomiss %xmm1, %xmm2
; BWON-F16C-NEXT: jae .LBB20_2
; BWON-F16C-NEXT: # %bb.1: # %entry
; BWON-F16C-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
Expand All @@ -1100,8 +1099,7 @@ define void @main.158() #0 {
; CHECK-I686-LABEL: main.158:
; CHECK-I686: # %bb.0: # %entry
; CHECK-I686-NEXT: subl $12, %esp
; CHECK-I686-NEXT: pxor %xmm0, %xmm0
; CHECK-I686-NEXT: movd %xmm0, (%esp)
; CHECK-I686-NEXT: movl $0, (%esp)
; CHECK-I686-NEXT: calll __truncsfhf2
; CHECK-I686-NEXT: pextrw $0, %xmm0, %eax
; CHECK-I686-NEXT: movw %ax, (%esp)
Expand Down
Loading