Skip to content

[X86][AVX] Add missing X86ISD::VBROADCAST(v2i64 -> v4i64) isel pattern for AVX1 targets #102853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 13, 2024

Conversation

Maetveis
Copy link
Contributor

An internal bug revealed that this form can be formed from existing combines, and then fail to select. Use the same pattern as v2f64 -> v4f64 (the instructions are moving bits around the actual type does not matter here).

…n for AVX1 targets

An internal bug revealed that this form can be formed from existing
combines, and then fail to select. Add the same pattern as v2f64 -> v4f64
(the instructions are moving bits around the actual type does not matter here).
@llvmbot
Copy link
Member

llvmbot commented Aug 12, 2024

@llvm/pr-subscribers-backend-x86

Author: Mészáros Gergely (Maetveis)

Changes

An internal bug revealed that this form can be formed from existing combines, and then fail to select. Use the same pattern as v2f64 -> v4f64 (the instructions are moving bits around the actual type does not matter here).


Full diff: https://github.com/llvm/llvm-project/pull/102853.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86InstrSSE.td (+5)
  • (modified) llvm/test/CodeGen/X86/combine-concatvectors.ll (+23)
diff --git a/llvm/lib/Target/X86/X86InstrSSE.td b/llvm/lib/Target/X86/X86InstrSSE.td
index 5f9211edfa161b..515e7d80a39637 100644
--- a/llvm/lib/Target/X86/X86InstrSSE.td
+++ b/llvm/lib/Target/X86/X86InstrSSE.td
@@ -7797,6 +7797,11 @@ let Predicates = [HasAVX1Only] in {
             (VPSHUFDri (VMOV64toPQIrr GR64:$src), 0x44)>;
   def : Pat<(v2i64 (X86VBroadcastld64 addr:$src)),
             (VMOVDDUPrm addr:$src)>;
+
+  def : Pat<(v4i64 (X86VBroadcast v2i64:$src)),
+            (VINSERTF128rr (INSERT_SUBREG (v4i64 (IMPLICIT_DEF)),
+              (v2i64 (VMOVDDUPrr VR128:$src)), sub_xmm),
+              (v2i64 (VMOVDDUPrr VR128:$src)), 1)>;
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/llvm/test/CodeGen/X86/combine-concatvectors.ll b/llvm/test/CodeGen/X86/combine-concatvectors.ll
index 17d22607cfea88..5b06a8091dce7e 100644
--- a/llvm/test/CodeGen/X86/combine-concatvectors.ll
+++ b/llvm/test/CodeGen/X86/combine-concatvectors.ll
@@ -118,3 +118,26 @@ define <4 x float> @concat_of_broadcast_v4f32_v8f32(ptr %a0, ptr %a1, ptr %a2) {
   %shuffle1 = shufflevector <8 x float> %ld2, <8 x float> %shuffle, <4 x i32> <i32 6, i32 15, i32 12, i32 3>
   ret <4 x float> %shuffle1
 }
+
+define <4 x i64> @broadcast_of_shuffle_v2i64_v4i64(<16 x i8> %vecinit.i) {
+; AVX1-LABEL: broadcast_of_shuffle_v2i64_v4i64:
+; AVX1:       # %bb.0: # %entry
+; AVX1-NEXT:    vpsllq $56, %xmm0, %xmm0
+; AVX1-NEXT:    vmovddup {{.*#+}} xmm0 = xmm0[0,0]
+; AVX1-NEXT:    vinsertf128 $1, %xmm0, %ymm0, %ymm0
+; AVX1-NEXT:    retq
+;
+; AVX2-LABEL: broadcast_of_shuffle_v2i64_v4i64:
+; AVX2:       # %bb.0: # %entry
+; AVX2-NEXT:    vpsllq $56, %xmm0, %xmm0
+; AVX2-NEXT:    vpbroadcastq %xmm0, %ymm0
+; AVX2-NEXT:    retq
+entry:
+  %vecinit15.i = shufflevector <16 x i8> %vecinit.i, <16 x i8> poison, <16 x i32> zeroinitializer
+  %0 = bitcast <16 x i8> %vecinit15.i to <2 x i64>
+  %1 = extractelement <2 x i64> %0, i64 0
+  %2 = and i64 %1, -72057594037927936 ; 0xFF00 0000 0000 0000
+  %3 = insertelement <4 x i64> poison, i64 %2, i64 0
+  %4 = shufflevector <4 x i64> %3, <4 x i64> poison, <4 x i32> zeroinitializer
+  ret <4 x i64> %4
+}

@RKSimon RKSimon self-requested a review August 12, 2024 08:05
def : Pat<(v4i64 (X86VBroadcast v2i64:$src)),
(VINSERTF128rr (INSERT_SUBREG (v4i64 (IMPLICIT_DEF)),
(v2i64 (VMOVDDUPrr VR128:$src)), sub_xmm),
(v2i64 (VMOVDDUPrr VR128:$src)), 1)>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use VPSHUFDri 0x44 instead to reduce domain changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will change it.

I'm not versed in these instruction level considerations yet, are you referring to less mixing of "floating point" and integer instructions? Is this sentence in the Intel Optimization Manual relevant?

When writing SIMD code that works for both integer and floating-point data, use the subset of SIMD convert
instructions or load/store instructions to ensure that the input operands in XMM registers contain data types that
are properly defined to match the instruction.
Code sequences containing cross-typed usage produce the same result across different implementa-
tions but incur a significant performance penalty. Using SSE/SSE2/SSE3/SSSE3/SSE44.1 instruc-
tions to operate on type-mismatched SIMD data in the XMM register is strongly discouraged.

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the v4i32 -> v8i32 equivalent broadcast pattern occur as well?

@Maetveis
Copy link
Contributor Author

Does the v4i32 -> v8i32 equivalent broadcast pattern occur as well?

I couldn't produce it by changing the types in the IR: https://godbolt.org/z/6nTPjhsqf

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - cheers

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@phoebewang phoebewang merged commit aa03327 into llvm:main Aug 13, 2024
5 of 8 checks passed
@Maetveis Maetveis deleted the vbroadcast branch October 8, 2024 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants