-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[X86][AVX] Add missing X86ISD::VBROADCAST(v2i64 -> v4i64) isel pattern for AVX1 targets #102853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…n for AVX1 targets An internal bug revealed that this form can be formed from existing combines, and then fail to select. Add the same pattern as v2f64 -> v4f64 (the instructions are moving bits around the actual type does not matter here).
@llvm/pr-subscribers-backend-x86 Author: Mészáros Gergely (Maetveis) ChangesAn internal bug revealed that this form can be formed from existing combines, and then fail to select. Use the same pattern as v2f64 -> v4f64 (the instructions are moving bits around the actual type does not matter here). Full diff: https://github.com/llvm/llvm-project/pull/102853.diff 2 Files Affected:
diff --git a/llvm/lib/Target/X86/X86InstrSSE.td b/llvm/lib/Target/X86/X86InstrSSE.td
index 5f9211edfa161b..515e7d80a39637 100644
--- a/llvm/lib/Target/X86/X86InstrSSE.td
+++ b/llvm/lib/Target/X86/X86InstrSSE.td
@@ -7797,6 +7797,11 @@ let Predicates = [HasAVX1Only] in {
(VPSHUFDri (VMOV64toPQIrr GR64:$src), 0x44)>;
def : Pat<(v2i64 (X86VBroadcastld64 addr:$src)),
(VMOVDDUPrm addr:$src)>;
+
+ def : Pat<(v4i64 (X86VBroadcast v2i64:$src)),
+ (VINSERTF128rr (INSERT_SUBREG (v4i64 (IMPLICIT_DEF)),
+ (v2i64 (VMOVDDUPrr VR128:$src)), sub_xmm),
+ (v2i64 (VMOVDDUPrr VR128:$src)), 1)>;
}
//===----------------------------------------------------------------------===//
diff --git a/llvm/test/CodeGen/X86/combine-concatvectors.ll b/llvm/test/CodeGen/X86/combine-concatvectors.ll
index 17d22607cfea88..5b06a8091dce7e 100644
--- a/llvm/test/CodeGen/X86/combine-concatvectors.ll
+++ b/llvm/test/CodeGen/X86/combine-concatvectors.ll
@@ -118,3 +118,26 @@ define <4 x float> @concat_of_broadcast_v4f32_v8f32(ptr %a0, ptr %a1, ptr %a2) {
%shuffle1 = shufflevector <8 x float> %ld2, <8 x float> %shuffle, <4 x i32> <i32 6, i32 15, i32 12, i32 3>
ret <4 x float> %shuffle1
}
+
+define <4 x i64> @broadcast_of_shuffle_v2i64_v4i64(<16 x i8> %vecinit.i) {
+; AVX1-LABEL: broadcast_of_shuffle_v2i64_v4i64:
+; AVX1: # %bb.0: # %entry
+; AVX1-NEXT: vpsllq $56, %xmm0, %xmm0
+; AVX1-NEXT: vmovddup {{.*#+}} xmm0 = xmm0[0,0]
+; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
+; AVX1-NEXT: retq
+;
+; AVX2-LABEL: broadcast_of_shuffle_v2i64_v4i64:
+; AVX2: # %bb.0: # %entry
+; AVX2-NEXT: vpsllq $56, %xmm0, %xmm0
+; AVX2-NEXT: vpbroadcastq %xmm0, %ymm0
+; AVX2-NEXT: retq
+entry:
+ %vecinit15.i = shufflevector <16 x i8> %vecinit.i, <16 x i8> poison, <16 x i32> zeroinitializer
+ %0 = bitcast <16 x i8> %vecinit15.i to <2 x i64>
+ %1 = extractelement <2 x i64> %0, i64 0
+ %2 = and i64 %1, -72057594037927936 ; 0xFF00 0000 0000 0000
+ %3 = insertelement <4 x i64> poison, i64 %2, i64 0
+ %4 = shufflevector <4 x i64> %3, <4 x i64> poison, <4 x i32> zeroinitializer
+ ret <4 x i64> %4
+}
|
llvm/lib/Target/X86/X86InstrSSE.td
Outdated
def : Pat<(v4i64 (X86VBroadcast v2i64:$src)), | ||
(VINSERTF128rr (INSERT_SUBREG (v4i64 (IMPLICIT_DEF)), | ||
(v2i64 (VMOVDDUPrr VR128:$src)), sub_xmm), | ||
(v2i64 (VMOVDDUPrr VR128:$src)), 1)>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use VPSHUFDri 0x44 instead to reduce domain changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will change it.
I'm not versed in these instruction level considerations yet, are you referring to less mixing of "floating point" and integer instructions? Is this sentence in the Intel Optimization Manual relevant?
When writing SIMD code that works for both integer and floating-point data, use the subset of SIMD convert
instructions or load/store instructions to ensure that the input operands in XMM registers contain data types that
are properly defined to match the instruction.
Code sequences containing cross-typed usage produce the same result across different implementa-
tions but incur a significant performance penalty. Using SSE/SSE2/SSE3/SSSE3/SSE44.1 instruc-
tions to operate on type-mismatched SIMD data in the XMM register is strongly discouraged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the v4i32 -> v8i32 equivalent broadcast pattern occur as well?
I couldn't produce it by changing the types in the IR: https://godbolt.org/z/6nTPjhsqf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - cheers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
An internal bug revealed that this form can be formed from existing combines, and then fail to select. Use the same pattern as v2f64 -> v4f64 (the instructions are moving bits around the actual type does not matter here).