-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[DAGCombiner] Don't drop atomic property of original load. #75626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-selectiondag @llvm/pr-subscribers-backend-x86 Author: Jonas Paulsson (JonPsson1) ChangesIt seems that visitBITCAST() fails to pass on the atomic property of the load it is converting:
This patch simply passes the original MMO instead of different individual values, which looks ok given that values all come from the MMO. This way, the seq_cst bit is not missed. I expected NFC, but got some X86 test failures, which I auto-updated. I got warnings from the update_llc_test_checks.py scripts with these tests:
Patch is 41.99 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/75626.diff 8 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 1d4bfa6fde0352..5934e3ca1b7219 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -15165,8 +15165,7 @@ SDValue DAGCombiner::visitBITCAST(SDNode *N) {
*LN0->getMemOperand())) {
SDValue Load =
DAG.getLoad(VT, SDLoc(N), LN0->getChain(), LN0->getBasePtr(),
- LN0->getPointerInfo(), LN0->getAlign(),
- LN0->getMemOperand()->getFlags(), LN0->getAAInfo());
+ LN0->getMemOperand());
DAG.ReplaceAllUsesOfValueWith(N0.getValue(1), Load.getValue(1));
return Load;
}
diff --git a/llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast_from_memory.ll b/llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast_from_memory.ll
index 6d5f8a78cb1d70..99d9f6b41e70dd 100644
--- a/llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast_from_memory.ll
+++ b/llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast_from_memory.ll
@@ -3647,10 +3647,10 @@ define void @vec384_i32_widen_to_i128_factor4_broadcast_to_v3i128_factor3(ptr %i
; AVX-NEXT: vmovdqa (%rdi), %xmm2
; AVX-NEXT: vmovdqa 16(%rdi), %xmm3
; AVX-NEXT: vpaddb 48(%rsi), %xmm3, %xmm3
-; AVX-NEXT: vpaddb 32(%rsi), %xmm2, %xmm2
; AVX-NEXT: vpaddb 16(%rsi), %xmm0, %xmm0
-; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
+; AVX-NEXT: vpaddb 32(%rsi), %xmm2, %xmm2
; AVX-NEXT: vmovdqa %xmm2, 32(%rdx)
+; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
; AVX-NEXT: vmovdqa %xmm3, 48(%rdx)
; AVX-NEXT: vmovdqa %xmm1, (%rdx)
; AVX-NEXT: vzeroupper
@@ -3833,10 +3833,10 @@ define void @vec384_i64_widen_to_i128_factor2_broadcast_to_v3i128_factor3(ptr %i
; AVX-NEXT: vmovdqa (%rdi), %xmm2
; AVX-NEXT: vmovdqa 16(%rdi), %xmm3
; AVX-NEXT: vpaddb 48(%rsi), %xmm3, %xmm3
-; AVX-NEXT: vpaddb 32(%rsi), %xmm2, %xmm2
; AVX-NEXT: vpaddb 16(%rsi), %xmm0, %xmm0
-; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
+; AVX-NEXT: vpaddb 32(%rsi), %xmm2, %xmm2
; AVX-NEXT: vmovdqa %xmm2, 32(%rdx)
+; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
; AVX-NEXT: vmovdqa %xmm3, 48(%rdx)
; AVX-NEXT: vmovdqa %xmm1, (%rdx)
; AVX-NEXT: vzeroupper
diff --git a/llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-2.ll b/llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-2.ll
index eeea912a56a69a..04fd6d9300c18d 100644
--- a/llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-2.ll
+++ b/llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-2.ll
@@ -501,136 +501,138 @@ define void @load_i16_stride2_vf64(ptr %in.vec, ptr %out.vec0, ptr %out.vec1) no
; SSE-LABEL: load_i16_stride2_vf64:
; SSE: # %bb.0:
; SSE-NEXT: subq $40, %rsp
-; SSE-NEXT: movdqa 96(%rdi), %xmm13
-; SSE-NEXT: movdqa %xmm13, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: movdqa 112(%rdi), %xmm3
-; SSE-NEXT: movdqa %xmm3, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: movdqa 128(%rdi), %xmm11
-; SSE-NEXT: movdqa %xmm11, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: movdqa 144(%rdi), %xmm2
+; SSE-NEXT: movdqa 160(%rdi), %xmm14
+; SSE-NEXT: movdqa %xmm14, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; SSE-NEXT: movdqa 176(%rdi), %xmm2
; SSE-NEXT: movdqa %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: movdqa 160(%rdi), %xmm10
-; SSE-NEXT: movdqa %xmm10, (%rsp) # 16-byte Spill
-; SSE-NEXT: movdqa 176(%rdi), %xmm4
-; SSE-NEXT: movdqa %xmm4, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: movdqa (%rdi), %xmm9
-; SSE-NEXT: movdqa %xmm9, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: movdqa 16(%rdi), %xmm1
+; SSE-NEXT: movdqa 64(%rdi), %xmm11
+; SSE-NEXT: movdqa %xmm11, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; SSE-NEXT: movdqa 80(%rdi), %xmm1
; SSE-NEXT: movdqa %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: movdqa 32(%rdi), %xmm12
-; SSE-NEXT: movdqa %xmm12, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: movdqa 48(%rdi), %xmm14
-; SSE-NEXT: movdqa %xmm14, %xmm0
+; SSE-NEXT: movdqa 96(%rdi), %xmm9
+; SSE-NEXT: movdqa %xmm9, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; SSE-NEXT: movdqa 112(%rdi), %xmm4
+; SSE-NEXT: movdqa %xmm4, (%rsp) # 16-byte Spill
+; SSE-NEXT: movdqa (%rdi), %xmm10
+; SSE-NEXT: movdqa %xmm10, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; SSE-NEXT: movdqa 16(%rdi), %xmm7
+; SSE-NEXT: movdqa %xmm7, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; SSE-NEXT: movdqa 32(%rdi), %xmm13
+; SSE-NEXT: movdqa %xmm13, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; SSE-NEXT: movdqa 48(%rdi), %xmm0
+; SSE-NEXT: movdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
; SSE-NEXT: pslld $16, %xmm0
; SSE-NEXT: psrad $16, %xmm0
-; SSE-NEXT: pslld $16, %xmm12
-; SSE-NEXT: psrad $16, %xmm12
-; SSE-NEXT: packssdw %xmm0, %xmm12
-; SSE-NEXT: movdqa %xmm4, %xmm0
+; SSE-NEXT: pslld $16, %xmm13
+; SSE-NEXT: psrad $16, %xmm13
+; SSE-NEXT: packssdw %xmm0, %xmm13
+; SSE-NEXT: movdqa %xmm7, %xmm0
; SSE-NEXT: pslld $16, %xmm0
; SSE-NEXT: psrad $16, %xmm0
; SSE-NEXT: pslld $16, %xmm10
; SSE-NEXT: psrad $16, %xmm10
; SSE-NEXT: packssdw %xmm0, %xmm10
-; SSE-NEXT: movdqa %xmm1, %xmm0
+; SSE-NEXT: movdqa %xmm4, %xmm0
; SSE-NEXT: pslld $16, %xmm0
; SSE-NEXT: psrad $16, %xmm0
; SSE-NEXT: pslld $16, %xmm9
; SSE-NEXT: psrad $16, %xmm9
; SSE-NEXT: packssdw %xmm0, %xmm9
-; SSE-NEXT: movdqa %xmm2, %xmm0
+; SSE-NEXT: movdqa %xmm1, %xmm0
; SSE-NEXT: pslld $16, %xmm0
; SSE-NEXT: psrad $16, %xmm0
; SSE-NEXT: pslld $16, %xmm11
; SSE-NEXT: psrad $16, %xmm11
; SSE-NEXT: packssdw %xmm0, %xmm11
-; SSE-NEXT: movdqa %xmm3, %xmm0
+; SSE-NEXT: movdqa %xmm2, %xmm0
; SSE-NEXT: pslld $16, %xmm0
; SSE-NEXT: psrad $16, %xmm0
-; SSE-NEXT: pslld $16, %xmm13
-; SSE-NEXT: psrad $16, %xmm13
-; SSE-NEXT: packssdw %xmm0, %xmm13
-; SSE-NEXT: movdqa 240(%rdi), %xmm0
-; SSE-NEXT: movdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; SSE-NEXT: pslld $16, %xmm14
+; SSE-NEXT: psrad $16, %xmm14
+; SSE-NEXT: packssdw %xmm0, %xmm14
+; SSE-NEXT: movdqa 144(%rdi), %xmm7
+; SSE-NEXT: movdqa %xmm7, %xmm0
; SSE-NEXT: pslld $16, %xmm0
; SSE-NEXT: psrad $16, %xmm0
-; SSE-NEXT: movdqa 224(%rdi), %xmm7
-; SSE-NEXT: movdqa %xmm7, %xmm15
+; SSE-NEXT: movdqa 128(%rdi), %xmm8
+; SSE-NEXT: movdqa %xmm8, %xmm15
; SSE-NEXT: pslld $16, %xmm15
; SSE-NEXT: psrad $16, %xmm15
; SSE-NEXT: packssdw %xmm0, %xmm15
-; SSE-NEXT: movdqa 80(%rdi), %xmm3
-; SSE-NEXT: movdqa %xmm3, %xmm1
+; SSE-NEXT: movdqa 240(%rdi), %xmm12
+; SSE-NEXT: movdqa %xmm12, %xmm1
; SSE-NEXT: pslld $16, %xmm1
; SSE-NEXT: psrad $16, %xmm1
-; SSE-NEXT: movdqa 64(%rdi), %xmm5
-; SSE-NEXT: movdqa %xmm5, %xmm4
+; SSE-NEXT: movdqa 224(%rdi), %xmm5
+; SSE-NEXT: movdqa %xmm5, %xmm3
+; SSE-NEXT: pslld $16, %xmm3
+; SSE-NEXT: psrad $16, %xmm3
+; SSE-NEXT: packssdw %xmm1, %xmm3
+; SSE-NEXT: movdqa 208(%rdi), %xmm6
+; SSE-NEXT: movdqa %xmm6, %xmm4
; SSE-NEXT: pslld $16, %xmm4
; SSE-NEXT: psrad $16, %xmm4
-; SSE-NEXT: packssdw %xmm1, %xmm4
-; SSE-NEXT: movdqa 208(%rdi), %xmm8
-; SSE-NEXT: movdqa %xmm8, %xmm6
-; SSE-NEXT: pslld $16, %xmm6
-; SSE-NEXT: psrad $16, %xmm6
; SSE-NEXT: movdqa 192(%rdi), %xmm2
; SSE-NEXT: movdqa %xmm2, %xmm1
; SSE-NEXT: pslld $16, %xmm1
; SSE-NEXT: psrad $16, %xmm1
-; SSE-NEXT: packssdw %xmm6, %xmm1
-; SSE-NEXT: psrad $16, %xmm14
-; SSE-NEXT: movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
-; SSE-NEXT: psrad $16, %xmm0
-; SSE-NEXT: packssdw %xmm14, %xmm0
-; SSE-NEXT: movdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; SSE-NEXT: packssdw %xmm4, %xmm1
; SSE-NEXT: movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
; SSE-NEXT: psrad $16, %xmm0
-; SSE-NEXT: movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm6 # 16-byte Reload
-; SSE-NEXT: psrad $16, %xmm6
-; SSE-NEXT: packssdw %xmm0, %xmm6
-; SSE-NEXT: movdqa %xmm6, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; SSE-NEXT: movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm4 # 16-byte Reload
+; SSE-NEXT: psrad $16, %xmm4
+; SSE-NEXT: packssdw %xmm0, %xmm4
+; SSE-NEXT: movdqa %xmm4, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
; SSE-NEXT: movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
; SSE-NEXT: psrad $16, %xmm0
-; SSE-NEXT: movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm14 # 16-byte Reload
-; SSE-NEXT: psrad $16, %xmm14
-; SSE-NEXT: packssdw %xmm0, %xmm14
-; SSE-NEXT: psrad $16, %xmm3
-; SSE-NEXT: psrad $16, %xmm5
-; SSE-NEXT: packssdw %xmm3, %xmm5
+; SSE-NEXT: movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm4 # 16-byte Reload
+; SSE-NEXT: psrad $16, %xmm4
+; SSE-NEXT: packssdw %xmm0, %xmm4
+; SSE-NEXT: movdqa %xmm4, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
; SSE-NEXT: movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
; SSE-NEXT: psrad $16, %xmm0
-; SSE-NEXT: movdqa (%rsp), %xmm6 # 16-byte Reload
-; SSE-NEXT: psrad $16, %xmm6
-; SSE-NEXT: packssdw %xmm0, %xmm6
-; SSE-NEXT: movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
+; SSE-NEXT: movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm4 # 16-byte Reload
+; SSE-NEXT: psrad $16, %xmm4
+; SSE-NEXT: packssdw %xmm0, %xmm4
+; SSE-NEXT: movdqa %xmm4, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; SSE-NEXT: movdqa (%rsp), %xmm0 # 16-byte Reload
; SSE-NEXT: psrad $16, %xmm0
-; SSE-NEXT: movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm3 # 16-byte Reload
-; SSE-NEXT: psrad $16, %xmm3
-; SSE-NEXT: packssdw %xmm0, %xmm3
+; SSE-NEXT: movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm4 # 16-byte Reload
+; SSE-NEXT: psrad $16, %xmm4
+; SSE-NEXT: packssdw %xmm0, %xmm4
+; SSE-NEXT: psrad $16, %xmm7
+; SSE-NEXT: psrad $16, %xmm8
+; SSE-NEXT: packssdw %xmm7, %xmm8
; SSE-NEXT: movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
; SSE-NEXT: psrad $16, %xmm0
+; SSE-NEXT: movdqa {{[-0-9]+}}(%r{{[sb]}}p), %xmm7 # 16-byte Reload
; SSE-NEXT: psrad $16, %xmm7
; SSE-NEXT: packssdw %xmm0, %xmm7
-; SSE-NEXT: psrad $16, %xmm8
+; SSE-NEXT: psrad $16, %xmm6
; SSE-NEXT: psrad $16, %xmm2
-; SSE-NEXT: packssdw %xmm8, %xmm2
+; SSE-NEXT: packssdw %xmm6, %xmm2
+; SSE-NEXT: psrad $16, %xmm12
+; SSE-NEXT: psrad $16, %xmm5
+; SSE-NEXT: packssdw %xmm12, %xmm5
; SSE-NEXT: movdqa %xmm1, 96(%rsi)
-; SSE-NEXT: movdqa %xmm4, 32(%rsi)
-; SSE-NEXT: movdqa %xmm15, 112(%rsi)
-; SSE-NEXT: movdqa %xmm13, 48(%rsi)
-; SSE-NEXT: movdqa %xmm11, 64(%rsi)
-; SSE-NEXT: movdqa %xmm9, (%rsi)
-; SSE-NEXT: movdqa %xmm10, 80(%rsi)
-; SSE-NEXT: movdqa %xmm12, 16(%rsi)
+; SSE-NEXT: movdqa %xmm3, 112(%rsi)
+; SSE-NEXT: movdqa %xmm15, 64(%rsi)
+; SSE-NEXT: movdqa %xmm14, 80(%rsi)
+; SSE-NEXT: movdqa %xmm11, 32(%rsi)
+; SSE-NEXT: movdqa %xmm9, 48(%rsi)
+; SSE-NEXT: movdqa %xmm10, (%rsi)
+; SSE-NEXT: movdqa %xmm13, 16(%rsi)
+; SSE-NEXT: movdqa %xmm5, 112(%rdx)
; SSE-NEXT: movdqa %xmm2, 96(%rdx)
-; SSE-NEXT: movdqa %xmm7, 112(%rdx)
-; SSE-NEXT: movdqa %xmm3, 64(%rdx)
-; SSE-NEXT: movdqa %xmm6, 80(%rdx)
-; SSE-NEXT: movdqa %xmm5, 32(%rdx)
-; SSE-NEXT: movdqa %xmm14, 48(%rdx)
+; SSE-NEXT: movdqa %xmm7, 80(%rdx)
+; SSE-NEXT: movdqa %xmm8, 64(%rdx)
+; SSE-NEXT: movdqa %xmm4, 48(%rdx)
; SSE-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
-; SSE-NEXT: movaps %xmm0, (%rdx)
+; SSE-NEXT: movaps %xmm0, 32(%rdx)
; SSE-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
; SSE-NEXT: movaps %xmm0, 16(%rdx)
+; SSE-NEXT: movaps {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Reload
+; SSE-NEXT: movaps %xmm0, (%rdx)
; SSE-NEXT: addq $40, %rsp
; SSE-NEXT: retq
;
diff --git a/llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-4.ll b/llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-4.ll
index 8eb26687600404..22e353f2502d66 100644
--- a/llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-4.ll
+++ b/llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-4.ll
@@ -1138,11 +1138,11 @@ define void @load_i16_stride4_vf32(ptr %in.vec, ptr %out.vec0, ptr %out.vec1, pt
; SSE-LABEL: load_i16_stride4_vf32:
; SSE: # %bb.0:
; SSE-NEXT: subq $248, %rsp
-; SSE-NEXT: movdqa 224(%rdi), %xmm3
+; SSE-NEXT: movdqa 160(%rdi), %xmm3
; SSE-NEXT: movdqa %xmm3, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: movdqa 192(%rdi), %xmm4
+; SSE-NEXT: movdqa 128(%rdi), %xmm4
; SSE-NEXT: movdqa %xmm4, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: movdqa 208(%rdi), %xmm5
+; SSE-NEXT: movdqa 144(%rdi), %xmm5
; SSE-NEXT: movdqa %xmm5, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
; SSE-NEXT: movdqa 96(%rdi), %xmm2
; SSE-NEXT: movdqa %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
@@ -1162,22 +1162,22 @@ define void @load_i16_stride4_vf32(ptr %in.vec, ptr %out.vec0, ptr %out.vec1, pt
; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm6[0,2,2,3]
; SSE-NEXT: movdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,1,0,2,4,5,6,7]
-; SSE-NEXT: pshufd {{.*#+}} xmm10 = xmm2[0,2,2,3]
-; SSE-NEXT: pshuflw {{.*#+}} xmm2 = xmm10[0,1,0,2,4,5,6,7]
+; SSE-NEXT: pshufd {{.*#+}} xmm13 = xmm2[0,2,2,3]
+; SSE-NEXT: pshuflw {{.*#+}} xmm2 = xmm13[0,1,0,2,4,5,6,7]
; SSE-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]
; SSE-NEXT: movsd {{.*#+}} xmm2 = xmm1[0],xmm2[1]
; SSE-NEXT: movapd %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: pshufd {{.*#+}} xmm9 = xmm5[0,2,2,3]
-; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm9[0,2,2,3,4,5,6,7]
-; SSE-NEXT: pshufd {{.*#+}} xmm8 = xmm4[0,2,2,3]
-; SSE-NEXT: pshuflw {{.*#+}} xmm1 = xmm8[0,2,2,3,4,5,6,7]
+; SSE-NEXT: pshufd {{.*#+}} xmm15 = xmm5[0,2,2,3]
+; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm15[0,2,2,3,4,5,6,7]
+; SSE-NEXT: pshufd {{.*#+}} xmm11 = xmm4[0,2,2,3]
+; SSE-NEXT: pshuflw {{.*#+}} xmm1 = xmm11[0,2,2,3,4,5,6,7]
; SSE-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; SSE-NEXT: movdqa 240(%rdi), %xmm0
+; SSE-NEXT: movdqa 176(%rdi), %xmm0
; SSE-NEXT: movdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: pshufd {{.*#+}} xmm6 = xmm0[0,2,2,3]
-; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm6[0,1,0,2,4,5,6,7]
-; SSE-NEXT: pshufd {{.*#+}} xmm7 = xmm3[0,2,2,3]
-; SSE-NEXT: pshuflw {{.*#+}} xmm2 = xmm7[0,1,0,2,4,5,6,7]
+; SSE-NEXT: pshufd {{.*#+}} xmm10 = xmm0[0,2,2,3]
+; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm10[0,1,0,2,4,5,6,7]
+; SSE-NEXT: pshufd {{.*#+}} xmm9 = xmm3[0,2,2,3]
+; SSE-NEXT: pshuflw {{.*#+}} xmm2 = xmm9[0,1,0,2,4,5,6,7]
; SSE-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]
; SSE-NEXT: movsd {{.*#+}} xmm2 = xmm1[0],xmm2[1]
; SSE-NEXT: movapd %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
@@ -1185,8 +1185,8 @@ define void @load_i16_stride4_vf32(ptr %in.vec, ptr %out.vec0, ptr %out.vec1, pt
; SSE-NEXT: movdqa %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
; SSE-NEXT: movdqa 16(%rdi), %xmm0
; SSE-NEXT: movdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: pshufd {{.*#+}} xmm5 = xmm0[0,2,2,3]
-; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm5[0,2,2,3,4,5,6,7]
+; SSE-NEXT: pshufd {{.*#+}} xmm8 = xmm0[0,2,2,3]
+; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm8[0,2,2,3,4,5,6,7]
; SSE-NEXT: pshufd {{.*#+}} xmm12 = xmm1[0,2,2,3]
; SSE-NEXT: pshuflw {{.*#+}} xmm1 = xmm12[0,2,2,3,4,5,6,7]
; SSE-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
@@ -1194,33 +1194,33 @@ define void @load_i16_stride4_vf32(ptr %in.vec, ptr %out.vec0, ptr %out.vec1, pt
; SSE-NEXT: movdqa %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
; SSE-NEXT: movdqa 48(%rdi), %xmm0
; SSE-NEXT: movdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: pshufd {{.*#+}} xmm11 = xmm0[0,2,2,3]
-; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm11[0,1,0,2,4,5,6,7]
-; SSE-NEXT: pshufd {{.*#+}} xmm14 = xmm2[0,2,2,3]
-; SSE-NEXT: pshuflw {{.*#+}} xmm2 = xmm14[0,1,0,2,4,5,6,7]
+; SSE-NEXT: pshufd {{.*#+}} xmm7 = xmm0[0,2,2,3]
+; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm7[0,1,0,2,4,5,6,7]
+; SSE-NEXT: pshufd {{.*#+}} xmm6 = xmm2[0,2,2,3]
+; SSE-NEXT: pshuflw {{.*#+}} xmm2 = xmm6[0,1,0,2,4,5,6,7]
; SSE-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm0[0],xmm2[1],xmm0[1]
; SSE-NEXT: movsd {{.*#+}} xmm2 = xmm1[0],xmm2[1]
; SSE-NEXT: movapd %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: movdqa 128(%rdi), %xmm0
+; SSE-NEXT: movdqa 192(%rdi), %xmm0
; SSE-NEXT: movdqa %xmm0, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: movdqa 144(%rdi), %xmm1
+; SSE-NEXT: movdqa 208(%rdi), %xmm1
; SSE-NEXT: movdqa %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: pshufd {{.*#+}} xmm13 = xmm1[0,2,2,3]
-; SSE-NEXT: pshuflw {{.*#+}} xmm1 = xmm13[0,2,2,3,4,5,6,7]
+; SSE-NEXT: pshufd {{.*#+}} xmm5 = xmm1[0,2,2,3]
+; SSE-NEXT: pshuflw {{.*#+}} xmm1 = xmm5[0,2,2,3,4,5,6,7]
; SSE-NEXT: pshufd {{.*#+}} xmm4 = xmm0[0,2,2,3]
; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm4[0,2,2,3,4,5,6,7]
; SSE-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
-; SSE-NEXT: movdqa 160(%rdi), %xmm2
+; SSE-NEXT: movdqa 224(%rdi), %xmm2
; SSE-NEXT: movdqa %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: movdqa 176(%rdi), %xmm1
+; SSE-NEXT: movdqa 240(%rdi), %xmm1
; SSE-NEXT: movdqa %xmm1, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
; SSE-NEXT: pshufd {{.*#+}} xmm3 = xmm1[0,2,2,3]
; SSE-NEXT: pshuflw {{.*#+}} xmm1 = xmm3[0,1,0,2,4,5,6,7]
; SSE-NEXT: pshufd {{.*#+}} xmm2 = xmm2[0,2,2,3]
-; SSE-NEXT: pshuflw {{.*#+}} xmm15 = xmm2[0,1,0,2,4,5,6,7]
-; SSE-NEXT: punpckldq {{.*#+}} xmm15 = xmm15[0],xmm1[0],xmm15[1],xmm1[1]
-; SSE-NEXT: movsd {{.*#+}} xmm15 = xmm0[0],xmm15[1]
-; SSE-NEXT: movapd %xmm15, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; SSE-NEXT: pshuflw {{.*#+}} xmm14 = xmm2[0,1,0,2,4,5,6,7]
+; SSE-NEXT: punpckldq {{.*#+}} xmm14 = xmm14[0],xmm1[0],xmm14[1],xmm1[1]
+; SSE-NEXT: movsd {{.*#+}} xmm14 = xmm0[0],xmm14[1]
+; SSE-NEXT: movapd %xmm14, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
; SSE-NEXT: pshuflw $237, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
; SSE-NEXT: # xmm0 = mem[1,3,2,3,4,5,6,7]
; SSE-NEXT: pshuflw $237, (%rsp), %xmm1 # 16-byte Folded Reload
@@ -1228,33 +1228,32 @@ define void @load_i16_stride4_vf32(ptr %in.vec, ptr %out.vec0, ptr %out.vec1, pt
; SSE-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
; SSE-NEXT: pshuflw $212, {{[-0-9]+}}(%r{{[sb]}}p), %xmm0 # 16-byte Folded Reload
; SSE-NEXT: # xmm0 = mem[0,1,1,3,4,5,6,7]
-; SSE-NEXT: pshuflw {{.*#+}} xmm10 = xmm10[0,1,1,3,4,5,6,7]
-; SSE-NEXT: punpckldq {{.*#+}} xmm10 = xmm10[0],xmm0[0],xmm10[1],xmm0[1]
-; SSE-NEXT: movsd {{.*#+}} xmm10 = xmm1[0],xmm10[1]
-; SSE-NEXT: movapd %xmm10, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
-; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm9[1,3,2,3,4,5,6,7]
-; SSE-NEXT: pshuflw {{.*#+}} xmm1 = xmm8[1,3,2,3,4,5,6,7]
+; SSE-NEXT: pshuflw {{.*#+}} xmm13 = xmm13[0,1,1,3,4,5,6,7]
+; SSE-NEXT: punpckldq {{.*#+}} xmm13 = xmm13[0],xmm0[0],xmm13[1],xmm0[1]
+; SSE-NEXT: movsd {{.*#+}} xmm13 = xmm1[0],xmm13[1]
+; SSE-NEXT: movapd %xmm13, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
+; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm15[1,3,2,3,4,5,6,7]
+; SSE-NEXT: pshuflw {{.*#+}} xmm1 = xmm11[1,3,2,3,4,5,6,7]
; SSE-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
-; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm6[0,1,1,3,4,5,6,7]
-; SSE-NEXT: pshuflw {{.*#+}} xmm6 = xmm7[0,1,1,3,4,5,6,7]
-; SSE-NEXT: punpckldq {{.*#+}} xmm6 = xmm6[0],xmm0[0],xmm6[1],xmm0[1]
-; SSE-NEXT: movsd {{.*#+}} xmm6 = xmm1[0],xmm6[1]
-; SSE-NEXT: movapd %xmm6, (%rsp) # 16-byte Spill
-; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm5[1,3,2,3,4,5,6,7]
+; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm10[0,1,1,3,4,5,6,7]
+; SSE-NEXT: pshuflw {{.*#+}} xmm9 = xmm9[0,1,1,3,4,5,6,7]
+; SSE-NEXT: punpckldq {{.*#+}} xmm9 = xmm9[0],xmm0[0],xmm9...
[truncated]
|
You can test this locally with the following command:git-clang-format --diff b522675816d6d5c0b0ae23e14115f4aa5a081583 5d8c80dca1b316494fa29ed8770c696274ea791d -- llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp View the diff from clang-format here.diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 5934e3ca1b..f82d574ce9 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -15163,9 +15163,8 @@ SDValue DAGCombiner::visitBITCAST(SDNode *N) {
if (TLI.isLoadBitCastBeneficial(N0.getValueType(), VT, DAG,
*LN0->getMemOperand())) {
- SDValue Load =
- DAG.getLoad(VT, SDLoc(N), LN0->getChain(), LN0->getBasePtr(),
- LN0->getMemOperand());
+ SDValue Load = DAG.getLoad(VT, SDLoc(N), LN0->getChain(),
+ LN0->getBasePtr(), LN0->getMemOperand());
DAG.ReplaceAllUsesOfValueWith(N0.getValue(1), Load.getValue(1));
return Load;
}
|
The update warnings are not a problem - we reuse the same check-prefixes for all the vector-interleaved-* tests to make them easier to maintain, so there are some cases where prefixes don't get used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to create a test for the lost atomic property? I haven't checked but my guess is the x86 cases are losing the original dereferencable memory size or something as the SSE/AVX cases are splitting 512-bit loads down to 4 x 128-bits, so this patch is helping.
Shouldn't a seq_cst load be an ISD::ATOMIC_LOAD not an ISD::LOAD? |
This changes causes the same X86 changes
getAlign() combines the base alignment of the MMO with the MMO offset. This can cause the base alignment of the newly created MMO to become smaller than the original MMO. This can produce different results if the MMO is later split into multiple MMOs with different offsets. |
I don't quite follow how this could go wrong as folding the bitcast doesn't change the memory access in any way. Shouldn't the later splitting also take into account the original offset? |
I may have explained that badly. The code should have used getOriginalAlign to begin with. Not the first case of this I've seen. Using getAlign probably decreased the alignment based on the trailing zeros of the current offset. A later split probably gave a new offset that had more trailing zeros. By using original align we are able to infer a larger alignment for this new offset. With getAlign it's been overly reduced so the new offset can't increase it. This patch has the same effect as using getOriginalAlign. |
Good question - it seems SystemZ::lowerATOMIC_LOAD() simply replaces the ATOMIC_LOAD with a normal LOAD which is ok generally for the architecture as all naturally aligned loads are atomic. The MachineMemOperand has the isAtomic() method, which is used by the MemSDNode::isSimple(). LoadSDNode and AtomicSDNode are separate subclasses derived from MemSDNode. This is a little confusing to me: if the isAtomic() is only relevent for AtomicSDNode, AtomicSDNode should override isSimple() and always return 'false', and MemSDNode should not care about isAtomic(). I agree that it could be that the intent is actually that MMO->isAtomic() of any regular LOAD/STORE is ok to ignore, so this patch is not really relevant..? Maybe instead a comment documenting this would be more appropriate if this is the case. On the other hand, if DAGCombiner is indeed free to ignore the MMO->isAtomic() on any STORE/LOAD, then the SystemZ lowering makes me a little worried. Could DAGCombiner ever e.g. combine loads with an underaligned result, e.g. two 4-byte loads resulting in a single 8-byte load aligned only to 4 bytes? |
We have various TLI.allowsMemoryAccess checks for load combines which should prevent merging consecutive atomic loads etc. Is it worth doing the OriginalAlign fix as a preliminary patch? |
It seems this patch actually shouldn't go in since it is probably not intentional to have DAGCombiner work on LOADs that are actually atomic. In other words, the MMO atomic state would generally be ignored with an ISD::LOAD, so if anything the SystemZ::lowerATOMIC_LOAD() should be improved. Ok to close this? |
I think this should still go ahead. I still think it's a good idea to get rid of ISD::ATOMIC_LOAD and migrate to using load with atomic operands. This is a code simplification anyway |
Pulled out of llvm#75626 to allow it focus on atomic loads
Not sure I agree with that.
Agreed, this patch is good to submit as a simplification and bugfix for the alignment, regardless of the issues around ATOMIC_LOAD. |
It seems that visitBITCAST() fails to pass on the atomic property of the load it is converting:
This patch simply passes the original MMO instead of different individual values, which looks ok given that values all come from the MMO. This way, the seq_cst bit is not missed.
I expected NFC, but got some X86 test failures, which I auto-updated. I got warnings from the update_llc_test_checks.py scripts with these tests: