-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[DAG] Preserve NUW when reassociating #87621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Similarly to the generic case below, preserve the NUW flag when reassociating adds with constants.
@llvm/pr-subscribers-backend-webassembly @llvm/pr-subscribers-backend-amdgpu Author: Piotr Sobczak (piotrAMD) ChangesSimilarly to the generic case below, preserve the NUW flag when reassociating adds with constants. Patch is 49.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/87621.diff 4 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 0a473180538a5d..c7393be201dd74 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -1166,8 +1166,13 @@ SDValue DAGCombiner::reassociateOpsCommutative(unsigned Opc, const SDLoc &DL,
if (DAG.isConstantIntBuildVectorOrConstantInt(peekThroughBitcasts(N01))) {
if (DAG.isConstantIntBuildVectorOrConstantInt(peekThroughBitcasts(N1))) {
// Reassociate: (op (op x, c1), c2) -> (op x, (op c1, c2))
- if (SDValue OpNode = DAG.FoldConstantArithmetic(Opc, DL, VT, {N01, N1}))
- return DAG.getNode(Opc, DL, VT, N00, OpNode);
+ if (SDValue OpNode = DAG.FoldConstantArithmetic(Opc, DL, VT, {N01, N1})) {
+ SDNodeFlags NewFlags;
+ if (N0.getOpcode() == ISD::ADD && N0->getFlags().hasNoUnsignedWrap() &&
+ Flags.hasNoUnsignedWrap())
+ NewFlags.setNoUnsignedWrap(true);
+ return DAG.getNode(Opc, DL, VT, N00, OpNode, NewFlags);
+ }
return SDValue();
}
if (TLI.isReassocProfitable(DAG, N0, N1)) {
diff --git a/llvm/test/CodeGen/AMDGPU/bf16.ll b/llvm/test/CodeGen/AMDGPU/bf16.ll
index 98658834e89784..bf4302c156d83d 100644
--- a/llvm/test/CodeGen/AMDGPU/bf16.ll
+++ b/llvm/test/CodeGen/AMDGPU/bf16.ll
@@ -5678,22 +5678,18 @@ define { <32 x i32>, bfloat } @test_overflow_stack(bfloat %a, <32 x i32> %b) {
; GFX11-NEXT: scratch_load_b32 v33, off, s32 offset:8
; GFX11-NEXT: scratch_load_b32 v32, off, s32 offset:4
; GFX11-NEXT: scratch_load_b32 v31, off, s32
-; GFX11-NEXT: v_readfirstlane_b32 s0, v0
-; GFX11-NEXT: s_clause 0x4
-; GFX11-NEXT: scratch_store_b128 off, v[18:21], s0 offset:64
-; GFX11-NEXT: scratch_store_b128 off, v[10:13], s0 offset:32
-; GFX11-NEXT: scratch_store_b128 off, v[6:9], s0 offset:16
-; GFX11-NEXT: scratch_store_b128 off, v[2:5], s0
-; GFX11-NEXT: scratch_store_b16 off, v1, s0 offset:128
-; GFX11-NEXT: s_add_i32 s1, s0, 0x70
-; GFX11-NEXT: s_add_i32 s2, s0, 0x60
-; GFX11-NEXT: s_add_i32 s3, s0, 0x50
-; GFX11-NEXT: s_add_i32 s0, s0, 48
+; GFX11-NEXT: s_clause 0x5
+; GFX11-NEXT: scratch_store_b128 v0, v[22:25], off offset:80
+; GFX11-NEXT: scratch_store_b128 v0, v[18:21], off offset:64
+; GFX11-NEXT: scratch_store_b128 v0, v[14:17], off offset:48
+; GFX11-NEXT: scratch_store_b128 v0, v[10:13], off offset:32
+; GFX11-NEXT: scratch_store_b128 v0, v[6:9], off offset:16
+; GFX11-NEXT: scratch_store_b128 v0, v[2:5], off
; GFX11-NEXT: s_waitcnt vmcnt(0)
-; GFX11-NEXT: scratch_store_b128 off, v[30:33], s1
-; GFX11-NEXT: scratch_store_b128 off, v[26:29], s2
-; GFX11-NEXT: scratch_store_b128 off, v[22:25], s3
-; GFX11-NEXT: scratch_store_b128 off, v[14:17], s0
+; GFX11-NEXT: s_clause 0x2
+; GFX11-NEXT: scratch_store_b128 v0, v[30:33], off offset:112
+; GFX11-NEXT: scratch_store_b128 v0, v[26:29], off offset:96
+; GFX11-NEXT: scratch_store_b16 v0, v1, off offset:128
; GFX11-NEXT: s_setpc_b64 s[30:31]
%ins.0 = insertvalue { <32 x i32>, bfloat } poison, <32 x i32> %b, 0
%ins.1 = insertvalue { <32 x i32>, bfloat } %ins.0 ,bfloat %a, 1
@@ -8827,19 +8823,6 @@ define <32 x double> @global_extload_v32bf16_to_v32f64(ptr addrspace(1) %ptr) {
; GFX11-NEXT: global_load_u16 v32, v[1:2], off offset:54
; GFX11-NEXT: global_load_u16 v33, v[1:2], off offset:58
; GFX11-NEXT: global_load_u16 v1, v[1:2], off offset:62
-; GFX11-NEXT: v_readfirstlane_b32 s0, v0
-; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT: s_add_i32 s1, s0, 0xf0
-; GFX11-NEXT: s_add_i32 s2, s0, 0xe0
-; GFX11-NEXT: s_add_i32 s3, s0, 0xd0
-; GFX11-NEXT: s_add_i32 s4, s0, 0xc0
-; GFX11-NEXT: s_add_i32 s5, s0, 0xb0
-; GFX11-NEXT: s_add_i32 s6, s0, 0xa0
-; GFX11-NEXT: s_add_i32 s7, s0, 0x90
-; GFX11-NEXT: s_add_i32 s8, s0, 0x70
-; GFX11-NEXT: s_add_i32 s9, s0, 0x60
-; GFX11-NEXT: s_add_i32 s10, s0, 0x50
-; GFX11-NEXT: s_add_i32 s11, s0, 48
; GFX11-NEXT: s_waitcnt vmcnt(31)
; GFX11-NEXT: v_lshlrev_b32_e32 v39, 16, v3
; GFX11-NEXT: s_waitcnt vmcnt(30)
@@ -8936,23 +8919,23 @@ define <32 x double> @global_extload_v32bf16_to_v32f64(ptr addrspace(1) %ptr) {
; GFX11-NEXT: v_cvt_f64_f32_e32 v[5:6], v5
; GFX11-NEXT: v_cvt_f64_f32_e32 v[3:4], v2
; GFX11-NEXT: v_cvt_f64_f32_e32 v[1:2], v37
-; GFX11-NEXT: scratch_store_b128 off, v[96:99], s1
-; GFX11-NEXT: scratch_store_b128 off, v[84:87], s2
-; GFX11-NEXT: scratch_store_b128 off, v[80:83], s3
-; GFX11-NEXT: scratch_store_b128 off, v[68:71], s4
-; GFX11-NEXT: scratch_store_b128 off, v[64:67], s5
-; GFX11-NEXT: scratch_store_b128 off, v[52:55], s6
-; GFX11-NEXT: scratch_store_b128 off, v[48:51], s7
-; GFX11-NEXT: scratch_store_b128 off, v[33:36], s0 offset:128
-; GFX11-NEXT: scratch_store_b128 off, v[29:32], s8
-; GFX11-NEXT: scratch_store_b128 off, v[25:28], s9
-; GFX11-NEXT: scratch_store_b128 off, v[21:24], s10
-; GFX11-NEXT: scratch_store_b128 off, v[17:20], s0 offset:64
-; GFX11-NEXT: scratch_store_b128 off, v[13:16], s11
-; GFX11-NEXT: s_clause 0x2
-; GFX11-NEXT: scratch_store_b128 off, v[9:12], s0 offset:32
-; GFX11-NEXT: scratch_store_b128 off, v[5:8], s0 offset:16
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0
+; GFX11-NEXT: s_clause 0xf
+; GFX11-NEXT: scratch_store_b128 v0, v[96:99], off offset:240
+; GFX11-NEXT: scratch_store_b128 v0, v[84:87], off offset:224
+; GFX11-NEXT: scratch_store_b128 v0, v[80:83], off offset:208
+; GFX11-NEXT: scratch_store_b128 v0, v[68:71], off offset:192
+; GFX11-NEXT: scratch_store_b128 v0, v[64:67], off offset:176
+; GFX11-NEXT: scratch_store_b128 v0, v[52:55], off offset:160
+; GFX11-NEXT: scratch_store_b128 v0, v[48:51], off offset:144
+; GFX11-NEXT: scratch_store_b128 v0, v[33:36], off offset:128
+; GFX11-NEXT: scratch_store_b128 v0, v[29:32], off offset:112
+; GFX11-NEXT: scratch_store_b128 v0, v[25:28], off offset:96
+; GFX11-NEXT: scratch_store_b128 v0, v[21:24], off offset:80
+; GFX11-NEXT: scratch_store_b128 v0, v[17:20], off offset:64
+; GFX11-NEXT: scratch_store_b128 v0, v[13:16], off offset:48
+; GFX11-NEXT: scratch_store_b128 v0, v[9:12], off offset:32
+; GFX11-NEXT: scratch_store_b128 v0, v[5:8], off offset:16
+; GFX11-NEXT: scratch_store_b128 v0, v[1:4], off
; GFX11-NEXT: s_setpc_b64 s[30:31]
%load = load <32 x bfloat>, ptr addrspace(1) %ptr
%fpext = fpext <32 x bfloat> %load to <32 x double>
diff --git a/llvm/test/CodeGen/AMDGPU/function-returns.ll b/llvm/test/CodeGen/AMDGPU/function-returns.ll
index acadee27981710..401cbce00ac9a8 100644
--- a/llvm/test/CodeGen/AMDGPU/function-returns.ll
+++ b/llvm/test/CodeGen/AMDGPU/function-returns.ll
@@ -1561,34 +1561,28 @@ define <33 x i32> @v33i32_func_void() #0 {
; GFX11-NEXT: buffer_load_b128 v[9:12], off, s[0:3], 0 offset:80
; GFX11-NEXT: buffer_load_b128 v[13:16], off, s[0:3], 0 offset:64
; GFX11-NEXT: buffer_load_b128 v[17:20], off, s[0:3], 0 offset:48
-; GFX11-NEXT: buffer_load_b128 v[21:24], off, s[0:3], 0 offset:16
-; GFX11-NEXT: buffer_load_b128 v[25:28], off, s[0:3], 0
-; GFX11-NEXT: buffer_load_b128 v[29:32], off, s[0:3], 0 offset:32
+; GFX11-NEXT: buffer_load_b128 v[21:24], off, s[0:3], 0 offset:32
+; GFX11-NEXT: buffer_load_b128 v[25:28], off, s[0:3], 0 offset:16
+; GFX11-NEXT: buffer_load_b128 v[29:32], off, s[0:3], 0
; GFX11-NEXT: buffer_load_b32 v33, off, s[0:3], 0 offset:128
-; GFX11-NEXT: v_readfirstlane_b32 s0, v0
-; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT: s_add_i32 s1, s0, 0x70
-; GFX11-NEXT: s_add_i32 s2, s0, 0x60
-; GFX11-NEXT: s_add_i32 s3, s0, 0x50
-; GFX11-NEXT: s_add_i32 s4, s0, 48
; GFX11-NEXT: s_waitcnt vmcnt(8)
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
+; GFX11-NEXT: scratch_store_b128 v0, v[1:4], off offset:112
; GFX11-NEXT: s_waitcnt vmcnt(7)
-; GFX11-NEXT: scratch_store_b128 off, v[5:8], s2
+; GFX11-NEXT: scratch_store_b128 v0, v[5:8], off offset:96
; GFX11-NEXT: s_waitcnt vmcnt(6)
-; GFX11-NEXT: scratch_store_b128 off, v[9:12], s3
+; GFX11-NEXT: scratch_store_b128 v0, v[9:12], off offset:80
; GFX11-NEXT: s_waitcnt vmcnt(5)
-; GFX11-NEXT: scratch_store_b128 off, v[13:16], s0 offset:64
+; GFX11-NEXT: scratch_store_b128 v0, v[13:16], off offset:64
; GFX11-NEXT: s_waitcnt vmcnt(4)
-; GFX11-NEXT: scratch_store_b128 off, v[17:20], s4
+; GFX11-NEXT: scratch_store_b128 v0, v[17:20], off offset:48
; GFX11-NEXT: s_waitcnt vmcnt(3)
-; GFX11-NEXT: scratch_store_b128 off, v[21:24], s0 offset:16
+; GFX11-NEXT: scratch_store_b128 v0, v[21:24], off offset:32
; GFX11-NEXT: s_waitcnt vmcnt(2)
-; GFX11-NEXT: scratch_store_b128 off, v[25:28], s0
+; GFX11-NEXT: scratch_store_b128 v0, v[25:28], off offset:16
; GFX11-NEXT: s_waitcnt vmcnt(1)
-; GFX11-NEXT: scratch_store_b128 off, v[29:32], s0 offset:32
+; GFX11-NEXT: scratch_store_b128 v0, v[29:32], off
; GFX11-NEXT: s_waitcnt vmcnt(0)
-; GFX11-NEXT: scratch_store_b32 off, v33, s0 offset:128
+; GFX11-NEXT: scratch_store_b32 v0, v33, off offset:128
; GFX11-NEXT: s_setpc_b64 s[30:31]
%ptr = load volatile ptr addrspace(1), ptr addrspace(4) undef
%val = load <33 x i32>, ptr addrspace(1) %ptr
@@ -1850,34 +1844,28 @@ define { <32 x i32>, i32 } @struct_v32i32_i32_func_void() #0 {
; GFX11-NEXT: buffer_load_b128 v[9:12], off, s[0:3], 0 offset:80
; GFX11-NEXT: buffer_load_b128 v[13:16], off, s[0:3], 0 offset:64
; GFX11-NEXT: buffer_load_b128 v[17:20], off, s[0:3], 0 offset:48
-; GFX11-NEXT: buffer_load_b128 v[21:24], off, s[0:3], 0 offset:16
-; GFX11-NEXT: buffer_load_b128 v[25:28], off, s[0:3], 0
-; GFX11-NEXT: buffer_load_b128 v[29:32], off, s[0:3], 0 offset:32
+; GFX11-NEXT: buffer_load_b128 v[21:24], off, s[0:3], 0 offset:32
+; GFX11-NEXT: buffer_load_b128 v[25:28], off, s[0:3], 0 offset:16
+; GFX11-NEXT: buffer_load_b128 v[29:32], off, s[0:3], 0
; GFX11-NEXT: buffer_load_b32 v33, off, s[0:3], 0 offset:128
-; GFX11-NEXT: v_readfirstlane_b32 s0, v0
-; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT: s_add_i32 s1, s0, 0x70
-; GFX11-NEXT: s_add_i32 s2, s0, 0x60
-; GFX11-NEXT: s_add_i32 s3, s0, 0x50
-; GFX11-NEXT: s_add_i32 s4, s0, 48
; GFX11-NEXT: s_waitcnt vmcnt(8)
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
+; GFX11-NEXT: scratch_store_b128 v0, v[1:4], off offset:112
; GFX11-NEXT: s_waitcnt vmcnt(7)
-; GFX11-NEXT: scratch_store_b128 off, v[5:8], s2
+; GFX11-NEXT: scratch_store_b128 v0, v[5:8], off offset:96
; GFX11-NEXT: s_waitcnt vmcnt(6)
-; GFX11-NEXT: scratch_store_b128 off, v[9:12], s3
+; GFX11-NEXT: scratch_store_b128 v0, v[9:12], off offset:80
; GFX11-NEXT: s_waitcnt vmcnt(5)
-; GFX11-NEXT: scratch_store_b128 off, v[13:16], s0 offset:64
+; GFX11-NEXT: scratch_store_b128 v0, v[13:16], off offset:64
; GFX11-NEXT: s_waitcnt vmcnt(4)
-; GFX11-NEXT: scratch_store_b128 off, v[17:20], s4
+; GFX11-NEXT: scratch_store_b128 v0, v[17:20], off offset:48
; GFX11-NEXT: s_waitcnt vmcnt(3)
-; GFX11-NEXT: scratch_store_b128 off, v[21:24], s0 offset:16
+; GFX11-NEXT: scratch_store_b128 v0, v[21:24], off offset:32
; GFX11-NEXT: s_waitcnt vmcnt(2)
-; GFX11-NEXT: scratch_store_b128 off, v[25:28], s0
+; GFX11-NEXT: scratch_store_b128 v0, v[25:28], off offset:16
; GFX11-NEXT: s_waitcnt vmcnt(1)
-; GFX11-NEXT: scratch_store_b128 off, v[29:32], s0 offset:32
+; GFX11-NEXT: scratch_store_b128 v0, v[29:32], off
; GFX11-NEXT: s_waitcnt vmcnt(0)
-; GFX11-NEXT: scratch_store_b32 off, v33, s0 offset:128
+; GFX11-NEXT: scratch_store_b32 v0, v33, off offset:128
; GFX11-NEXT: s_setpc_b64 s[30:31]
%ptr = load volatile ptr addrspace(1), ptr addrspace(4) undef
%val = load { <32 x i32>, i32 }, ptr addrspace(1) %ptr
@@ -2143,33 +2131,24 @@ define { i32, <32 x i32> } @struct_i32_v32i32_func_void() #0 {
; GFX11-NEXT: buffer_load_b128 v[25:28], off, s[0:3], 0 offset:144
; GFX11-NEXT: buffer_load_b128 v[29:32], off, s[0:3], 0 offset:128
; GFX11-NEXT: buffer_load_b32 v33, off, s[0:3], 0
-; GFX11-NEXT: v_readfirstlane_b32 s0, v0
-; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT: s_add_i32 s1, s0, 0xf0
-; GFX11-NEXT: s_add_i32 s2, s0, 0xe0
-; GFX11-NEXT: s_add_i32 s3, s0, 0xd0
-; GFX11-NEXT: s_add_i32 s4, s0, 0xc0
-; GFX11-NEXT: s_add_i32 s5, s0, 0xb0
-; GFX11-NEXT: s_add_i32 s6, s0, 0xa0
-; GFX11-NEXT: s_add_i32 s7, s0, 0x90
; GFX11-NEXT: s_waitcnt vmcnt(8)
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
+; GFX11-NEXT: scratch_store_b128 v0, v[1:4], off offset:240
; GFX11-NEXT: s_waitcnt vmcnt(7)
-; GFX11-NEXT: scratch_store_b128 off, v[5:8], s2
+; GFX11-NEXT: scratch_store_b128 v0, v[5:8], off offset:224
; GFX11-NEXT: s_waitcnt vmcnt(6)
-; GFX11-NEXT: scratch_store_b128 off, v[9:12], s3
+; GFX11-NEXT: scratch_store_b128 v0, v[9:12], off offset:208
; GFX11-NEXT: s_waitcnt vmcnt(5)
-; GFX11-NEXT: scratch_store_b128 off, v[13:16], s4
+; GFX11-NEXT: scratch_store_b128 v0, v[13:16], off offset:192
; GFX11-NEXT: s_waitcnt vmcnt(4)
-; GFX11-NEXT: scratch_store_b128 off, v[17:20], s5
+; GFX11-NEXT: scratch_store_b128 v0, v[17:20], off offset:176
; GFX11-NEXT: s_waitcnt vmcnt(3)
-; GFX11-NEXT: scratch_store_b128 off, v[21:24], s6
+; GFX11-NEXT: scratch_store_b128 v0, v[21:24], off offset:160
; GFX11-NEXT: s_waitcnt vmcnt(2)
-; GFX11-NEXT: scratch_store_b128 off, v[25:28], s7
+; GFX11-NEXT: scratch_store_b128 v0, v[25:28], off offset:144
; GFX11-NEXT: s_waitcnt vmcnt(1)
-; GFX11-NEXT: scratch_store_b128 off, v[29:32], s0 offset:128
+; GFX11-NEXT: scratch_store_b128 v0, v[29:32], off offset:128
; GFX11-NEXT: s_waitcnt vmcnt(0)
-; GFX11-NEXT: scratch_store_b32 off, v33, s0
+; GFX11-NEXT: scratch_store_b32 v0, v33, off
; GFX11-NEXT: s_setpc_b64 s[30:31]
%ptr = load volatile ptr addrspace(1), ptr addrspace(4) undef
%val = load { i32, <32 x i32> }, ptr addrspace(1) %ptr
diff --git a/llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll b/llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll
index c1d682689903ad..3b078c41f4a849 100644
--- a/llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll
+++ b/llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll
@@ -1989,256 +1989,138 @@ define amdgpu_gfx <512 x i32> @return_512xi32() #0 {
; GFX11-NEXT: s_mov_b32 s2, s0
; GFX11-NEXT: v_dual_mov_b32 v4, s3 :: v_dual_mov_b32 v3, s2
; GFX11-NEXT: v_dual_mov_b32 v2, s1 :: v_dual_mov_b32 v1, s0
-; GFX11-NEXT: v_readfirstlane_b32 s0, v0
-; GFX11-NEXT: s_clause 0x7
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0 offset:1024
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0 offset:512
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0 offset:256
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0 offset:128
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0 offset:64
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0 offset:32
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0 offset:16
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0
-; GFX11-NEXT: s_add_i32 s1, s0, 0x7f0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x7e0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x7d0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x7c0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x7b0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x7a0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x790
-; GFX11-NEXT: s_add_i32 s2, s0, 0x780
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x770
-; GFX11-NEXT: s_add_i32 s2, s0, 0x760
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x750
-; GFX11-NEXT: s_add_i32 s2, s0, 0x740
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x730
-; GFX11-NEXT: s_add_i32 s2, s0, 0x720
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x710
-; GFX11-NEXT: s_add_i32 s2, s0, 0x700
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x6f0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x6e0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x6d0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x6c0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x6b0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x6a0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x690
-; GFX11-NEXT: s_add_i32 s2, s0, 0x680
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x670
-; GFX11-NEXT: s_add_i32 s2, s0, 0x660
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x650
-; GFX11-NEXT: s_add_i32 s2, s0, 0x640
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x630
-; GFX11-NEXT: s_add_i32 s2, s0, 0x620
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x610
-; GFX11-NEXT: s_add_i32 s2, s0, 0x600
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x5f0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x5e0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x5d0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x5c0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x5b0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x5a0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x590
-; GFX11-NEXT: s_add_i32 s2, s0, 0x580
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x570
-; GFX11-NEXT: s_add_i32 s2, s0, 0x560
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x550
-; GFX11-NEXT: s_add_i32 s2, s0, 0x540
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x530
-; GFX11-NEXT: s_add_i32 s2, s0, 0x520
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x510
-; GFX11-NEXT: s_add_i32 s2, s0, 0x500
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x4f0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x4e0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x4d0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x4c0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x4b0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x4a0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2...
[truncated]
|
@llvm/pr-subscribers-llvm-selectiondag Author: Piotr Sobczak (piotrAMD) ChangesSimilarly to the generic case below, preserve the NUW flag when reassociating adds with constants. Patch is 49.19 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/87621.diff 4 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 0a473180538a5d..c7393be201dd74 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -1166,8 +1166,13 @@ SDValue DAGCombiner::reassociateOpsCommutative(unsigned Opc, const SDLoc &DL,
if (DAG.isConstantIntBuildVectorOrConstantInt(peekThroughBitcasts(N01))) {
if (DAG.isConstantIntBuildVectorOrConstantInt(peekThroughBitcasts(N1))) {
// Reassociate: (op (op x, c1), c2) -> (op x, (op c1, c2))
- if (SDValue OpNode = DAG.FoldConstantArithmetic(Opc, DL, VT, {N01, N1}))
- return DAG.getNode(Opc, DL, VT, N00, OpNode);
+ if (SDValue OpNode = DAG.FoldConstantArithmetic(Opc, DL, VT, {N01, N1})) {
+ SDNodeFlags NewFlags;
+ if (N0.getOpcode() == ISD::ADD && N0->getFlags().hasNoUnsignedWrap() &&
+ Flags.hasNoUnsignedWrap())
+ NewFlags.setNoUnsignedWrap(true);
+ return DAG.getNode(Opc, DL, VT, N00, OpNode, NewFlags);
+ }
return SDValue();
}
if (TLI.isReassocProfitable(DAG, N0, N1)) {
diff --git a/llvm/test/CodeGen/AMDGPU/bf16.ll b/llvm/test/CodeGen/AMDGPU/bf16.ll
index 98658834e89784..bf4302c156d83d 100644
--- a/llvm/test/CodeGen/AMDGPU/bf16.ll
+++ b/llvm/test/CodeGen/AMDGPU/bf16.ll
@@ -5678,22 +5678,18 @@ define { <32 x i32>, bfloat } @test_overflow_stack(bfloat %a, <32 x i32> %b) {
; GFX11-NEXT: scratch_load_b32 v33, off, s32 offset:8
; GFX11-NEXT: scratch_load_b32 v32, off, s32 offset:4
; GFX11-NEXT: scratch_load_b32 v31, off, s32
-; GFX11-NEXT: v_readfirstlane_b32 s0, v0
-; GFX11-NEXT: s_clause 0x4
-; GFX11-NEXT: scratch_store_b128 off, v[18:21], s0 offset:64
-; GFX11-NEXT: scratch_store_b128 off, v[10:13], s0 offset:32
-; GFX11-NEXT: scratch_store_b128 off, v[6:9], s0 offset:16
-; GFX11-NEXT: scratch_store_b128 off, v[2:5], s0
-; GFX11-NEXT: scratch_store_b16 off, v1, s0 offset:128
-; GFX11-NEXT: s_add_i32 s1, s0, 0x70
-; GFX11-NEXT: s_add_i32 s2, s0, 0x60
-; GFX11-NEXT: s_add_i32 s3, s0, 0x50
-; GFX11-NEXT: s_add_i32 s0, s0, 48
+; GFX11-NEXT: s_clause 0x5
+; GFX11-NEXT: scratch_store_b128 v0, v[22:25], off offset:80
+; GFX11-NEXT: scratch_store_b128 v0, v[18:21], off offset:64
+; GFX11-NEXT: scratch_store_b128 v0, v[14:17], off offset:48
+; GFX11-NEXT: scratch_store_b128 v0, v[10:13], off offset:32
+; GFX11-NEXT: scratch_store_b128 v0, v[6:9], off offset:16
+; GFX11-NEXT: scratch_store_b128 v0, v[2:5], off
; GFX11-NEXT: s_waitcnt vmcnt(0)
-; GFX11-NEXT: scratch_store_b128 off, v[30:33], s1
-; GFX11-NEXT: scratch_store_b128 off, v[26:29], s2
-; GFX11-NEXT: scratch_store_b128 off, v[22:25], s3
-; GFX11-NEXT: scratch_store_b128 off, v[14:17], s0
+; GFX11-NEXT: s_clause 0x2
+; GFX11-NEXT: scratch_store_b128 v0, v[30:33], off offset:112
+; GFX11-NEXT: scratch_store_b128 v0, v[26:29], off offset:96
+; GFX11-NEXT: scratch_store_b16 v0, v1, off offset:128
; GFX11-NEXT: s_setpc_b64 s[30:31]
%ins.0 = insertvalue { <32 x i32>, bfloat } poison, <32 x i32> %b, 0
%ins.1 = insertvalue { <32 x i32>, bfloat } %ins.0 ,bfloat %a, 1
@@ -8827,19 +8823,6 @@ define <32 x double> @global_extload_v32bf16_to_v32f64(ptr addrspace(1) %ptr) {
; GFX11-NEXT: global_load_u16 v32, v[1:2], off offset:54
; GFX11-NEXT: global_load_u16 v33, v[1:2], off offset:58
; GFX11-NEXT: global_load_u16 v1, v[1:2], off offset:62
-; GFX11-NEXT: v_readfirstlane_b32 s0, v0
-; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT: s_add_i32 s1, s0, 0xf0
-; GFX11-NEXT: s_add_i32 s2, s0, 0xe0
-; GFX11-NEXT: s_add_i32 s3, s0, 0xd0
-; GFX11-NEXT: s_add_i32 s4, s0, 0xc0
-; GFX11-NEXT: s_add_i32 s5, s0, 0xb0
-; GFX11-NEXT: s_add_i32 s6, s0, 0xa0
-; GFX11-NEXT: s_add_i32 s7, s0, 0x90
-; GFX11-NEXT: s_add_i32 s8, s0, 0x70
-; GFX11-NEXT: s_add_i32 s9, s0, 0x60
-; GFX11-NEXT: s_add_i32 s10, s0, 0x50
-; GFX11-NEXT: s_add_i32 s11, s0, 48
; GFX11-NEXT: s_waitcnt vmcnt(31)
; GFX11-NEXT: v_lshlrev_b32_e32 v39, 16, v3
; GFX11-NEXT: s_waitcnt vmcnt(30)
@@ -8936,23 +8919,23 @@ define <32 x double> @global_extload_v32bf16_to_v32f64(ptr addrspace(1) %ptr) {
; GFX11-NEXT: v_cvt_f64_f32_e32 v[5:6], v5
; GFX11-NEXT: v_cvt_f64_f32_e32 v[3:4], v2
; GFX11-NEXT: v_cvt_f64_f32_e32 v[1:2], v37
-; GFX11-NEXT: scratch_store_b128 off, v[96:99], s1
-; GFX11-NEXT: scratch_store_b128 off, v[84:87], s2
-; GFX11-NEXT: scratch_store_b128 off, v[80:83], s3
-; GFX11-NEXT: scratch_store_b128 off, v[68:71], s4
-; GFX11-NEXT: scratch_store_b128 off, v[64:67], s5
-; GFX11-NEXT: scratch_store_b128 off, v[52:55], s6
-; GFX11-NEXT: scratch_store_b128 off, v[48:51], s7
-; GFX11-NEXT: scratch_store_b128 off, v[33:36], s0 offset:128
-; GFX11-NEXT: scratch_store_b128 off, v[29:32], s8
-; GFX11-NEXT: scratch_store_b128 off, v[25:28], s9
-; GFX11-NEXT: scratch_store_b128 off, v[21:24], s10
-; GFX11-NEXT: scratch_store_b128 off, v[17:20], s0 offset:64
-; GFX11-NEXT: scratch_store_b128 off, v[13:16], s11
-; GFX11-NEXT: s_clause 0x2
-; GFX11-NEXT: scratch_store_b128 off, v[9:12], s0 offset:32
-; GFX11-NEXT: scratch_store_b128 off, v[5:8], s0 offset:16
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0
+; GFX11-NEXT: s_clause 0xf
+; GFX11-NEXT: scratch_store_b128 v0, v[96:99], off offset:240
+; GFX11-NEXT: scratch_store_b128 v0, v[84:87], off offset:224
+; GFX11-NEXT: scratch_store_b128 v0, v[80:83], off offset:208
+; GFX11-NEXT: scratch_store_b128 v0, v[68:71], off offset:192
+; GFX11-NEXT: scratch_store_b128 v0, v[64:67], off offset:176
+; GFX11-NEXT: scratch_store_b128 v0, v[52:55], off offset:160
+; GFX11-NEXT: scratch_store_b128 v0, v[48:51], off offset:144
+; GFX11-NEXT: scratch_store_b128 v0, v[33:36], off offset:128
+; GFX11-NEXT: scratch_store_b128 v0, v[29:32], off offset:112
+; GFX11-NEXT: scratch_store_b128 v0, v[25:28], off offset:96
+; GFX11-NEXT: scratch_store_b128 v0, v[21:24], off offset:80
+; GFX11-NEXT: scratch_store_b128 v0, v[17:20], off offset:64
+; GFX11-NEXT: scratch_store_b128 v0, v[13:16], off offset:48
+; GFX11-NEXT: scratch_store_b128 v0, v[9:12], off offset:32
+; GFX11-NEXT: scratch_store_b128 v0, v[5:8], off offset:16
+; GFX11-NEXT: scratch_store_b128 v0, v[1:4], off
; GFX11-NEXT: s_setpc_b64 s[30:31]
%load = load <32 x bfloat>, ptr addrspace(1) %ptr
%fpext = fpext <32 x bfloat> %load to <32 x double>
diff --git a/llvm/test/CodeGen/AMDGPU/function-returns.ll b/llvm/test/CodeGen/AMDGPU/function-returns.ll
index acadee27981710..401cbce00ac9a8 100644
--- a/llvm/test/CodeGen/AMDGPU/function-returns.ll
+++ b/llvm/test/CodeGen/AMDGPU/function-returns.ll
@@ -1561,34 +1561,28 @@ define <33 x i32> @v33i32_func_void() #0 {
; GFX11-NEXT: buffer_load_b128 v[9:12], off, s[0:3], 0 offset:80
; GFX11-NEXT: buffer_load_b128 v[13:16], off, s[0:3], 0 offset:64
; GFX11-NEXT: buffer_load_b128 v[17:20], off, s[0:3], 0 offset:48
-; GFX11-NEXT: buffer_load_b128 v[21:24], off, s[0:3], 0 offset:16
-; GFX11-NEXT: buffer_load_b128 v[25:28], off, s[0:3], 0
-; GFX11-NEXT: buffer_load_b128 v[29:32], off, s[0:3], 0 offset:32
+; GFX11-NEXT: buffer_load_b128 v[21:24], off, s[0:3], 0 offset:32
+; GFX11-NEXT: buffer_load_b128 v[25:28], off, s[0:3], 0 offset:16
+; GFX11-NEXT: buffer_load_b128 v[29:32], off, s[0:3], 0
; GFX11-NEXT: buffer_load_b32 v33, off, s[0:3], 0 offset:128
-; GFX11-NEXT: v_readfirstlane_b32 s0, v0
-; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT: s_add_i32 s1, s0, 0x70
-; GFX11-NEXT: s_add_i32 s2, s0, 0x60
-; GFX11-NEXT: s_add_i32 s3, s0, 0x50
-; GFX11-NEXT: s_add_i32 s4, s0, 48
; GFX11-NEXT: s_waitcnt vmcnt(8)
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
+; GFX11-NEXT: scratch_store_b128 v0, v[1:4], off offset:112
; GFX11-NEXT: s_waitcnt vmcnt(7)
-; GFX11-NEXT: scratch_store_b128 off, v[5:8], s2
+; GFX11-NEXT: scratch_store_b128 v0, v[5:8], off offset:96
; GFX11-NEXT: s_waitcnt vmcnt(6)
-; GFX11-NEXT: scratch_store_b128 off, v[9:12], s3
+; GFX11-NEXT: scratch_store_b128 v0, v[9:12], off offset:80
; GFX11-NEXT: s_waitcnt vmcnt(5)
-; GFX11-NEXT: scratch_store_b128 off, v[13:16], s0 offset:64
+; GFX11-NEXT: scratch_store_b128 v0, v[13:16], off offset:64
; GFX11-NEXT: s_waitcnt vmcnt(4)
-; GFX11-NEXT: scratch_store_b128 off, v[17:20], s4
+; GFX11-NEXT: scratch_store_b128 v0, v[17:20], off offset:48
; GFX11-NEXT: s_waitcnt vmcnt(3)
-; GFX11-NEXT: scratch_store_b128 off, v[21:24], s0 offset:16
+; GFX11-NEXT: scratch_store_b128 v0, v[21:24], off offset:32
; GFX11-NEXT: s_waitcnt vmcnt(2)
-; GFX11-NEXT: scratch_store_b128 off, v[25:28], s0
+; GFX11-NEXT: scratch_store_b128 v0, v[25:28], off offset:16
; GFX11-NEXT: s_waitcnt vmcnt(1)
-; GFX11-NEXT: scratch_store_b128 off, v[29:32], s0 offset:32
+; GFX11-NEXT: scratch_store_b128 v0, v[29:32], off
; GFX11-NEXT: s_waitcnt vmcnt(0)
-; GFX11-NEXT: scratch_store_b32 off, v33, s0 offset:128
+; GFX11-NEXT: scratch_store_b32 v0, v33, off offset:128
; GFX11-NEXT: s_setpc_b64 s[30:31]
%ptr = load volatile ptr addrspace(1), ptr addrspace(4) undef
%val = load <33 x i32>, ptr addrspace(1) %ptr
@@ -1850,34 +1844,28 @@ define { <32 x i32>, i32 } @struct_v32i32_i32_func_void() #0 {
; GFX11-NEXT: buffer_load_b128 v[9:12], off, s[0:3], 0 offset:80
; GFX11-NEXT: buffer_load_b128 v[13:16], off, s[0:3], 0 offset:64
; GFX11-NEXT: buffer_load_b128 v[17:20], off, s[0:3], 0 offset:48
-; GFX11-NEXT: buffer_load_b128 v[21:24], off, s[0:3], 0 offset:16
-; GFX11-NEXT: buffer_load_b128 v[25:28], off, s[0:3], 0
-; GFX11-NEXT: buffer_load_b128 v[29:32], off, s[0:3], 0 offset:32
+; GFX11-NEXT: buffer_load_b128 v[21:24], off, s[0:3], 0 offset:32
+; GFX11-NEXT: buffer_load_b128 v[25:28], off, s[0:3], 0 offset:16
+; GFX11-NEXT: buffer_load_b128 v[29:32], off, s[0:3], 0
; GFX11-NEXT: buffer_load_b32 v33, off, s[0:3], 0 offset:128
-; GFX11-NEXT: v_readfirstlane_b32 s0, v0
-; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT: s_add_i32 s1, s0, 0x70
-; GFX11-NEXT: s_add_i32 s2, s0, 0x60
-; GFX11-NEXT: s_add_i32 s3, s0, 0x50
-; GFX11-NEXT: s_add_i32 s4, s0, 48
; GFX11-NEXT: s_waitcnt vmcnt(8)
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
+; GFX11-NEXT: scratch_store_b128 v0, v[1:4], off offset:112
; GFX11-NEXT: s_waitcnt vmcnt(7)
-; GFX11-NEXT: scratch_store_b128 off, v[5:8], s2
+; GFX11-NEXT: scratch_store_b128 v0, v[5:8], off offset:96
; GFX11-NEXT: s_waitcnt vmcnt(6)
-; GFX11-NEXT: scratch_store_b128 off, v[9:12], s3
+; GFX11-NEXT: scratch_store_b128 v0, v[9:12], off offset:80
; GFX11-NEXT: s_waitcnt vmcnt(5)
-; GFX11-NEXT: scratch_store_b128 off, v[13:16], s0 offset:64
+; GFX11-NEXT: scratch_store_b128 v0, v[13:16], off offset:64
; GFX11-NEXT: s_waitcnt vmcnt(4)
-; GFX11-NEXT: scratch_store_b128 off, v[17:20], s4
+; GFX11-NEXT: scratch_store_b128 v0, v[17:20], off offset:48
; GFX11-NEXT: s_waitcnt vmcnt(3)
-; GFX11-NEXT: scratch_store_b128 off, v[21:24], s0 offset:16
+; GFX11-NEXT: scratch_store_b128 v0, v[21:24], off offset:32
; GFX11-NEXT: s_waitcnt vmcnt(2)
-; GFX11-NEXT: scratch_store_b128 off, v[25:28], s0
+; GFX11-NEXT: scratch_store_b128 v0, v[25:28], off offset:16
; GFX11-NEXT: s_waitcnt vmcnt(1)
-; GFX11-NEXT: scratch_store_b128 off, v[29:32], s0 offset:32
+; GFX11-NEXT: scratch_store_b128 v0, v[29:32], off
; GFX11-NEXT: s_waitcnt vmcnt(0)
-; GFX11-NEXT: scratch_store_b32 off, v33, s0 offset:128
+; GFX11-NEXT: scratch_store_b32 v0, v33, off offset:128
; GFX11-NEXT: s_setpc_b64 s[30:31]
%ptr = load volatile ptr addrspace(1), ptr addrspace(4) undef
%val = load { <32 x i32>, i32 }, ptr addrspace(1) %ptr
@@ -2143,33 +2131,24 @@ define { i32, <32 x i32> } @struct_i32_v32i32_func_void() #0 {
; GFX11-NEXT: buffer_load_b128 v[25:28], off, s[0:3], 0 offset:144
; GFX11-NEXT: buffer_load_b128 v[29:32], off, s[0:3], 0 offset:128
; GFX11-NEXT: buffer_load_b32 v33, off, s[0:3], 0
-; GFX11-NEXT: v_readfirstlane_b32 s0, v0
-; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT: s_add_i32 s1, s0, 0xf0
-; GFX11-NEXT: s_add_i32 s2, s0, 0xe0
-; GFX11-NEXT: s_add_i32 s3, s0, 0xd0
-; GFX11-NEXT: s_add_i32 s4, s0, 0xc0
-; GFX11-NEXT: s_add_i32 s5, s0, 0xb0
-; GFX11-NEXT: s_add_i32 s6, s0, 0xa0
-; GFX11-NEXT: s_add_i32 s7, s0, 0x90
; GFX11-NEXT: s_waitcnt vmcnt(8)
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
+; GFX11-NEXT: scratch_store_b128 v0, v[1:4], off offset:240
; GFX11-NEXT: s_waitcnt vmcnt(7)
-; GFX11-NEXT: scratch_store_b128 off, v[5:8], s2
+; GFX11-NEXT: scratch_store_b128 v0, v[5:8], off offset:224
; GFX11-NEXT: s_waitcnt vmcnt(6)
-; GFX11-NEXT: scratch_store_b128 off, v[9:12], s3
+; GFX11-NEXT: scratch_store_b128 v0, v[9:12], off offset:208
; GFX11-NEXT: s_waitcnt vmcnt(5)
-; GFX11-NEXT: scratch_store_b128 off, v[13:16], s4
+; GFX11-NEXT: scratch_store_b128 v0, v[13:16], off offset:192
; GFX11-NEXT: s_waitcnt vmcnt(4)
-; GFX11-NEXT: scratch_store_b128 off, v[17:20], s5
+; GFX11-NEXT: scratch_store_b128 v0, v[17:20], off offset:176
; GFX11-NEXT: s_waitcnt vmcnt(3)
-; GFX11-NEXT: scratch_store_b128 off, v[21:24], s6
+; GFX11-NEXT: scratch_store_b128 v0, v[21:24], off offset:160
; GFX11-NEXT: s_waitcnt vmcnt(2)
-; GFX11-NEXT: scratch_store_b128 off, v[25:28], s7
+; GFX11-NEXT: scratch_store_b128 v0, v[25:28], off offset:144
; GFX11-NEXT: s_waitcnt vmcnt(1)
-; GFX11-NEXT: scratch_store_b128 off, v[29:32], s0 offset:128
+; GFX11-NEXT: scratch_store_b128 v0, v[29:32], off offset:128
; GFX11-NEXT: s_waitcnt vmcnt(0)
-; GFX11-NEXT: scratch_store_b32 off, v33, s0
+; GFX11-NEXT: scratch_store_b32 v0, v33, off
; GFX11-NEXT: s_setpc_b64 s[30:31]
%ptr = load volatile ptr addrspace(1), ptr addrspace(4) undef
%val = load { i32, <32 x i32> }, ptr addrspace(1) %ptr
diff --git a/llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll b/llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll
index c1d682689903ad..3b078c41f4a849 100644
--- a/llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll
+++ b/llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll
@@ -1989,256 +1989,138 @@ define amdgpu_gfx <512 x i32> @return_512xi32() #0 {
; GFX11-NEXT: s_mov_b32 s2, s0
; GFX11-NEXT: v_dual_mov_b32 v4, s3 :: v_dual_mov_b32 v3, s2
; GFX11-NEXT: v_dual_mov_b32 v2, s1 :: v_dual_mov_b32 v1, s0
-; GFX11-NEXT: v_readfirstlane_b32 s0, v0
-; GFX11-NEXT: s_clause 0x7
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0 offset:1024
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0 offset:512
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0 offset:256
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0 offset:128
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0 offset:64
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0 offset:32
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0 offset:16
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s0
-; GFX11-NEXT: s_add_i32 s1, s0, 0x7f0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x7e0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x7d0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x7c0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x7b0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x7a0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x790
-; GFX11-NEXT: s_add_i32 s2, s0, 0x780
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x770
-; GFX11-NEXT: s_add_i32 s2, s0, 0x760
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x750
-; GFX11-NEXT: s_add_i32 s2, s0, 0x740
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x730
-; GFX11-NEXT: s_add_i32 s2, s0, 0x720
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x710
-; GFX11-NEXT: s_add_i32 s2, s0, 0x700
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x6f0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x6e0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x6d0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x6c0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x6b0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x6a0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x690
-; GFX11-NEXT: s_add_i32 s2, s0, 0x680
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x670
-; GFX11-NEXT: s_add_i32 s2, s0, 0x660
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x650
-; GFX11-NEXT: s_add_i32 s2, s0, 0x640
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x630
-; GFX11-NEXT: s_add_i32 s2, s0, 0x620
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x610
-; GFX11-NEXT: s_add_i32 s2, s0, 0x600
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x5f0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x5e0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x5d0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x5c0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x5b0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x5a0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x590
-; GFX11-NEXT: s_add_i32 s2, s0, 0x580
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x570
-; GFX11-NEXT: s_add_i32 s2, s0, 0x560
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x550
-; GFX11-NEXT: s_add_i32 s2, s0, 0x540
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x530
-; GFX11-NEXT: s_add_i32 s2, s0, 0x520
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x510
-; GFX11-NEXT: s_add_i32 s2, s0, 0x500
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x4f0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x4e0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x4d0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x4c0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2
-; GFX11-NEXT: s_add_i32 s1, s0, 0x4b0
-; GFX11-NEXT: s_add_i32 s2, s0, 0x4a0
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s1
-; GFX11-NEXT: scratch_store_b128 off, v[1:4], s2...
[truncated]
|
WebAssembly tests need updating. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once all tests are updated.
Commoned up code, updated tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It gives some nice codegen improvements.
Incidentally we could also preserve the "disjoint" flag when reassociating OR.
; CHECK: i32.add $push9=, $[[SP:[0-9]+]], $pop8 | ||
; CHECK: i32.const $push0=, 16 | ||
; CHECK: i32.add $push1=, $pop9, $pop0 | ||
; CHECK: i32.const $push0=, 24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe autogenerate the checks in this file? But I guess there is no need since you have already updated the manual checks.
Similarly to the generic case below, preserve the NUW flag when reassociating adds with constants.