AMDGPU: Make frame index folding logic consistent with eliminateFrameIndex #129633

arsenm · 2025-03-04T02:54:14Z

This adds handling of s_add_u32, which is handled and removes handling of
s_or_b32 and s_and_b32, which are not. I was working on handling them
in #102345, but need to finish that patch. This fixes a regression
exposed by a316539 where the
final instruction would use two literals.

arsenm · 2025-03-04T02:54:34Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-03-04T02:54:39Z

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

This adds handling of s_add_u32, which is handled and removes handling of
s_or_b32 and s_and_b32, which are not. I was working on handling them
in #102345, but need to finish that patch. This fixes a regression
exposed by a316539 where the
final instruction would use two literals.

Full diff: https://github.com/llvm/llvm-project/pull/129633.diff

6 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+1-2)
(modified) llvm/test/CodeGen/AMDGPU/fold-operands-frame-index.mir (+146)
(modified) llvm/test/CodeGen/AMDGPU/fold-operands-s-add-copy-to-vgpr.mir (+16-10)
(modified) llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll (+42)
(modified) llvm/test/CodeGen/AMDGPU/huge-private-buffer.ll (+35-19)
(modified) llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll (+31-30)

diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index eb9aabf8b6317..26a700e054217 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -227,8 +227,7 @@ bool SIFoldOperandsImpl::frameIndexMayFold(
   const unsigned Opc = UseMI.getOpcode();
   switch (Opc) {
   case AMDGPU::S_ADD_I32:
-  case AMDGPU::S_OR_B32:
-  case AMDGPU::S_AND_B32:
+  case AMDGPU::S_ADD_U32:
   case AMDGPU::V_ADD_U32_e32:
   case AMDGPU::V_ADD_CO_U32_e32:
     // TODO: Possibly relax hasOneUse. It matters more for mubuf, since we have
diff --git a/llvm/test/CodeGen/AMDGPU/fold-operands-frame-index.mir b/llvm/test/CodeGen/AMDGPU/fold-operands-frame-index.mir
index 6ab1395a0dcca..93a4c4cc6fd17 100644
--- a/llvm/test/CodeGen/AMDGPU/fold-operands-frame-index.mir
+++ b/llvm/test/CodeGen/AMDGPU/fold-operands-frame-index.mir
@@ -393,3 +393,149 @@ body:             |
     SI_RETURN implicit $vgpr0, implicit $vgpr1
 
 ...
+
+---
+name:  fold_frame_index__s_add_u32__fi_const
+tracksRegLiveness: true
+frameInfo:
+  maxAlignment:    4
+  localFrameSize:  16384
+stack:
+  - { id: 0, size: 16384, alignment: 4, local-offset: 0 }
+body:             |
+  bb.0:
+    ; CHECK-LABEL: name: fold_frame_index__s_add_u32__fi_const
+    ; CHECK: [[S_ADD_U32_:%[0-9]+]]:sreg_32 = S_ADD_U32 %stack.0, 128, implicit-def $scc
+    ; CHECK-NEXT: $sgpr4 = COPY [[S_ADD_U32_]]
+    ; CHECK-NEXT: SI_RETURN implicit $sgpr4
+    %0:sreg_32 = S_MOV_B32 %stack.0
+    %1:sreg_32 = S_ADD_U32 %0, 128, implicit-def $scc
+    $sgpr4 = COPY %1
+    SI_RETURN implicit $sgpr4
+...
+
+---
+name:  fold_frame_index__s_add_u32__const_fi
+tracksRegLiveness: true
+frameInfo:
+  maxAlignment:    4
+  localFrameSize:  16384
+stack:
+  - { id: 0, size: 16384, alignment: 4, local-offset: 0 }
+body:             |
+  bb.0:
+    ; CHECK-LABEL: name: fold_frame_index__s_add_u32__const_fi
+    ; CHECK: [[S_ADD_U32_:%[0-9]+]]:sreg_32 = S_ADD_U32 128, %stack.0, implicit-def $scc
+    ; CHECK-NEXT: $sgpr4 = COPY [[S_ADD_U32_]]
+    ; CHECK-NEXT: SI_RETURN implicit $sgpr4
+    %0:sreg_32 = S_MOV_B32 %stack.0
+    %1:sreg_32 = S_ADD_U32 128, %0, implicit-def $scc
+    $sgpr4 = COPY %1
+    SI_RETURN implicit $sgpr4
+...
+
+---
+name:  fold_frame_index__s_add_u32__fi_inlineimm
+tracksRegLiveness: true
+frameInfo:
+  maxAlignment:    4
+  localFrameSize:  16384
+stack:
+  - { id: 0, size: 16384, alignment: 4, local-offset: 0 }
+body:             |
+  bb.0:
+    ; CHECK-LABEL: name: fold_frame_index__s_add_u32__fi_inlineimm
+    ; CHECK: [[S_ADD_U32_:%[0-9]+]]:sreg_32 = S_ADD_U32 %stack.0, 16, implicit-def $scc
+    ; CHECK-NEXT: $sgpr4 = COPY [[S_ADD_U32_]]
+    ; CHECK-NEXT: SI_RETURN implicit $sgpr4
+    %0:sreg_32 = S_MOV_B32 %stack.0
+    %1:sreg_32 = S_ADD_U32 %0, 16, implicit-def $scc
+    $sgpr4 = COPY %1
+    SI_RETURN implicit $sgpr4
+...
+
+---
+name:  fold_frame_index__s_add_u32__inlineimm_fi
+tracksRegLiveness: true
+frameInfo:
+  maxAlignment:    4
+  localFrameSize:  16384
+stack:
+  - { id: 0, size: 16384, alignment: 4, local-offset: 0 }
+body:             |
+  bb.0:
+    ; CHECK-LABEL: name: fold_frame_index__s_add_u32__inlineimm_fi
+    ; CHECK: [[S_ADD_U32_:%[0-9]+]]:sreg_32 = S_ADD_U32 16, %stack.0, implicit-def $scc
+    ; CHECK-NEXT: $sgpr4 = COPY [[S_ADD_U32_]]
+    ; CHECK-NEXT: SI_RETURN implicit $sgpr4
+    %0:sreg_32 = S_MOV_B32 %stack.0
+    %1:sreg_32 = S_ADD_U32 16, %0, implicit-def $scc
+    $sgpr4 = COPY %1
+    SI_RETURN implicit $sgpr4
+...
+
+---
+name:            no_fold_literal_and_fi_s_or_b32
+tracksRegLiveness: true
+frameInfo:
+  maxAlignment:    16
+  localFrameSize:  8192
+stack:
+  - { id: 0, size: 4096, alignment: 4, local-offset: 0 }
+  - { id: 1, size: 4096, alignment: 16, local-offset: 4096 }
+body:             |
+  bb.0:
+    ; CHECK-LABEL: name: no_fold_literal_and_fi_s_or_b32
+    ; CHECK: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 %stack.1
+    ; CHECK-NEXT: [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 killed [[S_MOV_B32_]], 12345, implicit-def dead $scc
+    ; CHECK-NEXT: S_ENDPGM 0, implicit [[S_AND_B32_]]
+    %0:sreg_32 = S_MOV_B32 12345
+    %1:sreg_32 = S_MOV_B32 %stack.1
+    %2:sreg_32 = S_AND_B32 killed %1, killed %0, implicit-def dead $scc
+    S_ENDPGM 0, implicit %2
+
+...
+
+---
+name:            no_fold_literal_or_fi_s_or_b32
+tracksRegLiveness: true
+frameInfo:
+  maxAlignment:    16
+  localFrameSize:  8192
+stack:
+  - { id: 0, size: 4096, alignment: 4, local-offset: 0 }
+  - { id: 1, size: 4096, alignment: 16, local-offset: 4096 }
+body:             |
+  bb.0:
+    ; CHECK-LABEL: name: no_fold_literal_or_fi_s_or_b32
+    ; CHECK: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 %stack.1
+    ; CHECK-NEXT: [[S_OR_B32_:%[0-9]+]]:sreg_32 = S_OR_B32 killed [[S_MOV_B32_]], 12345, implicit-def dead $scc
+    ; CHECK-NEXT: S_ENDPGM 0, implicit [[S_OR_B32_]]
+    %0:sreg_32 = S_MOV_B32 12345
+    %1:sreg_32 = S_MOV_B32 %stack.1
+    %2:sreg_32 = S_OR_B32 killed %1, killed %0, implicit-def dead $scc
+    S_ENDPGM 0, implicit %2
+
+...
+
+---
+name:            no_fold_literal_and_fi_s_mul_i32
+tracksRegLiveness: true
+frameInfo:
+  maxAlignment:    16
+  localFrameSize:  8192
+stack:
+  - { id: 0, size: 4096, alignment: 4, local-offset: 0 }
+  - { id: 1, size: 4096, alignment: 16, local-offset: 4096 }
+body:             |
+  bb.0:
+    ; CHECK-LABEL: name: no_fold_literal_and_fi_s_mul_i32
+    ; CHECK: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 %stack.1
+    ; CHECK-NEXT: [[S_MUL_I32_:%[0-9]+]]:sreg_32 = S_MUL_I32 killed [[S_MOV_B32_]], 12345, implicit-def dead $scc
+    ; CHECK-NEXT: S_ENDPGM 0, implicit [[S_MUL_I32_]]
+    %0:sreg_32 = S_MOV_B32 12345
+    %1:sreg_32 = S_MOV_B32 %stack.1
+    %2:sreg_32 = S_MUL_I32 killed %1, killed %0, implicit-def dead $scc
+    S_ENDPGM 0, implicit %2
+
+...
diff --git a/llvm/test/CodeGen/AMDGPU/fold-operands-s-add-copy-to-vgpr.mir b/llvm/test/CodeGen/AMDGPU/fold-operands-s-add-copy-to-vgpr.mir
index ab0aa16cf6c09..2bdc3f671897c 100644
--- a/llvm/test/CodeGen/AMDGPU/fold-operands-s-add-copy-to-vgpr.mir
+++ b/llvm/test/CodeGen/AMDGPU/fold-operands-s-add-copy-to-vgpr.mir
@@ -394,8 +394,10 @@ stack:
 body:             |
   bb.0:
     ; CHECK-LABEL: name: fold_s_or_b32__mov_fi_const_copy_to_virt_vgpr
-    ; CHECK: [[V_OR_B32_e32_:%[0-9]+]]:vgpr_32 = V_OR_B32_e32 128, %stack.0, implicit $exec
-    ; CHECK-NEXT: SI_RETURN implicit [[V_OR_B32_e32_]]
+    ; CHECK: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 %stack.0
+    ; CHECK-NEXT: [[S_OR_B32_:%[0-9]+]]:sreg_32 = S_OR_B32 [[S_MOV_B32_]], 128, implicit-def dead $scc
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY [[S_OR_B32_]]
+    ; CHECK-NEXT: SI_RETURN implicit [[COPY]]
     %0:sreg_32 = S_MOV_B32 %stack.0
     %1:sreg_32 = S_OR_B32 %0, 128, implicit-def dead $scc
     %2:vgpr_32 = COPY %1
@@ -410,8 +412,10 @@ stack:
 body:             |
   bb.0:
     ; CHECK-LABEL: name: fold_s_or_b32__const_copy_mov_fi_to_virt_vgpr
-    ; CHECK: [[V_OR_B32_e32_:%[0-9]+]]:vgpr_32 = V_OR_B32_e32 128, %stack.0, implicit $exec
-    ; CHECK-NEXT: SI_RETURN implicit [[V_OR_B32_e32_]]
+    ; CHECK: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 %stack.0
+    ; CHECK-NEXT: [[S_OR_B32_:%[0-9]+]]:sreg_32 = S_OR_B32 128, [[S_MOV_B32_]], implicit-def dead $scc
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY [[S_OR_B32_]]
+    ; CHECK-NEXT: SI_RETURN implicit [[COPY]]
     %0:sreg_32 = S_MOV_B32 %stack.0
     %1:sreg_32 = S_OR_B32 128, %0, implicit-def dead $scc
     %2:vgpr_32 = COPY %1
@@ -426,8 +430,8 @@ stack:
 body:             |
   bb.0:
     ; CHECK-LABEL: name: fold_s_or_b32__fi_imm_copy_to_virt_vgpr
-    ; CHECK: %1:vgpr_32 = disjoint V_OR_B32_e64 64, %stack.0, implicit $exec
-    ; CHECK-NEXT: SI_RETURN implicit %1
+    ; CHECK: [[V_OR_B32_e64_:%[0-9]+]]:vgpr_32 = disjoint V_OR_B32_e64 64, %stack.0, implicit $exec
+    ; CHECK-NEXT: SI_RETURN implicit [[V_OR_B32_e64_]]
     %0:sreg_32 = disjoint S_OR_B32 %stack.0, 64, implicit-def dead $scc
     %1:vgpr_32 = COPY %0
     SI_RETURN implicit %1
@@ -441,8 +445,8 @@ stack:
 body:             |
   bb.0:
     ; CHECK-LABEL: name: fold_s_or_b32__imm_fi_copy_to_virt_vgpr
-    ; CHECK: %1:vgpr_32 = disjoint V_OR_B32_e64 64, %stack.0, implicit $exec
-    ; CHECK-NEXT: SI_RETURN implicit %1
+    ; CHECK: [[V_OR_B32_e64_:%[0-9]+]]:vgpr_32 = disjoint V_OR_B32_e64 64, %stack.0, implicit $exec
+    ; CHECK-NEXT: SI_RETURN implicit [[V_OR_B32_e64_]]
     %0:sreg_32 = disjoint S_OR_B32 64, %stack.0, implicit-def dead $scc
     %1:vgpr_32 = COPY %0
     SI_RETURN implicit %1
@@ -521,8 +525,10 @@ stack:
 body:             |
   bb.0:
     ; CHECK-LABEL: name: fold_s_and_b32__mov_fi_const_copy_to_virt_vgpr
-    ; CHECK: [[V_AND_B32_e32_:%[0-9]+]]:vgpr_32 = V_AND_B32_e32 128, %stack.0, implicit $exec
-    ; CHECK-NEXT: SI_RETURN implicit [[V_AND_B32_e32_]]
+    ; CHECK: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 %stack.0
+    ; CHECK-NEXT: [[S_AND_B32_:%[0-9]+]]:sreg_32 = S_AND_B32 [[S_MOV_B32_]], 128, implicit-def dead $scc
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY [[S_AND_B32_]]
+    ; CHECK-NEXT: SI_RETURN implicit [[COPY]]
     %0:sreg_32 = S_MOV_B32 %stack.0
     %1:sreg_32 = S_AND_B32 %0, 128, implicit-def dead $scc
     %2:vgpr_32 = COPY %1
diff --git a/llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll b/llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll
index 004403f46a4d4..7125e7740c10a 100644
--- a/llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll
+++ b/llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll
@@ -374,4 +374,46 @@ vector.body.i.i.i.i:                              ; preds = %.shuffle.then.i.i.i
   ret void
 }
 
+; GCN-LABEL: {{^}}fi_sop2_and_literal_error:
+; GCN: s_and_b32 s{{[0-9]+}}, s{{[0-9]+}}, 0x1fe00
+define amdgpu_kernel void @fi_sop2_and_literal_error() #0 {
+entry:
+  %.omp.reduction.element.i.i.i.i = alloca [1024 x i32], align 4, addrspace(5)
+  %Total3.i.i = alloca [1024 x i32], align 16, addrspace(5)
+  %p2i = ptrtoint ptr addrspace(5) %Total3.i.i to i32
+  br label %.shuffle.then.i.i.i.i
+
+.shuffle.then.i.i.i.i:                            ; preds = %.shuffle.then.i.i.i.i, %entry
+  store i64 0, ptr addrspace(5) null, align 4
+  %or = and i32 %p2i, -512
+  %icmp = icmp ugt i32 %or, 9999999
+  br i1 %icmp, label %.shuffle.then.i.i.i.i, label %vector.body.i.i.i.i
+
+vector.body.i.i.i.i:                              ; preds = %.shuffle.then.i.i.i.i
+  %wide.load9.i.i.i.i = load <2 x i32>, ptr addrspace(5) %.omp.reduction.element.i.i.i.i, align 4
+  store <2 x i32> %wide.load9.i.i.i.i, ptr addrspace(5) null, align 4
+  ret void
+}
+
+; GCN-LABEL: {{^}}fi_sop2_or_literal_error:
+; GCN: s_or_b32 s{{[0-9]+}}, s{{[0-9]+}}, 0x3039
+define amdgpu_kernel void @fi_sop2_or_literal_error() #0 {
+entry:
+  %.omp.reduction.element.i.i.i.i = alloca [1024 x i32], align 4, addrspace(5)
+  %Total3.i.i = alloca [1024 x i32], align 16, addrspace(5)
+  %p2i = ptrtoint ptr addrspace(5) %Total3.i.i to i32
+  br label %.shuffle.then.i.i.i.i
+
+.shuffle.then.i.i.i.i:                            ; preds = %.shuffle.then.i.i.i.i, %entry
+  store i64 0, ptr addrspace(5) null, align 4
+  %or = or i32 %p2i, 12345
+  %icmp = icmp ugt i32 %or, 9999999
+  br i1 %icmp, label %.shuffle.then.i.i.i.i, label %vector.body.i.i.i.i
+
+vector.body.i.i.i.i:                              ; preds = %.shuffle.then.i.i.i.i
+  %wide.load9.i.i.i.i = load <2 x i32>, ptr addrspace(5) %.omp.reduction.element.i.i.i.i, align 4
+  store <2 x i32> %wide.load9.i.i.i.i, ptr addrspace(5) null, align 4
+  ret void
+}
+
 attributes #0 = { nounwind }
diff --git a/llvm/test/CodeGen/AMDGPU/huge-private-buffer.ll b/llvm/test/CodeGen/AMDGPU/huge-private-buffer.ll
index 2cb440b1b7a01..08ea81ad81ae5 100644
--- a/llvm/test/CodeGen/AMDGPU/huge-private-buffer.ll
+++ b/llvm/test/CodeGen/AMDGPU/huge-private-buffer.ll
@@ -7,9 +7,10 @@
 ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1200 -amdgpu-enable-vopd=0 -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,SCRATCH2048K %s
 
 ; GCN-LABEL: {{^}}scratch_buffer_known_high_masklo16:
-; GCN: v_mov_b32_e32 [[FI:v[0-9]+]], 0{{$}}
-; GCN: v_and_b32_e32 v{{[0-9]+}}, 0xfffc, [[FI]]
-; GCN: {{flat|global}}_store_{{dword|b32}} v[{{[0-9]+:[0-9]+}}],
+; GCN: s_mov_b32 [[FI:s[0-9]+]], 0{{$}}
+; GCN: s_and_b32 s{{[0-9]+}}, [[FI]], 0xfffc
+; GCN: v_mov_b32_e32 [[VFI:v[0-9]+]], [[FI]]{{$}}
+; GCN: {{flat|global}}_store_{{dword|b32}} v[{{[0-9]+:[0-9]+}}], [[VFI]]
 define amdgpu_kernel void @scratch_buffer_known_high_masklo16() {
   %alloca = alloca i32, align 4, addrspace(5)
   store volatile i32 15, ptr addrspace(5) %alloca
@@ -20,11 +21,15 @@ define amdgpu_kernel void @scratch_buffer_known_high_masklo16() {
 }
 
 ; GCN-LABEL: {{^}}scratch_buffer_known_high_masklo17:
-; GCN: v_mov_b32_e32 [[FI:v[0-9]+]], 0{{$}}
-; SCRATCH128K-NOT: v_and_b32
-; SCRATCH256K: v_and_b32_e32 v{{[0-9]+}}, 0x1fffc, [[FI]]
-; SCRATCH1024K: v_and_b32_e32 v{{[0-9]+}}, 0x1fffc, [[FI]]
-; SCRATCH2048K: v_and_b32_e32 v{{[0-9]+}}, 0x1fffc, [[FI]]
+; SCRATCH256K: s_mov_b32 [[FI:s[0-9]+]], 0{{$}}
+; SCRATCH256K: s_and_b32 s{{[0-9]+}}, [[FI]], 0x1fffc
+
+; SCRATCH1024K: s_mov_b32 [[FI:s[0-9]+]], 0{{$}}
+; SCRATCH1024K: s_and_b32 s{{[0-9]+}}, [[FI]], 0x1fffc
+
+; SCRATCH2048K: s_mov_b32 [[FI:s[0-9]+]], 0{{$}}
+; SCRATCH2048K: s_and_b32 s{{[0-9]+}}, [[FI]], 0x1fffc
+
 ; GCN: {{flat|global}}_store_{{dword|b32}} v[{{[0-9]+:[0-9]+}}],
 define amdgpu_kernel void @scratch_buffer_known_high_masklo17() {
   %alloca = alloca i32, align 4, addrspace(5)
@@ -36,11 +41,17 @@ define amdgpu_kernel void @scratch_buffer_known_high_masklo17() {
 }
 
 ; GCN-LABEL: {{^}}scratch_buffer_known_high_masklo18:
-; GCN: v_mov_b32_e32 [[FI:v[0-9]+]], 0{{$}}
-; SCRATCH128K-NOT: v_and_b32
-; SCRATCH256K-NOT: v_and_b32
-; SCRATCH1024K: v_and_b32_e32 v{{[0-9]+}}, 0x3fffc, [[FI]]
-; SCRATCH2048K: v_and_b32_e32 v{{[0-9]+}}, 0x3fffc, [[FI]]
+; SCRATCH128K: v_mov_b32_e32 [[FI:v[0-9]+]], 0{{$}}
+; SCRATCH256K: v_mov_b32_e32 [[FI:v[0-9]+]], 0{{$}}
+; SCRATCH128K-NOT: and_b32
+; SCRATCH256K-NOT: and_b32
+
+; SCRATCH1024K: s_mov_b32 [[FI:s[0-9]+]], 0{{$}}
+; SCRATCH1024K: s_and_b32 s{{[0-9]+}}, [[FI]], 0x3fffc
+
+; SCRATCH2048K: s_mov_b32 [[FI:s[0-9]+]], 0{{$}}
+; SCRATCH2048K: s_and_b32 s{{[0-9]+}}, [[FI]], 0x3fffc
+
 ; GCN: {{flat|global}}_store_{{dword|b32}} v[{{[0-9]+:[0-9]+}}],
 define amdgpu_kernel void @scratch_buffer_known_high_masklo18() {
   %alloca = alloca i32, align 4, addrspace(5)
@@ -52,11 +63,16 @@ define amdgpu_kernel void @scratch_buffer_known_high_masklo18() {
 }
 
 ; GCN-LABEL: {{^}}scratch_buffer_known_high_masklo20:
-; GCN: v_mov_b32_e32 [[FI:v[0-9]+]], 0{{$}}
-; SCRATCH128K-NOT: v_and_b32
-; SCRATCH256K-NOT: v_and_b32
-; SCRATCH1024K-NOT: v_and_b32
-; SCRATCH2048K: v_and_b32_e32 v{{[0-9]+}}, 0xffffc, [[FI]]
+; SCRATCH128K: v_mov_b32_e32 [[FI:v[0-9]+]], 0{{$}}
+; SCRATCH256K: v_mov_b32_e32 [[FI:v[0-9]+]], 0{{$}}
+; SCRATCH1024K: v_mov_b32_e32 [[FI:v[0-9]+]], 0{{$}}
+
+; SCRATCH128K-NOT: and_b32
+; SCRATCH256K-NOT: and_b32
+; SCRATCH1024K-NOT: and_b32
+
+; SCRATCH2048K: s_mov_b32 [[FI:s[0-9]+]], 0{{$}}
+; SCRATCH2048K: s_and_b32 s{{[0-9]+}}, [[FI]], 0xffffc
 ; GCN: {{flat|global}}_store_{{dword|b32}} v[{{[0-9]+:[0-9]+}}],
 define amdgpu_kernel void @scratch_buffer_known_high_masklo20() {
   %alloca = alloca i32, align 4, addrspace(5)
@@ -69,7 +85,7 @@ define amdgpu_kernel void @scratch_buffer_known_high_masklo20() {
 
 ; GCN-LABEL: {{^}}scratch_buffer_known_high_masklo21:
 ; GCN: v_mov_b32_e32 [[FI:v[0-9]+]], 0{{$}}
-; GCN-NOT: v_and_b32
+; GCN-NOT: and_b32
 ; GCN: {{flat|global}}_store_{{dword|b32}} v[{{[0-9]+:[0-9]+}}],
 define amdgpu_kernel void @scratch_buffer_known_high_masklo21() {
   %alloca = alloca i32, align 4, addrspace(5)
diff --git a/llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll b/llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll
index 8ec3b7e2508ac..a3ebaec4811a9 100644
--- a/llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll
+++ b/llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll
@@ -224,54 +224,55 @@ define amdgpu_kernel void @local_stack_offset_uses_sp_flat(ptr addrspace(1) %out
 ; MUBUF-NEXT:    s_cbranch_scc1 .LBB2_1
 ; MUBUF-NEXT:  ; %bb.2: ; %split
 ; MUBUF-NEXT:    v_mov_b32_e32 v1, 0x4000
-; MUBUF-NEXT:    v_mov_b32_e32 v2, 0x4000
-; MUBUF-NEXT:    v_or_b32_e32 v0, 0x12c0, v1
-; MUBUF-NEXT:    v_or_b32_e32 v1, 0x12d4, v2
-; MUBUF-NEXT:    v_mov_b32_e32 v2, 0x4000
-; MUBUF-NEXT:    buffer_load_dword v5, v1, s[0:3], 0 offen glc
-; MUBUF-NEXT:    s_waitcnt vmcnt(0)
-; MUBUF-NEXT:    v_or_b32_e32 v1, 0x12d0, v2
-; MUBUF-NEXT:    v_mov_b32_e32 v2, 0x4000
-; MUBUF-NEXT:    buffer_load_dword v4, v1, s[0:3], 0 offen glc
+; MUBUF-NEXT:    v_or_b32_e32 v0, 0x12d4, v1
+; MUBUF-NEXT:    v_mov_b32_e32 v1, 0x4000
+; MUBUF-NEXT:    s_movk_i32 s4, 0x4000
+; MUBUF-NEXT:    buffer_load_dword v5, v0, s[0:3], 0 offen glc
 ; MUBUF-NEXT:    s_waitcnt vmcnt(0)
-; MUBUF-NEXT:    v_or_b32_e32 v1, 0x12c4, v2
-; MUBUF-NEXT:    buffer_load_dword v6, v1, s[0:3], 0 offen glc
+; MUBUF-NEXT:    v_or_b32_e32 v0, 0x12d0, v1
+; MUBUF-NEXT:    v_mov_b32_e32 v1, 0x4000
+; MUBUF-NEXT:    s_or_b32 s4, s4, 0x12c0
+; MUBUF-NEXT:    buffer_load_dword v4, v0, s[0:3], 0 offen glc
 ; MUBUF-NEXT:    s_waitcnt vmcnt(0)
-; MUBUF-NEXT:    buffer_load_dword v7, v0, s[0:3], 0 offen glc
+; MUBUF-NEXT:    v_or_b32_e32 v0, 0x12c4, v1
+; MUBUF-NEXT:    v_mov_b32_e32 v3, 0x4000
+; MUBUF-NEXT:    buffer_load_dword v1, v0, s[0:3], 0 offen glc
 ; MUBUF-NEXT:    s_waitcnt vmcnt(0)
-; MUBUF-NEXT:    v_mov_b32_e32 v1, 0x4000
-; MUBUF-NEXT:    v_mov_b32_e32 v2, 0x4000
-; MUBUF-NEXT:    v_or_b32_e32 v0, 0x12cc, v1
-; MUBUF-NEXT:    v_or_b32_e32 v1, 0x12c8, v2
-; MUBUF-NEXT:    v_mov_b32_e32 v2, 0x4000
+; MUBUF-NEXT:    v_mov_b32_e32 v0, s4
+; MUBUF-NEXT:    v_or_b32_e32 v2, 0x12cc, v3
+; MUBUF-NEXT:    v_mov_b32_e32 v6, 0x4000
 ; MUBUF-NEXT:    buffer_load_dword v0, v0, s[0:3], 0 offen glc
 ; MUBUF-NEXT:    s_waitcnt vmcnt(0)
-; MUBUF-NEXT:    v_mov_b32_e32 v3, 0x4000
-; MUBUF-NEXT:    buffer_load_dword v1, v1, s[0:3], 0 offen glc
+; MUBUF-NEXT:    v_mov_b32_e32 v7, 0x4000
+; MUBUF-NEXT:    buffer_load_dword v3, v2, s[0:3], 0 offen glc
 ; MUBUF-NEXT:    s_waitcnt vmcnt(0)
-; MUBUF-NEXT:    v_mov_b32_e32 v10, 0x4000
-; MUBUF-NEXT:    buffer_load_dword v8, v2, s[0:3], 0 offen glc
+; MUBUF-NEXT:    v_or_b32_e32 v2, 0x12c8, v6
+; MUBUF-NEXT:    v_mov_b32_e32 v8, 0x4000
+; MUBUF-NEXT:    v_mov_b32_e32 v9, 0x4000
+; MUBUF-NEXT:    buffer_load_dword v2, v2, s[0:3], 0 offen glc
 ; MUBUF-NEXT:    s_waitcnt vmcnt(0)
-; MUBUF-NEXT:    v_mov_b32_e32 v2, 0x4000
-; MUBUF-NEXT:    buffer_load_dword v9, v2, s[0:3], 0 offen offset:4 glc
+; MUBUF-NEXT:    v_mov_b32_e32 v10, 0x4000
+; MUBUF-NEXT:    buffer_load_dword v6, v7, s[0:3], 0 offen glc
 ; MUBUF-NEXT:    s_waitcnt vmcnt(0)
 ; MUBUF-NEXT:    v_mov_b32_e32 v11, 0x4000
-; MUBUF-NEXT:    buffer_load_dword v2, v3, s[0:3], 0 offen offset:8 glc
+; MUBUF-NEXT:    buffer_load_dword v7, v8, s[0:3], 0 offen offset:4 glc
 ; MUBUF-NEXT:    s_waitcnt vmcnt(0)
 ; MUBUF-NEXT:    v_mov_b32_e32 v12, 0x4000
-; MUBUF-NEXT:    buffer_load_dword v3, v10, s[0:3], 0 offen offset:12 glc
+; MUBUF-NEXT:    buffer_load_dword v8, v9, s[0:3], 0 offen offset:8 glc
 ; MUBUF-NEXT:    s_waitcnt vmcnt(0)
 ; MUBUF-NEXT:    s_load_dwordx2 s[4:5], s[8:9], 0x0
+; MUBUF-NEXT:    buffer_load_dword v9, v10, s[0:3], 0 offen offset:12 glc
+; MUBUF-NEXT:    s_waitcnt vmcnt(0)
+; MUBUF-NEXT:    v_add_co_u32_e32 v2, vcc, v2, v8
 ; MUBUF-NEXT:    buffer_load_dword v10, v11, s[0:3], 0 offen offset:16 glc
 ; MUBUF-NEXT:    s_waitcnt vmcnt(0)
-; MUBUF-NEXT:    v_add_co_u32_e32 v2, vcc, v1, v2
+; MUBUF-NEXT:    v_addc_co_u32_e32 v3, vcc, v3, v9, vcc
 ; MUBUF-NEXT:    buffer_load_dword v11, v12, s[0:3], 0 offen offset:20 glc
 ; MUBUF-NEXT:    s_waitcnt vmcnt(0)
-; MUBUF-NEXT:    v_addc_co_u32_e32 v3, vcc, v0, v3, vcc
-; MUBUF-NEXT:    v_add_co_u32_e32 v0, vcc, v7, v8
-; MUBUF-NEXT:    v_addc_co_u32_e32 v1, vcc, v6, v9, vcc
-; MUBUF-NEXT:    v_add_co_u32_e32 v4, vcc, v4, v10
+; MUBUF-NEXT:    v_add_co_u32_e32 v0, vcc, v0, v6
+; MUBUF-NEXT:    v_addc_co_u32_e32 v1, vcc, v1, v7, vcc
 ; MUBUF-NEXT:    v_mov_b32_e32 v12, 0
+; MUBUF-NEXT:    v_add_co_u32_e32 v4, vcc, v4, v10
 ; MUBUF-NEXT:    v_addc_co_u32_e32 v5, vcc, v5, v11, vcc
 ; MUBUF-NEXT:    s_waitcnt lgkmcnt(0)
 ; MUBUF-NEXT:    global_store_dwordx2 v12, v[4:5], s[4:5] offset:16

arsenm · 2025-03-05T01:03:04Z

Merge activity

Mar 4, 8:03 PM EST: A user started a stack merge that includes this pull request via Graphite.
Mar 4, 8:10 PM EST: Graphite couldn't merge this PR because it had conflicts with the trunk branch.
Mar 4, 10:07 PM EST: A user started a stack merge that includes this pull request via Graphite.
Mar 4, 10:09 PM EST: A user merged this pull request with Graphite.

…Index This adds handling of s_add_u32, which is handled and removes handling of s_or_b32 and s_and_b32, which are not. I was working on handling them in #102345, but need to finish that patch. This fixes a regression exposed by a316539 where the final instruction would use two literals.

llvm-ci · 2025-03-05T03:25:51Z

LLVM Buildbot has detected a new failure on builder openmp-s390x-linux running on systemz-1 while building llvm at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/88/builds/8749

Here is the relevant piece of the build log for the reference

Step 6 (test-openmp) failure: test (failure)
******************** TEST 'libomp :: tasking/issue-94260-2.c' FAILED ********************
Exit Code: -11

Command Output (stdout):
--
# RUN: at line 1
/home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/./bin/clang -fopenmp   -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test -L /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -fno-omit-frame-pointer -mbackchain -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/ompt /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic && /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# executed command: /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/./bin/clang -fopenmp -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test -L /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -fno-omit-frame-pointer -mbackchain -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/ompt /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic
# executed command: /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -11

--

********************

…Index (llvm#129633) This adds handling of s_add_u32, which is handled and removes handling of s_or_b32 and s_and_b32, which are not. I was working on handling them in llvm#102345, but need to finish that patch. This fixes a regression exposed by a316539 where the final instruction would use two literals.

arsenm added backend:AMDGPU llvm:globalisel labels Mar 4, 2025 — with Graphite App

arsenm requested review from jayfoad, jofrn, Pierre-vh, rampitec, ritter-x2a and shiltian March 4, 2025 02:54

arsenm marked this pull request as ready for review March 4, 2025 02:54

arsenm mentioned this pull request Mar 4, 2025

AMDGPU: Handle s_add_u32 in eliminateFrameIndex #129628

Merged

rampitec approved these changes Mar 4, 2025

View reviewed changes

arsenm force-pushed the users/arsenm/eliminateFrameIndex-s-add-u32 branch 2 times, most recently from 7ef23b8 to 62bfd5c Compare March 5, 2025 01:06

Base automatically changed from users/arsenm/eliminateFrameIndex-s-add-u32 to main March 5, 2025 01:09

arsenm force-pushed the users/arsenm/amdgpu/make-frameIndexMayFold-consistent-eliminateFrameIndex branch from 259d50b to d48682d Compare March 5, 2025 02:20

arsenm merged commit 3e53aea into main Mar 5, 2025
8 of 10 checks passed

arsenm deleted the users/arsenm/amdgpu/make-frameIndexMayFold-consistent-eliminateFrameIndex branch March 5, 2025 03:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU: Make frame index folding logic consistent with eliminateFrameIndex #129633

AMDGPU: Make frame index folding logic consistent with eliminateFrameIndex #129633

Uh oh!

arsenm commented Mar 4, 2025

Uh oh!

arsenm commented Mar 4, 2025 •

edited

Loading

Uh oh!

llvmbot commented Mar 4, 2025 •

edited

Loading

Uh oh!

arsenm commented Mar 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

llvm-ci commented Mar 5, 2025

Uh oh!

Uh oh!

AMDGPU: Make frame index folding logic consistent with eliminateFrameIndex #129633

AMDGPU: Make frame index folding logic consistent with eliminateFrameIndex #129633

Uh oh!

Conversation

arsenm commented Mar 4, 2025

Uh oh!

arsenm commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

llvm-ci commented Mar 5, 2025

Uh oh!

Uh oh!

arsenm commented Mar 4, 2025 •

edited

Loading

llvmbot commented Mar 4, 2025 •

edited

Loading

arsenm commented Mar 5, 2025 •

edited

Loading