Skip to content

[AMDGPU] Fix predicates for BUFFER_ATOMIC_FMIN/FMAX patterns #89066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion llvm/lib/Target/AMDGPU/BUFInstructions.td
Original file line number Diff line number Diff line change
Expand Up @@ -1726,7 +1726,7 @@ let SubtargetPredicate = isGFX12Plus in {
defm : SIBufferAtomicPat_Common<"SIbuffer_atomic_cond_sub_u32", i32, "BUFFER_ATOMIC_COND_SUB_U32_VBUFFER", ["noret"]>;
}

let SubtargetPredicate = isGFX6GFX7GFX10Plus in {
let OtherPredicates = [isGFX6GFX7GFX10Plus] in {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is annoying not to be able to use SubtargetPredicate for something that is clearly a subtarget predicate, but it clashes with the use of SubtargetPredicate = HasUnrestrictedSOffset inside SIBufferAtomicPat.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I came across this in the past too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect these to be swapped, with HasUnrestrictedSOffset in OtherPredicates

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but that just causes the converse problem, as there are already a bunch of cases that override OtherPredicates in SIBufferAtomicPat.

defm : SIBufferAtomicPat<"SIbuffer_atomic_fmin", f32, "BUFFER_ATOMIC_FMIN">;
defm : SIBufferAtomicPat<"SIbuffer_atomic_fmax", f32, "BUFFER_ATOMIC_FMAX">;
}
Expand Down
72 changes: 72 additions & 0 deletions llvm/test/CodeGen/AMDGPU/fp-min-max-buffer-atomics.ll
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,14 @@
; RUN: llc < %s -mtriple=amdgcn -mcpu=gfx1010 -verify-machineinstrs | FileCheck %s -check-prefix=GFX10
; RUN: llc < %s -mtriple=amdgcn -mcpu=gfx1030 -verify-machineinstrs | FileCheck %s -check-prefix=GFX1030
; RUN: llc < %s -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs | FileCheck %s -check-prefix=GFX1100
; RUN: llc < %s -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs | FileCheck %s -check-prefix=GFX12

; RUN: llc < %s -global-isel -mtriple=amdgcn -mcpu=verde -verify-machineinstrs | FileCheck %s -check-prefix=G_SI
; RUN: llc < %s -global-isel -mtriple=amdgcn -mcpu=hawaii -verify-machineinstrs | FileCheck %s -check-prefix=G_GFX7
; RUN: llc < %s -global-isel -mtriple=amdgcn -mcpu=gfx1010 -verify-machineinstrs | FileCheck %s -check-prefix=G_GFX10
; RUN: llc < %s -global-isel -mtriple=amdgcn -mcpu=gfx1030 -verify-machineinstrs | FileCheck %s -check-prefix=G_GFX1030
; RUN: llc < %s -global-isel -mtriple=amdgcn -mcpu=gfx1100 -verify-machineinstrs | FileCheck %s -check-prefix=G_GFX1100
; RUN: llc < %s -global-isel -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs | FileCheck %s -check-prefix=GFX12

declare float @llvm.amdgcn.raw.buffer.atomic.fmin.f32(float, <4 x i32>, i32, i32, i32 immarg)
declare float @llvm.amdgcn.raw.buffer.atomic.fmax.f32(float, <4 x i32>, i32, i32, i32 immarg)
Expand Down Expand Up @@ -70,6 +72,18 @@ define amdgpu_kernel void @raw_buffer_atomic_min_noret_f32(<4 x i32> inreg %rsrc
; GFX1100-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX1100-NEXT: s_endpgm
;
; GFX12-LABEL: raw_buffer_atomic_min_noret_f32:
; GFX12: ; %bb.0: ; %main_body
; GFX12-NEXT: s_clause 0x1
; GFX12-NEXT: s_load_b64 s[4:5], s[0:1], 0x34
; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
; GFX12-NEXT: s_wait_kmcnt 0x0
; GFX12-NEXT: v_dual_mov_b32 v0, s4 :: v_dual_mov_b32 v1, s5
; GFX12-NEXT: buffer_atomic_min_num_f32 v0, v1, s[0:3], null offen
; GFX12-NEXT: s_nop 0
; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX12-NEXT: s_endpgm
;
; G_SI-LABEL: raw_buffer_atomic_min_noret_f32:
; G_SI: ; %bb.0: ; %main_body
; G_SI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd
Expand Down Expand Up @@ -170,6 +184,15 @@ define amdgpu_ps void @raw_buffer_atomic_min_rtn_f32(<4 x i32> inreg %rsrc, floa
; GFX1100-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX1100-NEXT: s_endpgm
;
; GFX12-LABEL: raw_buffer_atomic_min_rtn_f32:
; GFX12: ; %bb.0: ; %main_body
; GFX12-NEXT: buffer_atomic_min_num_f32 v0, v1, s[0:3], null offen th:TH_ATOMIC_RETURN
; GFX12-NEXT: s_wait_loadcnt 0x0
; GFX12-NEXT: global_store_b32 v[0:1], v0, off
; GFX12-NEXT: s_nop 0
; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX12-NEXT: s_endpgm
;
; G_SI-LABEL: raw_buffer_atomic_min_rtn_f32:
; G_SI: ; %bb.0: ; %main_body
; G_SI-NEXT: buffer_atomic_fmin v0, v1, s[0:3], 0 offen glc
Expand Down Expand Up @@ -292,6 +315,20 @@ define amdgpu_kernel void @raw_buffer_atomic_min_rtn_f32_off4_slc(<4 x i32> inre
; GFX1100-NEXT: ds_store_b32 v1, v0
; GFX1100-NEXT: s_endpgm
;
; GFX12-LABEL: raw_buffer_atomic_min_rtn_f32_off4_slc:
; GFX12: ; %bb.0: ; %main_body
; GFX12-NEXT: s_clause 0x1
; GFX12-NEXT: s_load_b96 s[4:6], s[0:1], 0x34
; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
; GFX12-NEXT: s_wait_kmcnt 0x0
; GFX12-NEXT: v_dual_mov_b32 v0, s4 :: v_dual_mov_b32 v1, s5
; GFX12-NEXT: s_mov_b32 s4, 4
; GFX12-NEXT: buffer_atomic_min_num_f32 v0, v1, s[0:3], s4 offen th:TH_ATOMIC_NT_RETURN
; GFX12-NEXT: v_mov_b32_e32 v1, s6
; GFX12-NEXT: s_wait_loadcnt 0x0
; GFX12-NEXT: ds_store_b32 v1, v0
; GFX12-NEXT: s_endpgm
;
; G_SI-LABEL: raw_buffer_atomic_min_rtn_f32_off4_slc:
; G_SI: ; %bb.0: ; %main_body
; G_SI-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0xd
Expand Down Expand Up @@ -427,6 +464,18 @@ define amdgpu_kernel void @raw_buffer_atomic_max_noret_f32(<4 x i32> inreg %rsrc
; GFX1100-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX1100-NEXT: s_endpgm
;
; GFX12-LABEL: raw_buffer_atomic_max_noret_f32:
; GFX12: ; %bb.0: ; %main_body
; GFX12-NEXT: s_clause 0x1
; GFX12-NEXT: s_load_b64 s[4:5], s[0:1], 0x34
; GFX12-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
; GFX12-NEXT: s_wait_kmcnt 0x0
; GFX12-NEXT: v_dual_mov_b32 v0, s4 :: v_dual_mov_b32 v1, s5
; GFX12-NEXT: buffer_atomic_max_num_f32 v0, v1, s[0:3], null offen
; GFX12-NEXT: s_nop 0
; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX12-NEXT: s_endpgm
;
; G_SI-LABEL: raw_buffer_atomic_max_noret_f32:
; G_SI: ; %bb.0: ; %main_body
; G_SI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd
Expand Down Expand Up @@ -527,6 +576,15 @@ define amdgpu_ps void @raw_buffer_atomic_max_rtn_f32(<4 x i32> inreg %rsrc, floa
; GFX1100-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX1100-NEXT: s_endpgm
;
; GFX12-LABEL: raw_buffer_atomic_max_rtn_f32:
; GFX12: ; %bb.0: ; %main_body
; GFX12-NEXT: buffer_atomic_max_num_f32 v0, v1, s[0:3], null offen th:TH_ATOMIC_RETURN
; GFX12-NEXT: s_wait_loadcnt 0x0
; GFX12-NEXT: global_store_b32 v[0:1], v0, off
; GFX12-NEXT: s_nop 0
; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX12-NEXT: s_endpgm
;
; G_SI-LABEL: raw_buffer_atomic_max_rtn_f32:
; G_SI: ; %bb.0: ; %main_body
; G_SI-NEXT: buffer_atomic_fmax v0, v1, s[0:3], 0 offen glc
Expand Down Expand Up @@ -641,6 +699,20 @@ define amdgpu_kernel void @raw_buffer_atomic_max_rtn_f32_off4_slc(<4 x i32> inre
; GFX1100-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX1100-NEXT: s_endpgm
;
; GFX12-LABEL: raw_buffer_atomic_max_rtn_f32_off4_slc:
; GFX12: ; %bb.0: ; %main_body
; GFX12-NEXT: s_load_b256 s[0:7], s[0:1], 0x24
; GFX12-NEXT: s_wait_kmcnt 0x0
; GFX12-NEXT: v_dual_mov_b32 v0, s4 :: v_dual_mov_b32 v1, s5
; GFX12-NEXT: s_mov_b32 s4, 4
; GFX12-NEXT: buffer_atomic_max_num_f32 v0, v1, s[0:3], s4 offen th:TH_ATOMIC_NT_RETURN
; GFX12-NEXT: v_mov_b32_e32 v1, 0
; GFX12-NEXT: s_wait_loadcnt 0x0
; GFX12-NEXT: global_store_b32 v1, v0, s[6:7]
; GFX12-NEXT: s_nop 0
; GFX12-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
; GFX12-NEXT: s_endpgm
;
; G_SI-LABEL: raw_buffer_atomic_max_rtn_f32_off4_slc:
; G_SI: ; %bb.0: ; %main_body
; G_SI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x9
Expand Down