-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] - Add s_bitreplicate intrinsic #69209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2 | ||
; RUN: llc -march=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -global-isel=1 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX11 %s | ||
; RUN: llc -march=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -global-isel=0 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX11 %s | ||
|
||
declare i64 @llvm.amdgcn.s.bitreplicate(i32) | ||
|
||
define i64 @test_s_bitreplicate_constant() { | ||
; GFX11-LABEL: test_s_bitreplicate_constant: | ||
; GFX11: ; %bb.0: ; %entry | ||
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) | ||
; GFX11-NEXT: s_bitreplicate_b64_b32 s[0:1], 0x85fe3a92 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a follow up you could implement constant folding for this in |
||
; GFX11-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1 | ||
; GFX11-NEXT: s_setpc_b64 s[30:31] | ||
entry: | ||
%br = call i64 @llvm.amdgcn.s.bitreplicate(i32 u0x85FE3A92) | ||
ret i64 %br | ||
} | ||
|
||
define amdgpu_cs void @test_s_bitreplicate_sgpr(i32 inreg %mask, ptr addrspace(1) %out) { | ||
; GFX11-LABEL: test_s_bitreplicate_sgpr: | ||
; GFX11: ; %bb.0: ; %entry | ||
; GFX11-NEXT: s_bitreplicate_b64_b32 s[0:1], s0 | ||
; GFX11-NEXT: v_dual_mov_b32 v3, s1 :: v_dual_mov_b32 v2, s0 | ||
; GFX11-NEXT: global_store_b64 v[0:1], v[2:3], off | ||
; GFX11-NEXT: s_nop 0 | ||
; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) | ||
; GFX11-NEXT: s_endpgm | ||
entry: | ||
%br = call i64 @llvm.amdgcn.s.bitreplicate(i32 %mask) | ||
store i64 %br, ptr addrspace(1) %out | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any particular reason to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I understand it, the I needed this (or a similar) cc, because then the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for explaining. I had not noticed that this test uses There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I admit it is a bit odd at first glance. However, I did not want to 'clutter' the other tests with the stores, so I made an exception for this case. |
||
ret void | ||
} | ||
|
||
define i64 @test_s_bitreplicate_vgpr(i32 %mask) { | ||
; GFX11-LABEL: test_s_bitreplicate_vgpr: | ||
; GFX11: ; %bb.0: ; %entry | ||
; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) | ||
; GFX11-NEXT: v_readfirstlane_b32 s0, v0 | ||
; GFX11-NEXT: s_bitreplicate_b64_b32 s[0:1], s0 | ||
; GFX11-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1 | ||
; GFX11-NEXT: s_setpc_b64 s[30:31] | ||
entry: | ||
%br = call i64 @llvm.amdgcn.s.bitreplicate(i32 %mask) | ||
ret i64 %br | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unfortunately seems necessary.
The FixSGPRCopies pass incorrectly decides to turn
s_bitreplicate
into a VALU instruction when there is a VGPR input. This is a similar issue to D45826.I refrained from writing a helper method for now because there would be only two calls for now. If it makes sense to add one, I will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emitting readfirstlane is just wrong, you have to emulate the operation with a VALU expansion if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with the
convergent
attribute of the intrinsic? I thought we can rule out any divergent inputs with that.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that just maybe will prevent the optimizer from introducing new divergences. You cannot guarantee any arbitrary input was uniform from the start
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There’s more discussions and reasoning about this in the main thread of this PR, but as far as I understand the intention, the intrinsic only accepts uniform values. From the intrinsic definition:
A uniform value can still end up in a VGPR, so it makes sense to me to use readfirstlane in that case.