-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[NVPTX] Check Before inserting AddrSpaceCastInst in NVPTXLoweringAlloca #106127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
2506e94
0c9a46d
6ec68a5
1bdc7fa
bd08783
f1680bc
833e061
0cee1cd
26210e3
f84a849
22beb79
8b6aa8c
7c4e9a3
1f9c51e
7a54f90
dd7d7f2
895d692
0fa98d1
4a020cb
b1f7b70
1b84635
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
; RUN: opt < %s -S -nvptx-lower-alloca -infer-address-spaces | FileCheck %s | ||
; RUN: opt < %s -S -nvptx-lower-alloca | FileCheck %s --check-prefix LOWERALLOCAONLY | ||
; RUN: llc < %s -march=nvptx64 -mcpu=sm_35 | FileCheck %s --check-prefix PTX | ||
; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_35 | %ptxas-verify %} | ||
|
||
|
@@ -11,13 +12,32 @@ define void @kernel() { | |
%A = alloca i32 | ||
; CHECK: addrspacecast ptr %A to ptr addrspace(5) | ||
; CHECK: store i32 0, ptr addrspace(5) {{%.+}} | ||
; LOWERALLOCAONLY: [[V1:%.*]] = addrspacecast ptr %A to ptr addrspace(5) | ||
; LOWERALLOCAONLY: [[V2:%.*]] = addrspacecast ptr addrspace(5) [[V1]] to ptr | ||
; LOWERALLOCAONLY: store i32 0, ptr [[V2]], align 4 | ||
; PTX: st.local.u32 [{{%rd[0-9]+}}], {{%r[0-9]+}} | ||
store i32 0, ptr %A | ||
call void @callee(ptr %A) | ||
ret void | ||
} | ||
|
||
define void @alloca_in_explicit_local_as() { | ||
; LABEL: @lower_alloca_addrspace5 | ||
; PTX-LABEL: .visible .func alloca_in_explicit_local_as( | ||
%A = alloca i32, addrspace(5) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Huh. I don't know that AS-specific alloca is a thing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't knew it was supported, but if it is supported, then NVPTX should be setting this for all There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Things in non-generic AS tends to get in the way of optimizations, so doing it up-front will likely result in worse code. Perhaps one day everything will handle non-default AS as well as generic AS, but I don't think we're there yet. So, for now, I do not see any benefit of allocas in explicit AS -- we can easily infer the AS if/when we need to, and our priority is to eliminate as many of allocas as we can as they are a major bad news for performance, so not getting in the way of LLVM optimizations is more important. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding this test to make sure the newly added logic is working correctly (instead of asserting). I don't have too much understanding with the optimization tradeoff you mentioned here so I'll defer it to your best judgement. Just wanna make sure if there is any action items I shall take for this PR specifically wrt this test? 🙏 Shall I add a comment to make the intention of this test more clear? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The test is OK. Extra comments are helpful, but it's even better if the test is self-documenting. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sounds good, pushed an update. |
||
; CHECK: store i32 0, ptr addrspace(5) {{%.+}} | ||
; PTX: st.local.u32 [%SP+0], {{%r[0-9]+}} | ||
; LOWERALLOCAONLY: [[V1:%.*]] = addrspacecast ptr addrspace(5) %A to ptr | ||
; LOWERALLOCAONLY: store i32 0, ptr [[V1]], align 4 | ||
store i32 0, ptr addrspace(5) %A | ||
call void @callee(ptr addrspace(5) %A) | ||
ret void | ||
} | ||
|
||
declare void @callee(ptr) | ||
declare void @callee_addrspace5(ptr addrspace(5)) | ||
|
||
!nvvm.annotations = !{!0} | ||
!nvvm.annotations = !{!1} | ||
!0 = !{ptr @kernel, !"kernel", i32 1} | ||
!1 = !{ptr @alloca_in_explicit_local_as, !"alloca_in_explicit_local_as", i32 1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add an overall comment here describing what we do here and why. Otherwise it's not obvious why we do the seemingly pointless casting to AS5 and back.
Basically, we need to make sure that LLVM has info that alloca is in AS(5), but alloca's users still need a generic pointer to operate on.
For allocas in generic AS, we add asc to AS(5) and back to generic.
For allocas in AS5, we just need an ASC to generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done!