-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[NVPTX] Check Before inserting AddrSpaceCastInst in NVPTXLoweringAlloca #106127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NVPTX] Check Before inserting AddrSpaceCastInst in NVPTXLoweringAlloca #106127
Conversation
@llvm/pr-subscribers-backend-nvptx Author: weiwei chen (weiweichen) ChangesIf Full diff: https://github.com/llvm/llvm-project/pull/106127.diff 2 Files Affected:
diff --git a/llvm/lib/Target/NVPTX/NVPTXLowerAlloca.cpp b/llvm/lib/Target/NVPTX/NVPTXLowerAlloca.cpp
index 369238436083c7..62b789c008bb43 100644
--- a/llvm/lib/Target/NVPTX/NVPTXLowerAlloca.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXLowerAlloca.cpp
@@ -72,12 +72,21 @@ bool NVPTXLowerAlloca::runOnFunction(Function &F) {
Changed = true;
auto ETy = allocaInst->getAllocatedType();
auto LocalAddrTy = PointerType::get(ETy, ADDRESS_SPACE_LOCAL);
- auto NewASCToLocal = new AddrSpaceCastInst(allocaInst, LocalAddrTy, "");
- auto GenericAddrTy = PointerType::get(ETy, ADDRESS_SPACE_GENERIC);
- auto NewASCToGeneric =
- new AddrSpaceCastInst(NewASCToLocal, GenericAddrTy, "");
- NewASCToLocal->insertAfter(allocaInst);
- NewASCToGeneric->insertAfter(NewASCToLocal);
+ PointerType *allocInstPtrTy = dyn_cast_or_null<PointerType>(allocaInst->getType()->getScalarType());
+ assert(allocInstPtrTy && "AllocInst scalar type is not a PointerType.");
+ Instruction* NewASCToGeneric = allocaInst;
+ if(allocInstPtrTy->getAddressSpace() != ADDRESS_SPACE_LOCAL) {
+ // Only insert a new AddrSpaceCastInst if
+ // allocaInst is not already in ADDRESS_SPACE_LOCAL.
+ auto NewASCToLocal =
+ new AddrSpaceCastInst(allocaInst, LocalAddrTy, "");
+ auto GenericAddrTy = PointerType::get(ETy, ADDRESS_SPACE_GENERIC);
+ NewASCToGeneric =
+ new AddrSpaceCastInst(NewASCToLocal, GenericAddrTy, "");
+ NewASCToLocal->insertAfter(allocaInst);
+ NewASCToGeneric->insertAfter(NewASCToLocal);
+ }
+
for (Use &AllocaUse : llvm::make_early_inc_range(allocaInst->uses())) {
// Check Load, Store, GEP, and BitCast Uses on alloca and make them
// use the converted generic address, in order to expose non-generic
diff --git a/llvm/test/CodeGen/NVPTX/lower-alloca.ll b/llvm/test/CodeGen/NVPTX/lower-alloca.ll
index b1c34c8b5ecd78..68793dffb01e8a 100644
--- a/llvm/test/CodeGen/NVPTX/lower-alloca.ll
+++ b/llvm/test/CodeGen/NVPTX/lower-alloca.ll
@@ -17,7 +17,21 @@ define void @kernel() {
ret void
}
+define void @alloc_already_in_addrspace5() {
+; LABEL: @lower_alloca_addrspace5
+; PTX-LABEL: .visible .func alloc_already_in_addrspace5(
+ %A = alloca i32, addrspace(5)
+; CHECK-NOT: addrspacecast ptr %A to ptr addrspace(5)
+; CHECK: store i32 0, ptr addrspace(5) {{%.+}}
+; PTX: st.local.u32 [%SP+0], {{%r[0-9]+}}
+ store i32 0, ptr addrspace(5) %A
+ call void @callee(ptr addrspace(5) %A)
+ ret void
+}
+
declare void @callee(ptr)
+declare void @callee_addrspace5(ptr addrspace(5))
!nvvm.annotations = !{!0}
!0 = !{ptr @kernel, !"kernel", i32 1}
+!1 = !{ptr @alloc_already_in_addrspace5, !"alloc_already_in_addrspace5", i32 1}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
@weiweichen, where is this assertion? Maybe it needs to be changed. |
It comes from here
where
@justinfargnoli, would it be better to remove ☝️ instead? |
Interesting, I would've though that this is allowed. However, the IR Spec says otherwise (source):
Since the IR spec does not allow this, we should not remove the check. |
…eic/nvptx/check-before-addrspacecast
Good point! Thank you for checking the spec! Leaving the check as is and perhaps the fix in this PR makes more sense? |
Agreed :) |
define void @alloc_already_in_addrspace5() { | ||
; LABEL: @lower_alloca_addrspace5 | ||
; PTX-LABEL: .visible .func alloc_already_in_addrspace5( | ||
%A = alloca i32, addrspace(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. I don't know that AS-specific alloca is a thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't knew it was supported, but if it is supported, then NVPTX should be setting this for all alloca
s, and then converting the resulting pointer back to generic, if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things in non-generic AS tends to get in the way of optimizations, so doing it up-front will likely result in worse code. Perhaps one day everything will handle non-default AS as well as generic AS, but I don't think we're there yet.
So, for now, I do not see any benefit of allocas in explicit AS -- we can easily infer the AS if/when we need to, and our priority is to eliminate as many of allocas as we can as they are a major bad news for performance, so not getting in the way of LLVM optimizations is more important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this test to make sure the newly added logic is working correctly (instead of asserting). I don't have too much understanding with the optimization tradeoff you mentioned here so I'll defer it to your best judgement. Just wanna make sure if there is any action items I shall take for this PR specifically wrt this test? 🙏 Shall I add a comment to make the intention of this test more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is OK. Extra comments are helpful, but it's even better if the test is self-documenting.
E.g. renaming the test above to alloca_in_explicit_local_as
would do the trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, pushed an update.
PointerType *AllocInstPtrTy = | ||
cast<PointerType>(allocaInst->getType()->getScalarType()); | ||
Instruction *NewASCToGeneric = allocaInst; | ||
if (AllocInstPtrTy->getAddressSpace() != ADDRESS_SPACE_LOCAL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add an assert that alloca's address space must be generic. We can't cast any other AS to local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alloca's address space must be generic
Is this a hard requirement for NVPTX?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so. We can only cast to generic or from generic.
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, interesting, thank you for pointing out this definition! Maybe the original assert is fine then 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, wait, the spec definition of casting can only from generic or from generic is not conflicting with the change here, since we basically are saying if the source is not generic, please don't cast. This is not in violation with the spec, no? Or is there a restriction that alloca
can only happen in generic
space but not any other spaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the choice was between alloca in generic AS and alloca in AS(5), then indeed the choice would be to cast or skip. However, considering that alloca's AS can be explicitly specified to be in any AS, it would be prudent to have an assertion, so we catch unexpected input early, at the point where we care about it.
If we don't have an assertion, and we get an alloca in AS 333, we'll happily insert an ASC which will cause lowering problems further down in the pipeline and whoever will have to debug that will have to back-track to here before moving on to the real root cause -- nonsensical input IR. I'd much rather break things here, as soon as we're aware that something is wrong.
Definitely agree that assert should be there if the AS is not local (5)! However, the change in my PR does not remove that assert if the AS is say 333
because the if
added to avoid cast will not be true and the cast
will still happen and assert, right? Do you mean that an explicit assert should be added in NVPTXLowerAlloca.cpp
instead of rely on the constructor of AddrSpaceCastInst
to assert? If so, I think that's a good point too, and we can provide more meaningful error message as well.
I guess one thing I feel a bit confused about is whether alloca
has to be in generic
only. Based on my read of the discussions, I feel like it's not the case, it can be either in generic
(which the compiler will later cast it to local
) or in local
(already, so no cast is needed by the compiler). In other words, alloca
technically can only be in local
and compiler will either do a cast if it's generic
from the user program, or assert/error out if AS is not either local
or generic
. Is this a correct understanding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that an explicit assert should be added in NVPTXLowerAlloca.cpp instead of rely on the constructor of AddrSpaceCastInst
Yes. It's better to catch it early, rather than add more wrong things, and wait for something to break later on, and then back-track.
By default, IR operates on generic pointers. Specific address spaces are target-dependent and are not guaranteed to be supported, even if IR remains syntactically valid. For NVPTX, we can only support allocas in the default AS or AS(5).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the right way for NVPTX / LLVM to exploit that all generic statespace allocas point to local statespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that an explicit assert should be added in NVPTXLowerAlloca.cpp instead of rely on the constructor of AddrSpaceCastInst
Yes. It's better to catch it early, rather than add more wrong things, and wait for something to break later on, and then back-track.
By default, IR operates on generic pointers. Specific address spaces are target-dependent and are not guaranteed to be supported, even if IR remains syntactically valid. For NVPTX, we can only support allocas in the default AS or AS(5).
Sounds good! I pushed a commit with the early checking and assert with a more specific error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the right way for NVPTX / LLVM to exploit that all generic statespace allocas point to local statespace?
I'm not sure there's much to exploit. The best we can do with allocas is to get rid of them, which makes their AS a moot point. The allocas that are still left around, will be eventually lowered into the .local
storage in PTX. Theoretically knowing AS could be used to tell LLVM that cost of the loads/stores is high, but I do not see it making much of a difference whether LLVM can do anything different based on that info. If it needs to use the alloca, it has to do it, and the potential performance difference between direct or generic pointer access will be dwarfed by the actual cost of touching the .local
memory.
So, accepting alloca addrspace (5)
is fine for completeness sake, but I don't see it making any measurable positive impact on performance.
…eic/nvptx/check-before-addrspacecast
NewASCToGeneric->insertAfter(NewASCToLocal); | ||
PointerType *AllocInstPtrTy = | ||
cast<PointerType>(allocaInst->getType()->getScalarType()); | ||
Instruction *NewASCToGeneric = allocaInst; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use a better name, given that ASC may or may not be there now. AllocaInGenericAS
?
I'd also push the variables down, into the if
body -- some of them seem to be needed by the now-conditional code only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, this is original name in the source code, happy to rename it if that's better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're changing what the code does, Adjusting variable names as appropriate is the right thing to do.
// allocaInst is not already in ADDRESS_SPACE_LOCAL. | ||
auto NewASCToLocal = | ||
new AddrSpaceCastInst(allocaInst, LocalAddrTy, ""); | ||
auto GenericAddrTy = PointerType::get(ETy, ADDRESS_SPACE_GENERIC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single-use variables could be pushed into the AddrSpaceCastInst
call without hurting readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also just pushing the same original code around instead of new addition of the PR, but happy to apply the suggested change if that helps with readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, taking a closer look at the original logic of the code block
auto ETy = allocaInst->getAllocatedType();
auto LocalAddrTy = PointerType::get(ETy, ADDRESS_SPACE_LOCAL);
auto NewASCToLocal = new AddrSpaceCastInst(allocaInst, LocalAddrTy, "");
auto GenericAddrTy = PointerType::get(ETy, ADDRESS_SPACE_GENERIC);
auto NewASCToGeneric =
new AddrSpaceCastInst(NewASCToLocal, GenericAddrTy, "");
NewASCToLocal->insertAfter(allocaInst);
NewASCToGeneric->insertAfter(NewASCToLocal);
It seems to be casting the AS to Local
and then back Generic
before moving on. This looks very inefficient if the allocaInst
is already in Generic
and this is only here so that it can hit the assert
???
I updated the logic to:
- don't do any AS cast if
allocInst
is already inGeneric
- If
allocInst
is inLocal
, cast it toGeneric
. - Applied some of the name changing and readability suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, actually, I found out that -infer-address-spaces
won't work properly if the extra ASCast to LOCAL is not inserted if alloca
is already in Generic . So even alloca
is already in generic, we still need to insert two ASCast to make go to local
and back to generic
to make -infer-address-spaces
work properly (i.e. pass the test) 🤯 🤯 🤯 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-infer-address-spaces won't work properly if the extra ASCast to LOCAL is not inserted if alloca is already in Generic
That didn't parse. Are you saying that inferring does not work for the plain alloca()
-- it is already in generic AS. In other words it's broken right now?
Or that your patch skipped that "asc(asc(alloca, to AS(5)), to AS(0)" that the current code adds and that infer address spaces relies on? In that case, yes, we'll still need to keep those ASCs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it's a confusion from the original test ⬇️ (as well as my explanation 🤦 ). Let me elaborate:
define void @kernel() {
; LABEL: @lower_alloca
; PTX-LABEL: .visible .entry kernel(
%A = alloca i32
; CHECK: addrspacecast ptr %A to ptr addrspace(5)
; CHECK: store i32 0, ptr addrspace(5) {{%.+}}
; PTX: st.local.u32 [{{%rd[0-9]+}}], {{%r[0-9]+}}
store i32 0, ptr %A
call void @callee(ptr %A)
ret void
}
As shown above, FileCheck is checking that store
has a ptr addrsapce(5)
. This only happens if there is a chain of "asc(asc(alloca(AS(0)), to AS(5)), to AS(0)". Otherwise, the store
here would have a ptr
in generic space.
Currently the test is ran with -infer-address-spaces
:
; RUN: opt < %s -S -nvptx-lower-alloca -infer-address-spaces | FileCheck %s
Here are the outputs of main
with and without -infer-address-space
without -infer-address-space
:
(autovenv) [M] weiwei.chen@Weiweis-MacBook-Pro ~/research/modularml/modular/third-party/llvm-project 🍔 opt < llvm/test/CodeGen/NVPTX/lower-alloca.ll -S -nvptx-lower-alloca
; ModuleID = '<stdin>'
source_filename = "<stdin>"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-unknown-unknown"
define void @kernel() {
%A = alloca i32, align 4
%1 = addrspacecast ptr %A to ptr addrspace(5)
%2 = addrspacecast ptr addrspace(5) %1 to ptr
store i32 0, ptr %2, align 4
call void @callee(ptr %A)
ret void
}
declare void @callee(ptr)
!nvvm.annotations = !{!0}
!0 = !{ptr @kernel, !"kernel", i32 1}
with -infer-address-space
:
(autovenv) [M] weiwei.chen@Weiweis-MacBook-Pro ~/research/modularml/modular/third-party/llvm-project 🍔 opt < llvm/test/CodeGen/NVPTX/lower-alloca.ll -S -nvptx-lower-alloca -infer-address-spaces
; ModuleID = '<stdin>'
source_filename = "<stdin>"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-unknown-unknown"
define void @kernel() {
%A = alloca i32, align 4
%1 = addrspacecast ptr %A to ptr addrspace(5)
store i32 0, ptr addrspace(5) %1, align 4
call void @callee(ptr %A)
ret void
}
declare void @callee(ptr)
!nvvm.annotations = !{!0}
!0 = !{ptr @kernel, !"kernel", i32 1}
define void @alloc_already_in_addrspace5() { | ||
; LABEL: @lower_alloca_addrspace5 | ||
; PTX-LABEL: .visible .func alloc_already_in_addrspace5( | ||
%A = alloca i32, addrspace(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things in non-generic AS tends to get in the way of optimizations, so doing it up-front will likely result in worse code. Perhaps one day everything will handle non-default AS as well as generic AS, but I don't think we're there yet.
So, for now, I do not see any benefit of allocas in explicit AS -- we can easily infer the AS if/when we need to, and our priority is to eliminate as many of allocas as we can as they are a major bad news for performance, so not getting in the way of LLVM optimizations is more important.
…eic/nvptx/check-before-addrspacecast
…eic/nvptx/check-before-addrspacecast
…eic/nvptx/check-before-addrspacecast
…eic/nvptx/check-before-addrspacecast
…eic/nvptx/check-before-addrspacecast
; LABEL: @lower_alloca_addrspace5 | ||
; PTX-LABEL: .visible .func alloca_in_explicit_local_as( | ||
%A = alloca i32, addrspace(5) | ||
; CHECK-NOT: addrspacecast ptr %A to ptr addrspace(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to see a positive check here matching what we do want to see.
Negative checks are rather fragile. E.g. an unintentional typo in what we match will makes them always succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, adding a few positive checks to just to check the effect of -nvptx-lower-alloca
@@ -70,14 +70,30 @@ bool NVPTXLowerAlloca::runOnFunction(Function &F) { | |||
for (auto &I : BB) { | |||
if (auto allocaInst = dyn_cast<AllocaInst>(&I)) { | |||
Changed = true; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add an overall comment here describing what we do here and why. Otherwise it's not obvious why we do the seemingly pointless casting to AS5 and back.
Basically, we need to make sure that LLVM has info that alloca is in AS(5), but alloca's users still need a generic pointer to operate on.
For allocas in generic AS, we add asc to AS(5) and back to generic.
For allocas in AS5, we just need an ASC to generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done!
Got distracted by some other things and getting back to this now. If there is no objections, I'm gonna merge this one, and will definitely do follow-up PRs if there are more issues need to be addressed. 🙏 |
If
allocaInst
is already inADDRESS_SPACE_LOCAL
, there is no need to do an explicit cast which will actually fail assertion withAddrSpaceCastInst
. Only insert the cast when needed.