Skip to content

[NVPTX] Check Before inserting AddrSpaceCastInst in NVPTXLoweringAlloca #106127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

weiweichen
Copy link
Contributor

If allocaInst is already in ADDRESS_SPACE_LOCAL, there is no need to do an explicit cast which will actually fail assertion with AddrSpaceCastInst. Only insert the cast when needed.

@weiweichen weiweichen marked this pull request as ready for review August 26, 2024 20:09
@llvmbot
Copy link
Member

llvmbot commented Aug 26, 2024

@llvm/pr-subscribers-backend-nvptx

Author: weiwei chen (weiweichen)

Changes

If allocaInst is already in ADDRESS_SPACE_LOCAL, there is no need to do an explicit cast which will actually fail assertion with AddrSpaceCastInst. Only insert the cast when needed.


Full diff: https://github.com/llvm/llvm-project/pull/106127.diff

2 Files Affected:

  • (modified) llvm/lib/Target/NVPTX/NVPTXLowerAlloca.cpp (+15-6)
  • (modified) llvm/test/CodeGen/NVPTX/lower-alloca.ll (+14)
diff --git a/llvm/lib/Target/NVPTX/NVPTXLowerAlloca.cpp b/llvm/lib/Target/NVPTX/NVPTXLowerAlloca.cpp
index 369238436083c7..62b789c008bb43 100644
--- a/llvm/lib/Target/NVPTX/NVPTXLowerAlloca.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXLowerAlloca.cpp
@@ -72,12 +72,21 @@ bool NVPTXLowerAlloca::runOnFunction(Function &F) {
         Changed = true;
         auto ETy = allocaInst->getAllocatedType();
         auto LocalAddrTy = PointerType::get(ETy, ADDRESS_SPACE_LOCAL);
-        auto NewASCToLocal = new AddrSpaceCastInst(allocaInst, LocalAddrTy, "");
-        auto GenericAddrTy = PointerType::get(ETy, ADDRESS_SPACE_GENERIC);
-        auto NewASCToGeneric =
-            new AddrSpaceCastInst(NewASCToLocal, GenericAddrTy, "");
-        NewASCToLocal->insertAfter(allocaInst);
-        NewASCToGeneric->insertAfter(NewASCToLocal);
+        PointerType *allocInstPtrTy = dyn_cast_or_null<PointerType>(allocaInst->getType()->getScalarType());
+        assert(allocInstPtrTy && "AllocInst scalar type is not a PointerType.");
+        Instruction* NewASCToGeneric = allocaInst;
+        if(allocInstPtrTy->getAddressSpace() != ADDRESS_SPACE_LOCAL) {
+          // Only insert a new AddrSpaceCastInst if
+          // allocaInst is not already in ADDRESS_SPACE_LOCAL.
+          auto NewASCToLocal =
+              new AddrSpaceCastInst(allocaInst, LocalAddrTy, "");
+          auto GenericAddrTy = PointerType::get(ETy, ADDRESS_SPACE_GENERIC);
+          NewASCToGeneric =
+              new AddrSpaceCastInst(NewASCToLocal, GenericAddrTy, "");
+          NewASCToLocal->insertAfter(allocaInst);
+          NewASCToGeneric->insertAfter(NewASCToLocal);
+        }
+
         for (Use &AllocaUse : llvm::make_early_inc_range(allocaInst->uses())) {
           // Check Load, Store, GEP, and BitCast Uses on alloca and make them
           // use the converted generic address, in order to expose non-generic
diff --git a/llvm/test/CodeGen/NVPTX/lower-alloca.ll b/llvm/test/CodeGen/NVPTX/lower-alloca.ll
index b1c34c8b5ecd78..68793dffb01e8a 100644
--- a/llvm/test/CodeGen/NVPTX/lower-alloca.ll
+++ b/llvm/test/CodeGen/NVPTX/lower-alloca.ll
@@ -17,7 +17,21 @@ define void @kernel() {
   ret void
 }
 
+define void @alloc_already_in_addrspace5() {
+; LABEL: @lower_alloca_addrspace5
+; PTX-LABEL: .visible .func alloc_already_in_addrspace5(
+  %A = alloca i32, addrspace(5)
+; CHECK-NOT: addrspacecast ptr %A to ptr addrspace(5)
+; CHECK: store i32 0, ptr addrspace(5) {{%.+}}
+; PTX: st.local.u32 [%SP+0], {{%r[0-9]+}}
+  store i32 0, ptr addrspace(5) %A
+  call void @callee(ptr addrspace(5) %A)
+  ret void
+}
+
 declare void @callee(ptr)
+declare void @callee_addrspace5(ptr addrspace(5))
 
 !nvvm.annotations = !{!0}
 !0 = !{ptr @kernel, !"kernel", i32 1}
+!1 = !{ptr @alloc_already_in_addrspace5, !"alloc_already_in_addrspace5", i32 1}

@weiweichen weiweichen requested review from nikic and Artem-B August 26, 2024 20:10
Copy link

github-actions bot commented Aug 26, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@weiweichen weiweichen requested a review from Mogball August 26, 2024 20:19
@justinfargnoli
Copy link
Contributor

do an explicit cast which will actually fail assertion with AddrSpaceCastInst

@weiweichen, where is this assertion? Maybe it needs to be changed.

@weiweichen
Copy link
Contributor Author

do an explicit cast which will actually fail assertion with AddrSpaceCastInst

@weiweichen, where is this assertion? Maybe it needs to be changed.

It comes from here

AddrSpaceCastInst::AddrSpaceCastInst(Value *S, Type *Ty, const Twine &Name,
                                     InsertPosition InsertBefore)
    : CastInst(Ty, AddrSpaceCast, S, Name, InsertBefore) {
  assert(castIsValid(getOpcode(), S, Ty) && "Illegal AddrSpaceCast");
}

where castIsValid(getOpcode(), S, Ty) returns false if S is already in the same address space which is from the logic here

    if (SrcPtrTy->getAddressSpace() == DstPtrTy->getAddressSpace())
      return false;

@justinfargnoli, would it be better to remove ☝️ instead?

@justinfargnoli
Copy link
Contributor

Interesting, I would've though that this is allowed. However, the IR Spec says otherwise (source):

The ‘addrspacecast’ instruction takes a pointer or vector of pointer value to cast and a pointer type to cast it to, which must have a different address space.


would it be better to remove ☝️ instead?

Since the IR spec does not allow this, we should not remove the check.

@weiweichen
Copy link
Contributor Author

Interesting, I would've though that this is allowed. However, the IR Spec says otherwise (source):

The ‘addrspacecast’ instruction takes a pointer or vector of pointer value to cast and a pointer type to cast it to, which must have a different address space.

would it be better to remove ☝️ instead?

Since the IR spec does not allow this, we should not remove the check.

Good point! Thank you for checking the spec! Leaving the check as is and perhaps the fix in this PR makes more sense?

@justinfargnoli
Copy link
Contributor

Good point! Thank you for checking the spec! Leaving the check as is and perhaps the fix in this PR makes more sense?

Agreed :)

define void @alloc_already_in_addrspace5() {
; LABEL: @lower_alloca_addrspace5
; PTX-LABEL: .visible .func alloc_already_in_addrspace5(
%A = alloca i32, addrspace(5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. I don't know that AS-specific alloca is a thing.

Copy link
Contributor

@gonzalobg gonzalobg Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't knew it was supported, but if it is supported, then NVPTX should be setting this for all allocas, and then converting the resulting pointer back to generic, if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things in non-generic AS tends to get in the way of optimizations, so doing it up-front will likely result in worse code. Perhaps one day everything will handle non-default AS as well as generic AS, but I don't think we're there yet.

So, for now, I do not see any benefit of allocas in explicit AS -- we can easily infer the AS if/when we need to, and our priority is to eliminate as many of allocas as we can as they are a major bad news for performance, so not getting in the way of LLVM optimizations is more important.

Copy link
Contributor Author

@weiweichen weiweichen Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this test to make sure the newly added logic is working correctly (instead of asserting). I don't have too much understanding with the optimization tradeoff you mentioned here so I'll defer it to your best judgement. Just wanna make sure if there is any action items I shall take for this PR specifically wrt this test? 🙏 Shall I add a comment to make the intention of this test more clear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is OK. Extra comments are helpful, but it's even better if the test is self-documenting.
E.g. renaming the test above to alloca_in_explicit_local_as would do the trick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, pushed an update.

PointerType *AllocInstPtrTy =
cast<PointerType>(allocaInst->getType()->getScalarType());
Instruction *NewASCToGeneric = allocaInst;
if (AllocInstPtrTy->getAddressSpace() != ADDRESS_SPACE_LOCAL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add an assert that alloca's address space must be generic. We can't cast any other AS to local.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alloca's address space must be generic

Is this a hard requirement for NVPTX?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@weiweichen weiweichen Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, interesting, thank you for pointing out this definition! Maybe the original assert is fine then 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, wait, the spec definition of casting can only from generic or from generic is not conflicting with the change here, since we basically are saying if the source is not generic, please don't cast. This is not in violation with the spec, no? Or is there a restriction that alloca can only happen in generic space but not any other spaces?

Copy link
Contributor Author

@weiweichen weiweichen Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the choice was between alloca in generic AS and alloca in AS(5), then indeed the choice would be to cast or skip. However, considering that alloca's AS can be explicitly specified to be in any AS, it would be prudent to have an assertion, so we catch unexpected input early, at the point where we care about it.

If we don't have an assertion, and we get an alloca in AS 333, we'll happily insert an ASC which will cause lowering problems further down in the pipeline and whoever will have to debug that will have to back-track to here before moving on to the real root cause -- nonsensical input IR. I'd much rather break things here, as soon as we're aware that something is wrong.

Definitely agree that assert should be there if the AS is not local (5)! However, the change in my PR does not remove that assert if the AS is say 333 because the if added to avoid cast will not be true and the cast will still happen and assert, right? Do you mean that an explicit assert should be added in NVPTXLowerAlloca.cpp instead of rely on the constructor of AddrSpaceCastInst to assert? If so, I think that's a good point too, and we can provide more meaningful error message as well.

I guess one thing I feel a bit confused about is whether alloca has to be in generic only. Based on my read of the discussions, I feel like it's not the case, it can be either in generic (which the compiler will later cast it to local) or in local (already, so no cast is needed by the compiler). In other words, alloca technically can only be in local and compiler will either do a cast if it's generic from the user program, or assert/error out if AS is not either local or generic. Is this a correct understanding?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that an explicit assert should be added in NVPTXLowerAlloca.cpp instead of rely on the constructor of AddrSpaceCastInst

Yes. It's better to catch it early, rather than add more wrong things, and wait for something to break later on, and then back-track.

By default, IR operates on generic pointers. Specific address spaces are target-dependent and are not guaranteed to be supported, even if IR remains syntactically valid. For NVPTX, we can only support allocas in the default AS or AS(5).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the right way for NVPTX / LLVM to exploit that all generic statespace allocas point to local statespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that an explicit assert should be added in NVPTXLowerAlloca.cpp instead of rely on the constructor of AddrSpaceCastInst

Yes. It's better to catch it early, rather than add more wrong things, and wait for something to break later on, and then back-track.

By default, IR operates on generic pointers. Specific address spaces are target-dependent and are not guaranteed to be supported, even if IR remains syntactically valid. For NVPTX, we can only support allocas in the default AS or AS(5).

Sounds good! I pushed a commit with the early checking and assert with a more specific error message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the right way for NVPTX / LLVM to exploit that all generic statespace allocas point to local statespace?

I'm not sure there's much to exploit. The best we can do with allocas is to get rid of them, which makes their AS a moot point. The allocas that are still left around, will be eventually lowered into the .local storage in PTX. Theoretically knowing AS could be used to tell LLVM that cost of the loads/stores is high, but I do not see it making much of a difference whether LLVM can do anything different based on that info. If it needs to use the alloca, it has to do it, and the potential performance difference between direct or generic pointer access will be dwarfed by the actual cost of touching the .local memory.

So, accepting alloca addrspace (5) is fine for completeness sake, but I don't see it making any measurable positive impact on performance.

NewASCToGeneric->insertAfter(NewASCToLocal);
PointerType *AllocInstPtrTy =
cast<PointerType>(allocaInst->getType()->getScalarType());
Instruction *NewASCToGeneric = allocaInst;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a better name, given that ASC may or may not be there now. AllocaInGenericAS ?
I'd also push the variables down, into the if body -- some of them seem to be needed by the now-conditional code only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, this is original name in the source code, happy to rename it if that's better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're changing what the code does, Adjusting variable names as appropriate is the right thing to do.

// allocaInst is not already in ADDRESS_SPACE_LOCAL.
auto NewASCToLocal =
new AddrSpaceCastInst(allocaInst, LocalAddrTy, "");
auto GenericAddrTy = PointerType::get(ETy, ADDRESS_SPACE_GENERIC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single-use variables could be pushed into the AddrSpaceCastInst call without hurting readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also just pushing the same original code around instead of new addition of the PR, but happy to apply the suggested change if that helps with readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, taking a closer look at the original logic of the code block

        auto ETy = allocaInst->getAllocatedType();
        auto LocalAddrTy = PointerType::get(ETy, ADDRESS_SPACE_LOCAL);
        auto NewASCToLocal = new AddrSpaceCastInst(allocaInst, LocalAddrTy, "");
        auto GenericAddrTy = PointerType::get(ETy, ADDRESS_SPACE_GENERIC);
        auto NewASCToGeneric =
            new AddrSpaceCastInst(NewASCToLocal, GenericAddrTy, "");
        NewASCToLocal->insertAfter(allocaInst);
        NewASCToGeneric->insertAfter(NewASCToLocal);

It seems to be casting the AS to Local and then back Generic before moving on. This looks very inefficient if the allocaInst is already in Generic and this is only here so that it can hit the assert???

I updated the logic to:

  1. don't do any AS cast if allocInst is already in Generic
  2. If allocInst is in Local, cast it to Generic.
  3. Applied some of the name changing and readability suggestions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually, I found out that -infer-address-spaces won't work properly if the extra ASCast to LOCAL is not inserted if alloca is already in Generic . So even alloca is already in generic, we still need to insert two ASCast to make go to local and back to generic to make -infer-address-spaces work properly (i.e. pass the test) 🤯 🤯 🤯 .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-infer-address-spaces won't work properly if the extra ASCast to LOCAL is not inserted if alloca is already in Generic

That didn't parse. Are you saying that inferring does not work for the plain alloca() -- it is already in generic AS. In other words it's broken right now?

Or that your patch skipped that "asc(asc(alloca, to AS(5)), to AS(0)" that the current code adds and that infer address spaces relies on? In that case, yes, we'll still need to keep those ASCs.

Copy link
Contributor Author

@weiweichen weiweichen Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it's a confusion from the original test ⬇️ (as well as my explanation 🤦 ). Let me elaborate:

define void @kernel() {
; LABEL: @lower_alloca
; PTX-LABEL: .visible .entry kernel(
  %A = alloca i32
; CHECK: addrspacecast ptr %A to ptr addrspace(5)
; CHECK: store i32 0, ptr addrspace(5) {{%.+}}
; PTX: st.local.u32 [{{%rd[0-9]+}}], {{%r[0-9]+}}
  store i32 0, ptr %A
  call void @callee(ptr %A)
  ret void
}

As shown above, FileCheck is checking that store has a ptr addrsapce(5). This only happens if there is a chain of "asc(asc(alloca(AS(0)), to AS(5)), to AS(0)". Otherwise, the store here would have a ptr in generic space.
Currently the test is ran with -infer-address-spaces:

; RUN: opt < %s -S -nvptx-lower-alloca -infer-address-spaces | FileCheck %s

Here are the outputs of main with and without -infer-address-space

without -infer-address-space:

(autovenv) [M] weiwei.chen@Weiweis-MacBook-Pro ~/research/modularml/modular/third-party/llvm-project 🍔 opt < llvm/test/CodeGen/NVPTX/lower-alloca.ll -S -nvptx-lower-alloca 
; ModuleID = '<stdin>'
source_filename = "<stdin>"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-unknown-unknown"

define void @kernel() {
  %A = alloca i32, align 4
  %1 = addrspacecast ptr %A to ptr addrspace(5)
  %2 = addrspacecast ptr addrspace(5) %1 to ptr
  store i32 0, ptr %2, align 4
  call void @callee(ptr %A)
  ret void
}

declare void @callee(ptr)

!nvvm.annotations = !{!0}

!0 = !{ptr @kernel, !"kernel", i32 1}

with -infer-address-space :

(autovenv) [M] weiwei.chen@Weiweis-MacBook-Pro ~/research/modularml/modular/third-party/llvm-project 🍔 opt < llvm/test/CodeGen/NVPTX/lower-alloca.ll -S -nvptx-lower-alloca -infer-address-spaces 
; ModuleID = '<stdin>'
source_filename = "<stdin>"
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-unknown-unknown"

define void @kernel() {
  %A = alloca i32, align 4
  %1 = addrspacecast ptr %A to ptr addrspace(5)
  store i32 0, ptr addrspace(5) %1, align 4
  call void @callee(ptr %A)
  ret void
}

declare void @callee(ptr)

!nvvm.annotations = !{!0}

!0 = !{ptr @kernel, !"kernel", i32 1}

define void @alloc_already_in_addrspace5() {
; LABEL: @lower_alloca_addrspace5
; PTX-LABEL: .visible .func alloc_already_in_addrspace5(
%A = alloca i32, addrspace(5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things in non-generic AS tends to get in the way of optimizations, so doing it up-front will likely result in worse code. Perhaps one day everything will handle non-default AS as well as generic AS, but I don't think we're there yet.

So, for now, I do not see any benefit of allocas in explicit AS -- we can easily infer the AS if/when we need to, and our priority is to eliminate as many of allocas as we can as they are a major bad news for performance, so not getting in the way of LLVM optimizations is more important.

; LABEL: @lower_alloca_addrspace5
; PTX-LABEL: .visible .func alloca_in_explicit_local_as(
%A = alloca i32, addrspace(5)
; CHECK-NOT: addrspacecast ptr %A to ptr addrspace(5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to see a positive check here matching what we do want to see.
Negative checks are rather fragile. E.g. an unintentional typo in what we match will makes them always succeed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, adding a few positive checks to just to check the effect of -nvptx-lower-alloca

@@ -70,14 +70,30 @@ bool NVPTXLowerAlloca::runOnFunction(Function &F) {
for (auto &I : BB) {
if (auto allocaInst = dyn_cast<AllocaInst>(&I)) {
Changed = true;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add an overall comment here describing what we do here and why. Otherwise it's not obvious why we do the seemingly pointless casting to AS5 and back.

Basically, we need to make sure that LLVM has info that alloca is in AS(5), but alloca's users still need a generic pointer to operate on.

For allocas in generic AS, we add asc to AS(5) and back to generic.
For allocas in AS5, we just need an ASC to generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done!

@weiweichen
Copy link
Contributor Author

Got distracted by some other things and getting back to this now. If there is no objections, I'm gonna merge this one, and will definitely do follow-up PRs if there are more issues need to be addressed. 🙏

@weiweichen weiweichen merged commit 2e58d92 into llvm:main Sep 12, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants