Skip to content

[NVPTX] Check Before inserting AddrSpaceCastInst in NVPTXLoweringAlloca #106127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
2506e94
Only insert AddrSpaceCastInst when necessary.
weiweichen Aug 26, 2024
0c9a46d
Add unittest.
weiweichen Aug 26, 2024
6ec68a5
Formattttttttt.
weiweichen Aug 26, 2024
1bdc7fa
Address review comment.
weiweichen Aug 26, 2024
bd08783
Merge branch 'main' of https://github.com/llvm/llvm-project into weiw…
weiweichen Aug 26, 2024
f1680bc
Formatting again, ahhhhh.
weiweichen Aug 26, 2024
833e061
Merge branch 'main' of https://github.com/llvm/llvm-project into weiw…
weiweichen Sep 3, 2024
0cee1cd
Add early assert for alloc address space check.
weiweichen Sep 3, 2024
26210e3
Merge branch 'main' of https://github.com/llvm/llvm-project into weiw…
weiweichen Sep 4, 2024
f84a849
Update conditional logic.
weiweichen Sep 4, 2024
22beb79
Address some review comments.
weiweichen Sep 4, 2024
8b6aa8c
Merge branch 'main' of https://github.com/llvm/llvm-project into weiw…
weiweichen Sep 4, 2024
7c4e9a3
Fix unit test.
weiweichen Sep 4, 2024
1f9c51e
Merge branch 'main' of https://github.com/llvm/llvm-project into weiw…
weiweichen Sep 4, 2024
7a54f90
Add -infer-address-spaces back to test.
weiweichen Sep 4, 2024
dd7d7f2
Merge branch 'main' of https://github.com/llvm/llvm-project into weiw…
weiweichen Sep 4, 2024
895d692
Address review comment to make test name more explicit.
weiweichen Sep 4, 2024
0fa98d1
Merge branch 'main' of https://github.com/llvm/llvm-project into weiw…
weiweichen Sep 4, 2024
4a020cb
Update test case to check nvptx-lower-alloca only without infer-addre…
weiweichen Sep 6, 2024
b1f7b70
Add more comments about the ASC logic in NVPTXLowerAlloca pass.
weiweichen Sep 6, 2024
1b84635
Merge branch 'main' of https://github.com/llvm/llvm-project into weiw…
weiweichen Sep 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 38 additions & 14 deletions llvm/lib/Target/NVPTX/NVPTXLowerAlloca.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@
//
//===----------------------------------------------------------------------===//

#include "MCTargetDesc/NVPTXBaseInfo.h"
#include "NVPTX.h"
#include "NVPTXUtilities.h"
#include "MCTargetDesc/NVPTXBaseInfo.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"
Expand Down Expand Up @@ -55,8 +55,8 @@ class NVPTXLowerAlloca : public FunctionPass {

char NVPTXLowerAlloca::ID = 1;

INITIALIZE_PASS(NVPTXLowerAlloca, "nvptx-lower-alloca",
"Lower Alloca", false, false)
INITIALIZE_PASS(NVPTXLowerAlloca, "nvptx-lower-alloca", "Lower Alloca", false,
false)

// =============================================================================
// Main function for this pass.
Expand All @@ -70,14 +70,38 @@ bool NVPTXLowerAlloca::runOnFunction(Function &F) {
for (auto &I : BB) {
if (auto allocaInst = dyn_cast<AllocaInst>(&I)) {
Changed = true;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add an overall comment here describing what we do here and why. Otherwise it's not obvious why we do the seemingly pointless casting to AS5 and back.

Basically, we need to make sure that LLVM has info that alloca is in AS(5), but alloca's users still need a generic pointer to operate on.

For allocas in generic AS, we add asc to AS(5) and back to generic.
For allocas in AS5, we just need an ASC to generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done!

PointerType *AllocInstPtrTy =
cast<PointerType>(allocaInst->getType()->getScalarType());
unsigned AllocAddrSpace = AllocInstPtrTy->getAddressSpace();
assert((AllocAddrSpace == ADDRESS_SPACE_GENERIC ||
AllocAddrSpace == ADDRESS_SPACE_LOCAL) &&
"AllocaInst can only be in Generic or Local address space for "
"NVPTX.");

Instruction *AllocaInLocalAS = allocaInst;
auto ETy = allocaInst->getAllocatedType();
auto LocalAddrTy = PointerType::get(ETy, ADDRESS_SPACE_LOCAL);
auto NewASCToLocal = new AddrSpaceCastInst(allocaInst, LocalAddrTy, "");
auto GenericAddrTy = PointerType::get(ETy, ADDRESS_SPACE_GENERIC);
auto NewASCToGeneric =
new AddrSpaceCastInst(NewASCToLocal, GenericAddrTy, "");
NewASCToLocal->insertAfter(allocaInst);
NewASCToGeneric->insertAfter(NewASCToLocal);

// We need to make sure that LLVM has info that alloca needs to go to
// ADDRESS_SPACE_LOCAL for InferAddressSpace pass.
//
// For allocas in ADDRESS_SPACE_LOCAL, we add addrspacecast to
// ADDRESS_SPACE_LOCAL and back to ADDRESS_SPACE_GENERIC, so that
// the alloca's users still use a generic pointer to operate on.
//
// For allocas already in ADDRESS_SPACE_LOCAL, we just need
// addrspacecast to ADDRESS_SPACE_GENERIC.
if (AllocAddrSpace == ADDRESS_SPACE_GENERIC) {
auto ASCastToLocalAS = new AddrSpaceCastInst(
allocaInst, PointerType::get(ETy, ADDRESS_SPACE_LOCAL), "");
ASCastToLocalAS->insertAfter(allocaInst);
AllocaInLocalAS = ASCastToLocalAS;
}

auto AllocaInGenericAS = new AddrSpaceCastInst(
AllocaInLocalAS, PointerType::get(ETy, ADDRESS_SPACE_GENERIC), "");
AllocaInGenericAS->insertAfter(AllocaInLocalAS);

for (Use &AllocaUse : llvm::make_early_inc_range(allocaInst->uses())) {
// Check Load, Store, GEP, and BitCast Uses on alloca and make them
// use the converted generic address, in order to expose non-generic
Expand All @@ -87,23 +111,23 @@ bool NVPTXLowerAlloca::runOnFunction(Function &F) {
auto LI = dyn_cast<LoadInst>(AllocaUse.getUser());
if (LI && LI->getPointerOperand() == allocaInst &&
!LI->isVolatile()) {
LI->setOperand(LI->getPointerOperandIndex(), NewASCToGeneric);
LI->setOperand(LI->getPointerOperandIndex(), AllocaInGenericAS);
continue;
}
auto SI = dyn_cast<StoreInst>(AllocaUse.getUser());
if (SI && SI->getPointerOperand() == allocaInst &&
!SI->isVolatile()) {
SI->setOperand(SI->getPointerOperandIndex(), NewASCToGeneric);
SI->setOperand(SI->getPointerOperandIndex(), AllocaInGenericAS);
continue;
}
auto GI = dyn_cast<GetElementPtrInst>(AllocaUse.getUser());
if (GI && GI->getPointerOperand() == allocaInst) {
GI->setOperand(GI->getPointerOperandIndex(), NewASCToGeneric);
GI->setOperand(GI->getPointerOperandIndex(), AllocaInGenericAS);
continue;
}
auto BI = dyn_cast<BitCastInst>(AllocaUse.getUser());
if (BI && BI->getOperand(0) == allocaInst) {
BI->setOperand(0, NewASCToGeneric);
BI->setOperand(0, AllocaInGenericAS);
continue;
}
}
Expand Down
20 changes: 20 additions & 0 deletions llvm/test/CodeGen/NVPTX/lower-alloca.ll
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
; RUN: opt < %s -S -nvptx-lower-alloca -infer-address-spaces | FileCheck %s
; RUN: opt < %s -S -nvptx-lower-alloca | FileCheck %s --check-prefix LOWERALLOCAONLY
; RUN: llc < %s -march=nvptx64 -mcpu=sm_35 | FileCheck %s --check-prefix PTX
; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_35 | %ptxas-verify %}

Expand All @@ -11,13 +12,32 @@ define void @kernel() {
%A = alloca i32
; CHECK: addrspacecast ptr %A to ptr addrspace(5)
; CHECK: store i32 0, ptr addrspace(5) {{%.+}}
; LOWERALLOCAONLY: [[V1:%.*]] = addrspacecast ptr %A to ptr addrspace(5)
; LOWERALLOCAONLY: [[V2:%.*]] = addrspacecast ptr addrspace(5) [[V1]] to ptr
; LOWERALLOCAONLY: store i32 0, ptr [[V2]], align 4
; PTX: st.local.u32 [{{%rd[0-9]+}}], {{%r[0-9]+}}
store i32 0, ptr %A
call void @callee(ptr %A)
ret void
}

define void @alloca_in_explicit_local_as() {
; LABEL: @lower_alloca_addrspace5
; PTX-LABEL: .visible .func alloca_in_explicit_local_as(
%A = alloca i32, addrspace(5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. I don't know that AS-specific alloca is a thing.

Copy link
Contributor

@gonzalobg gonzalobg Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't knew it was supported, but if it is supported, then NVPTX should be setting this for all allocas, and then converting the resulting pointer back to generic, if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things in non-generic AS tends to get in the way of optimizations, so doing it up-front will likely result in worse code. Perhaps one day everything will handle non-default AS as well as generic AS, but I don't think we're there yet.

So, for now, I do not see any benefit of allocas in explicit AS -- we can easily infer the AS if/when we need to, and our priority is to eliminate as many of allocas as we can as they are a major bad news for performance, so not getting in the way of LLVM optimizations is more important.

Copy link
Contributor Author

@weiweichen weiweichen Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this test to make sure the newly added logic is working correctly (instead of asserting). I don't have too much understanding with the optimization tradeoff you mentioned here so I'll defer it to your best judgement. Just wanna make sure if there is any action items I shall take for this PR specifically wrt this test? 🙏 Shall I add a comment to make the intention of this test more clear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is OK. Extra comments are helpful, but it's even better if the test is self-documenting.
E.g. renaming the test above to alloca_in_explicit_local_as would do the trick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, pushed an update.

; CHECK: store i32 0, ptr addrspace(5) {{%.+}}
; PTX: st.local.u32 [%SP+0], {{%r[0-9]+}}
; LOWERALLOCAONLY: [[V1:%.*]] = addrspacecast ptr addrspace(5) %A to ptr
; LOWERALLOCAONLY: store i32 0, ptr [[V1]], align 4
store i32 0, ptr addrspace(5) %A
call void @callee(ptr addrspace(5) %A)
ret void
}

declare void @callee(ptr)
declare void @callee_addrspace5(ptr addrspace(5))

!nvvm.annotations = !{!0}
!nvvm.annotations = !{!1}
!0 = !{ptr @kernel, !"kernel", i32 1}
!1 = !{ptr @alloca_in_explicit_local_as, !"alloca_in_explicit_local_as", i32 1}