Skip to content

[AMDGPU][Verifier] Check address space of alloca instruction #135820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 26, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions llvm/lib/IR/Verifier.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4395,6 +4395,11 @@ void Verifier::visitAllocaInst(AllocaInst &AI) {
verifySwiftErrorValue(&AI);
}

if (TT.isAMDGPU()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this check the triple? This should check the default alloca AS in the DL. Same for globals. Basically everything that has a dedicated AS in the DL should only be created in that AS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because of this folded comment #135820 (comment). There are some test cases that basically have alloca in all ASs. Also, based on the discussion in #136865, specifically #136865 (comment) and #136865 (comment), we can't rely on DL for this purpose.

Check(AI.getAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS,
"alloca on amdgpu must be in addrspace(5)", &AI);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to check it against the alloca addrspace from the data layout instead of hardcoding 5?

My thinking is that this way we could extend this check to most other targets as well (mainly excluding wasm and any other that may be using multiple alloca address spaces).

(Another more involved alternative would be to allow specifying multiple alloca address spaces in DL, and making the first the preferred one. Then this could be a target-independent check.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with that is the datalayout isn't always right. Even if you have the triple, the right datalaoyut doesn't get pulled. e.g.

; RUN llvm-as < %s
target triple = "amdgcn-amd-amdhsa"

define void @foo() {
  %alloca = alloca i32
  store volatile i32 0, ptr %alloca
  ret void
}

In this case DL.getAllocaAddrSpace() is 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think llc and opt do default to the correct data layout based on triple nowadays, but it's possible llvm-as doesn't do this, as a low level tool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llc yes, not opt I think. There are a bunch of tests with spurious datalayout = "A5" to compensate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually does seem to work for opt now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic I'm thinking about how to update test cases after we extend DL. There are many test cases w/o triple nor data layout, but they have alloca in multiple ASs. After we extend DL, and assert that alloca must be in those ASs, how are we gonna deal with those test cases? Explicitly add AX, AY, AZ? Also, targets such as NVPTX actually require alloca to be in AS5, but their data layout doesn't have that. They do have a fix-up to convert alloca to AS5. However, if we enforce alloca via DL, it also needs update of the data layout string for NVPTX, which is quite intrusive. I can't speak for NVPTX but it might be quite challenging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic I'm thinking about how to update test cases after we extend DL. There are many test cases w/o triple nor data layout, but they have alloca in multiple ASs. After we extend DL, and assert that alloca must be in those ASs, how are we gonna deal with those test cases? Explicitly add AX, AY, AZ?

Yes, that's what I'd expect to happen.

Also, targets such as NVPTX actually require alloca to be in AS5, but their data layout doesn't have that. They do have a fix-up to convert alloca to AS5. However, if we enforce alloca via DL, it also needs update of the data layout string for NVPTX, which is quite intrusive. I can't speak for NVPTX but it might be quite challenging.

The target data layouts change all the time, this is not a problem in itself.


Something I'm uncertain about with this approach is that this basically adds some information to the data layout that is only used for verification purposes. I don't think we have any other properties like that. Maybe that's a sign that it's not the right approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want it just to be for verification only? The point @arsenm made was that DL doesn't assert anything. If we can make it assert, it is gonna be more useful? Also, if we enforce alloca to be in specific AS(s), it is a form of assertion right? Therefore, we can use it in the middle end for optimization. Of course if a target accepts multiple ASs for alloca, we can't assert it must be in exact one, but for those only have one, we can assert.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 to making this a list in the DL. It's overly limiting and it's a lot of work for something we do not care about at all. KISS and just check == 5 for AMDGPU

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with leaving it at that for now.

}

visitInstruction(AI);
}

Expand Down
16 changes: 0 additions & 16 deletions llvm/test/CodeGen/AMDGPU/assert-wrong-alloca-addrspace.ll

This file was deleted.

5 changes: 3 additions & 2 deletions llvm/test/CodeGen/AMDGPU/lower-indirect-lds-references.ll
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,9 @@ define amdgpu_kernel void @offloading_kernel() {
}

define void @call_unknown() {
%1 = alloca ptr, align 8
%2 = call i32 %1()
%alloca = alloca ptr, align 8, addrspace(5)
%alloca.cast = addrspacecast ptr addrspace(5) %alloca to ptr
%ret = call i32 %alloca.cast()
ret void
}

Expand Down
3 changes: 0 additions & 3 deletions llvm/test/Transforms/InstCombine/alloca-in-non-alloca-as.ll
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,6 @@

; Gracefully handle the alloca that is not in the alloca AS (=5)

target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
target triple = "amdgcn-amd-amdhsa"

declare void @use(ptr)
declare void @use2(ptr, ptr)

Expand Down
2,149 changes: 1,202 additions & 947 deletions llvm/test/Transforms/OpenMP/custom_state_machines.ll

Large diffs are not rendered by default.

2,866 changes: 1,582 additions & 1,284 deletions llvm/test/Transforms/OpenMP/custom_state_machines_pre_lto.ll

Large diffs are not rendered by default.

1,331 changes: 776 additions & 555 deletions llvm/test/Transforms/OpenMP/spmdization.ll

Large diffs are not rendered by default.

253 changes: 124 additions & 129 deletions llvm/test/Transforms/OpenMP/spmdization_constant_prop.ll

Large diffs are not rendered by default.

570 changes: 291 additions & 279 deletions llvm/test/Transforms/OpenMP/spmdization_indirect.ll

Large diffs are not rendered by default.

102 changes: 102 additions & 0 deletions llvm/test/Verifier/AMDGPU/alloca.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
; RUN: not llvm-as %s --disable-output 2>&1 | FileCheck %s

target triple = "amdgcn-amd-amdhsa"

; CHECK: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.0 = alloca i32, align 4
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.1 = alloca i32, align 4, addrspace(1)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.2 = alloca i32, align 4, addrspace(2)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.3 = alloca i32, align 4, addrspace(3)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.4 = alloca i32, align 4, addrspace(4)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.6 = alloca i32, align 4, addrspace(6)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.7 = alloca i32, align 4, addrspace(7)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.8 = alloca i32, align 4, addrspace(8)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.9 = alloca i32, align 4, addrspace(9)
define void @static_alloca() {
entry:
%alloca.0 = alloca i32, align 4
%alloca.1 = alloca i32, align 4, addrspace(1)
%alloca.2 = alloca i32, align 4, addrspace(2)
%alloca.3 = alloca i32, align 4, addrspace(3)
%alloca.4 = alloca i32, align 4, addrspace(4)
%alloca.5 = alloca i32, align 4, addrspace(5)
%alloca.6 = alloca i32, align 4, addrspace(6)
%alloca.7 = alloca i32, align 4, addrspace(7)
%alloca.8 = alloca i32, align 4, addrspace(8)
%alloca.9 = alloca i32, align 4, addrspace(9)
ret void
}

; CHECK: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.0 = alloca i32, i32 %n, align 4
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.1 = alloca i32, i32 %n, align 4, addrspace(1)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.2 = alloca i32, i32 %n, align 4, addrspace(2)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.3 = alloca i32, i32 %n, align 4, addrspace(3)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.4 = alloca i32, i32 %n, align 4, addrspace(4)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.6 = alloca i32, i32 %n, align 4, addrspace(6)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.7 = alloca i32, i32 %n, align 4, addrspace(7)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.8 = alloca i32, i32 %n, align 4, addrspace(8)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.9 = alloca i32, i32 %n, align 4, addrspace(9)
define void @dynamic_alloca_i32(i32 %n) {
entry:
%alloca.0 = alloca i32, i32 %n, align 4
%alloca.1 = alloca i32, i32 %n, align 4, addrspace(1)
%alloca.2 = alloca i32, i32 %n, align 4, addrspace(2)
%alloca.3 = alloca i32, i32 %n, align 4, addrspace(3)
%alloca.4 = alloca i32, i32 %n, align 4, addrspace(4)
%alloca.5 = alloca i32, i32 %n, align 4, addrspace(5)
%alloca.6 = alloca i32, i32 %n, align 4, addrspace(6)
%alloca.7 = alloca i32, i32 %n, align 4, addrspace(7)
%alloca.8 = alloca i32, i32 %n, align 4, addrspace(8)
%alloca.9 = alloca i32, i32 %n, align 4, addrspace(9)
ret void
}

; CHECK: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.0 = alloca i32, i64 %n, align 4
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.1 = alloca i32, i64 %n, align 4, addrspace(1)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.2 = alloca i32, i64 %n, align 4, addrspace(2)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.3 = alloca i32, i64 %n, align 4, addrspace(3)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.4 = alloca i32, i64 %n, align 4, addrspace(4)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.6 = alloca i32, i64 %n, align 4, addrspace(6)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.7 = alloca i32, i64 %n, align 4, addrspace(7)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.8 = alloca i32, i64 %n, align 4, addrspace(8)
; CHECK-NEXT: alloca on amdgpu must be in addrspace(5)
; CHECK-NEXT: %alloca.9 = alloca i32, i64 %n, align 4, addrspace(9)
define void @dynamic_alloca_i64(i64 %n) {
entry:
%alloca.0 = alloca i32, i64 %n, align 4
%alloca.1 = alloca i32, i64 %n, align 4, addrspace(1)
%alloca.2 = alloca i32, i64 %n, align 4, addrspace(2)
%alloca.3 = alloca i32, i64 %n, align 4, addrspace(3)
%alloca.4 = alloca i32, i64 %n, align 4, addrspace(4)
%alloca.5 = alloca i32, i64 %n, align 4, addrspace(5)
%alloca.6 = alloca i32, i64 %n, align 4, addrspace(6)
%alloca.7 = alloca i32, i64 %n, align 4, addrspace(7)
%alloca.8 = alloca i32, i64 %n, align 4, addrspace(8)
%alloca.9 = alloca i32, i64 %n, align 4, addrspace(9)
ret void
}
Loading