-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] SelDAG: fix lowering of undefined workitem intrinsics #126058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
GlobalISel already handles undefined workitem.id.{x,y,z} intrinsics, SelDAG failed in AMDGPUISelLowering.cpp due to a failed assertion in `AMDGPUTargetLowering::loadInputValue`: `Arg && "Attempting to load missing argument"`. This commit changes the behavior of SelDAG to instead use a zero constant. This LLVM defect was identified via the AMD Fuzzing project.
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-amdgpu Author: Robert Imschweiler (ro-i) ChangesGlobalISel already handles undefined workitem.id.{x,y,z} intrinsics, SelDAG failed in AMDGPUISelLowering.cpp due to a failed assertion in This LLVM defect was identified via the AMD Fuzzing project. Full diff: https://github.com/llvm/llvm-project/pull/126058.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index b632c50dae0e359..47654890ff50e74 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -8790,11 +8790,28 @@ SDValue SITargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
AMDGPUFunctionArgInfo::LDS_KERNEL_ID);
}
case Intrinsic::amdgcn_workitem_id_x:
- return lowerWorkitemID(DAG, Op, 0, MFI->getArgInfo().WorkItemIDX);
+ if (!MFI->getArgInfo().WorkItemIDX) {
+ // It's undefined behavior if a function marked with the amdgpu-no-*
+ // attributes uses the corresponding intrinsic.
+ return DAG.getConstant(0, SDLoc(Op),
+ EVT::getIntegerVT(*DAG.getContext(), 32));
+ } else {
+ return lowerWorkitemID(DAG, Op, 0, MFI->getArgInfo().WorkItemIDX);
+ }
case Intrinsic::amdgcn_workitem_id_y:
- return lowerWorkitemID(DAG, Op, 1, MFI->getArgInfo().WorkItemIDY);
+ if (!MFI->getArgInfo().WorkItemIDY) {
+ return DAG.getConstant(0, SDLoc(Op),
+ EVT::getIntegerVT(*DAG.getContext(), 32));
+ } else {
+ return lowerWorkitemID(DAG, Op, 1, MFI->getArgInfo().WorkItemIDY);
+ }
case Intrinsic::amdgcn_workitem_id_z:
- return lowerWorkitemID(DAG, Op, 2, MFI->getArgInfo().WorkItemIDZ);
+ if (!MFI->getArgInfo().WorkItemIDZ) {
+ return DAG.getConstant(0, SDLoc(Op),
+ EVT::getIntegerVT(*DAG.getContext(), 32));
+ } else {
+ return lowerWorkitemID(DAG, Op, 2, MFI->getArgInfo().WorkItemIDZ);
+ }
case Intrinsic::amdgcn_wavefrontsize:
return DAG.getConstant(MF.getSubtarget<GCNSubtarget>().getWavefrontSize(),
SDLoc(Op), MVT::i32);
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.workitem.id-undefined-attr.ll b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.workitem.id-undefined-attr.ll
new file mode 100644
index 000000000000000..0e6a448fcae5a96
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.workitem.id-undefined-attr.ll
@@ -0,0 +1,36 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn -mcpu=gfx942 -O0 --global-isel=false -o - %s | FileCheck %s
+
+declare i32 @llvm.amdgcn.workitem.id.x() #0
+declare i32 @llvm.amdgcn.workitem.id.y() #0
+declare i32 @llvm.amdgcn.workitem.id.z() #0
+
+define amdgpu_ps i32 @test_workitem_id_x() #1 {
+; CHECK-LABEL: test_workitem_id_x:
+; CHECK: ; %bb.0:
+; CHECK-NEXT: s_mov_b32 s0, 0
+; CHECK-NEXT: ; return to shader part epilog
+ %id = call i32 @llvm.amdgcn.workitem.id.x()
+ ret i32 %id
+}
+
+define amdgpu_ps i32 @test_workitem_id_y() #1 {
+; CHECK-LABEL: test_workitem_id_y:
+; CHECK: ; %bb.0:
+; CHECK-NEXT: s_mov_b32 s0, 0
+; CHECK-NEXT: ; return to shader part epilog
+ %id = call i32 @llvm.amdgcn.workitem.id.y()
+ ret i32 %id
+}
+
+define amdgpu_ps i32 @test_workitem_id_z() #1 {
+; CHECK-LABEL: test_workitem_id_z:
+; CHECK: ; %bb.0:
+; CHECK-NEXT: s_mov_b32 s0, 0
+; CHECK-NEXT: ; return to shader part epilog
+ %id = call i32 @llvm.amdgcn.workitem.id.z()
+ ret i32 %id
+}
+
+attributes #0 = { nounwind readnone }
+attributes #1 = { nounwind }
|
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.workitem.id-undefined-attr.ll
Outdated
Show resolved
Hide resolved
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.workitem.id-undefined-attr.ll
Outdated
Show resolved
Hide resolved
; UNDEF-LABEL: undefined_workitem_z_only: | ||
; UNDEF: ; %bb.0: | ||
; UNDEF-NEXT: s_endpgm | ||
%id.z = call i32 @llvm.amdgcn.workitem.id.z() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test isn't showing anything. It needs to have a real use of the value. Plus it doesn't have any of the attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the tests and moved them to a separate file because they need to use a different calling convention than amdgpu_kernel
so that the workitem.id registers are not defined. For the SelDAG case, I need -O0
, otherwise everything is optimized away (which seems maybe a bit weird).
declare i32 @llvm.amdgcn.workitem.id.y() | ||
declare i32 @llvm.amdgcn.workitem.id.z() | ||
|
||
define amdgpu_ps void @undefined_workitems(ptr %p, ptr %q, ptr %r) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this function is marked with amdgpu-no-*
attribute. Is it implied by the CC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed so since:
- GlobalISel behaves this way, too.
- the docs for “amdgpu-no-workitem-id-x” say: "Indicates the function does not depend on the value of the llvm.amdgcn.workitem.id.x intrinsic. [...] The backend can generally infer this during code generation, so typically there is no benefit to frontends marking functions with this."
- the (spare) docs on CC specify for
amdgpu_ps
, for example: "Used for Mesa/AMDPAL pixel shaders. ..TODO:: Describe." which led me to assume that it does not guarantee the Initial Kernel Execution State, as described foramdgpu_kernel
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one nit.
@@ -0,0 +1,61 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 | |||
; RUN: llc -mtriple=amdgcn -mcpu=gfx942 -O0 -stop-after=si-fix-sgpr-copies -o - %s | FileCheck --check-prefix=SelDAG %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder why you want to stop it after si-fix-sgpr-copies
instead of something earlier? I suppose right after amdgpu-isel
would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works too (see my latest push) :) It just doesn't show the IMPLICIT_DEF
s explicitly.
; RUN: llc -mtriple=amdgcn -mcpu=gfx942 -O0 -stop-after=amdgpu-isel -o - %s | FileCheck --check-prefix=SelDAG %s | ||
; RUN: llc -global-isel -mtriple=amdgcn -mcpu=gfx942 -stop-after=legalizer -o - %s | FileCheck --check-prefix=GlobalISel %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check prefixes should be all capitalized. MIR tests should be used as sparingly as possible, and in this case you should just run to the end and check the final ISA. If you were checking the MIR, these should check at a common point after selection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done. But then I only get an empty block (so %bb.0: s_endpgm
). At least for GlobalIsel (no matter if -O0
or not) and also for SelDAG if I remove the -O0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's what you expect, the store of undef was deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Sorry, I guess I misunderstood what you said on another PR: that one should be able to see something directly in the tests. I've understood this to mean that one should be able to more or less explicitly see the undefs and not only see the resulting code where everything is optimized away.
; GlobalISel-NEXT: G_STORE [[DEF]](s32), [[MV2]](p0) :: (store (s32) into %ir.r) | ||
; GlobalISel-NEXT: S_ENDPGM 0 | ||
%id.x = call i32 @llvm.amdgcn.workitem.id.x() | ||
store i32 %id.x, ptr %p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use addrspace(1) pointers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. What is the difference in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different instructions, but overall global is the simplest address space
I pushed the changes to my existing branch. Should I make another PR? |
Yes, can't do anything with the merged one |
…126058) GlobalISel already handles undefined workitem.id.{x,y,z} intrinsics, SelDAG failed in AMDGPUISelLowering.cpp due to a failed assertion in `AMDGPUTargetLowering::loadInputValue`: `Arg && "Attempting to load missing argument"`. This commit changes the behavior of SelDAG to instead use a zero constant. This LLVM defect was identified via the AMD Fuzzing project.
…126058) GlobalISel already handles undefined workitem.id.{x,y,z} intrinsics, SelDAG failed in AMDGPUISelLowering.cpp due to a failed assertion in `AMDGPUTargetLowering::loadInputValue`: `Arg && "Attempting to load missing argument"`. This commit changes the behavior of SelDAG to instead use a zero constant. This LLVM defect was identified via the AMD Fuzzing project.
…126058) GlobalISel already handles undefined workitem.id.{x,y,z} intrinsics, SelDAG failed in AMDGPUISelLowering.cpp due to a failed assertion in `AMDGPUTargetLowering::loadInputValue`: `Arg && "Attempting to load missing argument"`. This commit changes the behavior of SelDAG to instead use a zero constant. This LLVM defect was identified via the AMD Fuzzing project.
GlobalISel already handles undefined workitem.id.{x,y,z} intrinsics, SelDAG failed in AMDGPUISelLowering.cpp due to a failed assertion in
AMDGPUTargetLowering::loadInputValue
:Arg && "Attempting to load missing argument"
. This commit changes the behavior of SelDAG to instead use a zero constant.This LLVM defect was identified via the AMD Fuzzing project.