-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] SelDAG: fix lowering of undefined workitem intrinsics #126058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 | ||
; RUN: llc -mtriple=amdgcn -mcpu=gfx942 -O0 -stop-after=amdgpu-isel -o - %s | FileCheck --check-prefix=SelDAG %s | ||
; RUN: llc -global-isel -mtriple=amdgcn -mcpu=gfx942 -stop-after=legalizer -o - %s | FileCheck --check-prefix=GlobalISel %s | ||
|
||
declare i32 @llvm.amdgcn.workitem.id.x() | ||
declare i32 @llvm.amdgcn.workitem.id.y() | ||
declare i32 @llvm.amdgcn.workitem.id.z() | ||
|
||
define amdgpu_ps void @undefined_workitems(ptr %p, ptr %q, ptr %r) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see this function is marked with
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assumed so since:
|
||
; SelDAG-LABEL: name: undefined_workitems | ||
; SelDAG: bb.0 (%ir-block.0): | ||
; SelDAG-NEXT: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5 | ||
; SelDAG-NEXT: {{ $}} | ||
; SelDAG-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr5 | ||
; SelDAG-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr4 | ||
; SelDAG-NEXT: [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr3 | ||
; SelDAG-NEXT: [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr2 | ||
; SelDAG-NEXT: [[COPY4:%[0-9]+]]:vgpr_32 = COPY $vgpr1 | ||
; SelDAG-NEXT: [[COPY5:%[0-9]+]]:vgpr_32 = COPY $vgpr0 | ||
; SelDAG-NEXT: [[REG_SEQUENCE:%[0-9]+]]:sreg_64 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY]], %subreg.sub1 | ||
; SelDAG-NEXT: [[REG_SEQUENCE1:%[0-9]+]]:sreg_64 = REG_SEQUENCE [[COPY3]], %subreg.sub0, [[COPY2]], %subreg.sub1 | ||
; SelDAG-NEXT: [[REG_SEQUENCE2:%[0-9]+]]:sreg_64 = REG_SEQUENCE [[COPY5]], %subreg.sub0, [[COPY4]], %subreg.sub1 | ||
; SelDAG-NEXT: [[COPY6:%[0-9]+]]:vreg_64_align2 = COPY [[REG_SEQUENCE]] | ||
; SelDAG-NEXT: [[COPY7:%[0-9]+]]:vreg_64_align2 = COPY [[REG_SEQUENCE1]] | ||
; SelDAG-NEXT: [[COPY8:%[0-9]+]]:vreg_64_align2 = COPY [[REG_SEQUENCE2]] | ||
; SelDAG-NEXT: S_ENDPGM 0 | ||
; | ||
; GlobalISel-LABEL: name: undefined_workitems | ||
; GlobalISel: bb.1 (%ir-block.0): | ||
; GlobalISel-NEXT: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5 | ||
; GlobalISel-NEXT: {{ $}} | ||
; GlobalISel-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0 | ||
; GlobalISel-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1 | ||
; GlobalISel-NEXT: [[MV:%[0-9]+]]:_(p0) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32) | ||
; GlobalISel-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2 | ||
; GlobalISel-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr3 | ||
; GlobalISel-NEXT: [[MV1:%[0-9]+]]:_(p0) = G_MERGE_VALUES [[COPY2]](s32), [[COPY3]](s32) | ||
; GlobalISel-NEXT: [[COPY4:%[0-9]+]]:_(s32) = COPY $vgpr4 | ||
; GlobalISel-NEXT: [[COPY5:%[0-9]+]]:_(s32) = COPY $vgpr5 | ||
; GlobalISel-NEXT: [[MV2:%[0-9]+]]:_(p0) = G_MERGE_VALUES [[COPY4]](s32), [[COPY5]](s32) | ||
; GlobalISel-NEXT: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF | ||
; GlobalISel-NEXT: [[COPY6:%[0-9]+]]:_(s32) = COPY [[DEF]](s32) | ||
; GlobalISel-NEXT: G_STORE [[COPY6]](s32), [[MV]](p0) :: (store (s32) into %ir.p) | ||
; GlobalISel-NEXT: [[COPY7:%[0-9]+]]:_(s32) = COPY [[DEF]](s32) | ||
; GlobalISel-NEXT: G_STORE [[COPY7]](s32), [[MV1]](p0) :: (store (s32) into %ir.q) | ||
; GlobalISel-NEXT: G_STORE [[DEF]](s32), [[MV2]](p0) :: (store (s32) into %ir.r) | ||
; GlobalISel-NEXT: S_ENDPGM 0 | ||
%id.x = call i32 @llvm.amdgcn.workitem.id.x() | ||
store i32 %id.x, ptr %p | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use addrspace(1) pointers There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. What is the difference in this case? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Different instructions, but overall global is the simplest address space |
||
%id.y = call i32 @llvm.amdgcn.workitem.id.y() | ||
store i32 %id.y, ptr %q | ||
%id.z = call i32 @llvm.amdgcn.workitem.id.z() | ||
store i32 %id.z, ptr %r | ||
ret void | ||
} |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check prefixes should be all capitalized. MIR tests should be used as sparingly as possible, and in this case you should just run to the end and check the final ISA. If you were checking the MIR, these should check at a common point after selection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done. But then I only get an empty block (so
%bb.0: s_endpgm
). At least for GlobalIsel (no matter if-O0
or not) and also for SelDAG if I remove the-O0
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's what you expect, the store of undef was deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Sorry, I guess I misunderstood what you said on another PR: that one should be able to see something directly in the tests. I've understood this to mean that one should be able to more or less explicitly see the undefs and not only see the resulting code where everything is optimized away.