-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMDGPU] Use poison instead of undef for non-demanded elements #75914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-amdgpu Author: Nikita Popov (nikic) ChangesReturn poison instead of undef for non-demanded lanes in the AMDGPU demanded element simplification hook. I'm putting up a PR for this in case the notion of dead lanes for these dmask target intrinsics is different from LLVM's usual. Full diff: https://github.com/llvm/llvm-project/pull/75914.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
index ee93d9eb4c0a05..69867bf218efc9 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
@@ -1261,7 +1261,7 @@ static Value *simplifyAMDGCNMemoryIntrinsicDemanded(InstCombiner &IC,
unsigned NewNumElts = DemandedElts.popcount();
if (!NewNumElts)
- return UndefValue::get(IIVTy);
+ return PoisonValue::get(IIVTy);
if (NewNumElts >= VWidth && DemandedElts.isMask()) {
if (DMaskIdx >= 0)
@@ -1299,7 +1299,7 @@ static Value *simplifyAMDGCNMemoryIntrinsicDemanded(InstCombiner &IC,
if (IsLoad) {
if (NewNumElts == 1) {
- return IC.Builder.CreateInsertElement(UndefValue::get(IIVTy), NewCall,
+ return IC.Builder.CreateInsertElement(PoisonValue::get(IIVTy), NewCall,
DemandedElts.countr_zero());
}
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts-inseltpoison.ll b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts-inseltpoison.ll
index 53300e0c9771a9..1567a9cb7b3f92 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts-inseltpoison.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts-inseltpoison.ll
@@ -4792,7 +4792,7 @@ define amdgpu_ps float @extract_elt0_image_sample_2d_v4f32_f32(float %s, float %
define amdgpu_ps float @extract_elt0_dmask_0000_image_sample_3d_v4f32_f32(float %s, float %t, float %r, <8 x i32> inreg %sampler, <4 x i32> inreg %rsrc) #0 {
; CHECK-LABEL: @extract_elt0_dmask_0000_image_sample_3d_v4f32_f32(
-; CHECK-NEXT: ret float undef
+; CHECK-NEXT: ret float poison
;
%data = call <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f32(i32 0, float %s, float %t, float %r, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 0, i32 0)
%elt0 = extractelement <4 x float> %data, i32 0
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll
index ce809a5676b82f..3465bd7be3e678 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll
@@ -4791,7 +4791,7 @@ define amdgpu_ps float @extract_elt0_image_sample_2d_v4f32_f32(float %s, float %
define amdgpu_ps float @extract_elt0_dmask_0000_image_sample_3d_v4f32_f32(float %s, float %t, float %r, <8 x i32> inreg %sampler, <4 x i32> inreg %rsrc) #0 {
; CHECK-LABEL: @extract_elt0_dmask_0000_image_sample_3d_v4f32_f32(
-; CHECK-NEXT: ret float undef
+; CHECK-NEXT: ret float poison
;
%data = call <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f32(i32 0, float %s, float %t, float %r, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 0, i32 0)
%elt0 = extractelement <4 x float> %data, i32 0
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be a reason to prefer undef over poison here. However I'm wondering if it's correct to use undef in the first place. I briefly skimmed the hardware docs for images and I believe the behavior for 0 is to actually act like dmask=1 and return a 0
Are these docs public? I had a hard time finding information on what these intrinsics do. |
Mostly yes. https://llvm.org/docs/AMDGPUUsage.html#additional-documentation links to the various ISA public documents (see the MIMG sections for dmask). The intrinsics for the most part directly map to the instructions of the same names. It covers dmask but sometimes unusual edge cases aren't mentioned in the public docs (like I don't see the 0 case mentioned here) |
Thanks! Looking at the RDNA 3 docs, it does have a note on this as well:
I went ahead and pushed a commit to skip simplification for the dmask==0 case entirely, as it has special semantics. Does that sound reasonable? |
1a72c6f
to
058496c
Compare
Return poison instead of undef for non-demanded lanes in the AMDGPU demanded element simplification hook.
I'm putting up a PR for this in case the notion of dead lanes for these dmask target intrinsics is different from LLVM's usual.