Skip to content

[AMDGPU] Use poison instead of undef for non-demanded elements #75914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 20, 2023

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Dec 19, 2023

Return poison instead of undef for non-demanded lanes in the AMDGPU demanded element simplification hook.

I'm putting up a PR for this in case the notion of dead lanes for these dmask target intrinsics is different from LLVM's usual.

@llvmbot
Copy link
Member

llvmbot commented Dec 19, 2023

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Nikita Popov (nikic)

Changes

Return poison instead of undef for non-demanded lanes in the AMDGPU demanded element simplification hook.

I'm putting up a PR for this in case the notion of dead lanes for these dmask target intrinsics is different from LLVM's usual.


Full diff: https://github.com/llvm/llvm-project/pull/75914.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts-inseltpoison.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll (+1-1)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
index ee93d9eb4c0a05..69867bf218efc9 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
@@ -1261,7 +1261,7 @@ static Value *simplifyAMDGCNMemoryIntrinsicDemanded(InstCombiner &IC,
 
   unsigned NewNumElts = DemandedElts.popcount();
   if (!NewNumElts)
-    return UndefValue::get(IIVTy);
+    return PoisonValue::get(IIVTy);
 
   if (NewNumElts >= VWidth && DemandedElts.isMask()) {
     if (DMaskIdx >= 0)
@@ -1299,7 +1299,7 @@ static Value *simplifyAMDGCNMemoryIntrinsicDemanded(InstCombiner &IC,
 
   if (IsLoad) {
     if (NewNumElts == 1) {
-      return IC.Builder.CreateInsertElement(UndefValue::get(IIVTy), NewCall,
+      return IC.Builder.CreateInsertElement(PoisonValue::get(IIVTy), NewCall,
                                             DemandedElts.countr_zero());
     }
 
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts-inseltpoison.ll b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts-inseltpoison.ll
index 53300e0c9771a9..1567a9cb7b3f92 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts-inseltpoison.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts-inseltpoison.ll
@@ -4792,7 +4792,7 @@ define amdgpu_ps float @extract_elt0_image_sample_2d_v4f32_f32(float %s, float %
 
 define amdgpu_ps float @extract_elt0_dmask_0000_image_sample_3d_v4f32_f32(float %s, float %t, float %r, <8 x i32> inreg %sampler, <4 x i32> inreg %rsrc) #0 {
 ; CHECK-LABEL: @extract_elt0_dmask_0000_image_sample_3d_v4f32_f32(
-; CHECK-NEXT:    ret float undef
+; CHECK-NEXT:    ret float poison
 ;
   %data = call <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f32(i32 0, float %s, float %t, float %r, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 0, i32 0)
   %elt0 = extractelement <4 x float> %data, i32 0
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll
index ce809a5676b82f..3465bd7be3e678 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll
@@ -4791,7 +4791,7 @@ define amdgpu_ps float @extract_elt0_image_sample_2d_v4f32_f32(float %s, float %
 
 define amdgpu_ps float @extract_elt0_dmask_0000_image_sample_3d_v4f32_f32(float %s, float %t, float %r, <8 x i32> inreg %sampler, <4 x i32> inreg %rsrc) #0 {
 ; CHECK-LABEL: @extract_elt0_dmask_0000_image_sample_3d_v4f32_f32(
-; CHECK-NEXT:    ret float undef
+; CHECK-NEXT:    ret float poison
 ;
   %data = call <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f32(i32 0, float %s, float %t, float %r, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 0, i32 0)
   %elt0 = extractelement <4 x float> %data, i32 0

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be a reason to prefer undef over poison here. However I'm wondering if it's correct to use undef in the first place. I briefly skimmed the hardware docs for images and I believe the behavior for 0 is to actually act like dmask=1 and return a 0

@nikic
Copy link
Contributor Author

nikic commented Dec 19, 2023

There shouldn't be a reason to prefer undef over poison here. However I'm wondering if it's correct to use undef in the first place. I briefly skimmed the hardware docs for images and I believe the behavior for 0 is to actually act like dmask=1 and return a 0

Are these docs public? I had a hard time finding information on what these intrinsics do.

@arsenm
Copy link
Contributor

arsenm commented Dec 19, 2023

Are these docs public? I had a hard time finding information on what these intrinsics do.

Mostly yes. https://llvm.org/docs/AMDGPUUsage.html#additional-documentation links to the various ISA public documents (see the MIMG sections for dmask). The intrinsics for the most part directly map to the instructions of the same names. It covers dmask but sometimes unusual edge cases aren't mentioned in the public docs (like I don't see the 0 case mentioned here)

@nikic
Copy link
Contributor Author

nikic commented Dec 19, 2023

Are these docs public? I had a hard time finding information on what these intrinsics do.

Mostly yes. https://llvm.org/docs/AMDGPUUsage.html#additional-documentation links to the various ISA public documents (see the MIMG sections for dmask). The intrinsics for the most part directly map to the instructions of the same names. It covers dmask but sometimes unusual edge cases aren't mentioned in the public docs (like I don't see the 0 case mentioned here)

Thanks! Looking at the RDNA 3 docs, it does have a note on this as well:

If DMASK==0, the TA overrides DMASK=1 and puts zeros in VGPR followed by LWE status if exists. TFE status is not generated since the fetch is dropped.

I went ahead and pushed a commit to skip simplification for the dmask==0 case entirely, as it has special semantics. Does that sound reasonable?

@nikic nikic force-pushed the amdgpu-demanded-elts branch from 1a72c6f to 058496c Compare December 20, 2023 08:58
@nikic nikic merged commit 9d60e95 into llvm:main Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants