[AMDGPU] Use poison instead of undef for non-demanded elements #75914

nikic · 2023-12-19T09:23:27Z

Return poison instead of undef for non-demanded lanes in the AMDGPU demanded element simplification hook.

I'm putting up a PR for this in case the notion of dead lanes for these dmask target intrinsics is different from LLVM's usual.

llvmbot · 2023-12-19T09:23:55Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Nikita Popov (nikic)

Changes

Return poison instead of undef for non-demanded lanes in the AMDGPU demanded element simplification hook.

I'm putting up a PR for this in case the notion of dead lanes for these dmask target intrinsics is different from LLVM's usual.

Full diff: https://github.com/llvm/llvm-project/pull/75914.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp (+2-2)
(modified) llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts-inseltpoison.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll (+1-1)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
index ee93d9eb4c0a05..69867bf218efc9 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
@@ -1261,7 +1261,7 @@ static Value *simplifyAMDGCNMemoryIntrinsicDemanded(InstCombiner &IC,
 
   unsigned NewNumElts = DemandedElts.popcount();
   if (!NewNumElts)
-    return UndefValue::get(IIVTy);
+    return PoisonValue::get(IIVTy);
 
   if (NewNumElts >= VWidth && DemandedElts.isMask()) {
     if (DMaskIdx >= 0)
@@ -1299,7 +1299,7 @@ static Value *simplifyAMDGCNMemoryIntrinsicDemanded(InstCombiner &IC,
 
   if (IsLoad) {
     if (NewNumElts == 1) {
-      return IC.Builder.CreateInsertElement(UndefValue::get(IIVTy), NewCall,
+      return IC.Builder.CreateInsertElement(PoisonValue::get(IIVTy), NewCall,
                                             DemandedElts.countr_zero());
     }
 
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts-inseltpoison.ll b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts-inseltpoison.ll
index 53300e0c9771a9..1567a9cb7b3f92 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts-inseltpoison.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts-inseltpoison.ll
@@ -4792,7 +4792,7 @@ define amdgpu_ps float @extract_elt0_image_sample_2d_v4f32_f32(float %s, float %
 
 define amdgpu_ps float @extract_elt0_dmask_0000_image_sample_3d_v4f32_f32(float %s, float %t, float %r, <8 x i32> inreg %sampler, <4 x i32> inreg %rsrc) #0 {
 ; CHECK-LABEL: @extract_elt0_dmask_0000_image_sample_3d_v4f32_f32(
-; CHECK-NEXT:    ret float undef
+; CHECK-NEXT:    ret float poison
 ;
   %data = call <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f32(i32 0, float %s, float %t, float %r, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 0, i32 0)
   %elt0 = extractelement <4 x float> %data, i32 0
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll
index ce809a5676b82f..3465bd7be3e678 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-demanded-vector-elts.ll
@@ -4791,7 +4791,7 @@ define amdgpu_ps float @extract_elt0_image_sample_2d_v4f32_f32(float %s, float %
 
 define amdgpu_ps float @extract_elt0_dmask_0000_image_sample_3d_v4f32_f32(float %s, float %t, float %r, <8 x i32> inreg %sampler, <4 x i32> inreg %rsrc) #0 {
 ; CHECK-LABEL: @extract_elt0_dmask_0000_image_sample_3d_v4f32_f32(
-; CHECK-NEXT:    ret float undef
+; CHECK-NEXT:    ret float poison
 ;
   %data = call <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f32(i32 0, float %s, float %t, float %r, <8 x i32> %sampler, <4 x i32> %rsrc, i1 false, i32 0, i32 0)
   %elt0 = extractelement <4 x float> %data, i32 0

arsenm

There shouldn't be a reason to prefer undef over poison here. However I'm wondering if it's correct to use undef in the first place. I briefly skimmed the hardware docs for images and I believe the behavior for 0 is to actually act like dmask=1 and return a 0

nikic · 2023-12-19T09:35:43Z

There shouldn't be a reason to prefer undef over poison here. However I'm wondering if it's correct to use undef in the first place. I briefly skimmed the hardware docs for images and I believe the behavior for 0 is to actually act like dmask=1 and return a 0

Are these docs public? I had a hard time finding information on what these intrinsics do.

arsenm · 2023-12-19T09:42:40Z

Are these docs public? I had a hard time finding information on what these intrinsics do.

Mostly yes. https://llvm.org/docs/AMDGPUUsage.html#additional-documentation links to the various ISA public documents (see the MIMG sections for dmask). The intrinsics for the most part directly map to the instructions of the same names. It covers dmask but sometimes unusual edge cases aren't mentioned in the public docs (like I don't see the 0 case mentioned here)

nikic · 2023-12-19T10:12:33Z

Are these docs public? I had a hard time finding information on what these intrinsics do.

Mostly yes. https://llvm.org/docs/AMDGPUUsage.html#additional-documentation links to the various ISA public documents (see the MIMG sections for dmask). The intrinsics for the most part directly map to the instructions of the same names. It covers dmask but sometimes unusual edge cases aren't mentioned in the public docs (like I don't see the 0 case mentioned here)

Thanks! Looking at the RDNA 3 docs, it does have a note on this as well:

If DMASK==0, the TA overrides DMASK=1 and puts zeros in VGPR followed by LWE status if exists. TFE status is not generated since the fetch is dropped.

I went ahead and pushed a commit to skip simplification for the dmask==0 case entirely, as it has special semantics. Does that sound reasonable?

nikic requested review from arsenm and nhaehnle December 19, 2023 09:23

llvmbot added backend:AMDGPU llvm:transforms labels Dec 19, 2023

arsenm approved these changes Dec 19, 2023

View reviewed changes

nhaehnle approved these changes Dec 19, 2023

View reviewed changes

nikic added 3 commits December 20, 2023 09:56

[AMDGPU] Use poison instead of undef for non-demanded elements

092b865

Don't simplify image intrinsic with dmask == 0

fbdec0a

Update tests after rebase

058496c

nikic force-pushed the amdgpu-demanded-elts branch from 1a72c6f to 058496c Compare December 20, 2023 08:58

nikic merged commit 9d60e95 into llvm:main Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Use poison instead of undef for non-demanded elements #75914

[AMDGPU] Use poison instead of undef for non-demanded elements #75914

nikic commented Dec 19, 2023

Uh oh!

llvmbot commented Dec 19, 2023 •

edited

Loading

Uh oh!

arsenm left a comment

Uh oh!

nikic commented Dec 19, 2023

Uh oh!

arsenm commented Dec 19, 2023

Uh oh!

nikic commented Dec 19, 2023

Uh oh!

Uh oh!

[AMDGPU] Use poison instead of undef for non-demanded elements #75914

[AMDGPU] Use poison instead of undef for non-demanded elements #75914

Conversation

nikic commented Dec 19, 2023

Uh oh!

llvmbot commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

nikic commented Dec 19, 2023

Uh oh!

arsenm commented Dec 19, 2023

Uh oh!

nikic commented Dec 19, 2023

Uh oh!

Uh oh!

llvmbot commented Dec 19, 2023 •

edited

Loading