AMDGPU: Simplify demanded bits on readlane/writeline index arguments #117963

arsenm · 2024-11-28T03:53:35Z

The main goal is to fold away wave64 code when compiled for wave32.
If we have out of bounds indexing, these will now clamp down to
a low bit which may CSE with the operations on the low half of the
wave.

arsenm · 2024-11-28T03:53:45Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2024-11-28T03:54:23Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

The main goal is to fold away wave64 code when compiled for wave32.
If we have out of bounds indexing, these will now clamp down to
a low bit which may CSE with the operations on the low half of the
wave.

Full diff: https://github.com/llvm/llvm-project/pull/117963.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp (+42-1)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h (+4)
(modified) llvm/test/Transforms/InstCombine/AMDGPU/lane-index-simplify-demanded-bits.ll (+96-51)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
index 18a09c39a06387..a0bb3e181ac526 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
@@ -450,6 +450,37 @@ static bool isTriviallyUniform(const Use &U) {
   return false;
 }
 
+/// Simplify a lane index operand (e.g. llvm.amdgcn.readlane src1).
+///
+/// The instruction only reads the low 5 bits for wave32, and 6 bits for wave64.
+bool GCNTTIImpl::simplifyDemandedLaneMaskArg(InstCombiner &IC,
+                                             IntrinsicInst &II,
+                                             unsigned LaneArgIdx) const {
+  unsigned MaskBits = ST->isWaveSizeKnown() && ST->isWave32() ? 5 : 6;
+  APInt DemandedMask(32, maskTrailingOnes<unsigned>(MaskBits));
+
+  KnownBits Known(32);
+  if (IC.SimplifyDemandedBits(&II, LaneArgIdx, DemandedMask, Known))
+    return true;
+
+  if (!Known.isConstant())
+    return false;
+
+  // Unlike the DAG version, SimplifyDemandedBits does not change
+  // constants. Make sure we clamp these down. Out of bounds indexes may appear
+  // in wave64 code compiled for wave32.
+
+  Value *LaneArg = II.getArgOperand(LaneArgIdx);
+  Constant *MaskedConst =
+      ConstantInt::get(LaneArg->getType(), Known.getConstant() & DemandedMask);
+  if (MaskedConst != LaneArg) {
+    II.getOperandUse(LaneArgIdx).set(MaskedConst);
+    return true;
+  }
+
+  return false;
+}
+
 std::optional<Instruction *>
 GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
   Intrinsic::ID IID = II.getIntrinsicID();
@@ -1092,7 +1123,17 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
     const Use &Src = II.getArgOperandUse(0);
     if (isTriviallyUniform(Src))
       return IC.replaceInstUsesWith(II, Src.get());
-    break;
+
+    if (IID == Intrinsic::amdgcn_readlane &&
+        simplifyDemandedLaneMaskArg(IC, II, 1))
+      return &II;
+
+    return std::nullopt;
+  }
+  case Intrinsic::amdgcn_writelane: {
+    if (simplifyDemandedLaneMaskArg(IC, II, 1))
+      return &II;
+    return std::nullopt;
   }
   case Intrinsic::amdgcn_trig_preop: {
     // The intrinsic is declared with name mangling, but currently the
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
index 10956861650ab3..585f38fc02c29c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
@@ -220,6 +220,10 @@ class GCNTTIImpl final : public BasicTTIImplBase<GCNTTIImpl> {
 
   bool canSimplifyLegacyMulToMul(const Instruction &I, const Value *Op0,
                                  const Value *Op1, InstCombiner &IC) const;
+
+  bool simplifyDemandedLaneMaskArg(InstCombiner &IC, IntrinsicInst &II,
+                                   unsigned LaneAgIdx) const;
+
   std::optional<Instruction *> instCombineIntrinsic(InstCombiner &IC,
                                                     IntrinsicInst &II) const;
   std::optional<Value *> simplifyDemandedVectorEltsIntrinsic(
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/lane-index-simplify-demanded-bits.ll b/llvm/test/Transforms/InstCombine/AMDGPU/lane-index-simplify-demanded-bits.ll
index b686f447b8d3c9..327d68bdf550e4 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/lane-index-simplify-demanded-bits.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/lane-index-simplify-demanded-bits.ll
@@ -18,30 +18,45 @@ define i32 @readlane_31(i32 %arg) #0 {
 }
 
 define i32 @readlane_32(i32 %arg) #0 {
-; CHECK-LABEL: define i32 @readlane_32(
-; CHECK-SAME: i32 [[ARG:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 32)
-; CHECK-NEXT:    ret i32 [[RES]]
+; WAVE64-LABEL: define i32 @readlane_32(
+; WAVE64-SAME: i32 [[ARG:%.*]]) #[[ATTR0]] {
+; WAVE64-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 32)
+; WAVE64-NEXT:    ret i32 [[RES]]
+;
+; WAVE32-LABEL: define i32 @readlane_32(
+; WAVE32-SAME: i32 [[ARG:%.*]]) #[[ATTR0]] {
+; WAVE32-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 0)
+; WAVE32-NEXT:    ret i32 [[RES]]
 ;
   %res = call i32 @llvm.amdgcn.readlane.i32(i32 %arg, i32 32)
   ret i32 %res
 }
 
 define i32 @readlane_33(i32 %arg) #0 {
-; CHECK-LABEL: define i32 @readlane_33(
-; CHECK-SAME: i32 [[ARG:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 33)
-; CHECK-NEXT:    ret i32 [[RES]]
+; WAVE64-LABEL: define i32 @readlane_33(
+; WAVE64-SAME: i32 [[ARG:%.*]]) #[[ATTR0]] {
+; WAVE64-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 33)
+; WAVE64-NEXT:    ret i32 [[RES]]
+;
+; WAVE32-LABEL: define i32 @readlane_33(
+; WAVE32-SAME: i32 [[ARG:%.*]]) #[[ATTR0]] {
+; WAVE32-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 1)
+; WAVE32-NEXT:    ret i32 [[RES]]
 ;
   %res = call i32 @llvm.amdgcn.readlane.i32(i32 %arg, i32 33)
   ret i32 %res
 }
 
 define i32 @readlane_63(i32 %arg) #0 {
-; CHECK-LABEL: define i32 @readlane_63(
-; CHECK-SAME: i32 [[ARG:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 63)
-; CHECK-NEXT:    ret i32 [[RES]]
+; WAVE64-LABEL: define i32 @readlane_63(
+; WAVE64-SAME: i32 [[ARG:%.*]]) #[[ATTR0]] {
+; WAVE64-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 63)
+; WAVE64-NEXT:    ret i32 [[RES]]
+;
+; WAVE32-LABEL: define i32 @readlane_63(
+; WAVE32-SAME: i32 [[ARG:%.*]]) #[[ATTR0]] {
+; WAVE32-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 31)
+; WAVE32-NEXT:    ret i32 [[RES]]
 ;
   %res = call i32 @llvm.amdgcn.readlane.i32(i32 %arg, i32 63)
   ret i32 %res
@@ -50,7 +65,7 @@ define i32 @readlane_63(i32 %arg) #0 {
 define i32 @readlane_64(i32 %arg) #0 {
 ; CHECK-LABEL: define i32 @readlane_64(
 ; CHECK-SAME: i32 [[ARG:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 64)
+; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 0)
 ; CHECK-NEXT:    ret i32 [[RES]]
 ;
   %res = call i32 @llvm.amdgcn.readlane.i32(i32 %arg, i32 64)
@@ -58,11 +73,16 @@ define i32 @readlane_64(i32 %arg) #0 {
 }
 
 define i32 @readlane_and_31(i32 %arg, i32 %idx) #0 {
-; CHECK-LABEL: define i32 @readlane_and_31(
-; CHECK-SAME: i32 [[ARG:%.*]], i32 [[IDX:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[IDX_CLAMP:%.*]] = and i32 [[IDX]], 31
-; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 [[IDX_CLAMP]])
-; CHECK-NEXT:    ret i32 [[RES]]
+; WAVE64-LABEL: define i32 @readlane_and_31(
+; WAVE64-SAME: i32 [[ARG:%.*]], i32 [[IDX:%.*]]) #[[ATTR0]] {
+; WAVE64-NEXT:    [[IDX_CLAMP:%.*]] = and i32 [[IDX]], 31
+; WAVE64-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 [[IDX_CLAMP]])
+; WAVE64-NEXT:    ret i32 [[RES]]
+;
+; WAVE32-LABEL: define i32 @readlane_and_31(
+; WAVE32-SAME: i32 [[ARG:%.*]], i32 [[IDX:%.*]]) #[[ATTR0]] {
+; WAVE32-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 [[IDX]])
+; WAVE32-NEXT:    ret i32 [[RES]]
 ;
   %idx.clamp = and i32 %idx, 31
   %res = call i32 @llvm.amdgcn.readlane.i32(i32 %arg, i32 %idx.clamp)
@@ -72,8 +92,7 @@ define i32 @readlane_and_31(i32 %arg, i32 %idx) #0 {
 define i32 @readlane_and_63(i32 %arg, i32 %idx) #0 {
 ; CHECK-LABEL: define i32 @readlane_and_63(
 ; CHECK-SAME: i32 [[ARG:%.*]], i32 [[IDX:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[IDX_CLAMP:%.*]] = and i32 [[IDX]], 63
-; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 [[IDX_CLAMP]])
+; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 [[IDX]])
 ; CHECK-NEXT:    ret i32 [[RES]]
 ;
   %idx.clamp = and i32 %idx, 63
@@ -92,10 +111,15 @@ define i32 @readlane_poison(i32 %arg) #0 {
 }
 
 define float @readlane_f32_63(float %arg) #0 {
-; CHECK-LABEL: define float @readlane_f32_63(
-; CHECK-SAME: float [[ARG:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[RES:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[ARG]], i32 63)
-; CHECK-NEXT:    ret float [[RES]]
+; WAVE64-LABEL: define float @readlane_f32_63(
+; WAVE64-SAME: float [[ARG:%.*]]) #[[ATTR0]] {
+; WAVE64-NEXT:    [[RES:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[ARG]], i32 63)
+; WAVE64-NEXT:    ret float [[RES]]
+;
+; WAVE32-LABEL: define float @readlane_f32_63(
+; WAVE32-SAME: float [[ARG:%.*]]) #[[ATTR0]] {
+; WAVE32-NEXT:    [[RES:%.*]] = call float @llvm.amdgcn.readlane.f32(float [[ARG]], i32 31)
+; WAVE32-NEXT:    ret float [[RES]]
 ;
   %res = call float @llvm.amdgcn.readlane.f32(float %arg, i32 63)
   ret float %res
@@ -116,30 +140,45 @@ define i32 @writelane_31(i32 %arg0, i32 %arg1) #0 {
 }
 
 define i32 @writelane_32(i32 %arg0, i32 %arg1) #0 {
-; CHECK-LABEL: define i32 @writelane_32(
-; CHECK-SAME: i32 [[ARG0:%.*]], i32 [[ARG1:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.writelane.i32(i32 [[ARG0]], i32 32, i32 [[ARG1]])
-; CHECK-NEXT:    ret i32 [[RES]]
+; WAVE64-LABEL: define i32 @writelane_32(
+; WAVE64-SAME: i32 [[ARG0:%.*]], i32 [[ARG1:%.*]]) #[[ATTR0]] {
+; WAVE64-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.writelane.i32(i32 [[ARG0]], i32 32, i32 [[ARG1]])
+; WAVE64-NEXT:    ret i32 [[RES]]
+;
+; WAVE32-LABEL: define i32 @writelane_32(
+; WAVE32-SAME: i32 [[ARG0:%.*]], i32 [[ARG1:%.*]]) #[[ATTR0]] {
+; WAVE32-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.writelane.i32(i32 [[ARG0]], i32 0, i32 [[ARG1]])
+; WAVE32-NEXT:    ret i32 [[RES]]
 ;
   %res = call i32 @llvm.amdgcn.writelane.i32(i32 %arg0, i32 32, i32 %arg1)
   ret i32 %res
 }
 
 define i32 @writelane_33(i32 %arg0, i32 %arg1) #0 {
-; CHECK-LABEL: define i32 @writelane_33(
-; CHECK-SAME: i32 [[ARG0:%.*]], i32 [[ARG1:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.writelane.i32(i32 [[ARG0]], i32 33, i32 [[ARG1]])
-; CHECK-NEXT:    ret i32 [[RES]]
+; WAVE64-LABEL: define i32 @writelane_33(
+; WAVE64-SAME: i32 [[ARG0:%.*]], i32 [[ARG1:%.*]]) #[[ATTR0]] {
+; WAVE64-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.writelane.i32(i32 [[ARG0]], i32 33, i32 [[ARG1]])
+; WAVE64-NEXT:    ret i32 [[RES]]
+;
+; WAVE32-LABEL: define i32 @writelane_33(
+; WAVE32-SAME: i32 [[ARG0:%.*]], i32 [[ARG1:%.*]]) #[[ATTR0]] {
+; WAVE32-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.writelane.i32(i32 [[ARG0]], i32 1, i32 [[ARG1]])
+; WAVE32-NEXT:    ret i32 [[RES]]
 ;
   %res = call i32 @llvm.amdgcn.writelane.i32(i32 %arg0, i32 33, i32 %arg1)
   ret i32 %res
 }
 
 define i32 @writelane_63(i32 %arg0, i32 %arg1) #0 {
-; CHECK-LABEL: define i32 @writelane_63(
-; CHECK-SAME: i32 [[ARG0:%.*]], i32 [[ARG1:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.writelane.i32(i32 [[ARG0]], i32 63, i32 [[ARG1]])
-; CHECK-NEXT:    ret i32 [[RES]]
+; WAVE64-LABEL: define i32 @writelane_63(
+; WAVE64-SAME: i32 [[ARG0:%.*]], i32 [[ARG1:%.*]]) #[[ATTR0]] {
+; WAVE64-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.writelane.i32(i32 [[ARG0]], i32 63, i32 [[ARG1]])
+; WAVE64-NEXT:    ret i32 [[RES]]
+;
+; WAVE32-LABEL: define i32 @writelane_63(
+; WAVE32-SAME: i32 [[ARG0:%.*]], i32 [[ARG1:%.*]]) #[[ATTR0]] {
+; WAVE32-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.writelane.i32(i32 [[ARG0]], i32 31, i32 [[ARG1]])
+; WAVE32-NEXT:    ret i32 [[RES]]
 ;
   %res = call i32 @llvm.amdgcn.writelane.i32(i32 %arg0, i32 63, i32 %arg1)
   ret i32 %res
@@ -148,7 +187,7 @@ define i32 @writelane_63(i32 %arg0, i32 %arg1) #0 {
 define i32 @writelane_64(i32 %arg0, i32 %arg1) #0 {
 ; CHECK-LABEL: define i32 @writelane_64(
 ; CHECK-SAME: i32 [[ARG0:%.*]], i32 [[ARG1:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.writelane.i32(i32 [[ARG0]], i32 64, i32 [[ARG1]])
+; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.writelane.i32(i32 [[ARG0]], i32 0, i32 [[ARG1]])
 ; CHECK-NEXT:    ret i32 [[RES]]
 ;
   %res = call i32 @llvm.amdgcn.writelane.i32(i32 %arg0, i32 64, i32 %arg1)
@@ -156,11 +195,16 @@ define i32 @writelane_64(i32 %arg0, i32 %arg1) #0 {
 }
 
 define i32 @writelane_and_31(i32 %arg0, i32 %arg1, i32 %idx) #0 {
-; CHECK-LABEL: define i32 @writelane_and_31(
-; CHECK-SAME: i32 [[ARG0:%.*]], i32 [[ARG1:%.*]], i32 [[IDX:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[IDX_CLAMP:%.*]] = and i32 [[IDX]], 31
-; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.writelane.i32(i32 [[ARG0]], i32 [[IDX_CLAMP]], i32 [[ARG1]])
-; CHECK-NEXT:    ret i32 [[RES]]
+; WAVE64-LABEL: define i32 @writelane_and_31(
+; WAVE64-SAME: i32 [[ARG0:%.*]], i32 [[ARG1:%.*]], i32 [[IDX:%.*]]) #[[ATTR0]] {
+; WAVE64-NEXT:    [[IDX_CLAMP:%.*]] = and i32 [[IDX]], 31
+; WAVE64-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.writelane.i32(i32 [[ARG0]], i32 [[IDX_CLAMP]], i32 [[ARG1]])
+; WAVE64-NEXT:    ret i32 [[RES]]
+;
+; WAVE32-LABEL: define i32 @writelane_and_31(
+; WAVE32-SAME: i32 [[ARG0:%.*]], i32 [[ARG1:%.*]], i32 [[IDX:%.*]]) #[[ATTR0]] {
+; WAVE32-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.writelane.i32(i32 [[ARG0]], i32 [[IDX]], i32 [[ARG1]])
+; WAVE32-NEXT:    ret i32 [[RES]]
 ;
   %idx.clamp = and i32 %idx, 31
   %res = call i32 @llvm.amdgcn.writelane.i32(i32 %arg0, i32 %idx.clamp, i32 %arg1)
@@ -170,8 +214,7 @@ define i32 @writelane_and_31(i32 %arg0, i32 %arg1, i32 %idx) #0 {
 define i32 @writelane_and_63(i32 %arg0, i32 %arg1, i32 %idx) #0 {
 ; CHECK-LABEL: define i32 @writelane_and_63(
 ; CHECK-SAME: i32 [[ARG0:%.*]], i32 [[ARG1:%.*]], i32 [[IDX:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[IDX_CLAMP:%.*]] = and i32 [[IDX]], 63
-; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.writelane.i32(i32 [[ARG0]], i32 [[IDX_CLAMP]], i32 [[ARG1]])
+; CHECK-NEXT:    [[RES:%.*]] = call i32 @llvm.amdgcn.writelane.i32(i32 [[ARG0]], i32 [[IDX]], i32 [[ARG1]])
 ; CHECK-NEXT:    ret i32 [[RES]]
 ;
   %idx.clamp = and i32 %idx, 63
@@ -190,16 +233,18 @@ define i32 @writelane_poison(i32 %arg0, i32 %arg1) #0 {
 }
 
 define float @writelane_f32_63(float %arg0, float %arg1) #0 {
-; CHECK-LABEL: define float @writelane_f32_63(
-; CHECK-SAME: float [[ARG0:%.*]], float [[ARG1:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:    [[RES:%.*]] = call float @llvm.amdgcn.writelane.f32(float [[ARG0]], i32 63, float [[ARG1]])
-; CHECK-NEXT:    ret float [[RES]]
+; WAVE64-LABEL: define float @writelane_f32_63(
+; WAVE64-SAME: float [[ARG0:%.*]], float [[ARG1:%.*]]) #[[ATTR0]] {
+; WAVE64-NEXT:    [[RES:%.*]] = call float @llvm.amdgcn.writelane.f32(float [[ARG0]], i32 63, float [[ARG1]])
+; WAVE64-NEXT:    ret float [[RES]]
+;
+; WAVE32-LABEL: define float @writelane_f32_63(
+; WAVE32-SAME: float [[ARG0:%.*]], float [[ARG1:%.*]]) #[[ATTR0]] {
+; WAVE32-NEXT:    [[RES:%.*]] = call float @llvm.amdgcn.writelane.f32(float [[ARG0]], i32 31, float [[ARG1]])
+; WAVE32-NEXT:    ret float [[RES]]
 ;
   %res = call float @llvm.amdgcn.writelane.f32(float %arg0, i32 63, float %arg1)
   ret float %res
 }
 
 attributes #0 = { nounwind }
-;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
-; WAVE32: {{.*}}
-; WAVE64: {{.*}}

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp

jayfoad · 2024-11-28T08:31:54Z

If we have out of bounds indexing, these will now clamp down to
a low bit which may CSE with the operations on the low half of the
wave.

Should mention that in the comment in the code. It was not clear to me why you would want to clamp constants.

The main goal is to fold away wave64 code when compiled for wave32. If we have out of bounds indexing, these will now clamp down to a low bit which may CSE with the operations on the low half of the wave.

arsenm · 2024-12-06T15:10:57Z

ping

jayfoad · 2024-12-06T15:24:51Z

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp

+bool GCNTTIImpl::simplifyDemandedLaneMaskArg(InstCombiner &IC,
+                                             IntrinsicInst &II,
+                                             unsigned LaneArgIdx) const {
+  unsigned MaskBits = ST->getWavefrontSizeLog2();


I haven't been following closely, but isn't wave size a tri-state thing now, 32 or 64 or unknown? What does getWavefrontSizeLog2 return if it's unknown?

It is and is not a tri-state at the same time. The high level wavesize queries, like getWavefrontSize(Log2) will return 64(6) in the unknown case, which is safe in this context. We have the additional query to check if the wavesize is known precise, and isn't the maximum

It is and is not a tri-state at the same time.

:(

64(6) in the unknown case, which is safe in this context

Agreed.

arsenm mentioned this pull request Nov 28, 2024

AMDGPU: Add baseline test for lane index simplification #117962

Merged

arsenm added the backend:AMDGPU label Nov 28, 2024 — with Graphite App

arsenm requested review from jayfoad, jhuber6 and rampitec November 28, 2024 03:54

arsenm marked this pull request as ready for review November 28, 2024 03:54

llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Nov 28, 2024

shiltian reviewed Nov 28, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp Outdated Show resolved Hide resolved

arsenm force-pushed the users/arsenm/amdgpu-baseline-test-lane-index-simplify branch from 3363529 to bd8f0b9 Compare December 2, 2024 13:35

arsenm force-pushed the users/arsenm/amdgpu-simplify-demanded-readlane-index branch from 4c11c64 to 5a617f8 Compare December 2, 2024 13:36

Base automatically changed from users/arsenm/amdgpu-baseline-test-lane-index-simplify to main December 2, 2024 19:50

arsenm added 3 commits December 2, 2024 19:51

AMDGPU: Simplify demanded bits on readlane/writeline index arguments

f1172c1

The main goal is to fold away wave64 code when compiled for wave32. If we have out of bounds indexing, these will now clamp down to a low bit which may CSE with the operations on the low half of the wave.

Just use getWavefrontSizeLog2

1894d97

Reword comment

45498a1

arsenm force-pushed the users/arsenm/amdgpu-simplify-demanded-readlane-index branch from 5a617f8 to 45498a1 Compare December 2, 2024 19:51

jayfoad reviewed Dec 6, 2024

View reviewed changes

jayfoad approved these changes Dec 6, 2024

View reviewed changes

arsenm merged commit c74e223 into main Dec 6, 2024
8 checks passed

arsenm deleted the users/arsenm/amdgpu-simplify-demanded-readlane-index branch December 6, 2024 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU: Simplify demanded bits on readlane/writeline index arguments #117963

AMDGPU: Simplify demanded bits on readlane/writeline index arguments #117963

Uh oh!

arsenm commented Nov 28, 2024

Uh oh!

arsenm commented Nov 28, 2024 •

edited

Loading

Uh oh!

llvmbot commented Nov 28, 2024 •

edited

Loading

Uh oh!

Uh oh!

jayfoad commented Nov 28, 2024

Uh oh!

arsenm commented Dec 6, 2024

Uh oh!

jayfoad Dec 6, 2024

Uh oh!

arsenm Dec 6, 2024

Uh oh!

jayfoad Dec 6, 2024

Uh oh!

Uh oh!

Uh oh!

AMDGPU: Simplify demanded bits on readlane/writeline index arguments #117963

AMDGPU: Simplify demanded bits on readlane/writeline index arguments #117963

Uh oh!

Conversation

arsenm commented Nov 28, 2024

Uh oh!

arsenm commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jayfoad commented Nov 28, 2024

Uh oh!

arsenm commented Dec 6, 2024

Uh oh!

jayfoad Dec 6, 2024

Choose a reason for hiding this comment

Uh oh!

arsenm Dec 6, 2024

Choose a reason for hiding this comment

Uh oh!

jayfoad Dec 6, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

arsenm commented Nov 28, 2024 •

edited

Loading

llvmbot commented Nov 28, 2024 •

edited

Loading