[AMDGPU] InstCombine llvm.amdgcn.ds.bpermute with uniform arguments #130133

jayfoad · 2025-03-06T16:29:15Z

Reland #129895 with a fix to avoid trying to combine bpermute of
bitcast.

Reland llvm#129895 with a fix to avoid trying to combine bpermute of bitcast.

llvmbot · 2025-03-06T16:29:48Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

Changes

Reland #129895 with a fix to avoid trying to combine bpermute of
bitcast.

Full diff: https://github.com/llvm/llvm-project/pull/130133.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp (+23-4)
(modified) llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll (+39)
(modified) llvm/test/Transforms/InstCombine/AMDGPU/bitcast-fold-lane-ops.ll (+12)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
index ebe740f884ea6..0e500cca8beb1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
@@ -1118,9 +1118,11 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
   }
   case Intrinsic::amdgcn_permlane64:
   case Intrinsic::amdgcn_readfirstlane:
-  case Intrinsic::amdgcn_readlane: {
-    // If the first argument is uniform these intrinsics return it unchanged.
-    const Use &Src = II.getArgOperandUse(0);
+  case Intrinsic::amdgcn_readlane:
+  case Intrinsic::amdgcn_ds_bpermute: {
+    // If the data argument is uniform these intrinsics return it unchanged.
+    unsigned SrcIdx = IID == Intrinsic::amdgcn_ds_bpermute ? 1 : 0;
+    const Use &Src = II.getArgOperandUse(SrcIdx);
     if (isTriviallyUniform(Src))
       return IC.replaceInstUsesWith(II, Src.get());
 
@@ -1129,7 +1131,8 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
       return &II;
 
     // readfirstlane.ty0 (bitcast ty1 x to ty0) -> bitcast (readfirstlane.ty1)
-    if (auto *BC = dyn_cast<BitCastInst>(Src); BC && BC->hasOneUse()) {
+    if (auto *BC = dyn_cast<BitCastInst>(Src); BC && BC->hasOneUse() &&
+        IID != Intrinsic::amdgcn_ds_bpermute) {
       Value *BCSrc = BC->getOperand(0);
 
       // TODO: Handle this for update_dpp, mov_ddp8, and all permlane variants.
@@ -1152,6 +1155,22 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
       }
     }
 
+    // If the lane argument of bpermute is uniform, change it to readlane. This
+    // generates better code and can enable further optimizations because
+    // readlane is AlwaysUniform.
+    if (IID == Intrinsic::amdgcn_ds_bpermute) {
+      const Use &Lane = II.getArgOperandUse(0);
+      if (isTriviallyUniform(Lane)) {
+        Value *NewLane = IC.Builder.CreateLShr(Lane, 2);
+        Function *NewDecl = Intrinsic::getOrInsertDeclaration(
+            II.getModule(), Intrinsic::amdgcn_readlane, II.getType());
+        II.setCalledFunction(NewDecl);
+        II.setOperand(0, Src);
+        II.setOperand(1, NewLane);
+        return &II;
+      }
+    }
+
     return std::nullopt;
   }
   case Intrinsic::amdgcn_writelane: {
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
index 3c190efca7acf..843b436aa1b0f 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
@@ -6583,3 +6583,42 @@ define i32 @prng_poison_i32() {
   %prng = call i32 @llvm.amdgcn.prng.b32(i32 poison)
   ret i32 %prng
 }
+
+; --------------------------------------------------------------------
+; llvm.amdgcn.ds.bpermute
+; --------------------------------------------------------------------
+
+define amdgpu_kernel void @ds_bpermute_uniform_src(ptr addrspace(1) %out, i32 %lane) {
+; CHECK-LABEL: @ds_bpermute_uniform_src(
+; CHECK-NEXT:    store i32 7, ptr addrspace(1) [[OUT:%.*]], align 4
+; CHECK-NEXT:    ret void
+;
+  %v = call i32 @llvm.amdgcn.ds.bpermute(i32 %lane, i32 7)
+  store i32 %v, ptr addrspace(1) %out
+  ret void
+}
+
+define amdgpu_kernel void @ds_bpermute_constant_lane(ptr addrspace(1) %out, i32 %src) {
+; CHECK-LABEL: @ds_bpermute_constant_lane(
+; CHECK-NEXT:    [[V:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[SRC:%.*]], i32 7)
+; CHECK-NEXT:    store i32 [[V]], ptr addrspace(1) [[OUT:%.*]], align 4
+; CHECK-NEXT:    ret void
+;
+  %v = call i32 @llvm.amdgcn.ds.bpermute(i32 28, i32 %src)
+  store i32 %v, ptr addrspace(1) %out
+  ret void
+}
+
+define amdgpu_kernel void @ds_bpermute_uniform_lane(ptr addrspace(1) %out, i32 %lanearg, i32 %src) {
+; CHECK-LABEL: @ds_bpermute_uniform_lane(
+; CHECK-NEXT:    [[LANE:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[LANEARG:%.*]])
+; CHECK-NEXT:    [[TMP1:%.*]] = lshr i32 [[LANE]], 2
+; CHECK-NEXT:    [[V:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[SRC:%.*]], i32 [[TMP1]])
+; CHECK-NEXT:    store i32 [[V]], ptr addrspace(1) [[OUT:%.*]], align 4
+; CHECK-NEXT:    ret void
+;
+  %lane = call i32 @llvm.amdgcn.readfirstlane(i32 %lanearg)
+  %v = call i32 @llvm.amdgcn.ds.bpermute(i32 %lane, i32 %src)
+  store i32 %v, ptr addrspace(1) %out
+  ret void
+}
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/bitcast-fold-lane-ops.ll b/llvm/test/Transforms/InstCombine/AMDGPU/bitcast-fold-lane-ops.ll
index e458fbd712370..02f50228339b1 100644
--- a/llvm/test/Transforms/InstCombine/AMDGPU/bitcast-fold-lane-ops.ll
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/bitcast-fold-lane-ops.ll
@@ -311,3 +311,15 @@ define i32 @test_bitcast_f32_to_i32_readlane_convergencetoken(float %val, i32 in
   %result = call i32 @llvm.amdgcn.readlane.i32(i32 %bitcast, i32 %lane.index) [ "convergencectrl"(token %t) ]
   ret i32 %result
 }
+
+define i32 @test_bitcast_f32_to_i32_ds_bpermute(float %val, i32 %addr) {
+; CHECK-LABEL: define i32 @test_bitcast_f32_to_i32_ds_bpermute(
+; CHECK-SAME: float [[VAL:%.*]], i32 [[ADDR:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[BITCAST:%.*]] = bitcast float [[VAL]] to i32
+; CHECK-NEXT:    [[RESULT:%.*]] = call i32 @llvm.amdgcn.ds.bpermute(i32 [[ADDR]], i32 [[BITCAST]])
+; CHECK-NEXT:    ret i32 [[RESULT]]
+;
+  %bitcast = bitcast float %val to i32
+  %result = call i32 @llvm.amdgcn.ds.bpermute(i32 %addr, i32 %bitcast)
+  ret i32 %result
+}

arsenm · 2025-03-07T05:14:19Z

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

+  %v = call i32 @llvm.amdgcn.ds.bpermute(i32 %lane, i32 %src)
+  store i32 %v, ptr addrspace(1) %out
+  ret void
+}


Test bitcast with trivially uniform arg? I guess we could do a strip no-op casts

…lvm#130133) Reland llvm#129895 with a fix to avoid trying to combine bpermute of bitcast.

[AMDGPU] InstCombine llvm.amdgcn.ds.bpermute with uniform arguments

31ceb2a

Reland llvm#129895 with a fix to avoid trying to combine bpermute of bitcast.

llvmbot added backend:AMDGPU llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Mar 6, 2025

jayfoad requested review from arsenm, nhaehnle, tsymalla and shiltian March 6, 2025 16:29

arsenm approved these changes Mar 7, 2025

View reviewed changes

clang-format

f1dcdb3

jayfoad merged commit e3350a6 into llvm:main Apr 10, 2025
11 checks passed

jayfoad deleted the instcombine-ds-bpermute branch April 10, 2025 09:36

var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025

[AMDGPU] InstCombine llvm.amdgcn.ds.bpermute with uniform arguments (l…

07e4e43

…lvm#130133) Reland llvm#129895 with a fix to avoid trying to combine bpermute of bitcast.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] InstCombine llvm.amdgcn.ds.bpermute with uniform arguments #130133

[AMDGPU] InstCombine llvm.amdgcn.ds.bpermute with uniform arguments #130133

Uh oh!

jayfoad commented Mar 6, 2025

Uh oh!

llvmbot commented Mar 6, 2025 •

edited

Loading

Uh oh!

arsenm Mar 7, 2025

Uh oh!

Uh oh!

Uh oh!

[AMDGPU] InstCombine llvm.amdgcn.ds.bpermute with uniform arguments #130133

[AMDGPU] InstCombine llvm.amdgcn.ds.bpermute with uniform arguments #130133

Uh oh!

Conversation

jayfoad commented Mar 6, 2025

Uh oh!

llvmbot commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm Mar 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Mar 6, 2025 •

edited

Loading