[AMDGPULowerBufferFatPointers] Simplify and fix GEP offset emission #95115

nikic · 2024-06-11T13:52:26Z

Use emitGEPOffset() to emit the GEP offset, which already has all the necessary logic.

This also fixes the nuw flag incorrectly being set on the offset calculation, while only nsw is correct.

Use emitGEPOffset() to emit the GEP offset, which already has all the necessary logic. This also fixes the nuw flag incorrectly being set on the offset calculation, while only nsw is correct.

llvmbot · 2024-06-11T13:53:06Z

@llvm/pr-subscribers-backend-amdgpu

Author: Nikita Popov (nikic)

Changes

Use emitGEPOffset() to emit the GEP offset, which already has all the necessary logic.

This also fixes the nuw flag incorrectly being set on the offset calculation, while only nsw is correct.

Full diff: https://github.com/llvm/llvm-project/pull/95115.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp (+9-41)
(modified) llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll (+40-34)
(modified) llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-unoptimized-debug-data.ll (+12-12)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp b/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
index f878bd9465d38..77238d184adb0 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
@@ -200,6 +200,7 @@
 #include "llvm/ADT/SetOperations.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/Analysis/ConstantFolding.h"
+#include "llvm/Analysis/Utils/Local.h"
 #include "llvm/CodeGen/TargetPassConfig.h"
 #include "llvm/IR/AttributeMask.h"
 #include "llvm/IR/Constants.h"
@@ -1393,50 +1394,17 @@ PtrParts SplitPtrStructs::visitGetElementPtrInst(GetElementPtrInst &GEP) {
   IRB.SetInsertPoint(&GEP);
 
   auto [Rsrc, Off] = getPtrParts(Ptr);
-  Type *OffTy = Off->getType();
   const DataLayout &DL = GEP.getModule()->getDataLayout();
   bool InBounds = GEP.isInBounds();
 
-  // In order to call collectOffset() and thus not have to reimplement it,
-  // we need the GEP's pointer operand to have ptr addrspace(7) type
-  GEP.setOperand(GEP.getPointerOperandIndex(),
-                 PoisonValue::get(IRB.getPtrTy(AMDGPUAS::BUFFER_FAT_POINTER)));
-  MapVector<Value *, APInt> VariableOffs;
-  APInt ConstOffVal = APInt::getZero(BufferOffsetWidth);
-  if (!GEP.collectOffset(DL, BufferOffsetWidth, VariableOffs, ConstOffVal))
-    report_fatal_error("Scalable vector or unsized struct in fat pointer GEP");
-  GEP.setOperand(GEP.getPointerOperandIndex(), Ptr);
-  Value *OffAccum = nullptr;
-  // Accumulate offsets together before adding to the base in order to preserve
-  // as many of the inbounds properties as possible.
-  for (auto [Arg, Multiple] : VariableOffs) {
-    if (auto *OffVecTy = dyn_cast<VectorType>(OffTy))
-      if (!Arg->getType()->isVectorTy())
-        Arg = IRB.CreateVectorSplat(OffVecTy->getElementCount(), Arg);
-    Arg = IRB.CreateIntCast(Arg, OffTy, /*isSigned=*/true);
-    if (!Multiple.isOne()) {
-      if (Multiple.isPowerOf2())
-        Arg = IRB.CreateShl(Arg, Multiple.logBase2(), "", /*hasNUW=*/InBounds,
-                            /*HasNSW=*/InBounds);
-      else
-        Arg = IRB.CreateMul(Arg, ConstantExpr::getIntegerValue(OffTy, Multiple),
-                            "", /*hasNUW=*/InBounds, /*hasNSW=*/InBounds);
-    }
-    if (OffAccum)
-      OffAccum = IRB.CreateAdd(OffAccum, Arg, "", /*hasNUW=*/InBounds,
-                               /*hasNSW=*/InBounds);
-    else
-      OffAccum = Arg;
-  }
-  if (!ConstOffVal.isZero()) {
-    Constant *ConstOff = ConstantExpr::getIntegerValue(OffTy, ConstOffVal);
-    if (OffAccum)
-      OffAccum = IRB.CreateAdd(OffAccum, ConstOff, "", /*hasNUW=*/InBounds,
-                               /*hasNSW=*/InBounds);
-    else
-      OffAccum = ConstOff;
-  }
-
+  // In order to call emitGEPOffset() and thus not have to reimplement it,
+  // we need the GEP result to have ptr addrspace(7) type.
+  Type *FatPtrTy = IRB.getPtrTy(AMDGPUAS::BUFFER_FAT_POINTER);
+  if (auto *VT = dyn_cast<VectorType>(Off->getType()))
+    FatPtrTy = VectorType::get(FatPtrTy, VT->getElementCount());
+  GEP.mutateType(FatPtrTy);
+  Value *OffAccum = emitGEPOffset(&IRB, DL, &GEP);
+  GEP.mutateType(Ptr->getType());
   if (!OffAccum) { // Constant-zero offset
     SplitUsers.insert(&GEP);
     return {Rsrc, Off};
diff --git a/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll b/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
index 85cc810988a71..60c6890689ccb 100644
--- a/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
+++ b/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
@@ -10,12 +10,13 @@ define ptr addrspace(7) @gep(ptr addrspace(7) %in, i32 %idx) {
 ; CHECK-SAME: ({ ptr addrspace(8), i32 } [[IN:%.*]], i32 [[IDX:%.*]]) #[[ATTR0:[0-9]+]] {
 ; CHECK-NEXT:    [[IN_RSRC:%.*]] = extractvalue { ptr addrspace(8), i32 } [[IN]], 0
 ; CHECK-NEXT:    [[IN_OFF:%.*]] = extractvalue { ptr addrspace(8), i32 } [[IN]], 1
-; CHECK-NEXT:    [[TMP1:%.*]] = mul nuw nsw i32 [[IDX]], 40
-; CHECK-NEXT:    [[TMP2:%.*]] = add nuw nsw i32 [[TMP1]], 32
-; CHECK-NEXT:    [[RET:%.*]] = add i32 [[IN_OFF]], [[TMP2]]
-; CHECK-NEXT:    [[TMP3:%.*]] = insertvalue { ptr addrspace(8), i32 } poison, ptr addrspace(8) [[IN_RSRC]], 0
-; CHECK-NEXT:    [[TMP4:%.*]] = insertvalue { ptr addrspace(8), i32 } [[TMP3]], i32 [[RET]], 1
-; CHECK-NEXT:    ret { ptr addrspace(8), i32 } [[TMP4]]
+; CHECK-NEXT:    [[RET_IDX:%.*]] = mul nsw i32 [[IDX]], 40
+; CHECK-NEXT:    [[RET_OFFS:%.*]] = add nsw i32 [[RET_IDX]], 8
+; CHECK-NEXT:    [[RET_OFFS1:%.*]] = add nsw i32 [[RET_OFFS]], 24
+; CHECK-NEXT:    [[RET:%.*]] = add i32 [[IN_OFF]], [[RET_OFFS1]]
+; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { ptr addrspace(8), i32 } poison, ptr addrspace(8) [[IN_RSRC]], 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue { ptr addrspace(8), i32 } [[TMP1]], i32 [[RET]], 1
+; CHECK-NEXT:    ret { ptr addrspace(8), i32 } [[TMP2]]
 ;
   %ret = getelementptr inbounds {i32, [4 x ptr]}, ptr addrspace(7) %in, i32 %idx, i32 1, i32 3
   ret ptr addrspace(7) %ret
@@ -26,12 +27,13 @@ define <2 x ptr addrspace(7)> @gep_vectors(<2 x ptr addrspace(7)> %in, <2 x i32>
 ; CHECK-SAME: ({ <2 x ptr addrspace(8)>, <2 x i32> } [[IN:%.*]], <2 x i32> [[IDX:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[IN_RSRC:%.*]] = extractvalue { <2 x ptr addrspace(8)>, <2 x i32> } [[IN]], 0
 ; CHECK-NEXT:    [[IN_OFF:%.*]] = extractvalue { <2 x ptr addrspace(8)>, <2 x i32> } [[IN]], 1
-; CHECK-NEXT:    [[TMP1:%.*]] = mul nuw nsw <2 x i32> [[IDX]], <i32 40, i32 40>
-; CHECK-NEXT:    [[TMP2:%.*]] = add nuw nsw <2 x i32> [[TMP1]], <i32 32, i32 32>
-; CHECK-NEXT:    [[RET:%.*]] = add <2 x i32> [[IN_OFF]], [[TMP2]]
-; CHECK-NEXT:    [[TMP3:%.*]] = insertvalue { <2 x ptr addrspace(8)>, <2 x i32> } poison, <2 x ptr addrspace(8)> [[IN_RSRC]], 0
-; CHECK-NEXT:    [[TMP4:%.*]] = insertvalue { <2 x ptr addrspace(8)>, <2 x i32> } [[TMP3]], <2 x i32> [[RET]], 1
-; CHECK-NEXT:    ret { <2 x ptr addrspace(8)>, <2 x i32> } [[TMP4]]
+; CHECK-NEXT:    [[RET_IDX:%.*]] = mul nsw <2 x i32> [[IDX]], <i32 40, i32 40>
+; CHECK-NEXT:    [[RET_OFFS:%.*]] = add nsw <2 x i32> [[RET_IDX]], <i32 8, i32 8>
+; CHECK-NEXT:    [[RET_OFFS1:%.*]] = add nsw <2 x i32> [[RET_OFFS]], <i32 24, i32 24>
+; CHECK-NEXT:    [[RET:%.*]] = add <2 x i32> [[IN_OFF]], [[RET_OFFS1]]
+; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { <2 x ptr addrspace(8)>, <2 x i32> } poison, <2 x ptr addrspace(8)> [[IN_RSRC]], 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue { <2 x ptr addrspace(8)>, <2 x i32> } [[TMP1]], <2 x i32> [[RET]], 1
+; CHECK-NEXT:    ret { <2 x ptr addrspace(8)>, <2 x i32> } [[TMP2]]
 ;
   %ret = getelementptr inbounds {i32, [4 x ptr]}, <2 x ptr addrspace(7)> %in, <2 x i32> %idx, i32 1, i32 3
   ret <2 x ptr addrspace(7)> %ret
@@ -44,13 +46,14 @@ define <2 x ptr addrspace(7)> @gep_vector_scalar(<2 x ptr addrspace(7)> %in, i64
 ; CHECK-NEXT:    [[IN_OFF:%.*]] = extractvalue { <2 x ptr addrspace(8)>, <2 x i32> } [[IN]], 1
 ; CHECK-NEXT:    [[DOTSPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[IDX]], i64 0
 ; CHECK-NEXT:    [[DOTSPLAT:%.*]] = shufflevector <2 x i64> [[DOTSPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
-; CHECK-NEXT:    [[TMP1:%.*]] = trunc <2 x i64> [[DOTSPLAT]] to <2 x i32>
-; CHECK-NEXT:    [[TMP2:%.*]] = mul nuw nsw <2 x i32> [[TMP1]], <i32 40, i32 40>
-; CHECK-NEXT:    [[TMP3:%.*]] = add nuw nsw <2 x i32> [[TMP2]], <i32 32, i32 32>
-; CHECK-NEXT:    [[RET:%.*]] = add <2 x i32> [[IN_OFF]], [[TMP3]]
-; CHECK-NEXT:    [[TMP4:%.*]] = insertvalue { <2 x ptr addrspace(8)>, <2 x i32> } poison, <2 x ptr addrspace(8)> [[IN_RSRC]], 0
-; CHECK-NEXT:    [[TMP5:%.*]] = insertvalue { <2 x ptr addrspace(8)>, <2 x i32> } [[TMP4]], <2 x i32> [[RET]], 1
-; CHECK-NEXT:    ret { <2 x ptr addrspace(8)>, <2 x i32> } [[TMP5]]
+; CHECK-NEXT:    [[DOTSPLAT_C:%.*]] = trunc <2 x i64> [[DOTSPLAT]] to <2 x i32>
+; CHECK-NEXT:    [[RET_IDX:%.*]] = mul nsw <2 x i32> [[DOTSPLAT_C]], <i32 40, i32 40>
+; CHECK-NEXT:    [[RET_OFFS:%.*]] = add nsw <2 x i32> [[RET_IDX]], <i32 8, i32 8>
+; CHECK-NEXT:    [[RET_OFFS1:%.*]] = add nsw <2 x i32> [[RET_OFFS]], <i32 24, i32 24>
+; CHECK-NEXT:    [[RET:%.*]] = add <2 x i32> [[IN_OFF]], [[RET_OFFS1]]
+; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { <2 x ptr addrspace(8)>, <2 x i32> } poison, <2 x ptr addrspace(8)> [[IN_RSRC]], 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue { <2 x ptr addrspace(8)>, <2 x i32> } [[TMP1]], <2 x i32> [[RET]], 1
+; CHECK-NEXT:    ret { <2 x ptr addrspace(8)>, <2 x i32> } [[TMP2]]
 ;
   %ret = getelementptr inbounds {i32, [4 x ptr]}, <2 x ptr addrspace(7)> %in, i64 %idx, i32 1, i32 3
   ret <2 x ptr addrspace(7)> %ret
@@ -61,11 +64,11 @@ define ptr addrspace(7) @simple_gep(ptr addrspace(7) %ptr, i32 %off) {
 ; CHECK-SAME: ({ ptr addrspace(8), i32 } [[PTR:%.*]], i32 [[OFF:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[PTR_RSRC:%.*]] = extractvalue { ptr addrspace(8), i32 } [[PTR]], 0
 ; CHECK-NEXT:    [[PTR_OFF:%.*]] = extractvalue { ptr addrspace(8), i32 } [[PTR]], 1
-; CHECK-NEXT:    [[TMP1:%.*]] = shl i32 [[OFF]], 2
-; CHECK-NEXT:    [[RET:%.*]] = add i32 [[PTR_OFF]], [[TMP1]]
-; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue { ptr addrspace(8), i32 } poison, ptr addrspace(8) [[PTR_RSRC]], 0
-; CHECK-NEXT:    [[TMP3:%.*]] = insertvalue { ptr addrspace(8), i32 } [[TMP2]], i32 [[RET]], 1
-; CHECK-NEXT:    ret { ptr addrspace(8), i32 } [[TMP3]]
+; CHECK-NEXT:    [[RET_IDX:%.*]] = mul i32 [[OFF]], 4
+; CHECK-NEXT:    [[RET:%.*]] = add i32 [[PTR_OFF]], [[RET_IDX]]
+; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { ptr addrspace(8), i32 } poison, ptr addrspace(8) [[PTR_RSRC]], 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue { ptr addrspace(8), i32 } [[TMP1]], i32 [[RET]], 1
+; CHECK-NEXT:    ret { ptr addrspace(8), i32 } [[TMP2]]
 ;
   %ret = getelementptr i32, ptr addrspace(7) %ptr, i32 %off
   ret ptr addrspace(7) %ret
@@ -76,11 +79,11 @@ define ptr addrspace(7) @simple_inbounds_gep(ptr addrspace(7) %ptr, i32 %off) {
 ; CHECK-SAME: ({ ptr addrspace(8), i32 } [[PTR:%.*]], i32 [[OFF:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[PTR_RSRC:%.*]] = extractvalue { ptr addrspace(8), i32 } [[PTR]], 0
 ; CHECK-NEXT:    [[PTR_OFF:%.*]] = extractvalue { ptr addrspace(8), i32 } [[PTR]], 1
-; CHECK-NEXT:    [[TMP1:%.*]] = shl nuw nsw i32 [[OFF]], 2
-; CHECK-NEXT:    [[RET:%.*]] = add i32 [[PTR_OFF]], [[TMP1]]
-; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue { ptr addrspace(8), i32 } poison, ptr addrspace(8) [[PTR_RSRC]], 0
-; CHECK-NEXT:    [[TMP3:%.*]] = insertvalue { ptr addrspace(8), i32 } [[TMP2]], i32 [[RET]], 1
-; CHECK-NEXT:    ret { ptr addrspace(8), i32 } [[TMP3]]
+; CHECK-NEXT:    [[RET_IDX:%.*]] = mul nsw i32 [[OFF]], 4
+; CHECK-NEXT:    [[RET:%.*]] = add i32 [[PTR_OFF]], [[RET_IDX]]
+; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { ptr addrspace(8), i32 } poison, ptr addrspace(8) [[PTR_RSRC]], 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue { ptr addrspace(8), i32 } [[TMP1]], i32 [[RET]], 1
+; CHECK-NEXT:    ret { ptr addrspace(8), i32 } [[TMP2]]
 ;
   %ret = getelementptr inbounds i32, ptr addrspace(7) %ptr, i32 %off
   ret ptr addrspace(7) %ret
@@ -91,9 +94,10 @@ define ptr addrspace(7) @zero_gep(ptr addrspace(7) %ptr) {
 ; CHECK-SAME: ({ ptr addrspace(8), i32 } [[PTR:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[PTR_RSRC:%.*]] = extractvalue { ptr addrspace(8), i32 } [[PTR]], 0
 ; CHECK-NEXT:    [[PTR_OFF:%.*]] = extractvalue { ptr addrspace(8), i32 } [[PTR]], 1
+; CHECK-NEXT:    [[RET:%.*]] = add i32 [[PTR_OFF]], 0
 ; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { ptr addrspace(8), i32 } poison, ptr addrspace(8) [[PTR_RSRC]], 0
-; CHECK-NEXT:    [[RET:%.*]] = insertvalue { ptr addrspace(8), i32 } [[TMP1]], i32 [[PTR_OFF]], 1
-; CHECK-NEXT:    ret { ptr addrspace(8), i32 } [[RET]]
+; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue { ptr addrspace(8), i32 } [[TMP1]], i32 [[RET]], 1
+; CHECK-NEXT:    ret { ptr addrspace(8), i32 } [[TMP2]]
 ;
   %ret = getelementptr i8, ptr addrspace(7) %ptr, i32 0
   ret ptr addrspace(7) %ret
@@ -105,9 +109,10 @@ define ptr addrspace(7) @zero_gep_goes_second(ptr addrspace(7) %v0, i32 %arg) {
 ; CHECK-NEXT:    [[V0_RSRC:%.*]] = extractvalue { ptr addrspace(8), i32 } [[V0]], 0
 ; CHECK-NEXT:    [[V0_OFF:%.*]] = extractvalue { ptr addrspace(8), i32 } [[V0]], 1
 ; CHECK-NEXT:    [[V1:%.*]] = add i32 [[V0_OFF]], [[ARG]]
+; CHECK-NEXT:    [[V2:%.*]] = add i32 [[V1]], 0
 ; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { ptr addrspace(8), i32 } poison, ptr addrspace(8) [[V0_RSRC]], 0
-; CHECK-NEXT:    [[V2:%.*]] = insertvalue { ptr addrspace(8), i32 } [[TMP1]], i32 [[V1]], 1
-; CHECK-NEXT:    ret { ptr addrspace(8), i32 } [[V2]]
+; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue { ptr addrspace(8), i32 } [[TMP1]], i32 [[V2]], 1
+; CHECK-NEXT:    ret { ptr addrspace(8), i32 } [[TMP2]]
 ;
   %v1 = getelementptr i8, ptr addrspace(7) %v0, i32 %arg
   %v2 = getelementptr i8, ptr addrspace(7) %v1, i32 0
@@ -119,7 +124,8 @@ define ptr addrspace(7) @zero_gep_goes_first(ptr addrspace(7) %v0, i32 %arg) {
 ; CHECK-SAME: ({ ptr addrspace(8), i32 } [[V0:%.*]], i32 [[ARG:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[V0_RSRC:%.*]] = extractvalue { ptr addrspace(8), i32 } [[V0]], 0
 ; CHECK-NEXT:    [[V0_OFF:%.*]] = extractvalue { ptr addrspace(8), i32 } [[V0]], 1
-; CHECK-NEXT:    [[V2:%.*]] = add i32 [[V0_OFF]], [[ARG]]
+; CHECK-NEXT:    [[V1:%.*]] = add i32 [[V0_OFF]], 0
+; CHECK-NEXT:    [[V2:%.*]] = add i32 [[V1]], [[ARG]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { ptr addrspace(8), i32 } poison, ptr addrspace(8) [[V0_RSRC]], 0
 ; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue { ptr addrspace(8), i32 } [[TMP1]], i32 [[V2]], 1
 ; CHECK-NEXT:    ret { ptr addrspace(8), i32 } [[TMP2]]
diff --git a/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-unoptimized-debug-data.ll b/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-unoptimized-debug-data.ll
index 69387e67c1c7e..5828d5bf6f11c 100644
--- a/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-unoptimized-debug-data.ll
+++ b/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-unoptimized-debug-data.ll
@@ -28,32 +28,32 @@ define float @debug_stash_pointer(ptr addrspace(8) %buf, i32 %idx, ptr addrspace
 ; CHECK-NEXT:    [[BUF_PTR_2_PTR_RSRC:%.*]] = inttoptr i128 [[TMP4]] to ptr addrspace(8), !dbg [[DBG27]]
 ; CHECK-NEXT:    [[BUF_PTR_2_PTR_OFF:%.*]] = trunc i160 [[BUF_PTR_2]] to i32, !dbg [[DBG27]]
 ; CHECK-NEXT:    tail call void @llvm.dbg.value(metadata { ptr addrspace(8), i32 } undef, metadata [[META16:![0-9]+]], metadata !DIExpression()), !dbg [[DBG27]]
-; CHECK-NEXT:    [[TMP5:%.*]] = shl i32 [[IDX]], 2, !dbg [[DBG28:![0-9]+]]
-; CHECK-NEXT:    [[BUF_PTR_3:%.*]] = add i32 [[BUF_PTR_2_PTR_OFF]], [[TMP5]], !dbg [[DBG28]]
+; CHECK-NEXT:    [[BUF_PTR_3_IDX:%.*]] = mul i32 [[IDX]], 4, !dbg [[DBG28:![0-9]+]]
+; CHECK-NEXT:    [[BUF_PTR_3:%.*]] = add i32 [[BUF_PTR_2_PTR_OFF]], [[BUF_PTR_3_IDX]], !dbg [[DBG28]]
 ; CHECK-NEXT:    tail call void @llvm.dbg.value(metadata { ptr addrspace(8), i32 } undef, metadata [[META17:![0-9]+]], metadata !DIExpression()), !dbg [[DBG28]]
 ; CHECK-NEXT:    [[BUF_PTR_3_INT_RSRC:%.*]] = ptrtoint ptr addrspace(8) [[BUF_PTR_2_PTR_RSRC]] to i160, !dbg [[DBG29:![0-9]+]]
-; CHECK-NEXT:    [[TMP6:%.*]] = shl nuw i160 [[BUF_PTR_3_INT_RSRC]], 32, !dbg [[DBG29]]
+; CHECK-NEXT:    [[TMP5:%.*]] = shl nuw i160 [[BUF_PTR_3_INT_RSRC]], 32, !dbg [[DBG29]]
 ; CHECK-NEXT:    [[BUF_PTR_3_INT_OFF:%.*]] = zext i32 [[BUF_PTR_3]] to i160, !dbg [[DBG29]]
-; CHECK-NEXT:    [[BUF_PTR_3_INT:%.*]] = or i160 [[TMP6]], [[BUF_PTR_3_INT_OFF]], !dbg [[DBG29]]
+; CHECK-NEXT:    [[BUF_PTR_3_INT:%.*]] = or i160 [[TMP5]], [[BUF_PTR_3_INT_OFF]], !dbg [[DBG29]]
 ; CHECK-NEXT:    store i160 [[BUF_PTR_3_INT]], ptr addrspace(5) [[BUF_PTR_VAR]], align 32, !dbg [[DBG29]]
 ; CHECK-NEXT:    [[BUF_PTR_4:%.*]] = load i160, ptr addrspace(5) [[BUF_PTR_VAR]], align 32, !dbg [[DBG30:![0-9]+]]
-; CHECK-NEXT:    [[TMP7:%.*]] = lshr i160 [[BUF_PTR_4]], 32, !dbg [[DBG30]]
-; CHECK-NEXT:    [[TMP8:%.*]] = trunc i160 [[TMP7]] to i128, !dbg [[DBG30]]
-; CHECK-NEXT:    [[BUF_PTR_4_PTR_RSRC:%.*]] = inttoptr i128 [[TMP8]] to ptr addrspace(8), !dbg [[DBG30]]
+; CHECK-NEXT:    [[TMP6:%.*]] = lshr i160 [[BUF_PTR_4]], 32, !dbg [[DBG30]]
+; CHECK-NEXT:    [[TMP7:%.*]] = trunc i160 [[TMP6]] to i128, !dbg [[DBG30]]
+; CHECK-NEXT:    [[BUF_PTR_4_PTR_RSRC:%.*]] = inttoptr i128 [[TMP7]] to ptr addrspace(8), !dbg [[DBG30]]
 ; CHECK-NEXT:    [[BUF_PTR_4_PTR_OFF:%.*]] = trunc i160 [[BUF_PTR_4]] to i32, !dbg [[DBG30]]
 ; CHECK-NEXT:    tail call void @llvm.dbg.value(metadata { ptr addrspace(8), i32 } undef, metadata [[META18:![0-9]+]], metadata !DIExpression()), !dbg [[DBG30]]
 ; CHECK-NEXT:    [[RET:%.*]] = call float @llvm.amdgcn.raw.ptr.buffer.load.f32(ptr addrspace(8) align 4 [[BUF_PTR_4_PTR_RSRC]], i32 [[BUF_PTR_4_PTR_OFF]], i32 0, i32 0), !dbg [[DBG31:![0-9]+]]
 ; CHECK-NEXT:    tail call void @llvm.dbg.value(metadata float [[RET]], metadata [[META19:![0-9]+]], metadata !DIExpression()), !dbg [[DBG31]]
 ; CHECK-NEXT:    [[AUX_PTR_2:%.*]] = load i160, ptr addrspace(5) [[AUX_PTR_VAR]], align 32, !dbg [[DBG32:![0-9]+]]
-; CHECK-NEXT:    [[TMP9:%.*]] = lshr i160 [[AUX_PTR_2]], 32, !dbg [[DBG32]]
-; CHECK-NEXT:    [[TMP10:%.*]] = trunc i160 [[TMP9]] to i128, !dbg [[DBG32]]
-; CHECK-NEXT:    [[AUX_PTR_2_PTR_RSRC:%.*]] = inttoptr i128 [[TMP10]] to ptr addrspace(8), !dbg [[DBG32]]
+; CHECK-NEXT:    [[TMP8:%.*]] = lshr i160 [[AUX_PTR_2]], 32, !dbg [[DBG32]]
+; CHECK-NEXT:    [[TMP9:%.*]] = trunc i160 [[TMP8]] to i128, !dbg [[DBG32]]
+; CHECK-NEXT:    [[AUX_PTR_2_PTR_RSRC:%.*]] = inttoptr i128 [[TMP9]] to ptr addrspace(8), !dbg [[DBG32]]
 ; CHECK-NEXT:    [[AUX_PTR_2_PTR_OFF:%.*]] = trunc i160 [[AUX_PTR_2]] to i32, !dbg [[DBG32]]
 ; CHECK-NEXT:    tail call void @llvm.dbg.value(metadata { ptr addrspace(8), i32 } undef, metadata [[META20:![0-9]+]], metadata !DIExpression()), !dbg [[DBG32]]
 ; CHECK-NEXT:    [[BUF_PTR_4_PTR_INT_RSRC:%.*]] = ptrtoint ptr addrspace(8) [[BUF_PTR_4_PTR_RSRC]] to i160, !dbg [[DBG33:![0-9]+]]
-; CHECK-NEXT:    [[TMP11:%.*]] = shl nuw i160 [[BUF_PTR_4_PTR_INT_RSRC]], 32, !dbg [[DBG33]]
+; CHECK-NEXT:    [[TMP10:%.*]] = shl nuw i160 [[BUF_PTR_4_PTR_INT_RSRC]], 32, !dbg [[DBG33]]
 ; CHECK-NEXT:    [[BUF_PTR_4_PTR_INT_OFF:%.*]] = zext i32 [[BUF_PTR_4_PTR_OFF]] to i160, !dbg [[DBG33]]
-; CHECK-NEXT:    [[BUF_PTR_4_PTR_INT:%.*]] = or i160 [[TMP11]], [[BUF_PTR_4_PTR_INT_OFF]], !dbg [[DBG33]]
+; CHECK-NEXT:    [[BUF_PTR_4_PTR_INT:%.*]] = or i160 [[TMP10]], [[BUF_PTR_4_PTR_INT_OFF]], !dbg [[DBG33]]
 ; CHECK-NEXT:    call void @llvm.amdgcn.raw.ptr.buffer.store.i160(i160 [[BUF_PTR_4_PTR_INT]], ptr addrspace(8) align 32 [[AUX_PTR_2_PTR_RSRC]], i32 [[AUX_PTR_2_PTR_OFF]], i32 0, i32 0), !dbg [[DBG33]]
 ; CHECK-NEXT:    ret float [[RET]], !dbg [[DBG34:![0-9]+]]
 ;

krzysz00

LGTM, approved, thank you for the cleanup

krzysz00 · 2024-06-11T14:44:39Z

llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll

@@ -119,7 +124,8 @@ define ptr addrspace(7) @zero_gep_goes_first(ptr addrspace(7) %v0, i32 %arg) {
 ; CHECK-SAME: ({ ptr addrspace(8), i32 } [[V0:%.*]], i32 [[ARG:%.*]]) #[[ATTR0]] {
 ; CHECK-NEXT:    [[V0_RSRC:%.*]] = extractvalue { ptr addrspace(8), i32 } [[V0]], 0
 ; CHECK-NEXT:    [[V0_OFF:%.*]] = extractvalue { ptr addrspace(8), i32 } [[V0]], 1
-; CHECK-NEXT:    [[V2:%.*]] = add i32 [[V0_OFF]], [[ARG]]
+; CHECK-NEXT:    [[V1:%.*]] = add i32 [[V0_OFF]], 0


Silly question: is there a good way to fold away these sort of trivial computations? Because I think I've seen them make their way down into ISel and beyond

(Maybe there's a way to do a localized instcombine?)

InstSimplify. Can also use the IRBuilder that tries to simplify everything

Yes, we could use InstSimplifyFolder. We could also avoid emitting these in emitGEPOffset() directly -- it's currently used primarily in InstCombine where this doesn't matter, but for other usages it's probably nice to not generate "add 0", especially as zero-indices are somewhat common in "old-style" GEPs.

Turns out that this particular zero case was actually handled by the pass itself, and I broke it. Fixed by cb3a6bd.

krzysz00 · 2024-06-11T14:50:19Z

I will note that addrspace(7) has the offset part being unsigned

nikic · 2024-06-11T15:46:41Z

I will note that addrspace(7) has the offset part being unsigned

Does that mean that doing a gep inbounds (gep inbounds p, 1), -1 on addrspace(7) is not legal? I'd expect only the final offset to be treated as unsigned, not intermediate calculations.

krzysz00 · 2024-06-11T15:51:05Z

@nikic Right, no, it's the final offset that's unsigned

[AMDGPULowerBufferFatPointers] Simplify and fix GEP offset emission

ede2679

Use emitGEPOffset() to emit the GEP offset, which already has all the necessary logic. This also fixes the nuw flag incorrectly being set on the offset calculation, while only nsw is correct.

nikic requested review from arsenm and krzysz00 June 11, 2024 13:52

llvmbot added the backend:AMDGPU label Jun 11, 2024

krzysz00 approved these changes Jun 11, 2024

View reviewed changes

nikic merged commit 6fc63ab into llvm:main Jun 12, 2024
9 checks passed

nikic deleted the lower-buffer branch June 12, 2024 07:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPULowerBufferFatPointers] Simplify and fix GEP offset emission #95115

[AMDGPULowerBufferFatPointers] Simplify and fix GEP offset emission #95115

Uh oh!

nikic commented Jun 11, 2024

Uh oh!

llvmbot commented Jun 11, 2024

Uh oh!

krzysz00 left a comment

Uh oh!

krzysz00 Jun 11, 2024

Uh oh!

arsenm Jun 11, 2024

Uh oh!

nikic Jun 11, 2024

Uh oh!

nikic Jun 12, 2024

Uh oh!

krzysz00 commented Jun 11, 2024

Uh oh!

nikic commented Jun 11, 2024

Uh oh!

krzysz00 commented Jun 11, 2024

Uh oh!

Uh oh!

Uh oh!

[AMDGPULowerBufferFatPointers] Simplify and fix GEP offset emission #95115

[AMDGPULowerBufferFatPointers] Simplify and fix GEP offset emission #95115

Uh oh!

Conversation

nikic commented Jun 11, 2024

Uh oh!

llvmbot commented Jun 11, 2024

Uh oh!

krzysz00 left a comment

Choose a reason for hiding this comment

Uh oh!

krzysz00 Jun 11, 2024

Choose a reason for hiding this comment

Uh oh!

arsenm Jun 11, 2024

Choose a reason for hiding this comment

Uh oh!

nikic Jun 11, 2024

Choose a reason for hiding this comment

Uh oh!

nikic Jun 12, 2024

Choose a reason for hiding this comment

Uh oh!

krzysz00 commented Jun 11, 2024

Uh oh!

nikic commented Jun 11, 2024

Uh oh!

krzysz00 commented Jun 11, 2024

Uh oh!

Uh oh!

Uh oh!