Skip to content

[SeparateConstOffsetFromGEP] propagate const offset through GEP chains #143470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

AlexMaclean
Copy link
Member

When separating the constant offset from a GEP, if the pointer operand is a constant ptradd (likely generated when we performed this transform on that GEP), we accumulate the offset into the current offset. This ensures that when there is a chain of GEPs the constant offset reaches the final memory instruction where it can likely be folded into the addressing.

@llvmbot
Copy link
Member

llvmbot commented Jun 10, 2025

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-nvptx

Author: Alex MacLean (AlexMaclean)

Changes

When separating the constant offset from a GEP, if the pointer operand is a constant ptradd (likely generated when we performed this transform on that GEP), we accumulate the offset into the current offset. This ensures that when there is a chain of GEPs the constant offset reaches the final memory instruction where it can likely be folded into the addressing.


Patch is 20.67 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/143470.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp (+37-11)
  • (added) llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/gep-chain.ll (+383)
diff --git a/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp b/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
index 320b79203c0b3..b478c72a14a71 100644
--- a/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
+++ b/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
@@ -1048,19 +1048,31 @@ bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
   if (GEP->getType()->isVectorTy())
     return false;
 
+  // If the base of this GEP is a ptradd of a constant, lets pass the constant
+  // along. This ensures that when we have a chain of GEPs the constant
+  // offset from each is accumulated.
+  Value *NewBase;
+  const APInt *BaseOffset;
+  const bool ExtractBase =
+      match(GEP->getPointerOperand(),
+            m_PtrAdd(m_Value(NewBase), m_APInt(BaseOffset)));
+
+  const int64_t BaseByteOffset = ExtractBase ? BaseOffset->getSExtValue() : 0;
+
   // The backend can already nicely handle the case where all indices are
   // constant.
-  if (GEP->hasAllConstantIndices())
+  if (GEP->hasAllConstantIndices() && !ExtractBase)
     return false;
 
   bool Changed = canonicalizeArrayIndicesToIndexSize(GEP);
 
   bool NeedsExtraction;
-  int64_t AccumulativeByteOffset = accumulateByteOffset(GEP, NeedsExtraction);
+  int64_t AccumulativeByteOffset =
+      BaseByteOffset + accumulateByteOffset(GEP, NeedsExtraction);
 
   TargetTransformInfo &TTI = GetTTI(*GEP->getFunction());
 
-  if (!NeedsExtraction) {
+  if (!NeedsExtraction && !ExtractBase) {
     Changed |= reorderGEP(GEP, TTI);
     return Changed;
   }
@@ -1084,7 +1096,9 @@ bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
 
   // Track information for preserving GEP flags.
   bool AllOffsetsNonNegative = AccumulativeByteOffset >= 0;
-  bool AllNUWPreserved = true;
+  bool AllNUWPreserved = GEP->hasNoUnsignedWrap();
+  bool NewGEPInBounds = GEP->isInBounds();
+  bool NewGEPNUSW = GEP->hasNoUnsignedSignedWrap();
 
   // Remove the constant offset in each sequential index. The resultant GEP
   // computes the variadic base.
@@ -1120,6 +1134,16 @@ bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
       }
     }
   }
+  if (ExtractBase) {
+    GEPOperator *Base = cast<GEPOperator>(GEP->getPointerOperand());
+    AllNUWPreserved &= Base->hasNoUnsignedWrap();
+    NewGEPInBounds &= Base->isInBounds();
+    NewGEPNUSW &= Base->hasNoUnsignedSignedWrap();
+    AllOffsetsNonNegative &= BaseByteOffset >= 0;
+
+    GEP->setOperand(0, NewBase);
+    RecursivelyDeleteTriviallyDeadInstructions(Base);
+  }
 
   // Clear the inbounds attribute because the new index may be off-bound.
   // e.g.,
@@ -1147,7 +1171,7 @@ bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
 
   // If the initial GEP was NUW and all operations that we reassociate were NUW
   // additions, the resulting GEPs are also NUW.
-  if (GEP->hasNoUnsignedWrap() && AllNUWPreserved) {
+  if (AllNUWPreserved) {
     NewGEPFlags |= GEPNoWrapFlags::noUnsignedWrap();
     // If the initial GEP additionally had NUSW (or inbounds, which implies
     // NUSW), we know that the indices in the initial GEP must all have their
@@ -1155,13 +1179,13 @@ bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
     // add-operands therefore also don't have their signbit set. Therefore, all
     // indices of the resulting GEPs are non-negative -> we can preserve
     // the inbounds/nusw flag.
-    CanPreserveInBoundsNUSW |= GEP->hasNoUnsignedSignedWrap();
+    CanPreserveInBoundsNUSW |= NewGEPNUSW;
   }
 
   if (CanPreserveInBoundsNUSW) {
-    if (GEP->isInBounds())
+    if (NewGEPInBounds)
       NewGEPFlags |= GEPNoWrapFlags::inBounds();
-    else if (GEP->hasNoUnsignedSignedWrap())
+    else if (NewGEPNUSW)
       NewGEPFlags |= GEPNoWrapFlags::noUnsignedSignedWrap();
   }
 
@@ -1242,11 +1266,13 @@ bool SeparateConstOffsetFromGEP::run(Function &F) {
 
   DL = &F.getDataLayout();
   bool Changed = false;
-  for (BasicBlock &B : F) {
-    if (!DT->isReachableFromEntry(&B))
+
+  ReversePostOrderTraversal<Function *> RPOT(&F);
+  for (BasicBlock *B : RPOT) {
+    if (!DT->isReachableFromEntry(B))
       continue;
 
-    for (Instruction &I : llvm::make_early_inc_range(B))
+    for (Instruction &I : llvm::make_early_inc_range(*B))
       if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(&I))
         Changed |= splitGEP(GEP);
     // No need to split GEP ConstantExprs because all its indices are constant
diff --git a/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/gep-chain.ll b/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/gep-chain.ll
new file mode 100644
index 0000000000000..2587fd26fdef2
--- /dev/null
+++ b/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/gep-chain.ll
@@ -0,0 +1,383 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -mtriple=nvptx64-nvidia-cuda -S -passes=separate-const-offset-from-gep  < %s | FileCheck %s
+
+%struct.uchar4 = type { i8, i8, i8, i8 }
+
+define ptr @basic(ptr %ptr, i64 %offset1, i64 %offset2) {
+; CHECK-LABEL: define ptr @basic(
+; CHECK-SAME: ptr [[PTR:%.*]], i64 [[OFFSET1:%.*]], i64 [[OFFSET2:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr [[STRUCT_UCHAR4:%.*]], ptr [[PTR]], i64 [[OFFSET1]]
+; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr [[STRUCT_UCHAR4]], ptr [[TMP1]], i64 [[OFFSET2]]
+; CHECK-NEXT:    [[GEP24:%.*]] = getelementptr i8, ptr [[TMP2]], i64 72
+; CHECK-NEXT:    ret ptr [[GEP24]]
+;
+  %offset3 = add i64 %offset1, 8
+  %gep1 = getelementptr %struct.uchar4, ptr %ptr, i64 %offset3
+  %offset4 = add i64 %offset2, 10
+  %gep2 = getelementptr %struct.uchar4, ptr %gep1, i64 %offset4
+  ret ptr %gep2
+}
+
+define i32 @more_interesting(ptr %ptr, i32 %offset1, i32 %offset2) {
+; CHECK-LABEL: define i32 @more_interesting(
+; CHECK-SAME: ptr [[PTR:%.*]], i32 [[OFFSET1:%.*]], i32 [[OFFSET2:%.*]]) {
+; CHECK-NEXT:    [[IDXPROM:%.*]] = sext i32 [[OFFSET1]] to i64
+; CHECK-NEXT:    [[GEP1:%.*]] = getelementptr inbounds [[STRUCT_UCHAR4:%.*]], ptr [[PTR]], i64 [[IDXPROM]]
+; CHECK-NEXT:    [[IDXPROM1:%.*]] = sext i32 [[OFFSET2]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds [[STRUCT_UCHAR4]], ptr [[GEP1]], i64 [[IDXPROM1]]
+; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 8
+; CHECK-NEXT:    [[V1:%.*]] = load i32, ptr [[TMP2]], align 4
+; CHECK-NEXT:    [[IDXPROM2:%.*]] = sext i32 [[OFFSET2]] to i64
+; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds [[STRUCT_UCHAR4]], ptr [[TMP1]], i64 [[IDXPROM2]]
+; CHECK-NEXT:    [[V2:%.*]] = load i32, ptr [[TMP4]], align 4
+; CHECK-NEXT:    [[R:%.*]] = add i32 [[V1]], [[V2]]
+; CHECK-NEXT:    ret i32 [[R]]
+;
+  %gep1 = getelementptr inbounds %struct.uchar4, ptr %ptr, i32 %offset1
+  %gep2 = getelementptr inbounds nuw i8, ptr %gep1, i32 8
+  %gep3 = getelementptr inbounds %struct.uchar4, ptr %gep2, i32 %offset2
+  %v1 = load i32, ptr %gep3, align 4
+  %gep4 = getelementptr inbounds i8, ptr %gep3, i32 -8
+  %gep5 = getelementptr inbounds %struct.uchar4, ptr %gep4, i32 %offset2
+  %v2 = load i32, ptr %gep5, align 4
+  %r = add i32 %v1, %v2
+  ret i32 %r
+}
+
+;; Check nuw/nusw/inbounds flag propagation
+
+; GEPs with nusw flag. All indices and offsets are non-negative.
+define ptr @test_0(ptr %p, i32 %i) {
+; CHECK-LABEL: define ptr @test_0(
+; CHECK-SAME: ptr [[P:%.*]], i32 [[I:%.*]]) {
+; CHECK-NEXT:    [[I_PROM:%.*]] = zext i32 [[I]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nusw i8, ptr [[P]], i64 [[I_PROM]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nusw i8, ptr [[TMP1]], i64 2
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw inbounds i8, ptr %p, i32 1
+  %i.prom = zext i32 %i to i64
+  %idx2 = add i64 %i.prom, 1
+  %arrayidx2 = getelementptr nusw i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with inbounds flag. All indices and offsets are non-negative.
+define ptr @test_1(ptr %p, i32 %i) {
+; CHECK-LABEL: define ptr @test_1(
+; CHECK-SAME: ptr [[P:%.*]], i32 [[I:%.*]]) {
+; CHECK-NEXT:    [[I_PROM:%.*]] = zext i32 [[I]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[I_PROM]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 2
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr inbounds i8, ptr %p, i32 1
+  %i.prom = zext i32 %i to i64
+  %idx2 = add i64 %i.prom, 1
+  %arrayidx2 = getelementptr nuw inbounds i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with nusw flag. All indices and offsets are non-negative.
+define ptr @test_2(ptr %p, i32 %i) {
+; CHECK-LABEL: define ptr @test_2(
+; CHECK-SAME: ptr [[P:%.*]], i32 [[I:%.*]]) {
+; CHECK-NEXT:    [[I_PROM:%.*]] = zext i32 [[I]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nusw i8, ptr [[P]], i64 [[I_PROM]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nusw i8, ptr [[TMP1]], i64 2
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nusw i8, ptr %p, i32 1
+  %i.prom = zext i32 %i to i64
+  %idx2 = add nuw i64 %i.prom, 1
+  %arrayidx2 = getelementptr nuw inbounds i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with inbounds flag. All indices and offsets are non-negative.
+define ptr @test_3(ptr %p, i32 %i) {
+; CHECK-LABEL: define ptr @test_3(
+; CHECK-SAME: ptr [[P:%.*]], i32 [[I:%.*]]) {
+; CHECK-NEXT:    [[I_PROM:%.*]] = zext i32 [[I]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[I_PROM]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 2
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw inbounds i8, ptr %p, i32 1
+  %i.prom = zext i32 %i to i64
+  %idx2 = add nuw i64 %i.prom, 1
+  %arrayidx2 = getelementptr inbounds i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with nusw flag. All indices and offsets are non-negative.
+define ptr @test_4(ptr %p, i32 %i) {
+; CHECK-LABEL: define ptr @test_4(
+; CHECK-SAME: ptr [[P:%.*]], i32 [[I:%.*]]) {
+; CHECK-NEXT:    [[I_PROM:%.*]] = zext i32 [[I]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nusw i8, ptr [[P]], i64 [[I_PROM]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nusw i8, ptr [[TMP1]], i64 2
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw inbounds i8, ptr %p, i32 1
+  %i.prom = zext i32 %i to i64
+  %idx2 = add i64 %i.prom, 1
+  %arrayidx2 = getelementptr nuw nusw i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with inbounds flag. All indices and offsets are non-negative.
+define ptr @test_5(ptr %p, i32 %i) {
+; CHECK-LABEL: define ptr @test_5(
+; CHECK-SAME: ptr [[P:%.*]], i32 [[I:%.*]]) {
+; CHECK-NEXT:    [[I_PROM:%.*]] = zext i32 [[I]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 [[I_PROM]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr inbounds i8, ptr [[TMP1]], i64 2
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw inbounds i8, ptr %p, i32 1
+  %i.prom = zext i32 %i to i64
+  %idx2 = add i64 %i.prom, 1
+  %arrayidx2 = getelementptr nuw inbounds i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with nuw flag. Negative offsets.
+define ptr @test_6(ptr %p, i64 %i) {
+; CHECK-LABEL: define ptr @test_6(
+; CHECK-SAME: ptr [[P:%.*]], i64 [[I:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nuw i8, ptr [[P]], i64 [[I]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nuw i8, ptr [[TMP1]], i64 -11
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw nusw i8, ptr %p, i32 -1
+  %idx2 = add nuw i64 %i, -10
+  %arrayidx2 = getelementptr nuw i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with nuw and nusw flags. Negative offsets.
+define ptr @test_7(ptr %p, i64 %i) {
+; CHECK-LABEL: define ptr @test_7(
+; CHECK-SAME: ptr [[P:%.*]], i64 [[I:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nusw nuw i8, ptr [[P]], i64 [[I]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nusw nuw i8, ptr [[TMP1]], i64 -11
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw inbounds i8, ptr %p, i32 -1
+  %idx2 = add nuw i64 %i, -10
+  %arrayidx2 = getelementptr nuw nusw i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with nuw flag. Mixed positive/negative offsets.
+define ptr @test_8(ptr %p, i64 %i) {
+; CHECK-LABEL: define ptr @test_8(
+; CHECK-SAME: ptr [[P:%.*]], i64 [[I:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nuw i8, ptr [[P]], i64 [[I]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nuw i8, ptr [[TMP1]], i64 -9
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw i8, ptr %p, i32 1
+  %idx2 = add nuw i64 %i, -10
+  %arrayidx2 = getelementptr nuw nusw i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with nuw and nusw flags. Mixed positive/negative offsets.
+define ptr @test_9(ptr %p, i64 %i) {
+; CHECK-LABEL: define ptr @test_9(
+; CHECK-SAME: ptr [[P:%.*]], i64 [[I:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nusw nuw i8, ptr [[P]], i64 [[I]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nusw nuw i8, ptr [[TMP1]], i64 -9
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw inbounds i8, ptr %p, i32 1
+  %idx2 = add nuw i64 %i, -10
+  %arrayidx2 = getelementptr nuw nusw i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with nuw flag. Mixed negative/positive offsets.
+define ptr @test_10(ptr %p, i64 %i) {
+; CHECK-LABEL: define ptr @test_10(
+; CHECK-SAME: ptr [[P:%.*]], i64 [[I:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nuw i8, ptr [[P]], i64 [[I]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nuw i8, ptr [[TMP1]], i64 9
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw i8, ptr %p, i32 -1
+  %idx2 = add nuw i64 %i, 10
+  %arrayidx2 = getelementptr nuw nusw i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with nuw and nusw flags. Mixed negative/positive offsets.
+define ptr @test_11(ptr %p, i64 %i) {
+; CHECK-LABEL: define ptr @test_11(
+; CHECK-SAME: ptr [[P:%.*]], i64 [[I:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nusw nuw i8, ptr [[P]], i64 [[I]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nusw nuw i8, ptr [[TMP1]], i64 9
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw inbounds i8, ptr %p, i32 -1
+  %idx2 = add nuw i64 %i, 10
+  %arrayidx2 = getelementptr nuw nusw i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with nuw flag. All positive offsets.
+define ptr @test_12(ptr %p, i64 %i) {
+; CHECK-LABEL: define ptr @test_12(
+; CHECK-SAME: ptr [[P:%.*]], i64 [[I:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nuw i8, ptr [[P]], i64 [[I]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nuw i8, ptr [[TMP1]], i64 2
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw nusw i8, ptr %p, i32 1
+  %idx2 = add nuw i64 %i, 1
+  %arrayidx2 = getelementptr nuw i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with nuw and nusw flags. All positive offsets.
+define ptr @test_13(ptr %p, i64 %i) {
+; CHECK-LABEL: define ptr @test_13(
+; CHECK-SAME: ptr [[P:%.*]], i64 [[I:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nusw nuw i8, ptr [[P]], i64 [[I]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nusw nuw i8, ptr [[TMP1]], i64 2
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw inbounds i8, ptr %p, i32 1
+  %idx2 = add nuw i64 %i, 1
+  %arrayidx2 = getelementptr nuw nusw i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with nuw flag. Zext index with negative offsets.
+define ptr @test_14(ptr %p, i32 %i) {
+; CHECK-LABEL: define ptr @test_14(
+; CHECK-SAME: ptr [[P:%.*]], i32 [[I:%.*]]) {
+; CHECK-NEXT:    [[I_PROM:%.*]] = zext i32 [[I]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nuw i8, ptr [[P]], i64 [[I_PROM]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nuw i8, ptr [[TMP1]], i64 -11
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw nusw i8, ptr %p, i32 -1
+  %i.prom = zext i32 %i to i64
+  %idx2 = add nuw i64 %i.prom, -10
+  %arrayidx2 = getelementptr nuw i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with nuw and nusw flags. Zext index with negative offsets.
+define ptr @test_15(ptr %p, i32 %i) {
+; CHECK-LABEL: define ptr @test_15(
+; CHECK-SAME: ptr [[P:%.*]], i32 [[I:%.*]]) {
+; CHECK-NEXT:    [[I_PROM:%.*]] = zext i32 [[I]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nusw nuw i8, ptr [[P]], i64 [[I_PROM]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nusw nuw i8, ptr [[TMP1]], i64 -11
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw nusw i8, ptr %p, i32 -1
+  %i.prom = zext i32 %i to i64
+  %idx2 = add nuw i64 %i.prom, -10
+  %arrayidx2 = getelementptr nuw inbounds i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with nuw flag. Zext index with mixed positive/negative offsets.
+define ptr @test_16(ptr %p, i32 %i) {
+; CHECK-LABEL: define ptr @test_16(
+; CHECK-SAME: ptr [[P:%.*]], i32 [[I:%.*]]) {
+; CHECK-NEXT:    [[I_PROM:%.*]] = zext i32 [[I]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nuw i8, ptr [[P]], i64 [[I_PROM]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nuw i8, ptr [[TMP1]], i64 -9
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw i8, ptr %p, i32 1
+  %i.prom = zext i32 %i to i64
+  %idx2 = add nuw i64 %i.prom, -10
+  %arrayidx2 = getelementptr nuw nusw i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with nuw and nusw flags. Zext index with mixed positive/negative offsets.
+define ptr @test_17(ptr %p, i32 %i) {
+; CHECK-LABEL: define ptr @test_17(
+; CHECK-SAME: ptr [[P:%.*]], i32 [[I:%.*]]) {
+; CHECK-NEXT:    [[I_PROM:%.*]] = zext i32 [[I]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nusw nuw i8, ptr [[P]], i64 [[I_PROM]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nusw nuw i8, ptr [[TMP1]], i64 -9
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw nusw i8, ptr %p, i32 1
+  %i.prom = zext i32 %i to i64
+  %idx2 = add nuw i64 %i.prom, -10
+  %arrayidx2 = getelementptr nuw inbounds i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with nuw flag. Zext index with mixed negative/positive offsets.
+define ptr @test_18(ptr %p, i32 %i) {
+; CHECK-LABEL: define ptr @test_18(
+; CHECK-SAME: ptr [[P:%.*]], i32 [[I:%.*]]) {
+; CHECK-NEXT:    [[I_PROM:%.*]] = zext i32 [[I]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nuw i8, ptr [[P]], i64 [[I_PROM]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nuw i8, ptr [[TMP1]], i64 9
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw nusw i8, ptr %p, i32 -1
+  %i.prom = zext i32 %i to i64
+  %idx2 = add nuw i64 %i.prom, 10
+  %arrayidx2 = getelementptr nuw i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with nuw and nusw flags. Zext index with mixed negative/positive offsets.
+define ptr @test_19(ptr %p, i32 %i) {
+; CHECK-LABEL: define ptr @test_19(
+; CHECK-SAME: ptr [[P:%.*]], i32 [[I:%.*]]) {
+; CHECK-NEXT:    [[I_PROM:%.*]] = zext i32 [[I]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nusw nuw i8, ptr [[P]], i64 [[I_PROM]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nusw nuw i8, ptr [[TMP1]], i64 9
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw nusw i8, ptr %p, i32 -1
+  %i.prom = zext i32 %i to i64
+  %idx2 = add nuw i64 %i.prom, 10
+  %arrayidx2 = getelementptr nuw inbounds i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
+}
+
+; GEPs with nuw flag. Zext index with all positive offsets.
+define ptr @test_20(ptr %p, i32 %i) {
+; CHECK-LABEL: define ptr @test_20(
+; CHECK-SAME: ptr [[P:%.*]], i32 [[I:%.*]]) {
+; CHECK-NEXT:    [[I_PROM:%.*]] = zext i32 [[I]] to i64
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr nuw i8, ptr [[P]], i64 [[I_PROM]]
+; CHECK-NEXT:    [[ARRAYIDX22:%.*]] = getelementptr nuw i8, ptr [[TMP1]], i64 2
+; CHECK-NEXT:    ret ptr [[ARRAYIDX22]]
+;
+  %ptradd = getelementptr nuw i8, ptr %p, i32 1
+  %i.prom = zext i32 %i to i64
+  %idx2 = add nuw i64 %i.prom, 1
+  %arrayidx2 = getelementptr nuw nusw i8, ptr %ptradd, i64 %idx2
+  ret ptr %arrayidx2
...
[truncated]

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are some test failures.

This looks plausible, but I also feel like this is something we should be doing in InstCombine? (That is, trying to combine multiple constant offset GEPs into one, even if there are intermediate GEPs, as long as use-constraints allow.)

for (BasicBlock &B : F) {
if (!DT->isReachableFromEntry(&B))

ReversePostOrderTraversal<Function *> RPOT(&F);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this related to the rest of the patch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is to ensure that when there is a chain of GEPs, we always separate constant offsets in order. This way the logic to pick up the constant offset from the first GEP and pass it along through the second always works as expected. My understanding is that a standard iteration doesn't have very strong guarantees about the ordering of blocks and a RPOT would ensure GEPs are transformed in order.

@AlexMaclean
Copy link
Member Author

This looks plausible, but I also feel like this is something we should be doing in InstCombine? (That is, trying to combine multiple constant offset GEPs into one, even if there are intermediate GEPs, as long as use-constraints allow.)

As things are currently divided I think it makes a lot of sense to keep this logic here. This pass isn't run by most targets and it's behavior is very dependent on TTI.isLegalAddressingMode. Since we assume that the constant offsets we're generating will be used in the addressing of some memory instruction we're also less concerned about use constrains then we would be in InstCombine. While I agree there are likely some limited ways to do similar transformations in IC, here I think its safe to be a bit more aggressive and rely on target-specific information.

@AlexMaclean
Copy link
Member Author

@nikic @jrbyrnes @krzysz00 Ping for review

@AlexMaclean AlexMaclean requested review from arsenm and Artem-B June 17, 2025 14:26
Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see anything wrong here, so week approve from me (I'd give it a bit of time to see of other folks have thoughts)

@nikic
Copy link
Contributor

nikic commented Jun 17, 2025

This looks plausible, but I also feel like this is something we should be doing in InstCombine? (That is, trying to combine multiple constant offset GEPs into one, even if there are intermediate GEPs, as long as use-constraints allow.)

As things are currently divided I think it makes a lot of sense to keep this logic here. This pass isn't run by most targets and it's behavior is very dependent on TTI.isLegalAddressingMode. Since we assume that the constant offsets we're generating will be used in the addressing of some memory instruction we're also less concerned about use constrains then we would be in InstCombine. While I agree there are likely some limited ways to do similar transformations in IC, here I think its safe to be a bit more aggressive and rely on target-specific information.

I don't disagree. My general thinking here was more along the lines that we should try do canonicalize this as far as we can in InstCombine, and have SeparateConstOffsetFromGEP pick up the parts that InstCombine can't do (such as use constraints, as you mention.)

We're moving towards splitting up GEPs in InstCombine (see #137297 for a first step), which will increase our ability to make these kind of transforms at that level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants