[InstCombine] Handle scalable geps in EmitGEPOffset #71565

davemgreen · 2023-11-07T17:57:15Z

This adds scalable handling for scalable vectors in emitGEPOffset. This was noticed in some tests that Biplob was creating, so might be unlikely to come up much in practice. I've attempted to add test coverage for various places EmitGEPOffset is called. The vscale intrinsics will currently emit multiple copies, relying on later CSE to combine them.

llvmbot · 2023-11-07T17:57:47Z

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: David Green (davemgreen)

Changes

This adds scalable handling for scalable vectors in emitGEPOffset. This was noticed in some tests that Biplob was creating, so might be unlikely to come up much in practice. I've attempted to add test coverage for various places EmitGEPOffset is called. The vscale intrinsics will currently emit multiple copies, relying on later CSE to combine them.

Full diff: https://github.com/llvm/llvm-project/pull/71565.diff

3 Files Affected:

(modified) llvm/lib/Analysis/Local.cpp (+7-4)
(modified) llvm/test/Transforms/InstCombine/icmp-gep.ll (+70)
(modified) llvm/test/Transforms/InstCombine/sub-gep.ll (+61)

diff --git a/llvm/lib/Analysis/Local.cpp b/llvm/lib/Analysis/Local.cpp
index 672e6a190a57ff8..ded6007663845e0 100644
--- a/llvm/lib/Analysis/Local.cpp
+++ b/llvm/lib/Analysis/Local.cpp
@@ -45,7 +45,8 @@ Value *llvm::emitGEPOffset(IRBuilderBase *Builder, const DataLayout &DL,
   for (User::op_iterator i = GEP->op_begin() + 1, e = GEP->op_end(); i != e;
        ++i, ++GTI) {
     Value *Op = *i;
-    uint64_t Size = DL.getTypeAllocSize(GTI.getIndexedType()) & PtrSizeMask;
+    TypeSize TSize = DL.getTypeAllocSize(GTI.getIndexedType());
+    uint64_t Size = TSize.getKnownMinValue() & PtrSizeMask;
     if (Constant *OpC = dyn_cast<Constant>(Op)) {
       if (OpC->isZeroValue())
         continue;
@@ -70,10 +71,12 @@ Value *llvm::emitGEPOffset(IRBuilderBase *Builder, const DataLayout &DL,
     // Convert to correct type.
     if (Op->getType() != IntIdxTy)
       Op = Builder->CreateIntCast(Op, IntIdxTy, true, Op->getName() + ".c");
-    if (Size != 1) {
+    if (Size != 1 || TSize.isScalable()) {
       // We'll let instcombine(mul) convert this to a shl if possible.
-      Op = Builder->CreateMul(Op, ConstantInt::get(IntIdxTy, Size),
-                              GEP->getName() + ".idx", false /*NUW*/,
+      auto *ScaleC = ConstantInt::get(IntIdxTy, Size);
+      Value *Scale =
+          !TSize.isScalable() ? ScaleC : Builder->CreateVScale(ScaleC);
+      Op = Builder->CreateMul(Op, Scale, GEP->getName() + ".idx", false /*NUW*/,
                               isInBounds /*NSW*/);
     }
     AddOffset(Op);
diff --git a/llvm/test/Transforms/InstCombine/icmp-gep.ll b/llvm/test/Transforms/InstCombine/icmp-gep.ll
index 5cc6d9f80bac462..7d266b7b246ae08 100644
--- a/llvm/test/Transforms/InstCombine/icmp-gep.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-gep.ll
@@ -396,3 +396,73 @@ define i1 @test61_as1(ptr addrspace(1) %foo, i16 %i, i16 %j) {
   ret i1 %cmp
 ; Don't transform non-inbounds GEPs.
 }
+
+define i1 @test_scalable_same(ptr %x) {
+; CHECK-LABEL: @test_scalable_same(
+; CHECK-NEXT:    ret i1 false
+;
+  %a = getelementptr <vscale x 4 x i8>, ptr %x, i64 8
+  %b = getelementptr inbounds <vscale x 4 x i8>, ptr %x, i64 8
+  %c = icmp ugt ptr %a, %b
+  ret i1 %c
+}
+
+define i1 @test_scalable_x(ptr %x) {
+; CHECK-LABEL: @test_scalable_x(
+; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[A_IDX_MASK:%.*]] = and i64 [[TMP1]], 576460752303423487
+; CHECK-NEXT:    [[C:%.*]] = icmp eq i64 [[A_IDX_MASK]], 0
+; CHECK-NEXT:    ret i1 [[C]]
+;
+  %a = getelementptr <vscale x 4 x i8>, ptr %x, i64 8
+  %c = icmp eq ptr %a, %x
+  ret i1 %c
+}
+
+define i1 @test_scalable_xc(ptr %x) {
+; CHECK-LABEL: @test_scalable_xc(
+; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[A_IDX_MASK:%.*]] = and i64 [[TMP1]], 576460752303423487
+; CHECK-NEXT:    [[C:%.*]] = icmp eq i64 [[A_IDX_MASK]], 0
+; CHECK-NEXT:    ret i1 [[C]]
+;
+  %a = getelementptr <vscale x 4 x i8>, ptr %x, i64 8
+  %c = icmp eq ptr %x, %a
+  ret i1 %c
+}
+
+define i1 @test_scalable_xy(ptr %foo, i64 %i, i64 %j) {
+; CHECK-LABEL: @test_scalable_xy(
+; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP2:%.*]] = shl i64 [[TMP1]], 2
+; CHECK-NEXT:    [[GEP2_IDX:%.*]] = mul nsw i64 [[TMP2]], [[J:%.*]]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP4:%.*]] = shl i64 [[TMP3]], 4
+; CHECK-NEXT:    [[GEP1_IDX:%.*]] = mul nsw i64 [[TMP4]], [[I:%.*]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp sgt i64 [[GEP2_IDX]], [[GEP1_IDX]]
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %bit = addrspacecast ptr %foo to ptr addrspace(3)
+  %gep1 = getelementptr inbounds <vscale x 4 x i32>, ptr addrspace(3) %bit, i64 %i
+  %gep2 = getelementptr inbounds <vscale x 4 x i8>, ptr %foo, i64 %j
+  %cast1 = addrspacecast ptr addrspace(3) %gep1 to ptr
+  %cmp = icmp ult ptr %cast1, %gep2
+  ret i1 %cmp
+}
+
+define i1 @test_scalable_ij(ptr %foo, i64 %i, i64 %j) {
+; CHECK-LABEL: @test_scalable_ij(
+; CHECK-NEXT:    [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP2:%.*]] = shl i64 [[TMP1]], 4
+; CHECK-NEXT:    [[GEP1_IDX:%.*]] = mul nsw i64 [[TMP2]], [[I:%.*]]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP4:%.*]] = shl i64 [[TMP3]], 2
+; CHECK-NEXT:    [[GEP2_IDX:%.*]] = mul nsw i64 [[TMP4]], [[J:%.*]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i64 [[GEP1_IDX]], [[GEP2_IDX]]
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+  %gep1 = getelementptr inbounds <vscale x 4 x i32>, ptr %foo, i64 %i
+  %gep2 = getelementptr inbounds <vscale x 4 x i8>, ptr %foo, i64 %j
+  %cmp = icmp ult ptr %gep1, %gep2
+  ret i1 %cmp
+}
diff --git a/llvm/test/Transforms/InstCombine/sub-gep.ll b/llvm/test/Transforms/InstCombine/sub-gep.ll
index a8f8a82e675b026..af5fa5352ace73f 100644
--- a/llvm/test/Transforms/InstCombine/sub-gep.ll
+++ b/llvm/test/Transforms/InstCombine/sub-gep.ll
@@ -370,6 +370,67 @@ define i64 @gep_diff_with_bitcast(ptr %p, i64 %idx) {
   ret i64 %i6
 }
 
+define i64 @sub_scalable(ptr noundef %val1) {
+; CHECK-LABEL: @sub_scalable(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP1:%.*]] = shl i64 [[TMP0]], 4
+; CHECK-NEXT:    ret i64 [[TMP1]]
+;
+entry:
+  %gep1 = getelementptr <vscale x 4 x i32>, ptr %val1, i64 1
+  %sub.ptr.lhs.cast.i = ptrtoint ptr %gep1 to i64
+  %sub.ptr.rhs.cast.i = ptrtoint ptr %val1 to i64
+  %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+  ret i64 %sub.ptr.sub.i
+}
+
+define i64 @sub_scalable2(ptr noundef %val1) {
+; CHECK-LABEL: @sub_scalable2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP1:%.*]] = shl i64 [[TMP0]], 4
+; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[GEP2_IDX:%.*]] = shl i64 [[TMP2]], 5
+; CHECK-NEXT:    [[GEPDIFF:%.*]] = sub i64 [[TMP1]], [[GEP2_IDX]]
+; CHECK-NEXT:    ret i64 [[GEPDIFF]]
+;
+entry:
+  %gep1 = getelementptr <vscale x 4 x i32>, ptr %val1, i64 1
+  %sub.ptr.lhs.cast.i = ptrtoint ptr %gep1 to i64
+  %gep2 = getelementptr <vscale x 4 x i32>, ptr %val1, i64 2
+  %sub.ptr.rhs.cast.i = ptrtoint ptr %gep2 to i64
+  %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
+  ret i64 %sub.ptr.sub.i
+}
+
+define i64 @nullptrtoint_scalable_c() {
+; CHECK-LABEL: @nullptrtoint_scalable_c(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[PTR_IDX:%.*]] = shl i64 [[TMP0]], 7
+; CHECK-NEXT:    ret i64 [[PTR_IDX]]
+;
+entry:
+  %ptr = getelementptr inbounds <vscale x 4 x i32>, ptr null, i64 8
+  %ret = ptrtoint ptr %ptr to i64
+  ret i64 %ret
+}
+
+define i64 @nullptrtoint_scalable_x(i64 %x) {
+; CHECK-LABEL: @nullptrtoint_scalable_x(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
+; CHECK-NEXT:    [[TMP1:%.*]] = shl i64 [[TMP0]], 4
+; CHECK-NEXT:    [[PTR_IDX:%.*]] = mul nsw i64 [[TMP1]], [[X:%.*]]
+; CHECK-NEXT:    ret i64 [[PTR_IDX]]
+;
+entry:
+  %ptr = getelementptr inbounds <vscale x 4 x i32>, ptr null, i64 %x
+  %ret = ptrtoint ptr %ptr to i64
+  ret i64 %ret
+}
+
 define i1 @_gep_phi1(ptr noundef %str1) {
 ; CHECK-LABEL: @_gep_phi1(
 ; CHECK-NEXT:  entry:

nikic

LGTM

david-arm · 2023-11-07T18:29:35Z

llvm/lib/Analysis/Local.cpp

-                              GEP->getName() + ".idx", false /*NUW*/,
+      auto *ScaleC = ConstantInt::get(IntIdxTy, Size);
+      Value *Scale =
+          !TSize.isScalable() ? ScaleC : Builder->CreateVScale(ScaleC);


It looks like ScaleC could possibly be a VectorType judging from the CreateVectorSplat call above. Do we need a test for this?

Thanks for the suggestion. It looks like the above CreateVectorSplat wouldn't handle scalable vectors either. That is actually a different form of scalable vector, but I'll try and address that in a separate patch.

preames · 2023-11-09T00:34:37Z

llvm/lib/Analysis/Local.cpp

@@ -45,7 +45,8 @@ Value *llvm::emitGEPOffset(IRBuilderBase *Builder, const DataLayout &DL,
  for (User::op_iterator i = GEP->op_begin() + 1, e = GEP->op_end(); i != e;
       ++i, ++GTI) {
    Value *Op = *i;
-    uint64_t Size = DL.getTypeAllocSize(GTI.getIndexedType()) & PtrSizeMask;
+    TypeSize TSize = DL.getTypeAllocSize(GTI.getIndexedType());
+    uint64_t Size = TSize.getKnownMinValue() & PtrSizeMask;


I'm not entirely clear on the role of PtrSizeMask, but it seems odd to be applying this to the known min value for a scalable vector and not the full dynamic type size. Doesn't seem to match the fixed vector semantics.

preames · 2023-11-09T00:35:55Z

llvm/lib/Analysis/Local.cpp

      // We'll let instcombine(mul) convert this to a shl if possible.
-      Op = Builder->CreateMul(Op, ConstantInt::get(IntIdxTy, Size),
-                              GEP->getName() + ".idx", false /*NUW*/,
+      auto *ScaleC = ConstantInt::get(IntIdxTy, Size);


This expression should just be IRBuilder::getTypeSize(IntIdxTy, TSize).

davemgreen · 2023-11-10T12:34:14Z

Thanks for the suggestions. The IRBuilder method for building TypeSize was now one I knew about. I've tried to address your comments in the followup in #71699.

Following up on llvm#71565, this makes scalable splats in EmitGEPOffset use the ElementCount as opposed to assuming it is fixed width, and attempts to handle scalable offsets with vector geps by splatting the vscale to each vector lane. It appears that the `& PtrSizeMask` can be removed without altering any of the tests or any of the test I tried across AArch64/Arm.

Following up on #71565, this makes scalable splats in EmitGEPOffset use the ElementCount as opposed to assuming it is fixed width, and attempts to handle scalable offsets with vector geps by splatting the vscale to each vector lane.

Following up on llvm#71565, this makes scalable splats in EmitGEPOffset use the ElementCount as opposed to assuming it is fixed width, and attempts to handle scalable offsets with vector geps by splatting the vscale to each vector lane.

davemgreen requested review from nikic, david-arm, spatel-gh and bipmis November 7, 2023 17:57

llvmbot added llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Nov 7, 2023

nikic approved these changes Nov 7, 2023

View reviewed changes

david-arm reviewed Nov 7, 2023

View reviewed changes

davemgreen merged commit 3ccbd68 into llvm:main Nov 8, 2023

davemgreen mentioned this pull request Nov 8, 2023

[InstCombine] Handle more scalable geps in EmitGEPOffset #71699

Merged

preames reviewed Nov 9, 2023

View reviewed changes

davemgreen deleted the gh-sve-emitGEPOffset branch November 10, 2023 12:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[InstCombine] Handle scalable geps in EmitGEPOffset #71565

[InstCombine] Handle scalable geps in EmitGEPOffset #71565

davemgreen commented Nov 7, 2023

Uh oh!

llvmbot commented Nov 7, 2023 •

edited

Loading

Uh oh!

nikic left a comment

Uh oh!

david-arm Nov 7, 2023

Uh oh!

davemgreen Nov 8, 2023

Uh oh!

preames Nov 9, 2023

Uh oh!

preames Nov 9, 2023

Uh oh!

davemgreen commented Nov 10, 2023

Uh oh!

Uh oh!

[InstCombine] Handle scalable geps in EmitGEPOffset #71565

[InstCombine] Handle scalable geps in EmitGEPOffset #71565

Conversation

davemgreen commented Nov 7, 2023

Uh oh!

llvmbot commented Nov 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

david-arm Nov 7, 2023

Choose a reason for hiding this comment

Uh oh!

davemgreen Nov 8, 2023

Choose a reason for hiding this comment

Uh oh!

preames Nov 9, 2023

Choose a reason for hiding this comment

Uh oh!

preames Nov 9, 2023

Choose a reason for hiding this comment

Uh oh!

davemgreen commented Nov 10, 2023

Uh oh!

Uh oh!

llvmbot commented Nov 7, 2023 •

edited

Loading