[InstCombine] Extend folding of aggregate construction to cases when source aggregates are partially available #100828

weiguozhi · 2024-07-26T22:48:46Z

Function foldAggregateConstructionIntoAggregateReuse can fold
insertvalue(phi(extractvalue(src1), extractvalue(src2)))
into
phi(src1, src2)
when we can find source aggregates in all predecessors.

This patch extends it to handle following case
insertvalue(phi(extractvalue(src1), elm2))
into
phi(src1, insertvalue(elm2))
with the condition that the predecessor without source aggregate has only one successor.

llvmbot · 2024-07-26T22:49:22Z

@llvm/pr-subscribers-llvm-transforms

Author: None (weiguozhi)

Changes

Function foldAggregateConstructionIntoAggregateReuse can fold
insertvalue(phi(extractvalue(src1), extractvalue(src2)))
into
phi(src1, src2)
when we can find source aggregates in all predecessors.

This patch extends it to handle following case
insertvalue(phi(extractvalue(src1), elm2))
into
phi(src1, insertvalue(elm2))
with the condition that the predecessor without source aggregate has only one successor.

Full diff: https://github.com/llvm/llvm-project/pull/100828.diff

4 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp (+30-3)
(added) llvm/test/Transforms/InstCombine/fold-aggregate-reconstruction.ll (+215)
(modified) llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll (+3-4)
(modified) llvm/test/Transforms/InstCombine/phi-aware-aggregate-reconstruction.ll (+12-14)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
index 753ed55523c84..5377bf6f033d0 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
@@ -1106,6 +1106,7 @@ Instruction *InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse(
   // from which all the elements were originally extracted from?
   // Note that we want for the map to have stable iteration order!
   SmallDenseMap<BasicBlock *, Value *, 4> SourceAggregates;
+  bool FoundSrcAgg = false;
   for (BasicBlock *Pred : Preds) {
     std::pair<decltype(SourceAggregates)::iterator, bool> IV =
         SourceAggregates.insert({Pred, nullptr});
@@ -1117,9 +1118,35 @@ Instruction *InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse(
     // aggregate produced by OrigIVI must have been originally extracted from
     // the same aggregate. Is that so? Can we find said original aggregate?
     SourceAggregate = FindCommonSourceAggregate(UseBB, Pred);
-    if (Describe(SourceAggregate) != AggregateDescription::Found)
-      return nullptr; // Give up.
-    IV.first->second = *SourceAggregate;
+    if (Describe(SourceAggregate) == AggregateDescription::Found) {
+      FoundSrcAgg = true;
+      IV.first->second = *SourceAggregate;
+    } else {
+      // If UseBB is the single successor of Pred, we can add InsertValue to
+      // Pred.
+      if (succ_size(Pred) != 1)
+        return nullptr; // Give up.
+    }
+  }
+
+  if (!FoundSrcAgg)
+    return nullptr;
+
+  // For predecessors without appropriate source aggregate, create one in the
+  // predecessor.
+  for (auto &It : SourceAggregates) {
+    if (Describe(It.second) == AggregateDescription::Found)
+      continue;
+
+    BasicBlock *Pred = It.first;
+    Builder.SetInsertPoint(Pred, Pred->getTerminator()->getIterator());
+    Value *V = PoisonValue::get(AggTy);
+    for (auto I : enumerate(AggElts)) {
+      Value *Elt = (*I.value())->DoPHITranslation(UseBB, Pred);
+      V = Builder.CreateInsertValue(V, Elt, I.index());
+    }
+
+    It.second = V;
   }
 
   // All good! Now we just need to thread the source aggregates here.
diff --git a/llvm/test/Transforms/InstCombine/fold-aggregate-reconstruction.ll b/llvm/test/Transforms/InstCombine/fold-aggregate-reconstruction.ll
new file mode 100644
index 0000000000000..a75e9f1580ab9
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/fold-aggregate-reconstruction.ll
@@ -0,0 +1,215 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -passes=instcombine < %s | FileCheck %s
+
+declare {ptr, i64} @bar(i64)
+
+; Basic test.
+define {ptr, i64} @test1(i1 %cond1, ptr %p1, ptr %p2) {
+; CHECK-LABEL: define { ptr, i64 } @test1(
+; CHECK-SAME: i1 [[COND1:%.*]], ptr [[P1:%.*]], ptr [[P2:%.*]]) {
+; CHECK-NEXT:    br i1 [[COND1]], label %[[BBB1:.*]], label %[[BBB2:.*]]
+; CHECK:       [[BBB1]]:
+; CHECK-NEXT:    [[CALL1:%.*]] = call { ptr, i64 } @bar(i64 0)
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[BBB2]]:
+; CHECK-NEXT:    [[VAL21:%.*]] = load ptr, ptr [[P1]], align 8
+; CHECK-NEXT:    [[VAL22:%.*]] = load i64, ptr [[P2]], align 4
+; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { ptr, i64 } poison, ptr [[VAL21]], 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue { ptr, i64 } [[TMP1]], i64 [[VAL22]], 1
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    [[RES_MERGED:%.*]] = phi { ptr, i64 } [ [[CALL1]], %[[BBB1]] ], [ [[TMP2]], %[[BBB2]] ]
+; CHECK-NEXT:    ret { ptr, i64 } [[RES_MERGED]]
+;
+  br i1 %cond1, label %bbb1, label %bbb2
+
+bbb1:
+  %call1 = call {ptr, i64} @bar(i64 0)
+  %val11 = extractvalue { ptr, i64 } %call1, 0
+  %val12 = extractvalue { ptr, i64 } %call1, 1
+  br label %exit
+
+bbb2:
+  %val21 = load ptr, ptr %p1
+  %val22 = load i64, ptr %p2
+  br label %exit
+
+exit:
+  %val1 = phi ptr [%val11, %bbb1], [%val21, %bbb2]
+  %val2 = phi i64 [%val12, %bbb1], [%val22, %bbb2]
+  %tmp = insertvalue { ptr, i64 } poison, ptr %val1, 0
+  %res = insertvalue { ptr, i64 } %tmp, i64 %val2, 1
+  ret {ptr, i64} %res
+}
+
+; Test with more predecessors.
+define {ptr, i64} @test2(i1 %cond1, i1 %cond2, ptr %p1, ptr %p2) {
+; CHECK-LABEL: define { ptr, i64 } @test2(
+; CHECK-SAME: i1 [[COND1:%.*]], i1 [[COND2:%.*]], ptr [[P1:%.*]], ptr [[P2:%.*]]) {
+; CHECK-NEXT:    br i1 [[COND1]], label %[[BBB1:.*]], label %[[BBB4:.*]]
+; CHECK:       [[BBB1]]:
+; CHECK-NEXT:    br i1 [[COND2]], label %[[BBB2:.*]], label %[[BBB3:.*]]
+; CHECK:       [[BBB2]]:
+; CHECK-NEXT:    [[CALL1:%.*]] = call { ptr, i64 } @bar(i64 0)
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[BBB3]]:
+; CHECK-NEXT:    [[CALL2:%.*]] = call { ptr, i64 } @bar(i64 1)
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[BBB4]]:
+; CHECK-NEXT:    [[VAL31:%.*]] = load ptr, ptr [[P1]], align 8
+; CHECK-NEXT:    [[VAL32:%.*]] = load i64, ptr [[P2]], align 4
+; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { ptr, i64 } poison, ptr [[VAL31]], 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue { ptr, i64 } [[TMP1]], i64 [[VAL32]], 1
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    [[RES_MERGED:%.*]] = phi { ptr, i64 } [ [[CALL1]], %[[BBB2]] ], [ [[CALL2]], %[[BBB3]] ], [ [[TMP2]], %[[BBB4]] ]
+; CHECK-NEXT:    ret { ptr, i64 } [[RES_MERGED]]
+;
+  br i1 %cond1, label %bbb1, label %bbb4
+
+bbb1:
+  br i1 %cond2, label %bbb2, label %bbb3
+
+bbb2:
+  %call1 = call {ptr, i64} @bar(i64 0)
+  %val11 = extractvalue { ptr, i64 } %call1, 0
+  %val12 = extractvalue { ptr, i64 } %call1, 1
+  br label %exit
+
+bbb3:
+  %call2 = call {ptr, i64} @bar(i64 1)
+  %val21 = extractvalue { ptr, i64 } %call2, 0
+  %val22 = extractvalue { ptr, i64 } %call2, 1
+  br label %exit
+
+bbb4:
+  %val31 = load ptr, ptr %p1
+  %val32 = load i64, ptr %p2
+  br label %exit
+
+exit:
+  %val1 = phi ptr [%val11, %bbb2], [%val21, %bbb3], [%val31, %bbb4]
+  %val2 = phi i64 [%val12, %bbb2], [%val22, %bbb3], [%val32, %bbb4]
+  %tmp = insertvalue { ptr, i64 } poison, ptr %val1, 0
+  %res = insertvalue { ptr, i64 } %tmp, i64 %val2, 1
+  ret {ptr, i64} %res
+}
+
+; Test with multiple PHI instructions.
+define {ptr, i64} @test3(i1 %cond1, i1 %cond2, ptr %val31, i64 %val32) {
+; CHECK-LABEL: define { ptr, i64 } @test3(
+; CHECK-SAME: i1 [[COND1:%.*]], i1 [[COND2:%.*]], ptr [[VAL31:%.*]], i64 [[VAL32:%.*]]) {
+; CHECK-NEXT:    br i1 [[COND1]], label %[[BBB1:.*]], label %[[BBB4:.*]]
+; CHECK:       [[BBB1]]:
+; CHECK-NEXT:    br i1 [[COND2]], label %[[BBB2:.*]], label %[[BBB3:.*]]
+; CHECK:       [[BBB2]]:
+; CHECK-NEXT:    [[CALL1:%.*]] = call { ptr, i64 } @bar(i64 0)
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[BBB3]]:
+; CHECK-NEXT:    [[CALL2:%.*]] = call { ptr, i64 } @bar(i64 1)
+; CHECK-NEXT:    br label %[[BBB5:.*]]
+; CHECK:       [[BBB4]]:
+; CHECK-NEXT:    [[CALL3:%.*]] = call { ptr, i64 } @bar(i64 2)
+; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { ptr, i64 } poison, ptr [[VAL31]], 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertvalue { ptr, i64 } [[TMP1]], i64 [[VAL32]], 1
+; CHECK-NEXT:    br label %[[BBB5]]
+; CHECK:       [[BBB5]]:
+; CHECK-NEXT:    [[DOTMERGED:%.*]] = phi { ptr, i64 } [ [[CALL2]], %[[BBB3]] ], [ [[TMP2]], %[[BBB4]] ]
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    [[RES_MERGED:%.*]] = phi { ptr, i64 } [ [[CALL1]], %[[BBB2]] ], [ [[DOTMERGED]], %[[BBB5]] ]
+; CHECK-NEXT:    ret { ptr, i64 } [[RES_MERGED]]
+;
+  br i1 %cond1, label %bbb1, label %bbb4
+
+bbb1:
+  br i1 %cond2, label %bbb2, label %bbb3
+
+bbb2:
+  %call1 = call {ptr, i64} @bar(i64 0)
+  %val11 = extractvalue { ptr, i64 } %call1, 0
+  %val12 = extractvalue { ptr, i64 } %call1, 1
+  br label %exit
+
+bbb3:
+  %call2 = call {ptr, i64} @bar(i64 1)
+  %val21 = extractvalue { ptr, i64 } %call2, 0
+  %val22 = extractvalue { ptr, i64 } %call2, 1
+  br label %bbb5
+
+bbb4:
+  %call3 = call {ptr, i64} @bar(i64 2)
+  br label %bbb5
+
+bbb5:
+  %val41 = phi ptr [%val21, %bbb3], [%val31, %bbb4]
+  %val42 = phi i64 [%val22, %bbb3], [%val32, %bbb4]
+  br label %exit
+
+exit:
+  %val1 = phi ptr [%val11, %bbb2], [%val41, %bbb5]
+  %val2 = phi i64 [%val12, %bbb2], [%val42, %bbb5]
+  %tmp = insertvalue { ptr, i64 } poison, ptr %val1, 0
+  %res = insertvalue { ptr, i64 } %tmp, i64 %val2, 1
+  ret {ptr, i64} %res
+}
+
+; Negative test, bbb4 has multiple successors, so we don't add insertvalue to it.
+define {ptr, i64} @test4(i1 %cond1, i1 %cond2, ptr %p1, ptr %p2) {
+; CHECK-LABEL: define { ptr, i64 } @test4(
+; CHECK-SAME: i1 [[COND1:%.*]], i1 [[COND2:%.*]], ptr [[P1:%.*]], ptr [[P2:%.*]]) {
+; CHECK-NEXT:    br i1 [[COND1]], label %[[BBB1:.*]], label %[[BBB4:.*]]
+; CHECK:       [[BBB1]]:
+; CHECK-NEXT:    br i1 [[COND2]], label %[[BBB2:.*]], label %[[BBB3:.*]]
+; CHECK:       [[BBB2]]:
+; CHECK-NEXT:    [[CALL1:%.*]] = call { ptr, i64 } @bar(i64 0)
+; CHECK-NEXT:    [[VAL11:%.*]] = extractvalue { ptr, i64 } [[CALL1]], 0
+; CHECK-NEXT:    [[VAL12:%.*]] = extractvalue { ptr, i64 } [[CALL1]], 1
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[BBB3]]:
+; CHECK-NEXT:    [[CALL2:%.*]] = call { ptr, i64 } @bar(i64 1)
+; CHECK-NEXT:    [[VAL21:%.*]] = extractvalue { ptr, i64 } [[CALL2]], 0
+; CHECK-NEXT:    [[VAL22:%.*]] = extractvalue { ptr, i64 } [[CALL2]], 1
+; CHECK-NEXT:    br label %[[EXIT]]
+; CHECK:       [[BBB4]]:
+; CHECK-NEXT:    [[VAL31:%.*]] = load ptr, ptr [[P1]], align 8
+; CHECK-NEXT:    [[VAL32:%.*]] = load i64, ptr [[P2]], align 4
+; CHECK-NEXT:    [[COND3_NOT:%.*]] = icmp eq i64 [[VAL32]], 0
+; CHECK-NEXT:    br i1 [[COND3_NOT]], label %[[EXIT]], label %[[BBB4]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    [[VAL1:%.*]] = phi ptr [ [[VAL11]], %[[BBB2]] ], [ [[VAL21]], %[[BBB3]] ], [ [[VAL31]], %[[BBB4]] ]
+; CHECK-NEXT:    [[VAL2:%.*]] = phi i64 [ [[VAL12]], %[[BBB2]] ], [ [[VAL22]], %[[BBB3]] ], [ [[VAL32]], %[[BBB4]] ]
+; CHECK-NEXT:    [[TMP:%.*]] = insertvalue { ptr, i64 } poison, ptr [[VAL1]], 0
+; CHECK-NEXT:    [[RES:%.*]] = insertvalue { ptr, i64 } [[TMP]], i64 [[VAL2]], 1
+; CHECK-NEXT:    ret { ptr, i64 } [[RES]]
+;
+  br i1 %cond1, label %bbb1, label %bbb4
+
+bbb1:
+  br i1 %cond2, label %bbb2, label %bbb3
+
+bbb2:
+  %call1 = call {ptr, i64} @bar(i64 0)
+  %val11 = extractvalue { ptr, i64 } %call1, 0
+  %val12 = extractvalue { ptr, i64 } %call1, 1
+  br label %exit
+
+bbb3:
+  %call2 = call {ptr, i64} @bar(i64 1)
+  %val21 = extractvalue { ptr, i64 } %call2, 0
+  %val22 = extractvalue { ptr, i64 } %call2, 1
+  br label %exit
+
+bbb4:
+  %val31 = load ptr, ptr %p1
+  %val32 = load i64, ptr %p2
+  %cond3 = icmp ne i64 %val32, 0
+  br i1 %cond3, label %bbb4, label %exit
+
+exit:
+  %val1 = phi ptr [%val11, %bbb2], [%val21, %bbb3], [%val31, %bbb4]
+  %val2 = phi i64 [%val12, %bbb2], [%val22, %bbb3], [%val32, %bbb4]
+  %tmp = insertvalue { ptr, i64 } poison, ptr %val1, 0
+  %res = insertvalue { ptr, i64 } %tmp, i64 %val2, 1
+  ret {ptr, i64} %res
+}
diff --git a/llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll b/llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll
index fbf58d47a32d2..41e3197e5a2f0 100644
--- a/llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll
+++ b/llvm/test/Transforms/InstCombine/merging-multiple-stores-into-successor.ll
@@ -166,14 +166,13 @@ define %struct.half @one_elem_struct_merge(i1 %cond, %struct.half %a, half %b) {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 [[COND:%.*]], label [[BB0:%.*]], label [[BB1:%.*]]
 ; CHECK:       BB0:
-; CHECK-NEXT:    [[TMP0:%.*]] = extractvalue [[STRUCT_HALF:%.*]] [[A:%.*]], 0
 ; CHECK-NEXT:    br label [[SINK:%.*]]
 ; CHECK:       BB1:
+; CHECK-NEXT:    [[TMP0:%.*]] = insertvalue [[STRUCT_HALF:%.*]] poison, half [[B:%.*]], 0
 ; CHECK-NEXT:    br label [[SINK]]
 ; CHECK:       sink:
-; CHECK-NEXT:    [[STOREMERGE:%.*]] = phi half [ [[TMP0]], [[BB0]] ], [ [[B:%.*]], [[BB1]] ]
-; CHECK-NEXT:    [[VAL1:%.*]] = insertvalue [[STRUCT_HALF]] poison, half [[STOREMERGE]], 0
-; CHECK-NEXT:    ret [[STRUCT_HALF]] [[VAL1]]
+; CHECK-NEXT:    [[VAL1_MERGED:%.*]] = phi [[STRUCT_HALF]] [ [[A:%.*]], [[BB0]] ], [ [[TMP0]], [[BB1]] ]
+; CHECK-NEXT:    ret [[STRUCT_HALF]] [[VAL1_MERGED]]
 ;
 entry:
   %alloca = alloca i64
diff --git a/llvm/test/Transforms/InstCombine/phi-aware-aggregate-reconstruction.ll b/llvm/test/Transforms/InstCombine/phi-aware-aggregate-reconstruction.ll
index 6b7ac445839d2..706ce4e02f231 100644
--- a/llvm/test/Transforms/InstCombine/phi-aware-aggregate-reconstruction.ll
+++ b/llvm/test/Transforms/InstCombine/phi-aware-aggregate-reconstruction.ll
@@ -97,28 +97,26 @@ end:
   ret { i32, i32 } %i8
 }
 
-; When coming from %left, elements are swapped
-define { i32, i32 } @negative_test2({ i32, i32 } %agg_left, { i32, i32 } %agg_right, i1 %c) {
-; CHECK-LABEL: @negative_test2(
+; When coming from %left, elements are swapped, and new insertvalue is created.
+; From right side the old aggregate can be directly reused.
+define { i32, i32 } @positive_test2({ i32, i32 } %agg_left, { i32, i32 } %agg_right, i1 %c) {
+; CHECK-LABEL: @positive_test2(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 [[C:%.*]], label [[LEFT:%.*]], label [[RIGHT:%.*]]
 ; CHECK:       left:
 ; CHECK-NEXT:    [[I2:%.*]] = extractvalue { i32, i32 } [[AGG_LEFT:%.*]], 1
 ; CHECK-NEXT:    [[I0:%.*]] = extractvalue { i32, i32 } [[AGG_LEFT]], 0
 ; CHECK-NEXT:    call void @foo()
+; CHECK-NEXT:    [[TMP0:%.*]] = insertvalue { i32, i32 } poison, i32 [[I2]], 0
+; CHECK-NEXT:    [[TMP1:%.*]] = insertvalue { i32, i32 } [[TMP0]], i32 [[I0]], 1
 ; CHECK-NEXT:    br label [[END:%.*]]
 ; CHECK:       right:
-; CHECK-NEXT:    [[I4:%.*]] = extractvalue { i32, i32 } [[AGG_RIGHT:%.*]], 1
-; CHECK-NEXT:    [[I3:%.*]] = extractvalue { i32, i32 } [[AGG_RIGHT]], 0
 ; CHECK-NEXT:    call void @bar()
 ; CHECK-NEXT:    br label [[END]]
 ; CHECK:       end:
-; CHECK-NEXT:    [[I5:%.*]] = phi i32 [ [[I2]], [[LEFT]] ], [ [[I3]], [[RIGHT]] ]
-; CHECK-NEXT:    [[I6:%.*]] = phi i32 [ [[I0]], [[LEFT]] ], [ [[I4]], [[RIGHT]] ]
+; CHECK-NEXT:    [[I8_MERGED:%.*]] = phi { i32, i32 } [ [[TMP1]], [[LEFT]] ], [ [[AGG_RIGHT:%.*]], [[RIGHT]] ]
 ; CHECK-NEXT:    call void @baz()
-; CHECK-NEXT:    [[I7:%.*]] = insertvalue { i32, i32 } undef, i32 [[I5]], 0
-; CHECK-NEXT:    [[I8:%.*]] = insertvalue { i32, i32 } [[I7]], i32 [[I6]], 1
-; CHECK-NEXT:    ret { i32, i32 } [[I8]]
+; CHECK-NEXT:    ret { i32, i32 } [[I8_MERGED]]
 ;
 entry:
   %i0 = extractvalue { i32, i32 } %agg_left, 0
@@ -417,14 +415,14 @@ define { i32, i32 } @test8({ i32, i32 } %agg_left, { i32, i32 } %agg_right, i1 %
 ; CHECK:       left:
 ; CHECK-NEXT:    call void @foo()
 ; CHECK-NEXT:    switch i32 [[VAL_LEFT:%.*]], label [[IMPOSSIBLE:%.*]] [
-; CHECK-NEXT:    i32 -42, label [[END:%.*]]
-; CHECK-NEXT:    i32 42, label [[END]]
+; CHECK-NEXT:      i32 -42, label [[END:%.*]]
+; CHECK-NEXT:      i32 42, label [[END]]
 ; CHECK-NEXT:    ]
 ; CHECK:       right:
 ; CHECK-NEXT:    call void @bar()
 ; CHECK-NEXT:    switch i32 [[VAL_RIGHT:%.*]], label [[IMPOSSIBLE]] [
-; CHECK-NEXT:    i32 42, label [[END]]
-; CHECK-NEXT:    i32 -42, label [[END]]
+; CHECK-NEXT:      i32 42, label [[END]]
+; CHECK-NEXT:      i32 -42, label [[END]]
 ; CHECK-NEXT:    ]
 ; CHECK:       impossible:
 ; CHECK-NEXT:    unreachable

PR Link: llvm/llvm-project#100828

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

llvm/test/Transforms/InstCombine/fold-aggregate-reconstruction.ll

PR Link: llvm/llvm-project#100828

dtcxzyw

This patch causes InstCombine to hang when building https://github.com/jqlang/jq:

; timeout 5s bin/opt -passes=instcombine reduced.ll -S
define { i64, ptr } @minmax_by({ i64, ptr } %0, i1 %1, i1 %2, { i64, ptr } %3) {
  %5 = extractvalue { i64, ptr } %0, 0
  %6 = extractvalue { i64, ptr } %0, 1
  br label %7

7:                                                ; preds = %14, %4
  %.sroa.01.0 = phi i64 [ %5, %4 ], [ %.sroa.01.1, %14 ]
  %.sroa.3.0 = phi ptr [ %6, %4 ], [ %.sroa.3.1, %14 ]
  br i1 %1, label %8, label %15

8:                                                ; preds = %7
  br i1 %2, label %9, label %12

9:                                                ; preds = %8
  %10 = extractvalue { i64, ptr } %3, 0
  %11 = extractvalue { i64, ptr } %3, 1
  br label %14

12:                                               ; preds = %8
  %13 = getelementptr { i64, ptr }, ptr null, i32 0, i32 0
  br label %14

14:                                               ; preds = %12, %9
  %.sroa.01.1 = phi i64 [ %10, %9 ], [ %.sroa.01.0, %12 ]
  %.sroa.3.1 = phi ptr [ %11, %9 ], [ %.sroa.3.0, %12 ]
  br label %7

15:                                               ; preds = %7
  %.fca.0.insert = insertvalue { i64, ptr } zeroinitializer, i64 %.sroa.01.0, 0
  %.fca.1.insert = insertvalue { i64, ptr } %.fca.0.insert, ptr %.sroa.3.0, 1
  ret { i64, ptr } %.fca.1.insert
}

weiguozhi · 2024-07-30T19:53:45Z

This patch causes InstCombine to hang when building https://github.com/jqlang/jq:

; timeout 5s bin/opt -passes=instcombine reduced.ll -S
define { i64, ptr } @minmax_by({ i64, ptr } %0, i1 %1, i1 %2, { i64, ptr } %3) {
  %5 = extractvalue { i64, ptr } %0, 0
  %6 = extractvalue { i64, ptr } %0, 1
  br label %7

7:                                                ; preds = %14, %4
  %.sroa.01.0 = phi i64 [ %5, %4 ], [ %.sroa.01.1, %14 ]
  %.sroa.3.0 = phi ptr [ %6, %4 ], [ %.sroa.3.1, %14 ]
  br i1 %1, label %8, label %15

8:                                                ; preds = %7
  br i1 %2, label %9, label %12

9:                                                ; preds = %8
  %10 = extractvalue { i64, ptr } %3, 0
  %11 = extractvalue { i64, ptr } %3, 1
  br label %14

12:                                               ; preds = %8
  %13 = getelementptr { i64, ptr }, ptr null, i32 0, i32 0
  br label %14

14:                                               ; preds = %12, %9
  %.sroa.01.1 = phi i64 [ %10, %9 ], [ %.sroa.01.0, %12 ]
  %.sroa.3.1 = phi ptr [ %11, %9 ], [ %.sroa.3.0, %12 ]
  br label %7

15:                                               ; preds = %7
  %.fca.0.insert = insertvalue { i64, ptr } zeroinitializer, i64 %.sroa.01.0, 0
  %.fca.1.insert = insertvalue { i64, ptr } %.fca.0.insert, ptr %.sroa.3.0, 1
  ret { i64, ptr } %.fca.1.insert
}

weiguozhi · 2024-07-30T20:01:01Z

Thanks for the quick test.

The new insertvalue is added to a predecessor with only one successor, so it intends to reduce the number of executed insertvalue instruction. But it doesn't behave like this when a loop is involved. The loop exiting edges and loop latches don't guarantee reduced dynamic number of insertvalue instructions. We should check these cases.

weiguozhi · 2024-08-20T20:02:41Z

ping

PR Link: llvm/llvm-project#100828

dtcxzyw

Another reproducer causing instcombine to hang:

bin/opt -passes=instcombine reduced.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

define { i64, ptr } @minmax_by({ i64, ptr } %0, i1 %1, i1 %2, { i64, ptr } %3) {
  %5 = extractvalue { i64, ptr } %0, 0
  %6 = extractvalue { i64, ptr } %0, 1
  br label %7

7:                                                ; preds = %14, %4
  %.sroa.01.0 = phi i64 [ %5, %4 ], [ %.sroa.01.1, %14 ]
  %.sroa.3.0 = phi ptr [ %6, %4 ], [ %.sroa.3.1, %14 ]
  br i1 %1, label %8, label %15

8:                                                ; preds = %7
  br i1 %2, label %9, label %12

9:                                                ; preds = %8
  %10 = extractvalue { i64, ptr } %3, 0
  %11 = extractvalue { i64, ptr } %3, 1
  br label %14

12:                                               ; preds = %8
  %13 = getelementptr { i64, ptr }, ptr null, i32 0, i32 0
  br label %14

14:                                               ; preds = %12, %9
  %.sroa.01.1 = phi i64 [ %10, %9 ], [ %.sroa.01.0, %12 ]
  %.sroa.3.1 = phi ptr [ %11, %9 ], [ %.sroa.3.0, %12 ]
  br label %7

15:                                               ; preds = %7
  %.fca.0.insert = insertvalue { i64, ptr } zeroinitializer, i64 %.sroa.01.0, 0
  %.fca.1.insert = insertvalue { i64, ptr } %.fca.0.insert, ptr %.sroa.3.0, 1
  ret { i64, ptr } %.fca.1.insert
}

BTW, I don't think this transform is beneficial since it blocks some select-related optimizations:
Before this patch:

define hidden { i64, i64 } @"_ZN4core6result19Result$LT$T$C$E$GT$7map_err17h84015741f1851455E.llvm.9397651224479784633"(i64 noundef %0, i64 %1) unnamed_addr #0 {
   %3 = insertvalue { i64, i64 } poison, i64 %0, 0
   %4 = insertvalue { i64, i64 } %3, i64 %1, 1
   ret { i64, i64 } %4
 }

After this patch:

define hidden { i64, i64 } @"_ZN4core6result19Result$LT$T$C$E$GT$7map_err17h84015741f1851455E.llvm.9397651224479784633"(i64 noundef %0, i64 %1) unnamed_addr #0 {
   %3 = icmp eq i64 %0, -9223372036854775807
   %4 = insertvalue { i64, i64 } poison, i64 %0, 0
   %5 = insertvalue { i64, i64 } %4, i64 %1, 1
   %.merged = select i1 %3, { i64, i64 } { i64 -9223372036854775807, i64 undef }, { i64, i64 } %5
   ret { i64, i64 } %.merged
 }

Can you provide some performance data to demonstrate the usefulness of this patch?

weiguozhi · 2024-08-21T17:23:59Z

The test case is extracted from hot path of tcmalloc, so it has real world impact.

This patch uses LoopInfo, so you need to run it with

bin/opt -passes='instcombine<use-loop-info>' reduced.ll

I ran the patch on your smaller test case, and got

opt -passes='instcombine<use-loop-info>' reduced2.ll -S

define hidden { i64, i64 } @"_ZN4core6result19Result$LT$T$C$E$GT$7map_err17h84015741f1851E.llvm"(i64 noundef %0, i64 %1) unnamed_addr {
  %3 = insertvalue { i64, i64 } poison, i64 %0, 0
  %4 = insertvalue { i64, i64 } %3, i64 %1, 1
  ret { i64, i64 } %4
}

How did you run it?

dtcxzyw · 2024-08-21T17:35:14Z

The test case is extracted from hot path of tcmalloc, so it has real world impact.

This patch uses LoopInfo, so you need to run it with
bin/opt -passes='instcombine<use-loop-info>' reduced.ll
I ran the patch on your smaller test case, and got
opt -passes='instcombine<use-loop-info>' reduced2.ll -S

define hidden { i64, i64 } @"_ZN4core6result19Result$LT$T$C$E$GT$7map_err17h84015741f1851E.llvm"(i64 noundef %0, i64 %1) unnamed_addr {
  %3 = insertvalue { i64, i64 } poison, i64 %0, 0
  %4 = insertvalue { i64, i64 } %3, i64 %1, 1
  ret { i64, i64 } %4
}
How did you run it?

I run it with opt -O3. UseLoopInfo is disabled by default due to compilation time regression: ef6f235

weiguozhi · 2024-08-21T18:22:05Z

How did you run it?

I run it with opt -O3. UseLoopInfo is disabled by default due to compilation time regression: ef6f235

With -O3 I got

opt -O3 reduced2.ll -S

; ModuleID = 'reduced2.ll'
source_filename = "reduced2.ll"

; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
define hidden { i64, i64 } @"_ZN4core6result19Result$LT$T$C$E$GT$7map_err17h84015741f1851E.llvm"(i64 noundef %0, i64 %1) unnamed_addr #0 {
  %3 = insertvalue { i64, i64 } poison, i64 %0, 0
  %4 = insertvalue { i64, i64 } %3, i64 %1, 1
  ret { i64, i64 } %4
}

Thank you for the LoopInfo information. In order to make it more usable I should avoid the LoopInfo usage. I will think it more.

weiguozhi · 2024-09-09T22:35:53Z

Now if I need to add insertvalue into predecessor, I restrict it to

1, UseBB == OrigBB, then it can not cross a loop
2, If profile is available, we can only add insertvalue into colder BB.

PR Link: llvm/llvm-project#100828

weiguozhi · 2024-09-16T21:15:39Z

ping

nikic · 2024-09-16T21:43:19Z

Some quick thoughts:

Looking at https://github.com/dtcxzyw/llvm-opt-benchmark/pull/1288/files, a significant number of the transforms look non-profitable. This transforms is reasonable if it actually drops extractelement + insertelement pairs, but if you just move insertelements to a predecessor while converting scalar phis into aggregate phis, I think that's a loss, because optimizations generally have troubles with aggregate SSA values. I think this happens mostly if the other phi operands are constants.
BFI in InstCombine -- wat? It looks like it's currently only used in SLC. Let's not use it in InstCombine proper.
To protect against cycles you probably want to check isBackEdge() instead, which was recently added.
succ_size(Pred) != 1 => This should probably explicitly check for an unconditional branch, so you don't need to deal with the weird edge cases? (Like say, a single-successor callbr with aggregate return).

weiguozhi · 2024-09-18T18:26:21Z

Looking at https://github.com/dtcxzyw/llvm-opt-benchmark/pull/1288/files, a significant number of the transforms look non-profitable. This transforms is reasonable if it actually drops extractelement + insertelement pairs, but if you just move insertelements to a predecessor while converting scalar phis into aggregate phis, I think that's a loss, because optimizations generally have troubles with aggregate SSA values. I think this happens mostly if the other phi operands are constants.

The original code removes fully redundant insertvalue, this patch enhances it to remove partial redundant insertvalue, so it actually drop redundant insertvalue. Avoiding constant aggregate sounds reasonable, I added code to check it.

BFI in InstCombine -- wat? It looks like it's currently only used in SLC. Let's not use it in InstCombine proper.

Deleted.

To protect against cycles you probably want to check isBackEdge() instead, which was recently added.

Thanks for the info. Currently I restrict it to UseBB == OrigBB && succ_size(Pred) == 1, it can already avoid cycles.

succ_size(Pred) != 1 => This should probably explicitly check for an unconditional branch, so you don't need to deal with the weird edge cases? (Like say, a single-successor callbr with aggregate return).

Changed it to checking for unconditional branch.

weiguozhi · 2024-10-15T23:59:37Z

ping

dtcxzyw · 2024-10-16T07:39:40Z

I don't think we can benefit from introducing more aggregate phi nodes. If these phi nodes are converted into selects in subsequent passes (e.g., SimplifyCFG), it is hard for InstCombine to fold. See dtcxzyw/llvm-opt-benchmark#1351 (comment).

If you think this patch will reduce total number of instructions, I don't agree with this argument. Can you provide some performance data on SPEC/llvm-test-suite?

Compilation time impact looks ok. Don't worry about this :)

I will accept this patch if:

most of the regressions related to selects on aggregate types are fixed.
- bench/llvm/optimized/LowerExpectIntrinsic.cpp.ll
This patch results in a net performance improvement on some benchmarks.

weiguozhi · 2024-10-18T18:54:45Z

Add new instcombine pattern to transform

         extract (select (cond, insert(agg, elem), FV))
     ->  select (cond, elem, extract(FV))

so the regression of dtcxzyw/llvm-opt-benchmark#1351 (comment) can be fixed.

weiguozhi · 2024-10-25T00:17:46Z

I tested it on speccpu 2006 int, this patch is triggerred 19 times for 483.xalancbmk only, but the generated code is same. So no impact.

The real world impact of this patch is to a hot path in tcmalloc. In function _ZN8tcmalloc17tcmalloc_internal14TCMallocPolicyINS0_12CppOomPolicyENS0_18DefaultAlignPolicyENS0_25AllocationAccessHotPolicyENS0_17InvokeHooksPolicyENS0_21IsSizeReturningPolicyENS0_24LocalNumaPartitionPolicyEE10to_pointerEPvmocPolicyINS0_12CppOomPolicyENS0_18DefaultAlignPolicyENS0_25AllocationAccessHotPolicyENS0_17InvokeHooksPolicyENS0_21IsSizeReturningPolicyENS0_24LocalNumaPartitionPolicyEE10to_pointerEPvm (will be inlined into tcmalloc_size_returning_operator_new) there is a code sequence similar to

BB1:
   {ptr, i64} %v1 = call {ptr, i64} @func1
   %f1 = extractvalue { ptr, i64 } %v1, 0
   %f2 = extractvalue { ptr, i64 } %v1, 1
   br label %exit

BB2:
   {ptr, i64} %v2 = call {ptr, i64} @_ZN8tcmalloc17tcmalloc_internal14TCMallocPolicyINS0_12CppOomPolicyENS0_18DefaultAlignPolicyENS0_25AllocationAccessHotPolicyENS0_17InvokeHooksPolicyENS0_21IsSizeReturningPolicyENS0_24LocalNumaPartitionPolicyEE10to_pointerEPvmocPolicyINS0_12CppOomPolicyENS0_18DefaultAlignPolicyENS0_25AllocationAccessHotPolicyENS0_17InvokeHooksPolicyENS0_21IsSizeReturningPolicyENS0_24LocalNumaPartitionPolicyEE10to_pointerEPvm 
   %f3 = extractvalue { ptr, i64 } %v2, 0
   %f4 = extractvalue { ptr, i64 } %v2, 1
   br label %exit

exit:
   %f5 = phi ptr [%f1, %BB1], [%f3, %BB2]
   %f6 = phi i64 [%f2, %BB1], [%f4, %BB2]
   %res0 = insertvalue { ptr, i64 } poison, ptr %f5, 0
   %res = insertvalue { ptr, i64 } %res0, i64 %f6, 1
   ret {ptr, i64} %res

Usually function foldAggregateConstructionIntoAggregateReuse can optimize away all of the insertvalue/extractvalue,

BB1:
   {ptr, i64} %v1 = call {ptr, i64} @func1
   br label %exit

BB2:
   {ptr, i64} %v2 = call {ptr, i64} @_ZN8tcmalloc17tcmalloc_internal14TCMallocPolicyINS0_12CppOomPolicyENS0_18DefaultAlignPolicyENS0_25AllocationAccessHotPolicyENS0_17InvokeHooksPolicyENS0_21IsSizeReturningPolicyENS0_24LocalNumaPartitionPolicyEE10to_pointerEPvmocPolicyINS0_12CppOomPolicyENS0_18DefaultAlignPolicyENS0_25AllocationAccessHotPolicyENS0_17InvokeHooksPolicyENS0_21IsSizeReturningPolicyENS0_24LocalNumaPartitionPolicyEE10to_pointerEPvm 
   br label %exit

exit:
   %res = phi {ptr, i64} [%v1, BB1], [%v2, BB2]
   ret {ptr, i64} %res

But in one of our applications built with thinlto, with profile information thinlto causes the function _ZN8tcmalloc17tcmalloc_internal14TCMallocPolicyINS0_12CppOomPolicyENS0_18DefaultAlignPolicyENS0_25AllocationAccessHotPolicyENS0_17InvokeHooksPolicyENS0_21IsSizeReturningPolicyENS0_24LocalNumaPartitionPolicyEE10to_pointerEPvmocPolicyINS0_12CppOomPolicyENS0_18DefaultAlignPolicyENS0_25AllocationAccessHotPolicyENS0_17InvokeHooksPolicyENS0_21IsSizeReturningPolicyENS0_24LocalNumaPartitionPolicyEE10to_pointerEPvm to be inlined, so I got

BB1:
   {ptr, i64} %v1 = call {ptr, i64} @func1
   %f1 = extractvalue { ptr, i64 } %v1, 0
   %f2 = extractvalue { ptr, i64 } %v1, 1
   br label %exit

BB2:
   %f3 = phi ptr [], []
   %f4 = load i64 ...
   br label %exit

exit:
   %f5 = phi ptr [%f1, %BB1], [%f3, %BB2]
   %f6 = phi i64 [%f2, %BB1], [%f4, %BB2]
   %res0 = insertvalue { ptr, i64 } poison, ptr %f5, 0
   %res = insertvalue { ptr, i64 } %res0, i64 %f6, 1
   ret {ptr, i64} %res

This instruction sequence can't be handled by the original foldAggregateConstructionIntoAggregateReuse, so worse tcmalloc_size_returning_operator_new was generated.

llvm/test/Transforms/InstCombine/fold-aggregate-reconstruction.ll

dtcxzyw · 2024-10-30T07:24:20Z

IR diff looks good to me. I believe the regression can be addressed by using simplifyInstruction in FoldOpIntoSelect: https://alive2.llvm.org/ce/z/HiwgPk. We discussed about this in #84686 (comment).

weiguozhi · 2024-11-08T00:53:59Z

ping

dtcxzyw · 2024-11-08T07:43:19Z

IR diff looks good to me. I believe the regression can be addressed by using simplifyInstruction in FoldOpIntoSelect: https://alive2.llvm.org/ce/z/HiwgPk. We discussed about this in #84686 (comment).

@weiguozhi Are you willing to try this?

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

weiguozhi · 2024-11-08T19:14:39Z

IR diff looks good to me. I believe the regression can be addressed by using simplifyInstruction in FoldOpIntoSelect: https://alive2.llvm.org/ce/z/HiwgPk. We discussed about this in #84686 (comment).

@weiguozhi Are you willing to try this?

Yes, I tried it. As you have noticed it doesn't handle non-const aggregates. We can enhance FoldOpIntoSelect to do the non-const transform in aef5402

     extract (select (cond, insert(agg, elem), FV))
 ->  select (cond, elem, extract(FV))

dtcxzyw · 2024-11-10T08:58:43Z

IR diff looks good to me. I believe the regression can be addressed by using simplifyInstruction in FoldOpIntoSelect: https://alive2.llvm.org/ce/z/HiwgPk. We discussed about this in #84686 (comment).

@weiguozhi Are you willing to try this?

Yes, I tried it. As you have noticed it doesn't handle non-const aggregates. We can enhance FoldOpIntoSelect to do the non-const transform in aef5402
     extract (select (cond, insert(agg, elem), FV))
 ->  select (cond, elem, extract(FV))

Can you file a separate PR?

…source aggregates are partially available Function foldAggregateConstructionIntoAggregateReuse can fold insertvalue(phi(extractvalue(src1), extractvalue(src2))) into phi(src1, src2) when we can find source aggregates in all predecessors. This patch extends it to handle following case insertvalue(phi(extractvalue(src1), elm2)) into phi(src1, insertvalue(elm2)) with the condition that the predecessor without source aggregate has only one successor.

add an insertvalue to PredBB.

BFI in InstCombine has limited usage, it's usually not available, so remove it. Also avoid constructing constant aggregate.

This reverts commit aef54025298b496d4f0b8766816700659b2f9015. This optimization has been implemented in llvm@9a844a3.

dtcxzyw

LGTM. Please wait for additional approval from other reviewers.

dtcxzyw · 2024-11-19T03:05:11Z

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

+    // Avoid constructing constant aggregate because constant value may expose
+    // more optimizations.
+    bool ConstAgg = true;
+    for (auto [Idx, Val] : enumerate(AggElts)) {


Suggested change

for (auto [Idx, Val] : enumerate(AggElts)) {

for (auto Val : AggElts) {

Idx is unused.

dtcxzyw · 2024-11-19T03:07:02Z

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

+      continue;
+
+    BasicBlock *Pred = It.first;
+    Builder.SetInsertPoint(Pred, Pred->getTerminator()->getIterator());


Suggested change

Builder.SetInsertPoint(Pred, Pred->getTerminator()->getIterator());

Builder.SetInsertPoint(Pred->getTerminator());

nikic · 2024-11-19T10:27:40Z

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

+  }
+
+  if (!FoundSrcAgg)
+    return nullptr;


If I'm understanding correctly, this currently checks that at least one predecessor folds, rather than that all but one folds, so we could end up inserting extra insertvalues in many predecessors. Shouldn't we be limiting this such that we only insert into at most one predecessor?

For example, here is a slight variant of test2 with the current patch:

define {ptr, i64} @test2(i1 %cond1, i1 %cond2, ptr %p1, ptr %p2) { ; CHECK-LABEL: define { ptr, i64 } @test2( ; CHECK-SAME: i1 [[COND1:%.*]], i1 [[COND2:%.*]], ptr [[P1:%.*]], ptr [[P2:%.*]]) { ; CHECK-NEXT: br i1 [[COND1]], label %[[BBB1:.*]], label %[[BBB4:.*]] ; CHECK: [[BBB1]]: ; CHECK-NEXT: br i1 [[COND2]], label %[[BBB2:.*]], label %[[BBB3:.*]] ; CHECK: [[BBB2]]: ; CHECK-NEXT: [[CALL1:%.*]] = call { ptr, i64 } @bar(i64 0) ; CHECK-NEXT: br label %[[EXIT:.*]] ; CHECK: [[BBB3]]: ; CHECK-NEXT: [[VAL31:%.*]] = load ptr, ptr [[P1]], align 8 ; CHECK-NEXT: [[VAL32:%.*]] = load i64, ptr [[P2]], align 4 ; CHECK-NEXT: [[TMP1:%.*]] = insertvalue { ptr, i64 } poison, ptr [[VAL31]], 0 ; CHECK-NEXT: [[TMP2:%.*]] = insertvalue { ptr, i64 } [[TMP1]], i64 [[VAL32]], 1 ; CHECK-NEXT: br label %[[EXIT]] ; CHECK: [[BBB4]]: ; CHECK-NEXT: [[VAL33:%.*]] = load ptr, ptr [[P1]], align 8 ; CHECK-NEXT: [[VAL34:%.*]] = load i64, ptr [[P2]], align 4 ; CHECK-NEXT: [[TMP3:%.*]] = insertvalue { ptr, i64 } poison, ptr [[VAL33]], 0 ; CHECK-NEXT: [[TMP4:%.*]] = insertvalue { ptr, i64 } [[TMP3]], i64 [[VAL34]], 1 ; CHECK-NEXT: br label %[[EXIT]] ; CHECK: [[EXIT]]: ; CHECK-NEXT: [[RES_MERGED:%.*]] = phi { ptr, i64 } [ [[CALL1]], %[[BBB2]] ], [ [[TMP2]], %[[BBB3]] ], [ [[TMP4]], %[[BBB4]] ] ; CHECK-NEXT: ret { ptr, i64 } [[RES_MERGED]] ; br i1 %cond1, label %bbb1, label %bbb4 bbb1: br i1 %cond2, label %bbb2, label %bbb3 bbb2: %call1 = call {ptr, i64} @bar(i64 0) %val11 = extractvalue { ptr, i64 } %call1, 0 %val12 = extractvalue { ptr, i64 } %call1, 1 br label %exit bbb3: %val21 = load ptr, ptr %p1 %val22 = load i64, ptr %p2 br label %exit bbb4: %val31 = load ptr, ptr %p1 %val32 = load i64, ptr %p2 br label %exit exit: %val1 = phi ptr [%val11, %bbb2], [%val21, %bbb3], [%val31, %bbb4] %val2 = phi i64 [%val12, %bbb2], [%val22, %bbb3], [%val32, %bbb4] %tmp = insertvalue { ptr, i64 } poison, ptr %val1, 0 %res = insertvalue { ptr, i64 } %tmp, i64 %val2, 1 ret {ptr, i64} %res }

We remove extractvalues in one branch, but trade this with insertvalues in two branches. This doesn't look profitable, especially once we extend this to even more branches that don't fold.

This doesn't look profitable, especially once we extend this to even more branches that don't fold.

The latter one is better: https://godbolt.org/z/Yfhjhd3Yn

Because we limits the insertvalue to be moved to immediate predecessors and UseBB is the only successor of PredBB, the total execution count of UseBB is the sum of all predecessors. So once we have found some source aggregate in a PredBB, add insertvalue to all other predecessors still results in less dynamic count of insertvalue instructions.

nikic · 2024-11-19T10:53:25Z

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

+    // If OrigIVI is in UseBB and it's the only successor of PredBB, PredBB
+    // can't be in inner loop.
+    if (UseBB != OrigBB)
+      return nullptr;


I have a hard time understanding why this check is sufficient to prevent cycles in all cases, or even how the UseBB != OrigBB condition is related. I guess this is supposed to check the case where OrigBB is a loop header with phis and UseBB is an exit block with inserts -- but couldn't it fail on non-sunk inserts?

It limits the insertvalue can only be inserted into immediate predecessors. When combining the previous condition that UseBB is the only successor of PredBB, the insertvalue can never be moved across any BB with multiple successors, such as a loop exit node. So it prevents moving insertvalue

across loop iterations

to inner loop from outside

weiguozhi requested a review from nikic as a code owner July 26, 2024 22:48

llvmbot added the llvm:transforms label Jul 26, 2024

weiguozhi requested review from arsenm and gandhi56 July 26, 2024 22:49

weiguozhi requested a review from LebedevRI July 26, 2024 22:52

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Jul 27, 2024

pre-commit: test PR100828

4c861ca

PR Link: llvm/llvm-project#100828

dtcxzyw mentioned this pull request Jul 27, 2024

pre-commit: test PR100828 dtcxzyw/llvm-opt-benchmark#1087

Closed

dtcxzyw requested changes Jul 27, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp Outdated Show resolved Hide resolved

arsenm reviewed Jul 30, 2024

View reviewed changes

llvm/test/Transforms/InstCombine/fold-aggregate-reconstruction.ll Outdated Show resolved Hide resolved

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Jul 30, 2024

pre-commit: test PR100828

4369896

PR Link: llvm/llvm-project#100828

dtcxzyw mentioned this pull request Jul 30, 2024

pre-commit: test PR100828 (New) dtcxzyw/llvm-opt-benchmark#1107

Closed

dtcxzyw requested changes Jul 30, 2024

View reviewed changes

weiguozhi closed this Jul 30, 2024

weiguozhi reopened this Jul 30, 2024

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Aug 21, 2024

pre-commit: test PR100828

8660b5c

PR Link: llvm/llvm-project#100828

dtcxzyw mentioned this pull request Aug 21, 2024

pre-commit: test PR100828 (New) dtcxzyw/llvm-opt-benchmark#1223

Closed

dtcxzyw requested changes Aug 21, 2024

View reviewed changes

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Sep 10, 2024

pre-commit: test PR100828

2c0e5af

PR Link: llvm/llvm-project#100828

dtcxzyw mentioned this pull request Sep 10, 2024

pre-commit: test PR100828 (3rd try) dtcxzyw/llvm-opt-benchmark#1288

Closed

arsenm requested a review from dtcxzyw October 16, 2024 04:08

dtcxzyw requested a review from goldsteinn October 16, 2024 07:39

dtcxzyw reviewed Oct 26, 2024

View reviewed changes

llvm/test/Transforms/InstCombine/fold-aggregate-reconstruction.ll Show resolved Hide resolved

dtcxzyw reviewed Nov 8, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp Outdated Show resolved Hide resolved

weiguozhi added 9 commits November 18, 2024 11:52

If a value is defined in UseBB, we can't use it in PredBB, so we can't

e7a60fc

add an insertvalue to PredBB.

Prevent add insertvalue to inner loops.

b15c315

Use control flow and profile information to avoid infinite loop.

f63b726

Remove the usage of BFI

d9fe299

BFI in InstCombine has limited usage, it's usually not available, so remove it. Also avoid constructing constant aggregate.

Change the condition expression.

8db6d10

Change the loop iterator.

aca50cd

Revert "Do the transform"

0737620

This reverts commit aef54025298b496d4f0b8766816700659b2f9015. This optimization has been implemented in llvm@9a844a3.

Rebase test case.

954f6c8

weiguozhi force-pushed the combineInsertValue branch from 9cd8625 to 954f6c8 Compare November 18, 2024 22:26

llvmbot added the llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes label Nov 18, 2024

dtcxzyw mentioned this pull request Nov 19, 2024

pre-commit: PR100828 dtcxzyw/llvm-opt-benchmark#1714

Closed

dtcxzyw approved these changes Nov 19, 2024

View reviewed changes

nikic reviewed Nov 19, 2024

View reviewed changes

Fixing styles.

087f907

nikic approved these changes Nov 20, 2024

View reviewed changes

weiguozhi merged commit 9424f3d into llvm:main Nov 21, 2024
6 of 8 checks passed

	for (auto [Idx, Val] : enumerate(AggElts)) {
	for (auto Val : AggElts) {

	Builder.SetInsertPoint(Pred, Pred->getTerminator()->getIterator());
	Builder.SetInsertPoint(Pred->getTerminator());

[InstCombine] Extend folding of aggregate construction to cases when source aggregates are partially available #100828

[InstCombine] Extend folding of aggregate construction to cases when source aggregates are partially available #100828

Uh oh!

Conversation

weiguozhi commented Jul 26, 2024

Uh oh!

llvmbot commented Jul 26, 2024

Uh oh!

Uh oh!

Uh oh!

dtcxzyw left a comment

Choose a reason for hiding this comment

Uh oh!

weiguozhi commented Jul 30, 2024

Uh oh!

weiguozhi commented Jul 30, 2024

Uh oh!

weiguozhi commented Aug 20, 2024

Uh oh!

dtcxzyw left a comment

Choose a reason for hiding this comment

Uh oh!

weiguozhi commented Aug 21, 2024

Uh oh!

dtcxzyw commented Aug 21, 2024

Uh oh!

weiguozhi commented Aug 21, 2024

Uh oh!

weiguozhi commented Sep 9, 2024

Uh oh!

weiguozhi commented Sep 16, 2024

Uh oh!

nikic commented Sep 16, 2024

Uh oh!

weiguozhi commented Sep 18, 2024

Uh oh!

weiguozhi commented Oct 15, 2024

Uh oh!

dtcxzyw commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weiguozhi commented Oct 18, 2024

Uh oh!

weiguozhi commented Oct 25, 2024

Uh oh!

Uh oh!

dtcxzyw commented Oct 30, 2024

Uh oh!

weiguozhi commented Nov 8, 2024

Uh oh!

dtcxzyw commented Nov 8, 2024

Uh oh!

Uh oh!

Uh oh!

weiguozhi commented Nov 8, 2024

Uh oh!

dtcxzyw commented Nov 10, 2024

Uh oh!

dtcxzyw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

dtcxzyw commented Oct 16, 2024 •

edited

Loading