[LV] Don't simplify wide binops to constants if non-uniform #121898

lukel97 · 2025-01-07T08:20:26Z

After 6d6eea9 we started simplifying more operands of widened binops when they were known constant via SCEV.

However in the example in #119173, we were simplifying a reduction phi that was constant in the original IR:

<x1> vector loop: {
  vector.body:
    WIDEN-REDUCTION-PHI ir<%add45> = phi ir<5>, ir<%add> ; <5, 0, 0, 0>
    WIDEN ir<%add> = add ir<0>, ir<%add45> ; <5, 0, 0, 0>
  No successors
}

-->

<x1> vector loop: {
  vector.body:
    WIDEN-REDUCTION-PHI ir<%add45> = phi ir<5>, ir<%add> ; <5, 0, 0, 0>
    WIDEN ir<%add> = add ir<0>, ir<5> ; <5, 5, 5, 5>
  No successors
}

Whilst the underlying value is constant, the widened reduction PHI isn't uniform so we can't simplify it.

This fixes #119173 by checking if the operand is known to be uniform, but also requires doing the same fix in the legacy cost model as well in order to avoid the cost-model mismatch assertion from #107015 again.

After 6d6eea9 we started simplifying more operands of binops when they were known constant via SCEV. However in the example in llvm#119173, we were simplifying a reduction phi that was constant in the original IR: <x1> vector loop: { vector.body: WIDEN-REDUCTION-PHI ir<%add45> = phi ir<5>, ir<%add> ; <5, 0, 0, 0> WIDEN ir<%add> = add ir<0>, ir<%add45> ; <5, 0, 0, 0> No successors } --> <x1> vector loop: { vector.body: WIDEN-REDUCTION-PHI ir<%add45> = phi ir<5>, ir<%add> ; <5, 0, 0, 0> WIDEN ir<%add> = add ir<0>, ir<5> ; <5, 5, 5, 5> No successors } Whilst the underlying value is constant, the widened reduction PHI isn't uniform so we can't simplify it. This fixes llvm#119173 by checking if the operand is known to be uniform, but also requires doing the same fix in the legacy cost model as well in order to avoid the cost-model mismatch assertion from llvm#107015 again.

llvmbot · 2025-01-07T08:20:57Z

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

After 6d6eea9 we started simplifying more operands of binops when they were known constant via SCEV.

However in the example in #119173, we were simplifying a reduction phi that was constant in the original IR:

&lt;x1&gt; vector loop: {
  vector.body:
    WIDEN-REDUCTION-PHI ir&lt;%add45&gt; = phi ir&lt;5&gt;, ir&lt;%add&gt; ; &lt;5, 0, 0, 0&gt;
    WIDEN ir&lt;%add&gt; = add ir&lt;0&gt;, ir&lt;%add45&gt; ; &lt;5, 0, 0, 0&gt;
  No successors
}

--&gt;

&lt;x1&gt; vector loop: {
  vector.body:
    WIDEN-REDUCTION-PHI ir&lt;%add45&gt; = phi ir&lt;5&gt;, ir&lt;%add&gt; ; &lt;5, 0, 0, 0&gt;
    WIDEN ir&lt;%add&gt; = add ir&lt;0&gt;, ir&lt;5&gt; ; &lt;5, 5, 5, 5&gt;
  No successors
}

Whilst the underlying value is constant, the widened reduction PHI isn't uniform so we can't simplify it.

This fixes #119173 by checking if the operand is known to be uniform, but also requires doing the same fix in the legacy cost model as well in order to avoid the cost-model mismatch assertion from #107015 again.

Full diff: https://github.com/llvm/llvm-project/pull/121898.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+11-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll (+55-18)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 0797100b182cb1..dadad7a4d81287 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -6716,9 +6716,16 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I,
     // If we're speculating on the stride being 1, the multiplication may
     // fold away.  We can generalize this for all operations using the notion
     // of neutral elements.  (TODO)
+    auto IsAlwaysOne = [this, VF](Value *V) {
+      // Reduction phi SCEVs may be constant when scalar, but non-uniform when
+      // vectorized and unfoldable.
+      if (auto *I = dyn_cast<Instruction>(V);
+          I && !isUniformAfterVectorization(I, VF))
+        return false;
+      return PSE.getSCEV(V)->isOne();
+    };
     if (I->getOpcode() == Instruction::Mul &&
-        (PSE.getSCEV(I->getOperand(0))->isOne() ||
-         PSE.getSCEV(I->getOperand(1))->isOne()))
+        (IsAlwaysOne(I->getOperand(0)) || IsAlwaysOne(I->getOperand(1))))
       return 0;
 
     // Detect reduction patterns
@@ -8632,6 +8639,8 @@ VPWidenRecipe *VPRecipeBuilder::tryToWiden(Instruction *I,
       // to replace operands with constants.
       ScalarEvolution &SE = *PSE.getSE();
       auto GetConstantViaSCEV = [this, &SE](VPValue *Op) {
+        if (!vputils::isUniformAfterVectorization(Op))
+          return Op;
         Value *V = Op->getUnderlyingValue();
         if (isa<Constant>(V) || !SE.isSCEVable(V->getType()))
           return Op;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll b/llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll
index 0ff98d2abe776c..a8d44421a3c37b 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll
@@ -7,24 +7,10 @@ target triple = "arm64-apple-macosx"
 define i64 @mul_select_operand_known_1_via_scev() {
 ; CHECK-LABEL: define i64 @mul_select_operand_known_1_via_scev() {
 ; CHECK-NEXT:  [[ENTRY:.*]]:
-; CHECK-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
-; CHECK:       [[VECTOR_PH]]:
-; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
-; CHECK:       [[VECTOR_BODY]]:
-; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
-; CHECK-NEXT:    [[VEC_PHI:%.*]] = phi <2 x i64> [ <i64 12, i64 1>, %[[VECTOR_PH]] ], [ [[VEC_PHI]], %[[VECTOR_BODY]] ]
-; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
-; CHECK-NEXT:    br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
-; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vector.reduce.mul.v2i64(<2 x i64> [[VEC_PHI]])
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
-; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi i64 [ [[TMP0]], %[[MIDDLE_BLOCK]] ], [ 12, %[[ENTRY]] ]
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 2, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP:.*]]
 ; CHECK:       [[LOOP]]:
-; CHECK-NEXT:    [[RED:%.*]] = phi i64 [ [[BC_MERGE_RDX]], %[[SCALAR_PH]] ], [ [[RED_NEXT:%.*]], %[[LOOP]] ]
-; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[RED:%.*]] = phi i64 [ 12, %[[ENTRY]] ], [ [[RED_NEXT:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
 ; CHECK-NEXT:    [[TMP1:%.*]] = and i32 [[IV]], 1
 ; CHECK-NEXT:    [[CMP1_I:%.*]] = icmp eq i32 [[TMP1]], 0
 ; CHECK-NEXT:    [[NARROW_I:%.*]] = select i1 [[CMP1_I]], i32 1, i32 [[IV]]
@@ -32,9 +18,9 @@ define i64 @mul_select_operand_known_1_via_scev() {
 ; CHECK-NEXT:    [[RED_NEXT]] = mul nsw i64 [[RED]], [[MUL]]
 ; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
 ; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV]], 1
-; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]]
 ; CHECK:       [[EXIT]]:
-; CHECK-NEXT:    [[RES:%.*]] = phi i64 [ [[RED_NEXT]], %[[LOOP]] ], [ [[TMP0]], %[[MIDDLE_BLOCK]] ]
+; CHECK-NEXT:    [[RES:%.*]] = phi i64 [ [[RED_NEXT]], %[[LOOP]] ]
 ; CHECK-NEXT:    ret i64 [[RES]]
 ;
 entry:
@@ -56,6 +42,57 @@ exit:
   %res = phi i64 [ %red.next, %loop ]
   ret i64 %res
 }
+
+define i32 @add_reduction_select_operand_constant_but_non_uniform() {
+; CHECK-LABEL: define i32 @add_reduction_select_operand_constant_but_non_uniform() {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VEC_PHI:%.*]] = phi <4 x i32> [ <i32 42, i32 0, i32 0, i32 0>, %[[VECTOR_PH]] ], [ [[TMP2:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VEC_PHI1:%.*]] = phi <4 x i32> [ zeroinitializer, %[[VECTOR_PH]] ], [ [[TMP1:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[TMP2]] = add <4 x i32> zeroinitializer, [[VEC_PHI]]
+; CHECK-NEXT:    [[TMP1]] = add <4 x i32> zeroinitializer, [[VEC_PHI1]]
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
+; CHECK-NEXT:    [[TMP0:%.*]] = icmp eq i32 [[INDEX_NEXT]], 64
+; CHECK-NEXT:    br i1 [[TMP0]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    [[BIN_RDX:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX]])
+; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK:       [[SCALAR_PH]]:
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 64, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP3]], %[[MIDDLE_BLOCK]] ], [ 42, %[[ENTRY]] ]
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[ADD2_REASS:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[RDX:%.*]] = phi i32 [ [[BC_MERGE_RDX]], %[[SCALAR_PH]] ], [ [[RDX_NEXT:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[ADD2_REASS]] = add i32 [[IV]], 1
+; CHECK-NEXT:    [[RDX_NEXT]] = add i32 0, [[RDX]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i32 [[ADD2_REASS]], 64
+; CHECK-NEXT:    br i1 [[CMP]], label %[[LOOP]], label %[[EXIT]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    [[ADD_LCSSA:%.*]] = phi i32 [ [[RDX_NEXT]], %[[LOOP]] ], [ [[TMP3]], %[[MIDDLE_BLOCK]] ]
+; CHECK-NEXT:    ret i32 [[ADD_LCSSA]]
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop ]
+  %rdx = phi i32 [ 42, %entry ], [ %rdx.next, %loop ]
+
+  %iv.next = add i32 %iv, 1
+  %rdx.next = add i32 0, %rdx
+
+  %cmp = icmp ult i32 %iv.next, 64
+  br i1 %cmp, label %loop, label %exit
+
+exit:
+  ret i32 %rdx.next
+}
 ;.
 ; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
 ; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}

llvmbot · 2025-01-07T08:20:57Z

@llvm/pr-subscribers-vectorizers

Author: Luke Lau (lukel97)

Changes

After 6d6eea9 we started simplifying more operands of binops when they were known constant via SCEV.

However in the example in #119173, we were simplifying a reduction phi that was constant in the original IR:

&lt;x1&gt; vector loop: {
  vector.body:
    WIDEN-REDUCTION-PHI ir&lt;%add45&gt; = phi ir&lt;5&gt;, ir&lt;%add&gt; ; &lt;5, 0, 0, 0&gt;
    WIDEN ir&lt;%add&gt; = add ir&lt;0&gt;, ir&lt;%add45&gt; ; &lt;5, 0, 0, 0&gt;
  No successors
}

--&gt;

&lt;x1&gt; vector loop: {
  vector.body:
    WIDEN-REDUCTION-PHI ir&lt;%add45&gt; = phi ir&lt;5&gt;, ir&lt;%add&gt; ; &lt;5, 0, 0, 0&gt;
    WIDEN ir&lt;%add&gt; = add ir&lt;0&gt;, ir&lt;5&gt; ; &lt;5, 5, 5, 5&gt;
  No successors
}

Whilst the underlying value is constant, the widened reduction PHI isn't uniform so we can't simplify it.

This fixes #119173 by checking if the operand is known to be uniform, but also requires doing the same fix in the legacy cost model as well in order to avoid the cost-model mismatch assertion from #107015 again.

Full diff: https://github.com/llvm/llvm-project/pull/121898.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+11-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll (+55-18)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 0797100b182cb1..dadad7a4d81287 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -6716,9 +6716,16 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I,
     // If we're speculating on the stride being 1, the multiplication may
     // fold away.  We can generalize this for all operations using the notion
     // of neutral elements.  (TODO)
+    auto IsAlwaysOne = [this, VF](Value *V) {
+      // Reduction phi SCEVs may be constant when scalar, but non-uniform when
+      // vectorized and unfoldable.
+      if (auto *I = dyn_cast<Instruction>(V);
+          I && !isUniformAfterVectorization(I, VF))
+        return false;
+      return PSE.getSCEV(V)->isOne();
+    };
     if (I->getOpcode() == Instruction::Mul &&
-        (PSE.getSCEV(I->getOperand(0))->isOne() ||
-         PSE.getSCEV(I->getOperand(1))->isOne()))
+        (IsAlwaysOne(I->getOperand(0)) || IsAlwaysOne(I->getOperand(1))))
       return 0;
 
     // Detect reduction patterns
@@ -8632,6 +8639,8 @@ VPWidenRecipe *VPRecipeBuilder::tryToWiden(Instruction *I,
       // to replace operands with constants.
       ScalarEvolution &SE = *PSE.getSE();
       auto GetConstantViaSCEV = [this, &SE](VPValue *Op) {
+        if (!vputils::isUniformAfterVectorization(Op))
+          return Op;
         Value *V = Op->getUnderlyingValue();
         if (isa<Constant>(V) || !SE.isSCEVable(V->getType()))
           return Op;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll b/llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll
index 0ff98d2abe776c..a8d44421a3c37b 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/mul-simplification.ll
@@ -7,24 +7,10 @@ target triple = "arm64-apple-macosx"
 define i64 @mul_select_operand_known_1_via_scev() {
 ; CHECK-LABEL: define i64 @mul_select_operand_known_1_via_scev() {
 ; CHECK-NEXT:  [[ENTRY:.*]]:
-; CHECK-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
-; CHECK:       [[VECTOR_PH]]:
-; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
-; CHECK:       [[VECTOR_BODY]]:
-; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
-; CHECK-NEXT:    [[VEC_PHI:%.*]] = phi <2 x i64> [ <i64 12, i64 1>, %[[VECTOR_PH]] ], [ [[VEC_PHI]], %[[VECTOR_BODY]] ]
-; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 2
-; CHECK-NEXT:    br i1 true, label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
-; CHECK:       [[MIDDLE_BLOCK]]:
-; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vector.reduce.mul.v2i64(<2 x i64> [[VEC_PHI]])
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
-; CHECK:       [[SCALAR_PH]]:
-; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi i64 [ [[TMP0]], %[[MIDDLE_BLOCK]] ], [ 12, %[[ENTRY]] ]
-; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 2, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
 ; CHECK-NEXT:    br label %[[LOOP:.*]]
 ; CHECK:       [[LOOP]]:
-; CHECK-NEXT:    [[RED:%.*]] = phi i64 [ [[BC_MERGE_RDX]], %[[SCALAR_PH]] ], [ [[RED_NEXT:%.*]], %[[LOOP]] ]
-; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[RED:%.*]] = phi i64 [ 12, %[[ENTRY]] ], [ [[RED_NEXT:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[IV_NEXT:%.*]], %[[LOOP]] ]
 ; CHECK-NEXT:    [[TMP1:%.*]] = and i32 [[IV]], 1
 ; CHECK-NEXT:    [[CMP1_I:%.*]] = icmp eq i32 [[TMP1]], 0
 ; CHECK-NEXT:    [[NARROW_I:%.*]] = select i1 [[CMP1_I]], i32 1, i32 [[IV]]
@@ -32,9 +18,9 @@ define i64 @mul_select_operand_known_1_via_scev() {
 ; CHECK-NEXT:    [[RED_NEXT]] = mul nsw i64 [[RED]], [[MUL]]
 ; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
 ; CHECK-NEXT:    [[EC:%.*]] = icmp eq i32 [[IV]], 1
-; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT]], label %[[LOOP]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK-NEXT:    br i1 [[EC]], label %[[EXIT:.*]], label %[[LOOP]]
 ; CHECK:       [[EXIT]]:
-; CHECK-NEXT:    [[RES:%.*]] = phi i64 [ [[RED_NEXT]], %[[LOOP]] ], [ [[TMP0]], %[[MIDDLE_BLOCK]] ]
+; CHECK-NEXT:    [[RES:%.*]] = phi i64 [ [[RED_NEXT]], %[[LOOP]] ]
 ; CHECK-NEXT:    ret i64 [[RES]]
 ;
 entry:
@@ -56,6 +42,57 @@ exit:
   %res = phi i64 [ %red.next, %loop ]
   ret i64 %res
 }
+
+define i32 @add_reduction_select_operand_constant_but_non_uniform() {
+; CHECK-LABEL: define i32 @add_reduction_select_operand_constant_but_non_uniform() {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VEC_PHI:%.*]] = phi <4 x i32> [ <i32 42, i32 0, i32 0, i32 0>, %[[VECTOR_PH]] ], [ [[TMP2:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[VEC_PHI1:%.*]] = phi <4 x i32> [ zeroinitializer, %[[VECTOR_PH]] ], [ [[TMP1:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[TMP2]] = add <4 x i32> zeroinitializer, [[VEC_PHI]]
+; CHECK-NEXT:    [[TMP1]] = add <4 x i32> zeroinitializer, [[VEC_PHI1]]
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
+; CHECK-NEXT:    [[TMP0:%.*]] = icmp eq i32 [[INDEX_NEXT]], 64
+; CHECK-NEXT:    br i1 [[TMP0]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    [[BIN_RDX:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]
+; CHECK-NEXT:    [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX]])
+; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
+; CHECK:       [[SCALAR_PH]]:
+; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i32 [ 64, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; CHECK-NEXT:    [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP3]], %[[MIDDLE_BLOCK]] ], [ 42, %[[ENTRY]] ]
+; CHECK-NEXT:    br label %[[LOOP:.*]]
+; CHECK:       [[LOOP]]:
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[ADD2_REASS:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[RDX:%.*]] = phi i32 [ [[BC_MERGE_RDX]], %[[SCALAR_PH]] ], [ [[RDX_NEXT:%.*]], %[[LOOP]] ]
+; CHECK-NEXT:    [[ADD2_REASS]] = add i32 [[IV]], 1
+; CHECK-NEXT:    [[RDX_NEXT]] = add i32 0, [[RDX]]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i32 [[ADD2_REASS]], 64
+; CHECK-NEXT:    br i1 [[CMP]], label %[[LOOP]], label %[[EXIT]], !llvm.loop [[LOOP3:![0-9]+]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    [[ADD_LCSSA:%.*]] = phi i32 [ [[RDX_NEXT]], %[[LOOP]] ], [ [[TMP3]], %[[MIDDLE_BLOCK]] ]
+; CHECK-NEXT:    ret i32 [[ADD_LCSSA]]
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop ]
+  %rdx = phi i32 [ 42, %entry ], [ %rdx.next, %loop ]
+
+  %iv.next = add i32 %iv, 1
+  %rdx.next = add i32 0, %rdx
+
+  %cmp = icmp ult i32 %iv.next, 64
+  br i1 %cmp, label %loop, label %exit
+
+exit:
+  ret i32 %rdx.next
+}
 ;.
 ; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
 ; CHECK: [[META1]] = !{!"llvm.loop.isvectorized", i32 1}

antoniofrighetto

Thanks for fixing it, this makes sense to me. Let's wait for Florian's feedback as well.

arcbbb · 2025-01-10T08:42:55Z

I was pondering this: replace the reduction-phi with a constant, then the vector operation wouldn't be necessary.
like this https://github.com/arcbbb/llvm-project/commit/fix-reduc-opt/

fhahn · 2025-01-16T22:49:17Z

Would it be possible to add the test case first? I've been thinking about some potential alternatives, but it seems like none of the options are really great :(

lukel97 · 2025-01-17T01:55:05Z

Would it be possible to add the test case first? I've been thinking about some potential alternatives, but it seems like none of the options are really great :(

I committed the test case in e83e0c3. This isn't an area of code that I'm super familiar with so I'd be happy to close this PR and let you take over if needed!

ayalz · 2025-01-20T15:35:02Z

... I've been thinking about some potential alternatives, but it seems like none of the options are really great :(

An add Reduction with a constant addend is an Induction, by definition, but a zero addend leads SCEV (rightfully) to consider the Phi an invariant rather than an AddRec?

Such a redundant invariant Phi chain best be RAUW'd with its start value? Better apply such a scalar preliminary cleanup before reaching LV, rather than within it?

fhahn

Sorry this dropped off my radar.

I think it would probably be better to only use SCEV for live-ins for now, to avoid introducing another divergence in case LoopVectorizationCostMode::isUniform... and the vputils version disagree. Tried something like this in #125436 and it looks like it also only impacts the tests as here.

We should be able to do @ayal's suggestion building on top of #124432 eventually, i.e. a very rough version like fhahn@b8e0f63

lukel97 · 2025-02-03T08:41:45Z

I think it would probably be better to only use SCEV for live-ins for now, to avoid introducing another divergence in case LoopVectorizationCostMode::isUniform... and the vputils version disagree. Tried something like this in #125436 and it looks like it also only impacts the tests as here.

Agreed that makes sense, closing this PR

lukel97 added 2 commits January 7, 2025 16:05

Precommit test

43d9835

lukel97 requested review from antoniofrighetto, ayalz, fhahn and preames January 7, 2025 08:20

llvmbot added vectorizers llvm:transforms labels Jan 7, 2025

antoniofrighetto approved these changes Jan 7, 2025

View reviewed changes

dtcxzyw mentioned this pull request Feb 2, 2025

[LoopVectorize] Miscompilation at Os #125374

Closed

fhahn reviewed Feb 2, 2025

View reviewed changes

lukel97 closed this Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV] Don't simplify wide binops to constants if non-uniform #121898

[LV] Don't simplify wide binops to constants if non-uniform #121898

Uh oh!

lukel97 commented Jan 7, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jan 7, 2025

Uh oh!

llvmbot commented Jan 7, 2025

Uh oh!

antoniofrighetto left a comment

Uh oh!

arcbbb commented Jan 10, 2025

Uh oh!

fhahn commented Jan 16, 2025

Uh oh!

lukel97 commented Jan 17, 2025

Uh oh!

ayalz commented Jan 20, 2025

Uh oh!

fhahn left a comment

Uh oh!

lukel97 commented Feb 3, 2025

Uh oh!

Uh oh!

[LV] Don't simplify wide binops to constants if non-uniform #121898

[LV] Don't simplify wide binops to constants if non-uniform #121898

Uh oh!

Conversation

lukel97 commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 7, 2025

Uh oh!

llvmbot commented Jan 7, 2025

Uh oh!

antoniofrighetto left a comment

Choose a reason for hiding this comment

Uh oh!

arcbbb commented Jan 10, 2025

Uh oh!

fhahn commented Jan 16, 2025

Uh oh!

lukel97 commented Jan 17, 2025

Uh oh!

ayalz commented Jan 20, 2025

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

lukel97 commented Feb 3, 2025

Uh oh!

Uh oh!

lukel97 commented Jan 7, 2025 •

edited

Loading