Skip to content

[ConstraintElim] Handle switch cases with the same destination #76928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dtcxzyw
Copy link
Member

@dtcxzyw dtcxzyw commented Jan 4, 2024

The original implementation only adds the fact cond == C if BB dominates DestBB and there is a unique edge from BB to DestBB.
This patch adds more range information for switch cases with the same destination if BB is the unique predecessor of DestBB.

The original motivation of this patch is to optimize a pattern in cvc5. For example:

%bf.load.i = load i16, ptr %d_kind.i, align 8
  %bf.clear.i = and i16 %bf.load.i, 1023
  %bf.cast.i = zext nneg i16 %bf.clear.i to i32
  switch i32 %bf.cast.i, label %if.else [
    i32 335, label %if.then
    i32 303, label %if.then
  ]

if.then:                                          ; preds = %entry, %entry
  %d_children.i.i = getelementptr inbounds %"class.cvc5::internal::expr::NodeValue", ptr %0, i64 0, i32 3
  %cmp.i.i.i.i.i = icmp eq i16 %bf.clear.i, 1023
  %cond.i.i.i.i.i = select i1 %cmp.i.i.i.i.i, i32 -1, i32 %bf.cast.i

%cmp.i.i.i.i.i always evaluates to false because %bf.clear.i can only be 335 or 303.

Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=8c72ff716b3e4b298695fa3faf6add860c6dbcb2&to=24a93e6c04b60b5d7d567379b8a0cd082c6554b0&stat=instructions%3Au

stage1-O3 stage1-ReleaseThinLTO stage1-ReleaseLTO-g stage1-O0-g stage2-O3 stage2-O0-g stage2-clang
-0.00% +0.02% +0.01% +0.03% -0.05% +0.03% +0.02%

@llvmbot
Copy link
Member

llvmbot commented Jan 4, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Yingwei Zheng (dtcxzyw)

Changes

The original implementation only adds the fact cond == C if BB dominates DestBB and there is a unique edge from BB to DestBB.
This patch adds more range information for switch cases with the same destination if BB is the unique predecessor of DestBB.

The original motivation of this patch is to optimize a pattern in cvc5. For example:

%bf.load.i = load i16, ptr %d_kind.i, align 8
  %bf.clear.i = and i16 %bf.load.i, 1023
  %bf.cast.i = zext nneg i16 %bf.clear.i to i32
  switch i32 %bf.cast.i, label %if.else [
    i32 335, label %if.then
    i32 303, label %if.then
  ]

if.then:                                          ; preds = %entry, %entry
  %d_children.i.i = getelementptr inbounds %"class.cvc5::internal::expr::NodeValue", ptr %0, i64 0, i32 3
  %cmp.i.i.i.i.i = icmp eq i16 %bf.clear.i, 1023
  %cond.i.i.i.i.i = select i1 %cmp.i.i.i.i.i, i32 -1, i32 %bf.cast.i

%cmp.i.i.i.i.i always evaluates to false because %bf.clear.i can only be 335 or 303.

Compile-time impact: http://llvm-compile-time-tracker.com/compare.php?from=8c72ff716b3e4b298695fa3faf6add860c6dbcb2&to=24a93e6c04b60b5d7d567379b8a0cd082c6554b0&stat=instructions%3Au

stage1-O3 stage1-ReleaseThinLTO stage1-ReleaseLTO-g stage1-O0-g stage2-O3 stage2-O0-g stage2-clang
-0.00% +0.02% +0.01% +0.03% -0.05% +0.03% +0.02%

Full diff: https://github.com/llvm/llvm-project/pull/76928.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/ConstraintElimination.cpp (+42-6)
  • (modified) llvm/test/Transforms/ConstraintElimination/switch.ll (+113-4)
diff --git a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
index cc93c8617c05e2..59c23ea5993f6f 100644
--- a/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
+++ b/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
@@ -1057,13 +1057,49 @@ void State::addInfoFor(BasicBlock &BB) {
   }
 
   if (auto *Switch = dyn_cast<SwitchInst>(BB.getTerminator())) {
-    for (auto &Case : Switch->cases()) {
-      BasicBlock *Succ = Case.getCaseSuccessor();
-      Value *V = Case.getCaseValue();
-      if (!canAddSuccessor(BB, Succ))
+    DenseMap<BasicBlock *, SmallVector<ConstantInt *>> BBMap;
+    for (auto &Case : Switch->cases())
+      BBMap[Case.getCaseSuccessor()].push_back(Case.getCaseValue());
+    // TODO: handle default dest
+    BBMap.erase(Switch->getDefaultDest());
+    Value *Cond = Switch->getCondition();
+    Type *Ty = Switch->getCondition()->getType();
+    unsigned BitWidth = Ty->getScalarSizeInBits();
+    for (auto &[Succ, Values] : BBMap) {
+      if (Succ->getUniquePredecessor() != &BB)
         continue;
-      WorkList.emplace_back(FactOrCheck::getConditionFact(
-          DT.getNode(Succ), CmpInst::ICMP_EQ, Switch->getCondition(), V));
+      DomTreeNode *Node = DT.getNode(Succ);
+      if (Values.size() == 1)
+        WorkList.emplace_back(FactOrCheck::getConditionFact(
+            Node, CmpInst::ICMP_EQ, Cond, Values.back()));
+      else {
+        APInt SignedMin = APInt::getSignedMaxValue(BitWidth);
+        APInt SignedMax = APInt::getSignedMinValue(BitWidth);
+        APInt UnsignedMin = APInt::getMaxValue(BitWidth);
+        APInt UnsignedMax = APInt::getMinValue(BitWidth);
+
+        for (ConstantInt *V : Values) {
+          const APInt &C = V->getValue();
+          SignedMin = APIntOps::smin(SignedMin, C);
+          SignedMax = APIntOps::smax(SignedMax, C);
+          UnsignedMin = APIntOps::umin(UnsignedMin, C);
+          UnsignedMax = APIntOps::umax(UnsignedMax, C);
+        }
+        if (!SignedMin.isMinSignedValue())
+          WorkList.emplace_back(FactOrCheck::getConditionFact(
+              Node, CmpInst::ICMP_SGE, Cond, ConstantInt::get(Ty, SignedMin)));
+        if (!SignedMax.isMaxSignedValue())
+          WorkList.emplace_back(FactOrCheck::getConditionFact(
+              Node, CmpInst::ICMP_SLE, Cond, ConstantInt::get(Ty, SignedMax)));
+        if (!UnsignedMin.isMinValue())
+          WorkList.emplace_back(
+              FactOrCheck::getConditionFact(Node, CmpInst::ICMP_UGE, Cond,
+                                            ConstantInt::get(Ty, UnsignedMin)));
+        if (!UnsignedMax.isMaxValue())
+          WorkList.emplace_back(
+              FactOrCheck::getConditionFact(Node, CmpInst::ICMP_ULE, Cond,
+                                            ConstantInt::get(Ty, UnsignedMax)));
+      }
     }
     return;
   }
diff --git a/llvm/test/Transforms/ConstraintElimination/switch.ll b/llvm/test/Transforms/ConstraintElimination/switch.ll
index b104b26e83cff2..fd581d081b4d3b 100644
--- a/llvm/test/Transforms/ConstraintElimination/switch.ll
+++ b/llvm/test/Transforms/ConstraintElimination/switch.ll
@@ -151,10 +151,8 @@ define i1 @switch_same_destination_for_different_cases(i8 %x) {
 ; CHECK:       exit.2:
 ; CHECK-NEXT:    [[C_3:%.*]] = icmp ult i8 [[X]], 7
 ; CHECK-NEXT:    call void @use(i1 [[C_3]])
-; CHECK-NEXT:    [[C_4:%.*]] = icmp ult i8 [[X]], 6
-; CHECK-NEXT:    call void @use(i1 [[C_4]])
-; CHECK-NEXT:    [[C_5:%.*]] = icmp ult i8 [[X]], 11
-; CHECK-NEXT:    call void @use(i1 [[C_5]])
+; CHECK-NEXT:    call void @use(i1 false)
+; CHECK-NEXT:    call void @use(i1 true)
 ; CHECK-NEXT:    [[C_6:%.*]] = icmp ult i8 [[X]], 10
 ; CHECK-NEXT:    ret i1 [[C_6]]
 ;
@@ -181,4 +179,115 @@ exit.2:
   ret i1 %c.6
 }
 
+define i1 @test_switch_with_same_dest(i32 %a) {
+; CHECK-LABEL: @test_switch_with_same_dest(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[B:%.*]] = and i32 [[A:%.*]], 1023
+; CHECK-NEXT:    switch i32 [[B]], label [[SW_DEFAULT:%.*]] [
+; CHECK-NEXT:      i32 37, label [[SW_BB:%.*]]
+; CHECK-NEXT:      i32 38, label [[SW_BB]]
+; CHECK-NEXT:    ]
+; CHECK:       sw.bb:
+; CHECK-NEXT:    ret i1 false
+; CHECK:       sw.default:
+; CHECK-NEXT:    ret i1 false
+;
+entry:
+  %b = and i32 %a, 1023
+  switch i32 %b, label %sw.default [
+  i32 37, label %sw.bb
+  i32 38, label %sw.bb
+  ]
+
+sw.bb:
+  %cmp = icmp eq i32 %b, 1023
+  ret i1 %cmp
+sw.default:
+  ret i1 false
+}
+
+define i1 @test_switch_with_same_dest_zext(i16 %a) {
+; CHECK-LABEL: @test_switch_with_same_dest_zext(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[B:%.*]] = and i16 [[A:%.*]], 1023
+; CHECK-NEXT:    [[B_EXT:%.*]] = zext nneg i16 [[B]] to i32
+; CHECK-NEXT:    switch i32 [[B_EXT]], label [[SW_DEFAULT:%.*]] [
+; CHECK-NEXT:      i32 37, label [[SW_BB:%.*]]
+; CHECK-NEXT:      i32 38, label [[SW_BB]]
+; CHECK-NEXT:    ]
+; CHECK:       sw.bb:
+; CHECK-NEXT:    ret i1 false
+; CHECK:       sw.default:
+; CHECK-NEXT:    ret i1 false
+;
+entry:
+  %b = and i16 %a, 1023
+  %b.ext = zext nneg i16 %b to i32
+  switch i32 %b.ext, label %sw.default [
+  i32 37, label %sw.bb
+  i32 38, label %sw.bb
+  ]
+
+sw.bb:
+  %cmp = icmp eq i16 %b, 1023
+  ret i1 %cmp
+sw.default:
+  ret i1 false
+}
+
+; Negative tests
+
+define i1 @test_switch_with_same_dest_fail1(i32 %a) {
+; CHECK-LABEL: @test_switch_with_same_dest_fail1(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[B:%.*]] = and i32 [[A:%.*]], 1023
+; CHECK-NEXT:    switch i32 [[B]], label [[SW_BB:%.*]] [
+; CHECK-NEXT:      i32 37, label [[SW_BB]]
+; CHECK-NEXT:      i32 38, label [[SW_BB]]
+; CHECK-NEXT:    ]
+; CHECK:       sw.bb:
+; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[B]], 1023
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+entry:
+  %b = and i32 %a, 1023
+  switch i32 %b, label %sw.bb [
+  i32 37, label %sw.bb
+  i32 38, label %sw.bb
+  ]
+
+sw.bb:
+  %cmp = icmp eq i32 %b, 1023
+  ret i1 %cmp
+}
+
+define i1 @test_switch_with_same_dest_fail2(i32 %a, i1 %cond) {
+; CHECK-LABEL: @test_switch_with_same_dest_fail2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[B:%.*]] = and i32 [[A:%.*]], 1023
+; CHECK-NEXT:    br i1 [[COND:%.*]], label [[IF_THEN:%.*]], label [[SW_BB:%.*]]
+; CHECK:       if.then:
+; CHECK-NEXT:    switch i32 [[B]], label [[SW_BB]] [
+; CHECK-NEXT:      i32 37, label [[SW_BB]]
+; CHECK-NEXT:      i32 38, label [[SW_BB]]
+; CHECK-NEXT:    ]
+; CHECK:       sw.bb:
+; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[B]], 1023
+; CHECK-NEXT:    ret i1 [[CMP]]
+;
+entry:
+  %b = and i32 %a, 1023
+  br i1 %cond, label %if.then, label %sw.bb
+
+if.then:
+  switch i32 %b, label %sw.bb [
+  i32 37, label %sw.bb
+  i32 38, label %sw.bb
+  ]
+
+sw.bb:
+  %cmp = icmp eq i32 %b, 1023
+  ret i1 %cmp
+}
+
 declare void @use(i1)

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Jan 4, 2024
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good fit for ConstraintElimination. I'd expect either IPSCCP or CVP to optimize such cases. If they don't, we should fix it there.

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Jan 4, 2024

I don't think this is a good fit for ConstraintElimination. I'd expect either IPSCCP or CVP to optimize such cases. If they don't, we should fix it there.

We have handled this in CVP:

// If the edge was formed by a switch on the value, then we may know exactly
// what it is.
if (SwitchInst *SI = dyn_cast<SwitchInst>(BBFrom->getTerminator())) {
Value *Condition = SI->getCondition();
if (!isa<IntegerType>(Val->getType()))
return ValueLatticeElement::getOverdefined();
bool ValUsesConditionAndMayBeFoldable = false;
if (Condition != Val) {
// Check if Val has Condition as an operand.
if (User *Usr = dyn_cast<User>(Val))
ValUsesConditionAndMayBeFoldable = isOperationFoldable(Usr) &&
usesOperand(Usr, Condition);
if (!ValUsesConditionAndMayBeFoldable)
return ValueLatticeElement::getOverdefined();
}
assert((Condition == Val || ValUsesConditionAndMayBeFoldable) &&
"Condition != Val nor Val doesn't use Condition");
bool DefaultCase = SI->getDefaultDest() == BBTo;
unsigned BitWidth = Val->getType()->getIntegerBitWidth();
ConstantRange EdgesVals(BitWidth, DefaultCase/*isFullSet*/);
for (auto Case : SI->cases()) {
APInt CaseValue = Case.getCaseValue()->getValue();
ConstantRange EdgeVal(CaseValue);
if (ValUsesConditionAndMayBeFoldable) {
User *Usr = cast<User>(Val);
const DataLayout &DL = BBTo->getModule()->getDataLayout();
ValueLatticeElement EdgeLatticeVal =
constantFoldUser(Usr, Condition, CaseValue, DL);
if (EdgeLatticeVal.isOverdefined())
return ValueLatticeElement::getOverdefined();
EdgeVal = EdgeLatticeVal.getConstantRange();
}
if (DefaultCase) {
// It is possible that the default destination is the destination of
// some cases. We cannot perform difference for those cases.
// We know Condition != CaseValue in BBTo. In some cases we can use
// this to infer Val == f(Condition) is != f(CaseValue). For now, we
// only do this when f is identity (i.e. Val == Condition), but we
// should be able to do this for any injective f.
if (Case.getCaseSuccessor() != BBTo && Condition == Val)
EdgesVals = EdgesVals.difference(EdgeVal);
} else if (Case.getCaseSuccessor() == BBTo)
EdgesVals = EdgesVals.unionWith(EdgeVal);
}
return ValueLatticeElement::getRange(std::move(EdgesVals));
}

However, it cannot fix the original issue because the condition of switch is %bf.cast.i but the value to compare is %bf.clear.i. So I fixed it in ConstraintElimination.
If we plan to fix it in CVP, we should make LazyValueInfoImpl::getValueAt recursive, which is more expensive than the approach of this patch.

godbolt: https://godbolt.org/z/38TMcv4xE

@nikic
Copy link
Contributor

nikic commented Jan 4, 2024

@dtcxzyw Can we solve this by folding switch(zext(x)) to switch(x) instead? That seems like a generally beneficial transform...

dtcxzyw added a commit that referenced this pull request Jan 5, 2024
This patch folds `switch(zext/sext(X))` into `switch(X)`.
The original motivation of this patch is to optimize a pattern found in
cvc5. For example:
```
  %bf.load.i = load i16, ptr %d_kind.i, align 8
  %bf.clear.i = and i16 %bf.load.i, 1023
  %bf.cast.i = zext nneg i16 %bf.clear.i to i32
  switch i32 %bf.cast.i, label %if.else [
    i32 335, label %if.then
    i32 303, label %if.then
  ]

if.then:                                          ; preds = %entry, %entry
  %d_children.i.i = getelementptr inbounds %"class.cvc5::internal::expr::NodeValue", ptr %0, i64 0, i32 3
  %cmp.i.i.i.i.i = icmp eq i16 %bf.clear.i, 1023
  %cond.i.i.i.i.i = select i1 %cmp.i.i.i.i.i, i32 -1, i32 %bf.cast.i
```
`%cmp.i.i.i.i.i` always evaluates to false because `%bf.clear.i` can
only be 335 or 303.
Folding `switch i32 %bf.cast.i` to `switch i16 %bf.clear.i` will help
`CVP` to handle this case.
See also
#76928 (comment).

Compile-time impact:
http://llvm-compile-time-tracker.com/compare.php?from=7954c57124b495fbdc73674d71f2e366e4afe522&to=502b13ed34e561d995ae1f724cf06d20008bd86f&stat=instructions:u

|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|+0.03%|+0.06%|+0.07%|+0.00%|-0.02%|-0.03%|+0.02%|
dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Jan 6, 2024
@nikic
Copy link
Contributor

nikic commented Jan 10, 2024

Is this still needed now that the other changes have landed?

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Jan 10, 2024

Is this still needed now that the other changes have landed?

Looks like there are other cases which are not covered by #76988.
I am investigating another case found in bench/cvc5/optimized/quantifiers_rewriter.cpp.ll:

  %bf.load.i633 = load i16, ptr %d_kind.i632, align 8
  %bf.clear.i634 = and i16 %bf.load.i633, 1023
  %bf.cast.i635 = zext nneg i16 %bf.clear.i634 to i32
  %cmp298 = icmp eq i32 %128, %bf.cast.i635
  br i1 %cmp298, label %land.lhs.true299, label %if.else332

land.lhs.true299:                                 ; preds = %invoke.cont294
  switch i32 %128, label %if.else332 [
    i32 21, label %if.then303
    i32 19, label %if.then303
  ]

 if.then303:                                       ; preds = %land.lhs.true299, %land.lhs.true299
   store i8 1, ptr %childChanged, align 1
   %cmp.i.i.i.i.i640 = icmp eq i16 %bf.clear.i634, 1023
   %cond.i.i.i.i.i641 = select i1 %cmp.i.i.i.i.i640, i32 -1, i32 %128

%cmp.i.i.i.i.i640 = icmp eq i16 %bf.clear.i634, 1023 always evaluates to false.

justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
This patch folds `switch(zext/sext(X))` into `switch(X)`.
The original motivation of this patch is to optimize a pattern found in
cvc5. For example:
```
  %bf.load.i = load i16, ptr %d_kind.i, align 8
  %bf.clear.i = and i16 %bf.load.i, 1023
  %bf.cast.i = zext nneg i16 %bf.clear.i to i32
  switch i32 %bf.cast.i, label %if.else [
    i32 335, label %if.then
    i32 303, label %if.then
  ]

if.then:                                          ; preds = %entry, %entry
  %d_children.i.i = getelementptr inbounds %"class.cvc5::internal::expr::NodeValue", ptr %0, i64 0, i32 3
  %cmp.i.i.i.i.i = icmp eq i16 %bf.clear.i, 1023
  %cond.i.i.i.i.i = select i1 %cmp.i.i.i.i.i, i32 -1, i32 %bf.cast.i
```
`%cmp.i.i.i.i.i` always evaluates to false because `%bf.clear.i` can
only be 335 or 303.
Folding `switch i32 %bf.cast.i` to `switch i16 %bf.clear.i` will help
`CVP` to handle this case.
See also
llvm#76928 (comment).

Compile-time impact:
http://llvm-compile-time-tracker.com/compare.php?from=7954c57124b495fbdc73674d71f2e366e4afe522&to=502b13ed34e561d995ae1f724cf06d20008bd86f&stat=instructions:u

|stage1-O3|stage1-ReleaseThinLTO|stage1-ReleaseLTO-g|stage1-O0-g|stage2-O3|stage2-O0-g|stage2-clang|
|--|--|--|--|--|--|--|
|+0.03%|+0.06%|+0.07%|+0.00%|-0.02%|-0.03%|+0.02%|
@dtcxzyw dtcxzyw marked this pull request as draft February 8, 2024 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants