Skip to content

[InstCombine] Re-queue users of phi when nsw/nuw flags of add are inferred #113933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 18, 2024

Conversation

dtcxzyw
Copy link
Member

@dtcxzyw dtcxzyw commented Oct 28, 2024

This patch re-queue users of phi when one of its incoming add instructions is updated. If an add instruction is updated, the analysis results of phis may be improved. Thus we may further fold some users of this phi node.

See the following case:

define i8 @trunc_in_loop_exit_block() {
; CHECK-LABEL: @trunc_in_loop_exit_block(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    br label [[LOOP:%.*]]
; CHECK:       loop:
; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
; CHECK-NEXT:    [[PHI:%.*]] = phi i32 [ 1, [[ENTRY]] ], [ [[IV_NEXT]], [[LOOP_LATCH]] ]
; CHECK-NEXT:    [[CMP:%.*]] = icmp samesign ult i32 [[IV]], 100
; CHECK-NEXT:    br i1 [[CMP]], label [[LOOP_LATCH]], label [[EXIT:%.*]]
; CHECK:       loop.latch:
; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
; CHECK-NEXT:    br label [[LOOP]]
; CHECK:       exit:
; CHECK-NEXT:    [[TRUNC:%.*]] = trunc i32 [[PHI]] to i8
; CHECK-NEXT:    ret i8 [[TRUNC]]
;
entry:
  br label %loop

loop:
  %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]
  %phi = phi i32 [ 1, %entry ], [ %iv.next, %loop.latch ]
  %cmp = icmp ult i32 %iv, 100
  br i1 %cmp, label %loop.latch, label %exit

loop.latch:
  %iv.next = add i32 %iv, 1
  br label %loop

exit:
  %trunc = trunc i32 %phi to i8
  ret i8 %trunc
}

%iv u< 100 -> infer nsw/nuw for %iv.next = add i32 %iv, 1
-> %iv is non-negative -> infer samesign for %cmp = icmp ult i32 %iv, 100.
Without re-queuing users of phi nodes, we cannot improve %cmp in one iteration.

However, it is weird that this patch doesn't handle new add instructions with nsw/nuw flags. I tried to do this in InstructionWorklist::pushUsersToWorkList. Unfortunately the compile-time impact is huge: http://llvm-compile-time-tracker.com/compare.php?from=2d26ef09fc87472cd42ea219c8f9267599872958&to=c92bfb8e4b98790c444031dda95d78772d9e1ea4&stat=instructions%3Au

If this patch makes sense, I will add some comments and more tests.

Address review comment #112642 (comment). This patch also fixes some non-fixpoint issues in tests.

if (auto *PHI = dyn_cast<PHINode>(U))
Worklist.pushUsersToWorkList(*PHI);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This obviously has a positive effect, but I don't quite understand why this isn't already handled by pushUsersToWorkList in the normal InstCombine loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand why this isn't already handled by pushUsersToWorkList in the normal InstCombine loop

Take the above case as an example. If the add instruction is updated, %iv and %phi will be re-queued. But %cmp cannot be re-queued since we do not change %iv.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah duh

@dtcxzyw dtcxzyw force-pushed the perf/add-phi-worklist branch from 8dbe0d8 to cbcc1dd Compare November 11, 2024 06:44
@dtcxzyw dtcxzyw marked this pull request as ready for review November 11, 2024 06:44
@dtcxzyw dtcxzyw requested a review from nikic as a code owner November 11, 2024 06:44
@llvmbot
Copy link
Member

llvmbot commented Nov 11, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Yingwei Zheng (dtcxzyw)

Changes

This patch re-queue users of phi when one of its incoming add instructions is updated. If an add instruction is updated, the analysis results of phis may be improved. Thus we may further fold some users of this phi node.

See the following case:

define i8 @<!-- -->trunc_in_loop_exit_block() {
; CHECK-LABEL: @<!-- -->trunc_in_loop_exit_block(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    br label [[LOOP:%.*]]
; CHECK:       loop:
; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
; CHECK-NEXT:    [[PHI:%.*]] = phi i32 [ 1, [[ENTRY]] ], [ [[IV_NEXT]], [[LOOP_LATCH]] ]
; CHECK-NEXT:    [[CMP:%.*]] = icmp samesign ult i32 [[IV]], 100
; CHECK-NEXT:    br i1 [[CMP]], label [[LOOP_LATCH]], label [[EXIT:%.*]]
; CHECK:       loop.latch:
; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
; CHECK-NEXT:    br label [[LOOP]]
; CHECK:       exit:
; CHECK-NEXT:    [[TRUNC:%.*]] = trunc i32 [[PHI]] to i8
; CHECK-NEXT:    ret i8 [[TRUNC]]
;
entry:
  br label %loop

loop:
  %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop.latch ]
  %phi = phi i32 [ 1, %entry ], [ %iv.next, %loop.latch ]
  %cmp = icmp ult i32 %iv, 100
  br i1 %cmp, label %loop.latch, label %exit

loop.latch:
  %iv.next = add i32 %iv, 1
  br label %loop

exit:
  %trunc = trunc i32 %phi to i8
  ret i8 %trunc
}

%iv u&lt; 100 -> infer nsw/nuw for %iv.next = add i32 %iv, 1
-> %iv is non-negative -> infer samesign for %cmp = icmp ult i32 %iv, 100.
Without re-queuing users of phi nodes, we cannot improve %cmp in one iteration.

However, it is weird that this patch doesn't handle new add instructions with nsw/nuw flags. I tried to do this in InstructionWorklist::pushUsersToWorkList. Unfortunately the compile-time impact is huge: http://llvm-compile-time-tracker.com/compare.php?from=2d26ef09fc87472cd42ea219c8f9267599872958&amp;to=c92bfb8e4b98790c444031dda95d78772d9e1ea4&amp;stat=instructions%3Au

If this patch makes sense, I will add some comments and more tests.

Address review comment #112642 (comment). This patch also fixes some non-fixpoint issues in tests.


Full diff: https://github.com/llvm/llvm-project/pull/113933.diff

4 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp (+7)
  • (modified) llvm/test/Transforms/InstCombine/2007-10-31-RangeCrash.ll (+3-10)
  • (modified) llvm/test/Transforms/InstCombine/cast_phi.ll (+1-1)
  • (modified) llvm/test/Transforms/SimpleLoopUnswitch/2007-08-01-LCSSA.ll (+6-17)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
index 21588aca512758..cde3c4f3a630db 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
@@ -1868,6 +1868,13 @@ Instruction *InstCombinerImpl::visitAdd(BinaryOperator &I) {
   if (Instruction *Res = foldBinOpOfSelectAndCastOfSelectCondition(I))
     return Res;
 
+  if (Changed) {
+    for (User *U : I.users()) {
+      if (auto *PHI = dyn_cast<PHINode>(U))
+        Worklist.pushUsersToWorkList(*PHI);
+    }
+  }
+
   return Changed ? &I : nullptr;
 }
 
diff --git a/llvm/test/Transforms/InstCombine/2007-10-31-RangeCrash.ll b/llvm/test/Transforms/InstCombine/2007-10-31-RangeCrash.ll
index b5ae08e1daa3af..3936d027de599d 100644
--- a/llvm/test/Transforms/InstCombine/2007-10-31-RangeCrash.ll
+++ b/llvm/test/Transforms/InstCombine/2007-10-31-RangeCrash.ll
@@ -1,25 +1,18 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
 ; RUN: opt < %s -S -passes=instcombine | FileCheck %s
 
-; We do not reach a fixpoint, because we first have to infer nsw on the IV add,
-; and could eliminate the icmp slt afterwards, but don't revisit it.
-
 target datalayout = "E-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f128:64:128"
 
-define i32 @test() "instcombine-no-verify-fixpoint" {
-; CHECK-LABEL: define i32 @test(
-; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+define i32 @test() {
+; CHECK-LABEL: define i32 @test() {
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    br i1 true, label [[BB_I:%.*]], label [[CALCULATECOLORSPECIFICBLACKLEVEL_EXIT:%.*]]
 ; CHECK:       bb.i:
 ; CHECK-NEXT:    br label [[BB51_I_I:%.*]]
 ; CHECK:       bb27.i.i:
-; CHECK-NEXT:    [[TMP50_I_I:%.*]] = add nsw i32 [[X_0_I_I:%.*]], 2
 ; CHECK-NEXT:    br label [[BB51_I_I]]
 ; CHECK:       bb51.i.i:
-; CHECK-NEXT:    [[X_0_I_I]] = phi i32 [ [[TMP50_I_I]], [[BB27_I_I:%.*]] ], [ 0, [[BB_I]] ]
-; CHECK-NEXT:    [[TMP54_I_I:%.*]] = icmp slt i32 [[X_0_I_I]], 0
-; CHECK-NEXT:    br i1 [[TMP54_I_I]], label [[BB27_I_I]], label [[BB57_I_I:%.*]]
+; CHECK-NEXT:    br i1 false, label [[BB27_I_I:%.*]], label [[BB57_I_I:%.*]]
 ; CHECK:       bb57.i.i:
 ; CHECK-NEXT:    ret i32 0
 ; CHECK:       calculateColorSpecificBlackLevel.exit:
diff --git a/llvm/test/Transforms/InstCombine/cast_phi.ll b/llvm/test/Transforms/InstCombine/cast_phi.ll
index 6b05edc31deb87..a457e520d30cd6 100644
--- a/llvm/test/Transforms/InstCombine/cast_phi.ll
+++ b/llvm/test/Transforms/InstCombine/cast_phi.ll
@@ -316,7 +316,7 @@ define i8 @trunc_in_loop_exit_block() "instcombine-no-verify-fixpoint" {
 ; CHECK:       loop:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
 ; CHECK-NEXT:    [[PHI:%.*]] = phi i32 [ 1, [[ENTRY]] ], [ [[IV_NEXT]], [[LOOP_LATCH]] ]
-; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i32 [[IV]], 100
+; CHECK-NEXT:    [[CMP:%.*]] = icmp samesign ult i32 [[IV]], 100
 ; CHECK-NEXT:    br i1 [[CMP]], label [[LOOP_LATCH]], label [[EXIT:%.*]]
 ; CHECK:       loop.latch:
 ; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
diff --git a/llvm/test/Transforms/SimpleLoopUnswitch/2007-08-01-LCSSA.ll b/llvm/test/Transforms/SimpleLoopUnswitch/2007-08-01-LCSSA.ll
index fb342322b2da7a..1ad57ce936bb9b 100644
--- a/llvm/test/Transforms/SimpleLoopUnswitch/2007-08-01-LCSSA.ll
+++ b/llvm/test/Transforms/SimpleLoopUnswitch/2007-08-01-LCSSA.ll
@@ -1,40 +1,29 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
 ; RUN: opt < %s -S -passes='loop(simple-loop-unswitch),instcombine' -verify-memoryssa | FileCheck %s
 
-; We do not reach a fixpoint, because we first have to infer nsw on the IV add,
-; and could eliminate the icmp slt afterwards, but don't revisit it.
-
 @.str9 = external constant [1 x i8]
 
 declare i32 @strcmp(ptr, ptr)
 
-define i32 @_ZN9Generator6strregEPKc(ptr %this, ptr %s) "instcombine-no-verify-fixpoint" {
+define i32 @_ZN9Generator6strregEPKc(ptr %this, ptr %s) {
 ; CHECK-LABEL: define i32 @_ZN9Generator6strregEPKc(
-; CHECK-SAME: ptr [[THIS:%.*]], ptr [[S:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-SAME: ptr [[THIS:%.*]], ptr [[S:%.*]]) {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[TMP122:%.*]] = icmp eq ptr [[S]], null
 ; CHECK-NEXT:    br label [[BB184:%.*]]
 ; CHECK:       bb55:
 ; CHECK-NEXT:    ret i32 0
 ; CHECK:       bb88:
-; CHECK-NEXT:    br i1 [[TMP122]], label [[BB154:%.*]], label [[BB128:%.*]]
+; CHECK-NEXT:    br i1 poison, label [[BB154:%.*]], label [[BB128:%.*]]
 ; CHECK:       bb128:
-; CHECK-NEXT:    [[TMP138:%.*]] = call i32 @strcmp(ptr noundef nonnull dereferenceable(1) null, ptr noundef nonnull dereferenceable(1) [[S]])
-; CHECK-NEXT:    [[IFTMP_37_0_IN4:%.*]] = icmp eq i32 [[TMP138]], 0
-; CHECK-NEXT:    br i1 [[IFTMP_37_0_IN4]], label [[BB250:%.*]], label [[BB166:%.*]]
+; CHECK-NEXT:    br i1 poison, label [[BB250:%.*]], label [[BB166:%.*]]
 ; CHECK:       bb154:
 ; CHECK-NEXT:    br i1 false, label [[BB250]], label [[BB166]]
 ; CHECK:       bb166:
-; CHECK-NEXT:    [[TMP175:%.*]] = add i32 [[IDX_0:%.*]], 1
-; CHECK-NEXT:    [[TMP183:%.*]] = add nsw i32 [[I33_0:%.*]], 1
 ; CHECK-NEXT:    br label [[BB184]]
 ; CHECK:       bb184:
-; CHECK-NEXT:    [[I33_0]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[TMP183]], [[BB166]] ]
-; CHECK-NEXT:    [[IDX_0]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP175]], [[BB166]] ]
-; CHECK-NEXT:    [[TMP49:%.*]] = icmp slt i32 [[I33_0]], 0
-; CHECK-NEXT:    br i1 [[TMP49]], label [[BB88:%.*]], label [[BB55:%.*]]
+; CHECK-NEXT:    br i1 false, label [[BB88:%.*]], label [[BB55:%.*]]
 ; CHECK:       bb250:
-; CHECK-NEXT:    ret i32 [[IDX_0]]
+; CHECK-NEXT:    ret i32 poison
 ;
 entry:
   %s_addr.0 = select i1 false, ptr @.str9, ptr %s

@@ -1868,6 +1868,13 @@ Instruction *InstCombinerImpl::visitAdd(BinaryOperator &I) {
if (Instruction *Res = foldBinOpOfSelectAndCastOfSelectCondition(I))
return Res;

if (Changed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're only interested in recurrence phis here, right? So can we check first if an add operand is a phi, and then whether it is in the user list? That way we can avoid the full users scan in most cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not mentioning it earlier, but I think the cleanest way to do this is to use matchSimpleRecurrence to get the PHINode. If it's not a simple recurrence we're unlikely to get anything useful out of it anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dtcxzyw dtcxzyw force-pushed the perf/add-phi-worklist branch from cbcc1dd to d2a5040 Compare November 16, 2024 10:29
@llvmbot llvmbot added the llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes label Nov 16, 2024
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dtcxzyw dtcxzyw merged commit db90673 into llvm:main Nov 18, 2024
8 checks passed
@dtcxzyw dtcxzyw deleted the perf/add-phi-worklist branch November 18, 2024 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants