-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Pipelines] Move IPSCCP after inliner pipeline #96620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Moving the Interprocedural Constant Propagation (IPSCCP) pass to run after the inliner pipeline can enhance optimization effectiveness. Performance uplift for SPEC2017:548.exchange2_r on rv64gc is over 40%.
Please update these tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently rely on IPSCCP happening early, so it can make use of dominating conditions before they have been converted to selects.
This patch causes some significant performance regressions on llvm-test-suite (rv64gc-O3-thinlto):
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-clang Author: SiHuaN (sihuan) ChangesThis patch significantly improves the performance of LLVM for the SPEC2017:548.exchange2_r benchmark, with a performance uplift of over 40% on the rv64gc. During our investigation into the significant performance disparity between GCC and LLVM on the SPEC2017:548.exchange2_r benchmark on RISC-V, we identified that the primary difference stems from constant propagation optimization. $ objdump -D exchange2_r_gcc | grep "digits_2.*:$"
000000000001d480 <__brute_force_MOD_digits_2.isra.0>:
000000000001f0f6 <__brute_force_MOD_digits_2.constprop.7.isra.0>:
000000000001fdd0 <__brute_force_MOD_digits_2.constprop.6.isra.0>:
0000000000020900 <__brute_force_MOD_digits_2.constprop.5.isra.0>:
00000000000211c4 <__brute_force_MOD_digits_2.constprop.4.isra.0>:
0000000000022002 <__brute_force_MOD_digits_2.constprop.3.isra.0>:
0000000000022d6a <__brute_force_MOD_digits_2.constprop.2.isra.0>:
0000000000023898 <__brute_force_MOD_digits_2.constprop.1.isra.0>: However, in LLVM, this function is not split: $ objdump -D exchange2_r_llvm | grep "digits_2.*:$"
00000000000115a0 <_QMbrute_forcePdigits_2>: By applying this patch, LLVM now exhibits similar behavior, resulting in a substantial performance uplift. $ objdump -D exchange2_r_patched_llvm | grep "digits_2.*:$"
0000000000011ab0 <_QMbrute_forcePdigits_2>:
0000000000018a4e <_QMbrute_forcePdigits_2.specialized.1>:
0000000000019820 <_QMbrute_forcePdigits_2.specialized.2>:
000000000001a436 <_QMbrute_forcePdigits_2.specialized.3>:
000000000001ae78 <_QMbrute_forcePdigits_2.specialized.4>:
000000000001ba8e <_QMbrute_forcePdigits_2.specialized.5>:
000000000001c7e6 <_QMbrute_forcePdigits_2.specialized.6>:
000000000001d072 <_QMbrute_forcePdigits_2.specialized.7>:
000000000001dad0 <_QMbrute_forcePdigits_2.specialized.8>: And we used
Additionally, I performed tests on x86_64, yielding similar results:
Full diff: https://github.com/llvm/llvm-project/pull/96620.diff 12 Files Affected:
diff --git a/clang/test/CodeGen/attr-counted-by.c b/clang/test/CodeGen/attr-counted-by.c
index 79922eb4159f1..8d0e39d0e3dad 100644
--- a/clang/test/CodeGen/attr-counted-by.c
+++ b/clang/test/CodeGen/attr-counted-by.c
@@ -639,7 +639,7 @@ void test6(struct anon_struct *p, int index) {
p->array[index] = __builtin_dynamic_object_size(p->array, 1);
}
-// SANITIZE-WITH-ATTR-LABEL: define dso_local i64 @test6_bdos(
+// SANITIZE-WITH-ATTR-LABEL: define dso_local range(i64 0, -3) i64 @test6_bdos(
// SANITIZE-WITH-ATTR-SAME: ptr nocapture noundef readonly [[P:%.*]]) local_unnamed_addr #[[ATTR2]] {
// SANITIZE-WITH-ATTR-NEXT: entry:
// SANITIZE-WITH-ATTR-NEXT: [[DOT_COUNTED_BY_GEP:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 8
@@ -649,7 +649,7 @@ void test6(struct anon_struct *p, int index) {
// SANITIZE-WITH-ATTR-NEXT: [[TMP1:%.*]] = select i1 [[DOTINV]], i64 0, i64 [[TMP0]]
// SANITIZE-WITH-ATTR-NEXT: ret i64 [[TMP1]]
//
-// NO-SANITIZE-WITH-ATTR-LABEL: define dso_local i64 @test6_bdos(
+// NO-SANITIZE-WITH-ATTR-LABEL: define dso_local range(i64 0, -3) i64 @test6_bdos(
// NO-SANITIZE-WITH-ATTR-SAME: ptr nocapture noundef readonly [[P:%.*]]) local_unnamed_addr #[[ATTR2]] {
// NO-SANITIZE-WITH-ATTR-NEXT: entry:
// NO-SANITIZE-WITH-ATTR-NEXT: [[DOT_COUNTED_BY_GEP:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 8
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 926515c9508a9..5659c116e9c95 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -1204,6 +1204,12 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
else
MPM.addPass(buildInlinerPipeline(Level, Phase));
+ MPM.addPass(IPSCCPPass(
+ IPSCCPOptions(/*AllowFuncSpec=*/
+ Level != OptimizationLevel::Os &&
+ Level != OptimizationLevel::Oz &&
+ !isLTOPreLink(Phase))));
+
// Remove any dead arguments exposed by cleanups, constant folding globals,
// and argument promotion.
MPM.addPass(DeadArgumentEliminationPass());
diff --git a/llvm/test/Other/new-pm-defaults.ll b/llvm/test/Other/new-pm-defaults.ll
index 489aed40c190b..32fd2e2d3ffbf 100644
--- a/llvm/test/Other/new-pm-defaults.ll
+++ b/llvm/test/Other/new-pm-defaults.ll
@@ -229,6 +229,7 @@
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
+; CHECK-O-NEXT: Running pass: IPSCCPPass
; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
; CHECK-O-NEXT: Running pass: CoroCleanupPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
index 064362eabbf83..40468e917bc40 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
@@ -1,4 +1,4 @@
-; The IR below was crafted so as:
+ ; The IR below was crafted so as:
; 1) To have a loop, so we create a loop pass manager
; 2) To be "immutable" in the sense that no pass in the standard
; pipeline will modify it.
@@ -156,6 +156,7 @@
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
+; CHECK-O-NEXT: Running pass: IPSCCPPass
; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
; CHECK-O-NEXT: Running pass: CoroCleanupPass
; CHECK-POSTLINK-O-NEXT: Running pass: GlobalOptPass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
index 19a44867e434a..b4b63d8cb3d6f 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
@@ -140,6 +140,7 @@
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
+; CHECK-O-NEXT: Running pass: IPSCCPPass
; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
; CHECK-O-NEXT: Running pass: CoroCleanupPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
index ac80a31d8fd4b..72753a169d288 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
@@ -148,6 +148,7 @@
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
+; CHECK-O-NEXT: Running pass: IPSCCPPass
; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
; CHECK-O-NEXT: Running pass: CoroCleanupPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
index 42ef49f8f7c7e..896feeb801f5f 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
@@ -188,6 +188,7 @@
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
+; CHECK-O-NEXT: Running pass: IPSCCPPass
; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
; CHECK-O-NEXT: Running pass: CoroCleanupPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
index e74f88c1a3bf9..687e9d08ee31d 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
@@ -187,6 +187,7 @@
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
+; CHECK-O-NEXT: Running pass: IPSCCPPass
; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
; CHECK-O-NEXT: Running pass: CoroCleanupPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
index 210a4ef1f7664..0f87f3a5b1615 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
@@ -152,6 +152,7 @@
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
+; CHECK-O-NEXT: Running pass: IPSCCPPass
; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
; CHECK-O-NEXT: Running pass: CoroCleanupPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/constraint-elimination-placement.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/constraint-elimination-placement.ll
index 335a850f1ec68..bd29141786ebb 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/constraint-elimination-placement.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/constraint-elimination-placement.ll
@@ -105,29 +105,20 @@ define void @test2(ptr %this) #0 {
; CHECK-NEXT: i64 17, label [[IF_END_I31:%.*]]
; CHECK-NEXT: ]
; CHECK: if.end.i:
-; CHECK-NEXT: [[CALL8_I_I:%.*]] = tail call fastcc noundef i32 @test2_fn6()
-; CHECK-NEXT: [[TRUNC_I_I:%.*]] = trunc i32 [[CALL8_I_I]] to i8
-; CHECK-NEXT: [[CALL1_I1_I:%.*]] = tail call i1 @test2_fn4(i8 [[TRUNC_I_I]])
+; CHECK-NEXT: [[CALL1_I1_I:%.*]] = tail call i1 @test2_fn4(i8 0)
; CHECK-NEXT: [[TMP0:%.*]] = xor i1 [[CALL1_I1_I]], true
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP0]])
; CHECK-NEXT: br label [[COMMON_RET]]
; CHECK: test2_fn2.exit12:
-; CHECK-NEXT: [[CALL8_I_I8:%.*]] = tail call fastcc noundef i32 @test2_fn6()
-; CHECK-NEXT: [[TRUNC_I_I9:%.*]] = trunc i32 [[CALL8_I_I8]] to i8
-; CHECK-NEXT: [[CALL1_I1_I10:%.*]] = tail call i1 @test2_fn4(i8 [[TRUNC_I_I9]])
+; CHECK-NEXT: [[CALL1_I1_I10:%.*]] = tail call i1 @test2_fn4(i8 0)
; CHECK-NEXT: [[TMP1:%.*]] = xor i1 [[CALL1_I1_I10]], true
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP1]])
-; CHECK-NEXT: [[CMP4_I11:%.*]] = icmp eq i32 [[CALL8_I_I8]], 0
-; CHECK-NEXT: br i1 [[CMP4_I11]], label [[TEST2_FN2_EXIT24:%.*]], label [[COMMON_RET]]
-; CHECK: common.ret:
-; CHECK-NEXT: ret void
-; CHECK: test2_fn2.exit24:
; CHECK-NEXT: store i8 0, ptr [[THIS]], align 4
; CHECK-NEXT: br label [[COMMON_RET]]
+; CHECK: common.ret:
+; CHECK-NEXT: ret void
; CHECK: if.end.i31:
-; CHECK-NEXT: [[CALL8_I_I32:%.*]] = tail call fastcc noundef i32 @test2_fn6()
-; CHECK-NEXT: [[TRUNC_I_I33:%.*]] = trunc i32 [[CALL8_I_I32]] to i8
-; CHECK-NEXT: [[CALL1_I1_I34:%.*]] = tail call i1 @test2_fn4(i8 [[TRUNC_I_I33]])
+; CHECK-NEXT: [[CALL1_I1_I34:%.*]] = tail call i1 @test2_fn4(i8 0)
; CHECK-NEXT: [[TMP2:%.*]] = xor i1 [[CALL1_I1_I34]], true
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP2]])
; CHECK-NEXT: br label [[COMMON_RET]]
@@ -162,16 +153,12 @@ define i1 @test2_fn2(ptr %__rhs) #0 {
; CHECK-NEXT: [[CMP2_NOT:%.*]] = icmp eq i64 [[CALL]], [[COND_I]]
; CHECK-NEXT: br i1 [[CMP2_NOT]], label [[IF_END:%.*]], label [[CLEANUP:%.*]]
; CHECK: if.end:
-; CHECK-NEXT: [[CALL8_I:%.*]] = tail call fastcc noundef i32 @test2_fn6()
-; CHECK-NEXT: [[TRUNC_I:%.*]] = trunc i32 [[CALL8_I]] to i8
-; CHECK-NEXT: [[CALL1_I1:%.*]] = tail call i1 @test2_fn4(i8 [[TRUNC_I]])
+; CHECK-NEXT: [[CALL1_I1:%.*]] = tail call i1 @test2_fn4(i8 0)
; CHECK-NEXT: [[TMP0:%.*]] = xor i1 [[CALL1_I1]], true
; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP0]])
-; CHECK-NEXT: [[CMP4:%.*]] = icmp eq i32 [[CALL8_I]], 0
; CHECK-NEXT: br label [[CLEANUP]]
; CHECK: cleanup:
-; CHECK-NEXT: [[RETVAL_0:%.*]] = phi i1 [ [[CMP4]], [[IF_END]] ], [ false, [[ENTRY:%.*]] ]
-; CHECK-NEXT: ret i1 [[RETVAL_0]]
+; CHECK-NEXT: ret i1 [[CMP2_NOT]]
;
entry:
%call = call i64 @strlen(ptr %__rhs)
@@ -231,11 +218,6 @@ entry:
}
define internal i32 @test2_fn6() {
-; CHECK-LABEL: define internal fastcc noundef i32 @test2_fn6(
-; CHECK-SAME: ) unnamed_addr #[[ATTR5]] {
-; CHECK-NEXT: entry:
-; CHECK-NEXT: ret i32 0
-;
entry:
%call = call i32 @memcmp(ptr @.str.79, ptr @.str.79, i64 2)
ret i32 %call
diff --git a/llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll b/llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll
index c33fcfbe6ed97..cc144a700d63a 100644
--- a/llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll
+++ b/llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll
@@ -9,10 +9,9 @@
define internal void @f(ptr byval(%struct.ss) align 8 %b, ptr byval(i32) align 4 %X) noinline nounwind {
; CHECK-LABEL: define {{[^@]+}}@f
-; CHECK-SAME: (i32 [[B_0:%.*]]){{[^#]*}} #[[ATTR0:[0-9]+]] {
+; CHECK-SAME: (){{[^#]*}} #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: entry:
-; CHECK-NEXT: [[TEMP:%.*]] = add i32 [[B_0]], 1
-; CHECK-NEXT: store i32 [[TEMP]], ptr [[DUMMY]], align 4
+; CHECK-NEXT: store i32 2, ptr [[DUMMY]], align 4
; CHECK-NEXT: ret void
;
entry:
@@ -27,7 +26,7 @@ define i32 @test(ptr %X) {
; CHECK-LABEL: define {{[^@]+}}@test
; CHECK-SAME: (ptr {{[^%]*}} [[X:%.*]]){{[^#]*}} #[[ATTR1:[0-9]+]] {
; CHECK-NEXT: entry:
-; CHECK-NEXT: tail call {{.*}}void @f(i32 1)
+; CHECK-NEXT: tail call {{.*}}void @f()
; CHECK-NEXT: ret i32 0
;
entry:
diff --git a/llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll b/llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll
index 689f4a9798a75..09c7b1cebef8e 100644
--- a/llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll
+++ b/llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll
@@ -1,7 +1,7 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt -passes='default<O3>' -S < %s | FileCheck %s --check-prefixes=ALL,O3
-; RUN: opt -passes='default<O2>' -S < %s | FileCheck %s --check-prefixes=ALL,O2
-; RUN: opt -passes='default<O1>' -S < %s | FileCheck %s --check-prefixes=ALL,O1
+; RUN: opt -passes='default<O3>' -S < %s | FileCheck %s
+; RUN: opt -passes='default<O2>' -S < %s | FileCheck %s
+; RUN: opt -passes='default<O1>' -S < %s | FileCheck %s
; All these tests should optimize to a single comparison
; of the original argument with null. There should be no loops.
@@ -9,10 +9,10 @@
%struct.node = type { ptr, i32 }
define dso_local zeroext i1 @is_not_empty_variant1(ptr %p) {
-; ALL-LABEL: @is_not_empty_variant1(
-; ALL-NEXT: entry:
-; ALL-NEXT: [[TOBOOL_NOT3_I:%.*]] = icmp ne ptr [[P:%.*]], null
-; ALL-NEXT: ret i1 [[TOBOOL_NOT3_I]]
+; CHECK-LABEL: @is_not_empty_variant1(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TOBOOL_NOT3_I:%.*]] = icmp ne ptr [[P:%.*]], null
+; CHECK-NEXT: ret i1 [[TOBOOL_NOT3_I]]
;
entry:
%p.addr = alloca ptr, align 8
@@ -51,10 +51,10 @@ while.end:
}
define dso_local zeroext i1 @is_not_empty_variant2(ptr %p) {
-; ALL-LABEL: @is_not_empty_variant2(
-; ALL-NEXT: entry:
-; ALL-NEXT: [[TOBOOL_NOT4_I:%.*]] = icmp ne ptr [[P:%.*]], null
-; ALL-NEXT: ret i1 [[TOBOOL_NOT4_I]]
+; CHECK-LABEL: @is_not_empty_variant2(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TOBOOL_NOT4_I:%.*]] = icmp ne ptr [[P:%.*]], null
+; CHECK-NEXT: ret i1 [[TOBOOL_NOT4_I]]
;
entry:
%p.addr = alloca ptr, align 8
@@ -96,35 +96,10 @@ while.end:
}
define dso_local zeroext i1 @is_not_empty_variant3(ptr %p) {
-; O3-LABEL: @is_not_empty_variant3(
-; O3-NEXT: entry:
-; O3-NEXT: [[TOBOOL_NOT4_I:%.*]] = icmp ne ptr [[P:%.*]], null
-; O3-NEXT: ret i1 [[TOBOOL_NOT4_I]]
-;
-; O2-LABEL: @is_not_empty_variant3(
-; O2-NEXT: entry:
-; O2-NEXT: [[TOBOOL_NOT4_I:%.*]] = icmp ne ptr [[P:%.*]], null
-; O2-NEXT: ret i1 [[TOBOOL_NOT4_I]]
-;
-; O1-LABEL: @is_not_empty_variant3(
-; O1-NEXT: entry:
-; O1-NEXT: [[TOBOOL_NOT4_I:%.*]] = icmp eq ptr [[P:%.*]], null
-; O1-NEXT: br i1 [[TOBOOL_NOT4_I]], label [[COUNT_NODES_VARIANT3_EXIT:%.*]], label [[WHILE_BODY_I:%.*]]
-; O1: while.body.i:
-; O1-NEXT: [[SIZE_06_I:%.*]] = phi i64 [ [[INC_I:%.*]], [[WHILE_BODY_I]] ], [ 0, [[ENTRY:%.*]] ]
-; O1-NEXT: [[P_ADDR_05_I:%.*]] = phi ptr [ [[TMP0:%.*]], [[WHILE_BODY_I]] ], [ [[P]], [[ENTRY]] ]
-; O1-NEXT: [[CMP_I:%.*]] = icmp ne i64 [[SIZE_06_I]], -1
-; O1-NEXT: call void @llvm.assume(i1 [[CMP_I]])
-; O1-NEXT: [[TMP0]] = load ptr, ptr [[P_ADDR_05_I]], align 8
-; O1-NEXT: [[INC_I]] = add i64 [[SIZE_06_I]], 1
-; O1-NEXT: [[TOBOOL_NOT_I:%.*]] = icmp eq ptr [[TMP0]], null
-; O1-NEXT: br i1 [[TOBOOL_NOT_I]], label [[COUNT_NODES_VARIANT3_EXIT_LOOPEXIT:%.*]], label [[WHILE_BODY_I]], !llvm.loop [[LOOP0:![0-9]+]]
-; O1: count_nodes_variant3.exit.loopexit:
-; O1-NEXT: [[PHI_CMP:%.*]] = icmp ne i64 [[INC_I]], 0
-; O1-NEXT: br label [[COUNT_NODES_VARIANT3_EXIT]]
-; O1: count_nodes_variant3.exit:
-; O1-NEXT: [[SIZE_0_LCSSA_I:%.*]] = phi i1 [ false, [[ENTRY]] ], [ [[PHI_CMP]], [[COUNT_NODES_VARIANT3_EXIT_LOOPEXIT]] ]
-; O1-NEXT: ret i1 [[SIZE_0_LCSSA_I]]
+; CHECK-LABEL: @is_not_empty_variant3(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TOBOOL_NOT4_I:%.*]] = icmp ne ptr [[P:%.*]], null
+; CHECK-NEXT: ret i1 [[TOBOOL_NOT4_I]]
;
entry:
%p.addr = alloca ptr, align 8
|
It has been fixed. But this patch didn't show a positive net effect :( |
Does that mean it has a negative net effect, or its neutral (in which case the original motivating case should be enough). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running IPSCCP twice seems like quite a heavy hammer, I'd expect a noticeable compile-time impact.
I'd recommend to try to extract a reproducer from your motivating use case and check why IPSCCP cannot perform the desired optimization before inlining. Note that we run SCCP after inlining I think, which is the non-IP version of IPSCCP
It is neutral.
But this patch may have an impact on compile time. |
Yes, this causes a massive compile-time regression: http://llvm-compile-time-tracker.com/compare.php?from=253a294b54a6096a0b66f840931dd0e345d70c4f&to=20869e8ed55c58696ce4e9dc3d18cd6af01ed49c&stat=instructions:u |
@sihuan Are you testing your patch on LTO? Before applying your patch, the function is split into two patches:
With your patch, it exhibits the same behavior. Do you know how I can get the same behavior as gcc where digits_2 is split into eight parts? Are you testing with other options? |
@joshua-arch1
When using LTO, I don't think there is a problem that performance is worse than GCC, so this patch is not specific to LTO. |
Hi @sihuan, I have tried to apply yout initial design where IPSCCP is directly moved after inliner. It can work with LTO and |
This patch significantly improves the performance of LLVM for the SPEC2017:548.exchange2_r benchmark, with a performance uplift of over 40% on the rv64gc.
During our investigation into the significant performance disparity between GCC and LLVM on the SPEC2017:548.exchange2_r benchmark on RISC-V, we identified that the primary difference stems from constant propagation optimization.
In GCC, the hotspot function
digits_2
is split into several parts:However, in LLVM, this function is not split:
By applying this patch, LLVM now exhibits similar behavior, resulting in a substantial performance uplift.
And we used
perf stat
to measure the instruction count forexchange2_r 0
on rv64gc, as shown in the table below:Additionally, I performed tests on x86_64, yielding similar results: