[Pipelines] Move IPSCCP after inliner pipeline #96620

sihuan · 2024-06-25T10:59:49Z

This patch significantly improves the performance of LLVM for the SPEC2017:548.exchange2_r benchmark, with a performance uplift of over 40% on the rv64gc.

During our investigation into the significant performance disparity between GCC and LLVM on the SPEC2017:548.exchange2_r benchmark on RISC-V, we identified that the primary difference stems from constant propagation optimization.
In GCC, the hotspot function digits_2 is split into several parts:

$ objdump -D exchange2_r_gcc | grep "digits_2.*:$"
000000000001d480 <__brute_force_MOD_digits_2.isra.0>:
000000000001f0f6 <__brute_force_MOD_digits_2.constprop.7.isra.0>:
000000000001fdd0 <__brute_force_MOD_digits_2.constprop.6.isra.0>:
0000000000020900 <__brute_force_MOD_digits_2.constprop.5.isra.0>:
00000000000211c4 <__brute_force_MOD_digits_2.constprop.4.isra.0>:
0000000000022002 <__brute_force_MOD_digits_2.constprop.3.isra.0>:
0000000000022d6a <__brute_force_MOD_digits_2.constprop.2.isra.0>:
0000000000023898 <__brute_force_MOD_digits_2.constprop.1.isra.0>:

However, in LLVM, this function is not split:

$ objdump -D exchange2_r_llvm | grep "digits_2.*:$"
00000000000115a0 <_QMbrute_forcePdigits_2>:

By applying this patch, LLVM now exhibits similar behavior, resulting in a substantial performance uplift.

$ objdump -D exchange2_r_patched_llvm | grep "digits_2.*:$"
0000000000011ab0 <_QMbrute_forcePdigits_2>:
0000000000018a4e <_QMbrute_forcePdigits_2.specialized.1>:
0000000000019820 <_QMbrute_forcePdigits_2.specialized.2>:
000000000001a436 <_QMbrute_forcePdigits_2.specialized.3>:
000000000001ae78 <_QMbrute_forcePdigits_2.specialized.4>:
000000000001ba8e <_QMbrute_forcePdigits_2.specialized.5>:
000000000001c7e6 <_QMbrute_forcePdigits_2.specialized.6>:
000000000001d072 <_QMbrute_forcePdigits_2.specialized.7>:
000000000001dad0 <_QMbrute_forcePdigits_2.specialized.8>:

And we used perf stat to measure the instruction count for exchange2_r 0 on rv64gc, as shown in the table below:

Compiler	Instructions
GCC #d28ea8e5	55,965,728,914
LLVM #62d44fbd	105,416,890,241
LLVM #62d44fbd with this patch	62,693,427,761

Additionally, I performed tests on x86_64, yielding similar results:

Compiler	cpu_atom instructions
LLVM #62d44fbd	100,147,914,793
LLVM #62d44fbd with this patch	53,077,337,115

Moving the Interprocedural Constant Propagation (IPSCCP) pass to run after the inliner pipeline can enhance optimization effectiveness. Performance uplift for SPEC2017:548.exchange2_r on rv64gc is over 40%.

dtcxzyw · 2024-06-25T12:41:17Z

Failed Tests (14):
LLVM :: Other/new-pm-defaults.ll
LLVM :: Other/new-pm-thinlto-postlink-defaults.ll
LLVM :: Other/new-pm-thinlto-postlink-pgo-defaults.ll
LLVM :: Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
LLVM :: Other/new-pm-thinlto-prelink-defaults.ll
LLVM :: Other/new-pm-thinlto-prelink-pgo-defaults.ll
LLVM :: Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
LLVM :: ThinLTO/X86/devirt2.ll
LLVM :: ThinLTO/X86/devirt_promote.ll
LLVM :: ThinLTO/X86/devirt_promote_legacy.ll
LLVM :: Transforms/PhaseOrdering/AArch64/constraint-elimination-placement.ll
LLVM :: Transforms/PhaseOrdering/dce-after-argument-promotion.ll
LLVM :: Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll
LLVM :: Transforms/PhaseOrdering/icmp-ashr-breaking-select-idiom.ll

Please update these tests.

nikic

We currently rely on IPSCCP happening early, so it can make use of dominating conditions before they have been converted to selects.

dtcxzyw · 2024-06-25T13:02:47Z

This patch causes some significant performance regressions on llvm-test-suite (rv64gc-O3-thinlto):

Name	Before	After	Ratio
SingleSource/Benchmarks/Shootout/Shootout-random	2.150161677	3.300161641	+ 53.5%
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv	0.111845159	0.145389494	+30.0%
SingleSource/Benchmarks/Adobe-C++/functionobjects	5.489498263	6.827863965	+24.4%

…inliner pipeline

llvmbot · 2024-06-27T08:24:49Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-clang

Author: SiHuaN (sihuan)

Changes

This patch significantly improves the performance of LLVM for the SPEC2017:548.exchange2_r benchmark, with a performance uplift of over 40% on the rv64gc.

During our investigation into the significant performance disparity between GCC and LLVM on the SPEC2017:548.exchange2_r benchmark on RISC-V, we identified that the primary difference stems from constant propagation optimization.
In GCC, the hotspot function digits_2 is split into several parts:

$ objdump -D exchange2_r_gcc | grep "digits_2.*:$"
000000000001d480 &lt;__brute_force_MOD_digits_2.isra.0&gt;:
000000000001f0f6 &lt;__brute_force_MOD_digits_2.constprop.7.isra.0&gt;:
000000000001fdd0 &lt;__brute_force_MOD_digits_2.constprop.6.isra.0&gt;:
0000000000020900 &lt;__brute_force_MOD_digits_2.constprop.5.isra.0&gt;:
00000000000211c4 &lt;__brute_force_MOD_digits_2.constprop.4.isra.0&gt;:
0000000000022002 &lt;__brute_force_MOD_digits_2.constprop.3.isra.0&gt;:
0000000000022d6a &lt;__brute_force_MOD_digits_2.constprop.2.isra.0&gt;:
0000000000023898 &lt;__brute_force_MOD_digits_2.constprop.1.isra.0&gt;:

However, in LLVM, this function is not split:

$ objdump -D exchange2_r_llvm | grep "digits_2.*:$"
00000000000115a0 &lt;_QMbrute_forcePdigits_2&gt;:

By applying this patch, LLVM now exhibits similar behavior, resulting in a substantial performance uplift.

$ objdump -D exchange2_r_patched_llvm | grep "digits_2.*:$"
0000000000011ab0 &lt;_QMbrute_forcePdigits_2&gt;:
0000000000018a4e &lt;_QMbrute_forcePdigits_2.specialized.1&gt;:
0000000000019820 &lt;_QMbrute_forcePdigits_2.specialized.2&gt;:
000000000001a436 &lt;_QMbrute_forcePdigits_2.specialized.3&gt;:
000000000001ae78 &lt;_QMbrute_forcePdigits_2.specialized.4&gt;:
000000000001ba8e &lt;_QMbrute_forcePdigits_2.specialized.5&gt;:
000000000001c7e6 &lt;_QMbrute_forcePdigits_2.specialized.6&gt;:
000000000001d072 &lt;_QMbrute_forcePdigits_2.specialized.7&gt;:
000000000001dad0 &lt;_QMbrute_forcePdigits_2.specialized.8&gt;:

And we used perf stat to measure the instruction count for exchange2_r 0 on rv64gc, as shown in the table below:

Compiler	Instructions
GCC #d28ea8e5	55,965,728,914
LLVM #62d44fbd	105,416,890,241
LLVM #62d44fbd with this patch	62,693,427,761

Additionally, I performed tests on x86_64, yielding similar results:

Compiler	cpu_atom instructions
LLVM #62d44fbd	100,147,914,793
LLVM #62d44fbd with this patch	53,077,337,115

Full diff: https://github.com/llvm/llvm-project/pull/96620.diff

12 Files Affected:

(modified) clang/test/CodeGen/attr-counted-by.c (+2-2)
(modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+6)
(modified) llvm/test/Other/new-pm-defaults.ll (+1)
(modified) llvm/test/Other/new-pm-thinlto-postlink-defaults.ll (+2-1)
(modified) llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll (+1)
(modified) llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll (+1)
(modified) llvm/test/Other/new-pm-thinlto-prelink-defaults.ll (+1)
(modified) llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll (+1)
(modified) llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll (+1)
(modified) llvm/test/Transforms/PhaseOrdering/AArch64/constraint-elimination-placement.ll (+7-25)
(modified) llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll (+3-4)
(modified) llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll (+15-40)

diff --git a/clang/test/CodeGen/attr-counted-by.c b/clang/test/CodeGen/attr-counted-by.c
index 79922eb4159f1..8d0e39d0e3dad 100644
--- a/clang/test/CodeGen/attr-counted-by.c
+++ b/clang/test/CodeGen/attr-counted-by.c
@@ -639,7 +639,7 @@ void test6(struct anon_struct *p, int index) {
   p->array[index] = __builtin_dynamic_object_size(p->array, 1);
 }
 
-// SANITIZE-WITH-ATTR-LABEL: define dso_local i64 @test6_bdos(
+// SANITIZE-WITH-ATTR-LABEL: define dso_local range(i64 0, -3) i64 @test6_bdos(
 // SANITIZE-WITH-ATTR-SAME: ptr nocapture noundef readonly [[P:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // SANITIZE-WITH-ATTR-NEXT:  entry:
 // SANITIZE-WITH-ATTR-NEXT:    [[DOT_COUNTED_BY_GEP:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 8
@@ -649,7 +649,7 @@ void test6(struct anon_struct *p, int index) {
 // SANITIZE-WITH-ATTR-NEXT:    [[TMP1:%.*]] = select i1 [[DOTINV]], i64 0, i64 [[TMP0]]
 // SANITIZE-WITH-ATTR-NEXT:    ret i64 [[TMP1]]
 //
-// NO-SANITIZE-WITH-ATTR-LABEL: define dso_local i64 @test6_bdos(
+// NO-SANITIZE-WITH-ATTR-LABEL: define dso_local range(i64 0, -3) i64 @test6_bdos(
 // NO-SANITIZE-WITH-ATTR-SAME: ptr nocapture noundef readonly [[P:%.*]]) local_unnamed_addr #[[ATTR2]] {
 // NO-SANITIZE-WITH-ATTR-NEXT:  entry:
 // NO-SANITIZE-WITH-ATTR-NEXT:    [[DOT_COUNTED_BY_GEP:%.*]] = getelementptr inbounds i8, ptr [[P]], i64 8
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 926515c9508a9..5659c116e9c95 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -1204,6 +1204,12 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
   else
     MPM.addPass(buildInlinerPipeline(Level, Phase));
 
+  MPM.addPass(IPSCCPPass(
+              IPSCCPOptions(/*AllowFuncSpec=*/
+                            Level != OptimizationLevel::Os &&
+                            Level != OptimizationLevel::Oz &&
+                            !isLTOPreLink(Phase))));
+
   // Remove any dead arguments exposed by cleanups, constant folding globals,
   // and argument promotion.
   MPM.addPass(DeadArgumentEliminationPass());
diff --git a/llvm/test/Other/new-pm-defaults.ll b/llvm/test/Other/new-pm-defaults.ll
index 489aed40c190b..32fd2e2d3ffbf 100644
--- a/llvm/test/Other/new-pm-defaults.ll
+++ b/llvm/test/Other/new-pm-defaults.ll
@@ -229,6 +229,7 @@
 ; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
+; CHECK-O-NEXT: Running pass: IPSCCPPass
 ; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
 ; CHECK-O-NEXT: Running pass: CoroCleanupPass
 ; CHECK-O-NEXT: Running pass: GlobalOptPass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
index 064362eabbf83..40468e917bc40 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
@@ -1,4 +1,4 @@
-; The IR below was crafted so as:
+	; The IR below was crafted so as:
 ; 1) To have a loop, so we create a loop pass manager
 ; 2) To be "immutable" in the sense that no pass in the standard
 ;    pipeline will modify it.
@@ -156,6 +156,7 @@
 ; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
+; CHECK-O-NEXT: Running pass: IPSCCPPass
 ; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
 ; CHECK-O-NEXT: Running pass: CoroCleanupPass
 ; CHECK-POSTLINK-O-NEXT: Running pass: GlobalOptPass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
index 19a44867e434a..b4b63d8cb3d6f 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
@@ -140,6 +140,7 @@
 ; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
+; CHECK-O-NEXT: Running pass: IPSCCPPass
 ; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
 ; CHECK-O-NEXT: Running pass: CoroCleanupPass
 ; CHECK-O-NEXT: Running pass: GlobalOptPass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
index ac80a31d8fd4b..72753a169d288 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
@@ -148,6 +148,7 @@
 ; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
+; CHECK-O-NEXT: Running pass: IPSCCPPass
 ; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
 ; CHECK-O-NEXT: Running pass: CoroCleanupPass
 ; CHECK-O-NEXT: Running pass: GlobalOptPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
index 42ef49f8f7c7e..896feeb801f5f 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
@@ -188,6 +188,7 @@
 ; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
+; CHECK-O-NEXT: Running pass: IPSCCPPass
 ; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
 ; CHECK-O-NEXT: Running pass: CoroCleanupPass
 ; CHECK-O-NEXT: Running pass: GlobalOptPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
index e74f88c1a3bf9..687e9d08ee31d 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
@@ -187,6 +187,7 @@
 ; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
+; CHECK-O-NEXT: Running pass: IPSCCPPass
 ; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
 ; CHECK-O-NEXT: Running pass: CoroCleanupPass
 ; CHECK-O-NEXT: Running pass: GlobalOptPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
index 210a4ef1f7664..0f87f3a5b1615 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
@@ -152,6 +152,7 @@
 ; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
 ; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
+; CHECK-O-NEXT: Running pass: IPSCCPPass
 ; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
 ; CHECK-O-NEXT: Running pass: CoroCleanupPass
 ; CHECK-O-NEXT: Running pass: GlobalOptPass
diff --git a/llvm/test/Transforms/PhaseOrdering/AArch64/constraint-elimination-placement.ll b/llvm/test/Transforms/PhaseOrdering/AArch64/constraint-elimination-placement.ll
index 335a850f1ec68..bd29141786ebb 100644
--- a/llvm/test/Transforms/PhaseOrdering/AArch64/constraint-elimination-placement.ll
+++ b/llvm/test/Transforms/PhaseOrdering/AArch64/constraint-elimination-placement.ll
@@ -105,29 +105,20 @@ define void @test2(ptr %this) #0 {
 ; CHECK-NEXT:      i64 17, label [[IF_END_I31:%.*]]
 ; CHECK-NEXT:    ]
 ; CHECK:       if.end.i:
-; CHECK-NEXT:    [[CALL8_I_I:%.*]] = tail call fastcc noundef i32 @test2_fn6()
-; CHECK-NEXT:    [[TRUNC_I_I:%.*]] = trunc i32 [[CALL8_I_I]] to i8
-; CHECK-NEXT:    [[CALL1_I1_I:%.*]] = tail call i1 @test2_fn4(i8 [[TRUNC_I_I]])
+; CHECK-NEXT:    [[CALL1_I1_I:%.*]] = tail call i1 @test2_fn4(i8 0)
 ; CHECK-NEXT:    [[TMP0:%.*]] = xor i1 [[CALL1_I1_I]], true
 ; CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP0]])
 ; CHECK-NEXT:    br label [[COMMON_RET]]
 ; CHECK:       test2_fn2.exit12:
-; CHECK-NEXT:    [[CALL8_I_I8:%.*]] = tail call fastcc noundef i32 @test2_fn6()
-; CHECK-NEXT:    [[TRUNC_I_I9:%.*]] = trunc i32 [[CALL8_I_I8]] to i8
-; CHECK-NEXT:    [[CALL1_I1_I10:%.*]] = tail call i1 @test2_fn4(i8 [[TRUNC_I_I9]])
+; CHECK-NEXT:    [[CALL1_I1_I10:%.*]] = tail call i1 @test2_fn4(i8 0)
 ; CHECK-NEXT:    [[TMP1:%.*]] = xor i1 [[CALL1_I1_I10]], true
 ; CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP1]])
-; CHECK-NEXT:    [[CMP4_I11:%.*]] = icmp eq i32 [[CALL8_I_I8]], 0
-; CHECK-NEXT:    br i1 [[CMP4_I11]], label [[TEST2_FN2_EXIT24:%.*]], label [[COMMON_RET]]
-; CHECK:       common.ret:
-; CHECK-NEXT:    ret void
-; CHECK:       test2_fn2.exit24:
 ; CHECK-NEXT:    store i8 0, ptr [[THIS]], align 4
 ; CHECK-NEXT:    br label [[COMMON_RET]]
+; CHECK:       common.ret:
+; CHECK-NEXT:    ret void
 ; CHECK:       if.end.i31:
-; CHECK-NEXT:    [[CALL8_I_I32:%.*]] = tail call fastcc noundef i32 @test2_fn6()
-; CHECK-NEXT:    [[TRUNC_I_I33:%.*]] = trunc i32 [[CALL8_I_I32]] to i8
-; CHECK-NEXT:    [[CALL1_I1_I34:%.*]] = tail call i1 @test2_fn4(i8 [[TRUNC_I_I33]])
+; CHECK-NEXT:    [[CALL1_I1_I34:%.*]] = tail call i1 @test2_fn4(i8 0)
 ; CHECK-NEXT:    [[TMP2:%.*]] = xor i1 [[CALL1_I1_I34]], true
 ; CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP2]])
 ; CHECK-NEXT:    br label [[COMMON_RET]]
@@ -162,16 +153,12 @@ define i1 @test2_fn2(ptr %__rhs) #0 {
 ; CHECK-NEXT:    [[CMP2_NOT:%.*]] = icmp eq i64 [[CALL]], [[COND_I]]
 ; CHECK-NEXT:    br i1 [[CMP2_NOT]], label [[IF_END:%.*]], label [[CLEANUP:%.*]]
 ; CHECK:       if.end:
-; CHECK-NEXT:    [[CALL8_I:%.*]] = tail call fastcc noundef i32 @test2_fn6()
-; CHECK-NEXT:    [[TRUNC_I:%.*]] = trunc i32 [[CALL8_I]] to i8
-; CHECK-NEXT:    [[CALL1_I1:%.*]] = tail call i1 @test2_fn4(i8 [[TRUNC_I]])
+; CHECK-NEXT:    [[CALL1_I1:%.*]] = tail call i1 @test2_fn4(i8 0)
 ; CHECK-NEXT:    [[TMP0:%.*]] = xor i1 [[CALL1_I1]], true
 ; CHECK-NEXT:    tail call void @llvm.assume(i1 [[TMP0]])
-; CHECK-NEXT:    [[CMP4:%.*]] = icmp eq i32 [[CALL8_I]], 0
 ; CHECK-NEXT:    br label [[CLEANUP]]
 ; CHECK:       cleanup:
-; CHECK-NEXT:    [[RETVAL_0:%.*]] = phi i1 [ [[CMP4]], [[IF_END]] ], [ false, [[ENTRY:%.*]] ]
-; CHECK-NEXT:    ret i1 [[RETVAL_0]]
+; CHECK-NEXT:    ret i1 [[CMP2_NOT]]
 ;
 entry:
   %call = call i64 @strlen(ptr %__rhs)
@@ -231,11 +218,6 @@ entry:
 }
 
 define internal i32 @test2_fn6() {
-; CHECK-LABEL: define internal fastcc noundef i32 @test2_fn6(
-; CHECK-SAME: ) unnamed_addr #[[ATTR5]] {
-; CHECK-NEXT:  entry:
-; CHECK-NEXT:    ret i32 0
-;
 entry:
   %call = call i32 @memcmp(ptr @.str.79, ptr @.str.79, i64 2)
   ret i32 %call
diff --git a/llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll b/llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll
index c33fcfbe6ed97..cc144a700d63a 100644
--- a/llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll
+++ b/llvm/test/Transforms/PhaseOrdering/dce-after-argument-promotion.ll
@@ -9,10 +9,9 @@
 
 define internal void @f(ptr byval(%struct.ss) align 8 %b, ptr byval(i32) align 4 %X) noinline nounwind  {
 ; CHECK-LABEL: define {{[^@]+}}@f
-; CHECK-SAME: (i32 [[B_0:%.*]]){{[^#]*}} #[[ATTR0:[0-9]+]] {
+; CHECK-SAME: (){{[^#]*}} #[[ATTR0:[0-9]+]] {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[TEMP:%.*]] = add i32 [[B_0]], 1
-; CHECK-NEXT:    store i32 [[TEMP]], ptr [[DUMMY]], align 4
+; CHECK-NEXT:    store i32 2, ptr [[DUMMY]], align 4
 ; CHECK-NEXT:    ret void
 ;
 entry:
@@ -27,7 +26,7 @@ define i32 @test(ptr %X) {
 ; CHECK-LABEL: define {{[^@]+}}@test
 ; CHECK-SAME: (ptr {{[^%]*}} [[X:%.*]]){{[^#]*}} #[[ATTR1:[0-9]+]] {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    tail call {{.*}}void @f(i32 1)
+; CHECK-NEXT:    tail call {{.*}}void @f()
 ; CHECK-NEXT:    ret i32 0
 ;
 entry:
diff --git a/llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll b/llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll
index 689f4a9798a75..09c7b1cebef8e 100644
--- a/llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll
+++ b/llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll
@@ -1,7 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt -passes='default<O3>' -S < %s  | FileCheck %s --check-prefixes=ALL,O3
-; RUN: opt -passes='default<O2>' -S < %s  | FileCheck %s --check-prefixes=ALL,O2
-; RUN: opt -passes='default<O1>' -S < %s  | FileCheck %s --check-prefixes=ALL,O1
+; RUN: opt -passes='default<O3>' -S < %s  | FileCheck %s
+; RUN: opt -passes='default<O2>' -S < %s  | FileCheck %s
+; RUN: opt -passes='default<O1>' -S < %s  | FileCheck %s
 
 ; All these tests should optimize to a single comparison
 ; of the original argument with null. There should be no loops.
@@ -9,10 +9,10 @@
 %struct.node = type { ptr, i32 }
 
 define dso_local zeroext i1 @is_not_empty_variant1(ptr %p) {
-; ALL-LABEL: @is_not_empty_variant1(
-; ALL-NEXT:  entry:
-; ALL-NEXT:    [[TOBOOL_NOT3_I:%.*]] = icmp ne ptr [[P:%.*]], null
-; ALL-NEXT:    ret i1 [[TOBOOL_NOT3_I]]
+; CHECK-LABEL: @is_not_empty_variant1(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TOBOOL_NOT3_I:%.*]] = icmp ne ptr [[P:%.*]], null
+; CHECK-NEXT:    ret i1 [[TOBOOL_NOT3_I]]
 ;
 entry:
   %p.addr = alloca ptr, align 8
@@ -51,10 +51,10 @@ while.end:
 }
 
 define dso_local zeroext i1 @is_not_empty_variant2(ptr %p) {
-; ALL-LABEL: @is_not_empty_variant2(
-; ALL-NEXT:  entry:
-; ALL-NEXT:    [[TOBOOL_NOT4_I:%.*]] = icmp ne ptr [[P:%.*]], null
-; ALL-NEXT:    ret i1 [[TOBOOL_NOT4_I]]
+; CHECK-LABEL: @is_not_empty_variant2(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TOBOOL_NOT4_I:%.*]] = icmp ne ptr [[P:%.*]], null
+; CHECK-NEXT:    ret i1 [[TOBOOL_NOT4_I]]
 ;
 entry:
   %p.addr = alloca ptr, align 8
@@ -96,35 +96,10 @@ while.end:
 }
 
 define dso_local zeroext i1 @is_not_empty_variant3(ptr %p) {
-; O3-LABEL: @is_not_empty_variant3(
-; O3-NEXT:  entry:
-; O3-NEXT:    [[TOBOOL_NOT4_I:%.*]] = icmp ne ptr [[P:%.*]], null
-; O3-NEXT:    ret i1 [[TOBOOL_NOT4_I]]
-;
-; O2-LABEL: @is_not_empty_variant3(
-; O2-NEXT:  entry:
-; O2-NEXT:    [[TOBOOL_NOT4_I:%.*]] = icmp ne ptr [[P:%.*]], null
-; O2-NEXT:    ret i1 [[TOBOOL_NOT4_I]]
-;
-; O1-LABEL: @is_not_empty_variant3(
-; O1-NEXT:  entry:
-; O1-NEXT:    [[TOBOOL_NOT4_I:%.*]] = icmp eq ptr [[P:%.*]], null
-; O1-NEXT:    br i1 [[TOBOOL_NOT4_I]], label [[COUNT_NODES_VARIANT3_EXIT:%.*]], label [[WHILE_BODY_I:%.*]]
-; O1:       while.body.i:
-; O1-NEXT:    [[SIZE_06_I:%.*]] = phi i64 [ [[INC_I:%.*]], [[WHILE_BODY_I]] ], [ 0, [[ENTRY:%.*]] ]
-; O1-NEXT:    [[P_ADDR_05_I:%.*]] = phi ptr [ [[TMP0:%.*]], [[WHILE_BODY_I]] ], [ [[P]], [[ENTRY]] ]
-; O1-NEXT:    [[CMP_I:%.*]] = icmp ne i64 [[SIZE_06_I]], -1
-; O1-NEXT:    call void @llvm.assume(i1 [[CMP_I]])
-; O1-NEXT:    [[TMP0]] = load ptr, ptr [[P_ADDR_05_I]], align 8
-; O1-NEXT:    [[INC_I]] = add i64 [[SIZE_06_I]], 1
-; O1-NEXT:    [[TOBOOL_NOT_I:%.*]] = icmp eq ptr [[TMP0]], null
-; O1-NEXT:    br i1 [[TOBOOL_NOT_I]], label [[COUNT_NODES_VARIANT3_EXIT_LOOPEXIT:%.*]], label [[WHILE_BODY_I]], !llvm.loop [[LOOP0:![0-9]+]]
-; O1:       count_nodes_variant3.exit.loopexit:
-; O1-NEXT:    [[PHI_CMP:%.*]] = icmp ne i64 [[INC_I]], 0
-; O1-NEXT:    br label [[COUNT_NODES_VARIANT3_EXIT]]
-; O1:       count_nodes_variant3.exit:
-; O1-NEXT:    [[SIZE_0_LCSSA_I:%.*]] = phi i1 [ false, [[ENTRY]] ], [ [[PHI_CMP]], [[COUNT_NODES_VARIANT3_EXIT_LOOPEXIT]] ]
-; O1-NEXT:    ret i1 [[SIZE_0_LCSSA_I]]
+; CHECK-LABEL: @is_not_empty_variant3(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TOBOOL_NOT4_I:%.*]] = icmp ne ptr [[P:%.*]], null
+; CHECK-NEXT:    ret i1 [[TOBOOL_NOT4_I]]
 ;
 entry:
   %p.addr = alloca ptr, align 8

dtcxzyw · 2024-06-27T10:24:51Z

This patch causes some significant performance regressions on llvm-test-suite (rv64gc-O3-thinlto):

Name Before After Ratio
SingleSource/Benchmarks/Shootout/Shootout-random 2.150161677 3.300161641 + 53.5%
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv 0.111845159 0.145389494 +30.0%
SingleSource/Benchmarks/Adobe-C++/functionobjects 5.489498263 6.827863965 +24.4%

It has been fixed. But this patch didn't show a positive net effect :(

goldsteinn · 2024-06-27T10:33:34Z

This patch causes some significant performance regressions on llvm-test-suite (rv64gc-O3-thinlto):
Name Before After Ratio
SingleSource/Benchmarks/Shootout/Shootout-random 2.150161677 3.300161641 + 53.5%
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv/trisolv 0.111845159 0.145389494 +30.0%
SingleSource/Benchmarks/Adobe-C++/functionobjects 5.489498263 6.827863965 +24.4%

It has been fixed. But this patch didn't show a positive net effect :(

Does that mean it has a negative net effect, or its neutral (in which case the original motivating case should be enough).

fhahn

Running IPSCCP twice seems like quite a heavy hammer, I'd expect a noticeable compile-time impact.

I'd recommend to try to extract a reproducer from your motivating use case and check why IPSCCP cannot perform the desired optimization before inlining. Note that we run SCCP after inlining I think, which is the non-IP version of IPSCCP

dtcxzyw · 2024-06-27T10:51:52Z

Does that mean it has a negative net effect, or its neutral

It is neutral.

(in which case the original motivating case should be enough).

But this patch may have an impact on compile time.

nikic · 2024-06-27T12:38:11Z

Yes, this causes a massive compile-time regression: http://llvm-compile-time-tracker.com/compare.php?from=253a294b54a6096a0b66f840931dd0e345d70c4f&to=20869e8ed55c58696ce4e9dc3d18cd6af01ed49c&stat=instructions:u

joshua-arch1 · 2025-01-16T09:21:11Z

@sihuan Are you testing your patch on LTO? Before applying your patch, the function is split into two patches:

0000000000071270 <_QMbrute_forcePdigits_2.specialized.1>:
0000000000074630 <_QMbrute_forcePdigits_2.specialized.5>:

With your patch, it exhibits the same behavior. Do you know how I can get the same behavior as gcc where digits_2 is split into eight parts? Are you testing with other options?

sihuan · 2025-01-16T12:56:56Z

@joshua-arch1
I tested this patch on LTO and it is consistent with what you describe.
With the -flto option enabled is that digits_2 has two specializations whether the patch is applied or not.
However with LTO enabled, LLVM performance is comparable to GCC (in terms of number of instructions ). Here are some figures:

on x86_64 ArchLinux
via perf stat ./exchange2_r 0

Compiler	Instructions
gfortran O3 (archlinux gcc-fortran 14.2.1+r134+gab884fffe3fc-2)	54,865,827,741
gfortran O3 lto (archlinux gcc-fortran 14.2.1+r134+gab884fffe3fc-2)	54,186,161,980
flang O3 (#62d44fbd)	107,953,750,439
flang O3 lto (#62d44fbd)	53,391,922,257
flang O3 (patched #62d44fbd)	53,146,581,727
flang O3 lto (patched #62d44fbd)	53,391,760,016

When using LTO, I don't think there is a problem that performance is worse than GCC, so this patch is not specific to LTO.

joshua-arch1 · 2025-01-20T09:37:57Z

@joshua-arch1 I tested this patch on LTO and it is consistent with what you describe. With the -flto option enabled is that digits_2 has two specializations whether the patch is applied or not. However with LTO enabled, LLVM performance is comparable to GCC (in terms of number of instructions ). Here are some figures:

on x86_64 ArchLinux
via perf stat ./exchange2_r 0

Compiler Instructions
gfortran O3 (archlinux gcc-fortran 14.2.1+r134+gab884fffe3fc-2) 54,865,827,741
gfortran O3 lto (archlinux gcc-fortran 14.2.1+r134+gab884fffe3fc-2) 54,186,161,980
flang O3 (#62d44fbd) 107,953,750,439
flang O3 lto (#62d44fbd) 53,391,922,257
flang O3 (patched #62d44fbd) 53,146,581,727
flang O3 lto (patched #62d44fbd) 53,391,760,016
When using LTO, I don't think there is a problem that performance is worse than GCC, so this patch is not specific to LTO.

Hi @sihuan, I have tried to apply yout initial design where IPSCCP is directly moved after inliner. It can work with LTO and digits_2 has eight specializations just like gfortran.

[Pipelines] Move IPSCCP after inliner pipeline

abf211c

Moving the Interprocedural Constant Propagation (IPSCCP) pass to run after the inliner pipeline can enhance optimization effectiveness. Performance uplift for SPEC2017:548.exchange2_r on rv64gc is over 40%.

sihuan requested review from teresajohnson and nikic June 25, 2024 11:04

dtcxzyw mentioned this pull request Jun 25, 2024

Test PR96620 plctlab/llvm-ci#1159

Open

nikic requested changes Jun 25, 2024

View reviewed changes

sihuan added 2 commits June 27, 2024 15:11

Restore IPSCCP Pass to its original position and repeat it after the …

1f8eef8

…inliner pipeline

update tests

1c46928

llvmbot added clang Clang issues not falling into any other category llvm:transforms labels Jun 27, 2024

fix typo

8c583ce

fhahn reviewed Jun 27, 2024

View reviewed changes

hazzlim mentioned this pull request Oct 7, 2024

[LTO] Run Argument Promotion before IPSCCP #111163

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Pipelines] Move IPSCCP after inliner pipeline #96620

[Pipelines] Move IPSCCP after inliner pipeline #96620

Uh oh!

sihuan commented Jun 25, 2024

Uh oh!

dtcxzyw commented Jun 25, 2024

Uh oh!

nikic left a comment

Uh oh!

dtcxzyw commented Jun 25, 2024

Uh oh!

llvmbot commented Jun 27, 2024 •

edited

Loading

Uh oh!

dtcxzyw commented Jun 27, 2024

Uh oh!

goldsteinn commented Jun 27, 2024

Uh oh!

fhahn left a comment

Uh oh!

dtcxzyw commented Jun 27, 2024

Uh oh!

nikic commented Jun 27, 2024

Uh oh!

joshua-arch1 commented Jan 16, 2025 •

edited

Loading

Uh oh!

sihuan commented Jan 16, 2025

Uh oh!

joshua-arch1 commented Jan 20, 2025

Uh oh!

Uh oh!

[Pipelines] Move IPSCCP after inliner pipeline #96620

Are you sure you want to change the base?

[Pipelines] Move IPSCCP after inliner pipeline #96620

Uh oh!

Conversation

sihuan commented Jun 25, 2024

Uh oh!

dtcxzyw commented Jun 25, 2024

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

dtcxzyw commented Jun 25, 2024

Uh oh!

llvmbot commented Jun 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dtcxzyw commented Jun 27, 2024

Uh oh!

goldsteinn commented Jun 27, 2024

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

dtcxzyw commented Jun 27, 2024

Uh oh!

nikic commented Jun 27, 2024

Uh oh!

joshua-arch1 commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sihuan commented Jan 16, 2025

Uh oh!

joshua-arch1 commented Jan 20, 2025

Uh oh!

Uh oh!

llvmbot commented Jun 27, 2024 •

edited

Loading

joshua-arch1 commented Jan 16, 2025 •

edited

Loading