-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RISCV] Enable global merging by default #115495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Enable global merging by default #115495
Conversation
…lobals by default AArch64 left this disabled after seeing some cases of slightly worse codegen that weren't tracked down, so I suggest as a path to incrementally moving towards enable globals merging we follow suit, and evaluate turning on later. This patch disables merging of external globals, but also adds a flag to override that. This reduces churn in test cases, simplifies benchmarking runs, and this flag can be removed later.
Stacks on top of <llvm#115484>. From the discussion at the round-table at the RISC-V Summit it was clear people see cases where global merging would help. So the direction of enabling it by default and iteratively working to enable it in more cases or to improve the heuristics seems sensible. This patch tries to make a minimal step in that direction.
@llvm/pr-subscribers-backend-risc-v Author: Alex Bradbury (asb) ChangesStacks on top of <#115484>. From the discussion at the round-table at the RISC-V Summit it was clear Full diff: https://github.com/llvm/llvm-project/pull/115495.diff 7 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index a5afbcfd79710f..4fc185525d8d25 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -53,6 +53,13 @@ static cl::opt<cl::boolOrDefault>
EnableGlobalMerge("riscv-enable-global-merge", cl::Hidden,
cl::desc("Enable the global merge pass"));
+static cl::opt<bool> ForceEnableGlobalMergeExternalGlobals(
+ "riscv-force-enable-global-merge-external-globals", cl::Hidden,
+ cl::init(false),
+ cl::desc(
+ "If the global merge pass is enabled, force enable global merging of "
+ "external globals (overriding any logic that might disable it)"));
+
static cl::opt<bool>
EnableMachineCombiner("riscv-enable-machine-combiner",
cl::desc("Enable the machine combiner pass"),
@@ -469,10 +476,17 @@ bool RISCVPassConfig::addPreISel() {
addPass(createBarrierNoopPass());
}
- if (EnableGlobalMerge == cl::BOU_TRUE) {
+ if ((TM->getOptLevel() != CodeGenOptLevel::None &&
+ EnableGlobalMerge == cl::BOU_UNSET) ||
+ EnableGlobalMerge == cl::BOU_TRUE) {
+ // FIXME: Like AArch64, we disable extern global merging by default due to
+ // concerns it might regress some workloads. Unlike AArch64, we don't
+ // currently support enabling the pass in an "OnlyOptimizeForSize" mode.
+ // Investigating and addressing both items are TODO.
addPass(createGlobalMergePass(TM, /* MaxOffset */ 2047,
/* OnlyOptimizeForSize */ false,
- /* MergeExternalByDefault */ true));
+ /* MergeExternalByDefault */
+ ForceEnableGlobalMergeExternalGlobals));
}
return false;
diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
index a74c842c595526..8fd9ae98503665 100644
--- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
@@ -74,6 +74,7 @@
; CHECK-NEXT: Exception handling preparation
; CHECK-NEXT: A No-Op Barrier Pass
; CHECK-NEXT: FunctionPass Manager
+; CHECK-NEXT: Merge internal globals
; CHECK-NEXT: Dominator Tree Construction
; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
; CHECK-NEXT: Function Alias Analysis Results
diff --git a/llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-nonzero.ll b/llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-nonzero.ll
index c5471389302124..156a34db827456 100644
--- a/llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-nonzero.ll
+++ b/llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-nonzero.ll
@@ -1,8 +1,9 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv64 -riscv-enable-global-merge -verify-machineinstrs < %s \
-; RUN: | FileCheck %s -check-prefix=SMALL-DATA
-; RUN: llc -mtriple=riscv64 -riscv-enable-global-merge -global-merge-min-data-size=0 \
-; RUN: -verify-machineinstrs < %s | FileCheck %s -check-prefix=MINSIZE
+; RUN: llc -mtriple=riscv64 -riscv-force-enable-global-merge-external-globals \
+; RUN: -verify-machineinstrs < %s | FileCheck %s -check-prefix=SMALL-DATA
+; RUN: llc -mtriple=riscv64 -global-merge-min-data-size=0 \
+; RUN: -riscv-force-enable-global-merge-external-globals -verify-machineinstrs < %s \
+; RUN: | FileCheck %s -check-prefix=MINSIZE
@ig1 = internal global i32 0, align 4
@ig2 = internal global i32 0, align 4
diff --git a/llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-zero.ll b/llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-zero.ll
index 8e4d72af00ebce..e41f14d394b7ca 100644
--- a/llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-zero.ll
+++ b/llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-zero.ll
@@ -1,8 +1,9 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -riscv-enable-global-merge -verify-machineinstrs < %s \
-; RUN: | FileCheck %s -check-prefix=SMALL-DATA
-; RUN: llc -mtriple=riscv64 -riscv-enable-global-merge -global-merge-min-data-size=5 \
-; RUN: -verify-machineinstrs < %s | FileCheck %s -check-prefix=MINSIZE
+; RUN: llc -mtriple=riscv32 -riscv-force-enable-global-merge-external-globals \
+; RUN: -verify-machineinstrs < %s | FileCheck %s -check-prefix=SMALL-DATA
+; RUN: llc -mtriple=riscv64 -global-merge-min-data-size=5 \
+; RUN: -riscv-force-enable-global-merge-external-globals -verify-machineinstrs < %s \
+; RUN: | FileCheck %s -check-prefix=MINSIZE
@ig1 = internal global i32 0, align 4
@ig2 = internal global i32 0, align 4
diff --git a/llvm/test/CodeGen/RISCV/global-merge-minsize.ll b/llvm/test/CodeGen/RISCV/global-merge-minsize.ll
index e405425832acbb..1d65d9d1732ba3 100644
--- a/llvm/test/CodeGen/RISCV/global-merge-minsize.ll
+++ b/llvm/test/CodeGen/RISCV/global-merge-minsize.ll
@@ -1,8 +1,8 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -riscv-enable-global-merge -verify-machineinstrs < %s \
-; RUN: | FileCheck %s -check-prefix=RV32
-; RUN: llc -mtriple=riscv32 -riscv-enable-global-merge -global-merge-min-data-size=5 \
-; RUN: -verify-machineinstrs < %s | FileCheck %s -check-prefix=RV32-MINSIZE
+; RUN: llc -mtriple=riscv32 -riscv-force-enable-global-merge-external-globals \
+; RUN: -verify-machineinstrs < %s | FileCheck %s -check-prefix=RV32
+; RUN: llc -mtriple=riscv32 -global-merge-min-data-size=5 \
+; RUN: -riscv-force-enable-global-merge-external-globals -verify-machineinstrs < %s | FileCheck %s -check-prefix=RV32-MINSIZE
@ig1 = internal global i32 0, align 4
@ig2 = internal global i32 0, align 4
diff --git a/llvm/test/CodeGen/RISCV/global-merge-offset.ll b/llvm/test/CodeGen/RISCV/global-merge-offset.ll
index 13afcba181719e..bb8264ee438545 100644
--- a/llvm/test/CodeGen/RISCV/global-merge-offset.ll
+++ b/llvm/test/CodeGen/RISCV/global-merge-offset.ll
@@ -1,12 +1,12 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: sed 's/ArrSize/100/g' %s | llc -mtriple=riscv32 -riscv-enable-global-merge \
-; RUN: -verify-machineinstrs | FileCheck %s
-; RUN: sed 's/ArrSize/100/g' %s | llc -mtriple=riscv64 -riscv-enable-global-merge \
-; RUN: -verify-machineinstrs | FileCheck %s
-; RUN: sed 's/ArrSize/101/g' %s | llc -mtriple=riscv32 -riscv-enable-global-merge \
-; RUN: -verify-machineinstrs | FileCheck %s -check-prefix=CHECK-TOOBIG
-; RUN: sed 's/ArrSize/101/g' %s | llc -mtriple=riscv64 -riscv-enable-global-merge \
-; RUN: -verify-machineinstrs | FileCheck %s -check-prefix=CHECK-TOOBIG
+; RUN: sed 's/ArrSize/100/g' %s | llc -mtriple=riscv32 \
+; RUN: -riscv-force-enable-global-merge-external-globals -verify-machineinstrs | FileCheck %s
+; RUN: sed 's/ArrSize/100/g' %s | llc -mtriple=riscv64 \
+; RUN: -riscv-force-enable-global-merge-external-globals -verify-machineinstrs | FileCheck %s
+; RUN: sed 's/ArrSize/101/g' %s | llc -mtriple=riscv32 \
+; RUN: -riscv-force-enable-global-merge-external-globals -verify-machineinstrs | FileCheck %s -check-prefix=CHECK-TOOBIG
+; RUN: sed 's/ArrSize/101/g' %s | llc -mtriple=riscv64 \
+; RUN: -riscv-force-enable-global-merge-external-globals -verify-machineinstrs | FileCheck %s -check-prefix=CHECK-TOOBIG
; This test demonstrates that the MaxOffset is set correctly for RISC-V by
; constructing an input that is at the limit and comparing.
diff --git a/llvm/test/CodeGen/RISCV/global-merge.ll b/llvm/test/CodeGen/RISCV/global-merge.ll
index 20379ee2e7dacd..8b4043ed476f77 100644
--- a/llvm/test/CodeGen/RISCV/global-merge.ll
+++ b/llvm/test/CodeGen/RISCV/global-merge.ll
@@ -1,8 +1,12 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -riscv-enable-global-merge -verify-machineinstrs < %s \
-; RUN: | FileCheck %s
-; RUN: llc -mtriple=riscv64 -riscv-enable-global-merge -verify-machineinstrs < %s \
-; RUN: | FileCheck %s
+; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mtriple=riscv32 \
+; RUN: -riscv-force-enable-global-merge-external-globals -verify-machineinstrs < %s \
+; RUN: | FileCheck -check-prefix=CHECK-WEXTERN %s
+; RUN: llc -mtriple=riscv64 \
+; RUN: -riscv-force-enable-global-merge-external-globals -verify-machineinstrs < %s \
+; RUN: | FileCheck -check-prefix=CHECK-WEXTERN %s
@ig1 = internal global i32 0, align 4
@ig2 = internal global i32 0, align 4
@@ -21,9 +25,21 @@ define void @f1(i32 %a) nounwind {
; CHECK-NEXT: sw a0, %lo(.L_MergedGlobals)(a1)
; CHECK-NEXT: addi a1, a1, %lo(.L_MergedGlobals)
; CHECK-NEXT: sw a0, 4(a1)
-; CHECK-NEXT: sw a0, 8(a1)
-; CHECK-NEXT: sw a0, 12(a1)
+; CHECK-NEXT: lui a1, %hi(eg1)
+; CHECK-NEXT: sw a0, %lo(eg1)(a1)
+; CHECK-NEXT: lui a1, %hi(eg2)
+; CHECK-NEXT: sw a0, %lo(eg2)(a1)
; CHECK-NEXT: ret
+;
+; CHECK-WEXTERN-LABEL: f1:
+; CHECK-WEXTERN: # %bb.0:
+; CHECK-WEXTERN-NEXT: lui a1, %hi(.L_MergedGlobals)
+; CHECK-WEXTERN-NEXT: sw a0, %lo(.L_MergedGlobals)(a1)
+; CHECK-WEXTERN-NEXT: addi a1, a1, %lo(.L_MergedGlobals)
+; CHECK-WEXTERN-NEXT: sw a0, 4(a1)
+; CHECK-WEXTERN-NEXT: sw a0, 8(a1)
+; CHECK-WEXTERN-NEXT: sw a0, 12(a1)
+; CHECK-WEXTERN-NEXT: ret
store i32 %a, ptr @ig1, align 4
store i32 %a, ptr @ig2, align 4
store i32 %a, ptr @eg1, align 4
|
In terms of benchmarking, I'd set off a run on the SpacemiT K1 but before I'd matched AArch64 in disabling merging for external globals. For the specrate benchmarks that I'd been running, the mean 500.perlbench_r runtime is ~0.7% higher after this change. gcc_r is perhaps a coulple of tenths of a percent faster after the change, but in both cases I'd need to do a proper statistical test beyond just looking at the mean and standard deviation. I'll kick off a run for this precise change, but my expectation is very minimal impact indeed (if people have particular benchmarks in mind that may be impacted, that would be very welcome). As noted in the above commit, I think minimal impact would be fine - we'd be more aligned to AArch64, and can look to iteratively improve the impact of globals merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM because the discussion from round table was clearly in this direction and the change is straight forward. However, please wait a couple of days for other input and in particular, for any suggestions on blocking benchmarking. (I would recommend measuring drystone since it tends to be very sensitive to gp-relaxation.)
…lobal-merging-by-default
I've had a look at this + Dhrystone and this change actually makes zero codegen difference for it at all. This is the same as for AArch64, and due to MergeExternalByDefault being False. It does kick in if that is force enabled (and in the default configuration for Arm), so this is obvious a benchmark to look at more closely for the next step after this lands - looking to enable that option. |
This follows up #115495 by enabling merging of external globals by default, which had been left as a next step in order to make the previous change more incremental and so we can more easily narrow down on any identified regressions. Enabling merging of external globals matches what Arm does (for non mach-o targets), though AArch64 doesn't as there were [some concerns](https://reviews.llvm.org/D61947) it might cause regressions in some cases. See #117880 for benchmark figures and discussion.
Stacks on top of #115484.
From the discussion at the round-table at the RISC-V Summit it was clear
people see cases where global merging would help. So the direction of
enabling it by default and iteratively working to enable it in more
cases or to improve the heuristics seems sensible. This patch tries to
make a minimal step in that direction.