[RISCV] Enable merging of external globals by default #117880

asb · 2024-11-27T13:34:26Z

This follows up #115495 by enabling merging of external globals by default, which had been left as a next step in order to make the previous change more incremental and so we can more easily narrow down on any identified regressions.

Enabling merging of external globals matches what Arm does (for non mach-o targets), though AArch64 doesn't as there were some concerns it might cause regressions in some cases.

Here are the results I have from specrate benchmarks (train workload, figures are execution time in seconds):

	oldmerge mean	oldmerge stdev	newmerge mean	newmerge stdev	abs old vs new change	old vs new change as %
500.perlbench_r	159.6	0.4	159.7	0.6	0.1	0.06%
502.gcc_r	408.8	2.3	409.7	2.4	0.9	0.22%
505.mcf_r	619.4	3.6	620.6	4.1	1.2	0.19%
520.omnetpp_r	573.7	5	569.1	2.4	-4.6	-0.80%
525.x264_r	183.9	0.2	184	0.4	0.1	0.05%
531.deepsjeng_r	280.6	0.9	280	0.7	-0.6	-0.21%
541.leela_r	284.6	1.1	284.5	0.9	-0.1	-0.04%
557.xz_r	208.8	3	209.5	3.2	0.7	0.34%

This is on the Spacemit X60, with essentially rva22+v. omnetpp is the main one that sees a potentially positive change, though there was an annoyingly high standard deviation on that benchmark.

Looking at dhyrstone we see a positive codegen change in Proc0 which translates to a reduced instruction count, though not decreased runtime:

$ perf stat -r100 ./dry.oldmerge

 Performance counter stats for './dry.oldmerge' (100 runs):

            815.02 msec task-clock:u                     #    0.999 CPUs utilized            ( +-  0.02% )
                 3      context-switches:u               #    3.681 /sec                     ( +-  2.75% )
                 0      cpu-migrations:u                 #    0.000 /sec
                46      page-faults:u                    #   56.443 /sec
     1,304,052,614      cycles:u                         #    1.600 GHz                      ( +-  0.02% )
     1,601,705,197      instructions:u                   #    1.23  insn per cycle           ( +-  0.00% )
       100,021,180      branches:u                       #  122.728 M/sec                    ( +-  0.00% )
             3,427      branch-misses:u                  #    0.00% of all branches          ( +-  0.68% )

          0.815862 +- 0.000137 seconds time elapsed  ( +-  0.02% )

$ perf stat -r100 ./dry.newmerge

 Performance counter stats for './dry.newmerge' (100 runs):

            815.29 msec task-clock:u                     #    0.999 CPUs utilized            ( +-  0.00% )
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
                46      page-faults:u                    #   56.429 /sec
     1,304,461,619      cycles:u                         #    1.600 GHz                      ( +-  0.00% )
     1,501,837,638      instructions:u                   #    1.15  insn per cycle           ( +-  0.00% )
       100,021,173      branches:u                       #  122.697 M/sec                    ( +-  0.00% )
             2,813      branch-misses:u                  #    0.00% of all branches          ( +-  0.82% )

         0.8161623 +- 0.0000465 seconds time elapsed  ( +-  0.01% )

EDIT: I should elaborate on my overall view here / what feedback I'm looking for. At least for the tests I've run, the differences seem mainly in the noise and the reduction in instructions in dhrystone is meaningful enough (even if execution time isn't affected for the spacemit x60 microarch) that it seems sensible to move forwards. But I'd welcome any opposing thoughts / concerns, or indeed further input from people who've enabled globals merging downstream. I'd understand this probably included external globals merging too as enabled in this patch, but it would be helpful to confirm.

llvmbot · 2024-11-27T13:35:06Z

@llvm/pr-subscribers-backend-risc-v

Author: Alex Bradbury (asb)

Changes

This follows up #115495 by enabling merging of external globals by default, which had been left as a next step in order to make the previous change more incremental and so we can more easily narrow down on any identified regressions.

Enabling merging of external globals matches what Arm does (for non mach-o targets), though AArch64 doesn't as there were some concerns it might cause regressions in some cases.

Here are the results I have from specrate benchmarks:

	oldmerge mean	oldmerge stdev	newmerge mean	newmerge stdev	abs old vs new change	old vs new change as %
500.perlbench_r	159.6	0.4	159.7	0.6	0.1	0.06%
502.gcc_r	408.8	2.3	409.7	2.4	0.9	0.22%
505.mcf_r	619.4	3.6	620.6	4.1	1.2	0.19%
520.omnetpp_r	573.7	5	569.1	2.4	-4.6	-0.80%
525.x264_r	183.9	0.2	184	0.4	0.1	0.05%
531.deepsjeng_r	280.6	0.9	280	0.7	-0.6	-0.21%
541.leela_r	284.6	1.1	284.5	0.9	-0.1	-0.04%
557.xz_r	208.8	3	209.5	3.2	0.7	0.34%

This is on the Spacemit X60, with essentially rva22+v. omnetpp is the main one that sees a potentially positive change, though there was an annoyingly high standard deviation on that benchmark.

Looking at dhyrstone we see a positive codegen change in Proc0 which translates to a reduced instruction count, though not decreased runtime:

$ perf stat -r100 ./dry.oldmerge

 Performance counter stats for './dry.oldmerge' (100 runs):

            815.02 msec task-clock:u                     #    0.999 CPUs utilized            ( +-  0.02% )
                 3      context-switches:u               #    3.681 /sec                     ( +-  2.75% )
                 0      cpu-migrations:u                 #    0.000 /sec
                46      page-faults:u                    #   56.443 /sec
     1,304,052,614      cycles:u                         #    1.600 GHz                      ( +-  0.02% )
     1,601,705,197      instructions:u                   #    1.23  insn per cycle           ( +-  0.00% )
       100,021,180      branches:u                       #  122.728 M/sec                    ( +-  0.00% )
             3,427      branch-misses:u                  #    0.00% of all branches          ( +-  0.68% )

          0.815862 +- 0.000137 seconds time elapsed  ( +-  0.02% )

$ perf stat -r100 ./dry.newmerge

 Performance counter stats for './dry.newmerge' (100 runs):

            815.29 msec task-clock:u                     #    0.999 CPUs utilized            ( +-  0.00% )
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
                46      page-faults:u                    #   56.429 /sec
     1,304,461,619      cycles:u                         #    1.600 GHz                      ( +-  0.00% )
     1,501,837,638      instructions:u                   #    1.15  insn per cycle           ( +-  0.00% )
       100,021,173      branches:u                       #  122.697 M/sec                    ( +-  0.00% )
             2,813      branch-misses:u                  #    0.00% of all branches          ( +-  0.82% )

         0.8161623 +- 0.0000465 seconds time elapsed  ( +-  0.01% )

Full diff: https://github.com/llvm/llvm-project/pull/117880.diff

6 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+4-7)
(modified) llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-nonzero.ll (+3-4)
(modified) llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-zero.ll (+3-4)
(modified) llvm/test/CodeGen/RISCV/global-merge-minsize.ll (+2-2)
(modified) llvm/test/CodeGen/RISCV/global-merge-offset.ll (+4-4)
(modified) llvm/test/CodeGen/RISCV/global-merge.ll (+6-6)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index fa507653264ccd..7a2e2a90699553 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -51,12 +51,9 @@ static cl::opt<cl::boolOrDefault>
     EnableGlobalMerge("riscv-enable-global-merge", cl::Hidden,
                       cl::desc("Enable the global merge pass"));
 
-static cl::opt<bool> ForceEnableGlobalMergeExternalGlobals(
-    "riscv-force-enable-global-merge-external-globals", cl::Hidden,
-    cl::init(false),
-    cl::desc(
-        "If the global merge pass is enabled, force enable global merging of "
-        "external globals (overriding any logic that might disable it)"));
+static cl::opt<bool> DisableGlobalMergeExternalGlobals(
+    "riscv-disable-global-merge-external-globals", cl::Hidden, cl::init(false),
+    cl::desc("Disable global merging of external globals"));
 
 static cl::opt<bool>
     EnableMachineCombiner("riscv-enable-machine-combiner",
@@ -484,7 +481,7 @@ bool RISCVPassConfig::addPreISel() {
     addPass(createGlobalMergePass(TM, /* MaxOffset */ 2047,
                                   /* OnlyOptimizeForSize */ false,
                                   /* MergeExternalByDefault */
-                                  ForceEnableGlobalMergeExternalGlobals));
+                                  !DisableGlobalMergeExternalGlobals));
   }
 
   return false;
diff --git a/llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-nonzero.ll b/llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-nonzero.ll
index 156a34db827456..77fe1783bb5d5e 100644
--- a/llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-nonzero.ll
+++ b/llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-nonzero.ll
@@ -1,8 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv64 -riscv-force-enable-global-merge-external-globals \
-; RUN:   -verify-machineinstrs < %s | FileCheck %s -check-prefix=SMALL-DATA
-; RUN: llc -mtriple=riscv64 -global-merge-min-data-size=0 \
-; RUN:   -riscv-force-enable-global-merge-external-globals -verify-machineinstrs < %s \
+; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
+; RUN:   | FileCheck %s -check-prefix=SMALL-DATA
+; RUN: llc -mtriple=riscv64 -global-merge-min-data-size=0 -verify-machineinstrs < %s \
 ; RUN:   | FileCheck %s -check-prefix=MINSIZE
 
 @ig1 = internal global i32 0, align 4
diff --git a/llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-zero.ll b/llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-zero.ll
index e41f14d394b7ca..c29749c17a5b5c 100644
--- a/llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-zero.ll
+++ b/llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-zero.ll
@@ -1,8 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -riscv-force-enable-global-merge-external-globals \
-; RUN:   -verify-machineinstrs < %s | FileCheck %s -check-prefix=SMALL-DATA
-; RUN: llc -mtriple=riscv64 -global-merge-min-data-size=5 \
-; RUN:   -riscv-force-enable-global-merge-external-globals  -verify-machineinstrs < %s \
+; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
+; RUN:   | FileCheck %s -check-prefix=SMALL-DATA
+; RUN: llc -mtriple=riscv64 -global-merge-min-data-size=5 -verify-machineinstrs < %s \
 ; RUN:   | FileCheck %s -check-prefix=MINSIZE
 
 @ig1 = internal global i32 0, align 4
diff --git a/llvm/test/CodeGen/RISCV/global-merge-minsize.ll b/llvm/test/CodeGen/RISCV/global-merge-minsize.ll
index 1d65d9d1732ba3..915dde388cffdc 100644
--- a/llvm/test/CodeGen/RISCV/global-merge-minsize.ll
+++ b/llvm/test/CodeGen/RISCV/global-merge-minsize.ll
@@ -1,8 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -riscv-force-enable-global-merge-external-globals \
+; RUN: llc -mtriple=riscv32 \
 ; RUN:   -verify-machineinstrs < %s | FileCheck %s -check-prefix=RV32
 ; RUN: llc -mtriple=riscv32 -global-merge-min-data-size=5 \
-; RUN:   -riscv-force-enable-global-merge-external-globals -verify-machineinstrs < %s | FileCheck %s -check-prefix=RV32-MINSIZE
+; RUN:   -verify-machineinstrs < %s | FileCheck %s -check-prefix=RV32-MINSIZE
 
 @ig1 = internal global i32 0, align 4
 @ig2 = internal global i32 0, align 4
diff --git a/llvm/test/CodeGen/RISCV/global-merge-offset.ll b/llvm/test/CodeGen/RISCV/global-merge-offset.ll
index bb8264ee438545..c1074bc8ca97ed 100644
--- a/llvm/test/CodeGen/RISCV/global-merge-offset.ll
+++ b/llvm/test/CodeGen/RISCV/global-merge-offset.ll
@@ -1,12 +1,12 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: sed 's/ArrSize/100/g' %s | llc -mtriple=riscv32 \
-; RUN:   -riscv-force-enable-global-merge-external-globals -verify-machineinstrs | FileCheck %s
+; RUN:   -verify-machineinstrs | FileCheck %s
 ; RUN: sed 's/ArrSize/100/g' %s | llc -mtriple=riscv64 \
-; RUN:   -riscv-force-enable-global-merge-external-globals -verify-machineinstrs | FileCheck %s
+; RUN:   -verify-machineinstrs | FileCheck %s
 ; RUN: sed 's/ArrSize/101/g' %s | llc -mtriple=riscv32 \
-; RUN:   -riscv-force-enable-global-merge-external-globals -verify-machineinstrs | FileCheck %s -check-prefix=CHECK-TOOBIG
+; RUN:   -verify-machineinstrs | FileCheck %s -check-prefix=CHECK-TOOBIG
 ; RUN: sed 's/ArrSize/101/g' %s | llc -mtriple=riscv64 \
-; RUN:   -riscv-force-enable-global-merge-external-globals -verify-machineinstrs | FileCheck %s -check-prefix=CHECK-TOOBIG
+; RUN:   -verify-machineinstrs | FileCheck %s -check-prefix=CHECK-TOOBIG
 
 ; This test demonstrates that the MaxOffset is set correctly for RISC-V by
 ; constructing an input that is at the limit and comparing.
diff --git a/llvm/test/CodeGen/RISCV/global-merge.ll b/llvm/test/CodeGen/RISCV/global-merge.ll
index 327a6b54f4be25..fa0dda3736906b 100644
--- a/llvm/test/CodeGen/RISCV/global-merge.ll
+++ b/llvm/test/CodeGen/RISCV/global-merge.ll
@@ -1,11 +1,11 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s | FileCheck %s
-; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s | FileCheck %s
-; RUN: llc -mtriple=riscv32 \
-; RUN:     -riscv-force-enable-global-merge-external-globals -verify-machineinstrs < %s \
+; RUN: llc -mtriple=riscv32 -riscv-disable-global-merge-external-globals \
+; RUN:     -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mtriple=riscv64 -riscv-disable-global-merge-external-globals \
+; RUN:     -verify-machineinstrs < %s | FileCheck %s
+; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
 ; RUN:   | FileCheck -check-prefix=CHECK-WEXTERN %s
-; RUN: llc -mtriple=riscv64 \
-; RUN:     -riscv-force-enable-global-merge-external-globals -verify-machineinstrs < %s \
+; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
 ; RUN:   | FileCheck -check-prefix=CHECK-WEXTERN %s
 
 @ig1 = internal global i32 0, align 4

llvm/lib/Target/RISCV/RISCVTargetMachine.cpp

…lobal-merge-on-external (thanks Craig!)

preames

LGTM.

Thank you for the performance data. I agree the performance is pretty much in the noise, but this makes sense in concept. Given that, I think moving forward here is the right answer.

topperc

LGTM

…merging-mergeexternalbydefault

llvm-ci · 2024-12-11T16:55:57Z

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building llvm at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/10066

Here is the relevant piece of the build log for the reference

Step 6 (test-openmp) failure: test (failure)
******************** TEST 'libomp :: tasking/issue-94260-2.c' FAILED ********************
Exit Code: -11

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp   -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/openmp/runtime/test -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -fno-omit-frame-pointer -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/openmp/runtime/test/ompt /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic && /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/./bin/clang -fopenmp -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/openmp/runtime/test -L /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -fno-omit-frame-pointer -I /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/openmp/runtime/test/ompt /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic
# executed command: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -11

--

********************

[RISCV] Enable merging of external globals by default

0496e16

asb requested review from preames, mshockwave, michaelmaitland and topperc November 27, 2024 13:34

llvmbot added the backend:RISC-V label Nov 27, 2024

topperc reviewed Nov 27, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVTargetMachine.cpp Outdated Show resolved Hide resolved

Delete ForceEnableGlobalMergeExternalGlobals as it's redundant with g…

2e3ee91

…lobal-merge-on-external (thanks Craig!)

preames approved these changes Dec 4, 2024

View reviewed changes

topperc approved these changes Dec 4, 2024

View reviewed changes

Merge remote-tracking branch 'origin/main' into 2024q4-riscv-globals-…

c0fe781

…merging-mergeexternalbydefault

asb merged commit 3787fbf into llvm:main Dec 11, 2024
5 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Enable merging of external globals by default #117880

[RISCV] Enable merging of external globals by default #117880

Uh oh!

asb commented Nov 27, 2024 •

edited

Loading

Uh oh!

llvmbot commented Nov 27, 2024

Uh oh!

Uh oh!

preames left a comment

Uh oh!

topperc left a comment

Uh oh!

Uh oh!

llvm-ci commented Dec 11, 2024

Uh oh!

Uh oh!

[RISCV] Enable merging of external globals by default #117880

[RISCV] Enable merging of external globals by default #117880

Uh oh!

Conversation

asb commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 27, 2024

Uh oh!

Uh oh!

preames left a comment

Choose a reason for hiding this comment

Uh oh!

topperc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Dec 11, 2024

Uh oh!

Uh oh!

asb commented Nov 27, 2024 •

edited

Loading