Skip to content

[PowerPC][GlobalMerge] Reduce TOC usage by merging internal and private global data #101224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 14, 2024

Conversation

amy-kwan
Copy link
Contributor

This patch aims to reduce TOC usage by merging internal and private global data.

Moreover, we also add the GlobalMerge pass within the PPCTargetMachine pipeline, which is disabled by default. This transformation can be enabled by -ppc-global-merge.

@llvmbot
Copy link
Member

llvmbot commented Jul 30, 2024

@llvm/pr-subscribers-backend-powerpc

Author: Amy Kwan (amy-kwan)

Changes

This patch aims to reduce TOC usage by merging internal and private global data.

Moreover, we also add the GlobalMerge pass within the PPCTargetMachine pipeline, which is disabled by default. This transformation can be enabled by -ppc-global-merge.


Full diff: https://github.com/llvm/llvm-project/pull/101224.diff

5 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/GlobalMerge.h (+4)
  • (modified) llvm/include/llvm/CodeGen/Passes.h (+3-1)
  • (modified) llvm/lib/CodeGen/GlobalMerge.cpp (+22-5)
  • (modified) llvm/lib/Target/PowerPC/PPCTargetMachine.cpp (+13)
  • (modified) llvm/test/CodeGen/PowerPC/merge-private.ll (+10-10)
diff --git a/llvm/include/llvm/CodeGen/GlobalMerge.h b/llvm/include/llvm/CodeGen/GlobalMerge.h
index 13ad67d4544bc..ef767d548dc6e 100644
--- a/llvm/include/llvm/CodeGen/GlobalMerge.h
+++ b/llvm/include/llvm/CodeGen/GlobalMerge.h
@@ -28,6 +28,10 @@ struct GlobalMergeOptions {
   bool MergeConst = false;
   /// Whether we should merge global variables that have external linkage.
   bool MergeExternal = true;
+  /// Whether we should merge global variables that have private linkage.
+  bool MergePrivateGlobals = false;
+  /// Whether we should merge constant global variables.
+  bool MergeConstantGlobals = false;
   /// Whether we should try to optimize for size only.
   /// Currently, this applies a dead simple heuristic: only consider globals
   /// used in minsize functions for merging.
diff --git a/llvm/include/llvm/CodeGen/Passes.h b/llvm/include/llvm/CodeGen/Passes.h
index cafb9781698a2..b401a8d9f10fd 100644
--- a/llvm/include/llvm/CodeGen/Passes.h
+++ b/llvm/include/llvm/CodeGen/Passes.h
@@ -476,7 +476,9 @@ namespace llvm {
   ///
   Pass *createGlobalMergePass(const TargetMachine *TM, unsigned MaximalOffset,
                               bool OnlyOptimizeForSize = false,
-                              bool MergeExternalByDefault = false);
+                              bool MergeExternalByDefault = false,
+                              bool MergePrivateByDefault = false,
+                              bool MergeConstantByDefault = false);
 
   /// This pass splits the stack into a safe stack and an unsafe stack to
   /// protect against stack-based overflow vulnerabilities.
diff --git a/llvm/lib/CodeGen/GlobalMerge.cpp b/llvm/lib/CodeGen/GlobalMerge.cpp
index e420c2bb7a25e..4056860fddd93 100644
--- a/llvm/lib/CodeGen/GlobalMerge.cpp
+++ b/llvm/lib/CodeGen/GlobalMerge.cpp
@@ -196,11 +196,14 @@ class GlobalMerge : public FunctionPass {
   }
 
   explicit GlobalMerge(const TargetMachine *TM, unsigned MaximalOffset,
-                       bool OnlyOptimizeForSize, bool MergeExternalGlobals)
+                       bool OnlyOptimizeForSize, bool MergeExternalGlobals,
+                       bool MergePrivateGlobals, bool MergeConstantGlobals)
       : FunctionPass(ID), TM(TM) {
     Opt.MaxOffset = MaximalOffset;
     Opt.SizeOnly = OnlyOptimizeForSize;
     Opt.MergeExternal = MergeExternalGlobals;
+    Opt.MergePrivateGlobals = MergePrivateGlobals;
+    Opt.MergeConstantGlobals = MergeConstantGlobals;
     initializeGlobalMergePass(*PassRegistry::getPassRegistry());
   }
 
@@ -475,7 +478,8 @@ bool GlobalMergeImpl::doMerge(const SmallVectorImpl<GlobalVariable *> &Globals,
   auto &DL = M.getDataLayout();
 
   LLVM_DEBUG(dbgs() << " Trying to merge set, starts with #"
-                    << GlobalSet.find_first() << "\n");
+                    << GlobalSet.find_first() << ", total of " << Globals.size()
+                    << "\n");
 
   bool Changed = false;
   ssize_t i = GlobalSet.find_first();
@@ -551,6 +555,8 @@ bool GlobalMergeImpl::doMerge(const SmallVectorImpl<GlobalVariable *> &Globals,
     MergedGV->setAlignment(MaxAlign);
     MergedGV->setSection(Globals[i]->getSection());
 
+    LLVM_DEBUG(dbgs() << "MergedGV:  " << *MergedGV << "\n");
+
     const StructLayout *MergedLayout = DL.getStructLayout(MergedTy);
     for (ssize_t k = i, idx = 0; k != j; k = GlobalSet.find_next(k), ++idx) {
       GlobalValue::LinkageTypes Linkage = Globals[k]->getLinkage();
@@ -700,6 +706,11 @@ bool GlobalMergeImpl::run(Module &M) {
       else
         Globals[{AddressSpace, Section}].push_back(&GV);
     }
+    LLVM_DEBUG(dbgs() << "GV "
+                      << ((DL.getTypeAllocSize(Ty) < Opt.MaxOffset)
+                              ? "to merge: "
+                              : "not to merge: ")
+                      << GV << "\n");
   }
 
   for (auto &P : Globals)
@@ -710,7 +721,7 @@ bool GlobalMergeImpl::run(Module &M) {
     if (P.second.size() > 1)
       Changed |= doMerge(P.second, M, false, P.first.first);
 
-  if (EnableGlobalMergeOnConst)
+  if (Opt.MergeConstantGlobals)
     for (auto &P : ConstGlobals)
       if (P.second.size() > 1)
         Changed |= doMerge(P.second, M, true, P.first.first);
@@ -720,8 +731,14 @@ bool GlobalMergeImpl::run(Module &M) {
 
 Pass *llvm::createGlobalMergePass(const TargetMachine *TM, unsigned Offset,
                                   bool OnlyOptimizeForSize,
-                                  bool MergeExternalByDefault) {
+                                  bool MergeExternalByDefault,
+                                  bool MergePrivateByDefault,
+                                  bool MergeConstantByDefault) {
   bool MergeExternal = (EnableGlobalMergeOnExternal == cl::BOU_UNSET) ?
     MergeExternalByDefault : (EnableGlobalMergeOnExternal == cl::BOU_TRUE);
-  return new GlobalMerge(TM, Offset, OnlyOptimizeForSize, MergeExternal);
+  bool MergeConstant = EnableGlobalMergeOnConst.getNumOccurrences() > 0
+                           ? EnableGlobalMergeOnConst
+                           : MergeConstantByDefault;
+  return new GlobalMerge(TM, Offset, OnlyOptimizeForSize, MergeExternal,
+                         MergePrivateByDefault, MergeConstant);
 }
diff --git a/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp b/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp
index 1ef891d1b677a..e4045ec304435 100644
--- a/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp
+++ b/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp
@@ -111,6 +111,15 @@ static cl::opt<bool> EnablePPCGenScalarMASSEntries(
              "(scalar) entries"),
     cl::Hidden);
 
+static cl::opt<bool>
+    EnableGlobalMerge("ppc-global-merge", cl::Hidden, cl::init(false),
+                      cl::desc("Enable the global merge pass"));
+
+static cl::opt<unsigned>
+    GlobalMergeMaxOffset("ppc-global-merge-max-offset", cl::Hidden,
+                         cl::init(0x7fff),
+                         cl::desc("Maximum global merge offset"));
+
 extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializePowerPCTarget() {
   // Register the targets
   RegisterTargetMachine<PPCTargetMachine> A(getThePPC32Target());
@@ -491,6 +500,10 @@ void PPCPassConfig::addIRPasses() {
 }
 
 bool PPCPassConfig::addPreISel() {
+  if (EnableGlobalMerge)
+    addPass(createGlobalMergePass(TM, GlobalMergeMaxOffset, false, false, true,
+                                  true));
+
   if (MergeStringPool && getOptLevel() != CodeGenOptLevel::None)
     addPass(createPPCMergeStringPoolPass());
 
diff --git a/llvm/test/CodeGen/PowerPC/merge-private.ll b/llvm/test/CodeGen/PowerPC/merge-private.ll
index 6cf276990d7ea..6ed2d6dfc542b 100644
--- a/llvm/test/CodeGen/PowerPC/merge-private.ll
+++ b/llvm/test/CodeGen/PowerPC/merge-private.ll
@@ -1,15 +1,15 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
 ; RUN: llc -verify-machineinstrs -mtriple powerpc64-ibm-aix-xcoff -mcpu=pwr8 \
-; RUN:     -ppc-asm-full-reg-names < %s | FileCheck %s \
+; RUN:     -ppc-asm-full-reg-names -ppc-global-merge=true < %s | FileCheck %s \
 ; RUN:     --check-prefix=AIX64
 ; RUN: llc -verify-machineinstrs -mtriple powerpc-ibm-aix-xcoff -mcpu=pwr8 \
-; RUN:     -ppc-asm-full-reg-names < %s | FileCheck %s \
+; RUN:     -ppc-asm-full-reg-names -ppc-global-merge=true < %s | FileCheck %s \
 ; RUN:     --check-prefix=AIX32
 ; RUN: llc -verify-machineinstrs -mtriple powerpc64le-unknown-linux -mcpu=pwr8 \
-; RUN:     -ppc-asm-full-reg-names < %s | FileCheck %s \
+; RUN:     -ppc-asm-full-reg-names -ppc-global-merge=true < %s | FileCheck %s \
 ; RUN:     --check-prefix=LINUX64LE
 ; RUN: llc -verify-machineinstrs -mtriple powerpc64-unknown-linux -mcpu=pwr8 \
-; RUN:     -ppc-asm-full-reg-names < %s | FileCheck %s \
+; RUN:     -ppc-asm-full-reg-names -ppc-global-merge=true < %s | FileCheck %s \
 ; RUN:     --check-prefix=LINUX64BE
 
 @.str = private unnamed_addr constant [15 x i8] c"Private global\00", align 1
@@ -24,7 +24,7 @@ define dso_local void @print_func() {
 ; AIX64-NEXT:    stdu r1, -128(r1)
 ; AIX64-NEXT:    std r0, 144(r1)
 ; AIX64-NEXT:    std r31, 120(r1) # 8-byte Folded Spill
-; AIX64-NEXT:    ld r31, L..C0(r2) # @__ModuleStringPool
+; AIX64-NEXT:    ld r31, L..C0(r2) # @_MergedGlobals
 ; AIX64-NEXT:    mr r3, r31
 ; AIX64-NEXT:    bl .puts[PR]
 ; AIX64-NEXT:    nop
@@ -43,7 +43,7 @@ define dso_local void @print_func() {
 ; AIX32-NEXT:    stwu r1, -64(r1)
 ; AIX32-NEXT:    stw r0, 72(r1)
 ; AIX32-NEXT:    stw r31, 60(r1) # 4-byte Folded Spill
-; AIX32-NEXT:    lwz r31, L..C0(r2) # @__ModuleStringPool
+; AIX32-NEXT:    lwz r31, L..C0(r2) # @_MergedGlobals
 ; AIX32-NEXT:    mr r3, r31
 ; AIX32-NEXT:    bl .puts[PR]
 ; AIX32-NEXT:    nop
@@ -64,9 +64,9 @@ define dso_local void @print_func() {
 ; LINUX64LE-NEXT:    .cfi_offset r30, -16
 ; LINUX64LE-NEXT:    std r30, -16(r1) # 8-byte Folded Spill
 ; LINUX64LE-NEXT:    stdu r1, -48(r1)
-; LINUX64LE-NEXT:    addis r3, r2, .L__ModuleStringPool@toc@ha
+; LINUX64LE-NEXT:    addis r3, r2, .L_MergedGlobals@toc@ha
 ; LINUX64LE-NEXT:    std r0, 64(r1)
-; LINUX64LE-NEXT:    addi r30, r3, .L__ModuleStringPool@toc@l
+; LINUX64LE-NEXT:    addi r30, r3, .L_MergedGlobals@toc@l
 ; LINUX64LE-NEXT:    mr r3, r30
 ; LINUX64LE-NEXT:    bl puts
 ; LINUX64LE-NEXT:    nop
@@ -87,9 +87,9 @@ define dso_local void @print_func() {
 ; LINUX64BE-NEXT:    .cfi_def_cfa_offset 128
 ; LINUX64BE-NEXT:    .cfi_offset lr, 16
 ; LINUX64BE-NEXT:    .cfi_offset r30, -16
-; LINUX64BE-NEXT:    addis r3, r2, .L__ModuleStringPool@toc@ha
+; LINUX64BE-NEXT:    addis r3, r2, .L_MergedGlobals@toc@ha
 ; LINUX64BE-NEXT:    std r30, 112(r1) # 8-byte Folded Spill
-; LINUX64BE-NEXT:    addi r30, r3, .L__ModuleStringPool@toc@l
+; LINUX64BE-NEXT:    addi r30, r3, .L_MergedGlobals@toc@l
 ; LINUX64BE-NEXT:    mr r3, r30
 ; LINUX64BE-NEXT:    bl puts
 ; LINUX64BE-NEXT:    nop

@@ -28,6 +28,10 @@ struct GlobalMergeOptions {
bool MergeConst = false;
/// Whether we should merge global variables that have external linkage.
bool MergeExternal = true;
/// Whether we should merge global variables that have private linkage.
bool MergePrivateGlobals = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MergePrivateGlobals is unused?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Like the internal global variables, private global variables should always be merged and should not be guarded under an option. You've already done this in #101222 :)

Comment on lines 739 to 741
bool MergeConstant = EnableGlobalMergeOnConst.getNumOccurrences() > 0
? EnableGlobalMergeOnConst
: MergeConstantByDefault;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool MergeConstant = EnableGlobalMergeOnConst.getNumOccurrences() > 0
? EnableGlobalMergeOnConst
: MergeConstantByDefault;
bool MergeConstant = EnableGlobalMergeOnConst || MergeConstantByDefault;

@amy-kwan amy-kwan force-pushed the users/amy-kwan/ppc-global-merge-private-internal branch from 61a41df to d14bd7c Compare August 2, 2024 03:09
Copy link
Member

@redstar redstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@amy-kwan amy-kwan force-pushed the users/amy-kwan/ppc-global-merge-private-internal branch from d14bd7c to 10d3137 Compare August 2, 2024 16:23
Copy link
Collaborator

@chenzheng1030 chenzheng1030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks very much for enabling this pass on PPC.

Base automatically changed from users/amy-kwan/update-globalmerge-private-condition to main August 13, 2024 14:13
@amy-kwan amy-kwan force-pushed the users/amy-kwan/ppc-global-merge-private-internal branch from 10d3137 to 10af3fb Compare August 13, 2024 16:00
Copy link

github-actions bot commented Aug 13, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

…te global data

This patch aims to reduce TOC usage by merging internal and private global data.

Moreover, we also add the GlobalMerge pass within the PPCTargetMachine pipeline,
which is disabled by default. This transformation can be enabled by -ppc-global-merge.
@amy-kwan amy-kwan force-pushed the users/amy-kwan/ppc-global-merge-private-internal branch from 10af3fb to 40e0221 Compare August 14, 2024 04:21
@amy-kwan amy-kwan merged commit 5e990b0 into main Aug 14, 2024
8 checks passed
@amy-kwan amy-kwan deleted the users/amy-kwan/ppc-global-merge-private-internal branch August 14, 2024 14:14
@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 14, 2024

LLVM Buildbot has detected a new failure on builder llvm-x86_64-debian-dylib running on gribozavr4 while building llvm at step 7 "test-build-unified-tree-check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/4942

Here is the relevant piece of the build log for the reference:

Step 7 (test-build-unified-tree-check-llvm) failure: test (failure)
******************** TEST 'LLVM :: ExecutionEngine/JITLink/AArch64/ELF_relocations.s' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: rm -rf /b/1/llvm-x86_64-debian-dylib/build/test/ExecutionEngine/JITLink/AArch64/Output/ELF_relocations.s.tmp && mkdir -p /b/1/llvm-x86_64-debian-dylib/build/test/ExecutionEngine/JITLink/AArch64/Output/ELF_relocations.s.tmp
+ rm -rf /b/1/llvm-x86_64-debian-dylib/build/test/ExecutionEngine/JITLink/AArch64/Output/ELF_relocations.s.tmp
+ mkdir -p /b/1/llvm-x86_64-debian-dylib/build/test/ExecutionEngine/JITLink/AArch64/Output/ELF_relocations.s.tmp
RUN: at line 2: /b/1/llvm-x86_64-debian-dylib/build/bin/llvm-mc -triple=aarch64-unknown-linux-gnu -x86-relax-relocations=false    -position-independent -filetype=obj -o /b/1/llvm-x86_64-debian-dylib/build/test/ExecutionEngine/JITLink/AArch64/Output/ELF_relocations.s.tmp/elf_reloc.o /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/ExecutionEngine/JITLink/AArch64/ELF_relocations.s
+ /b/1/llvm-x86_64-debian-dylib/build/bin/llvm-mc -triple=aarch64-unknown-linux-gnu -x86-relax-relocations=false -position-independent -filetype=obj -o /b/1/llvm-x86_64-debian-dylib/build/test/ExecutionEngine/JITLink/AArch64/Output/ELF_relocations.s.tmp/elf_reloc.o /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/ExecutionEngine/JITLink/AArch64/ELF_relocations.s
RUN: at line 4: /b/1/llvm-x86_64-debian-dylib/build/bin/llvm-jitlink -noexec               -abs external_data=0xdeadbeef               -abs external_func=0xcafef00d               -check /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/ExecutionEngine/JITLink/AArch64/ELF_relocations.s /b/1/llvm-x86_64-debian-dylib/build/test/ExecutionEngine/JITLink/AArch64/Output/ELF_relocations.s.tmp/elf_reloc.o
+ /b/1/llvm-x86_64-debian-dylib/build/bin/llvm-jitlink -noexec -abs external_data=0xdeadbeef -abs external_func=0xcafef00d -check /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/ExecutionEngine/JITLink/AArch64/ELF_relocations.s /b/1/llvm-x86_64-debian-dylib/build/test/ExecutionEngine/JITLink/AArch64/Output/ELF_relocations.s.tmp/elf_reloc.o
Expression 'decode_operand(test_adr_gotpage_external, 1) =      (got_addr(elf_reloc.o, external_data)[32:12] -         test_adr_gotpage_external[32:12])' is false: 0x108 != 0xffffffffffe00108
/b/1/llvm-x86_64-debian-dylib/build/bin/llvm-jitlink: Some checks in /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/ExecutionEngine/JITLink/AArch64/ELF_relocations.s failed

--

********************


qiaojbao pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Aug 29, 2024
…b32e22b9d

Local branch amd-gfx e0cb32e Merged main:98119718603c6e623b34e3d2160b65b26d9bfcdc into amd-gfx:a22f3eb12095
Remote branch main 5e990b0 [PowerPC][GlobalMerge] Reduce TOC usage by merging internal and private global data (llvm#101224)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants