Skip to content

[AMDGPU] Improve StructurizeCFG pass performance: avoid redundant DebugLoc map initialization. NFC. #130568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 11, 2025

Conversation

vpykhtin
Copy link
Contributor

Previously, the TermDL (BB terminator → DebugLoc) map was initialized at the start of processing each function's region, creating entries for the entire function. This could be inefficient for large functions.

This patch improves performance by creating map entries only when needed—when a terminator is being killed or when a flow block is created. Additionally, entries are removed immediately after use, preventing unnecessary map growth and ensuring DebugLocs are not "retracked."

A mapless variant was also explored, but due to limited familiarity with the structurizer, it was not pursued further.

In my cases, this change improves performance by 2-3×.

@llvmbot
Copy link
Member

llvmbot commented Mar 10, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Valery Pykhtin (vpykhtin)

Changes

Previously, the TermDL (BB terminator → DebugLoc) map was initialized at the start of processing each function's region, creating entries for the entire function. This could be inefficient for large functions.

This patch improves performance by creating map entries only when needed—when a terminator is being killed or when a flow block is created. Additionally, entries are removed immediately after use, preventing unnecessary map growth and ensuring DebugLocs are not "retracked."

A mapless variant was also explored, but due to limited familiarity with the structurizer, it was not pursued further.

In my cases, this change improves performance by 2-3×.


Full diff: https://github.com/llvm/llvm-project/pull/130568.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/StructurizeCFG.cpp (+21-15)
diff --git a/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp b/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
index 89a2a7ac9be3f..db6321cc2f71e 100644
--- a/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
+++ b/llvm/lib/Transforms/Scalar/StructurizeCFG.cpp
@@ -361,6 +361,8 @@ class StructurizeCFG {
 
   void rebuildSSA();
 
+  void setDebugLoc(BranchInst *Br, BasicBlock *BB);
+
 public:
   void init(Region *R);
   bool run(Region *R, DominatorTree *DT);
@@ -595,14 +597,6 @@ void StructurizeCFG::collectInfos() {
     // Find the last back edges
     analyzeLoops(RN);
   }
-
-  // Reset the collected term debug locations
-  TermDL.clear();
-
-  for (BasicBlock &BB : *Func) {
-    if (const DebugLoc &DL = BB.getTerminator()->getDebugLoc())
-      TermDL[&BB] = DL;
-  }
 }
 
 /// Insert the missing branch conditions
@@ -924,12 +918,24 @@ void StructurizeCFG::simplifyAffectedPhis() {
   } while (Changed);
 }
 
+void StructurizeCFG::setDebugLoc(BranchInst *Br, BasicBlock *BB) {
+  auto I = TermDL.find(BB);
+  if (I == TermDL.end())
+    return;
+
+  Br->setDebugLoc(I->second);
+  TermDL.erase(I);
+}
+
 /// Remove phi values from all successors and then remove the terminator.
 void StructurizeCFG::killTerminator(BasicBlock *BB) {
   Instruction *Term = BB->getTerminator();
   if (!Term)
     return;
 
+  if (const DebugLoc &DL = Term->getDebugLoc())
+    TermDL[BB] = DL;
+
   for (BasicBlock *Succ : successors(BB))
     delPhiValues(BB, Succ);
 
@@ -974,7 +980,7 @@ void StructurizeCFG::changeExit(RegionNode *Node, BasicBlock *NewExit,
     BasicBlock *BB = Node->getNodeAs<BasicBlock>();
     killTerminator(BB);
     BranchInst *Br = BranchInst::Create(NewExit, BB);
-    Br->setDebugLoc(TermDL[BB]);
+    setDebugLoc(Br, BB);
     addPhiValues(BB, NewExit);
     if (IncludeDominator)
       DT->changeImmediateDominator(NewExit, BB);
@@ -990,10 +996,10 @@ BasicBlock *StructurizeCFG::getNextFlow(BasicBlock *Dominator) {
                                         Func, Insert);
   FlowSet.insert(Flow);
 
-  // use a temporary variable to avoid a use-after-free if the map's storage is
-  // reallocated
-  DebugLoc DL = TermDL[Dominator];
-  TermDL[Flow] = std::move(DL);
+  auto *Term = Dominator->getTerminator();
+  if (const DebugLoc &DL =
+          Term ? Term->getDebugLoc() : TermDL.lookup(Dominator))
+    TermDL[Flow] = DL;
 
   DT->addNewBlock(Flow, Dominator);
   ParentRegion->getRegionInfo()->setRegionFor(Flow, ParentRegion);
@@ -1088,7 +1094,7 @@ void StructurizeCFG::wireFlow(bool ExitUseAllowed,
 
     // let it point to entry and next block
     BranchInst *Br = BranchInst::Create(Entry, Next, BoolPoison, Flow);
-    Br->setDebugLoc(TermDL[Flow]);
+    setDebugLoc(Br, Flow);
     Conditions.push_back(Br);
     addPhiValues(Flow, Entry);
     DT->changeImmediateDominator(Entry, Flow);
@@ -1129,7 +1135,7 @@ void StructurizeCFG::handleLoops(bool ExitUseAllowed,
   LoopEnd = needPrefix(false);
   BasicBlock *Next = needPostfix(LoopEnd, ExitUseAllowed);
   BranchInst *Br = BranchInst::Create(Next, LoopStart, BoolPoison, LoopEnd);
-  Br->setDebugLoc(TermDL[LoopEnd]);
+  setDebugLoc(Br, LoopEnd);
   LoopConds.push_back(Br);
   addPhiValues(LoopEnd, LoopStart);
   setPrevNode(Next);

@vpykhtin vpykhtin merged commit bb2e85f into llvm:main Mar 11, 2025
11 checks passed
@vpykhtin vpykhtin deleted the speedup_structurizecfg_debugloc branch March 11, 2025 06:19
@llvm-ci
Copy link
Collaborator

llvm-ci commented Mar 11, 2025

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime-2 running on rocm-worker-hw-02 while building llvm at step 5 "compile-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/10/builds/1048

Here is the relevant piece of the build log for the reference
Step 5 (compile-openmp) failure: build (failure)
...
0.148 [393/130/182] Generating header uchar.h from /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/runtimes/../libc/include/uchar.yaml
0.148 [392/130/183] Generating /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/include/amdgcn-amd-amdhsa/llvm-libc-decls/time.h
0.149 [391/130/184] Generating header time.h from /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/runtimes/../libc/include/time.yaml
0.149 [390/130/185] Building CXX object libc/src/math/amdgpu/CMakeFiles/libc.src.math.amdgpu.lgamma.dir/lgamma.cpp.o
0.150 [389/130/186] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.getenv.dir/getenv.cpp.o
0.150 [388/130/187] Building CXX object libc/src/strings/CMakeFiles/libc.src.strings.strncasecmp.dir/strncasecmp.cpp.o
0.150 [387/130/188] Generating header stdlib.h from /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/runtimes/../libc/include/stdlib.yaml
0.151 [386/130/189] Generating header stdio.h from /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/runtimes/../libc/include/stdio.yaml
0.152 [385/130/190] Generating /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/include/amdgcn-amd-amdhsa/llvm-libc-decls/signal.h
0.155 [384/130/191] Building CXX object libc/src/string/CMakeFiles/libc.src.string.strstr.dir/strstr.cpp.o
FAILED: libc/src/string/CMakeFiles/libc.src.string.strstr.dir/strstr.cpp.o 
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/./bin/clang++ --target=amdgcn-amd-amdhsa -DLIBC_NAMESPACE=__llvm_libc_21_0_0_git -D__LIBC_USE_FLOAT16_CONVERSION -I/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc -isystem /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.build/include/amdgcn-amd-amdhsa -O3 -DNDEBUG --target=amdgcn-amd-amdhsa -DLIBC_QSORT_IMPL=LIBC_QSORT_QUICK_SORT -DLIBC_ADD_NULL_CHECKS "-DLIBC_MATH=(LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES | LIBC_MATH_NO_ERRNO | LIBC_MATH_NO_EXCEPT)" -fpie -DLIBC_FULL_BUILD -nostdlibinc -ffixed-point -fno-exceptions -fno-lax-vector-conversions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -ftrivial-auto-var-init=pattern -fno-omit-frame-pointer -Wall -Wextra -Werror -Wconversion -Wno-sign-conversion -Wdeprecated -Wno-c99-extensions -Wno-gnu-imaginary-constant -Wno-pedantic -Wimplicit-fallthrough -Wwrite-strings -Wextra-semi -Wnewline-eof -Wnonportable-system-include-path -Wstrict-prototypes -Wthread-safety -Wglobal-constructors -nogpulib -fvisibility=hidden -fconvergent-functions -flto -Wno-multi-gpu -Xclang -mcode-object-version=none -DLIBC_COPT_PUBLIC_PACKAGING -UNDEBUG -fno-builtin-strstr -MD -MT libc/src/string/CMakeFiles/libc.src.string.strstr.dir/strstr.cpp.o -MF libc/src/string/CMakeFiles/libc.src.string.strstr.dir/strstr.cpp.o.d -o libc/src/string/CMakeFiles/libc.src.string.strstr.dir/strstr.cpp.o -c /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/string/strstr.cpp
In file included from /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/string/strstr.cpp:13:
In file included from /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/string/memory_utils/inline_strstr.h:15:
In file included from /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/string/string_utils.h:22:
In file included from /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/string/memory_utils/inline_bzero.h:15:
In file included from /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/string/memory_utils/inline_memset.h:14:
In file included from /home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/string/memory_utils/utils.h:15:
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/__support/endian_internal.h:43:13: error: unknown type name 'uint8_t'
   43 | LIBC_INLINE uint8_t
      |             ^
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/__support/endian_internal.h:44:48: error: use of undeclared identifier 'uint8_t'
   44 | Endian<__ORDER_LITTLE_ENDIAN__>::to_big_endian<uint8_t>(uint8_t v) {
      |                                                ^
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/__support/endian_internal.h:44:48: error: use of undeclared identifier 'uint8_t'
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/__support/endian_internal.h:44:57: error: unknown type name 'uint8_t'
   44 | Endian<__ORDER_LITTLE_ENDIAN__>::to_big_endian<uint8_t>(uint8_t v) {
      |                                                         ^
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/__support/endian_internal.h:49:13: error: unknown type name 'uint8_t'
   49 | LIBC_INLINE uint8_t
      |             ^
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/__support/endian_internal.h:50:51: error: use of undeclared identifier 'uint8_t'
   50 | Endian<__ORDER_LITTLE_ENDIAN__>::to_little_endian<uint8_t>(uint8_t v) {
      |                                                   ^
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/__support/endian_internal.h:50:51: error: use of undeclared identifier 'uint8_t'
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/__support/endian_internal.h:50:60: error: unknown type name 'uint8_t'
   50 | Endian<__ORDER_LITTLE_ENDIAN__>::to_little_endian<uint8_t>(uint8_t v) {
      |                                                            ^
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/__support/endian_internal.h:55:13: error: unknown type name 'uint16_t'
   55 | LIBC_INLINE uint16_t
      |             ^
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/__support/endian_internal.h:56:48: error: use of undeclared identifier 'uint16_t'
   56 | Endian<__ORDER_LITTLE_ENDIAN__>::to_big_endian<uint16_t>(uint16_t v) {
      |                                                ^
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/__support/endian_internal.h:56:48: error: use of undeclared identifier 'uint16_t'
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/__support/endian_internal.h:56:58: error: unknown type name 'uint16_t'
   56 | Endian<__ORDER_LITTLE_ENDIAN__>::to_big_endian<uint16_t>(uint16_t v) {
      |                                                          ^
/home/botworker/builds/openmp-offload-amdgpu-runtime-2/llvm.src/libc/src/__support/endian_internal.h:61:13: error: unknown type name 'uint16_t'

@vpykhtin
Copy link
Contributor Author

The https://lab.llvm.org/buildbot/#/builders/10/builds/1048 failure looks unrelated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants