Skip to content

[MemProf] Tolerate missing leaf debug frames #71233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 4, 2023

Conversation

teresajohnson
Copy link
Contributor

Loosen up the matching so that a missing leaf debug frame in the profile
does not prevent matching an allocation context if we can match further
up the inlined call context. This relies on the pre-inliner, which was
already the default when performing normal PGO feedback along with the
MemProf feedback, but to ensure matching is not affected by the presence
of PGO, enable the pre-inliner for MemProf feedback as well.

Loosen up the matching so that a missing leaf debug frame in the profile
does not prevent matching an allocation context if we can match further
up the inlined call context. This relies on the pre-inliner, which was
already the default when performing normal PGO feedback along with the
MemProf feedback, but to ensure matching is not affected by the presence
of PGO, enable the pre-inliner for MemProf feedback as well.
@llvmbot llvmbot added PGO Profile Guided Optimizations llvm:transforms labels Nov 3, 2023
@llvmbot
Copy link
Member

llvmbot commented Nov 3, 2023

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-pgo

Author: Teresa Johnson (teresajohnson)

Changes

Loosen up the matching so that a missing leaf debug frame in the profile
does not prevent matching an allocation context if we can match further
up the inlined call context. This relies on the pre-inliner, which was
already the default when performing normal PGO feedback along with the
MemProf feedback, but to ensure matching is not affected by the presence
of PGO, enable the pre-inliner for MemProf feedback as well.


Full diff: https://github.com/llvm/llvm-project/pull/71233.diff

8 Files Affected:

  • (modified) llvm/include/llvm/Passes/PassBuilder.h (+3-1)
  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+50-39)
  • (modified) llvm/lib/Transforms/Instrumentation/MemProfiler.cpp (+9-9)
  • (added) llvm/test/Other/new-pm-memprof.ll (+12)
  • (added) llvm/test/Transforms/PGOProfile/Inputs/memprof_missing_leaf.exe ()
  • (added) llvm/test/Transforms/PGOProfile/Inputs/memprof_missing_leaf.memprofraw ()
  • (modified) llvm/test/Transforms/PGOProfile/Inputs/update_memprof_inputs.sh (+31)
  • (added) llvm/test/Transforms/PGOProfile/memprof_missing_leaf.ll (+92)
diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h
index 2c7ceda7998eda1..5aaed05f5ae2f99 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -628,11 +628,13 @@ class PassBuilder {
   Error parseModulePassPipeline(ModulePassManager &MPM,
                                 ArrayRef<PipelineElement> Pipeline);
 
+  void addPreInlinerPasses(ModulePassManager &MPM, OptimizationLevel Level,
+                           ThinOrFullLTOPhase LTOPhase);
+
   void addPGOInstrPasses(ModulePassManager &MPM, OptimizationLevel Level,
                          bool RunProfileGen, bool IsCS,
                          bool AtomicCounterUpdate, std::string ProfileFile,
                          std::string ProfileRemappingFile,
-                         ThinOrFullLTOPhase LTOPhase,
                          IntrusiveRefCntPtr<vfs::FileSystem> FS);
 
   // Extension Point callbacks
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index baea2913338cda7..f4a4802bcb649f3 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -733,47 +733,52 @@ void PassBuilder::addRequiredLTOPreLinkPasses(ModulePassManager &MPM) {
   MPM.addPass(NameAnonGlobalPass());
 }
 
-void PassBuilder::addPGOInstrPasses(ModulePassManager &MPM,
-                                    OptimizationLevel Level, bool RunProfileGen,
-                                    bool IsCS, bool AtomicCounterUpdate,
-                                    std::string ProfileFile,
-                                    std::string ProfileRemappingFile,
-                                    ThinOrFullLTOPhase LTOPhase,
-                                    IntrusiveRefCntPtr<vfs::FileSystem> FS) {
+void PassBuilder::addPreInlinerPasses(ModulePassManager &MPM,
+                                      OptimizationLevel Level,
+                                      ThinOrFullLTOPhase LTOPhase) {
   assert(Level != OptimizationLevel::O0 && "Not expecting O0 here!");
-  if (!IsCS && !DisablePreInliner) {
-    InlineParams IP;
+  if (DisablePreInliner)
+    return;
+  InlineParams IP;
 
-    IP.DefaultThreshold = PreInlineThreshold;
+  IP.DefaultThreshold = PreInlineThreshold;
 
-    // FIXME: The hint threshold has the same value used by the regular inliner
-    // when not optimzing for size. This should probably be lowered after
-    // performance testing.
-    // FIXME: this comment is cargo culted from the old pass manager, revisit).
-    IP.HintThreshold = Level.isOptimizingForSize() ? PreInlineThreshold : 325;
-    ModuleInlinerWrapperPass MIWP(
-        IP, /* MandatoryFirst */ true,
-        InlineContext{LTOPhase, InlinePass::EarlyInliner});
-    CGSCCPassManager &CGPipeline = MIWP.getPM();
+  // FIXME: The hint threshold has the same value used by the regular inliner
+  // when not optimzing for size. This should probably be lowered after
+  // performance testing.
+  // FIXME: this comment is cargo culted from the old pass manager, revisit).
+  IP.HintThreshold = Level.isOptimizingForSize() ? PreInlineThreshold : 325;
+  ModuleInlinerWrapperPass MIWP(
+      IP, /* MandatoryFirst */ true,
+      InlineContext{LTOPhase, InlinePass::EarlyInliner});
+  CGSCCPassManager &CGPipeline = MIWP.getPM();
 
-    FunctionPassManager FPM;
-    FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
-    FPM.addPass(EarlyCSEPass()); // Catch trivial redundancies.
-    FPM.addPass(SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(
-        true)));                    // Merge & remove basic blocks.
-    FPM.addPass(InstCombinePass()); // Combine silly sequences.
-    invokePeepholeEPCallbacks(FPM, Level);
+  FunctionPassManager FPM;
+  FPM.addPass(SROAPass(SROAOptions::ModifyCFG));
+  FPM.addPass(EarlyCSEPass()); // Catch trivial redundancies.
+  FPM.addPass(SimplifyCFGPass(SimplifyCFGOptions().convertSwitchRangeToICmp(
+      true)));                    // Merge & remove basic blocks.
+  FPM.addPass(InstCombinePass()); // Combine silly sequences.
+  invokePeepholeEPCallbacks(FPM, Level);
 
-    CGPipeline.addPass(createCGSCCToFunctionPassAdaptor(
-        std::move(FPM), PTO.EagerlyInvalidateAnalyses));
+  CGPipeline.addPass(createCGSCCToFunctionPassAdaptor(
+      std::move(FPM), PTO.EagerlyInvalidateAnalyses));
 
-    MPM.addPass(std::move(MIWP));
+  MPM.addPass(std::move(MIWP));
 
-    // Delete anything that is now dead to make sure that we don't instrument
-    // dead code. Instrumentation can end up keeping dead code around and
-    // dramatically increase code size.
-    MPM.addPass(GlobalDCEPass());
-  }
+  // Delete anything that is now dead to make sure that we don't instrument
+  // dead code. Instrumentation can end up keeping dead code around and
+  // dramatically increase code size.
+  MPM.addPass(GlobalDCEPass());
+}
+
+void PassBuilder::addPGOInstrPasses(ModulePassManager &MPM,
+                                    OptimizationLevel Level, bool RunProfileGen,
+                                    bool IsCS, bool AtomicCounterUpdate,
+                                    std::string ProfileFile,
+                                    std::string ProfileRemappingFile,
+                                    IntrusiveRefCntPtr<vfs::FileSystem> FS) {
+  assert(Level != OptimizationLevel::O0 && "Not expecting O0 here!");
 
   if (!RunProfileGen) {
     assert(!ProfileFile.empty() && "Profile use expecting a profile file!");
@@ -1104,6 +1109,12 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
   MPM.addPass(createModuleToFunctionPassAdaptor(std::move(GlobalCleanupPM),
                                                 PTO.EagerlyInvalidateAnalyses));
 
+  // Invoke the pre-inliner passes for instrumentation PGO or MemProf.
+  if (PGOOpt && Phase != ThinOrFullLTOPhase::ThinLTOPostLink &&
+      (PGOOpt->Action == PGOOptions::IRInstr ||
+       PGOOpt->Action == PGOOptions::IRUse || !PGOOpt->MemoryProfile.empty()))
+    addPreInlinerPasses(MPM, Level, Phase);
+
   // Add all the requested passes for instrumentation PGO, if requested.
   if (PGOOpt && Phase != ThinOrFullLTOPhase::ThinLTOPostLink &&
       (PGOOpt->Action == PGOOptions::IRInstr ||
@@ -1111,7 +1122,7 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
     addPGOInstrPasses(MPM, Level,
                       /*RunProfileGen=*/PGOOpt->Action == PGOOptions::IRInstr,
                       /*IsCS=*/false, PGOOpt->AtomicCounterUpdate,
-                      PGOOpt->ProfileFile, PGOOpt->ProfileRemappingFile, Phase,
+                      PGOOpt->ProfileFile, PGOOpt->ProfileRemappingFile,
                       PGOOpt->FS);
     MPM.addPass(PGOIndirectCallPromotion(false, false));
   }
@@ -1332,12 +1343,12 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
       addPGOInstrPasses(MPM, Level, /*RunProfileGen=*/true,
                         /*IsCS=*/true, PGOOpt->AtomicCounterUpdate,
                         PGOOpt->CSProfileGenFile, PGOOpt->ProfileRemappingFile,
-                        LTOPhase, PGOOpt->FS);
+                        PGOOpt->FS);
     else if (PGOOpt->CSAction == PGOOptions::CSIRUse)
       addPGOInstrPasses(MPM, Level, /*RunProfileGen=*/false,
                         /*IsCS=*/true, PGOOpt->AtomicCounterUpdate,
                         PGOOpt->ProfileFile, PGOOpt->ProfileRemappingFile,
-                        LTOPhase, PGOOpt->FS);
+                        PGOOpt->FS);
   }
 
   // Re-compute GlobalsAA here prior to function passes. This is particularly
@@ -1831,12 +1842,12 @@ PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
       addPGOInstrPasses(MPM, Level, /*RunProfileGen=*/true,
                         /*IsCS=*/true, PGOOpt->AtomicCounterUpdate,
                         PGOOpt->CSProfileGenFile, PGOOpt->ProfileRemappingFile,
-                        ThinOrFullLTOPhase::FullLTOPostLink, PGOOpt->FS);
+                        PGOOpt->FS);
     else if (PGOOpt->CSAction == PGOOptions::CSIRUse)
       addPGOInstrPasses(MPM, Level, /*RunProfileGen=*/false,
                         /*IsCS=*/true, PGOOpt->AtomicCounterUpdate,
                         PGOOpt->ProfileFile, PGOOpt->ProfileRemappingFile,
-                        ThinOrFullLTOPhase::FullLTOPostLink, PGOOpt->FS);
+                        PGOOpt->FS);
   }
 
   // Break up allocas
diff --git a/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp b/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
index 2b29ea2a65fdc87..eb9866727deef59 100644
--- a/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
+++ b/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
@@ -808,19 +808,19 @@ static void readMemprof(Module &M, Function &F,
         auto CalleeGUID = Function::getGUID(Name);
         auto StackId = computeStackId(CalleeGUID, GetOffset(DIL),
                                       ProfileHasColumns ? DIL->getColumn() : 0);
-        // LeafFound will only be false on the first iteration, since we either
-        // set it true or break out of the loop below.
+        // Check if we have found the profile's leaf frame. If yes, collect
+        // the rest of the call's inlined context starting here. If not, see if
+        // we find a match further up the inlined context (in case the profile
+        // was missing debug frames at the leaf).
         if (!LeafFound) {
           AllocInfoIter = LocHashToAllocInfo.find(StackId);
           CallSitesIter = LocHashToCallSites.find(StackId);
-          // Check if the leaf is in one of the maps. If not, no need to look
-          // further at this call.
-          if (AllocInfoIter == LocHashToAllocInfo.end() &&
-              CallSitesIter == LocHashToCallSites.end())
-            break;
-          LeafFound = true;
+          if (AllocInfoIter != LocHashToAllocInfo.end() ||
+              CallSitesIter != LocHashToCallSites.end())
+            LeafFound = true;
         }
-        InlinedCallStack.push_back(StackId);
+        if (LeafFound)
+          InlinedCallStack.push_back(StackId);
       }
       // If leaf not in either of the maps, skip inst.
       if (!LeafFound)
diff --git a/llvm/test/Other/new-pm-memprof.ll b/llvm/test/Other/new-pm-memprof.ll
new file mode 100644
index 000000000000000..c98a1fdd35d8d5e
--- /dev/null
+++ b/llvm/test/Other/new-pm-memprof.ll
@@ -0,0 +1,12 @@
+;; Ensure we invoke the preinliner when feeding back a memprof profile.
+
+;; The opt invocation will fail as the profdata file is empty, which is fine
+;; since we are simply testing the pass pipeline below.
+; RUN: not opt -debug-pass-manager -passes='default<O2>' -memory-profile-file=/dev/null %s 2>&1 | FileCheck %s
+
+; CHECK: Running pass: InlinerPass on (foo)
+; CHECK: Running pass: MemProfUsePass
+
+define void @foo() {
+  ret void
+}
diff --git a/llvm/test/Transforms/PGOProfile/Inputs/memprof_missing_leaf.exe b/llvm/test/Transforms/PGOProfile/Inputs/memprof_missing_leaf.exe
new file mode 100755
index 000000000000000..212f8f8ecce7668
Binary files /dev/null and b/llvm/test/Transforms/PGOProfile/Inputs/memprof_missing_leaf.exe differ
diff --git a/llvm/test/Transforms/PGOProfile/Inputs/memprof_missing_leaf.memprofraw b/llvm/test/Transforms/PGOProfile/Inputs/memprof_missing_leaf.memprofraw
new file mode 100644
index 000000000000000..3a06639d3a2bec8
Binary files /dev/null and b/llvm/test/Transforms/PGOProfile/Inputs/memprof_missing_leaf.memprofraw differ
diff --git a/llvm/test/Transforms/PGOProfile/Inputs/update_memprof_inputs.sh b/llvm/test/Transforms/PGOProfile/Inputs/update_memprof_inputs.sh
index cbf9a401a607235..417c7c4ecc58a6a 100755
--- a/llvm/test/Transforms/PGOProfile/Inputs/update_memprof_inputs.sh
+++ b/llvm/test/Transforms/PGOProfile/Inputs/update_memprof_inputs.sh
@@ -94,3 +94,34 @@ ${LLVMPROFDATA} merge --text ${OUTDIR}/memprof_pgo.profraw -o ${OUTDIR}/memprof_
 rm ${OUTDIR}/memprof.cc
 rm ${OUTDIR}/pgo.exe
 rm ${OUTDIR}/memprof_pgo.profraw
+
+# Use musttail to simulate a missing leaf debug frame in the profiled binary.
+# Note that changes in the code below which affect relative line number
+# offsets of calls from their parent function can affect callsite matching in
+# the LLVM IR.
+cat > ${OUTDIR}/memprof_missing_leaf.cc << EOF
+#include <new>
+#ifndef USE_MUSTTAIL
+#define USE_MUSTTAIL 0
+#endif
+
+// clang::musttail requires that the argument signature matches that of the caller.
+void *bar(std::size_t s) {
+#if USE_MUSTTAIL
+  [[clang::musttail]] return ::operator new (s);
+#else
+  return new char[s];
+#endif
+}
+
+int main() {
+  char *a = (char *)bar(1);
+  delete a;
+  return 0;
+}
+EOF
+
+${CLANG} ${COMMON_FLAGS} -fmemory-profile -DUSE_MUSTTAIL=1 ${OUTDIR}/memprof_missing_leaf.cc -o ${OUTDIR}/memprof_missing_leaf.exe
+env MEMPROF_OPTIONS=log_path=stdout ${OUTDIR}/memprof_missing_leaf.exe > ${OUTDIR}/memprof_missing_leaf.memprofraw
+
+rm ${OUTDIR}/memprof_missing_leaf.cc
diff --git a/llvm/test/Transforms/PGOProfile/memprof_missing_leaf.ll b/llvm/test/Transforms/PGOProfile/memprof_missing_leaf.ll
new file mode 100644
index 000000000000000..0ff98b310014afb
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/memprof_missing_leaf.ll
@@ -0,0 +1,92 @@
+;; Tests memprof profile matching when the leaf frame is missing in the
+;; profile. In this case the call to operator new was inlined before
+;; matching and we are able to match the next call frame up the inlined
+;; context.
+
+;; Avoid failures on big-endian systems that can't read the profile properly
+; REQUIRES: x86_64-linux
+
+;; TODO: Use text profile inputs once that is available for memprof.
+;; # To update the Inputs below, run Inputs/update_memprof_inputs.sh.
+;; # To generate below LLVM IR for use in matching:
+;; $ clang++ -gmlt -fdebug-info-for-profiling -S memprof_missing_leaf.cc \
+;; 	-O2 -mllvm -disable-llvm-optzns -flto
+;; $ opt < memprof_missing_leaf.s -passes='cgscc(inline)' -S >memprof_missing_leaf.ll
+
+;; $ clang++ -gmlt -fdebug-info-for-profiling -fno-omit-frame-pointer \
+;;      -fno-optimize-sibling-calls memprof.cc -S -emit-llvm
+
+; RUN: llvm-profdata merge %S/Inputs/memprof_missing_leaf.memprofraw --profiled-binary %S/Inputs/memprof_missing_leaf.exe -o %t.memprofdata
+; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -S | FileCheck %s
+
+; CHECK: call {{.*}} @_Znam{{.*}} #[[ATTR:[0-9]+]]
+; CHECK: attributes #[[ATTR]] = {{.*}} "memprof"="notcold"
+
+; ModuleID = '<stdin>'
+source_filename = "memprof_missing_leaf.cc"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; Function Attrs: nobuiltin allocsize(0)
+declare noundef nonnull ptr @_Znam(i64 noundef) #0
+
+; Function Attrs: mustprogress norecurse uwtable
+define dso_local noundef i32 @main() #1 !dbg !8 {
+entry:
+  %s.addr.i = alloca i64, align 8
+  %retval = alloca i32, align 4
+  %a = alloca ptr, align 8
+  store i32 0, ptr %retval, align 4
+  store i64 1, ptr %s.addr.i, align 8, !tbaa !11
+  %0 = load i64, ptr %s.addr.i, align 8, !dbg !15, !tbaa !11
+  %call.i = call noalias noundef nonnull ptr @_Znam(i64 noundef %0) #3, !dbg !18
+  store ptr %call.i, ptr %a, align 8, !dbg !19, !tbaa !20
+  %1 = load ptr, ptr %a, align 8, !dbg !22, !tbaa !20
+  %isnull = icmp eq ptr %1, null, !dbg !23
+  br i1 %isnull, label %delete.end, label %delete.notnull, !dbg !23
+
+delete.notnull:                                   ; preds = %entry
+  call void @_ZdlPv(ptr noundef %1) #4, !dbg !23
+  br label %delete.end, !dbg !23
+
+delete.end:                                       ; preds = %delete.notnull, %entry
+  ret i32 0, !dbg !24
+}
+
+; Function Attrs: nobuiltin nounwind
+declare void @_ZdlPv(ptr noundef) #2
+
+attributes #0 = { nobuiltin allocsize(0) "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #1 = { mustprogress norecurse uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #2 = { nobuiltin nounwind "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #3 = { builtin allocsize(0) }
+attributes #4 = { builtin nounwind }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3, !4, !5, !6, !7}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1, producer: "clang version 18.0.0 ([email protected]:llvm/llvm-project.git 71bf052ec90e77cb4aa66505d47cbc4b6016ac1d)", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, splitDebugInlining: false, debugInfoForProfiling: true, nameTableKind: None)
+!1 = !DIFile(filename: "memprof_missing_leaf.cc", directory: ".", checksumkind: CSK_MD5, checksum: "f1445a8699406a6b826128704d257677")
+!2 = !{i32 7, !"Dwarf Version", i32 5}
+!3 = !{i32 2, !"Debug Info Version", i32 3}
+!4 = !{i32 1, !"wchar_size", i32 4}
+!5 = !{i32 8, !"PIC Level", i32 2}
+!6 = !{i32 7, !"PIE Level", i32 2}
+!7 = !{i32 7, !"uwtable", i32 2}
+!8 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 15, type: !9, scopeLine: 15, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0)
+!9 = !DISubroutineType(types: !10)
+!10 = !{}
+!11 = !{!12, !12, i64 0}
+!12 = !{!"long", !13, i64 0}
+!13 = !{!"omnipotent char", !14, i64 0}
+!14 = !{!"Simple C++ TBAA"}
+!15 = !DILocation(line: 11, column: 19, scope: !16, inlinedAt: !17)
+!16 = distinct !DISubprogram(name: "bar", linkageName: "_Z3barm", scope: !1, file: !1, line: 7, type: !9, scopeLine: 7, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0)
+!17 = distinct !DILocation(line: 16, column: 21, scope: !8)
+!18 = !DILocation(line: 11, column: 10, scope: !16, inlinedAt: !17)
+!19 = !DILocation(line: 16, column: 9, scope: !8)
+!20 = !{!21, !21, i64 0}
+!21 = !{!"any pointer", !13, i64 0}
+!22 = !DILocation(line: 17, column: 10, scope: !8)
+!23 = !DILocation(line: 17, column: 3, scope: !8)
+!24 = !DILocation(line: 18, column: 3, scope: !8)

;; # To update the Inputs below, run Inputs/update_memprof_inputs.sh.
;; # To generate below LLVM IR for use in matching:
;; $ clang++ -gmlt -fdebug-info-for-profiling -S memprof_missing_leaf.cc \
;; -O2 -mllvm -disable-llvm-optzns -flto
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment on why we need "-disable-llvm-optzns". It seems strange that it follows the O2 option.
Also is there a difference here for flto vs flto=thin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using -flto just to get it to emit the LLVM assembly, but I switched to -emit-llvm instead. I thought I needed -mllvm -disable-llvm-optzns to prevent dead code elimination of my allocation, but I just tried without and don't need that, so I have cleaned that up

Copy link
Contributor

@snehasish snehasish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@teresajohnson teresajohnson merged commit 87f5e22 into llvm:main Nov 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants