Skip to content

[Memprof] Adds instrumentation support for memprof with histograms. #100834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

mattweingarten
Copy link
Contributor

@mattweingarten mattweingarten commented Jul 26, 2024

This patch allows running -fmemory-profile without the flag -memprof-use-callbacks, meaning the RecordAccessesHistogram is injected into IR as a sequence of instructions. This significantly increases performance of the instrumented binary.

@llvmbot llvmbot added compiler-rt PGO Profile Guided Optimizations llvm:transforms labels Jul 26, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 26, 2024

@llvm/pr-subscribers-pgo

@llvm/pr-subscribers-llvm-transforms

Author: Matthew Weingarten (mattweingarten)

Changes

This patch allows running -fmemory-profile without the flag -memprof-use-callbacks, meaning the RecordAccessesHistogram is injected into IR as a sequence of instructions. This significantly increases performance of the instrumented binary.

Changes the HISTOGRAM_GRANULARITY from 8U to 8ULL to correct the shadow address translation.


Full diff: https://github.com/llvm/llvm-project/pull/100834.diff

2 Files Affected:

  • (modified) compiler-rt/lib/memprof/memprof_mapping.h (+1-1)
  • (modified) llvm/lib/Transforms/Instrumentation/MemProfiler.cpp (+37-15)
diff --git a/compiler-rt/lib/memprof/memprof_mapping.h b/compiler-rt/lib/memprof/memprof_mapping.h
index fef8acfcfc921..6da385ab3d6e2 100644
--- a/compiler-rt/lib/memprof/memprof_mapping.h
+++ b/compiler-rt/lib/memprof/memprof_mapping.h
@@ -55,7 +55,7 @@ extern uptr kHighMemEnd; // Initialized in __memprof_init.
 // computed by summing up all individual 1 byte counters. This can incur an
 // accuracy penalty.
 
-#define HISTOGRAM_GRANULARITY 8U
+#define HISTOGRAM_GRANULARITY 8ULL
 
 #define HISTOGRAM_MAX_COUNTER 255U
 
diff --git a/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp b/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
index 2c5d749d4a67a..0e4d21671d2a7 100644
--- a/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
+++ b/llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
@@ -61,6 +61,9 @@ constexpr int LLVM_MEM_PROFILER_VERSION = 1;
 // Size of memory mapped to a single shadow location.
 constexpr uint64_t DefaultMemGranularity = 64;
 
+// Size of memory mapped to a single histogram bucket.
+constexpr uint64_t HistogramGranularity = 8;
+
 // Scale from granularity down to shadow size.
 constexpr uint64_t DefaultShadowScale = 3;
 
@@ -192,7 +195,8 @@ namespace {
 struct ShadowMapping {
   ShadowMapping() {
     Scale = ClMappingScale;
-    Granularity = ClMappingGranularity;
+    // Histogram Granularity is always 8U for now.
+    Granularity = ClHistogram ? HistogramGranularity : ClMappingGranularity;
     Mask = ~(Granularity - 1);
   }
 
@@ -288,10 +292,6 @@ ModuleMemProfilerPass::ModuleMemProfilerPass() = default;
 PreservedAnalyses ModuleMemProfilerPass::run(Module &M,
                                              AnalysisManager<Module> &AM) {
 
-  assert((!ClHistogram || (ClHistogram && ClUseCalls)) &&
-         "Cannot use -memprof-histogram without Callbacks. Set "
-         "memprof-use-callbacks");
-
   ModuleMemProfiler Profiler(M);
   if (Profiler.instrumentModule(M))
     return PreservedAnalyses::none();
@@ -489,16 +489,38 @@ void MemProfiler::instrumentAddress(Instruction *OrigIns,
     return;
   }
 
-  // Create an inline sequence to compute shadow location, and increment the
-  // value by one.
-  Type *ShadowTy = Type::getInt64Ty(*C);
-  Type *ShadowPtrTy = PointerType::get(ShadowTy, 0);
-  Value *ShadowPtr = memToShadow(AddrLong, IRB);
-  Value *ShadowAddr = IRB.CreateIntToPtr(ShadowPtr, ShadowPtrTy);
-  Value *ShadowValue = IRB.CreateLoad(ShadowTy, ShadowAddr);
-  Value *Inc = ConstantInt::get(Type::getInt64Ty(*C), 1);
-  ShadowValue = IRB.CreateAdd(ShadowValue, Inc);
-  IRB.CreateStore(ShadowValue, ShadowAddr);
+  if (ClHistogram) {
+    Type *ShadowTy = Type::getInt8Ty(*C);
+    Type *ShadowPtrTy = PointerType::get(ShadowTy, 0);
+    Value *Shadow = IRB.CreateAnd(AddrLong, Mapping.Mask);
+    Shadow = IRB.CreateLShr(Shadow, Mapping.Scale);
+    assert(DynamicShadowOffset);
+    Value *ShadowInteger = IRB.CreateAdd(Shadow, DynamicShadowOffset);
+
+    Value *ShadowAddr = IRB.CreateIntToPtr(ShadowInteger, ShadowPtrTy);
+    Value *ShadowValue = IRB.CreateLoad(ShadowTy, ShadowAddr);
+    Value *MaxCount = ConstantInt::get(Type::getInt8Ty(*C), 255);
+    Value *Cmp = IRB.CreateICmpULT(ShadowValue, MaxCount);
+    // Insert Basic block that increments histogram bucket if value is less
+    // than 255.
+    Instruction *IncBlock =
+        SplitBlockAndInsertIfThen(Cmp, InsertBefore, /*Unreachable=*/false);
+    IRB.SetInsertPoint(IncBlock);
+    Value *Inc = ConstantInt::get(Type::getInt8Ty(*C), 1);
+    ShadowValue = IRB.CreateAdd(ShadowValue, Inc);
+    IRB.CreateStore(ShadowValue, ShadowAddr);
+  } else {
+    // Create an inline sequence to compute shadow location, and increment the
+    // value by one.
+    Type *ShadowTy = Type::getInt64Ty(*C);
+    Type *ShadowPtrTy = PointerType::get(ShadowTy, 0);
+    Value *ShadowPtr = memToShadow(AddrLong, IRB);
+    Value *ShadowAddr = IRB.CreateIntToPtr(ShadowPtr, ShadowPtrTy);
+    Value *ShadowValue = IRB.CreateLoad(ShadowTy, ShadowAddr);
+    Value *Inc = ConstantInt::get(Type::getInt64Ty(*C), 1);
+    ShadowValue = IRB.CreateAdd(ShadowValue, Inc);
+    IRB.CreateStore(ShadowValue, ShadowAddr);
+  }
 }
 
 // Create the variable for the profile file name.

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a test. E.g. see the instrumentation IR checking in llvm/test/Instrumentation/HeapProfiler/basic.ll.

@@ -55,7 +55,7 @@ extern uptr kHighMemEnd; // Initialized in __memprof_init.
// computed by summing up all individual 1 byte counters. This can incur an
// accuracy penalty.

#define HISTOGRAM_GRANULARITY 8U
#define HISTOGRAM_GRANULARITY 8ULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably go in as a separate PR since I think it affects both versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to PR: #100949.

Changed commit comment as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will keep the change in this PR for now until the other PR is merge, just so tests do not fail, but remove it before pushing.

@@ -192,7 +195,8 @@ namespace {
struct ShadowMapping {
ShadowMapping() {
Scale = ClMappingScale;
Granularity = ClMappingGranularity;
// Histogram Granularity is always 8U for now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the relevance of this comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh are you explaining why it is not a cl::opt? That's sort of unclear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that was what I was referring to. Removed comment since its not necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an assert then if the user specifies both ClHistogram and ClMappingGranularity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, done.

Value *Shadow = IRB.CreateAnd(AddrLong, Mapping.Mask);
Shadow = IRB.CreateLShr(Shadow, Mapping.Scale);
assert(DynamicShadowOffset);
Value *ShadowInteger = IRB.CreateAdd(Shadow, DynamicShadowOffset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't the above 4 lines just reuse memToShadow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, there is a lot of common stuff between the histogram and non-histogram cases. Can you just make a few pieces of it condition, such as the selection of the ShadowTy and the conditional handling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixed.

@mattweingarten
Copy link
Contributor Author

Needs a test. E.g. see the instrumentation IR checking in llvm/test/Instrumentation/HeapProfiler/basic.ll.

Added basic-histogram.ll for testing

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with assert suggested in my last comment.

@mattweingarten mattweingarten force-pushed the MemprofHistogramInstrumentation branch from 07a5c65 to bfaf9c9 Compare July 29, 2024 19:49
…grams.

This patch allows running `-fmemory-profile` without `-memprof-use-callbacks`. This significantly increases
performance of the instrumented binary.

Additionally, changes the `HISTOGRAM_GRANULARITY` from 8U to 8ULL to correct shadow address translation.
@mattweingarten mattweingarten force-pushed the MemprofHistogramInstrumentation branch from 56d14e1 to 32e8c99 Compare July 29, 2024 23:08
@teresajohnson teresajohnson merged commit 17993eb into llvm:main Jul 29, 2024
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler-rt llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants