Skip to content

[MC][profgen][NFC] Expand auto for MCDecodedPseudoProbe #102788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

aaupov
Copy link
Contributor

@aaupov aaupov commented Aug 11, 2024

Expand autos in select places in preparation to #102789.

aaupov added 2 commits August 10, 2024 21:33
Created using spr 1.3.4
@llvmbot llvmbot added mc Machine (object) code PGO Profile Guided Optimizations labels Aug 11, 2024
@llvmbot
Copy link
Member

llvmbot commented Aug 11, 2024

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-pgo

Author: Amir Ayupov (aaupov)

Changes

Expand autos in select places to trigger type casts.


Full diff: https://github.com/llvm/llvm-project/pull/102788.diff

2 Files Affected:

  • (modified) llvm/lib/MC/MCPseudoProbe.cpp (+2-2)
  • (modified) llvm/tools/llvm-profgen/ProfileGenerator.cpp (+1-1)
diff --git a/llvm/lib/MC/MCPseudoProbe.cpp b/llvm/lib/MC/MCPseudoProbe.cpp
index bb9538d55b7aa..e9e391c8eadd7 100644
--- a/llvm/lib/MC/MCPseudoProbe.cpp
+++ b/llvm/lib/MC/MCPseudoProbe.cpp
@@ -630,7 +630,7 @@ void MCPseudoProbeDecoder::printProbeForAddress(raw_ostream &OS,
                                                 uint64_t Address) {
   auto It = Address2ProbesMap.find(Address);
   if (It != Address2ProbesMap.end()) {
-    for (auto &Probe : It->second) {
+    for (const MCDecodedPseudoProbe &Probe : It->second) {
       OS << " [Probe]:\t";
       Probe.print(OS, GUID2FuncDescMap, true);
     }
@@ -657,7 +657,7 @@ MCPseudoProbeDecoder::getCallProbeForAddr(uint64_t Address) const {
   const auto &Probes = It->second;
 
   const MCDecodedPseudoProbe *CallProbe = nullptr;
-  for (const auto &Probe : Probes) {
+  for (const MCDecodedPseudoProbe &Probe : Probes) {
     if (Probe.isCall()) {
       // Disabling the assert and returning first call probe seen so far.
       // Subsequent call probes, if any, are ignored. Due to the the way
diff --git a/llvm/tools/llvm-profgen/ProfileGenerator.cpp b/llvm/tools/llvm-profgen/ProfileGenerator.cpp
index 175556c2220e6..c840e7789edfe 100644
--- a/llvm/tools/llvm-profgen/ProfileGenerator.cpp
+++ b/llvm/tools/llvm-profgen/ProfileGenerator.cpp
@@ -1194,7 +1194,7 @@ void ProfileGeneratorBase::extractProbesFromRange(
           Binary->getAddress2ProbesMap();
       auto It = Address2ProbesMap.find(IP.Address);
       if (It != Address2ProbesMap.end()) {
-        for (const auto &Probe : It->second) {
+        for (const MCDecodedPseudoProbe &Probe : It->second) {
           ProbeCounter[&Probe] += Count;
         }
       }

@@ -657,7 +657,7 @@ MCPseudoProbeDecoder::getCallProbeForAddr(uint64_t Address) const {
const auto &Probes = It->second;

const MCDecodedPseudoProbe *CallProbe = nullptr;
for (const auto &Probe : Probes) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expand autos in select places to trigger type casts.

Can you explain? auto is a syntax sugar. Probes is of type std::list<MCDecodedPseudoProbe>, so the deduced type for auto is MCDecodedPseudoProbe. What type extra casting do you expect by changing from auto to explicit type?

Copy link
Contributor Author

@aaupov aaupov Aug 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #102789, Probes (when accessed via Address2ProbesMap) become std::vector<std::reference_wrapper<MCDecodedPseudoProbe>>.
auto would expand to a reference_wrapper, so to access the underlying pseudo probe, it would need explicit get. To avoid that, we can use reference_wrapper::operator T& which is functionally identical to get. This is achieved by providing MCDecodedPseudoProbe & as type of a range-based variable.

https://en.cppreference.com/w/cpp/utility/functional/reference_wrapper/get

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then this change should be part of #102789, and should not be separated. Otherwise the motivation and effect of this change is lost, because the base side of this change still use std::list, and the change description stating "trigger type casts" can be confusing.

Suggest you merge it back (or at least fix change description).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then this change should be part of #102789, and should not be separated. Otherwise the motivation and effect of this change is lost, because the base side of this change still use std::list, and the change description stating "trigger type casts" can be confusing.

Suggest you merge it back (or at least fix change description).

I intended to minimize the changes in #102789 and make all preparation changes small, isolated, and verifiably NFC. I've changed the description, hope it makes sense now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks.

I intended to minimize the changes in #102789 and make all preparation changes small, isolated, and verifiably NFC.

You might have gone a bit too far. :) And this isn't the first instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but still I though it's worth clarifying. Generally the purpose of splitting a patch isn't to strip things into individual patches as much as possible, but to help reviewers: 1) to make sure a patch is manageable in size for review; 2) group changes with logical connection/separation (components/steps). The other change isn't that big anyways, and evidently splitting here actually make it more difficult for reviewers instead of helping them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarifying. I'm used to operate under the assumption of extremely limited reviewer time, hence splitting into minimal/standalone changes. If you're fine with slightly larger and more loosely connected changes, that's also good with me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your assumption is right, and you do what's needed to help reviewer. More often than not splitting helps, but not always. It's not about personal preference, but what helps reviewer.

@aaupov aaupov changed the base branch from users/aaupov/spr/main.mcprofgennfc-expand-auto-for-mcdecodedpseudoprobe to main August 11, 2024 06:46
@aaupov aaupov changed the base branch from main to users/aaupov/spr/main.mcprofgennfc-expand-auto-for-mcdecodedpseudoprobe August 11, 2024 06:47
aaupov added 2 commits August 10, 2024 23:47
Created using spr 1.3.4

[skip ci]
Created using spr 1.3.4
@aaupov aaupov changed the base branch from users/aaupov/spr/main.mcprofgennfc-expand-auto-for-mcdecodedpseudoprobe to main August 11, 2024 06:48
@aaupov aaupov merged commit cd15d12 into main Aug 11, 2024
6 of 9 checks passed
@aaupov aaupov deleted the users/aaupov/spr/mcprofgennfc-expand-auto-for-mcdecodedpseudoprobe branch August 11, 2024 06:48
aaupov added a commit that referenced this pull request Aug 26, 2024
…unction records

Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`)
and function records (`InlineTreeVec`).

Leverage that to also shrink sizes of `MCDecodedPseudoProbe`:
- Drop Guid since it's accessible via `InlineTree`.

`MCDecodedPseudoProbeInlineTree`:
- Keep track of probes and inlinees using `ArrayRef`s now that probes
  and function records belonging to the same function are allocated
  contiguously.

This reduces peak RSS from 13.7 GiB to 9.7 GiB and pseudo probe parsing
time (as part of perf2bolt) from 15.3s to 9.6s for a large binary with
400MiB .pseudo_probe section containing 43M probes and 25M function
records.

Depends on:
#102774
#102787
#102788

Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm

Reviewed By: wlei-llvm

Pull Request: #102789
aaupov added a commit that referenced this pull request Sep 4, 2024
…unction records

Use #102774 to allocate storage for decoded probes (`PseudoProbeVec`)
and function records (`InlineTreeVec`).

Leverage that to also shrink sizes of `MCDecodedPseudoProbe`:
- Drop Guid since it's accessible via `InlineTree`.

`MCDecodedPseudoProbeInlineTree`:
- Keep track of probes and inlinees using `ArrayRef`s now that probes
  and function records belonging to the same function are allocated
  contiguously.

This reduces peak RSS from 13.7 GiB to 9.7 GiB and pseudo probe parsing
time (as part of perf2bolt) from 15.3s to 9.6s for a large binary with
400MiB .pseudo_probe section containing 43M probes and 25M function
records.

Depends on:
#102774
#102787
#102788

Reviewers: maksfb, rafaelauler, dcci, ayermolo, wlei-llvm

Reviewed By: wlei-llvm

Pull Request: #102789
aaupov added a commit that referenced this pull request Sep 4, 2024
Expand autos in select places in preparation to #102789.

Reviewers: dcci, maksfb, WenleiHe, rafaelauler, ayermolo, wlei-llvm

Reviewed By: WenleiHe, wlei-llvm

Pull Request: #102788
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mc Machine (object) code PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants