Skip to content

[RFC][MC] Cache MCRegAliasIterator #93510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 19 additions & 46 deletions llvm/include/llvm/MC/MCRegisterInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,9 @@ class MCRegisterInfo {
DenseMap<MCRegister, int> L2SEHRegs; // LLVM to SEH regs mapping
DenseMap<MCRegister, int> L2CVRegs; // LLVM to CV regs mapping

mutable std::vector<std::vector<MCPhysReg>> RegAliasesCache;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I wanted to use a vector of unique_ptr but it makes it too awkward to use. I can't even use reserve in that case, so I didn't do that indirection in the end.

The vector is resized exactly once upon init, and each vector is 3 pointers, so for AMDGPU we have 9000*24 bytes so about 216KB for this map, which is worth it IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reduce the size by using SmallVector<, 0> instead of std::vector. (Using SmallVector is generally always preferred in LLVM, even if you don't use the inline storage.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use std::optional in place of unique_ptr.
Sentinel looks fine, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic SmallVector is 32 bytes, it has one more pointer for some reason.
Though it means I can use SmallVector<MCPhysReg, 4> and store small alias sets inline, so I think it's worth the size increase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but SmallVector doesn't have a shrink_to_fit, so it may end up using quite a bit more memory. I don't think it's worth it then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic SmallVector is 32 bytes, it has one more pointer for some reason.

That doesn't sound right. SmallVector should be 16 bytes large if the element type is 4 bytes... uh, I just saw that MCPhyReg is only 2 bytes, which increases SmallVector size to 24 bytes. And of course there's no way to disable that heuristic. So yeah, SmallVector won't make things smaller here (but it also shouldn't be 32 bytes either...)

Though it means I can use SmallVector<MCPhysReg, 4> and store small alias sets inline, so I think it's worth the size increase.

But this also means that you allocate those 4 inline elements for all of the registers that are not actually queried. It's probably not worthwhile in this context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SmallVector should be 16 bytes large if the element type is 4 bytes... uh, I just saw that MCPhyReg is only 2 bytes, which increases SmallVector size to 24 bytes.

Why does it get bigger when the element type gets smaller? Is it worth fixing that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because SmallVector<char> sometimes needs to hold more than 4GB of data. Use of 64-bit sizes should probably be limited to just that case and not also extend to SmallVector<uint16_t>...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayRef<MCPhysReg> getCachedAliasesOf(MCPhysReg R) const;

/// Iterator class that can traverse the differentially encoded values in
/// DiffLists. Don't use this class directly, use one of the adaptors below.
class DiffListIterator
Expand Down Expand Up @@ -263,6 +266,7 @@ class MCRegisterInfo {
friend class MCRegUnitIterator;
friend class MCRegUnitMaskIterator;
friend class MCRegUnitRootIterator;
friend class MCRegAliasIterator;

/// Initialize MCRegisterInfo, called by TableGen
/// auto-generated routines. *DO NOT USE*.
Expand Down Expand Up @@ -298,6 +302,8 @@ class MCRegisterInfo {
EHDwarf2LRegsSize = 0;
Dwarf2LRegs = nullptr;
Dwarf2LRegsSize = 0;

RegAliasesCache.resize(NumRegs);
}

/// Used to initialize LLVM register to Dwarf
Expand Down Expand Up @@ -723,63 +729,30 @@ class MCRegUnitRootIterator {
}
};

/// MCRegAliasIterator enumerates all registers aliasing Reg. If IncludeSelf is
/// set, Reg itself is included in the list. This iterator does not guarantee
/// any ordering or that entries are unique.
/// MCRegAliasIterator enumerates all registers aliasing Reg.
class MCRegAliasIterator {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if I should move the old implementation to a MCRegAliasIteratorUncached class, and then make MCRegAliasIterator the default cached implementation.

It makes things a bit cleaner (and makes the caching system easier to remove someday), but makes it very difficult to add a cache "killswitch" if we feel like one is necessary (like if we don't want caching on all targets).

private:
MCRegister Reg;
const MCRegisterInfo *MCRI;
bool IncludeSelf;

MCRegUnitIterator RI;
MCRegUnitRootIterator RRI;
MCSuperRegIterator SI;
const MCPhysReg *It = nullptr;
const MCPhysReg *End = nullptr;

public:
MCRegAliasIterator(MCRegister Reg, const MCRegisterInfo *MCRI,
bool IncludeSelf)
: Reg(Reg), MCRI(MCRI), IncludeSelf(IncludeSelf) {
// Initialize the iterators.
for (RI = MCRegUnitIterator(Reg, MCRI); RI.isValid(); ++RI) {
for (RRI = MCRegUnitRootIterator(*RI, MCRI); RRI.isValid(); ++RRI) {
for (SI = MCSuperRegIterator(*RRI, MCRI, true); SI.isValid(); ++SI) {
if (!(!IncludeSelf && Reg == *SI))
return;
}
}
}
}

bool isValid() const { return RI.isValid(); }

MCRegister operator*() const {
assert(SI.isValid() && "Cannot dereference an invalid iterator.");
return *SI;
bool IncludeSelf) {
ArrayRef<MCPhysReg> Cache = MCRI->getCachedAliasesOf(Reg);
assert(Cache.back() == Reg);
It = Cache.begin();
End = Cache.end();
if (!IncludeSelf)
--End;
}

void advance() {
// Assuming SI is valid.
++SI;
if (SI.isValid()) return;

++RRI;
if (RRI.isValid()) {
SI = MCSuperRegIterator(*RRI, MCRI, true);
return;
}
bool isValid() const { return It != End; }

++RI;
if (RI.isValid()) {
RRI = MCRegUnitRootIterator(*RI, MCRI);
SI = MCSuperRegIterator(*RRI, MCRI, true);
}
}
MCRegister operator*() const { return *It; }

MCRegAliasIterator &operator++() {
assert(isValid() && "Cannot move off the end of the list.");
do advance();
while (!IncludeSelf && isValid() && *SI == Reg);
++It;
return *this;
}
};
Expand Down
84 changes: 84 additions & 0 deletions llvm/lib/MC/MCRegisterInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,90 @@

using namespace llvm;

namespace {
/// MCRegAliasIterator enumerates all registers aliasing Reg. This iterator
/// does not guarantee any ordering or that entries are unique.
class MCRegAliasIteratorImpl {
private:
MCRegister Reg;
const MCRegisterInfo *MCRI;

MCRegUnitIterator RI;
MCRegUnitRootIterator RRI;
MCSuperRegIterator SI;

public:
MCRegAliasIteratorImpl(MCRegister Reg, const MCRegisterInfo *MCRI)
: Reg(Reg), MCRI(MCRI) {

// Initialize the iterators.
for (RI = MCRegUnitIterator(Reg, MCRI); RI.isValid(); ++RI) {
for (RRI = MCRegUnitRootIterator(*RI, MCRI); RRI.isValid(); ++RRI) {
for (SI = MCSuperRegIterator(*RRI, MCRI, true); SI.isValid(); ++SI) {
if (Reg != *SI)
return;
}
}
}
}

bool isValid() const { return RI.isValid(); }

MCRegister operator*() const {
assert(SI.isValid() && "Cannot dereference an invalid iterator.");
return *SI;
}

void advance() {
// Assuming SI is valid.
++SI;
if (SI.isValid())
return;

++RRI;
if (RRI.isValid()) {
SI = MCSuperRegIterator(*RRI, MCRI, true);
return;
}

++RI;
if (RI.isValid()) {
RRI = MCRegUnitRootIterator(*RI, MCRI);
SI = MCSuperRegIterator(*RRI, MCRI, true);
}
}

MCRegAliasIteratorImpl &operator++() {
assert(isValid() && "Cannot move off the end of the list.");
do
advance();
while (isValid() && *SI == Reg);
return *this;
}
};
} // namespace

ArrayRef<MCPhysReg> MCRegisterInfo::getCachedAliasesOf(MCPhysReg R) const {
auto &Aliases = RegAliasesCache[R];
if (!Aliases.empty())
return Aliases;

for (MCRegAliasIteratorImpl It(R, this); It.isValid(); ++It)
Aliases.push_back(*It);

sort(Aliases);
Aliases.erase(unique(Aliases), Aliases.end());
assert(none_of(Aliases, [&](auto &Cur) { return R == Cur; }) &&
"MCRegAliasIteratorImpl includes Self!");

// Always put "self" at the end, so the iterator can choose to ignore it.
// For registers without aliases, it also serves as a sentinel value that
// tells us to not recompute the alias set.
Aliases.push_back(R);
Aliases.shrink_to_fit();
return Aliases;
}

MCRegister
MCRegisterInfo::getMatchingSuperReg(MCRegister Reg, unsigned SubIdx,
const MCRegisterClass *RC) const {
Expand Down
15 changes: 8 additions & 7 deletions llvm/test/CodeGen/ARM/constant-island-movwt.mir
Original file line number Diff line number Diff line change
Expand Up @@ -898,13 +898,14 @@ body: |
# CHECK-NEXT: CONSTPOOL_ENTRY 1, %const.0, 4
# CHECK-NEXT: {{^ $}}
# CHECK-NEXT: bb.2.entry (align 2):
# CHECK-NEXT: liveins: $d13, $s27, $r10, $r9, $r8, $s26, $d12, $s25, $s24,
# CHECK-SAME: $d15, $s30, $s31, $d14, $s28, $s29, $lr, $r0, $d21,
# CHECK-SAME: $r3, $q10, $d20, $d17, $r2, $d25, $q11, $d22, $d23,
# CHECK-SAME: $r1, $q8, $d16, $s3, $q14, $d28, $d29, $d19, $s17,
# CHECK-SAME: $d8, $s16, $r6, $r7, $r4, $q12, $q9, $d18, $s0, $q15,
# CHECK-SAME: $d30, $d31, $r12, $s1, $d0, $d24, $s2, $d1, $q0, $s6,
# CHECK-SAME: $d3, $d2, $s4, $q1, $s7, $s5, $d9, $s18, $s19, $q4
# CHECK-NEXT: liveins: $s26, $s27, $r10, $r9, $r8, $d13, $s24, $s25,
# CHECK-SAME: $d12, $d15, $s30, $s31, $d14, $s28, $s29, $lr,
# CHECK-SAME: $d21, $q10, $r7, $r0, $d20, $d17, $r2, $q12,
# CHECK-SAME: $q11, $d22, $d23, $r1, $q8, $d16, $d30, $q14,
# CHECK-SAME: $d28, $d29, $d19, $s17, $r4, $d8, $r6, $r3,
# CHECK-SAME: $s16, $d25, $q9, $d18, $s0, $d31, $s3, $q15,
# CHECK-SAME: $r12, $d0, $s1, $d24, $d1, $s2, $q0, $s5, $d2,
# CHECK-SAME: $q1, $s4, $s7, $d3, $s6, $d9, $s18, $s19, $q4
# CHECK-NEXT: {{^ $}}
# CHECK-NEXT: $r5 = t2MOVi16 target-flags(arm-lo16) @.str.84, 14 /* CC::al */, $noreg
# CHECK-NEXT: $r5 = t2MOVTi16 $r5, target-flags(arm-hi16) @.str.84, 14 /* CC::al */, $noreg
Loading