Skip to content

[RFC][MC] Cache MCRegAliasIterator #93510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 14, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 90 additions & 24 deletions llvm/include/llvm/MC/MCRegisterInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,9 @@ class MCRegisterInfo {
DenseMap<MCRegister, int> L2SEHRegs; // LLVM to SEH regs mapping
DenseMap<MCRegister, int> L2CVRegs; // LLVM to CV regs mapping

mutable DenseMap<MCPhysReg, std::vector<MCPhysReg>> RegAliasesCache;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also just be a std::vector<std::unique_ptr<std::vector<MCPhysReg>>> where register IDs are the indexes into the vector.

It'd use more space upfront (for AMDGPU, about 9000*8 bytes) but makes accesses a lot faster, and would be more memory efficient if we end up caching tons of aliases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the total size of this map (worst case) for AMDGPU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realistically I don't know if a program could use all registers available. I don't think so.
Theoretically, there are 9000 registers and I'm not sure how much space DenseMap needs, I think it's a lot more than what std::vector would use

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretical minimum size of worst case would be 897289722 -> ~154MiB?
The upfront cost of a 70KiB vector version without the risk of multiple megabyte map seems preferable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I counted correctly, there will be 1'864'820 registers in all vectors in the map if every register is asked for its aliases. The number doesn't look like horribly large, so I guess caching is a reasonable trade-off.

ArrayRef<MCPhysReg> getCachedAliasesOf(MCPhysReg R) const;

/// Iterator class that can traverse the differentially encoded values in
/// DiffLists. Don't use this class directly, use one of the adaptors below.
class DiffListIterator
Expand Down Expand Up @@ -263,6 +266,7 @@ class MCRegisterInfo {
friend class MCRegUnitIterator;
friend class MCRegUnitMaskIterator;
friend class MCRegUnitRootIterator;
friend class MCRegAliasIterator;

/// Initialize MCRegisterInfo, called by TableGen
/// auto-generated routines. *DO NOT USE*.
Expand Down Expand Up @@ -726,60 +730,122 @@ class MCRegUnitRootIterator {
/// MCRegAliasIterator enumerates all registers aliasing Reg. If IncludeSelf is
/// set, Reg itself is included in the list. This iterator does not guarantee
/// any ordering or that entries are unique.
///
/// This iterator can work in two modes: cached and uncached.
///
/// In Uncached mode, this uses RegUnit/RegUnitRoot/SuperReg iterators to
/// find all aliases. This is very expensive if the target (such as AMDGPU)
/// has a lot of register and register aliases. It can also cause us
/// to iterate the same register many times over.
///
/// In cached mode, the iterator requests MCRegisterInfo for a cache.
/// MCRegisterInfo then returns a cached list of sorted, uniqued
/// aliases for that register. This allows the iterator to be faster
/// past the first request, but also to iterate much less in some
/// cases, giving a slight speedup to any code that's using it.
class MCRegAliasIterator {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if I should move the old implementation to a MCRegAliasIteratorUncached class, and then make MCRegAliasIterator the default cached implementation.

It makes things a bit cleaner (and makes the caching system easier to remove someday), but makes it very difficult to add a cache "killswitch" if we feel like one is necessary (like if we don't want caching on all targets).

private:
MCRegister Reg;
const MCRegisterInfo *MCRI;
bool IncludeSelf;
bool IncludeSelf : 1;
bool IsCached : 1;

/// TODO: Should we dynamically allocate this? If we don't make caching
/// specific for each target, we probably should in order to keep the size of
/// this object under control. RegAliasIterator is currently 80 bytes.
struct UncachedData {
MCRegUnitIterator RI;
MCRegUnitRootIterator RRI;
MCSuperRegIterator SI;
};

MCRegUnitIterator RI;
MCRegUnitRootIterator RRI;
MCSuperRegIterator SI;
struct CachedData {
ArrayRef<MCPhysReg> Aliases;
// Index into Aliases. The actual index is (Idx - IncludeSelf) because, when
// IncludeSelf is used, we use zero to represent self.
unsigned Idx;
};

union {
UncachedData Iters;
CachedData Cache;
};

public:
MCRegAliasIterator(MCRegister Reg, const MCRegisterInfo *MCRI,
bool IncludeSelf)
: Reg(Reg), MCRI(MCRI), IncludeSelf(IncludeSelf) {
bool IncludeSelf, bool UseCache = true)
: Reg(Reg), MCRI(MCRI), IncludeSelf(IncludeSelf) {

IsCached = UseCache;
if (IsCached) {
Cache.Aliases = MCRI->getCachedAliasesOf(Reg);
Cache.Idx = 0;
return;
}

// Initialize the iterators.
for (RI = MCRegUnitIterator(Reg, MCRI); RI.isValid(); ++RI) {
for (RRI = MCRegUnitRootIterator(*RI, MCRI); RRI.isValid(); ++RRI) {
for (SI = MCSuperRegIterator(*RRI, MCRI, true); SI.isValid(); ++SI) {
if (!(!IncludeSelf && Reg == *SI))
for (Iters.RI = MCRegUnitIterator(Reg, MCRI); Iters.RI.isValid();
++Iters.RI) {
for (Iters.RRI = MCRegUnitRootIterator(*Iters.RI, MCRI);
Iters.RRI.isValid(); ++Iters.RRI) {
for (Iters.SI = MCSuperRegIterator(*Iters.RRI, MCRI, true);
Iters.SI.isValid(); ++Iters.SI) {
if (!(!IncludeSelf && Reg == *Iters.SI))
return;
}
}
}
}

bool isValid() const { return RI.isValid(); }
bool isValid() const {
return IsCached ? (Cache.Idx < (Cache.Aliases.size() + IncludeSelf))
: Iters.RI.isValid();
}

MCRegister operator*() const {
assert(SI.isValid() && "Cannot dereference an invalid iterator.");
return *SI;
if (IsCached) {
if (IncludeSelf && (Cache.Idx == 0))
return Reg;
return Cache.Aliases[Cache.Idx - IncludeSelf];
}

assert(Iters.SI.isValid() && "Cannot dereference an invalid iterator.");
return *Iters.SI;
}

void advance() {
if (IsCached) {
++Cache.Idx;
return;
}

// Assuming SI is valid.
++SI;
if (SI.isValid()) return;
++Iters.SI;
if (Iters.SI.isValid())
return;

++RRI;
if (RRI.isValid()) {
SI = MCSuperRegIterator(*RRI, MCRI, true);
++Iters.RRI;
if (Iters.RRI.isValid()) {
Iters.SI = MCSuperRegIterator(*Iters.RRI, MCRI, true);
return;
}

++RI;
if (RI.isValid()) {
RRI = MCRegUnitRootIterator(*RI, MCRI);
SI = MCSuperRegIterator(*RRI, MCRI, true);
++Iters.RI;
if (Iters.RI.isValid()) {
Iters.RRI = MCRegUnitRootIterator(*Iters.RI, MCRI);
Iters.SI = MCSuperRegIterator(*Iters.RRI, MCRI, true);
}
}

MCRegAliasIterator &operator++() {
assert(isValid() && "Cannot move off the end of the list.");
do advance();
while (!IncludeSelf && isValid() && *SI == Reg);
if (IsCached)
advance();
else {
do
advance();
while (!IncludeSelf && isValid() && *Iters.SI == Reg);
}
return *this;
}
};
Expand Down
20 changes: 20 additions & 0 deletions llvm/lib/MC/MCRegisterInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,26 @@

using namespace llvm;

ArrayRef<MCPhysReg> MCRegisterInfo::getCachedAliasesOf(MCPhysReg R) const {
if (auto It = RegAliasesCache.find(R); It != RegAliasesCache.end())
return It->second;

// TODO: Should we have a DenseSet instead & then convert it
// to vector? Or even a BitVector that's then converted to a normal
// MCPhysReg vector?
auto &Aliases = RegAliasesCache[R];
for (MCRegAliasIterator It(R, this, /*IncludeSelf=*/false,
/*UseCache=*/false);
It.isValid(); ++It) {
Aliases.push_back(*It);
}

llvm::sort(Aliases);
Aliases.erase(unique(Aliases), Aliases.end());
Aliases.shrink_to_fit();
return Aliases;
}

MCRegister
MCRegisterInfo::getMatchingSuperReg(MCRegister Reg, unsigned SubIdx,
const MCRegisterClass *RC) const {
Expand Down
14 changes: 7 additions & 7 deletions llvm/test/CodeGen/ARM/constant-island-movwt.mir
Original file line number Diff line number Diff line change
Expand Up @@ -898,13 +898,13 @@ body: |
# CHECK-NEXT: CONSTPOOL_ENTRY 1, %const.0, 4
# CHECK-NEXT: {{^ $}}
# CHECK-NEXT: bb.2.entry (align 2):
# CHECK-NEXT: liveins: $d13, $s27, $r10, $r9, $r8, $s26, $d12, $s25, $s24,
# CHECK-SAME: $d15, $s30, $s31, $d14, $s28, $s29, $lr, $r0, $d21,
# CHECK-SAME: $r3, $q10, $d20, $d17, $r2, $d25, $q11, $d22, $d23,
# CHECK-SAME: $r1, $q8, $d16, $s3, $q14, $d28, $d29, $d19, $s17,
# CHECK-SAME: $d8, $s16, $r6, $r7, $r4, $q12, $q9, $d18, $s0, $q15,
# CHECK-SAME: $d30, $d31, $r12, $s1, $d0, $d24, $s2, $d1, $q0, $s6,
# CHECK-SAME: $d3, $d2, $s4, $q1, $s7, $s5, $d9, $s18, $s19, $q4
# CHECK-NEXT: liveins: $d13, $s26, $r10, $r9, $r8, $s27, $d12, $s24, $s25,
# CHECK-SAME: $d15, $s30, $s31, $d14, $s28, $s29, $lr, $d21, $r0,
# CHECK-SAME: $q10, $r7, $d20, $d17, $r2, $d25, $q11, $d22, $d23,
# CHECK-SAME: $s1, $q8, $d16, $s3, $q14, $d28, $d29, $d19, $s16,
# CHECK-SAME: $r4, $d8, $r6, $r3, $q12, $s17, $q9, $d18, $s0, $d31,
# CHECK-SAME: $q15, $d30, $r1, $d0, $r12, $d24, $s2, $d1, $q0, $s7,
# CHECK-SAME: $s4, $d2, $s6, $q1, $s5, $d3, $d9, $s18, $s19, $q4
# CHECK-NEXT: {{^ $}}
# CHECK-NEXT: $r5 = t2MOVi16 target-flags(arm-lo16) @.str.84, 14 /* CC::al */, $noreg
# CHECK-NEXT: $r5 = t2MOVTi16 $r5, target-flags(arm-hi16) @.str.84, 14 /* CC::al */, $noreg
Loading