-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RegAllocFast] Replace UsedInInstr with vector #96323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A SparseSet adds an avoidable layer of indirection and possibly looping control flow. Avoid this overhead by using a vector instead to store UsedInInstrs and PhysRegUses. To avoid clearing the vector after every instruction, use a monotonically increasing counter. The two maps are now merged and the lowest bit indicates whether the use is relevant for the livethrough handling code only.
@llvm/pr-subscribers-llvm-regalloc Author: Alexis Engelke (aengelke) ChangesA SparseSet adds an avoidable layer of indirection and possibly looping control flow. Avoid this overhead by using a vector to store UsedInInstrs and PhysRegUses. To avoid clearing the vector after every instruction, use a monotonically increasing counter. The two maps are now merged and the lowest bit indicates whether the use is relevant for the livethrough handling code only. Not really related: iterating over regunits isn't fast. I assume (but haven't verified) that this comes from data dependencies from the diff list storage. I think a more efficient way to map regs to reg units, maybe even by reordering enum values, would improve performance. (It's not really slow either, but it happens extremely often and is therefore noticable.) Full diff: https://github.com/llvm/llvm-project/pull/96323.diff 1 Files Affected:
diff --git a/llvm/lib/CodeGen/RegAllocFast.cpp b/llvm/lib/CodeGen/RegAllocFast.cpp
index fa9b7576bf9ed..3a29a703a88a3 100644
--- a/llvm/lib/CodeGen/RegAllocFast.cpp
+++ b/llvm/lib/CodeGen/RegAllocFast.cpp
@@ -253,11 +253,19 @@ class RegAllocFastImpl {
SmallVector<MachineInstr *, 32> Coalesced;
- using RegUnitSet = SparseSet<uint16_t, identity<uint16_t>>;
- /// Set of register units that are used in the current instruction, and so
+ /// Track register units that are used in the current instruction, and so
/// cannot be allocated.
- RegUnitSet UsedInInstr;
- RegUnitSet PhysRegUses;
+ ///
+ /// In the first phase (tied defs/early clobber), we consider also physical
+ /// uses, afterwards, we don't. if the lowest bit isn't set, it's a solely
+ /// physical use (markPhysRegUsedInInstr), otherwise, it's a normal use. To
+ /// avoid resetting the entire vector after every instruction, we track the
+ /// instruction "generation" in the remaining 31 bits -- this meands, that if
+ /// UsedInInstr[Idx] < InstrGen, the register unit is unused. InstrGen is
+ /// never zero and always incremented by two.
+ uint32_t InstrGen;
+ SmallVector<unsigned, 0> UsedInInstr;
+
SmallVector<unsigned, 8> DefOperandIndexes;
// Register masks attached to the current instruction.
SmallVector<const uint32_t *> RegMasks;
@@ -271,7 +279,7 @@ class RegAllocFastImpl {
/// Mark a physreg as used in this instruction.
void markRegUsedInInstr(MCPhysReg PhysReg) {
for (MCRegUnit Unit : TRI->regunits(PhysReg))
- UsedInInstr.insert(Unit);
+ UsedInInstr[Unit] = InstrGen | 1;
}
// Check if physreg is clobbered by instruction's regmask(s).
@@ -285,26 +293,25 @@ class RegAllocFastImpl {
bool isRegUsedInInstr(MCPhysReg PhysReg, bool LookAtPhysRegUses) const {
if (LookAtPhysRegUses && isClobberedByRegMasks(PhysReg))
return true;
- for (MCRegUnit Unit : TRI->regunits(PhysReg)) {
- if (UsedInInstr.count(Unit))
- return true;
- if (LookAtPhysRegUses && PhysRegUses.count(Unit))
+ for (MCRegUnit Unit : TRI->regunits(PhysReg))
+ if (UsedInInstr[Unit] >= (InstrGen | !LookAtPhysRegUses))
return true;
- }
return false;
}
/// Mark physical register as being used in a register use operand.
/// This is only used by the special livethrough handling code.
void markPhysRegUsedInInstr(MCPhysReg PhysReg) {
- for (MCRegUnit Unit : TRI->regunits(PhysReg))
- PhysRegUses.insert(Unit);
+ for (MCRegUnit Unit : TRI->regunits(PhysReg)) {
+ assert(UsedInInstr[Unit] <= InstrGen && "non-phys use before phys use?");
+ UsedInInstr[Unit] = InstrGen;
+ }
}
/// Remove mark of physical register being used in the instruction.
void unmarkRegUsedInInstr(MCPhysReg PhysReg) {
for (MCRegUnit Unit : TRI->regunits(PhysReg))
- UsedInInstr.erase(Unit);
+ UsedInInstr[Unit] = 0;
}
enum : unsigned {
@@ -1382,7 +1389,12 @@ void RegAllocFastImpl::allocateInstruction(MachineInstr &MI) {
// - The "free def operands" step has to come last instead of first for tied
// operands and early-clobbers.
- UsedInInstr.clear();
+ InstrGen += 2;
+ // In the event we ever get more than 2**31 instructions...
+ if (InstrGen == 0) {
+ UsedInInstr.assign(UsedInInstr.size(), 0);
+ InstrGen = 2;
+ }
RegMasks.clear();
BundleVirtRegsMap.clear();
@@ -1443,8 +1455,6 @@ void RegAllocFastImpl::allocateInstruction(MachineInstr &MI) {
// heuristic to figure out a good operand order before doing
// assignments.
if (NeedToAssignLiveThroughs) {
- PhysRegUses.clear();
-
while (ReArrangedImplicitOps) {
ReArrangedImplicitOps = false;
findAndSortDefOperandIndexes(MI);
@@ -1769,10 +1779,8 @@ bool RegAllocFastImpl::runOnMachineFunction(MachineFunction &MF) {
MRI->freezeReservedRegs();
RegClassInfo.runOnMachineFunction(MF);
unsigned NumRegUnits = TRI->getNumRegUnits();
- UsedInInstr.clear();
- UsedInInstr.setUniverse(NumRegUnits);
- PhysRegUses.clear();
- PhysRegUses.setUniverse(NumRegUnits);
+ InstrGen = 0;
+ UsedInInstr.assign(NumRegUnits, 0);
// initialize the virtual->physical register map to have a 'null'
// mapping for all virtual registers
|
llvm/lib/CodeGen/RegAllocFast.cpp
Outdated
UsedInInstr.clear(); | ||
InstrGen += 2; | ||
// In the event we ever get more than 2**31 instructions... | ||
if (InstrGen == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth putting an unlikely here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't change assembly, but added it anyway for clarity.
A SparseSet adds an avoidable layer of indirection and possibly looping control flow. Avoid this overhead by using a vector to store UsedInInstrs and PhysRegUses. To avoid clearing the vector after every instruction, use a monotonically increasing counter. The two maps are now merged and the lowest bit indicates whether the use is relevant for the livethrough handling code only.
A SparseSet adds an avoidable layer of indirection and possibly looping control flow. Avoid this overhead by using a vector to store UsedInInstrs and PhysRegUses.
To avoid clearing the vector after every instruction, use a monotonically increasing counter. The two maps are now merged and the lowest bit indicates whether the use is relevant for the livethrough handling code only.
http://llvm-compile-time-tracker.com/compare.php?from=739a9605677dd736971b17a7888f9d18fd245904&to=86e8efec8f9e09e7ed0aaaf47d5bea3fe793b1f6&stat=instructions:u
Not really related: iterating over regunits isn't fast. I assume (but haven't verified) that this comes from data dependencies from the diff list storage. I think a more efficient way to map regs to reg units, maybe even by reordering enum values, would improve performance. (It's not really slow either, but it happens extremely often and is therefore noticable.)