-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[AMPGPU] Emit s_singleuse_vdst instructions when a register is used multiple times in the same instruction. #89601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d8598a3
81b2d80
9ded9f8
71adc8f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -83,37 +83,42 @@ class AMDGPUInsertSingleUseVDST : public MachineFunctionPass { | |
// instruction to be marked as a single use producer. | ||
bool AllProducerOperandsAreSingleUse = true; | ||
|
||
for (const auto &Operand : MI.operands()) { | ||
if (!Operand.isReg()) | ||
continue; | ||
const auto Reg = Operand.getReg(); | ||
|
||
// Count the number of times each register is read. | ||
if (Operand.readsReg()) | ||
for (const MCRegUnit &Unit : TRI->regunits(Reg)) | ||
RegisterUseCount[Unit]++; | ||
|
||
// Do not attempt to optimise across exec mask changes. | ||
if (MI.modifiesRegister(AMDGPU::EXEC, TRI)) { | ||
for (auto &UsedReg : RegisterUseCount) | ||
UsedReg.second = 2; | ||
} | ||
// Gather a list of Registers used before updating use counts to avoid | ||
// double counting registers that appear multiple times in a single | ||
// MachineInstr. | ||
SmallVector<MCRegUnit> RegistersUsed; | ||
|
||
// If we are at the point where the register first became live, | ||
// check if the operands are single use. | ||
if (!MI.modifiesRegister(Reg, TRI)) | ||
continue; | ||
for (const auto &Operand : MI.all_defs()) { | ||
const auto Reg = Operand.getReg(); | ||
|
||
const auto RegUnits = TRI->regunits(Reg); | ||
if (any_of(RegUnits, [&RegisterUseCount](const MCRegUnit &Unit) { | ||
if (any_of(RegUnits, [&RegisterUseCount](const MCRegUnit Unit) { | ||
return RegisterUseCount[Unit] > 1; | ||
})) | ||
AllProducerOperandsAreSingleUse = false; | ||
|
||
// Reset uses count when a register is no longer live. | ||
for (const MCRegUnit &Unit : RegUnits) | ||
for (const MCRegUnit Unit : RegUnits) | ||
RegisterUseCount.erase(Unit); | ||
} | ||
|
||
for (const auto &Operand : MI.all_uses()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens with regmask operands? What happens with call uses? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From the point of view of this pass, I'm not sure these need special handling. The pass will treat them as regular instructions, just with a large amount of registers used. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess you should disable all optimization across a call, like you do across exec mask changes, but that can be a separate patch. |
||
const auto Reg = Operand.getReg(); | ||
|
||
// Count the number of times each register is read. | ||
for (const MCRegUnit Unit : TRI->regunits(Reg)) { | ||
if (!is_contained(RegistersUsed, Unit)) | ||
RegistersUsed.push_back(Unit); | ||
} | ||
} | ||
for (const MCRegUnit Unit : RegistersUsed) | ||
RegisterUseCount[Unit]++; | ||
|
||
// Do not attempt to optimise across exec mask changes. | ||
if (MI.modifiesRegister(AMDGPU::EXEC, TRI)) { | ||
for (auto &UsedReg : RegisterUseCount) | ||
UsedReg.second = 2; | ||
} | ||
if (AllProducerOperandsAreSingleUse && SIInstrInfo::isVALU(MI)) { | ||
// TODO: Replace with candidate logging for instruction grouping | ||
// later. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything tracking per-instruction liveness should be split into a helper function. I feel like there's already something to do something similar for you somewhere, but I can't seem to find it. Is this just building a map from regunit to a 0-1-or-2 counter?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback.
Essentially, yes. We are counting the uses but the only values we care about are 0, 1 or >1.